Professional Documents
Culture Documents
Pristine
Pristine www.edupristine.com
Pristine
CHAID Analyses
I.
Pristine
Data Mining
Data mining is the nontrivial extraction of implicit, previously unknown, and
potentially useful information from data.
As goals could be differently. Accordingly the data mining techniques will vary. At
a high level, data mining techniques can be classified into:
Directed or
Undirected
Directed
Pristine
Undirected
Goal is to predict,
estimate, classify, or
characterize the behavior
of some pre-identified
target variable.
Goal is
to discover
structure in the data set
as a whole.
- Classification
- Estimation
- Prediction
- Description &
Visualization
- Association Rule or
Affinity Grouping
- Clustering
Pristine
Respons
e
20%
1000
Married
25%
500
Male
40%
300
Pristine
Female
15%
300
Divorce
d
15%
200
Single
15%
300
Pet=no
40%
50
Pet=yes
6.67%
150
Segments in order of
responsiveness
1. Married Male
Sample size 50
3. Married Female
Pristine
CHAID Analyses
I.
Pristine
Pristine
CHAID Algorithms
A CHAID tree is a decision tree that is constructed by splitting subsets of the
space into two or more child (nodes) repeatedly, beginning with the entire data
set.
To determine the best split at any node, CHAID merges any allowable pair of
categories of the predictor variable (the set of allowable pairs is determined by
the type of predictor variable being studied) if there is no statistically significant
difference within the pair with respect to the target variable.
The process is repeated until no non-significant pair is found.
The resulting set of categories of the predictor variable is the best split with
respect to that predictor variable.
This process is followed for all predictor variables.
The split that is the best prediction is selected, and the node is split.
The process repeats recursively until one of the stopping rules is triggered.
The significance of splitting is tested by means of chi-square test based on
contingency table approach.
Pristine
10
Pristine
11
Importing data in R
Selecting the variables
Runing the analysis
Interpreting the results
Pristine
12
Pristine
13
Pristine
14
#
Default_On_Pay
Status_Checking_Acc Observatio
ment
ns
A11
1370
675
A12
1345
520
A13
315
70
A14
1970
230
Duration_in_Mo
# Observations
nths
12
1795
24
2055
36
715
48
355
60
80
72
5
Credit_History
A30
A31
A32
A33
A34
Pristine
Default_On_Payment
380
610
285
180
40
5
#
Default_On_Pay
Observatio
ment
ns
200
125
245
140
2650
840
440
140
1465
250
Default
Rate
49.27%
38.66%
22.22%
11.68%
Default
Rate
21.2%
29.7%
39.9%
50.7%
50.0%
100.0%
Default
Rate
62.50%
57.14%
31.70%
31.82%
17.06%
15
Purposre_Credit_Taken
A40
A41
A42
A43
A44
A45
A46
A48
A49
A410
#
Default_On_Pay
Observatio
ment
ns
1170
445
515
85
905
290
1400
305
60
20
110
40
250
110
45
5
485
170
60
25
Default
Rate
38.03%
16.50%
32.04%
21.79%
33.33%
36.36%
44.00%
11.11%
35.05%
41.67%
Credit_Amount
# Observations
Default_On_Payment
4000
11000
12000
3770
1085
145
975
420
100
Savings_Acc
A61
A62
A63
A64
A65
Pristine
#
Default_On_Pay
Observatio
ment
ns
3015
1080
515
170
315
55
240
30
915
160
Default
Rate
25.86%
38.71%
68.97%
Default
Rate
35.82%
33.01%
17.46%
12.50%
17.49%
16
#
Years_At_Present_Emplo
Default_On_Pay
Observatio
yment
ment
ns
A71
310
115
A72
860
350
A73
1695
515
A74
870
195
A75
1265
320
Inst_Rt_Income
1
2
3
4
Marital_Status_Gender
A91
A92
A93
A94
1
0
#
Default_On_Pay
Observatio
ment
ns
680
170
1155
305
785
225
2380
795
#
Default_On_Pay
Observatio
ment
ns
250
100
1550
540
2740
730
460
125
#
Other_Debtors_Guaranto
Default_On_Pay
Observatio
rs
ment
ns
A101
4535
1355
A102
205
90
A103
260
50
Pristine
Default
Rate
37.10%
40.70%
30.38%
22.41%
25.30%
Default
Rate
25.00%
26.41%
28.66%
33.40%
Default
Rate
40.00%
34.84%
26.64%
27.17%
Default
Rate
29.88%
43.90%
19.23%
17
Current_Address_Yrs
1
2
3
4
1
2
1
3
1
4
Property
A121
A122
A123
A124
#
Default_On_Pay
Observatio
ment
ns
1410
295
1160
355
1660
510
770
335
Age:
Refer to
Analysis_of_Default.xlsx
Other_Inst_Plans
A141
A142
A143
1
5
#
Default_On_Pay
Observatio
ment
ns
650
180
1540
480
745
215
2065
620
Housing
A151
A152
A153
Pristine
#
Default_On_Pay
Observatio
ment
ns
695
285
235
95
4070
1115
#
Default_On_Pay
Observatio
ment
ns
895
350
3565
925
540
220
Default
Rate
27.69%
31.17%
28.86%
30.02%
Default
Rate
20.92%
30.60%
30.72%
43.51%
Default
Rate
41.01%
40.43%
27.40%
Default
Rate
39.11%
25.95%
40.74%
18
Num_CC
1
2
3
4
1
7
Job
A171
A172
A173
A174
1
8
1
9
2
0
Dependents
1
2
Telephone
A191
A192
Foreign_Worker
A201
A202
Pristine
#
Default_On_Pay
Observatio
ment
ns
3165
995
1665
460
140
30
30
10
#
Default_On_Pay
Observatio
ment
ns
110
35
1000
280
3150
925
740
255
#
Default_On_Pay
Observatio
ment
ns
4225
1265
775
230
#
Default_On_Pay
Observatio
ment
ns
2980
930
2020
565
#
Default_On_Pay
Observatio
ment
ns
4815
1475
185
20
Default
Rate
31.44%
27.63%
21.43%
33.33%
Default
Rate
31.82%
28.00%
29.37%
34.46%
Default
Rate
29.94%
29.68%
Default
Rate
31.21%
27.97%
Default
Rate
30.63%
10.81%
19
Pristine
20
Pristine
21
Pristine
22
Pristine
23
Pristine
24
Pristine
25
Pristine
26
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Pristine
27
Pristine
28
sample = 235}
A12- (A30; A31) {rate = 63.6%, sample
= 165}
A11- A32 {rate = 51,2%, sample =
800}
Pristine
29
CART Analyses
I.
Pristine
30
Pristine
31
CART Algorithms
Pristine
32
Pristine
33
Pristine
34
Importing data in R
Selecting the variables
Runing the analysis
Interpreting the results
Pristine
35
Pristine
36
Pristine
37
#
Default_On_Pay
Status_Checking_Acc Observatio
ment
ns
A11
1370
675
A12
1345
520
A13
315
70
A14
1970
230
Duration_in_Mo
# Observations
nths
12
1795
24
2055
36
715
48
355
60
80
72
5
Credit_History
A30
A31
A32
A33
A34
Pristine
Default_On_Payment
380
610
285
180
40
5
#
Default_On_Pay
Observatio
ment
ns
200
125
245
140
2650
840
440
140
1465
250
Default
Rate
49.27%
38.66%
22.22%
11.68%
Default
Rate
21.2%
29.7%
39.9%
50.7%
50.0%
100.0%
Default
Rate
62.50%
57.14%
31.70%
31.82%
17.06%
38
Purposre_Credit_Taken
A40
A41
A42
A43
A44
A45
A46
A48
A49
A410
#
Default_On_Pay
Observatio
ment
ns
1170
445
515
85
905
290
1400
305
60
20
110
40
250
110
45
5
485
170
60
25
Default
Rate
38.03%
16.50%
32.04%
21.79%
33.33%
36.36%
44.00%
11.11%
35.05%
41.67%
Credit_Amount
# Observations
Default_On_Payment
4000
11000
12000
3770
1085
145
975
420
100
Savings_Acc
A61
A62
A63
A64
A65
Pristine
#
Default_On_Pay
Observatio
ment
ns
3015
1080
515
170
315
55
240
30
915
160
Default
Rate
25.86%
38.71%
68.97%
Default
Rate
35.82%
33.01%
17.46%
12.50%
17.49%
39
#
Years_At_Present_Emplo
Default_On_Pay
Observatio
yment
ment
ns
A71
310
115
A72
860
350
A73
1695
515
A74
870
195
A75
1265
320
Inst_Rt_Income
1
2
3
4
Marital_Status_Gender
A91
A92
A93
A94
1
0
#
Default_On_Pay
Observatio
ment
ns
680
170
1155
305
785
225
2380
795
#
Default_On_Pay
Observatio
ment
ns
250
100
1550
540
2740
730
460
125
#
Other_Debtors_Guaranto
Default_On_Pay
Observatio
rs
ment
ns
A101
4535
1355
A102
205
90
A103
260
50
Pristine
Default
Rate
37.10%
40.70%
30.38%
22.41%
25.30%
Default
Rate
25.00%
26.41%
28.66%
33.40%
Default
Rate
40.00%
34.84%
26.64%
27.17%
Default
Rate
29.88%
43.90%
19.23%
40
Current_Address_Yrs
1
2
3
4
1
2
1
3
1
4
Property
A121
A122
A123
A124
#
Default_On_Pay
Observatio
ment
ns
1410
295
1160
355
1660
510
770
335
Age:
Refer to
Analysis_of_Default.xlsx
Other_Inst_Plans
A141
A142
A143
1
5
#
Default_On_Pay
Observatio
ment
ns
650
180
1540
480
745
215
2065
620
Housing
A151
A152
A153
Pristine
#
Default_On_Pay
Observatio
ment
ns
695
285
235
95
4070
1115
#
Default_On_Pay
Observatio
ment
ns
895
350
3565
925
540
220
Default
Rate
27.69%
31.17%
28.86%
30.02%
Default
Rate
20.92%
30.60%
30.72%
43.51%
Default
Rate
41.01%
40.43%
27.40%
Default
Rate
39.11%
25.95%
40.74%
41
Num_CC
1
2
3
4
1
7
Job
A171
A172
A173
A174
1
8
1
9
2
0
Dependents
1
2
Telephone
A191
A192
Foreign_Worker
A201
A202
Pristine
#
Default_On_Pay
Observatio
ment
ns
3165
995
1665
460
140
30
30
10
#
Default_On_Pay
Observatio
ment
ns
110
35
1000
280
3150
925
740
255
#
Default_On_Pay
Observatio
ment
ns
4225
1265
775
230
#
Default_On_Pay
Observatio
ment
ns
2980
930
2020
565
#
Default_On_Pay
Observatio
ment
ns
4815
1475
185
20
Default
Rate
31.44%
27.63%
21.43%
33.33%
Default
Rate
31.82%
28.00%
29.37%
34.46%
Default
Rate
29.94%
29.68%
Default
Rate
31.21%
27.97%
Default
Rate
30.63%
10.81%
42
Pristine
43
Pristine
44
Pristine
45
Pristine
46
Pristine
47
Pristine
48
Pristine
49
Pristine
50
Pristine
51
Pristine
52
Pristine
53
Terminal Nodes
with default rate
greater than over
all average of
29.9%
Pristine
Node 14:
32.7%
Node 19:
51.2%
Node 20: 75%
Node 21:
38.4%
54
Terminal Nodes
with default rate
less than over all
average of 29.9%
Node 6: 25.8%
Node 12: 20%
Node 13:
26.9%
Node 17:6.5%
Node 18:
12.3%
Node 22:
25.7%
Terminal Nodes
with default rate
greater than over
all average of
29.9%
Node 14:
32.7%
Node 19:
51.2%
Node 20: 75%
Node 21:
38.4%
Pristine
55
Pristine
56
Pristine
57
CHAID
58
CHAID
59
Pristine
2 CART
60
CHAID
61
Pristine
62