You are on page 1of 2

1

UNIVERSITY OF CAPE TOWN


DEPARTMENT OF STATISTICAL SCIENCES
STA2020F: BUSINESS STATISTICS
Test 2 - Memo
Question 1 [50]
a)
i) = . + .
1
+ .
3
(1)
ii)
2
= . ; this means that 74.16% of the variability in the response
variable (sales) can be explained by the explanatory variables (X
1
, X
2
, X
3
) .
(2)
iii) It is an indication of the strength of the linear relationship between and
(i.e. the correlation coefficient calculated between and ) . (1)
iv) E
0
: [
1
= [
2
= [
3
= vs. E
1
: that at least one regression coefficient is not 0.
= 8.9 with a p-value = .8
-12
which is less than any
reasonable level of significance. We therefore conclude that we have
sufficient evidence to say that at least one of the explanatory variables is
linearly related to the response variable, sales. The model is valid. (4)
v) None of the variables are significant , since the p-values are all greater than
0.05. (2)
vi) Multicolinearity and one would check if the variables included in the
model are highly correlation (correlation matrix). (2)
vii) The local media variable (X
2
) was removed. This variable was selected as
it had the largest p-value and this p-value exceeded o
out
= . . (3)

b) With respect to the model constructed using the forward selection method:
i) = .8 + .
2
+ .
3

Reason: Need to consider all two variable models. The variable in addition
to the local media variable that has the lowest p-value, which is lower than
o
n
= ., is national media . Adding the remaining variable results in
the full model, and the added variable is insignificant at 10%. (2)
ii) The local media variable (X
2
) since it has the highest correlation with sales.
(2)
iii)
A. V.9 = .8
B.
ud]
2
= -
44-1
44-2

149.7371
515.9451
= .9
C. V. = .888
D. =
MSR
MSL
=
366.2080
3.5651
= .
E. n -p = - =
F. =
2.8315
0.6989
= . (6)
iv) The 95% CI:
[

_
n-p,
u
2

. _
44-3,
0.05
2
.
. _ .9 .
2

. _ .
(-.,.) 4
{allow for rounding and table value selected}
Since 0 is an element of this interval, we would conclude that the coefficient
is not significantly different from the zero at the 5% level of significance.
(5)
v) The local media coefficient is 1.6577 , which means that 1.6577 thousand
additional cases is sold for every additional R10000 spent on local media
advertising (all other variable remaining the same) . (2)
vi) Yes it would , the national media variable would not have been included
into the model as its . > p-value = .89 > . = o
n
. (3)

c) Comparing the forward and backward selection models:
i) The models differ both models had two explanatory variables of which
one was the national media variable, but the other variable differed .
Reason: The variable selection method differed . In forward selection the
variable that explains most of the unexplained variation in was included,
while in backwards selection the variable that was most insignificant was
removed. This need not result in the same model (as can be seen here). (3)
ii) Backward selection model : (1) highest
2
, (2) highest adjusted
2
and (3)
lowest MSE. It also has the highest -test statistic value corresponding to the
lowest p-value. And all the variable are significant at a lower level of
significance. { for any one of these reasons} (2)

d) Regarding the regression assumptions:
i) |e
]
] = for all (errors have a mean of 0) , |e
]
2
] = o
2
for all (errors
have a fixed variance homoscedasticity) , |e

e
]
] = for all , (errors
are independent) and e
]
(, o
2
) for all (errors are normally distributed)
. (4)
ii) Figure 1: a straight line through the origin
Figure 2: the residuals should be scattered randomly about the 0 value . (2)
iii) Figure 1: Concerns about normality as there is some deviation from a straight
line in the left tail.
Figure 2: Some concerns about heteroscedasticity as the variance is not
constant throughout (more variation in the middle than towards the ends).
Figure 2: Concerns about the residuals having a mean of 0 as most of the
residuals lie above the 0 line. 2 {mark any two of these} (4)

You might also like