Professional Documents
Culture Documents
Introduction
Since the seminal paper by Box and Cox(1964), the Box-Cox type
of power transformations have generated a great deal of interests,
both in theoretical work and in practical applications. In this
presentation, I intend to go over the following topics:
The following are Q-Q Normal plots for a random sample of size
500 from Exp(1000) distribution.
5000
5
Sample Quantiles
Sample Quantiles
4
3000
3
0 1000
2
3 2 1 0 1 2 3 3 2 1 0 1 2 3
Exp(1000) Weibull(5,1000)
20
10
Sample Quantiles
Sample Quantiles
15
8
6
10
4
5
2
3 2 1 0 1 2 3 3 2 1 0 1 2 3
Weibull(3.64,1000) Weibull(2.86,1000)
Since the work of Box and Cox(1964), there have been many
modifications proposed.
Manly(1971) proposed the following exponential transformation:
ey 1 , if 6= 0;
y() =
y, if = 0.
where
1, if y 0;
Sign(y) =
1, if y < 0.
() = (X X) Xy(),
y() (In G)y()
2 () = ,
n
where G = ppo(X) = X(X X) X .
Substitute () and 2 () into the likelihood equation, and note
that for the original form of the Box-Cox transformation,
Qn
J(, y) = i=1 yi1 , we could obtain the profile log likelihood(i.e.,
the likelihood function maximized over (, 2 )) for alone.
n
lP () = C log(s2 ),
2
where s2 is the residual sum of squares divided by n from fitting
the linear model y(, g) N(X, 2 In ). So to maximize the profile
log-likelihood, we only need to find a that minimizes
2 y(,g) (In G)y(,g)
s = n .
Without any further effort and just use the standard likelihood
methods, we could easily give a likelihood ratio test. For test
H0 : = 0 , the test statistic is W = 2[lP () lP (0 )].
Asymptotically W is distributed as 21 . Carefully note that W is a
function of both the data (through ) and 0 .
A large sample CI for is easily obtainable by inverting the
likelihood ratio test. Let be the MLE of , then an approximate
(1 )100% CI for is
SSE()
{ | n log( ) 21 (1 )},
SSE()
where SSE() = y(, g) (In G)y(, g). The accuracy of the
approximation is given by the following fact:
12
P (W 21 (1 )) = 1 + O(n ).
Its also not hard to derive a test using Raos score statistic.
Atkinson(1973) first proposed a score-type statistic for test
H0 : = 0 , although the derivation were not based on likelihood
theory. Lawrence(1987) modified the result by Atkinson(1973), by
employing the standard likelihood theory.
1
Box and Cox(1964) chose b = J(, y) n = g 1 , which they
admitted that such choice was somewhat arbitrary. This gives
the Box-Cox version of the prior distribution.
Pericchi(1981) followed exactly the same argument, with the
exception that the use of Jefferys prior for (, ) instead of
invariant non-informative prior.
Clearly the Box and Coxs prior is outcome-dependent, which
seems to be an undesirable property.
Its not hard to see that the posterior distribution for Box and
Cox prior is
1 b A A( )
S + ( ) b
(1)(nr)
1 (, , |y, A) n+1 exp( 2
)g (),
2
where S = (y() A) (y() A).
The posterior distribution for Pericchis prior is
1 b A A( )
S + ( ) b
(1)n
2 (, , |y, A) n+r+1 exp( 2
)g ().
2
613.0
95%
613.5
614.0
614.5
0.5 1.0 1.5 2.0 2.5
Pengfei Li Apr 11,2005
Box-Cox Transformation: An Overview
4
3
log of fitted slope
2
1
lambda
5
log of residual standard deviation
4
3
2
1
lambda
0.38
ratio of fitted slope to residual standard deviation
0.36
0.34
0.32
0.30
lambda
4
1000 2000
33 33
Standardized residuals
3
18 18
2
Residuals
1
0
0
1000
2 1
64
64
0.5
33
77
18
Standardized residuals
0.4
1.5
64
Cooks distance
0.3
1.0
17
0.2
0.5
0.1
64
0.0
0.0
11.5
11.0
10.5
fitted slope
10.0
9.5
9.0
lambda
30.1
30.0
residual standard deviation
29.9
29.8
29.7
29.6
lambda
0.38
ratio of fitted slope to residual standard deviation
0.36
0.34
0.32
0.30
lambda
4
1000 2000
33 33
Standardized residuals
3
18 18
2
Residuals
1
0
0
1000
2 1
64
64
0.5
33
77
18
Standardized residuals
0.4
1.5
64
Cooks distance
0.3
1.0
17
0.2
0.5
0.1
64
0.0
0.0
However, such scaling has not much effect other than producing
comparable estimates. The test statistic remain the same as if
unscaled. Thus it may not be useful for ANOVA model.