You are on page 1of 48

Appendix A1

Review of Linear Algebra


Matrix and vector
An m n matrix A is an array of m rows and n columns of elements:

mn

A aij

a11 a12
a
a22
21

am1 am 2

a1n
a2 n

amn

where aij denotes the element in the ith row and jth column of A, often referred
to as the (i,j)-element of A.

The elements in a matrix can be anything of interest (numbers, functions or


symbols), but we will focus on matrices with real numbers as elements.
A vector of dimension n is a n 1 matrix.
1

The transpose of an m n matrix A, denoted by AT is an n m matrix with aij


at its jth row and ith column:

A aij

nm

a
ji
mn

a11
a
12

a1n

a21 am1
a22 am 2

a2 n amn

In particular, a vector v of dimension n can be written as


v v1 v2 vn

or vT v1 v2 vn

A matrix A is said to be symmetric if AT A , or equivalently, aij a ji for all


i, j 1, , n m (it must be a square matrix).

mn bij mn aij bij mn


a matrix: A aij
aij
mn
mn

The sum of matrices: A B aij

The scalar multiple of

mn and B bij nl :

The product of A aij

ml

AB C cij

with cij

aik bkj ai1b1 j ai 2b2 j ainbnj

k 1

Note that AB BA in general.

In particular, for v v1 v2 vn and u u1 u2 un ,


T

u v v u ui vi u1v1 u2v2 un vn
T

i 1

An n n matrix I is called an identity matrix if it has the form


1
0
I diag 1,1, ,1

0
0

It satisfies AI IA A for any n n matrix A.


3

Quadratic form

For any n n matrix A aij and vector x x1 x2 xn , the product

xT Ax x1

x2

a11
a
xn 12

a1n

a21 an1 x1
a22 an 2 x2 n n
aij xi x j
i 1 j 1

a2 n ann xn

is called a quadratic form. In particular,

a11 a12 x
2
2
y
a
x
a
a
xy
a
y

12
21
22
y 11
a
a
21 22

An n n symmetric matrix A is said to be:


positive definite if xT Ax 0 for any n 1 vector x 0 ;
positive semi-definite if xT Ax 0 for any n 1 vector x .
4

Determinant

For a 2 2 matrix A aij , its determinant is denoted and defined by

A a11a22 a12 a21

For a 3 3 matrix A aij , the minor M ij of aij is the determinant of the


2 2 matrix obtained by deleting row i and column j from A.

The co-factor Aij of aij is defined by Aij (1)i j M ij .


The determinant of a 3 3 matrix A can be calculated by
A ai1 Ai1 ai 2 Ai 2 ai 3 Ai 3

(expansion by row i); or

A a1 j A1 j a2 j A2 j a3 j A3 j (expansion by column j)

for any row i = 1, 2, 3 or any column j = 1, 2, 3.

Similarly, we can calculate A for an n n matrix A with n = 4,5,.


5

For example,
2

3 2
0 2
0 3
A 0 3 2 2
4
5
6 8
3 8
3 6
3 6 8

2 24 (12) 4 0 6 5 0 (9) = 141


(expansion by row 1); or
A 2

3 2
6

(3)

3 2

2 24 (12) 3 8 15 = 141

(expansion by column 1).


In particular,
a11

a21 an1

a22 an 2

ann

a11a22 ann

Linear independence of vectors

Vectors v1, v2 ,, vk are said to be linearly independent if

1v1 2v2 k vk 0
holds only for scalars 1 2 k 0 .
Vectors v1, v2 ,, vk are said to be linearly dependent if

1v1 2v2 k vk 0
holds for at least one i 0 .
Equivalently, v1, v2 ,, vk are linearly dependent if there exists a vi such that,
for some scalars j , j i ,
j
vi j v j
j i
j i i

vj

( i 0 )

That is, one of v1, v2 ,, vk is a linear combination of the others.


7

Rank of matrix

For an m n matrix A, the maximum number of linearly independent rows


(columns) is called the row (column) rank of A.
It can be proven that row and column ranks of A are always equal. The
common value the row and column ranks of A is defined to be the rank of A,
and denoted by Rank(A).
Obviously, Rank( A) min(m, n) for an m n matrix A.
A row operation on a matrix includes: (i) multiply a row by a scalar;
(ii) interchange two rows; and (iii) add a scalar multiple of one row to
another.
Rank(A) can be found by performing row operations on matrix A until
reaching an upper triangular matrix. The number of non-zero rows in this
upper triangular matrix is equal to Rank(A).

Example A1.1.
1 3 2
2 8 8
A
3 6 9

4 10 1
1

R3 1.5 R 2 0
R 4 R 2 0

0
2

1
R 2 2 R1
R3 3R1 0

R 4 4 R1 0

3 2 0
2 4 2

0 9 3

0 3 1

3 2
2 4
3 3
2 7

1
0
3
R

3 0

0
2

3
3 2 0
2 4 2

0 9 3

0 0 0

Thus Rank(A) = 3.
Note: The matrix A need not be square in this way to determine its rank.

Inverse matrix

For an n n matrix A , its inverse matrix, denoted by A1 , is an n n matrix


(if exists) such that AA1 A1 A I . If A1 exists, A is said to be invertible.
An n n matrix A is invertible if and only if A 0 , which is equivalent to
Rank(A) = n; and A is said to have full rank in this case.
The inverse A1 of a square matrix A, if exists, can be found by row
operations as follows:
A I Row operations I B

(A1.1)

Then A1 B .
An n n matrix A is said to be orthogonal if AAT AT A I , or AT A1.
A a1 an is orthogonal if and only if a1, , an are orthonormal vectors
in the sense that aiT ai 1 and aiT a j 0 for i j , 1 i, j n .

10

Example A1.2.
0 1 1
0 1 1

A 1 0 1 A I 1 0 1

3 2 1
3 2 1
1 0 1

0 1 1
3 2 1

R1 R 2

R 3 6 1

0 1

0 1 1
0 0 1

1 0 0

0 1 0
0 0 1

0 1 0 R 33 R1 1 0 1

1 0 0 0 1 1
0 0 1 R 3 2 R 2 0 0 6

0 R1 R 3 1 0 0

1
0
0 0 1 0
1 3 1 2 1 6 R 2 R 3 0 0 1
0

0 1 0

1 0 0
2 3 1
1 3
23
13

1 6

1 2 1 6
12
1 6
12

1 3 1 2 1 6
2 3 1
1
A1 2 3 1 2 1 6 4 3 1

1 6
1 3 1 2
2 3 1
11

nn can be determined by

Alternatively, A1 of A aij

nn

1
A Aij
A
1

(A1.2)

where Aij is the co-factor of aij .

For example, the inverse of a 2 2 matrix A aij is given by


a12
1 a11
A

a
a
21 22

1 A11

A A21

A12
1 a22

A22
A a12

a21
a11

a22 a12
1

a11a22 a12 a21 a21 a11

provided A a11a22 a12 a21 0 .


For n 2 , method (A1.1) is usually more efficient than (A1.2).

12

Linear equations

A system of linear equations can be expressed in matrix form as


Ax b

(A1.3)

where A is an m n matrix, x is an n 1 vector and b is an m 1 vector, with x


as the unknown to be solved, while b is given.
(A1.3) consists of m equations for n unknowns in x x1 xn .
T

The solutions to (A1.3) have three possibilities:


A unique solution;
No solution;
Infinitely many solutions.

If m n , a unique solution is impossible. We will focus on the case with


m n , i.e., A is a square matrix.

13

Assume A to be an n n square matrix from now on.


Ax b has a unique solution x A1b if and only if A is invertible, or
equivalently, A 0 or A has full rank n.
In particular, the homogeneous equation Ax 0 has a unique solution
x 0 if and only if A is invertible.
If A 0 (hence Rank( A) n ), the homogeneous equation Ax 0 must have
infinitely many solutions.
If Rank( A) k n , n k of the elements x1, x2 ,, xn in the solution to
Ax 0 can be taken free. Thus the set of all solutions to Ax 0 form a linear
space with n k dimensions.

For non-homogeneous equation Ax b with b 0 and A 0 ,


there is no solution if Rank A Rank A b ;
there are infinitely many solutions if Rank A Rank A b .
14

Eigenvalues and eigenvectors

Given an n n matrix A, if Av v for some scalar and vector v 0 , then


is called an eigenvalue of A and v is an eigenvector corresponding to .
An eigenvalue of A and its corresponding eigenvector v must satisfy

A I v 0

(A1.4)

This can be viewed as a homogeneous equation with unknown v.


Equation (A1.4) has a non-zero solution if and only if
A I 0

(A1.5)

Since eigenvector v 0 , any eigenvalue of A must satisfy equation (A1.5).


For an n n matrix A, A I is a polynomial of with degree n.
Hence equation (A1.5) has n roots, labeled as 1, 2 ,, n (they need not be
distinct), which give all eigenvalues of A.
For each eigenvalue solved from (A1.5), there is at least one eigenvector.
15

If v1, v2 ,, vk are eigenvectors corresponding to distinct eigenvalues


1, 2 ,, k ( k n ), then they are linearly independent.
This can be seen as follows. If v1, v2 ,, vk are linearly dependent, then one of
them can be expressed as a linear combination of other linearly independent
vectors from v1, v2 ,, vk . Without loss of generality, let
v1 2v2 m vm 0 ( 2 m k )

(A1.6)

Since Avi i vi , i 1, 2,, n , 1v1 Av1 and (A1.6) imply

1 2v2 mvm A 2v2 mvm 22v2 mmvm

1 2 2v2 1 m mvm 0

1 2 2 1 m m 0

(since v2 ,, vm are linearly independent).


Thus 2 m 0 as 1, 2 , , k are distinct. This contradicts (A1.6)
and hence v1, v2 ,, vk must be linearly independent.
16

If the eigenvalues 1, 2 ,, n of A are all distinct, then A has n linearly


independent eigenvectors v1, v2 ,, vn . If not, it is possible (but not necessary)
that A has fewer than n linearly independent eigenvectors.
If is a single root of equation (A1.5), there is only one linearly independent
eigenvector v corresponding to .
If m is the largest number such that 1 is a factor of A I , then 1 is
m

said to be an eigenvalue of multiplicity m.


For an eigenvalue 1 of multiplicity m:
There are k m linearly independent eigenvectors v1, v2 ,, vk ;
Any other eigenvector must be a linear combination c1v1 ck vk of
v1, v2 ,, vk , where c1,, ck are scalars;
There exist m linearly independent vectors v1, v2 , , vm , each satisfies:

A I v 0 for some l 1, 2,, m.


l

(A1.7)
17

Similar Transform

Let A be a square matrix with an eigenvalue and its corresponding


eigenvector v. If V is an invertible matrix, then

AV V 1v V 1 Av V 1 v V 1v

(A1.8)

Since V is invertible and v 0 , we must have V 1v 0 . Hence (A1.8) shows


that is also an eigenvalue of V 1 AV ; in other words, A and V 1 AV have
the same eigenvalues.
V 1 AV is called a similar transform of A. Therefore we have shown that
eigenvalues are invariant under a similar transform.
If an n n matrix A has n linearly independent eigenvectors v1, v2 , , vn , then
V v1 v2 vn has an inverse V 1 .
Since Av j j v j , j 1, , n,
AV A v1 v2 vn Av1

Av2 Avn 1v1 2v2 n vn


18

Let v j v1 j

v2 j vnj , j 1, , n , and D diag 1 , 2 ,, n . Then


T

AV 1v1 2v2

v11 v12
v
21 v22

vn1 vn 2

1v11 2v12
v
1 21 2 v22

n vn

1vn1 2vn 2
v1n 1 0
v2 n 0 2

vnn 0 0

n v1n
n v2 n

n vnn

0
0
VD

(A1.9)

Thus V 1 AV D is a diagonal matrix. That is, A can be diagonalised by a


similar transform.
This is possible even if the eigenvalues of A are not all distinct.

19

Example A1.3. The matrix


1 0 0
A 0 1 0

1 1 2

has one double eigenvalue: 1 2 1 and one single eigenvalue 3 2 .


It is easy to find three linearly independent eigenvectors v1, v2 , v3 to form the
matrix V v1 v2 v3 to diagonalise A:

A 1I v1

v2 A I v1

0 0 0 1 0
v2 0 0 0 0 1 0 and

1 1 1 1 1

1 0 0 0
A 2 I v3 0 1 0 0 0 V v1 v2
1 1 0 1

1 0 0
v3 0 1 0

1 1 1

It can be easily checked that V 1 AV diag 1, 2 , 3 diag 1,1, 2 .


20

Triangularisation

An n n matrix A with fewer than n linearly independent eigenvectors cannot


be diagonalised. However, it can be triangularised by a similar transform:
V 1 AV U an upper triangular matrix

(A1.10)

The transform matrix V in (A1.10) can be obtained by solving the linear


equations in (A1.7):

A i I l v 0 ,

l 1, 2, , mi

(A1.11)

for each eigenvalue i of A with multiplicity mi , i = 1,2,,k, where


m1 mk n .
Equations (A1.11) can give n linearly independent vectors v1, v2 , , vn , and
then V v1 v2 vn satisfies (A1.10).
To see why, look at the case with m = 2 for example. Let be an eigenvalue
of multiplicity 2 for a 2 2 matrix A.
21

If A I v 0 has only one linearly independent solution v1, then any other
eigenvector of A has the form cv1 for some scalar c 0 .
Let v2 be a solution to A I v 0 , linearly independent of v1. Then
A I v2 is an eigenvector, which implies A I v2 cv1 for some c 0 ,
so that Av2 cv1 v2 . Thus
2

AV A v1 v2 Av1

Av2 v1 cv1 v2

v1 v2 0 cv1 V 0 cv1

(A1.12)

T
T
Since V 1v1 V 1v2 V 1 v1 v2 V 1V I cV 1v1 c 1 0 c 0 ,
it follows from (A1.12) that V 1 AV is upper triangular:

V 1 AV V 1 V 0 cv1 I 0 cV 1v1
1 0 0 c c

0 1 0 0 0
22

Orthogonal transform

By the spectral theorem in matrix theory, any n n symmetric real matrix A


has real eigenvalues 1,, n and there is an orthogonal matrix V such that
V T AV V 1 AV diag 1,, n

(A1.13)

which is referred to as an orthogonal transform of A.


To obtain this orthogonal matrix V , we can first find linearly independent
eigenvectors u1,, un of A corresponding to 1,, n (which exist by the
spectral theorem). Then ui and u j are orthogonal (i.e., uiT u j 0 ) for i j .
If 1 m is an eigenvalue of multiplicity m and u1,, um (1 m n ) are
the corresponding eigenvectors, take
v1 u1, v1 v1

v1, v2 u2 v1T u2 v1, v2 v2

vm um v1T um v1 vmT 1um vm , vm vm

v2 , ,

vm , where v vT v
23

It is easy to check that v1, , vm are orthonormal eigenvectors corresponding


to the eigenvalue 1 m .
Do this for each eigenvalue of A to obtain orthonormal eigenvectors v1, , vn
corresponding to 1,, n . Let V v1 vn . Then
v1T
v1T v1 v1T vn 1 0

I
V V v1 vn

vT
vT v vT v 0 1
n n
n
n 1

Thus V is an orthogonal matrix and satisfies (A1.13).


Furthermore, it can be shown that a symmetric matrix A is positive definite if
and only if all eigenvalues of A are positive; and A is positive semi-definite if
and only if all eigenvalues of A are nonnegative.
Thus if A is an n n positive semi-definite matrix, then (A1.13) holds with an
orthogonal matrix V and i 0 for i 1, , n .
24

Square root of a matrix

Any matrix B satisfying B 2 A is a square root of A, which is not unique.


However, a positive semi-definite matrix has a unique positive semi-definite
square root. To see this, recall that (A1.13) holds with an orthogonal matrix V
and i 0 , i 1, , n , if A is positive semi-definite. Hence we can define
A1 2 V diag

1 ,, n V T

Then A1 2 is the unique positive semi-definite square root of A:


12

A1 2 A1 2 V diag

V diag

1 ,, n V T V diag

1 ,, n V T

1 ,, n V T V diag 1,, n V T

V V T AV V T VV T AVV T A

(since VV T V TV I for orthogonal matrix V).


25

Appendix A2

Law of iterated expectation


Given any two continuous random variables X and Y, let
f x, y denote the joint density of (X,Y);
f X x and fY y the marginal densities of X and Y, respectively;

f X |Y x | y the conditional density of X given Y = y.

Then
f x, y f X |Y x | y fY y and f X x

Define

f x, y dy

g y E X | Y y x f X |Y x | y dx

Then by definition, E X | Y g Y .
26

It follows that

E E[ X | Y ] E g Y g y fY y dy

x f

X |Y

x | y fY y dxdy

x f x, y dxdy x f x, y dy dx xf X x dx E X

Thus we have the law of iterated expectation:


E X E E[ X | Y ]

(A2.1)

This holds for discrete and mixed distributions as well.


More generally, for any bivariate function h x, y ,

E h X , Y E E[h X , Y | Y ] E h X , y Y y fY y dy

(A2.2)

if Y is continuous, and (because Pr E E I E for any event E)


Pr h X , Y A Pr h X , y A | Y y fY y dy

(A2.3)
27

Appendix A3

Maximum Likelihood Estimator


Consistency

Let X 1, X 2 , , X n be independent and identically distributed (i.i.d.) random


variables with a parametric distribution function F x | and representative
X ~ F x | .
First consider a univariate parameter with true value 0 .
Let l | X i denote the contribution of X i to the likelihood function,

si
log l | X i

2
and vi 2 log l | X i

The log-likelihood function is


n

log L log l | X i
i 1

28

Define
n


log L si

i 1

and

2

2 log L vi

i 1

If the sample is complete with true density f x | 0 , then

E s1
log f x | f x | 0 dx

f x | 0 f
dx
f x |

Assume that the support of f x | does not depend on . Then


E s1 0


f x |
dx

(1)
0
0

f x | dx
0

(A3.1)
29

Hence
E 0 nE s1 0 0

(A3.2)

Let be the maximum likelihood estimator (MLE) of 0 such that () 0 .


Then by Taylor expansion,

0 () 0 0 0

(A3.3)

By the law of large numbers and (A3.1), as n ,


1
1
0 E s1 0 0 and
0 E v1 0
n
n

Hence by (A3.3), for large n,

E s1 0
0

0
0 n
E v1 0
n

(A3.4)

This explains the consistency of the MLE.


30

Asymptotic normality

By the central limit theorem and (A3.1),

1
1 n
0
si 0 N 0, Var s1 0

n
n i 1

(A3.5)

By (A3.3),

n 0

Var
s

1
1 0

0 N 0,
2

n
E v1 0

(A3.6)

Moreover, let f1 f X 1 | . Then


1 2 f1
2 log f1
1 f1 1 f1
v1

2

2
f1 f1
f1 2

1 2 f1
1 2 f1
log f1
2

s1

2
f1
f1 2

(A3.7)

31

By (A3.7) and similar to the derivation of (A3.1), we get


E v1 0

E s12

2 f
1
f x | 0 dx
2
f x | 0
0

E s12

0 2 (1) E s12 0

0

(A3.8)

By (A3.1) and (A3.8),

Var s1 0 E s12 0 E s1 0

E v1 0

(A3.9)

Consequently,
Var 0 nVar s1 0 nE v1 0 E 0 I 0

(A3.10)

where I ( ) is the Fishers information.

32

It follows from (A3.6) and (A3.10) that

E v1 0

0 ~ N 0,

n E v1 0

1
1
N 0,
N 0,

nE v1 0

I 0

approximately for large n.

Therefore the asymptotic distribution of the MLE is given by

~ N 0 ,

This also shows that, for large n, the variance of the MLE of 0 can be
approximated by

Var

I 0

and estimated by Var


I ()

33

Multivariate parameter vector

For a multivariate vector of k parameters, the above results remain valid in


a multivariate version.
The Taylor expansion in (A3.3):

0 0 () 0 0

(A3.11)

still holds, but with a k 1vector 0 and a k k matrix 0 .

The consistency in (A3.4) now becomes


1 1
0 n 0
0
n

E v1 0

E s1 0 0

By the multivariate central limit theorem, (A3.5) remains valid, and similar
arguments to (A3.8) show that the variance matrix in (A3.5) is
T

Var s1 0 E s1 0 s1 0 E v1 0

(A3.12)
34

Combine (A3.5), (A3.11), (A3.12) and the law of large numbers, we get

1
1
1

0
I 0 0 0 n 0
n
n
n

N 0, E v1 0 in distribution as n

(A3.13)

For a k 1random vector X with E[ X ] 0 , we have Var X E[ XX T ], so


that with any m k constant matrix A,
T
Var AX E AX AX A E[ XX T ] AT A Var( X ) AT

(A3.14)

Thus (A3.13) shows that asymptotically as n ,


Var() I 0

nE v1 0 I 0

I 0 I 0 I 0

I 0

and consequently,

~ N 0 , I 0

) I () 1
with variance estimator Var(
35

Moreover, it follows from (A3.12) that I 0 nE v1 0 nVar s1 0 is


a positive-definite matrix, and hence has a positive-definite square root
I 0

12

nE v1 0

12

with inverse I 0

1 2

E v1 0
n

1 2

As a result, (A3.13) and (A3.14) imply

I 0

12

I 0

1 2

I 0

N 0, E v1 0

1 2

E v1 0

1 2

E v1 0 E v1 0

1
I 0 0
n

1 2

N 0, I k

where I k is the k k identity matrix.

It follows that

I 0 0 k2

(A3.15)

36

MLE for U 0, distribution

Let X1, , X n be a (complete) random sample from the U 0, distribution.


Then the likelihood of is
0
L n I0 x ,, x n I 0 x x n
1
n

(1) ( n )

if x( n )
if x( n )

where x(1) min x1 , x2 , , xn and x( n ) max x1 , x2 , , xn

Thus L reaches maximum at x( n ) .


Therefore the maximum likelihood estimator of is X ( n ) .
The cdf of X ( n ) is given by

Pr X ( n )

x
x Pr X1 x, , X n x I0 x I x

37

It follows that for large n,

x
x

Pr n X ( n ) x Pr X ( n ) Pr X ( n )
n
n

1
x x
1

n
n

Consequently,

lim Pr n X ( n )

x
x
x lim 1

n
n
n

This shows that the limiting distribution of n n X ( n ) is


exponential with mean .

Thus the MLE of for the uniform distribution over interval 0, is not
asymptotically normal.
38

Properties of MLE with survival data

The above arguments for consistency and asymptotic normality are based on
complete data. For survival data subject to censoring, the results remain valid,
but the derivations become more complex.
For example, the contribution l | X i to the likelihood becomes

1 i

l | X i f X i | i S X i |

where i is the censoring indicator of X i .

(A3.1) and (A3.8) still hold with the log-likelihood of X i :


log l | X i i log f X i | 1 i log S X i |

(A3.16)

under any censoring distribution.

Let X* denote the failure time (not subject to censoring) with cdf F x |
and C the censoring random variable.
39

Given C = c, the cdf of X X * C is


F x | 0 if x c
Pr X x Pr X c x
1
if x c

That is, given C = c, X has a mixed distribution with density f x | 0 for

x c and a mass at c with probability

Pr X c Pr X * c 1 F c | 0 S c | 0

Note that the censoring indicator is I{ X * c}. Therefore by (A3.16),


s1

log l 1 f
1 S

1
f
S

(A3.17)

where for convenience we write

l l | X1 ,

f f X1 |

and

S S X1 |

40

It follows that
1 f

log l
1 S
E s1 C E
C E
I X * C
I X * C C
S

f x | 0 f x |
S C | 0 S C |

dx

f x |
S C |

E s1 0 E E s1 0 C E

f x |
S C |
dx

(A3.18)
0

If the integration and differentiation in (A3.18) are interchangeable, then

E s1 0 E

F C | S C |
S C |

f x | dx
E

0
0

E F c | S c | E (1) 0

0
0

(A3.19)

41

Furthermore, (3.16)(A3.17) imply


T
T
T

2 log l
log l
1 f
1 S

v1



1
T

f
S

1 f f T 1 2 f
1 S S T 1 2 S
1
2
2
T
T
f
f
S S
1 f
1 f

1 S
1 S

1
1
S
S
f
f

1 2 f
1 2S

1
T
T
f
S
s1 s1

1 2 f
1 2S

I
T X * C
T X * C
f
S

(A3.20)

42

Thus similar arguments to (A3.19) lead to


2
E
T

E v1 0 E s1 0 s1 0

E s1 0 s1 0

E s1 0 s1 0

2S C |
f x | dx

E
F C | S C |
T

(A3.21)
E

1
E
s

1 0 1 0
T

By the law of large numbers and central limit theorem, (A3.19) and (A3.21)
show that the following properties of the MLE remain true for survival data:
(i) 0 in probability as n ; and

1
(ii) ~ N 0 , I 0

approximately for large n.


43

MLE for U 0, with censoring

If the data include both uncensored and censored points, let x j be the largest
censored point, then j 0 and

1 i n . The likelihood becomes


1 i

1 i xi
L 1

i 1
n

1 i

xi

(A3.22)

i 1

for x( n ) max x1,, xn and L 0 if x( n ) .


Extend L in (A3.22) to x j . Then
L( )

n n 1 i

n log 1 i log xi
i 1 xi
i 1

as x j ( x j )
1 i

n
xj
i j xi
n 1 i 0 as
44

Thus 0 has a solution x j and is a maximum point of the

extended L over x j . Then the MLE of is max , x( n ) , since the


actual likelihood L 0 for x( n ) , which implies x( n ) .
If x( n ) is censored, then x j x( n ) . Hence x( n ) .
If x( n ) is uncensored ( x( n ) x j ), then either x( n ) or x( n ) is possible
(depending on the data). When x( n ) , the MLE of is x( n ) .
To see an example, let n 2 and x2 x1 1. Suppose that x1 1 is censored
and x2 x( n ) is uncensored. Then for x j x1 1,
1
2

0 2
1 ( 1)
2

Hence 2 if 1 x2 x( n ) 2 ; or x( n ) x2 if x2 2 .
45

Furthermore, by (A3.18),

E s1 0 E

I x dx

1 IC IC

0

C 0 C

2 I x dx 2 IC E
2 IC
2
0

0
0
0

C 0

C
IC IC 2 IC
E
2
0
0
0

0
0

0
C
E 2 IC 2 IC 2 IC E IC
0
0
0
0
0
0
0

1
E IC Pr C 0
0
0
0

(A3.23)

46

Similarly by (A3.20),
E v1 0

E s12

E s12

0 E

0 E

2 f x |
2
2

2S C |
dx

I
dx
3 x

I
C
3 0

2C

2C 0 2C

2
2

E s1 0 E
3 IC E s1 0 E 2 IC
3
0
0
0
0

2
2

E s1 0 2 Pr C 0

(A3.24)

(A3.23) and (A3.24) show that


E s1 0 0 and Var s1 0 E s12 0 E v1 0

(A3.25)

if and only if Pr C 0 0 , or Pr C 0 1.
47

Thus the MLE of in a censored U 0, remains asymptotically normal if


() 0 and Pr C 1.
0

Recall that a key regularity condition for the properties of MLE is that the
support of the failure distribution does not depend on unknown parameters,
which can generally ensure interchangeable integration with differentiation in
(A3.1) and (A3.8), or in (A3.19) and (A3.21), in order to obtain (A3.25).
For example, U 0, does not satisfy this condition and in this case,

f
dx

1
dx

0


0
2 dx
0
0

1

f dx
0

This is, however, not a necessary condition. Even if it fails, (A3.25) may still
hold, such as in a censored U 0, with Pr C 0 1.
When the support of the failure distribution depends on unknown parameters,
we need to check (A3.25) specifically instead of relying on the general theory.
48

You might also like