You are on page 1of 5

Class 9.

07 Fall2004
Handout: Two Sample Hypothesis Testing And Inference For
Dierence In Means
Hypothesis Testing
I. Two independent samples from Normal distributions.
Suppose X
1
,...,X
n
1
is an independent sample from Normal(
1
,
1
2
) distribu-
tion. Independently of the rst sample, suppose Y
1
,...,Y
n
2
is an independent
samplefromNormal(
2
,
2
2
)distribution(possiblydierentfromtherstone):
Mean
Variance
Data
Sample size
Sample Mean
StandardDeviation
Distribution
Group1 Group2 Known?

2
Unknown

2
1

2
2
Either
X
1
,...,X
n
1
Y
1
,...,Y
n
2
Known
n
1
n
2
Known
m
1
=

X m
2
=

Y Known
SD
1
SD
2
Known
Normal(
1
,
2
1
) Normal(
2
,
2
2
) Assumed
Reasonable estimate forthe dierence ofthe population means
1

2
is

m
1
m
2
=X

Y .
Note that
E(m
1
m
2
) =
1

2
and
SE(m
1
m
2
) = V ar(m
1
m
2
) = V ar(m
1
) + V ar(m
2
) =
1
2
/n
1
+
2
2
/n
2
.
forindependent samples X
1
,...,X
n
1
and Y
1
,...,Y
n
2
.
1


For testing
H
0
:
1
=
2
(1)
against
H
1
:1)
1
=
2
or
2)
1
<
2
or
3)
1
>
2
use
test statistics d

=
m
1
m
2
obt
SE(m
1
m
2
)

which follows some distribution d .


Theorem. Under the above assumptions about the two samples X
1
,...,X
n
1
and Y
1
,...,Y
n
2
, for testing test H
0
:
1

2
= 0 at - signicance level
vs
1)H
1
:
1


=0. Reject H
0
if |d
obt
| d
crit
(/2)

2)H
1
:
1

2
<0. Reject H
0
if d

d
crit
()
obt

3)H
1
:
1

2
>0. Reject H
0
if d

d
crit
()
obt
Computation of SE(m
1
m
2
) and choice of distribution d

:
1.
1
and
2
are known
SE(m
1
m
2
) =
1
2
/n
1
+
2
2
/n
2
m
1
m
2
Test statistics d
obt
= z
obt
=

2
follows standard Normal distri-
1
/n
1
+
2
2
/n
2
bution z.
2.
1
and
2
are unknown, but n
1
and n
2
arelarge ( 30)
In this case can omit Normality assumption.
SE(m
1
m
2
) =
1
2
/n
1
+
2
2
/n
2
SD
2
2
/n
2
and
1
/n
1
+SD
2
m
1
m
2
teststatisticsd
obt
=z
obt
=

1
/n
1
+SD
2
approximatelyfollowsstandard
SD
2
2
/n
2
Normal distribution z.
2
3.
1
and
2
are unknown, and n
1
andn
2
arenot largeenough (30)
a)
2
=
2
=
2
(unknown)
1 2

SE(m
1
m
2
) =
2
(1/n
1
+ 1/n
2
) with
1
+(n
2
1)SD
2
pooledestimate of
2
:
2
=
(n
1
1)SD
2
2
and
pool n
1
+n
2
2
m
1
m
2
test Statistics d
obt
= t
obt
=

2
follows t distribution
pooled
(1/n
1
+1/n
2
)
with df =n
1
+n
2
2 degrees of freedom.
=
2
2
areunknown b)
2

1

SE(m
1
m
2
) =
1
2
/n
1
+
2
2
/n
2
) SD
2
2
/n
2
and
1
/n
1
+SD
2
m
1
m
2
test Statistics d
obt
=t
obt
=

1
/n
1
+SD
2
follows t distribution with
SD
2
2
/n
2
degrees of freedom:
(SD
2
2
/n
2
)
2
1
/n
1
+SD
2
df =
(SD
2
2
/n
2
)
2
(estimated by MATLAB)
1
/n
1
)
2
(SD
2
+
n
1
1 n
2
1
oralternatively df min(n
1
1, n
2
1).
Ifinstead oftesting (1), want totest
H
0
:
1

2
=d (2)
against
H
1
:
1

2
=d(< or >)
use

m
1
m
2
d
test statistics d
obt
=
SE(m
1
m
2
)
II. Proportions
For a random variable X drawn from a Binomial(n
1
, p
1
) distribution, and an
independent random variable Y drawn from a Binomial(n
2
, p
2
) distribution,
let p
1
=X/n
1
and p
2
=Y/n
2
. For testing
H
0
: p
1
=p
2
(3)
against
H
1
:1)p
1
=p
2
or
2)p
1
<p
2
or
3)p
1
>p
2
3

p
1
p
2
use test statistics d

=
obt SE(
.
p
1
p
2
)
Since foraBinomial(n,p) randomvariableX and p) =
p(1p)
, p=X/n, V ar(
n

p
1
(1p
1
) p
2
(1p
2
)
p
1
p
2
) = V ar( p
1
) + V ar(
n
SE( p
1
p
2
) = V ar( p
2
) = +
1
n
2
Then

p
1
p
2
test statistics d
obt
=z
obt
=
p
1
(1p
1
) p
2
(1p
2
)
+
n
1
n
2
hasapproximatelyNormalzdistributionifn
1
p
1
,n
1
(1p
1
),n
2
p
2
, and n
2
(1p
2
)
10.
III. Dependent samples (Paired data)
Forpairedmeasurements(X
1
, Y
1
),...,(X
n
, Y
n
)(eg.,measurementsbeforeand
after)previous theory does nothold.
SampleX
1
,...,X
n
(anindependent samplefromNormal(
1
,
1
2
)distribution)is
notindependentofsampleY
1
,...,Y
n
(anindependentsamplefromNormal(
2
,
2
2
)
distribution), then
SE(m
1
m
2
) = V ar(m
1
m
2
) = V ar(m
1
) + V ar(m
2
)Cov(m
1
, m
2
) =
V ar(m
1
) + V ar(m
2
)
=0fornon-independent data! since Cov(m
1
, m
2
)
Insuchcase,testing (1)isequivalent totesting one-samplehypothesis fordata
D1(=X
1
Y
1
), ... ,D
n
(=X
n
Y
n
):
H
0
:
D
= 0 (4)
against
H
1
:1)
D
= 0 or
2)
D
<0 or
3)
D
>0.
4
Condence Intervals
m
1
m
2
For a test statistics d
obt
=
SE(m
1
m
2
)
we reject H
0
if |d
obt
| d
crit
(/2). If

< d
crit
(/2) d
crit
(/2)< d
obt
weconcludethatevidenceagainstH
0
isnotstatisticallysignicantat- signicance
level.
Concence interval for
1

2
is computed by inverting non-rejection region

m
1
m
2

< d
crit
(/2) d
crit
(/2)<
SE(m
1
m
2
)

d
crit
(/2)SE(m
1
m
2
)< m
1
m
2
< d
crit
(/2)SE(m
1
m
2
)
with (1)100%condence interval for
1

2
:

((m
1
m
2
)d
crit
(/2)SE(m
1
m
2
); (m
1
m
2
) + d
crit
(/2)SE(m
1
m
2
))
5

You might also like