You are on page 1of 416

\370ductory

Int

statistics

WONNACOTT

THOMA

)lessor of Mathematics

Associate

University

oi

RONALI

Western

) J.

Ontario

WONNACOTT

Professor of Economics
University

ol

Western

&
New

York

Ontario

SONS\177

\370ndon' Sydney'

INC.

Toronto

Copyright

\270

1969

by John

rights
reserved. No part
be reproduced by any means,
nor
translated
into a machine

All

out the

written

permission

Wiley

& Sons,

of this

book

Inc.

ma)'

nor transmitted,
langua\177\275e with-

o.f the

publisher.

109876543
Library

of Congress

SBN 471
Printed

95965 0

in the

Catalog Card

United States of

Number'

America

69-16041

INTRODUCTORY

STATISTICS

Monique

and

Eloise

Our ol

\177jective has been to write a text that


would
come into the Statistics
market belween the two texts written by Paul G. Hoel (or the tWO texts
written by Iohn E. Freund). We have tried to cover most of the material
in
their mathematical
statistics books, but we have used mathematics only
slightly
m\370\177'e difficult
than
that used in their
elementary
books. calculus is
used only ih sections where the argument is difficult
to develop
without it;
although tt}is puts the calculus student at an advantage, we have
made
a
special effort to design these sectionsso that
a student
without calculus can

also folloW\177
By

\177

r ec

re

by

books

uiring

texts we hai

to treat

mathematical

and inferer :e to the theory

objectiveh\177.s been to

mathematics
than many other elementary
many important topics normally
covered
only
statistics: for example, the relation
of sampling

more

little

able

been

of

and

probability

random

variables.

Another

logical relation between topics that have often


and isolatedchapters:for example,
the equivalence
of i\177
terval estimation and hypothesistesting, of the t test and \234 test,
and
of anal
,sis of variance and regressionusing dummy
variables.
in every
case our m(
:ivation has been twofold' to help the student appreciate.
indeed
enjoy--the
underlying
logic,
and to help him arrive at answers to practical
appeared

i\177

texts

show

the

as separate

problems.

We ha{

regression

statistician
such relatec
Our

or

priority on the regression model, not only


because
regarded
as the most powerful tool of the practicing
but also because it provides
a good focal point for understanding
techniques
as correlation
and analysis of variance.

\177e
placed

high

s widely

ginal
aim was to write an introduction to statistics for e\177onomic
b] tt as our efforts increased, so it seems
did our ambitiQns. Ac\177
cordingly, t \177isbook is now written for students in economics
and other social

students,

sciences,

by

foi

mathe\177

omitted

fro

business schools, and for service


itics departments.Some of the

!
rn

audience:

f\177
,r

decisions,

example,

\177d game

multiple

in statistics

Provided

typically
are of interest to such a broad
comparisons,
multiple regression, Bayesian

courses,

introductory

courses

but

theory.

vii

topics

covered

are

PREFACE

viii

reservedfor

a special
these

.raises major

calculus

simple,

is kept

itself

been made

has

effort

to allow the

at least to

arrow (=>) if they


bracketed ( ) if they
exercise

previous

duplicate

the

finer

is allowed,

instructor

to his students' background.


are more difficult,
or set with an
ideas taken up later in the text, or
problems,
and thus provide
optional

if they

important

introduce

Thus

the

course

the
(*)

some of

Moreover,

to skip

student

elementary

more

continuity.

to the instructor's manual.

somedegree,to tailor
Problemsare also starred

and

and design. The text


interpretations and developments

sections. In all instancestheseareoptional;

losing

without

are deferred

difficult

more

starred

and

with

students

including

of evenness

problems

the

with

footnotes

completely

points

severalaudiences

text aimed at

A statistics

without

only.

Our experiencehas beenthat

this

is about

the

right

of material

amount

two-semestercourse;a single semester


introduction
is easily designed
to include the first 7, 8, or 9 chapters. We have also found that majors in
economics
who may be pushed a bit harder can cover
the first 10 chapters in
one
semester.
This has allowed us in the second
semester to use our forthcoming Econometricstext which provides
more detailed coverage of the
material
in Chapters
11 to 15of this book,
plus additional
material on serial
correlation, identification,
and
other
econometric
problems.

for a

So many
have
contributed
to this book that
it is
them all individually.
However,
a special vote of thanks
implication, to the following for their thoughtful
reviews:

David A. Belsley,Ralph

R. W.

Pfouts,

and

teaching assistantsand
the

University

the

of Western

many improvements
London,

A.

especially

Ontario,

September, 1968

Canada

during

Bradley,

Franklin
students

go,

should

without

J. Arnold,

Harvey

Edward
Greenberg, Leonard Kent,
M. Fisher. We are also indebted to our
in both mathematics and economicsat

Ontario and Wesleyan


a two-year

to thank

impossible

classroom

who

(Connecticut)

suggested

test.

Thomas H.
Ronald

J.

Wonnacott

Wonnacott

on

Introducti

1-1

Ex

1-2

In(.uction

1-3

Wiry

Sample?

1-4

Hiw

to Sample

Descnpt\177e
o
. t
2-1

\177mple

and Deduction

5
5

and Graphs

Tables

ce([ters (Measuresof Location)


of Spread)

(Measures

Deifiations

Lin!:ar TransfOrmations

(Coding)

3 Probabilit F

3-6

Indi:pendence

Random

4-1

ariables

Me\177.n and
Bin\177,mial
tinUous

27

29

30

45

Views

of Probability

48

and Their'DistributiOns

52

DisCreteRandom
Col:]

19

40

Probability

cor\177ditional

Otl\177er

12

17

27

3-1 Int\177 oduction


3-2 Ele\177nentary Properties of Probability
3-3 Events and Their Probabilities
3-5

8
8

Intl'oduction

FrgquenCy

3-4

for Samples

Statistics

52
56

Variables

Variance

Distribution

59

Distributions

63

Th e N\370rmal Distribution

4-6 A F:anction of a

RandOm

66
Variable

4-7 Notl ttion

72

73
ix

CONTENTS

Two

77

Random Variables

5-1
5-2

Functions

5-3

Covariance

5-4

Linear

77

Distributions

of Two

84

Random Variables
of Two

Combination

88
93

Random Variables

102

Sampling

102

6-1

Introduction

6-2

Sample

Sum

6-3

Sample

Mean

6-4 Central
6-5 Sampling
6-6 Sampling
6-7 Summary
7

Estimation

105
107

112

Theorem

Limit

a Finite Population, without


from Bernoulli Populations

from
of

Sampling

119

124

Theory

128

Interval
7-1 Introduction'Confidence
7-2 Desirable Properties of Estimators
Estimation
7-3 Maximum-Likelihood

116
Replacement

for

128

the Mean

134

141

(MLE)

150
150
152

Estimation II
in Two Means
8-1 Difference
8-2 Small Sample Estimation: the t Distribution
8-3 Estimating Population Proportions' The Election Prob16m

157

Once
8-4

Again

the Variance
Square Distribution

Estimating

of a

Normal

Population'

The

10

9-1

Testing a SimpleHypothesis

9-2

Composite

Hypotheses

9-3

Two-Sided

Tests vs.

9-4

The

9-5

Conclusions

Analysis

163

167

9 Hypothesis Testing

Relation

Chi-

of

167

175

One-Sided Tests
Hypothesis
Tests to ConfidenceIntervals

Introduction

10\1772

One-Factor

10-3

Two-Factor Analysis

187'

193

195

of Variance

10-1

185

Analysis of
of

Variance
Variance

195
195

211

xi

CONTENTS

11

Introd

11-!

Regression

to

ction

220

221

\177nExample

dd-3

'he Least

ix 11-1

,,Appem

for

Criteria

iosSible

Derivation

Alternative

An

With\370ut Calculus

Model

234

Term

Error

the

of

E
stimating \177.and fi
T he Mean and Variance of a

12-5 T he

12-6

12-7

for

Interval

240

241

Hypotheses

about

243

245

249

Estimation

of the

]'he Characteristics

II\177dependent

250
254

Variable

Regression

13-1

In

13-2

T\177Le

255

:roductory Example

t squares
Mhltic\370llinearity

13-5

In}erpreting

Lqas

D\177mmy

255
256
257

Model

Mathematical

13-4

Estimatio

26O

an Estimates

Regression

265

Variables

269

Re\177ression, Analysis

\17713-7

and Testing
Y0

Likelihood

l\177/laximUm

13-6

238

of Extrapolation

D'angers
12.-10

Multipl

237

and b

of a and b

>nfidence

12-9

236

Theorem

Gauss-Markov

he DistribUtion

C
Intervals
12-8 P,'ediction

t/I\177_3

231

234

12-2 T he Nature

12-11

of Least SquaresEsti-

Theory

ion

12-1 ] he Mathematical

12-3

225

!i

12 Regress

12-4

223

Squares Solution
mates

13

a Line

Fitting

of Variance, and

Analysis

of Co278

va[iance

Correlatii)

14-1 Sir\177ple

285
Correlation

14-2 Pa:'tiaI Correlation


14-3 M1 ltiple Correlation
15

theory

Decision

15-1

Pri

15-20p

)r and

285

308

310

312
Posterior

:imal Decisions

Distributions

312
315

CONTENTS

xii

as a

322

Decision

15-3

Estimation

15-4

Estimation' BayesianVersus
Classical
Critique
of Bayesian Methods

15-5

15-6

Hypothesis Testing
15-7 Game Theory

as

a Bayesian

Decision

324

331

333
340
350

Tables

Appendix

Table

Squares

Table II

Random

Table III

Binomial

and

Square

Roots

Digits and Normal Variates


and Probabilities

Coefficients

362

IV

Standard Normal Probabilities

367

Table

Table

Student'st

368

Critical

Table VI

Modified

Table VII

F Critical

Points

Table VIII

Common

Logarithms

Points

Chi-Square

Acknowledgements
Answers to

Glossary of
Index

351
360

Odd-NumberedProblems
Symbols

Critical Points

369

370

374

376
377
393
397

IntrodUction

originally meant the collectionof popUlatiOn


and
vital to the state. From that
modest
beglnning,
statistics
ha s grown into a scientific
method
of analysis
now applied to all
the
social
a4d naturaI sciences, and one of the major
branches
of inathe matics. The\177resent
aims
and methods
of statistics are best illustrated
with
The

wor

i \"statiStics\"

economic

in?Ormation

a familiar

ex!mple.

1-1 EXAM ?LE

:
Before

specifically,

/ery

presidential

'.hey

try

to

guess

vote for

each candidate. Clearly,

task. As

the

hope

\177nly

alternative,

the pollsters

election,

the proportion
canvassing

survey

they

that
thq sample
proportion
proport'\177on. This is a typical

will

be

of

to

try

the

pick

population

the winner;
thht
will

would be a hopeless
a sample of a few thousand
i n the
good estimate of the total pOpulaall voters

exampleof statistical inference


or statistical
induction.' the (voting) characteristicsof an unknown
population
are inferred
from the (vo ing) characteristics
of an observed sample.
As any \177ollSter will admit, it is an uncertain business.To be sur e of the
tion

population,
\177ne has
to wait until election day when
all votes
are cOUnted.
Yet if the sa ripling
is done
fairly and adequately, we can have
high
hopes
that the sam'\177le proportion
will be close to the population proporti\370hl
This
allows
us tO:estimate the unknown
population
proportion
rr from the observed sampl\177! ProPortion
(?),
as follows:
;

with crucial J uestions being,


,r =\"Ho
P w4we that
we a 'e right
?\"

small is
samall

error
this

error

?\"

and

\"Ho w sure

are

INTRODUCTION

typifies the very


of Chapter 7

this

Since

language

the

in

core of the
(where the

it more precisely
find the proof and a

we state

book,

will

reader

understanding).

fuIler

If
confidence

the sampling
that

is

and

random

-4- 1.96x/P(l
\177rand

u'here

size.

of how

illustration

an

As

1,000voters,

of .60,

proportion

this

.60

95 \177 confidence,
to be between

with

This

will

the sample

we have

candidate. With

sampled
sample

this

(11000
-- .60)

-4- 1.965/.60

.60q-.03

Democrat

kind

n is

and

equation (1-2)becomes

or approximately
Thus,

95\177

(1-2)

suppose

works,

Formula

the Democratic

600 choosing

with

}\177,ith

-- P)

and sampleproportion,

the population

P are

state

we can

enough,

large

we

estimate

the population

.57 and .63.


as a confidenceinterval,

is referred

to

be one

of our

major

objectives

(1-3)

and

in this

making

proportion
estimates

book. The

other

voting

of

this

objective

For
example,
suppose we wish
to test the hypothesis
candidate
will
win
the election.
On the basis of the
information
in equation
(1-3) we would reject this claim;
it is no surprise
that
a sample
result that pointed to a Democratic majority
of 57 to 63 \177 of
the vote will
also
allow
us to reject the hypothesisof a Republican
victory.
In general,
there is a very close association
of this kind between confidence
intervals
and
hypothesis
tests; indeed, we will show
that in many instances
they
are
equivalent
procedures.
We pause to make several
other
crucial
observations
about equation
is

test

to

h)7\177otheses.

Republican

the

that

(\177-3).

estimate is not made


with certainO';
we are only 95 \177 confident.
concede the possibility that we are wrong
and wrong because we
were unlucky
enough
to draw a misleading sample. Thus, even
if less than
half the population is in fact Democratic,
it is still possible,
although
unIlkely, for us to run into a string of Democrats in our sample. In such circumstances, our conclusicm
(1-3)
would
be dead wrong. Since this
sort
of bad
luck is possible, but not likely, we can be 95 \177oconfident
of our conclusion.
2. Luck becomesless of a factor
as sample
size increases; the more
voters we canvass,
the less likely we are to draw
a predominantly
Democratic
1.

We

The

must

AND

INDUCTION

'DEDUCTION

sample from a Republican population. Hence,the more precise our predicFormal


ly, this is confirmed
in equation (1-2); in this formula
we note
that
the err(
\177rterm
decreases
with sample size. Thus, if we increased
our
sample to 1\177,000 voters, and continued to observea Democraticproportion
of .60, our 9
5 5/o confidence
interval would become the more precise'

tion.

.604-.01
3. Supp!ose
have two

when

back

Ol\177tions.

indicates
that 95\177o
you are 99 % sure of your
is to increase our sample

employer

our

enough.\"Come

(1-4)

One

good

is not

confidence

We

conclusion.\"

size; as a

now

of

result

this

additional
cost and effort we will be able to make an interval
estimate
with
the precision'; Of (1-4) but at a higher level of confidence.But if the additional
resources fo] further sampling are not available,
then
we can increase our

confidenceo
\177ly by
of Democratsis

making

a less

precise statement--i.e.,

that

the

proportion

.60 4-.02

The less we
we can

be

ourselves

commit
that

are right.
of avoiding
an
we

can be certain
ment so imprecisethat

whole popul\177ttion2;
statistical co]\177clusions

-1

Figure

confident

that we
a state-

be contradicted. \177The other is to sample the


is not statistics ...... it is just counting. Meaningful
be prefaced
by some degree of uncertainty.

it cannot

but

this
must

AND

INDU\234TION

1-2

to a precise prediction, the more


In the limit, there are only
two
ways
erroneous conclusion. One is to make

the

illustrates

reasoning.h duction

DEDUCTION
difference

between

inductive

and deductive

from the specific to the general,


or
population. Deduction is the reverse-arguing
fro m the general to the specific,
i.e., from the population to the
sample.
a Equhtion
(1-1) represents inductive reasoning; we are arguing
from
a sample prc portion
to a population
proportion.
But this is only possible
(in our

case)

\177
E.g.,

Or,

rr

from

involves

the

sample

arguing

to the

.50 :k .50.

th

almost

e whole population. Thus


it would not be necessary
to poll the whole
etermine the winner
of an election; it would
only be necessary to continue
canvassing unti one candidate comes up with a majority. (It is always
possible,
of course,
that
some peop e change
their mind between the sample survey
and their actual vote, but
we don't deal x\177
ith this issue here.)

population

The

that

Thus

student c\177
.n easily
keep these straight with
the popula tion is the point of reference.
induction

deduction

on

to

induction.

is

the

help of a little
means

The prefix in
s arguing towards the population. The prefix
ar
\177uing away
from the population. Finally,

Latin,

and

recognition

\"into\" or \"towards.\"
de means \"away from?' Thus
statistical
inference
is based

INTRODUCTION

Sample
Population

known

(a)

Population

FIG.

1-1 Induction

deduction

and

contrasted.

(b) Deduction
study the simpler
(1-1), we note that the

if we

can be inferred

Sample?

known

(a)

Induction

problem of deduction first.


inductive

(statistical inference).

(probability).

statement

Specifically,

(that the population

in equation

proportion

proportion) is based on a prior deduction


(that
the sample proportion
is likely to be closeto the population
proportion).
Chapters
2 through
5 are
devoted
to deduction. This involves,
for
example,
the study of probability, which
is useful
for its own sake, (e.g., in
from

the

sample

HOW

TheC

Game

dealt

ry);

with

\"With

\"Only

questions

o\177

argume

the

ur

We

in the

ir\177duction

statisticaI

we ask,

6 chapters

first

sample behave? Will the


issue is resolved can we

be

sample

to

move

inference. This involves, in the later chapters, turning


asking \"}low precisely can we make
inferences
population from an observed sample?\"
and

around

known

SAMPLE?

WHY

1-3

deductive

this

when

will a

statistical

tt

an

about

how

as the basis for

In short,

10.

through

,\177npopulation,

target'

'on

Chapters
git

even more useful

it is

but

TO SAMPLE

study

than

, rather

sa_\177ple

the

whole

population,

for any one

of

three reasonis.

(1) Littilted

resources.

data available.

Lirrlited

(2)

(3) Destructive

1.

testing.

Limi

:ed resources
t

preelection

but this is

n,

,olls,funds
only

\177t
the

2. Som,times
may be incm red.

there

almost always play some part. In our example of


not available
to observe the whole population;
reason for sampling.
is only a small sample available,
no matter
what cost
were

example,
an anthropologist may wish to test the theory
on islands .4 and B have
developed
independently,
with their o\177n distinctive
characteristics
of weight, height, etc. But there
is
no way in \177hich
he can
compare
the two civilizations in toro. Instead
he
that

For

two!civilizations

the

must make a }\177inference from the small sampleof the 50 survivin g inhabitants
of island .4 \177tnd the 100 surviving
inhabitants of island B. The sample
size
is fixed by nature, rather than
by the researcher's
budget.
There a\177re many
examples
in business. An
allegedly
more
eflqcient
machine may; be introduced
for testing, with
a view to the purchase of additional
simila?
units.
The manager of quality
control
simply
cannot
wait
around to ol}serve
the
entire
population
this machine will produce.
Instead
a sample ru4 must
be observed,
with the decision on efficiency
based
on an
inference from this sample.
3.
have

SamP

ing may involve


a thousand

produc\177:d

It would be
burn

1-4

HOW

In

staffs

to

insist

and

wish

to

the whole

ott.

they

distinguish

blly

destructive testing. For example,suppose

light bulbs
on observing

know

their

we

average life.

population of bulbs until

SAMPLE

ics,
b \177

as in business

\177.tween

bad

luck

or

any

and bad

other

profession,

management. For

it is

essential to

example,suppose

INTRODUCTION

man bets you s100 at even odds that you will get an ace (i.e., 1 dot) in rolling
a die. You accept the challenge,roll an ace, and he wins. He's a bad manager
and
you're
a good one; he has merely
overcome
his bad management
with
extremely
good
luck. Your only defense against this combination
is to get
him to keep playing
the
game
with your dice.
If we now return to our original example of preelectionpolls, we note
that
the sample proportion of Democrats may
badly
misrepresent
the
population proportion for either
(or both)
of these reasons. No matter
how
well managed
and designed our sampling
procedure
may
be, we may be
unlucky
enough
to turn up a Democratic sample from
a Republican
population. Equation (1-2)relatesto this case; it is assumed that the only complication is the luck of the draw,
and not mismanagement.
From that equation
we confirm
that the best defense against
bad
luck is to \"keep playing\";
by
increasing
our sample size, we improve the reliability
of our estimate.
The other problem is that sampling
can be badly mismanaged or biased.
Forexample,in sampling a population of voters, it is a mistake to take their
names from
a phone
book,
since poor voters who
often
cannot
afford
telephones are badly underrepresented.
Other
examples
of biased samples are easy to find
and
often
amusing.
\"Straw polls\" of peopleon the street are often biased because the interviewer
tends to selectpeoplethat seem civil and well dressed; the surly worker
or
harassed
mother is overlooked. A congressman
can not rely on his mail
as
an unbiased
sample of his constituency, for this is a sample of people with
strong
opinions,
and includes an inordinate number of cranksand members
of pressure
groups.
The simplest way to ensure an unbiased sample is to give each member
of the population an equal chance of being included in the sample.
This, in
fact, is our definition
of a \"random\"
sample. 4 For a sample to be random,
it
cannot
be chosen
in a sloppy or haphazard
way;
it must be carefully
designed. A sample of the first thousand people encountered on a New York
street corner will not be a random sample of the U.S. population.
Instead,
it is necessary to draw
some
of our sample from the West, some from
the
East,
and so on. Only if our sample
is randomized, will it be free of bias and,
equally
important,
only then will
it satisfy
the assumptions
of probability
theory,
and
allow
us to make scientificinferences
of the form of (1-2).
In somecircumstances,
the
only available
sample will be a nonrandom
one. While probability theory
often
cannot
be strictly applied to such a
sample,it still may provide the basis for a goodeducated
guess
or what we
might term the art of inference.Although
this
art is very important, it cannot
be taught
in an elementary text; we, therefore,
consider
only
scientific
4 Strictly

speaking,

complex types of

this is
random

called \"simple
sampling.

random

sampling,\"

to distinguish

it

from

more

READINGS

FURTHER

inference

for ensun >ased on the assumption that


\177gthis
are discussed
further

are random.
6.

samples
Chapter

The techniques

READINGS

FURTHI\177

For

in

readel

recommen,

1.

Hu

2.

Hu:

who

the

wish

following.

a more

\177,
Darrell,

\"How

Darrell,

\"How

F,

lis, W. A., and

ack, 1956.

Paper\177

extensive

to Lie with
to Take a

to

introduction

New

Statistics.\"

Chance.\"New

Roberts,H. V.,

York'

we

highly

Norton, 1954.

1957.
of Statistics.\" Free

York'

Nature

\"The

Statistics,

Norton,

Press

4.

New

5.

)onald,

J.,

ork-

Norton,

tim,

M. J.,

and

Osborn,

1950.

R.,\"Strategy

in

Poker,

Business,

and War.\"

Slo\177
\"Sampling.\"

Simon

and

Shuster

Paperback, 1966.

chapter

Samples

for

Statistics

Descriptive

INTRODUCTION

2-1

already discussed the


the whole population

have

We

an inference

to

the sample must


each is called a
the

In

very

sample proportion

about

people
D

of describing

way

a single

by

sample

of Democrats;this

number is the statistic


P, the
used to make an inference
this

chapter,

previous

the

be

will

Admittedly,

proportion.

compute. In the sample of


by a

followed

turn

now

We

describe two other

(a) The results

only
a count
of the number
division by sample
size, (n = 1,000).
to the more substantial computations

a die

when

Each time
the values
only

assumes

x Later,

we

of

statistics

to

50

thrown

times.

of 200 American

men.

GRAPHS

AND

Example

(a) Discrete

on

is

a sample

of

height

2-2 FREQUENCYTABLES

it

Democrat

voting

samples.

average

(b) The

to
the sample

is trivial

statistic

computing

proportion (.60) required


(600),

numbers;

descriptive

few

pollster
would
record the
in his sample, obtaining a sequencesuch
as
and R represent Democrat and Republican. The

this

population

the

rr,

step,

a preliminary

sample statistic.\177
simple example of Chapter 1, the

answers of the 1000


D D R D R .... where
best

sample.As

reduced to

and

simplified,

be

from a

to make

of statistics

purpose

primary

shall

we

the die,

toss

6.

1, 2,...,
a finite
have to

is

we record the

a \"discrete\"

called

(or countably

define

a statistic

infinite)

X,

which

takes

random variable because


of values.

number

more rigorously;
8

of dots

number

but

for

now, this

will

suffice.

FREQUENCY

'

TA

The 50

BL

of Tossing a

Results

2-I

Die 50

\"

Times

such as given
in Table
2-1.
of the six possible outcomes
in Table
2-2
In column
3 we note that 9 is the
frequency
f (or total number
of times) th\177
it we
rolled
a 1; i.e., we obtained this
outcome
on 9/50 of our
tosses.
Forn
ally, this proportion
(.18) is called relative frequency (fin); it
4.
is computed in column

To

sim

hrows
yield a string
lily, we keep a running

2.

TABLE

Calculation of

(1)

of each

Frequency,
and
in 50 Tosses

(2)

of

Number

tally

of Dots

Number

the

50 numbers

of

(3)
Frequency (f)

Tally

Itll

\177 \177 ]1

12

]\"N4

4
5

'[\"b[.l

111

.16

D44

D-\275I

.10

.20

F'\177

can be

simila

vertical scale

This now

in

column

The

except

.\177ntical
transforms

givi

(b) Continuo\177

:s us

\177 f

2-1.

.fly graphed;

id

are

graphs

rmation

in Figure

.24

.12

=n

Ef=50

graphed

.18

where

is

Frequency
(fin)

]\"\177-I

The info

(4)
Relative

Dots

of the

Relative Frequency
of a Die

an

is

\"the

for

immediate

of all

\177(f/n)

1.00

f\"

3 is called a \"frequency distribution',\"


and
\" relative
'
\337 \337
\337 , ' in
\337 column
frequency distribution
4

so will note that the two


the vertical scale. Hence, a simple change of
2-1 into a relative frequency distribution.
picture of the sample result.

the student
Figure

sum

\234\1773

who does

.s Example

that a sample of 200 men is drawn from a certain population,


each recorded in inches.
The ultimate
aim will be an inference aborn the average height
of the whole population;
but first we must
efficiently sur \177marize and describe
our sample.
Suppose

with

the

heig'\177t of

10

STATISTICS

DESCRIPTIVE

Relative
frequency

Frequency

15

-30%

10 -20%

10%

O0

Number

FIG. 2-1

Frequency

frequency distribution of

relative

and

4
of dots

tossesof a die.

the

of a sample

results

of 50

In this example,
height
(in inches) is our random
variable
X. In this
case, X is continuous; thus an individual's
height might be any value,
such
as 64.328 inches. 2 It no longer makes sense to talk about the frequency
of
this specific value of 2'; chancesarewe'llnever
again
observe
anyone exactly
64.328 inches tall. Instead we can tally the frequency of heights within
a
TABLE 2-3

Frequency, and Relative Frequency of the

of 200 Men

NO.

Boundaries

55.5-58.5

57

58.5-61.5

60

61.5-64.5

63

Tally

Midpt

66
69

5
6

75
78
81

8
9

82.5-85.5

overlook

the measured

height

.010

.035

22

.110

13

.065

fact

.220

36

.180

32

.160

13

.065
.105

21
10

84

the

Frequency
f/n

E\234=200=,

2 We shall

(5)

Relative

44

72

10

(4)
Frequency,

Cell

Cell

Cell

(3)

(2)

(1)

Sample

of a

Heights

although height
is conceptually
to a few decimal places at most,

that

is rounded

.050

\177f/.----1.00

continuous,
and

is therefore

in

practice

discrete.

1. Th

e iCh

2-2, where

.2
2-.2

rounded

off

as

is represented

60

recorded exactly,

63 66 69 72 75 78
into

observations

e values

all sampl

in

Figure

we

than

rather

have

: being

Height

illustrating the

cells,

grouping

nearest integer,
into cells of

then

graphed

to the

preliminary

data is

in

first

two

columns

of

for example,may
width

fact

be

in

1.)

2-3. This frequency disto representfrequencies


as a

Figure

uses bars

histogram,

so-called

81 84

Table 2-3.

tN t the observations

reminder

much

too

into cells is ilIustrated


by a dot. For simplicity,

200 obSerVations

the

(Rounding,

o]

with the following

between

represent

will

hereafter

observations are

grc UPed

The
tribution,

at

o f'

TN grouping of

regarded

the

5<,' \"'5\177\"7D:3..... ;T\"\177;\177\"\"T\",ZZ.T\177'%\177


' ..... 7\177'~';

57
FIG.

reasonablecompromise

which

Observation

tt the

assumed

but

arbitrarily,

convenient whole number.

uPin g

grc

The

cell midpoint,

i a

cell,

Then

Table 2-3.

before.

as

little.

t\177,o

Eack

2.
the

in

is a

cell s

of

e\177.tumber

3 of

column

The cells hav e been chosensomewhat


convenience in mind..

detail and

FREQUENCY

frequency are tabulated

d relative

61.5\")as in

(e.g., 58.5\" to

cell

or

class
frequency

occurred throughout

the

cell,

and

not just

midPdint.

the

We

no\177

frequency

to

turn

diktribution

60

the

question

with

a single

of how we
descriptive

may

characterize

a sample

measure, or sample Statistic.

3\177oo>'

50
\177_40

)\177o\177

II

1057

60

63

66

69

72

75

78

8I

84

Height

FIG.2-3

Th

frequency

and

relative frequency distribution of a sample of

200 men.

DESCRIPTIVE

12

there

fact,

In

of the

2-3

are two

of

concepts

below. We shall start

discussed

LOCATION)
the

with

the

\"center\" of

median, and

the mode, the

of these,

Three

is the central point

is its spread.

OF

are several different

There
distribution.

the first

descriptions'

useful

the second

(MEASURES

CENTERS

(a) The

highly

and

distribution

STATISTICS

the

frequency

mean,

are

simplest.

Modes

This is defined

as the most frequent


value. In our example of heights,
inches, since this cell has the greatest frequency, or highest
bar
in Figure
2-3. Generally, the mode is not a good measure of central
tendency, since it often depends
on the arbitrary grouping of the data.
(The student
will
note that, by redefining cell boundaries, the mode
can be
shifted up or down considerably.)
It is alsopossible
to draw
a sample where
the largest frequency
(highest
bar in the group) occurs at two (or even more)
heights; this unfortunate
ambiguity
is left unresolved, and the distribution

is 69

mode

the

is \"bimodal.\"

(b) The

Median

This is the 50th percentile;


i.e., the value below which
half
the values
in the sample fall. Sinceit splits the observations
into two halves, it is sometimes called the middle value. In the sample
of 200 shown in Figure
2-2,
the median (say, 71.46)is most
the left; but if the only

from

tion
within
a

2-3,

in Figure

\"Mode\"

the

median
means

must

it

easily

derived

by reading off the

information available is the


be

calculated

choosing

frequency

100th value4
distribu-

an appropriate

value

cell. 5

fashion,

in

French.

101st value. This ambiguity is best resolved by defining the median as the average of
the 100th and 101st values. In a sample with an odd number of observations,
this ambiguity
does not arise.
\177
The median cell is clearly
the 6th, since this leaves 44% (i.e., 88) of the sample values
below and 38 % (i.e., 76) above.The median
value can be closely approximated
by moving
through this median cell from left to right to pick up another
6\177o of the observations.
Since this cell includes 18 % of the observations,
we move 6/18 of the way through this
cell. Thus our median approximation is 70.5 + (6/18 x 3) -- 71.5.

4 Or

CENTERS'

Me:i

(c) The

\337

is i ;ometimes
\177

This

Called

is the

This

13

\177

(.\177)

\177\370stc\370mm\370n

, X,)'iare simply

the arithmetic mean, or simply


the
average.
meaSUre. The original observations
then divided by n. Thus

Central

summed,

\177 \177

1 (X 1

X2

. . +

X\177)

Definition

(2-1a)

where

Xi

repr\177

the ith

;ents

The average height


observati:')ns

200

value of 2', and


of

our

sample

and dividing by 200.

be greatly sirr:plified

using

by

the

\177x

means

\"equals, by definition.\"

could be computed by summing


However, this tedious
calculation

grouped data in Table


2-3. Let fl
1, where each observation may

the number (,f observations


in cell
proximated
6 !)y the cell midpoint,

xx. Similar

'3t- Xl
\177

ap-

for all

'JF

'\177
' ' .\177l'l
'3t- Xl)
ii\177
\1773L (X2
\177 JV'
_Hi iX2 + ....
f\177times

fl times

+
f10
\177

be

celli too, so that

the other

where

represent

hold

approximations

all

can

represents

approximate

\177

it follows

equality;

\177

times

that

\337
'A00}

.... A0

XlO

In gener a ]
X \177 i=
\177 x i
\177
In

approximati

(2-1b)

observed
value by the midpoint of its cell, we sometimes
err
mes negatively;
but unless we are very
unlucky,
these errors will tend
the error must be smaller
than half the cell
width. Note that th e unluckiest case,however,
by the small xi, to distinguish
the m from
the observed val\177 cell midPoints are designated

positively, somet

to cancel.Even

ir

\177geach

CENTERS
where

(f\177/n) =

numbe r

We

formulation

frequency

relative

equation

this

(2'Ia), appropriatefor
calculation ifof (2-Ib)i
s based on the
3 ot

column

value

each x

(d)

Cotnpari

is

th

the

distrf

mirro]

coincide.

data.

grouped

data

in Table

2-4. We can think


of this as a
weighted appropriately
by its relative

Table

on

These

show

cell, and

ith

the

in

(2-lb) to emphasize that

of

Mean,

m=

of

number

is

it

15

\"

In our

example, the

2-3, and is

in

shbwn

with

average,

\"weighted\"

cells.

equivalent

the

frequency.

and Mode

Median

ree measuresof centerare compared


in Figure
2-4. In part a we
bUtio n which
has a single peak and is symmetric
(i.e., one half
image of the other);
in this case all three central measures

Bu\177
when

the

is skewed

distribution

to the

right

as

in b,

the

\177nedian
i

(o)

Mode
Median
Mean

ModeMedian4,

Mean

F1G. 2-4(a)

coincide

at

A
the

;ymmetric

)oint of

distribution
symmetry.

with a single peak.The mode,


median,
A right-skewed distribution, showing
median < mean.
(b)

an d mean
mode

<

16

STATISTICS

DESCRIPTIVE

of the mode;
with the long scatter of observed
values
strung
hand tail, it is generally
necessary
to move from the mode
to the right to pick up half the observations. Moreover, the mean will generally
lie even further
to the right,
as explairied in the next section.

to the

falls

out

in

the

right

right

Interpreting the Mean by an Analogy from Physics. The 200 observations


sample of heights appear in Figure
2-2 as points along the X-axis.If
think
of these
observations
as masses (each observation
a one
pound

the

in

we

mass, for
ask

where

the

formula

example),and the

The precise

as a

X-axis

rod, we might

weightless supporting

balances. Our intuition

suggests

balancing point, also called

the

this rod

center.\"

\"the
center

of gravity,

is given

by

which is exactly the formula for the mean.


Thus
we are quite justified in
thinking
of the sample mean as the
\"balancing
point\"
of the data, and
representing it in graphs as a fulcrum \177.
It can easily be seen why
the
mean
lies to the right
of the median
in a
right-skewed distribution, as shown
in Figure
2-4b. Experiment by trying
to
balance at the median. Fifty
percent
of the observed values now lie on either
side,
but the observations
to the right tend to be further distant, tilting
the
distribution
to the right. Balance can be achieved
only
by placing the fulcrum
(mean) to the right of the median.

PROBLEMS

2-1 Show the


mode

mean,

a good

mode,
and median
central measure in this

2-2 Find the mean, median, and mode of


Graph

4 10
11

1 10
Sort the following

in Figure

2-3. Is the

case ?
the

following

sample

of

are

55,

litter

9 15 12

14

10

12

14

data into 8 cells, whose midpoints

60...

56.02
55.31 81.47 64.90 70.88 86.0277.25
76.73
84.21
84.92
90.23 78.01 88.05 73.37 87.09 57.41 85.43
74.76

86.51

sizes.

distribution.

the frequency

2-3

example

our

in

86.37

76.15

88.64

84.71

66.0\177

83.91

90.

17

DEVIATIONS

Approxirr

2-4 Sort the

Problem 2-3 into

ata of

90. Th

80,

2-5 Summariz.

table.

en answer the

Ori \177inal
--F
(.

)roblem 2-3)

oarsegrouping

mean

2-4

Although

see why

ra.

of

with the

lge is

value.

For men's heig[

us n, ts, the

it tells

two extreme

\177thing

which

ta ce

deviation of ea,
(Xi

\177)

are

defined

th

SPREAD)

be the most important characteristic


to know how spread out or varied

we find that

there are

the

distance

between

the largest

en

of

be

fairly

on the grounds

criticized

distribution

except where

be very

unreliable.

We therefore turn

account of all

and smallest

observation

largest-smallest

it

ends.

And

to

measures

these
of

observations.

as its name implies, is found


value (Xi) from the mean
averaged by summing and dividing

deviation,

\177

measures

several

the

h observed

the

important

range is 30.It may


about

val

The averag.

center,

___a

\177es may

spread

Not

simplest.

simply

Range

that

may

is also

observations.

star\177

The

OF

height

sample,it

\177asures

(a)

81.47

is not

mode

the

\177tverage

(statistic) of th e
are the sampie
As with rr
we

77.78

Mode

2-4)

DEVIATIONS(MEASURES

spread;

Median

following

a good measure?
)arse
grouping
al}va),s give worse approximations (for
nd median) than fine grouping,
or will it do so usually?

Do yoO

(b) Will

Mean

the

ine grouping

(; \177roblem

(a)

previous two problems in

data

values)

:xact

(,

to the

the

are 60, 70,

whose midpoints
as in Problem 2-3.

4 cells,

questions

same

answers

\177the

median, and mode? Graph

are the mean,

ately
what
distribution.

frequenc}

by
(\177);

by

calculating
the
deviations

these

n. Although

this

18

STATISTICS

DESCRIPTIVE

sounds

like a

always

cancel

promising measure, in

fact
it is worthless; positive deviations
leaving an average of zero.7 This sign
ignoring
all negative signs and taking
the average
deviations,
as follows.

deviations,

negative

problem can be avoided by


of the absolute values of the

(b) The Mean Absolute

Deviation

_x

1 \177 1x\177. //i=1

(2-4)

\177l

this is a good measure of spread;the problem


is that it is mathematically intractable.s We therefore
turn to an alternative means of avoiding
the
sign problem
namely, squaring each deviation.

Intuitively,

(c)
If we use

Deviation

Squared

Mean

grouped data as in

Table

(MSD)

2-4,

z= _.1 \177

/l i=l

this formula

(Xi

--

(2-5a)

-'\177)\177'

becomes

(2-5b)
This is a goodmeasure,provided
we wish
typically
we shall want to go one stepfurther,
about

inference

- 1 rather than
ance.

Average

may

!/ariarice
, sS-,z

=\177I

n
\177 (X,

= -

Average

deviation

,t
-1 1 i=1
\177 (xi

describe the sample. But


to make a statistical
is better to use the divisor

use this

is referred

to

as the

vari-

(2-6)

.\177)\177

as follows:

be proved

deviation

and

the population. For this purpose


it
\370
n. The resulting
sample statistic

(d)

This

only to

= 0

--

(2-2)
.\177)

1
\177 x\177-

- (n2)

(2-3)

s One difficulty
is the problem of differentiating
the absolute
value function.
0 Technically, this makes the sample variance an unbiased estimator
of the population
variance. SeeChapter
7.

The values of MSD and


(e) Stan lard deviation

s =

Deviation,

Standard

root of the

the square

is s,

\177 n

2-4, again exploiting

in Table

calculated

s 2 are

data. 10

of grouped

simplicities

the

\177Z

variance.

(Xi

X)

and
B

In

conclu\177sion,

the

. sample

.\177is

we compensate for
s is

reduced

language

second moments of the

units

common measure
to

s2

X and

squared
as the

having

to the same

of center,

common measure

the most

deviation
s the most
of physics, we refer

standard

\177

the

orrow\177ng

mean

sample

the

(2-10)

\177

/
Note
that b, taking the square root,
terms in d efining
variance
in (2-6), so that
X observatior\177s.

19

TRANSFORMATIONS

LINEAR

spread.

of

as

first and

the

sample.

PROBLEMS

2-6

pute !the
:

Com

(a)

Usin\177

(2-6).

definition

the

2-2.
(b) Using the easier formula

of Problem

data

the

of

variance

grouped data of Problem 2-4, compute


deviation, and standard deviation.
2-7

For the

(2-8)

1 For

the
witIx

has

done

already

2-5

LINEAI\177

Suppose t

of

\"norm\"

>arentheses are

6!

tat

(i.e., 5

Proof

and

merely

this

is shown
and

side is \177

(X i

following

formula.

(2-7a)

-i,h

in the last

two

\177)s=

We may

x?

\177 (X\177

\177 X\177 --

nY

forget

-- 2&\177 + \1772)

n\177\177
=

right side

(2-7b)

\177']

of Table

columns

are equivalent.

(2-7a)

25
The left

from the

computed

becomes

that (2-6)
prove

(CODING)

s2 is often

calculations,

the

s2

This computation

the student

in our example are measured relative


to
feet 9 inches).SinceXi denotes
the old height

ndata

grouped

problems

deviation.

heights

men's

the

siml:
lify

For

standard

parallel

(2-7).

absolute

mean

range,

Origin

inches,

Strictly to

they dosely

since

optional,

TRANSFORMATIONS
of

Change

(a)

2-3, compute the

of Problem

data

grouped

\177
Problems

the

2-4.
the common

divisor

(n

'

1),
(2-8)

(2-9)

20

STATISTICS

DESCRIPTIVE

(eg., 83), let X\177 denote

in inches

are related by the

equation

X i = Xi

terms,

nonmathematical
inches an individual

In

guessthat
the old

(+) or shorter

measure, i.e.'

On the other

hand,

regardless of which

spread

the

the

mean

be

will

exactly

the same,

i.e.'

is used,

(2-13)

sx, =sx
69--

\177 \177

....... \177
0

These
form

two points

of origin

Change

are illustrated

(2-15)

may be stated in theorem

consider

j'*
from

1 \177Xj

(2-14)

(\177

a)

Xi -- \177 a)

1 (ha)

1 (\177 X\177)

To prove

(2-16)it

will

be enough

to

prove

s\177,2
\177X\177
= n- 1 1 i=1
(
By

(shift).

2-5 and

Figure

in

Xi- 69

as follows:n

Proof: To prove

but

X[ =

New

FIG. 2-5

(2-14)and

(2-15)

the

equality

of variances.

\177,)2

n-1

\177

using
(2-12)

observations

of our

measurement

is simply \"the
number
of
69 inches. It is easy to

69

\177-

\177'=

measures

than

(--)

measure is just 69 lessthan

using this new

mean

the

The two

-- 69

measurement

new

this

is taller

14).

measure (e.g.,

the new

[(X i--a)--

(.\177--a)]

n -- 1 \177 (Xi --

\177)2

= s}c

21

TRANSFORMATIONS

LINEAR

Theorem
If

X[ =X\177-a

(2-14)

then

.g'=.V-a

(2-15)

(b) Change

= sx

sx.

and

(2-16)

Scale

Suppose
If

\"trintal.\"

x hat

3 inches

'e let

X\177*

of

name, for example


an old height of

some standard
new' height in trintals,

is given

height

the

denote

/ /'

///

/ /-

//

Old

New Xi*

FIG. 2-6
X\177 --

81 inche

would be

of scale

Change

converted to X.*
W

--

X\177

X(

(shrink).

= 27 trintals,

and

generally

\177

(2-17)

It is no

surprise

inches,i.e.'

that the

mean

in trintals

height

1/3 as

beseenfrom
much ascan
before.

These two
Theorem

poin

s can

be stated

mean

height

in

(2-18)

2-6 the

Figure

SX, --

the

}X

x*=
a

Furthermore,

be

is just 1,/3

standard

deviation

wilI

also

(2-19)

\253S
X

generally as

If

x\177,

bx,.

then

\177*

b\177

(2-20)

(2-21)
land

The proof
for

the

paralh

student.

Is that

of Theorem

(2-22)
I directly

above, and is left

as

an exercise

22

STATISTICS

DESCRIPTIVE

Linear Transformations

General

(c)

It is now
Consider the

linear

general

xg

two

the above

combine

to

appropriate

into one.

theorems

transformation'

TheoremIII
then

a +

\177 =

and

= a

Yi

If

s\177.

--

(2-23)
bXi

(2-24)

bX

(2-25)

Ibl sly

similar to Theorem I above,


and
is left as an exercise
theorem may be interpreted
very
simply'
if the individual
observations
(Xi) are linearly transformed (into corresponding Yi values),
then the mean observation is transformed in exactly
the same way, and the
standard deviation
is stretched
(or shrunk) x3 by the factor Ibt, with
no effect
from a.
proof
is
student.
This

the

Again

the

for

(d)

In future

tions in

Coding

to

Application

chapters we shall draw

applied to find

three steps.

1. Codeall
will be

Xi values

the

Xi -Yi =

students'

Y\177_X\177-

This is

clearly a linear

have one immediate

\177 and

into a new

most simplified if we use the

In our example of

of

of linear

theory

this

upon

it does

computation

a simpler

This involves

2-4.

However,

contexts.

various

that shown

sx

than

Yi

values.

set of

transforma-

use;

Our

it can
in

computations

formula

one of the cell midpoints


cell width

heights,

3 69=

be

Table

(2-26)

this becomes

(--\1779)

transformation
0o

(2-27)

(\177)X\177

of the form
_ --23

of (2-23), with

b=\253
\177Called

line
la

(with

More

given
any values of a and b, the
Y-intercept a).
stretched if lb[ > 1, but shrunk if lb[

linear because,
slope

precisely,

graph

b and

<

1.

of Y

= a+

bX

is a

straight

TRANSFORMATIONS

LINEAR
TABLE

2-5

Coded Computation of Mean


and
Standard
Deviation
of 200 Men's Heights (Compare with
Table
2-4)

Sa

Cod'

(1)

23

of a

\177ple

(21

For sy,
2
using
easy
calculation (2-7)

For

ng

(4)

(3)

(5)

-- 69

57

--

60

--

63

--

66

--

-8

69

+32

--21

+63

22

--44

+88

13

-13

44

+13

36

72

36

36

75

32

64

78

13

39

117

81

21

84

336

10

50

250

84

Ej\177di

128

187

200

n Ye =

175

\177
s\177,

X=3Y+69

1063

= .935

Ef,3/\177

199

(888)

= 4.46

71.80

= 3 \177/4.46

57

63 69 75 81

.,.-'

-40

FIG. 2-7

Codinl\177

from

.\177.

inches

(X)into
and

trintals (Y),
of scale.

involving

both

a change of'

Origin

24

STATISTICS

DESCRIPTIVE

in the

fill

column

in

= 0. Furthermore,
as X\177
steps. With
these
guidelines
2 of Table 2-5; diagrammati-

Yi

in unit

Y values

appropriate

69,

X\177 =

by steps

progresses

we can

evident that when


of 3, Y\177progresses

it is

Moreover,

cally, this codingis illustrated


in Figure
2-7.
2. Compute the mean and standard deviation
of the
in the successive
columns
of Table 2-5 how easily this
\177 and
sy now in hand, we are in a position
to'
3. Translate

this

transformations

of linear

theory

Y values. We note
now done. With

deviation back into X

standard

and

mean

the

applying

involves

is

values.This
III)

(Theorem

to

(2-27).

(-\177) +

F=

(2-28)

\177--69

\253\177--

and
s\177-

From (2-28)

(2-29)

-}sx

Y = 3Y

(2-30)

69

= 71.80

From (2-29)

--

Thus

simple

the

(2-31)

sx = 3st
6.35

of Y and sx

coded computation

is

complete.

PROBLEMS

(2-9)

2-5 from inches (X) into feet


linear transformation with a

By coding the heights shown in Table


(Y), compute
Y and
sx. Show your
diagram

similar

preferred

2-10 Use codingto


Problem

mean

the

find

and

is

the

standard

coding

used

the

in

deviation of

data

the

text

in

2-4.

2-11 Find the mean of

(Hint. It

is

the

following:

239510

239250

239860 239360

239480

239430

239230

239370

239290 239850

natural

number, and just work


mathematically

239,000)

2-7. Why

to Figure

justified

to
with

drop the

simply
the

239680

numbers

it is just the

first

510,

of every

digits

250,

....

linear transformation

This
Y

is

X --

2-12 To st

If
(2-13)

Using

25

TRANSFORMATIONS

LINEAR

nonlinear transformations are trickier,seeif this


F = ,\177', when there are three values of Z\177
coding, find the mean and standard deviation of

)w that

X 2, then

(a) T he

3, 5.

2-3.

of Problem

data

is true.
1,

(b) \177Ihe data of Problem 2-2.


(2-14)

Find

:hemean

execu

:ive

Graph

the relative

35

46

63

55

43

42

of the following sample

deviation

standard

and

ages.

69
59

54
45

of

50

frequencydistribution.
50 62 68 38 40
44 57 47 48 46
60
42
60 42 38
56
51 38 61 54

43 64 49 36 59
51 50 66 63 57
50 44 48 69 64 37

56

53

62

52

Review Prol \177lems

2-15 The
belo\177

\177eekly

for 5 major

rates

wage

:. Find the average weekly

ndustry
of

Wed

2-16

Supt

obta/

30 }/o

employment

S120

ly wage

ose

the

(2-17)

The
harvl

the

mean

sample
table*

Following
\177sted

(as

opposed

acco

rding

to

region.

as a

whole.

* Source.

Sta' istical

25

20

20

150

120

100

80

was recorded for

each

of 25

families,

1, 0, 1, 3, 0,4, 2,6,0,0,2,3,1,5,4,
2, 5, 3, 4, 1.

a frequency

:onstruct

3, 1,

0,

(b) \177ind

listed

data'

following

2, 4,

(a)

of children

number

the

ning

are

groupings

industrial

wage.

Abstract

table and graph.


and standard deviation.

the actual percent of farmland


that
was
to pasture, woodlot, etc.) in the U.S.A.
in 1959,
Compute
the percentage harvestedin the U.S.A.
gives

of the

United States, 1963,pp. 625,614.

26

STATISTICS

DESCRIPTIVE

Amount

PercentHarvested

(millions of acres)

Region
North

421

46.7

South

357

21.0

Mountain

264

8.7

18.8

80

Pacific

1,122

U.S.A.

2-18

of Farmland

was sampled, yielding


the
following
10
1.0, 1.2, 1.0, 1.1, 1.0, 1.6,1.2,1.4,
Find
the
median,
mean, range, variance, and standard deviation
(a)
For the original lengths.
(b) If the lengths are expressed in mm (1 cm -- 10ram).
(c) Ifthe lengths are expressed as \"centimeters abovea standard beetle
height
of 1.1,\"
(i.e., the sample values become+.4, --.1, .1,
A

species

certain

lengths,

of beetle

2.0.

1.5,

centimeters:

in

-.1,

2-19

a die

Throw

numbers
tribution,

(a)

After

(b) After

in

and
i0

100 times
Appendix

calculate
throws;

(or elsesimulate

this

by consulting

IIa). Graph the


the sample mean

Table

25 throws;

(c) After

100

(d) After

millions of throws

throws;
(guess).

the random

relative frequency dis-

chapter

Probabili

3-1 INTR DDUCTION

In the

L ext

fo

\177n

Chapters 7
tion

from

certain

If the
tha

o 10,

we shall
sample.

.opulation of
the

sample. Nevertheless, it
up

in

our

s\177Lmple.

Our

is

make

induction

inferences about an

in

involved

unknown

popula-

is 55 % Democrat, we cannot
be
of Democrats will occur in a random
that \"close to\" this percentage
will turn
is to define \"likely\"
and
\"close
to\" more
voters

American

same

deductions about a sample from

necessary preludeto the

this is a

where
\177observed

:xactI\276

we make

chapters

ur

pophlation;

known

percentage
\"likely\"

objective

able to make useful


predictions.
First,
of ground work. Predictions
in the face
of uncertainiy or chance require a knowledgeof the laws of probability,
and
precisely;

however,

this

ir

w\177

must

way

lay

we shall be
a good deal

tsPr\177s \177ehs\177Petxe\177
iis lde\177v\370tt;dssTnXClcUSively
\337 P
\177p
' g

t\370

their

Conside[ again

our

example

against rolliqg an ace on a die. This


this outcom\177 was unIikel . N

in

development.

dice.
Chapter 1, in which

oins and rolling


gamble

'

was

based

'

We start
the

on the

reader
jud\177em6nt

with

the

gambled
that

be more specific,and try to define its


probability p}ecisely. Intuitively,
since
this is but one of six equally
probable
outcomes,
w\177 m. ight
(correctly)
guess \177ts probability
to be one in six,
or
one-sixth
p\177owded
it is an honest die. Alternatively
we might
say that if
the die were tarown
a large
number of times, the relativefrequency
(of rolling
an ace) woul
d approach one-sixth
(as
in Problem
2-19). This is a useful
operational
a
pproach;
thus,
if we suspect that
this
die is not, in fact,
a fair
one, we coul,
test by tossing
it many times, and observing whether
or not
the relative fr
\177quency
of this outcome
approached one sixth.
This

formally

ow

let s

deft'

state

nition

d as'

of

probability

as \"the limit

27

of relative

frequency,\"

is

PROBABILITY

Definition

H1

Pr (eO

where

is the

frequency

n\177is

of times
of times that

total number

the number

n\177is
the

(\"ace\

the outcome

e\177is

therefore

(3-1)

--

lim

zx

f of

that

the trial

the

outcome

is repeated

(dieis thrown)

e\177occurs,

[also

called n(eO

or

ex]

the relative

of

frequency

ex

use this definition


of probability
because it provides
intuitive idea. However,you will find in Section 3-6 that it involves
difficulties; thus, if you choose to study
probability
further
you
forced to turn
to the axiomatic
approach.

We shall

clearest
conceptual

the

be

soon

will

PROBLEMS

3-1 (a) Throw


Record
record

a thumbtack

Define tossing \"the

Up ?

Point

(n0

table,

following

of

Frequency

Trial Number
(\177)

50 times.

your
results as in the
for future reference.

\"Ups\"

up\"

point

and keep

Relative

Frequency

Accumulated

No

.00

Yes

.50

3
4
5

Yes

.67

No

.50

Yes

.60

i0
20
30
40

50

as

permanent

e\177.

(b) Show

on the following graph'

results

,our

29

PROPERTIES

ELEMENTARY

>, 1.0

0.5

.->

II

Of tossingthe

3-3

in

a pal

11. 1\177epeat

el;

as

\"head\"

ELEME

e2,...

and

,ei

lize

by

...,

considering
).

The

and

as

proceed

for use in
and

e4,

You

use.

in

and
9.)

Chapter

proceed as in
may use the

same rela ions are true

in

n'--

<:

_<

Pr

be exact,

of

any

outcome
are

e,\177

positive;
relative

so that

limit,

the

to

elementary outcomes

posit re, since both the numerator and denominator


moreover, sin :e the numerator cannot exceedthe denominator,
frequency
cam tot exceed 1. Thus
_<

of

total

PROBABILITY

an experiment
with N
relative
frequency n\177/n

is your estimate

order

must be

The

3-1(a)

in

occur if you get


(b). What

theoretically ,

OF

PROPERTIES

e,v

E to

event
3-1(a)

Pr (E)
work?

derive

empirical

\177TARY

gener\177

and define the


as in

50 times,
you

the point up ?

2-19.)

Problem

of dice,

Roll

7 or

We
el,

of tossing

probability

down.

point

coin, define a

of Pr (E)? Can
and also s \177ve the

3-2

50

20

(b), tossiltg it 100 times.


(Record
your results
Roll a di{ 100 times. Define rolling a four
as
3-1 (a) and (b).
(Record
your results for future

same data as

3-4

15

best guessof the

(c) What is your


3-2 In tossing!a

10

from (3-1)
(3-2)

(ei)

and
Pr

(e\177)

<

(3-3)

30

PROBABILITY

we

Next

note

that the frequencies


n2 +

n\177 +

by n,

equation

this

Dividing

we find

Outcome Set

(a) The

e\177, e2,...,

relative frequencies

all the

die

was an

example

=e2

\337(H,T,H)

=e3

\337(H,T,I)

=e4

\337(T,T,H)
\337(T,T,T)

FIG. 3-1
flipping

We

Outcomeset

times.

coin

experiment

the

three

statisticiafi are sampling


set is also often known
as

=e6
=e?
=es

a coin

of outcomes.

consists of
times (or, equivalently, flipping
three
coins
at once). A typical outcome (designated
as e4) is the sequence H, T, T. The list of all possible
outcomes, or outcome
set,
is shown
in Figure 3-1.
a

three

outcomes

matter. Whenever
convention

the

sample

this

to use

curly

{e\177,

e\177,...

is the

, es}

case,

brackets.

it

the

practical

the

outcome

space S.

The order

features.

several

note

set of eight
in

complications.

no

Since most experiments of interest


to
experiments,

\337(T,H,H)=e\177
\337
(T,H,T)

experiment where the

involved

complex set

a more

have

will

flipping

H,H)=el

(3-4)

x) = 1

+'\"Pr(e

suppose

. (H,H,T)

1.

sum

-- 1

q-

For example,

\337(H,

to

numerical, and

e6 were

an experiment

Usually,

n.

Example

An

In the previous section,the


outcomes

to

PROBABILITIES

THEIR

AND

sum

limit, so that

+Pr(eQ

Pr(e0

3-3 EVENTS

that

n2q_...

in the

is true

'\"nx

__

n\177q_

This same relation

of all possibleoutcomes

in

which

the

is listed doesn't
is a mathematical

Thus

the two

out-

come sets {ex,e=,..., es}and {e\177, e\177, es,...,


es} are
the same set.
However, since(H, H,T) and (H, T, H) are separate and distinct
outcomes,
the order in which
H and T appear is an essential
feature;
in this case
we use round brackets
and call the result an ordered triple.
Finally,
we
note
that an experimental outcome involves
an
entire
ordered triple. It is tempting
to try to tear each triple into
three
parts,
and
think of 24 outcomes.This mistake
is avoided
by writing down a dot for each
of the 8 elementary outcomes. (Hereafter, we shall
often
refer to outcomes
as \"points\"
for short.)

31

EVENTS

To

let us

sire

suppo', e that
Sihce

probable.

the

all 8

is

coin

tossed,

fairly

probabilities

sum

must

''' =

= P(e2)

P(eO

our concepts in any way,


so that all 8 outcomes are equally
to 1 according
to (3-4) we have

restricting

without

calculations

dify

--

P(es)

(3-5)

\177

Events

(b)

This

event

of 3 coins,considerthe
E: at least 2 heads

the example

Continting

outcomes

tcludes

Event

es

ca, and

e\177, e2,

event

3-1.

Figure

in

We might

say

;,//\177'\177

'<

sample

space,

Outcome set,

\337
e2
\337
e\177/

or

\337
\2756
\337
e7

'

iG.

An event as a

3-2

the event \234 is the


this
is a col .venient

collection

way

\2758

within an outcome

of points

subset

of points
to generally

{e\177, e\177, ea,

e,} as in

set.

Figure 3-2. In

fact,

define an event'

Definitio

An event

We

ask

no\177v

limiting relative

E is a subset of the
is the

\"What

probability of

frequency, we may

where

n\177

ca, o

frequency
e5

occur.

of E. But

(3-6)

set S

E?\" Using

the

of

definition

write

Pr (E)

el, e2,

outcome

= lim

n
T/--\177
OC

(3-7)

of course E occurswhenever

Thus

n E = nl

n\177 q-

na +

n.\177

the

9.utcomes

32

PROBABILITY

and from

(3-7)

Pr (E)

= Pr

3-1

TABLE

ways of

alternative

Three

Events

Several

(1)

n\177 +

lim

n2 +

(ei) 4- Pr (e=)

the

in

na

n5

Pr (ea)

of Figure

Experiment

3-1 (Tossing3

Coins)

event

an

naming

(3-8)

+ Pr (es)

(4)

(2)

(3)

Description

Outcome List

Arbitrary

Symbol
for Event
E
F

At least 2 heads
Second coin head,

Fewer than

All

No heads

followed

1 head

{e4, e6, e?}

2 heads

{e\177o,ea,

Ia

Exactly 3 heads

Less

of

event,

that

the

probabilities

1/8
3/8
3/8

es}

1/8

{ex}
{ex, e2, ea,e\177}

generalization

obvious

sum

1/2
1/4

Exactly

2 tails

1/8= 1/2

I/4

{es}

Exactly

The

1/8 + 1/8 +

{e\177, es}

12

than

1/8 +

{e2, e0}
{e4, e0, e?, es}

tail

2 heads

the same

coins

ea, es}

I\177

is the
that

by

{e\177,es,

Probability

of (3-8) is that
of all the points

4/8

the

probability

of an

event

(or outcomes) included in

is
Pr (E)

= \177 Pr

(ei) i

(3-9)

over just those outcomes ei which


are
in E. We note an analogy
(in physics)
and probability: the mass of an object
is the sum
of the masses of all the atoms in that object; the probability of an event is
the sum of the probabilities of all the outcomes
included in that event.
Various events are considered in Table
3-1; all the outcomes included
in
each
event
are listed in column
3. Since the probability of each outcome
summing
between

mass

33

F.V}\177NTS

is 1/8,the
TN

in t[

probability of each event


in column
when we consider the first

of the

:alculation

simple.

events

value of

have been cis table. In fact,


lear immediately

4 is very

is evident

list

the

same event; although

the

are

they

from the description, the

this

last
not

it obvious.

makes

list

and
may

(c) Combin!ngEvents
we might ask for the
than 2 heads or all

example,

an

As

there

that

of \"G or
same (or

probability

less

be

Will

the

coins

H,\" that is,


both). This

\337
e2 %

\370
e3

(a)

FIG.3-3
each case

diagrams,
illustrating probability of combined
events.
(The rectangle in
the whole sample space;hence the probability
of all points (or out-

repre

comes)within

rots

sun,\177oand1.) H\";
(a)

rectangle

ev,

combined

well as

\"G

o\177

at is

denoted

lists

general,

fo

any

events

two

G(c)t.\177
I Htw shaded,
J shaded. \"G or
u H,\"

and

of Table

3-1

\"G

by

H.\"From the

G UH
In

(c)

(b)

Ven

--

{e4,

e6,

be

may
it

can

e7, es,

q}.

(b)

H\";

G rq H

\"G union

read

be seen

shaded,

H\" as

that

G, H:

Definition.

A
this

in G

H \177xset
little

of points

which are

in

G,

or in

H, or

in

both

(3-10)

definition

u H, its

)stract
Since
\177robability

art

in

Figure

3-3a,

five of the eight


is

5/8.

called

Venn

diagram,

illustrates

equiprobable outcomes are included

34

PROBABILITY

be interested

\"G and H,\" that


is, that
This is clearly a much
more restricted combined event;
any
outcome
must satisfy both
G and
H,
rather than
either
G or H. Again, we can use a Venn
diagram
as in Figure
we might

Similarly,

there

will

than 2

fewer

be

event

the

in

heads, and all coins the

same.

3-3b; this shows clearly

that
there is only one outcome (3 tails) that
qualifies.
event is denoted by G ch H, and may be read\177\"G intersect
H\"
and
H.\" The lists of G and H in Table
3-1 confirm that

combined

This

as well as

\"G

G
since the only
outcome
appearing
of G rh H is 1/8. In general, for

any

(es)

rh

in both lists is es.


2 events G, H

Hence the

probability

Definition.

Gm

(d) Probabilities

of

of

& set

are in

which

points

(3-11)

both G and H.'t

Events

Combined

We have already shown


how
Pr (G k9 H) may be found
from
the Venn
diagram in Figure
3-3. Now we should like to develop a formula.
First
consider a pair of events
that
do not have any
points
in common,
such as I
and J from Table 3-1. (We also say that they are mutually exclusive, or do
3-3c

Figure

From

overlap).

not

Pr (I

this

But

simple

J)

that

obvious

is

it

k9

Pr (I)

+ Pr

does not always work.

addition

Pr (G t9

H)

\177

Pr

(G)

(3-12)

(J)

+ Pr

For example
(3-13)

(H)

has

What

Pr (G)

and
to

(H) we
This

overestimates.
\177
To

wrong

gone

and Pr

the

is easily

Since
intersection

case?

this

in

count

corrected;

G and
G rh H

subtracting

remember
when to or rh is used, it may help to recall
rh resembles
the letter \"A\" in the word \"and.\"
These

that

avoid

the

sentence\"E
is ambiguous.

ambiguity

to F

that

has 5 points\"

might
has a

overlap,

twice;
Pr(G

in

summing

this is why
(3-13)
rh H)eliminates

to stands for \"union,\"


technical symbols are used

that

occur if we used ordinary English. For example, the


precise meaning,but the informal\"E
or Fhas 5 points\"

EVENTS

double

this

counting. Thus, we have

35

shown

Theorem.

Pr (G

exan

In our

(G)

= Pr

H)

t9

-3-

Formula (3.
th

14) is in

disappears

is applied.

(3-14)

v\177hen

'+8-2

\177

s
and applies

general,

quite

fact

example where G and


! and J do not overlap;

(3-i2) where

rh

ple
5
4
8 --\177

worksin

(3-14)

- Pr (G

(H)

Pr

to

H overlap.
It also
hence Pr(I m J) =

For emphasis

we may

events.

It

in cases

like

two

any

applies
0, and
write

last

this

term

in generaI,

Theorem.

+Pr(J)

Pr(I\177J) =Pr(I)
if

But

exclusive.

mutually

(3-15)

,e recognizedthat this is just a special case of (3-14).


:ion of several events is defined
as mutually
exclusive if there is
.e.,
if no outcome e\177belongs
to more than one event. For exT\177ble
3-1, events I, I\177, and
12 are mutually exclusive; but
E, F,

it must
colle,

no

J are

I and

overlap,

ample,

in

are no becauseE and


The coll ction of events

and I

the

\"covers\"

In general,

{I,

I\177, 12,

space S.

sample

whole

at

F overlap

We

is mutually

Is}

call

therefore

exclusive, and also


it a partition of $.

Definition.

of a

partition

sample space

S is a collection

mutually
exclusive events {I,...
union is the whole sample space
\177I x wI2...
twl,\177 =S
of

a partiti

Thus

events, as
In

in E.

illu:

Table

on completely divides the


;trated in Figure
3-4b.
note

.\177fore

by

whose

(3-16)

space

sample

into

nonoverlapping
\"

G consists of exactly
those
points
which
\"CO
'
\"
'
G the
m?tement of'E, or 'not
And in general, for any event
E

We ther 3-1

denote it

I\177}

S.

that

could

call

\"E\"'&

points

are not

E,,,

and

Definition.

in sample

space

not

in E.

(3-17)

36

PROBABILITY

E4

E\177

E\177

FIG. 3-4
(b) Ex,

Venn

to

diagrams

event

An

Because these

and

since

(c) E

case is

events are mutually

{E, E}

{E, \234}

exclusive,

u E) = Pr (E)

form a partition).

Sample

rectangle.

a very

form

exclusive;

by

(3-15)

Pr (E)

simple partition.

(3-18)

form a partition
E)

1 = Pr (E) q-

Pr

(3-19)

(3-19) into (3-18)

Substituting

yields

E 2 are mutually

{E, E}

represented by

Pr (E u

This

E\177and

(Note.

shaded,

and its complement

Pr (E

(a)

definitions.

illustrate

a partition;
space S in each

E 2, . . \337,E\177form

(c)

for Pr

a solution

(E) in

(\234)

of Pr

terms

(E)'
(3-20)

Theorem.
As an example, considerthe
is \"no

complement

Pr

heads,\"
(at

of getting

probability

and is very simple

least one

head) = 1 ---

at least

one

head. The

to calculate.Thus

Pr (no heads)
8

This is not the only way to answer this question,


but it is by far the simplest,
since Pr (no heads)is so easy to evaluate.
The student should be on the alert
for similar problems'
the key wordsto watch
for are \"at least,\" \"more than,\"
\"less

than,\"

\"no more

than,\"

etc.

EVENTS

37

PROBLEM

3-5 Suppos

are

and nickel

a penny

(a) The outcome set

table.

the

on

thrown

conveniently.

listed

be

may

(penny, nickell

= e\177

(H, H)
(H,

T)

\337
(T,

H)

= e2
= ea
=e4

\337
(T,T)

ourself

Satisfy

are equally likely,

4 outcomes

all

that

1. Philo:o?hical
argument. Obviously et and
because they
differ
only
in what happens to

e2
the

are

Em\234i\177ical

so

10 times
frequenc

(b)

Con

as just

, of

everyone in the class


amount
of data can be

Have

argume\177t.
that

a large

each outcome

ider the following


a reduction
of the

Both

heads

tails

One of

Are these

(c) What
3-6

two

alter\177

The

outc(

three outcomes equally


is the probability
of
sets, and

ate outcome
>me

set

of

Problem

repeat the

H
T

each

verify

are their probabilities

? What

likely

one head?

at least

that

3-5(a) could

you

H)

(H, T)
\337
(T,

T)

the

same answer.
be

(H, H)

using

Answer

get the

alternatively

\337
(T,

relative

'

Nickel

Penny

experiment
the

(a)].

set in

\337
Both

equally

symmetric

set [which is recognized

outcome

alternate
outcome

nickel.

ea are
the

pooled.Is

about 1/4?

likely,

equally

symmetric

Similarl,
ea and e4 are equally
likely.
Finally,
e\177 and
likely, t ecause they
differ
only
in what happens to
penny.
] hus all 4 outcomes are equally
likely.
2.

2 ways'

in

written

as

PROBABILITY

38

same way, list the outcome set when


a pair
red, one white. Then calculate the probability

In the

one

thrown

(1) A

total of 4 dots.

(2)

total

of 7

(3)

total

of 7

(4) A

double.

(5)
(6)

total

double

painted white

? In

other.

the

on

get the

answers
to (1)-(7) if the dice were both
compare the chance of a {3,3}combina-

same

particular,

a {I, 5}combination.
coin of Figure 3-I were not

chance of

to the

the

Suppose

3-4).

Problem

3.

(8) Would you

3-7

(as in

the long run,

the

fairly

HT)

\337
(H

T H).

\337
(T

H H)

\337
(T

H T)

\337
(T

T H)

\337
(T

T T)

of Table

G'

10

10

15

10

10

15,/

3-I,
than

fewer

15

2 heads

H: all coins the


find

following

the

(a)

Pr (G);

(b)

Verify

that

Let us

further

(3-14)

\177

holds

same
(3-9)and a Venn

(Hint. Use

probabilities.

Pr (H); Pr (G

H);

Pr (G ch

H)

true.

define

K: fewer

than 2 tails

L' somecoinsdifferent
Then

find

(c) Pr

(d)

(K); Pr

Verify

that

(L); Pr (K L;
(3-14)

holds

over

15v'

. (H

\337
(HTT)

definitions

and that

observed

Pr (e)

\337
(HHH)

the

thrown,

frequencies were

relative

following

Recalling

are

of

least 8 dots.

of at

on one die, 5

(7) A 1

tion

dots.
or 11 dots

of dice

L);

true.

Pr (K ch

L)

diagram.)

39

EVENTS

3-8 (a) List

(b) Defir

he sample space of 4 coinstossed


e events ,4: all coins the same

B: precisely

Redt

fine

(A) + Pr

3-9 When

+ Pr

(,4)

Pr

Evaluate

(c)

1 head

least 2 heads

C: at

is Pr

simultaneously.

form

a partition

as \"all

tails.\" Do

(C). Do these

(B) + Pr

events

,4, B, C now

(B) + Pr (C)?

? What

a partition

form

tossed 4 times, let Y denote the number of changes


in
sequence.
For
example,
the outcome H T H H may be written
H/T/HH
! where
the two changes in sequence
are indicated
by slashes;
similarly\177
the
outcome
H/TTT has only
1 change.
What is

3-10

oin

(a) Pr (\177 = I)
(b) Pr (Y\177 = 2)
(c) Do tile events of (a) and (b) form
a
(a) Wha'
is the probability of at least
tossed ? .

partition

(b) Wha'

one head

3-11

is the

at least

of

probability

x Supt

>sea class

following

of 100students

of

Men

Women

math

math

If a student is chosen
e student will be:

chance tk

are

groups,

in

the

by

to

lot

17

38

100

100

23

22

100

100

be class

president,

what

is

the

(f) If

th

th

be a

what

is

the

math ?

classpresident
\177the

in

is taking

prece( .ed by arrows

math

A ma n, and taking

chance

Problems

coins

wc man ?

(c) Takir g math ?


(d) A ma n, or taking

the text.

coins are

A mE n?

(b) A

10

when

several

consists

Not
taking

(e)

when 4

proportions'

Taking

(a)

.9

head

one

tossed
\177

is fairly

fact

math ?

are

important,

turned

Not

taking

because

out to
math

man,

they introduce

a later section

in

PROBABILITY

40

students of a

3-12 The

in various

engage

school

certain

sports

in

the

proportions:

following

Football, 30 \177oof all students.


Basketball, 20 \177o.
20 %.

Baseball,

Both football
football

Both

three

is chosen

a student

If

that he

will

and baseball,

basketball

Both
All

sports,
by

for

lot

5 \177o.

2 \177o.
an interview,

chance

is the

what

be:

An athlete (playing at least


football
player
only.9

(a)

and basketball,5 %.
and baseball, 10\177o.

one sport).9

(b) A

player or a baseballplayer.9

(c) A football

is chosen by lot, what is the chance


(d) A football player only ?
(e) A football player or a baseballplayer.9
Hint.
Use a Venn diagram.
(f) Use your result in (a) to generalize (3-14).
an

If

3-4

athlete

with the experiment of fairly


is completed, and we are informed
i.e., that event G had occurred.Given

heads,

tossing

probability

that

he

will

be:

PROBABILITY

CONDITIONAL

Continuing
the

that

event

ditional probability,\"
given G.\"

The problem

! (no

3 coins,

tossing
that
this

heads) occurred? This is

and is denoted

suppose

were fewer
condition,
what

there

an

as Pr (I/G),or \"the

example
probability

that

than\1772

is the

of \"conof I,

by keeping in mind
that
our relevant
Figure 3-5 it is evident that Pr (I/G) = 1/4.
The
second
illustration
in this figure
shows
the conditional
probability
of H (all coinsthe same), given G (less than 2 heads). Our knowledge of G
means that the only relevant part of H is H \177 G (\"no heads\" = I) and thus
Pr
(H/G)=
1/4. This example is immediately recognized as equivalent
to
the preceding
one; we are just asking the samequestion in two different ways.
Suppose Pr (G),Pr (H), and Pr (G \177 H) have already been computed for
the
original
sample
space $. It may be convenient
to have
a formula for
Pr (H/G) in terms of them. We therefore turn to the definition (3-1) of
probability as relative
frequency.
We imagine
repeating the experiment n
times,
with G occurring n(G) times, of which
H also occurs n(H \177 G) times.

outcome set is reduced

be

may

to

solved

G. From

CONDITIONAL
Knowledge that
has occurred m kes

Knowledge

\337
e1

this original sar pie---->


space S irreJev, tit.

41

PROBABILITY

,,,

'el

has occurred

'\"

\177__theoriginal

\337
e2

\337
e2

\\

that G
makes
sample

S (including
outcome e\177in H)
space

\337
e3

irrelevant.

G, which now
the ne\177t
\177

becomes
sample

space,

:--H

' e6
I; this event includes
one of four equi-\177
probable outcom\177
in sample

\337
e7

G, which becomes
new sample

space.

space

Thus Pr(I/G)

13

relevant

G, the only
part of H,

(a)

FIG. 3-5

Ve\177

to illustrate

tn diagrams

probability.
to Pr(I/G).

conditional

Note Pr(H/G) is identical


is

ratio

The

and

frequency,

relative

conditional

Lhe

Pr (H/G)

.x

n(H

lim

in

(a) Pr(I/G).

the

(b) Pr(H/G).

limit

(h G)

(3-21)

n(G)
On

divi

denominator

and

numerator

ting

Pr (H/G)

Iim

n(H

,\177

Pr (H/O) =
This

is

formula

multiplying

Pr

often

in a

used

Pr

by n, we
(h G)/n

n(G)/n

Pr (G)
(H rh

(3-22)

O) t

different

slightly

obtain

(G)

form, obtained by

cross

r (H

G)

ch

= Pr

(G) Pr (H/G)

(3-23)

/-P

PROBLEMS ]
(3-13) Flip 3

follow:ngcoinstable. over

In this

section

probabilities. Tt

ind the

and

next, we

is permits

us

over again,

shall

to divide

recording your

all events under


legitimately by various

assume

as

results

consideration

probabilities

in the

have nonzero

at

will.

PROBABILITY

42

Conditional

Accumulated If G Occurs,
G
Frequency
Then H Also
Occurs ?
n(G)
Occurs?

Trial
Number

Relative
Accumulated

Frequency

Frequency
n(H

n(H

\177 G)

\177 G)/n(G)

No

2
3

Yes
No

1
1

Yes

No

.50

Yes

Yes

.67

is because

3-14 Usingthe

relative frequency n(H H G)/n(G)

calculated theoretically in the


of insufficient trials, so poolthe

the probability

not, it
class.)

is the

trials,

50

After

and definitions

coins

unfair

1.00

Yes

to

section?
(If
the whole

previous
data

close

from

of Problem 3-7,calculate

Pr (G/H)

(a)

(b) Pr

(H/G)

(c) Pr (K/L)
(d) Pr (R/L)

3-15

(a)

of

bought either X

bought brand

X?

(b) If

events

empty, i.e.,

brand

buying

consumer

the

buy brand X or brand Y but


X is .06, and brand Y is

may

consumer

probability

and

B are
at

include

or Y, what

one possible

(and

that

of course

outcome), is

it

The

that he

probability

the

exclusive

mutually

least

is

both.

not

.15.Given

always

nontrue

that

Pr

3-16

(A/A

L; B)

[Pr (A)]/[Pr

(A)

Pr (B)]?

chips (numbered Rx, R2, Rs) and 2 white chips


sample of 2 chipsis drawn,
one after the other.
List the sample space. For each of the following events, diagram
the
subset
of outcomes included and find its probability.
(a) Second chip is red.

bowl

contains

(b) First

3 red

Wx, W2). A

(numbered

chip

is red.

chip is red,given
(d) First chip is red, given
(e) Both chips are red.

(c) Second

the
the

Then note the following


obvious also'

first

second

chip is red.
chip is red.

features, which

are

perhaps

intuitively

PROBABILITY

CONDITIONAL

\177

(I)
\177he answers
to (a) and (b) agree, as do the answers
(2)
\177
:h \370w that
the answer to (e) can be found
altern\177atively
(3-2'-) to parts (b) and (\275).

i i113

Two !\177cards
that:

(3-17)

of part (2)' if 3 chipsare drawn


what
red ? Can 3,ou now generalizeTheorem

!Xtension
are

(3)
that

3-18

p\370\177er

two black

the

are

(a)

(b) 21aces,

(d) 4 acesand
(e) 4 aces?

is the

What

probabiIit3,

queen, jack or ten)?


'deck of cards.

order,

queen ?

a king

What?is the chance

in

finall

2 kings,

then

(c) 4iaces,then

probability

(3-23)?

an ordinary

from

\177sdrawn

i is the chance of drawing,


2iaces , then
3 kings?

Wha

is the

aces ?

cards (ace, king,

honor
cards)'

are both
hand
(5

an ordinary deck.

and (d).

b3, appl3,ing

aces?

(a) l'he3,are both

(b) \177ljhey
(c) \177he3,

from

drawn

are

to (c)

43

of drawing,

in

order

any

whatsoever,

a king?

(f) '5Vour of a kind\" (i.e., 4 aces, or 4 kings,or 4 jacks, etc.)?


5 cards are drawn
with
replacement
(i.e., each card is replaced
in the
deck before drawing the next card, so that
it is no longer a real
If the

poker
deal),what is
(g) E\177:actly 4 aces?
3-19 A

are picked up
(a)

T\177e

(b) T15e
(c) The

first

two

first

defective

first

defective

Two diceare throw


E: firs di e is 5
tot\177

G'

totall

Compute

(a) Pr
Pr

(b)

(c) Is

3-21 If

empty,

the relevant

(G/E)

.of

3-22accoun
A
corrpany

defective

If

bulbs.

is the chance

the

the bulbs

that

9th ?

Let

n.

probabilities using

(F/E)= Pr (F).

a\177d

order,

10

is

it

what

any

in

I is

related to

of drawing,

are good ?
bulb was picked 6th ?
bulb was not picked until

bulbs

=> 3-20

F:

order,

random

in

probability

bulbs contains

10 light

of

sup\"ply

the

\177

Pr

Venn

(G).

Pr (E)? Do you
(a), Orthatjust Pr an(E/F)=
accident ?

true

F are
course),

Show that:

diagrams.

this

is closely

exclusive
events
(and both are nonbe said about Pr (E/F)
100 persons--75
men and 25 women. The
provides jobs for 12\177o of the men and 20\177

any 2 mutually
what can

employs
ng department

think

PROBABILITY

44

women. If a

of the

from the accounting


man? That it is a

at random

chosen

is

name

probability

is the

what

department,

is a

it

that

woman?

(Bayes' Theorem).In a
grade schoolgraduates,
50

=> 3-23

among

employed,

and among the college graduates 2 %


If a worker is chosen at random
is the

what

(a)

grade

are

Among the grade school graduates,10%are unthe high school graduates, 5\177o are unemployed,

graduates.

college

40% are

graduates, and 10%

high school

% are

suppose

workers,

of

population

unemployed.

are

and

to be

found

unemployed,

that he is

probability

graduate?

school

(b) A high school


graduate
?
(c) A college graduate 9.
(This
problem
is important as an introduction to Chapter
15; therefore
its answer is given
in full.)
Answer. Think of probability as proportion of the population,
if you
like.

Classesof Workers

C3

C2

C1

Old sample space =

pop

ulation

of workers
Effect

E (unemployment)

the new

space,

sample

shaded.

!2/////////

Pr (E) =

,,,,,,,,,,,

er(E)=\177er(Erh
Pr

(E rh
Pr

In the

Ci) =

\177j

new

.040

(E/C\177)Pr(C\177'

space

sample

-/
.025

....... : .....

shaded,

.002

(3-22) gives
,040

(a)

Pr

(C\177/E)

--'

.067

--

.025

'

(b) Pr (C2/E)-- .067--

.597

.373

.002

(c)

Pr (Ca/E)
check,

sum

-- .067 ---

.030
1.000

= .067

.067.

is

45

INDEPENDENCE

Theorem. Problem 3-.23


be stated as Follows'

on Bayes'

No!es

may

which

Theoren\177,

is an

Certain \"causes\"(education levels)


Q,
.probab/l;tiesPr (Ci). In a sense the causes
(unemP oyment) not

Pr (E/Cly

In a

probabilities

(C,/E)

Pr (E/C\177)J-+Pr
3-24

an \"effect\"

Deduced

Given

\177

.prior

C,,, have

manipulations,
calculates
of probability
a cause given
the effect,Prone
(Ca/E)
\337

Using
conditional
the probability

eventual

Ca,...
produce

with conditional

but

certainty,

with

of Bayes'

example

cd rtain

country
it rains 40 % of the days and shines60 \177 of the
A barometer manufacturer, in testing
his instrument
in the lab,
has four d \177at it sometimes errs: on rainy days it erroneously predicts
'\177shine\"
10 \177 of the time,
and on shiny
days
it erroneously
predicts

days.

\"rain\"

\1770\177

(a) In

the

of

time.

the (prk,r) chance of rain

seeingit
(b)

(error

INDEPE]

3-5

In

this

an

if

rain

predicts

shine if

the

improved
\"rain\"?

barometer

improved

barometer

\"rain\"

3-20

is the

that Pr (F/E) = Pr (F).Tfiis


the same as the chance

we noticed

is exactly

E,

khowing

knowing E; or)knowledge
It seems reasonable,therefore,
fact,

chance of

posterior

chance of rain

is the (posterior)

posterior
chance of
10 and 20\177 respectively)

is the

\177

F,

what

looking

\177DENCE

Proble

chance of

of

r\177.tes

(c) Wh\177t
predicts

the

is

is 40 \177. After

\"rain,\"

predict

Wh\177,t

looking at the barometer,


at the barometer and

before

weather

tomorrow's

p\177'edicting

of

does
to

:basis for the general

not change the


F statisticall),

of F,

\177robability

call

definition:

that

means

independent

of

the

without
F at

all.

of E. In

Definition.

An

event

F is

called statisticall),

independent

(3-24)
of an

Of

say that
changes

course,

G
the

n the

wa\177

event E if

case of events

statistically

pr o bability

of

Pr (F/E)
G and

de\234endent

G.

on

Pr

(F)]

E, where
E.

!'(G/E) v\177P(G), we would


case, knowledge of E

In this

46

PROBABILITY

We

now
(3-22)

stituting

Pr (F

can

E) =

(Fm

Pr

Pr (F) Pr

(E)

Sub-

(3-25)

and work backwardsfrom

this argument,

reverse

of E.

independent

being

Pr (F)

E)

(h

Pr (E)

hence

We

of F

develop the consequences


in (3-24), we obtain

(3-25)

as

follows:

Pr (F

(h

E)

__

Pr

(E)

Pr (F)

[Pr(E/F)=Pr(E)
(3-26)
of F whenever
F is independent
of E. In other
words,the result in Problem 3-20(c) above was no accident.In view of this
symmetry,
we may henceforth simply
state
that
E and F are statistically
independent
of each other, whenever any
of the three
logically equivalent
statements (3-24), (3-25),or (3-26)
is true.
Usually,
statement (3-25) is
the
preferred
form,
in view of its symmetry.
Sometimes,
in fact, this \"multiplication
formula\"
is taken as the definition
of statistical
independence.
But this is just a matter
of taste.
Notice
that so far we have insistedon the phrase
\"statistical
independence,\" in order
to distinguish
it from other forms of independence
philosophical,
logical, or whatever. For example,we might
be tempted
to
say that in our dice problem, F was \"somehow\"
dependent
on E because the
total of the two tosses depends on the first die. This vague notion of dependenceis of no use to the statistician, and will be considered
no further. But
let it serve as a warning that statistical
independence
is a very precise concept,
defined
by (3-24),
(3-25), or (3-26) above.
Now that we clearly understand statistical independence, and agree that
this
is the only kind of independencewe shall
consider,
we shall run no risk
of confusion
if we are lazy and drop the
word
\"statistical.\"
Our results so far are summarized
as follows:

That

is,

E is

independent

Pr (E

General

Theorem

SpecialCase

+ Pr (F)

Pr (E)

= Pr
if

+ Pr

(E)

E and

(E

(F)

mutually

rh

F)

i.e.,

exclusive;

if Pr

t3

= 0

Pr (E m

F)

Pr

(Em

F)

= Pr

F)

(F). Pr (E/F)

= Pr (F). Pr (E)
E and F are
independent; i.e.,
if Pr (E/F)
= Pr (E)
if

47

INDEPENDENCE

PROBLEMS
3-25 Three co ns

are

tossed.

fairly

E\177'

are heads;

coins

two

first

Es' last coin is a head;

Ea'
Try to a\177.swer
the condition

questions

following

the

affect your betting


sample s[,aceand calculating
the
(a) Are 4?\177and E2 independent ?
(b) Are

\177

and

folh

ws (compare

Then verify
probabilities

relevant

of

knowledge

(does

intuitively

odds?).

by drawing
for (3-24)

the

E,\177independent

3-26 Repeat Problem 3-25 t\177sing


\177sas

are heads.

coins

three

all

unfair

three

the

coins whose

Problem 3-7).

sample Space

.(H H
(H H

H)

t5

T)

10

(H T H)
\337
(H

10

15

T T)

, 7(Tn

H)

.(T H
(T T

15

T)

10

H)

.10

.15

(T T T)'

3-27 A
on

electronicmechanism

certain

or off

the

with

which have been observed


relative frequencies:

2 bulbs

has

long-run

following

Bulb
.1'
l\177BUlb
1'

on
Off
This

off

30 pe:

(a) Is
(b)

:able

means,

cent of the

for example,

tt\177at

On

Off

.15

.45

.10

.30

both

bulbs

fib

I on\"

independent

of

\"bulb

2 on\"?

Is \"bi fib

I off\"

independent

of

\"bulb

2 on\"?

\"b'

'\177

were simultaneously

time.

3-28 A single ard is drawn from

a deck of

E:
F'

it

is an

it

is

cards,

ace

a heart.

and

\177

let

48

PROBABILITY

(a) An

ordinary

52-card

An

ordinary

deck,

(c) An

ordinary

deck,

(b)

3-6

(a) Symmetric
are equally

symmetry of

order

that

(compareto

(3-5)).

points, for

a fair

(e\177)

Pr (e2)

having

for

by (3-9)

an event

summation

Thus, for equally

For example,

\177

1/6,

likely

outcomes

or

the

points,

in

rolling

E consists

of
thus

its

three

over points

only

extends

probability

is given

es in

(NE

in number).

outcomes

probable

a fair

E: number

(3-27)

NE

event
an even number.

die consider the


of

dots

is

of the six equiprobable


is 3/6.

elementary outcomes (2, 4, or 6

probability

Symmetric probability theory


and gives a simpler

probability,

must be

--

E consisting of NE

Pr (E)

dots);

equally

each

as

the

where

one,

to

sum

ej

Pr (es) =
Then,

of its outcomes

= Pr (e0)

-- '\"

these six probabilities

point

each

us that all six

assures

die

for an experiment

general,

In

probability,

Thus

probable.

Pr
In

relative frequency.

symmetric

including

Probability

physical

The

limit of

as the

probability

and subjectiveprobability.

probability,

axiometric

PROBABILITY

approaches,

other

several

are

spades deleted.
spades from 2 to 9 deleted.

with all the


with all the

3-1 we defined

In Section
There

deck.

OF

VIEWS

OTHER

we use

F independent, when

E and

Are

begins
development

with

(3-27)
than

as the
our

definition

earlier

of

relative

ap[roach. However, our


although the exampleswe cited often
theory
we der:loped
was in no way

frequency

should c\177nfirm
eouiDrobable;!special

you

3-\17763

where

Not

only

also has

earlier
analysis
was more general;
involved equiprobable
outcomes\177,
the
limited to such cases. In reviewing
it,

be applied whether or not outcomes


are
should be given to those cases (e.g., Problem

it may

that

attention

o\177.tcomes we.re not equiprobable.


:is symmetric
probability
limited

in (3-2 I). revolves the phrase


circular reasolqing.

,,

equally

because

it

lacks

it

generality;

how the definition of probaprobable


; we are guilty
\"
of

:'

to probability suffers from the

approach

frequency

own!relative

Note

weakness.

philosophical

major

bility
'

Our

OF PROBABILITY

VIEWS

OTHER

same philoso\177lhical

weakness.
We might ask what
sort
of limit is meant in
is logicallypossiblethat the relative frequency n\177/n behaves
badly,
even i\177 the limit; for example, no matter how often we toss a die, it
is just
conce/ \177able that
the ace will
keep
turning
up every time, making
lira n\177/n -- 1. Fherefore,
we should qualify
equation
(3-1) by stating that
the
limit occurs \177ith high
?robabilitF,
not
logical
certainty.
In
using
the
concept
!.
of probability in the definition
c(f probability,
we are again guilty
of circular

? It

(3-1

equation

reasoning.

,-

Objective

Axiomaft:

(b)

The

onI,\177

In

as axioms'

approach,
in fact, is an abstract axioversion, the following properties are taken

sound

philosophically

appro\177.ch.

matic

Probability

a simplified

.....

Axioms

Pr

(e\177) >
_

(3-2) repeated

Pr (ex) +

Pr (es)'''

Pr (E) =

\177

Pr

Pr

(e 5)

= 1

(e0

(3-4) repeated
(3-9) repeated

Then the ol her properties, such as (3-1),(3-3),


and
(3-20)
are theorems
derived fro_\177 these axioms
.with axioms and theorems together
comprising
a system of analysis that
appropriately
describes
probability
situations such

as d\177e toss\177n\177g,
Equatiqn

large

easily

numbers.
in

fa\275

matic theor
event

E,

etc.

(3-1) is. particularly


important, and is known
Equations
(3-3) and (3-20) may be proved
that
we shall give the proof to illustrate
how
can be developed. We can prove
even
stronger

as
very

the la,v of
easily,
so

nicely

this axio-

results'

for

any

50

PROBABILITY
Theorems.

_<

(E)

Pr

(3-28), like

Pr (E)

_<

Pt(E)

= 1 -Pr(E)

(3-29),

(3-2)

like (3-3)

(3-30),repeating

(3-20)

Proof

axioms

to

According

is therefore
positive;
To prove (3-30),we write

terms, and

Pr (e0

(3-9) and (3-2), Pr (E) is the


thus (3-28) is proved.
out axiom (3-4)'

+ Pr

Terms for

+ Pr (e,v) =

\".

Terms

for

\177

is just

(3-9), this

to

According

(e2)

of positive

sum

Pr (E) + Pr (E)= 1
In (3-28) we proved that every probability
(E) is positive or zero; substituting
this

is positive or zero. In particu(3-31) ensures that'

Pr

lar

Pr (E) _<

into

(3-29)

Thus our
be derived.

above theorems are established;other

(c)

Probability

Subjective

Sometimes
that

events

any

given

this

as \"likely\"
by observing
influences

vitally

interpretation. For
next

the

within

or

example,considerevents

average tomorrow,

market

stock

the

in

certain government
layman

\"unlikely,\"

on

risks

To answer this
bility
by

worth

in

practical

decision

be

as

such

of a

overthrow

that

decisions

the

likelihood
be

can

estimated

be made

taking.
need,

theory

theory of personalprobaprobability
is defined
event; we shall find this a useful

an axiomatic

has been developed. Roughly


speaking,
the
odds one would give in betting
on an

concept later

the

their relative frequency. Nevertheless, their


policy decisions, and as a consequence
must

rough-and-ready way. It is only then


are

or

with

month. These events are describedby


even though there is no hopeof estimating

in some
what

similarly

may

theorems

proved.

called personal probability, this is an attempt


to deal
cannot
be repeated, even conceptually, and hence cannot

frequency

an increase

(3-31)

follows.

(3-30)

which

from

(Chapter

15).

personal

OTHER

VIEWS

OF

PROBABILITY

Review Probl\177
\177lllS
A tetrah

3-29

(e4)

if possible,

so.)

staf'e

= .4;
) = .6;Pr

(b)
(c)

(e ) = .7; Pr (e2)

(d) Pr

3-30 In a fam

(c)

been loaded. Find Pr

tht

following conditions. (If the problem is impossible,


Pr (\1771) = .2; Pr (e2) =.4;
Pr (ea) = .1
Pr (\177)
Pr
(e2) = .4; Pr (ea) = .3
Pr (e
(ea)
= .2

(a)

(a)
(b)

die has

(four-sided)

',dral

given

ly of 3

At

!eiast
Ielast

one boy
2 boys 9

At

!e\177ast

2 boys,

lepst

2 boys,

At

(d) At

3-31 Suppose}hat the

.5

--

children,

is the

what

of

chance

least one boy?

given at

given

that

is a boy?
out of a restaurant
has to hand back their
the

eldest

3 customers

last

hat-checks, so that
the
girl
order. W \177at is the probability
(a) That no man will get the right hat?

(b) That

1 man

exactly

is t \177eprobability

their

random

in

will? .

(c) That exactly2 men will


(d) That all 3 men will?

3-32 What

lose

all

3 hats

that

peo >le picked at random


have different
birthdays ?
(b) A rot mful of 30 people all have different birthdays ?
(c) In a oomful
of 30 people
there is at least one p\177tir with
(a) 3

the

birthday

3-33

loaded

c )in

is drawn at

coin, if

it

in a

row

in

a row

(c) 20 times

in

a row.

3 times

Repeat Pr \177blem 3-33


b' it is biased

T faces,

coins, one of

random.

is flipped

(b) 10times

(a)

3-34

a thousand

co atains

bag

sides.

when

so that

What

and turns up

the loaded
the

which

has

is the

probability

heads

without

heads

that

on both
it

is the

f\177il

coin in the bag has both


of H is 3/4.

probability

same

H and

chapter

and

P'ariables

Random

Their Distributions

4-1

VARIABLES

RANDOM

DISCRETE

Again consider the experiment of fairly


tossing
3 coins. Suppose that
interest is the total number of heads.This
is an example
of a random
variable
or variate and is customarily denotedby a capital
letter thus:

our

only

X-- the total


The possible values of X are
likely. To find what the probabilities

(4-1)

heads

of

number

2, 3; however,
they
are not equally
are, it is necessary
to examine the original
sample space in Figure
4-1. Thus,
for example, the event \"two
heads\"
(X -- 2) consists of 3 of the 8 equiprobable
outcomes; hence its probability
is 3/8. Similarly,
the
probability
of each of the other events
is computed.
Thus in Figure 4-1 we obtain
the probability
function
of X.
The mathematical definition
of a random
variable is \"a numericalvalued
function
defined over a sample space.\"But for our purposes we can
be lessabstract;
it is sufficient
to observe that:
probabilities

with

In

our specific

the values
in

Figure

random

discrete

0, 1,

takes on various values

variable
specified

in its

example, the random variable

0,

1, 2,

4-lb.

3,

specified

probabilities

with

(number

by the

of heads) takes on
probability function

well enough, it is not always


which
stresses the random
variable's relation
to the original sample space. Thus,
for example,
in tossing 3 coins,
the
random
variable Y -- total
number
of tails, is seen to be a different random variable
from
X = total number
of heads. Yet X and Y have
the same 1)robability
distribution,
(cont 'd)

x Although

the

as satisfactory

intuitive

as

the

(4-2) will serve our


rigorous mathematical

(4-2)

function.

probability

definition

more

52

purposes

definition

DISCRETE

Pr(e)

RANDOM VARIABLES

i\177

53

\275

(T,T,T)

--.

\337(T,T,H)

---.

p(x)

\337(T,H,T)

----

\337
0

\337(T,H,H)

----

\337
1

\177

\337(H,T,T)

--

\337
2

\177

\337(H,T,H)

\177

\337
3

\177

\337(H,

\177

H, T)

\337(H,H,H)

Old
sample space

N ew , smaller
sample space

(a)
p(x)

2
(b)

FIG. 4-1 (a) X

the

variable

random

of heads in
fUnctiOn.

\"number

probability

three tosses.\" (b) Grapi:\177

of the

a probability function, as in Figure4-2,


we
begin
by c\177nsidering in the original sample space events
such
as (X = 0),
(X = 1),... jin general
(J\177 =
x); (note that capital
J\177represents
the random
variable, and Ismall
x a specific value it lnay
take).
For these events we cal
culate the probabfi\177tzes
and
denotethem \234(0), \234(1),.,
.\234(x) ....
This
probability
fu \177ction \234(x) may be presented equally well in any of 3 ways:
In the

ge

1.

Table

2.

Graph

the

same

case

of defining

f'orm, as in
form, as in

and anyone who


were

teral

ra!

used

Figure

the loose

edom variable.

probability
funcl ion.
\177
This notation,
[ke any

other,

4-1a.

Figure

4-lb.

definition
(4-2)
In conclusion,

may

be regarded

Thus, for exampi


for Pr (X =
the number e,/2(3) is short
>f heads
is three.\" Note that

that

spondingly abbn

viated

to ?.

might be deceived into


there is more to a random

simply as an

3), which
when

in turn

A' =

3 is abbreviated

variable

thht they
than

its

for convenience.

abbreviation

is short

thinking

for

\"the probability
to 3, Pr is

corre-

54

VARIABLES

RANDOM

Pr(el)
Pt(e2)

Pr(e)

01d outcome

set

New,

smaller

set of

numbers

4-2 A general random


variable
X as a mapping of the original
outcome set onto a
condensed set of numbers.
(The set of numbers illustrated is 0, 1, 2,..., the set of positive
integers. We really ought to be more general, however,
allowing
both negative values and
fractional
(or even irrational) values as well.
Thus
our notation, strictly
speaking,
should
FIG.

be

x\177,x

2, ...

x\177,...

than O, 1,

rather

2,... , x, .... )

sample space (outcome set) is reducedto a much


sample space. The original sample spaceis introduced
us to calculate the probability function
?(x)
for the new space; having
a

complicated

numerical

its purpose,

can be
Figure

the old

easily

very

answered

4-3, what

space

unwieldy

in the new

is the probability of

relevant probabilities in

the

Pr (X <

forgotten. The interesting

is then

1 head

new

sample

1)

p(0)

smaller,
to

enable
served

questions

space. For example,referring


or fewer ? We simply
add

to

up the

space
--

+ p(1)

\177

\1774-

a _

\177

(4-2)

p(x)

Prob

= \177

'2

FIG.

4-3

The

event

\337
(H,

T, H)

\337
(H,

H, T)

<Z

1 in

'3

both sample spaces,illustrating


the new sample space.

the

easier

calculation

in

DISCRETE

The

RANDOM

VARiABLEs

answe\177

found,

been

have

could

spa,

sample

more

with

but

trouble,

in the original

EXAMPL

In the
in t

changes

is 1,

;ame experiment of

tosses
of a coin, let Y = number
of
for the sequence HTT, the value of Y
there is one changeover from
H to T. In Figure 4-4 we use the

becaus,

\337(H, H,

\337(H,

fair

example,

For

\177esequence.

T)\177-.

\177

T, H)

(T, r. H).

FIG.4-4

The

random variable

Y (\"number of
its probability

and

technique p(y)
de,'eloped above to
function

define

changes in sequence
distribution.

random

this

of 3

tosses of a coin\

variable and its probability

!-

PROBLEMs

In each
by

4-1 In

.\177se,

constr

first

4 fair

(a) 2' =
(b)

4-2 Let

4-3

Two

paper

Y=

2' be

tabulate

ucting

the

a sample

probability
function of the random variable,
space of the experimental outcomes.

ossesof a coin,let
heads.

lumber

of

lumber

of changes

in

he total

number of

dots showing

box(

are

between

t]

s each contain
drawn,
\177enumbers

6 slips

one from
drawn

sequence.
when

of paper

each of
(absolute

the

two

fair dice

are tossed.

numbered 1 to 6. Two slips of


Let 2' be the difference

boxes.

value).

56

VARIABLES

RANDOM

To review Chapter 2, consider the experiment


of tossing
number of headsX may be 0, 1,2, or 3. Repeat this experiment
to obtain 50 values of X, so that you can
table
of X.
(a) Construct a relative frequency

4-4

=::>

O)

Graph

the sample \177 from (2-lb).


the mean squared deviation

Calculate

1. The relative
2.

were repeated

experiment

If the

(e)

tend

frequencies

50 times

table.

frequency

relative

this

(c) Calculate

3 coins; the

from

(2-5b).
, to

millions'oftimes

what value would

\177 tend

squared deviation)

3. MSD (mean
s 2 tend?

tend?

4.

4-2

3 we defined probability

In Chapter

we notice the

close

and

4-4

Problem

3 coins.

VARIANCE

AND

MEAN

If the

to toss ad
the probability

as

limiting

frequency. Now

relative

frequencytable observed in
the probability table calculated in Figure
4-1, for tossing
relation

between

size were

sample

the

infinitum),

the relative

limit,
(i.e., if we continued
table would settle down to

increased without
frequency

relative

table.

frequency table (Problem 4-4),we calculated


sam?le 4.It is natural
to calculate
analogous
tion values from the probability
table,
and call them the mean \177and
rr 2 of the
probability
distribution ?(x), or of the random
variable
From

\177 and

the relative

variance

the

s 2 of our

mean

populavariance

X itself.

Thus
(4-3)cf.

1, & \177 xp(x)

Population

mean,

Population

variance,

o2

& \177 (x

(4-4) cf. (2-5b)

- t,) 27(x)

Or simulate
this by consulting
the random numbers in Appendix
even number representing a head, and an odd number a tail); or
results, as follows'

(2-lb)

Table

IIa,

else use the

(with

an

authors'

03220 11232 11221 22213


13332
12212
12121 11233 21112 11213

Strictly

n --\177co,

speaking, we calculated the mean squared


they become practically
indistinguishable.

deviation, rather

than

s2. However,

as

MEAN AND
our e: :ample of tossingthree coins,we
the anal \177gy here to our calculations in
We
call fi\177
the \"population
mean,\" since
For

Note

sample mean, s\177nce it


parent
popul a :ion of all
tion and samp\177le variance,

and sample Vhlues


TABLE

coins. On

of the

tdsses

possible

all

Given

on the

is based

it

tosses.

similarly
o 2 and s 2 represent
A c lear distinctionbetweenpopulation

respectively.

to

we return

Var

Random

ia\177l

Easier CalcUlation
of a2, Using
of cr 2 from

Calculation

(x

(x

\177)

\177)2

(4-5)

(4-4)

(\177,-/)2

\177,.pC)

?(\177,)

--3/2

9/4

9/32

3/8

--1/2

1/4

3/32

3/8

6/8

+1/2

1/4

3/32

3/8

-t-3/2

9/4

9/32

12]8
9/8

02= 24/32

12/8

/\177=

7.

6 and

in Chapters

point

this

drawn

tosses

Mean and Variance of a

of the

x p(x)
0

3/8
3/8
1/8

called
a
fro m th e
p 0pula-

possible

(4-3)

1/8

,\177 is

mere sample of

of t* from

Function

4 = 1. 5

Tabl\177

populatlon of

the mean

hand,

Calculation

ProbabiJity

in

on a

Calculation

\1774-1

cr 2

is based

crucial;

is

and

compute/\177

Table 2-4.

other

the

VARIANCE

--

= 1.50

x\177?(x)

= 24/8

/1\177= 18/8

.75

o2 =

Since th definitions
parallel inter I tretations.

a ?'

of/\177 and

We

average, usi\177 g probability


The mean is dsoa fulcrum

those

parallel

of

,\177 and

6/8

s 2, We

find

continue
to think
of the mean/\177 as a weighted
weights rather than
relative
frequency
Weights.
and
center.
The standard deviation is a measureof

spread.
5

The

ion of

computa

09'

Mth
*

last column
Proof

its

proof,
1

ofjTable 4-

is equivalent

thati(4-5)

noting

th!t.,

\177 x?(x\177

(4-5)
is illustrat\177g

in the

\177

(x2 _

2,uz +

\1772)?(x)

is a constant:
=

Since

\177

to (4-4).Reexpress(4-4)as'

.2=
and

using:

by

a2= \177 a\1772?(x) _ t, 9.


is analogous to (2-7).The computation

This formula,

simplified

is often

=/\177

and

\177 \177p\275)

\177?(x) =
\1772=

1, we

x p(\177) + \177'

2.\177

have

\177 x\177
p(x)

\177 \177,\177?(\177)
_

2\177(/\177)+

y2

?(1)
(4-5)

proved

58

VARIABLES

RANDOM

When
a random variable is linearly transformed, the new mean
and
variance behave in exactly
the same way as when
sample
observations
were
transformed in Section
2-5 (the proof is quite analogous and is left
as an
exercise). For future
reference,
we state these results in Table
4-2.

We could
the rows,

all the

verbally

out

write

as Follows:

Variable

Random

in

this

Mean

working

table,

(Y) of a Random

Transformation

Linear

4-2

TABLE

information

(X)

Variable

Variance

across

Deviation

Standard

\"Consider
the random variable X, with
mean
3tx and variance
If we definea new random
variable
Y as a linear function
of X (specifically
Y = a + bX), then
the
mean
of Y will be a + b,ux, and its variance
be

will

b2rr\177r.\"

PROBLEMS

3t and o2 for the probability


distributions
check, compute a \177'in 2 ways from the definition
the easy formula (4-5).
(4-6) Compute
3t and
o 2 for the random variables of
4-5

Compute

As a

(b) Problem

4-7

Letting

If

(a) By

Problem

(4-4),

and

4-1.

from

4-2.

Problem

(a)

in

4-3.
of dots rolledon a fair
4, calculate3tr and ay in 2 ways:

the number

2X +

the

tabulating

probability

function

of

Y,

die,

then

find

3tx and

using

a x.

(4-3) and

(4-5).

(b)
(4-8)

By

A bowl
etc.

9's,

Table

contains tags numbered


Let I'

from

1 to

denote the number on a tag

10. There
drawn

at

are ten 10's, nine


random.

a table of its probability


function.
(b) Find 3tx and ax.
A student
is given 4 questions, each with
a choice
of 3 answers. Let
X be the number of correct answers when the student has to guesseach
answer.
Compute
the probability
function and the mean and variance
(a) Make

4-9

4-2.

of

X.

i' be

:c>4-10 Let
(This

4-11Suppolethat

for family sizeo


at 6.)

dren

Proport\177 3n

of famili

Source.

St\177

(a) Le

nted

drawn

) The

regist\177

proba[

Is

BINOl\177

the

in

Then

is

\177Slightly

.17

.11

.06

.03

rr

1963, p. 41.

of U.S.,

of children in

number
slip of

by

done

be

\17702

paper,

the

imagine each
well

mixed,

is given

in

slips

fufiction of X

at random.

selected

a family

lots:

the

being

familY'

and then one slip


table, of course.

x.

[IAL DISTRIBUTION

as an example
example of

number

random

variables.

We

shall

study

can be developed

formula

a general

variable

binomial

number of headsin

to g :neralize,we
or

of how

?(x).

function

Lbility

clas sical

discrete

of

types

many

>mial

the to

listed

.18

number

variable.
There a\177e
are

\337
43

a child

\"su, ',cess\"

either

probability

X =
In order

let

a
bin

prob\177

The

--/\177

be selected at random (rather


than
a famil'y), and
of children in his family.
(This selection may be
,y a teacher,
for example, who picks a child by lot from
the
of children.)
What are the possible values of Y? Completethe
ility table, and compute #r and a r.
\337
\177x or/\177r
more properly
called the \"average family
size'\"'?

done

for

,e the

N\177,w
Y

There

may

on a

repres\177

one- the

data

the

simplicity,

be the

Find p x and

4-3

(For

'Y

a
families yields the

of American

Abstract

itistical

(This election

(c)

population

Z --

Z, where

by truncating

No. Chi

(b)

of

4-5.)

whole

the

table

ng

follow
alterec

let

standard deviation

}tre the mean and


\177ntroduces section

What

# and standard deviation

with mean

variable

a random

59

DISTRIBUTION

BINOMIAL

shall

\"failure,\"

of

speak
with

successes

n tosses

is

of a

coin

of n independent \"trials,\"
each
resulting
respective
probabilities
rr and (1 - rr).
,Y is defined as a binomial
random

many practical random variables of this


in T \177ble 4-3. We shall now derive a simpleformula

type,
for

some

ot ? which

the probability

60

VARIABLES

RANDOM

TaBLe
\"Trial\"

Tossinga fair
of a

Birth

coin

child

Examples of

4-3

Variables

Binomial

\"Success\"

\"Failure\"

Head

Tail

1/2

Boy

Girl

Practically
1/2

Family size

'

rr

Number

tosses

of

heads

of

Number

in

boys

family

Throwing 2 dice
Drawing

a voter

7 dots

in

Democrat

Anything
else

6/36

'

Number

\177
throws

of

sevens

Proportion of
Democrats

Republican

a poll

Number

size

Sample

the

in

of

Democrats
in the sampk

population

the

history of one
which may
radioactively
decay during a
certain time period

Decay

Very

No change

Number of

large,

Very

small

the number
of atoms

atom

radioactive
decays

in

sample

the

\234(x).
First,
consider
the special case in which
we compute
the
probability of getting
3 heads
in tossing 5 coins (Figure 4-5a).Each
point
in
our outcome set is represented as a sequenceof five of the letters S (success)
and F (failure).We concentrate
on the event three heads (X -- 3), and show
all outcomes that
comprise
this event. In each of these outcomes
S appears

function

\337(sssss)

(sss ...........
ss)
(sss...........
SF)

\337(SSSSF)
Outcome

\337(SSSFS)

set

5 trials

trials

(SSS --SSFF...

F)

\177

Event
X=3

times

n-x

times

\337
(FFSSS)

\337
(FFFFF)

FIG. 4-5

\337(FF

x..

Computing

binomial probability.
coin. (b) General
case:

..............

(a) Specialcase' 3
x successes in n

F)

x..

heads

trials.

in 5 tosses

of a

DISTRiBUTiON

BINOMIAL

61

three times

SSSFFis

F twice.

and

The prol:

of

ability

the

Since the

of S

probability

is

probability of
S FF...
F

'i

is ,\177.,,...,

'

note:that
any
example,
th e probability

-- =).

\177r'(1

we

Now

are include d only have to

three S's and

this

\177r) =

(\177 _
rr\370:
(1

,,)...
--

rr)\177-x

differentIy.
such

many

in

which

is designated

of ways

number

(outcomes)

sequences

precisely the number of ways

can be arranged.This

is \370

and

or,

in general

=10
x!(n

(X \177
=

includes

with

\177 7T tg \177

)i

\177(\177

its probability

Hence

p(3) =

a probability
1312
(\177) (\177)

\275\177

is:

(x

Vr

\234(x)

5\177

3!2!

(\177

summarize

o This formula
designated

and so

thus

d-)
with

(xn)rr\177(1_

rr),\177-\177

(4-7)

\177o

Figure

4-6.

is C

S\177,S2,

on;

x)

= 10

outcomes, eacl\177

We

- x)!

event

Our

as follows. Suppose we wish to fill five spots with five


F1, F2. We have a choice of 5 objects to fill the first spot, 4 the
number of options we have is' 5.4.3.2.1= 5!

S eveloped

th\177a,

We

For

\177r\177(\177
-- =)\177

ordered

onIy

determine how
This is

evento

two F's

=. = .(\177

they are

appear;

tors

\177).

independence of the trials.


outcome
in this event has the same probability.
of SFSSF is

further

The same fa

(\177 ,.

by the

justified

being

multipI\177catlon

x times

n --

times

'

(1 --

F is

In general, the

sequence

SS...

this

and

rr

objects,
second,
(4-6)
\275o\177t
'a)

the

as

RANDOM

62

VARIABLES

Pr(e)

,rr(1

\177')\"-z

\337
{FF

fr (1

\177')\"-z

\337(FF

.........
.........

fr(1 - fr) \"-z

\337(SF

.........

rr\177(1 - fr)'*-\177

\337(FF'\"

4-6

FIG.

As a

final

fair coins.

three

But

this

n,,r

we

What is the

of a
is not the

(1 - \177r)
\177-\177

....... SS)

example,

Pr (X

a confirmation

(1-

FFF)

Computing binomial probability of x

trial in which

dependent

FSF)

FSS'\" SS)

\337(SSSS

\177,',

FFS)

rr

= 2)

return
probability

to our previous experiment of tossing


of two heads? Each toss is an in-

1/2. Noting

in n trials.

successes

also

that

n =

2! 1!
3[

\177 \177
32) r\177_V.l_\177_\177._

3 and x

(\177)a

\177
a

= 2, we have

(4-8)

previous result.

problem at hand;
appear as S. Thus

in our

case we

cannot

distinguish

between Sz, S2,

of our separate and distinct


arrangements
in (4-6), (e.g.SzS9SaF\177F2 and S2S\177SaF\177F\177.)cannot be distinguished, and appear as the single
arrangement
SSS\177rF. Thus (4-6) involves serious double
counting.
How much?
We double counted 3.2.1 = 3! times because we assumed in (4-6) that we could dis-

and

Sa--all

of which

many

and Sa when in fact we cannot. (3!is simply


the number of distinct
and Sa.) Similarly,
we double counted 2.1 = 2! times
because
we
that we could distinguish
between
F\177and
Fs, when in fact we cannot.
When (4-6)is deflated
for double-counting
in both these ways, we have

tinguish

between

of arranging
assumed in (4-6)

ways

S\177,S 2,

S\177,S 2,

= 3! 2!

3!(5

-- 3)[

63

DISTRIBUTIONS

CONTINUOUS

PROBLEMS

as well as the complete binomial distribution


i n' Tabl

tabulated

e i][I

of the Appendix' for

(b)

.
\177'hen

(c)

\177rom (b),
Graph
the

(d)

4-13

b\17711

drawn

is

(b) irePeat

a secondball

(a), for

dice, let Xbe the


tos.s of

blind

,q

on:the

basis of

binomial

variable,

\177(For

f0

that
?

probability
most

2 times

in terms

of

students

only,

showing

its

(b)

ksymptotes.

and

in
At

the target
6 tosses
you will hit the target
least
3 times ?
you
guess
the mean of \177general
rr ?

rest.

of hitting

guess the variance?

Can you

leading into section

of

4-5). Graph the

inflection.

DISTRIBUTIONS

In Chz ,ter 2 w e saw how a continUOus quantity


graphed
Wi:h a relative frequency histogram. The
pro'

TabUlatethe

qlaximum.
iboints

CONq,INUOUS

\177Starred

that occur.

cr \177.Graph.

probability

these questions, can

calculus
e-\177/2,

and

3t

suppose the

6-6.)

\177mmetry.

(d)

a dart,

of aces

number

of .Y. Find

\370Chapter

(a)
(c)

4-4

of

function

function

(This leadsint

function

number of blue ballsdraw\177.


Problem
4-9 using the fornlulas of

the total

probtbility

is 1 . What is the
exaCily2 times? At

\337
4-18

until

cr. Graph.

In rt 11ing 3
On

4-17

Y =

black

Tabulate its proba-

balls drawn,

red

and so On

is drawn,

.ection.

this

4-16

coin.

fair

replacement)

with

tin

of

number

,it and

Find

the probability

Check

(4-15)

use

are tossed;

calculate 3t and a 2.
probability function
of (b), showing
from a bowl containin g 2 red, 1 blue, and

= the total

(a) Eet X

bilit,!function.

for a

the results

obtain

\177 to
\177,

rr =

set

ballsl The ball is replaced, and


3 bats
have been drawn (sam

4-14

\177

probability

the

'al rr.

genei

\177

use.

optional

your

a diagram
similar to Figure4-6to obtain
(a) ( :onstrUCt
funct
ion for the number of heads Z when
4 coins

4-12

p(x) are

\1771
eros

are optional,

since they

are

such

as height

histogram

more theoretical

and/or

of

\177Was

heights

difficul\177

than

best

of
the

64

VARIABLES

RANDOM

4-7a below. (For purposes of illustration,


rather than inches.
Furthermore,
the g-axis has
been shrunk
to the same scale as the x-axis.)Note that
in Figure
4-7a relative
frequency is given by the height of each bar; but since its width (or base) is
1/4, its area (height times width) is numerically only 1/4 as large. Thus we
can't use area in this figure to represent relative frequency, since it would
badly
understate.
In fact, if we wish
area
to represent
relative frequency each

height

measure

we

reproduced

2-3 is

Figure

Figure

in

in feet,

1.00-

0.75-

0.50-

0,257 Height (ft)

(a)

Relative frequency
ea

,- 1.00

>' 0.75
Unit square,

= 0.50--

Area

= 1

--

\1770.25

a::

(b) making

(a) transformed into


total area -- 1.

increased

fourfold.

This is done

area of each bar is relative

frequency,

FIG.

4-7

Relative

must

height

be

frequency

relative

In

histogram

frequency

and

in

relative

4-7b,

in

density

frequency

Figure

the height

where

the

of each bar is called

density.

general

(relative frequencydensity)(cellwidth)
area of any

sum

7 Height (ft)

There is but one


to one (the sum

more
of

bar

(relative

relative

frequency)
frequency.

important
observation. In Figure
relative frequencies must be

all

4-7a, the

heights

one). From the

Height

Area = relative frequency

of

in

men

(if)

interval

(5\177 to 5\177ft)
\177.probability

Area = relative frequency

\177

of

men

in

interval

probability

(c)
Area = probability

Height (ft)

Height

(ft)

of drawing a man in interval

Height (ft)

:lative frequency
density
may be approximated by a probability
density
size increases, and cell size decreases. (a) Small
n, as in Fig. 4-7b.
(b)Largeenough n to stabilize relative frequencies.
(c) Even larger n, to permit finer cells
while
keeping
relai:ive frequencie s stable. (d)For very large n, this becomes (approximately)
a smooth probability density curve.
FIG.

function

How

4-8

as

sample

65

66

VARIABLES

RANDOM

of

equivalence

numerical

follows

the

that

characteristic of a density
to

equal

sum

also

Figure
And

it encloses an

in statistics:

function

in

to one.

area

this

4-7b,

it

is a

key

numerically

1.

In Figure

4-8 we

1. Sample
2. Ceil

size

as

frequencydensity

of

increases.

size decreases.

a small

chance

sample,

size increases,chanceis
At the

probabilities.

of cells.

definition

to the relative

happens

what

show

random variable

a continuous

With

4-7b must

area

4-7a to

Figure

in

height

in Figure

areas

out,

same time,

we

remains fixed

refer to

shall

the picture. But as sample


frequencies settle down to
in sample
size allows a finer
at 1, the relative frequency
so-called probability density

and relative
increase

the

While the area

density becomes approximately

function, which

influence

fluctuations

averaged

the

curve,

simply as the

function,

probability

designated

If we wish
to compute
the mean and variance from
Figure
4-8c,
the
discrete formulas (4-3)and (4-4)can be applied.
But
if we are working with
the
probability
density
function in Figure
4-8d,
then integration
(which
calculus students will recognize
is the limiting case of summation)
must
be
used; if a and b are the limits
of X, then (4-3) and (4-4)become
(4-9)
Mean,

,tt

Variance,

\177x

p(x)
(x

\1772 =

dx

\177)2 p(x)

(4-10)

dx

that we state about discreterandom


variables
are
continuous random
variables,
with summations
replaced by
integrals. Proofsare alsovery similar. Therefore,
to avoid tedious duplication,
we give theorems
for discrete random variables only, leaving it to the reader
to supply
the
continuous
case himself, if he so desires.
the

All

equally

4-5

theorems

valid for

THE

DISTRIBUTION

NORMAL

For many

random variables, the

bell-shapedcurve,

called

the

normal

probability

curve, or

density

function

Gaussiancurve,

is a
as

specific

shown

in

E NORMAL

TH

4-9 to 4-12. It is

Figures

statistics. Ma
are

in

made

variables

ny

the binomial'

physical

\177easuring

there

n addition,

often can

which

(a) Standard Normal


pro

The

be approximated by

symbols
3.14

1./v/\177-'\177is

ar

rr

2.7

and

curve.

normal

the

standard normal variable Z

of the

function

)ability

constant

in

Distribution

p(z)

The

function

distributed; for example,errorsthat


and economic phenomena often are normally
are other useful probability functions
(SUch
as

are normally

\177

distributed.

probability

useful

most

single

the

DiSTRIBUTiON

a scale

x/2\177

is

(4-11)

e-k\177z2

to make the

factor required

d e denote important
mathematical
constants,
8 respectively. We draw the normal curve

total

area

1. The

approximately
4-9 to

in 7 Figure

p(z)

-2

-3

1.0

.5

I
I

-1

Unit square

.2

\337

-2

.3

-1

(b)

FI( r.
reach

(a) Standard

a maxintum at z

to the left
In Problem
4-9.

The
Section

4-9

or

right

of

0,

4-11
you

math\177

4-3 to

may

-- 0. We

normal curve. (b) Vertical

axis

rescaled.

so: as we move
z2 increases; since its negativeexponent is increasing
confirm

have confirmed

that

in (4-11)

the

natical
constant
rr = 3.14 is not
d(:signate probability of success.

of (4-11)

graph
to

that this is

be

confused

is that

in Figure

shown

with the

rr

used

in

68

VARIABLES

RANDOM

?(z) decreases. Moreover, the


decreases;
as z takes on very

in size,

away from zero, the


or negative) values,

we move

further

?(z)

more

(positive

large

very large and p(z) approacheszero.


Since z appears only in squared
form, --z
generates the same probability
in (4-11)
as + z. This confirms
the
shape
of this standard normal curve as we have drawn it in Figure 4-9. The mean
and
variance
of Z can be calculated by integration
using
(4-9) and (4-10); since

the negativeexponent in

becomes

(4-11)

is symmetric.

curve

this

Finally,

p(z)

(o4Zqzo)

0
4-10

FIG.

this requires

enclosed

Probability

calculus, we

quote

Zo

by the

normal

between

0 and

z0.

without proof:

results

the

curve

\177z=O

Crz= 1

it is
we

very reason,

this

for

Later

when we

mean:

in

speak of
it so

shifting

fact,

that

Z is

its

that

called a standard
any

\"standardizing\"

is 0

mean

and

variable,
shrinking

normal

variable.

this is precisely
(or

stretching)

what

it so

deviation (or variance)


is one.
The probability (area) enclosedby the normal
curve
between the mean
(0)and any specified value (say z0) also requirescalculusto evaluate
precisely,
but may be easily pictured in Figure
4-10.
This evaluation of probability, doneonceand for all, has been recorded
in Table IV of the Appendix. Students without
calculus
can think of this as
accumulating the area of the approximating rectangles, as in Figure 4-8c.
To illustrate this table, consider the probability
that Z falls between .6
and 1.3,as shown
in Figure
4-11a. From Table IV in the Appendix
we note
that
the probability
that Z falls between 0 and .6is .2257;
similarly
the probability that Z falls between 0 and' 1.3is .4032.
We require
the difference in

that its standard

these

namely:

two,

Pr (.6

In

Figure

4-1

Becauseof

lb we
the

<_

--<_

1.3)

.4032 --

.2257 =

.1775

consider the probability that Z falls between --1 and +2.


of the normal curve, the probability
that Z falls

symmetry

<

\177.3)

-1

-2

DISTRIBUTION

NORMAL

THE

FIG. 4,11

In

.3413.

pr (--1
the

just over

2/31

PROBLEM

4-19 IfZ

probability of Z

+1,

\177hich

between

is

and

<_

(b)

If

Z <

I\177

ekaluate:

oo).

2).

%) =

.99, what

is z o?

(-Zo

_<

Zo) =

.95, what

is

\177dom

to

Z).

Z <

ra

Table IV

1.64).
<_

_<

Normal

standar*\177d

between one
1) is .6826,or

< +2).

_<

use Appendix

(-z0

(b) General
If a

enclosed
< +
-

(-- 1 __< Z

n0rmal 'cU;\275e,

variable,

normal

oo <

(_

(e)

(a) If

probability

the

the mean

.8185

\177

Pr (--2
Pr (Z <_

(d)

.3413 + .4772=

that

below

area of the

of the

is \177standard
pr

2) =

Z <

and

(c) Pr (-2.33 _<

and

0 and

between

pr\370bability

this to the

confirm

above

(a) pr (-2
(b)

__<

\275.tudent may

standard de\177ation

4-20

zi

.Z1772--which yields

2-namely

Finally,

to the

add

we

instance

thi'\177

probabilities.

normal

Standard

1 is identical

0 ah!d

between

(b)

(a)

_<

Zo

Distribution

variable

deviation

X
a, it

probability curve,
function iss written:

has a.n0r\177al

probability

p{x)

1 e_lA(?)2

with

mean

(4-12)

tt (4-12)is centered
at/\177, we note that the peak of the
curve ocCurg when the\177
To prove
that
negative exP\370\177ent attains its smallest value 0, i.e., when x -- ,u. It may also be shown
(4-12)
is seal e d by the factor a. Finally,
it is bell shaped for the same reason5 given
in

part (a).

RANDOM

70

VARIABLES

in the very special case in which


F = 0 and a -- 1, (4-I2)
standard normal distribution (4-11). But more
important,
regardless
of what
F and
a may be, we can translate
an),
normal
variable
X in (4-12)into
the standard
form (4-11) by defining:
that

notice

We

reduces

the

to

X-- F --Z

(4-13)

o'

p(x)-

\177 e -\177

X Standard
Sometimes
scale
is stretched

FIG. 4-12

Linear

\177 \177
\177 p(z)-\\ \177\177

normal

variate

/ Sometimes
1 e-\177Z2
\177
/

of any

transformation

normal

is shrunk

it

the

into

variable

standard

normal

variable.

Z is

recognized as just a linear


transformation
that whereas the mean and standard

Notice

of X,

as shown
of

deviation

in

Figure

a general

4-12.

normal

variate X can take on any values,


the standard
normal variate Z is unique
mean
0 and standard deviation 1 as proved
in Problem
4-10.
To evaluate any normal
variate
X, we therefore translate X into
Z,
and
then evaluate Z in the standard
normal
table (Appendix Table IV).
For example,suppose
that
X is normal, with
F = 100 and \177r= 5. What is
the probability
of getting
an X value of 110or more?That
is, we wish to
evaluate
with

Pr (X >_
First

(4-14) can

be

written

equivalently

pr (X-Any inequality
divided
by the

is preserved if both
same positive
amount

110)

(4-14)

o as

;
100 -> 110 --

(4-15)

100)

sides are diminished


(5).

by

the

same amount

(100) and

THE

not!

which,

is

(4-13),

ng

71

DISTRIBUTION

NORMAL

(4-16) is the

d form
of (4-14), and from Table IV
.0228. Moreover, the standardiied
form
(4-16) all\370\275is a Clearer interpretation
of our original
question;
in \177fact, we
were askin\177 \"What
is the probability
of getting
a normal
valUe at l\177ast two
standard
deviations
above the mean ?\" The answer is' very small--about one

We see t

evaluate

we

in

standardize

to be

probability

this

t.

fifty

As a

line has

standard
between

is a

\177 Which

length
diiViation

9.9 and

a bolt picked at random from


a production
normal random variable with mean
10 cm and
What is the pr0babilitY
that
its length
Will
b e

suppose

example,

filial

cm.

0.2

10.1cm? That

is

Pr (9.9 _<
This

may

b\177

standardized

in the

written

Pr (9.9-10<
,
--X-.2
= Pr
=

(--.50

_<

10.1)

_<

form

10< 10.1
2---

_<

.50)

.38

These :alculations confirm our earlier observation


from
although tl}ere is any number
of normal
curves, there is only
normal cur\177e. This is fortunate;
instead of requiring a whole
we only ne\177d one (Appendix
Table IV).

4 21 Draw'\177

similar

diagram

the reit
4-22 If X i
r

(b)

'r (X

(c)

r (800

(b)

(c)

of tables,

book

(4.5

_<

<_

6.5)

800)

_<

mean

4-12 for both


being

area

the

'e over

\177

X)

population

of' 68

inches,

where

5 and Ox = 1
400 and Ox = 200

where

400

where

Ox

and

5 feet 6 inches
5 feet 6 inches and
3 answers,

= 200

of men's heights is normally


and standard deviation of 3 inches.

6 feet

A re under
e between
A \337

To ch :ck your

in

evaluated.

:tion of the men who


A\177

Solved

examples

calculate'

4-23 Supprse that


with

to Figure

directly above. Shadethe


normal,

(a)

(a)

standard

PROBLE

propo

4-12:

Figure
one

6 feet.

see whether

they

sum

to

1.

distributed
Find

the

72

RANDOM

4-6

A FUNCTION

OF

VARIABLE

RANDOM

of tossingthree

again at the experiment

Looking

that there

is a

Formally,

we might

Now

VARIABLES

let us suppose

state

that

is
R

form

specific

the

that

well given

is equally

bilities

the

of X

Value

third

4-4.

Table

Form of

Tabled

4-4

TABLE

by

R =

Function

,. =

(X

Value of R
= (,\177-

g(x)

- 1)\177'

(0

- 1)2

(1

- 1)2

(2

(3

- 1)2
- 1)2

of R are customarily
rearranged
in order as shown in the
Table 4-5. Furthermore,
the values
of R have certain probamay be deduced from the previousprobabilities
of X, (just as the
of X were deducedfrom the probabilities
in our original sample

The values
column of
which

probabilities

toss.

is

function

this

us suppose

of heads we

= (x
which

of

let

coins,

the number X
a function of X, or
upon

depending

reward

4-5

TABLE

Calculation

from

the

of the

(1)

(2)

(3)

?(x)

r = g(,\177')

3/8

\337
0
\337
2

\177/8\177.
3/8

\337
3

1/8

--------

X
(4)

R Value

Values

(5)

?(r)

4/8

4/8

1/8

4/8

0
_.\177

of Each

Probability

of Various

Probabilities

3/8

NOTATION

(0 or 2) give

The third

to an

ris:
and

The last

of R.

we

R value

,ourth

column

',olumn

shows

Table 4-4

from

note

of 1.This

in this table
calculation

show the

of R,
from X,

\"derived\"

been

has

it

Table 4-5.

distribution

probability

of the mean

the

9f X,

values

two

that

with arrows in

is indicated

tom variable; although

a ran

is

all

4-1). Thus

in Fig ue

space,

\177 73

it

has

the proper 1 es of an ordinary random variable. The mean


of R c\177n be
fron its probability distributio n, as in Table
4-5, and is found to
1.0. But if it is more convenient, the answer can be derived
from
the

computed

be

probability

of X, as in Table

di s :ribution

TAI\177LE

of R

Mean

4-6

= (X

4-6.

o
1

3/8

3/8

3/8

4/8
/\177/\177
=

It is

this works;

see why

to

easy

a disguised

in

first and

8/8

= 1.0

Way we

are calculating

9f Tab!e 4,6
line of Table 4-5.,Also,
the
second
and fourth
correspon
d to t he first
and
ihird
lines of Tabl e 4-5.1Thus
as Table
4,5; it therefore
Table
4-6 con rains precisely the same information
for/\177e.
The only difference in the two tables is t\177at 4-6
yields the sam e value
is orderedac :ording to X vaiu es, While 4-5 is ordered (and condensed)

/\177e

the

in

sanlte way as

p(:c)

g(x)?(x)

?(x)
o

from

1)\177, calculated

appear togeth\177er as
lines of Table4-6

exa

[heorem

be generalized,

function, then R =
from

her

function

probability

the

4-5.

the

NOTA\177

Some n e
of the

mean.

third

lines

as follows. If X is a random
g(X) is a random variable.

.probability

function

of R, or

alternatively

.variable,
/\177R

may

be
from

of X accordingto

\177

4-7

The

second

can

nple
\337

is any
calculated
eit
g

and

Table

values.

accordingto
This

the

in

(4-17a)

,ION
\177notation

For

wi ll

any rando

help u s better understand the vari\370us vie\177p \370ints


m variable, X let us say, all the following terms

74

VARIABLES

RANDOM

the same thing: \177o

exactly

mean

/\177x

mean

of X

= average .g

of

expectation

expected value of X

= E(X), the

Theterm

because

is introduced

E(X)

sum, i.e.,

a weighted

it

reminder

as a

is useful

that

it represents

\177(a)

we recall
well write

Finally,
equally

= Z
x

was just an
in an easily

that

(4-17b)

can be written

result (4-17a)

notation,

new

this

With

(4-3)

g(x)

p(\177)

(4-17b)

abbreviation for g(X), so that


remembered form'

we

may

Theorem

e[g(x)] =
As

an example

of

write

we may

notation,

this

(4-17c)

E(X -- t\177)\177=

\177

E(X

\1772

(x

--

(4-18)

p\177)\177
p(x)

By (4-4),
--/\177)'\177

(4-19)

we see that g= may be regardedasjust a kind


the expectation of the random variable
(X--3t)
z.

Thus

of

namely,

expectation

PROBLEMS

4-24 As in
fairly

4-1,

Problem

let X

be the number of

heads when

4 coins

(a) If R(X) = X\" -- 3X, find its probability function, and/\177R


(b) Find E [X- 21in 2 ways:
(1) Using the probability function of [X -- 21;and

(2) Using
(c)

Find

the

probability

function

of X

in

and

(4-17)

E(X 2)

(d) Find \234(X-

#x)\".

Is this

related to

\177x

in

any

way ?

for the plethora of names is historical.


For example, gamblers and economists
\"expected gain,\" meteorologists use the term \"mean annual rainfall,\"
and
use the term \"average grade.\"

\1770The

reason

use the

term

teachers

are

flipped.

75

NOTATION

(4-25)

4 coinsI are
The

letting .g

4-24,

Repeat

4-26

tossed.

T,

tiI.\177e

with

variable

be the

the average

Fid

of 2 b
rewarc

Review

a maze,

is a rai\177dom

.2

26

.2

27

.1

time.
1 biscuit
for each second. faster
just 23 seconds,he gets a reward
25 seconds or longer, he getsno

;cuits. Of course,if he takes


.) What
is the rat's averagereward 9.

ms

Probl

4-27 In

.3

25

is rewarded with
example,
if he takes

(For

25.

than

24

rat

the

run

when

?(t)

(a)

to

probability function.

the following

(b) Suppose

a rat

required for

in seconds,

in sequence

of changes

number

a r\177'.cent

election,

presidential

If Gallup

Republican.

40 \177owent

60

\177oof

voters

the

took a sampleof 5

went

Demo\177cratic,

at random,

voters

find

(a)

that

probability

T.\177e

(b) T't\177e

probability

the

i.e.,

that

election winner,
D\177mocratic.

(c)
4-28

Id what

Three:

coins

Make\177

the sample would

that

be all Democrats.

sample would correctly forecast the


a majority of the sample would be
i

a sample of 5 better than a sample of 1? \177


are independently flipped; let ,Y = number
of heads.
of the probability function, and find
/\177x and
a\177r

way is

table

assu\177ing

(a)

coins

T\177e

are

fair.

last coin is biased,coming


\"heads
up\" 3/4 of the tim e.
SuppOse the amount of cerealin a package
cannot be weighed exactly.
In fa\177 L, it is a normally distributed random variable,
with/\177 -- 10.10
oz. a\177;dcr = .040 oz. On the package is claimed, 'net weight,
10 oz.\"
(a) '\275\177hat proportion
of the packages are underweight?

(b) The
4-29

(b) T

o what

of 1 \177

of

the

value must the mean/\177


be raised
packages
be underweight?

in order

that

only

1/10

76

VARIABLES

RANDOM

volunteers had their breathing capacity


measured
before
and
after a certain treatment.
The
data
might have looked like this'
Eight

4-30

Capacity

Breathing

Before

After

Improvement

2750

2850

+100

2360

2380

2950

2800

Person

+20

-150

Let us concentrate
deteriorates,i.e., whether

so

that

his

chance

and 1/6the
so

(a)

Find

errors.
(b) What
*4-32

that

impossible.)

learns rapidly,
1/4 the second time,

time,

first

the
that

that

he

equally well from his successesand


may be consideredindependent.
table and mean
of X -- the total number of
learns

the three trials

the probability
is the

1/2 the

is

signs? (Assume
He

succession.

in

what

average,

is practically

a tie

times

improves or
is q- or --.

time.

third

We assume
failures,

that

a task 3
of error is

performs

person

a given person
of the improvement

has no effect, on
be 6 or more q-

precise

are so

measurements

(4-31) A

will

there

that

sign

the

treatment

that

Supposing

probability

whether

on

of more

probability

I error?

than

(Requires calculus)
A

random

variable

X is
p(x)

continuous, and has a


=

\177x2

-- 0
(a)

Graph

x <

function

otherwise

p(x).

(b) Find the

expect ?

(c) Find

0 <

probability

a 2.

mean, median, and mode.Are

they

in

the order you

chapte?

Tllo

5-1

Varxble

andom

DISTRIIiUTiONS

thi

in Table

i section
5- i

TABLE

The main
Therefore

to recognize

bb

will

we outline

a simpleextensionof the last two chapters.


the old ideas behind the new names.

:ction is

first

The

problem

Review of

5-1, as

both

Section 5-1, Showing

Origins

the

review.

and

introduction

an

of the

Ideas

(new

terminology)

ol
Application

d Idea

(G rh

(3-11)

H)

Pr(X =x

\177

Pr
(H/G)to=
applied

er (X = 2/Y

\177=1)

or Pr(E

\177 F)i=

general

(3-22)

Pr(G)

= x/Y \177 y)

F is/ndepe
Pr (F/E)!=

1)
\177/)in

(5-2a)
(5-2b)

p(2, 1)
'
?(x, y) in general

rh G)

\177

Event

function

to

applied

Pr(X=

Pr (X

p obabthty

Joint

Conditionalprobability
= 1)

p(2/Y

p(x/Y =

in general

dent of E if
Pr (F)
Pr(E)

Variable

(3-24)

p(x/y)

or p(x,

Pr(F)
(3-25)

function

:q)

\177) or

X is
= p(:r)

p(x/\177d)

independent

p(x)p(y)

of

if

'

TWO

78

Probability

Joint

(a)

VARIABLES

RANDOM

In the experiment
single sample space) two

of tossinga coin three times,let us

X-Y

--

number of

(on

our

heads

of changes

number

Two Random

TABLE 5-2

define

variables'

random

in

sequence

on

Defined

Variables

the Original

Sample Space

(1)

(2)

(3)
Corresponding

Corresponding

Outcomes

X value

value

!
1
2
1
0

2
1
1
0

We
might
be interested
in the probability of 2 heads and
1 change
of
sequence occurring together. As usual,
we refer to the sample space of the
experiment
(in column 1 of Table 5-2),and look for the intersection of these
two

events,

obtaining

Pr (X

For convenience

Pr

ing

in

of X

5-3

Table

1) =

Y =

(h

(5-1)

2/8

2 \177 Y = 1) is abbreviated to \234(2,1)


compute \234(0,0), \234(0, 1), \234(0,2), ?(1,2)...
is called the joint (or bivariate)probability

(X =

we could

Similarly

= 2

what

(5-2a)

, obtainfunction

and Y.

The formal definition is


\234(x,

y)

,__a

Pr

The general caseis illustrated


X =

2...

form

a partition

of the

(X =
in

Figure

\177

Y =

(5-2b)

y)

5-1. The

sample space, shown

events X -schematically

0, X

--

1,

as a

79

DISTRIBUTIONS
T^BLE

5-3

The Joint

?(x, y),

value

of

value

of X

of X

and

1/8

2/8

l/8

3/8

2/8

1/8

3/8

\177/8

4/8

2/8

2/8

iv(Y)

2
3

slic\177ng.
shown as aver tical

a Coin.

\177

o
1

horizontal

Probability

Tosses of

in Three

Iv'

events Y = 0, Y = 1 ... form a partition


sample space.The intersection
of the horizontal
slice
X =
and the vertical slice Y = y is the evep* (X = x (h Y= y).
Its probability is collected into ?(x, y) in the table.
This tan :, or specificallyTable 5-3, may be graphed,
but we run into
)hical
difficulties
in trying to represent 3 dimensionson a
some typogra
two-dimensiot
al
piece
of paper.
We shall suggest some possible
ways
to
resolve
this di fficulty. First, since the outlay of the x and y in Table '5-3 is
running x across and y up as in
arbitrary, we hall change it for convenience,
Figure 5-2a ( his is the custom
in analytic
geometry).
Then the functional

values ?(x, y)imay

the

Similarly,

slicing

be

of the

plotted

in the

direction of

we

which

axis

an

i\177agine

X=0

X=I

012

X=2

0
1

(X.Y)

\337
p(0,

p(x,

y)

\337 \337y

0)

.p(1, 0)

2
3

X-x

.
New

sample

space

Orig hal sample space

FIG. 5-I

TWO

random variables

(X,

Y),

showing

their sample

space and

joint

proba-

bility function.
i

TWO

80

VARIABLES

RANDOM

Y/

2.

/o
(a)

(b)

3 x

2
(\275)

5-2

FIG.

(a) Realignment

coming
be

Various graphic presentations


of the axes. (b) ?(x, y)
the paper.\" (c)p(x, y)

of Table

is

up out of the paper, as in Figure


5-2b, or the functional
by the size of the dot, as in Figure 5-2c.

5-3.

out of

up

'

value

may

the

joint

represented

(b)

function of X, for

It

the

p(2)

sample space
chunks
comprising

those

\177

general,

p(2,

p(2,

1) q-

yet have to work with


we compute the

X,

in

can

How

= Pr

probability

schematic

p(2) -- p(2,0) q=

interested only
of X and Y.

example

that

appears

X = 2 in the
bilities of all

in

Function

Probability

Marginal

Suppose
we are
probability
function

and

bivariate probability function


by a line segment \"coming
represented
by the size of the dot.
of the

is represented

(X =

of

(i.e.,

of Figure 5-1) is the


it, i.e.,

p(2, 2)

q- p(2,3) q-

\337
'-p(2,

the horizontal
sum

of the

y) q-

.--

slice
proba-

(5-3)

(5-4)

y)

for any given

2)?
event

this

probability

x,

?(x) =

v)

(5-5)

DISTRIBUTIONS

For example,

may be

is idea

tt

applied to Table 5-3.We


8 \177

place this s m in the right-hand


z, thus pJoviding the whole

and
every

sometimes
it was

X (which

could

in Figure

In

found

been

t\177ave

4-I).

of X, to

specifictechnical
describes how the probability distribution of X may

another

in play;

Y is

V!ariable

This is

describehow

it is just the ordinary probability funct!on of


without any reference to Y, as indeed ltWas

has no

\"marginal'

word

the

conclusiVn,

it simply

when

of course,

But,

is computed for
margin.

right-hand

distribution

probability

marginal

the

calle\177i

obtained\177

the

in

find,

thus

margin. Similarly,.p(x)

column

81

a row

is

sum

meaning.

calcul\177te.d

be

and placed\"\"m

calculated

the margin.\"

In an

?(tJ), the (marginal) probability diStri


row of Table 5-3;eachelement
in
this row
column
above.
Finally, we note as expected,.
the
exact
correspor
dence of this marginal
probability
distribution
of Y with the
probability
distribution of Y calculated
in Figure 4-4 without
any
reference
whatsoever
to
of

bution

(c)

we calculate

way

identical

s is set out
is the mm of the
Y; th

marginal

the

in

Function

Con&tlonal:Probabfi\177ty

example of tossingthree coins,we might wish to know the probanumbers


of heads, given one change in sequence.
And,
in
general,
it is often of interest to know the probability
distribution
of X, When
Y is given.
TM\177s,
let us suppose
that Y is known to be 1. The conditional
probability
disilribution
of \177Y, given
Y = 1, is designated as ?(x/Y\177- 1).
How is it to be\177evaluated
?
In the

bilities of vario.\177s

Clearly,
5-1
Y

generally,
?r Table 5-3
1 appears
as the third

second colum n
in

do not sum

T] tey

distribution.

X values. Thus
conditional
summed
sum

are equally

Ly

1/2,

is

is that

I, shown
vertical

it is
the

Figure

sli ce

for

reproduced as the
joint

represent

cannot

in

probabilities
a probability

relative probabilities of various


that
X' cannot
be 0 or 3, but X'
Intuitively,
therefore,
we arrive at the

give us the

1, we know

probable.

we

5-3;

Table

The problem

Y =

get these

hence it

\177n

to 1, hencethey

distribution

:lid we

to on
to

know

prC\177bability

How

column.

must

of 1 or

values

column

however,

do,

if we

slice for

specifical!y.
The appropriate

5-4 below.

Table

in

column

this

the vertical

examine

w\177 should

of ,Y given
numbers

9.

Since

=
all

1 as shown in the fhird


elements
in column 2

simply doubled them all. The result


a bona fide probability distribution.

(column 3)

82

TWO

VARIABLE\177

RANDOM

Derivation of the
of X, Given

5-4

T^BLE

of X

Values

?(x, 1)

1)

o
1/2

2/8

2
3

\177(w/Y

o
1

Dis-

Conditional

tribution

2/8

1)

Sum =Pr(Y=
-- p(1)

Sum

=1

= 1/2

doubling

Formally,

theory

in

2 is justified
rigorously
probability was found
to be'

in column

all elements

conditional

3, where

Chapter

= Pr (H (h

Pr (H/G)

(3-22)

G)

Pr (G)
substitute

merely

We

H,

G and

for

defined

events

by the

in

terms

repeated
of random

variables, as follows'

For G,

x)

(X =

H, substitute

For

(Y

substitute

(5-6)

1)

Thus

Pr (X = x/Y
new

Using

= 1) = Pr(X

example,

our

p(1)

= 1/2,

so

that

p(\177/y

thus

Y=i)

notation
p(x/Y

In

=xrh

Pr (Y = 1)

justifying

the doubling

The generalization

p(x, 1)

(5-7)

becomes

\177) =

(5-7)

p(1)

2p(\177,

(5-8)

\177)

5-4.

Table

in

1) --

of (5-7) is clearly
p(x/Y

y) =

p(x,

(5-9)

y)

P(Y)

The

conditional

?(x/y),

probability distribution

may

?(v)

p(x/y)

Note

be

further

abbreviated

to

giving

how

similar

this

is to

= ?(x,

y) ]

equation (3-22).

(5-10)

'
Sinceth,

>le to

(d)

or

= v)

E(X/

os

it can

distribution,

fide

83
be

mean

conditional

the

obtain

a bona

is

distribution

conditional

exam

for

used

UTi

iST

3tx/\177

\177

(5-11)

x p(x/y)

Independe\177tee

We
defin
the independence
of 2
concept of the independence
of 2 events
Definition.

T \177erandom

X and

variables

if ?orevery .\177and
in :tependent.

random

develope

by

d in

Chapter 3.

called

are

events (X

y, the

= x) and

the

extending

variables

independent
( Y = \177/)are

The conse!quencesare easily derived. From (3-25)we


(\177fevents
(X = x) and (Y = g/) means
that

(5-12)

know

that

the

independence

= x

)r (X

(h

Y =

y) =

?(x,

y) =

Pr (X = x) Pr

(Y =

\177t)

(5-13)

?(x)?(y)

Returning
to
our example,
we easily show that
X and
Y are not independent. Foi independence,(5-13)must hold for every (x, y) combindtion.
We
ask
whethdr
it holds, for example when
x -- 0 and y
07 The answer
\177sno;
from theprobabilities
in Table 5-3, (5-13)is shown
to be violated
Since

PROBLEMS

5-1 In 4

tosses:

.V =
Y =

List the
(a)

coin,
again
heads

\177fa

nu

rr ber

of changes of sequence
and then find

space,

sample

The

1:

ivariate

Figur-- 5 2c

(b) The

marginal)

(c) The

\177ean

and

(d) The c\177nditional


(e) The conditional

(f) Are

let

of

nur\177.ber

and

function;

probability

probability
variance
of

Y independent?

and

with

a dot

of

X.
function

probability
mean

function

illustrate

variance

?(x/ Y
2).
of X, given Y

2.

graph

as in

84

5-2

Y have

X and

Suppose

VARIABLES

RANDOM

TWO

the following joint

Answer

Suppose

Answer

same questions

the

X and

same

the

4-7,

Section

in

some function

of

.10

.20

.10

as

an

In this chapter we
of a pair of random

.1

.1

.1

.4

.1

.1

.I

in

Problem

5-1.

questions

as

VARIABLES

analyze

variables

a derived

\1777)will

variable T which

is some

function

X, Y:

g(X,

of this section will


therefore
run
the main difference being that
the
replace
the probability function
?(x).

new variable T.

was

(5-14)

= g(X)

proofs

shall be particularly

R which

variable

chapter,

previous

We

distribution

we analyzed a derived
random
(original)
random variable X'

shall

The conceptsand

5-1.

joint

r =

function p(x,

Problem

x\177

of the

in

OF TWO RANDOM

FUNCTIONS

5-2

following

the

have

.15 .30 .15

10

5-3

probability function

interested

in

the

distribution

parallel to those
probability

joint

and mean

of the

Example

Following

simple

our

examples,

normal

procedure,

we develop the argument

and then generalize.To

use our example

of

in

terms

tossing

of

three

OF TWO

FUNCTIONS

coins, shown

in

S is just the

5-3, suppose

Figure

specificcase( ;-15) becomes:

VARIABLES

sum

X and

\177of

\276.

s=x+Y
s) mbol

the

use

We

of X and

is

In Figure

derived direct

emphasizethat

T to

than

rather

(5-16)

in

special case of

being

(5-15),

a simple

the derivation of

the

function

probability

fufiction

this

sum.

\234(g), the probability


function of
the original sample space, or indirectly
by
function ?(x, \177,). In either case, the result is

?wo views of

this

(5-16)

5-3, we show how


iy from

the joint prok ability

FIG.5-3

a very

In

of S

S may

be

means

of

the

same.
!i
+ Y

(a) Directly

Reordered

s -- a\177q0
1
1
2
1

\337(HT\177 \177)

.mHi

?)

0
1
2
1
1
2
1

t)

\337(TTq
\337(TT}

[)

.(TH:

,)

\337(THI\177
i t)
\337(HT]

(b) Using

the

Original

sampl

'
i

[;:\"\177

,)

jo nt

x\"x,,y

of X

2 \177{]//\177

\337(HHT)

To give some

and

(HTT)'

\177

Final

\337(HTH)

0
1

\337

\177\177ff4\177

\177

2/8
4/8

1/8

(THT)

\177

p(s)

Intermediate

e space

(TTH)\177

function

Probability

\177d

'\177

\177\177'

as

an intermediate
sample

condensation

space

p(s)

\3370

1/8

'2

2/8

3 4/8
\3374

1/8

ation
as to why this random variable might
be of practical interest,
the tossing of 3 coins as \"having
3 children,\"
and then consider X =
number of girls a \177dY -- number of sex changes.
Since girls are more expensive
to clothe
than boys, and
ince sex changes
interfere
with the convenient passing
on of clothing
from one child t6
the next child, we might interpret S = X q- Y as a rough index of the
clothing costs for :he family.
Of course, a weighted average of X and Y might be even more
we

may

appropriate.

reinterpret

ti

IWO

86

one hand, consider the direct


derivation
in Figure 5-3a. To
note that four of the eight equiprobableoutcomesare associated
3. Hence _p(3)= Pr(S = 3) is 4/8.
Other
$ probabilities
are

the

On

we

illustrate,

with

evaluated.

similarly

combinations

The expectation
hand, applying

(4-3)

p(3) may

5-3b,

Figure

in

probability function
yield S = 3, and the

the joint

deriving

(x, y)

be evaluated indirectly,
by
i/). Then the three circled
all
sum
of their probabilities is 4/8.
E(S)
may similarly be derived in two ways.
On the one
directly
to the probability distribution
of S, we have,

hand,

the other

On
first

VARIABLES

RANDOM

?(x,

by definition:

E(S) =
=

the

On

1(0) +

0(\177)

hand, we may arrive at

and

3- Y) =

E(X

\177

=
(5-I8)

does,

in

(x

3-

0)(\177)

we see that

same as the

4(\177)

the same result by using the joint


if (4-17c) can be extended

g/)

(5-18)

p(x,

(0 +

\177)(0)

0)(0) +

(\177 +

\177)(-\177). \337
\337

(\177 +

\177)(}\177+

(2 +

\177)(-i)

at least

work

derived

result

same

the

in

(5-17).

Why?

example.

this

in

(3 +

The last

to

-\177+-\177)

secondlast term

is just

(5-18)

(\177 +

3(-\177

is the

3(\177)

2\253,

fact,

3 terms of (5-18)amount

we wonder

Specifically,

Y.

= (0

which

2(\177)

of X

to:

So

(5-17)

s p(s)

other

distribution

E(S)

\177

of

3(\177)

of

a disguised form

in this

Continuing

(5-17).

the

more

fashion,

condensed

form

(5-\1777).

In

a similar

Theorem.

way we

If T

= g(X,
E(T)

[compare
X

could prove generally


Y)

any

is

= E[g(X,

of two random

function
\276)]

\177 g(x,
\177,\177J

V)?(,\177,

variables, then

V)

(5-19)

(4-17c)].

For an example of how


and Y, we return to the

this
tossing

works

for a

of three

more complicated function

coins, and

T=X\177--2Y

of

consider

(5-2o)

meth \370d

Following

th\177

probability

distribution

tp(t)

--3

1/8

-\177

2/8

\177

(x 2

= (02

3/8
-2/8

2/8

2
9

2/8

4/8

\177/8

9/8

which Et T) is directly
calculated
Alternatixely, we could calculate
in Table 5-3. Thus, noting
(5-20):
E(T)

fol16Wi ng

(5-17)

using

?(t)

given

the

derive

for T:

Calculation of \234(T),

from

we can

5-3(a),

Figur

of

87

VARIABLES

TWO

OF

FUNCTIONS

-- 2y)

p(x,

-- 2(0))(1/8)

1.

to be

using

_p(x,

\177t)

as

y)

+ (0\177'--

2(1))(0)

+ (0 \177-- 2(2))(0)

2(2))(0)

2(0))(0)..--t-(32--

q-(12--

(5-19),

from

E(T)

\177--1.

PROBLEMS

5-4

X(X +

Let U
V =
where

Jr

(X -and

Y)

8)(Y -Y have

4)

the same joint

as

distribution

in Proble

namely

x\177
5

10

(a)

.20

.10

.30

.15

.15

Fine the distribution of U, and

(b)

(c)

.10

Find\177

the

mean

Fin! E(V),

of U using

'

fro

(5-19).

this

its mean.

m 5-2,

Let U =

VARIABLES

RANDOM

TWO

88

XY- 1

r = (X-

where X and

2)

\177)(\177'-

joint

the same

Y have

as

distribution

Problem

in

5-3,

namely

2
xx\177

(a)

Find

(b) Find

the

.1

.1

.1

.4

.1

.1

.1

of U, and from
of U using (5-19).

distribution

the mean

its

this

\177nean.

(c) Find E(V).


5-3

COVARIANCE

is a

This

related.
graphed
to move

As
in

an

measure
example,

Figure

5-4a.

of the degree to which


two
variables
consider the joint probability function
We notice some tendency for these

are
of

two

linearly

Table

5-5, '

variables
Y; and

together (i.e.,a largeX tends to be associated with a large


a
small X with a small Y).
Our measure of how the variables
move together should be independent
of our choiceof origin.It will, therefore,
be convenient
in Figure 5-4b to
translate
both
axes from the (0, 0) origin
to/\177x and/\177y (which are calculated
to be 3 and 3); this means defining
two
new variables
Y-

and

Now supposewe

multiply

the

new coordinate

values together,

(x-/,x)(\276TABLE

Joint

5-5

1
1

Probability

p(x,

.2

.1

.1

.1

4
$

\177d)

.2

.2

0
0

.1

89

cOVARIANCE

(a)

-- /1X

FIG.

Tran: fation of axes.

5-4

(a)

Original.
tribution.

(b) Axes

translated to

the

center

of the

dis-

fx) coordinate
product will be positive.
It will also be'\177positive for any point in the third quadrant,
since both factors
are negative.
Cut
for points in the other two quadrants the product is ne\177at ve. If we sun\177 all of these,
attaching the appropriate probability weights
to
For

any

and

( Y

poin'

--

fv'\177

in quadrant

in

will be

coordinate

5-4b

Figure

its (X-

both

positive; hence this

i.e.,

each,

x y (x

\177Z

this gives us a good measure of how


fact
axv , the \177\"covariance
of X and
In

and III;

our
thus

eX'tmPle,
t!te

the heavier

Positive

terms

-- fir)

-- fx)(Y
the

p(x, y)
s move

Variable

together, and

is in

Y.\"

probability weights

in this

(5-21)

calculation

will

appear
outweight

in quadr?ts
the negative.

IWO

90

VARIABLES

RANDOM

covariance

Consequently,

for the variables to

tendency

probabilitieshad

positive,

be

will

Alternatively,

together.

move

as

indicating,

expected,
if the

some

larger

in quadrants
II and IV, covariance would
be
the tendency for X and Y to move in opposite directions.
Finally, had the probabilities
been evenly distributed in the four quadrants,
there would be no discernible
tendency
for X and
Y to move together and,
occurred

indicating

negative,

as expected,
their

zero.
to the following formal

would
be
is equivalent

covariance

We notice that

(5-21)

definition)

Definition.

This

(-2)(-2)(.1) + (-1)(-1)(.2)
+ (-1)(+1)(.i)
+ (+1)(-1)(.1)

of axy

computation

The

(5-22)

as follows'

Calculated

axY =

Y,

/\177x)(Y- /\177y)

= E(X-

%x:Y

and

of X

Covariance

may

often

be simplified

axy

= E(XY)

with its proof, is analogous

formula,

(+1)(+1)(.2)

(+2)(+2)(.1)

= +1.0

using

by

(5-23)

-- YxYY

to (4-5):

(5-24)

Proofof (5-23):beginning
axy

= \177 (x -

with

(5-21),

\177x)(y

Yr) ?(x,

x\177y

Y\177x

y)

a\177y

(xy

+ \177x\177 ) p(x,

y)

\177c
y

-- \177

xy

y) -- ttlz

p(x,

\177 x p(x, y)

-- YX

\177

y ?(:c, y) +/\177xY\177'

\177

x y

?(x, y)
(5-25)

In the second term,

we find

that
x

x?(x,y)=\177x\177y?(x,Y)\177
=

and by (5-5),

(5-26)

Similarly,

in

the

term of

third

(5-25),
y p(x,

Finally,

in the

last term of

(5-25)

reduces

= \177

y)

(5-27)

(5-25),
\177?(x,

Thus

y)

to
x

xy p(x,

y)

yr(/\177x)

-/\177x(/\177r)

= E(XY)-

ttxl\177y

(5-23) proved

91

COVARIANCE

The variance of X [ref.

being the

col

Since

we

a2

it pl\177

find

(4-19)] is recognized

a special

just

as

itself.

X with

of

\177ariance

measures the extentto which the two


usible (indeed,it may be proved a) that

variables

case of

this,

move together,

Theoren

If X

and r

are independent,

(5-28)

'\177

PROBLEM

5-6 For tI

joint

following

table,

probability

2
0
1
(a) Fr>m
1
(b)

(s-7)

the

(5-22).

definition

the easier

Fr:>m

Repeai

formula (5-23).

.xx\177 o

and

Y have

the

3
Proof.

If X a m

Yare

independent,

[\177 (\177-

= 0.0

,\177

distribution:

.10

.40

.20

.05

.05

.20

then

2vx, y)

Thus (5-21)bec, me s

'\177,,.

/'

joint

following

.,., ,

\337
.
X

joint probability distribution:

for the following

5-6,

Problem

5-8 Suppo:e

.2

.4

axy

.te

Calcul

.2

.2

= 0

= ?(x)

\177x)?\275)1

(5-I 3)repeated

?(y)

[\177

(y

\177r)?(y)l

(5-28)proved

RANDOM

TWO

92

p(x)

Find

(a)

confirm
5-9

that

is

VARIABLES

and p(y); then by verifying


that
and Y are independent [ref. equation

(b)

What

(a)

Referring to

(b)

Referring

p(x, y)

(5-13)].

ax\177,

Problems 5-4 and 5-5,is it

that

true

E(V)=

axY ?

5-1, find

Problem

to

p(x)p(y)=

(1)

'5-10

0
1

X and

Are

.1

.3

.1

.1

.2

.2

of X and the probability function

function
E(X)and E(Y).
Y independent?

Y. Compute

(b)

probability

the

Find

(a)

of

(c) Calculate axy.

(d) Which

are

statements

(1) If X

and

(2) If \177x\177 =
a

=> 5-11 In

true,

for

0, then X

then

and

game,

any

independent,

gambling

certain

Y?

must be zero.

be independent.

must

a pair

and

axr

of honest three-sideddiceare

Let

thrown.

X1 =
X2 =
The

are

number on first
number on the

of

distribution

probability

joint

die

second die
X\177 and

X2 is,

of course

2
3

The total

number

of

dots

S is:

S--- X1 .qL

of S, and its mean and variance.


and variance of X 1 and X\177..
see the relation between (a) and (b)?

(a)

Find

the

(b)

Find

the mean

(c)

Do

you

X 2

distribution

Supp

:v-5-12

using

\177,se

5-11 is

of Problem

gambling
game
as follows:

the

93

COMBINATION

LINEAR

by

complicated

dice,

l\177aded

\337

\177'

of

distribdtion

.5

.3

.4

.3

.1

X\177

dice are tossed


X2, and then

and

independently, tabulate
the
joint
answer the same questionsas in

5-11.

Problela

OF

COMBINATION

LINEAl

5-4

.4

the

that

Assumlng

I
2

VARIABLES

RANDOM

TWO

(a) Mean

First,

web

When

of \177

the mean
the

and

of more complicated functions, and return


Section 5-2 in which S was just the sum of
E(S) the student's suspicionsmay have been

leave

take

example of
we calculated

simple

problems.

mean
In

simply the

out to

turned

(2\177)

of Y, (1). Moreover,
tct, for any X and

of

sum

the

this was exactly the


it may be proved* that

Y,

mean

to

the

aroused;

of X,

(1\177)

conclusion in

the

Theorem,

E(X

often refer to this


the expectation

Mathematicia}s
or

\"linearity\":of

cover the

caseof a \"weighted

For S

--

W-- aX

first

term,

Y)

we may

x y

x p(x, y)

write

=
=

(5-5)

Similarly

the

seC\177nd

easily

to

(5-30)

--- \177 (x d-' y) ?(x,


zc y

\177XlV(x,y )
by

be

may

+ bY

XY

g the

as the \"additivity\"
generaliz'ed

property

important

operator. it

= \177
Considerin

(5-29)

becomes

Y, (5-19)

E(X +

E('Y)\177

E(X) +

sum\"

'
5 Proof

+ Y) =

't- \177
X y

it as
\177x

y)

y p(x,

y)

!
[\177p(x,

y)]

\177 x p(\177)

to E(Y), so that
E(X + Y)= E(X) +

term reduces

E(Y)

(5-29)

proved

TWO

94

where
a and b are any
//tion of X and Y.\" For
a = b = 1. As another
Y is (X +
Y)/2 = \177

1/2. Similarly,

We
we

know the averageof X and

(5-30) to

into

average

the

find

is always justified; thus

operation

simple

average

if we

that

these

plug

as a \"linear combinathe special case in which

which

Y,

\177

known

is just

1.

b =

guess

might

might

average of two random numbers X and


is just a weighted sum with weights
is just a linear combination
with
a and

the

example,

also

q-

S = X

example,
X

W is

constants.

two

weighted

any

a +

satisfying

VARIABLES

RANDOM

of

the

average

of Y,
this

Fortunately

\177.

\177

\337
Theorem.

E( I4/) = E(aX q- b Y) = aE(X) -+- bE(Y)


(5-31)
review, the student should compare(5-19)
and
(5-31).
Both provide
a means of calculating
the expected
value of a function of X and
Y. However,
(5-19) applies to any function
of X and Y, whereas
(5-31) is restricted to linear
functions
only.
When
we are dealing with
this
restricted
class of linear

As a

functions, (5-31)is generally


preferred
Whereas evaluation of (5-19)involves
of X

and

of

and

distribution

ability

distributions

ginal

JF

to

(5-19) because it is much simpler.


through
the whole joint prob-

working

5-3), (5-31) requires only the marthe last row and column
of that
table).

Table

(e.g.,

Y (e.g.,

(b) Variance

Again, we

considera

later. The variance

is a

and

first,

any linear
than

complicated

more

little

combination
its mean. It

? that

proved

be

may

sum

simple

a sum

of

Theorem.

var(X+
6

Since

Proof.

the

the proof
It is time

awkward

parallels the
to simplify

(5-32)

Y)

of (5-29), it is left as an exercise.


proofs by using brief notation
such as E(W)

proof
our

more

or the even

w?(w),

\177

Y+2cov(X,

=varX+var

Y)

\177

awkward

w(x,

\177/)?(x,.V).

rather
First,

than

from

(4-19),

= E(S -- ItS)

var S
Substituting

for S
var

and its,
E[(X q-...Y)-

S =

(itx q-

= E[(X--Itx)+ (Y-

'
Realizing

itx)

El(X-each

that (5-31) holds


var

for

'\177
+

of these

E(X --

is a

any random

itx) \177+
=varX+2cov(X,

S =

2(X--

itx)(Y-random

variables,
Itx)( Y

2E(X --

Y)+varY

ity) +

(Y--

variable

-- itI \177)+

E( Y

-(5-32)

proved

LINEAR

.g

var

where

spectively.

A\177

(ai

covariance

for

pendent,

nd coy (X, Y)are alternate


interesting simplification
in the

simplifies to:

for

notations

occurs

dice Problems

and

X and

5-11 and

axe'

re-

Y have

zero

and

a\177

when

whenever

occurs

this

);

uncorrelated

example

95

COMBINATION

inde-

are

(5-32)

Then

5-12.

Corollarj

(: ;'32)

Theorem.

and

importan\177

tl

in

with

specil

restricted

+,b Y),

(aX

of the

Summary

of

\276 +

Mean and

X and

Mean

Variance

E[g(X, Y)]

=
combira

Row

2/) p(x, y)
(5-19)

\177ig(x,

X,Y
\234(aX + 5 \276)

+ bY

tion aX

Setting b =

of o n

variable, aX

8 Since the

has a

corollary

6 = 1
in row 2
a =

4. Function

E(X +

Setting

sum

----- a

proof

in

\177rallels

similar

row

the

E(X)

to (5-33).

E(Y)

(5-29)

E(aX)

= aE(X')

of (5-32),

it

is left

\177'\177

X +

b\"'

var

(5-34)_
Y)

+ 2 cov (X,

Y)

= a\177var

4-2)

as an exercise. Note

Y)

varX+var

var (aX)

(ref. Table

proof

2 var

+ 2ab coy (X,


var (X

Y)

+ b Y)

(aX

var

+ bE(Y)

-- aE(X)

(5-31)
x +

(5-34)

Y)

Functions

Various

of

Variance

Variables

Derived by'

3. Simple

(X,

2ab coy

and

function

Linear

vat

Variance

X and

2.

b2

X +

vat

Random

the

Mean
Function

Any

= a2

theorems of this section


are summarized
in Table 5-6, a
table for future
reference.
The general
function g(X, Y) is
e first row, while the succeeding rows representincreasingly
\1771cases.

T^BLE 5-6

g(X,

combination.S

the ether

This

very

t.

(5-33)

va\177

dealt

to any linear

be generalized

may

Y) =varX+var

vat(X+
Finally,

uncorrelated,

Yare

IfXand

(5-32)
X

(Table 4-2)

also

that (5-34)

96

TWO

VARIABLES

RANDOM

Example

we choose

Suppose

at

family

letting

G =

so that

known

it is

Suppose

family

the

in

number of girls in

= number

q- G

=z

of boys

-- number

family

the

of children.

that

= 2.0

vat B

-- 2.2

var

G) = 0.3

coy (B,

of children, and the variance'

the average number

calculate

can

we

Then

From (5-29)

population,

a certain

from

random

= 2.3

From

(5-32)

(C)

var

2.0 +

2(0.3)= 4.8

2.2 +

PROBLEMS

5-13

are

only

not

function

5-12,suppose

5-11 and

Problems

Continuing

but

loaded,

of the 2 numbers

mean

3,\177

the

distribution

.1

.1

.1

.i

.1 .i

.,5

of S

of 3-sided dice
probability

joint

.2

.1
'

'

(the ?otal

number
t.

2..

of dots), and its

and variance.

(b) Find the mean and variance


(c) Find the covariance of
and (5-32) hold true.

(5-14)When

the

.1

Find

pair

the
that

is

(a)

so

dependent,

a coin

is

fairly

tossed

X -- number of headson the


y--- number of headson the
Z -- total number of heads

of
X1

of X2.
X2, and then

X\177 and

and

3 times, let
first
two coins
last

coin

verify

that

(5-29)

COMBINATION

LINEAR

(a) Are X

(b) F\177r
ance.

tossed

, having

in

3-26)
sample space'

10

15

T'T)

\337
(H

. (T

_H)

15

\337
(T

H T)

10

\337
(T

T H)

.10

.15

\337
(TTT)

2nd

(b)

50

80

20

\177

20

\177

(,q +

(b) The instructor

Repea

covariance
%\177

of the

(5-18) Repeat

table, assuming
a simple average of

50

the

second

exam was

,t

such

tverage

two

the

grades,

twice as important,

average

Problem 5-16, if'the covariance

interpr

calculated

thought

a weighted

tool

in the

blanks

the

instructor

The

.V =

5-17

Variance

\177

',rage W

FJ[1 in

so

Standard

Deviation

characteristics:

following

the

with

ob-

X Veighted

av{

(a)

grades,

wrote 2 exams,each time

Mean

X2

\177xam,

(a) Average,

large class

Class

X\177

fairly

10

. (H T H)

1st e\177am,

not

Pr(e)

. (HHT)

udents
of a certain
tainin\177 a distribution
of

is

which

H H)

\177,(H

sl

and vari-

true.

5-16 The

mean,

(Problem

coin

following

the

fact

covariance ?

distribution,

the

find

(5-32) hold

and

(5-29)

that

Z,

Y, and

t Problem 5-14for

Repe\177

is their

What

independent?

of X,

each

(c) v \177rify

(5-15)

and

97

a negative
grade
?

problem 5-16,if

covariance ? What
the

covariance

--200.

is
has

is 0.

might you
to the variance

How

it done

98

Review

VARIABLES

RANDOM

TWO

Problems

X and

5-19 If

Find

the

joint probability function

the following

Y have

.1

.3

.1

.1

.1

.3

and mean

distribution

probability

of

(a) x.

(b)
(c)
(d)

Y.

The

sum S

Y, given

= X+

X =

Y.

5.

(e) Are

5-20

X and
Y independent?
Briefly, why?
(f) Find Pr(X < Y).
In a small community of ten
working
couples,
thousands of dollars)has the following
distribution:

Man's Income

Couple

couple

10

15

15

15

10

10

10

10

15

10

6
7

2O

10

15

10

20

15

10

20

The

(b) The
(c)

(d)

bivariate
probability

probability
The covariance

The

at a

con-

and

wife

10

to represent the community


income of the

by lot

(random)

be the

(in

is drawn

income

Income

vention. Let M and


respectively. Find:

(a)

Wife's

yearly

man

distribution

of M;

and its dot graph.


also/&u and a.\177s.

distribution

of

also/&v

probability

a.m[..

distribution,
147;

and

GIV.
\177

COMBINATION

LINEAR

(e) E( ['V/.M = 10), E(IV/M =


conditional mean of IV increases

(f)

what

and

\177L?
\177ts mean

that

This is

as

variance

the

increaseS,

another expressionof the

IV.

the total combined income of the

C represents

If

too.

M and

between

relation\"

\"positive

20).Note

man

and'

wife,

9.

is pr (C _> 25)9.
If tncome IS taxed a strmght
20 percent,
what is the mean and
variani:e
of the tax on a couple's income9.
(i) If the income
of a coupleis taxed
according
to the following
progressive tax table, what is the mean and variance of the taX?
(g)

What

(h)

\177

(5-21)

Ten P!ople

in

a room

Person

For
(a)

a,

Ti

Tax

Income

Combined

10

15

20

25

30

35

10

40

13

have the following

(inches)

Height

heights and weights

Weight (pounds)

g
B
C

70
65
65

140

75

160

70

150

70

140

65

140

150

75

150

75

160

70

160

person

drawn

(b) TJ\177eprobability

(c) T!\177eprobability

(d) TJ

le

(e)

\177,W/H =

height
H and weight W), find.
distribution,
and graph it.
distribution
of H, and its mean and variance.
distribution
of W, and its mean and variance.

by lot

(with

probability

\177ebivariate

covariance,

aH\177r.

65), E(W/H

= 70), E(W/H

= 75).

100

VARIABLES

RANDOM

TWO

(As height increases, the conditional mean


weight
another view of the positive
covariance
of H and
(f)
Are H and W independent?
(g) If a \"size index\" I were defined as

is

which

increases,
W.)

I=2H+3W

5-22 Suppose

and verify

of I

distribution

(a) List the

sample

is the

(b) What

Each coin
possible
space.

that

R,

up\" you

\"heads

0 to

R ranges from

and variance?

its mean,

work through an alternate way


going to the trouble of finding
let us define

find

to

without

R,

tion. To begin

with,

X\177

= the

X2

= the

the mean and


distribu-

exact

its

to the reward

nickel's contribution

to the
the quarter's contribution to

Xa =

the

find

a nickel, a
are allowed

the table.

on

coins

lands

reward
of

distribution

now

shall

We

variance of

then

directly.

involves dropping 3

a game

dime,and a quarter.
to keep,so that the

deviation of I;

and standard

variance

mean,

the

find

dime's contribution

reward
reward

the

Thus
(5-35)

is the distribution

(c) What

(d)

(e) Apply

on

supposethat
the

X\177,

find

of

:=> 5-25 A
at

there were 4
Answer

2 quarters.

instead

What

coins
the

of 3 coins, we dropped
is the range, mean, and

bowl contains 6 chipsnumbered


from
1 to
and then a second is selected (random

6. One

random

Let

X\177 and

X2 be

the

first

and

X\177 and

chip is selected

sampling

numbers

second

(a) Tabulate the joint probability


function
of
(b) Tabulate the (marginal) probability functions
Are

Xa.

R ?

replacement).

(c)

and

5-22.

Problem 5-22,supposethat
3 nickels, 2 dimes, and 5 quarters.
of

3 coins,

a dime,and

Continuing
variance

varianceof X2
var (R).

and

E(R)

instead

Problem

and variance?

mean,

its

mean, and

a nickel,

table

same questions as in
5-24

of

distribution,

(5-29) and (5-33)to

(5-23) Continuing,
dropped

the

find

Similarly

X\177 and

of

without

drawn.

X=.
X\177 and

X= independent'?.

(d) What is the covarianceof X\177 and X= ?


(e) Find the mean and variance
of X\177 and
(f) Find the mean and variance
of S -- X\177 + X= in two different ways.

COMBINATION

LINEAR

:> 5-26 Ret eat Problem 5-25 with


draWi

draW!

probl

(random
m

the

5-27

Let

the

be

of dots

number

total

the

mathematically

twice
\177

in

replacement).

with

sampling

replacement)

(with

following

then replaced

and recorded,

101

change. The first Chip is


before the second is

bowl

Isn't this sampling


identical to tossing a die
10 fair

when

showing

dice are

tosse\177

'hat are the

(a)

(b)

\276that

(a)

,a\177bowl

is the

variance

and

mean

of possible

range

of Y?

values of

Y?

=> 5-28
contains

by $. Tabulate
of

variat\177ce

(b) Repeat

(c)
(d)

50 chips

of two chips

A sarhple

Repeat

o you

the

is drawn

probability

numbered 0, and 50 chipsnumbered


1.
with
replacement;
the sum is denoted
function
of $. What
are the mean and
:

S?

for a sample of three chips.


for

a sample

recognize

the

of five chips.
probability

functions

in

(a),

(b),

and

(c)?

chapter

Sampling

6-1

INTRODUCTION

In the
last three chapters we have
analyzed
probability
and random
variables; we shall now employ this essential theory
to answer
the basic
deductive question in statistics:
\"What
can we expect of a random sample

population ?\"
met several examples of sampling:
the poll of voters
sampled from the population
of all voters; the sample of light
bulbs
drawn
from the whole production of bulbs;
a sample of men's heights drawn from
the whole
population;
a sample
of 2 chips drawn
from
a bowl of chips
(Problem 5-25).All of these are sampling }vithout replacement;
an individual
once sampled, is out. Sincehe is no longerpart of the population
he cannot
appear again in the sample. On the other hand, sampling
with
replacement
involves returning any sampled
individual
to the population. The population
remains
constant;
hence any individual may appear more than
once
in a
sample, as in Problems
5-26 and 5-28. Polls of voters
are typically
samples
without replacement; but there is no reasonwhy a poll could not be taken
with
replacement.
Thus
no record would be kept of those
already
selected,
and, for example,John Q. Smith
of Cincinnati
might vote twice in the poll.
a
privilege he will not enjoy on election day.
As
defined
earlier,
a random sample is one in which
each individual
in
the population is equally
likely
to be sampled. There are several
ways
to
actually
carry out the physical process of random
sampling.
For example,
suppose a random sample is to be drawn
from
the population
of students
from

drawn

We

in the

a known

already

have

classroom.

1.

The

most

board chip, mix


draw

graphic
all

these

method
is
chips in a

2. A more practical method


is
a random sample of numbers.

to

put

each

person's

large bowl and then

to assign
Thus
102

each person
for a

name
draw

a number,

population of

on a
the

card-

sample.
and

less than

then
a

!! '103

INTRODUCTION

hundred, 2-d git


by throwing

(Appendix \177able
required

in

t\177e

A random

suffice.

numbers

or

die twice,

\17710-Sided

2-digit number
a table

consulting

pair

off a

reading

and

II)

by

digits

of

may

be

obtained

of random numbers
for each individual

sample.

are mathematically
equivalent. Method 2
used in practical
sampling.
However,
the
first
method
i conceptually
easier to deal with and to visualize; consequently
in our theor\177 tical development
of random sampling, we talk of drawing
chips
from
a bowl. Moreover, if we are studying
men's
heights,
then the
height
alone
i.\177
all that is required on the chip and the
man's
name is irrelevant.
Hence we ca\177 view the population
simply as a collection of numbered chips
in
a bowl,
w\177ich
is stirred
and then sampled.
!
\337
\337 How
canirandom
samphng
be mathematically specified.9 If we draw one
chip at rando\177n,
its number
can be regarded as a random
variable
taking
on
values that rar\177ge over the .whole Population of chip values,
with
probabilities
correspondin\177 t \370the relative
frequencies
in the population.
As an ex'\177mple,
suppose
a population
of 80 million
men's
heiehts
ha\177
the
frequency'idistribution
shown in Table 6-1. For future
reference,\177we
aisc}'
These

t\177o

is simpler

compute

of

..V,

3t

methods

sampling

a\177,cl

Table

a 2 from

6-1, and

the ?arent

3 represents

where

it is

hence

employ,

to

TABLE 6-1

call them

population

of Men

Population

ought

(3)

51

825,000

.01

54

791,000

.01

57

2,369,000

.03

60

5,505,000

.07

9,483,000

.12

66

16,087,000

.20

69

20,113,000

.25

72

149480,000

75

7,891,000

78

1,633,000
823,000

z =

4-8c.

\177

also

Frequency

81
approximate

each

o have

variance

Frequency,

of cell)

idpoint

63

precise, we

Heights

mea n and
s heights.

Relative

x We

men

(2)

(1)
Height

(M

the,

of

height

by the cell

used a

very

80,000,000

.18

.10

.02
.01

z = 1.oo

midpoint to keep concepts simple. To be 'more


of height into many
cells, as in Figure

fine subdivision

104

SAMPLING

From

(4-3)'

From

(4-4)'

51(.01) q- 54(.01)\"'

3t =

67.8)2(.01)q- (54 -- 67.8)2(.01)


\"'

(51 --

a2 =

81(.01)-(81

q-

67.8

= 28.4

-- 67.8)2(.01)

a=5.3
is equivalent mathematically
to
a bowl with each chip carrying
The first chip selectedat random
can take on
any
of these
x values, with
probabilities
shown
in column 3. This random
variable
we designate
as X\177; the second draw is designated as the random
variable
X2, and
so on. But each of these
random
variables
X\177, X2,
... X,
(together representing our sample of n chips) has the same probability distribution
\234(x), the distribution
of the parent population; that is 2
population

this

from

sampling

Random

million chips of
the x value shown in column
1.
the 80

placing

This

of course,

equality,

secondchip

is drawn
(6-1)

Fortunately,

though

X\177,

must show
We

X2,...

X\177

now

are

noted

already

have

the

same.

population (and not

sample

if we

same bowlful
sampling

(6-1)

with replacement, sincethe


as is the first chip, etc.
even

replacement,

without

since this is not

dependent;

Once that

Thus
of

distribution
But

of

this
X\177,

is not
which

X2

X2 will

we

obvious,

all

at

is

as the
of X2

same

the

distribution

has been taken from the


chang es3, along with relative
on X\177; or to restate,
the

value

population
is

X\177

conditional

sample

first

the

of

distribution

the

that

replaced),

(probabilities).

conditional
first draw.

. . . p(x\177)

of the population. However, the

given X1 is not

distribution

holds true

from exactly the


also holds true for

why.

distribution

frequencies

-- \234(xa)

= ?(x2 )

?(xO

2 in

column

dependent

depend on

the value

of

in the

X\177 selected

the issue in (6-1). In that equation


is not the conditional distribution, but

is

\234(x2)

rather

the
the

of X2--without an3 condition,


i.e., without
any knowledge of X\177. And if we have no knowledgeof X\177 and consider the distribution
of X2, there is no reason for it to differ from the distribution of X\177.
Our
intuition
in this case is a goodguide.We could
formally
confirm
this result by considering
the full table showing the joint probability function
of X\177 and
X2. It is symmetric around its main
diagonal;
hence
although
conditional distributions (rows or columns) vary
in this table,
the marginal

marginal

distribution

speaking,
(6-1) is not precise enough. It would be moreaccurate
the probability function
of X\177,72 of X 2, etc., and then write

\177
Strictly

p\177(x)

-- p2(x)

pa(:\177)

' \"

--

p\177(x)

to let?\177 denote

-- p(x)

where -- means \"identically


equal
for all x.\"
\177
In our example, with
a population
of 80 million heights, this
practical
consequence.
But with smaller populations
it might.

change

would

bc of

no

105

sAMPLE SUM

distributions of X\177 and of X2 are necessarily identical. (SeeProblem


Thus equatio a (6-1) holds true, even
in the case of sampling without

5'25b.)
replace-

ment.

Before

is extremely

matters whe:her

Ie4ds us

the frequencies

It hardly

replacement.

with

in the

is replaced

will

(X=)

be

to the conclusionthat

column

in

population or
2 or the [dative
of the

independent

practically

replacement;

with

first

from

replacement

without

sampling

to sampling

is equivalent

\177oPulation

infinite

million, sampling without

3.

th: second draw

This

(X0.

as sampling

the

wh'en

observation.

further

as 80

sampled

changes

hardly

one

such

large,

the individual

not one individual


frequencies i:\177column
Thus

we have

the same

is ?tactically

replacement

an

6-1,

Table

l\177aving

parent population

is

: this

important
e\177iough
that
we shall return to it in Section
6-5.
Simulat ed\177b a
Cone!us,ion.
Any population to be sampled
bowl
of chips,
with the following mathematical characteristics:
1. The \177umber on the first chip drawn is a random
variab!e
X\177, with
a
distribution
dentical to the distribution
of the population
random variable X.

maybe

2. The \177ampI
Each

X\177

the

h a

n chips gives us n random


same (marginal) distribution
characteristic
(6-1) holds in all

e of

variables

(X\177, X\177,

.'..

population X.
regardless
of sample

of the

that

This fund an\177 :ntal


cases
replacemen(or population
size.
However,
the independence of X'l, \177r2,...
is a more c(.mplex issue. If the population is finite
and
sampling
is without
, then the X; are dependent, sincethe conditional
distribution
replacement
of any X\177 depends
on the previous X values drawn. In all other
cases the
Xi are independent; for simplicity,
we shall
assume this independence in the
rest of the took (except Section6-5).

6-2

SUM

SAMI'LE

Now

V\177e arc

First consid\177rS,
The

expect j ,d

to use the heavy artillery


drawn
up
sum of the sample observations, defined

ready
the

+
value ofSS&isX\177
obtained

E(S) =
4

E(s)=

E(\177'\177+

X2 +

=
=

\177

Again

by Thi 'orem

(5_29).

gener all

induction.

:ati\370n

of the

\234[(X\177+

X\177+...

X._\177)

(5-29), as:

5.

as'

(6-2)

(6-3)

+ E(X,,)

+\"'

X\177]

q- X\177+ \337
\337
\337
q- Xn-\177) q- E(X\177)
\177[(x\177 + x\177 +-..
+ x\177_\177)+ x\177_\177]+ \177(x\177)

E(X\177

_-- E(X1+ X\177+


=

This

+by Xa
+'\"4 Theorem
4- X\177
using

-+- E(X2)

E(XO

.... + X\177) =

15-29):

by Theorem

X\177

inC\177 ptcr

\234(x\177) +

E(x\177)

special two-variable

...

+ X,,_=) +

+.-.

case in

+
(5-29)

E(X,\177_O

\234(x\177)

is an

example of proof

by

SAMPLING

106

(6-1) that

Noting from

each

population,
it follows
(6-3) can therefore be

X\177,

each

that

written'

...

has

the same

E(S)

or

the expected

population times the

that

Note

the mean o'f the

simply

is

parent

size.

sample

\177

var

(X\177

vat

X\177

X2 +

var X2

\"' +

X\177)

... +

(5-33)'

Theorem

using

by

var

(6-6)

X\177

this depends on the assumed


independence
of X\177, X2, ... X,\177.
all the X\177, X2, \337\337. X\177 have the same distribution as the populaalso
have the variance a \177of the population.
Thus (6-6) becomes:
var
S = a 2 + a 2 + ...
+ a2

since

Again,

tion,

(6-5)

n/\177 I

value of a sample sum

var S

the

as

population

(6-4)

same way, the varianceof S is obtained

In the

the

I I\177s =

Thus,

mean as

+... +

same distribution

X,, has the

X2,

they

(6-7)

= na2

or

(6-8)
Formulas (6-5)and
average

length/\177

are

(6-8)

a machine

suppose

is made by joining
5' is a random variable,
is

from

fluctuating

sample

100(.40)

Moreover, becauseour
X=,
the

...

X\17700

standard

are

sample

independent.

deviation

of $.

is

drawn

Therefore,

As

bicycle
a

deviation

sample of

a random

together

6-1a.

of

population

standard

inch and

--.40

in Figure

illustrated

produces a

another

example,

chain links
--.02
inch. A

with

chain

100oftheselinks. Its length


Its expectedlength

to sample.

= 40.0

inches

an infinite
we may apply

from

population,

(6-8) to

X\177,
compute

10(.02)= .20inch
The

student

teristics
We
/\177s

was

will notice that


this
is an example of
a sample (\177s, as) have been deduced
of the parent population.
of

characteristics
(/\177,

pause

a)

to interpret

n times/\177. But why

(6-5) and
should a s

(6-8) intuitively.

be only

x/\177

It
times

statistical deduction;
from
was

known

charac-

no surprise

a? Typically,

that

a sample

THE
Population

107

MEAN

SAMPLE

Sample sum

(one ObserVation)

(n =

4 observations)

p(s)

/as =

41\177

(a)

p(5)

Sample

(n

--

mean
4 observations)

0'2=

Parr

Population

i(X)

(one

observation)

(b)

FIG. 6-1
sum

(a)

(e.g.

and some
the

it is

spread

sample sum S to the parent population.


sample mean 2 to the parent population.

of the

[elation

tin)

ich are

in

substan'

accumulated without

less

6-3 THE SAMPLE


Recall

tl

undersized so that

chain

the
ially

some individuals

include

will

wl\177

(links)

(b) Relation

which

cancellation

some

of the
\337

are

occurs.

oversized,

Thus while

does exceed the spread in an individual


link (a),
it would be if the errors in all the links were

(as)
than

(mO.

cancellation

MEAN
the

of

definition

\177

sample

1 (X\177 +

mean,

X\177

\337

X\177)

(2-1a)

repeated

(6-9)
We
easily

recogniz\177

be anal

that 2

,zed in

is just a linear
terms

of $.

transformation

of $,

and hence

2 can

108

SAMPLING

is important
to remember that \177, as well as $, is a random
fluctuates
from sample to sample.It seems
intuitively
clear
fluctuate
about
the same central value as an individual
observation,
It

that

less deviation becauseof

We thus

out.\"

\"averaging

plausible

find

variable

that X will
but with
the formulas

(6-10)

(6-11)

Proof.

and

the variance,

for

and from

Table

5-6 to

(6-9).

Fs

P\177

--

F7

-- F

we apply the

(6-10)proved

last row

of

5-6 to

Table

(6-9) again.

(6-7)

1 (ha2)

a\177

aX =

of

the

sample

will confirm how

(6-12)

proved

(6-11)

x/\177

(6-11)are illustrated in Figure 6-lb. A graph of the


mean for n -- 9 and n = 25 is left as an exercise;
distribution
concentrates
about F as sample size

(6-10)and

Formulas
distribution

this

of

(6-5)

from

Now,

mean, we apply the last row


1

for the

First,

this

increases.
review

We

of a

die. Two
of 2

sampling

distribution

its mean

section
(X\177,

of all

population

infinite

ity

this
rolls

X2)

by reconsidering
can be regarded

possible rolls

of the

chips from a bowl, as discussed


of the parent population

(F) and standard deviation

a familiar
problem
as a sample of 2
is also

die.

This

in

Problem

is shown

in

rolling
from the

the
taken

equivalent to a

5-26. The probabilTable


6-2a, along with

(a).

this experiment has such simple probability


characteristics,
we
compute the probability distribution
of S and of \177 for a sample
of
2 rolls of the die as shown
in Table
6-2b; the moments of both $ and X are
Because

can also

also calculated

in

this

table.

TABLE 6-2

(a)

of the

Distribution

Probability

Roll of a Die (Population)


x?(x)

?(x)

\177]

1/6

1/6

1/6

2/6

1/6

3/6

1/6

4/6

1,/6

5/6

1/6

6/6

21/6

= 3.5

similarly

= 1.71

a =

TABLE

6-2

Distribution of the

b) Probability

S and

Sample

(1)

Outcome
Sample

\177et

X,

n --

with

i(

(4)

(3)

(2)

or

(5)

S ace

First Se..cond

Die D

te

Sum

Mean

\337

Probability

s p(s)

\177
p(\177)

1/36

2/36

1/36

1.5

2/36

6/36

3/36

3/36

4/36

.(1,2)

.(2,1)

.(1,3)

.(2,2)

.(3,1)

36

\177able

equipro

outcom

!s

.(6 I, 6) i

2.5

5/36

3.5

6/36

5/36

4.5

4/36

10

3/36

11

5.5

2/36

12

1/36

6/36

12/36

3%

252/36

tt\234

similarly

as

109

=2.4

126/36

= 3.5

= 7.0

similarly'
a\234

1.2

TABLE6-2

(C)

Direct

On

Moment

Table

from

/\177s

This Relevant Formula


(using population
a from Table 6-2a)

Hand,

Other

the

Givesthe

Calculation

of Mean and Variance

Calculation

Alternative

Calculation

Short-cut

6-2b

/\177and

7.0

(6-5)

/\177s

2.4

(6-8)

as =

3.5

(6-10)

/\177,r\177
= /\177

2(3.5)

\177/\177
a

\177/5(1.71)

3.5

3.5
o-

cr'r'-

(6-11)

1.2

\177/\177

a\177
\177as
= 7.0

= 1.7

= 7

n/\177

2.4

= 1.2

\177/5

10

o's =

1.71

12

2.4.._

(a)

p (\177),Sample

mean

(n =

2)

p (x),Population
(n=

3 14
\1772}=

1)

3.5

o'= 1.7
o'\234=

1.2

(b)

6-2 Throwing a die twice


(a specific illustration of Fig. 6-1).(a) Relation
of the
sample sum S to the parent population. (b) Relation
of the sample mean ? to the parent
population. (Note. In order to facilitate graphing, the probabilities
were converted to
probability
densities,
so that they would
all have the same comparable area -- 1.)
FIG.

110

MEAN

SAMPLE

in

more

show how

;-2c we

Table

of this section. Finally,

formulas

the

tsing

simply

is summarized

example

in

have been

could

moments

these

111

obtained

d\177e-tossmg

this

6-2.

Figure

PROBLEMS

or false?If false, correct


die is rolledtwice,

6-1 True

an

having

randore'variable

of

range

The

However, 2 does not take on


values e,re rare. Thus \177 has
=

o2

illustrating

a better

X.

extreme

.the

likely

deviation than

the

why

range

the

of a

random varigble is

deviation.

errors'

sampled

of Table

the population

from

total

Would

length

total length would vary (from sample to


a standard
deviation of nor = 53 inches.
other
hand, if the 10 men in the random
sample were
The

inches.

678

with

samplel

a.single roll

as for

same

end, the expectationof the

end to

laid

then

nAt

6, also the

than the standard

6-2 True or 'alSe?if false, correct


if '0 men were randomly
be

as

satxpectati\370n

the

all values equally


a smaller
.standard

this illustrates
of spread

measure

6-1,am

3\253,

At.

crtj\177'.

dentally,

Inc

Atx

l to

from

\177 is

of

expectation

2 numbers (X)is a

of the

average

the

siqgleroll X. This illustrates

for a

errors'

the

When a

O n the
averagel :l, the expectation
of the average would be At = 67.8 inches, and
its stan\177tard
deviation
would
be cr -- 5.3 inches. \"['his is how thee long
and shcrt menin the sample tend to \"average out,\"' making
X flUCtuate
less thah a singleobservation.

(Classrom Exercise)
6-3

(b)

Tal\177e

how in

(c)

men

s\"i\370f the

height

weight

the

sample

sample,

each

(a)
a

class.

samples of size4 (with


replacement),
tall students tend to be offset
by short
calculate
\177. Plot
the Values of \177 and

of employees in a certain
around a mean of 150
pounds.
A random group

20

of

elevato[ each morning.


(b)

of the population of

graph

t\370tal
Th\177

showing

students.
cqmpare

full

and OnCe'third

the

mean

pounds,
of

25

with

employees

and variance

has

building

a standard
getin
the

of:

S.

weight

average

b\370\177l is

Find

office

large

s distributed

deviatk\177n

=> 6-5

(probability)

pCpulation

The

in the

random

a few

\177ach

FO\177

to (a).
6-4

frequency

Ma\177e a relative

(a)

weight

of many

2.

chips, one-third marked 2, one-third

marked 6.

ma\177ked

\337

4,

112

SAMPLING

(a) When

one chip is drawn,

let 2' be its number. Find F and cr, (the


and standard deviation.)
of 2 chips is drawn,
let \177?be the sample
mean'. Find

mean

population

(b) When

a sample

(1)

The

(2)

From

table

probability

\177.

%.; check your

Fx' and

calculate

this

(6-11).

(6-10) and

of

(c) Repeat (b) for a sample of 3 chips.


(d) Graph p(\177) for each case above, i.e., for sample size n
Comparison is facilitated
by using
probability
density, i.e.,
bar graph
As

else

The

1, 2,

by

using

to the share

LIMIT

THEOREM

In

around

concentrated

more

of\234(\177)7

this

case

\177 is

of the Sample Mean


exactly

Population

This follows from

normal.

which we quote

the

When

a theorem

linear

on

proof:

without

If X and Y are normal,


then any linear combination
Z = aX + b Y is also a normal random variable.

a normal

With

population,

normal. The sample mean


n normal variables,

each observation in
be written
as a

.\177can

F =1
Xt
lq
that

(6-13)

emphasizethat
(ref.

cases.

The Distribution
is Normal

combinations,

so

3.

= (height) (width).
becomes

that\234(.\177)

In the precedingsection we found the mean and standard deviation of,\177.


one question
we have not yet addressed is the share
of its distribution.

We consider two

(a)

area

is happening

CENTRAL

THE

6-4

notice

n increases,

/\177. What

probability

with

using

answers

can be
its

concentrates

6-11).

(b) The

Distribution

of

.\177

When

the

sample

.g\177,

2,, is

2'2,...

linear combination

of

.g

is normal.

about F

Population

these

(6-14)

. + 1 X,,

1 X2

lq

used to establish that

distribution

(6-13)

Finally,

as samplesizen

we re-

increases

is Not Normal

It is surprising that,
even
in this case, most of the
same
follow. As an example, considerthe bowl of 3 kinds of chips in

conclusions
Problem

6-5.

i;

is obviou

This

tion. As

6-3a?

we notice the
This sam

as shown

,endencyto the normal


tendency to the normal

...

2, 3,

(n

throws

it is

fact,

a rectangular

larger sample istaken,the distribution


As well as the increasing concentration

:r and

a larg

Figure

in

population; in

a nonnormal

ly

THEOREM

LIMIT

CENTRAL

in

from

throws

of

\177?is

this

of

distribugraphed

distribution,

shape.
occurs for the sample of dice
a population
of all possible throws),

bell

shape

6-3b.

\177gure

population is shown, having


chips
numbered 2, 4, a\177d 6, with proportions
1/4, 1/4, and 1/2. Sample
means\"from
this populatio[
\177
also
show the same tendency to normality.
'
\337
These
thrtee examples
display an astonishing pattern the sample mean
becomes
norn'lally
distributed
as n grows,
no matter what the parent p0pulation
is. This
p'attern is of such central
importance
that mathematician\177 have
fo rmu 1ate d :as
Finally,

the

increases,

from

practically

approaches a normal
and standard
deviation
The

cent\177:al

well. For

it

specifies

it has

the

,n of

distributi,

of

examples

that

found

\177een

\177
\275ith

from

taken

case in

the

6-.3.

figure

is normal

.g

taken

samples

conclusions

previous

our

can now be very specific


a known population..

.g, we

of

inference.

large

when the size n


normal. This is certainly the

conch\177sion, we can assume that


a norm;[1 population, and for large

population.
tion

as

practical

very

but

samples, and is
In fact, as a rule of
reaches
about
10'or 20,

usually

practically

\177 is

(6-15)

mean

\177/v'\177).

In

from

(with

distribution,

X, of

population

\370any

large-samplestatistical

therefore the key to


thumb

mean,

is not only remarkable,


the distribution of .V in

theorem

limit

ccmpletely

size/t

sample

the

the

of

distribution

taken

sample

Theorem. As

Litnit

Cetttral

The

a third

6-3c

i\177 Figure

in our

for any sample


taken
from practically any

on the mean and standard.deviadeduction about a sample mean

Example

Conside\177the

marks

a normal dist 'ibution


student

\177
The
first

2 graphs

I:

as already
f

Figure

culated.
0

The

theorem,

with

of all students on a statistics test. If the


a mean
of 72 and standard deviation of

done the
6-3b (in

first

3 graphs

one qua ification


is that the population
see, [or example, P. Hoel, Introduction

pp. 143-5,Jol

Wiley

& Sons,

of Figure

Table 6-2).The rest

1962.

have

6-3a (in

of the

variance.
Mathematical

finite
to

graphs

Problem
may

marlys

have

9, cQmpare

6-5), and
be similarly

For a proof of
Statistics, 3rd

the
calthis

ed.,

114

SAMPLING
p(x)

p(x)

p(x)

n=l
2
p(\177)

4 \177

p(\177)

n=2
i

[--I

r-

p(\177)

p(\177)

p(\177)

p(\177)

n=3

p(\177)

n=5

\177
p(\177)

\337

p(\177)

\177

p(\177.)

n--10

FIG. 6-3 The limiting


six kinds of chips

(c)

(b)

(a)

normal

shape for

(or die). (c)

\177p(\177).
(a)

Bowl of three

Bowl of three
of chips

kinds

kinds of chips. (b)Bowl


of different
frequency.

probability that any one student will have a mark over 78 with
(2)
that a sample of 10 students
will have an average mark
over
78.
1. The probability that
a single
student
will
have
a mark over 78 is
by standardizing
the normal population

(1) the

the probability
found

of

= Pr (Z >

.67)

= [50

:l
- .2486=

.2514.

above

'115

THEOREM

LIMIT

CENTRAL

2. Now c( nsiderthe distribution


of the sample mean.
we kno/v it is normal,
with a mean of 72 and a

a/x/\177 =

From

9/
78

exceeding

t\177

the

calculate

we

this

From the theorems


standard deviation
probability of a sample mean

be'

Pr

>

(\177

78-

Pr (X-/\177

78)=

> 2.11)

Pr (Z

72\177

':

= .0174

(6-16)

\177

\177

(X > 78)

Pr

50

\337
\177=

100

\177

72

--

FIG.

6-4

of probabilities

Co\177parison

of

ten

students

will

and

for

chance

well. This

mean.
that a i:ingle
that a sample

the sample

(about
1/4)
(about
1/60)

chance

this

perform

.017

population

for the

Hence, alihough there is a reasonable


student will gelover
78, there is very little
average

78)

X,. Pr (\177 >

is shown

in

Figur

e 6-4.

PROBLEMS

6-6 The welghts


about

.\177mean

What

i\275the

of

packages
of 25

probability

avelage
weight
Suppos\177 that the

an

6-7

has a 4ean of
that

in

11.1

a random

of schooling between

filled by a

ounces,
that

with

machine are normallv

a standard

n packages

deviation

distributed

of one

ounce.

from the machine willhave

of less than
24 ounces
if n = 1, 4, 16, 647
education level among
adults
in a certain country
years,
and a variance of 9. What
is the probability
survey of 100 adults you will find an average level
10 and

I27

6-8 Does t le central


limit theorem
sum ? J ustify
briefly.

(6-15) also hold

true

for

the

sample

116

SAMPLING

6-9

is designed

elevator

An

If the

10 persons.

of

limit of

a load

with

normally distributed with a mean


of 22 lb, what is the probability that
the load limit of the elevator?
6-10

2000 lb. It claimsa capacity

people using
the
elevator
are
of 185 lb and a standard deviation
a group
of 10 persons will exceed

all the

of

weights

Supposethat bicycle chain links have lengths distributed around a


mean/z = .50cm,with a standard deviation cr -- .04 cm. The manufacturer's standardsrequire
the chain
to be between 49and 50 cm long.
(a)
the

are made of

If chains
standards

100

of

proportion

what

links,

them

meets

(b) If chains are made of


? How

standards

the

(c) Using 99 links,


the quality
control
percent of the chains
(6-11) The amount

pocket

money

distribution

with

of

a nonnormal
tion

of $2.50.

will

be

carrying

only
99 links, what proportion now meets
many links should be put in a chain ?
to what value must cr be reduced
(how much must
on the links be improved) in order
to have 90
meet the standards ?

that persons in a certain


a mean of $9.00and a

What is the probability


a total of more than

6-12 In Problems 6-6 to 6-11,the

S2100

that the

required

formulas

variance

has

of 225 individuals

a group

that

city carry

standard devia-

independently drawn. Do you think


this is a questionable assumption? Why
?
'6-13 A farmer has 9 wheatfields planted.The distribution
of yield from
each field has a mean
of 1000 bushels
and variance 20,000. Furtherindividuals

in the

sample were

the yields of any


2 fields
are correlated,
same weather conditions, weed control, etc; in

more,

10,000. Letting S denote the total yield from


(a)
The mean and variance of S. [Hint.
How
to (5-32) be adjusted
?]
(b) Pr (S < 8,000), assuming
S is normal.

*6-5

SAMPLING FROM
WITHOUT

alternatively,

analysis,we have
sampling

from

assumed

This

is a starred
it without loss

all 9

fields,

must

the footnote

section, and
of continuity.

like

either

an infinite

matter whether we replaceor


possibility
sampling from a finite population,
skip

is

covariance
find

proof

POPULATION,

FINITE

it doesn't

the

share

they

the

fact

REPLACEMENT

In the preceding

ment, or

because

a starred

not. This

leaves

without

problem,

sampling

population-.

it

with replacein

which

one

case

remaining

replacement.

is optional;

the student

may

We

argued

already

hay

observations

X,2,...

(X\177,

whether
or i ot
(6-5)
still fol. ows

in Section

6-1

have

the

will

X\177)

we replace;
from (6-3)'

i.e.,

\177

similarb

And

land, the

On the other
or

not

of

the

we re t

lace;

male

;tudents

it is

/rs

/\177-

=/t

117

so

that

repeated

(6-5)

n/\177

(6-10) repeated

does dependon whether


10 of the heights

we sample

Suppose

college campus;

supposefurther

first

the

that

student we st mple is the star of the basketball team (say Lew Alcindor,
7 feet 1 inch)
Clearly, we now face the problem of a sample average
that
\"off target\"-?specifically,
too high. If we replace,then in the next 9

chosen, Alciqdor could turn


off target on the high

up
side.

further

to

once

then

replace,

mean

er\177

we don't

have

In summary, sampling without


replacement
(i.e.,
\177has less variance), because extreme
return to haunt
us again.

cannot

s\177mpled,

Formall)
the argument runs as follows.If we sample without
ment,
then X\177 X\177... 2,, are not independent. Hence all our theorems

variance of S

ind

\177

on the

based

above,

modified to'Sp,',cifically, (6-7)


true.

hold

on the

independence assumption, dO

assumed

which

replace-

not

must

replacement--now

be

vats = a\177
replacement{

where N

(6-17)

without

(sampling

..

size,
and
n = sample size. Furthermore
med replacement--must be similarly
modified

(6-12)

population

also asst

which

varY= a5

to-

=-\177-L\177

(6-18)

without

(sampling

replacement)
Although

we d'

not prove

1. The varijance

of

\177

intuitive

with

is

'eliable samplemean

a more

values

our sample

throw, lag

we don

at

again.

Alcindor

abor[t

worry

yields

again,
But if

examp!e

these two

of.\177 without

replacement

(6-12);

of heights

distribution

regardless,

holds

of

a sample

X\177 in

(marginal)

same

sample mean

see why.

easy to
on a

the

all

that

(6-.1)

equation

of the

variance

REPLACEMENT

WITHOUT

SAMPLING

formulas,

we interpret

them-

variance

replacement

(6-18) is less than the

(this is. the

formal

confirmation

of

\177our

This

occurs

because

'the

of college students)D

SAMPLING

factor,\"

\"reduction

(6-19)

(\177-\177)

appearing
and

and

have wondered where

If you

seethat

(6-18) must

(6-12) and

sample

size

replacement

coincide.

necessarily

denominator, you can

in the

from

1 came

the

the

course,

made between

can be

distinction

no

case,

[Unless of

than one.

is less

(6-18)

in

one. In this
nonreplacement,

only

is

to logicallymake (6-12)and (6-18) equivalent,


one.]
2. When n = N, the sample coincideswith the whole population, every
time. Hence every
sample
mean
must be the same--the population mean.
The variance
of the sample mean, being a measure of its spread,
must be zero.
This is reflected
in
(6-19)
having a zero numerator; and var.g in (6-18)
becomes
zero. (Note that
with
replacement
this is not the case in this
instance,
n = N does not guarantee that
the
sample
and the population are
in order

necessary,

it is

as they

must be, for

size of

a sample

identical).

3.

men

the

On

other

are sampled

when

hand,

from 80 million),

common

whether

is

(6-19)

then

the same as with replacement.


sense;
if the population is very
or not the observations are thrown

is practically

(e.g., when
practically 1, so that
than N,

smaller

is much

This,
of course,
large,
it makes very
back

in again

200
var

coincides
little

before

with

difference
continuing

sampling.

PROBLEMS

\337
6-14

In the

game of

bridge, cardsare allotted

points

Points

Cards
All

(a) For the


and

the

(b) In
random

cards

below

jack

Jack

Queen

King

Ace

population of

as follows:

52

find

cards,

the mean

number of

points,

variance.

a randomly dealt hand of


variable. What are the

players beware'no points

counted

13

cards,

mean

for

the number
variance

and

distribution).

of points Y is a
of Y? (Bridge

1 19

POPULATIONS

BERNOULLI

is Pr ( Y >_ 13) ? (Hint. The distribution shapeis approximately


might hope from the central limit theorem).
Rework
Problem 6-9, assuming the population of peopleusing
etevato\177 is no longer very
large,
but rather
(a) N = 500.
(C) Wh: t

as we

normal

'6-15

(b) N =

We

the
in

50.

final

have

examined

stati

.tic that

Chapter

(a)

The

Bet

the

I oulli

First, we

BERNOULLI

FROM

SAMPI,ING

6-6

the

the distribution

we

is the

study

proportion

sample

POPULATIONS

of a sample mean
to in our

and

one referred

a sample

sum;

of U.S.

voters

poll

P.

Population
be

must

clear

drawn. We c()nceiveof this

T^m\177E

on the population from which


made up of a large number

the

as being

6-3

A Bernoulli

Frequency

salmple

Variable
p(a\177)

66,000,000

Republican

66,000,000

150,000,000

= .44

84,000,000

Democrat

84,000,000

is

of individuals,

150,000,000

--

.56

.56

.56

5o,ooO,ooo

or R (Democrator Republican).We can make this 16ok like


of chips
by relabelling each D with
a 1 and each R With
a 0.
Thus, if the voting population of 150 million
is comprised
of 84 million
Democrats and 66 million
Republicans,
the population
probability distribuall marked
the familiar

tion would

\177owl

l\177e as

shown

in Table

6-3.

'

The po!,ulationproportion
rr. of Democrats
is .56, which
is also
the
probability,!in
sampling one individual
at random,
that a Democraf will be
chosen.
This' is called a \"Bernoulli\"
population
and its distribution
is graphed
later in Fig\177re 6-6a. This is the simplest kind of probability
distri'bution,

120

SAMPLING

two values 0 and 1.(Notethat this population


is as far from
any that we will encounter).
Its mean and variance are easily
Table
6-4. In our example,/\177
= .56, and cr = .5
The reason that
the
arbitrary
values of 0 and 1 were assignedto the
population
is now clear. This ensures that/\177
and
rr coincide.
at only

lumped

being normal
computed in

as

Calculation of

TABLE 6-4

\177tand

cr 2

(1

,r)

rr

\177r

Population

a Bernoulli

for

(x

- rr)2(1 (1 - \177r)\177r

rr
rr

(6-20)

=(1

- =)

(6-21)

(6-22)

=)

(b) Bernoulli Sampling

can we expect of a sample drawn


from
this sort of
is so large that
even
without
replacement,
the
are practically
independent;
the probability of choosinga
practically
.56 regardless
of whether or not we replace.
a sample of n = 50 let us say, we might
obtain,
for example,
\"What

ask,

now

We

population

?\"

population

The

observations

Democratremains
If we take

numbers:

the following 50

1... 0 1 1

0 110 100 10 11
sum, of course, will
We recall encountering this

sample

The

sample.

4-3;

Table

a binomial

thus

just

be

before

random

(6-23)

the number of Democrats in the


as a binomial random variable in
a sample

simply

is

variable

sum

in

disguise.

is this

Why

wish to

interesting coincidenceof any


binomial

the

calculate

probability

practical

of at least

value

? Suppose

30 Democratsin

we
50

evaluate the probability of exactly


30 Democrats,
of 31, 32,
and so on. This would require a major computational effort: not only are
some twenty odd probabilities involved,
but
in addition,
each is extremely
We could

trials.

to

difficult

? As
ability

an

calculate.

exercise,
of getting

But

we recognize

the student should


30 Democrats in

that

consider
whether
a sample of 50, which

(\177:) (.56)a\370(.44)

is equivalent

this

it is

\177\370

is:

feasible

to calculating

to evaluate

the prob-

the

y that

probabilil

is at

least 30

)n we have completely

secti

previous

S, the sample sum taken


of 50. This is very

a sample

121

POPULATIONS

BERNOULLI

because:in the

to calculate,

the

described

population,

a Bernoulli

from
easy

of

distribution

s\177ample

any

sum.

Sin fact

normally

s approximately

distributed,

the following

with

mean

and variance

From
from

(6-20)

(6\1775),

'!
(6-24)

mean

Binomial

From (6-7),

and using (6- !1),

as

X/n\177(1

Hence the probability of at

least

Pr (S 2
.dardized form

in stm

which,

Ip (S-- us >
To confirm

t \177
'e

e s
usefulness

in a

30 Democrats

sample of 50 is'.

30)

is
32
\177

Pr

(Z >

-approximation
to the

-- this
\17712.3
]
of
normal

on p. 6-5.
120.\177he
s For

normal

to

approximation

(6-26)

.28

.58)

student should compare th\177s simple


solution
w\177th the
in evaluating lsome twenty-odd expressions,
eachlike
Figure

(6-25)

deviation

standard

Binomial

\177r)

calculations
one in

the

the binomial

the

binomial,

inkolved

the fobtnote

is graphed\370 in

the central limit theorem.


A useful rule of thumb
is that n should be large
enough
to make: n= > 5 and n(1 -- =) > 5. If n is large, yet = is so small that n= < 5,
then
there is a \177etter approximation
than the normal, called the Poisson
distribution.
\177
This graph cl, irly indicates
that a better approximation to the binomial histogram would
be the area und \177rthe curve above 29.5, not 30. This peculiarity arises from trying to approximate a dis;;rete variable
with a continuous one, and is therefore
called the cqntinuity
correction.
Our 'better approximation is
large

n, bJ

Pr S --/ts
(
To keep the
book.

an. \177lysis uncluttered,

>

\177

this continuity

Pr (Z

> .43) \177 .334

correction is

ignored

(6-27)

in the

rest of

the

SAMPLING

122

continuity

With

correction Pr =

correction Pr =

continuity

Without

answer

We now
is the
as

Just

to

turn

distribution
the total

disguise, so the

to

tion

second major issue of this section:


of the sampie proportion P?
number of successesis merely
the

sample proportion is merely

theory developed

of the

(Compare Fig. 6-1a.)

the

P -All our

35
binomial.

the

for .\177, can

sample statistic ?.

--

now
from

Thus,

Pr

in sample

Democrats

of

FIG. 6-5 Normal approximation

what

30

/\177s29.5

.28

with exact
= .337

Compare

s = number

.334

the

sample

,r known,

with

sum in
in disguise:

sample

mean

(6-28)

.\177

be applied to determinethe distribu(6-10) and (6-20) the mean of ?

is'

(6-29)

we note that, on the average, the sample


proportion
? is on target,
value is equal to the population proportion which
(we shall
see in Chapter 8) it will be used to estimate. But any specific
sample ? will
be subject
to sampling variation and will typically
fall above or below
From (6-11)and (6-22) we discover that its standard deviation is
From

i.e., its

this

average

= X/(1
Finally,

since P is

(central limit

(6-30)
=)

a samplemean, its distribution

is normal

for large

samples

theorem).

As an example, consider the population of voters


shown
in
What is the probability
that
in a random sample of 50 voters
and 60 percent will be Democrats
? From (6-29) and (6-30)
/\177p

rr =

.56

%/.56(1 -

.56)

.070

Figure 6-6a.
between 50

two

These

define the
tion

along

.es,

vall

of

d\177stnbut\177on

near

was no+ere

with our knowledge that


P shown in Figure 6-6b.

completely

is normal,

Even though

our

P is

statistic

sample

our

normal,

123

POPULATIONS

BERNOULLI

popula-

approximately

normal.

i\370\370
5O

cr

1.0

7r = .56

/\177
=

= w/\177r(1 - \177)= .50

(a)

.50 \177P

.4

.2'

\177

.6

gp =
FIG. 6-6

of voters.

(b)

\177.5208

.8

1.0

= .56

proportion

the sample

of

Rela[ion

Fig. 6-lb). (a] Population

\177.60)

In a

proportion

population

the

to

sample of 50 voters,

The evahtation of the area of this normal


now a's [raightforward matter:

between

distribution

(compare
6f P.

distribution

.5\1770 and

.60 is

\177r

(.s0

-<

P <

.6o) =

(.S0 --.S6
\177r \177

Pr

.5208

(-.857

.\177\177

-<

\177

\177

--

.60

--.\1776]

<

.572)

PROBLEMS\177

(Note
continuity

if you

want

tN

Lt

co]

rections.)

high accuracy

in

your

answers,

you should

make

124

SAMPLING

takes a poll of 1000


Democratic. Letting P

Gallup

Suppose

6-16

56 percent

is

(.52 < P <

(a) Pr

.60).

be the

problems raisedin
In tossing a fair coin

tion of

6-18 If

a fair

heads
die
of

quarter

(a) Pr

is rolled

(a)

is the

the

propor-

that at least

probability

is the

what

? Answer

that

General

one

ways'

two

number

chance

Year,
Year,

SAMPLINGTHEORY

SUMMARY OF

6-7

what is the probability

of aces _> 25)


that of the first 100 babies born in the New
more
than
60 will be boys ?
What
is the chance
that of the first 8 babies born in the New
more
than
6 will be boys ? Answer
two
ways'
(a) Exactly, using
the
binomial
distribution.
(b) Approximately, using
the
normal
distribution.
What

6-20

will correctly
to answer the

sample

beginning

(? >

(b) Pr (total
(6-19)

100 times,

are aces

these

find

1.

Chapter

50 times,
exceed
.55 ?

will

th'e

which

proportion,

sample

(b)
Pr (P > .5), i.e., the probability that
predict the election. Note how
we
are

6-17

a population

from

voters

Sampling

1. The distribution
large samples say n >
population is near normal,

of

mean

sample

the

t0 or

20 as a
a much

then

normal for
(Moreover,
if the

approximately

\177 is

thumb.

of

rule

smaller sample

will

be

approximately

normal.)

2. X will
3. If we

have

an

equal

expectation

sample without

population (N) is very

large,

\177

will

expectation.
a variance equal to'

population
have

--i

If the

the

to/\177,

replacement,

this

reduces

to, approximately:

G2

which

is also the

Thus we

may

formula for

variance

when

we

sample

with

replacement.

write'

(6-31)

125

SUMMARY

is a

which
and

(b) Bernoull

distributed

for \".g is normally

abbreviation

seful

varianC,

mean

with

Sampling

If we a ply thi s sampling


theory to a special population-.
-chip s coded
0 and 1-.-th}n we have the solution
to the proportion
problem. The Sample
proportion
:\177 is just
a disguised \177, an d th e population proportion rr: is just
a disguised/\177,so that

(6-32)

again assum\177

\177gn

Probl

ems

Review

6-21

large.

is sufficiently

Five n

proba'

(c)

TI\177

Gi

6-22 A

e total

weight

\177ean

intuitive

mar

follow

is

than

more

is more

reason
pays

carnival

at

more

Winning

why your

average net
approximate

'

is

Whlat

his

plays the game'

(1) 5 times

(2) !25

s I to play a

game

(roulette)

the

with

Net

Winning

Y-

winning

in

Probability

20/38

18/38

0
'
ending up a loser (net

a game

chance of

loss)

times ?

(3) !:125
times

Hoqv could you


Ho;\177v

answers are related.

+!

(a) Wt.at is the

(d)

170 ?

-- 1

()

losing?

mean

is the

than 8507

(c)

What

170

than

$2

if he

elevator.

\177gpayoff:

y \177 Gross

population with

that

fility

AI five men weigh


(b) Th e average weight
(a)

(d)

= 20lb, get on an

and a

160 lb

/\177=

from a normal

at random

selected

\177en,
weigh\177

many

times

get an exact answer


should

for

(b)l

he play is he wants to

be 99 \177 certain

of

126

SAMPLING

6-23

Fill

blanks.

in the

(a) Supposethat

in a certain
election, the U.S. and California
are
proportion of Democrats,,r, the only difference
being
that the U.S. is about 10times
as large
a population. In order to get
an
equally
reliable
estimate of rr, the U.S. sample should be

in their

alike

as large as the

+2\" or --2\",equally
likely.
is taken. The sample sum

error as much
as
no more than
.
t. Worst possible

2.
==>

6-24

Let

\177be

we feel

the

for

example,

For

(95 %) to be in
two

inde-

be in

S couIdi)ossibly

100, these

of

by

error
errors

are

error =
a die is thrown

mean when

sample

that

certain\"

\"fairly

S is likely

. However,

suppose

we

which

A sample

<

error

Likely

an error,

with

measured

is

measurements

pendent

sample.

California

(b) A certain length


for simplicity to be

.\177 is

\"quite

times.

1000

close\" to

hr.

More

to

the

Intuitively
precisely,

calculate

Pr

(6-25) In making
(a) If the

a budget,

up

consists

budget

--

(3t

.1 <

\177

+ .1)

</t

a housewife rounds
of 200 items, what

rounding error will exceed


s l.007
(b) Briefly state the assumptions necessaryin
(6-26)

Suppose there
62, 65,
denoted

are

68, 65,

five

in a

men

65. One

man

out

is

the

answer

nearest
100..
chance that the
(a).

room, whose heights in


at random with

is drawn

inches
his

are
height

X.

(a) Graph the

tion. Find

its

probability

mean

bt, and

Suppose a sample of

function

of X,

i.e., the population distribu-

variance a
two

men

is drawn,

with

replacement,

and

mean .\177is calculated.


(b) Construct a table of the probability
function of \177. (Hint.
List the
possible samples, i.e., the sample
space.
Are the outcomes equally
likely
? For each outcome,
calculate

the sample

(c) Graph the

probability

function

of X.

(d) Find the mean and variance


of .\177from
its probability
distribution.
(e) Check your answers
to (d) using the equations of this
chapter.
(f) Is the following a valid interpretation of these
formulas?
If not,
correct it'
.\177fluctuates
around/t;
sometimes
larger, sometimes smaller, but
exactly
equal
to 3t on the average. (/\177x =/t).-g,
the average of n
observations,
does not fluctuate as much
as a single observation,
however (tr} = a\177'/n < a\177)\337This is to be expected, because in a
sample,
a large observation will often
be \"cancelled
out\" by a

SUMMARY

mall

observation,

ions which

will

or at least swamped by
typical.

the

rest

\177i

of the

127

0bserva-

be more

(6-27)
Rep\177

*6-28

at

merit:

Problem

Why IS this

a sample of 2 men
sampling without
replacement

6-26 for

likely 3' it was

precise, for

of

ac\177 s

in

w/thout replace-

preferable

'
stated that relative
frequency
in the long run is
to be \"closeto\" probability.
To make this statement
the rolling
of a die for example,'letP denote the proportion
I0,000 throws, and calculate

In Ct}apter

\"very'

drawn

Pr(\177--.01
< P<

\177+.01)

chapter

Estimation

7-1

INTRODUCTION

induction,

statistical

beginning

Before

the concepts

we pause

Table

in

7-1

to review

of sample and population.

It

is essential

,it and

variance

calledpopulation

to remember
\177r
2

(though

so that its mean


unknown). These are

is fixed,

population

the

that

constants

are

generally

parameters.

the sample mean .V and sample variance s 2 are random


varying from sampleto sample,with a certain probability
distribution. For example,the distribution
of \177 was
found
to be approximately
.V \177. N (3t, o2/n) in Chapter
6. A random
variable
such as .V or s = which
is
calculated
from the observations in a sample
is given the technical name
By

contrast,

variables,

sample

statistic.

specific example of statistical


inference,
suppose
we wish to estimate
height
of American men on a large Midwestern campus.This
population'
mean
l.t is a fixed, but
unknown
parameter.
We estima_te it by
taking
a sample of 36 students,and
compute
the sample
mean X; let us
suppose this turns out to be 68 inches. We shall see in the next section that this
is our best single estimate or \"point
estimate\"
of/t.
But we also know, from
As a

the

average

T^B\177.E

Random

which
Random

are examples

used
tt

s\177

statistics,

Population

which

of

Population

Probabilities
are

compute

.V and

2.

versus

Random Subset of the

frequencies.fi./n

are used to
3.

is a

Sample

1. Relative

of'Sample

Review

7-1

p(\177')

to compute
and

\177

are examples

of

Fixed parameters,or

or

4. Estimators

Targets

128

129

INTRODUCTION

the theory of ?ur previous


sample this estimate .\177will

we are extremely lucky


in our
target, but rather a bit high or

that unless
be exactly on

chapter

not

\177 is distributed
around/z.
above and below i't as
6-tb. If we want to be reasonably confident that our inf4\177rence
is correct, we!cannot estimate/z to be precisely
equal
to our observed \177;
instead
we mUkt estimate that/z is bracketed by some
interval
know n as a
co\177de;7ce
intert,
al
of the following form.

Tec}\177nically,

low.

bit

in FigUre

shown

/z

the

with

estim\177

In this

estimate

in

around/z

tl\177e

chapter.

previous

to the assumptions of Section6-3,so that


of Figure 7-1.
decide'
\"How confident do we wish
to be that our imerval
that
it rioes bracket \177?\" It is common
to choose 953/00

First we must
estimate is right ....

according

words, we

in other

confidence;

the long run,

To geta c{)nfidence
this is the
in

shown

of 95

level

of

distribt.tion

normal

a technique that
19 times out of 20.

using

be

will

estimate

interval

correct

allowance.

error

we can be very
specific
in our interval
be specific about the distribution
of \177
To keep inferences simple, we assume
\177

could

we

distributed
distribution
is that

its

of the

that

show

will

because

normally

is

is a

is the evaluation

seckion we

of/\177,!

right-hand
side of (7-1),there is no problem
simple calculation of the average
of the sample

the

evaluating

.tor\177; this

problem

The

values.

(7-2)

inches

68 4- 3

/z

We observethz .t in

we

\177,

the smallest

select

enclose a

will just

.\177that

95 3/o

miditle chunk, leaving 2-}\177 probability


Figure
7-1. From our normal tables,

going above
a\177d
write

mean

the

below

therefore

(7-1)

allowance

an error

.\177 4-

estimate

we might

exampld,

an

As

by

will

note

we

ObviOusly,

in each tail, as
that
this involves

deviations of

1.96 standard

us, in

range under the

probability.

excluded

give

.\177.

We

t
P,] r

The brack

speak, obtainin!g

To

(7-4)

prove

standard

normal

inequalities

/\177

i [
-- '
note directly,

istribution.
Pr

In

(7-5)

the

brack,

\177,'
n

96

O'
.96-5-=i

be solved
statement:
\177

x/n
for/z,

may

from

<

--t.96
may

= 95 o\177

o/

be solved

</\177
\177-\177

<

<
for

5' +

(7-4)

1.96--=_} =95/o

standardizing .V, which


the standard normal tables:

c\177/x/7t

(7-3)

turned around\" so to

rr

< X+
4 could</\177 begin
by

.-7\177

X-- 1.96

Pr

rr

we
Thus

:ed inequalities

2- \177</z

<

equivalent

the
P]

i---'

)d

rr,_

1.96

\177/z --

1.96

h\177's the

(7-5)

95\177

,tt, obtaining

1.96

then

the equivalent inequalities

95\177o

(7-6)
(7-4)

proved

This area

\177

p.FIG.
we

7-1
don't

\\

which

1.96-\177

is

\177.
+

\177.

This area

1.96

\275

also/J..\177

Distribution of sample mean \177\177-\177


N [3t, (o2/n)]. (Note. tt is an unknown constant;
what its value is; all we know is that, whatever \177tmay be, the variable
X is
distributed around it as shown in this diagram.)

know

D\177stribuhon

of

mean

sample

This is what

we know,
the

> but

statistician

does

&

67

69

71

not.

\177

70

His

\337
\177'1

estimate,

interval

first

68
His second,

His third,

These are the


statistician's

and so on;

(so
far, att
bracket/z)
:
His one

> interval
estimates.

miss

z\17712

His

FIG.

%2

Construction of

twenty

interval

130

twenbeth

estimates'

a typical

result.

131

INTRODUCTION

We
u)t be exceedingly careful not
to
m\177smterpret
(%4). ,u has not
changed its c\177;haracter in the course of this algebraic
manipulation.
It has not
become a v\177riable;
it remains
a population constant. Equation (7-\177), like
(7-3),
is a prgbability statement about the random
variable
?, or more pre\177,aries, not

that

constructing an

point, let's

fundamental

men's heights on our


what is going on, supposewe

illustrate

clearly

to

of the population \177 (which


know to be 6 inches).Now let's

supern\177tural knowledge

some

69 inches)anc

we

a (which

happens whe the statistician (poor mortal


that
using
(7-4)
above.
Just for the sake of illustration,
such interval
estimates,
each
time from a different
Figure 7-2 illustrates
his typical
experience.

we
just

is) tries

he

suppose

let's

large
have

to be
observe what
know

to estimate
he makes 20
of 36.

sample

random

of

problem

our

to

return

average

of

estimate

interval

campus.Moreover,

is this

It

1.96(a/\177/\177).

\177.

this

apprec.ate

To

1.96(a/\177)
to \177 +

?-

interval\"

\"sandom

the

cisely,

interval

with

normal

is

\177

dviatio,

\177 95 \177

calculates

th\177

any

\177

will

\177

fall

computes

he

standard

and

(69)

rom

(7-3)

k,

ow

in the

range 67

to

71

takes
to

interval

95\177 confidence

appropriate

mean. We know

sample

the

doesn't know this; he blindly


the first mean \1771

the sqatistician

But

2
t
sample, from
Which

that

probability

of

population

i,ch.

to

there is

that
inches.

show the distribution


mean equal to the

d\177agram we

that

n
that

random
From (7-4) he

his first

be 70.

for

6
70 \177
=
interval

This

in

that

his

estimate
for
firsl effort, the

In his secor \177sample,

of individuals,
of

tion

diagram,
bracket

(7-4)

missed the

comes

\177and

able approximation

\177ne

kp\370ws

for

that

72

(7-8)

first one shown

the

\177 is

that nineteen

observe
Only

in

7-2.

Figure

enclosed

in this

We note

interval.

one--the

twelfth

of these twenty
does not;

in

estimates
this

cas e

he

wrong.

was

difficulty

(7-7)

statistician is right;

We
\177.

and

mar\177,

e gloss over
value for

he

and \177o on.


the constant

is

68 to

/
\177
\17736

the statistician
happens to draw a shorter group
computes
z\177\177o be 68 inches.
From a similar evalu a_
up with his second interval estimate shown in the

duly

and

\177

\177.96

here.
sample

a in this

In
size

evaluating
n

problem.

is 36

(7-4) the statistician has an observed


But there is one value he does n
,

132
We

this

but

easily
see why he was right
most
of the time. For each interval
is simply adding and subtracting 2 inches to his sample mean;
is the same 4-2 inches that
defines
the range ab around bt. Thus,
if

can

he

estimate

and only if he observes a


estimate bracket bt. Nineteen

ab, and

would

take

only

either

when

a statistician does not take many samples


-fie
(e.g., \177). And once this interval estimate is made, he
or wrong;
this interval brackets \177 or it does not. But the
of course,

practice,

'\177'2)'In

is

one instance

in the

only

within
the range ab, will his interval
twenty sample means do fall in the range
his interval estimate was right. He was wrong
he observed
a sample mean outside ab (i.e.,

of his

instances

these

all

in

sample mean

right

one

important point to recognizeis that the statistician


is using a method with
a
this follows because there
is a 95 \177 probability
within the range ab, and as a consequence
his
interval
estimate
will bracket \177. This is what is meant by a 95 \177 confidence
interval'
the statistician knows that
in the long run, 95 \177 of the intervals
he
constructs in this way will bracket
To review, we briefly emphasize
the main points'

95 \177 probability
of success;
that his observed \177 will fall

1. The
interval

that

is a

and remainsconstant. It is the

is constant,

parameter

population

estimate

random variable,

because

As long as \177 is a random variable that can take


is referred to\177as an
emmatot.
of \177.
2. But once the sample has been observed
value

(e.g.,

\177

no longer
valid.

longer

For

70 inches)

this reason

called a

it

95\177

when

it

values,

X takes

and

on one

specific

an \"estimate\"a of \177u.Since it is
are no longer strictly
\177is substituted
into (7-4), it is no
statements
but

statement,

probability

variable.

a random

whole range of

called

is then

probability
the
estimate

variable,

randpm

\177 is

on a

rather

95\177

confidence

interval'

our deduction in
into the induction that

Thus,
times

(7-4)
3t is

1.96\177 <

that

,\177is

within

(7-9)

\177
\177-

,t\177<

within

1.96a/x/\177t

\177+

1.96

1.96a/x//\177

of

the

'

of bt is

observed

around\"
x,. (7-9) is some-

\"turned

to

abbreviated

95 %

confidence interval'
(7-10)

a For emphasis, the


timator is denoted

potential

value.

estimate
is denoted
the capital letter

by

by the lower case letter


\177, while
the
X. We might call Z the realized value,

random_

and

es-

X the

13 3

INTRODUCTION

where

the critical value leaving 2-}5/0


normal distribution.

is

025

standard

the

To reca

once

fitulate,

3. Becauseof
in

he

4.

he

concentratec

narrows

interval

we know

the

that

clusions,

the\177n

An

95

right

be

ab increases. Hence our interval


point
and the one preceding

not

;\177.\177,\177

Intuitively,

its

of

estimator

there are

n{ossible that

is

for certain.

erred only
are wrong. All

our

less precise.
observa{ions
in

becomes

estimate
verify

of our conin Figure 7-2;

confident
each
tail

casual

\337

about the population

we knew thedistribution
...^

statistician

\177

confident
e.g., 9 5/o
leave less of the probability in

must

we

i{fference

\177It

cast,\" and the

9 0

Note how thais


Chapter 1. 'i
6.

\"die is

or wrong

is increased,

be more

to

wish

ranke

the

thus

upper tail of

if any,

estimates,

w4

5. If

the

then

certain,

for

right

has no idea which

will

size

s\177mple

As

either

in the

of the time,
in the long run.
the distribution of X becomes?more
around
/\177(cr/'\177/n
decreases
as n increases), and the confidence
(becomes
more precise).

is :hat

knows

7,

to be

observed

omniscience,

our

try. But he

twelftl

his

\177 is

will be

(7-9)

estimate

interval

probability

ta

\177.,,\177a v,\177,\177a

stronger realons, given

a \177WhY

to estimate

a mean with

a mean.

But there

are

next section.

the

in

beca. use
an interesting q .uest\177on,
(for example,
th e\177sample
did we use the sample mean ?\"
feasible

raises

This

statistics

other

a\177timate

preferable

\177tseems

\177.

was

parameter/\177

PROBLEMS

7-1 An
of

and v

.riance

(a) F

\177d a

the heights (in

measured

anthropologist

from a

10{ men

to be
95

71 and 9

respectively.

interval

\177oconfidence

inches) of a random
sample
found the sample mean

and

population,

certain

for the

mean

height/\177

of

population.
(b)

Find

whole

\177

a 99 \177

interval.

confidence

expenditures (in thousands


of 50 American families(all at the same
inco !e and asset level) The sample mean
is 5.2 and the itandard
devmt\177on
is .72.
Construct
a 95\177 confidence
interval
for the mean

7-2 A

research

the

study

of do!lars)of

the consumption

examines

a random

sample

conSrmption of all American families (at this income and assetlevel).


7-3 The r :aCtion times of 150 randomly selected driverswere foun4 to have
of .20 sec. Find a 95\177o
a me m of .83 sec and standard deviation
confi \177
[ence
of

dri vers.

interval

for the

mean\177reaCtio

n time

of the

whole

Poipulati\370n

134

ESTIMATION

(7-4)

From

a very large

randomly

selected'

class

74
83

78

64 55

58

86

71

72 64 42 62 62
65 68 60 76 86

65
58

49

64

56

50

71

56

45

73

54 86

a 95 \177 confidence
class. (Hin. Reduce your
into cells of width
5.)

grouping

What is the

7-5

95

\177

probability

(a) Once (as in


(b) Not at all ?
(c) More

82

53
73

70

average

of

mark

the

to manageable

proportions

who constructs

20 independent

by

will err:

7-1)?

in Section

example

our

58

74

once?

than

OF ESTIMATORS

PROPERTIES

DESIRABLE

7-2

a statistician

that

intervals

confidence

for the

interval
work

were

40 marks

57 75

58

87

Construct

whole

following

the

statistics,

in

To be perfectly

any population parameter 0, and


(In our special example in the preceding
section,/\177 is the population parameter 0, and \177 is its estimator
0). We would
like the random
variable
\177to
vary
within only a narrow range around
its
fixed target 0; (thus in our example
in Figure 7-2, we should like the distribution of X to be concentrated
around/\177,
as close to/\177 as possible).
We develop
this
notion
of closeness
in several ways.

denote an

(a)

No

for

it by 0.

Bias

unbiased

An

shown

we consider

general,

estimator

in

Figure

estimator
7-3a.

is one that is,


we state

on the

average,

right

on

target,

as

Formally,

Definition.
\177is

For example, X is an

an unbiased

estimator of 0 if

unbiased estimator
E(\177)

fact,

E(0)

= 0

'

(7-11)

because

of/t,

(6-10)

=/\177

Of course, an estimator 0 is calledbiased


bias is defined as this
difference:

if E(O)

is different

repeated

from 0; in

Definition.

bias B

(7-12)
_a_ E(\177)

--

OF ESTIMATORS

PROPERTIES

135

True

= E(O)
(a)

Bias\177_
True

E(\177')

(b)

(b)Biasedestimator.

CImparison of a biased

7-3

FIG.

is

Bias

(0)

illust:\177rated

in

0,

there

exceed\177

As an

will be
of

e\177ample

estimator.

unbiased

and

7-3b. The distribution

Figure

a tendency for

a biased

MSD =

estimator.

0.

mean squareddeviation

(7-13)

/7

will

on

inflate

it

since

target\";

over-estimate

0 to

the sample

estimator,

Unbiased

(a)

of 0 is \"off

(2-5a)r\177epeated

the

iverage

just

a little,

underestimate
by

dividing

population

the

1 instead

by

variance
2

variance.

of .\177 we

\177But

obtain

if we

the sample

(7-14)

(2-6)repeated
which

has

be,\177n

proved

proved,\"we mean
proved in thi\177 text,
4

This

that

an unbiased
it has been

we shall

of

estimator

proved

usually say

unclerest marion can be seen very easily


that Eq. (7-13) gives MSD -- 0, which

\"we

in the

rr \177'.(When

advanced

in

have

proved.\")

case ofn =

we say \"has
texts.
If it has

The student

1.Then

.\177coincides

been
been
who
with

obvious underestimate of cr \177


On the otOer hand, Eq. (7-14), when
n = 1, gives s2'= 0/0, which
is \177tndefined.
But this is not a'idrawback;
in fact, it is a good way to warn the unwary
that since a sample
of just one obs,:rvation has no \"spread,\"
it cannot
estimate the population variance
cr \177(assuming/\177
is ,nknown, of course).
\1773., so

is an

ESTIMATION

136

True
v(o)

(a)

True

FIG. 7-4

of an

comparison

inefficient

and

efficient

(a)

Efficient.

(b)

estimator

(both are

unbiased).

Inefficient.

puzzled by our division by n -- 1 in defining s 2 in Chapter


2 can now see
why' we want
to use this sample variance as an
unbiased
estimator
of the
population variance.
Both the sample mean and median are unbiased estimators of F in a
normal
population;
thus, in judging
which
is to be preferred, we must

was

their

examine

characteristics.

other

(b) Efficiency
As well as beingon target
estimator

an

of

tion

This is

the

of

notion

becauie

efficient
efficiency

\177to

of two

it

on

efficiency,
has

we should also likethe distributhat is, to have a small variance.


shown in Figure 7-4. We describe
\177as more
variance.
A useful relative measure of the
the average,

be concentrated,
smaller

unbiased *

is'

estimators

Definition.
vat

Relative

5 For biased estimators,

the

efficiency

definition

of

0 compared

of efficiency

is

\177:(\177
-

E(O
which

of course

is (7-15) if

both

estimators

- o)*
have 0 bias.

to

var 0

\177\177=

which is more

or sil nply

effcien.t,

Finally,

\"efficient.\"

median

population,

been

has

has been shown

median

G 2/\1771

Because it

is

that

(i.e.,

precise

df

sn

\177aller

of loo

way

more

again

411tend

size (n) we can reduce


tive

of/\177.

sample

the

(7-16)

a lar

mediar s derived from (7-15) as:


'I; sample,
the
relative efficiencyof the
(=/2)0'7,0

estimate

from

is cr2/n. On the other hand,


large
n, a variance of

to have, for

of the
a normal
W e' have

merits

the

on

sampling

(*r/2)(\370'2/n)

Hence in
to the

In

,u.

effcient estimator

to be the

variance

its

of

estimators

as

proved

already establ{shedthat

pass judgement

position to

in a

ve are

sample mean ':and


\177

absolutely

is called

other

any

than

eft%lent

137

ESTIMATORS

OF

PROPERTIES

estimator

An

efficient,

rr/2

(%17)

It
it

,u; or,

target

Of course,

estimate.

interval

the variance

of either

compared

mean

sample

1.5

is preferred.

,w4

to be closerto the

range)

\177

will

give

us a

will

give

us a

by

sample

increasing

an alterna-

Therefore,

estimator.

point
more

at the greater
efficiency of the sample mean
is to recognize
median will yield as accurate a point or interval
estimate
only
if we take a Ia
'ger sample. Hence, using the sample mean
is more
efficient,
because
i't cost: less to sample; note how
the economic
and statistical defini-

that the sample

tions of effcieq,

ting

cy

coincide.

(c) Consistency
s

Roughly

on its

pletely

:arget as

Figure 7-5. In
sistent estimate
We

now

he

measure

good

beaking, a consistent estimator

0.

good meas\177

re of

Consistency

a con-

r 0 will provide
a
:ate consistency more precisely. Just as the variance
\177fthe
spread
of a distribution about its mean, so the

\337
equires

& E(0

error

squared

mean

is a

concentrates comas sketched in

that

indefinitely,

as the sample size becomesinfinite,


perfect point estimate of the target

case,

limiting

is one

increases

size

sample

how the distribution of


this to be zero in the

0.
\177x'as

-- 0)2

(7-18)

0 is spreadabout

its

target

Walue

limit'

Definition.

11

-o.

consistenff

co

if

E(0

as

0 is

This
called\"

is

definition

'

conslstenc\177

sometimes calIed \"consistency


in probability\":

for any

Pr (]0
This

is often

taker

as the

definition of

- 0)2

(7-19)
-. 0 I

in mean-.square.\"

positive 6 (no

0' <

consistency.

(5)

as t/

--,. I
--\177OO

matter

It implies a
how

condition

small),
(7-20)

ESTIMATION

138

n= 100

= 50

True 0

FIG. 7-5

A consistent

how the

showing

estimator,

target 0 as n
squared

Mean

error is related

distribution of 0

concentrates

on

its

increases.

to bias and

by

variance

the following

theorem. 7
Theorem.

- 0)2

r(O

(7-21)

+ d'a

Corollary.
0 is a consistent estimator iff s its variance
and bias both approach
zero, as n \177 cc,.

If only the

unbiased\"

bias approaches zero, the estimator


a condition
that
is clearly weaker
does

Consistency

example,as an
7

Proof.

s lff is
9

E(O --

0)2

= E[(O
--'

E(O

o\177+
2

an abbreviation

Asymptotic

applies for

not guarantee
of

estimator

for

unbiasedness
all n, not just

in

bt

-- Ira) q--,tl\177) 2

0 +

thaf

a normal

(/t\177 --

\"asymptotically

consistency.

estimator

population,

is a good
the sample

one. For
median is

0)12

q- 2(,tt\177- O)E(O

(/,\177 -

an

is called
than

(7-22)

-- /t s) q- (/t\177 -- 0) 2

0) 2

\"if and

only

is also a
n -\177 m.

if.\"

weaker

condition

than

unbiasedness--since

the latter

consistent.to

good estimator;

not a

it is

But

because it is
As a final

and efficient.

consistent

both

is true
that it is a. biased estimator;
i.e., it is asymptotically unbiased.\177t
Since

but

is preferred

mean

sample

the

a consistentestimator

MSD is

sample

the

,example,

139

OF ESTIMATORS

PROPERTIES

ash
it can

\177c\177,

bias

this

also be proven that

its

variance

conditions of corollary (7-22)aresatisfied.


This
concept
of
aeconometrics.
biased, yet consistent estimator
is a very important
one for example,in
tends

to zeroi the

PROBLEMS

or false? If false, correctit.

7-6 True

(a) T1}e
sample
p\177oportion

(b) /t is

a random

7-7 Based )n
and

Prcve

(b)

Wt

they are

7-8 A

farm

(1) Should
(2) Should

to how to
he

average

he

square

it's a

Matherr[atically,

Thus MSD
as
t/--\177 oo.

--,-

n\370te

0.2

When

estimate.

to

is

estimator

Which

then

s s as n --* co. Since s s

in

? or

average?

question whether

that

But he is

average.

and

then square

02, and

0\177and

first, and

is best

(7-22),
noting that
approaches
zero.

the sample

median

has

zero

s2

MSD =
we

.\177?

area he wants

observation

To prove consistency, we use corollary


bias, and a varian
:e given by (7-16) which

thii'

to

relative

proceed'

x0

11 T\370 establish

1 \"JI- (-\177)X'2

of the

length

a second

take

decidestto

of

estimators

two

field, he makesa random


error,
so that
O\177is a normal
variate centered at 200 (the trule but
with
rr = 20. Worried
about his possible error, he

he mea\177;ures
the
his observed length
unknow!n value)
dilemmi\177 as

(\253)X'

field, whose

a square

\177rhas

and is

unbiased.

efficiency of

is the

at

prefera 3le ?

popula-

to sample),

sample

from

observations, consider the


? = (\177)xl + (\177)x\177
W &

(a)

estimator of the

Y.

parameter

of 2

a sample

an unbiased

(varying

variable

the

tc\177estimate

used

P is

proportion

tion

\\

is unbiased (for

any

n), it

follows that

MSD

is unbiased

14o

ESTIMATION

(a)
ways

Are methods (1)and


of saying the same

(Hint. Try

work out (1)

and (2).)

(b) If they
(Hints. This

problem

by generalizing

general a.

from a length
the

Furthermore,

marked 2,

(b) Is (2\177

(c) Is
(d) Is

an

unbiased

an

unbiased

1/,\177

1) an

7-10 To illustrate

the

all

bias very

tosses

We shall
ment

7-8.)

Problem

1//\177 ?

(c)

theoretically,

a sample of
die. The population

consider

of a fair

2 tosses

moments

easily computed'
3.5,

/\177=

(a)

of g.

computations?.

concretely,
all

and

1)?

(Compare

of/\17727

of

(2/\177

When

of X,

table

estimator

unbiased

an

marked 6.

probability

answered parts (a), (b), and

from the population of


are

\177 is

..one-third

chips

many

of

one-third

and

4,

estimator
estimator

you have
going through

and

200,

0.\177--

n measurements.

full

unbiased estimator of

could

How

without

a bowl

chips is drawn, construct the

hence
(a) Show (oncemore)that
(,\177)2

a sample of

marked

one-third

of 2

sample

230 and

2 different

has less bias?


be easier if you avoid arithmetic
of 200 feet to a length
of/\177, and also use
normality
is irrelevant to questions of
equation
(4-5)' E(X 2) = t\1772 + \177.)

consider

6-5b,

Problem

in

(b) to

answer

Generalize

(c)

using

try

they just

actually

will

expectation.Finally,
7-9 As

which

different,

are

or are

different,

thing?
actual values, like 0\177 --

of

a couple

(2) really

study

Empirical

many

in 2

estimators

sample

(Monte-Carlo

approach

times.

\1772 =

(You

35/12

ways.
technique).

can simulate the

roll

Repeat the experi2 dice


with the
does
it, say, 5

of

random digits of Appendix Table 11.If each student


times, and the results from the class a, c pooled,this
The result will be a table like'

would

Result of

2 Tosses
(3,

1)

(2, 5)

Averages

.\177

MSD

s \177

3.5

2-}

4\177

save

work.)

MAXIMUM-LIKELIHOOD

It

141

ma

(1)

i2

oes ,\177average

But

close

of the dice.

find

cr \1779

very easily,

be calculated

can

probabilities

tt\177ese

symm':etry

to

close

relative

the

table.

frequency

a relative

approach.
In (a), if the experiment
frequencies
would settle down to

ff\177eoretical

(b)

in

to 3t ?
to a27

close

(2) E oes s\"average


(3) Does MSD average
endlessly,

data

to array the

be convenient
answer

Then

the

calculating

After

were repeated
probal\177ilities

the

\177xploiting

by

table,

probability

relevant

(2)

iMS)

(3)

*7-3

(MLE)

ESTIMATION

UM-LIKELIHOOD

MAXI

(a) Introduction

The

question

characteristics?\"
most

statisticians

that

techniqu\177

example of sampling

a Bernoulli

from

flip a biased c\177in 10 times in


of heads,
an\177
get 4 heads.
(esti

solution

4 oi

With

,r T'

wer,

rr

(trials)

In

wor,

other

wouh

be if rr

4 heads fro m
that

\177This

popula
is also, oi

hereafter only

P--4/10)

proportion

sample

proportion

.1a

Is, if

rr

the

we might

--

rr)

\"-\177

.1, there
sample we

ask

(.1)a(.9)

is only about
observed.

ourselves

how

.8. The student can verify


.his sort
of population is only
;ion with ,r = .8 \177kould yield
were

the population
\"proportion.\"

course,

tc the

\177

common-sense

10 heads before us, we ask, \"Is


reasonable
.1, then the probability of four
heads
(successes)
\177vould
be, according
to the binomial formula

get

Similarly

would

The \177aximum-likelihood method is


often use. We introduce
it with an
population; to be concrete,suppose
we

order

in

to

ideas.

rr\177(1

that we

estimators

finding

order to estimate \177, the population


We shall temporarily forget
the
the

with

exist for

technique

\177tof

If

tosses

,r

nate

somi general

develop

some

Does

\177s,

attractive

these

w\177th

the

next\177

\177=

.006;

the

in our ten

.011

one

likely our
that

of

estimate

chance

in a

result of four

hu\177ndred

heads

probability
of getting
again
it seems implausible
sample result we observed.
the

probability of heads. But,

for simplicity,

we refer

142

ESTIMATION

143

MAxIMUM,LiKELIHOOD

we consider all

Similarly'

how

asking

:ely it is

li lb

that

the

other
re would

this

possible values for


yield the sample

re, in
that

eac(
we

case
ifi fact

The

results are shown in the first column of Table 7-2, and graphed
.. 7-6.
We refer to this
as the
likelihood
function, wheh
the
sample valu es of 4 and 10 are fixed,
and
the only variable in the functlon
is
the hypotheti ci tl value
of rr. For emphasis, we often write
this
as a funct\177bn
of
observed.

in Figur!

fully

re

alone'

The

maxir

likelihood
DefinitiO

likelihood

aura

estimate (,r

:tion. In general'

fum

= .4) is

the

value

maximizing

this

The

MLE

maximizes

hypothetical population
the likelihood of the observed

is the

value

which

(7-23)

sample.

We note'

(a) The s\177.mple proportion


P is our MLE of the population
proportion
re; it is often, l:,ut not always the case that the corresponding
sample value is
the MLE of tk e population
parameter.
(b) Figun 7-6 is the likelihood
function
for the particular sample we
observed,
(i.e., 4 heads in 10 tosses).
A different sample result would
call
for
a different like ihood function,
and
hence
a different MLE.
L(\177)

(\177)

.5

Gives
maximum

1.0
Hypothetical values of
population
proportion, rr

L( \177r)
FIG.

7-6

An

hypotheticid

ex,\177mple

of a

population

likelihood function. L(rr), the


proportions
would yield

likelihood
the

sample

that various
we observed.

function

144

ESTIMATION

L(\177r)

or o(x;

L(,r),

of various
values of \177r

hkelihood

hypothetical

the given sample


successes in 10 trials

yielding
/'4

4
(same

as 6in

Fig.

8
7-6)

of

10

x=

of

number

10 trials

p(x; ,r), probability


successes in

of x

trials, given

\177r
=

10

.8

1.0
b

= populabon
proportion

The binomial probability

7-7

FIG.

function

p(a:;

rr) plotted

against both x

and

rr.

is related to our previous deductionsabout


and 6. In this figure we graph the binomial
probabilities\234(x;
n, \177r). [Since
n is set at 10 regardless,this function
is referred
to simply as \234(x; ,r)]. In earlier chapters we regarded rr fixed and x variable,
as in slice a; thus the dotted function shows the probability
of various
numbers of headsif the population
proportion
is given as .8. In this chapter
we
regard x the observed sample result
as fixed,
while the population \177ris
thought
of as taking on a whole
set of hypothetical
values; thus slice b shows

7-7

In Figure

the

discussion

this

in Chapters

binomial

the likelihood

heads. Slicesin

that

the

3, 4,

various

a direction

possible

population

are referred

proportions

to as probability

would yield 4
functions,

direction are calledlikelihood


functions.
We now generalize maximum likelihoodestimation. (A summary
results is shown in the last three columns of Table 7-2 for reference.)
(b)

General

function

of our

Binomial

our result in the previous section was no


It is very easy to show that
estimate
of the binomial rr is
and that the maximum likelihood
the
sample
proportion
P.
Given any observed sample of x successes in n trials, the likelihood

accident,
ahvays

while

in the b

'slices

is

145

MAXIMUM-LIKELIHOOD

calculus
function

With
hood

can
easily be shown
>ccurs when ,r = x/n

il

=P,

[MLEofrr

the maximum
P. Thus

\177athat

value of

this

likeli-

the\177ampleproportion.]

We
argued
in ('\177hapter
1 that
it is reasonable to use the sample
proportion
to estimate the >opulation proportion; but in addition
to its intuitive
appeal,
we now add the\177more
rigorous
justification
of maximum likelihood' a polbulation
with rr -- > would
generate
with the greatest likelihood the sample
we
observed.

of

(c) MLE

is

have drawn

we

Suppose

which

a sample

is to

problem

o '\;") our

N(ht,

Normal Population

any

of

Mean

th

MLE

the

population is normal,
\177population mean 3t is:

sample.

Becaus e the

value x,

g\177ven

--

p(x;/z)

this

3t for

any

getting

of

probability

the

population

from a parent
of the unknown

x.,., xa)

(x\177,
find

(7-26)

(\177/20\177)(\177-\177)\177-

x,,/2 fro .2

th4

Specifically,

in our

first

while the

draw p(x\177
is ; 3t)

saiple

probabfilties of

e-(1/2a

the values

drawing

p(x\177;

3t)

\177

find where

,(rr) is a

maximum, set the

x\177that

we observed

(7-27)

)(x\177-\177)
\177

and

x2

x a are,

respectively

e-(\177/'\"\1771(\177 \'\177")

l
x,/2 fro

\177'\177
To

get the value

we would

that

probability

(7-28)

\177

equal

derivative

to zero.

d\234(\177)

(;)

(7-25)
[rr\177(n

-- x)(1

--

rr)'\177-\177-\177(
-- 1)+

x\177r:':-\177(1--

rr)'\177-\177]=

\177r\177-\177(1
-- rr) \177-x-\177,(7-25)becomes:

--rt0*

x)

You

can easily

'
co' lfirm

+ x(1
--nrr

\177.

rr)

q-x

=0

ti

that

this is a

maximum

(rather

than a minimum

or

inflection

point).

146

ESTIMATION

p(x;

/\177,)

Xl

x3

x2

(a)

p(x;

Xl

x2 \337

x3
(b)

\337

7-8 Maximum likelihood estimation


of the mean (p) of a normal
on three sample observations
(a:x, ace, %). (a) Small likelihood LOt,),
three ordinates.
(b) Large likelihood LOt0).
FIG.

based
of the

population,
the

product

and
p(xa; P)
assume
probability

We
joint

means

e-(1/

2a2 ) ( x a-tt ) -\370

(7-29)

as usual that
Xx,
X.,., and
Xa are independent so
function is the product of (7-27),(7-28),
and
(7-29)'
p(,x\177,

where ii

--

\"the

x2,

xa; p)

product

the

that

=
of,\" just

as \177 means

\"the

sum of.\" But

in

our

valuesxi are fixed and only p is thought


of as
varying over hypothetical values;we shall
speculate
on these various possible
values of/\177, with a view to selecting the most plausible. Thus (7-30) can be
estimation

problem

the sample

MAXIMUM-LIKELIHOOD

of

I[kelihood function

as a

written

147

\177

MLE

The

likelihood
consider

We

of \177is defined as the hy othetical


function (7-31). Its valuPe may
only a geometric interpretation in
in Figure

as

\177,'

served.Altho(\177gh

za (i.e., the

of all

\1770

fact,

m\177 \177amp\177e\177mean
prove\177,
as

7-12.

fi\177ced.

values'

(d) MLE

from

from a

'of
\177

\177 joint

tate MLE

in

\177opulation

with

popu

unknown

random

Parameter

a: y

now

We

the

in

seems
of xx,

is greater for
ar ently re mred
'

likelihood
is a pp
\1770
that
\177,

the
and

MLE
of
xa; this

\177qmight

in

c\177n,

this

is to

mathematically

Population

any

full

do

generality.
A sample (x\177, x\177...z\177)
function p(x; 0) where 0 is
we wish to estimate. From our definition

probability
that

of

tp\177

.whole

is obtained

sample

observedsample

values

of 0

by

maximizes

this

as fixed,
probability9\"

the

multiplying.

(7-32)

x\177; 0)

values

is
any

the

each

regard he

hypothetic\177

the

way to

the

But we

On

small.

7-8b is

(with replacement, or from


an infinite
population),
with the probability function p(x\177; 0), hence

sampling

probabilit\177

its

ation parameter

indepenpent,

are

population

who has carefully learned that \177is a fixed population


how it can appear in the likelihood
function
(7-31)
is simply a mathematical convenience.The true
vaiue
of
But
since it is unknown, in MLE
we must consider all of

its possible,
or hypothetical
treat
it as a va :iable.

drawn

of a
quite

more likel
. yto
are collectively closer to \1770,

in Figure

shift

sample.
It
average
value

in Problem

'his

of

large,

wonder

ma\177

in fact,

\177 is,

z\177 are

that a population
the sample we obthe probabfiitv of
so far distant \177om

reader

the

F\177nally,

parameter
as a variable.

to yieid

values

\177f the

me

\177.e.,

be.

We

Thus

httle additional

hkehh\177od

\177fiff\177J_z\177,t\177e

be

note

of\177.

likely

\1770 as

values. S\177nce the x


probability.

the spmple

have
a greater joint
han
\177
indeed,
very
. for\337\177, \177
\337 .

they

to

with
mean
.

a population

han\177
_L

we

but

7 8

Fieure

z\177 and

calculus,

with

the

maximizes

which

of/\177

derived

small because it is
three probabilities [i.e. the likelihood
'
the sample (xt, x2, xa)] is therefore

above

generating

generate

of
za) is very

probabilities

the

ordinate

\177,. The prod\177ct


with
mean
\177,\177

the other

vaIues
7-8a is not very

hypothetical

\177ut\" two

\"try

mean

with

value
be

and ask,
This

\"Which

is

of

emphas\177zed

all

148

ESTIMATION

L(O)

The MLE

function:

(7-32) the likelihood

by renaming

is that

I-I
of 0

value

hypothetical

(7-33)

O)
this

maximizes

that

likelihood

function.

(e)

Estimation

MME)

versus

(MLE

of Moments

vs Method

Likelihood

Maximum

above, we have estimated


a population proportion with
a sample
proportion,
and a population mean
with
a sample mean. Why
not
always
use this technique, and estimate
any
population
parameter
with the
corresponding samplevalue ? This is known as method of moments
estimation
(MME).
Its great advantage is that
is it plausible
and easy to understand.
In the analysis

MLE

Moreover,
But

suppose

often coincide.

MME

and

do

two methods

the

circumstance MLEis

(as

differ

7-14)? In such

in Problem

appeal of MMEis

The intuitive

superior.

usually

more

impressive advantages of MLE. Since MLE is


to generate the sample values observed,it is
in some sense the population value that
\"best
matches\"
the observed sample.
In addition, under broad conditions
MLE
has the following asymptotic

than

by the following

offset

the population value

likely

most

properties'

1. E\177cient,

is,

that

than

variance

smaller

with

2. Consistent,

estimators.

other

any

with variance tending

unbiased,

asymptotically

to zero.

3.
it may

be readily

\177,

Theorem

proved

MLE

the

(6-13);

mean and variance;

hence

used to make inferences.

For example,we
for

easily computed

with

distributed,

Normally

already

have

of/\177

in

a normal

seen that

these three propertiesare true


[Property 3 follows from

population.

2 follows from (6-10) and (6-11);Property


texts, and has beenalluded
to in (7-17).]

Property

in advanced

We emphasize that
large samples as n \177 ,v.
for example, MLE is not

these

But

properties

for the

necessarily

are asym?totic,

small samples often

that

used

is,

1 is

true

for

by economists

best.

PROBLEMS

'7-11

Following Figure 7-6, graph the likelihood


6 headsin 8 tosses of a coin; show the MLE.

function

for a

sample of

'7-12

Derivg the MLE

'7-13

(a) D\177!rive
kilowE

'7-t4

As N

(b)

the

of/\177

MLE

nt

tmbered 0,

numbe\177

N,

numbe

at is the

(b)

at is

2.

MME

of N?

provided

corridor

Is

the MLE of N? Is it

calculus\370

assuming/.t is

it

the

a sample

successive
unknown

of 5 tags
'

biased
biased

given

EADING

lescription of the

Lindgreh,

distribution,

normal

the

arrived at a convention, they


were
1,2, 3,... N. In order to estimate

detailed,

1. Wilks,

for

a brief walk in the


37,
t6, 44, 43, 22.

(a)

FURTHER

For a

.=d

a\"

using

unbiased

t delegates

tags

of

population,

a normal

for

149

READING

FURTHER

\177
\337
S.

B.

virtues

of MLE,

Mathematical
Statistics,
W. Statistical Theory,

see for example

New York'
New

York'

John

Wiley

Macmillan

& Sons (1962).


(1959).

chapter

Estimation

8-1

IN

DIFFERENCE

TWO

MEANS

In the previouschapter, we used a sample mean to estimatea population


mean. In this chapter
we will develop several other similar
examples
of how
a sample statistic is usedto estimate
a population
parameter.
Whenever two population means are to be compared,it is usually their
differe\177ce
that
is important,
rather than their absolute
values. Thus we often
wish

to

estimate

[gl -A

reasonable

in

sample

estimate

of

this

difference

/tg2

in population

means is the difference

means

(8-2)

(Assuming normality of the parent populations,this is the maximum likelihood estimator, with many attractive
properties.)
Again, because of the error in point estimates, we are typically
interested
in an interval estimate. Its development
is comparable
to the argument in
Section 7-1, and involves
two
steps:
the distribution
of our estimator
(.\177 -- -\1772) must
be deduced;
then this can be \"turned around\" to make an
inference
about
the population parameter
First, how is the estimator (.\177 -- \177) distributed? From (6-31) we know
that
the
first sample mean .\177x is approximately
normfilly distributed around
the population mean
bq as follows.

where
the

o\177 represents

sample

drawn.

the variance
Similarly

of the

first

population,

and

nl the

size of
(8-4)

150

DIFFERENCE

deviation

Standard

FIG. 8-I

random vari{,bles .\177


can

(6-13)

b\177

true,

assuming

(.\177tthat

approximateliy

true (by the


populations.

Un_der tttese

2'2) behaves

(5-31),

two

the

and

(5-34\177),

8-1 Equation (8-5) is


are normal; it still remains

in Figure

shown

is

.\1770.)

both populations

central

conditions, our
can

now

(8-5)

practically a\177y
(.gt

that

ensure

will

hence

independent;

are

-\1772

=
of

distribulion

This

and

directly'

applied

y,)

exactly

of (.\177 -- -\177o,),

Distribution

sampling procedures

of the two

MEANS

density

Probability

Independenc

IN

knowledgein

be turned

large

for

theorem)

limit

samples

the estimator

of how

(8-5)

around to construct

from

confidence

the

interval:

95 \177 confidence

(\177-

When

for

cr\177and

(btl

--

go, have

The variinces
not known;
is an

new

\177o,) 4-

a common

1.96

in

means

(ttt-/to,)

(8-6)

/o_\177 q-c\177_.\177

value, say a,

95,Voo confidence

the

interval

bto,) becomes'

(\177-

s\177and

for the difference

interval

tl\177.e best

of

the

the

\177\") 4-

two

statistician

1.96ox,//

populations,

-3-

(8-7)

cr\177 and

can do is guess at

cr\177in

them,

usually
the variances

(8-6)are
with

s\177 he
cbserved
in his two samples. Provided his sampleis large,
this
accurate enough approximation;but with a small sample, this introduces
source
of error. The student will
recall
that
this same proble m was

152

II

ESTIMATION

a single population

in estimating
next section we shall give

encountered

in

mean

Section

7-1. In the

problems of small-sample

for these

solution

estimation.

PROBLEMS

8-1

12 minutes

sample

minutes

11

of

of 50 workers
to complete the

minutes. Construct a 9570

in

large

a second

task,

interval

confidence

standard

with

average of
2 minutes.
plant took an average

plant took an
deviation
of

large

a standard

for the

of 3

deviation

difference between

averages.

population

two

the

one

complete

to

random

of 100 workers in
a task, with

sample

random

8-2 Two samples of 100 seedlings


were grown with
two
different
fertilizers.
One sample had an average
height
of 10 inches and a standard deviation
of 1 inch.
The second sample had an average
height
of 10.5 inches and
a standard deviation
of 3 inches.
Construct a confidenceinterval
for the
difference between the average population
(a)
At the 95 \177olevel
of confidence.

(b)

8-3 A

random

The

first

6. The

of confidence.

90 70 level

the

At

heights

of 60 students was taken


in two different
universities.
an average mark of 77 and a standard
deviation of
second sample
had
an average
mark of 68 and a standard
sample

had

sample

10.

of

deviation

(a) Find a 9570 confidence


interval
in the two universities.

for the

difference between the

mean

be necessaryto cut

error

marks

(b)
(c)
error

the

by 1/2 ?
increase

What

allowance

in the sample
to 1.07

size would be necessaryto

SAMPLE ESTIMATION:

8-2 SMALL
We shall

sample size would

in the

increase

What

allowance

assume in

THE

that

section

this

reducethe

DISTRIBUTION

the populations

are normal.

(a) One Mean,


In

estimating

a population

generallyhas no information
he uses the estimator s, the

mean/\177

on
sample

the

from

a sample

mean X, the statistician

population
standard deviation
standard
deviation.
Substituting

or;
this

hence
into

DISTRIBUTION
p (t)
as t

same

Normal,

...,..\177/

w\177thd.f.

oo

/
'\177

1.96
= z.025

t.025 =

FIG.

The standard

(7-10), he est:mates

the 95

\177

and the

distribution

normal

interval

confidence

\177a=

F:

.. d.f. = 5
d.f. = 2

! distribution

t';

t. o2\177=

4.30

cornDared.

as,

for/\177

s
Z.o\177-

q-

(8-8)

smaller

large (at least 25-50, depending


on
the
accurate approximation. But

his sample is

Provided

required),

will

this

be

sampie size, this

error. Hence if he wishes


must be broa lened.How
that

Recall

\177Vhas

precision

a reasonably

an

introduces

substitution

to remain

95\177o

confident,

with

appreciable
source of
his interval
es'timate

much?

distribution;

a normal

when

is known,

we mav

standardize, obtaining
_x.-g

\"Student

Z is ,,th'\177standard
x

(8-9)

a/\177/n

where

ariable,

variable

normal

defined

as
t

By

analogy,

we

introduce

a new

(8-\1770)

similarit'
of these two variables is immediately
evident.
The
only
difference is tt\177at Z involves or, which
is generally
unknown;
but t involves
s, which can \177lways
be calculated
from an observed sample. The precise
distribution
ot t, like Z, has been derived
by mathematicians
and is Shown
in Table V of
he Appendix.
The distribution of t is compared
to Z in \177igure
The

8-2.

This

and

t variable

later

because

proved

it is not

was first introduced by Gosset


writing under the pseudonym \"Student,\"
valid by R. A. Fisher.
We make no attempt to develop
the entire'proof,
\177eryinstructive.
It can be found
in almost any mathematical
statistics
text.

] 54

II

ESTIMATION

(We

forget

in order

on,

letters denoted their

to conform to

values, we shall use small

realizedvalues.
entirely

we shall

usage,

common

represent
either
letters
! and s, and

to

letters

capital

now,

Until

notation.

in

small

while

variables,

convention;

this

a break

emphasize

must

denoted random
But from now

random variables or realized

capital letters X, X,

Z,

P,

etc.)

As expected, the t distribution


is more spread out than the normal,
since the use of s rather than a introduces
a degree of uncertainty. Moreover,
while
there
is one standard
normal distribution, there is a whole
family
of
t distributions. With
small
sample
size, this distribution
is considerably
more
spread out than
the
normal;
but as sample size increases,the t distribution
approaches
the normal, and for samplesof about
50 or more, the normal

becomesa very
The
rather

accurate

x 2 we

approximation.
t is
not tabled

may write

d.f.
For example, for a
Appendix Table V that
in the upper tail is

s'
-----

sample with
the

in

8-2.

Figure
Pr

Substituting

for

critical

(--4.30

t value

a sample

for

of size

now

be

--

(8-11)

d.f. --

2, and
2\177

leaves

which

we

find

from

probability

4.30

<

<\177t

it

that

follows

4.30)=

for any observed

(8-12)

95\177

(8-10)'

Pr --4.30<
This deduction can

3, then

n =

By symmetry,

! according to

of freedom

degrees

t.o\177os =

This is shown

to sample size (n), but


divisor
in s 2. Thus, in cal-

according

of freedom,\" the

to \"degrees

according

culating

of

distribution

s/x//\177

\"turned

3, the 95 \177 confidence


= x

<

= 955/0

4.30

around\"

into the following inference:

interval

+ 4.30

for/z

is

(8-14)

phrase \"degrees of freedom\"


is explained
in the following
intuitive
way:
Originally there are n degrees of freedom in a sample of n observations. But one
degree of freedom is used up in calculating
,V, leaving
only n -- 1 degrees of freedom
for
the residuals (X\177-- .\177)to calculate s2.
For example,
consider
a sample of two observations,
21 and 15, say. Since X -- 18,
the
residuals
are q-3 and --3, the second residual necessarily being just the negative of the
first. While the first residual is \"free,\" the second is strictly
determined;
hence there is only
1 degree of freedom in the residuals.
Generally, for a sample of size n, it may be shown that if the first n -- 1 residuals are
specified,
then the last residual is automatically
determined
by the requirement that the sum
of all residuals be zero, i.e.,Z(X\177 -- X) = 0.

9.

The

'DISTRIBUTION

For a general sample

95 % confidence

size n, the

155

for/\177 is

interval,

where (00.5 is
wi'

tail,

upper

the
n

To sum t

critical
t value leaving
1 degrees of freedom.

is substituted

17-10). The
or

or,

when

do

normal

the

estimation

is that

difference

consequence
question

practical

\177ve use

of t

simiIarity

only

as a

and

value.

impo\177,.ant

An

and

norma

the

in the

probability

the

of

2\177

--

estimation in p, we note the


for

(8-15)

.g 4- t.o25- \177

,u

is:

an observed

a critical t

the

(s)

be substituted

the

we use

do

\"When

?'\177'i'\177'\"'Cr\"is kno\1777\177

sample value

must

value

normal

and

(8-15)

in

di\177trlh,,tic\177n

normal-d';\1777\177'\177,',\177,\177\".'\177

if cr)s unknown, then t\177i'b-\177tdistri155\177\"iio'n is theoretical'l\177;;;\177't\177r'i:


of sample
size. However, if the sample sizeis
the
normal
\177san accurate
\177nough
approximation
s of the t. So in\234ractice
the t distribution
is used only for small sam?lea when o /s unknown
.... and
the normal is used
but

used;

ate

large,

regardles\177

otherwise.
t/on

from

\177

e :t distribution
the

whic\177

requirement fcr

all

our

one additional

has

sample

is drawn

estimation,

small-sample

the #arent

requirement.

is assumednorma/.
even if

cr

p\370Pula

normalit

(But
is

known.

v is a
Recall

population was validated


by the central
mir theorem only if the sample size was large.)
As sampl,
size
(n) decreases,
estimation becomes less precise (i.e.,
interval
estima{es
become
wider). The two reasons for this are clearly
distinguished
in (0-15). First, the divisor
x/';
becomes
smaller. This appears in
(7-10)
as well as in (8-15);
thus even if tr is known and inference is based
on
the
normal
di\177
;tribution,
the error allowance increases and the
interval
estimate
becorr
es wider as a consequence.The secondaryreasonfor loss of
precision
occur' if s must be substituted
for an unknowl\177 or. The
smaller
the
sample, the mo
e the appropriate t distribution
will depart from the normal;
and the more s
\177read the
t distribution,
the broader the interval
estimate.
from Chapter

(b) Difference

that

n Two

inference

Population

We shall a
sume, as often
populations ma
have

a This

may be vet

about

a nonnormal

Means (3q --/to,)

occurs

in practice,

different means, they

that even

have a

common

two

the

though

varianc

led from TableV. For example, a 95 \177 confidence


interval constructed
a sample of si
_,e 61 should
use a critical
t value of 2.00; but the use of the normal
Value
of 1.96 as an apprc\177ximation
involves
very little (2 \177) error.
As we scan dm
rn the t.0e\177column in Table V, these critical values approach z.09,\177_-= 1.96.
this verifies
Figure
8-2, where the t distributions approach
the normal.
from

ESTIMATIONII

156
When

mated. The
both
to

(8-7) is

o \370-is known,

appropriate estimate

samples,
and then
obtain
an unbiased

where

X\177

sv for o in
95 % confidence

appropriate.

(8-7)

ith observation

the

that

requires

(n\177

1) + (n2

called the pooled samplevariance'

estimator

the

represents

of freedom

degrees

the

the

in

also

- I),

Substitution

of

be used, obtaining

the

sample.

first

distribution

esti-

be

squared deviations from

add up all the

is to

divide by

o 2 must

unknown,

When

interval'

/I

(8-17)
'N

where

t.o2, is the

critical

with

value

d.f. =

I? 1

1ll

2F n.z --

112

2.

PROBLEMS

8-4 Sixteen weather


stations
at random
locations in a state measure rainfall. In 1967, they recorded
an average
of 10 inches\177 and standard
deviation
of 1.5 inch. For the mean rainfall for the state,
(a) Construct a 95 % confidence
interval.
(b) Construct a 99 \177oconfidence
interval.
8-5 100 cars on a thruway
were
clocked
at an average speed of 69 m.p.h.,
with a standard deviation of 4 m.p.h.
Construct
a 95 5/0 confidence
interval
for the mean speed of all cars on this thruway.
(8-6)
A random
sample of 4 students in a large statistics course received the
following marks' 56, 70, 55, 59.Construct
a 95 5/0 confidence
interval
course.
8-7 From
a sample
of five random normal numbers from Table IIb find
a 95 % confidence interval for the mean of the
population.
8-8 Five people selected at random
had their breathing capacity measured
before and after a certain treatment, obtaining the following data:
for the

average

mark

all students

of

in the

Breathing Capacity

Persoil

(X)

Before

After

(Y)

Improvement

+ 100

2750

2850

2360

2380

2950

2800

2830

2860

+30

2250

2300

+ 50

-3-20

-- 150

157

PROPORTIONS
Let/\177x

(and/\177>,)

wt

(a)

mean capacity of

be the

;er) treatment.

af

(and

(point)

is the

at

(b) Co \177struct

estimate of the mean

a 95 \177oconfidence

Given the

8-10

assuming

means,

p?pulation

\177nthe

25

'\1771

60.0

S1 =

15

\177.,

68.0

s., =

method as in

8-12 Derive le confidence

8-3

PROBLEM

in disguise.

of 10, then
=

\337
P

\177=\177 \177a

The

portion

the

(1

95,5/o

We confirm
(the

th\177it

standard

deviation of P
But we

right-hand

(8-18)

se\177m

is just

deviation

as given

to have

in

P 4-

+ 0) =

the population
an interval estimate

rr is just
deriving

the

for

estimate

interval

--

1.96\177/rr(1

a recasting
of.\177)

1 q-. 0

1 +0+

to modify (7-10),
interval for rr is

rr

\177/cr-\177

of

method

\177implest

is therefore
confi
lence

AGAIN

+0+0+0+
proportion

thetlpopulation

Similarly,

disguise.

ONCE

we saw that
a sample
proportion
P is just a
For example, if we observe 4 Democratsin a

6-6,

Sectioi

In
mean

THE

PROPORTIONS:

POPULATION

ESTIM\177,TING

ELECTION

Use practically

(7-4).)

(8-5).

from

(8-6)

interval

d\177fference

12
10

(Hint.

(8-13).

to equation

footnote

the

l\177he

--

(/\177

interval (8-14) from

ae confidence

the sam

of 20

devih-

squared

0.2

for

interval

'5 ,g/oo confidence

Derive

was 27

age
sample

2 populations'

from

o\177 =

--/&v)

0'2.

n\177 =

and assume
Find a

a\177 =

samples

random

following

n.o

\3378-11

improvement (/\177

for

interval

In
a random
sample of 10 football players, the average
and the] sum of squared deviation\177 was 300' In a random
hockey iplayers, the average age was 25 and the sum of
!\177ons w\177s 450.
Estimate,
with a 95 \177oconfidence
,nterval,

8-9

before

population

whole

the

in

for a proThus

a mean.

(8-18)

rr)

by

\177

mean'/\177

of (7-I0)..\177
is replacedby

is replaced

sample
sample

x/\177r(1

--

rr)/n

Pi

and

[the standard

(6-30)].

reached an impasse; the

side of (8-18).Fortunately,

the

situation

unknown

has a

rr appears

i'n the

remedy' subslitute

158
the

? for

sample

before, when
this

II

ESTIMATION

we

substituted

right side of (8-18).This


is a strategy
s for \177in the confidence
interval

introduces another source


great problem. Thus'

approximation

sampl e size,this

the

\177-in

for

val

a large

with

inter-

confidence

95 }5

\177r
is'

,r

For

but

error;

used

Again,

is no

For large samples,the

where z.0,,5
tail. As an

of

we have
for/\177.

(8-19)
= P

4-

Z.o25

- ?)

is the critical
value leaving 2\177%
example, the voter poll of Section

probability in the
this formula.

of the

upper

1-1 used

options. The simplest is to read


8-4, a table which
is constructed
in the following manner. The first step is the mathematical deduction of how
the
variable
estimator
P is distributed, for any population
rr. This
is shown
for a sample size 20 in Figure
8-3. Thus,
for example, if rr = .4, then the
sample P has the dotted
distribution
shown in this diagram, and there is a
95 \177oprobability
that any P calculated from a random
sample
of 20 will
lie
in the interval ab. For each possiblevalue
of rr, such a probability
function
of ? defines two critical points like a and
b. When
all such points are joined,
the
result
is the two curves enclosinga 95 \177oprobability
band.
This description of how the statistic ? is related
to the
population
rr
can of course be \"turned
around\"
to draw a statistical inference about rr
from
a given sample P. For example,if we have observed
a sample proportion

the

?\177

interval

samples,

small

estimate

11/20 =

.55,

then

there
are several
for rr from Figure

the

95 %

confidence

for

interval

rr is

defined

byfg, the

Probability

Outline

/
\177

'

of the

probability
when

P,

..//'-../

Hypothetica
\177= .4

obse\177ed

P\177=

.55

FIG. 8-3

Distribution

1.0

Pg
of P.

discrete

function of
7r = .4

PROPORTIONS

',\177

159

1.0
,.9

.8

.7

.6

.5
.4

.3

.2

.I
0

.2

.t

.3

.4

.5
Sample

FIG. 8-4 95,5/,!


confidence
permission
f
Professor
of Confidence

width of

Limits

band

\177r\370bability

the

of ttte?

vertical
This

is

tt\177e

above

P\177,

< rr <

i.0

logic

i.e.,

.77

interval

axis, the (induced)


of the rr axis.
same

.9

(I954), p. 404.]

probability

(deduced)

direction

.8

for population proportions (rr). [Reproduced


with
from C. J. Clopper and E. S. Pearson, \"The Use
Illustrated
in the Case of the Binomial,\"
Biometrika,
26

.31

Whereas

.7

E. S. Pearson

this

direction

.6

intervals

the

Fiducial

(8-20)

is defined

confidence interval

we have used in
pause
briefly to

deriving

the horizontal

in

is

defined i n

confidence

the

intervals

review, because this is a imore


generalized ar\177.urnent
than
we have previously
encountered. Suppose the
true
value
of rr is .4; then
there
is a probability
of 95 \177othat a sample P will
fall between
a 4tnd b. If and only if it does (e.g., P0 will the confidence
interval
we construct \177racket
the
true rr of .4. We are therefore
using
an estimation
procedure whii:h
is 95 \177o probable
to bracket
the true value of rr, and thus
before. We

will

nevertheless

ESTIMATIONII

160

yield a correctstatement.
sample P will fall beyond

recognize the 5 \177 probability


that the
this
case our interval estimate

we must

But

a or b

(e.g., Po,);in

our conclusion will be wrong.


Why is this
a more
general theory of confidence interval
estimation
?
In previous instances (e.g., estimating
a population
mean,/\177) we constructed
a confidence
interval
symmetrically
about
our point estimate \177. But in

will

not

bracket

rr --

.4, and

rr, no such symmetry


is generally
involved. '\177For example,
with
our observed sample proportion P\177 = .55, the confidence interval
(8-20)
we
constructed
for rr was not symmetric about our point estimate .55.
The
95 3/0 probability
band in Figure 8-3 is set out in Figure
8-4, along
with
the
similar
bands
appropriate
for other sample sizes. This
neater

estimating

is used to construct 95 % confidence


As an example, if we have observeda P

diagram

confidenceinterval

for

rr is

in

of 15, the

a sample

95

approximately

.32 <
For

for

intervals

= .6

the same P = .6 in
rr is narrowvet'

<

rr

.84

sample

larger

of 100, the

95%

confidence

been

used,

for

interval

.50 <
Alternatively,

the same

with

such a

result, i.e.,

rr

<

.70

large sample,(8-19)could

have

with

,'(.6)(.4)
\275r--

.60 4-

= .60 4-

1.96\\/

100

.lO

there is a third
method
of estimating
rr that we introduce, not
for its practical value as for its illustration
of this useful principle'
with
a little
imagination,
several alternative methods of solving
a problem
can often be developed,and the most appropriate
one to use in a given set
Finally,

so

much

of circumstances is a matter
of judgment.
Let us be conservative, and ask: what
interval
estimate
in (8-18), i.e., what
is the

is

the

maximum

maximum

value

width of the
that

the

error

The student may have wondered why the 95 \177oprobability


band does not converge
on the
two end points
O and R. It is true that
one half of this band (madeup of all points similar
to b) does intersect the P axis at 0; this means
that if ,r is zero (e.g., no Socialists
in the U.S.),
then
any sample P must also be zero (no Socialists
in the sample). But the other half of this
band does not intersect the ,r axis at 0; instead
it intersects at h. This means that an observed
P of zero (e.g., no Socialists
in a sample) does not necessarily
imply that ,r is zero (no
Socialists
in the U.S.).

I. 6

allowance

,
//Tr(1

'

can be

that
\177

'

for

interval

confidence

valu\177F\177fe\177((IsJ;-\177i7)t't\177\177*;'\1775/o
writte

:'\177

shown'

It Is easily

have

----:'
can
-- rr)

PROPORTIONS

(with

7r

the

161

maii\177imt, m

sample),

a large

/n

or simply'

anl

1/4,

than

(8-21)is an
For

aming the worst; if, in


our interval estimate

is as:

this

But

Democrats
accurate

For

be

not

In these

1/2.

to

close

-- rr)'is less
or to restate,
of confidenc e.

rr(1

then

wide;

this

fo r 7r with at least a 95\177 level


simple formula is sometimes used in
the basis of historical experiencethat
the

on

knbwn

isI

need

very

this

it is

where

not 1/2,

7r is

fact,

esti mate

i \177
terval

example,i

(8-21)

.98

P +

rr =

'

polls

political

of

proportion

circumstances (8-.21)becomesia

very

approximation.
we

coml\177leteness

the 95

write

interval

\177 confidence

for the

difference

2 proportl9ns:

in

large n,

r_

'\1772) =

(P\177

P2) +

l --

/Pi(

1.96

is derivei

The simplest

u ay

To prove

way as

calculus, setting the


we may simply

is with

it without

same

the

essentially

in

derivative

calculus,

graph

(8-22)

P2(I -- P2)
tl 2

Hi

\177

This

P1)

(8-6).
of rr(l -- rr) equal to zero.
f(rr) -- rr(l -- rr). as follows'

for)

1 \\
\\

Note

that for ei her extreme

then rr(l --

rr)

r\177

aches its

value of

maximum

rr

(t

value

or

O)

the

because

value of

f(rr)

of symmetry

is zero'

of the

and

parabola.

if ,-r

1/2,

162

II

ESTIMATION

PROBLEMS

8-13

a 95 \177oconfidence

Construct

a random

(a) If the sample


(b) Ifn =25.

size

company,
20%
confidence

certain

by

company's standards.Construct
proportion rr (in the whole

meet the

not

do

which

produced

of tires

sample

random

did not meet the


interval for the

(8-15)

were 4820 Republicans in

of 10,000.

sample

8-14 In a

proportion of Repub-

for rr, the

interval

U.S., if there

in the

voters

lican

a 95,5/o

of tires)

population

standards,
n = 10.

= 2500.

(c)

If n

By

talking

you discover

I5 voters

to

3 favor

only

that

a certain

candidate. Construct a 95 \177 confidence


intervaI
for the proportion
of all voters favoring this candidate.
(8-16) In a random
sample
of 100 British smokers, 28 preferred brand X.
Construct
a 95 \177 confidence
interval
to estimate the proportion of all
intentions,

consumer

U.S.

of

survey

X.

who prefer

smokers

British

8-17 In a

indicated that

sample

of 2500

a year.

Construct
a 95 \177 confidence
intending
a new car

intended

they

families

U.S.

(a) Answer

interval

498 families in a random


to buy a new car within
for the proportion of all

purchase.

ways:

two

(1) Using

the usual formula (8-19).

(2) Using

the

formula

simplified

(b) If the sample P

had

(8-21).
the error

would

.40,

been

in

(a2)

been

have

greater?

8-18 If

rr

(8-21)

3/4, what is the precise percentageerror introduced by using


rather than (8-19)? Does this suggest
that (8-2I) is a reasonable

95

\177oconfidence

of safe

1/4

provided

approximation,

8-19 A sample of 100cars was


the cars passed the safety

cars

8-20 A sample

in

the

taken

1948.

interval
for
two cities.

of 3182 voters

Construct

3/47

cities. In one city 72 of


passed. Construct
the difference between the proportions

yielded

of 2

in each

the

second

the

a 95 \177 confidence

only 66

following

SenatorJoseph

* From S. M. Lipset,\"The Radical


Right\" in
York Criterion Books [1955].

New

\177r<

test; in

relating their attitudes to


in

<

interval

Bell, Daniel,

frequency

McCarthy

for (%
ed.

and

--

The New

table,*

their

rr2),

American

where

vote
%
Ri\177ght,

163

VARIANCE

is the >roportion of

and rr 2

who were

voters

Democrat

all

voters who

of Republican

proportion

is the

were

pro-McCarthy,
pro-McCarthy.

to

Attitude

McCarthy
[

1958

Pro

Anti

Democrat

506

1381

Republican

563

\177,Vote

trban

In an

790 favored

180 opposed the


99,5/o

Cdlnstruct

of city

prdportions

the

certain legislation.In a rural


same legislation.
confidence
interval for the differencebeiween
and country voters who favor the legislation.

1000,

of

survey

of 300,

survey

'8-4 ESTIMATING
THE
VARIANCE
POPUI,ATION: THE CHI-SQUARE
is

There

732

OF

A NORMAL
DISTRIBUTION

a confidence
interval,
interesting
\177ot so
for the insight it provides.
Consider a normal population N(t\177 , cr \177)with both/\177 and cr2 unknown.
So far we hav estimated a2 with s 2 only as a means of finding
a confidence
interval for )t. Now suppose, on the other
hand,
that our primary interest is
\177na

, rather

is there

of the

in

example of

ne further

for its

much

value 6 as

tactical

example,

For

tl}an/\177.

requirement

countr\177's

of

is the variance

What

policy aimed at

if so,

distributed;

have

We

of \177=; but to
estimator s=

random

the

that

how do we

.\177iready

construct
C

farm

stabilizing

we shall assurge

we may

to

\"How much

ask

is necessary.

income

variable

variance

\177To

(e.g., farm

estimate

variance

income) is normally

proceed'\177

7-2,
an interval estimate
in Section

seen,

around

istributed

wish

of payments?\" in order
to get some indication
of foreign exchange reserves.Or, we may ask
farm
income9\"
in order to evaluate whether a

balance

Jap\177an's

\17727\"

To

that

s 2 is

for \177=we
answer

an unbiased estimator
must ask' \"how ls the

this,

it is

customa[y to

interval for a\177'


is of limited practical use is that
it depends
th e parent
population
is normal. By contrast,
most of the
confidence interv'.ils
for means remain approximately true
even if the parent population
is
nonnormal;
such confidence intervals
are called robust.
One reason tha'
crucially on the

th e confidence
ssumption
that

? Income

ition

reasonably

income

stabiliz

policies

high evel.

(i\177). Here

i We

Thus

are almost always


designed
to stabilize income around
a
aim both at reducing variance
(a =) and raising average

they

concentrate

only

on the

variance problem.

164

II

ESTIMATION

p(C2)

= 50

\337
<--d.f.

'

= 10

,,\177d.f.

\177

d.f.j:2 \177

.325

(where

8-5

FIG.

define a

C 2:

2.05

$2/o-2

s 2 = a 2)

of

Distribution

the

chi-square,

modified

C 2.

variable'

new

,A

s2

C2 _

(8-23)
0 .2

Of

when s 2 =

course,
C

is

\"how

ratio is
1 ?\"

this
around

cr 2,

distributed

1;

thus

our

question

can be

rephrased:

a modified Chi-square variable, with


n - 1 degrees of freebeen proved by advanced
calculus
that the distribution
of C 2 is
that of Figure 8-5; critical values
are given
in Appendix Table \276I.
Since
its numerator
s \177and denominator
rr 2 are both positive,
the variable
C\177is also always positive, with its distribution
falling to the right
of zero in
Figure 8-5. For small
sample
values
we note that
it is also skewed to the
right; but as n gets large, this skewness
disappears
and the C 2 distribution
approaches
normality.
Since s'\177is an unbiased estimator of rr \177,this implies
that the expected value of each of these
C 2 distributions
is 1. Moreover as
sample size increasesC\177becomes
more
and more heavily concentrated
around 1, indicating
that
s 2 is becoming
an increasingly accurate estimator

C2

is called

It has

dom.*

of G 2.

With

target

or2,

this
we

now infer

familiar technique.
*

C 2 is

comprised of

degrees of freedom

of how the estimator s\177-is


a 95\177oconfidence
interval

deduction
may

We

the

as s 2

with

illustrate

constant

parameter

[explained

in

the

sample

size

around

distributed

for a \177using
= 11 (d.f. --

cr 2, and the variable s2. Thus


footnote to equation (8-11)l.

our

its
now-

10). From

it has

the same

V^m^NC\234

Figure 8-5'
21

off

cutting

)r more
\177 of the

for o

<

.325

<

Pr
If

obse

the

ri

for

interval

value

:d

\177

\177

of s =

turns

cr 2

and

(8-24)

95\177

statement
= 95

to be

(8-25)

\177

3.6, then the 95 \177

o=

confidence

(8-26)

11.1

<

that

note

critical 'points

is

1.76 <
We

165

thus

<

out

the

find

we

VI

tail;

2.05

<

the equivalent

obtain

, we

in each

distribution

Pr

Solving

Table

precisely from

\177

his is another exampleof an asymmetrical


In gener [1, the upper and lower critical
values
interval
is written
C}n and the 95 \177 confidence
S2

interval.
=
, C .025

confidence
of

C=

are

denoted

S2

<

(8-27)

\177

\370\177
<

CY0=,

PROBLEM

*8-22

If a

sa\177
\177ple

constr

*8-23

From t

of

25 IQ

scores from

certain

s 2 --

has

population

120,

cta 95
of Problem

\177esample

8-6, construct

a 95

interval

\177oconfidence

for

Review

8-24

Probl
Two

the same good. In 400 articles


were substandard. In the same length
of
time,
tl te second
machine produced 600 articles,and 60 were substandar J. Construct 95\177oconfidence
intervals
for
(a) rq, the true proportion of substandardarticles
from
the first
produC\177

\177,d
by

used

are

\177chines

machine

to produce

A, 16

machin}.

(b)

. lthe

rr=,

true

proportion

of substandard

articles from

machine.

(c) Theldifference

between the

two

proportions

(rr\177

--

rr2).

the

second

ESTIMATIONII

166

8-25 To

that

Assume

of data
or

by

kinship,

for

the \"vitamin

effect,\"
the

effect

Rate

.Beats per

Minute

After Experiment

Experiment

Smith

71

84

Jones

67

72

Gunther

71

70

Wilson

78

85

Pestritto

64

71

Seaforth

70

81

that

Suppose

it is

Calculate
a 95 \177
on heart rate.
scientist
one

result

known

that

normally

approximately

rate

8-27 A certain
\"Sofar

through
a certain experiment.
rate, he collectsthe following
data:

6 people

on heart

Before

Person

ment

\"\177

% = %, and that
the
mice are not paired [i.e., the
(12 and 18)doesnot come from mice that are related
anything
else]. Construct a 95% confidence
interval

Heart

heart

16

\177

8-26 Suppose a psychologist runs

In orderto find

3 mice.

Treated Group

Group

19'

row

for 2 groups of

(in grams)

increases

weight

Control

first

the

supplement,

vitamin

obtained'

data were

Table of

certain

of a

effectiveness

the

determine

following

confidence

as a

people

distributed,

with

for the

interval

concluded
his study
has emerged from

whole have an

in

the

fertility
before-and-after

mean

average

73.

effect of the
control

experi-

as follows:
survey,

and

key measure of the outcome'


at the end of 1962, 14.2% of the
in the sample were pregnant, and at the end of 1963 (after
the birth-control campaign)
11.4\177
of the women (in a secondindependent
sample)
were pregnant,
a decline of about one fifth.\"
If the
samples
(both before and after)
included
2500
women,
what statistical qualification would
you
add to the above statement,

it is a
women

in orderto make

its

meaning

clearer?

:r

chpt,

Hypot

We be
shows

I notice

number of 16sses..27aces.This

a loaded die\177i
dice recently
my

Is

die, and lose whenever

I have

that

an in?rdinate

suffered

ect tha t mY

me susP

the

opP onentssusin g

to wonder Whether
this
is one of the 9rooked
aces one-_\177u\177ar\177L\177te\177\2342\177.,\177e\177tim_\177e:..,
.\177
,
well-founde
}\177at
I should make an accusation,
game
? My decision should depend on several
factors.

\177uspicion

1. Hov

makes

the philoSOphical

to keep

order

I begi n
as giving

specificallY,
advertised

and termi%te

in

with a

gambling

I am

100 throws

After

a\177e.

example,

simple

a very

it

issues clear.lSupposethat
die

HYPOTHESIS

SIMPLE

TESTING

the

mt\177Ch

I trust

did

my opponent

even before I began the

game

? For example, if I am playing


with
a sharplooking character I have just met on a Mississippi steamboat I will \177bemore
inclined
to erminate the game than
if [ am playing with
an old and trusted
to

(prior

the

c\370ileCting

evidence)

friend.

2.

potential
losses involved in making
a wrong decision ?
with very attractive odds in my favor;
if the die is, after
all,
a true one, then I will have
a good deal to lose if ][ erroneously
conclude
that it is cr,\177oked
and
terminate
the game.
3. D \370es the evidence itself (27 aces in 100 tosses) indicate that I am being

my

are

Wht}t

p!aying

be

may

cheated?
If ques
to

nut

this

1t).

(Ch\177tpter

cannot

:ions

(1) and

! whole

(2) can be answered,even roughly,


into the larger framework

r)roblem

H\370weJver, in

easiqy be

making

the ! wrong

effects?

In

\177many

many practical

answered; using a
decision
instance\177

and
it is

problems, the
example,

medical

certifying

only question

167

drug

(3) than

it is useful

then

of decisi0gtheory

questions
is the cost of

two

first

what

which
can

has

seri\177Ous

be

answered

side

by

168

and

scientist,

the

TESTING

HYPOTHESIS

address

it

but extremely

limited

is this

we

which

question

important

chapter.

this

in

state the two conflicting hypotheses as precisely(mathematically)


The hypothesis that
the
die is fair is really a statement
that
the
Bernoulli
population of all possible throws
has a proportion
of aces equal to
1/6.This is the hypothesis of\"no cheating\" or \"nothing
out of the ordinary.\"
Customarily, it is called the null hypothesis,
First,

possible.

as

H0'rr =

probability of an

is that the

hypothesis

other

The

called the

that

Suppose

(i.e., if

Reject

as the

decision

Of

_<

H 0 is

course,

occur (/'

not

will

how well will rule


then concentrated around its

error
normal

?\"

The

value .20 which


P > .20 is

while

lead to the

always

will

that

lead

separates

referred to
us to

reject

significant.
right

how

out

than

decision,

We can hope, however, that


the
small we apply probability

luck).

First,
an

more

> .20)

observed values of/'


call the results statistically

denoting
rejected, we

this rule

(9-3)
if

because
probability

analysis, as

9-1 b.

Figure

answer

suggested.After

.20)

accept Ht)

H 0 (i.e.,

of chance fluctuation
(bad
of error is small. To find

that

rule was

20 acesoccur

more than

if no

range,

critical

H0. When

before we started

collected

was

decision
rule is shown in Figure
9.-la.
two regions is often called the criticalpoint,

This

(9-2)

.25

plausible

observed/'

20 aces

in

1/4 =

evidence

following

the

'Accept H0

the

this is customarily

is 1/4;

ace

we should'

throws,

100

before

even

die

the

(9-1)

hypothesis,

alternate

H\177'rr

throwing

1/6 = .167

observed

will

be

R work

greater

(\"reject H0\.") We now ask,


Recall from Chapter 6 that a
distribution,

? The distribution
of P is
a small probability
than .20, causing R to give
the
wrong
\"How small is the probability of this
sample
proportion
P has an approximate
if H0 is true

value

mean

.167,

with only

with

3re =

and

cry

rr

(,,(1

1/6 =

.167

= .037

(9-4)

Accept Ho

169

Reject

.25

.20

.15

.10

HYPOTHESIS

A SIMPLE

TESTING

(a)

Eithe\177 Ho

is true,

in

case:

which

I error

\177

\177z
=

= .18

Probability

of

= .167

fr

(b;

Or/-/

(i.e., H] is true),

false,

\177n
whi

:h case:

\1773
=

Probability

\177

of

\177

,,_\177

typelIerr\370r='13

,25

\177'=

(e)
FIG.

9-1

Hyp\177

thesis testing.

(a) Known

dish

This

is easily

calcu

of P

ibution

Pr (P

is shown

> .20/Ho) =

Pr

18

\177=

This error

of

On the

error

then ?

rejecting

de aoted =.
ot
her

case

H\177

again approxi]\177ately normal.


accepting
he false H 0

in

i In this

chapter, we

write

Pr (

is

that
true,

we

/H o) to

0\177,

tt

of

.20,

that

is \177

167)

let

(9-5)

us say.

called a

type

error,

with

its

what is the probability of


.25; the distribution
of P is
to the rule R, we will make
an error
P < .20. The probability
of this

and

observe
mean

world,

> .9)

According
if

Unknown

probability of error
right

.20 .03'}

\177j,

it is true is

when

suppose

hand,

this

H0

-crz,

(c)

'

>

,P

Pr (Z

probability

(b) and

in Figure
9-lb. The
the probability to the

evaluating

by

ated

rule, R.
possibilities.

decision
two

with

is false;

rr =

\"probability,

assuming H 0

is true.\"

HYPOTHESIS TESTING

170

error is therefore

left

the normal distribution


the mean and standard deviation

by evaluating

calculated

to the
are

9-1c lying
distribution

.20.

of

Since

crj, --

it

(9-7)

that'

follows

.20/H0= er .P

Pr (P <

Pr (z

This error of accepting


probability denoted/\177.

R)

(like
We

in

now

conflict.

let

(9-9)

us say
type

II

with its

error,

must

row

\"Isthere

decision

better

rule, i.e.,

a better

critical

Of course, we should like to make the


of error (:,. and \177) as small as possible, but
these
two objectives
This is illustrated in Figure
9-2, which is a condensed version
of

TABLE 9-1

State of

the

--.20?\"

Possible Errors

in

If H 0 is true

false

true)

Testing

Correct

Reject H0

H0

decision.

Probability ---1

If H 0 is

Hypothesis

Decision

world

Accept

(H\177

- 1.15)

[:\177,

is reviewed in Table
9-1. Note
that the
sum to 1; this must follow, so long as we use
which involves the decision either2 to accept
or reject H0.
recall
that our decision rule R in (9-1) was determined
arbieach

trarily. We now ask:


point
for our test than
probabilities

20T6-\177
-- .25)

of testing

terminology

The

probabilities

<.

false is called a

it is

H0 when

(9-8)
\177 _btp

<

= .13=

a rule

(9-6)

= .043

X/=(1

this

of

.25

rr =

hip

Figure

in

Type

corresponds

to

\"confidence

level\"

Type II
Probability

also

/3

c\177;

called

\"significance
Correct

error.

I error.

Probability --

decision.

Probability
also

level\"

called

= 1

/3;

\"power\"

2 Of course other more complicated decision rules may be used. For example, the statistician
may decide to suspend
judgement
if the observed P is in the region around .20 (say.18 <
P < .22).If he observes an ambiguous P in this range, he would then undertake a second
stage
of sampling--which
might yield a clear-cut decision, or might lead to further stages of
sequential

sampling.

A SIMPLE HYPOTHESIS

TESTING

^cceptHo

171

Reject Ho

.22

of P

Distribution
if

Ho true

.167
Ho:

FIG. 9-2

7r

of how

lustration

false

if/-/o

.22

Hi'

= .167

reducing

.25
7r

= .25

increases/3

0c

(compare

with Fig. 9-1).

Figure 9-1, exci,\177pt


As

that
the critical point has been moved
up from
.20 t\370\177.22.
this does reduce \177; but it also increases fl. Moreover, we note
only {ay to eliminate \177is to move the critical point far enough
t \370
but a\177 we do so \177approaches
1; i.e., our test becomes\"powerless\"
can
n\177o longer
reject
even the most dishonest die Similarly,
it is

we hope,

that

the

the

right,

since we
easy to

confir\177

below .20) will

that
any
increa
se

trading off con!licting


only

From equatio
of P, concentrating
9-3.

to reduce/3 (by lowen.ng the critical point


statistics, as in economics,
the problem
is

objectives.

to increase the sample Size.


in n will reduce the sPi'ead
its distribution
more closely around its central value.
if n is inCieaSedfrom 100to 200,we obtain
the result shown in FigUre
Ihe only clifference in this test and th e one shown in Figure
9,1 i\177th e
The

Thus

attempt
e. In

\177ay to

ni (9-4)

reduce both
it is clear that

increase
in sam\177le size; note how it
These
prinlciples are illustrated

cz

and

an

/3 is

increase

reduces both

cz

and/3.

legal analogy. in a
asked to decide between H0, the hypothesis
that the accuse\177
is innocent,
and the alternate H1, that
he is guilty.
A type
I error is committed
if an innocent
man is condemned (innocenceis rejected),
while
a type II ',\177rror occurs
if a guilty man is set free
(innocence
is accepied).
The judge's adn\177onition
to the jury that \"guilt
must
be proved (i.e., innocence
rejected) beyord a reasonabledoubt\"
means
that
\177 should
be kept '9ery
small. There h\177ve been many legal reforms (for example, in limiting
the
evidence that cil n be used against an accused man)
which
have been designed
to reduce \177, ths probability
that an innocent man
will
be condemned.'But
these same refcrms
have
increased/3,
the probability that
a guilty
man Will
evade
punish m. nt. There is no way
of pushing
\177down
to zero, and insU?ing
absolutely
agai\177 [st Convicting
an innocent man without
letting
every defendant

murder trial,

tl\177e

jury

is being

with

an

interesting

172

TESTING

HYPOTHESIS

Accept/-/o

Reject

Ho

///////////\177////,\177
/,/\177/

.167

FIG. 9-3

How

go free,
It should

\177
and

fl are

.20

\337

fro

frl

both reduced by

sample size

increasing

raising
fl to 1 and making the
also be noted that historically
\177and
crime detection i.e., by increased

thus

improved

.25

\337

trial

(compare with

available

been

9-1).

(powerless).

meaningless

fl have

Fig.

both

evidence

reduced

brought

by

to

bear on H0.

best

take

to balance,
into

the

to

Returning

more funds for

statistical

problem,

increasing samplesize)
or trade off,

consideration

the

\177and

factors

we

we conclude
are

fl. Whenever

mentioned

that

(short

of raising

problem of how
possible the answer should
at the outset of the chapter.

left with the

1. The relative prior likelihood of the


two
competing
hypotheses.
To
use an earlier example:if your opponent
is a trusted friend, rather
than
a
complete
stranger, your greater prior confidence
in H0 will make you more

reluctant to

it; thus you will


keep
\177small.
relative cost of making
each
type
of error. To use the
same
example:
suppose
the cost of making
a type
I error (and accusing an old
friend
of being a cheat) is high, while
the
cost of making a type II error is
relatively
low
(you continue
to bet against a crookeddie but it is only for
peanuts); in these circumstances,
your greater concern about making
a type I
error will lead you to reduce \177to a relatively small value, even though/\177
is
increased
as a consequence. Or, drawing
on our
legal analogy, we may
interpret
legal reforms designed to protect the innocent
(i.e.,
reduce \177) as a
reflection
of the judgement that
the
cost
of type I errors (condemning
innocent
men)
exceeds the cost of type
II errors
(allowing
the guilty
to go
2.

free).

The

reject

A SIMPLE

TESTING

The

:u!.[yis that,

difiS,

be aris

cannot

\177,ered
\177is

quite serious

is constructe\177

illus/

We

on
rate

set at

a small

value

scientific inquiry,
these
questions
1Secausetype I errorsare usually
5 %\177or

-usually

\177o- Then

the

test rule

are

typically

basis.

this

die-tossing example how

with our

Three steps are

tested.

deal of
precision;
but

a great

in

any

with

173

HYPOTHESIS

hypotheses

involved.

1. The

rull hypothesis H0 and the alternative H\177 are formally


stated,
ar td (9-2). At the same time, the sample size (e.g., 100)and the
significance f the test (e.g., \177= 5 \177) are set.
2. We
ow assume that
the
null hypothesis
H0 is true. And
we
ask:
\"What
can
ve expect of a sampledrawn
from
this kind of world?\" This
example thus' if H0 is true, then
there
question is a: sweredin our die-tossing
is a probabilty of only 5 \177othat we will observe a sample P greaterthan .228.
This
critical
value
(.228) is determined as follows. We note from Appendix
Table
IV th \177t a Z value of 1.64 cuts a 5\177o tail off the standard normal
Z value
is translated into a P value:
distribution. This critical

as

in (9-1)

P--rr

--

1.64

(9-10)

and for

value of rr

the

1/6'

P-

/(\177/6)
N'

1.64

(9-11)

(5/6)
100

the critical value of P = .228.The resulting


test R, is shown
shows
us
what
to
expect
of
a
sample
P,
if H0 is true.
(At
in Figure 9of a type II error (fi) and
the power of this
test
this
stage,
t \177e probability
(1 - f) ma also be calculated,but
this
is not always done. As an exercise,
yiel&

which

\337
. This

= .31 and the power of the


test.is
.69.)
there remains only the last automatic
step.
and P observed. We now ask.
Is th\177s. P con-.
(i.e., P > .228)we reject H0.
that in our 100 tosses, we rolled 27 aces. This
conflict
with H0 that
it cannot
be \"reas9nably\"

the reader thould confirm that


With the rul\177e R, now established,

3. The \177sample
sistent

with

As

observed

an

it

is not

recall

:xample,

.27 is in such
t.) chance and H0 is rejected.

P =

attributed

Whereas

Summar)

critical valt

(R,) we

is taken,

H0?\" If

fi

(.20)

sp{ cify

and

c\177(.05)

\177:

our first
test
(R)
we arbitrarily
specified the
solved for \177., in this more typical hypothesis test
and solve for the critical value. Note that
the
\"95 %
in

174

TESTING

HYPOTHESIS

.05

.167

.228

&

I
I

value

Critical

our decision

yields

which

rule

Accept/\"/o

Reject Ho
p

.228
FIG.

9-4

Construction of a

confidence

of

level\"

test

of H0,

level

(e =

this

test

the hypothesis that

.05) with

is similar to the

,r =

n --

that

.228, and we will (correctly)


which
there is a 95\177o probability

below

accept
that

There is another
way
of looking
at
observed P exceeding.228,there are two

.167, at a 5%

concept of
that

Being reasonable,we
are left in

one. For
level

some doubt;
reason

this

(type I

it

we

the assumption
observed

be

will

H0. Thus we are using a method


we will be right when H0 is true.
this

testing

If we

procedure.

get an

explanations.

bet

is no

a very
with odds

got

and
we

surprise

that

second explanation. Although


the
as plausible as the second. But we
is just possible
that the first explanation
is the correct
qualify our conclusion \"to be at the 5 ,%o significance
opt

is conceivable,

explanation

\177o confidence

95

9-4 on

1. H0 is true, but we have been exceedingly unlucky


improbable sample P. (We're
born
to be losers; even when
of 19 to 1 in our favor, we still lose); or
2. H0 is \177zot true after all; the die is crooked,and it
we rolled
so many aces.

first

significance

100.

in interval estimation. We set up the test in Figure


H0 is true. If this
is so, there is a 95 \177oprobability

used

in

a sample

the

for
it

is

not

error level).\"

PROBLEMS

9-1Fill

in

Consider
enemy

the

blanks.

the problem facing a radar operator whose job


When something irregular appearson the

aircraft.

is to
screen,

detect

he

175

COMPOSITE HYPOTHESES

must de

between

all is
well; isonlycoming.
a bit
attack

\177j:

In

cas\177, the

this

type_

equipment is made as reliable


9-2 (a) To test whether
construct \177test, at

reduce both

number of

a die has a fair


the 1 \177osignificance

.167

What

construct

test ?

whether

a die
test,

has a fair number


at the 5 \177osignificance

versus

a:

(b) What

ed

Cornpal

(2)

\177di'fferent?

(3)

fi

di\177.Terent

'value

class

yo u to

lead

mistakenly

will

H o'

or= .167

H\177'

rr =

of aces, using
IeveI,

of

.300

this test ?

the test

with

(1) The critical

developed

different

in

the

text

in Figure

9-4, is

TE HYPOTHESES

COMP7S

9-2

fi for

o\177
and

3-2

Problem

in

in your

students

.60

100

Interpret.

appropriate

tl\177st
a\177

this

observed

you
many

(heads)

a sample of

H\177'Pr

Assume

significance.

of

how

for

\177/9

=> 9-4 (a) To

(c)

.25

alternative

the

versus

results

sample

the

reject HH0?
About
reject
07 1
(c)

= 100

rct the appropriate test of

H0:c,)inunbiased,
using a 25 \177o level
tosses!
(b) Do

aces, using

\177?

(a) Const

9-3

electronic

the

of

level,

--

Hl'7r

versus

type

the

fi,

\177zand

as possible.

sensitive

and

Ho:or =
(b) What

and

error is a \"false alarm,\"

;.

\"missed rearm. To

is a

error

on the screen.

of interference

an

(a) Introduction'

in

In our die-t,\177ssing
example
we have assumed that
there
is only one Way
a die ca 1 be Crooked (i.e.,or = 1/4). Thus the alternative hypothesis

which

H\177

was

one. But usually


against us.

a simple

the die may

be

hi. tsed

there

is no way

Thus our alternative

of knowing how heavily


hypothesis

Hx (crooked

HYPOTHESIS

176

be a

die) would

TESTING

.17

rr =

.18

.19

(9-12)

H0:\177r

.167

(9-13)

H\177',r

>

.167

(9-14)

H\177'rr

rr

including our

To

set of possibilities.

a whole

embracing

hypothesis,

composite

previous, simple alternative'

we wish to

summarize,

test

against the composite alternative

Since there are many


fi as simply as in

in Ht, we can no longer


evaluate
But note that
H0 is still a simple
hypothesis; thus the evaluation
of \177.has not been complicated. There is now,
therefore,
an
even
stronger
case for concentrating on oc, which
we set at
.05; we shall return
to an evaluation
of the more complicated fi values
later.
With this significance level of .05 given,
the reader
should now develop
as an exercise
the appropriate
decision rule for accepting or rejecting
H0.
Note
that this is identical to rule R, developed
in Figure
9-4. Since the rule
is based on the level of significance selected, (= = .05 in both cases), it is
the

remain

(b) The

Power

With
a

composite

section.

previous

of

entirely independent
may

included

alternatives

the

same,

any

of' fl. But

considerations

there are two

major changes in

test

while the formal


its

interpretation.

Function

a simple

alternative

Ht there

are now

many

fi

was

a single

possible

probability

values

of'

\177r,

value.
each

With

giving

9-5; each involves


evaluating
the area under a curve, lying
to
the left of the critical value
(P = .228).
Thus
the middle
curve shows how the sample P will be distributed
if the true rr is .25 and yields fi = 31%. To interpret' if \177ris in fact .25, then
there is a probability
of 31 \177othat
an observed
P will be less than our critical

different

fl. We

show three

H\177,

such

calculations

in Figure

cOMpOSITEHYPOTHESES
ACCept/-/o

177

Reject Ho

.228
I

Critical

value

of P if

Distribution
\177r
---

.25; then

/\177
=

.31
of P if

Distribution

\177

\177

\177r
= .30;
\"\"\337

Distribution
of P it
\177r: .18; then --\177
/\177
= .80

Hi:

FIG. %!

value of .228- :\177and

Table 9-2 lnas

been

,r; the corresponding


test

reject

(1 --

fi)

is

a crook{

hown

erroneously

will

we

constructed

\177values

in column

are

3;

more

\177'>

.167

conclude

type

that

using a whole
shown in column
this

is the

die.If a dishonest
gambler
an ace, he knc
vs he will be quickly
found

the

probability of

of fi, the

Calculation

crool,

.30

.25
-V\177

.167

then

II error,

the

set of

(n =

die

t00).

is fair

possible

(accept

values

of

2, and the power of the


probability
that we will correctly
a die which
out, and the

uses

always

turns

up

game abandoned;

the
die he uses, the greater your \"power\"
to uncover
him
less crooked the die, (i.e., as we move down this column),
the more
dJflict
It it becomes
to reject it. The dishonestgambler will recognize
this,
and will
[,refer
to get you to play against a slightly
crooked
die. The
power
of this
:st is thus
seen
to be our ability
to uncover
a crooked
die;
and if it is on
y slightly
biased,
our test has little
power,
and it becomes
almost
impossi
ble to distinguish
between
the two conflicting
hypotheses.
This is confirm
',d from
the last, limiting line in Table
9-2. Here the value of
,r is H0, and to\177
ejectH0would be wrong. The probability of this was a = 5 \177

as a

by

cheat.

ed

Th\177

definition.

The \"powe

power

function\"

function

is graphed

in Figure

9-6. Clearlywe should like a

close to the baseline, since its initial


\177z,the
level of si
gnificance, which
we wish to keep low. At the same
wish
the
powe]
function
to
be very steep; the more rapidly it
greater is our p.)wer to distinguish
between
competing
hypotheses.
hat

begins

very

height

is

time,

we

rises,

the

178

TESTING

HYPOTHESIS

TABLE 9-2

(Test of

fl

and

Function

Power

Fair Die, at

for

R,

Test

the

of Significance)

5 \177 Level

(3)

(2)

Probability

Possible

Values

of

.28

.26

.24

.22
.20
18
17

(.167)

H0

Power = 1 -/\177

rr

.30

(c)

of (Erroneously)

Accepting

.32

Limit

of Correctly
RejectingH0

Probability

Accepting

About

Warning

This introduces a

.02
.05

.98
.95

.12
.23
.39

.88
.77

.58

.61
.42

.76

.26

.89

.11

.94

.06

(.95)

(.05)

H0

secondreasonfor

our

interpreting

test in this

section

a composite
H0 differently from our test in the previous
section (with
a simple H\177). It is now possible that this die is only slightly
biased
(rr = .18).
If this is in fact the case, it is very likely (fi = .89) that we will observe P in
the range
below .228. R, tellsus to accept
H0 (true die); but this is a mistake.
At
the same
time the evidence is not strong
enough
to reject H0. What
to do ?
(with

Power

-\177
1-\1775

-- Pr (rejecting H0)

\177
=

Pr (type II error)

a -- Pr (type
.05

.167

.2O

=Ho

FIG. 9-6

Graph

of the

power function
of Table
/o significance
5o\177

.30

Possiblevalues
9-2, for the
level.

of rr

test R, of the

fair

die at the

COMPOSITE HYPOTHESES

The
reject

H0.\" t is
II eri'or.

this

in

only

is to suspend judgement;or formally


way that we can avoid
the great
risk of

option

apparent

other

only

179

Earlier,

\"not
incurring

H0),
alternative
this
risk
be\177 )roes
(fi can run up to 95). Thus
we prefer
to \"not
reject H0, :'suspendjudgement,'or conclude
that
\"our
sample P is not
statistically ignificant, i.e., P is not significantly
greater
than 1/6.\" But we
a type

we could liv,

do not

the risk
prohibitive

with

with

simple

of type II

H\177

different

(substantially

error; but

our

wfith

from

composite

,t H 0 outright.

acce

point of view. Suppose we are using test


we toss the die and observeP = .21.This
would
not bei an unlucky result; in fact it is the best luck that
we could
have
hoped for, s\177ce our estimate
P is exactly on the true value rr. If we Started
ut
suspectlag
bias, it has now been confirmed by our sample.
There are no
con: irm

We

from

this

rule R, on a'biaseddie (rr

grounds whaisoeve
we also

Since

to

this

There
we

r for

another
= .21);

concluding

that

this

is a

fair die.

cannot reject H0, we suspendjudgement.a


alternative
turn.

is another
shi[11 now

even more

is generally

which

We cannot acceptH0

'

attractive and

(d) Prob-val

The pro\177-value

(the

\177

Pr

a This

b-value

& Pr

to

sample value would


value we actually

the

be as extreme\177

(9-15)

observed/Ho]

illustrates
a problem involved
in accepting
H 0 if the sample size is small;
anreiect\177
inc[ease
in sample
size, by reducing our standard
error
woul d eventually
H,
*'resu--:
J \177t 0--},
,tung, oz course, mat we continued
to observe P _

section

note
how
allow
us

\\as

as

is defined

'

......

.21.

othe/' hand, if the sample size is extremely


large we can fall into a trap in
rejectin\177
tt o. To \177ee why, we consider more carefully
the question
of whether any die is
absolutely true, \177ith
to this must be no; H0 ' rr = 1/6 like any
\177 rr = 1/6 exactly ' The answer
On

the

to do to reject it is to take
estimate to the point where even
',\177 \337
\337
,,
cally s\177gmficant \"! But cDn(-h,\177;-,\177 ,\177, ,\177-':- - ....
o\177rrejection
of H0, \177.e., be statisti\337 '
' I
'-\177,-*,.*.st\177;
t\177at tins
me \177sreshonest
misses the point: it is not
dishonest enoug\177 to be of any practical consequence.We therefore
must distinguish
between statistical
significance,
and practical importance.
other simple

hypdthesis, must

be (slightly)

false. And

enough Sa\177{nple thus reducing our standard


an observed Pjus\177 slightlv different from 1/6 will
a large

all

we have

error of
call f

sample size is obviously


an important
consideration in hy othes\177s
. ,.'
IfconclusiO\177j,
the s a\177q-[\177le
is ver- lar-e, re'ectin H\177
'
'
.
T
Y
g
J
g 0 may oe very clan,,erous. on the
2 the sample \177svery small, accepting H 0 is danserous
(and this i7,h\177 -\177'\177
.... ,,,ot\177her hfi. nd,
\177oreconomists
ant ..+t.
.._, _ .
...
.,-, .
o -\177,. m,_uc c/mcal
10roDtern
test\177nIn

'

' g

\337
\177-

.....

\177-,-,-,\177, =uc,m

sc\177ennsts,

ause
Short
\"proba
,ility-value.\"
It
this for
term,
to a \177'oid confusion\370

with

their limited

is sometimes further

available

contracted

informationS).

to ';o-value\". ' we

d\370not

HYPOTHESISTESTING

180

example,

gambling

the

For

of aces to

proportion

be

prob-value

_
(P >

Pr (Z

and

observing

the

have

.27, we

=\177 Pr

100 times

a die

tossing

in

\234 =

(9-16)

27/H0)

\337

\177_ 2.77)

(9-17)

-- .0028

This calculation is very similar to the calculation of v., and is shown in Figure
9-7a. We further
note
that if the observed value of \234 is extreme,
the probvalue is very small. Thus the ?rob-value measures the credibility
of rio. It is an
excellent way for the scientist to summarize what
the
data
says about the
null

hypothesis.

The

to

of prob-value

relation

H0

testing

\177

may

be seen

in

Figure

9-7b.

Prob-value

.27

.167

= Observed

\337

(a)

.167

FIG.

9-7

Prob-value for the

(a) Calculation

of

prob-value

lation

gambling
when

of prob-value

.27

example;
observed

to

H0 is rr = 1/6 and sample size is n = 100.


P = .27. (b) Fig. 9-4 repeated to show re-

o\177'Reject

0 iff

prob-value

HYPOTHESES

COMPOSITE

the prc

Since

b-value is
f the test,

tion region

smaller

To restate thls, we recall that

of

valu\177

To

\177at

th e

rejec-

(9-I8)

0\275
I

measure of the credibilit


be re'ected.
\177nterpretation:the prob-value is the smallest
may be rejected.
is a

prob-value

the

yet another
which
H0

\177gure 9\1777 shows

possible

<

iff prob-value

sinks below \177.,then

thislicredibility

of P is in

18i

\177

Reject

of H0; if

value

observed

0\275,the

than

i.e.,

\177

H0 must

of the traditional hypothesis testing of


arbitrarily, and the simple decisionto reject
or not reject Ho does not
allow
the sample
to \"tell us\" all that it might.
Prob-value is therefore
the preferred
way of stating
the result
of a hypSthesis
test. Then ea!:h reader can set his own level of significance
o\177at
whatever
value he deen\177s appropriate,
and make his own decision to reject Ho if the
prob-value
< \177. [If the prob-value
> \177, he should
suspend judgement for
concl

0\275
is

the reasonsci':edin
a

with

set rather

that

Suppose

of 90

distance

\177topping

above.]

9-2(c)

Section

E\177'ample.

\177nother
linings

criticism

major

\177de,

Section9-1isthat

an auto
feet. The

been

has

firm

is considering

firm

using brake
a switch to

another type )f lining, which is similar in all other respects, but alleged to
have a shorte: stoppingdistance.In a test run the new linings are installed
on 64 cars; th\177 average stopping distance is 87 feet, with a standard deviation
of 16 feet. In \177our job of quality control, you are asked to evaluate
whether
the
Let

or not

\177 =

is better.

lining

ne\177

stopping distance

average

for the

of new

population

linings

and test
90

Ho:/\177 =

against the alternative

.{
Noting

that

method

simila
\337

tl\177e

H\177:/\177

observed,

Translating

<90

87, you calculate

the

using

prob-value,

(9-16):

i to

prob-value

\177

In other wordsl, this


you

2 is

observed

\177.e.,

is the

probability

Pr

(.,\177 <7

that

or more below
Z values, we have

3 feet

(9- 9) into

prob-value

Pr

87/Ho)

.\177will

(9-19)

be as

extreme as the

the hypothetical value

87 --

of

value

90 feet.

90\177

(\"\177-

Pr (Z <7

--

.067

- 1.5)
(9-20)

HYPOTHESIS TESTING

182

You report therefore that there is evidence that the new linings
are
better,
since there is only
a 6.7\177o probability
that you would get such extreme
test
results
from an equivalent product. Thus
you
leave the decision to the
vice-president. If he usesa 10\177o significance
level, he will
switch
to the new
linings. But if he uses a 5 % significance
level, he will not switch.

(e) How to

Select H0

So far we have
Hx. Casesoccasionally
example,
suppose
vote
Democratic

preference is the

rr\177r

and

a simple

both

and composite

occur
when
both H0 and Ht are composite.As an
we are asking whether American men are more likely
to
than American women. The null hypothesis
(that voting

same)

rrw

simple hypotheses.

many

contains

.50

r\177w

rr\177

--

rr w

-- .51

r\177

r\177w

H0'r\177

where

H0 against

a simple

tested

x,

of

the proportion

represent

x _<

_<

voting

women

and

men

Democratic.

Moreover, the alternate hypothesisis even


.51

and

rr\177

.52

and

rrw =

rr\177

.52

and

rr

and

rr

.50
.49

.50

Hx:rr\177

composite.

more

rr

_<

x _<

O_<y<_l

x>.y

Additional

complications are

introduced when only H\177 was


to start. The key is to define
between

voting

preferences.

a new

refer to

H0

involved,

Indeed

population

above

it is

as one-dimensional,

and

7r3\177
[ --

H\177 as

Tr W

two-dimensional.

and

beyond those
to

difficult

parameter

Specifically
\177---

\177
We

now

composite.

-the

know

where

difference

COMPOSITE

The null hy p

becomes

)thesis

H0'

against the

For large samples,a test can


sample'iproportions,

now

in the

As this

p\177

makes

illustration

that

one

and

\177--

;:(9-21)

H\177' \177>

(%22)

alternative

.mposite

c\177

uninteresting,.

183

HYPOTHESES

be constructed

on the basis of

difference

may sometimes be

clear,
the null hypothesis
we neither believe nor wish

selected becat se of its simplicity.


establish, and we prove H1by

the

Pw.

It is

establish'i

to

alternative H1 that we are trying to


rejecting
H0. We can see now why
statistics
is
sometimes
caled \"the science
of disproof.\"
H0 cannot be prox;en;\177and Hx
can be prove n only by disproving
(rejecting) H0. It follows that if we wish to
prove some pi oposition,we call it Hx and set up the contrary hypothesis
H 0
as the \"stra w man\" we hope to destroy.
It is the

research engineersin an electronics


company clai\177 \177that they have developed a new televisiontube superior
to the
old, which
hal
an average lifetime of 12,400hours. They
ask
you to prove
its superiorit)
You wish to establish,
that the

Suppose

\177xample.

Another

Hl'/t

12,400

>

lifetime of all new tubes. The


is no better, i.e.,

is the:average

where/t

to destroy is that

Ho:/\177

The new

tube

significantly

lished.

s then
g eater

\"straw

man\"

you hope

tube

this

tested

than

in the hope
12,400. If it

This exar

I2,400

that

is,

the

then

observed

H0 is

sample

mean

rejected and

H\177

will

be

is estab-

earlier warning against accepting H0.


slightly above/to, yielding a prob-value
of
20 \177o. If the vi
president
specifies the significance level 0\177at 5 \177o, the evidence
is not strong
el ough to allow us to reject H0.But we cannot accept H0 either,
for two reason:
did not believe it in the first place; itwas setup simply
as a \"straw m n\"(1)we
we hoped
to knock over in order
to establish
Hx; (2) the
tests suggest \177
is wrong, (although
not
as strongly
as we would have liked).
We therefore
o!
>t to withhold
judgement,
simply quoting the prob-valuel \177
and
wait for furthe:
evidence.

Supposethe

See

Section

\177ple

s.

tmple
\177

9-2!

our

emphasizes
mean

above.

X is

184

TESTING

HYPOTHESIS

PROBLEMS

9-5

has always grown to a mean

of seed

type

certain

of 49 seedsgrown
under
inches and a standard deviation

A sample
8.8

of

(a) At the 5\177o


conditions
grow

(b) Graph
9-6

(1 --/\177)

3 of Table 9-2,an
all

graphing

by

in

and compare

test,

with

it

desirableshapefor

is the most

(a) Calculate

the

At

(c)

At a 1 \177 level
you

accept

it ? Explain your answer.


average 320 cupsof coffee
per

standard deviation of 40. After


advertising,
they
sell an average
of 350 cups.

(a) Has advertising

left

with

a standard

would you reject the man's claim


would you reject his claim?

of significance,

sells on the

of 150

sample

random
S6730

salary of

and interpret.

of significance

therefore

shop

coffee

shows

prob-value,

a 5 \177olevel

the average yearly

is only S6600.A
a mean salary of

profession

a certain

(b)
Would

2). Draw

(column

\177values

this

that

claim

iraplausible

men in that profession


deviation of s900.

9-8 A

characteristics

\"operating

OCC

9-7 A man makes the


men

new

the

that

involved graphing all the

die test

the operating characteristics curve


for
the power function in Figure
9-6. What
an

1 inch.

test.

this

our

of

(OCC) is defined

curve\"

of

function
function

in column

values

inches.

mean height

no better plants.

power

the

has a

the hypothesis

test

level,

significance

the power

Whereas

of

of 8.5

height

conditions

new

business

their

they

with a

day,

find that on

7 days

unchanged ? Calculate the

prob-

value.

(b) If the
the
that

(c)

test

owner of

coffee

the

level) is
is unchanged
?

(significance

business
What

Under what
*(d) If coffee
improved,

have you
are they

assumptions
conditions
sales

sales have to

be

can
in

made

specifies

that the type I error of


reject the hypothesis

you

made implicitly
questionable ?

at the 5 \177osignificance
among 100
a standard

in

be observed for 25 days,what


to justify
a statement

order

9-9 In order to comparethe


was

shop

to be 5 \177o, do

yearly

men

in

level

would
that

(a) and

(b) ?

the average
had

business

incomes
each.

parts

In one

in two

professions a

sample the mean

survey

incomeis

S6000
with
deviation of $700; in the second sample the
mean is $6200with a standard
deviation
of S400. To weigh
the claim
that the mean salary in the second
profession
is no larger than in the

TWO-SIDEDTESTS
first Pr

ferenc e

calculate the
in means developed

ffession,

(9-10)Record
of 25.

reducin

production.

was

erely statistical

(a)

To objectively

left

un(\177hanged,

(b) If

\177o

1 2
Supposi

the

'\177

alternate

'9-12 The out


articles

as

TWO-SII

\177ED

In the

pre\177

thz

or less D\177

more

How small
inferior
at the

400.

in
machin\177

Democratic

is

following

that

there

is a

bias towards

in order

to reject the

ious section we asked whether men voters were more heavily


,n women.
Suppose
instead we ask whether men voters
than

-22)we

(!

the

or

digits.

would 2' have to be


5 \177osignificance
level

In either

women.

H0' a
in

in a factory is substandard 4 \177 of the time.


being inferior produces 2' substandard

hypothesi\177

But where

that

level of
ement

TESTS

mocratic

null

of

suspected

\177ine

a fair

24844302

hypothesis

machines

of all

>ut

maCi

'

is

of maria

favor

in

of. random

2451

1. g086l

'that

rule

= 1 \177

the hypothesis

test

from a population

small di gits.

9-3

do they

level,

significance

is drawn

that =

decides

I error),

is

production

whether

prob-value.

the

sample

an hourly

board

significance(type
At

deviation

installed,

was

summarize the evidenceon

calculate

union? !

'9-11

machine

a standard

fluctuation.\"

arbitration

tlhe

100 hours

in a random sample 6f 500


average of 674 article s with
of 5. Pointing to the drop of 4 articles per hour
management
claimed
the safety device was
The union countered that
the
drop
of 4 articles

mean,

sample

on dif-

theory

8.)

Chapter

in

produced

deviation

\177rd

the

safety device

machine

he

a stand
in

hourly

tfter a

hours

prob-value. (Hint. Usethe

in a random
sample of
average of 678 articles with

that

Sl\177oW

c \177dan

produ

185

used

a one-sided

case we use the

same

are

simple

(9-23)

= 0
alternative

Hi'0 > 0

we now

must

u ;e

the two-sided

alternative
\275>0,

i
i H\177'
We
than

6 \177 0,

reject
H0 if\177':our sample
0. We test H0 \"from

which
estimate

is equivalent
of

both sides.\"

\177is

to \177..

significantly

greater

(9-24)
than, or less

186

TESTING

HYPOTHESIS

suppose we are again


testing
the trueness
of a
die. But instead
of testing a suspect die we are betting
against,
suppose
we
work in quality
control
in a die-making factory. We are now just as concerned about a die that shows too few aces, as one which shows too many.
Our appropriate test involves'
As

a second illustration,

.167

H0'rr =

(9-25)

(9-13) repeated
against

the

alternative

two-sided

H\177'rr

[compare

with

(9-14)].

two-sided. For a level

\177

The critical
of

i.e.,

.167,

region (for rejecting

significance

(9-26)

167 or
.167,

\177z--

5 %

now also

must

H0)

this is shown

in

be

9-8b;

Figure

a= 5%

.167

.228

Ho

Reject

./\177
\177

\177z
=

,090

.167

5%

\177\177.....each
.244

but

area

in

ta,I = 2\177 p

Ho
Reject

Ho

Reject Ho

FIG. 9-8 A one-sided


and two-sided test of a die compared.(a) A
H0:rr =. 167against
the alternative
Hz'z- > .167(Fig. 9-4 repeated). (b)
of H0'rr = .167against
the alternative
Hx',r \177 .167.

one-sided

test of

A two-sided

test

(,F

RELATION

area

an equal

2.-} \177)

is cut

rejectingH0 a'.\177large

TO cONFiDENCE

TESTS

HYPOTHESIS

Off

keep the criticalregion for

in order to

tail

each

Thus'

as possible.

P -- 7%
H o if [ZI

Reject

=
%(1

1 57, the

rr0 is.

where

whether

dif

hypothesis

null

rr 0 by a

;ers from

187

INTERVALS

> 1.96

(9-27)

- To)

value of rr. Equation (9-27) simply


critical amount .on either
the
high
is \"How do we recognizewhen
to use

asks

side or
a two-

low side.
'he final question
tailed test or a one-tailed test?\" The one-tailed test is recognized by an
asymmetrical
\177hrase like
\"more than, less than, at least, no more than,
better,
worse,...\"
\177
ad so on. Thus our first
test
of whether
the probabilityof an
ace on the gambler's
die was more than
one
sixth,
required a one-sided test.

the

PROBLEMS
i
9-13

H0

Test

i/'2 versus

rr =

tossing

H\177'rr

\177

up.\"

\"point

thumbtack

use the ;ample observations of


(a) Aft{ r 10 tosses.

(b) Aft4r
9-14

Usi ng

answer

implausibly

>

now

(a)

Two-side

low,

6600 or

a two-sided

the same

9-4 THE I ELATION


CONF]

5 5/0

level

the probability
of significance,

of
and

3-1.

Problem

Problem 9-7,suppose

Hx:3t

two-siddd:

rr is

where

tosses.

100

Referri4g to
is no lounger

1/2

Use a

DENCE

Hypothesis

pt

that

claim that 3to -- 6600


the alternate hypothesis is

man's

the

i.e., suppose
< 6600.

test of H0,and alsoa two-sided

questions as

in

Problem

OF HYPOTHESIS

prob-value,

9-7.

TESTS TO

INTERVALS

Tests

In this s :ction we shall reach a very


important
conclusion:
a confidence
interval can >e used to test any hypothesis;
in fact, the two procedures are
with an example.
equivalent. \177re illustrate
a firm has been producing a light bulb with an average life of
Suppos\177
800 hours. I wishes to test a new
bulb.
A sample of 25 new bulbs has an
(\177), with
a standard deviation of 30 hours (s).
average life )f 810hours
of our small sample we should use the
t, rather
than the
Noting that oecause

188

TESTING

HYPOTHESIS

we can

distribution,

normal

1. Test

the

either

hypothesis

Ho:)to = 800
against

(9-30)

the alternative
800

H0 may

be accepted7

the

at

i.e.,

if )t0

-- 2.06

Given our

s/x/\177

tl

)to

-\177-

_<

if

(9-32)

t.02.\177

+ 2.06

_< .\177 _< )to

samples, along

of significance

\177olevel

lobserved

(9-31)

hypothesis

our

with

788

Since our observed .\177 (810) does


This is shown
in Figure
9-9a.

__< .\177 _<

)t0,

this

condition

(9-33)

812

fall within

becomes

interval,

this

)t0 is

acceptable.

Accept
1

812

800

788

Observed

Hypothetical

value

value
x

\177uo

Conhdence interval
810

798

822

t.025 \177

= 2.06

30
_4\177

12

(b)

FIG. 9-9 Comparison of two-sided


hypothesis
test with confidence
sample with \177 = 810 and s = 30). (a) Test of H'0:\1770 = 800 versus
Confidence interval
for/t.

interval
H\177'p

(using
\177 800.

a
(b)

? More specifically \"H 0 should not be rejected.\"


To simplify the exposition in this section
and avoid double
negatives,
we shall use \"accept
H'0,\" rather than \"do not reject
Ho\"-although as we have pointed out earlier,
the latter (weaker) conclusion
may be the only
one

justified.

RELATIO\177
OF

2.

Alte\177

natively,

the sample result

3t. Using

the same
(8-1 5) as:

for

interval

is d

interval

TESTS TO CONFIDENCE

HYPOTHESIS

in

fined

]89

INTERVALS

could be usedto construct

95%

of

level

a confidence

this

confidence,

confidence
\"

X 4-

t.025

30

4- 2.06

810

(9-34)

/\177

,\177/25

or
< 3t

798

is show

This

in

The obs

erved X of
st in Figure
we note that

810 fails in the


9-9a; hence 3t0 is

hypothesis te
9-9b

Figure

This is tI\177e key


an

it be

will

interval

we

U\177e

is the

if 3t o

only

same length

in

acceptable.

cases:

both

same error

the

region defined in the


At the same time, in

this confidence interval,


from the diagram, sincethe
it is constructed by adding

falls within

is clear

This

hypothesis.

precisely

subtracting

and

if and

acceptable

the confidence interval.

within

3to falls

point:

\177tcceptable

(9-35)

< 822

9-9b.

Figure

allowance

(t.02\177

12).

s/\276/7\177=

Provided the sample mean \177 and 3t o differ by less than this, 3to will fall in the
confidence
in :erval, and will also be an acceptable hypothesis. This holdsfor
any
3t0. (To c >nfirm, note that
3t0 = 797.6 would be just barely contained in
the
confidem
e interval at the bottom; at the same time
this
hypothetical
value would Slhift our acceptable
region to the left in the top diagram to the
point
where
our sample \177 = 810 would just barely remain in that region.
But any smaller hypothetical value of 3t will fall outside
our confidence

region and bd rejected.)


It

can

b\177proven, in general,
i H 0 is accepted

interval
a general
the basis of bott
s For

of

test

\177, but

ou

that

if and only

observed \177
In constru,

satisfy (9-37)for
if

the

hypothesis

relevant

the

confidence
(9-36)

(rather
than geometric
interpretation) for (9-36),consider
interval and hypothesis test. (We illustrate with the normal
are equally valid for most tests.) With
95 % probability,

the confidence
remarks

a/x/\177

the

if

contains H0

tlgebraic proof
r

In deciding whet

her to
\177tisfies
\177nga

acceptthe
this

null

<

hypothesis

(9-37)

1.96
3t0, we

first

and then

fix/to,

see whether

inequality.

confidence

interval, we

first

observe

,\177; then

the values

n our confidence interval.


3% will be in the confidence
,% is accepted, for in both cases we have

1.96

interval

of 3t
if

and

which

only

190

TESTING

HYPOTHESIS

noting,
of course, that
level of type I error (level

of confidence
5 %).

level

the

(e.g., 95 \177)

match

must

the

significance,

of

*(b) One-sidedHypothesis Tests

one-sided test of a hypothesis,


confidence interval, as shown
in
Using the same sample result, we see that the observed X of
(9-36)

Equation

9-10.

Figure

810 falls

for a

true

remains

provided,of course,that

a one-sided

use

we

region defined in the hypothesis


test in Figure
9-10a; hence/\1770 is acceptable.
At the same time, in Figure
9-10b we note that
/\1770 falls
within
the confidence interval. This illustrates
once
more
that H0 is
accepted if and only if the confidence interval contains H0.
The reasons for one-sided
hypothesis
tests have been established at
length

in

the

in this

acceptable

chapter. These

fidence intervals

too.

same reasonsjustify

the

for example

that

the

dam

in a

Suppose,

considering construction of
Accept

multipurpose

use of
federal

one-sided congovernment
is

river basin.

Suppose

\177o

810.3

800

(a)

Observed
X

Hypothetical
\177o

Confidence

interval

810

799.7
8
t.o\177

= 1.71

FIG.9-10 Comparison
sample

result

as

of' one-sided

Fig. 9-9). (a) Test

\177

30 = 10.3

hypothesis test and


of/-[0:P0 = 800 versus
interval

for p.

confidence
H l:p

interval

> 800.

(using same

(b) Confidence

RELATIO\177X

OF

HYPOTHESIS

TESTS

CONFIDENCE

TO

191

INTERVALS

that

further

the cost of this installation


is s100 million. The problem is'
the b \177
roefits
from
the project exceed this
cost ?
To get
.n idea of irrigation benefits, suppose we run a careful calculation
of the opera
sample
of 25 farmers in the river basin, and
estimate tha tion of a random
the net profit
(per
100 acres) will
increase
on the average by
$810
(with '\177
standard
deviation
of $30). To simplify
the exposition,
we have
used
the sat
te numbers
as in Figures 9-9 and 9-10,exceptthat
,\177and
u now
refer to the
tverage increase in profit.
'
The be\177
point
estimate of \177 (average
profit
increase) is 810. BUt if we
would

in

this

use

o\177

calculations, we will

r benefit

i.e., it may \177eway too high, or way


estimate of {799.7,the critical point
Figure
9-10. We can be 95 \177 confident
by

know

certain

in our
that

this figure

benefits (flood control, recreation, etc.) and


nillion. We can now be very confident that benefits

of other

estimates
sum

underestimates

this

Now

one-sided confidenceinterval
in
understates. We don't
doesn't matter; the point is that we are almost
benefits.
Suppose we use similar
under-

this

but

much,

ho,\177

that

of its reliability;
consider the alternative

no account

take

low.

too

to $1 I0

stage we

since at eacl

policy point

)f

a much stronger conclusion than

the

that

\370

costs,

From

benefits.

underestimated

consciously

have

this is

view

these

that

exceed

\"best

of benefits is si20 million,


since
the reliability
of this estimate
(This strategy clearly has a major
drawback.
An understatement of' >enefits may reduce
the estimated
benefits below cost--in which
estimate\"

a m rstery.

remains

have to start all over again.)


the case\" against our conclusion,it is strengthened.
Economists
cften apply this general
philosophy
in another way by selecting
adverse assmnptions
in
order
to strengthen
a
policy
conclusion; they may
use one-sided
confidence intervals in the future for the same reason.
case

we wou?

(c) The

\"cooking

b'

Thus,

Con! dence Interval

as

Technique

a General

The read

ask- \"Doesn't
(9-36) reduce hypothesis testing
to a very
t of interval estimation ?\" In a sense this is true. Whenever
a
confidence
in\177
erval has been constructed, it can immediately
be used to test
any
null hypc thesis'the hypothesis
is accepted
if and only
i? it is in the con,\177r
may

adjure

simple

fidence
lent

interv\177
tI.

form:

To

A cbnfidence
the
0

I.e.,

the presen

of discount,

and

also important

set of

interval

as

be regarded

may

just

acceptable hypotheses.

value of these
:\177r
the extent
)nsiderations;

we can restate (9-36)in

this point,

emphasize

to

accumulated
which

but

benefits

benefits.

such as

Issues

here on

costs to

the

equiva-

(9-38)

must exceed

we concentrate

an

the

appropriate

justify
the
statistical
issues.

rate

project are

HYPOTHESISTESTING

192

is whether, in view of this, our study


of hypothesis
been a waste of time. Why
not
simply
construct
the
(single) appropriate confidence
interval,
and use this to test any null hypothesis that anyone may suggest
? There is a gooddeal of validity
to this conclusion;
nevertheless, our brief study
of hypothesis
testing has been necessaryfor the
question

next

The

in

testing

has

chapter

this

reasons'

following

hypothesis testing

1. Historically,

used

frequently

been

has

physical

in

research. This technique


must
be understood
to be evaluated;
specifically
the
nature
of type I and type II error and the warnings
about
accepting
H0 must be understood.
2. Certain hypotheses
have no corresponding
simple confidenceinterval,
and
are consequently
tested on their own.
3. The calculation
of a prob-value
provides additional information
not
available
if the hypothesis is tested from
a confidence
interval.
4. Hypothesis testing plays an important
role
in statistical decision
and

social science

theory, developedin

15.

Chapter

PROBLEMS

9-15

Three

claim

sources

different

the average

that

income in

certain

is S7200, $6000,and s6400respectively.


You
find from a
sample of 16persons
in the profession
that their mean salary is S6030

profession

and
(a)
at

the

At

is S570.

deviation

standard

the

\177

test each

level,

significance

of the three hypotheses,one

a time.

a 95 \177 confidence

(b) Construct

by simply

3 hypotheses

noting

for 3t.

interval

it is

whether

Then test each of the

included

in

the

confidence

interval.
marks'
3, 9, 6, 6, 8, 7, 8, 9.
(9-16) A sample of 8 students made the following
Assume
the
population
of marks is normal. At a 5 \177 level of significance, which
of the following
hypotheses about the mean mark
(/\177)

(a)

'9-17

reject?

you

would
3to

--- 8.

(b) 3t0 =

6.3.

(c) 3t0

4.

(d)

--

9.

3to

As in the

second example

process of manufacturing

The engineers
than

the

old

have

standard.

of

television

Section

tubes

9-2(e),

has a

suppose
a standard
mean of 12,400hours.

a new process which


they
To establish this, a sampleof

found

hope

100tubes

is better
from

has a

rocess
i
ihours.

new
4000

mean of 12,760hours, andCONCLUSIONS


a standard
devi\177ition
i,

(a)

\177onstruct

a one-sided

(b)

(\177alculate

the prob-value

interval for the

confidence

associated

new/\177.

of

hypothesis

null

the

with

of
193

no ir \177provement.
(c)

/\177t

\177

do you reject the

of significance,

level

hyp6thesis

null

CONCLUSIONS

9-5

Hypothesis
for

the

several

testing is a

to an

preferred
observed

sa{nple

sample is

sff.tistically

a\177

interval

the

with

used

a clearer

care,

greht

confidenceinterval

gives

whereas a test merely

result,

be

must

of a

construction

hypothesis test;

is\177usually

picture of

indicates whether

or

not

the

the

significant.

are real

there

Second.

accepting
calculated.

technique that

the

First,

r',\177asons.

problems -especiallywith

the prob-value

of

H0;

implausible

instead,

'[his provides a clear and

sample

. in

should

be

test

of

picture

immediate

small
the

how

Well the

statistical re}ults match

H0,
leaving
the rejection decision to the reader.
Finally; rejection of H0 does not answer
the question
\"Is there any
practical economic (as opposed
to statistically
significant)
difference between
our samplei\177esult and H0?\" This is the broader question of decisiontheory,

it

developed

Review
9-18

Pr\370l

Fore

towa
9-19

[ellis

are tossed together 144times. The


is 2.2. To answer a gambler who
fears

coins

headl

ds heads, calculate

\177hesis
\370

hyp

15.

Chapter

of

fair

the

number
of
coins are biased
associated
with the null

prob-value

average

the

coins.

A saf

aple of 784 men and 820 women in 1962 showed that 30 percent
men and 22 percent of the women
stated
they were against the
John
Birch Society. The majority
had no opinion.
(a) I etting
rr\177 and
rrw be the population proportion of men
and
',n respectively
who are against the Society,
construct
a 95,g/oo
confi tence interval
for the difference (rr\177 -- rrw).
(b) !;?hat is the prob-value for the null hypothesis
that (rr3x -- rrw) =
of

thi

07

(c) /
woml

(d)

\177tthe

5,\177/o

significance

,m statistically
X

lould you

judge

level,

is the

difference between men

significant ? (i.e., doyou


this difference
to be

reject

the null

and

hypothesis) ?

of sociological
significance

HYPOTHESIS TESTING

194

(9-20) Of 400 randomly

selected

townspeople

candidate.

presidential

a certain

in a

certain city, 184 favored

Of 100randomly

the same city, 40 favored


the candidate.
(a) To judge whether the student population
have the same proportion favoring
the candidate,

value.

(b) Is

the

significant,

in

difference

at the

5 }/o

the

students

and

town

calculate

students

in

population

the prob-

and townspeople statistically

of 100 workers in one plant took


deviation of 2.5 minutes.
A sample
of 100 workers in a second
plant took an average of 11
minutes,
and
a standard
deviation of 2.1 minutes.
(a)
Construct
a 95 \177 confidence
interval
for the difference in the two
population
means.
(b) Calculate the prob-value
for the null hypothesis
that
the
two
population
means are the same.
(c) Is the difference
in the two sample means statistically
significant
at the 5 \177 level ?

9-21 To

complete a certain

level

selected

an average

9-22

of 12 minutes,

task

a sample
and

a standard

By talking to a random sample of 50 students,suppose


you
find that
27 percent support a certain
candidate
for student
government. To
what
extent
does this invalidate the claim that
only
20\177o of
all the
students support the candidate?

!o

cr

chapt

of Variance

4nasis

INTR

10-1

ODUCTION

In the 1,tst

in

Section

population neans. Now we

made inferencesabout one population

we have

chapters

three

mean; more over,

8-1 we extended

this

to

the

difference

using techniques
calledanaly\177 .isof variance.
\177
Since
the development
of this technique
complicated
and mathematical,
we shall give a plausible,intuitive
of what is if .volved,
rather than rigorous proofs.

10-2

ONE.

OF

ANALYSIS

FACTOR

As an example,

compared.i\177ecause

suppose that
these

inexplicable':reasons,

hope of\"avCrag\177ng

r means,

compare

output

three

machines
per

hour

in two

commonly
becomes
description

VARIANCE
machines

are operated
is subject to

(A, B,
by

men,

and C) are being


and

foi' other

chance fluctuation. In the

and thus reducing the effectof chance fluctuation,


hours is obtained from
each
machine
and set' out
Table 10-1, \1771ong with the mean of each sample.
Of the nany questions
which
might be asked, the simplestare set out
random

saff [ple

out'

of 5

Table 10-2.

1To

a
in

in

Lrgument simple, we assume (among


other things) that there is an equal Size
a\177nfrom
each of the r populations.
While
such balanced samples afe typical
intal sciences (such as biology
and psychology),
they are often imp6ssible
in
the nonexperii
nental sciences (e.g.,economicsand sociology).
While analysis of Variance
can be exten& d to take account of these
circumstances,
regression analysis (dealt with in
Chapters 11 tC 14) is an equally
good--and
often preferred
technique. But regardless
of
its limitations,
analysis of variance
is an enlightening way Of introducing
regreSSi6n.
keep

the

sample (n) dr
in the experim

195

196

OF

ANALYSIS

TABLE

10-1

VARIANCE
Sample

of Three Machines

Output

Machine, or
Sample

Sample from

Number

i=1

48.4

=2

56.1

=3

52.1

Machine

49.7

48.7 48.5 47.7

56.3

56.9

51.1 51.6

48.6

55.1

57.5

56.4

52.1 51.1

51.6

X =

Average

\177

52.2

TABLE 10-2

How It Is

Question
the machines

(a)

Are

(b)

How much are


different ?

Analysis of Variance
hypothesis)

different

machines

the

Answered

Multiple

(test

Table

of

(simultaneous

comparisons

intervals)

confidence

(a) HypothesisTest

The

the machines really different?\"


That
is, are
10-1 different because of differences
in the
underlying
population means/\177i
(where/\177i
represents
the lifetime performance
of machine
i). Or may these differences in \177 be reasonably
attributed
to
chance fluctuations
alone?
To illustrate,
suppose we collect three
samples
from
one machine, as shown in Table
10-3. As expected, sample statistical
fluctuations
cause
small differences
in sample means even though
the/\177's are
first

the sample

question

means \177i

TABLE 10-3

is \"Are

in

Table

Three Samples of the

=2

=3

Machine

Sample Values

Sample Number
i=1

of One

Output

51.7

53.0

52.0

51.8

51.0

52.1 52.3 52.9 53.6 51.1

52.8 51.8

52.3

52.8

51.8

51.9

52.4

52.3

52.2

197

OF VARIANCE

ANALYSIS

ONE-FACTOR

identical. Sot

le question may be rephrased,


\"Are the differences in \177 of Table
ne order as those of Table 10-3(and
thus
attributable
to Chance
fluctuation), \177
\177rare
they large enough to indicate a difference
in the underlying
p's ?\" \177
he latter explanation seemsmoreplausible;
but
how do we
develop
a fori nal test ?
As befori
i the
hypothesis
of \"no difference\" in the population
means
10-1

sal

of the

becomes the

ull hypothesis,

HO:Pl =
The

alternate

is

hypothesis

different,

not necessarily

(but

some

that

for

H\177:p\177 \177,!p\177

(10-1)

=/\177a

P2

are

p's

(10-2)

i and

some

all) of the

To develo? a plausibletest of this hypothesis we first require


a numerical
measure of th\177 degree
to which the sample means differ.
We
therefore
take
the three sample means in the last column of Table 10-1and calculate
their
variance.
Using formula (2-6) (and being very
careful
to note that we are
calculating
the
variance
of the sample means and not the variance
of all
values

in the

t\177
Lble),

we have

(56.4

52.2) 2 +

--

\253\177 [(48.6

- 52.2)
2 +

(51.6 --

52.2)

-- 1:.5
(10-3)
where

.\177 =

Yet s} doei not


Table

whil

10-4,

that produce

average

tell the

;h has the

laI 'ge

(i.e., the

of rows

number

r =

s,\177 as

10-4

Samples
Different

Machine\177

i---1

Sample Output
54.6

53.4
--3

56.7

_1 \177 \177i
Fi_-- 1

for

Table

means), and
(10-4)

52.2

the dat

consider

example,

more erratic

10-1, yet

row. The implicatio

within each

fluctuations

chance

?ABLE

\177

whole story;

same

of sample

number

of the Production

of Three

Machines

from

Machine

45.7 56.7 37.7

48.6

48.3

57.5

54.3

52.3

64.5

56.4

44.7

50.6

56.5

49.5

51.6

X --

52.2

a of

machines

ns

of

198
this are shown

outputs could

all sample

that

differences in sample
the (same) differences

the

p's are

(s}) is

chance.

by chance

explained

be

hardly

variance

H0 .-because the

reject

and

different

the

hand,

we measure

can

means

as

the spread

Thus we

compute the

it

interpreting

(or variance) of observed values


variance
within
the first sample
(48.4 --

\177=

'

(n

*\177

where X\177

jth

is the

__

48.6)\370'q-

\177?\177)0-

\177

i=2

the

in

first

10-l,

'\"

sample.

Conceivable

\177

o
o

\337

i=3

each

Table

(10-5)

observed value

i=1

within

.52

\177

in

be

1) j=\177
--

we seem to

? Intuitively,

fluctuation

chance

this

in

conclude

sample

in

large relative to the chance fluctuation.

How

sample.

by

have

now

We

can

means

sample

in

same population--i.e.,
On the other

the

from

explained

so erratic

because the machines in this case are not erratic.


our standard of comparison.In Figure 10-1
(b)we

10-lb,

Figure

be drawn

may be

means

10-1a, the machinesare

In Figure

10-1.

Figure

in

VARIANCE

OF

ANALYSIS

00

..\177

oo

x,;

a,
6O

5O

40

common

(a)

i=1

i=3

i=2

\177

\177

, Apparently
3 different

'\177
i=2
i

populations

o \177oo,

\177

= 3

0o,\177oo

60

50
(b)

FIG.

10-1

(a) Graph

of Table 10-4.(b)

be

Graph
different.

of Table

10-1.The populations

appear

to

'we compute

Similarly

second

an :1 third

(s\177)

the

becomes

(n-

\177easure of

\177m

of

each

or

The

simple

.25

chance fluctuation
the r samples,

freedOin, s\370 that

the

of

dom.

The key question

can

fluctuation
of these

the

within

(10-6)

= .547

variance

sample

have

P001ed ' variance


n\370w b e Stared.
is

\"pooled

to as

is referred

and

we

:s of

fre\177

average

.87 +

.52

1) degre,

degrees
29

chance

variance
(%2).

\177 si 2
-1I'i=\177

= =
sv

variance.\"

the

samples

with

examine the

1)

r(/-

s} has
sx2 large

relative

to %.

In practic , we

199

'

OF VARIANCE

ANALYsIs

ONE-FACTOR

ratio

(10-7)

called the ' 'v j ri ace


n

ratio.\"

whenever Hoi\275

this

is F-introduced

ratio

it

the average,
will

so that,

numerator

the

into
on

have,

will

of statisticaI fluctuation,

becailse

however,

and

true,

a value

below.

sometime\177

0 is nct true (and the/\177's are not the same) then ns} will be
compare]
to s\177, and the F value in (10-7)
will be greater
Formally,
H 0 \177srejected
if the computed
value of F is significantly
If H

relahvely

large

1.

than

for the formal


means are the

same,

meaningless.

a 2,

1. The

(in

fact,

gfeater

;t obvious

be

could

way is

anbther
normal

are necessary

assumptions

true, and the three population


of our data into three samples
is
viewed as one large sampledrawn
three alternative ways of estimating

division

p \177pulation.
Now
consider
of that population.

m(

these

(10-7) from
from three

H0 is

addition,

in

the

then

observations

All

a single
variano

the

variance;
If

below).

lest

samples

our

that

the same

wii:h

populations

we interpret
are drawn

test further,

this

of vieW/Suppose

from

1.

than

del/eloping

Before

point

1;

near

be above'one,

sometimes

to estimate it

by

computing

the variande

of

the one large s\177.mple.


2.
each

The

of the

second

way is to

3 sa nples as in

estimate
and

(10-5)

by

it

(10-6]

averaging

This is the

the variances Wlthin


the denominator

s 2 in

of (10-7).

3.
from

Infer

Chapter

a\177

from

\177ihow

of the populati,'m:

s2x-\177,the

the

observed

variance

variance

of sampl e

of sample

means is related

means.
to

th e

Recall

varfance

(7 2

or

(6-12) repeated

(10-9)

200

VARIANCE

OF

ANALYSIS

This suggests estimating a 2 as nS}r, which is recognized as


(10-7). We note that
we are estimating
population variance
the observed variance of the sample
means.

by

of

numerator

the

up\"

\"blowing

if H0 is true, we can estimate


rr 2 by three valid methods.
last two, we note that
one
appears
in the numerator of
(10-7),the other in the denominator; they should be about equal, and their
ratio closeto 1. [This establishes
why n was introduced into the numerator of
(10-7).]But if H o is not true, the denominator will still reflect only chance
fluctuation, but the numerator
will be a blow-up of the differences
between
means; and this ratio
will consequently
be large.
The formal
test
of H0, like any
other
test, requires
knowledge of the
distribution of the observed
statistic
in this case F if Ho is true. This is
To recapitulate:

Considering only

the

F.o5 value, cutting off 5 % of the upper tail


shown.Thus, if H0 is true there is only a 5 \177 probability
that we would observe an F value
exceeding
3.89, and consequently
reject H0. It is conceivable, of course,
that
H0 is true and we were very
unlucky;
but we choose the more plausible explanation
that
H0 is false.
To illustrate this procedure,
let us reconsider the three sets of sample
results
shown in Tables 10-1, 10-3,and 10-4,and in each case ask whether the
machines exhibit differencesthat are statistically
significant.
In other words,
in each case we test H0:fq = ft2 = fro against
the alternative
that they
are
not equal. For the data in Table
10-3, an evaluation of (10-7)yields:

shown
of

the

in

10-2.

Figure

The critical

is also

distribution

.35

nsx

\177

is below the

this

Since

differences

observed

fluctuations.
in

Table

10-3

in

critical F.o5
means

(I0-10)

_ .64

.547
value

of

3.89,

can reasonaNy

(This is no surprise; recall that


from the same machine.)

we

be

we conclude
explained

generated

by

these three

that

the

chance

samples

Reject H0
0

FIG. 10-2

The

distribution

of F

when

3.89

3
H 0 is

true

(with

2, I2

degrees of freedom).

ONE-FACTOR

For

the

Table 10-4, the

ata in

ratio

OF

ANALYSIS

201

VARIANCE

is

77.4

F- 35.7-2.7
In

case,

this

sample means (and consequently the


is the chance fluctuation
(reflected
in a
ator). Again, the F value
is less than
the critical vaiu e 3.89.
for the data in Table
10-.1, the F ratio is
'

large denomit
However

between

difference

:he

numerator) is

But so

greater.

much

77.4
\177-

case,

In this

tt

in sample

e difference

fluctuation,

F ratio

the

\177king

--

.547

\1774\177

means is very

far exceed the

(\1770-\1772)

large

chance

to the

relative

critical value

so that

3.89,

H0

is rejected.

These
I 0-1

he only case
dil\177 :rent
means.

our earlier

confirm

tests

.\177e
formal

provides

tions have

(b)

thr

in

which

we conclude

that the

Table

conclusions.

intuitive

underlying

pSpUla-

F Dii tribution

The

This

dist\177

bution is so important
some detail. The F

for

later

it is

applications,

Worth

distribution
shown
in Figure
10-2 i s only
different distribution depending on degreesof freedom
(r -- 1) in the \177umerator,
and
degrees
of freedom [r(n -- 1)]in the denominator. lntuitivel),
we can see why this is so. The more degrees of freed6m in
calculating
bo h numerator and denominator, the closer
these two estimates

in

considering

one of

many;

aere

of variance wi\177

is a

a2; thus the more closely their ratio


is illustrated in Figure
10-3.
::
We
could
\177resent
a whole
set of F tables, each correspondingto a
different
combination
of degrees of freedom. For purposesof practical
testing,
however,
only .the critical 5 % or 1 \177opoints
are required,
and are set Out in
Table
VII in t le Appendix.
From this table, we confirm
the
critical
point
of 3.89 used in Figure
10-2\177
will

(c)

likely

The

ANO

This

Tabl e

his

devoted

sectic

the same varia


is th

cr 2-

\177

to a summary shorthand of how


is summarized in Table
10-5.

model

tll samples

ace

their target

1. This

rA
\177

are usually do\177 \177e. The


column
2 that

(Indeed it

be to

around

concentra/

but,

are assulned
of course,

possible differences

in

these

We

CalCulations

confir m

in

drawn from normal populations wit h


means that may,
or may not, differ.
means
that are being tested).

OF VARIANCE

ANALYSIS

2O2

d.f.-- 50, 50

12

d.f. = 8,

d.f. = 2,

0
nominator.

Note

the

The resulting
an ANOVA

is mostly

of the

(b) part of this


Table 10-1.

In addition,

ratio,

we evaluate

table

calculations.One

table

this

is on

with the

two handy

Summary

in

Population

\177

in

denominator;

machines

three

of Assumptions

(3)

Observed SampleValues

Distribution

l'-'n)

N (/\1771'

62')

X\177j

(j=

N(t\177o.,

\177)

X,,\177

(j

= 1 .\"

N(P3,'

cr 2)

Xas

(j

= 1 '\"

n)
n)

(j

= 1 '\"

n)

N(/ti, cr2)

/-/0' tt\177=Pg
\337
these

means

Xi\177

=3*i, for

any

in

intermediate checks on our


3. The other is on sums

column

(2)

Assumed

calculations

showing

row the
the specific exampleof the

of freedom

10-5

row

first

called

10-6,

Table

Of VAriance. This

the second

and

provides

degrees

T^tmE

in

shorthand

arrangement,

a bookkeeping

of the numerator

conveniently
laid out
for ANalysis

are

obvious

an

rejecting

freedom increase.

calculations

table

degrees of freedom in numerator


and deH 0) moves toward 1 as degrees of

point (for

critical

3.89

2.85

various

with

distribution,

how

1.60

FIG. 10-3 The ?

12

are not all equal

II

203

ANALYSIS

204

squareswithD\177
is divided

the

by

The variance

variation
F is

Thus

withD\177

(e.g., machines

parent populations
be systematically

sometimes referred to
F=

the

as

explained variance

(10-17)

Proved as follows.
The difference,
or deviation of any observed
of all observed values (.V), can be broken
down into two parts.
= explained deviation

deviation

,g)--5

(xi;Table

using

Thus,

(56.9)is

than

4.7 greater

of this
unexplained, due
two

On

the

right

(56.4-- 52.2)
4.7 -- 4.24- .5

side, the middle

the

Furthermore,

the

i=1

(.\177/

algebraic
first

--

sample

56.4)

and./'

ij

(cross product)

'=

term

(Xij --

,V/)

sum of deviations

term on the

j=l

side

right

x,.)

XX

ij

(10-14)

is

, which

about

must be zero since

mean

the

of (10-14)

'=

is always

zero.

is'

(10-15)

i=1
of./

independent

Substituting

(56.9 --

over all i

sum

5:22
ij

(10-I3)

Y,) )

is explained by the machine (4.2), while very little (.5) is


fluctuations. Clearly (10-13) must always be true, since the

random

\177j

\177)+ (x\177 -

52.2) =

occurrences
of X\177cancel.
Square both sides of (10-13)and

522;

deviation

deviation

total
to

4- unexplained

value (Xij) from the mean

10-1 as an example, the third observation


in the second
.\177= 52.2. This total deviation can be broken
down into

(56.9 --

Thus most

(\234\177-

or chance
machines).

variance

unexplained

Total

differently).

perform

because it is the random


explained (by differences in
variance ratio.

is \"unexplained\"

rows

cannot

that

2; the

adds

different

from

come

rows

variance

The

sum
of squares
between rows plus the sum of
up to the total sum of squares.
2 When
any variation
appropriate
degrees of freedom, the result is variance.
betwee/\177 rows
is \"explained\" by the fact that the rows may

in column

of squares

VARIANCE

OF

these two conclusions back

into

(10-14),

we have:

(I0-16)
i

Total

\177'J

variation =

explained

variation

+ unexplained variation.

205

OF VARIANCE

ANALYSIS

ONE-FACTOR

suggests

This

of strengthening
this F test. Suppose that these
sensitive to differences in temperature.
Why not introduce
temperature
\177xplicitly
into
the analysis ? If someof the previously
unexplained
variation
ca r
now be explained by temperature,
the denominator
of (10-17)
will
be redm
ed. With the larger F value that results we will have a more
powerful test
the machines
(i.e., we will be in a stronger position t \370reject
Ho). Thus th\177 ofintroduction
of other explanations of variance
will assist
us in

macbin

three

detecting
us

a possiblemeans

as are

wh\177

to two-wa

specificinfluence

ither one

in Sect/on

\177ANOVA

10-3.

is important.

(machine)

This brings

*(d) ConfidericeIntervals
diffi.

The

ANOVA

cas\177

means

tion

practical or
how

\"by

e :onomicimportance. Again,
\177uch

If we wa
\177ted

#2)

usi\177 tg

(,\177 --

-\1772)

differ

the

and

in

the

whether p'bpUla.
a difference
can

ask

\177
such

to be of any
important to find

small

be more

?\"

machines

only two

compare

be an easy c nestion to answer' just


(#\177

to

may

it

means

population

do
to

9 hold true

cited in Chapter

tests

be too enlightening

ffer; by increasing
sample
size enough,
be established.-.even though
it Es too

alway,,

nearly
out

:ulties with hypothesis


as well. It may not

in

construct

i0-1,

Table

confidence

this Would
interval
for

t distribution'

(8-17) repeated
In

(8-17),

more

it is
variance

s\177 was

re\177

frorr

\177q-/\1772)

the

all

use

samples as in
of freedom. Thus

all three
12 degrees

4+4+4=

pooled from the two samples. However,


information available, and pool the

variance

the

.sonableto

(48.6 --

56.4)

obtaining

(10-6),
the

95

\177oconfidence

2.179V/.-\177x/\177

zk

with

2
.547
s\177interval

is

- q-\253
;

Similar

Pa) and

for (/q-

\177nfidence intervals

for

(p\177- Pa)

may be

constructed,
a total
example,theS\177 ,rintervats

of
are' three

(P\177

--

pe)

(/q --/\177a)
(P2 -The

results

of

intervals

1.0

(a)

--3.0

=k

1.0

(b)

+4.8

=k

1.0

(c)

= -=

Pa) =

[or (g)for

his piece-by-piece
approach

7.8 :t:

are

r populations]; inour

summarized

(10-19)
in Table

10'7.
,:

OF

ANALYSIS

206

Differences

I0-7

TABLE

VARIANCE

of Confidence

in

-7.8

*(e)

There is just one difficulty

The level

of

a--

.857,

fact

they

are not;

our observed

go wrong.
in the
system

could

this

confidence

three individual
for example, they

consequence.The problem

is how

obtain the correct simultaneous

In fact,

this

must the
confidence

problem
that

to allow for

(10-19)

- ?a) +

the

(\"

x/r.o5

x/(r--

-\177a) =1=

x/F.

o5

s\177

(r

//

much

simplest,

statements4

are

in

if

wider

level of
due to

true.

(a)

1) 2

(b)

(c)

1)

to

wide as a
order to

a 95%

to yield

proof

are

whole system.

how

around:

order

simultaneously
solutions,
we quote without
% confidence, all the following

in

for the

Thus

s\177.

dependence

this

coefficient

be in
true ?

term

will be

(I0-I9)

in

stated the other way


in

all are

Of the many
Scheft6;a with 95

estimates

But

independent.

were

be
far

be reduced

the common

involve

confidence

is usually
intervals

individual

can

we

Although

(10-19) would

statements
all

all three interval

high,

s\177is

Comparisons

approach.

the above

if the

(.95)

statement [e.g., 10-19(a)],we can be


system of statements (10-19)is true; there

whole

the

that

ways in which

three

4.8 4- 1.0

individual

of each

less confident

with

4- 1.0

-3.0

1.0

4-

Multiple

Intervals:

Confidence

Simultaneous

confident

2
3

Level

Estimate

(/t\177.-/q)

.\177x). 95\177o

Interval

Each

95,\177/o

Means

Population

in

from Sample Means (.\177- -

Estimated

(10-20)

H. ScheffC The Analysis


of Variance, p. 66-73, New York' John
Wiley,
1959.
4 And some other
statements
as well--as we shall see in (10-26). In fact if we were interested
a

in the three comparisons of means


slightly narrower.

only

in

(10-20),

our interval

estimates could

be made

ANALYSIS

oNE-FACTOR

207

OF VARIANCE

where

F.0
tic

r =
n

We

machines

tail.

upper

it\177the

sample variance, as calculated

thl pooled

s\177 =

nt\177mber

Table

--

\177

of

--

the actual

3t2 =

(48.6 --

3ta =

ca.culations
the

to be

compared.

of statements (10-20)and
(10-19).
simultaneous confidenceintervals

10-1,

These

equa-

size.

note the similarity

\177

width

(means)

rows

of

sample

eai\177h

in

10-6 or

in Table

n (10-6)

\177:onfidence

-7.8

4.8

56.4) -_1= 43.89

the

are

(.74)x/\177-\177)

\361

(a)

1.3

(c)

:t=

For

are summarized in

Table
10-8.
As expected, the
than in Tabl e 10-7 (compare 1.3
width (vagueness) that makes
uS 95

is greater

interval

versus 1.0).Indeed,it is this increased


confident thaf! all statements are true.
As
a bones,
this theory can be usedto make
any
number
of comparisons
of means, called\"contrasts.\"
A \"contrast
of means\" is defined as a qinear
combination,

!or weighted

sum,

with

weights

that

zero:

add to

i=1
provided
(10-22)

\177C\177=O

i=I

Differences in Population
Means (3\177i - lb)
stimated from Sample Means (\177 -\177t). 95 % Level
f Confidence
in All Interval Estimates. (Comparewith
'ABLE 10-8

--7.8 4- 1.3
o

-3.0

4-

+4.8

4- 1.3

1.3

208

ANALYSIS

For example,

the simplest

OF

VARIANCE

contrast is the
(q-

/\177-/\1772 --

It

is the difference
(\177=

is

There

no

to the number

limit

contrast of the population


means
the sample means, plus or minus

As another

the

example,

average

of contrasts.It
will be

of means

given

statement, from

95 3/o confidence,

With

contrasts

and

each

the
same contrast
of
(10-21a) is one example.
in (10-24)
is estimated as

\177'

(10-20)

which

that

surprise

by

i
The general

and/\177a'

(10-24)

no

is

estimated
allowance.

error

an

contrast

of/\1772

+ (-

+ (-

= (+

interesting

Another

(10-21a).

in

between/q and the

(10-23)

(0)a

(--1)\177= +

q-

was estimated

that

contrast

this

was

contrast

1)/\177

difference

(10-25)

23

(0-25)

were derived,

is

all

are bracketed by

the

bounds'
(10-26)

\337

provided
2

is pooled
When

set of
three
infinite

(r --

1)(\177

only that \177 C\177 = 0 to satisfy the definition


of \"contrast.\"
variance,
and F o5 is the critical value of F.
we examine
(10-26) more carefully, we discover that

95\177o

simultaneous

statements
number

justifiably
statements

confidence

in (i0-20) but
of contrasts

wonder

\"How

?\" The

answer

also
that

intervals
statements
can

be

this

defines

which includes not only


the
like (i0-25), and indeed an
constructed.
The student may

be 95 % confident
is: because these statements

can we

As before

of an
are

infinite number
dependent.

of

Thus,

for example, once we have


made
the first two statements in '(10-21),
our
intuition
tells us that the third is likely to follow. Moreover,oncethese
three
statements
are made, intervals like (10-25)tend
to follow,
and can be added
with
little
damage
to our level of confidence.As the number
of statements or
contrasts grows and grows,each new statement
tends
to become simply a
restatement
of contrasts
already specified, and essentiallyno damage
is done
to our level of confidence.
Thus,
it can be mathematically confirmed that
the
entire
(infinite) set of contrasts in (10-26)
are all simultaneously
estimated at

a 95 \177olevel

of confidence.

ONE-FACTOR

PROBLEMS

\177

sa

\177

10-1

(a) Usi \177gfirst

Industry

66

Industry

58

a t-test

(as

income (in $00) recorded,as

annual

average

their

with

follows'

\177ndusm.\177s,

at random from two different

was drawn

4 workers

of

63
61

65
53

62
56

or not there is a statistically significant


at the 5 % level.
th e t and F tests exactly equivalent ? Can you
whether

calculat\177

F-test,

then an ANOVA

8) and

Chapter

in

209

vARIANCE

OF

ANALYSIS

difference

in inconce
Are

(b)

freedo m

to as the

is often
referred
the
numerator
?

distribU\177:ion
in

*(c) Usi g first the t distribution


(10-20), construct a 95 \177oconfidence
income\177 in the two industries.
1
=>

10-2

plots of

TwelVe
is held

other 2\177groups.

(a) A1

Yield

60

64

75

70 66 69

74 78 72 68

65

a 5

the

st\177rring

*(c) Can you

be

differences
95

\177 confident

You

observed

lave

cel rain

occupation

the

applied to the

55

:
yield?

affect

income

(Y) of

contrast

a sampleof

a 5

women

610

56

70

50

62

54

48

% level of significance,
income is the same

mean

and

men

and

Men

48

th it

means,

of

to be'

Women

(a) A

mean

similar
to Table' 10-8,
that are statistically significant.
that the two fertilizers have a different

W hat is the difference between


w\177
ighted average of means?

*(d)

in

effect

in a

Control, C

of

Th e first

3 groups.
B are

and

% significance level, does fertilizer


C6\177nstruct
a table
of differences in means,

*(b)

10-3

is

difference

into

divided

while 2 fertilizersA
observed to be:

group

1 degree

F distribution

the

then

for the

interval

the t 2

why

see

with

distribution

(8-17), and

land are randomly

control

\177s a

can

you

for men

reject
and

the
women

hypothesis

null

210

a 95 \177oconfidence

Construct

*(b)

VARIANCE

OF

ANALYSIS

difference in

for the

interval

two

the

means.

Since

(a)

Table

ANOVA

variance

d.f.

Variation

Source

state its solution.

13 we

Chapter

in

F = 56

60

Fs =

52

F\177 =

later

is important

problem

this

Between

sexes

128

128

128

= 2.67= F

48
Residual

288

Total

416

value of 5.99, thus

than the critical

is less

'48

signi-

statistically

not

ficant.

*(b) Evaluate the

or, more simply

in (10-20);

equation

first

(10-18),

t.0s5 = v/F. 05

that

noting

(,u\177

This also confirms

--

,u\177)

the

(52 --

-8

answer

60) 4- 2.45x/\177'-\177/2/4

4- 12

since

in (a);

this

zero,

includes

interval

this is not statistically significant.

'10-4

the

to

Referring

example of

machine

use equation

10-6(b),

(10-26) to

Table 10-1and

Table

ANOVA

the

solve

incidentally

following

problem:

Suppose one factory


the

of the

machines
70\177o.

Find

following

second and

95\177o

for the

production

10-5Fromeach of

is

to be

outfitted entirely

Suppose a second factory

first type.

three

third

confidence

types,

interval

with

of

machines

outfitted with
in the proportions
30 \177 and
for the difference in mean
is

to

be

2 factories.
large

classes,

50 students were

sampled, with

the

results'

Class
A

Average Grade .\177


68

73

70

Test whether the classes

are equally

Standard

Deviation,

11
12

8
good

at a

5 ,%0

significance

level.

TWO-FACTOR

ANALYSISOF

10-3 TWO-['ACTOR

'

VARIANCE

tlready
seen that the F test on the differences
wou !d be strengthened if the unexplainedvariance

in machines
given
could
be reduced.

have:

We

(10-17)

in

for example,that

We suggested
is

variance

the human

duL\177
' to

for. Suppose \177hat

sample

the

variance

unexplained

some

if

or

into account;

be taken

might

:his

temperature,

21i

vARIANCE

Table

ANOVA

The

(a)

OF

ANALYSIS

if

factor, we shall seehow

this

given

10-4

outputs

in

Table

is due to

unexplained

some

might
be adjusted
were produced by

machinist producing one of the sample


reorganized according to a twO-way
classification
by machine and operator), is shown in Table
10-9. It is necessary
to complicate our notation
somewhat.
We are now interested in the average
of eac h opera
:or (.,?j, each column average) as well
as the average
o\177 each
machine
(Xi., each row averageS).
Now the picture is clarified; some operators are efficient
(the
first and
fourth), some are not. The machines
are not that erratic after all; there
is just
a wide differe\177 ce in the efficiency of the operators. If we can explicitly
adjust
for this, it willlreduce our unexplained
(or chance)
variation
in the denomina tor of (10-17)isince the numerator
will remain unchanged, the Fratio Will be
larger
as . a co!p seq uence, p erhap s allowin
gus to reject
. H0. To sum Up, it
appears
that
a nother influence (difference in operators)
was responsibl e: for a
with each
This
data,

different/aachinists

five

on eac h

values

S;,mples

10-9

T^BLE

in Table

oper

\177
Machine

Mac

machine.

at

The

dot

suppresses

arranged

indica

:es the
\177script

(as given

Machines

to machine operator)

Xf.

45.7

48.3

54.6

37.7

Average
48.6

53.4
50.6

54.3
49.5

57.5
56.5

52.3
44.7

56.4

56.7

59.3

49.9

50.7

56.2

44.9

56.7
64.5

avirage
\177.j

the st

Three Different

according

Machine

j=l

hi\177f 1\177...\177
4

Operator

now

or

(Xis) of

of Productisn

10-4, but

subscript

over
1

which summation

j in \177i. = - \177 Xi\177'

occurs.For

example,

5!.6
X

=52.2

the

dot

212

ANALYSIS

OF

VARIANCE

OF

ANALYSIS

TWO=FACTOR

lot of extrane(
us noise in our simple one-way
analysis
in
by removing .his noise, we hope to get a much more

machines.
The anab

of columns

ble I0-10.Of

in

at the
7'

variation

\177
of

bottom of

letter c

small

of our

test

is sum-

and

represents the

before, thecom-

10-4. As

total

the

to

sum

number

variation

i.e.,

column,

this

section;

the previous
powerful

ANOVA,

one-factor

the

the

course,

10=9, and replaces n in Table


shown in column 2

rable

source

ponent

of

fsis is an extension

in Th

marized

213

'

VARIANCE

\275

= ci--1
5;

Z 5---1
Z(x,

i=1

To\177al v

iriation =

machine

variation

(row)

j----1
operator
variation

(column)

7'

+,

\177, \177(X\177+ rand\177)\177r\177


variation

i
ti

\337

that

note

We

(10-27) is estal\177liShed

otaI

the

test

there is a

the

is a

\177'_

-which, if H0is
(10-28)

factor

will

\370nly

means.
those

the random
be interpreted bei\370w.)

(The last term


it will

the

there
\177tions

as

F value

critical
is a
are

difference
shown in

in
full

(10-28)

Thus, if the observed

F distribution.

an

by machines
variance

explained

unexplained

we

reject

may

row

population
in

Table

we

or whether

by constructing

in machines

differences

for

(10-27),

test the extraneous

operators;
in either
taken
into account.

be

in

in machines,

difference

significant

variance

MSS7'
MSS\177,

rue, has

exc\177-\177ds

concluding thai
Our calc u
evaluated

the

to

parallel

maniPUlatiOns,

case.

puzzling;

bit

in

difference

other

On the on!: hand, we test


the ratio

in

of

broken down into components

variation

\177ether there

significant

of

influence

variation;

H potheses

With

can now

a complex
set
the
simpler

may seem

10-27)

in

Testing

(b)

by

(10-27)

\177)\177'

ted by column

exhibi

variation

(10-16) in

used to establlsh
variation

defined as the

this is

like machine

is defined

variation

\177perator

is thjat

difference

-\177s+

,\177i.-

null
means.
the

10-11,

F calculated
hypothesis,

whence (10-28)

is

77.4

FSince

this

exce.

and 8 d.f., and

is the

critical \177F

(10-29)

5.9 -- 13.1
value

of 4.46,

we reject the

null

hypothesi

5 \177osignificance.
UNIVERSITY

CARNEGIE-MELLON

LIBRARIES

UNIVERSITY

PITI'SBURGH,PENNSYLVANIA

152I,'1

ANALYSIS

214

VARIANCE

OF

Two-Way

TABLE 10-11

(3)

(2)

(1)

(4)

10-9

Table

in

(5)

(6)
Critical

Variance;

Variation;

(SS)

Source

Given

Observations

for

ANOVA,

d.f.

(MSS)

Between

machines

154.8

77.4

13.1

4.46

Between

operators

381.6

95.4

16.2

3.84

Residual

variation

47.3

5.9

583.7 x/

Total

14

machines are similar. We now


not reject the null
unchanged, but the chance variation
since the effect of differing
operators

that the

leverage,

statistical

greater

we might

Similarly,

variance;
but
from column

F =

time,

this

case,

null

of

(10-11),

remains
us

hypothesis.

perform

the operators

that

ratio of an explained to an
course, the numerator is the variance

unexplained
estimated

differences. Thus
by operators

explained

variance

the

noise has

\"machine\"

There is one issuethat


our one-factor
test

tion. In

at the spread
whole
row in

of

--

variance

a strong test of how operators


exceeds the critical F value s
that machinists do differ.

our

of the

hypothesis

in

is much
smaller,
This has given

in the denominator
been netted out.

has

null

Ftest

numerator

The

is the

unexplained

this

this with our

hypothesis.

rejection

allowing

test the

equally well. Once again

In

compare

we could

where

x/

been isolated;as

passed

over quickly,

we calculated

observed

Table 10-4. But


observations
columnwise,

values
in

null

that

unexplained

within a
the two-way

we get

a consequence

observed F

Since our

reject the

16.2 (10-30)

5.9

MSS\177

compare.
of 3.84, we

we

95.4 =

MSSc

of 16.2
concluding

value

hypothesis,

still

clarifica-

requires

variation

by

looking

category, or cell, e.g.,within


test (Table 10-9)we have

as well as rowwise;this

has

left

us with

a
split
only

we have a stronger test because


we have gained more by reducing
than
we have lost because our degrees
of freedom in the denominator
have been reduced
by 4. (The student will observe
that if we are short of degrees of freedom-i.e., if we are near the top of F TableVII, loss of degrees of freedom may be serious.)
8 Different than in the previous test since degrees of freedom are now 4 and 8.

Strictly

unexplained

speaking,

variance

TWO-FACTOR

one observation in
tion (57.5) o how
car l

Variation

is produced

output

much

there is only

for example,

Thus,

cell.

each

be computed within
that
cell.
were no random error, how

no longer

a single

observa-

machine 2.
should We do ?

4 on

operator

by

215

vARiANC E

OF

ANALYSIS

What

We
ask,
i\"I f there
would
we predict
the
output of op[rator 4 on machine
27
We note, reformally,
that
this
is a
better-than-a\177erage machine (.\177. -- 56.4) and a relatively efficient operator
(-\177.4 -'
56.2). !On both counts we'would
predict
output
to be above average.
This
strategy
\177an easily
be formalized
to predict -\1772.4. We can do this for each
cell, with the.. random element estimated as the difference in our observed
value (X\177) ar/d the corresponding
Predicte d value (\"\177S). This yields a Whole
set of randorr elements,
whose
sum of squares is preciselythe unexplained
variation
9 SS
(the last term in equation
(10-27),
also appearing in column
2
of Table 10-1( ; divided by d.f., this becomes the unexplained variance used
in the denomi nator of both tests in this
section.

One final warning'

there is

no intiraction

if certain

oper\177ators

valut

Predicted

some

\"\177ij is

-=

defined

\177 +

as

X) +

in o!r example

Specifically,

.'\1779.\177
= 52.2

\177

-- 52.2
Thus,
our predic
tion of
adjusting average

(4.2)and

the
Cancelling

de! ee

to

el

emphasize

both machine
In

our

th;

iar

exam'

random

d operator

le

Thus, this obserw d output


plained--the
re sul of rando

Unexplained
elements

as defi

ne

= -'\177. +

ement, being the difference

\177t
this

'ariation

(56.2- 52.2)
2 is

machine

on

this

machine

calculated

by

is above average

(4\1770).

in (10-32):

X/\177 -

We

(10-32)

+ (56.4- 52.2)+
+ 4.2 + 4.0-- 60.4

'\177J

and the random

-- \177)

of operator 4
(52.2) by the degree to which
this operator
is above average

which

\177values

(,\177j

performance

the

performance

(10-31)

performance

oper\177ator

(,\177i.

-!- adjustment

performance

machine

reflecting

adjustment

:eftecting

'\177*\177'

that

assume

we

complex model, and more sample observations.The

a more

require

would

.,\177j,

the two factors as would occur, for example,


machines,
and dislike others; suchinteraction

between
like

output

predicted

computing

in

j.

,\177

-'\177i. -

57.5 --60.4

&\177

is 2.9

units

(SS\177)

is recognized

below

m influences.

what

(10-33)

and expected,

observed

3\177.\177
+ \177

left

Output

X9.\177-

in (10-34).

-- -\177

between the

X\177j -

eleme nt is

\177,j

becomes:
(10-34)

after adjustment

unexplained

(10-35)
i!

= --2.9
we expected,

and

must

be left Unexi!

to be

the

sum

for

of squares

of

all

random

216

analysis of variance developed


in
interaction
does not exist.

two-way
tion

VARIANCE

OF

ANALYSIS

that

*(c) Multiple

on the assump-

is based

section

this

Comparisons

Turning

tests to

hypothesis

from

statement for

in row

may write

to (10-26)'

means

the bounds'

within

fall

is quite similar

all contrasts

95 \177oconfidence,

With

we

intervals,

confidence

which

ANOVA

two-factor

(10-36)
\177 Ci/\177i

4- x/F.o s

\177 Ci,\177i.

sv

where

F. 0s = the
=

sv

critical value

of

calculated

x/Mss\177,, as

r -- number

of rows

c --

of

number

(r --

F, with
in

1) and

10-10,

Table

1)(c --

--

(r

1) d.f.

column

columns

that (10-36) differs from (10-26) becauseunexplained


smaller, making
the confidence
interval more precise.
As an example,considerthe machines
of Table 10-9, analyzed
10-11. With 95% confidence, all the following
statements

Note

sv

variance

is now
Table

/q-/\177,.= (48.6-- 56.4)


t\1772 =

/\177t --

/q

--/\177a

and
[Intervals

that

do

all

-- 7.8

4- 4.5'

--3.0

q- 4.5.

4.8 4-

other

possible

/\177z --/\177a

overlap

not

significance'
thus H0 (no
cases
another
illustration

hypotheses.]
Of course,we
interchanging

in

44.46

v/5---.\177

are true'

25

(0-37)

contrasts

in

starred to
means)

the column
(10-36).

their

indicate

would

of how confidenceintervals

equation

ANOVA

4.5'

zero are

difference

contrast

could

r and

4-

in

statistical

be rejected
may

be used

in

these

to test

means equally well, by simply


As an example, how do the

ANALYSIS

TWO-FACTOR

operatorsof
95 \177oco

With

analyzed in

10-9 compare, when

'able

the

all

\177fidence,

following

4- 7.8*

8.6

=t=

3.1

4- 7.8

v/3.84

217
Table

ANOVA

10-11 ?

are true'

statements

(59.3--49.9):Jr
9.4

vARIANCE

OF

\177v'/\177(2
i

x/5.-\177

718'

14.4 4- 7.8*

--0.8

7.8

4-

(0-38)
and

all other

For

example\177

possible contrasts, of the

(55.4 --

This last conworkers

2 and

:ast

might

be

of interest

5 are women; thus


has been estirr ated, as a bonus.
The

presented

first

of

part

mot

\177concisely,

BLE

form

the

8,0 4-

th

if workers 1, 3, and 4 are men, and


difference in men and women

average

diffeYences in means-

equation
(10-38)--all
in Table 10-12.

10-12 Differencesin Operator


the sample means

e listed

95 \177 simultaneous

2
9.4*

8.6*

-.8

3to\177

To

\177o0.

take

intervals,

differences

significant

ttj

Means
(,\177.j-

confidence

value + 7.8. Statistically


are starred.]

\177

5,0*

[l\177stimated from
c\177instruct

47.4) 4- 5.5x/5-

3.1

-6.3

--5.5

14.4'

5.0
5.8

11.3'

can be

ANALYSIS

218

VARIANCE

OF

PROBLEMS

refine the experimentaldesignof Problem


10-2,
suppose
the twelve
plots of land are on 4 farms
(3 plots
on each). Moreover, you suspect
that
there
may be a difference in fertility
between
farms. You now
retabulate the data in Problem
10-2, according
to fertilizer and farm

10-6 To

as follows.
\177Farm

Fertilizer'\177

69

72 74
or not

whether

(a) Reanalyze
nificance
level.

75

(10-7)

of differences

struct

a table

Three

men work

boxes

packed

on an

by each

television

22 22

four farms?

similar

to Table
also con-

boxes. The number


in

the

of

below.

table

18
significant at the 5 \177 level.
significant,
construct

is statistically

statistically

are

confidence
pulse

Before

25

16

17

with the following

program,

sig-

21 18 21

P.M.

for

the

5\177o

4-5 P.M.

10-8

at the

significant;

is shown

hours

11-12A.M.

*(b)

of

fertilizers

of packing

task

Hour \177

table of simultaneous
95 %
Five children were tested

differ,

fertility

in

3 given

(a) Test whether each factor


For the factors which

68

in the

\177Man

1-2

66

78

farms.

in

identical
in

55

7O

the statistically

that

differences

fertilizers

th\177

(b) Is there, after


all,
a difference
(Use a 5 \177 significance
level.)
*(c) Construct a table of differences

10-12, starring

60 64 65

Control

intervals as
before

rate

results:
After

104

96

102

112

I08

112

89

93

85

89

in

Table

and after

10-12.

certain

ANALYSIS

TWO-FACTOR

(a) I'est whether pulse rate


for

tt

10-9 Rewc
First,

a 95

Zonstruct

e population

\177oconfidence

rk Problem 10-8using

tabulate the

,efore (X)
96

104

102

112

108

112

89

93

85

89

The s mple of

equat on

(Y)

After

(8-15).

D's fluctuates

interval

the following
in pulse rate'

change

5 \177osignificance
for the change in

at the

changes,

of all children.

219

OF VARIANCE

technique

Difference D

level\370
p\177lse

rate

(matched t-test)

(Y-Jr)

+8

+10

+4

sample

to estimate

+4

+4
around

the

true difference

. Now

apply

chapter

to Regression

Introduction

Our
first
example
of statistical inference (in Chapter
7) was estimating
the mean of a singlepopulation. This was followed
(Chapter
8) by a comparison of two population means. Finally
(Chapter
10) r population means were
compared,using analysis of variance. We now consider the question \"Can
the
analysis
be improved
upon if the r populations do not fall in unordered
categories, but are ranked numerically 9.\"
For
example,
it is easy to see how
the analysis
of variance could be used
to
examine
whether
wheat yield depended on 7 different
kinds
of fertilizer2
Now we wish to consider
whether yield depends on 7 different
amounts
of
fertilizer;
in this case, fertilizer application is defined
in a numerical
scale.

If yield (Y) that


scatter

similar

various

from

follows

1-11

to Figure

be

might

fertilizer applications (X) is plotted,a


observed.
From this scatter it is clear

\337
\337

\337

\337
\337

\337
\337
\337

\337

\337
\337

\337

\337

\337

\337

100

200

300

1 By

extending

Problem

Observed

\337
\337
\337

\337

\337

relation of

,,,x

400 500 600 700

Fertilizer

FIG. 11-1

\337
\337

wheat

10-2.

220

(lb/acre)
yield

to fertilizer

application.

that

of fertilizer

the amount
\177

affects yield

fe[tilizer

how

define

dependence

a curve

ito fitting

geometrically

of Y on X. Ttjis regression
brief and pre\177ise description,
given

the quan!

of a

ity

regression
model, useful as a

mathem,aticat

exclusively to how

is devoted
of

cl\177aracteristics

statistical test\365

significance;

of

(e.g., its
but these issues
line

this

AN

may

are deferred

to

X.

more
Instead

Chapter

to
12.

complicated
we assume

by the

tions, so that. the


one observation

is not

experimenter,

it

are available
experimenter
sets X
Y in each
case, shown

S\177ppose

funds

referred to as the \"dependent\"


dependent on yield, but instead

it is

fertilizer,

application

fertilizer

si\177hce

is determine(/
variable

best be

be subjected

E21AMPLE

Since wheat yield depends on


Y;

may

line

straight

slope)

Furthermore, i It \177spossible
that
Y \177srelated
to J6 \177n a
nonlinear wa'; but these issues are not dealt with here.
that
the
appr( priate description is a straight
line.

variable

of course equivalent

income.

cha')ter

fitted. The

11-1

the so-called

scatter,

this

a simple

be

or as a means of predicting the yield


Y for a
Regression is the most useful of all statistical
in economics it provides
a means of defining
good demanded depends on its price,
or how consump-

of

tion depends )n
This

an equation is

through

describing

equanon

fertilizer
X.
A\177 another
example,

amount

techniques.
how

will

an

define

be possibleto

it should

Moreover,

\177.e.,

X. Estimating

on

o\177'

....

matter.

does

221

EXAMPLE

AN

is

referred

to as the

for only seven


at seven different
in

Figure

\"independent\"

observaonly

experimental
values,

11-2

and Table

taking

11-11

80
60 -

20--

100

300

200

FertiliZer

Observed wheat

yields

400 500 600

700

(lb/acre)
at various

levels of

fertilizer

apPlicationl

INTRODUCTION TO

222

REGRESSION

Experimental
Data
of Wheat to the
of Applied Fertilizer, as in

11-1

TABLE

Yield

Relating

Amount

Figure

11-2

Yield

Fertilizer

(bu/acre)

(lb/acre)

100

40

200

45

300

50

400

65

500

70

600

70

700

80

of all note that if the points were exactly


in a line, as in Figure
then the fitted
line
could
be drawn in with a ruler \"by eye\" perfectly
accurately.
Even if the points were nearly
in a line, as in Figure
11-3b, fitting
by eye would be reasonably satisfactory.
But
in the highly scattered case, as
in
Figure
11-3c, fitting by eye is too subjectiveand
too
inaccurate.
Further-

We first

11-3a,

more,

fitting

by

eye

plotting all the points

requires

first.

If

there

\337
\337
\337
\337
\337

x
(a)

\337
\337

\337
\337

\337

(c)
FIG.

11-3

Various degrees of scatter.

were

100

POSSIBLE

observations

experimental.

The foll\177)wing
line, succesSiYely

It is tim
is,

\177gure
fitted

\177I
t is

11-4.

line..

i.i.\177.
,

of the line.
the line

V\177'e

and

precisely \"What
is a good fit
the total error small.\"One typical

define\177d
\276i

note

ii\177egative

1. As our

the total of

aI

first
these

satisfactory.

more

ask

makes
-

as

\276i),

that

the
where

tentative
errors,

the

\"fitted value

\177Y\177
is the

error

of

\276\"

or

Y'to the

the

ordinate

positive when the observed \276\177


is

observed

criterion,

answer surely
is shown in

?\" The

the observed

from

distance

ver\177tical

the error is

when

223

LINE

FITTING ALINE

FOR

CRITERIA

to

tha

fit

\"a

set

sections

would
be very tedious,
and an algebraic
computer could solve would
be preferable.
forth various algebraic methods for fitting
a

more sophisticatedand

POSSIBLE

11-2

this

electronic

w.\177ich an

technique

FOR FITTING

CRITERIA

\276i is

consider a

above

the line.

below
fitted

line

which

minimizes

crite 'ion worksbadly.


Using
this criterion,
the two lines shown in
f t the observations equally well, even
though
the fit in Figure
11-Sais intui' ively a good one, and the fit in Figure 11-Sb is a very bad one.
The problem is one of sign; in both cases positive errorsjust offset
negative
errors,
leavi n their sum equal to zero.This
criterion
must be rejected, since
it provides
nc distinction
between bad fits and good Ones.
2. There
two ways of overcoming
the
sign problem.
The firs t is to
But this
Figure

11-5

\276

Error..,=

\177

\177',..\177\177J

Xs

Error

in fitting

ne

Xi

X.9

FIG. 11-4

Fitted

points

with

a line.

224

TO REGRESSION

INTRODUCTION

x
(b)

(a)

FIG.11-5
the

minimize

absolute values of

sum of the

\177

Since

this criterion
drawback.

out bad

rule

It is evident

criterion better

than

errors are not

large positive
would

lYE

the

fit

allowed

Figure

11-6,

that

in part

a;

errors,
(11-2)

PJ\177I

like

fits

Figure

in

the

to offset

large negative

11-5b. However,

the

(\177 I Y.\177--

it

still

b satisfies

fit

in

part

\177l

is

3, rather

ones,
has

this

than 4). In

fact, the reader can satisfy himself that the line in part b joining the two end
points satisfies
this
criterion
better than an)' other line. But it is not a good
common-sense solution
to the problem,
because it pays no attention
whatever to the middle
point.
The fit in part a is preferable because it takes
of all

account

points.
Y

I0

x
(b)

(a)

FIG. 11-6

THE

225

SOLUTION

SQUARES

LEAST

3. As

to

second

minimiz,

the sum

overcome the

way to

of the

squares

squares\"

\"least

amous

(a)

errors,

criterion;

its

propose

we finally

(11-3)

is the

This

problem,

sign

of the

the

overcomes

Sq\177taring

sign problem

include-

justifications
by

making

all errors

positive

(b) Sqt.aring e nphas\177zes


the
large
errors, and in trying
to satisfy this
criterio \177large errors are avoided if at all possible. Hence all points are
taken irto account, and the fit in Figure 11-6a is selected by this criterion
in preference to Figure 11-6b.

(c) Th

chapter.

Figure

Our scater of observed X and


11-7
Our objective is to fit

This

involve:

three

that:

from

Y values
a line

Y = a0

steps'

Translate

Step
variable
x,1.st

theoretical justifications

for least squares,


!

SOLUTION

SQUARES

LEAST

is very manageable.

squares

are two
important
in th e next

Th4re
develoP[d

11-3 THE

least

of

algebra

(d)

Table 11-1 is graphedin

(11-4)

+ bx
i

.g into deviations

from

its

mean;

a new

i.e., define

x=K-\177

In Figure 11-7bwe
to

similar

show

procedure

th\177

how

(11-5)

this involves a geometric translation


in Section 5-3, where both

developed

axis

;f
axes

were

to sltudy covariance.
The new x value becomes positive
or negative
on i whether
,V was above or below \177. There
is no change in the
Y values.
a0, but the slope b remains
the
same.Th e' intercept a differs from the original
translated
depending

One

of

value is that
.Y is
be

unusually

simplified

advantages of measuring .Vz as deviations


from their central
more explicitly ask the question \"Howis Y affected
when
large,
or unusually
small ?\" In addition,
the mathematics will
b\177ecause
the
sum of the new x values equals zero 2

tl' e

w e can

Proof:

Noting

that the

ean g is defined

\177

as \177 Xi

\177 xi

xi

-----

(11-6)

it follows

= n\177- nX

that

= 0

\177

Xi

= rig

and

(11-6)proved

TO

INTRODUCTION

226

REGRESSION

80

\177+

60 __

bX

is .slo?e

40
2O

-ao

I00

200

300

400

500 600

700

+ bx
I

60

b is slope
of this line\177

I--I

20 P-

x=X-X

I
t
t

-300

100

-i'\177

\337

400

300

200

+300

+200

+100

-100

-200

500

600

700

(new

origin)

X (old origin)

X
(\177)

I 1-7

FIG.

Translation

of axis.

(a)

Fit the line

in

11-7b;

Figure

Regression,

X.

translating

Step 2.

variables. (b)

original

using

Regression,

i.e.,

fit

line

the

y=a+bx

to

this

scatter

the values

by selecting

criterion, i.e., selectthose

Since

the fitted

value

?i is

on

our
\177

When
a and

for a and b that

of a

values

and b

a +

\177

satisfy

the least

minimize

squares

(11-8)

line (11-7)

estimated

this is substituted into (11-8), the


b to minimize the sum of squares,
S(a, b)

that

(11-9)

bx\177

problem becomesone of

( Y\177--

bx\177)2

selecting

(11-10)

THE
The

our

a{ what value of a and b it

the next

.lest minimization

paral

more cumberi

calculus

Minimiz ng S(a, b) requires setting its


a and b equai to zero.In the first instance,

thr,

.ugh

that

be used

will

(11-10)with

the resulting

where

us

in
the

\177x

with respect to
the partial derivative with

derivatives

partial

setting

i =

by

l)(\177

b\177,) \177=

rearranging'

(11-6),

a=

2(-

b.\177,)\370'
=

and

--2

by

-- b

na

\177 Y\177--

Noting

us

to zero'

ual

Z (,\276, -

Dividing

give

below.]

tted

respectto a e

it

can minimize

11-1,and rejoin

of Appendix

algebra

and

is calculus,

technique

without

[Readers

\177raph.

;ome

is st,

theorem

will

squares) line.

(least
sim

too,

vary

will

This

a minimum.

will.be

depend s on

expression

this

that

lines are tried), S(a,b)

as various

(i.e.,

b vary

td

optimum

The

is usedto emphasize

g(a, b)

notation

and b. As a
and we ask

227

SOLUTION

SQUARES

LEAST

\177 x\177 =

a.

solve for

we can

(11-13)

a--

or

\177 Y\177

Thus our lea\177t squares


estimate
of a is simply the averagevalue
of Y; referring
to Figure 111'7,we see that this ensures
that our fitted
regression
line must
pass through the point (X, Y), which
may be interpreted
as the center of
gravity

to b

of

of n

sample

t\177e

It is alS0 necessaryto
equal t. zero,

\177 (y\177

a _

points.

set the

bx\177.)\177.
=

\177 \177,.\177(\177

of (11-10)

derivative

partial

\177

2(_xi)(y\177

b\177,)

-=

a -- bxi)t

with

= O

respect

(11-14)
(\177-\1775)

Rearrang'in i

Noting

th

al

\177 x\177=

O, we

can solve for


b --

b.

\177x,

(11-16)

228

INTRODUCTION TO

REGRESSION

$++

\177 c\177\177

\177 \275r\177

II

II

II

II

THE LEAST SQUARES


resu

Our

ts 3

(11-16) are

and

(11-13)

in

229

SOLUTION

important

to restate

enough

as:

Theore m

their
and

as deviations from
least squares values of a

measured

x values

With

the

mean,
b are

a=Y

(ll-13)

b =

problem

For the ixample


five colu:nns in

first

Table

until

the next chapter). It

This

fitted

in

Table

11-2;

(the

follows that
Y

is graphed

lin.

If desired, this

Step $.
our

original

f\177ame

the

original

X values'

of

11-1, a and b are calculated i:n the


last three columns may
be ignored
the least-squares
equation
is:

11-7b.
now be retranslated back
11-7a. Express (11-17)interms

can

regression

in Figure

reference

(! 1-17)

60 +.068x

Figure

in

(11-16)

\177 Y\177x,

Y =

60 +

.068(X-

X)

60 +

.068(X-

400)

into

of

= 60 + .068X'-27.2
This

is graphed

lin

fitted

A compz..'ison

intercept.
may

(b

regressii\177n

fitted

be

of (11-17) and (11-18)confirms


= .068) remains the same; the

M\370\177reover,

is

fertilizer

iour

t \270ibe

alternatNe
X'

3\1770'

least

To

for

any

that

the

slope

of our
is in the

difference

only

easily the original intercept

applied,
our
Y = 32.8 +

(a0 = 32.8)

60 +

be perfectly, rigorous, we could


to zero, \177 actually
do have a

saddle point or

ocal minimum.

is now easily

application

(11-18). For

example,if

3501b

of

best estimate of yield is


.068 (350)= 56.6bushels/acre

least squares equation


then
x = --50, and
=

fertilizer

given

equation

squares

equal

how

note

we

of yield

estimate

An

When

11-7a.

Figure

in

(i 1-18)

.068X

rec\370V\177red.

derived from

The

32.8 +

have

(11-17)yields exactly

.068 (--50)
shown

minimum

the

same

result.

= 56.6

that when the partial derivatives are set


of squares. rather than a maximum,

sum

TO

INTRODUCTION

230

REGRESSION

PROBLEMS

(Save
11-I

work

your

in

a random

Suppose

for

chapters,

three

next

the

sample of

reference.)

future

income and

had the following

5 families

savings:-

(b)
11-2

Use

Interpret

$8,000

11,000

1200

1000

9,000

6,000

6,000

and graph the

regression

the intercepts

a and ao.

700

300
of savings

line

of Problem 11-1to regress


C = Y- S.)

the data

S on income
C on

consumption

Y.

income

Y.

define

(Economists

11-3To interpret
the

$600

(a) Estimate

Savings S

Income

Family

slope

regression

the

b, use

equation (11-18)to

answer

questions.

following

(a) About how much is the

for every

increased

yield

pound of fertilizer

applied?

(b)

were

If wheat

it be

would

pound,

(c) To

[The answer to

$.25 per
drop

to

?
is simply

(a)

of fertilizer

effect

\"marginal\"

economical
approximately

price

what

it economical

make

bushel and fertilizer cost


to apply fertilizer ?
would fertilizer have to

per

82

worth

the slope b. Economistsrefer


x on yield Y.]

::> 11-4If we

translated
both
X and Y into deviations
was translated in Figure
11-7b),
then
(a) What would the new y-intercept be? Would
same ? Doesnot this imply that the fitted regression

x and
the

slope
equation

to

b as

(just

as

the
X

remain the
is simply

g =bx

(b) Prove
in terms

that

\177

xiy \177--

of deviations

\177 x\177Y\177,hence

as
b=

'11-5

(Requires
bein\177

calculus.)

translated

into

\177

we

may alternatively

write

XiYi
2

Suppose X is Ie\177t in its ori\177innl form,


x (deviations from the mean).

rather

than

(a) Write
a0 and

the

11-1

partial derivatives

to zero the

(d)
Suppos

50

40
40

60

\177
I

40

30

50

50

(a) Fit

regression line of P on R.

(b)
Does
Criticiz{

this regression

\177

line \"show how research

SQUARES ESTIMATES
i

o e

of minimizing

\177n

ordinary

where k\177., k\177, k,\177are


With a

lit'

quadratic

f(b)
constants,

[e algebraic

LEAST

OF
b,

AND

\177

necessary to solve the theoretical


problem
function
of one variable b, of the form

b, it is

a and

est\177mating

OF a

(11-19)

= k2b 2 + ktb + ko
with k\177 > 0.

(11-19) may

manipulation,

be written

as

(11-20)

4k\177?

Note
that b ap Dears in the first
hope of minimizing
the
expression
the first term. t.eing a squareand

minimized

whei\177

it is

zero,

?\"

CALCULUS

WITHOUT

generates profits

DERIVATION

ALTERNATIVE

AN

of dollars)

(thousands

\177.

1-1

PrOblem

Expenditure,

Research

of dollars)

(thousands

APPENDIX

and b,

and researchexpenditures.

had the following profits

firms

four

Profit,

Be\234

to ao

respect

with

\"normal\" equations.
Ev41uate
these
two normal
equations using the data in
anld solve for a0 and b. Do you get the same answer ?
Co'nparethe two alternative
methods
of solution.

Firm

of

in terms

(11-10),

so-called

two

obtaining

(c)

squared deviations as in

sum of

3.

Set i equal

(b)
thus

\337
11-6

out

231

11-1

APPENDIX

term, but not in the second.


lies in selecting a value
hence
never negative,
the

that is,

Therefore

of b to
first

term

our

minimize

will

be

when

b +

k\177

2k2

(11-21)

232

REGRESSION

TO

INTRODUCTION

then

b
2k2

This result
of

function

is

this

With

will

b) =

S(a, b)

-- a) 2

[(Y,

\177

\177

[(\177,

power)

first

(11-23)
problem

the

to

return

of selecting

- a)

(11-24)

b\177,]\177

follows'

this, as

to manipulate

useful

be

let us

quadratic

minimize

s(a,

It

of secondpower)

2 (coefficient

in hand,

theorem

a and b to

of

restate,

setting

by

(coefficient

--

11-8. To

Figure

(11-19) is minimized

the form

b=

values for

in

graphically

shown

-- 2b(Yi

-- a)xi +

(11-25)

bSx\177]

(11-26)

In the

middle term,

consider

Z
Using

this

to

--

(Y'

\177) =

\177

( \177-

is a useful recasting of
while the last 2 terms contain

This

only the

(Y\177

--

-- a

(11-26)we

a)'-

2\177 \177

b alone.

a)

\177--

f(b) = k2b

+ klb

To

the

find

This may
Y\177

--

2a E

have

\177

(11-24),becausethe

is relevant.

term

first

Yixi

\177

term of

the middle

rewrite
S(a,

(11-27)

a)sci--

first

term

of a

value

be written
has

Y\177+

k.o

-b

-k l
2k\177_

FIG.

11-8

The

minimization

of a

quadratic

(11-27)

\177

function.

contains
which

a alone,
minimizes

t \270 (11-23),

According

this

is minimized

when

(11-13) proved

2/'/
To

find

are relevant.

te value
\\ccording

of b which

minimizes

to (11-23),

_...

-(-2

233

1 I-1

APPENDIX

this

(11-27),
is minimized

only

the last

tw\370

terms

when

,Y_.,

(11-16)

Proved

chapter

Regression

12-1

have

So far we

This yielded a and

b,

statistics of the sample, (like \177 in Chapter


2); now we
inferences
about the parent population (like our inferences

make

to

wish

a line.

fitted

mechanically

only

descriptive

are

which

MODEL

MATHEMATICAL

THE

Specificallywe must consider the mathematical model


tests of significance
on a and b.
Turning back to the example in Section
11-1, suppose
that the experiment could be repeated
many
times
at a fixed value of x. Even
though
fertilizer
application
is fixed from experiment to experiment,
we would not
observe exactly the same yield each time. Instead, there would
be
some
statistical
fluctuation of the Y's, clusteredabout
a central
value. We can
think
of the many possible values of Y forming
a population;
the probability
function of Y for a given x we shall call\177?(Y/x).
Moreover,
there will be a
similar probability function
for
Y at any other experimental level of x. One
in Chapter 7).
us to run

about/t

allows

which

of Y populations

sequence

possible

obviously be mathematical

To keepthe
about

the

manageable,

problem

of

regularity

these

is shown

in

12-1a.

Figure

in analyzing

involved

problems

There

would

such a population.

we make a reasonable set of assumptions


populations,
as shown in Figure
12-lb.
We

assume the probability functions

?(Y\177/x\177)

have

1. The same variance a2 for all xi; and


2. Means E(Y\177) lying on a straight line,

as

known

the

true

regression

line'
\234( \2763
\177
Remember

that

capital letter

denotes

the

our notation
an

original

conventions

(12-1)

=
are

different

observation and a

mean.

234

small

from Chapters 4 to 7. Now a


denotes its deviation from

letter

THE MATHEMATICAL

235

MODEL

p(\276/x)

(a)
i

p(Y/x)

FIG.

General populations of Y, given x. (b) The special form of the


of Y assumed in simple linear regression.

12-1

The populat
from

i on

of

a large valm
by

parameters

fi specify

\177and

information. We
The 'andom variables

the line;
that

statistically

are

Yi

to make

tend

not

Y\177does

assume

also

sampig

3.

x3

x2

Xl

Y2

they

are

to be

independent.'
large;

i.e.,

Ya is

populations

estimated

For ekample,
\"unaffected\"

Y\177.

be

may

written

more

concisely as'

These aisumptions
th The

iJi

random

variables

Yi

mean

I
i

and

variance

are

--

statistically

independent,

(12-2)

236

written

describe the deviation of

?t is useful to

occasion,

On

value as the
be

THEORY

REGRESSION

error or disturbance term


Yi =
where the

e\177, so

means

translated

No

onto a

In fact, the

zero mean.

let us consider
or disturbance term

Now

in

e is

of

just the

distribution

detail

more
ei.

of

TERM

ERROR

the \"purely

random\" part

Yi,

of

does it exist ? Or, why


doesn't
and exact value of Yi follow,
once the value of xi is given
?
The error may be regarded as the sum
of two components:
error

later.

assumption

THE

of

as the
before

refer

normality

THE NATURE OF

12-2

e are identical, except that

Y and

the shape of the distribution


to assumptions
(12-4)
results
as possible
from these,

or otherwise). We therefore
set\"; we shall derive as many

\"weak

(12-4)

distribution

(normal,

adding a more restrictive

with

yet about

made

is

assumption

variables,

- 0

of

distributions

the

that

differ.

alternatively

variance

and

We note

may

(12-3)

random

mean

their

its expected

Y\177from

the model

4- f9.r i 4- e i

rz

independent

e\177are

that

the

a precise

Why

(a) MeasurementError

There are
measuring
inaccurate

at various income
of budget and

(b)

reasons

various

levels,

is a study

the measurement

be measured incorrectly.
due to sloppy harvesting

Y may
error

an

error

the

of
in

or

of families

consumption

consumption

In

might

consist

Error

This occurs becauseof

social phenomena.

Even

of our

would result

be

inaccuracies.

reporting

Stochastic

repetition

why

wheat yield, there may


weighing. If the example

in

the

inherent

wheat experiment
different

yields;

irreproducibility

were no measurement

if there

using

these

exactly

the same

differences are

of biological

error,

and

continuous

amount of fertilizer

unpredictable and

are

ESTIMATING

stoch.

called

for\177\177tic

control

cot

But

etc.

duplicated.

omitted

vari\177

In the s

cz

AND

237

/3

They
may be reduced by tighter
experimental
by holding constant soil conditions,amount
of water,
control
is impossible--seeds,
for example, cannot be

differences.

example,
\177?lete

may

error

tochastic

.bles, each

be regarded

as the influenceon

of

many

small effect.

individually

an

with

experiments
are usually
not
possible.
hold U.S. national income constant
for
several
yearSt i while
he examines
the effect of interest
rate
on investment.
Since he cannot neutralize
extraneous
influences
by holding them constant,
his best alter[\177atiVe
is to take them explicitly into account, bv J reeressin
o
g Y
For

\177cial

sciences,

controlled

an

economist

cannot

example\177

on x

and

th\177

stochastic erqor; it

is

next chapter.

the

12-3

factors.

extraneous

called

ESTiiX [ATING
Suppose

in Figure

that

12-

\177

Th\177s

\"multiple

fl

AND

our true regression

!. This

is a useful
technique for reducing
regression\" and is discussedfully
in

fix is

0\177
q-

the dotted line shown

to the statistician, whose job it is


to estimate it
as best he can by observing
x and Y. Suppose at the first
level
x\177, the
stocha
;tic error e\177takes
on a negative value, as shown
in the diagram;
he will obse r,
'e the Y and x combination
at P\177. Similarly,
suppose
hi s only
other two ob
;ervations
are P,. and Pa, resulting from positive values
of e.
will

remain

unknown

p(Y/x)

Xl

FIG. 12-

True

(population)

x2

x3

regression

and estimated

(sample) regression.

REGRESSION

238

squares

line Y--

a 4- bx, applying

solid estimating line

ing, the

points

he has

information

reader

estimates the true line

statistician

the

suppose

Further,

THEORY

be sure

he can

regression and its surrounding e

estimated regressionline

on

the

Pa.

and

a critical diagram; before proceed-

This is

figure.

this

in

should

of Chapter 11 to the only


He would then
come
up with the

method

the

P\177, P\177.,

a least

fitting

by

clearly

on

distribution

between

distinguish

one

the

hand,

the true

and the

other.

Unless the statistician is very lucky


indeed,
it is obvious that
his estiline will not be exactly on the true population line. The best he can
hope for is that the least squares method of estimation
will
be close to the
target. Specifically,we now ask: \"How is the estimator a distributed
around
its target 0\177,and b around
its target fiT'

mated

12-4 THE

MEAN

We shall show that

OF

VARIANCE

AND

the

estimators

random

a AND b
a and b

have the

following

moments'

E(a) =
var

(a) =
E(b)

\177

(12-5)

--H

(12-6)

fi

\177

(12-7)

(12-8)
var

(b)

where a2 is the variance


of the error (the varianceof Y). We note
from (12-5)
and (12-7) that both a and b are unbiased estimators
of = and fl. Because of
its greater importance
we shall
concentrate
on the slope estimator b, rather
than
a, for the rest of the chapter.
of (12-7)

Proof
written

and (12-8). The formula for b

in

(11-16)

may

be re-

as

(12-9)
where

(12-10)
Thus

where
Wi

----

x,
k

(12-12)

THE MEAN
Since each xi i!

a fixed

VARIANCE

AND

so is eachwi.

constant,

OF a

Thus

AND

from

(12-11)we

eStkblish

the importan t! conclusion,


b is a weighted

of the

by

Hence

(5E(bl

may

vm

meat

For the

the

that

(b)=

w\177var

, from

combination)

(12-13)

write

may

= WlE(Y1)

.n ting
write \177'

Moreover,
we

;1) we

sum
(i.e., a linear
variables
Yi

random

Yx

+ w} var Y, =

+\"'

\177

(12-14)

wiE(Y0

assumed independent, by

Yi are

variables

w,,E(Y,O

w\177E(Y2)\"'

2
\177 w\177var

(12-15)

Yi

(12-1)

(12-14) and

(12-16)

=
and

(5-34)__.

i12217)

w,

2-12)

noting

(12-18)
k

but

\177

xi

is z, to,

according

\177

to (11-6).

Thus

From (12-10

For

the

vari\177

race, from (12-15)and

(12-2)

(12-19)

0.
\177 w\177
\177\177

=,

(b)

var

(12-7) proved

= fi

E(b)

=\177

(12-20)

x\177 0.\177.
k \177

0.2

notin

Again

(12-10),

var (b)
A
completing

weight

simil\177

(12-21)

r derivation

:heproof. We
w\177 at&tached

to the

of the

--

0.2

proved

mean and VarianCe

observe
Y\177observation

fr

\370m

(12-8)
of

(12\17712)that

is proportional

a is
in

left as

an eisercise,

Cal\177ia\177in\177

\177i

to the deviation

the
x\177.

REGRESSION

240
Hence
the
the
calculation

outlying
observations
of b.

THE

'12-5

THEORY

will

exert

a relatively

This is the major justification of using the


linear regression model.

least

Within

the class
the least

Theorem.

method

squares

estimators
of fi (or 00,
has minimum
variance.

Gauss-Markov

estimator

in

THEOREM

GAUSS-MARKOV

unbiased

heavy influence

of linear
squares

in the

(12-22)
i

is important

theorem

This

assumptions (12-4),and
distribution

statistics

texts.

To interpret

it

requires

hence

error term. A

of the

because
proof

follows

even

no assumption
be found in

may

from

the weak

of the shape
most

set of
of

the

mathematical

important
theorem,
consider
b, the least squares
have already seen in (12-13)
that it is a linear estimator,
and
we restrict ourselves to linear estimators becausethey
are easy to analyze
and understand. We restrict
ourselves
even further, as shown in Figure
12-3;
within this set of linear
estimators
we consider
only the limited class that are
unbiased.
The least squares estimator not only is in this class, according to
(12-7),but of all the estimators in this class it has the minimum variance.
It is often,
therefore,
referred
to as the \"best linear unbiasedestimator.\"
of

estimator

Gauss-Markov

The

case of

this

fl. We

regression, we

theorem
might

ask

has an interesting corollary. As a special


happens
if we are explaining Y, but

what

Least squares
estimator.

In its class,

this

estimator

has least variance

FIG. 12-3

Diagram

of the

restricted class of estimators

theorem.

considered

in the

Gauss-Markov

THE

so

0 in (12-Z),

fi

(12-2),J. is

least squarese
mean

(/\177)

is

play. From

x comes .into

(11-13) its
the least squares estimator
of a population
(Y), and the Gauss-Markov theorem fully

population

(/\177).

from

Moreover,

Y. Thus,

is

mean

sample

t\177e

mean is the best

sa[nple

the

applies:

of the

Limatot

variable

independent

no

that

mean

th

241

OF b

DISTRIBUTION

linear unbiasedestimator

of a

population

that the Gauss-Markov theorem is restricted,


are both linear and unbiased. It follo ws that
there may be h biased
or nonlinear
estimator that is better (i.e., has smaller
variance) tha
the
least
squares
estimator. For example, to estimate
a
population
m\177an,
the sample median is a nonlinearestimator. It is better
than
the sample
m ;an for certain kinds of nonnormal
populations.
The sample
median is jus one example of a whole collection of nonlinear statistical
arametrlc
methods known
as \"distribution-free\"
or\" non
'\" statistics.
' ' These

mear\177i

emphasized

ie

must

only

applying

are expressly\177lesigned

12-6 THE

inference

for

when

the population

cannot be assumed

distributed.

be normally,

to

that

\177o estimators

}ISTRIBUTIONOF

W, ith the i mean


and variance
of b established in (2-7) and (2-8), we now
ask: ' What
i\177the
shape
of' the distribution
of b?\" If we add (for the first
time) the strong assumption that the Yi are normal, and recall that b is a
linear combi \177ation of the Yi, it follows that
b will also be normal from
(6-13).But exen without assuming the Yi are normal, as sample size increases
the
distributi
\177nof b will usually approach normality; this
can
be justified
by
a generalized form \177'of the central limit theorem (6-15).
We
are
tow in a position to graph the distribution
of b in Figure 12-4,
in
order
to
evelop a clear intuitive
idea
of how this estimator varie\177:s from
sample
to sample. First, of course, we note that (12-7) established that
b
is an unb\177ase
estimator
so that the distribution
of' b is centered
on its

target
The \177ntqrpretanon
the experiment

makes the

of

the

had been badly

(\177eviations

xi

small;

designed

(12-8) is more difficult.


with the Xi's close

hence

\177

variance

x \177.small.

Therefore

Suppose

that

together. This
a\177'/\177 x \177'

the

(12-8) is largeand b isa comparatively


unreliable
estimator.
To check th\177 intuitive
validity
of this, consider the scatter diagram
in Figure
12-5a. The t unching
of the X's means that
the
small part of the line being
of b from

variance

The

central

mit theorem

(6-15)concernedthe

normality

of the

sample mean

X.

In

6-8 i'. was seen to apply equally well to the sample


sum S. It applies also to a
weio\177hted sum
of random variables such 'as b in (12-13), under most conditions.
See for
example, D. A. S. Fraser, Non?arametric StatiStics, New York: John Wiley, 1957.Similarly
the normality
\177fa is justified.

Problem

242

THEORY

REGRESSION

FIG.

12-4

The probability

of the

distribution

estimator b.

Y=a
+ lSx
Unknown true

regression

Y=a+bx
Estimated

regression

(a)

Y=a

'

12-5

(a) Unreliable

estimate

when Xi are very


X i are spread out.

true

Estimated

regression

\177k

FIG.

tSx

regression

\177- \177'\177

(b)

--- Unknown

close.

(b) More reliable

fit

because

cE

CONFIDEN

INTERVALs

}tYPOTHESES

TESTING

AND

\177BOUT

243

/\177 '

investigated
is obscured by the error e, making
the
slope
estimate
b very
unreliable. In this specificinstance,our estimate
has
been pulled badly out
of line by th\177 \177rr\370rs... i n particular,
the on e iI\177dicated
by the arrow.
By contrait, in Figure
12-5b we show the case where the X's are reasonably spread otJt. Even though
the error e remains the same,
the estimate
b
is much
more
'eliable,
because errors no longer exert the same leverag e.
As a con.:rete example,suppose
we
wish
to examine how sensitive
Canadian impilrtS
(Y)
are to the international
value of the Canadian
dollar
(x). A much more reliable estimate
should
be possible
using the period' 1948
to
1962
when
the Canadian dollar was flexible (and took on a range of
values)
than
in the period before or Sincewh en this dollar was fixed (and
only
allowed
t\177 fluctuate
within
a very narrow range).

12-7

With
statistical

to the

AND

INTERVALS

CONFI[\177ENCE

ABOUT

HYPOTHES'ES

TESTING

fi

the

variance

\177ean,

infe\177

inferenc. \177about

normality
of the estimator b established,
now in order.
Our argument
will
be similar
in Section 8-2. First standardize the estimat$
r b,
and

fi are

about

\177nces

3t

obtaining
(12-23)

\177

where

Since

where ?i

\177-\177
N(0

{,

1).

of

variance

a\177', t\177e

is the fitted

value

of

Y is

generally

n--

5 Z

[ Y\177--

the

estimated

on

?.\177=

s \177'
is

often

referr,

The divisor (n

unbiased estim.
estimator

L]

as \"residual variance,\"a term


2) is used in (12-24) rather than
a

of

\177'.When

this

longer normal, but

a As

\177'

substitution

instead has the

with

is estimated

(12-24)

\177

regression line'

i.e.
(12,25)

bx\177

:d to
tor

b is nd

a +

it

unknown,

similarly

used

n in order

of s \177'for
slightly

in ANOVA.

to make s\177 an
cr \177'
is

more

made,

the

spread2out

argued
in thl footnote to equation
(8-11).
But in the present calculation
of s 2, two
estimators a and \177are required; thus there remain
two fewer degrees of freedom for s a.
Hence (n - 2) is t\177e divisor
in s \177',and also the degrees of freedom of the subsequent
t

distribution

in

(12-\1776).

REGRESSION

244

THEORY

t distribution'

(12-26)

t --

For

the

t distribution

that

the

distribution

to be
of

(a) Confidence

the strong assumption


(12-26) we may now proceed to

we require

valid,

strictly

From

is normal.

or test an

interval

a confidence

construct

Yi

$2

hypothesis.

Intervals

t.025denote the

letting

Again

leaves

which

value

2\253\177

of the

distribution

upper tail,

in the

Pr (--t.0._,,
for

Substituting

(12-27)

t.025)= .95

t <

<

(12-26)

t from

[-

Pr
The

inequalities

within

(12-28)

'

.95

t.o.\17751

<

I-t'\37025

the bracket

may

be

reexpressed

.95

(12-29)

yields

which

The

95 \177 confidence

interval

for

fi:

s
fi

t.025 has

where

(n --

=b

2) degreesof

4-

(o25

(12-30)

--\177.\177

freedom.

evaluated

our example of wheat yield in the previous


chapter,
the confidence
for
/3 (the effect of fertilizer on yield) is computed as follows. s is
in the last three columns of Table 11-2.Also
noting
the values

4 Using a

similar

For
interval

argument,

and noting
oc

We note

that

this

is very

similar to

the

(12-6), the
-- a 4- t.025

confidence

95\177

confidence

interval

for :z is'

(12-31)

/\"

interval

for it

given

by equation

(8-15).

for b and

calculated

becomes

iNTERVAi-i

table,

our 95 \177o confidence

that

in

FOR

PREDICTION

(12-30)

interval

3.48

2.571

.068 4-

245

Yo

,/28o,oOe
.017

4-

.O68

.051 <

fi

.085

<

(12-32)

Testing hyp\177 theses.


A two-sided
test of any hypothesis may
be carried
out simply by hoting
whether
or not the confidence interval
(12-30)
contains
that hypothesis. For example, the
hypothesis
typically
tested is the null
hypothesis.

i.e., using our example, that


t w,

the

against

H0 must be rejected at a 5
is not containeli in (12-.32).

nce fertihzer

(12-33)

The first

calculate

if; to

step

From (122.26)and

the

(12-34)

since the

level,

affect

favorably

the one-sided

against

(12-33)this

statistic
reduces

on yield. In testing this

\177

\177o signifiCance

is expected to

appropriate to itest

true.

no effect

has

fertilizer

i-sided alternative

Ht' fi

(12-33)

=: 0

H0:fi

yield,

of'zero

value

null

it seems

more

alternative:

under
to

the assumption

Ho

that

is

12-36)

and in our

exa\177

nple'

=
Since

this

obsm

of

in favor

th

red value
:onclusion

.068
(I2-37)

10.3

\177-

3.48/42801000

exceedsthe

Critical

of 2.015, Ho
affectsyield.

t.05 value

that fertilizer favorably

is rejected

:
PREDI(

12-8

If
how

TION

INTERVAL

FOR

Yo

one new application of 550 pounds of fertilizer


pre, lict the resulting yield ?
The bestp )int estimate
will be the corresponding fitted
Y
we

plan

(x o =

150)

do we

value

on our

THEORY

REGRESSION

246

Y= cr + \177Sx
True regression

Estimated
regression

0
FIG.

t2-6

How the

regression line,

estimated

Yo is

estimator

related to

the

E(Yo).

target

i.e.'

= a+
=

xo

(12-38)

bXo

60 +

.068(150) =

70.2bu/acre

(12-39)

involve some error because,


b. Figure 12-6 illustrates
the effect of these errors; the true regressionis shown,
along
with an estimated
regression. Note how the fitted 7o in this case underestimates.In Figure
12-7
the true regression is again shown along with
several
estimated
regressions
fitted from several possiblesets of sample data. The fitted
value
is sometimes
too low, sometimestoo high,
but
on the average, just right.
The important observation in Figure
12-7 is that if x 0 were
further
to

point estimate, this

But

as a

for

example,

of e?rors

will

made

in

certainly

almost

calculating

a and

E(Yo)

Y=a + \177Sx
(True regression)
Estimated

regression

values (YO)
shown as dots

Fitted

lines

1
:\177

xo

FIG. 12-7

Yo

as an

unbiased estimator of E(Y0).

INTERVAL: FOR

PREDICTION

the

right,

On

the other

thin

of zero then
in b that
a

caus\177

prediction'

the effect

would

:s this;

such

of ai\177extreme

Formally,

it

dividual

Y oN

thus

ha\177

spread out. Moreover,it


slope b will
of fertilizer; but
in the

do

range.

wider

central \177Value
is the

little

error

harm

in

amount
any
predicti\177\370n
of
amount
of fertilizer will be thrown
badly into error.
shown
that5 the 95\177o prediction
interval
for an in-

be

nay

be less
an error

average

an

;ervation is

Yo

where t.o25

further

lr estimates

=,iVen

be spread out over an even


to the left and closer to its

would

estimates

OUr

and if xo were

247

Yo

(n

-- 2)

(12-42)

s jl 7,

?0 +

d.f.

For example, we can


550lb/acreoflfertilizer
were
predict'

a prediction

construct

now

With

applied.

150\177.
Yo =

70.2 4- 2.571(3.48)\\/\177
60.3

_<

interval

a 95 \177 chance

for yield if
right we

of being

+1

280,000

Yo _

(i2'43)

80.1

This predictiO: interval


is shown
in Figure 12-8. Moreover, the same calCulation for all po, sible xo values
yields
the dotted band of predictio
n interval
s,
expanding
as :o moves farther
away
from its central value of zero.
It should
be emphasized
that x o may be any value of x. If xo lies among
the observed
'alues Xl \337\337\337x\177, the process is called \"interpolation.\" If xo is
one of the valttes xx \337\337\337x,,, the process migh't be called, \"using also the other
values of x to ':Shar
en our knowled
e of' ' \177n\177.\177,,\177,,l\177,;,,- o, o. ,' \177 \177 ;\177
beyond
x\177 or Jh,, then
the process is called \"extrapOlatiOn.\"
Th e techhiqueS
a Without going Into
in

a prediction

the

proof

of (12-42), we

sketch its

plausibility.

The variance involved

i s rodghly

(a) +

Par = var

var

(b)x\177

+ var

(Y)

(12-40)

varim ce of a, plus the variance of b weighted with x0\177


, plus the inherent variance
This last sourceof error must be included of course;
if \177and/9
were known exa4tly, the prediction
of Y0 would still be subject to error.
even
Into (12-40)we substitute
(12-6), (12-8), and (12-2)
that

is, the

of any

Y observation.

G2

var

When s is substi

\177ted for

o, (12-42)

{72

=-

09.

a2

(x\177))+

follows.

E\177,\177
\177 +

(12-41)

248

THEORY

REGRESSION
Y

.''

I
I

Y=

bx

a+

80.1

8O

regression

.-

\177

'

\177o.2

60 + .068x

\"\177=Estimated
.....-

.._.-

40
[
I

20

,,[

150

xo
FIG.

developed

siderable

Prediction

12-8

for

interval

Yo.

in this section may


be used
for extrapolation,
caution
as we shall see in the next section.

but

only

with

con-

PROBLEMS

12-1 Construct a
(a)

95

\177

for the

interval

confidence

regression

in

coefficient/\177

11-1.

Problem

(b) Problem 11-2.

12-2

the

of

Which

prove

to be

following

does the data

hypotheses

unacceptable at

the

\177

of significance

level

11-1

Problem

of

(a) fi=0

(b) /\177=

1/2

(c)

(d)

\177 =--.1

12-3 At the 1 \177


the

level

hypothesis

use the data

of significance,

that savings

alternative hypothesis that


12-4 Using the data of Problem
for the savings of a family
(a) $6,000

does
11-1,

what

income,

test

against the

with income.

is your

an income

of Problem 11-1to
on

depend

increases

savings

with

not

of

95

\177oprediction

interval

EXTRAPOLATION

OF

DANGERs

249

(b) $8,0!0
(c)

s10.

000

(d)

S12

000

(e)

Whi

:h of

if)

NoV

from

these four intervals is least precise? Most


is the answer to (b) related to the confidence

and I

matical\" and
interpola

dan

central value.

(a) Mathema

ou

underlying

(b) Practical

it is

which

allowed to

widely

vary

which we

in extrapolation,

In both

'practical.\"

cases, there

ion and dangerousextrapolation.


get of misinterpretation
as

call

\"mathe-

in

the

further

previous

section

that

zero. This is true, even


mathematical model hold exactly.

.ves away from

prediction

if all the

intervals

get

assumptions

\177anger

it

practic\177

must

absolutely corfsct. Rather,

be recognized
it

is a

that

mathematical

useful approximation.

model is never
In particular, one
means are strung

the hypothesis that


th e population
line. If we consider the ernhzer example,it is likely
that
the
true r,
[ation increases initially,
but
then bends down eventually as
a \"burning poi: \177t\"is approached,
and the crop is overdosed.This is illustrated
cannot

out

take

in an

se 'iousty

exact

\177
straight

in Figure
ately

12-9which

reduced.

is practically

model.
of

HoweV

experimenta

is an extension of Figure
11-2
with the scale appr\370priregion of interest, from 0 to 700 pounds, the relation
a ;traight line, and nO great harm
is done
i n assuming the llnear
:r, if the linear model is extrapolated far beyond this region
Iion, the result becomes meaningless.
In

division between
there
is continually
and further from its

Rather,

x 0 gets

might

sharp

no

is

cal Danger
>hasized

m,

to explain how the interest rate


(i)
affects
Would you prefer to take observations
of
which
the authorities
were trying
to hold
interest

U.S.

dangers

two

en

larger as x0

In

found

OF EXTRAPOLATION

DANGiRs
an

It was

the

a periodin

There

increasing

in

over a periodin

constant i or

safe

interval

trying

are

you

investm\177.qt (I)

12-9

precise

(1\177 -3\177)?

Suppose

12-5

\177

the

250

THEORY

REGRESSION

Linear model

--

-\177r
\"x

700 1,000

400

100

Fertilizer

FIG.

There are

models

\"nonlinear\"

are

tests

statistical

Moreover

Comparison of linear

12-9

appropriate. Thesetopics

'12-10

MAXIMUM

12-5 including

Y). In Sections

of

normality

required

general principle that


parent

population

make the

strong

sample

small

not

or

whether

advanced texts.

estimation

In these

(i.e.,

term

was

assumption

normality

.and this
requires a

estimation

to validate the t distribution.


assumption
of a normally

of least

justification

of the error

normality

the

12-6 to 12-9,the

sample

small

for

only

in more

Gauss-Markov

the
of

assumption

seem more appropriate.

determine

ESTIMATION

LIKELIHOOD

12-1 to

Sections

if they

to help
covered

are

models.

nonlinear

and

available,

available

they are

squares requiredno

(true)

model

because
normally

of

a quite

distributed

last two sections

distributed error throughout.

we
On

we derive the maximum


likelihood
estimates
of o\177and/\177, i.e.,
hypothetical population values of o: and/\177 more likely than any others
to generate
the sample values we observed.TheseMLE
of o\177
and/\177 turn
out
to be the least squaresestimates;
thus
maximum
likelihood
provides a second
justification
for using
least squares.
Before addressing the algebraic
derivation,
it is best to clarify
what
is
going on with
a bit of geometry.
Specifically, why should
the maximum
likelihood line fit the data well ? To simplify,
assume
a sample of only
three
this

premise,

those

observations

First,
it

carefully,

(P\177,

let us

P2,
try

we note

points.) Temporarily,

Pa).

line shown

the

out

that

it

seems

to be

in

12-10a.

Figure

pretty

bad

fit for

(Before examining
our three observed

this were the true regression line; then


the
be centered around
it as shown.
The likelihood
that
such
a population
would give rise to the samples
we observed
is the
probability density that we would get the particular set of three
e values
shown in this diagram. The probability
density
of the three values is shown

distribution

suppose

of errors would

LIKELIHOOD

MAXIMUM

251

ESTIMATION

ordiqates above the points P\177, p=, and Pa. Because our th
by assumption statistically inde,,end=-* ,\177-- ,'- ,..ree\177observa..
:
\177'
\177,,\177,the
llKeilnooo
oI all three
(i.e., the pr\275 bability
density
of getting the sample we observed),
is the
product
of these tl\177:ee ordinates.
This likelihood
seems relatively small, mostly
because
the /ery small ordinate o\234P\177reduces
the product
value. Our intuition
that
this is
bad estimate is confirmed;
such
a hypothetical
population is
not very Iik\177 y to generate our sample values. We should be able to do better.
as the

t\177ons are

In Figu!:e12-10/5
it is evident
is more likely

thetical pop/alation

we can

that

do

This

better.

much

rise to the

give

to

sample we

hypo-

observed.

el

The second observation


Its probability ordinate
xt

x2

x3

(a)
P(Y/x)or
p (e/x)

x2

Xl

x3

FIG.

it is

a hypoth
12-10
Max

only

likely to generate

tt

:hum

likelihood

estimation.

',tical

population

that the

e observed

P1, P2, Pa'


likely

(a) Note. This is not


statistician is considering.

(b)Another

to generate

hypothetical

Pz, P2, Pa.

the

true population;
But

population;

it is

not

very

this is more

252

THEORY

REGRESSION

terms are collectivelysmaller, with their probability


density
as a consequence.
The MLE technique
is seen to involve speculating on various
possible
populations.
How likely is each to give
rise
to the sample we observed?
Geometrically, our problem
would
be to try them
all out, by moving the
disturbance

The

being greater

population

through

line and its

surrounding

all

position involves a different


the likelihood of' observingP\177,

Each

choosethat
over,
shall

little

of' trial

set
P2,

This procedureseems
other points worth

The

noting.

another set of'

in a

intuitively,

result,

to

MLE we

MLE

It is
to arrive

likelihood.

12-10b

good

more-

fit;

surprise

that

we

is derived

f`rom

our

is no

it

case

In each

For our

almost
certainly give rise to another
MLE
of' 0\177and
fl. The second point is more
subtle. The likelihood of' any population
yielding our sample depends on
not
only
the size of' the e terms involved, but
also
on the shape of' the e
distribution--in particular
\1772, the
variance
of' e. However, it can be shown
that the maximum likelihood line does not depend on \1772. In other words, if'
we assume
\1772 is larger,
the geometry will look different,
because e will have
a flatter distribution;
but
the end result will
be the same maximum likelihood

three

sample

observations;

and/3.

maximizes this
is required in Figure

since it seems similar to the least squaresfit,


be able to show that
the two coincide.

There are two

0c

which

adjustment

f'urther

values for

be evaluated.

P\177 would

population

hypothetical

evident that
at the MLE.

values--i.e.,
by
moving
the regression
through all possible positions in space.

possible

its

e distribution

line.

has

means Whileof'

geometrYth at
arriving

done algebraically.For generality,


rather than just 3. We wish to

likelihood

as a Function
the probability

or probability
of` the

possible

suppose

..--.
p(V\177)

that

we

know
Y2'\"

(I2-44)

Y,\177)

density of' the sample


population values of' 0c,

density of the

provided a precise
estimate.This must be
have a sample of' size n,

maximum likelihood

?(Y\177,

the

would

it hasn't

method,

the

clarified

e specific

observations

sample

value

first

of' Y,

e_(1/2c\1772)

we observed-.-expressed
fi, and

\1772. First,

consider

which is

[ Yl-(\177+\177x\177)]\370'

(12-45)

x' / 2---\1777\177z

is simply the normal distribution


of' Y\177, with
its mean (\177 + fixt) and
variance (crs) substituted into the appropriate
positions.
[In terms of the
geometry
of' Figure
12-10, p(Y0 is the ordinate above
P,.]
The probability
density of the second Y value is similar to (12-45), except that the subscript
2 replaces l throughout,
and so on, f'or all the other observed Y values.

This

inde

The

of

\177endence

bilities togeth

to

,,r

P( Y\177, Y2,'...,

Y values

the

justifies multiplying

proba-

these

all

Thus

(12-44).

find

253

LIKELIHOOD ESTIMATION

MAXIMUM

Y\177)

__

where

the

rei,resents

\177

i=1

the product

exponentials

of,

product

factors, Using

reexpressed

be

can

(t2-46)

in

for

rule

\177amiliar

the

summing

by

exponents
ya,...

p(y\177,

y,\177)

the observed Y'\177


3, and \177a. To emphasize

t'mt

Recall
of g,

values

(12-47)

eX\177-\177/a'\177tI\177-\275\177\275\177'

are

given.

this,

We are speculating
we rename (12-47)the

on Various
likelihood

i\177nction
1

We

now ask
is

t\177
appear

exponent in;'olves
\177and

fl in

fl make

\177.and

moreover,

exponent;

the

i;.

of

values

which

the

minimizing

(12-48)

e_\177/aa\177x\177_\177_\177,\177

L largest

? The only

plac

e g

and

maximizing a function
with
a negative
Hence our problem is to choose

exponent.

to

\177rder

[Yi-m likelihood
\177.\177xi] \177solution
(12-49)
maximu
for
This is the proposition
suggested
in the geometrical an tlysis in Figure 12-10; no matter what,is ass\275\177ed:abPut
the
spread
of the distribution, the maximum likelihood line is not affected by it.
But an even more important
conclusion
follows
from comparing equation (12-49) vith equation
(11-10).
Maximum likelihood estimates are identical
to least
squhres estimates. The selection of least squaresestimates
a and
b

his providesminimize
the

\177

Moreover,

regardless of'

the

(11-10) is identical

to minimiz}

estimates o\275 =
called

our

follows
normally
6 e\" \337
e \177'
=

and

fi

\177stimates

ustification

theoretical
fr

\370n

d
ea

of o.

value

applying

of the least
maximum

stributed error.

for any

to

the

selection

of maximum likelihood
e is that we;re

minimize
(12-49). The only
differenc
d\177ffe
'
r e nt names . This establishes our
to

a and b.

squares method: it
likelihood

is

other

th e

important

estim.ate

techniques to a model

that
with

THEORY

REGRESSION

254

'12-11

THE CHARACTERISTICS OF

INDEPENDENT

THE

VARIABLE

So far
set

given

it

been

has

of fixed

assumed

that the independent

example, fertilizer

values (for

variable x takeson a
was

application

set at

certain

specifiedlevels).But

in many
cases x cannot be controlled
in this way. Thus
if we are examining the effectof rainfall
on yield,
it must be recognizedthat
x (rainfall)
is a random variable, completely
outside
our control.
The surprising thing
is that
the same MLE follows whether
x is fixed or a random
variable,
if we assume
[as well as (12-4)], that

1. The distribution
2. The distribution
every x i.

for

The

likelihood

of

x does
e is

not depend on \177, fi, or rrz.


independent of x, being N(0, rr 2)

(12-50)

(12-51)
of our

Y. If the

z and

both

of

z\177.are

sample now involves the probability


independent,
the likelihood function

of observing
is

(12-52)
Because

Collecting

of the normality

the

exponents

L =

?(x) does not


(12-50), the problem
Since

assumption (12-51),

by

holds

of maximizing

a joint

this

likelihood

minimization

in fact, even if the


probability
distribution;
true,

e_(i/2az)Z(y\177._=_fia:\177)2

depend on the parameters

these parameters reducesto the


This

p(x0p(:%)

x\177are
then

not

o\177,f,

and

function

(12-54)

to

rr \177according

with respect

to

of the same exponent as before.


independent,
and are determined

(12-54)

becomes-

(12-55)

again requiring the same (least squares)minimization


of the exponent.
We conclude that MLE and least squares coincide regardlessof whether
the independent
variable x is fixed, or a random
variable
ifx is independent
of the error and parameters in the equation
being estimated. This greatly
generalizesthe application
of the regression model.

chapter

Regression

13-1

INTR\177

)DUCTORY EXAMPLE
that

Suppose

were

taken

country.

Ew

the

different

if soil

conditions

wheat yield observations

and

fertilizer

t several

and temperatures

in

stations

experiment

agricultural

were essentially

Chapter

11

across the
the

same

ireas, we still might


ask,
\"Can't part of the fluctuation
in
Y
levels
of rainfall in
(i.e., the dis :urbance term e) be explainedby varying
different
are\177ts ?\" A better
prediction of wheat yield may be possible
if both
fertilizer an d rainfall are examined. Noticehow this argument
is similar to
the one used'in two factor ANOVA'
if the error e can be reducedby taking

in

all

these

i\177to account,
variablesRare
related.The
1

rainfall
along

with

Table 11-1.

ithe original

TABLE

will get

we

a better
explanation
of how
the other
of rainfall are shown in Table
13-1,

levels

observed

observations of

Fertilizer

Wheat

Observed

13-1

and

Application,

Wheat

Yield

(bu/acre)

yield

wheat

X
Fertilizer

(lb/acre)

Yield,

Rainfall

Z
Rainfall

(inches)

40

100

36

45

200

33

5O

300

37

65

400

37

70

500

34

70

600

32

80

700

36

255

and

fertilizer

from

256

REGRESSION

MULTIPLE

The

MODEL

MATHEMATICAL

THE

13-2

technique
used to describe how a dependent
or more independent variables
is in fact only an
extension of the simple
regression
analysis
of the previous two chapters.
Yield
Y is now to be regressed on
the
two independent
variables, or \"regressors,\"fertilizer
2' and rainfall
Z. Let us suppose it is reasonable
to argue
that
the
model
is of the form
regression

multiple

to two

related

is

variable

E( Y\177) =
regressors

both

with

Geometrically
in

measured as

is a

plane

from

deviations

as a

hollow

unlikely

to

observed Y,.
expected

plane. The

at

fall precisely on
this fertilizer/rainfall

value,

combination

any

observed

shown

directly

lying

E(Y) = ot

e\177
I

(x\177,

above, shown
as a solid dot, is

our particular
is somewhat greater than

and expected

IY

fertilizer

and

directly

example,

solid dot

and is shown as the

differencebetween

For

plane.

this

means.

their

space shown

three-dimensional

the

\177
in

For any given combination of rainfall


yield E(Y,) is the point on this
plane
dot.
Of course,
the observed value of Y,

expected

very

its

x and z
equation

(13-1)

7z;

13-1.

Figure

the

this

fl.% q-

\177q-

above

value of

Yi

this
is the

+ \177+ '\177z

E(Yi)

(xi, zi)

Fertilizer x

Rainfall

FIG.
\177
It

is a plane

could say that


more important
sumption

that

13-1

Scatter of observed points

about

the true

regression plane.

because it is linear in .v and z. Looked at from another


point
of view we
(13-1) is linear in cz, fi, and 7. 17n fact, this latter linearity
assumption
is the
of the two, since we are involved in estimating cz,/5',and
7; it is this askeeps our estimating equations (13-4)linear.

257

ESTIMATION

SQUARES

LEAST

!!

or

stochastic

its expected

:rror

term

\177lue

plus

/3

ass\177:

mptions about e the

is geo r \177etrically

fix i +

o\177+

Yi

may

be

expressed as

7z\177. +

(13-2)
!!

es

Chapter I2.
slope of the plane as we
i.e., keep z constant; thus
as in

same

as the

interpreted

Value

term

disturbance
Yi

with our

observed

any

e\177.Thus

this

move

in a

fi is
the
pal illel to the (x, Y) plane,
Y. Similarly
97 is geometrically
intermarginal effe:t of fertilizer x on yield
pretedas the ;lope of th e plane as we move in a direction parallel to the (z, Y)
Y.
plane,
i.e., k\177,ep x Constant;
hence 7 is the marginal effectof z on

direction

LEAS'[ '

13-3

Least

Sc

SQUARES ESTIMATION

baresestimates

m nimize

and 97 that
Y's and the

tted Y's;

of the squared deviations

i.e., minimize
(yi

where
by

a, b,

setting

- a --

th

of

derivatives

the estimates
between

the

Of 0c,

fi,

observed

\177'

a [d c are our estimators of


partial

by selecting

derived

are

the sum

this

hx,

:z,/3,

and

function

(13-3)

czt,)
7. This is
with

done

respect

with

to a,

calculus

b, and c

fi, and 7 are derived


in the same way as in the simple
with least squares. Geometrically,
this involves trying
hypothetical
regression
planes in Figure 13-1, and selecting
that one that
out
all possible
is most likely tl generate
the solid-dot sample values
we actually observed.
But first, fiote that Figure 13-1 involves 3 parameters (oc, fi, and 7), and 3 variables
(y, .r, and z). !ttowever, there is one additional
variable
in our system--p(Y/x, z)\177which
has not vet beSn \1771otted. It may appear that
there
is no way of forcing
4 variables
into
a three-d]mens{on\177.l
space,
but this is not so. For ex0\177ple,
economists
often plot 3 variables
(labor,
capitalj and output) in a two-dimensiOnal
labor-capital space by introducing
the
third output v\177riable as a system of isoquants.
Those
for whom this is a familia[
exercise
should have little trouble
in graphing four variables [Y'
dimensional
(t, \177,and z) space by introducing
the fourth variable [/9(Y/x, z)] as a system
of isoplanes.
!ach of these
isoplanes
represents
(Y, x, z) combinations
that
are equiprobable (i.e.: for which the probability density of Y is constant). Thus the Complete
geometric
mo( el is the regression plane
shown
in Figure 13-1, with isoprobability
planes
stacked above and below it. Our assumptions about
the error term (12-4) gaurantee
that
the isoprobabi ity planes will be parallel to the true regression plane.
For MLE, we introduce
the additional assumption that
the error configuration
is
normal.
The n we shift
around
a hypothetical regression plane
along
with its associated
set of paralle I \177oPr\370bability planes. In each position the pr\370bability density of the \370bserved
sample
of poir\177ts is evaluated by examining the isoprobability plane
on which each point
lies, and mUltiplYing
these together. That hypothetical
regression
which maximizes this
likelihood is cposen.The algebra
resembles
the simple case in Section
12-10; it i s easy to

9.

Maximum

lik

regression case

show that

,.lihood
again

thisiresults

estimates

this

of

cz,

coincides

in minimizing the

sum of squares

(13-3).

258

MULTIPLE

REGRESSION

LEAST SQUARES

ESTIMATio
N

259

equal to
'o, (or algebraically by a technique
similar to that
used
dix 11-1). 'he result is the following three estimating equations'

in Appen-

a=Y
=

\177 Yixi

Again,

nott

third

equat

be solved

may

Table 13-2, and yield the

\177 x\177 +

regression

multiple

fitted

(13-4)

\177 x\177.zi

estimate a is the mean of


for b and c. Thesecalculations

the intercept

that
\370ns

second and

Y. The
are

in

Shown

equation.

PROBLEM S
13-1

SupVOse a
exte\177

random sample of 5

Income

s 8,000

s 600

(a)

Savings

family

data (an

the following

yielded

families

11-1)

of Problem

sion

Assets W

$12,000
6,000

11,000

1,200

1,000

6,000

9,000

3,000

700

6,000

15

300

6,000

18,000

of S on Y and W.
(b) E oes the coefficient of Ydiffer
from
the answer to Problem 11-1(a)?
Whic
coefficient
better
illustrates
the relation of S to Y?
(c) F r a Family
with
assets of S5000 and incomeof ,88000,
what
would
you
redict savings to be?
(d) C\177tlculate the residual sum of squares, and residual variance
s 2.
(e) A-e you satisfied with
the
degrees
of freedom you have for s2 in
this oroblem ? Explain.
the

Estimate

regression

multiple

13-2
a random

Suppc\177se

exten ion of

mily

sample

equation

of 5 families

Problem 11-1)

Savings

s 600

yielded the

Income

data

following

Number

of

Children

s 8,000

700

6,000

2
1
3

300

6,000

1,200

1,000

11,000

9,000

(an

260

REGRESSION

MULTIPLE

multiple regressionof S on Y and N.


with
5 children
and income of $6000, what
would
you predict
savings to be?
'13-3 Combiningthe data of Problems 13-1 and 13-2,we obtain
the following
Estimate the
(b) For a family
(a)

table

of

Number

Income Y

Savings S

Family

700

6,000

300

6,000

the

Measuring

3,000

18,000

variables as

independent

wish to estimate the

6,000

6,000

9,000

1,000

S12,000

11,000

1,200

Assets

$ 8,000

600

Children N

deviationsfrom

the

mean,

we

regressionequation
S

\177.+

fly +

?,w +

6n

(a)
Generalizing
(13-4), use the least squares criterion
to derive
the
system of 4 equations neededto estimate
the four
parameters.
(b) Using a table such as Table
13-2,
calculate
the estimates of the four

parameters.

13-4

MULTICOLLINEARITY

Regression

Simple

(a)

In

the

Xi's were

In Figure
will

12-5a it

was

how

shown

closely bunched, i.e., if


be instructive to consider the limiting

the

our estimate b became unreliable


if
regressor
X had little
variation.
It
case,
where the Xi's are concentrated

on one single value X0, as in Figure 13-2. Then b is not determined


at all.
There are any number
of differently
sloped lines passingthrough
(X,
Y)which
fit equally well for each line in Figure
13-2, the sum of squared deviations
is the same,
since the deviations are measured vertically
from
(X, Y). This
geometric fact has an

algebraic

and the

b in

term

involving

depend on b at all. It
sum of squares. An

follows
alternative

counterpart.

(11-10) is zero;
that

any

way

If all Xi --

hence the

sum

-\177,

then

of squares

b will do equally well in


of looking at the same

= 0,
does not

all x i

minimizing

the

problem is that

261

MULTICOLLINEARITY

i!

FIG.

since

all

x\177ar

zero,

defined.
In concl\177.si\370n,

\177 x\177in

the

the

when

because of no

regression,

Degenerate

denominator

X. \177

of (11-16) is zero, and

ofX show

values

(variatio n) in

SPread

little

or

no variation,

b is

not

then the

effect of X 04 Y can no longer be sensibly, investigate d' BUt ifth e problem is


predicting Y-\177rather
th an investigating
Y s dependence on X this bunching
of the X values doesn't matter
provided
we stick to this
same
value
of X. All
the lines in F gure 13-2 predict Y equally
well. The best predicti \370n i s F' and
all these lines give us that result.

(b) In Multi

ile

Regression

consider

Again

the

hm\177t\177ng

case

where

of the independent

the values

Figur

\177

variables

a\177d

Z are

completely bunched up

on a line

L,

as in

\177'1

F
\1772
1

FIG.

13-3

Multicollinearity.

e 13-3.

262

MULTIPLE

REGRESSION

that all the observed points in our scatter lie in the vertical plane
through
L. You can think
of three-dimensional
space as a room
in
a house;
the observations are not scattered throughout
this
room,
but
instead lie embedded in an extremely
thin pane of glass standing
vertically
on the floor.
means

This

up

running

In explaining Y, multicollinearity
makes
us lose one dimension.In the
case of simple regression, our bestfit for Y was not a line, but
rather
a point (x,, Y); in this multiple
regression case our best fit for Y is not a plane,
but
rather
the line F. To get F, just fit the least squares
line through
the
points
on the vertical pane of glass.Theproblem
is identical
to the one shown
in
Figure
11-2; in one case a line
is fitted
on a flat pane
of glass, in the
other case, on a flat piece of paper. This regression line F is therefore
our
best
fit for Y. As long as we stick to the same combination
of 2'and Z i.e., solong
as we confine ourselves to predicting Y values on that pane of glass no
special
problems
3 arise. We can use the regression
F on the glass to predict Y
in exactly
the same way as we did in the simple regressionanalysis
of Chapter
11. But there is no way to examine
how ,Y affects Y. Any attempt
to define fl,
the marginal effectof X on Y (holding
Z constant),
involves moving off that
pane
of glass, and we have no sample
information
whatsoever
on what
the
world
out there looks like. Or, to put it differently,
if we try to explain
Y with
a plane rather than
a line
F we find
there
are any number of planes
running
through
F (e.g., rr\177and %) which do an equally good job. Sinceeach
passes
through
F, each yields an identical sum
of squared
deviations;
thus
each provides an equally
good fit. This is confirmed in the algebra
in the normal equations (13-4).When
X is a linear function
of Z (i.e., when x is a linear
function
of z) it may be shown that
the last two equations are not independent,
and cannot
be solved
uniquely for b and c.4
Now
let's be less extreme in our
assumptions
and consider the nearlimiting
case,
where
z and x are almost on a Iine, (i.e.,where
all our observations in the room lie very close to a vertical
pane
of glass). In this
case,
a
plane may be fitted
to our observations,
but the estimating procedure is very
earlier

a In

practice, there

routines typically
4 Two

equations

suppose

can

be a problem
down in the face

usually

age (X)

John's

that

would

break

be solved

is twice

in

getting

of perfect

for

line

unknowns,
but not
(Y). Then we can write

two

Harry's

the regression

F,

since computer

multicollinearity.
always.

For

example,

X=2Y
or

5X =
Note that
unknowns,
dent

these two
but

information.

they

10Y

(13-5)

tell us the same thing.


We have two equations with
two
generate a unique solution, becausethey don't give us indepenThe second just restates
what the first told us.
equations

don't

MU

263

NEARITY

LTICOLLI

sensitive to random errors, reflected


in large
c. Thus, even though
X may really affect Y,
its statistical
s.gnificance
can't be established becausethe standard
deviation
of b is so lar\177 :. This
is analogous
to the argument in the simple
regression
case
in Secti\370i
12-6.
unstable;

very

[ ecomes

it

of

variance

b and

estimators

thl

Z are collinear, or nearly so, it is


For prediction purposes,it does not
hurt
provided'
there is no attempt to predict for values
of X and Z removed
from their lin e of collinearity.
But
structural
questions
cannot be answered
the relation
of Y to either X or Z cannot be sensibly
investigated.
When

th\177

the

called

pr\370i

independent variables .,Y

and

of multicollinearity.

>lem

Example 1

In our wf
fertilizer
incredibly

amount
in

example, suppose

.eat yield

mca\177

ured

of defining
of fer '.ilizermeasured in
foc

lish error

mu s t

ounces

ounces

Z =

exactly.

Thu s

we have

an

pl a

regression

one possible

xample of perfect
le to the observations

But

into

an

this

on

fall

Now

and fertilizer

original regression

solution

y satisfactory

equal

(13-6)
must

of yield

32.8 +

Y =

measured

weight

any

pounds'

multicollinearity.

be our

would

\177nswer

Since

16X

and Z

of,Y

\17711combinations

acre.

per

measurement in

be sixteen times its

the
Z as the

variable

independent

another

makes

statistician

the

that

amount of

(as before) the

X is

that

and

acre,

per

pounds

in

given

line, and
to fit s a
Tabl e 11-1,

straight
if we try
in

given

in

(11-18)'

.068X'+ 0Z

(13-7)

would follow from

(13-6)

substituting

(13-7)'
Y =

Another

32.8 q- 0X'

equi!alent answer would

be

q- .00425Z

to make

a partial

for

substitution

,Y in

(13-7) as follows:
32.8 +

32.8

32.8 +

Y=

(13-8) is a wh
i. In fact,

to

\177
The

computer

,le family
ai [1

these

program

.06811X +
+ .06811X +

of

.068)\177X

planes

three-dimensional
would

suppose the cal c' dations are

.00425(1

\"hang

up\"

(13-8)

-- i)Z

depending
on the arbitrary
planes are equivalent

probably

handcrafted,

q-

(1 - 1)XI
(1 -- 1)(\177o)Z]

trying

to divide

value assigned
expressionsfor
by

zero.

So we

MULTIPLE

264

our simple

REGRESSION

two-dimensionalrelationship

between

all give the same correct prediction

whatever coefficients

of

X' and

of

\276,

Z we may

no

fertilizer
meaning

and yield. While


can be attached to

come up with.

Example

While the previous extreme example may


have
clarified
some of the
theoretical issues, no statistician
would
make
that sort of' error in model
specification.
Instead, more subtle difficulties
arise.
In economics,
for exampie,
suppose demand for a group of goods is being related to pricesand income,
with the overall price index being the
first
independent
variable.
Suppose
aggregate income measured in money
terms
is the second independent
variable.
Since
this is real income multiplied
by the same price index, the
problem of multicollinearity
may
become
a serious one. The solution
is to
use real income, rather than
money
income,
as the second independent
variable.
This
is a special case of a more general warning: in any multiple
regression
in which price is one independent variable,
beware
of other
independent variables measured in prices.
The problem
of multicollinearity may be solved if there happens to be
prior

information

a priori

about

the relation

of/\177

7-then
even

this information
in

7. For

and

example, if

it

is known

that

the

case of

will

us to

allow

(13-9)

5/\177

uniquely determine the

perfect collinearity. This

is evident

from

regressionplane,
the geometry of

Figure 13-3.Given a fixed relation between our two slopes (\177 and 7) there is
only one regression plane rr which
can be fitted to pass through
F. This
is
confirmed algebraically. Using (13-9),our model
(13-2)
can be written

\"--

It is natural to definea

new

0(,

(13-11)

(13-11)

5Zi) -1t- ei

variable

wi

Thus

'JI- \177(Zi -1t-

x\177+

(13-12)

5z\177

becomes
(13-13)

and a regression of Y on
estimate of 7, it is easily

w will

yield estimates a and b. Finally,


using (13-9)'

if we

wish an

computed

c=

5b

(13-14)

INT El; PRETiNG

13-5

Suppose

baX' a

-Jr- b\177.l['\":\177
+

is fitted to 25 abservations of Y and the X \177s.The


are published
in the form, for example'

Y= 10.6 +
(So

2.6)

(q = 4.1)
The br

:'

265

regression

a nt- btX'I

REGRESSION

REGRESSION

ESTiM\177TED

AN

the multiple

ESTIMATED

AN

INTERPRETING

28.4.Y\177

4.0X\177

(st =

11.4) 02 =

(t\177, =

2.5)

(ta =

n L- b4X' 4

least

12.7Xa

squares

+ .84Xi

1.5)

(sa

14. I)

(s4 =

.76)

2.6)

(t4

.9)

(t5 =

1.1)

i13\177i5)

'

th, e reliability of t hle least


or hypOtheSis test.
The true'\177 effect of Xt on Y is the unknown population parameter fit;
we estimate
\177 with
the sample estimator b t. While
the
unknown
fi\177 is fixed,
our estimator'ib t is a random variable, differing
from
sample
to sampl e. The
propertiesof bt may be established, just as th e properties of b were established
in the
prewouis
chapter. Thus b\177may be ShOWn to be normal... again
provided
the sample si:':e is large, or the error
term is normal. bt can also be shown to
be unbiased, vith its mean fit. The magnitude of error involved
in estimation
is reflected in the standard
deviation
of bt which,
let us suppose, is estimated
to be st = I .4 as given in the first bracket below equation (13-155,and
shown
in Fig Ire 13-4. When
bt is standardized
with this estimated
standard
deviation,
it (/ill have a t distribution.
':

squares

ac

information

eted

fit, eliher

is used

often

estimate\177

in

a confidence

assessing

in

intervaI

To recap].tulate'
we don't know fit; all we know is that
whatever
it may
be, our estim ttor bt is distributed
around it, as shown in Figure
13-4', This
knowledge of how closely bt estimates
fit can, of course, be \"turned
around\"
to infer a 95 !iercentconfidence
interval
for fit from our observed Samplebt

p (bO

Estimated Standard

\177

\177deviation

of bl=

True/gl unknown
FIG.

13-4

DiStributiOn

Of'

the

11.4

bl
eStin\177atOr

b t.

266
as

MULTIPLE

REGRESSION

follows'

fl --- bl -4- t.025s1

= 28.4 4- 2.09(11.4)

= 28.4
size, k = is
4-

(13-16)

23.8

parameters already
n -- k degrees of
intervals
can be constructed for the other
fi's.
If we turn
to testing
hypotheses,
extreme care is necessaryto avoid very
strange conclusions. Suppose it has been concluded
on theoretical grounds
that
X'\177should
positively
influence
Y, and we wish to see if we can statistically
confirm this relation.
This involves
a one-tailed test of the null hypothesis,
is the sample

[n--25

estimated in (13-15),
and
freedom.] Similar
confidence

critical

the

is

number
of
t value with

the

t.o25

=0
the alternative

against

If

Hx'fx > 0

on fix --0,
and there will be only a 5 %
probability of observing
a t value exceeding
1.72; this defines
our rejection
region in Figure
13-5a.
Our observed t value
[2.5 as shown below equation
(13-15)]falls in this region; hence we reject H0, thus confirming
(at a 5%
significance level) that Y is positively related to X\177.
The
similar
t values [also shown for the other
estimators
below (13-15)]
can be usedfor testing the null hypothesis on the other \177parameters.
As we
see in Figure
13-5b,
the null hypothesis \1772 = 0 can also be rejected,but a
similar
conclusion
is not warranted for \177aand f14- We conclude therefore that
H0

is

true,

b\177 will

be

centered

p(bl)

(a)
/\1771
=
(Null

Original

values of bi

values for bl

hypothesis)

1.72

-' t.05

Do not reiect

Ho

t\177'
t'\177

Reject
tI

t2

Ho
Other t values

(b)

FIG. 13-5

(a) Testof fi\177.(b)

Test of other

.....

JJiLJ

fi's.

AN

INTERPRETING

the results are 'statistically


is related to ea, ',h. But the
As long a.\177we confine
we

esis

about/\1773

tered

argument

is

9.

X2; the

X\177 and

evidence is
for

significant

that

`va and

X4.

emphasis.

for

rei\177viewed

it is

While

for

are not statistically

results

267

ourselves to rejecting hypotheses as with/\177


and
too much dilTtculty.
But
if we accept the null
hypothwe may run into a lot of trouble
of the sort first encounSince this is so important in regression
analysis,
the

a \177d/94,

Chapter

in

REGRESSION

encounter

won't

/\1772-

significant\"

ESTIMATED

true,

stehlSwdt7;;

\177,\177aat\177lc\177l.
lYltSl!gsn;fia\177;;t(\177

t coefficient

our

that

example,

fo,,r

for

(.9)is

X3

not

\177t\177;l\177;\370s;etthh\177tre2\177
;\370avr\177las[;\177t\177SghltPheboeJeWt\177ce.,\177l
1

positively related to ,Va. In (13-15) this belief


is confirmed' Y is related to Xa by a positive coefficient.
Thus
our statistical
evidence is co tsistent
with
our prior belief (even though
it is not as strong as
we might
like
it to be). \370To accept
the null hypothesis fla -' 0 and conclude
that
X3 doesnt
affect \276, would
be in direct contradiction to'both our\177.prior
belief
and the :statistical evidence. We would
be reversing
a prior belief even
for

grounds

statistical

the

though

had we not

positi[e

any

for

looked

evlen

results contra\177lict
It
Xa

follow\177

and

X\177 to

estimating

evidence weakly confirmedit. It\177Would have been better


at the evidence. And
we note
that this remain i true

as

t value, although

becomes

firmation

Y is

that

b\177.,lieving

weaker.
our

Only if
that

if'

stat\177stlca*i

con-

do the statistical

or negative,

instead

prior grounds for believing


not be dropped from the
they should be retained, with
all the
strong

had

we

related to

(13-15);

eq\177hation

zero

is

our

smaller,

becomes

belief.

prior

from
this,
13e positively

Y,

they

should

info rmation on their t values.


It must
,e emphasized that those
who have accepted hypotheses have
not
necessaril
\177
erred
in this way. But that
risk
has been run by anyone
who
has mechaniially
accepted a null hypothesis
because
the t value was not

pertinent

statistically
cited
(because

the
true

it

w\177s

place.

first

null

khe

.i.e., if

hypothesis

is especially acute
was introduced strictly

for

not because there is any reason


less acute if there is someexpectation
.

are

as in

and

simple),

It becomes
I
\177here

difficulty

The

si&nificant.

when

theoretical

grounds

\177

\337

for concluding

that

the

case we've
convenience

to believe
that

y
and

it in
H0 is
,V are

unrelated. Su\177ppose for illustration


that we expect a priori that
H0 is true;
in
such a case, ? weak
observed
relationship
(e.g., t = .6) would be in some
conflict
with }our prior expectation of no relationship.But it is not a serious
conflict, and easily explained
by chance. Hence resolving it in favor of our
prior expecta:ionand continuing
to use H0 as a working
hypothesis
might
be
a reasonable udgment.
\177Perhaps
how

Y is

becat se of too small a sample. Thus


12.7 may be a very
accurate
description
of
significant because our. sample
relat e,:l to Xa; but our t value is not statistically
standard
deviation
of our estimator (sa = t4.1) is large as a consequence.

is small, and

th

268

MULTIPLE

REGRESSION

We conclude once again,


complete grounds for accepting
judgment,

statistical

it is
Y

to

H0;

belief playing

with prior

must be

acceptance

provides

theory

statistical

classical

that

in-

based alsoon extra-

role.

a key

Prior belief plays a lesscritical


role
in the rejection of an hypothesis; but
by no means irrelevant. Suppose,for example
that although
you believed
be related to X\177, Xa, and X4, you didn't really expect it to be related to

someone
had just suggested that
you
\"try on\" \177Vgat a 5 \177olevel of significance. This means that if H0 (no relation) is true, there is a 5 \177o chance
of
ringing a false alarm. If this is the only variable \"tried
on,\"
then this is a risk
.,V2;

wecan live with. However, if many such variablesare \"tried on\" in a multiple
regression the chance of a false alarm increases dramatically. ? Of course, this
risk
can be kept small by reducing
the level of error for each t test from
5
to 1 \177oor less. This has led some authors
to suggest
a ! \177olevel of significance
with the variables just being \"tried
on,\"
and a 5 \177 level
of significance
with
the other variables expectedto affect
\177Y. Using
this criterion
we would
conclude that the relation
of \276 and \177V\177
is statistically
significant;
but the
relation of Y to X\177 is not. despite its higher t value .... because
there
are no
prior grounds for believing
it. s
To sum up: hypothesistests require

1. Good judgment,

and

model being tested;

2.

An

of the

understanding

and limitations

assumptions

of the

understanding

theoretical

prior

good

of the

statistical

techniques.

PROBLEMS

13-4

Suppose a

yields the

of

Y on

based

on a

regression

multiple

estimate,

following

Y= 25.I

Standard deviations
\177

confidence

? Suppose, for simplicity,


of them) were independent.
example,
as .40.

for

1.2X\177

30'

1.0X2

--

0.50X'a

(.060)

(2.1)

(I.5)

(1.3)

( )

( )

the ! tests for the significance of the several variables


the probability of no error at all is (.95)k. For

k =

(+4.3)

limits
that

sample of n

(11.9)

t-values

95

three independent variables

Then

this is .60, making the probability

of someerror

(some

false alarm)

(say

as

10,
high

who thinks he would never wish to use such a double standard


might
suppose that
price level, X\177is U.S. wages, and X\177the nun-tber of rabbits in South Australia.
With the t values shown in equation
(l 3-15), what would he do ?

Anyone

Yis the

U.S.

VARIABLES

DUMMY

(a)

(1) The

spaces

blank

n the

Fill

(b) The following

above
or false.

true

of

coefficient

the

in

are either

,Y\177 is

estimated

269

estimate.

If false, correct.
to be 1.2. Other

scientists

and calculate other estimates.The


distiibution
of these
estimates
would be centered around
the
true
valtie of 1.2. Therefore the estimator
is called unbiased.

(2)

the5
(3)
.

\177olevel

Y,

ratler
ar

There

function s

at

various

income over

of the

in ana

applications

Intr6duct\177,ry

null hypothesis

\1772 =

of statistical

categories

\177

that

'Y,.

0 at
does

coefficient 1.0

0.

information: crosssection

two. In

time
i:\177cross-section
ysing

consumption is related to national


periods (time series);and sometimesthey use
section
we develop a method that
is especially
data; as we shall see, it also has important

how total

of time

number

a combinatior

prior reasons for believing


to use the estimated

not

,Y\177does

example,
econometricians estimating the consumption
use
a detailed
breakdown of the consumption of inincome
levels at one point in time
(cross
section);

tt ey examine

sometimes

(a)

major

For

s\370r\177
\177etimes

dividuals

useful

s.

that

hypothesis

null

VARIABLES

two

and time seril

the

accept

than

DUM? \177IY

13-6

were strong
it \177s reasonable

-\177fthere

for believing

to reject the

it is reasonable
of significance.

Y,

mflt\177ence

prior reasons

strong

were

\177fthere

inflt\177ence

samples

other

collect

might

this

series
studies

as well.

Example

how the public purchase of government


(Y). A hypothetical
scatter
of annual
observations .)f these two variables
is shown for Canada in Figure
13-6, and
in Table 13-2 It is immediately evident that
the
relationship
of bonds to
Suppose

bonds

(B) is

to investigate

\177ve wish

r \177lated

to

national

income

income follov/s two distinct patterns--one applying


in wartime
(1940-5),
the other in p\177:eacetime.
The norr
lal relation
of B to Y (say L1) is subject to an upward shift (L0
during wartir
re; heavy bond purchases in those
years
is explained not by Y
alone,
but al:
\177oby
the patriotic
wartime campaign to induce public bond
purchases. B herefore
should
be related to Yand another variable-..
war
(W).
But
this
is oni y a categorical, or indicator
variable.
It does not have a Whole
i.e.,

how

con s' \177mption

expenditures

are related

to income.

270

REGRESSION

MULTIPLE

B = 1.26 + .68Y
--

\177

'40

,41\177x

X \177
x\177.,
./x'\177

8 -_

\177

.\177,,42 \177

x '45

X
,/\17746

./\"- 49

x\177,,,,-\"\177,\177<S
'47
'47

'

\337

'37

3 '34
I

12

Y ($ billions)
FIG. 13-6

Hypothetical

scatter

of public

purchases of bonds

(B) and

national income (Y).

range of values,
but
only two: on the one hand, we arbitrarily
at 1 for all wartime
years;
on the other hand we set its value
peacetime years. SinceW is either \"on\" or \"off,\" it is. referred
ter\"
or \"dummy\"
variable.
Our model is:
B = \177+ t3Y + 7W + e

set its

value
all

0 for

at

to as

a \"coun-

(13-19)

where

W-

= 0
equation is seen to

This single

may

also

We

represents

for

be

years.

peacetime

equivalent

to the following two

equations'

B=v.+fiY+7+e

for

wartime

(13-20)

B=\177+fiY+e

for

peacetime

(13-21)

be called a \"switching\"
forth between (13-20)

variable.

With

war

and peace,

and (13-21).
that 7 represents the effect of wartime
on bond
effect of income changes. (The latter
is assumed
or peace.)
The important point to note
is that

we

and

back

switch

years,

for wartime

note

the

sales; and fi
to remain the
one multiple

same in war
regression of B on Y and W as in (13-I9) will yield the two estimated lines
shown in Figure 13-6; L\177is the estimate of the peacetime function
(13-21),
and
L2 is the estimate of the wartime
function
(13-20).
Complete calculations for our exampleare set out in Table 13-3, and the
procedure is interpreted
in Figure
13-7. Since all observations are W- 0,
or W = 1, the scatter is spread only in the two vertical planes rrx and rr 2.
Estimation
involves a multiple
(least
squares)
regression
fit
of (13-19)
to this

II

[.,-1

War

years

271

II

II

MULTIPLE

272

REGRESSION

1.26+
.68Y + 2.43 W
B=

(slope

= .68)

:.68)

13-7

FIG.

scatter. The resulting

Multiple regression with

fitted

The

slopes

b, and

as

of

c is the

L\177

a plane

and

estimated

variable (W).

plane

+ bY+

B=a

can be visualized

a dummy

resting on its two

L,. are (by


wartime

cW

(13-22)

supporting buttressesrq

assumption) equaP\370to

the

and

rr,.

value

common

shift.

l0 This

restriction
means that L\177and L\177.are not independently fitted. In other words,
our
least squares plane (13-22)is fitted
first; L\177and L\177.are simply \"read off\" this plane. Thus
Lx does not represent a least squares
fit to the left-hand
scatter,
nor does L 2 represent
a
least squares fit to the right-hand scatter.
Thus
the dummy variable method of fitting
a single multiple regression plane
reading off L\177and L 2, can be comparedto the alternative method of independently
two simple regression lines to the two scatters in Figure
13-7. Our model would

and the

B =

<z\177
+

B =

<z2

estimated slopes(fl\177and

fix Y

+ fl2 Y

fl=) would

e\177

+ e2

for

wartime

for

peacetime,

generally not be

the

and

then
fitting

be:

same.

(cont'd)

In a du m my

portant to
if

un\177

only

our

in

is in B and Y, their
relationship
cannot
be properly
into account. In other words, sinceexperimental
over he \"nuisance\"
variable
W is not possible, its effects \177'must
be r, moved in the regression analysis.To ignore this
variabl
e is to

control
explicitly

;s

only

involves

is taken

ir] our eStimators, as well


donsider
what happens

a bias
occurs,

bias

W must

and

terest

unh

estimated

invite

variables

why both

problem. it is imbe included. \177


Even

regression

as in any

model

variable

ierstand

273

VARIABLES

DUMMY

\177:the

sional B-Yplane,as in

scatter in
This

13-8a.

Figure

B and

dimensions

two

the !three-dimensional

projecting

variance. To see how


so that our scatter

an increased
is ignored,

as

if

Y.

Geometrically

Figure

13-7

infOlves

this

the two-dlmen-

onto

recognized as the

is immediately

13-6; we also reproduce from


that
diagram
L1 and L=, oU\177 estimated
multiple
regression
using W as a dummy
variable.
f we calculat\177
La, the simple regression of B on Y, it clearly
ha s too g?eat a
scatter

same

slope.This
Jncome
A

:rror is
Y. With no

projectedontd
to

means,
higher

be

of this

war

years
tended to b e high
scatter, higher bond sales

wartime would be (erroneously)attributed

to be expectedin
Y

of B and W Which
in Figure 13-7 would be

investigation

any

scatter

our

dimension,

B-W,. plane, as in Figure


13-8b.
the wartime effect is to look at
11 whicl\177 is too large. This upward bias would
bond
S\177tles that
should
be attributed in part

attributed
to wartime
has illustrated
the

be alpplied to

can

is in

applica:ions

to th e

due

be

only
s\177mple

in

difference

the

same 6ause:

income

to higher

Would

alone.

general nature of

a wide variety
removing

diagram the

In this

the

exar\177Ple

This

useful

that

estima[e

(erroneously)

This

in part to

fact
side

alo\177:\177e.

similar

ignores

way

bias is due to the


on the right-hand

beeattributed

should

that

to

up\177%ard

years:ithus

income

Figure

in

l\177Iotted

of

seasonal

shifts

in

variables.

dummy

but

problems,

time

one of the most


series data, as

explained nexf

Estimates
dummy

variable

of four parameters are required


for this model, rather than
the three' in the
todd (1%19); thus one advantage
of the dummy model is that it conserves

one extra degree!offreedom.


additional prior \177estriction--that
advantage.
For inst ance, in

The

disadvantag

Of

the

dummy

fitOde

I is

tha t

it

requi\177es an

two slopes are equal. But this is not always :a disour example
it may be better to assume the two slopes equal
than
to indepenc}ently
fit a wartime function
to only five obSerVations. The very
small
wartime
sample
\177ayyield
a very unreliable
estimate
of slope, and it may make better sense
to pool all the dh ta to estimate one stope coefficient.

tl This is

equiv

involved,

this re

represents

the

to a simple regression of
li ne would pass through
the ef\177,,ct of W on B.
a ent

ression

B on
these

W. Because

t wo

of the

means;

thus

peculiar
thei r

'scatter

difference

MULTIPLE

274

REGRESSION

BB=

1.26

lO

B=

1.57+.76Y

(simple regression of B on
\276;
slope (,76) is biased)

L2

+.68\276+ 2.43W

1,26+ .68Y

(dummy variable
regression; slope
(.68) is unbiased)

8
6
4
2

12

0o

(a)

Ix

9 x

B (wartime)

Average

5.55

Estimate of % the effect of wartime on B.


[Compare this biased estimate (3.45) with the
unbiased
estimate (2.43) in L\177.
in part (a)l

B (peacetime)

\177---Average

FIG. 13-8
(the

effect

one explanatory

when

Error

because the

of Y)

of

effect

categorical

IV because the

variable

variable
numerical

is ignored.
(a) Biased estimate
IV is ignored. (b) Biased
estimate
variable
Y is ignored.

of slope
of the

(b) Seasonal Adjustment


To
wish

to

illustrate,
examine

consider a spectacular
how department store

example from
sales of jewelry

real

life.

increase

Suppose we
over time.

When we plot quarterly


sales
(in Table 13-4) against time
as in Figure
13-9a,
we note how sales shoot up every fourth
quarter
because
of Christmas.
Since we are interested in the long-term
secular
increase in sales,
these
strange Christmas observations should be discounted.
This
calls for a dummy

TABLE

Canadian Jewelry
and Seasonal Dummies

13-4

( $100,000'
s)

Qa

24

29

29

50

24

30

0
1

29

51

26

29

11

30

12

52

14

30

15

29

16
13

25
50

01

9s8

'9

10

S )urce' Dominion

variable1\177'Q:\177 ?Or fourth

Even

this
in

me
the

rio

fiT

is

+ e

+/4Q4

Qo\177and

0
1
0
0

Ottawa.

our model

be adequate. If allowance
dummies

Q2

(13-23)
should

Qa should

be

also

be made

adde& A

for

rifftomy

in the analysis
at which we might
conclude
that explicit dCcount
swings. We may expect a strong seasonal influence
from prior
theoretical reasdning. Or, such an influence may be discovered after we Plot tPe :scatter.
Finally,
it may [\177ediscovered
by examining residuals after
the regression
is fitted. Clearly
those observatldns
indicated
by arrows (in Figure 13-9a) have consistently
high re'sidUals.

12 There

are

quarters

so that

quarter)

el may not
olher

of Statistics,

Bureau

S=
shifts

(s)

Q4

1957

1960

Sales

(Quarter Years)

1959

275

vARIABLES

DUMMY

thr\177e

be taken'\177of

should

To explain

points

seasonal

this, we look for something


they have in common. Their
commo
n property is
all ocC\177:r in the fourth quarter. Hencethe fourth quarter is introduced as a dummy
regressoro
This \177.chnique of \"squeezing the residuals till they talk\" is important ifi every
kind of regress: >n, not just time series; used with discretion,
it indicate s which
furth er
regressors may
.e introduced
in order to reduce bias and residual variance.
that

they

MULTIPLE

276

REGRESSION

50

40

cS 30

x x
x

x x

$: 31.4+ .31T
X

(biased

slope

= .31)

\177,20

co 10
I

1957

50

1959

1958

1960

4O

24.2 + .075T + 25.5Q4


+ 4.4Q
8 + 4.7Q9.

cS 30
m..\177x

---\177

\17720

Effect of
(unbiased

TaloneonS
slope = .075)

co 10
I

1957

FIG. 13-9

growth

Secular

justment. (a) Inadequate

1960

1959

1958

in Canadian jewelry
sales,
with and without
seasonal
adregression of S on Talone. (b) Multiple
regression of S on
T, including
seasonal
adjustment.

simple

needed for the first quarter,


because
Q2, Qa, and Q4measure the
a first quarter base. (Whether
or
not to include the various
regressors Q4, Qa, Q,., can be decided on statistical
grounds,
by testing for
statistical significance.It is common
to include
them all in such a test, and
reject or acceptthem as a group. But such a statistical
test
on data as extreme
as ours would
be superfluous.)
Our modified model is now
Q\177 is

not

shift

from

The least
seasonal

squares fit

adjustment

The least squares


13-3. Equation system
5 unknowns.

la

\177z+

fit

/3\177T

\177a is

(/24Q\177

+/3\177Q\177)

in Figure

graphed

is exactly the

flaQa

same every

to this

model was calculated by

(13-4)

was extended

to a

system

year,

(13-24)

13-9b. Notice
i.e.,

each

that

our

year there

is

similar to that of Table


estimating equations for the

a method
of 5

DUMMY

the same

up\177

(b2)

shift

ard

seaso/

(lhese

shift

\177aI

in

277

VARIABLES

the first and second quarters.


not always be positive, as in our

fit between

our

need

coefficients

example.)
By contr. .st, the simple regression of S on T without
quarterly
adjustment
is graphed in ?igure 13-9a\177 It is a poor fit, with large residual variance. Even
worse,
the ca culated slopeshowing
the relation
of S to T is biased,for the
same reasons a s i n the bond example of part (a).

(c) Seasonal [djUstment

(Moving

Dummies

without

Average)

;
Dummy

comi

Another
the

\177
\177ariables

means of seasonally adjusting

the only

is to take

method

non

serie{,

time

not

are

a moving

(over

average

as shown in Table
13-5. Note how the wild
out in this averaging process. The
desired

Christmas is i\177'oned

to

time

n )w

can

S' on r.

It is

be estimated

a simple

by

regression

a whole
seasonal

relation

of seasonally

data.

year) of
at
of sales

swing

adjusted

this method with


the
dummy
variable
is that a total of three
observations
and end of the time series, in order
to g\177t the
moving\177avera\177e started
and finished. However, although
it is less evident,
the same loss'\177is involved
in using dummy variables, since three
degrees
of
freedom are lc\177st in estimating
the shift coefficients/\177,/\1773,
and
An advantage
of the moving average method
is that
it is not necessary
to assumea ionStant
seasonal
shift;
thus the adjustment for any
quarter
to

resting

int{

alternative. An apparent
are lost at th.\177beginning

compare

disadvantage

TABLE 13-5

Moving Average

S'
Time

\17757

1
1'\17758

$ (Unadjusted)

24

23

29
29

50

24

30

29

51

by

(Adjusted

Quarter

Four

Moving

Average)

\177\254(24

+29

\177(29\177
+

+29

29 +
\337
\337

+50)

50 +

=33

24) = 33
=33.25
= 33.25

= 33.5

REGRESSION

MULTIPLE

278

from year to year. The advantage


shifts and the relation of $ to

varies
seasonal

same regression.

both

fhat

in the

simultaneously

is

adjustment

average

moving

(A

is

variables

dummy

of

T are estimated

first

the

only

stage

is completed can S' be regressed


on
Another
advantage is that the dummy
coefficients
(fl\177, f13, and
f14)
give
index of the average seasonal shift,
and
tests of significance on them

two-step process;

using standard

be undertaken

easily

it

after

only

in a
T.)
an
can

procedures.

PROBLEMS

13-5

(a) Using the

simple

(b) Using

multiple

the

Is

(a)

the simple

(b) Compute the

(c)

Which
by

13-8

13-7

the

of jewelry

regression of

sales
$'

13-5,

Table

in

adjust-

seasonal

on T;

(adjusted)

simple

seasonal

using

to the

Referring

from the
available:

S for

regression
of $ (unadjusted) on T;
in (a) and (b),
do you think
better shows the time trend of sales ?
agrees
more
closely with the slope b\177 = .075 estimated

(1) Which
(2)

the sales

of 1961)

2 slopes

the

Of

predict

13-9,

Figure

quarter

of $ on T alone;
regression
of $ on T, including
than (a)?

years

two

the

Compute

in

first

regression

better

any

this

13-6 Referring to

13-7

17, the

(T =

quarter

ment.

sales

the jewelry

to

Referring

next

4th

dummies

jewelry sales in

consider

13-4,

Table

11th quarter.

the

to

9.

Supposing

the eight quarters


the only data

were

this

(a) Fit a simpleregressionline of $ on T, without


quarterly
adjustment;
(b) Is your slope estimate (time
trend)
unbiased
9. Why
9.
Referring
to Figures 13-6 and 13-8a,suppose
the
last 4 years are
missing. If a simple regressionof B on Y is calculated (ignoring W),
will
the bias of the slope be lessor greater
than
before
(when all the
years were used)?Why ?
ANALYSIS OF
COVARIANCE

REGRESSION,
OF

ANALYSIS

(a) Regression with


of

Analysis

If all the
regression

This can

Dummies

Equivalent

AND

VARIANCE,

to Analysis

of Variance or

Covariance

independent variablesare categorical (dummy)

analysis is essentially the

be proved

in

general;

familiar

but it is

analysis

more

instructive

variables,

of variance
to

then

(ANOVA).

illustrate

it in

applied analysis of
with a

used,

n[odeIof the

Y=

cz

in Table

ts analyzed

Note also that'!

in

both

of

estimate

to be identical\177

to our

sion on a nur\177ericaI
This could alte'rnatively
to

and

analysis

analysis

as

in

value (b

same
10-3

(17\177

variance (48) is the

=--8)
8).

=-

F2

--

so i s

same;

earlier exampleof explaining

sales'

bond

as a

regres-

variable (wartime).
be described
as a combination of standard regression
\177alysis
of variance.
Technically,
this combination is referred
of covariance (ANOCOVA), although
this
term
is Often
and

(income)

variable

a dummy

reserved for c\177.ses in which the effect of the dummy


variable
(wartim:e)
of prime interi,\177st and the other variable (income) is explicitly
introduced
only to remov i its noise effects
(i.e.,
to prevent
the sort of error shown
a

Another

pplication of

education,

biasing the

on income;

here the

study

concern

major

of

WOuld

',gression

res\177 \177lt.

Summary

Multiple
We

tions.

\337

in

income of the dummy


variable
(negro
versus white), with
a
on other numerical variables (years of experience,
etc. ) simply
a means
of keeping these other influences
from
r

simultaneous

\177

might be

of covariance

analysis

the

the effects of r\177


cial discrimination
be the effect o\177

\177'

is

13-8b). -

Figure

(b)

the

procedures are:seen

the two

Hence

(x/\177-\177x/'i-\177).

the

find
Problem

\177

referrid

We

13-6. We

that we found
tests the residual

standard

(13-25)

women.

in groups

differer\177ce

error\177 1

men

1 for

--

The data

been

fiG +

G = 0 for

the

(Y) of

have

alternatively

form:

where

for

of whether the income

regression could

Dummy

10,13we

Problem

In

variable.

problem

the

to

Variance

differs.

wonen

and

men

279

CoVARIANCE

AND

one independent dummy

c.tse of

simplest

the

OF VARIANCE

ANALYSIS

REGREssION,

independent

1.

\"

2.

ANOV.

Standt

variables.
3.

',gressionis an

r\177

deft

ANOC(

and numerical

\177e

three

riables:

extremely

cases,

special

\177dregression\"

is equivalent

useful

tool with many

distinguished

is regression

on

to regression on

VA (analysis of
ariables.

only

covariance)is regression

categorical

on

of the

variables.

numerical

only

broad apPlicanature

the

by

both

(dummy)

categorical

28O

REGRE

ANALYSIS

SION,

OF

These three te, hniques are

comparedusing

13-10to 13-13,which

the possible
in Figure

The

show
data

hypo!\177hetical

the
hypothetical
data of Figures
ways that mortality
may be analyzed.
13-10 shows a sample of observations

American men. Applying


reject the hypo[hesis that
the
true slope
mortalit\177 of

the

of

the

affect

does

age affects

of how

data

the

If

!irnortality
mortality\177

shown in

rate.

In the

is collected into

Figu4e

Note

\177 13-11.

--

0;

process we

we WOuld

regression,

standard
/9

thus

we conclude

derive a

three groups, we

up

come

this is exactly the

that

281

COVARIANCE

AND

VARIANCE

useful

that

age

estimate

b,

the scatter

with

same set of mortality

s as in Figure
13'10.
The only difference is that
we are n\177o\177
c about the age (X)variable.
Now
ANOVA
can be appli ed
to this data to t,\177'st whethe r the population means of these three scatters
are
equal. Once ag\177Lin, the conclusion
is that age affects mortality.
However,
ANOVA
does tot tell us holy age affects mortality,
unless we extend it to
multiple
compa:'isons.
Moreover, multiple comparisons will yield a whole
complicated tabl e, whereas
standard
regression provides a singledescriptive
number
(b) shov.'ing,
how age affects mortality.
So long
as 5\177is numerical,
as in Figures 13-10 and 13-11,we conclude
that
standard
re'\177resSion
i s generally
the preferred technique. But
when
x
(Y)
\370bservatimi
longer
as specifi

is categorical, it icannot
how mortality d{pends

categories(American,
humeri
is out of the qued,
these on a

If

For example,

be applied.

on

British,
Cal scale--or
tion, \177 and

our

nationality;\177

in

Figure

X variable

13-12

we graph

ranges over

vari6us

etc.) and there is no natural


way
of placing
even ordering them. Hencestandard regression
ANOVA
must be used.

s dependent
on income as well as nationality,
the
analysis
shown in Figure
13-13 is appropriate.
This uses nationality
dummies,
with th\177
tt e, numerical variable income explicitly
introduced
to eliminate
the error
that this has greatly
improved
our an a it might otherwise cause. We confirm
in Figure
13-12 that a national
characteristic of tj [ysis. Whereas it appeared
was a lower mortality rate than
the
Chinese,
we
see in Figure
13-1 \177eBritish
that
it
is
not
so
simple.
The
h\177ight
of
the
fitted
planes
for
China and the U
/ted Kingdom
are
practically
the same. The lower U.K.
mortality
rate is e
cplained
solely
by higher income.
mortality

of covariance

regression

x\177Standard

However,

if

this

tech\177 could

13-10.
Figures 13-12 an

of Figure
\177In

ique

also
is to

be applied,
be applied,

with
it

a line

is more

fitted to the scatter


efficient to use the

in Figure
ungrouped

13-11.
data

are assumed drawn from a single


age group; we
fa 13-13, all samples
tors influencing
mortality.
the stt
scatter in Figure l 3-12 dent will note that a standard linear regression line fitted to the
does not matter.
Yet if China is graph will yield b \177 0 and the conclusion that nationality
that
nationality does matt, ed last, rather than first, b \177 0 and it would be concluded
r. Thus, the conclusion depends
on the arbitrary ordering of our
nationality
variable.

consider only

\1770
To

confirm,

other

,Age

FIG.

13-10

\"Standard

regression,\" since X

25
Younger

30

35

Middle

Older

Age

FIG. 13-11

X is

x
x
x
x
x

x
x
x
x
x

is numerical.

group

grouped into classifications, and

ANOVA

may be

x
xx

U.K.

U.S.S,R.

x
xx
x
China

U,S.

Nationality

FIG. 13-12 X is categorical,

and

282

X
ANOVA

must be used.

used.

;\367i

ANALYSIS

SION,

REGRE

VARIANCE

OF

283

COVARIANCE

AND

Nationality

X
:apita

income

China
FIG.

13-13

In

A:

\177lySis

sumn

the independ,

be described>y

a simple

function.

is a

more

and

(nationality)

tool

powerful

numerical

whenever

and the dependenceof Y onX


Analysis of variance is appropriate if

numerical

X is

variable

independent

regressionis the

variable

:nt

U,S.S.R.

for a categorical variable


variable (income).

of covariance

ary, standard

U.K.

U.S.

set of

can
the

unordered categories.

PROBLEMfi

13-9

'13-10

a confidence interval for \177 using


with the answer to Problem

Con,, truct

Corn

\177are

Usin

the

fertil'

in Problem

data

dummies.

using

two

is the

result

type,

[zer

Prol: lem

in Table

data

the

i3-6.

10-3b.
.
10-2, estimate the regression of yield on

this

Compare

answer

your

with

to

10-2.

i3-11 The[oli\370Wing

of a

test of gas consumption

on

a kample

of 6 :ars

Miles Per Gallon

Engine

ttorsepower
,

Make A

Make

(a)

the

)etermine

of tl

\177etwo

(b)

Graph your

makes,

21

210

18

240

15

310

20

22O

18

260

15

320

in the performance (miles pergallon)


for horsepower differences.
in Figure 13-13.

difference
allowing

results

as

284
(13-12)

REGRESSION

MULTIPLE

(a) Based

on the

how education

analysis of
father's income

use the

information,

sample

following

describe

to

covariance

is related to

and placeof residence.


(b)

Graph

your

results.

Years of

Formal
(E)

Education

Urban Sample

Sample

(?)

15

58,000

18

11,000

12
Rural

Income

Father's

9,000

16

12,000

13

S5,000

10

3,000

11

6,000

14

10,000

14

chapt

14-1

SIMP

CORRELATION

LE

analysis showed us how variables


are Iinearly related;
show
us the degree to which
variables
are linearly
related. In regression analysis, a whole
mathematical
function
is estimated
(the
regressm
n equation);
but correlation
analysis yields only one number --an index desi\177!
\177ned to give
an immediate picture of how
closely
two variables
move
togeth e
r. In correlation analysis, we need not worry about cause and
effect relatim
is. Correlation between
X and
Y can be estimated regardless
of whether: (
t)
X affects
Y, or vice versa; (b) both
affect
each other;
or (c)
neither
direct]
/ affects the other, but they move together because somethird
variable
infiu\177
races
both.
Although
correlation is a less powerful
technique
than
regressic
n, the two are so closelyrelated mathematically
that
correlation
often become a useful aid in regression
analysis.
Simple iagression
correlation
a
\177alysis

(a) The

Popul

In

equati

varia

random

ation

will

Correlation

'n (5-22)
\177les

move

We

have

together'

Coefficient p (rho)
already

axr

defined a useful index of


, the covariance of X and

variablesused there were deviations from the mean'

two

how

Y.

The

(14-I)
It

will

be

uset

ul

to

expressthese

deviations

285

in terms

of

fully

standardized

286

CO}t}tEL^X\177ON

define

i.e.,

units;

the

variables'

new

X--\177x

(14-2)

ax

Y--Pt

Correlation pxr
being

in (14-2)

in

(5-22),

the

replace those in

(14-1).

Thus

our

be

(generally)

(b) The

more

interpreted

to r, the

attention

unknown

only difference

correlation

Population

This will

axr

to covariance

similar

is

variables

the

that

in

fully

Section

(14-3)

14-1(c) below;

sample correlation coefficient

used

for now we turn


this

estimate

to

p.

population

Sample Correlation Coefficient

with (14-3)

By analogy

Sample correlation

rxy -- n
Now consideran

s3

sX

index; (becauseof the

of this

development

intuitive

(14-4)

simi-

of the two concepts, some of this


interpretation
will closely parallel
the development of covariance
in Chapter
5-3). As our example,\177
we use the
marks on a verbal
(Y) and mathematical
(X) test scoredby a sampleof eight

larity

college

Each

students.

student's

scatter shown in Figure


columns of Table 14-1.

14-1a;

performance is representedby
this information is set out in

Sincewe are after a single measure of how


related, our index should be independent
of our
both
axes in Figure 14-lb, with
both
x and 3t now
the mean;i.e.,
Values

of the

x=

translated

,\177

and

variables are shown

\177d
=

in

a dot
the

on the
two

first

these variables
are
of origin. So we shift
defined
as deviations from

closely

choice

Y --

columns

F
3 and

(14-5)
4 of Table 14-1.

20

40 X =

60

80

y:Y-Y

P.
\337

P1

20

10 \370

x:X-X

I.

(b)
X:

60

$y

lo

Y= 50

X= 60

FIG.14-1

Sca

ter

of math

and

(c) Change

verbal

scale

scores.

(a) Original

of axes

to

287

standard

observations.
units.

(b)

Shift

axes.

288

Suppose

and

in the

be

in

is negative

for observations
tte positive, the

observations

most

produc\177s

x//

positive relat\1770nshir\177
(i.e.,
rather

quadrants,

fall

will

be Positive,

between

the

rises

oge

when

than \177\177phill;
lding

in the

will

in the
If

negative).

first

as

x and

\177/coordinates

for

also holds true

and

third

will

their

second or fourth
X and Y move

together,

quadrants;
consequently
sum
a reflection of the

Y But if X

in the
for will
our fall\177 x\177j
index.

most
observations
a negative
value

ob-

quadrant,

and Y are negatively


other falls), the original scatter will run
X and

will

any

coordinates negative.The product

such as P2

other

its

This

with both

on ly

most

14--lb, both

x\177j positive.

quadrant,

for each student,

good measure of how math and


Whenever an observation such as P\177

us a

together.

in Figure

product

values

coordinate

gives

x//)

move

to

tl \177ethird

coordin

(one

(\177

quadrant

a!nd their

positive,

servation

This

tend

fiXst

x and

the

multiply

all.

results

verbal
falls

we

thej

sum

289

CORRELATION

SIMPLE

rdated,
downhill

second
and
We conclude

fourth

that

as an index (,f

correlation,
\177 x//
at least carries the right
sign.
Moreover,
there is no relationship between X and
Y, and
our observations are
distributed e\177 ',nly over the four quadrants, positive and negative
terms
will
cancel, and tl\177tis index will be zero.

when

There at4 just


the units

in

\177hich

two

that

ways

x and

\177t/are

\177

be improved.

x!/can

(Suppose

measured.

First

the math

out
o f 50 instead of I00; x values
and
our \177 x\177/index
hair as large-\177-even
though
the degree
to which
verbal
and
performance
s related would not have changed.) This difficulty
by measuring both x and ?,, in terms of standard tinits;
i.e.,
are divided
b' their observed standard deviations.
marked

Xi

--

of cou

on

depends

been

be only
mathematical

would

is

both

avoided

x and

?/

sX
Y\177-

where,

it

test had

(14-6)

'se,
o

and

\177(14-7)

This step is sl
Our

new

own

in Figure

14-1c.

only one remaining flaw' it is dependent


on
size. ( .guppose we observed exactly the same sort of scatter from a
sample of do{\177ble the size; our index would also double,eve n though the

sample

index

\177 x\177:g\177
has

290

CORRELATION

Ca)
Y

{r=-ll

(d)

\275c)

FIG.

picture of

how

(f)

Scatter diagrams

by

This yields the

the

sample correlation

terms

is the same.) To
rather n- i, the

avoid

divisor

this

in

coefficient'

n--1

is recognized to be our definition


the original observations (X\275,

of

correlation coefficients.

associated

sample

,'
which

their

and

move together
size n or

variables

these

we divide

problem

(14-7).

14-2

(e)

(14-8)

xiy,

in
Y\275),

(14-4).
by

r may

substituting

be expressed in
s x and s r in

into

(14-7)

( 4-4)

1)'

(n --

and cancelling

291

CORRELATION

SIMPLE

(x; -

F)
--

52 (v,. -

'F)

- are appliedto (14-9)to

The dat

Example
i in Table 14 1
coefficient be!ween the math
and

scores of

verbal

calculate

the

(14-9)

correlation

our sampleof eight

sthdents.

654

(14-10)

= .62

=
\177/i1304)(836)

given
in Figure
14-2; especially note
perfect linear association, the product
of the
coordinates
n every
case is pos t\177ve; thus,
their sum (and the resulting coefficient of c..rrelation)is as large as possible. The same argument
holds
for
the perfect i\177',t)erse relation
of Y and ,g shown in diagram
d. This suggests
that
r has an upper limit of + 1 and
a lower
limit of -- 1. (This is proved
in

Some

:a

Section

(f)

behaves is

of how r

diagram b. \177ghen

is a

there

bi\177,low.)

e andf. Our calculation


of r in either case is
of the coordinates are offset
by
negative
ones. Yet widen
we examnine
the two scatters, no relation
between
X and Y
is confirmed i\177e but a strong relation is evident
in f; in this case a knowledge
of' X' will
tell'us a great deal about
Y. A zero value for r therefore does not
imply
\"no re ation\"; rather, it means
\177'no linear
relation.\"
Thus correlation
\177ompare diagrams

Finally

zero,

products

positive

becaus\177

is a measure of linear
relation
only;
it is of no use in describing
relations.
This brings us to the next critical question' \"In
what can we nfer about the underlying
population
p?\"

(c) Inference

Before
r,

statistic

from
and

x\177

which

verbal

e can

student.

If wi i

diagram

will

inference
about p from Our sample
assumptions
about the parent population
was drawn.
In our example, this would
be th e math
by all college entrants.

\177rsample

scored

many

might

statistical

any

our

clarify

This pop!ulation

of course, b e

draw

must

m :\177rks

r,

r to p

from

wi

nonlinear
calculating

subdivide both
be divided up

this

in

X' and
in

14-3, except that

as in Figure

appear

more dots
a

scatter,

Y into

Checkerboard

each

there

representing

class intervals, the area


pattern,
Fro m the

Would,

another
in
rleiative

our

CORRELATION

292

Math

FIG. 14-3

Bivariate

scattergram

population

frequency (sampling probability) in


Figure

14-4

is constructed.

\177The

each

of

histogram

score

(math and verbal scores).


the squares,
would have

the histogram in
approximately the

probability
density
in Figure 14-5. To conclude:in examining
student, neither his math score ,V nor his verbal score Y is predetermined;
both are random variables. Comparethis with our example in
Chapter 11, where one variable
(fertilizer)
was predetermined.
This distribution
in
Figure
14-5 is called \"bivariate
normal.\"
This
means that the conditional distribution
of ,Y or of Y is always
normal.

shape of the
a random

slice the surface at any value of Y, (say Y0), the shape of


the resulting cross section is normal. Similarly,
if we select
any .,Y value
(say ,V0) and slice the surface in this other direction,
the resulting
cross
section
is also normal.
It is worthwhile
pausing
briefly
to consider the alternative way that the
bivariate
normal population shown in three dimensions
in Figure 14-5 can
be graphed
in two dimensions.
Instead of slicing the surface
vertically
as we
did in that diagram,
slice it horizontally
as in Figure
14-6. The resulting
Specifically, if we

FIG. 14-4

Bivariate

population

histogram.

example is of a finite population, but a similar argument would apply


for an infinite
population. Moreover,instead
of using heights for probabilities,
we could use dots of
different
sizes; see Figure 5-4a.

\177
Our

Jr(,

\177

FIG.

section

cross

14-5

d :nsity. This

dimensional
this surface

Bivariate normal

representing

ellipse,

is an

probability

all X,

\275ariable

as a

associated

set of

Figure 14-7 to

in

isoprobabilitellipses.If the
its major

is marked

th e

with

combinations

the

in

\"c\"

i\177,Y space in Figure I4-7; isoprobability ellipsesdefined


sliced horizontally
at higher and lower levels
are
also

one

be useful

distribution.

curve

\"isoprobability\"

(Onceagain, nany social scientistswill


of forcing
a three-dimensional function
showing

293

CORRELATION

SIMPLE

major

axis

(d)common

bivariate

increases.

will

It

also

to all these
5 about

normal
distributionsconcentrate
Several
examples of populations,
cc\177rrelation coefficients
p are shown in Figure
14-8.

aX\177s,

shown.

as the familiar strategy


a two-dimensional
space by

into

the

two
when

this

recognize

isoquants, isobars,or whatever.)

mark

same

and

their

that the parent population is bivariate


normal,
inferences
p can easily be made from
a sample
correlation
r.
Recall the in: brences
about
\177from
P in Chapter 8. Using the same
reasoning
that establi s \177ed Figure
8-4,
Figure
14-9 is constructed. Thus fro .m any
Provided

about

sample
example,

the

p\177pulation

r, a

\1775\177

if a

confidence

interval

sample of 25 students

for the population p can be found,


For
r = .80, the 95 \177 confidence
interval

has

(Y,X)

FIG,

14\1776

An

isoprobability

ellipse from a

biVariate

normal

surface.

FIG. 14-7 The

bivariate

normal

distribution

shown as a set of

isoprobability

ellipses.

FIG. 14-8 Examples

of population

294

correlations.

295

CORRELATION

SIMPLE

q-

q-

q-

1.0

0 +.2 +.4 +.6 +.8+1.0

-.4 -.2

.8 -.6

correlation

Sample

FIG. 14-9
various

951\177'
o

bands

Confidence

sizes n. (This

sample

chart

Pearson from F. N. David,


Tables
of the ('orrelation Coefficient

bution

for p

is reac

vertically

for correlation p in a bivariate normal population, for


with
\177
is reProdUCed
the permission of ProfesSor
E. S.
of the Ordinates and Probability
Integral of the Distriin Small

Samples, Cambridge

University

1938.)

PreSs,

as

.58

< p

(14-11)

< .90

Becaus,
O f Space limitations,
We Shall concentrate in the ba!.an?
of this
chapter on ample correlations,and ignore
the corresponding
population.
correlation s BUt eaCh time a sample correlation is introduced,it should
be
recognized
\177at an equivalent
population
correlation is defined similarly,
and
inferences
ray be made about it from
the sample correlation.

PROBLEI\177

14-1

Son's

Height

(inches)

Father's Height

(inches)

68

64

66

66

72

71

73

70

\177'

66

69

\177

CORRELATION

296

From

(b)

The

(c)

At

random sample of 5

the above

5\177o

and

heights,

father

find

for the population correlation p;


can you reject the hypothesis that

interval

95 \177 confidence
the

son

r;

correlation

sample

The

(a)

level,

significance

p =07

sample of student

14-2 From the following


Student

(a)

Second Test

80

60

70

40

40

30

40

40

60

(b) Calculate the

95 \177oconfidence

find a

interval

Y on X, and

of

regression

90

and

r;

Calculate

Test X

First

grades,

find

for p;
a

95\177o

confidence

for fi;

interval

null

hypothesis

and the estimated regression line;


can you reject
p = 0?

null

hypothesis

\177 =

(c) Graph

the

(d) At

the

(1) The
(2) The

5 data

points

level,

\177osignificance

(d) Correlation and

If regressionand
scatter of math
(X)

07

Regression

analysis
were both applied to the same
(Y) scores, how would they
be
related?
Specifically,
consider the relation between the estimated
correlation
r, and
the estimated regression slopeb. In Problem
11-4(b)
it was confirmed that
correlation

verbal

and

_\177xy

(14-12)

E x2

and from

(14-9)

noting

that

both x and
\177

When (14-12)is divided

by

defined

as deviations

xy

(14-13)

=
\"

are

,/E

42;y

(14-13)

(14-14)

cORRELATI

SIMPLE

If we divide b )th the numerator

inside the

and denominator

297

\370N

sign

root

square

byn--1
- 1)= sr

(14-15)

sx

1)

or
b=rS__L

(,!4-I6)

sx
This

close

argument

cr :respOndence
late r. Note that

(e) Explained

of b and r
if either

!!i
ii
,,

role

important

an

play

will

zero, the other will

r or b is

also

in

the

be zero.

and'Unexplained Variation

Figur

14-10 we

reproduce our sample of

(y)
straightthe information
in Table
14-1. Now, if we \177ished
to predict
a tudent's verbal score (Y) without
knowing
.,V, then
th e best
prediction w( ul d be the average Observedvalue
(P).
At x\177, it is clea r from
this
diagram
hat we would make a very
large
error
namely (Yi -- Y), the
deviation in (\177from its mean. However, once our regressionequation
has
been calculat
',d, we predict Y to be Pi. Note how this reduces
our error,
since (Pt - i\177) the large part of our deviation is now \"explained.\"
This
leaves only a relatively small \"unexplained\"
deviation
Y\177- ?\177. The !! total
In

scores, along ,/ith


forward way \177rom

the

fitted

of
set out

regression

on

Y/\276i
- \177=total
dewation

rnath

Y/=

explained

deviation
by

, \177, '\177.\177,.-.'\177
\177,.
\276=\177

'

\177/'\177.\177'\177'\"\177

Fi(

\337
14-10

The value

of regression

(X)

and

not

regression

\177- \276=deviation

ex;lained by regression

xi
in reducing

variation

verbhl

in a

calculated

.,V,

in

Y.

298

CORREL^T\177OS

of

deviation

Y is the

sum:

--

Y) =

(Y\177

total deviation
It follows

--

is that

is surprising

YO, for any

--

-- explained deviation

unexplained

q-

(14-17)

deviation

Y) =

\177

(?\177-

(L

\177

(14-18)

-- L)

when these

holds

equality

same

this

\177)

deviationsare

i.e.

squared,

_
or, total

establishedin
we

Since

same way.

may write,

( Yi

\177

--

total variation

This equation
counted for

Y)=
b2

\177

the fact

explicit

11-4(a),
(14-21)

bxi

\177i =

x \370q-

\177

by X
that

regression

estimated

the

explained

variation

makes
by

y)2

--

a very

Recall

(10-16); (14-19)can be

(14-19)as

to rewrite

convenient

(14-19)

variation

unexplained

squared deviations.

according to Problem
(L --

is often

q-

of variance

analysis

in

the

much

sum of

as the

defined

is

similar conclusion proved

- L)

=E(?

F)=

variation = explained variation

where variation

it

(Y\177

that

\177 (Y,\177

What

Y) +

--

(Y\177

( Yg

(14-22)

?z)2

q- unexplained variation

explained

is that

variation

coefficient b.

This

procedure

acof

total variation and analyzing


its components
is called \"analysis
of variance
applied to regression.\" The components of variance
are displayed
in the ANOVA Table 14-2similar
to Table
10-6, (bearing in mind a that we

decomposing

\177
For

proof,
\177

(Y,

square both sides of


--

F)\177=

The last term can

\177

(14-17),and
F)+

[(Y\177-

\177(y\177._

be rewritten

over all values

of i:

Y\177)I
\177'

F)e+\177(y\177_
using

y\177.)2+2\177(r, --

F)(Y, --

Y\177.)

(t4-20)

(14-21)'
2b

this sum vanishes: in


to estimate our regression

(Y,-

sum

\177 x\177(Y\177
--

Y\177)

it was set equal


to zero in the normal equation
(11-I5)
used
line. Thus the last term in (14-20)
disappears,
and (14-19) is
proved. This same theorem can similarly
be proved in the general caseof multiple
regression.
A further justification
of the least squares technique
(not mentioned
in Chapter 1t)
is that
it results in this useful relation between explained,
unexplained,
and total variation.
a And
also noting that our terminology
for degrees of freedom has changed,
e.g., the total
number of sample observations
is now designated simply
as n, rather than
nr.
But

fact

coRREI,ATio

SIMPLE

TABLE 14-2
(a)

Degrees of

Source of lariation

Explained

(b\177?

or
\177

b2

\177

of Verbal
of

Sources

(by

Explained

regression

Total

expla

now

const

plained

variar

the

ning

14-1)

Degreesof
(d.f.)

Freedom

Variance

3.87

328

328

84.7

x/

1'). From this, a null hypothesis


the question is whether the ratio

as before,

?,)2

n-1

than

rather

Y,

ucted;

_:2

\177(\276i

508
836

may be

-.-

t/

Variation

Variation

S2

-- 2

Scores (Table

Math

and

residual)

Unexplained

reject

?i) 2

(I5

\177 (\276\177-

(b) For sampll

Variance

Z(?,--

regression)

Total

are

Freedom (d.f.)

Variation

residual)

Unexplained

Linear Regression

Table for

ANOVA

General

299

.ce to unexplained
variance is
hy i \177othesis
that
Y is unrelated
to

1'. Specifically,a

fi

ex-

1 to

than

greater

sufficiently

on

test

of the

the

Of

test

hypothesis

involves form

Ho'fi = 0,
ng the

(14-23)

ratio
regressipn

d by

explaine,

variance

unexplained variance
(14-24)

s \177
-

5 \177osignific,

of the

distribU

ice test
tion

involves

in the

from (14-24)e tCeeds this, reject the


We

must

hypothesis

confidence

int(

-23).

rval

The

for

fi

(as

critical

tail. If

the

F value
sample

Ieavqs

which

\177

calculated

F value

hypothesis.

that this is just an


first method
using

:mphasize

(1

the

finding

right-hand

in Section

alternate way
the

of

testing

t distribution

12-7) ::isusually

preferable.

to

th e
find

null

the

CORRELATION

300

Note

the

that

F and

are related, in

t distributions
F=

there

where

calculated

is one degree of freedom


in (14-24) is just the t 2

by

general,

t \177
numerator
of
the ANOVA

the

in

(12-36),

of

F. Since the F
F-test of this

is justified.

section

Example

In Table

and math
the

14-2(b)the
example.

score

of Table

bottom

(The necessary

14-1.)To test fi

r -Since this
null

of 5.99, the

short

falls

328

84.7

computational detailsare shown


evaluated to be:

on

0, (14-24) is

(14-25)

3.87

critical 5 % point

F,

of

we do

not reject the

using

(12-36)'

hypothesis.

The

test of/9

same

= 0 could be equivalently

s/x/\177

this falls

tails of the

critical

value

null hypothesis

the

(both for the calculated and for


must
follow
from both tests.
Alternatively, a 95% confidence
from (12-30)'
tg

= 1.97

9.2/x/1304

x\177

short of 2.45,(the

distribution),

done

.50

Since

verbal

for our

are presented

calculations

ANOVA

critical

the

for

interval

.50 4-

a total of 5 \177oin both


is not rejected. Since t 2 = F,
values), the same conclusion
leaving

be

\177 could

constructed

(2.45).254

= .50 -4-.62
This

includes

the

value

rejected. (Of course,this


of the sample.)

\177--0, once

more

result

inconclusive

confirming

may be

partly

be

cannot

that

H0

due

to the

smallness

\234

(f)

Interpretation

of Correlation

These variations in

Y are

now related

b =

r\\/5

to r. It

x7

follows

from

(14-14)

that

(14-26)

(14-22)

b in

for

value

Substituting

:,gt\177is

that

= r 23

Noting

301

CORRELATION

siMPLE

by definition

(Yi

\177

23

(i4-27)

(r,

the

solution

y)o\177,

for rs is

(14-28)

n teexpress the

we

Finally,

,.2

This eq tation

(C

F) \177

\177 (Y\177

F) 2

\177

of

variation

explained

(14-29)

of r

variation

total

intuitive interpretation of r 2. (NOte that


coefficient
r, often called the coefficient

a clear

provides

st!uare of the

this is the

(14-19). Thus

by noting

numerator

\275

correlation

It is the pro?ortion of the total


variation
in Y eXp. lained
the regresSion. Since the numerator cannot exceedthe denominator,

determin4tion.)

of

byfitting
the

off

in

upfiill;

running

reflects the tct


In either ca \177,a
When

the X-axis,

14-1 a

taken from a
an

culated

:stimator of

a Population

\177,

Our

a\177sumed

line

Y--

and

Pa.

which is

, -t- fiX,
E\177.ch
n\177trmal.

of

normal

bivariate

of iso\177\177robability

a true

is yes.

assumed

ask' \"Is the b

\177even

regression

population

Y.

exist

9.

line of

is shown

in

cal--

we

a bivariate

For
Y

Figure

X?\"

It

14-11

as

on

with major axis d. Now considerthe straight


by joining
points of vertical tangency
such
as

ellipses,

defined

these

does

ways

values

sample

for

or

to

parallel

pOpUlation

We now

population.

normal

between X and

Bivariate Normal

regression was calculated

bivariate

zero and a

be

will

= 0 areseento be equivalent

relation

Applied to a

normal po' relation, does there


exist
will
now
bt shown that the answer
a set

the regression line

linear

Y.

in

of Y is

variation

explained

= 0 and b

observed

\"no

o n Analysis

Regress
In

0. Thus r

b =

,kith

explain all the variation


i.e.,

nothing;

explains

stating

of formally

(g)

regression
fit will
0 (and r 2= 0) the

\177=

regression11ne

m\177ximUm

the

limits on r are 4-1. Thesetwo limits


were illustrated
in
part
(b), r = 1 and all observations
lie on a straight
line
in part (d), r =-1
and
this
perfect
inverse cor3elation
that
all observations
lie on a straight
line
running
downhill.
the

i\177 1'

14-2 i

Figure

side of (14-29)is1.Since

right-hand

of the

value

maxirn\177u\177

value

vertical

Concentrating

tangents
on the

defines a
slice

through

cross section
P\177Q\177,

for

slice

example,

of

P\177

Y
we

302

CORRELATION

0
Math score

14-11

/d

X=\177,+fi,Y

xx

FIG.

P'x
X

X1

Two regression lines found from the

isoprobabitity ellipses.

mean
of these Y values
occurs
at the point of tangency
P\177; at
our vertical line touches its highest
isoprobability
ellipse,
and the
highest point on any normal
curve is at the mean. Thus we seethat
the means
of the Y populations
lie on the straight
line
Y = \177+ fiX'. Next, the variance
of the Y populations
can be shown to be constant.4 Thus the assumptions
of
the regression model (12-2)are satisfied
by a bivariate
normal (correlation)
population. The line Y = v. + fiX' may therefore be regardedas a true linear
regression
of Y on

see
this

the

that

point

Thus if we know
score, this regression
were

](\177,

we

would

line

predict

score and we
be appropriate,

math

a student's

would

score to

his verbal

be

(e.g.,

P\177).

It

predict

to

wish

if

his

his verbal
score

math

is important

to

fully

curious conclusion,
since in Figure I4-5 the size of each cross section
slice differs depending on the value of X 0. However
each slice ?(Xo, y) must
be adjusted
by division by ?(Xo) in order to define
the conditional
distribution of Y. Thus recalling
the argument in Section 5-1 (c), and in particular
equation (5-10) ' the conditional distribution is
\177
This

may seem like a

?(
In fact,

this

thus have the

Y/Xo)

?(Xo)

same variance. makes all the conditional

adjustment

Y)

?(x\370'

distributions

of

Y \"look

alike,\"

and

SIMPLE

303

CORRELATION

understand wl Y we would not predict Q\177;i.e.,


we d \370not use th e majo\177 axis
of the ellipse (iinc d) for prediction,even
though
this represents
\"equivalent\"
performance .n the two tests. Since this student
is far above average in
mathematics,
tn equivalent
verbal score seems too optimistic
a prediction.
Recall that there is a large random
are a lot of stydents who will do
technically, p \177sless than I for

dicting at

Q\177,

of\"equivalent!

This is thi:
there

math,

are

.we

true

Q\177 and

exam,

but

and predict at

s score in
to \"regress\" toward
a student

Whatever

score

verbal

his

Pxga

the 6iher;
instead of presort of average 5
in

performance/\177r-

\"average\"

term regression.

poorly

Therefore,

popUlatiOn.

a tendency for

Will be

average). \370It is evident from Figure 14-I1 that


this
is
With a math score below average; in this ca se the

a student

fCtr

predicted verb.tl score


Another interesting

toward the average.


that the correlation

upward

regresses

coefficient

is

observation

is identically

(i.e., pxr

Y is unique

X anc

between

th\177s

moderate

more

' performance
origin
of the

(i.\177., the

mediocrity

equally

in one

There

m performance.

involved

element
.w. ell

but

Prx);

there

ar\177

two

th\177 regression
of Y on X and the regressionof X ,on /r. This is
immediately eOident if we ask how we would r)redict a student s math
\177core
(X) if we knew his verbal score (e.g., Y,,).
Exactly
t h \177same
argument
holds. Equivalent performance (point Q0 on
line d) is a bat predictor; sincehe has
done
very well in the verbal test, we
would
expect
tim to do less well in math,
although
still better than
average.
Thus,
the besl. prediction is P6 on the line X = o% q- fl, Y, the regression

regressions,

of X (math)
Y on X,
etc.)
tangents

o\177
but

these distributions
sion
hence

case our

of horizontal, rather
defim,s
a normal
line
least

has

thus

the

satisfying

sqt\177ares

values

is the direct analogu.e to.


regression is defined
by joining

This

Y (verbal).

ir} this

than

same variance,
our

a,

conditions

and b,

of

distribution
with

its

mean

of a true

are used to

(.P\177'

0\177, and

P4,

hor,Zi\370ntal

given Y;
lying on this

X,

regression of

estimate

of

regression

points

of thes e

Each

tangency.

vertical

conditional

our

each of
regres-

X oh

Y;

fl,.

in fact a w. ,,ighted average of Q\177and 3tr, with weights depending on p. Thus in the
case in v hich p -- 1, X and
Y are perfectly
correlated,
and we would predict Y
at Q\177. At the oth(..r
limit,
in which p -- 0, we can learn nothing about IikeIy performance
on one test from
Ihe result of the other, and
we wouId predict Y at try. But for all cases
between these tw
limits,
we predict using both Qx and l\177y; and the greater the p, the
more heavily Q\177\177sweighted.
\177
A cIassical case, \177
encountered
by Pearson & Lee (Biometrika,
1903),
involved trying to
predict
a son's helght from his father's
height. If the father is a giant the son is likeIy to be
tail; but there a? good reasons for expecting
him to be shorter
than
his father. \177
(For
example,
how tal was his mother? And his grandparents?
An so on.)So the prediction
for the son was de ived by \"regreSSing\" his father's height
towards
the population average.
\177
P\177is
limiting

304

CORRELATION

Y=a+bx

\177Y=50

\177/

Prediction

Pre,di.c.ti\370n

Y= 30
t

X =

60

= 90

score

Math

FIG. 14-12 Regressions estimated from

of verbal and math

a sample

scores.

Example

Our sample of
14-1 was,
shown

And

in

by

14-1, we

Table

from

drawn

14-11. We

Figure

scores shown in
from a bivariate

student's

eight

assumption,

have already estimated


r = .62

estimated
Y =

o:

50 +

= 20+
We

now estimate

of X on

are

X =

=, 4-/9, Y.

calculated

equations (I 1-13)and

(11-16),

Table

as

(14-10)repeated
fiX' with

.50x

(14-30)

.50Jr

(14-3I)

14-1;

this

care to

taking

and

normal population

p with

coefficients

The

in Table

i4-I

Figure

in this

involves

simple regression

using

the estimating

interchange X'and Ythroughout.

Thus

\177= 60+.78g/

= 60
=

+ .78(Y--

21 +

The two estimated regressions


14-12. Thus, for example, the predicted
result of 90 is 65; and the predicted
result of 30 is 44.4.

(14-32)

F)

.78 Y

(14-33)

(14-31)and (14-33)are shown


score of
score of a

verbal
math

a student with
student with

in

Figure
a math

a verbal

(h)

Both the
a random
more

variable,

latter.

model

model

'or example to describe the fertilizer-yieldproblem in Chapter


X vas fixed, or the bivariate
normal
population
of X and Y

11 where
this

with

th \177standard

Y be

made

assumptions

the

used

be

may

c\370rl'elation

in

makes few assumptions about X, but the


of this chapter requires that
Xbe
a random
\276a bivariate
normal distribution.
We therefor\177
conregression
model has wider application. Regression

ion

regress

having

that

clude

chapter.
However, the standard correlation model describeso\177ly
(It is true that r s can be calculated even when X is fixed as an indigation
effec{iVely

how

of

! ia ble.

X. The i
restrictiYei

about

models require that

regression
and correlation
But the two models differ

;tandard

va

When Regression?

'elation,

Col

When

305

CORRELATION

SIMPLE

regression

inferencesabput p in

redUCes variation;

but

be used

cannot

in
the

for

14-9.)

Figure

In addition,, regression answers more interesting


questions.
Like correlation, it ii\177dicates
if two
variables
move together; but it also estimates
how. Moreo {er' it can be shown that a key' iSSue in correlation
analysis- the

test of the

ntll

hypothesis

can be ans,a

:re d

null hypothe

.is

directly

(14-34)

= 0

Ho'p

from regression

analysis by

equivalent

the

testing

(14-35)

Ho'fi = 0
rejecti{.n

Thus

correlation
t[

question,

of tO = 0 implies rejection of p = 0, and


the
.oes not exist between X and Y. If this is the
:n it can be answered
by the regresSiOn test

that

conclusibn
correlation

only
of

(14-35),

and

need to introduce
correlation
analysis
at all.
Since re gressionanswersa broaderand more interesting
set of questions,
(and somec orrelation
questions
as well), it becomes the preferred technique;
correlation
{s useful
primarily
as an aid to understanding
regression,
and as
is no

there

an

auxiliar,\177';

t001.

(i)

\"Nonder, de\" Correlations

no

claim

In
suppose
liquor

inte\177

is
th

over

correlation,

>reting
\177ade

.t the

that

this

correlation

a period

one

necessarily

must keep

of teachers'

of years turns

out

in

firmly

indicates cause

salariesand

to be

mind

and effect.

.98. This

the

that
For

absolutely
\177xample,

consumption

would not

prove

of
that

306

CORRELATION

drink;

teachers

Instead, both

nor would

liquor

that

prove

it

third variable.. long-run growth


in national
this kind could be kept constantor
their
correlation
would become more meaningful.
correlation
in the next section.

Correlations such as the


It would be more accurate
to
is

the

but

enough,

real

conclusions

statisticaIly

this would

to

then

objective of partial

that

of cause
same

the

and effect is nonsense.


can also be leveled
at

charge

drawn from regression analysis. For example,a


salaries and liquor saleswould
also
yield a
coefficient.
Any inference of cause and
effect
from

teachers'
b

significant
still

discounted

is the

This

of

factors

third

only

fully

are often called \"nonsense\" correlations.


that the observed mathematicalcorrelation

inference

naive

any

effects

sometimes

applied

regression

say

be recognized

it should

Moreover,

above

teachers' salaries.
are influenced
by a

sales increase

because both
income. If

together,

moved

variables

nonsense.

be

Although
correlation and regression cannot
be used
as proof of cause
and effect,
these
techniques
are very useful in two ways. First, they may
provide furthe;'
confirmation
of a relation that
theory
tells us shouId exist
(e.g., pricesdepend
on wages).
Second, they are often helpful
in suggesting
causal relations that
were
not previously
suspected. For example, when
cigarette
smoking
was found to be highly
correlated
with
lung cancer,

possible links between


the two were investigated further.
correlation studies in which
third
factors were more
well as extra-statistical studies such as experiments
chemical theories.

This

included

more

controlled,

as

animals,

and

by regression

on X.

rigidly

with

PROBLEMS

14-3For the

following

sample

random

of 5 shoes,

(a)
The proportion
of the variation
(b) The proportion unexplained.

(c)
in

Whether

three

interval.

Y explained

in

on X, at the 5 \177osignificance

Y depends

alternate

find

ways--using

the F

Cost of Shoe
I0
15

test, t
=

Months
8

10

10

2o

12

2O

level.
and

test,

Answer

a 95 %

of Wear

this

confidence

CORRELATION

SIMPLE

14-4

normal distribution of scoresis perfectly


with isoprobability ellipses as follows'

bivariate

Suppose
with p

'\177 307

.50 and

symmetric,

80--

or False

True

(a)

The

(b)

The

? If false, correctit.
of Y on X is

regression

curve

regression

line of

= 80 +
Y

on

.s(x-

X has

80)

graph as

follows'

(c)

Thel variance

(d)

The

(e) Tht.s
b

a\177d

original

the
b,

for any given

b = r

(b)

b,

(c)

bb*=

rSx.

(after

Y values

explained
fitting

X)

by

X is

would

only
have

1/4.
3/4 the

Y values..

sample regression slopesof


scatter of points.

,s,,\177-

sx

variance of X.

Y variation

be the

True or False? If false,


(a)

1/4 the

is

of the

residual

the

of

varianc\177

14\337
5 Let

of

proportion

correct

it.

Y on

,Y, and

,Y

on

Y,

308

CORRELATION

If b ;>

(d)

(e) If
14-6In the
doing

< I necessarily.
b, > 1 necessarily.

1, then b,
1, then

<

graph

following

any algebraic

of 4 students' marks
calculations):

(a) The regressionline

X.

Y on

of

(without

geometrically

find

(b) The regressionline of 2' on Y.


(c) The correlation r (Hint. Problem 14-5c).

(d) The predicted


(e) The

student

of a

Y-score

predicted X-scoreof

a student

of 70.

with

X-score

with

Y-score of

70.

\177 8o

\1776o

x
4O

40

60
80
Term grade X

PARTIAL CORRELATION

14-2
As

more than
consider a simple three-variable
on spring

depends

the

Following

from the simple two-variable

we move

as

soon

which involve

suppose

example:

temperature (X)and rainfall


(Z).
techniques of Chapter 13 we

regression plane to a scatter of observations

of

to

case

complications

variables,

two

could

Y, X,

relations

arise. To illustrate,
that yield of hay
(Y)
fit

following

the

and Z:

Y=a+bX+cZ
we

how

Recall

Y is

how

the multiple

interpreted

related to X if Z

were

(14-36)

regression coefficient:
The

constant.

partial

correlation

estimates

coefficient

rxy. z is a similar concept. It estimates how X and Y move together if Z


were held constant. (For convenience
variables
Y, X, and Z are often
defined
as variables
1, 2, and 3; thus ryx.z
becomes
r\1772.a , the partial
correlation of
the

first

two

when

variables,

While the previous

regression analysis
section

corresponds

of

the

third

sections of

Chapter

is assumed
this

12, the

to the multiple

chapter

constant.)

regressionanalysis

to the

correspond

partial correlation

analysis

of Chapter

simple
in

this

13. Thus we

could

whole chapter on

on a

here

ribark
e\177

309

CORRELATION

PARTIAL

and

correlation,

partial

\1771ong

argued in the previous


section that
correlat on is relatively less important, we confine ourselves
to
a 'brief
introc uction to this concept,
intuitiv\177
and how it may be used.
,ing
assumptions
are generally
made about the parent populaof X,
Y, and Z is multivariate
normal.
This i\177plies
tion. Thh\177
T efollo\177
disU ibution
that for[any value of Z, the conditional
distribution
of Y and X is biv6riate
n\177,,'m
sho'\177n in Figure
14' 5 . o\1777,.
\177is defined
as the simP le correlation
\177,\177a l\177as
.
\177
\177.\177
X..\177
.hat.

at

one

t- lowever,

we have

since

of this

donditi(\177nal

joint

In

(ompuiing

its estimator

t it

variable

of 5( and Y.
problem arises.

distribution

z a

r\177Tx

possible to

is simply not

value

single

fix

Since Z is a random
Zo and sample the

corresp\177)nding!,conditional
distribution
of X and Y. Thus,
unless
the sample
is extren\177ely
la\177ge, it is unlikely that
more
than a single Y, X, Zo combination
involving Z0 'd/ill be observed.
The alternative is to compute rrx z as the
correlation
of Y and X after the influenceof Z has been removed
fro'm each. \177
The
resull ing partial correlation r x z.z can,after
considerable
manipulation,
;sed as the simple correlation
of Y and X (ryx),
adjusted by

wo simple

the

applying

correlations

rxz and

(namely

involving

rzz ) as

follows!

--

r\177x

This fo1)mula
special

case

rxz =

r\177z

\177artial

t at both X and
= 0), then (14-40)

uncorrelated

completely

are

however,

coefficient;

correlation

simple

and

be no closecorrespondence

there need

that

explicitly

hows

the

between

(14-40)

\177

rxz

/I

rx.z

rxz

r\177z

with

as

expect, the partial

we w.)uld

the

reduces to'

ryx.z =: ryx
and,

in

Z (i.e.,

(14-41)

and simple correlation

are

coefficients

the sam;e.

It is inst[uctive to

? By

the

\"influe\177

note

ce\" of

Z on

the

we mean

\"remr\177ving t \177einfluence,\"

obtainin\177

the reliduaI

we mean

is recogn

ed

to

r\177x.z cannot

regression

of Y on

Z:
(14-37)

+ bZ

subtracting

the

from

fitted

the observed

value,

deviation:

ti=
which

other extreme when


be calculated

the

case

this

fitted

= a

By

at

happens

what

with Z. In

correlated

perfdctly

becomes

be that

r- P= Y-a-bZ

part of

not

explained

residual leviatic n of X from its fitted value on Z. The partial


is the sir .ple cmrelation
of a and \177,thus:
-- ^^
I'.X\177
Iz \337
Z \177 ruv

(1\1774-38)

by Z.

Similarly,

correlation

we obtaifi

coefficient

v, the
\177rxy.z

(14-39)

CORRELATION

310

becomes zero as a consequence.


This
is recognized
as the multicollinearity problem of Chapter 13,
where
the corresponding
multiple regression estimate b could not be defined.
The parallel statistical properties of b and ryx.z
can be extended
further'
rejection
of the hypothesis that
fl = 0 in Chapter
13 is equivalent to
rejecting the nuI1 hypothesis
that pyx.z = 0. Again,
one
reason
for emphasizing regression analysis is confirmed' multiple
regression
will not only
answer its own set of regression questions, but also partial
correlation
questions as well.

= 1

since rxz

14-3

of (14-40)

denominator

the

and

CORRELATION

MULTIPLE

coefficient may be computed


for each independent
In addition, one single overall
index
of
value of fitting
the multiple
regression
equation can be defined:the multiple
correlation
coefficient, R, is the simple
correlation
coefficient of the observed
]r and the corresponding fitted \177. Thus,
if our estimated regression is:
correlation

A partial

variable in

regression.

multiple

\177=a

cZ

bJF+

_z

r\177.{\177I

(14-42)

then

propertiesof

has all the nice algebraic


particular, we note (14-29)which

This

Rs _

\177 (?\177

?)o.

F)

\177 (Y\177.--

Note

If

variation

the

there is more than


of Y explained

multiple

regression

variables

to our

see
of

in

how

(i4-44)
Our

Y.

conclusion

values of calculating
the

in Y.

variation

It remains,
coefficients,
Total

variation
We

could

set this

up

in

an

this

relate

to

-- variation

variation

as we add

fast R 2

in

explained

explained
ANOVA

to our
We

(13-15).

we can

by

by

t-test of
could

table Iike

our
\337
one

immediately
explanation

major

of the

regressionexplains
regression

multiple

extend (14-22) to

(X\177'\"Xa)

-Y4 +

explanatory

additional

increases

helpful these variables are in improving


is the same as in simple
correlation
R 2 is to clarify how successfully our

our example

(14-44)

regressor,
of them

how

watching

by

finally,

using

all

of Y

is only one regressor (independent


then the numerator represents
[with \177Yestimated
from the fuI1

there

if

of

variation
variation

total

In

correlation.

simple

any

form

explained

(14-42)]. Thus,

(e.g.,

model,

one
by

the

takes

to r 2

is identical

this

that

variable).

(14-43)

unexplained

additional

variation

Table 10-10,and

construct

(14-45)
the

MULTIPLECORRELATION

311

ratio
= additional variance explained by
: unexplained
variance

:F

just as
included

v\177im

r eii

fo r each)t.of
\177v.a!\177u\177s
s \177.ha.

o!

t\177sts

in

translated

lend themselves

(last

F ratio

an observed

are

ob-

thls

the

of

ficance

F values

regressor

orl each

rllllcance

s1

(13-15), and

equation

under

,\177ppear

of

of significance
the si ni

test
of

. These

(14-46)

--

\177uernCOmO construct

e S:\177si\177rrJY/;
g
'

\177s;\177erX4r'
i

ttJ

(i0\17730)was constructed. A
i s thu s seen to be a test
._
,,

ra. io

t!e

served

X\177

so
'\177\370\"\177

PROBLEM

assets \177,
(a

R, the

(d

Th
(1)
\177(

(f)1Is

r4. \177 '

other

thing:

or

for

(b)

14-7(d),
of

the

larger

the

(d) \177Usin\177 parts

(e)
'14-10

t-.

Repeat

8 Using,

W.

of cou\177'se,

Is

Y are

and

as

variation

explained

This

is one

t =

which

there for

w;\177

(13-15).

tOe'steps of Problem
'

'

13-2 and Sub-

of Problem

data

related

\337
find

of' the variation


and W.

--\177x/F.

such

eqdati0n

statis\177ticalisignificance

on

\"how

addition

by

the

\177s

unexplained

the

two

(a), (b), and (c), calculate the


variance
significance
of addin\177
W to the re

statistical

Oalc41ate

a\275

problem?

in this

than

of

(c)vh?i\177H( \177'imany
degrees
of
are
of
to n in (a) and ().
b freedom
9

in

by

is explained

which

of

\177

r\177ess\370r

iTh :Ipr\370portion
o\177 of S on
Y

test \177e

W.

measure

W throughout

( ) \177The\177proport\177on

regr?

r always

a better

r\177,r.,:

ol!owi\177g Problem

g re

(c), is R

s being equal\"?

as

and
\337

larger than

iProblem 14-7, using

stitFtingl

and

W,

(a)

\177i\1776\177siarilY

14-8 Repeat
'14-9

y and

)By

Co\177paring
'

(e)f

Wfixed.

holding

\177,

and

of S

variation

of the

\177
proportion
Y alone;

income

Y.

and

of S

correlation

multiple

of S
on

correlation

partial

the

(cb\177'\177)
\"\177s\177.,r,

and

of S

correlation

simple

savings S to

relating

13-1

find
the

ra\177,

( J

of Problem

thbdata

14-7 FSr

of adding

the

valuesg\177

14-9 to
Y as

ater

components
ratio

'
\177ey-s,
7,noum

oe

F, to
m\177odel:

\177ouno

find
the t value to test
a reg ressotaft er S \177s
'
regressed

:k x/\177', with 1 degree of freedom in

the

numerator

the

15

chapter
\337 \337

Theory

is devoted to making
decisions
in the face of uncertainty.
discussion involves Bayesianmethods,which
are not only
useful for their own sake, but also sharpen
our understanding
of the limitaThis chapter

part of the

A large

tions of

PRIOR

15-1

statistics.

classical

Problem 3-24b on Bayes'theorem


altered form. If we were to

slightly

a barometer,

sulting

DISTRIBUTIONS

AND POSTERIOR

we would

Prior

State 0
Prior

But

we

can

TABLE 15-2

\177
Prediction

a barometer

.60

characterized

by

Table

Conditional Probabilities p(x/O)


State

Shine (0=)

.40

p(O)

using

by

in

Rain (0x)

\177-\177

\"Rain\"

\"Shine\"

\337
'2\1770

.1\177'

.80

1.00

312

Shine (02)

.90

1.00

before con-

weather

Probabilities
Rain

probability

do better,

tomorrow's

use Table 15-1'


15-1

TABLE

to repeat,

enough

important

is
predict

15-2:

TABLE15\1773

Probabilities

posterior

, f)(O/x)
,

State

Rain (0x)

Shine (02)

.75

.25

probability

Posterior

P(O/\"rain\

by

Tabl e

isf
I15,31

Table

We
Tables

that
the barometer's
prediction is
15_1
and should
are no \177$nger\177 relevant,

observed

Afte\177]\177t

probabilillies

i'eca[l that
this
was
15!i a}d 15-2 int \370Figure

relati.te
probabilil[ieS
Sinc\177

'

these

size of

the

The new

15-1.

areas

hatched

two

we now

important,

\177'

the

be replaced
combining

by

sample space is \"rain\"

with

the two posterior

explaining

of rain

\"

(\1774)(\1779) -\177 ,3\177:'

of the state'shine
e\1770 \177'\177i\177
=; \177(0

con-

formal

full

and

=
,r0bability

its

'\"

= p(oo

nl

down

write

probability

the

express

p(o,

Similartylthe

diagrammatically,

derived,

\"rain,

15-3.

Table
'\177'\177;is
so
tn\177I
i n

W{use (5-10)to

firmatio

313

DISTRIBUTIONS

POSTERIOR

AND

PRIOR

and the

prediction

\"rain\"

rmn

as

\17715-2)

is
;(15-3)

2) - e(\177t/0Z)

= (.6)(.2) =. 12 .....
(15-4)
relations define the hat&ed'hreasin Figure
15-1. Companng
areas we b0n&ude tpat i t is three times a s likbly
for a \"rain\" predicti\370fi
to be
associate\177 With rai n' as with
shine.
Formally,
the hatched area in Figure
15-1 becgme\177
the
n ew samp le space, within
which
we calculate
the new
[

twb ca

These

(,

\177r0babilities.

To

\177s, we

that

note

p(\"rai

n\")

p(x\177)

state

.12 =

'(15-5)

.48

Shine (.6)

Rain (fi)
\"S i

.36 +

\177

I
Predictign'l

\177\177'

r\177'

\177

Rai h\"

(he

FIG.

15-1

How posterior

probabilities

are determined.

I/\177tVERSITY
\177
\177

\177

LtBRARI[S

CAR\177EGIE\177MELLO\177 \177IVERSIT\177
PITTSBURGH

PEP\177NSYLVANIA

1521\177

DECISION

314

again

(5-10)

Using

THEORY

p(O\177/x

p(O\177, x\177)

_.

(15-6)

_ .75

.36

p(x0

p(O\177, x\177)

\177.48

Similarly
=

p(o\177/:\177)

When
this

divisor

by using.the
in

distribution

?(x\177),

15-3. This is

Table

often

(\17754)

probabilities blown
up
in
posterior probability

is the

result

the

.25

\1772

.48

space has its

sample

(hatched)

new

this

way

p(x 0

more

the

in

written

convenient

and

we repeat

the

general form
p(o,

p(O/\177) =

To

keep

the

prior

probabilities

the

p(,\177/o)

in

perspective,

(barometer) is seen,

the evidence

Before

emphasis.

give the proper betting odds on the weather.


But
can do better; the posterior probabilities\234(O/x)
betting odds. (This may
be
intuitively
grasped
by
relative
frequency
interpretation.
Of all the times the
p(0)

after the evidence is


now
give
the proper

appealing to

p(o)

manipulations

the mathematical

physical interpretation for

,\177.)_
p(\177)

we

in

barometer registers\"rain,\"

in

proportion

what

The answer is 75 \177o.) As a simple summary,


distribution
is adjusted by the empirical

will

we note

that

evidence

distribution. Schematically:
Prior

Probability

and

of

?(o)

the

actually
prior

occur?

probability

to yield the posterior

Posterior

empirical evidence

probabilities

rain

yields

(\1775-9)

probabilities
?(o/\177)

?(\177/o)

PROBLEMS

15-1

A factory
machine,

table'

has 3 machines (0\177, 02, and 0a)


larger and more accurate it

the

Machine ,

020a

(newest)

produced

output

this

Rate of
bolts

bolts. The newer the


to the following

according

of total

Proportion
by

0\177 (oldest)

making
is,

machine
defective

it produces

\177o%

4o%

5o%

3 i5

OPTIMAL DECISIONS
Thus

the

(a)
is th\177

example, 0a produces half of the


Produces, 1 \177 are defective.
a bolt is selected at random;
,nce
\177e

(b)

02?

0\177?

By 02 ?

examined,
By 0a ?
afte

what

this

0\177? By

by machine

a roomful
;ights 0 have the following distribution:
is drawn at random from

aman

15-2

Of all

and

TM,

found defective;
produced

was

it

it is

before

it was produced by machine


the
bolt
is examined and

what is the chance

ion,

outp

fact\370ry's

of

people,

ten

iv(O)

(inChes)

70

71

72

73

74
75
(a)

(b)

of O.

distribution

(prior)

this

)ose also that a crude measuring


Lth the following distribution'

e (error in

inches)

device is available, that

makes

p(e)

.1

--2

--1

.4

.2

.1

2
SU
hei

d e\177
i.e:

help
us to be more accurate
in estimating
the man's
example, supposehis measured
height
using thii crude
s x-- 74 inches.We now have further information about 0;
\177is measurement
changes
the probabilities
for 0 from the prior
this

can

-'For

t .
mt\177on

?(0)

to a

his posterior

15-2

posterior

distribution

iv(0/x

74).

Calculate

and

distribution.

DECISIONS

(a)
Su
a

regularly sells
games. To keep

a salesman
t football

umbrellas or lemonade on
matters

simple,

suppose

Saturday

he has

just

316

THEORY

DECISION

ai).

options (actions,

possible

three

umbrellas;

sell only

a\177 =

sell someumbrellas,
sell

aa =

chooses

If he

It will

be
his

thus

lemonade.

only

his profit is $20;

it rains,

a\177 and

more convenient
losses will be --20

to describe

information

15-4

TABLE

l(a,

O)

Rain

10
5

25

--7

--20

a.2

Supposefurther
of

the

weather

that

distribution

the probability

is as shown

15-5.

Table

in

of 0
State 0
#(0)

Probability

is

What

out,

the

best action for


reading on;

before

Solution.
average

If

he

? Intuitively,

this

chooses

will
a\177,

--

Shine

.20

.80

be easier
what

--20(.20)

(long-run relativefrequency)

Rain

the salesman to
it

we calculate the
L(al)

All

loss table'

Distribution

Probability

15-5

TABLE

this

losses.

(02)

Shine

(0\177)

S10.

\177

Action

following

Function

Loss

State 0

,-'\"\"'7\177

the

he loses

(negative profit);

be certain

also
in

shines,

a loss

as

or + 10respectively.

action a2 or aa, there


wilI
may be assembled conveniently

if it

but

everything

he chooses

If

lemonade;

some

could

take?
(You
are urged to work
that way.)
he expect his loss to be, on the

expected loss if
+ 10(.80)

= 4

he

chooses

a\177'

We reco

as

this

= l(al,
evaluate

Simi

02) p(00

l(ai,

by (4-17)'

given

\177 l(a\177,

(15-11)

O) p(O)

0
\234\2752)

and

p(00

Ol)

value,\177as

of expected

concept

the

317

DECISIONS

OPTIMAL

5(.80)=

5(.20)+

(15-12)

(15-13)

L(aa) = 25(.20)- 7(.80)=--.6


In

gener\177

L(a)

The \177optiknal
in

tact,

action

only

th\177s \177s?
the

we assemble

is seen to
option that

15-6

T^BLE

O) p(O)

l(a,

be aa, which
allows any

Calculation of the

p(o)

(15-14)

.20

.80

01

{92

the

minimizes

profit.

expected

and calculations

information

\177tll our

\177

in

15-6'

Table

Action

Optimal

expected loss,
To summarize,

a\177

- 20

a\177

oss function

\177'

10

-7

25

aa

mber
to

next

anysection).
nr

We now
].

The obj

at S+
of $4 per d \177
days,

the

'remains

ective

P
an

(even

same:

roblem

can be

infinite

number,

to minimize

eneralized
as

expected

in

the

loss.

l(a,

t \270review,

WiSh
about

and
0).

?(0),

obabilities

those

days he

this

that

\177

0 or actions a

consider:

\177sFunction
\177
For

to state

necessary
\177

ot states

\177
mus\177 to

T\367

minimum

-.6\177

'

dlyiseems

ha

loss

l(a, O)

(b) Genet li

It

= expected

L(a)

20 rainy

yielding

we give an

days at

S+800

alternative

$-;\1770

for a sum

each,

of about

intuitive

yielding
S+400

calculation.
In, say , 10o
S--400; and about 80 shiny
in 100 days, or an average

318

DECISION

THEORY

Prior distribution

of data

Probability

P(\177)
1

{x/O)

Loss
if

that

Expected
find

>

loss L(a);
the smallest

decisions to

of Bayesian

logic

O)

possible;

otherwise

FIG. 15-2 The

l(a,

This

<

expected

minimize

loss.

?(0). These of course should


represent
the
best
intelligence on the subject. For example, suppose the salesman
moves
to another state, with
weather
probabilities
as given in Table
15-1.
If he has no barometer, he will have to use the (prior) probabilities in this
table.
But if he can consult the barometer
(described
in Table 15-2), then
of
course
the posterior probabilities ?(O/x) in Table
15-3 should be used. (See
The probabilities

possible

15-3.)

Problem

The

logic of

Bayesian inferenceis laid

15-2. Incidentally,

out

in the

of the average

calculation

the

in

block diagram, Figure

loss L(a) in

(15-14)

it

instead of?(0) as weights,


where
k is any constant
(independent of 0 and a). For ks(O) would generate losses kL(a), which
would
rank in the same order as the true
losses L(a), and hence point to the

would

not

hurt

to

use ks(O)

same correct optimizing

is a very useful observation. Thus, for


need not undertake the last step in calculating the posterior probabilities of rain ?(O/x\177) in (15-6) and (15-7); he
can forget about the denominator ,o(x0, and use (15-2)and (I5-4) instead-without affecting his decision. \370'
The loss function,
l(a, 0). In our example, we assumed
that monetary
loss is the appropriate consideration.This may be valid enough if the decision
is made (\"game
is played\")
over and over again: whatever
minimizes
the
expected loss in each game will minimize total expected loss in the long run.
our

example,

2 i.e., attaching
of .75

and .25.

weights

This

action.

umbrella

salesman

of .36 and

.12 to

his losses

would

yield

the

same result

as

weights

DECISIONS

OPTIMAL

Yet
are some

monetar\177

ofli

were

right

(1) a

pe )p e

criterion.

or

1/2chance(lottery

(a),

of choice

that

than

less

its

though

monetary

expected

(1) = s 100,000

s 100,000
is

even

(15-15)

$210,000 prize.

on a

ticket)

choice

prefer

would

only once, and then


expected
To illustrate'
suppose you

made

are

between

sure,

for

S100,000

(\177)

value

that

be the
a choice

!\177tax-free)

Most

decisions

not

may

)\177s

319

(15-16)

(b)'

s210,000(I,/2)-- S105,000
The rea
the

than

hun,

first

opportun
the sports

be based

of money
one

auth

ate me

expected

value

people

most

that

d. (The student
Once

housand.
be

availabIe

has already

been

itself as

n money

\"utility\"

the

/. Using

thousand

hundred

first

on how

speculate

more

much

he would spend the

these purchases have been made,less exciting


for spending the second hundred thousand;
bought,
and so on.) Such a decisionshould
but rather

(! 5-16),

in

of money.

bjectiveevaluation
:he decisionshould

are:

the

should

U(M).
be

an

As

on a subjective
Figure

illustration,

valuation

15-3 Shows

utility is the more approprion expected utility,


rather
than
expected
utilities
of the two choices
Since

based

Figure 15-3, the

(a)

(b) u2(\253)=
is

which

r victory

function

hereafter

\177all

for choice (a).In decision


situations,
a loss-of-utili O,
typically
be used as our loss function
l(a,
0);

should

kind

is

(15-17)

.7u

interpret

losses

in this way.

u2:

1.4uI

210,000

100,000
(dOllars)

\177

a This
dividual

15-3

utiliti

used to

by
defi\177

highly

hi m
ity,

Which

rather

Author's

subjective

evaluation

of money.

personal,
and temporary. It is defined empirically
bets he prefers. In other words,
many
bets like
than vice-versa.

':

for an in(15-15) are

320

THEORY

DECISION

PROBLEMS

15-3

Using the losses of Table 15-4,calculate


the optimal
action if
(a) The only available
probabilities
are the prior probabilitiesof Table

15-1.
(b) The

reads

\"rain\"

reads

\"shine.\"

barometer

(so

of

probabilities

posterior

the

that

Table 15-3are relevant).


barometer

(c)

The

(d)

Is the

a true or

following

above? If false

salesman

If the

before consulting

his

choose

must

then

barometer,

best.However,

(a) to (c)

questions

of

summary

false

it.

correct

can be consultedfirst,

barometer

if the

action (order his merchandise)


a2 (umbrellas and lemonade) is
then

the salesman

should

Choose

a\177 (umbrellas)

if the

Choose

aa (lemonade)

if

But a bright
salesman
going to all the trouble
t5-4 A farmer
His losses

the

barometer

predicts

\"rain.\"

barometer

predicts

\"shine.\"

seen

have

could

of learning

has to decide
depend on its
after

processing,

the

corn for use

has

decision

been

or

by the mill

(determined

content,

farmer'\177

decisions.

Bayesian

to sell his

whether
water

without

solution

obvious

this

about

B.

use
during

made) according to

the following table.

.'\"'7'--...\177State

Action

\177

Use A

(a) If

his corn

what

(b) Supposehe has


whether

it

is

regardlessof the
what

should

much does it

state

reduce

is that,
as

dry

a rough-and-ready

a method

of nature.

his decision

10
through

one

long

past

third of the time,

be ?

developed

or dry

wet

.30

20

has been classified

his decision

should

-10

information

additional

only

his

Wet

Use

experience

Dry

If

which

this

be ? How much

his expected

loss ?

is

indicates
is this

means of determining
3/4 of the time
that his corn is \"dry\"
method worth, i.e., how
correct

DECISIONS

OPTIMAL

--> 15-5
\177ool

rom

is to

serve 125students,who

be built to

Let

live

321
along

a single

distance student i lives from


origin
of school from origin

distance

Thu

-- a)

(xi

= distance

of student

school.

i from

(a) Whei'e is the optimum


place
(mean,
median,
mode, midrange?) to
build thq school
in order to
(1) i\275imize the distance that the farthest student has to walk.
(2)
\177lmlze
the
total
walking done, i.e., minimize
the
sum of the
ute deviations'

*(3)

sum

the

timize

of the

squared deviations'

Z
(Hi,it.

respit

\177alculus
e\177iuaI

(b)
tion

differentiating

the following accurately


.bove ? If not, correct
it.

is co
only

walk:

mor\177

mile.
1,

respect

with

to a,

setting the

reflect

live

your

where

conclusions

the school

is

in ques-

concerned only about


the total walking done; walking
no matter who does it. In (1), on the other hand,
g done by the two extreme people is considered a loss;
tone
by any others
is of no concern whatsoever.
(3) is a com\177we imply
that although
all walking is some kind
of loss,
the
:udent has walked, the greater
his loss in walking one more
s the person who walks 3 miles (xi -- a = 3) contributes
9 to
\177nction, whereas the person who walks 1 mile contributes
we are

red

only

suggests
zero.)

clmlze the number of students who


t, and
do not have to walk at all.

(4)

the

to

a)2

a loss,

THEORY

DECISION

322

AS

ESTIMATION

15-3

DECISION

states 0 (rain

earlier example the

In our

categorical (i.e., nonnumerical).


in this

theory;

consider a numerical

section we

shine)

and

was not an

this

But

and actions a were


essential part of the

example.

Example

the judge at
first contestant, whom

Suppose
of the

ignorance;

suppose

probability

distribution

p(O)

64

.1

65

.1

66

.2

.4

,2

Suppose,
in order to encourage an
if he makes a mistake (no matter

sl

as good

as a mile.\"What

(ii) Supposethe

rules

(iii) Suppose the


error

Solution. (i)
(ii) The median
(iii) The

15-5,with

(i),
the

are

rules

The

large

is to

the judge

or small);

miss

\"a

as his error
likely

most

is

$x for
is

his

even more severe, by fining


the judge
the same as (b), except that
the
loss
increase.What is his rational guess now ?
(modal) value 68.

made
this

is

67.

value

mean value 66.8.


(iii) are like (4), (2),and

(ii), and
same

guess,

intelligent

how

0.

rational judge guess ?


become
more severe, by fining
the judge
greater
his error,
the greater his loss. What

of x inches;

more severe

heights

68

the

should

an error of' x inches; the


rational
guess ?

Thus

Prior distribution of

15-4

be fined

an

66

64
FIG.

becomes

follow the

15-4.

.1

69

for

of contestants

.3

68

sacz

is asked to guess the height


seen. Yet he is not in complete

heights

in Figure

.2

67

(i)

the

that

shown

p(O)

never

has

he

he knows

0 (inches)

contest

beauty

(3)

in the

schoolhouse

Problem

solution.

To translate this into


height is the state of nature

the
0,

language of
the guessed height

familiar
and

decision theory,
(estimate)

the

is the

girl's

action

ESTiMATiON

a to be

iThe

numerical,

and

\177

\177er

formula'

loss functions,

Each of the 3

a table.

than

function

loss

the

pay is th e loss function [(a, 0);


is most convenientlY given

must

judge

the

fine

Optimal estimator, is shown in

correspo

,E

HOw

15-7

if

a = 0

1 0therwise
(,'the

a _
(a _

0'1

its

On

tt le

a is'

Estimator

Mode of p(O)

exactly,
loss function\

Median
Mean

0)p-

loss

in

is

(iii)

functi\370h

ory. It is graphed
but also by

' easily differentiated

exam

by

With

Corresponding

the

Optimal

0[

UadratiC\"

decisio
intuiti

0 Depends

of

Then
is'

O)

along

since

15-7.

Table

the Optimal EstimatOr


Loss Function

FunCtion l(a,

Ihe Loss

323

AS A DECISION

15-5.

Figure

attractive

its

(an

the

important

o)e tha t

is

It is justified
mathematical
requirement

USUally

rased

in

not only by its


properties. For
in minimization

;n\177tl\177eother
hand
(i) obviously cannot be differentiated,
nor
can
is an absolute value function.
W\177 reemPhasize
t h'at the Probability distributionp(0) used in the!deCiSiOn
proces\177 ou ht to reflect the best available
information.
Thus
we may be
forced \177,touie the prior distribution p(O) if we have not yet collecte d \177ny data,

1:

(ii), since

but

aft\177.pr d\177ita

is collected,

the posterior distribution

15:5 The

quadratic

p(O/x)is appropriate.

loss function.

324

THEORY

DECISION

PROBLEMS
of Problem 15-2.

are extensions

These

15-6

have

you

Suppose

(a)

to guess the

prior

with
of 0 only the

l(a,

Assuming

O) =

la --

(c) Assuming

l(a,

O) =

(a --

(15-18)

(15-19)
0)2.

Problem 15-6, after the


so that
the posterior

(15-20)

is relevant.

?(0/x)

distribution

CLASSICAL

VERSUS

BAYESIAN

crudely measured

has been

height

man's

74,

ESTIMATION:

15-4

Figure

l(a,

Repeat

15-2,

Problem

in

optimal estimate

otherwise.

(b) Assuming

as x =

man drawn
Find the

ifa=O

= 1

15-7

known.

?(0)

O) =

of the

height

distribution

This comparison is best shown with


an extended
example,
15-6; from this we shall draw conclusions later.

illustrated

in

(a) Example
to estimate the length
0 of a beetle accidentally
piece of machinery. A measurement
x is possible, using
subject
to some error; suppose x is normally
distributed
the true value 0, with
cr -I. Suppose
x turns
out
to be 20 min.

Suppose

is essential

it

caught in a delicate
a device which
is
about

Question (a). What


Our information

classical

the

is

Solution.

?(x/O)=
can be \"turned

on
N(O,

0 =
and,

mean

the

00 --

25 mm

posterior distribution
Solution.
any 00,

or0,

etc.,

It

(I 5-22)

will

and

be

then

useful

solve

to develop
it

for

our

(15-23)

of 0 = 20

we take the effort to find


of all beetles has a normally-distributed
and
variance
cr0\177
= 4. How can this
of 0 ?

population

0'

for

interval

1.96

q-

Suppose

(b).

Question

(15-2I)

1)

the following confidence

point estimate
that

of x, i.e.,

20 4- 1.96(I)

= 20

of course'

distribution
N(O,

for 07

interval

confidence

specifically

a\177.),

to construct

around\"

95\177

sampling

the

out

length,
be

a general formula
specific

IL

a biologist

from
used

applying

example. Since

jJ

with

to define

for

our prior

325

VERSUS CLASSICAL

BAYESIAN

ESTIMATION'
distribu,

and

the

is'

empirical evidence x

of our

bution

(15-24)

;V(Oo. .o

?(o)

(15-21)

= N(O.

repeated
it

b,

can

that

the

distribution

posterior

5-21)

may

K2e

to

p(x, O) =
.25) and

we can u

(15-26)

p(x, O) =
Now corn

)nly

the

(15-36)

(15-25)
(I5-26)

(1/2a2)(x--O)2

constants introduced

and other similar


nt. Since

K\177

specifically'

= K\177e (1/2ao\177)(O--0o)'2

p(x/O) =
where

normal;

be written'
p(O)

critical

also

a)

=
4 (15-24)

is

in this

footnote are of a

not

(15-27)

p(x/O)

p(O)

form

to write
\177)(02-200o+0o
\177)+(lj\3702)(x2--2x0402)]

KiK2e'(\177-/2)[(llao

which

exponent,

be rearranged

may

(15-28)

to

(15-29)
Let
1

the se

Using

de

.nitions,

2a

G0\177

G2

00

(15-30)

(15-31)

[02 -- 2abO

+ Ka]

[(0 --ab) 2

+ K 5]

---

--

(15-29)can be written

the exponent

(15-32)

(15-33)

a
(

Finally

\177

p(x,

0) =

K\177e -(\177/ea)(\370-a\177)2

(15-34)

and

p(0

-- p(x, 0) __

K7 e -(1/2a)(O-ab)\177

(15-35)

p(\177)

This

mea

a appean
(integrati

tt 0, given
x, is a normal variable with
mean
ab and variance a, \177r\370vided
aPP ;opriately in K 7. But it must, sincep(0/\177)
is a b\370na fide probability
function
\177gtc 1), and K7 is just the scale factor
necessary
to ensure this.
\177sth

326

THEORY

DECISION

where
1

0'o2

0 '2

(15-37)
{90

0'02

0 ,2

(5-38)

apply

Now

our

to

this

Since

example.

2=4

0.2__I
=

00

it

25

x =

20

that

follows

and

25

b-

20
1

105

Thus'
= ab

mean

--

variance

Hence the posterior distribution

may

= 21.0
.8

a --

.8)

20) =

compared

the

with

written

be formally

(15-39)

prior'

?(0) = N(25,4)
logic is shown in Figure
account of observeddata (x),

Bayesian

The
adjusted

to take

(15-40)
15-6.
with

A prior
the weight

observed x dependingon its probability


p(x/O).
The
distribution, with mean (21) falling, as expected, betweenthe prior
and the observed value (20).(As a bonus, variance is reduced in the
distribution.

Although

this

does

not

always

happen,

is

distribution

attached to the
result is the posterior

it

is evident

mean

(25)

posterior

that it must

happen for normal


distributions;
for (15-37) shows that
the posterior
variance
a is less than
0.0\177,and
also less than a 2 incidentally.)
Question
(c).
With the posterior distribution
(15-39)
now
in hand,
defining
a Bayesian
estimate of 0 requires only
a loss function.
Suppose this
is the quadratic
loss function; what
is the Bayesian
point estimator of {9?
Find

also

the 95

\177oprobability

interval

for 0.

VERSUS CLASSICAL

BAYESIAN

ESTIMATION:
Bayesian

95\177oProbability

327

interval-\177

21 + 1,96\177
\177pOsteriorI

p(O/x

20)

= N(21,.8)
= N(0o, O'o\177)= N\2755, 4)

p(o)

Bayesian

estimate

25

2!

20

estimate

\"<--- Classical 95\177 confidence


\177+1.9!

o'

interval

+1.96

-- 20

x, and its distribution.\177

Based only Ion \177th\177


evidence

Bayesian versus classical estimation.

15-6

FIG
Soh\177tton:

the

For

optimum

that

(Note

function,
the posterior mean (21)is
because p(O/x) is normal, this is also the
that
all
the loss functions
in
Table
15-7
loss

quadratic

the

estimator.

posterior me
:lian and mode, so
yield the sam,
answer. This is reassuring, and
To cons :uct

a 95 \177 probability

observation

the'

given

interval

20, there

is a 95 %

from (15-39) that,

we know
probability

in practice.)

happens

frequently

interval,

that

will

fall

in the

! th

21

i,l

,is/alue narrower
of the

rNe\177tte;t tihnig\177the
I

-_k:

1.76

the classical
(moreinformation
precise)than?(0).
prior
21

4-- 1.96,j\177

interval

(15-22),

PROBLEMS

15-8 As
(or*

\177ur

? 0)

leans
show

of measuring
that
in the

(beetles) becomes
posterior distribution

the

mean

variance
I

more and more precise


?(O/x),

---,-x
-\177
0

(15-41)

328

THEORY

DECISION

In

15-6,

Figure

independent

two

value z.

its measured

what would yo.u


measurements

be

we can

device,

measuring

errorless

true

the

that

::> 15-9Using
mean if

if we use an
0 will be

words,

other

certain

of the beetle

average
of 20 ram? (For an extension of
immediately following.)

had yielded an
see the

answer,

your

posterior

of the

intuitively

expect

section

(b) Generalization
of

a sample

that

Suppose

be taken

can

rather

measurements

independent

x\177,

sample mean
what happens as we

x. Using the

a single

just

than

x2,

...

x,, what

%\177

now

is the Bayesian estimateof 0 ? In particular,


get more
and
more observations (n \337o0).9
This problem
may be solved,using (15-36)to (15-38)
with one important
change. Since our data now is 2 instead of x we must make this substitution
in (15-38), and also substitute

0.2

for

o'2 in

(15-38). [Of

(15-37) and

samplemean

when

1 +

rr\177

00

and b

\1772

for

n -- 1

for

n =

variance of a

single observation.] Thus, our

is'

(15-36)

this reduces

to (15-37)

(15-43)

to (15-38)

(15-44)

d2

G\177)

In the

of a

of a

in

is just the

(15-42)

course,

variance

the

cr \177is

definition

generalized

(15-42)

n\177

this

reduces

0 '2

limit, as samplesize

n --\177 \177'

(15-45)

a -- 0'2
b N

Incidentally,

these

exactly

same results

(15-46)

nx

0,2

follow, whether

.\177o\177,

or

(15-47)

\177 \337oo
cr 0

Thus,

evaluating

(15-36)'

posterior mean --

(15-48)

\"\177 7:

ab

0.2

posterior variance =
Again the normality

mode, and median

of

coincide.

this

posterior

Hence,

a \177 --n
ensures

distribution

regardless

of

(15-49)

which

loss

that its mean,


we may

function

ESTIMATION'

329

VERSUS CLASSICAL

BAYESIAN

use:

Bayesianestimator
0 r

as

that,
We
con!lu&
This
is exact

\177c\177,

it should

as

interval

7o probability

95

?,elation of

0 N

of

1.96

\"'\177\1774-

more and more data

Classicaland

Bayesian

Assumed, Results are Instructive

the classical.
less and

approaches

estimation

Bayesian

be: as

(15-50)

\177

collected,

are

(Although
Cases Too)

Estimation.
Other

for

Normality

15

TABLE

Gets

And

Pr{

re

Requires,

and

Point

to
Estimat\177

Observed

Estimates

Our

In

the

20

estimate

Point

Classic\177

In

Example
(n = 1)

Along With

Interval

the Answer:

Confidence

\337

20 4- 1.96

[9(x/O)

as

Limit,

\337 4-

1.96

interval

[9(x/O),[9(0)

Point estimate
and

loss

p(x/O), p(O)

Probability

Same as classical

21

function
21

4- 1.76

Same

as classical

with

an unlimited

interval

b
peed
e
iprio\177r information

tt

\337

less wel\177ght
sample,

a a

ched

to

rior
p .

is completely

information;

disregarded,

and

as in

classical

estimation.

and Bayesianapproachesare compared


in more
detail if0 Table
turn
to the Other condition
that lead s tothe same
resul\177i. Bayes
estimate,rs
a\177so :approach
the classical if prior information is very
vague
(i.e.,
if rr0\177-\177-\177,\177ts stated
in (15-47). Thus the less the prior distribution
tells
us,
the less weight we attach to it. To sum
up,
the two reasons for completely
disregardin\177prior
information
are (I) if present data is in unlimited
Supply,

The classicai
15-8. W e no

or (2)

(c)

Is

if

pri(,r

0 Fb

In t

information

ed or

his chapter

for exan\177ple

the

is useless.

\337

Variable?
we regard the target to be estimated
as a random
variable
length 0 in Figure
15-6. Yet in all preceding chapterg,

beetle's

330

THEORY

DECISION

regarded the target as a fixed parameter


for example,
the average
American
men. Nevertheless, we may often find it useful to think
of,tt
as having a subjective probability distribution
........with
this being a description of the betting
odds we would give that
3t is bracketed
by any two given
values (see the description of subjective
probability
in Section
3-6). In the
problem of men's heights
it may
be helpful to boil down our best prior
knowledge
of 3t into a prior subjective
distribution
of \177u.Then
the posterior
subjective distribution
of 3t would reflect how the sampling data changed the
we have
height

betting

,tt of

odds.

PROBLEMS

15-10

Following the

beetle

in Section

example

15-4(b), suppose that:

=
25

0o

0.2.--1
and a sample of 4 independent observations
on
the trapped
beetle
yields an average length
\177of 20 mm.
(a) Calculate the Bayesianpoint
estimate
for 0, the length
of the beetle.
For two reasons this estimate
is closer to the observed value of 20 than
the

estimate

Bayesian

(b) Calculate the

=>15- 11 Suppose that,


college

in

you

campus,

(21)

in

probability

only

rather quote as your \"best estimate\"


in the population (whole campus):

(a) The

Explain.
interval

for 0.

sample of 10 students on
one is a Democrat. Which

a random
find

15-6.

Figure

Bayesian95 %

of the

proportion

an

rr of

American
would

you

Democrats

estimate,

classical

x
n

1
i-6 =

or
(b) The

Bayesian estimate which,

tribution'

assuming

pot)

this

subjective

prior dis-

CRITIQUE
a

yields5

loss function,

quadratic

x-l-3

n+6

15-5
(a)

estimate

the

-- .25

16

METHODS

BAYESIAN

OF

[QUE

331

METHODS

BAYESIAN

OF

\177

Stn

Ua

l\177ss

0). Comparedto

of
and loss
methods often
sense

the

?(0)

distribution

Bayesian

methods,

classical

(in

method

if there is

utility)

of

optimal statistical
a known
prior

is the

inference

[a[

credible point
, Problem l 5-11),and more appropriate
hypothesis
tests (e.g.,
3 below). Bayesian
methods
are particularly
useful in the social
business,
where sample size is often
very
small,
and Bayesian
considerably from the classical methods.

yield

estimat6s
Pr.oblent 1

sciences
metho&

15-8), more

Table

(e.g.,

estimates

interval

(b)
or criticism of Bayesianestimation
and loss function l(a, O) are usually

Th\177

The

prn

it is

is that

subjective.

highly

known

not

is there

nor

at all of specifying them exactly. For example,what


for an economist. measurin a population'sunemployment

often

with

the t
Then

\177 dif2CUlty
as
I\370;\177functions
I

hrde

error .We have already seen that


this
ls not as
it seems at first glance,
Since in many problem\177
any
of
of Table
15 7 lead to the same Bayes estimator
'
'

the

s\177lect'ng

oter

information
Th\177
I}
'nS!fin\177'n\370wi\177
t\370O

'\177

0 as a

re

Variat)le;

(as

thougB

rior

are often

there

distribution

as a subjective

functions,

:e for

example,

they

do

Lindgren,

the

involve

B. W.,

Statistical

Usuall

(0)

:
0f

reflecting

rate
Chips).

his prior

satisfactory.
of

specification

subjective judgments.
Theory,

interpreting

bowlful

entirely

right

the

unemployment

from a

distribution

indeed

]2
in

difficulties

cannot re g ard
it is drawn

to

lead

still

But
he may
not view even this as
'esian techniques require a rough-and-ready

these
pro

...the

would

Oil 0.

betting
Sin

\177
For

uired

an economist

think of?(0)

Instead

Moreover

:)m variable

function

loss

\"Wrong\"

estimator. ',
0 as a

rate,

statistical

!le

m\177v\177ta

serious

the

is

'\177pe

loss fun,

2nd Ed.

Ne w

The
York'

Macmilla

0 The
data

we borr

information
for Bayesian inference is ?(x/0), the distribution
s can often be borrowed
froIn dassica I statistics.
[For example,
classical
deduction in (15-42).1

uired

\177.B
\370i

:i

Of sample

recall how

THEORY

DECISION

332

interesting observation however, is that


such explicit specifications,are by no

elements.One

of

the

major

suppose

loss function,

Methods in

that,
opts

rather than
for the 0-1

using

been

Disguise

familiar

the

loss function.
the posterior distribution'
he

of

mode

the

badly

has

As we shall see
in extreme cases

wishes to estimate
0 with
no prior
he might use the \"equiprobable\"prior'
p(O)
= c, a constant

desperation

Further

techniques.
when exposed;

classical

knowledge.In

a Bayesian

Suppose

subjective

better. 7

as Bayesian

Methods

Classical

*(c)

in

of the same

of the Bayesianmethod

contributions

to lay bare the assumptions implicit


in the
next section,
some of these fare
any
intelligent
guess is substantially

free

require no

which

methods

classical

means

He

(15-52)

and attractive quadratic


will estimate 0 with

thus

_ p(O)p(x/O)

p(O/x) -

p\177-x)

mode, he
since the bracketed term

To find the

But

the

finds

this

statement

value
doesn't

[c/p(x)]

the value

'

(15-8) repeated

p(x)

of (15-52)'

because

But

of 0 which

is recognized

of 0 which
depend
makes

as just the

(15-54)

p(x/O)

?(x/O)
definition

?(O/x)

makes

on 0,

he only

largest.
needs

(15-55)

largest
of

the

But

to find:

classical

MLE. s

From this, we conclude that a classical statistician who uses MLE is


getting
the
same
result as a Bayesian using the 0-1 loss function
and an
\"equiprobable\" prior. This seems a very unflattering description of MLE,
sinceneither
this
prior
nor this loss function
is easy
to justify. But in many
cases,
MLE is not nearly this restrictive.
Ifp(O/x)
is unimodal and symmetric,
as it often is, then its mean, modeand median
coincide;
in such circumstances
MLE is equivalent to Bayesian estimation
using
any of our three loss
functions.

As

if the

consideran

discussion of

even

more

MLE above

questionable

has

not

application.

enough, we
Suppose we are estimating

been damaging

A further criticism of Bayesian


methods
is that there is too great
a cost of computing
Bayesian estimates (not to mention learning about them); but this criticism is being
weakened
with the advent of better
computer
programs.
8 Note
that in developing MLE in Section
7-3, the notation ?(x; 0) was used, equivalent
to 70:/0) used here.
?

AS A

S TESTING

HYPOTHESI

DECISIO

BAYESIAN

333

distrib\177
strib\177tion\177

proportion
rr (as in Problem
15-11). It has been pro\177ed \177that
i;tatisticianL\177using
MLE
will arrive at the same result (estimating
,
as a Bayesian
using the quadratic loss function
and
the prior
t
shown
in; Figure
15-7.

encoun\177re

i l.

popU\177latiCln

aa classlcal
class

\177 w\177th/x

rr

'n

with
\337 [/I

'ior distribution is

about the prior

comforttabll

of students

majority

We recall

are Republican, or
been un15-11; but it

have

may

we

that

in Problem

graphed

distribution

we have yet

worst

the

hopeless,

obviously

a huge

Democratic.)

are

ority

ma

huge,

(It means that

\177o(\177r)

.
I

FIG.

was vastly t/etter


strhng

very

to reject it
n

this.

than

d result in
ih Problem

should

b\177

used

15-11.

although

con\177:lusion,

Section7-3(e)],these
it

This

a small

are

with

large

great

15-7

explains

MLE

sample

Sul}pos, there are two


\"X\177
is

sighted

\177n a

\370Again,

le\177off

Macmilta

B.

an

W.,

but

Statistical

is

this

2nd

\177eetle

provides

sighting

Theory,

while

harmless,

insecticide.

expensive

territory;

DECISIO

BAYESIAN

species of beetle.SpeciesSo

a serious pest, requiring


as yet uninfested
Lindgren,

characteristics [see
sample estimation,

caution.

new,

)r example,

occasionally
give a
in leading us

correct

was

has man
attractive
prot\177erties; in small

AS

species

can

MLE

why

sample; our intuition

Ed.,

Ne w

is

no
York'

334

THEORY

DECISION

insecticide

was So or

whether the beetle

useful in establishing
be used or not7

information

S\177.

Should

To answer this

question,
we need to know the costs l(a, O) of a wrong
and the probabilities ?(0) of it being
one species
or the other;
these are given in Table 15-9. Obviously action a0 (don't spray)is appropriate
if the
state of nature is So (harmlessbeetle)while
a\177is appropriate
if the state

decision,

is S\177.

l\177,

it will

Probabilities of States of
Loss Function

T^m\177 15-9

.3

So

a0

(don't

each

100

15

15

spray)

a\177 (spray)

in

row of
\234(a0)

by their

table

this

Species)

Species)

\177

Action a

elements

S\177

(Harmful

(Harmless
\337
\337
_., \177
.\1770

weighting

and

Nature,

.7

?(o)

'

l(a\177, 0\177) =

calculate the expected lossesL(a),by

As always, we

short.

for

Should we spray, or not?


be convenient to generalize the losstable, calling

(a).

Ouestion
Solution.

probabilities'

appropriate

= ?(00)to0

= (.7)5 +

(15-56)

(.3)100= 33.5

and
L(a\177)

We

this

problem

testing: action ao (don't

beetle), while

q- (.3)15

I5-\177-min

action is a\177: spray.

the optimal
see that

Thus

(.7)15

action

a\177 (spray)

be expressed

may

spray) may
may

terms

in

accepting

interpreted'as

be

be interpreted

of

hypothesis

H0 (harmless

as accepting H\177 (harmful

beetle).

Question

(b). Suppose that prior


9 times as common
as
is the optimum
action ?

So is

species

p(O), what
Solution. Don't spray,
In

this

risk,\" i.e.,

case

the

assume the

harmful
beetle

as

shown

about

information
S\177.

in Table

species is
is harmless

this

Given

the beetles is

new information

that

about

15-10.

so rare, that
as our

working

it

is better
hypothesis.

to \"take the

HYPOTHESIS TESTINGAS
15-10

TABLE

A BAYESIAN

Action,

Optimal

of

Calculation

335

DECISION

a priori
p(O)

.9

.1

S 1

L(a)

(Zo)

spray)

(Don't

15

15

al

14.5\177- min

100

a0

15

(Spray)

So far we have assumedno

z).
beetle

that

been

sighted.

1.ength

r\177eas\177red as
their

27 min.

\177as

that

further

which are normal


0\177=
30 respectively.

25 and

\1770 --

me-\177ns

Suppose

a poster.ori [Assume

?(0) and

has

losses given

in

captured,

been
the

two

the

.pn

information

statistical
it

lengths,

tlnguishable\177y

and

suppose

Now

species are

random variables
What now is the

its

with

dis-

with

a --

best

action,

4,

15-9.]

Table

a generalsolution, leaving
of particulars
to the end. Losses are ca!culated\177as in (15-56),
Lhe appropriate
posterior probabilitiesp(O/x)for \234(0):
\234(ao) = ?(Ool)too + ?(Odx)to
It

$oluti,\177n.

substitulion

substituling

will

most

be

to develop

instructive

Similarl'\177
i

Wec

.ction a0

o\177se

-- jP(Oolx)lxo

L(al)

if and

only if

(15-57)

and (15-58) into

riterion: choosea0

obtain

p(O\177/x)[lot

:keted

Th\177

as-s9)

<

\234(a0)

SubstitUting

(15-58)

q- p(O1/x)ln

(15-59),and collectinglike terms, we

iff

- ln]

< p(Oo/x)[l\177o-

(15-6)

/o0]

quantities
ro

lt0-

(15-61)

100

and

are

call :d

,grets.

(to) is tl\177e e> tra


rather_t\177a n

the di

It is

easy to see why:

lossincurred

if we

Evaluating
:e in sprayed(ao).
column
elements
en !ot

regret

if the

used the wrong

action

the

harmless
i.e., sprayed(aO,

beetle is

(15-61)
we see that ro is
in Table
(15-10).
Our much

15
-- 15
larg er

10,
regret

336

THEORY

DECISION

rt = 100 -- 15 representsour net


spray) on a beetle that turns out

be

it may now

(15-60),

to

Returning

action (don't

loss
if we employ the wrong
to be'harmful.

of' regrets'

terms

in

written

< p(Oo/) ,'o

p(Odx)

--

p(Oo/x)

equation can
<

p(00 p(x/00

our

this is

that

Recall

to an

(15-65)

,-\177

0o).

the

called

theorem,

important

cross-multipli-

appropriate

An

as

spray), interpreted

a0 (don't

action
0 --

harmless,

of (15-65) leads us

cation

r_o

p(X/0o)
for

criterion

acceptanceof H0' (beetle

in

expressed

be

now

cancels,

?(x)

p(0o)

(15-64)

r\177

in this

probabilities
and noting that

posterior
(15-8),

The
using

full

(15-63)

Bayesian Likelihood-Ratio
Criterion'
H

Accept

p(x/ 0 0

where

the

r\177is

tion,

function. Thus

data
be

To illustrate

and p(00 are also assumed

equal.

if the

is more likely to

the very

the

this

iraplausible
then the

a sufficiently

p(x/0o)],

than

less

H0

Thus

inequality.

simple case in

assumedequal, and
The

the

0\177generating

00

will

the

sample

In simplest terms: we
generate the observed x. In this

viewed as hypothesis testing, within

a maximum

regrets

p(0o)

(15-66)becomes1;
sample

the

generating

the

which

probabilities

prior

side of

right-hand

likelihood of

greater than the likelihood


of
the alternative H\177 is accepted.
which

is called

be.

consider

further,

for error) are


is accepted

is

0x

p(x/00 is sufficiently
small enough
to satisfy
[i.e.,

as it should

accepted,

H0

0\177.

7-3 as

Section

of (15-66)

side

is certainly reasonable.If

criterion

This

explanation of the
likelihood ratio will

thus

the parameter

given

likelihood estimation in
left-hand

deduc-

classical

from

ratio.\"

\"likelihood

(penalties

the

x,

estimator

the

and

distribution,

prior

borrowed directly

is often

in maximum

it appeared

the likelihood

be

p(x/Oi)

and is the distribution of

Specifically,

(15-66)

ro p( 0o)

<

if 0i is true, p(Oi)is the


of the observed data.

earlier,

stated

iff

regret

is the distribution
As

[p(x/00].

[p(X/0o)] is
Otherwise,

selectthe
sense,

likelihood

hypothesis

this

context,

be
shown

could

HYPOTHESIS TESTINGAS
in

ga. In b we make the

Figure

function!

'distribution.

rotered

(c

Then

0o

on

criterion

Accept Ho iff
Ev,

(15-66) when

\337I

observed

x is

\177 r\177 or

r0

To keepthings

complicated

that the two lik

assumption

further

and 0, respectively)
(15-66) reduces to

337

DECISION

have

the

closer to 00 than

:lihood

normal

same

\177\370

(15-67)

0\177

result.

reasonable

Again,

BAYESIAN

normal

'00 are

?(00)

a common

with

\177 ?(00

we assume

simple,

\177.Then

is obviously

that 00 <

a more

0\177, and

that

our criterion

(15-66)

p(x/0o)

p(x]01)
f

critical

0o

Accept H0

:
if

x observed

this

ue

vat

in

\177Accept

range

H1

(a)

I:

p(x/01)

p(X/00)

00

01
valu e

critical
I

Accept

FIG.

15-

to
\177o
In

fact

symmetri

and

H0

I
I
I

Accept

testing,
using the
p(O 2) = p(01)]. (a) For

!ity is

not

required;

the

BaYeSian
any

ratio [sPecial
(b) If p(x/O i) = N(Oi,

likelihood

p(x/Oi).

tw \370distributions

ne ed

only

be

case

when

a).

unimOdal

and

338
for

THEORY

DECISION

H0

accepting

becomes

e-(\177/\177)(\177-\370\177)\177 ro

p(0o)
(15-68)

e -(\177/\177(\177-\370\370\177
< rl p(00

This may

be reducedn to: acceptH0


X

( 01 -(re

iff

log

r0
+

I\177 P-\177l)--J
p(0o)']

00

2
01 +

(15-69)

0\370

(The logarithms used throughout


this
section
are natural logarithms, to the
base
e. The common logarithms of appendixTable
VIII
can be converted
to natural
logarithms
by multiplying
by 2.30.) We note that
the
right-hand
side of (15-69) is independent of x; as in all hypothesis
tests, this can be
evaluated
prior
to observing
x. At the same time it does depend, as expected,
on background
information
?(0) and regrets. Moreover, when
r0 = rl and
p(00) = p(01),then the log term disappears and this reduces
to the special

case (15-67).

(15-69)

yields:

of the beetle

problem

particular

the

Finally,

Substituting

the information
accept H 0 iff

spray can
(c) and

question

in

given

< 3.2log

(15-76)

+ 27.5

(15-77)

< 23.4
logarithms

solved.

(2-\177.5)-4-27.5

3.2(--1.29)

taking

be

510(.7)q

16 log

Details:

now

Table 15-9 into

(5-78)

of (15-68):
I

- 2az

00 \177'+ \177 (x -

(.\177-

00)2

< K

(15-70)

where

(t5-71)
(15-70):

Rearranging

1
2a\177.(20xx
2(0x

i.e., accept

-- Oo)x

--

o\177
2

(0\177
\177'
--

0o\177.)< K

002) <

2o\177'K

(15-72)
(15-73)

H0 iff:
0\177--

definition

of K

K+

x<

Using the

20oX

00

2(0\177

in (15-71), (15-74)may

oo

(15-74)

-- 00)

be written

as (15-69).

AS A

TESTING

HYPOTHESIS

339

DECISION

BAYESIAN

Since

a 27 mm

served

H0.

beetle

is

beetle

25 ram--exactly

is,
to be

reasonable. The heavy

it

that

the relative size of


(b)

Co

\177

es H0

be chosen.

which

that explains

this

out

confirm
result.]

and

This

H\177

(two
is

analysis

involves only
two
00 and 0\177), one of
scope, since hypothesis

\177a composite

It\177.

limited

of
Thus

g the

materi

test,

we

less

sati

Ba,

loss

turns

(15-75) we

we have covered only that


first section of Chapter 9. In recalling
that
Classical
that
it had
the advantage of being far simpler;
but it was also
It
used
only the probability function ?(x/O), while
the
Lhod also exploits the prior distribution
#(0)
and regrets
(the
; we
have
seen in the last section how
important
both
these
ing
up an appropriate
test. Restated, the classical method
sets
involves

testing

this answer

beetle

described
here
states
of nature

testing

hypothesis

Th\177

competi

regrets

[From

that

harmless

Methods

Classical

with

two

the

risk.

see

if the

involved

if the

even

(15-78):

expect of a

we

thought

damage

spray to avoid this

us to

1 induces

further

With

spray.

still

in

we would

length

the

00,

violated, and we reject

this condition
is
the critical
value

beetle,

doesseemstrange

rum

can be i]
\177 =
5 o:
sometimes arbitrarily, sometimes with implicit
reference
to
vague c(}nsi\177
of loss
and prior belief. Bayesians would
argue
that
these co r}side t:ationsshould be explicitly
introduced--with
all the assumptions
exposed,

open to

improvement.

and

criticism

PROBL]
15-12

m
in
all

(c)
dec
15-13S
tal\177

no]

or
tio\177

\177

and losses given

in

Table

15-10-

the hypothesis test of question (c) above.


With
your
lent of 27 ram, what would you do ? Why does our argument
paragraph (spray even if beetle is 25 ram) no longer hold?
)pose
that
species
So and St were equally frequent.
Would
that
,ur decision ?

?frequent

species

would

So have

to be

in

order

to alter

your

classify people as sick or well


(hospia psychologicaltest.The test scores are
distributed, with a = 8, and mean 00 = 100if they are well
120 if they are sick. The losses (regrets) of a wrong
classificaobvious'
if a healthy person is hospitalized,resources
are
psychiatrist

not)

on the

has

to

basis of

THEORY

DECISION

340

and the person himself may even be hurt by the treatment.


Yet the other loss is even worse: if a sick person is not hospitali'zed,
he may do damage, conceivably fatal.
Suppose
this second
loss is
considered roughly
five times
as serious. From past recordsit has been
found that of the people taking
the
test, 60\177 are sick and 40\177 are

wasted

healthy.

should be the critical


classifiedas sick?Then

(1) What

(a)

is

What

(2)

(3) What

(Probability

\177?

is/\177? (Probability

(b) (1) Ira classical


is

What

(3)

By how

fi

I error).

of type

II error).
setting

\177

5\177,

then

what

critical score? Then

will be the
(2)

of type

arbitrarily

is used,

test

is

which the person

above

score

much has

(c) What would we have


to arrive at a
it is reasonable
?

to

in order

loss

average

the

by using

this

of the two regrets to

the ratio

assume
test

Bayesian

increased

method

less-than-optimal

having

5 %?

\177=

Do you

be

think

GAME THEORY

'15-7

of this book to consider a


theory. Recall that the concept
of
probability was developed in Chapter
3 as a groundwork for the
statistical
deduction
and induction that
followed.
Game
theory is not part of this
statistical
theory;
rather,
it illustrates a quite different
application
of the
concept of probability.
Game theory
is a way of analyzing
conflict
situations.
These may arise,
At

point

this

interesting

rather

we leave the general


branch
of decision

for example,in poker,


business,
might be card players playing
for
in business,

remain
and

or

military

or politics;

parties

conflicting

our

thus

insignificant
stakes,
leaders
engaged
in a

oligopolists

to

playing

desperate set of

moves

countermoves.

(a) Strictly

Determined Games

The players employ


he

argument

has

control;

The

the outcome
way

will

outcome

the

players is shown

in

Because

strategies.

over the outcome of the

some control

Table

also

depend

of the
15-11;

a player
game.

can choose his strategy,


But

on the strategy

game is related to
the \"payoff

this is called

he is not

in

complete

of his opponent.
the

of

strategy

matrix,\"

and

both

defines

TABLE

15-11

Matrix

for

An Example
A

of a

Payoff

B)

for

Function

(Loss

341

THEORY

GAME

B's Strategies
1
2
3
`4'S
Strategies

stn

simple,

Thus,

st function) for B. ,4

(o

matrix

outcome

11

should

while B

as possible,

:ge

;ible.

as small

Obvi

B will

15-11

some
\177where A

negative
!

game'shc

This is

tl]e

all-positi'\177

,n

in

Table

a ;umptlon
e for easier

hi\177nself,

strategy

his appropriate

2.

Tl\177e

from

In
attractiw
this

in B's
playing

6. A

fin,

reducing

the outcome

game shown

think

of a

and some

to induce

order

B S12for

eac

Table

payoff matrix

B pays A)

(where

in

time

B t \370play
he plays.

essence

of game

after he knows h(}w


his
strategy is obviousand the

known
largest

theory,

that B
payoff

becomes

has chosenstrategy
(18) and then play

however, is

that

each

has

opponent
game

player

3,

that

must

knowledge
of his opponent s decision;he only
matrix. We further
assume
that the game is repeated many
o4Iy
clues a player has about his opponent'sstrategy
must
come
tg his past pattern of play.
circumstances A finds .the.\177.c\177ntmu9us play of strategy 1 unis true that this row has thelargest possible payoff ($25).But
cooperation
in playing his strategy 1, and it is clearly
not
.t to cooperate. Indeed if B observes
that A is contin9ousl
gy 1, he will
select
strategy
2, thus keeping the payoff down
to
ategy
3 similarly
unattractive; B will counter
with strategy
3,
layoff to only 5. A chooses
strategy
2; the very best play by B

\275lmslelf

thle pakoff

times.

keep

we now make, in order


to keep our payoff matrix
geometric interpretation. The question
is \"With this
it in B's interest to play
this
game 9\"

strategy

Tl\177e

to

in playing the

in

make the

a strategy to

be selecting

pays B). Alternatively,


15-11, A might bribe

bne. For example,if it is


!scan column 3, selectthe

knows

should be trying

elements

a trivial
will just

commit

18

lose. So we might

positive

$12 side l\177ay\177ent, is


If a blay4r can selecthis
committeld

interest

no

have

d0 ' nothing but

can

normally

the

10

also

might

the

20

to player

r\177

on

25

,4. Thus if A selects


strategy
2 and B strategy 1,
be a payoff matrix for B, similarly
dependent
selected
by the two players. However, to keep the discussion
):ie that this is a \"zero-sum\" game- -i.e., what
,4 gains,
B loses.
115-11defines not only the gain matrix for A, but also th e loss

the payol
A

11

without

342

will

THEORY

DECISION
yield

still

the

calculated

these

payoff of

2t a

value

minimum

This

values.

minimum

10. Now

review
why ,4 chose strategy 2.
in each row
and then selected
the largest
maximum
of the row minima is called

He
of
the

\"maximin.\"

Now
to keep

consider

the payoff

the problem from B's point


as low aspossible.Strategy

of

view.

that he wants
when ,4 observes

Recall

ill-advised;

1 is

Payoff

25

20

\177-

\177
*,

11

I
I
.<_____>.

B's

strategies

A's strat

Compare

to the
saddle:

FIG. I5-9

him playing this, he

3 is also rejected

may

matrix

in Table

I, leaving B with a loss of $25. Strategy


$18. He selects strategy
2; the most it
that B calculates the maximum
value
of each
smallest of these. This minimum
of the column

cost

him

can cost him is $I0. Note


column, and then selects the
maxima is called the \"minimax.\"
Note
that
occurs
at the same point as maximin,
with
A will play his strategy 2, and B will
play
\"strictly determined\" game becauseminimax
each

This is illustrated
payoff measured

in

15-11.

with

counter

will
it

Payoff

Figure

vertically.

15-9,
At

in this
a payoff

strategy

his

and

a diagram
,Y we

note

special case minimax


of 10. In this game,

maximin

2;

this

called

is

coincide.

of the payoff matrix


a \"saddle point,\"

with
which

is

both

the

11

When st

Summ
strategy

B will

,rgest element
e.l In

I'l\177e

t)

(from
if he

is

both

it is

minimax.

and

maximin

both

game,

determined

strictly

this

payoff

play

exists,

point

\337
\177saddle

smallest element in its row.

and the

column

its

in

343

THEORY

GAME

is always 10,so that


bribed $t2 to do so.
B to A)

play
a game

will

and
it is

clearly

PROBL\177M\177

15-14

What

games

is

the

strategy

appropriate

in each

opponent, in

for each

case decide which

the

game

--1

--2

--2

player

(a)
B
1

--1 --3

B
2 3

I
1

10

I
1

--20

--4

--2

--6

the

favors.

following

344

(b)

THEORY

DECISION

Mixed

Strategies

Let us now

to

try

apply

the theory

of part (a) to

the

following

game:

largest

minimum

value

TABLE 15-12

B
1

5\177

4,\177

,4

Maximin

Minimax

A would select his strategy 2; this


(maximin
= 4); at the same time
column with the smallest maximum

arise; becauseminimax

and

is the
B

maximin

row with the

woutd

value

select
(minimax

do not

his strategy 1; this is the


= 5). But now problems

coincide, there is

no

saddle

Such a game is not strictly


determined,
with each playing only
one
strategy;
it is easy to see why.
B begins
by playing column 1, while
A plays
row 2; the payoff is 5. Now B observes,
that
as long as A is playing
row 2,
he can do better
by playing
column 2, thus
reducing
the payoff to 4. But
when
B switches
to column 2, it is now in A's interest to switch to row 1,
point.

raising the

As an exercise the student should confirm


that
a whole
countermoves
are set into play
eventually
drawing
around to the initial
position.
Then.a
new cycle begins.
This will continue
until the players recognize a fundamental
point.
Once
a player allows his strategy to be predicted,he will be hurt. Thus, for example,
when A's strategy
becomes
clear,
he can be hurt
by B. What
is his defense ?
A's best plan is to keep B guessing.Thus if A determines his strategy by
a chance
process, B will be unable
to predict what
he will do. For example,
A might
toss a coin, playing
row
1 if heads,
or row 2 if tails. He is using a
\"mixed
strategy,\"
weighting
each row with
a probability
of .50. Now B
doesn't know what to expect; the only question
left for A is whether this
50/50
mix is the best set of oddsto use.
The
best
mix of strategies for A is determined
in Figure 15-10. Along
the
horizontal
axis we consider various probabilitiesthat
A may
attach to
playing
row
1. This is all A has to select; once this
is determined
(e.g., if
payoff

to 6.

series of such moves


the players in a circle

and

345

THEORY

GAME

ar

E
\177

Maximin

o 5

value

'\1774

-u

If

sets
Bwill

\177
\177
'\177
x
\177-

play 2

imized

-' 4\177

A sets

or 2

/3),

A's probabilities

B
of

is

hen

extreme

if'

sets p

playing

Table

row

15-12).

2 is

also

-- 2/3).

1\1772,

the
\1771sets

value
p =

of the game l/\177 is 5. On the other hand, at the


1 (and always plays row 1), then
l/\177 =
3. Or if'

then

F1 =

3(1/2) +

r any probability/;

Generally,

for\177n
,

probabi!ity

o:

(15-80)

confirms

selects.

Similarly,

l/\177

plays only

These last two

--

5(\177

5(1/2) =

(15-79)

-- 27

(\1775-80)

select'

may

= S

--?)

that V\177 is a straight line function


if B plays only column
2, then

6p +

4(1

--?) =

of p ,

the

(15-81)

27

--

6p

3, then

column

Va

that

3(,0) +

F\177 =

Or, if B

shown in

the
expected
value
of the game which, of course,
the probabilities /l may select,
but also on what
B may
only
column
1, then the expected valueof the game
is a function
)ability/I
may select; this appears
in this diagram as the line
examining
in detail.
reme
left, if/1 sets ? = 0 (i.e., never
plays
row 1, but always

plays ro'

The

to

attached

probability

the

(for game

strategy

mixed

for playing row

,,nds on

At

we plot

Vert
only

It

always L play
\177

then

1/3

d\177

V1 = 4

and

B plays

DeterminatiOn of A's

,4 sets

V\177.

P = \177'B will

FIG.

only

A sets

at 4\177,

p =

do. If

!f

regardless of

whether

not

\177'

then Vis max-

27 +

8(1 --_p) --

equations are also graphed

in

Figure

15-10.

(15-82)

346

THEORY

DECISION

game

The

select?. If he
2, and keepthe

is now laid out for A to analyze,


? = 1/8, his opponent will
expected
value of the game at 4\177.

problem being to

with his

playing
geometrically,
and confirmedby evaluating
(15-81)
setting?
----- 1/8.]
Or ifA selects? = 1/2,
B will counter with 1, thus keeping the expected value of the game down to
4. Since ,4 is dealing
with
an opponent
who will be selecting
strategies
to
keep V low, the expected value of the game from A's point of view is shown
as the hatched
line in Figure 15-10. The bestA can do, therefore,
is to select
? -- 1/4.This guarantees
V = 4\177; moreover,
note that this is the intersection
of V\177 and V2. Ihus this is the value of the game regardless
of whether B
plays 1 or 2. This geometric
solution
may be read from Figure 15-10,or
determined
algebraically
by setting
V\177 =
Vo\177;using
(15-80)
and (15-81):
selects

5 --

27 =
p =

this value

Finally,

coun\177ter

by always

is shown

[This

(15-83)

2l,

(15-84)

\254

of? is substituted back into

(15-81)

for

of the

the value

game'
V2

A decides to

Thus

doeshe put

this

into

4 +

2(}) =

attach a probability
? There are

practice

(15-85)

4.\177

of
several

1/4

to playing

possibilities;

row I.
for

How

example,

he might toss 2 coins.If they both come up heads (probability 1/4), then he
row 1; if not, he plays row 2. If this game
is repeated
many times, A
will insure
that he receives an averagepayoff
which
will tend towards 4\177- and
there is nothing B can do to reduce
this. All B can hope for is that A has bad
luck; (e.g.,by the luck of the toss, A plays row 1 when B is playing
column
1).
This sort of bad luck can reduceA's average
winning
below 4\177 if the game is
played only a few times (or A's good luck can raise his average
winnings
above 4\253); but as the game is played over and over,
the element
of bad luck
plays

tends to

fade

out.

PROBLEMS

(a) ket's play a variation


on matching
coins. Each of us will choose
or tails, independently and secretly. I'll pay you S30 if I show
tails and you show heads. I'll pay you S10 if I show heads and you
show tails. Finally, to make it fair,
you pay me S20 if we match (i.e.,
both show heads,or both show tails). Do you want to play? Why ?
(b) What
are
the optimal
strategies of the two players
in an ordinat T
game of matching
pennies
? (Recall that in this game,
one player gets
the pennies if they match, the other gets them
if they
don't match.)

:=> 15-15

heads

still toss your penny i n

,ou

or tail ?

a head

r\177miSe.

co\177:

wants
Y\370u choos e

yo\177I eltect:a

on whether
a tail,

[ding

ac\177i

f\177o

det\177e
i

d]ng
on which
Do you agree to

(bi

u select

would

$5,\177 .h'at

(c) \177.re \177there


FrCm

$_5,

this

game

you conclude ?
any two lines in

strategies ?
five times and found

your

that

diagram

and the payoff matrix,


it is always preferable

do

you had
not

show that,
for him

intersect?

matter
select
the

no
to

from all

c'\370nsid -

further

'

erStion]?

In

for

sc\177lvin\177

maximinlandminimax
game, an d

If

coincide.

strategy, the
If

they

dctermin?d,
\177.ixed
strategies
are
cases o\177eo\177etdicaI
1Y or algebraically
\177

mfire a\177vanced
matpematiCal
extendin\177 th4 mechan ical SOlutiOn'

cases,

first
this

is

step is to
a strictly

test whether
deterinined

by each

techniques
are required;
it i s more important

and assumptions

ihilosophy

do,

player is determinedl
coincide, the game is not StriCtIy
called for' and are determined
in \177imple
as we have illustrated In more c\370mplex

t \270be used
maximin
do not

strategy

and

minimax

best game

t he

single

the?

fundameital

won

is \"dominated
payoff matrix,

strategy
the

club

c\177)t\177ld
n]t

If

\177Ce.

resPeCtiVelY'

S1

and

heart,
diamond
or club
$--2,
Sl, and 8--5, again

instead

thi

pehnies,

suggestsa
an

selects

he

while

$15, $4,

of the heart ace (i.e., the heart


strategy).
By initially
examining
he ha\177;e dropped
the heart strategy

club ac I
bv\"

He therefore

ace he's chosen.


play ? Why ? What

\177;\370ththe
diagram
tl\177te circumstances,

what

heads or tails

he s chosenthe spade,
he pays yOU S--10,

t \270play

were

i \177'\177
\370u

s\177retlY

than

to match

Wish

You

cards.

play

he pays you

head

det\177 e

(a)

to

companion

Jr

rather

a game,

Why

long sea voyage.

on a

yourself

15-16

such

347

THEORY

GAME

underlyillg

ganle

but

rather

than

to consider

the

the\370ry:

player using his best mixed strategy


can
guarantee
a certain ex.
pected value \177or the ga me, regardless of what his opponent
may do. HoWever,
this
is only
t\177e value
towards
which the average of many
games
wll tend.
If the galme .s only
playe
d a few times, luck may
raise
or lower
this
1. A

payoff.

2.

play

is

OIme
di'ctat\177

optimal
mixed strategy is determined
d by a random
process (tossing a

the

enough

t\177 decide

1, then

2,1 the:

(e.g., ? =

coin).It is simply

t\370 play

11 then

2,

each
and

'
1/2),

the

no t

good

strategy half the time---for example, alternating


so on. Once the opponent observesthis pgttern,

348

THEORY

DECISION

and

you.
Note
in the simplest game of
how badly a player would
be hurt
if he
interchanged heads and tails rather than tossed the coin. Once an intelligent
opponent
observed
this pattern, he couldwin every time. Each player must
be unpredictable,by deciding
his play be chance.
3. The theory of games is a very conservative
strategy.
It is appropriate
if a game is being replayed many times against an intelligent
strategist,
who
is out to get you, knows the payoff
matrix,
and can observe your strategy
mix. If these conditions are not met, chances
are you can find
a better
strategy than game play. To illustrate,
consider
an extreme
example. Your

predict your next play,

he can

hurt

5-15b),

(Problem

pennies

matching

payoff is:
Opponent

You

$4,000

_\177

$4

' 82 i

minimax
-

maximin.

coincide,
both you and your opponent should
time. But on the first play, your opponent plays strategy
1 ! This means either
that
he is a fool, or unaware
of the payoff matrix (and
that
$4,000
debacle
that he faces). It doesn't matter
which;
in these circumstances you drop game
strategy,
play row 1 and punish
your
opponent
for
his stupidity
or ignorance.
Game theory also should not be used in games against nature. As an
example,
suppose you are trying
to decide
whether to hold a picnic indoors

Since maximin

play

or

strategy

and

minimax

2 every

outdoors.

Nature

Rain

100

Outdoors 2

Indoors
You

Your

profit

You can

hold

depends

Not

Rain

picnic'

on

easily confirm

that

both where the


game

theory

1,000

picnic is held, and the weather.


selecting ? -- 10/11,with an

means

GAME THEORY'

o\177

weather
Instead,
the

that

be holdleg

he mor'\177e

does
so!uric

In cO.
nch
but

rather:

ate; but

\337

is t
ever

n outlined

Bayesian

if

if a

[sion,

etermined
under

\1777\177(01
is

Imetlods

student
will
and this of

The

hold.

not

play, unlessl\177e
hand,

probably

in Section

prior
by

these

opponent

15-2.

are required.

5-86)

it

see

that

one

may

ou
Y

the simpler

course is the Bayesian, or

opponent,

conditions

expected

not exist independently,


then game theory is appropribe too conservative
a line of
does

intelligent and informed. On the other


independently
(e.g., rain versus shine), then

is highly

determined

.
.
\177mmediately

p(O)

distribution

a hostlie

of:

(nature

assumptions

solution (15-:16)
is required;
value,

will

is dead wrong in this case, because


is intelli g ent and out to get

solution

game

co\177nplicated

of. the key\177ga}ne


theory
\177

an expected profit
000()
= 800

with

outdoors,

p\177cmc
I

s mply

you

that

clearly has gone wrong. An intuitive


glance
at the payoff
you should go outdoors, providing
there
is a reasonable
hat it won't rain. Game strategy is inappropriate becauseit is
false
premise
that nature is an opponent
determining the
the
sole
objective
of ruining your picnic (i.e., minimizing
V).
r '
\337
\337
e s odds
are determined independently; and let us suppose
ability
is 4/5 that
it will not rain. With these odds, you
Should

expectat:
based

mean

:sts

matrix

over $90. Theseodds

fit of just
lic indoors.

expectec
hold the

349

\177

Appendix

LISTOF
I. N 2 and
II.

TABLES

(a)

Random

Digits

(b)

Random

Normal

Numbers,

Standardized

III. (a) Binomial


Coefficients,
(;)
(b) Binomial Probabilities, Individual
(c) Binomial
Probabilities,
Cumulative
IV. Z, Standard
Normal
Distribution
V. Student's t Critical
Points
VI. Modified X 2 (C 2) Critical Points

VII. F Critical
VIII.

Common

Points

Logarithms

Citations for Tables

350

TABLE

Squares and

Square Roots

N 2

1.0000

1.00000 3.16228

1.0201

1.00499

1.0404

3.17805

1.009953.19374
1.0609 1.01489
3.20936
i1.1025

1.019803.22490

!1.1236

1.02956

;11.0816

1.02470 3.24037
3.25576

i1.1449
i!1.21oo

1.1
1.1

5.91162

1.54

2.5716

1.24097

1.56

2.4025

1.24499

3.924\1778

2.4336 1.249005.94968

1.55

2.4649

1.25300

2.4964

1.25698

1.04881

3.31662

1.58

5.96232

5.97492

1.60

2.5600

1.26491

4.00000

1.61

2.5921

1.26886

4.01248

1.2769

1.063013.36155

1.65

2.6569

1.27671

4.03733

!.2544

1.05830

3.34664

1.08167

1.62
1.65

1.66
1.67

3.42053

1.68

2.6244 1.27279 4.02492

2.6896 1.28062

2.7225

4.04969
1.28452 4.06202

2.7556

1.28841

4.07451

2.7889 1.292284.08656
2.8224
1.29615 4.09878

1.08628
3.45511
1.4161 1.09087
3.44964

1.69

2.8561

!.4400

1.70

2.8900 1.305844.12311

1.09545

3.46410

i.4641
3.47851
!.4884 1.10000
1.10454 3.49285
1.5129 1.10905
3.50714
!.5376 1.113553.52156

1.5625

t.11803 3.53553
1.12250

3.54965

!.6129

1.12694

3.56371

!.6384

i.6641

4.15521

1.50767
1.31149

1.74

3,0276

1.31909

1.76

3.0976 1.326654.19524
3.1329 1.33041 4.20714

5.0625

1,32288

4.18530

3.1684

1.33417

1.53791

4121900

3.2041

1.73

1.75
1.77

4.14729

2.9929 1.315294.15955
4.17133

1.13137 3.57771
1.13578 3.59166

1.79

1.14018

1.80

3.2400 1.34164 4.24264

1.81

5.2761

1.78

1.6900

I!.71611.144555.61939

li.7689

1.15526

5.64692

1.85

1!.7956

1.157585.66060

1.84

5.60555

1.14891 5.65518

1.16190

.35
1

\177.8496

1.38

li. 8769

1.17047

11.9044

1.17475

L9321

1.17898

li9600

1.18322

\177.8225

4.11096

2.9241

1.72

L7424

1.30000

2.9584

1.71

1.3!

.40
1.41

5.95700

2.5281 1.260953.98748

!.5876

1.3,'

3.89872

1.23693

1.57

1.2,

1.2

1.23288

2.3409

1.59

1.2

1.21

1.53

1.05357 3.33167

!.3924

1.2

1.52

2.3104

1.22882 5.88587

.1.2521

1.3689

1.2

1.22474

2.280I

3.30151

!.2996 1.06771 5.37639


1.3225 1.07258 5.59116
1.3456
1.07705 5.40588

2.2500

1.51

1.04403

11.03441
3.27109
!1.1664 1.03923
3.28634
\1771.1881

1.1

3.87298

1.150

5.67423
1.16619
3.68782

1.82

1.85

1.86

5.5124

1.87

5.72827

1.89 5.5721

5.74166

1.90

3.75500

1.88

......

1.54536- 4.25441
4.26615

1.34907

3.3489 1.55277 4.27785


5.3856 1.35647 4.28952
3.4225
1.36015
4.30116
5.4596 1.56582 4.31277

5.70155
5.71484

4.23084

5.4969

5.5544

3.6100

1.56748

1.57115
1.57477

4.32435

4.35590

4.54741

1.57840 4.55890

1.47
1.48
1.49

1.21655

1.21244
5.85406
1.22066 3\17784708
5.86005

1.91 5.6481
1.58203
4.37055
1.92
5.6864 1.38564 4.58178
1.95 5.7249
1.38924
4.59318
1.94 3.7636 1.392844.40454
1.95
5.8025
1.59642 4.41588
1.96 5.8416 1.40000
4.42719
1.97 5.8809 1.40357
4.45847
1.98
3.9204 1.40712 4.44972
1.99 5.9601
1.41067
4.46094

1.51

1.22474 3.87298

2.00

1.4\177

1.18745

1.19164

5.76829

1.43

1.19583

1.44
1.45
1.46

1.20000 3.79475

1\17720416

1.20830

5.78153
5.80789

3.82099

x/10N

N
351

4.0000
N2

1.41421

4.47214
X/\177N

TABLE

(Continued)
N

2.00

4.0000 1,41421 4.47214

2,01

4.0401

2.02

4.0804

2.05

1.41774

6.3001 1.58430
6.3504

1.58745 5.01996

4.50555

6.4009

1.59060

6.4516

1.59574 5.03984
1.59687
5.04975
1.60000 5.05964

1.42829 4.51664
1.45178

2.06

4.2456

1.45527

2.07
2.08
2.09

4.2849
4.5264
4.5681

1.45875
1.44222
1.44568

2.10
2.11

2.12

2.15
2.14

2.15
2.16

X/'\177-0N

5.00000

2.53

2.52

4.2025

2.05

X/'\177

1.58114

2.51

4.1616

2.04

N2

6.2500

4.48550

1.42127 4.49444

4.1209 1.42478

2.50

4.52769
4.55872

2.54

2.55

6.5025

5.00999

5.02991

2.56

6.5536

2.57

6.6049 1.60312

4.4100 1.449144.58258

2.60

6.7600 1.61245 5.09902

4,452I

4.59547

2.61

6,8121

4.61519

2.65

6.9169 1.62175

1.45258

4.54975

2.58
2.59

4.56070

4.57165

4.4944 1.45602 4.60455


4,5569

1.45945

4.5796 1.46287 4.62601


4.6225
1.46629
4.65681
4.6656 1.46969 4.64758

2.17 4.7089
1.47309
4.65855
2.18
4.7524 1.47648 4.66905
2.19 4.7961
t.47986
4.67974
2,20
4.8400 1.48524 4.69042

2.62

6.6564
6.7081

6.8644

1.60624
1.60955

1.61555

5.06952

5.07937
5.08920

5.10882

1.61864 5.11859
5.12835

2.64

6.9696

1.62481

5.13809

2.66

7.0756

1.65095

5.15752

2.67
2.69

2.68

7.1289 1.654015.16720
7.1824
1,65707
5.17687
7.2561 1.640125.18652

2.70

7.2900

2.72

2.65

7.0225 1.62788 5.14782

1.64517

5.19615

2.21

4.8841

1.48661

4.70106

2.71

2.25

4.9729

1.49552

4.72229

2.75

7.3441 1.646215.20577
7.5984
1.64924
5.21556
7,4529 1.65227 5.22494'

2.24

5.0176

1.49666 4.75286

5.0625

1.50000

2.76

2.22

2.25

4.9284 1.48997 4.71169

7.5625

1.65851

4.76445

2.77

7.6729

1.66453

5.26308

7.7284

1.66753

4.78559

2.79

7.7841

1.67055

5.27257
5.28205

2,80

7.8400 1.67552 5.29150

1.50555 4.75595

5.1076

2.27

5.1529 1.50665

2.29

5.I984

1.50997 4.77495

5.2441

1.51527

2.30

5.2900 1.51658 4.79585

2.31

5.3561

2.55

$.4289 1.526454.82701

1.51987

5.25450

7.5076

2.75

2.26

2.28

1.65529

2.\1774

4.74542

2.78

2.81

4.80625

5.24404

7.6176 1.661525.25557

7.8961
7.9524

1.67651
1.67929

8.0089

1.68226

5.50094
5.51057

2.52

5.5824

1.52515 4.81664

2.82
2.85

2.34

5.4756

1.52971 4.83755

2.84
2.85
2.86

8.0656 1.68525
5.52917
8.1225 1.68819 5.55854
8.1796 1.691155.54790

2.87

8.2569

1.69411

5.55724

2.89

8.5521

1.70000

5.57587

2.55

2.36

5.5225 1.55297
5.5696

1.55625

4.84768

4.85798

2.59

1.55948 4.86826
5.6644
1.54272
4.87852
5.7121 1.54596 4.88876

2.40

5.7600

2.41
2.42

2.44

2.57

2.58

2.45
2.45

5.6169

8.4100 1.70294 5.58516

5.8081 1.55242 4.90918

8.4681

1.55565

4.91955

2.92

2.95

8.5264

1.70587
1.70880

5.59444

5.8564

8.5849 1.711725.41295

5.9536

1.56205

4.93964

2.94

8.6456

1.71464

5.42218

4.95984

2.96

8.7616

1.72047

5.44059

4.96991

2.97

8.8209

1.72557

5.44977

4.98999

2.99

8.9401

1.72916 5.46809

5.9049 1.558854.92950

6.0025 1.565254.94975
1.56844

2.47

6.1009 1.57162

2.49

6.I504

1.57480 4.9?996

6.2001

1.57797

5.56656

2.91

4.89898

6.0516

2.50

8.2944 1.69706

2.90

1.54919

2.46

2.48

2.88

5.51977

6.2500 1.58114 5.00000


Ns

X/\177

2.95
2.98

3,00
N

x/i0fi

352

5.40370

8.7025 1.717565.45159

8.8804 1.72627

9.0000 1.73205
N 2

X./\177

5.45894

5.47723
Y '10N

(Continued)

TABLE

N2

3.0,

9.0000

1.732055.47723

3.0

9.0601

1.73494

9.1204

1.7378I

3.0.

9.1809

3.0

9.2416 1.74356

3.0

9.5025

3.0

9.5656

3.0

9.4249
9.4864

3.0

9.5481

3.0

1.74069

1.74642

3.50 12.25001.87085
5.51

5.48655

3.52

5.49545

5.50454

5.55

3.54
5.55

5.51562

5.52268

1.74929 5.55175

5,56

1.752145,54076

3.1

$.

5.92455

1.87885

5.94158

12.5516

1.88149 5.94979

I2.6025
t2.6756

1.87617

5.93296

1.884145.95819
1.88680

5.96657

3.57 12.7449 1.88944

5.97495

5.54977

5.58

12.81641.89209

1.75784

5,55878

5.59

12.8881

1.89475

5.99166

1.76068

5.56776

3.60

12.9600

1.89757

6.00000

3,61 15.05211.90000

3.62

9.7344 1.76655 5.58570


9.7969 1.769185.59464

5.]

1.87550

1.75499

19.6100
19.6721 1.76352 5.57674

3.1

5.91608

12.5201
12.5904
12.4609

15.1044

1.90265

5.98331

6.00833

6.01664

6.02495

3.65

15.1769

1.90526

5.60357

3.64

13.2496

1.90788 6.03324

5.66

15.5956

1.915116.04979

3.67

13.4689 1.91572

i9.8596
19.9225
i9.9856

1.77200

1.77764

5.62159

i0.0489

1.78045

5.63028

i0.2400

1.78885 5.65685

3.70

13.6900

1.92554

6.08276

3.7i

13.7641

1.92614

6.09098

.0.3684
.0.4329

1.791655.66569
1.79444
5.67450
5.68351
1.79722

\1770.4976

1.80000

1.77482

5.61249

.0.3041

3.:
3.,'

3.:

!0.5625

3.:
3.:

3.:

3.9

3.4
3.
5

3, 6

1,80278

3.68

5.69210

10.6276

1.80555

L0.6929

1.80851

t0.7584

1.81108

5.71839

5.75

13.9129

3.74

13.9876

1.93591

3.76

14.13761.95907

3.77

14.2129

[1.0889 1.82485

5.77062

3.85

1.82757
!1.1556
1.83030
;11.2225

5.77927

1.951526.10737

14.2884

1.94165

1.94422

6.11555

6.12372
6.13188

6.14003

6.14817

14.5641

1.94679 6.15630

3.80

14.4400

1.949366.1644i\"'\"

3.81

14.5161

1.95192

5.82

14,5924

1.95448

14.6689 1.95704

6.17252

6.18061'

6.18870

\1771.2896

1.85505

5.79655

5.84 14.74561.95959 6.19677


1.96214 6.20484
14.8225
5.85
1.96469 6.21289
14.8996
5.86

11.3569

1.8o576

5.80517

5,87

3,\177..._.\177]
11.5600
1.84662

3. 2

11.6964

5,\177$

11.7649

3.44

1111.8336

3.45

!11.9025

5.78792

11.9716

1.84952

1.85205

1.85472

5.83952

5.84808
5.85662

5.86515

1.85742 5.87567

1.86011 5.88218

47 i12.0409 1.862795.89067
5.89915
48 i12.11041.86548
12.1801

\177 !12.2500

/i

6.06630
6.07454

1.92875 6.09918

3.78
5.79

1.843915.83095

49

6.05805

13.8384

5.72

5.72715
1.81584 5.73585
10.8241
!0.8900 1.81659 5.74456
i0.9561 1.81954 5.75526
.11.0224 1.822095.76194

1.85848 5.81578
LI.4921 1.84120 5.82257

5.46

1.91050 6,04152

3.75 14.0625 1.95649

88

i11.4244

3.8

I5.3225

13.54241.91855
3.69 13.61611.92094

!0.1124 1.78326 5.63915


!0.17611.78606 5.64801

3.:
3.',

3.65

5.88

14.9769
15.0544

1.967256.22093
1.96977

5.89

15.1521

1.97231

3.90

15.2100

1.97484

5.91 15.2881
5.92

15.3664

5,95

15.4449

3.94

15.5236

5.96

15.6816

3.95

15.6025

6.22896

6.25699
-g. 2450

Oi

.97757

6,25500

.97990

6.26099
6.26897

.98242

1.98494

6.27691

1.98746 6.28490
1.98997 6.29285

15.7609 1.99249 6.30079


3.98 15.8404 1.994996.50872

3.97

1.86815

5.90762

1.87085

5.91608
V\1770 N

3.99 15.92011.99750

6.51664

4. O0

6.32456

353

16.0000

N2

2.00000

v'0N

TABLE I
N

N2

(Continued)

X,/\177

2,00000

6.32456'

4.00

I\"6.0000

4.01
4.02
4.03

16.080] 2.002506.55246
16.160q
2.00499
6,54035
16.2409 2.00749 6.34823

4.50

4.04

16.3216

2.00998

6.35610

4.06

16.4836

2.01494

6.37181

4.05

16.4025 2.012466.56396

4.07 16.56492.01742
6.37966
4.08
16.6464
2.01990 6.58749
4.09 16.72812.02237
6.39531
4.10

16.8100

2.12152

4.5I 20.34012.12568

4.52
4.55

20.4504

2.12605

20.5209

2.12858

6.70820
6.71565

6.72509

6.75055

4.54 20.61162.15075
6.75795
20.7025
2.15307 6.74537
4.56 20.7956 2.135426.75278
4.55

4.57

20.8849

2.13776

4.59

21.0681

2.14245

6.76018
6.77495

4.58 20.9764 2.140096.76757

2.02485 \"\177.405I 2

4.60 21.16002.14476

6.41093
4.12 16.9744
2.02978
6.41872
4.15 17.0569 2.03224 6.42651
4.11

20.2500

16.8921 2.02731

6.78255

4.62
4.65

21.2521 2.14709 6.78970


21.5444
2,14942
6.79706
21.4569 2.15174 6.80441

4.64

21.5296

4.61

4.14 17.1396
2.05470
6.45428
4.15 17.2225 2.05715 6.44205
4.16 17.5056
2.05961
6.44981

4.65 21.6225 2.156596.81909


4.66
21.7156
2.15870 6,82642

17.5889 2.04206 6.45755


2.04450
6.46529
17.5561 2.04695 6.47502

4.67 21.80892.16102
6.85574
4.68
21.9024
2.16555 6.84105
4.69 21.99612.16564
6.84856

4.17

4.18
4.19

17.4724

4.20

17.6400

4.21

17.7241 2,05185 6.48845

2.04959

4.22

17.8084

4.25

17.8929

2.05426
2.05670

4.24

17.9776

2.05913

4.26

18.1476

6.48074

4.70

6.49615
6.50384

6.51153
6.51920
2.06598 6.52687

4.25 18.06252.06155

4.27 18.23292.06640

6.53452

4.29

18.5184

2.06882 6.54217

18,4041

2.07123

6.54981

4.30

18.4900 2.07364 6.55744

4.31

18.5761

2.07605

I8.6624
18.7489

2.07846
2.08087

4.28

4.52
4.55

4.34

4.74

18.8356 2.08527 6.58787

4.76

4.80

25.6196

2.20454

6.97157

23.7169 2.206816.97854
25,8144
2.20907
6.98570
25.9121 2.21155 6.99285

4.39

19.2721

2.09525

6.62571

4.40

19.3600 2.09762 6.65325

4.41

19.4481

19.1844 2.092846.61816

4.94

24.4056

4.95

24,5025

19.9809

2.11424

N \177

X/\177

7.00000

4.92

4.47

2.12152

2.21559

4.95

6.64851

2.10713 6.66553
2.10950
6.67085
2.11187 6.67852

20.2500

24.0100

6.65582

19.7156

4.50

4.90

24.1081 2.21585 7.00714


24.2064
2.21811
7.01427
24.5049 2.22056 7.02140

19.8916

2.11660

4.89

4.91

4.46

20.0704

4.88

23.5225 2.20227 6.96419

6.64078

4.44

4.48

6.92820

4.86

6.61060

4.49 20.16012.11896

2.19089

4.87

2.09045

19.8025

25.0400

6.95701

19.0969

4.45

6.89928

2.20000

4.57

2.10238

2.18174

25.4256

4.85

19.5564

22.6576

6.88477
6.89202

4.84

6.59545

4.43 19.62492.1047\177

2.17715

4.77 22.7529 2.184056.90652


4.78
22.8484
2.18632 6.91575
4.79 22.94412.18861
6.92098

2.08567

4.42

6.85565

4.81 25.15612.19517
6.95542
4.82
25.2524
2.19545 6.94262
4.85 25.52892.19775
6.94982

6.56506
6.57267
6.58027

19.0096 2.08806 6.60505

2.10000

22.4676

4.75 22.56252.17945

18,9225

4.38

2.16795

6.81175

4.71 22.18412.17025
6.86294
4.72
22.2784
2.17256 6.87025
4.75 22.5729 2.174866.87750

4.35

4.36

22.0900

2.15407

2.22261
2.22486

7.02851
7.05562

6.68581

4.97

24.6016 2.22711 7.04275


24.7009
2.22935 7.04982

6.70075

4.96

6.69528
6.70820
X/\1770N

354

4.98 24.80042.25159

7.05691

4.99

24.900I

2.25585

7.06599

5.00

25.0000

2,23\177Q7

7.07107

TABLE

.oo

N2

'

i25.oooo

X/N

X/10N

2.23607

7.07107

!25.I001 2.23830 7.07814


5.02 i25.2004
2.24054 7.08520

5.01

(Continued)

5.50

30.2500 2.345217.41620

5.51

30.3601

2.54734

7.42294

50.5809

2.35160

7.43640

5.52 50.4704 2.34947 7.42967

5,0:5

:125.3009

2.24277 7.09225

5.55

5.04

i25.4016

2.244997.0993\177

5.54 50.69162.35372
30.8025

7.44312

5.56

2.55584 7.44985

50.9156

2.55797

7.45654

5,57

51.0249

2.56008 7.46324

5.05 !25.5025 2.24722


5 06 25.6036 2.24944
5.[37

5 98

5.39

\1772,5.7049

2,25167

5.55

7.10634

7.11337

7.12059

2.25389 7.12741
25.8064
!25.9081 2.25610 7.15442

,5,10 26.0100 2.25832 7.14145

5.12

2.260537.14845

5.58

2,36220

51.2481

2.56452 7.47665

5.60

31.5600

2.36645

5.61

51.4721 2.36854 7.48999

5.62

'26.2144

2.26274

7.15542

i26.3169

2.26495

7.1624O

5.63

2.26936 7.17635
2.27156 7.18331

2.26716

7.16958

5.64
5.65
5.66

i.
s.;li

2.27376 7.19027
2.27596 7.19722
.26.8324
!26.6256 2.278167.20417
26.7289
26.9561

5.68

5.\1770

27.0400

5.13

5. I4

:26.4196

5.15

!26.5225

5.:22

:27.2484
\1777.5529

27.5625

5.67

32.1489

5.69

52.3761

2.38118 7.52994
2.38328
7.53658
2.38537 7.54321
2.38747

32.2624

32,6041 2.58956 7.55645

5.72 52.71842.59165

7.54985

2..\1778473

7.22496

2.28692

7.23187

5.75

52,8529

2.59574

7.56968

2.28910

7.25878
7.24569
7.25259

5.74

52.9476

2.39585

7.57628

5.76

55.1776

2.40000 7.58947

5.77

55.2929

2.40208

55.4084

2.40416

oo.5\17741

2.40624

2.29129

5.75 55.06252.59792
5.78

2.300007.27524

5.79

7.56507

7.58288

7.59605

7.60263

7.60920

2.30217

7.28011

5.80

2.50454

7.28697

5.81

55.6400 2.40852 7.61577


55.7561
2.41059 7.62254

5.85

55.9889

5.52 28.30242.30651
5.35

28.4089

2.30868

7.29583
7.30068

5.\1774

28.5156

2.31084

7.50755

5.35

28.6225

5.\177;6

28.7296

2.315177.32120

5.$7

28.8569

29.1600

7.52530

2.37697 7.51665

32.4900

2.29565 7.25948
2.29783 7.26656

.5.40

7.50999

2.57908

5,71

27.7729

29.0521

2.57487

32.0356

31.9225

5.70

27.8784

5;._\177

51.8096

7.21803

5.98
5.29

28.9444

7.49667

31.6969 2.37276 7.50333

7.21110

5.'27

5.38

2.37065

31.5844

2,28254

2.29347

!28.0900

7.48331

2.28035

27.6676

27.9841

7.46994

51.1364

5.59

2,31733

2.32164

2.32379

2.41454

7.65544

7.34166

5.84 54.10562.41661
7.64199
5.85
54.2225
2.41868 7.64855
5.86 54,5596 2.42074 7.65506
5.87 54.45692.42281
7.66159
5,88
54.5744
2.42487
7,66812
5.89
54.6921 2.42695 7.67465

7.54847

5.90

2.31301 7.31457

2.51948

5.82 35.8724 2.412477.62889

7.32805

7.33485

2.32594 7.35527
2.32809 7.56206
5.42 29.2681
29.3764

54.8100

2.42899

7.68115

5.92

54.928I 2.45105 7.68765


55,0464
2.45511 7.69415

5,95 55.16492.45516

7,70065

5.94

55.2856

2.45721

7.70714

29.81162.33666 7.58918

5.96

55.5216

2.44151

5.!5
5. 7

2,53880
29.7025
29.9209

5.\1778

'30.0304

2.34094 7.40270
2.34307 7.4O945

5.97 35.64092.44556

o.25oo

2.345217.4162O

5.'}I
5.\1775

29.4849

5.44

29.5956

5. 6

lq'

'

5.91

2.330247.36885
9'\"
2.33,,08

2.33452

7.57564

7.58241

7.39594

N2

355

5.95

55.4025 2.459267.71562

5.98
5.99

55.8801

55.7604

6.00

56.0000[2.44949

2.44540

7.72010

7.72658

7,75505

2.44745 7.75951
7.7459\177'

TABLE I
v/iON

6.00

6.01
6.02
6.03
6.04

36.0000

2.44949 7.74597

6.50

42.2500 ;}.549518.06226
2,55147

42.5104

2.55343 8.07465

8.06846

42.6409

2.55539

8.08084

42.7716

2.55734 8.08703

2.45153

36.2404

2.45357 7.75887

7.75242

36.3609

2.45561

7.76531

6.51
6.52
6.55

7.77174

6.54

36.1201

36.4816 2.45764

6.05

36.6025

2.45967

6.07

36.8449

2.46374

6.09

37.088I

6.06

(Continued)

6.55

7.77817

36.7236 2.461717.78460

7.79102
7.79744
2.46779 7.80585

6.08 36.96642.46577

6.10 57.21002.46982

42.3801

42.9025

6.56

45.0536 2.56125 8.09958

2.55930

8.09521

6.57

43.1649

2.56320

8.10555

43.2964 2.565158.11172

6.58

7.81025

45.4281

6.59

6.60

\"43,5600

2.56710

8.11788

2.56905

8.12404

37.332I

2.47184 7.81665

6.61

45.6921 2.57099 8.15019

37.4544

2.47386

37.5769 2.47588 7.82945

6.62

43.8244

2.57294

8.13654

6.13

7.82304

6.14

37.6996

2.47790

7.83582

6.64

44.0896

2.57682

8.14862

6.16

37.9456

2.48193

7.84857

6.66

44.3556

2.58070 8.16088

6.17

38.0689 2.48395 7.85493


38.I924

2.48596

6.t9

38.3161 2.48797 7.86766

7.86130

6.67 44.48892.58263
8.16701
6.68
44.6224
2.58457
8.17515
6.69 '44.7561 2.586508.17924

6.20

38.4400

2.48998

7.87401

6.'/0

6.11

6.I2

6.15

6.18
6.21

6.22

6.25

6.24
6.25
6.26
6.27

6.28
6.29

37.8225 2.47992

6.63

7.84219

6.65

43.9569 2.57488 8.14248


44.2225 2.57876

44.8900

2.49599

58.8129

2.49600 7.89505

7.88670

38.9376

2.49800

7.89957

6.74

38.5641
38.6884

2.49199 7.88036

39.0625

2.50000 7.90569

6.71

45.4276

2.59615

6.75 45.56252.59808

8.22192

39.1876

2.50200

6.76

45.6976

2.50400 7.91833

6.77

39.4384

2.50599

7.92465

6.78

6.79

45.8329 2.601928.22800
45.9684
2.60584
8.25408
46.1041 2.60576 8.24015

7.93725

6.80

46.2400

39.5641

7.91202

2.50799 7..93095
2.51197 7.94355

40.0689 2.51595

39.9424

2.51396

7.94984

6.82

6.34

40.1956

2.51794

7.96241

6.84

46.7856

6.36

40.4496

2.52190

6.86

47.0596

6.37

40.5769 2.52389

6.38

40.7044

6.59

40.8321

40.3225 2.51992 7.96869


2.52587

7.97496

7.98123
7.98749

2.52784 7.99575

2.60768

8.24621

46.3761 2.609608.25227
46.5124
2.61151 8.25855
6.85 46.64892.61345
8.26458

59.8161

6.53

7.95613

2.60000

8.20975
8.21584

39.3129

6.31

6.35

8.18535

45.0241 2.59057 8.19146


6.72
45.1584
2.59250
8.19756
6.75 45.2929 2.594228.20566

6.30 39.69002.50998.

6.52

2.58844

8.15475

6.81

8.27045
8.27647
2.61916 8.28251
2.61534

6.85 46.92252.61725

6.87 47.19692.62107

8.28855

6.89

47.5344

2.62298 8.29458

47.4721

2.62488

8.50060

47.6100 2.62679 8.50662

6.88

6.40

40.9600

2.52982

8.00000

6.90

6.4I:

41.0881

2.53180

8.00625

6.91

47.7481

2.62869

6.95

47.8864

2.65059

8.51865

8.01873

48.0249 2.652498.52466

6.94

48.1656

2.65459

8.35067

6.96

48.4416

2.65818

8.34266

6.45

41.2164

2.53377 8.01249

41.3449

2.53574

6.44

41.4736

2.53772 8.02496

41.6025

2.53969

6'.46

41.7316

2.54165

8.03741

6.47

4128609

2.54362

41.9904

8.04363

2.54558 8.04984

42.1201

2.54755

8.05605

6.42

6.45
6.48

6.49
0.50
N

8.03119

42.2500 2.54951 8.06226


N2

v/i,ox

356

6.92

6.95

8.51264

48.5025 2.656298.55667

6.97 48.58092.64008
8.34865
6.98
48.7204
2.64197 8.55464
6.99 48.86012.64586
8.36062
'7.,00

4.%0000

2.64575

8.56660

TABLE

(Continued)

49.1401

2.64764

49,2804

2,64953

8.37257

8.37854

49.5616

2.65550

8.39047

49.8436

2.65707

8.40258

7,

,49.9849

2.65895

8.40833

150.2681

2.66271

8.42021

i50.1264 2.66083 8.41427

7.

7.

50,5521

2.66646 8.43208

\1770.6944

2.66855

7..

30.8569

2.67021

7!i

2.681428.47959
2.68328

8.48528

S1.98412.68514

7.2

52.1284

7.22

8.46759

2.67955 8,47549

51.6961

7.23

52.2729

7.24

52.4176

52.5625 2.692588.51469
7.26 52.7076 2.69444
8.52056

7.25
7.27

52.8529

7.28
7.29

52.9984

55.1441

8.52645

2.69629

8.53229

2.69815
2.70000

8.53815

7.30

2.70185

8.54400

7.51!

2,70370

8.54985\177

2.70555 8.55570

7.32i

7.35

7.34

7.35

5\177.8756
54\177i0225

7.56

[ 54i1696

7.58
7.39

7.37 [ 54\177i.'5169
54i4ff44

54i6121

2.70740

8.56154

2.70924

8.56738

2.71109

2.71293

8.57321

8.57904
2.71477 8.58487
2.71662 8,59069

2.718468.59651

2.74591
2.74775

8.68352
8.68907

2.74955 8.69483

57.3049 2.751368.70057

57.4564

2.75518

8.70632

57.6081.

2.75500

8.71206

8.75214

7.66

58,6756

7.67

58.8289 2.76948 8.757851

7.68

58.9824

2,76767

2,77128

7.69 59.15612.77508

8.76556

8.76926

7.70

59,2900

2.77489

8.7749'\177

7.71

59.4441

2.77669

8.78066

7.75

59.7529

2.78029

8.79204

7.74
7.75

59.9076
60,0625

2.78209
2.78588

8.80541

8.49117

2.68701 8.49706
2.68887 8.50294
2.69072 8.50882

57.0025

8.67756

57.7600
8.71780
7.61 57.91212.75862
8.72555
7.62
58.0644
2,76043 8.72926
7.65 58.21692,76225
8.75499
7.64
58.3695 2.76405 8.74071
7.65 58.52252.76586
8.74645

8.44985

1.84oo

55.8516

57.1556

8.67179

2.74408

7.60

8.46168

2.67769

7.,

7.55
7.56

7.59

2.67595 8.45577

5! .4089

51.5524

56.7009

7.58

8.44593

$1.26562.67582

7.55

7.57

8.45801

2.67208

\1770.9796

81.1225

7.

2.73861 8.66025
56.4001 2.74044 8.66603

7.52 56.55042.74226

49.7025 2.65518 8.59645

7.

7.51

56.2500

7.50

49.4209 2.651418,38451
7

N \177

49.0000 2.64575 8.36660

7.72

59.5984 2.77849

8.78655

8.79775

7.76

60,2176 2.78568 8.80909

7.77

60.3729

2.78747

8.81476

60.6841

2.79106

8.82610

7.78
7.79

60.5284 2.78927

7.80 60.84002.79285
60.996t

2.79464

8.82045

8.83176

7.81
7.82

61.1524

2.79645 8,84508

7.84

61.4656

2.80000 8.85458

7.86

61.7796

2.80357 8.86566

7.85 61,50892.79821
7.85 61,62252.80179

7.87

7.88
7.89

8.85742

8.84875
8.86002

61.9369

2.80555

8.87150

62.2521

2.80891

S.88257

62.0944 2.807158.87694

7.40

2.72029

8.60255

7.41

2.72215

8.60814

7.9I

62.568I

2.81247

8.61974

7.92

62.7264

2.81425

8.89582
8.89944

7.94

65.0456

2.81780

7.95

8.91067

8.91628
65.5616 2.821558.92188
65.5209
2.82512
8.92749

7.42

7.43

2.72597
2.72580

7.44
7.45
7.46

2.73150

7.47
7.48
7.49

7.50
N

2.72764
2.72947

8.61594

8.62554
8.63134

8.63713

2.73313

8.64292

2:73679

8.65448

2.75861

8.66025

2.73496 8.64870

7.90

7.95
7.96
7.97

7.98
7.99

62.4100 2.81069 8.88819

62.8849 2.816058.90505
65.2025

65.6804

65.8401

2.81957

2.82489

2.82666

8.00 64.00002.82845

8.95308

8.95868

8.94427

TABLE I

2.82845

(Continued)

8.94427

8.50 72.2500 2.915489.21954_

8.00

64.0000

8.0I

64.I601 2.85019 8.94986

8.5I

72.4201

2.91719 9.22497

8.02

64.3204

8.05

64.4809

8.55

72.7609

2.92062

8.54

72.9516 2.92255 9.24121

8.55

75.1025

8.56

75.2756

2.92404
2.92575

9.24662
9 25205

8.57
8.58

75.4449

2.92746

9.25745

75.7881

2.95087

9.26825

2.85196
2.85575

8.52 72.5904 2.918909.25038

8.95545

8.96105

8.04 64.6416 2.85549 8.96660


8.05
64.8025
2.85725
8.97218
8.06 64.9656 2.859018.97775
65 1249
65 2864
65.4481

8.07

8.08
8.09

-65.6100

-8.10

8.11

8.12

2.84077
2.84\177o5

8.98352
8.98888

2.84429

8.99444

2.84605

8.59

9.00000

2.84956

8.15

9.01110

66.0969 2.85152 9.01665

8.14

66.2596

2.85507

9.02219

8.15

'-8.61 74.152I 2.95428

66.4225 2.85482 9.02774


66.5856

8.16

2.85657

66.7489

2.85852

9.05881

8.19

67.0761

2.86182

9.04986

8.18

67.4041

2.86551

9.06091

8.22
8.25

67.5684
67,7529

2.86705
2.86880

9.06642
9.07195

8.24
8.25
8.26

67.8976
68.0625
68.2276

2.87054
2.87228
2.87402

9.07744
9.08295
),08845

8.27
8.28
8.29

68.5929
68.5584
68.7241

-8.30

2.87576
2.87750
2.87924

69.0561

69.2224 2.88444 9.12140


69.3889

2,88271

9.11592

2.88617

).12688

9.14877

8.59

70.5921

2.89655

9.15969

70.5600 2.898289.16515

-8.41

8.42

70.7281
70.8964

8,44

71,2336

2.90517

8\17745

71.4025

2.90689

8,45

\177.90000

2.90172

9.54545

8.74

76.5876

2.95655

9.34880

8.76

76.7576

2.95975

9.55949

8.77

76.9129

2.96142 9.56485

77.0884

2.96511

9.37017

77.4400

2.96648

9.58085

77.6161

2.96816

8.78
8.79

\177.80

8.81

8.82
8.85

V'\177

76.5625 2.958049.55414

77.2641

77.7924

77.9689

2.96479 9.57550_
9.58616
2.96985
7.59149
2.97155 9.59681

8.84 78.14562.97521
2.97489

9.40215
9.40744

2.97658

9.41276

8.85
8.86

78.5225
78.4996

8.87

78.6769 2.97825 9.41807


78.8544
2.97995
9.42558
79.0521 2.98161 9.42868

8.88

8.89
\177.90

79.2100

2.98529

9.45398

-8.91\"' 79.5881'\177.984969.45928

8,)2 79.5664 2.986649,44458

9.17061

9.17606

8,95

71,0649 2.905459,18150

Ns

9.52758

2.95466

9.18695
9.19259

71,5716 2,90861 9,19785


8.47 71.7409 2.910539.20526
8,48
71.9104
2.91204 9.20869
8.49 72.0801 2.915769.21412
8.5 O- -72.2500
2.91548 9.21954_

8,46

2.94958

76.2129

8.58 70.2244 2.894829.15425

8,40

-75.6900

8.75

69.5556 2.887919,15256
69.7225
2.88964
9.15785
8,56 69.8896 2.891579,14550
2.89510

9.50591

2.94449 9.51128
75.5424
2.94618
9.51665
75.5161 2.94788 9 52202_
75.8641 2.95127 9.55274

8.55

70.0569

9.30054

2.94279

8.71

8.54
8.37

2.94109

75.1689

8.75

68.8900 2.88097 9.11045_

8.33

74.9956

9.28440
9.28978

8.72 76.0584 2.95296 9.55809

9,10494

8.32

74.8225

8.69

9.09595
9.09945

8.51

74.6496 2.95959 9.29516

8.65

\177,\1770'

67.2400 2.86556 9.05559

8.21

8.64

8.68

66.9124 2.86007 9.04454

8.20

74.5044
74.4769

8.67

2.95598
2.95769

9.27901

8.62
8.65

8.66

\177.05527

8.17

2.92916 9.26285

75.9600 2.95258 9.27562

8,60

65.7721 2.84781 9.00555


65.9544

75.6164

9.23580

x 10N

358

79.7449

2.98851

9.44987

8,94 79.9256 2.989989.45516


8.95
80.1025
2.99166 9.46044
8,96 80,28162.99555
9.46575
8.97

80.4609

2.99500

9.47101

8.99

80.8201

2.99855

9.48156

.oo

81.0000 3.000009.48683

8.98

80.6404 2.996669.47629

TABLE

3.00000

81.0000

\177100

).01 81.1801

9.48683

90.,\17700

3.08221

5.001679.49210

9.51

90.4401

5.05585

9.73192

9.55

90.8209

5.08707

9.76217

0.,50
I

}.02

81.3604

3.00533

9.49737

LOS

81.5409

5.00500

9.50265

L04
\177.05

81.7216
81.9025

5.00666
3.00832

9.51315

!h06 82.0856

90.6304 5.08545 9.75705

9.54

91.0116 3.088699.76729

9.55

91.2023

9.57
9.58

9.50789

82.2649
82.4464
82.6281

5.0I 164
3.01330
3.01496

c\177.10

82.8100

3.01662

9.53959

t.). 11

82.9921

3.01828

9.54465

9.13

83.3569

3.02159 9.55510

\177.14

83.5396

9.15

85.7225

5.02324
\"' 9
0.0,,490

9.12 83.17443.01995

9.52.565

9.52890

9.53415

84.0889

\177.18
\177.19

84.2724
84.4561

9.65

\177.l\177/lo

92.9296 5.104859.81835
93.1225

3.10644

9,67

95.5089

5.10966

93.7024

3.11127

9.69

95.8961

9.;0

94.09003.11448

9,71
9.72

9.1.2841

9.68

84.6400 3.03515 9.59166


3.03809

9.79285

9.58645

9.58123

84.8241

85.0084

5.09677

9.57601

9.'.21

85.1.929

9.z8,64

3.09516

9.66 95.31565.10805

9.20

3.03480
5.05645

,5.09.554

92.1600 3.09839 9.79796


9.61
92.3521
3.10000 9.80506
9.62 92.5444 5.101619.S0816
9.63
92.7369
5.10322 9.81326

9,64

9.56556

9. I7

9I .5849
91.7764
91.968[

9.77241

9.77753

0.60

9.56033

85.9056 3.02655 9.57079


$.02820
3.02985
3.03150

9.59

9.54987

9.16

3.09031

9.56 91.39363.09192

!1.07

9'.23

9.52

9.74679

5.009989.51840

f\177.08
t.,.09

q.22

(Continued)

9159687

9.60208
9.60729

9.75

3.11288
3.11609

9.82344

9,82855

9.85362
9.84378
9.83870
9.84886

9.85395

9,85901

94.4784

5.11769

94,6729

5.11929

9.86408

94.8676

5.12090

9.86914

85.3776

$.05974 9.61249

85.5625

3.04138

9.61769

9.74
9.75
9.76

9.27

85.9329

3.04467

9.62808

9.77 95.45293.12570

3.29

86.3041

3.04795 9.65846

9.78
9.79

91.110

86.4900

3.04959

9.80 96.04005.15050

!,51

86.6761 5.05125 9.64883


86.8624
3.05287 9.65401

9,81

96.2361

5.15209 9.90454

5.05450

9.65919

96.4.524

87.0489

3.15369

9,85

96.6289

5.15528

4
\177125

85.7476 5.043029.62289

9.26
t
t

9.28 86.11843.04631

\177
.52
i .33

9.65328

9.64565

9.89949

9.90959

9.91464

9.67988

3.06431

9.69020

5.06594

9,69536

9.90

5.06757
3.06920

9.70052
9.70567

9.9549!)
9.92 98.40645.14960
9.959\177j2
9.95
98.6049
5.15119 9.96494

9.94 98.8056\177.15278 9.96995


9.95
99.0025
5.15456 9.97497
9.96 99.20165.15595
9.97998

' .56

\177'
87.6096

' 87.4225

: 87.9844

.39

88.1721

(.

,40

88.5600
': 88.5481

-:88.7564

3.05778 9.66954

3.062689.68504

9':43

88.9249

3.07083

9.71082

9144

89.1156

3.07246

9.71597

9k46

89.4916

5.07571

9.72625

9148

i89.8704

3.07896

9.73653

90.2500

3.08221

7.7.1679

91.45 89.5025

9.91

3.07409 9.72111

9.49 i90.06015.08058
- i

9.884,'75
9.88939

9.89444

9.67471

9.66457

0150

5.12890

9.87927

5.06105

5.05614

\177
,41
i .42

3.127.':;0

9.87421

3.05941

87.2356

i .38

95.6484
95.8441

5.12410

87.7969

' .37

95.2576

5.1 2250

9.84 96.82565,15688
9.91968
9.85
97.0225
5.13847 9.92472
9.86 97.21963.14006
9.92975
9.87
97,4169 3.14166 9.95479
9.88 97.61,t-t
5.14525
9.93982
9.89
97.8121 3.14484 9.94485

',34
i.35

9.82

\"
\"'
9o,06.\177

9.97
9.98
9.99

9.74166

98.0100

98.2081

99.4009
99.6004
99.8001

3.14645

5.15753
3.15911
3.16070

10.00 100.0003.16228

359

9.94987

5.14802

98\1779O
9.o80'\177

9.995(!0
10.0000

:'

TABLE

IIa

Randorn

39 65 76 45 45 19 90 69 64 61 20 26
73 71 23 70 90 65 97 60 12 11 31 56
72 20 47 33 84 51 67 47 97 19 98 4O
24 33
75 17 25 69 t7 17 95 21 78 58
37 48 79 88 74 63 52 06 34 30 O1 31

Digits

36 31 62 58 24 97 14 97
34 19 19 47 83 75 51 33
23 O5 O9 51 8O
O7 17 66
45 77 48 69 81 84 09 29
60 10 27 35 07 79 71 53

95 06 70

99

30 62 38 20

59 78

O0

46

11 52 49

93 22 70 45 80

28 99 52 O1 41

02 89

85 53 83 29 95 56 27 09 24 43 21 78 55 09 82 72 61 88 73 61
37 79 49 12 38 48 13 93 55 96 41 92 45 71 51 09 18 25 58 94
14 36 59 25 47 54 45 17 24 89
89 09 39 59 24 O0 06 41 41 20
10 08 58 07 04 76 62 I6 48 68 58 76 17 14 86 59 53 1t 52 2l 66 04 18 72 87
95 23 O0 84 47
47
90 56 37 31 71 82 13 50 41 27 55 10 24 92 28 04 67 53 44
08 16 94
87 18 15 70 07
98 83 71 70 15

93 05 31 03 07 34 I8 04 52 35 74 13 39 35 22 68 95 23 92 35 36 63 70 35 33
21 89 11 47 99 11 20 99 45 18 76 51 94 84 86 13 79 93 37 55 98 16 04 41 67
27 37 83 28 7t
79
57 95 13 91 09 61 87 25 21 56 20 11 32 44
95 18 94 06 97
77 31 61 95 46 2O 44 90 32 64 26 99 76 75 63
97 08 31 55 73 10 65 81 92 59
69 26 88 86 13 59 71 74 17 32 48 38 75 93 29 73 37 32 04 05 60 82 29 20 25
41 47

10 25 03

91 94

14 63 62

87 63 93 95 t7
08 61 74 51 69

81 83 83

04 49

92 79

89 79

43

45 85 50 51
29 18 94 51 23

77

79

88 O1 97

30

14 85 11 47 23

47 08 52 85 08 40 48 40 35 94 22 72 65 71 08 86 50 03 42 99 36
67 72 77 63 99 89 85 84 46 06 64 71 06 2t 66 89 37 2O 7O O1 61 65 70 22 12
59 40 24 13 75 42 29 72 23 19 06 94 76 10 O8 81 30 15 39 14 81 83 17 16 33

80 06 54 t8

63 62 06 34 4I 79 53 36 02 95
79 93 96 38 63
78 47 23 53 90
87 68 62 15 43 97 48 72 66 48
47 60 92 10 77 26 97 O5 73 51
56 88 87 59 41 06 87 37 78 48

94 61 09 43
34 85 52 05
53 16 71 13
88 46 38 03
65 88 69 58

62 20 21 14 68
09 85 43 O1 72
81

86

73
59 97 50 99 52
72 68 49 29 31

84 95 48 46 45

14 93

87

81 40

24 62 20 42 31
75 70 16 08 24

58
39 88 02 84 27 83 85 81 56 39 38

84 87 02 22 57 51 68 69 80 95 44 11.29O1 95 80 49 34 35 86 47
46 39 77 32 77 09 79 57 92 36 59 89 74 39 82 15 08 58 94 34 74
23 02 77
28 06 24 25 93 22 45 44 84 11 87 80 61 65 31 09 71 91 74 25
76 71 61 97 67 63 99 61 80 45 67 93 82 59 73 19 85 23 53 33 65 97 21
45 56 20 19 47
69 30 16 09 05 53 58 47 70 93 66 56 45 65 79
28 26 08

22 17 68

65

19 36 27 59

16 77
78 43

03 28

19 87 26 72 39 27 67 53 77 57 68 93 60 61 97 22 61
43 96 43 O0 65 98 50 45 6O 33 O1 O7 98 99 46 50 47
23 68 35 26 O0
99 53 93 61 28 52 70 05 48 34 56 65 O5 61 86 90 92 10 70 80
15 39 25 70 99 93 86 52 77 65 15 33 59 05 28 22 87 26 O7 47 86 96 98 29 06
58 71 96'30 24 18 46 23 34 27 85 13 99 24 44 49 18 09 79 49 74 16 32 23 02

04

31 t7

21 56 33 73

99

61 O698 03 91

87 14 77

93 22 53 64 39

07 10

78

76

58 54 74

61 81 31 96 82

42 88 07
77 94

30

10 05
O5

39

63 76 35
92 38 70 96 92

O0 57

25 60 59
24 98 65 63 21

28 10 99 O0 27

87 03 O4 79 88

52 06 79 79 45
46 72 60 18 77
47 21 61 88 32
12 73 73 99 12

360

08 13 13 85 51 55 34 57 72 69
82 63 18 27 44 69 66 92 19 09
55 66 12 62 11 08 99 55 64 57
10 92 35 36 12
27 80 30 21 6O
49 99 57 94 82 96 88 57 17 91

TABLE

lib

0.464

137

1.486

526

--0.531

354

--0.634

0.060

2.455 --0.323

1.394

555

0.906

1.179

[13

--1.501

--0,690
1.372
>-25

--0.482

--1.376

0.046

--0,005

--1.787

--1,339

0.012

63

--0.911

!6t
--0.!175
1

--21i56

--1.186

2,945

0,\17781

--0.486

--0.256
0.065

--0.561

1.598
--0.725
1.231

1.237
--1.384
--0.959

0.731

-- i.190

0,785

--0.963

0.194

1.974

1.192

--0.258

0.412

0,2t9
--0.169

0.415

--1.141

0.983 --1.330
--1,096 --1.396
0.250

--0.166

--0,465

0.857

0.120

--0.260

1.096 0.481 --1.69i


1.239 --2.574 --0.558

0.424

0,542

--0.432

--0.423

0.748

0.362

--1.041

--0.035

0.371
--0.702

--1,885
--0.255

--0.2t8

--1,630 --0.146 --0.392


--0.116 --1.698 --2.832

0.969

--0.273

0.439

--1,983 --2,830
0.779 0,953
0.313 --0.973

--0.491

0.856

--0,212

0.525
--1.865

--0,501

--1.530

0.960

--0.736

--0.957

0.241

0,022

--0,853

0.]61
--0.631

t.046 --0.508
0.360 --0.992
--0.873

--1,633

1.298

0.t87

-- t.851

1.147 --0.121
--0.199 --0.246

1.377

0.717

-2.o8

--1.805

0,321

0.761

0.378

199

o ;35

1.041

0,279

0,571

1,375

0,595 0,881 --0.934 \177.579


0.007 0.769 0.97I 0.712 1.090
--0.162 --0.136 1,033
0.203
0,448
--1.618 --0.345 --0.511 --2.051 --0.457

--0.918

t.393

--1.558

0,926

c=l

--0.288

0.296

--0.525

1.356

--0,105

0.697
3.521

--0.057 --t.229

--I.010

--0.068
0,543

--0.194

1.279

1.022

Numbers,

Normal

Random

--0.627
--1,108

1.620--1.040

--0.238
--i.0t6
--0.869

0.417

0.056
0.561

--2.357

--t.726
0.524

1.956

--0.281

1.047 0.089 --0,573 0,932


0.079

0.032

0.471 --1,029

1,114 0,882
1.265
--0.202
0,151 --0.376 --0,310
0.479
1.151 --1.210 --0,927
0.425
0.290 --0.902
0.610
1.709
--0.439 -- \177.ii41
--1.939 0.891 --0.227
0.602
0,873 --0.437 --0.220 --0.057
-- 1.399
0,385 --0.649 --0.577
0.237
--0.289
0.513
0.738 --0.300
--0,q30
0.199
--1.083 --0,219
--0.291
1.221
1.119
0,004 --2.015 -- 0. 594
0.159
o\370i72
\3708
--2.828
--0.439
--0,792 --1.275 --0.623 -- 1,047
2.273
o.$o6 --0.313 0.084
0.606 --0.747 0.247
1.291
0,063 --I.793
--0.699 -- 1,347
0.04t -0.3\17737
0.121 0.790 --0.584
0.541
0.484 --0,986
0.481 0.996
--1.132 -2.0\1778
0.921 0.145 0.446 --1,661 1.045
--1.363
--0.586
-- 1.023
0.768 0.(379
0.034 --2.127
0.665 0.084 --0.880 --0,579
0.551
'O.375 - 1.658 --1.473
0..234 --0.656
0.340 --0.086 --0.t58 --0.120
0.418
--0,513 0,3 \1774 --0.851
0.074
0,210 --0,736
1.041 0.008 0.427 --0.831 0, t91
0.292
0.5!1
i.266
--1,206 --0,899
0.110 --0.528 --0.813 0.071
0,524
1.026
2.9 }0
--0.574
--0.491
--1.114
i.297 --t.433 --1.345 --3.001
0.479
I.\17780

0.658

--1.334

--0.287

0.1 t4 --0.568
D.8 16
3. I \1773
).1'19

-- 1.346

--1,202

--1.250

0.630
0,375

--1.420

--0.151

--0.288

+-1.9
'-0.5t7

10.489

&o.2,3
!

o.5i

0.424

0,593

0,235
--0,853

0.782
0.247

--1.711

--0.430
0.416

--0,309

0.862

--0.109

--0.254 0.574
--0.921 \1770.509

0.161

'0.658
'7\17708
\337
8\177
5

'-70.62!
0.40\177

0.593

--1.127

--0.566
--1.I81
--0,518

0.500

2.923
--1.190
0. I92

0.394 --1.045 0.843


0.942
1,810 1.378 0.584 1.216

0.060
--0.491

0.499

--0.431

0,665

--0. t35

1,705

0.754 --0.732 --0.066

\1770.762

0,298
1,456

--\177.541

0,993

--t.407

--0.023 --0.463

0.833

1,049
2.040

1.810
--0. t24

1,381

0.022

--0.899

--0.394

--0.538

0.410 --0.349

361

--1.094

0,326

1.114
I

.068

1,045

0.031

0.772

0.733

0.402

0.226

1,164

1.006

0,884 --0.298
0.457

--0,798

2.885 --0.768

0.196

--0. t06
0.116 0.484 --I.272
--1,579 --1,616 1,458
1,262
0,532

0.359

--0.318 --0.094
--0,432
1.50I

--0.145 --0.498

--1.186

--0.142 --0.504
0,777

--0.515
--0.451
1.410

--0.281

--0.129

0.023 --1.204
1,066

0.736

--0.342

1.707 --0.188

0.580

1,064

0,162

1.395

1.097

--0,916

1.222

--1.t53
1,298

Coefficients

Binomial

IIIa

TABLE

0
1

1
1

10

10

15

20

15

21

35

28

56

35
70

36

84

126

21
56
126

10

10

45

120 210

11

11

55

165

12

12

66

13

13

14

14

78
91

220 495 792


286 715 1287
364 1001 2002

3003

3432

3003

15

15

105

455

1365

3003

5005

6435

6435

16

16 120

560

1820

4368

8008

11440

12870 11440

17

17

19448

24310

18

18

31824

43758

19

19

136
153
171

50388

75582

20

20

190 11404845

330

680 2380
816 3060
969 3876

1
1

7
28

84

36

252

210

120

45

10

462

462

330

165

924

792

38760

77520

1
11

55

220

495

1716 1716

6188 12376
8568 18564
11628 27132
15504

\1771

6\177

715

1287
i

28\177

2002

1001

5005

3002

24310
48620
92378

125970 167960

800\177
1944\177
4375\177

92371
1847\177

For
Note.

efficients

,\177:(\177
-

missing from

the

l)(,r

- 2) \337. \3373.2.1

above

table,

use the

362

relation

co-

TABLE IIIb

0
1

0
1

io95

.0025'

9025\177

8-1
.574

.45

,40

.50

.8000 .7500 .7000 .6500 .6000 .5500 .5000


.2500 .3000 .3500 .4000 .4500 .5000

.2000

.8100 .7225
.6400
.5625
.4900
,1800 .2550 .3200 .3750 .4200

.4225
.4550

.4800

.4950

.5000

.0100 .0225 .0400

.1225

.I600

.2025

.2500

.0900

.0625

.3600 .3025 .2500

.0135

.0486

.0001

.8145

2
3

.8500
.I500

.35

.1715\177

.\177354

0
1

.9000
.1000

.30

.25

.7290
.2160
.1664 .1250
.2430 .6141 .5120 .4219 .3430 .2746
.3251
.3840 .4219 .'4410 .4436 .4320 .4084
.3750
.0270
.2880
.0010 .0574 .0960 .1406 .1890.2389
.0034
.0080 .0156 .0270 .0429 .0640\177 .0911
.1250
.6561
.2916 .5220 .4096 .3164 .2401 .1785 .1296 .0915.0625

.0071

.20

.15

.10

Probabilitiesp(x)

Binomial

Individual

.4116

.4219

.4096

.3685

.3456'

.3845

.2995

.2500

.0410

.0625

.0975 .1536 .2109 .2646


.3105
.3456
.0005
\177.0036
.3750
.0115 .0256 .0469 .0756 .!115 .1536 .3675
.2005 .2500
.qOO0 .0001

.0081

.0039

.0016

.0005

.0256

.0150

5 0

\337
77381.5905

.4437

,2036 !.3280

.3915 .4096

.3277

.2373

.1681 1160

.3955

.3602

.3087 .3364\1773456.3369
.1323 .1811 \1772304 .2757

.3124

.077\177

.0503

.0312

.2592

.2059

.1562

.3125

.0214\177,0729

.1382

.2048

.2637

\337
0011.\177.0081

.0244

.0512

.0879

.0000 1.0004

.0022 .0064

.0146 .0284

.0488

.0\17768

.1128

.1562

.0000

;.0000

.0001 .0003

.0010 .0024

.0053

.0102

.0185

.0312

.7\17751
.2321

..5314
\177
..3543

.3771
,3993

.2621

.0277

.0156

.0984

.1762

,2458

.3125

0
1

2
3

.0 \17721

.0154

.0004

.0015

.0044. .0102

.0205

.0369

.0609

.0938

.0000

.000I

.0002 .0007

.0018 .0041

.0083

.0156

.0000

t. 0001

.0000

.0000

4
5

rr

.0819

.0055

.0012

2
3

If

.0415

0
1

i.0146

.3025 .2437 .1866 .1359 .0938


.3241 .3280 .3110 .2780 .2344
.1318 .1852 .2355 .2765
.3032
.3125
.0330 .0595 .0951 .1382 .1861 .2344

.0\17705

.0 0I

>

.1176 .0754 .0467

.1780
.3560

.3932

.2966

.69\1773

4783

.25'73

372O

.3206

.2097 .1335

.041)6

1240

.3960

.3670

,00\1776

0230

.2097

.2753

00\275)2

0026

.0617 .i147.1730

.2269

.2679

.2903

.2918

.2734

.0109 .0287

.0972

.1442

.I935

.2388

,2734

.1641
.0547

00(\1770

3002

00(\1770

)000

oo\234o

)ooo

.50,

nt

hange

.0012

.0824 .0490 .0280 .0152 .0078


.1306 .0872
.0547
.3115 .3177 .2985 .2613 ,2140 .1641
.3115

.0577

.2471 .1848

.0043

.0115

.0250

.0001 .0004

.0013

.0036

.0466
.0084

.0774 .1172
.0172 .0320

.0002

.0006

.0016

.0000 .0000 .0001


rr and

(1

\177).

363

.0037

.0078

(Continued)

TABLE IIIb

.15

.10

.05
0

,6634

.4305

.2725

.2793

.3826

.3847 .3355

.0515

.1488 .2376
.0331

0054

.0004 .0046

.0000

.0000 .0000

.0000

.1678

.0839

.0004

.2936

.25

.30

.3115

.0865

.0026

.0092 .0231

.0002

.0011

.0038

.0039

.0548

.0312

.2541

.2786

.2787

.1361

.1875

.2322 .2627

.2568

.1094
.2188

.2734

.0467

.0808

.1239

.0100

.0217

.0413 .0703

.0033

.0079

.0164

.0002

.0007

.0017

.0312
.0039

.0046

.0020

.0629

.0077

.0001 .0004

.2316 .1342 .0751 .0404

.1719

.2188

.1094

.0207

.0101

.1556

.1004

.0605 .0339

.3020 .3003

.2668 .2162 .1612.1110.0703

.1069

.1762 .2336

:2668

.2716

.2508

.0283

.0661

.1715

.2194

.2508 .2600 .2461

.0165 .0389 .0735


.0028 .0087 .0210

.3679

.3020

.1722

.2597

.0446

.0000

.0008

.0050

,0000

.0001

.0006

.0000

.0000 .0000

0000

.0000

.0000

.0000 .0000 .0000

.5987

.3487

.3151

.3874 .3474

.0746 .1937

.0105

.0010

.0574
.0112

.0084

.2090 .1569

.6302 .3874
2985 .3874

.50

.2587

0
1

.45

.2965

.0000

.40

.0576 .0319 .0168


.2670 .1977 .1373,0896

.0012
0000 .0000 .0000 .0000 .0000 .0001
.0000

.35

.1001

.1468 .2076

.0185 .0459

4 .0006 .0074

10

.20

.0003

.0000

.1969
.2759

.2253

.1168

.0012

.0000 .0001
.0000

.0001 .0015 .0085


.0264
6 .0000 .0001 .0012 .0055
7 .0000 .0000 .0001 .0008

.0000

.0000

.0000

.0000

.0000 .0000

.0424

.0743

.0098

.0212 .0407

.0004 .0013 .0035


.0000

.0001

.0563 .0282 .0135


.2684 .1877 .1211.0725
.3020

.2816

10 .0000 .0000 .0000 .0000

.2335

.1757

.2503 .2668
.2522
.1460 .2001 .2377

.1160

.1641

.0703

.0083

.0176

.0003

.0008

.0020

.0060

.0025

.0010

.0403

.0207

.0098

.1209 .0763

.0439

.2150

.1665

.1[\1772

.2508

.2384

.2051

.0584

.1029

.0162

.0368

.0031

.0090

.1536 .2007 .2340


.0689 .1115 .1596
.0212 .0425 .0746

.2461
.2051
.1172

.0014

.0043

.0106

.0000

.0001

.0005

.0000

.0000

.0000

.0016 .0042 .0098


.0001 .0003 .0010

.0001 .0004
.0000

.1641

,1181 .1672.2128.2461

.1074

.1298 .2013
.0401 .0881

.0039

.2119

.0176

364

.0229

.0439

'TA

IIIc
.o

2 1
2
1

2
3

4
2

4
5

.10

.09 '5

.1900

.00 5

1
2
3

.14\1776

.2710

.3859

.0280

.0608

.00\1771

.0010

,0034

.18(5 .3439 .4780

4
5
6

.1095

.01\1770

.0523

.00d5

.0037 .0120

,00d0 .0001

.0005

.20

.25

.30

.35

.4375
.0625

.4880
,1040
.0080

,5781 .6570
.1562 .2160
.0156 .0270

.0900

.8215

.5904

.6836

.7599

.1808

.2617

.3483 .4370

.0272

.0508

.0837

.0016

.0039

.0081

.7627

.3125

.0150

.0256 .0410

.8319 .8840 .9222

.001:

.000i

.0005

.0022

.0067

.0156

.0308

.000p

.0000

.0001

.0003

.0010

.0024 .0053

.2649

.4686

.6229

.032\177

\337
1143

,2235

.7379
.3447
.0989
.0170
.0016

.8220 .8824
.466I .5798
.1694 .2557
.0376 .0705
.0046 .0109

.0001

.0004

.9375

.2415

.6723

.ooo\177

.8704 .9085
.1792

.2627 .3672
.0579 .1035

.0473

.2500

.1265

.5563

.0059

.7500

.2025

.6875

.I648
.0266

\337
0158

.6975

.6090

.4095

\337
0013

.50

.5248

.0815
.0086

.002\177

.45

.7254 .7840 .8336 .8750


.2818 .3520 .4252 .5000
.0429 .0640 .0911 .1250

,22\177\177

.000}

.40

.5100 .5775 .6400


.I225 .1600

.3600
.0400

Tail

in Right-hand

Probabilities

Binomial

.02Z6

.2775

.00\1772

.15

.0100 .0225

Cumulative

.9497

.4718 .5716.6630.7438
.1631 .2352
.3174
.4069
.0540

.0870 .1312'

.0625

.9688
.8125

.5000

.1875

.0102 .0185 .0312

.9246 .9533
.9723
.6809 .7667
.8364
.3529 .4557
.5585
.1174 .1792 .2553
.0223 .0410 .0692

.0000 .0000 .0001

.0002

.0007

.5217 .6794
.1497 .2834

.7903

,8665

.4233

.5551

.3529 .4677 .5801


.1260 .1998 .2898

.0083

.9844

.8906

.6562

.3438

.1094
.0156

.0018

.0041

.9176

.9510

.6706

.7662

.9720 .9848 .9922


.8414 .8976 .9375

\370\370\370
i
3o1\177

.0257

.0738

.1480

.2436

.0027

.0121

.0333

.0706

.0002

.0012

.0047

.0129 .0288 .0556

.0000 .0001 .0004


.0000 .0000 .0000

.0013

.3917

.7734
.5000

.0963

.1529

.2266

.0357
.0037

.0078

.0038

.0090

.0188

.0001 .0002

.0006

.0016

365

.6836

.0625

TABLE 1Ilc

.15 .20

.25

.30

.3366

.5695

.7275

.8322

.8999

.0572

.1869 .3428

.4967

.6329

.9424
.7447

.3215

.35

.45

.40

.9681 .9832
.8309 .8936

.50

.9916 .9961
.9368 .9648

.0058

.0381

.1052

.2031

.4482 .5722

.6846

.7799

.8555

.0004

.0050

.0214

.0563 .1138

.1941 .2936

.4059

.5230

.6367

.oooo

.0004 .0029

.0104

.0273

.0580

.1061

.1737

.2604

.3633

6
7
8

.oooo .0000

.0012

.0042

.0113

.0253

.0498 .0885

.0036

.0085

.0t81

.0352

.0002

.0007

.0017

.0039

.9954

.9980

.0002

.0013
.oooo .0000 .0000 .0001.0004
.oooo .0000 .0000 .0000 .0000 .0001

.3698

.6126

.7684 .8658

.9249

.9596

.9793

.9899

.o712

.2252

.4005

.5638

.6997

.8040

.8789

.9295 .9615

.0084

.0530

.1409

.2618

.3993

.5372 .6627

.0006

.0083

.0339

.0856 .1657

.2703

.oooo

.0009 .0056

.0196 .0489

.oooo

.0001

.0006

.0031

.oooo

.0000

.0000

.0003

.oooo

.oooo

.0000
.0000

.0000
.0000

.0000 .0001 .0004


.0000 .0000 .0000

9 1

10

.10

.05

xo

(Continued)

.1445

.9805

.7682

.8505

.9102

.3911

.5174

.6386

.7461

.0988

.1717

.2666

.3786

.5000

.0100

.0253

.0536

.0994 .1658

.0013

.0043 .0112

.2539

.0250

.0498

.0898

.0014 .0038

.0091

.0195

.0008

.0020

.0001

.0003

.9865

.4o13

.6513 .8031

.8926

.9437

.9718

.9940 .9975

.9990

.o861

.2639

.4557

.6242

.7560

.8507 .9140

.9536

.9767

.9893

.o115

.0702

.1798

.3222 .4744

.6172

.7384

.8327

.9004

.9453

.OOLO

.0128 .0500

.1209 .2241 .3504

.4862

.6177

.7340

.8281

.0001

.0016 .0099

.0328

.0781

.1503

.2485

.3669 .4956

.0000

.0001 .0014.0064

.0197

.0473

.0949

.1662 .2616

.0000

.0000

.0001

.0009

.0000
.0000

.0000

.0000

.0001

.0035
.0004

.0106 .0260
.0016 .0048

.3770
.0548 .1020 .1719
.0123 .0274
.0547

.0000

.0000

.0000 .0000

.0001

.0017

.0005

10 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001

366

.6230

.0045

.0107

.0003

\1770010

TABLE

An

entr,

curve,

the

bi

n z

Areas

IV

Normal Distribution

for a Standard

Areas

table is the area under


the
= 0 and a positive value ofz.
values of z are obtained

:-z\177-----Area

= Probability

by,
0

Second Decimal Placeof

.01
0.0

.02

.03

.05

\17704

.06

.0279

.0319

.0359

.0675

.0714

.0753

.0080 .0120
.0478

0.2

.0793

.0517

.0832

0.3

.1179

.0871

.1217

.0910 .0948 .0987

\337
1554

.1591

.1255

0.4
0.5

0.6

0.7

0.8

\337
1368

.1406

.1700

\337
1736

.1772

\337
2088

.2123

\337
195o

.1985 .2019 .2054

.2291

.2324

.ss

.2611

.2642

\370

1.0

.2703

.2910

.2939 .2967

.3186

.3212

.3238

\337
2734

'

.3665

.3686

3708

.3051

.3340

\337
3531

3554

\337
3749

.3770

.3577
3790
.3980
.4147
.4292

3925

.3849

.3869

.3888

.3907

\337
3944

.3962

.4o49

.4066

...\17782

.4099

\337
4115

.4131

.4222

.4236

.4251

\337
4265

4279

1.6

1.7

1.8

1.9

2.0

2.1

2.2
2.3
2.4
2.5

30\17778

4656

.4713 t

.4719

I
I

.4772

.41}21
.4\17761
.41\17793

.4726

.4732

.4788

.4778

.4783

.4826

.4830 .4834

.4864

.4868

.4871

.4738

.4793
.4838

.4875

.4896 .4898 .4901 4904

.4920
.4\17718

.4940
.4955

2.7

.4664 .4671

.4966

2.8

.49'74

.4975

2.9

.498i

.4982

.4987. .4987

.4922

.4941

.4925

.4927

\337
4678

\"\177L'W6\177.4693

.3106

.3133

.3365

.3389

.3599

.3621

.3810

.3830

.3997

.4015

.4162 .4177
.43O6

.4[332 .4345 .4357 .4370 .4382 .4394 .4406 .4418


.44521 .4463 .4474 .4484 .44:95
.4505 .45\17715 .4525
.4564 .4573 .4582 .\177-i .459
.4654
1
.4608 .4616
.4649

2.6

3.0

.2852

.3315

.41032

1.5

.28:\1773

\337
3023

1.3

.41192 .4207

.2549

.2794

\337
3289

.3729

.2190.2224
.2517

.2995

1.2

1.4

.2157

.2486

.3264

.3438 .3461 .3485 .3508


.643

.2764

.1480.1517
.1844.1879

1443
.1808

.2422 .2454

.2389

2673

.1026 .1064 .1103 \337


1141

.1331

.3915

.2357

.0636

.0596

.2257

0.9

1.1

.1664

.1628

.09

.08

.0160 .0199 .0239

.0040

.0438'

1293

.07

.0557

.100012

.,0398

0.1

.4319

.4429

.4441

.4535

.4545

.4625

.4633

.4699

\3374706

\337
4744

.4750

.4756

.4761

\3374767

\337
4798

.4803

\337
4842

4846

.4808
.4850
4884
.4911
.4932

.4812
.4854

.4857

\337
4878

.4881

\337
4906

.4909

\337
4929

.4931

.4943 .4945 .4946 .4948

.4817

\3374887

.4890

.4913

.4916

.4934

.4936

.4949

.4951

.4959

\337
4960

.4961

.4962
.4972
.4979

.4963

.4988

\337
4989

.4989

.4989

.4990

.4952
.4964
.4967
.4968 .4969 \337
4970
.4971
.4973 .4974
.4976 .4977 .4977
\337
4978
.4979
.4980 .4981
4982
.4983
.4984 .4984 4985 .4985 .4986 .4986
.4956

.4957

.4987 .4988

367

.4990

TABLE V

Student's

Critical

Points

\275f\177/\177,,,,.
--t

Critical
point

Pd.f.\177

.10

.05

.025

.01

6.314

12.706

31.821

3.078

1.886 2.920

1.638

.005

63.657
9.925

4.303

6.965

2.353

3.182

4.541 5.841

1.533

2.132

2.776

3.747

4.604

1.476

2.015

2.571

3.365

4.032

1.440

1.943

3.143

3.707

1.415

1.895

2.365

2.998

3.499

1.397

1.866\177

2.306

2.896

3.355

1.83\1773\177

2.262

1.81.2

2.821

3.250

2.228

2.764

3.169

1.796

2.201

2.718 3.106

1.782

2.179

2.681

2.160 2.650 3.012

9
10

1.383
1.372

11

1.363

12

1.356

\177:i.2._.447

3.055

13

1.350

1.771

2.145

2.624

2.977

15

1.341

1.753

2.131

2.602

2.947

16

1.337

1.746

1.333

2.120

2.583

2.921

17

1.740

2.110

2.567

2.898

18

1.330

1.734

2.101 2.552

19

1.328

1.729

2.093

2.539

2.861

20

1.325

1.725

2.086

2.528

2.845

21

1.323

1.721

2.080 2.518 2.831

22

1.321

1.717

2.074

2.508

2.819

23

1.319 1.714

2.069

2.500

2.807

24

1.318

1.711

2.064

2.492

2.797

25

1.316

1.708

2.060

2.485

2.787

26

1.315

1.706

2.056

2.479

2.779

27

1.314

1.703

2.052

2.473

2.771

28

1.313

1.701

2.048

2.467

2.763

29

1.311

1.699

2.045

2.462

2.756

30

1.310

1.697

2.042

2.457

2.750

40

1.303 1.684

2.021

2.423

2.704

60

1.296

2.0'00

2.390

2.660

2.358

2.617

2.326

2.576

14

120
oo

1.345 1.761

1.671

1.289 1.658 1.980


1.282

1.645
368

1.960

2.878

VI

TABLE

Critical

Points*

(C 2

g2/d.f.)

Critical

point

.99

.90

.0O5

7.88

.060039

.00016

,00098

.0039

.04501

.0101

.0253

.0513

.0383

.0719

.117

.0158
.1054
.195

.0\17717

.0743

.t21

.178

.266

3.72

.0\17723

.111

.166

,229

.322

3.35

6
7
8
9

1\1773

.145

.206

.273

.367

3.09

1, 1

.177

.241

.310

.405

2.90

.206

.272

.342

.436

I'!'3

.232

.300

.369

10

21 6

.256

.463
.487

4
5

11
12
13
14
15

.0\17739

,8

2'\1774
.2\1771
.3\1777

.394

5.30

4.28

2.74
2.62

2.52

.278

.347

.416 .507

.298

.367

.435

.525

2.36

.316

.385

.453

.542

2.29

.333

.402

.469

.556

2.24

.349

.417

.484

.570

2.43

2.19

16
18

.3\2751

.363

.498

.582

2.14

.348

.390

.457

.522

.604

2.06

20

.3:}2

.413

.480

24

30

.4o

40

60

120

Interpol\177

* To

.95

.975

obtai

.543

.622

2.00

.452

.517

.577

.652

1.90

.498

.560

.616

.687

1.79

.554
.625
.724

.611

.663

.720

.726
.774

1.67

.675

.763

.798

.839

1.36

1.000

1.000

should be
ical

values

performed

using

of Z 2, multiply

!.53

1.000 1.000
reciprocals
the critical

369

1.00

of the
value

degrees of freedom.

of C s by (d.f.)

370

371

372

373

TABLE

2
0086

Common

VIII

10

0000

0043

1!

12

0414

0453 0492

0792

0828

13

1139

1173 1206

0864

0128

6
0170 0212 0253

0531

0569

0607

0899

0934

0969 1004

1239

1271

1303

14

1461

1492

15

16

1761

1790 1818

2041

2068

17

18

2304

2330 2355

2553

2577

2601

19

2788

2810

2833 2856

1523
2095

1335

1553

1584

1614 1644

1875

1903

2122

2148

2175 2201

2380

2405

2430

2625

3032

3054

3075

3263

3284 3304
3483

3096

3424

3444

3464

3636

3655 3674
3838

24

3802

3820

25

3979

3997 4014

26
27

4314

4330

4346

28

4472

4487

4502 4518
4654

2923

1072

1399 1430
1703

1987 2014
2253

2504 2529

2945

3139 3160
3345

3365

3502

3522

3541 3560

3692

3711

3729

3874

3892 3909

4048

4065
4232

4082
4249

3927

3945

3962

4116 4133
4281 4298

4393

4409 4425

4548

4564

4639

4669

4683

4698 4713
4843

4857

4969

4983

4997 5011

31

4914

4928

4942

5051

5065

5079 5092 5105

33

5185

5198 5211

4955
5224

5237

5366

34

5315

5328

5340

5353

35

5441

5453

5465

5478 5490
5599

36

5563

5575

5587

37

5682

5694

5705 5717

38

5798

5809

5821

39

5911

5922 5933

40

6021

6031

4579

4829

32

5119 5132
5250

5263

4440

4456

4594

4609

4728

4742

4757

4871

4886

4900

5024

5038

5145

5159

5172

5276

5289 5302

5378

5391 5403 5416

5502

5514

5527

5611

5623

5635 5647

5729

5740

5752

5843

5855 5866

5944

5955

5966

6053

6064 6075

5832

3201

4099
4265

4533

4786 4800 4814

3404

3181

3598

4216

4624

3385

3784

4378

4771

2765

2989

3579

4200

29

2742

2967

3766

3747

4362

30

2279

2227

2480

3118

3856

1732

1673

1959

3324

4031

1106

1038

1367

2672 2695 2718

3243

3617

0334

0719 0755

2900

3010

22

2455

0374

0294

0682

2648

3222

23

1931

1819

2878

20

6042

0645

1847

21

4150 4166 4183

Logarithms*

5428

5539

5551

5658

5670

5763

5775

5786

5877

5888

5899

5977

5988

5999 6010

6085

6096

6107

6117

6212,

6222

6972

6981

7059

7067

51

6128 6138 6149 6160 6170 6180 6191 6201


6232 6243 6253 6263
6274
6284
6294
6304
6335
6345
6355
6365 6375 6385 6395
6405
6435 6444 6454
6464
6474
6484
6493 6503
6532
6542
6551 6561 6571 6580
6590
6599
6628 6637
6646
6656
6665
6675 6684 6693
6721
6730
6739
6749 6758 6767 6776 6785
6812
6821 6830 6839 6848
6857
6866
6875
6902
6911
6920
6928
6937 6946 6955 6964
6990 6998 7007 7016 7024
7033
7042
7050
7076
7084
7093
7101
7110 7118 7126 7135

7143

7152

52
53

7160
7243

7168
7251

7324

7332

7177 7267
7185 7275
7193
7202
7259
7284
7340
7348 7356 7364

7308
7388

41
42
43

44
45

46
47

48

49
50

54

374

7210

7218

7292

7300

7372

7380

6314 6325
6415

6425

6513

6522

6609

6618

67026712
6794

6803

6884

6893

7226

7235
7316
7396

55

7419

7427

7435 7443

7451

7505

7513

752O

7528

7589

7597

7604

7566

7657 7664 7672 7679 7686


7716
7723 7731 7738 7745 7752 7760
7789 7796 7803 7810 7818 7825 7832

60

61

62

63
64

65
66

\17712c\177

67

\177261

8195

7582

7574

7649

7612 7619
7694

7767

7774

7846

7980

7987

7860

7868

7875

7882

7889

7931

7938

7945

7952

7959

8000

8007

7966

8021

8028

7973

8014

8O35

8055
8109 8116 8122
8176 8182 8189

8069 8075 8O82


8136 8142 8149
8215

8209

8202

7896 7903 7910 7917

8089

8096

8102

8156

8162

8169

8222

8228

8235

8041

8048

8241

8248

68

\177325

69
70

\177388
\17745I

71

t513

8519

72

I573 8579 8585 8591


8645
8651
1633 8639

74

5692

75

I{75i

76

77
78
79
8O

8274

8267

8525

8476

8482

8488

8537

8494

8500

8506

8543

8549

8555

856i

8567

8597

8603

8609

8657

8663

8669

8615 8621 8627


8675

8681

8802

8859

8887

8893

8899

8938

8943

8904

8910

8915

8949

8954

8960

8965

8998

9004

9009

9971

9053

9058

9063

9015 9020 9025


9069

9096
9101 9106 9112 9117 9122
9149 9154 9159 9165 9170 9175

9090

9143

8686

8745

8882

8982 8987 8993


9036
9042
9047

81

8382

8445

8470

8710 8716 8722 8727 8733 8739


8756 8762 8768
8774
8779
8785 8791 8797
8814 8820
8825
8831
8837
8842
8848 8854
8871
8876
8932

8254

8319

8531

8704

8698

8927

82

7627

7701

7839

8280 8287 8293 8299 8306 8312


8331 8338 8344
8351
8357
8363 8370 8376
8395 8401
8407
8414
8420
8426
8432 8439
8457
8463

73

7459
7466 7474
7536 7543 7551

58

7642

57

59

9074

9079

9128

9133

9180

9186

9232

9238

'83

9196

9201

84

9248

9253

9258

85

9263

9269

9279

9309

9284

9350 9355

9315

9320

9289

86

9304

9274

9299

9325

9330

9335

9340

87

88

9212 9217

9206

9222

9227

9360

9365

9370

9405

9410

9420

9455

9425

9450

9460

9415

9375

9400

9430

9435

9504 9509

9469

9474

9440

9465

9484

9489

9513

9518

9479

9552

9562 9566

89

9499

9O

9547

9557

9380 9385 9390

9523 9528 9533


9576 9581

957l

95'38

9586

91

9595

9600

9605

92

9609

9614

9647

9619

9643

9652

9624

9657

9661

9666

9713

9680
9717 9722 9727
9763
9768
9773

93

9689 9694

94

95
96

9703

9708

9741

9745

9754

9791

9759

9782

9786

9750

9800

9805

9832

9836

9795

19872

98

99

* The log

2,

because

of

the de

2.

9881

9675

9809

9814

9850 9854 9859


9886 9890 9894
9899 9903
9841

9845

9633

9818
9863

9908

9921

9926

9930

9934

9939

19961

9948

9974

9978

9952

9965

9969

9943

9983

9987

9991

9996

be raised to yield N.\" Thus log 100 =


the \"mantissa\" (the digits
to the right
iS
each lo g' The characteristic (the
inte
er to t he left of
i g\177ven for
,!forexam
lelo
191
1281
L
\177
'
g
\337
'i
P
g
\337 =
.
.
og lb- N requires the characteristic
\177haracteristic 2, log 100N the characteristic
3, and so on. Thus

to which

power

10.0.In

10 N

log 537 =

9877

9628

9671

i9917

\"'the

decimal\177

log

9699

9736

\1779827

97

the

749O 7497

7412

56

0,

(Continued)

VIII

TABLE

this

table,

10 must

only

375

376

APPENDIX

I. Reproduced, by permission,
Wiley and Sons, 1945.

II. (a) Reproduced,

by

(b) Reproduced, by

III.

by

Reproduced,

IV. Reproduced,
Edition, John

by
Wiley

Sons,

and

Statistics

1966.

Corporation.
Chemical Rubber
Company

Standard

Edition.

Student

permission,

Sons,

John

the RAND

from the

16th

Tables,

C. Clelland et al., .Basic


and

Wiley

from

permission,

permission,

Trievonometric

Wiley

from R.

permission,

Tables,

Mathematical

the

from

Applications, John

with Business

TABLES

FOR

CITATIONS

from

P.

Hoel,

Elementary

Statistics,

2rid

1966.

V. Reproduced, by permission,
from R. Fisher and F. Yates,Statistical
Tables,
Oliver and Boyd, Edinburgh,
1938.
VI. Reproduced,
by permission, from W. J. Dixon
and F. J. Massey, Introduction
to Statistical
Analysis, 2nd Edition,
McGraw-Hill,
1957.
VII. Reproduced, by permission,
from
Statistical
Methods, 6th Edition,
by
George W. Snedecor and William
G. Cochrane,
1967, by the Iowa State
University

Press,

VIII. Reproduced
Edition, \270
New Jersey.

from

1967,

Iowa.

Ames,

E. Freund,
permission
of

John

by

Modern

Elementary

Statistics,

Prentice-Hall Inc., Englewood

3rd
Cliffs,

to

4nsvelrs

The

given

merely
in

error

Problems

Odd-Numbered

\177,.tude

ans,\177ers

at is not

expected always

to

calculate

answers are given


for the
>eriefit
of those who want it;
b4caus
of slide rule inaccuracy.
>elow.

These

377

to

the answer
a fairly high

even so, the

last

as preciselyas the

degree of precision
digit may be slightly

378
Mode <

2-1

is

central measure

a bad

not

which is not

in

case,

this

2-17 27.6%(NOT

asymmetrical.

very

PROBLEMS

2-15 121.50

mode

mean.'The

<

median

ODD-NUMBERED

TO

ANSWERS

.952/4)

77.4, 81.25, 85

2-3

10

23
3-1

answer

n\177(authors'

(c) n

-60
Mean

2-5

3-5 (b) not

Mode

Median

= .54

= --

--

50

90

80

70

1/2

1/4, 1/4,

likely;

equally

-- .46)

50

(c) 3/4
hardly

raw

77.78

81.47

fine

77.4

81.25

85

coarse

78.4

80.00

80

defined

3-7

(c) .50, .70, .85,.35

(a)

3-9

Mode
degree

(b)

on

depends too much


of grouping.

but not always, does


grouping
give worse

Usually,

coarse

ap-

375

(c) No

the

the

_ .375

6
\177-o-

o --

(b)
(a)

.30, .65, .15

.50,

(a)

3-11(a)

.40

(b)

.60

(e) . 17
3_z
_
\177o --

.42t

proximations.

(c) .55
2-7

range

= 30

MAD =
=

s
2-9
integral

sum = 1

or 40

8.58
/\1772736
/
\276

x/114.--\177

10.67

.06
3-15

(a)

.21 -. 29

(b)

Yes.

24

2-11 239,483

Pr (A)
B)

3-17 (a)

2-13 (a) 77.4, 5x/4.84=


;/1--\177--_\177
=

11.0

3.8

Pr (A/A

Pr (A tw

easy to compute.

8.60,

\177'\177
-.58J

(d) .78

Text coding is preferred


because
it has
and small values of ,!/, which
are

(b)

(f)

\177

(b)

\177

(c)

\177

so

a
\177\177

\177 B)

Pr

(A)

Pr (A) +
x
-o\177x-\177t =
\177-\177-

.\177as
143
x \177 =.

Pr

.0046
.00076

(B)

TO

ANSWERS
z9 _

3-19 (a)

.62

2__8 __

4-I
= .089

=A

(c)
- \177 --.022
--45
0

3-25

(a)

E2)

(Et

-- 4

(b)

?(v)

1/16

2/16

4/16

6/16

6/16

6/16

4/16

2/16

1/16
16/16

4-3

(Et) Pr

(Es,),

I.e,,

(E\177) Pr

(Ea),

6\17736

10/36

'r (E\177,tq E a)
\177 Pr
-\177

8/36

6/36

4
5

4/36
2/36
36/36

(a)

les

(b)

]Yes i

4-5
3-29

(b)

16/16

= Pr

3-27

p(x)

(a) x

(b)

3-21

379

PROBLEMS

ODD-NUMBERED

(a)

tt = 2

(b)

/t =

(a)

'3

(b)

iinpo[sible
be an \177
error

must

conditions--there
of specification.

4-7 (a) 3tx

3.5

(b)

4-9

conditions.

sdm =

(d)
(c) 0j

.001

3-33(a)

5/T\1775

= 1t

(b), (c) lty

(d)

(c) [mp\370[sible
3-31 (a)

.75

1.5

1.7

3.4

p(x)

0
1

16/81 =

.198

32/81=

.395

24/8I

= .296

8/81 --

I/'81

= .0079

.099
.012

81/81
.001

\177or

t tosses,

/\177--

.001 +

5-g

\177
a --

s
(72 ---- \177---

(.999)

1.33
, 89

(b)
(c)
See
ow
certhintv

4-11

he probabilities grow
:as

n _,,

m.

toward

(a) /tx
(b)

= 1.36

,u :\177,=

(c) tt x

3.15

5/2'43 =

1.56

5/2.16= 1.47

4-13

TO

ANSWERS

380

(\177).2x.8a-\370c

(a)?(x)=

PROBLEMS

ODD-NUMBERED

p(x)=

4-15

1,\177
7 1,\177}

p(z)
o

64/125= .512

48/125

12/125 =

1/125=

125/216 = .579

.384

.096

75/216=

15/216

.008

1/216 =

.347

.070

.004

216/216= 1.00

1.00

-- 2

-- .50
c;\177\177
-- -\177
- .416
\177f\177
\177

.60
.48

0.6

0.4

p(x)

0.4

0.2 -

p(x)
0.2

0
0

&l

4-17It

= nrr

,7=(1

\1772=

- =)

4-19(a).9544
.9495

(b)

(c) .9901
(d) .9772
(e) .9772

p(.O
.729

__\177
.95
\177

.99

'

4-21

.243
.027

.001

\337
1.00

.30

= .27

0.8
0.6
p(x)

0.4
0.2

0
012

10 12

TO ODDaNUMBERED

ANSWERS

4-23

(a)

\177-\177

sum

(b)

4-25

,SFS
.SI\177F

1/48

.SSF

?(r)

(a)

.FSS
i

2/16

6/16

12/16

--2

6/16

4/16

-1'5
t2/16

(b)

i)

3/48

.FFS
.FFF

5/48

4-27

9/48

1/48

48/48
--.92

(nOte =
(b) 10/48 =

\177-+

} + -xo)

.21

= .75
Ix'

\276

5-1

21

(a)

(b)

6/16

8/16

8/16

2/16

4/16

16/16

(d)

1/48

\177

y?(y)

(c)

15/48

48/48

.FSF

/\177-

,\177e \177

?(x)

23/48

15/48

2/16

,\177R 3

0
1

15/48
3/48
5/48

.SSS

(c)

381

Pr(e)

(a)

PROBLEMS

it

12/16

0
1

4/16

0
1

0/16

p(x)

1/16

1/16

2/16 2/16

2
3

2/16

2/16

1/16

6/16

4/16

2/16

2/16

2
1

6/16

2/16

4/16
il/16
1

4
y

2/16

= 3
/\177)2=

3/4 =

o \177Xof

course

x
(a)

or
(b)

(c)

.6 \177:
.4 5'-\177 = .683
of 5 ha s

68\177

chance

whereas a
)n has only a 60 %

,redicting,

4-29

(a)
y

\337

382

ANSWERS

(c)

2,

(d)

TO

ODD-NUMBERED

PROBLEMS

1.2

(b)

?(x/\276=2)

,c

(c) ,2
2

1/3
1,/3

1/3

5-7

--.6

5-9

(a)

(e) 2, 2/3
(f)

No. For

p(O,l)
5-3

example,

-\177px(O)?\177.(l).

(a)
(b)

Yes

4
I

2 --

E(X)

5-11

\337 \337

(a)

p(.\177)

.2

.6
.2

p(s)

x
(b)

because of symmetry
+ E(Y)= 3\253

(i) 0,
(ii)

1/9

3
4
5
6

2/9
3/9
2/9
1/9

4
4/3

(b) For

X2

X\177and

(c)

02=2/3

/t = 2,

(d)

(c) \234(x\177+
0

1/6

4/6

1/6

(f) No, because,for


3)

?(0,
(a)

tp(.

-1

o
1
2
3
5

example,

\177?x(o)?r(3)

.2

.1
.4

--.2

.1

.5

.1

.1

.2

4
5
6

.3

.3

1.2

Var

0 + E(X_o)
Xa +

Var X2.

.3

.1

.2

\234(x

XQ =

?(s)

4.10,o}= 1.29

.4

.1

xs)=

(X\177 +

5-13 (a) s

t/3

(e) 1,

5-5

var

(b)

cs} =

.60

2.10,

o,\177=

.69

(X1,X 2)

= 0

/q

= 2.00,

/t 2

(c) cov

by

symmetry

TO

.ANSWERS
5-15

(a)

(d)
o

.15

.10

\177j

.10 .15

(f)

\337
5

\177}\177
-- . 25

osz =

5)

.2
.6
.2

(d)

and

are

different

= .50

1.5

I't'/x=5 = 6
(e) No, because (b)

dependent

1.0

(b)

?(\177j/X

5
6
7

.25 .25

are

383

PROBLEMS

ODD2NUMBERED

.7

.85

5-17

140 150
400

160

(a)

400

100

65

.2

.1

7O
75

.1

.2

.1

.1

.2

1200/9

133.3

160
I

Negati,
first

grad{

second
student

g\177
\177ho

means
that a high
rods t \370be followed bY a low
\177V
2. This may be becausea

well on the

(oes

becomesbver(0nfident

and

fails

exam

first

to

s\177\370nd examl
SimilarlY, a Student
who doe\177 poorly
o n the fi\177'st exam may
study ver\177 liar' f \370rth e second.
covariance makes the
The negative
less
fluctuating
(o = 10
average
grade\177

519

iP (:r)

.5

.2

.5

6
7

.4
.4

(b)

6. 2

(c)

?(h)

65

.3

70

\1774

75

.3

/t H
(c)

5.5,/t\177

(b)

70--

75--

ofl 5)
(a)'\177

\177

study

for the

instead

65--

-- 70

o\177t

15

?(w)

t40

.3

150

.4

160

.3

.!

.4

ttW:

.2

(d) 2O

.3

11.7= \177x

150

(e) 143.3,156.7
+ t1ix

(f)

No, because

a\177

= 60

384

TO

ANSWERS

(g)

590 (*-2/x H d- 3/t W)


840 (=4o\177rt + 90.\177V
+ 12oH W)

3t\177=
0.\177=

0.1

ODD-NUMBERED

PROBLEMS
,TI

(c)

1/2

1/2

(= X/840)

-- 29.0

coded

5/2

(d)

55O

.2

--5

560

.1
0

-4

570

580
59O
6O0

-1
0
1

.1
610 0

.1

62O

.2

63O

-1.0

-3
--2

.1
.2

5-23

(e)

100/4

/Ja

= 25/2

It4

-' 25/2

625/4
625/4

= 32.5

.2

.6
(b)

1/3o

1/30

1/30

840

(H H H
\337
(H

T
T

H
H

(H T

etc.

(b)

= 343.75

5-25 (a)

(a)

\337
(H

0.i2 -- 1375/4

65/2

\177/t,

o\177=

\337
(H

25/4

10(-1.0) = 590

600 +

Similarly,

/t 2 = 10/2

-1.0
bti =

0.x2

6oo)11o

= (i-

i'

_p(r)

1/16

1/16

10

1/16

T)
H)
T)

15

1/16
0

25

2/16

1/6 --. 1/6

1/6

i.e., 0 %
(d) -7/12

(f)

1/16

85

example,

1) % px(1)py(1)

p(1,

-\177
\337
\1776

= --.58

(e) 3.5,35/12

30

1/6

...

1/3o

(c) No, because,for

20

1/6

1/6

H)

H H)

p(Xl)

/x,g =

7.0

0.\177 =

28/6

= 2.92

(=o\177

092

+ 2o'12)

\177 4.67

32.5

343.75

Or compute
(the hard

from
way).

?(s)

directly

ANSWERS

5-27

350/12 =

(a

TO ODD-NUMBERED

29.2

900,000

(a) 9000 and

6-13

10= 50

385

PROBLEMS

(b) .147
(a).014

6-15

(b) .008

6-17 .24

6-1

In the
\"standard

Cl

sentence,

deviation\"

\"ran

(.023 without

.018

6-19

last

correc-

continuity

tion)

with

(.309)

(a)

.6-21

'\177--

.0028

(b). 131

(c) .131
6-5

8-7\177-'

1.63

850 --

Since

(d)

are asking

p(\177)

170 x 5, (b) and

exactly

the

(c)

same event.
'event
(b)

On the other hand,


occurs whenever (a) occurs, and
some other times as well.

1/9

2/9

3/9
6-23

2/9
1/9

(a) Equally

(b)

3.92x/\177

2n,

200, 39.2

4
6-25 (a) Pr

(c

:i = .016

(Izl >

p(\177)
6-27

(a)

\177/27
13

3/27

10]3

6/27

lZ/3

7/27
6/27

1\177
]3

3/27

p(x)

0.4
0.2

--

65,
(b)

(d)

X/'\177/'9

c\177
2 =
p(\177)

:: .943

6-3(a).

63.5

.3

65

.4

66.5 .3
5-7 pr

;7<Z<3:00)

=.9987

(c)

p(x) 0.2

6-9

0.4
.01154_

6-11

.02

65

68

1/27

o2--

TII

62

18/5 =

3.6

ANSWERS

386
(d)

TO ODD-NUMBERED

PROBLEMS

=p

px=65

(c)

:*

(\177)2

(\177)27\275)

(e)

\276

(f)

4/9

18/9

16

48/9

25

50/9

36

36/9

E(X

\177)=

Bias

156/9
E(,\177 2)

\177/\1772

--/if'

= 156/9 --

7-1 (a) 71

(3)

\177-

4-1.96

71 i

.59

x/lOO

Similarly,

(d)

E(1/\177v)
7-3

.83

Bias

4- .032

\177 \177/\177

-- .250

= .274

7-5 (a) 20(\177)


\177o\177(\177-o)
\177

.024

Theoretically,

(b)

- (sum of answers
above)
Answers
(a) and (b) can be roughly
approximated
by the normal, as .39and .30
(or .243, if you like). The correct values
are
.377 and .358, respectively.
(c) 1

(a) unbiased,

by

(6-10)

(b) unbiased,

by

Table

(c) biased;

by

variable'

(4-5),

Hence .\177is

(b) 9/10.

\337

for any

random

for X:

preferable.

i.e.,
7-9

4-2

= o2

E(X2) -/re
In particular,

7-7

42

-- 4/3

p(\177:)

\177:p(\177)

1/9

2/9

3
4

2/9
3/9

12/9

2/9

10/9

1/9

6/9

bias

\177}

=4/3

7-11
6/9
0.3

0.2
(a)

E(,\177)=

(b)

.\177

= tt

36/9
2.\177+

L(\177')

(2\177 +

1).p(\177)

5/9

14/9

27/9

11

22/9

13

13/9

E(2.\177q-

unbiased

1) \177.81/9

= 2/x 4- 1

0.1

0.5

\177

MLE -- .75

1.0

TO

ANSWERS

ODD-NUMBERED

(\177i

-,

8-19 .06 4- 1.965j

\177

100

100

,--'= .060 4-

\177roof:

(b)

(.66)(.34)

/(\17772)(.28)

7-13 (a)

387

PROBLEMS

.128

(Yi

\177 E(X\177

(.40)(.60)

/(.79)(.21) +

.39 4-2.58

8-21
l\177)
2

300

1000

xj

= .390 q-.080

\177G2

47.3

47.3

<

8-23

0r2

crz

8-25 4 4-

os < 658

15.2 <

.. i.e.,
.072

<

3.12

+ 1/3

2.776x/5-\177V\177l/3

= 4 \1774-8.17

8-1

)lx/.'\177 = 1

8-27 (%

4- .92

-- %) =

4- 1.96

8-3 (a)'9

of 4,

FaC

(b)

4-

so that

2.96

i.e.,

--

i.e., relative
= 69

\177 1.[8(4/\2751\177

8-7

Stt\177

,os\"ng a and it

decline --

Although the best guess


\177 .784

are both

one-fifth,
allowed

unknown,

only

.77

89
use

.142

rr

69

205/o\1774 -

135/o

decline is
when
sampling
fluctuation is
for, with 95\177 confidence
we can
say
that the relative de91ine was

between 7 \177 and

33

for the

5/0.

:\177
2!

=2\1774.1

8-13 .4 20

\177 1.96x'

9-1

=.4820

8-15 .0

I, II

\177 .0098

P -- .50

.
9-3
<(i \177 \177
.49
1.96x

8-17

4- .018

.028

.0284- .018

A,r
Thus

2500

2500

\177r
I -- 'rr 2 \"=-

240.

(c)

8-5

(.1 la,)(.886)
+

\177

--

96x/\17736/60

4-tl

(.142-- .114)

/\177.142)(.858)

(% Reject

(.199)(.801)/2500

(a \177).19

\177_

=.\177992 \177 .0157

i.e., ?

\17792\177 .98/X/2\177

'

The
er

=.1992

> .533

._.,>

/(.5)(.5)'

.67

X] 100\"

\177 .0196

closer P is to

approximation

H oiff

.5,

(b)
the

will (8-21)

About
make

(c) .085

25\177

of

erroneous

the stqdents

will

rejections of H0.

388

ANSWERS TO

9-5

Ho iff

(a) Reject
X--

PROBLEMS

ODD-NUMBERED

The best

that

done is a

can be

2.16\177 test'

8.5
>

X/l-7- \177

1.645

i.e.,

,,\177= 8.8, reject H0.(It


more accurate to use
critical value of 1.68.)

Since

would

be

the

= 0, .1, .9,or

Reject H 0 ifP

8.74

,\177>

H 0 if

(b) Reject

or

P<.402

(b)

P>.598

of the discrete
it would be better
to

because

Again,

of P,

nature

state the

answer:

H 0 if

Reject

1.0

1.0,

or P>.60

P<.40

0.8

found

\177 is

Then

correction

\1770.6

to

continuity
= 1.90).

by

be 5.73/00 (z

0.4
0.2
0

9-15 (a) reject

(t = 8.2)

accept (t
reject
9-7

(a) Prob-value
if the
claim

= .04

(z = 1.77) i.e.,
(S6600) is true,
the
of getting a sample as
as this ($6730) is only 4 \177.

chance
extreme

not reject. However,


if
possible, I would avoid accepting
H 0, in order to avoid
the risk of
a type II error.

(c) I would

.007 (z --

9-11 Reject
=
9-13

2.60)

<

6334

(b) 5726 <

It

Therefore reject, accept,reject.

Yes.

(b)

9-9

= .21)

(t --

H 0 (z

= 2.34

prob-value

and

the normal

tion, (which
reject Ho if

or

(c) You cannot reject


reason (a) or (b).

8-4, (which
reject H 0 if

rough),

or P>

.14

P<

very

a 5 \177 test

is not

(tenths),

binomial

.0804-

H0,

either

for

.043

<.001 (z = 3.7)

(c) The sample

is also

tive

difference

significant

cally

in

populations

matter.

.86

possible.

9-21 (a) 1 4-.64

(b) .002(z
(c) Yes

at the

sociologicalsignificance

difference

discrete
it is better to use the
Table II. It is seen
is

Since

9-19 (a)

(d) The

P>.81

Fig.

Using

that

approximarough),

is very

P<.19

Off)

18.4 Z

(b)

(b) prob-value

answers'

(i) Using

(ii)

< 3t

12,100

2.48)

.OLO).

(a) Three

9-17(a)

3.08)

is statisti-

5 \177olevel.
of the
is a

rela-

TO ODD-NUMBERED

ANSWERS

3.68

t=

(/

hour

(b) For

10-1 (a)

389

PROBLEMS

confidence
follow-

the

factor,

allowance is 4-2.77for
ing differences in

\177_.>/

the

--3*

3*

1
\\

',,,,

'

\177 2.

Therefore

6*

:i

10-995
t'=

Sincj

(b3

F and

t\1770,,

% confidence

3ttT --

oa

=64-3.5

s3e in this particular c\177se that


the tpst using t gives exactly
the

we

sam\177iconclusion as

the test

H o at

\337reject

5%

level.

F.

in

proved

have

Mat\177ematicians

using

interval:

= 6 4- 2.77V8/5

3tX

gene{al that whenever F has 1 and


k df, \177
and
t has k df, then t \177and
F halve exactly the same distribu[tion.

(c) 7 4-

L45x/4-\177x/2'\177

7 4- 4.7

2.6
11-1

(a)

760 +

S=:

or =-396

50(1114/18) -, oc,

(b)

a=

(319/3)

which

ialls

- estimate of
average person\337

ao

F.os

of

not reject

do

Th\177refor\177

= S--396 =
ings of a person

=z..o:,
short

3.06.

'
(a) lho

rlfact

(a)

r'

= 32.4

10/12

.068
acre\.

bushel

Not

economical

13.6q
\177
--

6.94\337

saw.

income\337

is extrapolating

recklessly.

\370

F.05

zero

with

savings

of

estimate

this

However,

Ho.

11-3
10-7

+.144Y

s760

of the
F\177

+.144y

= 760

10-5

-\177

(all

units are

\"per

(net return-

25{).

(c) 13.6e

Therefore

H 0 \337
11-5

16/3

(a) S(ao,

= 6.4

(b)

Oa \370

aS

0rt of
not

b) = \177( Yi -- ao

-- bXi) \177

aS

F. 05 =

reject

H 0.

6.94. There-

ab

-2

= -2

(Yi

-- ao

\"o

bXi) =

-- bX\177)
=0

390
(c) a 0 =

-396,

the

than

(d) The method

ODD-NUMBERED

TO

ANSWERS

before
is easier
in this problem.
.144, as

the text

in

method

PROBLEMS

(c)

878

(d)

s 2 = 7863/2

(e)

of freedom for s\177'


few.
It would be

degrees

Two

almost

are

= 3931.
too

better to collect more data. This


of data is even more
acute
in Problem
13-3. -'
scarcity

2.6
12-1

.144

(b)

13-3

/.0-\177

/.0-5-8'-\177

1-

/3*=

that

Note

[:\177,and
the error
are the same.

[3* and/3

(b) a =

12-3t

2.6/18

= 3.11

x/.0388/18
which

falls

fore do
12-5

It is

of wide

not

short of t.0 \177= 4.54. Therereject H 0.

preferable to observe i in

a period

13-5

(b)

S = 760

+ .115!!--

and

Y, and

misleadingly

tween S

and

no
W is (negwith both S
Y, taking

In fact,

correlated

atively)

.024c

+- .000010d

.1054

-.0242

d=

--38.1

is much

better.

269
8

52.5
42 (r-- 7.5)

= 33.6-

1.25(T- 7.5)

serious bias caused by


that we started at a
seasonal
high (Christmas), so
that
of course the time trend is

multiple correlation coeffiis the proper measure of


\"the
relation
of S to Y, other
things being equal.\"
The simple
correlation coefficient
measures

of S to

--.00_7\177b

+ .024d

There
is
the
fact

(b)

The

W.

+ 144c

760

c =

S-

cient

the relation

.007d

.029w

Coefficient of y is .115,which
is
less than the former value,
.144.

account of

--

(a) 36.7

13-7 (a)
13-1 (a)

(b) 25.5, which

fluctuation.

-.0017

(4)

18b--18c

= --18b

--6.3

(3)

18

= .8564- .148
allowances for

2.6 =

(2)

4- 3'18N/

18

760 = a

(a) (1)

i\177

.148

4-

15.4

--

fl*

18 4- 3.1 \177/

(a) /3--

downwards.

13-9

--8
= --8

l\177=

4-

--2.45\276\"48/2

4- 12.0

thereby produces a
be-

high correlation
Y themselves.

13-11Make

fi

is better

by .38

mpg.

ANSWERS

TO ODD-NUMBERED
15-3

14-1
i'/\17744)
\1779<

=. 62
p(34)
< .95

(a) a 1
(b)

a\177

(c)

aa

It

(d) False.

/(1 00)(20)=

best when 'rain' is


and
also 'When no
is possible.
Action aa

prediction

when 'shine' is predicted.


a 2 is never best.\"

is best

4.7while

F. 0a

Action

10.13

-318

I
\177\177.35
2.2

4-.51
while

t.o2$

any of these 3

Fo

15-5 (a) Midrange,

is false,

me

and should

ca

(a)

73 or

Mode,

(b) Median,

be:

mode.

mean,

<

74

73 to 74

(c) Mean,

1, no strict conclusion
be drawn about b,.\"

median,

(b) Correct.

reasons, do
15-7

14-5 \177)
'

a\177is

predicted,

.62

be:

should

\"Action
14-3

391

PROBLEMS

15-9 Closerto

73.5
because

20,

th\177

twice as reliable.

are

data

\337

14-7

(b) .9\1772

15-11

'

(a) is less believable, becauseit puts


complete
faith in a very
small
and unreliable sample.

> r 2 necessarily

Rzi

15-13(a) 103.54,\177z=
(b) 113.12,\177=

('D rst-

Average

14-9 (!a).:

of 3.16.

) .0 6

(c) r0/r

.020
\177.195

\177=

loss increases'by a

= 4/1,

.33, /3 =
.05,

which is

factor

unreason-

able.

2f
1,2

Ie)

15-15

- 5.3

do not

(a) You
the

want to play, because


of the game is 10/8 to
which I could win by using

value

me,

the strategy mix: H played


the time, T played
3/8.
(b) Each
which

15-1

ia)

.4, .5

, .44,

.28

play

Hand

results

5/8 of

Tequatly often,
in a zero payoff.
I

would secretly choosd my penny


only if my opponent was also
secretly
choosing and seemed
easy
to outwit.

Definition
Meaning,

Syrnbol\177

(a)

ENGLISH

ANOV?
a
!

or Other
Reference

LETTERS

(11-7),
intercept

regression

\177estimated

(11-7),(11-16),
(12-13)

slope

regression

\177stimated

(I 1-13)

Table 10-6

nalysis of variance

Important

(7-12)

ias

umber of columns

C\177.
d.f.

E(X)

:onstant coefficient

in

nodified chi-square

variable

legteesof

freedom

egression

error

(also F,

of
(10-27)

coefficient

regression

,st\177mated

in analysis

or

variance,

a contrast

(13-3)

(10-22)

(8-23)

(8-11)

(12-3),(12-4)

G, etc.)= event

(3-6)

riotE

(3-17)

of X

\177peCted value

\177x

e ratio

(4-17b)

(10-7), (10-17), (10-,28),


(14-2\1774)

,ternate

only

and

L( )

li celihood

MLE

1T

MSD

1Tean

aximum

(9-1)

hypothesis

(9-2)

if
function

likelihood

(7-24), (12-48)
estimate(tion)

squared deviation

Table 7-2

(2-5),(%13)
393

394

OF

GLOSSARY

IMPORTANT

SYMBOLS
or Other

Definition

Meaning

Symbol

(6-17)

size

sample

normal

specified

with

distribution,

variance

mean and

(6-17)

size

population

Table10-6

squares

mean sum of

MSS

m(,

bn?ortant Reference

(6-31)

(1-2),

sample proportion

Pr (E)
(E/F)

Pr

?(x)

probability

of event

conditional

probability

probability

function

joint
?(,\177'/y)

Y =

multiple

correlation,

decision

rule
variance

residual
S2

pooled

sample sum

variance

ss

sum

student's

of squares,

Y,

if Z

(14-39),(14-40)
(14-44)

(14-43),

(9-3)

(2-6)

or

(12-24)

regression

(8-16),(10-26)

of samples
or

(10-27)
(14-29)

(5-16),(6-2)
Table

variation

(I0-6)

(8-10), (12-26)

(13-24)

vat

variance = o2
weighted

X and

t variable

time

of variance

or

in

(5-10)

(14-4), (14-16)

in analysis

variance of sample,

(5-2b)

X,

determination

of

coefficient

or

partial correlation of
were held constant
R

and
function of
of X

number of rows

(3-22)

of X

correlation,

simple

given F

of E,

probability

conditioned
given

F2

function

probability

(6-28)

(3-\177)

(5-32)
(5-30)

sum

-- random

(also Y, V, W, etc.)
regressor

(alsoy, v,

(4-1),

or

variable,

in original
etc.)

regressor
in
the mean

(11-4)

form

= value

terms of

of

X,

deviations

sample mean of X (note


different usage than

this
\234)

(4-2)

(Fig. 4-2)

or
from

is a

(1i-5)

(2-1),(6-9)

GLOSSARY

IMPORTANT

OF

Definition

Important

Meaning

Symbol

'realized)value
this

letters

or

GREIK

LETTERS

are

probability

of type

population

regression

I error,

or

(12-3)

intercept

(9-9)
(12-3)

of type II error, or
population regression slope
population

any population

coefficient

regression

parameter

sampleestimator

of

population

mean

population

proportion

(7-11)

(4-3),

Pxy

population

correlationof

population

standard

population

variance

population

covariance

of

sum of

E'uF'

E and

ErhF

X and

F, or
F

(6-20)

(14-3)
(4-4)

deviation

(4-4),

of X

and Y

both

(4-5),

(4-19)

(5-21), (5-22), (5-23)


Table

/[ATHEMATICAL
E or

(4-17a)

(7-30)

product

OT\177ER

(4-10),

(1-2), (4-7),

II

(C)

parameters as follows:
(9-18)

probability

(8-9)

(4-13),

for population

reserved

generally

(7-9)

Table 13-1

a secondregressor

(b)

Other

between

standard normal variable,

or

Reference

of X.

distinction

little

After Chapter 8
capital and
is forgotten

395

SYMBOLS

SYMBOLS

(3-10)

2-2

396

GLOSSARY

equals,

by

OF

(d)

GREEK.

distributed

SYMBOLS

definition

(2-1a)

equals

approximately

is

IMPORTANT

(2-lb)

as

(6-31)

ALPHABET

EnoMish
Letters

Names

Equivalent

Letters

Names

English

Equivalent

As

Alpha

Nv

Nu

Bfi

Beta

E\177

xi

1'7

Gamma

Omicron

Delta

g
d

0o

A6

II\177-

Pi

Pp

Rho

E\177

Epsilon

Z\177

Zeta

Eta

\177

e
z

O0

Theta

It

Iota

Kx

Kappa

A/\177

Lambda

M\177t

Mu

nl

:Eo

Sigma

p
r
s

Tr

Tau

\276v

Upsilon

q\177\275

Phi

Xg

Chi

\177'V'

Psi

\177o)

Omega

uory

18, 224

dev: ations,

Absoh\177t[

' 168
, 278

hyl\177ot. hesis

Alternate
Analysi\177of

c9variance

Analysbq of

assun4ptiorls, 202
confid

nce iintervals,

hypot
intera

esis !est,
tion\177 215

one

Best linear
Bias, 134

195

206, 216

in

to,

con >arep to, 195, 278


sum o sqm tres, 204
!11,299
table,
011\177
two

tors

fa

classical

mean, 121
al.\177o Variation

Measures

363

trial, 59

thod compared,

variance, 121

324, 332,

critique, 33I
ecisions,
estima\177tion,

C._,

\17722

ri\177tio

Classicalversus

test,

336

loss fu\177ctiq',
315; 318,323
prior and r\177d,
sterior
probability,
subsect,
utility:

\1773\177

weakn\177aS,

nayes'

241

estimation,

Bayesian

Coding. 22
312

Collinearity,

see

Multicollinearity'

event, 35

331

Compositehypothesis,

312

Confidence interval, acceptable hypotheses, as set of, 2, 191,216


in analysis
of variance, 205, 216

319

3} 1

thlleare4, 44,

164

modified,

324,339

Complementary

xve n}th,-e,
\177fqnctlon'

121

Chi-square variable,
table, 368

333

113

theorem,

binomial,

for regression,

332

large

st,'engt\177,

for

compared,349

hypothesis
}iests,
shmpl\177,
328

likelihood

164, 368

limit

Central

327, 329

MLq,

and

statistic,

301

Centers, 12

by interval,

game theor

78

distribution,

normal, 292,

327, 329

intervals,

120

sample sum, as a,

Bivariate
confidgnce

12

approximation,

normal

of location

312

\177ods,

inc

362

table, 365

cumulative

tabl\177e,

mett

59

coefficients, table,

211

\177lean;

Bayesiarl

Unbiasedness

distribution,

Binomial

\177

D4; see
,n, 2\177

variati

sampling,

see also

298

variable is

MSD, 135

of sample
in

regres ion, !applied

if some

240

273

ignored,

i 195

f\177:tor,

estimator,

unbiased

regression,

196, 213

119, 125
120

variance,

and

mean

(ANOVA),

v{\177riance

162

Daniel.

Bell,

Bernoulli population,

397

175,

182

398

INDEX

Covariance, 88,286

(cont.)

interval

Confidence

Bayesian, 327. 329


for difference

206,

means,

several

in

216

proportions, 161

example.2
mean,

for

small

large sample,
158

proportion,
sample,

small

for

2, 157

multiple,

coefficients,

regression

266

simple, 244
for

Degrees of

freedom, 154

in

analysis

in

multiple

of variance, 199
regression,
259,273,

31t

see Independence

17

Deviations,

in means and proportions,


see
Confidence Interval, for difference
Discrete variable, 8, 52
Distribution,
see Probability
functions
Dummy
variable
regression,
269
and analysis of covariance, 279
278

ANOVA,

compared to moving
average,
277
for seasonal adjustment, 274

137, 148
121

correction,

Continuity

3, 106

and

163

variance,

Consistency,

312

Deduction.

Difference

as a, 131

interval,

269

Dependence,
statistical,

difference, 161

for proportions,
random

information,

Destructive testing, 5

190

one-sided type,
for

see Confidence
difference

for

interval,

Crosssection

in simple

2, 131

several,

means,

in hypothesis

regression, 243
in single
sample, 154
in two samples,
156
Density function, 64

129, 132

large sample,
sample,
152

meaning of,
for

to, 187

relation

test,

hypothesis

Critical point

Decision theory.

155,205

in two

difference

223
testing, 168

a line,

fitting

large

150

sample,

small sample,
for

means,

in two

difference

for

91

independence,

and

Criteria for

293

correlation,

for

Continuous distributions, 63
9

variable,

Continuous

Contrast of means,

experiments,

Controlled

of MLE, asymptotically, 148


of sample mean and median, 137
Error, confidence interval
allowance,
129; see also Confidence interval
in hypothesis
testing, 169

288

confidence interval,

286

in

305

test,

hypothesis

293

compared to,

covariance,

independence,

relation to, 91

interpretation,

286,

in

291,300

306,

compared
286

simple,

285

point,

to, 285,296,

301,

see Confidence

128

Bayesian, 322
versus

Bayesian

classical,

estimator, comparedto,
and

Counted data, see Bernoulli population;


Binomial distribution
Counter
variable,
120, 157,270; see also
Dummy

interval,

variable

215

interval

305
sample,

in ANOVA,
243,275,297

fitting,

regresson,

Estimate,

308

population, 285
regression,

after

1,

236

model,

regression

residual,

multiple, 310
partial,

equivalence

137

of,

237

291

assumptions,

calculation,

and statistical

economic

285

Correlation,

136

Efficiency,

207,216

regression

function,

loss

324

132

323

properties of, 134


Estimating

equations,
259

(least-squares)
multiple

regression,

in simple regression,

227

in

399
INDEX
7

Darrell,

HUff,

167

HYpothesis test,

Event

in

45

ind

:nc\177ent,

int,

:cdion of, 34

Bayesian, 333,339

exclusive, 34

for
value; Mean

3ee Expected

m,

\177ction

ot

216

point,

critical

errors of type

,345
ss, 316
\177

or a s.\177mple

prob-value, 179
regression,
245,299
for seasonal influence,

in

135

variance,

86, 93,106

Of a siam,
\177ee a!so Mean

istatiJtic '

201

use, 299,

regression

rela!ion to
tabli\177.,

31

t, 209,300,

of variables,

215
egression,
223,245,303

sJr.D

A. S., 241
'F*-r-.a\177-.k;,;-;
S; see also Relative f.r. equ\177e2cy
ri\177MP\177t\177\"J'
Aom
variable,
Fundtions,
o\177one ran,,

....

of\177twO

Gan[e theory,

c. nservative,

function, 341

1\177
\177ss(payoff)
finimax
and

\177ture

349

maximin,

as opponent,

addle point,
.trategies,

342, 347

dominated, 347

distribution,

G\177tussian

see Normal

listogram,

240

theorem,

auss-Markov

( lossary
( [ossett,

of symbols,
W. S.. 153
11

toel, P., 1t3

393

225

in regression,

attractive properties,

225

228

calculations,

229

coefficients,

regression,

multiple

Likelihood

function,

257

143,250

ratio test, Bayesian,


B. W., 149,331

Likelihood
Lindgren,

contrast

336

of means,

207

games,340

variable

numbers, 49

of large

Leastsquares

Linear combination,

340

pure,

distribu-

tion

in

342

strictly determined

interval

equations, 227,259

348

mixed, 344,347
i

see Bivariate

distribution,

Joint

Law

340

solution, compared to,


as too, 348

B'wesian

Interpolating in regression,
247
Interval estimate, see Confidence
Isoprobability
ellipses, 293

84

variables,

random

1,3;seealso

interval

Confidence

in ANOVA,

91

83

and inference,

Induction

153

to,

45

of events,

311

Null

statistical, 45

independence,

covariance,relation

-' 1

I:itted (predicted)value,
in!

see also Confidence interval;


hypothesis

370

R. A.,

'ishe\177

249

ANOVA use, 199,204, 213

distribution,
;

regre\177=,,,,*,

276

185,187

two-sided,

in

dangers

E\177trapc\177lation

170

176

170,

power,

108,117

mean,

.Lmple

168
I and II, 169,
regression,
266

one-sided, 168, 190

93

combination,

i\177ear

multiple

in

2, 187,

interval, relation to,

confidence

variables, 73,

of random

175,182

305

correlation,

74

definition,
Exp\177

simple,

versus

composite

,33
Ex \177

196,213

ANOVA,

of

random

93

variables,

regression slope, 239


Linear

of a

transformation,
variable,
70

of observations,

of random

normal

19

variables,

58, 93

400
INDEx
375

Logarithms,

315, 318, 323,341

Loss function,

j.,

McDonald,

as Bayesian estimates,

of binomial

\177-,141,

least 254
squares,

of mean
in

multiple

of

parameter

regression,
in general,

141,

\370fproportion,

250

Bernoulli

of binomial,

comparisons, 206, 216, 281


correlation,

\17742,

310

wuqation,

and

Multiple regress/on, 255

148

calculations, 258
intervals,

266

equations,

259

confidence

error reduced, 237

333
120

estimating

265

interpretation,

for, see Confidence

estimation, 257

least squares

posterior., 328
of

56, 66, 29
coefficients,
238

86,

approximation to
67

distribution,

Mean sum of squares,


Measures of location,

as

Bayesian
ef\177ciency,

Minimax
Mode,
12and
as

Bayesian

as MLE

maximin,
estimator,

estimator,

332

121

Binomial,

155

of symbols,

random
variables, 52,
regressors,
225, 234

for

393

132

switch, 154,
Null

12

estimator,
137

to, 153,

Notation,
glossary
for mean, 73

203

12

hypothesis,

danger

in

danger

in

168

accepting,
rejecting,

178, 267
179

323

estimator, 136

unbiased

t, relation
table,
367

for

Measuresof spread,17
of sample,

variable,

Normal

93

see al,\276o Expected


value; Sample mean
Mean squared
error,
137
and consistency, 138
related
to bias and variance, I38
Mean
squared
deviation
(MSD), 18
bias,
135

Median

305

equations, 227, 259


Z, 66

Normal

I22

241

statistics,

Nonsense correlations,

variable,

regress/on

to, 308

relation

66, I03

of sample proportion,
of sample sum, 106

of sum,

partial correlation,
xee also Regression
Nonparametric

of random

256

model,

mathematical

of//near
combination, 58, 93
MLE,
145
56,

278,283

hypothesis tests, 266

confidence
interval interval

of population

to, 255,

ANOVA,
re/at/on
b/as
reduced,
273

144

population,

I21

310

and las\177 regressor,


311
and regress/on, 310

253,

83

conditional,

Multiple
Multiple

257
142, 147

deficiencies,

sample

Mean, of

to, 250,

of moments,

in regress/on,
small

148

260

264

treatment,

142, 145

(normal),

versus method

m partial correlation, 3IO

144

equivalence

of regressors,

M!dticollinearity

147, 251

properties,

sample

large

\17742,

Operating characteristics
set, 30

Outcome

curve,

342, 347
323

148

Monte Carlo, 140

(MLE),

332

interpretation,

geometric

of moments
estimation,
Mean; Variance

method
,tee also

Maximum
141 Likelihood Estimates

I9

Moments, 16,

Parameters of population,
glossary,

Partial

395

correlation,

assumptions, 309

308

128

184

401

INDEX

coml:

n, 309

regre\177

relation

341

Payoff

Point

point

Poisson

bution,

Pooled

Ice, 156, 199

Popula'

02

Power

athesis

Predtct\177t)]

:erval

45, 312, 326

Probabt!ity,

\1777

as

tre;qucncy,

27,

of,

limit

48

66

Probabi\177Jity

d\177nsity function,

Probabi\177lity

f\177tncl[ons (distributions)

6inon ial, 59
81

i \177

104 314

relation to,
Propert]es of
Proportions,

10, 104

Randon\177

digi

Random

nor

Randorb sat

exambluff

266

etc.,

q/,

correlation, compared
301. 305
term,

236

331

level,

estimators,

134

181

122; see also Confidence


frequency
table, 360
numbers,

\177ling,

102

I9, 125

;. 102

1, 102,103

234, 237
249

see also Multiple

nonlinear, 250
parameters, 235
245,

prediction,

303

prediction interval, 245


residuals,
237, 275, 297

see also Multiple


Regrets,

361

253,254

model,

limitations,

multiple. 255:
gression

significance

225

estimation,

mathematical

331
test, 179

hal

285,*'296,

to,

coefficients, 229, 24I


versus
random independent

model

inter\177a]; Relative

Bern+u!li,

244

or,

variable, 254

o\177a

definittion.

243

[\177,244

fixed

12. 326

312,

Prob-va\177lue

for
for
for

301

population,

normal

least squares

330,

personal,

posterior,

236
variable,'\177254, 305

Iiketihood function,

marg]nal,\177

variable,

estimated

5:

normal, 6,

prior,

'

63

cont\177uou\177,

joint,'78

independent

error

'

\177
b\177var\177ate,

discr{te,

error term,

about

confidence intervals,

330, 331

subjective?50,

.I

about

bivariate

40

perso[hal,50, 330,331

condi'tiona

about dependent

bias. 273

cond(tiona!,

symn\275ctnc,

299

assumptions,
235

axionl\177atic?49

relattye

as ANOVA.

326,

314,

probabilities,

I7

Regression, 220,234

331

45,312,

of sample,

Range

267

regression,

in

84

regressor, 254

test, 170, 176


245

lities,

:ertor

and

63

continuous,

of, 72.84

function

in regression,

dion

72,

derived,

328

lance,

ini(

Sampling

variable,

discrete, 52

Posteridr

Prior

see also
Random

66, 103

56,

Prior pro

124

definition, 52

t'

Posteridr

128

of population,

summary,

121

me!in, 328

Posteri4r

simulated, 26, 56, 105


as subset

seeEstimate,

e:

102, 124

replacement,

with

space, 35

Partitio\177

116, 124

without replacement,
308

to,

Relative frequency,
density,

regression

335

9, 63, 103

64

limit is probability,
Residuals. see Error;
Robustness,
163

27,

66

Variation

re-

402

INDEX

Saddle point,

342

as Bayesian

323

estimator,

and central limit theorem,


distribution, 109, 112

113

t Statistic,

normal,

estimator,
241
transformation
of sample

Gauss-Markov

as linear
sum, 107

as sample

mean, 122,
frequency

Relative

test
ad-

109, 115

289

169

error,

170

!I error,

Type

correlation,

225, 230

regression,

Type I

105

distribution,

Translation of axes,in
in covariance, 88
in

Sample space, 30

Samplesum,

368

350

justment

variance, 108, tt7

Sampleproportion,
125;seealso

table,

Tables,

Test of hypothesis,
see Hypothesis
Time series,269;seealso Seasonal

112

distributed,

normally

153

to, 209, 300,311


relation
to, 153, 155

F, relation

as estimator of/\177, 128, 136


expected value, 108, 117
as

152

distribution,

137

efficiency,

327

estimates,

Bayesian

and

15

distribution,

Symmetric

mean, 13, 107

Sample

mean, 106, 117

see Variance

of

bias, 6

methods, 5

see alsoRandom

dummy

using

variables, 274

deviation,

319

loss,

152;

120

163

see also Variation


58, 95

combination,

linear

pooled, 156, 199

of population,
of

see also

of

56,

199;

variable, 56, 66
statistic

see ah'o F

regression

coefficients,

128

unexplained

sample, single,

85; see

18, 135

sample
mean, 108, 117
of sampleproportion,
122
of sample sum, 106, 117

of

of

Game theory
see t Statistic

238

residual, 204, 237, 275;seealso


Variation,

also Variance
variable, 70

66, 103
328

distribution,

posterior

of random

sum,

94

unexplained, 204;

see

Student's t,
Sum of random
variables,
Mean; Variance
Sum of squarcs, see Variation

of

population,

91

204;

explained,

ratio,

19; see

Statistic definition, 8,
Strategies,

168,

see Regression

of normal
variable,
59

of random

test,

181

prob-value,

regression,

Standard

135

interval,

as covariance,

277

hypothesis

Skewed distribution,
15
Slonim, M. V., 7
Small sample estimation,
Confidence interval
Square
root table, 351
Standardization,

variance,

confidence

average,

moving

Significance level of
170

Simple

of sample

Variance, of Bernoulli
of binomial, 121

H., 206

relation to
Sign test, 76

mean, 136

sampling

Seasonaladjustment,
with

sample,

random

of sample

Utility versus monetary

reasons, 5
Scheft6,

138

asymptotic,

102

Sampling,

134

Unbiasedness,

117

106,

variance,

Sample variance,

seealso Variation,

unexplained

also

Variation

(explained, unexplained, and


203,204,
212, 213,297

total),
unexplained,

205,211,215

403

INDEX

Wilks, S. S., 149

Venn diag

Z variable,

Wallis, W
Weighted

15,94

see

Normal

Zero-sum game, 341

variable

You might also like