Introductory Statistics

\370ductory
Int
statistics
WONNACOTT
THOMA
)lessor of Mathematics
Associate
University
oi
RONALI
Western
) J.
Ontario
WONNACOTT
Professor of Economics
University
ol
Western
&
New
York
Ontario
SONS\177
\370ndon' Sydney'
INC.
Toronto
Copyright
\270
1969
by John
rights
reserved. No part
be reproduced by any means,
nor
translated
into a machine
All
out the
written
permission
Wiley
& Sons,
of this
book
Inc.
ma)'
nor transmitted,
langua\177\275e with-
o.f the
publisher.
109876543
Library
of Congress
SBN 471
Printed
95965 0
in the
Catalog Card
United States of
Number'
America
69-16041
INTRODUCTORY
STATISTICS
Monique
and
Eloise
Our ol
\177jective has been to write a text that

would
come into the Statistics
market belween the two texts written by Paul G. Hoel (or the tWO texts
written by Iohn E. Freund). We have tried to cover most of the material
in
their mathematical
statistics books, but we have used mathematics only
slightly
m\370\177'e difficult
than
that used in their
elementary
books. calculus is
used only ih sections where the argument is difficult
to develop
without it;
although tt}is puts the calculus student at an advantage, we have
made
a
special effort to design these sectionsso that
a student
without calculus can
also folloW\177
By
\177
r ec
re
by
books
uiring
texts we hai
to treat
mathematical
and inferer :e to the theory
objectiveh\177.s been to
mathematics
than many other elementary
many important topics normally
covered
only
statistics: for example, the relation
of sampling
more
little
able
been
of
and
probability
random
variables.
Another
logical relation between topics that have often

and isolatedchapters:for example,
the equivalence
of i\177
terval estimation and hypothesistesting, of the t test and \234 test,
and
of anal
,sis of variance and regressionusing dummy
variables.
in every
case our m(
:ivation has been twofold' to help the student appreciate.
indeed
enjoy--the
underlying
logic,
and to help him arrive at answers to practical
appeared
i\177
texts
show
the
as separate
problems.
We ha{
regression
statistician
such relatec
Our
or
priority on the regression model, not only

because
regarded
as the most powerful tool of the practicing
but also because it provides
a good focal point for understanding
techniques
as correlation
and analysis of variance.
\177e
placed
high
s widely
ginal
aim was to write an introduction to statistics for e\177onomic
b] tt as our efforts increased, so it seems
did our ambitiQns. Ac\177
cordingly, t \177isbook is now written for students in economics
and other social
students,
sciences,
by
foi
mathe\177
omitted
fro
business schools, and for service

itics departments.Some of the
!
rn
audience:
f\177
,r
decisions,
example,
\177d game
multiple
in statistics
Provided
typically
are of interest to such a broad
comparisons,
multiple regression, Bayesian
courses,
introductory
courses
but
theory.
vii
topics
covered
are
PREFACE
viii
reservedfor
a special
these
.raises major
calculus
simple,
is kept
itself
been made
has
effort
to allow the
at least to
arrow (=>) if they

bracketed ( ) if they
exercise
previous
duplicate
the
finer
is allowed,
instructor
to his students' background.

are more difficult,
or set with an
ideas taken up later in the text, or
problems,
and thus provide
optional
if they
important
introduce
Thus
the
course
the
(*)
some of
Moreover,
to skip
student
elementary
more
continuity.
to the instructor's manual.
somedegree,to tailor
Problemsare also starred
and
and design. The text

interpretations and developments
sections. In all instancestheseareoptional;
losing
without
are deferred
difficult
more
starred
and
with
students
including
of evenness
problems
the
with
footnotes
completely
points
severalaudiences
text aimed at
A statistics
without
only.
Our experiencehas beenthat
this
is about
the
right
of material
amount
two-semestercourse;a single semester

introduction
is easily designed
to include the first 7, 8, or 9 chapters. We have also found that majors in
economics
who may be pushed a bit harder can cover
the first 10 chapters in
one
semester.
This has allowed us in the second
semester to use our forthcoming Econometricstext which provides
more detailed coverage of the
material
in Chapters
11 to 15of this book,
plus additional
material on serial
correlation, identification,
and
other
econometric
problems.
for a
So many
have
contributed
to this book that
it is
them all individually.
However,
a special vote of thanks
implication, to the following for their thoughtful
reviews:
David A. Belsley,Ralph
R. W.
Pfouts,
and
teaching assistantsand
the
University
the
of Western
many improvements
London,
A.
especially
Ontario,
September, 1968
Canada
during
Bradley,
Franklin
students
go,
should
without
J. Arnold,
Harvey
Edward
Greenberg, Leonard Kent,
M. Fisher. We are also indebted to our
in both mathematics and economicsat
Ontario and Wesleyan

a two-year
to thank
impossible
classroom
who
(Connecticut)
suggested
test.
Thomas H.
Ronald
J.
Wonnacott
Wonnacott
on
Introducti
1-1
Ex
1-2
In(.uction
1-3
Wiry
Sample?
1-4
Hiw
to Sample
Descnpt\177e
o
. t
2-1
\177mple
and Deduction
5
5
and Graphs
Tables
ce([ters (Measuresof Location)

of Spread)
(Measures
Deifiations
Lin!:ar TransfOrmations
(Coding)
3 Probabilit F
3-6
Indi:pendence
Random
4-1
ariables
Me\177.n and
Bin\177,mial
tinUous
27
29
30
45
Views
of Probability
48
and Their'DistributiOns
52
DisCreteRandom
Col:]
19
40
Probability
cor\177ditional
Otl\177er
12
17
27
3-1 Int\177 oduction

3-2 Ele\177nentary Properties of Probability
3-3 Events and Their Probabilities
3-5
8
8
Intl'oduction
FrgquenCy
3-4
for Samples
Statistics
52
56
Variables
Variance
Distribution
59
Distributions
63
Th e N\370rmal Distribution
4-6 A F:anction of a
RandOm
66
Variable
4-7 Notl ttion
72
73
ix
CONTENTS
Two
77
Random Variables
5-1
5-2
Functions
5-3
Covariance
5-4
Linear
77
Distributions
of Two
84
Random Variables
of Two
Combination
88
93
Random Variables
102
Sampling
102
6-1
Introduction
6-2
Sample
Sum
6-3
Sample
Mean
6-4 Central
6-5 Sampling
6-6 Sampling
6-7 Summary
7
Estimation
105
107
112
Theorem
Limit
a Finite Population, without

from Bernoulli Populations
from
of
Sampling
119
124
Theory
128
Interval
7-1 Introduction'Confidence
7-2 Desirable Properties of Estimators
Estimation
7-3 Maximum-Likelihood
116
Replacement
for
128
the Mean
134
141
(MLE)
150
150
152
Estimation II
in Two Means
8-1 Difference
8-2 Small Sample Estimation: the t Distribution
8-3 Estimating Population Proportions' The Election Prob16m
157
Once
8-4
Again
the Variance
Square Distribution
Estimating
of a
Normal
Population'
The
10
9-1
Testing a SimpleHypothesis
9-2
Composite
Hypotheses
9-3
Two-Sided
Tests vs.
9-4
The
9-5
Conclusions
Analysis
163
167
9 Hypothesis Testing
Relation
Chi-
of
167
175
One-Sided Tests
Hypothesis
Tests to ConfidenceIntervals
Introduction
10\1772
One-Factor
10-3
Two-Factor Analysis
187'
193
195
of Variance
10-1
185
Analysis of
of
Variance
Variance
195
195
211
xi
CONTENTS
11
Introd
11-!
Regression
to
ction
220
221
\177nExample
dd-3
'he Least
ix 11-1
,,Appem
for
Criteria
iosSible
Derivation
Alternative
An
With\370ut Calculus
Model
234
Term
Error
the
of
E
stimating \177.and fi
T he Mean and Variance of a
12-5 T he
12-6
12-7
for
Interval
240
241
Hypotheses
about
243
245
249
Estimation
of the
]'he Characteristics
II\177dependent
250
254
Variable
Regression
13-1
In
13-2
T\177Le
255
:roductory Example
t squares
Mhltic\370llinearity
13-5
In}erpreting
Lqas
D\177mmy
255
256
257
Model
Mathematical
13-4
Estimatio
26O
an Estimates
Regression
265
Variables
269
Re\177ression, Analysis
\17713-7
and Testing
Y0
Likelihood
l\177/laximUm
13-6
238
of Extrapolation
D'angers
12.-10
Multipl
237
and b
of a and b
>nfidence
12-9
236
Theorem
Gauss-Markov
he DistribUtion
C
Intervals
12-8 P,'ediction
t/I\177_3
231
234
12-2 T he Nature
12-11
of Least SquaresEsti-
Theory
ion
12-1 ] he Mathematical
12-3
225
!i
12 Regress
12-4
223
Squares Solution
mates
13
a Line
Fitting
of Variance, and
Analysis
of Co278
va[iance
Correlatii)
14-1 Sir\177ple
285
Correlation
14-2 Pa:'tiaI Correlation

14-3 M1 ltiple Correlation
15
theory
Decision
15-1
Pri
15-20p
)r and
285
308
310
312
Posterior
:imal Decisions
Distributions
312
315
CONTENTS
xii
as a
322
Decision
15-3
Estimation
15-4
Estimation' BayesianVersus
Classical
Critique
of Bayesian Methods
15-5
15-6
Hypothesis Testing
15-7 Game Theory
as
a Bayesian
Decision
324
331
333
340
350
Tables
Appendix
Table
Squares
Table II
Random
Table III
Binomial
and
Square
Roots
Digits and Normal Variates

and Probabilities
Coefficients
362
IV
Standard Normal Probabilities
367
Table
Table
Student'st
368
Critical
Table VI
Modified
Table VII
F Critical
Points
Table VIII
Common
Logarithms
Points
Chi-Square
Acknowledgements
Answers to
Glossary of
Index
351
360
Odd-NumberedProblems
Symbols
Critical Points
369
370
374
376
377
393
397
IntrodUction
originally meant the collectionof popUlatiOn

and
vital to the state. From that
modest
beglnning,
statistics
ha s grown into a scientific
method
of analysis
now applied to all
the
social
a4d naturaI sciences, and one of the major
branches
of inathe matics. The\177resent
aims
and methods
of statistics are best illustrated
with
The
wor
i \"statiStics\"
economic
in?Ormation
a familiar
ex!mple.
1-1 EXAM ?LE
:
Before
specifically,
/ery
presidential
'.hey
try
to
guess
vote for
each candidate. Clearly,
task. As
the
hope
\177nly
alternative,
the pollsters
election,
the proportion
canvassing
survey
they
that
thq sample
proportion
proport'\177on. This is a typical
will
be
of
to
try
the
pick
population
the winner;
thht
will
would be a hopeless
a sample of a few thousand
i n the
good estimate of the total pOpulaall voters
exampleof statistical inference

or statistical
induction.' the (voting) characteristicsof an unknown
population
are inferred
from the (vo ing) characteristics
of an observed sample.
As any \177ollSter will admit, it is an uncertain business.To be sur e of the
tion
population,
\177ne has
to wait until election day when
all votes
are cOUnted.
Yet if the sa ripling
is done
fairly and adequately, we can have
high
hopes
that the sam'\177le proportion
will be close to the population proporti\370hl
This
allows
us tO:estimate the unknown
population
proportion
rr from the observed sampl\177! ProPortion
(?),
as follows:
;
with crucial J uestions being,

,r =\"Ho
P w4we that
we a 'e right
?\"
small is
samall
error
this
error
?\"
and
\"Ho w sure
are
INTRODUCTION
typifies the very

of Chapter 7
this
Since
language
the
in
core of the
(where the
it more precisely
find the proof and a
we state
book,
will
reader
understanding).
fuIler
If
confidence
the sampling
that
is
and
random
-4- 1.96x/P(l
\177rand
u'here
size.
of how
illustration
an
As
1,000voters,
of .60,
proportion
this
.60
95 \177 confidence,
to be between
with
This
will
the sample
we have
candidate. With
sampled
sample
this
(11000
-- .60)
-4- 1.965/.60
.60q-.03
Democrat
kind
n is
and
equation (1-2)becomes
or approximately
Thus,
95\177
(1-2)
suppose
works,
Formula
the Democratic
600 choosing
with
}\177,ith
-- P)
and sampleproportion,
the population
P are
state
we can
enough,
large
we
estimate
the population
.57 and .63.

as a confidenceinterval,
is referred
to
be one
of our
major
objectives
(1-3)
and
in this
making
proportion
estimates
book. The
other
voting
of
this
objective
For
example,
suppose we wish
to test the hypothesis
candidate
will
win
the election.
On the basis of the
information
in equation
(1-3) we would reject this claim;
it is no surprise
that
a sample
result that pointed to a Democratic majority
of 57 to 63 \177 of
the vote will
also
allow
us to reject the hypothesisof a Republican
victory.
In general,
there is a very close association
of this kind between confidence
intervals
and
hypothesis
tests; indeed, we will show
that in many instances
they
are
equivalent
procedures.
We pause to make several
other
crucial
observations
about equation
is
test
to
h)7\177otheses.
Republican
the
that
(\177-3).
estimate is not made

with certainO';
we are only 95 \177 confident.
concede the possibility that we are wrong
and wrong because we
were unlucky
enough
to draw a misleading sample. Thus, even
if less than
half the population is in fact Democratic,
it is still possible,
although
unIlkely, for us to run into a string of Democrats in our sample. In such circumstances, our conclusicm
(1-3)
would
be dead wrong. Since this
sort
of bad
luck is possible, but not likely, we can be 95 \177oconfident
of our conclusion.
2. Luck becomesless of a factor
as sample
size increases; the more
voters we canvass,
the less likely we are to draw
a predominantly
Democratic
1.
We
The
must
AND
INDUCTION
'DEDUCTION
sample from a Republican population. Hence,the more precise our predicFormal

ly, this is confirmed
in equation (1-2); in this formula
we note
that
the err(
\177rterm
decreases
with sample size. Thus, if we increased
our
sample to 1\177,000 voters, and continued to observea Democraticproportion
of .60, our 9
5 5/o confidence
interval would become the more precise'
tion.
.604-.01
3. Supp!ose
have two
when
back
Ol\177tions.
indicates
that 95\177o
you are 99 % sure of your
is to increase our sample
employer
our
enough.\"Come
(1-4)
One
good
is not
confidence
We
conclusion.\"
size; as a
now
of
result
this
additional
cost and effort we will be able to make an interval
estimate
with
the precision'; Of (1-4) but at a higher level of confidence.But if the additional
resources fo] further sampling are not available,
then
we can increase our
confidenceo
\177ly by
of Democratsis
making
a less
precise statement--i.e.,
that
the
proportion
.60 4-.02
The less we
we can
be
ourselves
commit
that
are right.
of avoiding
an
we
can be certain
ment so imprecisethat
whole popul\177ttion2;
statistical co]\177clusions
-1
Figure
confident
that we
a state-
be contradicted. \177The other is to sample the

is not statistics ...... it is just counting. Meaningful
be prefaced
by some degree of uncertainty.
it cannot
but
this
must
AND
INDU\234TION
1-2
to a precise prediction, the more

In the limit, there are only
two
ways
erroneous conclusion. One is to make
the
illustrates
reasoning.h duction
DEDUCTION
difference
between
inductive
and deductive
from the specific to the general,

or
population. Deduction is the reverse-arguing
fro m the general to the specific,
i.e., from the population to the
sample.
a Equhtion
(1-1) represents inductive reasoning; we are arguing
from
a sample prc portion
to a population
proportion.
But this is only possible
(in our
case)
\177
E.g.,
Or,
rr
from
involves
the
sample
arguing
to the
.50 :k .50.
th
almost
e whole population. Thus

it would not be necessary
to poll the whole
etermine the winner
of an election; it would
only be necessary to continue
canvassing unti one candidate comes up with a majority. (It is always
possible,
of course,
that
some peop e change
their mind between the sample survey
and their actual vote, but
we don't deal x\177
ith this issue here.)
population
The
that
Thus
student c\177
.n easily
keep these straight with
the popula tion is the point of reference.
induction
deduction
on
to
induction.
is
the
help of a little
means
The prefix in
s arguing towards the population. The prefix
ar
\177uing away
from the population. Finally,
Latin,
and
recognition
\"into\" or \"towards.\"
de means \"away from?' Thus
statistical
inference
is based
INTRODUCTION
Sample
Population
known
(a)
Population
FIG.
1-1 Induction
deduction
and
contrasted.
(b) Deduction
study the simpler
(1-1), we note that the
if we
can be inferred
Sample?
known
(a)
Induction
problem of deduction first.

inductive
(statistical inference).
(probability).
statement
Specifically,
(that the population
in equation
proportion
proportion) is based on a prior deduction

(that
the sample proportion
is likely to be closeto the population
proportion).
Chapters
2 through
5 are
devoted
to deduction. This involves,
for
example,
the study of probability, which
is useful
for its own sake, (e.g., in
from
the
sample
HOW
TheC
Game
dealt
ry);
with
\"With
\"Only
questions
o\177
argume
the
ur
We
in the
ir\177duction
statisticaI
we ask,
6 chapters
first
sample behave? Will the

issue is resolved can we
be
sample
to
move
inference. This involves, in the later chapters, turning

asking \"}low precisely can we make
inferences
population from an observed sample?\"
and
around
known
SAMPLE?
WHY
1-3
deductive
this
when
will a
statistical
tt
an
about
how
as the basis for
In short,
10.
through
,\177npopulation,
target'
'on
Chapters
git
even more useful
it is
but
TO SAMPLE
study
than
, rather
sa_\177ple
the
whole
population,
for any one
of
three reasonis.
(1) Littilted
resources.
data available.
Lirrlited
(2)
(3) Destructive
1.
testing.
Limi
:ed resources
t
preelection
but this is
n,
,olls,funds
only
\177t
the
2. Som,times
may be incm red.
there
almost always play some part. In our example of

not available
to observe the whole population;
reason for sampling.
is only a small sample available,
no matter
what cost
were
example,
an anthropologist may wish to test the theory
on islands .4 and B have
developed
independently,
with their o\177n distinctive
characteristics
of weight, height, etc. But there
is
no way in \177hich
he can
compare
the two civilizations in toro. Instead
he
that
For
two!civilizations
the
must make a }\177inference from the small sampleof the 50 survivin g inhabitants
of island .4 \177tnd the 100 surviving
inhabitants of island B. The sample
size
is fixed by nature, rather than
by the researcher's
budget.
There a\177re many
examples
in business. An
allegedly
more
eflqcient
machine may; be introduced
for testing, with
a view to the purchase of additional
simila?
units.
The manager of quality
control
simply
cannot
wait
around to ol}serve
the
entire
population
this machine will produce.
Instead
a sample ru4 must
be observed,
with the decision on efficiency
based
on an
inference from this sample.
3.
have
SamP
ing may involve

a thousand
produc\177:d
It would be
burn
1-4
HOW
In
staffs
to
insist
and
wish
to
the whole
ott.
they
distinguish
blly
destructive testing. For example,suppose
light bulbs
on observing
know
their
we
average life.
population of bulbs until
SAMPLE
ics,
b \177
as in business
\177.tween
bad
luck
or
any
and bad
other
profession,
management. For
it is
essential to
example,suppose
INTRODUCTION
man bets you s100 at even odds that you will get an ace (i.e., 1 dot) in rolling
a die. You accept the challenge,roll an ace, and he wins. He's a bad manager
and
you're
a good one; he has merely
overcome
his bad management
with
extremely
good
luck. Your only defense against this combination
is to get
him to keep playing
the
game
with your dice.
If we now return to our original example of preelectionpolls, we note
that
the sample proportion of Democrats may
badly
misrepresent
the
population proportion for either
(or both)
of these reasons. No matter
how
well managed
and designed our sampling
procedure
may
be, we may be
unlucky
enough
to turn up a Democratic sample from
a Republican
population. Equation (1-2)relatesto this case; it is assumed that the only complication is the luck of the draw,
and not mismanagement.
From that equation
we confirm
that the best defense against
bad
luck is to \"keep playing\";
by
increasing
our sample size, we improve the reliability
of our estimate.
The other problem is that sampling
can be badly mismanaged or biased.
Forexample,in sampling a population of voters, it is a mistake to take their
names from
a phone
book,
since poor voters who
often
cannot
afford
telephones are badly underrepresented.
Other
examples
of biased samples are easy to find
and
often
amusing.
\"Straw polls\" of peopleon the street are often biased because the interviewer
tends to selectpeoplethat seem civil and well dressed; the surly worker
or
harassed
mother is overlooked. A congressman
can not rely on his mail
as
an unbiased
sample of his constituency, for this is a sample of people with
strong
opinions,
and includes an inordinate number of cranksand members
of pressure
groups.
The simplest way to ensure an unbiased sample is to give each member
of the population an equal chance of being included in the sample.
This, in
fact, is our definition
of a \"random\"
sample. 4 For a sample to be random,
it
cannot
be chosen
in a sloppy or haphazard
way;
it must be carefully
designed. A sample of the first thousand people encountered on a New York
street corner will not be a random sample of the U.S. population.
Instead,
it is necessary to draw
some
of our sample from the West, some from
the
East,
and so on. Only if our sample
is randomized, will it be free of bias and,
equally
important,
only then will
it satisfy
the assumptions
of probability
theory,
and
allow
us to make scientificinferences
of the form of (1-2).
In somecircumstances,
the
only available
sample will be a nonrandom
one. While probability theory
often
cannot
be strictly applied to such a
sample,it still may provide the basis for a goodeducated
guess
or what we
might term the art of inference.Although
this
art is very important, it cannot
be taught
in an elementary text; we, therefore,
consider
only
scientific
4 Strictly
speaking,
complex types of
this is
random
called \"simple
sampling.
random
sampling,\"
to distinguish
it
from
more
READINGS
FURTHER
inference
for ensun >ased on the assumption that

\177gthis
are discussed
further
are random.
6.
samples
Chapter
The techniques
READINGS
FURTHI\177
For
in
readel
recommen,
1.
Hu
2.
Hu:
who
the
wish
following.
a more
\177,
Darrell,
\"How
Darrell,
\"How
F,
lis, W. A., and
ack, 1956.
Paper\177
extensive
to Lie with
to Take a
to
introduction
New
Statistics.\"
Chance.\"New
Roberts,H. V.,
York'
we
highly
Norton, 1954.
1957.
of Statistics.\" Free
York'
Nature
\"The
Statistics,
Norton,
Press
4.
New
5.
)onald,
J.,
ork-
Norton,
tim,
M. J.,
and
Osborn,
1950.
R.,\"Strategy
in
Poker,
Business,
and War.\"
Slo\177
\"Sampling.\"
Simon
and
Shuster
Paperback, 1966.
chapter
Samples
for
Statistics
Descriptive
INTRODUCTION
2-1
already discussed the

the whole population
have
We
an inference
to
the sample must

each is called a
the
In
very
sample proportion
about
people
D
of describing
way
a single
by
sample
of Democrats;this
number is the statistic

P, the
used to make an inference
this
chapter,
previous
the
be
will
Admittedly,
proportion.
compute. In the sample of

by a
followed
turn
now
We
describe two other
(a) The results
only
a count
of the number
division by sample
size, (n = 1,000).
to the more substantial computations
a die
when
Each time
the values
only
assumes
x Later,
we
of
statistics
to
50
thrown
times.
of 200 American
men.
GRAPHS
AND
Example
(a) Discrete
on
is
a sample
of
height
2-2 FREQUENCYTABLES
it
Democrat
voting
samples.
average
(b) The
to
the sample
is trivial
statistic
computing
proportion (.60) required

(600),
numbers;
descriptive
few
pollster
would
record the
in his sample, obtaining a sequencesuch
as
and R represent Democrat and Republican. The
this
population
the
rr,
step,
a preliminary
sample statistic.\177
simple example of Chapter 1, the
answers of the 1000

D D R D R .... where
best
sample.As
reduced to
and
simplified,
be
from a
to make
of statistics
purpose
primary
shall
we
the die,
toss
6.
1, 2,...,
a finite
have to
is
we record the
a \"discrete\"
called
(or countably
define
a statistic
infinite)
X,
which
takes
random variable because

of values.
number
more rigorously;
8
of dots
number
but
for
now, this
will
suffice.
FREQUENCY
'
TA
The 50
BL
of Tossing a
Results
2-I
Die 50
\"
Times
such as given
in Table
2-1.
of the six possible outcomes
in Table
2-2
In column
3 we note that 9 is the
frequency
f (or total number
of times) th\177
it we
rolled
a 1; i.e., we obtained this
outcome
on 9/50 of our
tosses.
Forn
ally, this proportion
(.18) is called relative frequency (fin); it
4.
is computed in column
To
sim
hrows
yield a string
lily, we keep a running
2.
TABLE
Calculation of
(1)
of each
Frequency,
and
in 50 Tosses
(2)
of
Number
tally
of Dots
Number
the
50 numbers
of
(3)
Frequency (f)
Tally
Itll
\177 \177 ]1
12
]\"N4
4
5
'[\"b[.l
111
.16
D44
D-\275I
.10
.20
F'\177
can be
simila
vertical scale
This now
in
column
The
except
.\177ntical
transforms
givi
(b) Continuo\177
:s us
\177 f
2-1.
.fly graphed;
id
are
graphs
rmation
in Figure
.24
.12
=n
Ef=50
graphed
.18
where
is
Frequency
(fin)
]\"\177-I
The info
(4)
Relative
Dots
of the
Relative Frequency
of a Die
an
is
\"the
for
immediate
of all
\177(f/n)
1.00
f\"
3 is called a \"frequency distribution',\"

and
\" relative
'
\337 \337
\337 , ' in
\337 column
frequency distribution
4
so will note that the two

the vertical scale. Hence, a simple change of
2-1 into a relative frequency distribution.
picture of the sample result.
the student
Figure
sum
\234\1773
who does
.s Example
that a sample of 200 men is drawn from a certain population,

each recorded in inches.
The ultimate
aim will be an inference aborn the average height
of the whole population;
but first we must
efficiently sur \177marize and describe
our sample.
Suppose
with
the
heig'\177t of
10
STATISTICS
DESCRIPTIVE
Relative
frequency
Frequency
15
-30%
10 -20%
10%
O0
Number
FIG. 2-1
Frequency
frequency distribution of
relative
and
4
of dots
tossesof a die.
the
of a sample
results
of 50
In this example,
height
(in inches) is our random
variable
X. In this
case, X is continuous; thus an individual's
height might be any value,
such
as 64.328 inches. 2 It no longer makes sense to talk about the frequency
of
this specific value of 2'; chancesarewe'llnever
again
observe
anyone exactly
64.328 inches tall. Instead we can tally the frequency of heights within
a
TABLE 2-3
Frequency, and Relative Frequency of the
of 200 Men
NO.
Boundaries
55.5-58.5
57
58.5-61.5
60
61.5-64.5
63
Tally
Midpt
66
69
5
6
75
78
81
8
9
82.5-85.5
overlook
the measured
height
.010
.035
22
.110
13
.065
fact
.220
36
.180
32
.160
13
.065
.105
21
10
84
the
Frequency
f/n
E\234=200=,
2 We shall
(5)
Relative
44
72
10
(4)
Frequency,
Cell
Cell
Cell
(3)
(2)
(1)
Sample
of a
Heights
although height
is conceptually
to a few decimal places at most,
that
is rounded
.050
\177f/.----1.00
continuous,
and
is therefore
in
practice
discrete.
1. Th
e iCh
2-2, where
.2
2-.2
rounded
off
as
is represented
60
recorded exactly,
63 66 69 72 75 78
into
observations
e values
all sampl
in
Figure
we
than
rather
have
: being
Height
illustrating the
cells,
grouping
nearest integer,
into cells of
then
graphed
to the
preliminary
data is
in
first
two
columns
of
for example,may
width
fact
be
in
1.)
2-3. This frequency disto representfrequencies

as a
Figure
uses bars
histogram,
so-called
81 84
Table 2-3.
tN t the observations
reminder
much
too
into cells is ilIustrated

by a dot. For simplicity,
200 obSerVations
the
(Rounding,
o]
with the following
between
represent
will
hereafter
observations are
grc UPed
The
tribution,
at
o f'
TN grouping of
regarded
the
5<,' \"'5\177\"7D:3..... ;T\"\177;\177\"\"T\",ZZ.T\177'%\177

' ..... 7\177'~';
57
FIG.
reasonablecompromise
which
Observation
tt the
assumed
but
arbitrarily,
convenient whole number.
uPin g
grc
The
cell midpoint,
i a
cell,
Then
Table 2-3.
before.
as
little.
t\177,o
Eack
2.
the
in
is a
cell s
of
e\177.tumber
3 of
column
The cells hav e been chosensomewhat

convenience in mind..
detail and
FREQUENCY
frequency are tabulated
d relative
61.5\")as in
(e.g., 58.5\" to
cell
or
class
frequency
occurred throughout
the
cell,
and
not just
midPdint.
the
We
no\177
frequency
to
turn
diktribution
60
the
question
with
a single
of how we
descriptive
may
characterize
a sample
measure, or sample Statistic.
3\177oo>'
50
\177_40
)\177o\177
II
1057
60
63
66
69
72
75
78
8I
84
Height
FIG.2-3
Th
frequency
and
relative frequency distribution of a sample of
200 men.
DESCRIPTIVE
12
there
fact,
In
of the
2-3
are two
of
concepts
below. We shall start
discussed
LOCATION)
the
with
the
\"center\" of
median, and
the mode, the
of these,
Three
is the central point
is its spread.
OF
are several different
There
distribution.
the first
descriptions'
useful
the second
(MEASURES
CENTERS
(a) The
highly
and
distribution
STATISTICS
the
frequency
mean,
are
simplest.
Modes
This is defined
as the most frequent

value. In our example of heights,
inches, since this cell has the greatest frequency, or highest
bar
in Figure
2-3. Generally, the mode is not a good measure of central
tendency, since it often depends
on the arbitrary grouping of the data.
(The student
will
note that, by redefining cell boundaries, the mode
can be
shifted up or down considerably.)
It is alsopossible
to draw
a sample where
the largest frequency
(highest
bar in the group) occurs at two (or even more)
heights; this unfortunate
ambiguity
is left unresolved, and the distribution
is 69
mode
the
is \"bimodal.\"
(b) The
Median
This is the 50th percentile;

i.e., the value below which
half
the values
in the sample fall. Sinceit splits the observations
into two halves, it is sometimes called the middle value. In the sample
of 200 shown in Figure
2-2,
the median (say, 71.46)is most
the left; but if the only
from
tion
within
a
2-3,
in Figure
\"Mode\"
the
median
means
must
it
easily
derived
by reading off the
information available is the

be
calculated
choosing
frequency
100th value4
distribu-
an appropriate
value
cell. 5
fashion,
in
French.
101st value. This ambiguity is best resolved by defining the median as the average of
the 100th and 101st values. In a sample with an odd number of observations,
this ambiguity
does not arise.
\177
The median cell is clearly
the 6th, since this leaves 44% (i.e., 88) of the sample values
below and 38 % (i.e., 76) above.The median
value can be closely approximated
by moving
through this median cell from left to right to pick up another
6\177o of the observations.
Since this cell includes 18 % of the observations,
we move 6/18 of the way through this
cell. Thus our median approximation is 70.5 + (6/18 x 3) -- 71.5.
4 Or
CENTERS'
Me:i
(c) The
\337
is i ;ometimes
\177
This
Called
is the
This
13
\177
(.\177)
\177\370stc\370mm\370n
, X,)'iare simply
the arithmetic mean, or simply

the
average.
meaSUre. The original observations
then divided by n. Thus
Central
summed,
\177 \177
1 (X 1
X2
. . +
X\177)
Definition
(2-1a)
where
Xi
repr\177
the ith
;ents
The average height

observati:')ns
200
value of 2', and

of
our
sample
and dividing by 200.
be greatly sirr:plified
using
by
the
\177x
means
\"equals, by definition.\"
could be computed by summing

However, this tedious
calculation
grouped data in Table

2-3. Let fl
1, where each observation may
the number (,f observations

in cell
proximated
6 !)y the cell midpoint,
xx. Similar
'3t- Xl
\177
ap-
for all
'JF
'\177
' ' .\177l'l
'3t- Xl)
ii\177
\1773L (X2
\177 JV'
_Hi iX2 + ....
f\177times
fl times
+
f10
\177
be
celli too, so that
the other
where
represent
hold
approximations
all
can
represents
approximate
\177
it follows
equality;
\177
times
that
\337
'A00}
.... A0
XlO
In gener a ]
X \177 i=
\177 x i
\177
In
approximati
(2-1b)
observed
value by the midpoint of its cell, we sometimes
err
mes negatively;
but unless we are very
unlucky,
these errors will tend
the error must be smaller
than half the cell
width. Note that th e unluckiest case,however,
by the small xi, to distinguish
the m from
the observed val\177 cell midPoints are designated
positively, somet
to cancel.Even
ir
\177geach
CENTERS
where
(f\177/n) =
numbe r
We
formulation
frequency
relative
equation
this
(2'Ia), appropriatefor
calculation ifof (2-Ib)i
s based on the
3 ot
column
value
each x
(d)
Cotnpari
is
th
the
distrf
mirro]
coincide.
data.
grouped
data
in Table
2-4. We can think

of this as a
weighted appropriately
by its relative
Table
on
These
show
cell, and
ith
the
in
(2-lb) to emphasize that
of
Mean,
m=
of
number
is
it
15
\"
In our
example, the
2-3, and is
in
shbwn
with
average,
\"weighted\"
cells.
equivalent
the
frequency.
and Mode
Median
ree measuresof centerare compared

in Figure
2-4. In part a we
bUtio n which
has a single peak and is symmetric
(i.e., one half
image of the other);
in this case all three central measures
Bu\177
when
the
is skewed
distribution
to the
right
as
in b,
the
\177nedian
i
(o)
Mode
Median
Mean
ModeMedian4,
Mean
F1G. 2-4(a)
coincide
at
A
the
;ymmetric
)oint of
distribution
symmetry.
with a single peak.The mode,

median,
A right-skewed distribution, showing
median < mean.
(b)
an d mean
mode
<
16
STATISTICS
DESCRIPTIVE
of the mode;
with the long scatter of observed
values
strung
hand tail, it is generally
necessary
to move from the mode
to the right to pick up half the observations. Moreover, the mean will generally
lie even further
to the right,
as explairied in the next section.
to the
falls
out
in
the
right
right
Interpreting the Mean by an Analogy from Physics. The 200 observations

sample of heights appear in Figure
2-2 as points along the X-axis.If
think
of these
observations
as masses (each observation
a one
pound
the
in
we
mass, for
ask
where
the
formula
example),and the
The precise
as a
X-axis
rod, we might
weightless supporting
balances. Our intuition
suggests
balancing point, also called
the
this rod
center.\"
\"the
center
of gravity,
is given
by
which is exactly the formula for the mean.

Thus
we are quite justified in
thinking
of the sample mean as the
\"balancing
point\"
of the data, and
representing it in graphs as a fulcrum \177.
It can easily be seen why
the
mean
lies to the right
of the median
in a
right-skewed distribution, as shown
in Figure
2-4b. Experiment by trying
to
balance at the median. Fifty
percent
of the observed values now lie on either
side,
but the observations
to the right tend to be further distant, tilting
the
distribution
to the right. Balance can be achieved
only
by placing the fulcrum
(mean) to the right of the median.
PROBLEMS
2-1 Show the

mode
mean,
a good
mode,
and median
central measure in this
2-2 Find the mean, median, and mode of

Graph
4 10
11
1 10
Sort the following
in Figure
2-3. Is the
case ?
the
following
sample
of
are
55,
litter
9 15 12
14
10
12
14
data into 8 cells, whose midpoints
60...
56.02
55.31 81.47 64.90 70.88 86.0277.25
76.73
84.21
84.92
90.23 78.01 88.05 73.37 87.09 57.41 85.43
74.76
86.51
sizes.
distribution.
the frequency
2-3
example
our
in
86.37
76.15
88.64
84.71
66.0\177
83.91
90.
17
DEVIATIONS
Approxirr
2-4 Sort the
Problem 2-3 into
ata of
90. Th
80,
2-5 Summariz.
table.
en answer the
Ori \177inal
--F
(.
)roblem 2-3)
oarsegrouping
mean
2-4
Although
see why
ra.
of
with the
lge is
value.
For men's heig[
us n, ts, the
it tells
two extreme
\177thing
which
ta ce
deviation of ea,
(Xi
\177)
are
defined
th
SPREAD)
be the most important characteristic

to know how spread out or varied
we find that
there are
the
distance
between
the largest
en
of
be
fairly
on the grounds
criticized
distribution
except where
be very
unreliable.
We therefore turn
account of all
and smallest
observation
largest-smallest
it
ends.
And
to
measures
these
of
observations.
as its name implies, is found

value (Xi) from the mean
averaged by summing and dividing
deviation,
\177
measures
several
the
h observed
the
important
range is 30.It may

about
val
The averag.
center,
___a
\177es may
spread
Not
simplest.
simply
Range
that
may
is also
observations.
star\177
The
OF
height
sample,it
\177asures
(a)
81.47
is not
mode
the
\177tverage
(statistic) of th e
are the sampie
As with rr
we
77.78
Mode
2-4)
DEVIATIONS(MEASURES
spread;
Median
following
a good measure?
)arse
grouping
al}va),s give worse approximations (for
nd median) than fine grouping,
or will it do so usually?
Do yoO
(b) Will
Mean
the
ine grouping
(; \177roblem
(a)
previous two problems in
data
values)
:xact
(,
to the
the
are 60, 70,
whose midpoints
as in Problem 2-3.
4 cells,
questions
same
answers
\177the
median, and mode? Graph
are the mean,
ately
what
distribution.
frequenc}
by
(\177);
by
calculating
the
deviations
these
n. Although
this
18
STATISTICS
DESCRIPTIVE
sounds
like a
always
cancel
promising measure, in
fact
it is worthless; positive deviations
leaving an average of zero.7 This sign
ignoring
all negative signs and taking
the average
deviations,
as follows.
deviations,
negative
problem can be avoided by

of the absolute values of the
(b) The Mean Absolute
Deviation
_x
1 \177 1x\177. //i=1
(2-4)
\177l
this is a good measure of spread;the problem

is that it is mathematically intractable.s We therefore
turn to an alternative means of avoiding
the
sign problem
namely, squaring each deviation.
Intuitively,
(c)
If we use
Deviation
Squared
Mean
grouped data as in
Table
(MSD)
2-4,
z= _.1 \177
/l i=l
this formula
(Xi
--
(2-5a)
-'\177)\177'
becomes
(2-5b)
This is a goodmeasure,provided
we wish
typically
we shall want to go one stepfurther,
about
inference
- 1 rather than
ance.
Average
may
!/ariarice
, sS-,z
=\177I
n
\177 (X,
= -
Average
deviation
,t
-1 1 i=1
\177 (xi
describe the sample. But

to make a statistical
is better to use the divisor
use this
is referred
to
as the
vari-
(2-6)
.\177)\177
as follows:
be proved
deviation
and
the population. For this purpose

it
\370
n. The resulting
sample statistic
(d)
This
only to
= 0
--
(2-2)
.\177)
1
\177 x\177-
- (n2)
(2-3)
s One difficulty
is the problem of differentiating
the absolute
value function.
0 Technically, this makes the sample variance an unbiased estimator
of the population
variance. SeeChapter
7.
The values of MSD and

(e) Stan lard deviation
s =
Deviation,
Standard
root of the
the square
is s,
\177 n
2-4, again exploiting
in Table
calculated
s 2 are
data. 10
of grouped
simplicities
the
\177Z
variance.
(Xi
X)
and
B
In
conclu\177sion,
the
. sample
.\177is
we compensate for
s is
reduced
language
second moments of the
units
common measure
to
s2
X and
squared
as the
having
to the same
of center,
common measure
the most
deviation
s the most
of physics, we refer
standard
\177
the
orrow\177ng
mean
sample
the
(2-10)
\177
/
Note
that b, taking the square root,
terms in d efining
variance
in (2-6), so that
X observatior\177s.
19
TRANSFORMATIONS
LINEAR
spread.
of
as
first and
the
sample.
PROBLEMS
2-6
pute !the
:
Com
(a)
Usin\177
(2-6).
definition
the
2-2.
(b) Using the easier formula
of Problem
data
the
of
variance
grouped data of Problem 2-4, compute

deviation, and standard deviation.
2-7
For the
(2-8)
1 For
the
witIx
has
done
already
2-5
LINEAI\177
Suppose t
of
\"norm\"
>arentheses are
6!
tat
(i.e., 5
Proof
and
merely
this
is shown
and
side is \177
(X i
following
formula.
(2-7a)
-i,h
in the last
two
\177)s=
We may
x?
\177 (X\177
\177 X\177 --
nY
forget
-- 2&\177 + \1772)
n\177\177
=
right side
(2-7b)
\177']
of Table
columns
are equivalent.
(2-7a)
25
The left
from the
computed
becomes
that (2-6)
prove
(CODING)
s2 is often
calculations,
the
s2
This computation
the student
in our example are measured relative

to
feet 9 inches).SinceXi denotes
the old height
ndata
grouped
problems
deviation.
heights
men's
the
siml:
lify
For
standard
parallel
(2-7).
absolute
mean
range,
Origin
inches,
Strictly to
they dosely
since
optional,
TRANSFORMATIONS
of
Change
(a)
2-3, compute the
of Problem
data
grouped
\177
Problems
the
2-4.
the common
divisor
(n
'
1),
(2-8)
(2-9)
20
STATISTICS
DESCRIPTIVE
(eg., 83), let X\177 denote
in inches
are related by the
equation
X i = Xi
terms,
nonmathematical
inches an individual
In
guessthat
the old
(+) or shorter
measure, i.e.'
On the other
hand,
regardless of which
spread
the
the
mean
be
will
exactly
the same,
i.e.'
is used,
(2-13)
sx, =sx
69--
\177 \177
....... \177
0
These
form
two points
of origin
Change
are illustrated
(2-15)
may be stated in theorem
consider
j'*
from
1 \177Xj
(2-14)
(\177
a)
Xi -- \177 a)
1 (ha)
1 (\177 X\177)
To prove
(2-16)it
will
be enough
to
prove
s\177,2
\177X\177
= n- 1 1 i=1
(
By
(shift).
2-5 and
Figure
in
Xi- 69
as follows:n
Proof: To prove
but
X[ =
New
FIG. 2-5
(2-14)and
(2-15)
the
equality
of variances.
\177,)2
n-1
\177
using
(2-12)
observations
of our
measurement
is simply \"the
number
of
69 inches. It is easy to
69
\177-
\177'=
measures
than
(--)
measure is just 69 lessthan
using this new
mean
the
The two
-- 69
measurement
new
this
is taller
14).
measure (e.g.,
the new
[(X i--a)--
(.\177--a)]
n -- 1 \177 (Xi --
\177)2
= s}c
21
TRANSFORMATIONS
LINEAR
Theorem
If
X[ =X\177-a
(2-14)
then
.g'=.V-a
(2-15)
(b) Change
= sx
sx.
and
(2-16)
Scale
Suppose
If
\"trintal.\"
x hat
3 inches
'e let
X\177*
of
name, for example

an old height of
some standard
new' height in trintals,
is given
height
the
denote
/ /'
///
/ /-
//
Old
New Xi*
FIG. 2-6
X\177 --
81 inche
would be
of scale
Change
converted to X.*
W
--
X\177
X(
(shrink).
= 27 trintals,
and
generally
\177
(2-17)
It is no
surprise
inches,i.e.'
that the
mean
in trintals
height
1/3 as
beseenfrom
much ascan
before.
These two
Theorem
poin
s can
be stated
mean
height
in
(2-18)
2-6 the
Figure
SX, --
the
}X
x*=
a
Furthermore,
be
is just 1,/3
standard
deviation
wilI
also
(2-19)
\253S
X
generally as
If
x\177,
bx,.
then
\177*
b\177
(2-20)
(2-21)
land
The proof
for
the
paralh
student.
Is that
of Theorem
(2-22)
I directly
above, and is left
as
an exercise
22
STATISTICS
DESCRIPTIVE
Linear Transformations
General
(c)
It is now
Consider the
linear
general
xg
two
the above
combine
to
appropriate
into one.
theorems
transformation'
TheoremIII
then
a +
\177 =
and
= a
Yi
If
s\177.
--
(2-23)
bXi
(2-24)
bX
(2-25)
Ibl sly
similar to Theorem I above,

and
is left as an exercise
theorem may be interpreted
very
simply'
if the individual
observations
(Xi) are linearly transformed (into corresponding Yi values),
then the mean observation is transformed in exactly
the same way, and the
standard deviation
is stretched
(or shrunk) x3 by the factor Ibt, with
no effect
from a.
proof
is
student.
This
the
Again
the
for
(d)
In future
tions in
Coding
to
Application
chapters we shall draw
applied to find
three steps.
1. Codeall
will be
Xi values
the
Xi -Yi =
students'
Y\177_X\177-
This is
clearly a linear
have one immediate
\177 and
into a new
most simplified if we use the
In our example of
of
of linear
theory
this
upon
it does
computation
a simpler
This involves
2-4.
However,
contexts.
various
that shown
sx
than
Yi
values.
set of
transforma-
use;
Our
it can
in
computations
formula
one of the cell midpoints

cell width
heights,
3 69=
be
Table
(2-26)
this becomes
(--\1779)
transformation
0o
(2-27)
(\177)X\177
of the form
_ --23
of (2-23), with
b=\253
\177Called
line
la
(with
More
given
any values of a and b, the
Y-intercept a).
stretched if lb[ > 1, but shrunk if lb[
linear because,
slope
precisely,
graph
b and
<
1.
of Y
= a+
bX
is a
straight
TRANSFORMATIONS
LINEAR
TABLE
2-5
Coded Computation of Mean

and
Standard
Deviation
of 200 Men's Heights (Compare with
Table
2-4)
Sa
Cod'
(1)
23
of a
\177ple
(21
For sy,
2
using
easy
calculation (2-7)
For
ng
(4)
(3)
(5)
-- 69
57
--
60
--
63
--
66
--
-8
69
+32
--21
+63
22
--44
+88
13
-13
44
+13
36
72
36
36
75
32
64
78
13
39
117
81
21
84
336
10
50
250
84
Ej\177di
128
187
200
n Ye =
175
\177
s\177,
X=3Y+69
1063
= .935
Ef,3/\177
199
(888)
= 4.46
71.80
= 3 \177/4.46
57
63 69 75 81
.,.-'
-40
FIG. 2-7
Codinl\177
from
.\177.
inches
(X)into
and
trintals (Y),
of scale.
involving
both
a change of'
Origin
24
STATISTICS
DESCRIPTIVE
in the
fill
column
in
= 0. Furthermore,
as X\177
steps. With
these
guidelines
2 of Table 2-5; diagrammati-
Yi
in unit
Y values
appropriate
69,
X\177 =
by steps
progresses
we can
evident that when

of 3, Y\177progresses
it is
Moreover,
cally, this codingis illustrated

in Figure
2-7.
2. Compute the mean and standard deviation
of the
in the successive
columns
of Table 2-5 how easily this
\177 and
sy now in hand, we are in a position
to'
3. Translate
this
transformations
of linear
theory
Y values. We note
now done. With
deviation back into X
standard
and
mean
the
applying
involves
is
values.This
III)
(Theorem
to
(2-27).
(-\177) +
F=
(2-28)
\177--69
\253\177--
and
s\177-
From (2-28)
(2-29)
-}sx
Y = 3Y
(2-30)
69
= 71.80
From (2-29)
--
Thus
simple
the
(2-31)
sx = 3st
6.35
of Y and sx
coded computation
is
complete.
PROBLEMS
(2-9)
2-5 from inches (X) into feet

linear transformation with a
By coding the heights shown in Table

(Y), compute
Y and
sx. Show your
diagram
similar
preferred
2-10 Use codingto

Problem
mean
the
find
and
is
the
standard
coding
used
the
in
deviation of
data
the
text
in
2-4.
2-11 Find the mean of
(Hint. It
is
the
following:
239510
239250
239860 239360
239480
239430
239230
239370
239290 239850
natural
number, and just work

mathematically
239,000)
2-7. Why
to Figure
justified
to
with
drop the
simply
the
239680
numbers
it is just the
first
510,
of every
digits
250,
....
linear transformation
This
Y
is
X --
2-12 To st
If
(2-13)
Using
25
TRANSFORMATIONS
LINEAR
nonlinear transformations are trickier,seeif this

F = ,\177', when there are three values of Z\177
coding, find the mean and standard deviation of
)w that
X 2, then
(a) T he
3, 5.
2-3.
of Problem
data
is true.
1,
(b) \177Ihe data of Problem 2-2.

(2-14)
Find
:hemean
execu
:ive
Graph
the relative
35
46
63
55
43
42
of the following sample
deviation
standard
and
ages.
69
59
54
45
of
50
frequencydistribution.
50 62 68 38 40
44 57 47 48 46
60
42
60 42 38
56
51 38 61 54
43 64 49 36 59
51 50 66 63 57
50 44 48 69 64 37
56
53
62
52
Review Prol \177lems
2-15 The
belo\177
\177eekly
for 5 major
rates
wage
:. Find the average weekly
ndustry
of
Wed
2-16
Supt
obta/
30 }/o
employment
S120
ly wage
ose
the
(2-17)
The
harvl
the
mean
sample
table*
Following
\177sted
(as
opposed
acco
rding
to
region.
as a
whole.
* Source.
Sta' istical
25
20
20
150
120
100
80
was recorded for
each
of 25
families,
1, 0, 1, 3, 0,4, 2,6,0,0,2,3,1,5,4,
2, 5, 3, 4, 1.
a frequency
:onstruct
3, 1,
0,
(b) \177ind
listed
data'
following
2, 4,
(a)
of children
number
the
ning
are
groupings
industrial
wage.
Abstract
table and graph.

and standard deviation.
the actual percent of farmland

that
was
to pasture, woodlot, etc.) in the U.S.A.
in 1959,
Compute
the percentage harvestedin the U.S.A.
gives
of the
United States, 1963,pp. 625,614.
26
STATISTICS
DESCRIPTIVE
Amount
PercentHarvested
(millions of acres)
Region
North
421
46.7
South
357
21.0
Mountain
264
8.7
18.8
80
Pacific
1,122
U.S.A.
2-18
of Farmland
was sampled, yielding

the
following
10
1.0, 1.2, 1.0, 1.1, 1.0, 1.6,1.2,1.4,
Find
the
median,
mean, range, variance, and standard deviation
(a)
For the original lengths.
(b) If the lengths are expressed in mm (1 cm -- 10ram).
(c) Ifthe lengths are expressed as \"centimeters abovea standard beetle
height
of 1.1,\"
(i.e., the sample values become+.4, --.1, .1,
A
species
certain
lengths,
of beetle
2.0.
1.5,
centimeters:
in
-.1,
2-19
a die
Throw
numbers
tribution,
(a)
After
(b) After
in
and
i0
100 times
Appendix
calculate
throws;
(or elsesimulate
this
by consulting
IIa). Graph the

the sample mean
Table
25 throws;
(c) After
100
(d) After
millions of throws
throws;
(guess).
the random
relative frequency dis-
chapter
Probabili
3-1 INTR DDUCTION
In the
L ext
fo
\177n
Chapters 7
tion
from
certain
If the
tha
o 10,
we shall
sample.
.opulation of
the
sample. Nevertheless, it
up
in
our
s\177Lmple.
Our
is
make
induction
inferences about an
in
involved
unknown
popula-
is 55 % Democrat, we cannot
be
of Democrats will occur in a random
that \"close to\" this percentage
will turn
is to define \"likely\"
and
\"close
to\" more
voters
American
same
deductions about a sample from
necessary preludeto the
this is a
where
\177observed
:xactI\276
we make
chapters
ur
pophlation;
known
percentage
\"likely\"
objective
able to make useful

predictions.
First,
of ground work. Predictions
in the face
of uncertainiy or chance require a knowledgeof the laws of probability,
and
precisely;
however,
this
ir
w\177
must
way
lay
we shall be
a good deal
tsPr\177s \177ehs\177Petxe\177
iis lde\177v\370tt;dssTnXClcUSively
\337 P
\177p
' g
t\370
their
Conside[ again
our
example
against rolliqg an ace on a die. This

this outcom\177 was unIikel . N
in
development.
dice.
Chapter 1, in which
oins and rolling

gamble
'
was
based
'
We start
the
on the
reader
jud\177em6nt
with
the
gambled
that
be more specific,and try to define its

probability p}ecisely. Intuitively,
since
this is but one of six equally
probable
outcomes,
w\177 m. ight
(correctly)
guess \177ts probability
to be one in six,
or
one-sixth
p\177owded
it is an honest die. Alternatively
we might
say that if
the die were tarown
a large
number of times, the relativefrequency
(of rolling
an ace) woul
d approach one-sixth
(as
in Problem
2-19). This is a useful
operational
a
pproach;
thus,
if we suspect that
this
die is not, in fact,
a fair
one, we coul,
test by tossing
it many times, and observing whether
or not
the relative fr
\177quency
of this outcome
approached one sixth.
This
formally
ow
let s
deft'
state
nition
d as'
of
probability
as \"the limit
27
of relative
frequency,\"
is
PROBABILITY
Definition
H1
Pr (eO
where
is the
frequency
n\177is
of times
of times that
total number
the number
n\177is
the
(\"ace\
the outcome
e\177is
therefore
(3-1)
--
lim
zx
f of
that
the trial
the
outcome
is repeated
(dieis thrown)
e\177occurs,
[also
called n(eO
or
ex]
the relative
of
frequency
ex
use this definition

of probability
because it provides
intuitive idea. However,you will find in Section 3-6 that it involves
difficulties; thus, if you choose to study
probability
further
you
forced to turn
to the axiomatic
approach.
We shall
clearest
conceptual
the
be
soon
will
PROBLEMS
3-1 (a) Throw

Record
record
a thumbtack
Define tossing \"the
Up ?
Point
(n0
table,
following
of
Frequency
Trial Number
(\177)
50 times.
your
results as in the
for future reference.
\"Ups\"
up\"
point
and keep
Relative
Frequency
Accumulated
No
.00
Yes
.50
3
4
5
Yes
.67
No
.50
Yes
.60
i0
20
30
40
50
as
permanent
e\177.
(b) Show
on the following graph'
results
,our
29
PROPERTIES
ELEMENTARY
>, 1.0
0.5
.->
II
Of tossingthe
3-3
in
a pal
11. 1\177epeat
el;
as
\"head\"
ELEME
e2,...
and
,ei
lize
by
...,
considering
).
The
and
as
proceed
for use in
and
e4,
You
use.
in
and
9.)
Chapter
proceed as in
may use the
same rela ions are true
in
n'--
<:
_<
Pr
be exact,
of
any
outcome
are
e,\177
positive;
relative
so that
limit,
the
to
elementary outcomes
posit re, since both the numerator and denominator

moreover, sin :e the numerator cannot exceedthe denominator,
frequency
cam tot exceed 1. Thus
_<
of
total
PROBABILITY
an experiment
with N
relative
frequency n\177/n
is your estimate
order
must be
The
3-1(a)
in
occur if you get

(b). What
theoretically ,
OF
PROPERTIES
e,v
E to
event
3-1(a)
Pr (E)
work?
derive
empirical
\177TARY
gener\177
and define the

as in
50 times,
you
the point up ?
2-19.)
Problem
of dice,
Roll
7 or
We
el,
of tossing
probability
down.
point
coin, define a
of Pr (E)? Can
and also s \177ve the
3-2
50
20
(b), tossiltg it 100 times.

(Record
your results
Roll a di{ 100 times. Define rolling a four
as
3-1 (a) and (b).
(Record
your results for future
same data as
3-4
15
best guessof the
(c) What is your

3-2 In tossing!a
10
from (3-1)
(3-2)
(ei)
and
Pr
(e\177)
<
(3-3)
30
PROBABILITY
we
Next
note
that the frequencies

n2 +
n\177 +
by n,
equation
this
Dividing
we find
Outcome Set
(a) The
e\177, e2,...,
relative frequencies
all the
die
was an
example
=e2
\337(H,T,H)
=e3
\337(H,T,I)
=e4
\337(T,T,H)
\337(T,T,T)
FIG. 3-1
flipping
We
Outcomeset
times.
coin
experiment
the
three
statisticiafi are sampling

set is also often known
as
=e6
=e?
=es
a coin
of outcomes.
consists of
times (or, equivalently, flipping
three
coins
at once). A typical outcome (designated
as e4) is the sequence H, T, T. The list of all possible
outcomes, or outcome
set,
is shown
in Figure 3-1.
a
three
outcomes
matter. Whenever
convention
the
sample
this
to use
curly
{e\177,
e\177,...
is the
, es}
case,
brackets.
it
the
practical
the
outcome
space S.
The order
features.
several
note
set of eight
in
complications.
no
Since most experiments of interest

to
experiments,
\337(T,H,H)=e\177
\337
(T,H,T)
experiment where the
involved
complex set
a more
have
will
flipping
H,H)=el
(3-4)
x) = 1
+'\"Pr(e
suppose
. (H,H,T)
1.
sum
-- 1
q-
For example,
\337(H,
to
numerical, and
e6 were
an experiment
Usually,
n.
Example
An
In the previous section,the

outcomes
to
PROBABILITIES
THEIR
AND
sum
limit, so that
+Pr(eQ
Pr(e0
3-3 EVENTS
that
n2q_...
in the
is true
'\"nx
__
n\177q_
This same relation
of all possibleoutcomes
in
which
the
is listed doesn't
is a mathematical
Thus
the two
out-
come sets {ex,e=,..., es}and {e\177, e\177, es,...,

es} are
the same set.
However, since(H, H,T) and (H, T, H) are separate and distinct
outcomes,
the order in which
H and T appear is an essential
feature;
in this case
we use round brackets
and call the result an ordered triple.
Finally,
we
note
that an experimental outcome involves
an
entire
ordered triple. It is tempting
to try to tear each triple into
three
parts,
and
think of 24 outcomes.This mistake
is avoided
by writing down a dot for each
of the 8 elementary outcomes. (Hereafter, we shall
often
refer to outcomes
as \"points\"
for short.)
31
EVENTS
To
let us
sire
suppo', e that
Sihce
probable.
the
all 8
is
coin
tossed,
fairly
probabilities
sum
must
''' =
= P(e2)
P(eO
our concepts in any way,

so that all 8 outcomes are equally
to 1 according
to (3-4) we have
restricting
without
calculations
dify
--
P(es)
(3-5)
\177
Events
(b)
This
event
of 3 coins,considerthe
E: at least 2 heads
the example
Continting
outcomes
tcludes
Event
es
ca, and
e\177, e2,
event
3-1.
Figure
in
We might
say
;,//\177'\177
'<
sample
space,
Outcome set,
\337
e2
\337
e\177/
or
\337
\2756
\337
e7
'
iG.
An event as a
3-2
the event \234 is the

this
is a col .venient
collection
way
\2758
within an outcome
of points
subset
of points
to generally
{e\177, e\177, ea,
e,} as in
set.
Figure 3-2. In
fact,
define an event'
Definitio
An event
We
ask
no\177v
limiting relative
E is a subset of the
is the
\"What
probability of
frequency, we may
where
n\177
ca, o
frequency
e5
occur.
of E. But
(3-6)
set S
E?\" Using
the
of
definition
write
Pr (E)
el, e2,
outcome
= lim
n
T/--\177
OC
(3-7)
of course E occurswhenever
Thus
n E = nl
n\177 q-
na +
n.\177
the
9.utcomes
32
PROBABILITY
and from
(3-7)
Pr (E)
= Pr
3-1
TABLE
ways of
alternative
Three
Events
Several
(1)
n\177 +
lim
n2 +
(ei) 4- Pr (e=)
the
in
na
n5
Pr (ea)
of Figure
Experiment
3-1 (Tossing3
Coins)
event
an
naming
(3-8)
+ Pr (es)
(4)
(2)
(3)
Description
Outcome List
Arbitrary
Symbol
for Event
E
F
At least 2 heads
Second coin head,
Fewer than
All
No heads
followed
1 head
{e4, e6, e?}
2 heads
{e\177o,ea,
Ia
Exactly 3 heads
Less
of
event,
that
the
probabilities
1/8
3/8
3/8
es}
1/8
{ex}
{ex, e2, ea,e\177}
generalization
obvious
sum
1/2
1/4
Exactly
2 tails
1/8= 1/2
I/4
{es}
Exactly
The
1/8 + 1/8 +
{e\177, es}
12
than
1/8 +
{e2, e0}
{e4, e0, e?, es}
tail
2 heads
the same
coins
ea, es}
I\177
is the
that
by
{e\177,es,
Probability
of (3-8) is that
of all the points
4/8
the
probability
of an
event
(or outcomes) included in
is
Pr (E)
= \177 Pr
(ei) i
(3-9)
over just those outcomes ei which

are
in E. We note an analogy
(in physics)
and probability: the mass of an object
is the sum
of the masses of all the atoms in that object; the probability of an event is
the sum of the probabilities of all the outcomes
included in that event.
Various events are considered in Table
3-1; all the outcomes included
in
each
event
are listed in column
3. Since the probability of each outcome
summing
between
mass
33
F.V}\177NTS
is 1/8,the
TN
in t[
probability of each event

in column
when we consider the first
of the
:alculation
simple.
events
value of
have been cis table. In fact,

lear immediately
4 is very
is evident
list
the
same event; although
the
are
they
from the description, the
this
last
not
it obvious.
makes
list
and
may
(c) Combin!ngEvents
we might ask for the
than 2 heads or all
example,
an
As
there
that
of \"G or
same (or
probability
less
be
Will
the
coins
H,\" that is,

both). This
\337
e2 %
\370
e3
(a)
FIG.3-3
each case
diagrams,
illustrating probability of combined
events.
(The rectangle in
the whole sample space;hence the probability
of all points (or out-
repre
comes)within
rots
sun,\177oand1.) H\";
(a)
rectangle
ev,
combined
well as
\"G
o\177
at is
denoted
lists
general,
fo
any
events
two
G(c)t.\177
I Htw shaded,
J shaded. \"G or
u H,\"
and
of Table
3-1
\"G
by
H.\"From the
G UH
In
(c)
(b)
Ven
--
{e4,
e6,
be
may
it
can
e7, es,
q}.
(b)
H\";
G rq H
\"G union
read
be seen
shaded,
H\" as
that
G, H:
Definition.
A
this
in G
H \177xset
little
of points
which are
in
G,
or in
H, or
in
both
(3-10)
definition
u H, its
)stract
Since
\177robability
art
in
Figure
3-3a,
five of the eight

is
5/8.
called
Venn
diagram,
illustrates
equiprobable outcomes are included
34
PROBABILITY
be interested
\"G and H,\" that

is, that
This is clearly a much
more restricted combined event;
any
outcome
must satisfy both
G and
H,
rather than
either
G or H. Again, we can use a Venn
diagram
as in Figure
we might
Similarly,
there
will
than 2
fewer
be
event
the
in
heads, and all coins the
same.
3-3b; this shows clearly
that
there is only one outcome (3 tails) that
qualifies.
event is denoted by G ch H, and may be read\177\"G intersect
H\"
and
H.\" The lists of G and H in Table
3-1 confirm that
combined
This
as well as
\"G
G
since the only
outcome
appearing
of G rh H is 1/8. In general, for
any
(es)
rh
in both lists is es.

2 events G, H
Hence the
probability
Definition.
Gm
(d) Probabilities
of
of
& set
are in
which
points
(3-11)
both G and H.'t
Events
Combined
We have already shown

how
Pr (G k9 H) may be found
from
the Venn
diagram in Figure
3-3. Now we should like to develop a formula.
First
consider a pair of events
that
do not have any
points
in common,
such as I
and J from Table 3-1. (We also say that they are mutually exclusive, or do
3-3c
Figure
From
overlap).
not
Pr (I
this
But
simple
J)
that
obvious
is
it
k9
Pr (I)
+ Pr
does not always work.
addition
Pr (G t9
H)
\177
Pr
(G)
(3-12)
(J)
+ Pr
For example
(3-13)
(H)
has
What
Pr (G)
and
to
(H) we
This
overestimates.
\177
To
wrong
gone
and Pr
the
is easily
Since
intersection
case?
this
in
count
corrected;
G and
G rh H
subtracting
remember
when to or rh is used, it may help to recall
rh resembles
the letter \"A\" in the word \"and.\"
These
that
avoid
the
sentence\"E
is ambiguous.
ambiguity
to F
that
has 5 points\"
might
has a
overlap,
twice;
Pr(G
in
summing
this is why
(3-13)
rh H)eliminates
to stands for \"union,\"

technical symbols are used
that
occur if we used ordinary English. For example, the

precise meaning,but the informal\"E
or Fhas 5 points\"
EVENTS
double
this
counting. Thus, we have
35
shown
Theorem.
Pr (G
exan
In our
(G)
= Pr
H)
t9
-3-
Formula (3.
th
14) is in
disappears
is applied.
(3-14)
v\177hen
'+8-2
\177
s
and applies
general,
quite
fact
example where G and

! and J do not overlap;
(3-i2) where
rh
ple
5
4
8 --\177
worksin
(3-14)
- Pr (G
(H)
Pr
to
H overlap.
It also
hence Pr(I m J) =
For emphasis
we may
events.
It
in cases
like
two
any
applies
0, and
write
last
this
term
in generaI,
Theorem.
+Pr(J)
Pr(I\177J) =Pr(I)
if
But
exclusive.
mutually
(3-15)
,e recognizedthat this is just a special case of (3-14).

:ion of several events is defined
as mutually
exclusive if there is
.e.,
if no outcome e\177belongs
to more than one event. For exT\177ble
3-1, events I, I\177, and
12 are mutually exclusive; but
E, F,
it must
colle,
no
J are
I and
overlap,
ample,
in
are no becauseE and

The coll ction of events
and I
the
\"covers\"
In general,
{I,
I\177, 12,
space S.
sample
whole
at
F overlap
We
is mutually
Is}
call
therefore
exclusive, and also

it a partition of $.
Definition.
of a
partition
sample space
S is a collection
mutually
exclusive events {I,...
union is the whole sample space
\177I x wI2...
twl,\177 =S
of
a partiti
Thus
events, as
In
in E.
illu:
Table
on completely divides the

;trated in Figure
3-4b.
note
.\177fore
by
whose
(3-16)
space
sample
into
nonoverlapping
\"
G consists of exactly
those
points
which
\"CO
'
\"
'
G the
m?tement of'E, or 'not
And in general, for any event
E
We ther 3-1
denote it
I\177}
S.
that
could
call
\"E\"'&
points
are not
E,,,
and
Definition.
in sample
space
not
in E.
(3-17)
36
PROBABILITY
E4
E\177
E\177
FIG. 3-4
(b) Ex,
Venn
to
diagrams
event
An
Because these
and
since
(c) E
case is
events are mutually
{E, E}
{E, \234}
exclusive,
u E) = Pr (E)
form a partition).
Sample
rectangle.
a very
form
exclusive;
by
(3-15)
Pr (E)
simple partition.
(3-18)
form a partition
E)
1 = Pr (E) q-
Pr
(3-19)
(3-19) into (3-18)
Substituting
yields
E 2 are mutually
{E, E}
represented by
Pr (E u
This
E\177and
(Note.
shaded,
and its complement
Pr (E
(a)
definitions.
illustrate
a partition;
space S in each
E 2, . . \337,E\177form
(c)
for Pr
a solution
(E) in
(\234)
of Pr
terms
(E)'
(3-20)
Theorem.
As an example, considerthe
is \"no
complement
Pr
heads,\"
(at
of getting
probability
and is very simple
least one
head) = 1 ---
at least
one
head. The
to calculate.Thus
Pr (no heads)
8
This is not the only way to answer this question,

but it is by far the simplest,
since Pr (no heads)is so easy to evaluate.
The student should be on the alert
for similar problems'
the key wordsto watch
for are \"at least,\" \"more than,\"
\"less
than,\"
\"no more
than,\"
etc.
EVENTS
37
PROBLEM
3-5 Suppos
are
and nickel
a penny
(a) The outcome set
table.
the
on
thrown
conveniently.
listed
be
may
(penny, nickell
= e\177
(H, H)
(H,
T)
\337
(T,
H)
= e2
= ea
=e4
\337
(T,T)
ourself
Satisfy
are equally likely,
4 outcomes
all
that
1. Philo:o?hical
argument. Obviously et and
because they
differ
only
in what happens to
e2
the
are
Em\234i\177ical
so
10 times
frequenc
(b)
Con
as just
, of
everyone in the class

amount
of data can be
Have
argume\177t.
that
a large
each outcome
ider the following

a reduction
of the
Both
heads
tails
One of
Are these
(c) What
3-6
two
alter\177
The
outc(
three outcomes equally

is the probability
of
sets, and
ate outcome
>me
set
of
Problem
repeat the
H
T
each
verify
are their probabilities
? What
likely
one head?
at least
that
3-5(a) could
you
H)
(H, T)
\337
(T,
T)
the
same answer.
be
(H, H)
using
Answer
get the
alternatively
\337
(T,
relative
'
Nickel
Penny
experiment
the
(a)].
set in
\337
Both
equally
symmetric
set [which is recognized
outcome
alternate
outcome
nickel.
ea are
the
pooled.Is
about 1/4?
likely,
equally
symmetric
Similarl,
ea and e4 are equally
likely.
Finally,
e\177 and
likely, t ecause they
differ
only
in what happens to
penny.
] hus all 4 outcomes are equally
likely.
2.
2 ways'
in
written
as
PROBABILITY
38
same way, list the outcome set when

a pair
red, one white. Then calculate the probability
In the
one
thrown
(1) A
total of 4 dots.
(2)
total
of 7
(3)
total
of 7
(4) A
double.
(5)
(6)
total
double
painted white
? In
other.
the
on
get the
answers
to (1)-(7) if the dice were both
compare the chance of a {3,3}combina-
same
particular,
a {I, 5}combination.
coin of Figure 3-I were not
chance of
to the
the
Suppose
3-4).
Problem
3.
(8) Would you
3-7
(as in
the long run,
the
fairly
HT)
\337
(H
T H).
\337
(T
H H)
\337
(T
H T)
\337
(T
T H)
\337
(T
T T)
of Table
G'
10
10
15
10
10
15,/
3-I,
than
fewer
15
2 heads
H: all coins the

find
following
the
(a)
Pr (G);
(b)
Verify
that
Let us
further
(3-14)
\177
holds
same
(3-9)and a Venn
(Hint. Use
probabilities.
Pr (H); Pr (G
H);
Pr (G ch
H)
true.
define
K: fewer
than 2 tails
L' somecoinsdifferent
Then
find
(c) Pr
(d)
(K); Pr
Verify
that
(L); Pr (K L;
(3-14)
holds
over
15v'
. (H
\337
(HTT)
definitions
and that
observed
Pr (e)
\337
(HHH)
the
thrown,
frequencies were
relative
following
Recalling
are
of
least 8 dots.
of at
on one die, 5
(7) A 1
tion
dots.
or 11 dots
of dice
L);
true.
Pr (K ch
L)
diagram.)
39
EVENTS
3-8 (a) List
(b) Defir
he sample space of 4 coinstossed

e events ,4: all coins the same
B: precisely
Redt
fine
(A) + Pr
3-9 When
+ Pr
(,4)
Pr
Evaluate
(c)
1 head
least 2 heads
C: at
is Pr
simultaneously.
form
a partition
as \"all
tails.\" Do
(C). Do these
(B) + Pr
events
,4, B, C now
(B) + Pr (C)?
? What
a partition
form
tossed 4 times, let Y denote the number of changes

in
sequence.
For
example,
the outcome H T H H may be written
H/T/HH
! where
the two changes in sequence
are indicated
by slashes;
similarly\177
the
outcome
H/TTT has only
1 change.
What is
3-10
oin
(a) Pr (\177 = I)
(b) Pr (Y\177 = 2)
(c) Do tile events of (a) and (b) form
a
(a) Wha'
is the probability of at least
tossed ? .
partition
(b) Wha'
one head
3-11
is the
at least
of
probability
x Supt
>sea class
following
of 100students
of
Men
Women
math
math
If a student is chosen
e student will be:
chance tk
are
groups,
in
the
by
to
lot
17
38
100
100
23
22
100
100
be class
president,
what
is
the
(f) If
th
th
be a
what
is
the
math ?
classpresident
\177the
in
is taking
prece( .ed by arrows
math
A ma n, and taking
chance
Problems
coins
wc man ?
(c) Takir g math ?

(d) A ma n, or taking
the text.
coins are
A mE n?
(b) A
10
when
several
consists
Not
taking
(e)
when 4
proportions'
Taking
(a)
.9
head
one
tossed
\177
is fairly
fact
math ?
are
important,
turned
Not
taking
because
out to
math
man,
they introduce
a later section
in
PROBABILITY
40
students of a
3-12 The
in various
engage
school
certain
sports
in
the
proportions:
following
Football, 30 \177oof all students.

Basketball, 20 \177o.
20 %.
Baseball,
Both football
football
Both
three
is chosen
a student
If
that he
will
and baseball,
basketball
Both
All
sports,
by
for
lot
5 \177o.
2 \177o.
an interview,
chance
is the
what
be:
An athlete (playing at least

football
player
only.9
(a)
and basketball,5 %.
and baseball, 10\177o.
one sport).9
(b) A
player or a baseballplayer.9
(c) A football
is chosen by lot, what is the chance

(d) A football player only ?
(e) A football player or a baseballplayer.9
Hint.
Use a Venn diagram.
(f) Use your result in (a) to generalize (3-14).
an
If
3-4
athlete
with the experiment of fairly

is completed, and we are informed
i.e., that event G had occurred.Given
heads,
tossing
probability
that
he
will
be:
PROBABILITY
CONDITIONAL
Continuing
the
that
event
ditional probability,\"
given G.\"
The problem
! (no
3 coins,
tossing
that
this
heads) occurred? This is
and is denoted
suppose
were fewer
condition,
what
there
an
as Pr (I/G),or \"the
example
probability
that
than\1772
is the
of \"conof I,
by keeping in mind
that
our relevant
Figure 3-5 it is evident that Pr (I/G) = 1/4.
The
second
illustration
in this figure
shows
the conditional
probability
of H (all coinsthe same), given G (less than 2 heads). Our knowledge of G
means that the only relevant part of H is H \177 G (\"no heads\" = I) and thus
Pr
(H/G)=
1/4. This example is immediately recognized as equivalent
to
the preceding
one; we are just asking the samequestion in two different ways.
Suppose Pr (G),Pr (H), and Pr (G \177 H) have already been computed for
the
original
sample
space $. It may be convenient
to have
a formula for
Pr (H/G) in terms of them. We therefore turn to the definition (3-1) of
probability as relative
frequency.
We imagine
repeating the experiment n
times,
with G occurring n(G) times, of which
H also occurs n(H \177 G) times.
outcome set is reduced
be
may
to
solved
G. From
CONDITIONAL
Knowledge that
has occurred m kes
Knowledge
\337
e1
this original sar pie---->

space S irreJev, tit.
41
PROBABILITY
,,,
'el
has occurred
'\"
\177__theoriginal
\337
e2
\337
e2
\\
that G
makes
sample
S (including
outcome e\177in H)
space
\337
e3
irrelevant.
G, which now
the ne\177t
\177
becomes
sample
space,
:--H
' e6
I; this event includes
one of four equi-\177
probable outcom\177
in sample
\337
e7
G, which becomes
new sample
space.
space
Thus Pr(I/G)
13
relevant
G, the only
part of H,
(a)
FIG. 3-5
Ve\177
to illustrate
tn diagrams
probability.
to Pr(I/G).
conditional
Note Pr(H/G) is identical

is
ratio
The
and
frequency,
relative
conditional
Lhe
Pr (H/G)
.x
n(H
lim
in
(a) Pr(I/G).
the
(b) Pr(H/G).
limit
(h G)
(3-21)
n(G)
On
divi
denominator
and
numerator
ting
Pr (H/G)
Iim
n(H
,\177
Pr (H/O) =
This
is
formula
multiplying
Pr
often
in a
used
Pr
by n, we
(h G)/n
n(G)/n
Pr (G)
(H rh
(3-22)
O) t
different
slightly
obtain
(G)
form, obtained by
cross
r (H
G)
ch
= Pr
(G) Pr (H/G)
(3-23)
/-P
PROBLEMS ]
(3-13) Flip 3
follow:ngcoinstable. over
In this
section
probabilities. Tt
ind the
and
next, we
is permits
us
over again,
shall
to divide
recording your
all events under

legitimately by various
assume
as
results
consideration
probabilities
in the
have nonzero
at
will.
PROBABILITY
42
Conditional
Accumulated If G Occurs,
G
Frequency
Then H Also
Occurs ?
n(G)
Occurs?
Trial
Number
Relative
Accumulated
Frequency
Frequency
n(H
n(H
\177 G)
\177 G)/n(G)
No
2
3
Yes
No
1
1
Yes
No
.50
Yes
Yes
.67
is because
3-14 Usingthe
relative frequency n(H H G)/n(G)
calculated theoretically in the

of insufficient trials, so poolthe
the probability
not, it
class.)
is the
trials,
50
After
and definitions
coins
unfair
1.00
Yes
to
section?
(If
the whole
previous
data
close
from
of Problem 3-7,calculate
Pr (G/H)
(a)
(b) Pr
(H/G)
(c) Pr (K/L)
(d) Pr (R/L)
3-15
(a)
of
bought either X
bought brand
X?
(b) If
events
empty, i.e.,
brand
buying
consumer
the
buy brand X or brand Y but

X is .06, and brand Y is
may
consumer
probability
and
B are
at
include
or Y, what
one possible
(and
that
of course
outcome), is
it
The
that he
probability
the
exclusive
mutually
least
is
both.
not
.15.Given
always
nontrue
that
Pr
3-16
(A/A
L; B)
[Pr (A)]/[Pr
(A)
Pr (B)]?
chips (numbered Rx, R2, Rs) and 2 white chips

sample of 2 chipsis drawn,
one after the other.
List the sample space. For each of the following events, diagram
the
subset
of outcomes included and find its probability.
(a) Second chip is red.
bowl
contains
(b) First
3 red
Wx, W2). A
(numbered
chip
is red.
chip is red,given
(d) First chip is red, given
(e) Both chips are red.
(c) Second
the
the
Then note the following

obvious also'
first
second
chip is red.
chip is red.
features, which
are
perhaps
intuitively
PROBABILITY
CONDITIONAL
\177
(I)
\177he answers
to (a) and (b) agree, as do the answers
(2)
\177
:h \370w that
the answer to (e) can be found
altern\177atively
(3-2'-) to parts (b) and (\275).
i i113
Two !\177cards
that:
(3-17)
of part (2)' if 3 chipsare drawn

what
red ? Can 3,ou now generalizeTheorem
!Xtension
are
(3)
that
3-18
p\370\177er
two black
the
are
(a)
(b) 21aces,
(d) 4 acesand
(e) 4 aces?
is the
What
probabiIit3,
queen, jack or ten)?

'deck of cards.
order,
queen ?
a king
What?is the chance
in
finall
2 kings,
then
(c) 4iaces,then
probability
(3-23)?
an ordinary
from
\177sdrawn
i is the chance of drawing,

2iaces , then
3 kings?
Wha
is the
aces ?
cards (ace, king,
honor
cards)'
are both
hand
(5
an ordinary deck.
and (d).
b3, appl3,ing
aces?
(a) l'he3,are both
(b) \177ljhey
(c) \177he3,
from
drawn
are
to (c)
43
of drawing,
in
order
any
whatsoever,
a king?
(f) '5Vour of a kind\" (i.e., 4 aces, or 4 kings,or 4 jacks, etc.)?

5 cards are drawn
with
replacement
(i.e., each card is replaced
in the
deck before drawing the next card, so that
it is no longer a real
If the
poker
deal),what is
(g) E\177:actly 4 aces?
3-19 A
are picked up
(a)
T\177e
(b) T15e
(c) The
first
two
first
defective
first
defective
Two diceare throw

E: firs di e is 5
tot\177
G'
totall
Compute
(a) Pr
Pr
(b)
(c) Is
3-21 If
empty,
the relevant
(G/E)
.of
3-22accoun
A
corrpany
defective
If
bulbs.
is the chance
the
the bulbs
that
9th ?
Let
n.
probabilities using
(F/E)= Pr (F).
a\177d
order,
10
is
it
what
any
in
I is
related to
of drawing,
are good ?
bulb was picked 6th ?
bulb was not picked until
bulbs
=> 3-20
F:
order,
random
in
probability
bulbs contains
10 light
of
sup\"ply
the
\177
Pr
Venn
(G).
Pr (E)? Do you
(a), Orthatjust Pr an(E/F)=
accident ?
true
F are
course),
Show that:
diagrams.
this
is closely
exclusive
events
(and both are nonbe said about Pr (E/F)
100 persons--75
men and 25 women. The
provides jobs for 12\177o of the men and 20\177
any 2 mutually
what can
employs
ng department
think
PROBABILITY
44
women. If a
of the
from the accounting

man? That it is a
at random
chosen
is
name
probability
is the
what
department,
is a
it
that
woman?
(Bayes' Theorem).In a
grade schoolgraduates,
50
=> 3-23
among
employed,
and among the college graduates 2 %

If a worker is chosen at random
is the
what
(a)
grade
are
Among the grade school graduates,10%are unthe high school graduates, 5\177o are unemployed,
graduates.
college
40% are
graduates, and 10%
high school
% are
suppose
workers,
of
population
unemployed.
are
and
to be
found
unemployed,
that he is
probability
graduate?
school
(b) A high school

graduate
?
(c) A college graduate 9.
(This
problem
is important as an introduction to Chapter
15; therefore
its answer is given
in full.)
Answer. Think of probability as proportion of the population,
if you
like.
Classesof Workers
C3
C2
C1
Old sample space =
pop
ulation
of workers
Effect
E (unemployment)
the new
space,
sample
shaded.
!2/////////
Pr (E) =
,,,,,,,,,,,
er(E)=\177er(Erh
Pr
(E rh
Pr
In the
Ci) =
\177j
new
.040
(E/C\177)Pr(C\177'
space
sample
-/
.025
....... : .....
shaded,
.002
(3-22) gives
,040
(a)
Pr
(C\177/E)
--'
.067
--
.025
'
(b) Pr (C2/E)-- .067--
.597
.373
.002
(c)
Pr (Ca/E)
check,
sum
-- .067 ---
.030
1.000
= .067
.067.
is
45
INDEPENDENCE
Theorem. Problem 3-.23

be stated as Follows'
on Bayes'
No!es
may
which
Theoren\177,
is an
Certain \"causes\"(education levels)

Q,
.probab/l;tiesPr (Ci). In a sense the causes
(unemP oyment) not
Pr (E/Cly
In a
probabilities
(C,/E)
Pr (E/C\177)J-+Pr
3-24
an \"effect\"
Deduced
Given
\177
.prior
C,,, have
manipulations,
calculates
of probability
a cause given
the effect,Prone
(Ca/E)
\337
Using
conditional
the probability
eventual
Ca,...
produce
with conditional
but
certainty,
with
of Bayes'
example
cd rtain
country
it rains 40 % of the days and shines60 \177 of the
A barometer manufacturer, in testing
his instrument
in the lab,
has four d \177at it sometimes errs: on rainy days it erroneously predicts
'\177shine\"
10 \177 of the time,
and on shiny
days
it erroneously
predicts
days.
\"rain\"
\1770\177
(a) In
the
of
time.
the (prk,r) chance of rain
seeingit
(b)
(error
INDEPE]
3-5
In
this
an
if
rain
predicts
shine if
the
improved
\"rain\"?
barometer
improved
barometer
\"rain\"
3-20
is the
that Pr (F/E) = Pr (F).Tfiis

the same as the chance
we noticed
is exactly
E,
khowing
knowing E; or)knowledge
It seems reasonable,therefore,
fact,
chance of
posterior
chance of rain
is the (posterior)
posterior
chance of
10 and 20\177 respectively)
is the
\177
F,
what
looking
\177DENCE
Proble
chance of
of
r\177.tes
(c) Wh\177t
predicts
the
is
is 40 \177. After
\"rain,\"
predict
Wh\177,t
looking at the barometer,

at the barometer and
before
weather
tomorrow's
p\177'edicting
of
does
to
:basis for the general
not change the

F statisticall),
of F,
\177robability
call
definition:
that
means
independent
of
the
without
F at
all.
of E. In
Definition.
An
event
F is
called statisticall),
independent
(3-24)
of an
Of
say that
changes
course,
G
the
n the
wa\177
event E if
case of events
statistically
pr o bability
of
Pr (F/E)
G and
de\234endent
G.
on
Pr
(F)]
E, where
E.
!'(G/E) v\177P(G), we would

case, knowledge of E
In this
46
PROBABILITY
We
now
(3-22)
stituting
Pr (F
can
E) =
(Fm
Pr
Pr (F) Pr
(E)
Sub-
(3-25)
and work backwardsfrom
this argument,
reverse
of E.
independent
being
Pr (F)
E)
(h
Pr (E)
hence
We
of F
develop the consequences

in (3-24), we obtain
(3-25)
as
follows:
Pr (F
(h
E)
__
Pr
(E)
Pr (F)
[Pr(E/F)=Pr(E)
(3-26)
of F whenever
F is independent
of E. In other
words,the result in Problem 3-20(c) above was no accident.In view of this
symmetry,
we may henceforth simply
state
that
E and F are statistically
independent
of each other, whenever any
of the three
logically equivalent
statements (3-24), (3-25),or (3-26)
is true.
Usually,
statement (3-25) is
the
preferred
form,
in view of its symmetry.
Sometimes,
in fact, this \"multiplication
formula\"
is taken as the definition
of statistical
independence.
But this is just a matter
of taste.
Notice
that so far we have insistedon the phrase
\"statistical
independence,\" in order
to distinguish
it from other forms of independence
philosophical,
logical, or whatever. For example,we might
be tempted
to
say that in our dice problem, F was \"somehow\"
dependent
on E because the
total of the two tosses depends on the first die. This vague notion of dependenceis of no use to the statistician, and will be considered
no further. But
let it serve as a warning that statistical
independence
is a very precise concept,
defined
by (3-24),
(3-25), or (3-26) above.
Now that we clearly understand statistical independence, and agree that
this
is the only kind of independencewe shall
consider,
we shall run no risk
of confusion
if we are lazy and drop the
word
\"statistical.\"
Our results so far are summarized
as follows:
That
is,
E is
independent
Pr (E
General
Theorem
SpecialCase
+ Pr (F)
Pr (E)
= Pr
if
+ Pr
(E)
E and
(E
(F)
mutually
rh
F)
i.e.,
exclusive;
if Pr
t3
= 0
Pr (E m
F)
Pr
(Em
F)
= Pr
F)
(F). Pr (E/F)
= Pr (F). Pr (E)
E and F are
independent; i.e.,
if Pr (E/F)
= Pr (E)
if
47
INDEPENDENCE
PROBLEMS
3-25 Three co ns
are
tossed.
fairly
E\177'
are heads;
coins
two
first
Es' last coin is a head;
Ea'
Try to a\177.swer
the condition
questions
following
the
affect your betting

sample s[,aceand calculating
the
(a) Are 4?\177and E2 independent ?
(b) Are
\177
and
folh
ws (compare
Then verify
probabilities
relevant
of
knowledge
(does
intuitively
odds?).
by drawing
for (3-24)
the
E,\177independent
3-26 Repeat Problem 3-25 t\177sing

\177sas
are heads.
coins
three
all
unfair
three
the
coins whose
Problem 3-7).
sample Space
.(H H
(H H
H)
t5
T)
10
(H T H)
\337
(H
10
15
T T)
, 7(Tn
H)
.(T H
(T T
15
T)
10
H)
.10
.15
(T T T)'
3-27 A
on
electronicmechanism
certain
or off
the
with
which have been observed

relative frequencies:
2 bulbs
has
long-run
following
Bulb
.1'
l\177BUlb
1'
on
Off
This
off
30 pe:
(a) Is
(b)
:able
means,
cent of the
for example,
tt\177at
On
Off
.15
.45
.10
.30
both
bulbs
fib
I on\"
independent
of
\"bulb
2 on\"?
Is \"bi fib
I off\"
independent
of
\"bulb
2 on\"?
\"b'
'\177
were simultaneously
time.
3-28 A single ard is drawn from
a deck of
E:
F'
it
is an
it
is
cards,
ace
a heart.
and
\177
let
48
PROBABILITY
(a) An
ordinary
52-card
An
ordinary
deck,
(c) An
ordinary
deck,
(b)
3-6
(a) Symmetric
are equally
symmetry of
order
that
(compareto
(3-5)).
points, for
a fair
(e\177)
Pr (e2)
having
for
by (3-9)
an event
summation
Thus, for equally
For example,
\177
1/6,
likely
outcomes
or
the
points,
in
rolling
E consists
of
thus
its
three
over points
only
extends
probability
is given
es in
(NE
in number).
outcomes
probable
a fair
E: number
(3-27)
NE
event
an even number.
die consider the

of
dots
is
of the six equiprobable

is 3/6.
elementary outcomes (2, 4, or 6
probability
Symmetric probability theory

and gives a simpler
probability,
must be
--
E consisting of NE
Pr (E)
dots);
equally
each
as
the
where
one,
to
sum
ej
Pr (es) =
Then,
of its outcomes
= Pr (e0)
-- '\"
these six probabilities
point
each
us that all six
assures
die
for an experiment
general,
In
probability,
Thus
probable.
Pr
In
relative frequency.
symmetric
including
Probability
physical
The
limit of
as the
probability
and subjectiveprobability.
probability,
axiometric
PROBABILITY
approaches,
other
several
are
spades deleted.
spades from 2 to 9 deleted.
with all the

with all the
3-1 we defined
In Section
There
deck.
OF
VIEWS
OTHER
we use
F independent, when
E and
Are
begins
development
with
(3-27)
than
as the
our
definition
earlier
of
relative
ap[roach. However, our

although the exampleswe cited often
theory
we der:loped
was in no way
frequency
should c\177nfirm
eouiDrobable;!special
you
3-\17763
where
Not
only
also has
earlier
analysis
was more general;
involved equiprobable
outcomes\177,
the
limited to such cases. In reviewing
it,
be applied whether or not outcomes

are
should be given to those cases (e.g., Problem
it may
that
attention
o\177.tcomes we.re not equiprobable.

:is symmetric
probability
limited
in (3-2 I). revolves the phrase

circular reasolqing.
,,
equally
because
it
lacks
it
generality;
how the definition of probaprobable

; we are guilty
\"
of
:'
to probability suffers from the
approach
frequency
own!relative
Note
weakness.
philosophical
major
bility
'
Our
OF PROBABILITY
VIEWS
OTHER
same philoso\177lhical
weakness.
We might ask what
sort
of limit is meant in
is logicallypossiblethat the relative frequency n\177/n behaves
badly,
even i\177 the limit; for example, no matter how often we toss a die, it
is just
conce/ \177able that
the ace will
keep
turning
up every time, making
lira n\177/n -- 1. Fherefore,
we should qualify
equation
(3-1) by stating that
the
limit occurs \177ith high
?robabilitF,
not
logical
certainty.
In
using
the
concept
!.
of probability in the definition
c(f probability,
we are again guilty
of circular
? It
(3-1
equation
reasoning.
,-
Objective
Axiomaft:
(b)
The
onI,\177
In
as axioms'
approach,
in fact, is an abstract axioversion, the following properties are taken
sound
philosophically
appro\177.ch.
matic
Probability
a simplified
.....
Axioms
Pr
(e\177) >
_
(3-2) repeated
Pr (ex) +
Pr (es)'''
Pr (E) =
\177
Pr
Pr
(e 5)
= 1
(e0
(3-4) repeated
(3-9) repeated
Then the ol her properties, such as (3-1),(3-3),

and
(3-20)
are theorems
derived fro_\177 these axioms
.with axioms and theorems together
comprising
a system of analysis that
appropriately
describes
probability
situations such
as d\177e toss\177n\177g,
Equatiqn
large
easily
numbers.
in
fa\275
matic theor
event
E,
etc.
(3-1) is. particularly

important, and is known
Equations
(3-3) and (3-20) may be proved
that
we shall give the proof to illustrate
how
can be developed. We can prove
even
stronger
as
very
the la,v of
easily,
so
nicely
this axio-
results'
for
any
50
PROBABILITY
Theorems.
_<
(E)
Pr
(3-28), like
Pr (E)
_<
Pt(E)
= 1 -Pr(E)
(3-29),
(3-2)
like (3-3)
(3-30),repeating
(3-20)
Proof
axioms
to
According
is therefore
positive;
To prove (3-30),we write
terms, and
Pr (e0
(3-9) and (3-2), Pr (E) is the

thus (3-28) is proved.
out axiom (3-4)'
+ Pr
Terms for
+ Pr (e,v) =
\".
Terms
for
\177
is just
(3-9), this
to
According
(e2)
of positive
sum
Pr (E) + Pr (E)= 1
In (3-28) we proved that every probability
(E) is positive or zero; substituting
this
is positive or zero. In particu(3-31) ensures that'
Pr
lar
Pr (E) _<
into
(3-29)
Thus our
be derived.
above theorems are established;other
(c)
Probability
Subjective
Sometimes
that
events
any
given
this
as \"likely\"
by observing
influences
vitally
interpretation. For
next
the
within
or
example,considerevents
average tomorrow,
market
stock
the
in
certain government
layman
\"unlikely,\"
on
risks
To answer this
bility
by
worth
in
practical
decision
be
as
such
of a
overthrow
that
decisions
the
likelihood
be
can
estimated
be made
taking.
need,
theory
theory of personalprobaprobability
is defined
event; we shall find this a useful
an axiomatic
has been developed. Roughly

speaking,
the
odds one would give in betting
on an
concept later
the
their relative frequency. Nevertheless, their

policy decisions, and as a consequence
must
rough-and-ready way. It is only then

are
or
with
month. These events are describedby

even though there is no hopeof estimating
in some
what
similarly
may
theorems
proved.
called personal probability, this is an attempt

to deal
cannot
be repeated, even conceptually, and hence cannot
frequency
an increase
(3-31)
follows.
(3-30)
which
from
(Chapter
15).
personal
OTHER
VIEWS
OF
PROBABILITY
Review Probl\177
\177lllS
A tetrah
3-29
(e4)
if possible,
so.)
staf'e
= .4;
) = .6;Pr
(b)
(c)
(e ) = .7; Pr (e2)
(d) Pr
3-30 In a fam
(c)
been loaded. Find Pr
tht
following conditions. (If the problem is impossible,

Pr (\1771) = .2; Pr (e2) =.4;
Pr (ea) = .1
Pr (\177)
Pr
(e2) = .4; Pr (ea) = .3
Pr (e
(ea)
= .2
(a)
(a)
(b)
die has
(four-sided)
',dral
given
ly of 3
At
!eiast
Ielast
one boy
2 boys 9
At
!e\177ast
2 boys,
lepst
2 boys,
At
(d) At
3-31 Suppose}hat the
.5
--
children,
is the
what
of
chance
least one boy?
given at
given
that
is a boy?
out of a restaurant
has to hand back their
the
eldest
3 customers
last
hat-checks, so that
the
girl
order. W \177at is the probability
(a) That no man will get the right hat?
(b) That
1 man
exactly
is t \177eprobability
their
random
in
will? .
(c) That exactly2 men will

(d) That all 3 men will?
3-32 What
lose
all
3 hats
that
peo >le picked at random

have different
birthdays ?
(b) A rot mful of 30 people all have different birthdays ?
(c) In a oomful
of 30 people
there is at least one p\177tir with
(a) 3
the
birthday
3-33
loaded
c )in
is drawn at
coin, if
it
in a
row
in
a row
(c) 20 times
in
a row.
3 times
Repeat Pr \177blem 3-33

b' it is biased
T faces,
coins, one of
random.
is flipped
(b) 10times
(a)
3-34
a thousand
co atains
bag
sides.
when
so that
What
and turns up
the loaded
the
which
has
is the
probability
heads
without
heads
that
on both
it
is the
f\177il
coin in the bag has both

of H is 3/4.
probability
same
H and
chapter
and
P'ariables
Random
Their Distributions
4-1
VARIABLES
RANDOM
DISCRETE
Again consider the experiment of fairly

tossing
3 coins. Suppose that
interest is the total number of heads.This
is an example
of a random
variable
or variate and is customarily denotedby a capital
letter thus:
our
only
X-- the total

The possible values of X are
likely. To find what the probabilities
(4-1)
heads
of
number
2, 3; however,
they
are not equally
are, it is necessary
to examine the original
sample space in Figure
4-1. Thus,
for example, the event \"two
heads\"
(X -- 2) consists of 3 of the 8 equiprobable
outcomes; hence its probability
is 3/8. Similarly,
the
probability
of each of the other events
is computed.
Thus in Figure 4-1 we obtain
the probability
function
of X.
The mathematical definition
of a random
variable is \"a numericalvalued
function
defined over a sample space.\"But for our purposes we can
be lessabstract;
it is sufficient
to observe that:
probabilities
with
In
our specific
the values
in
Figure
random
discrete
0, 1,
takes on various values
variable
specified
in its
example, the random variable
0,
1, 2,
4-lb.
3,
specified
probabilities
with
(number
by the
of heads) takes on
probability function
well enough, it is not always

which
stresses the random
variable's relation
to the original sample space. Thus,
for example,
in tossing 3 coins,
the
random
variable Y -- total
number
of tails, is seen to be a different random variable
from
X = total number
of heads. Yet X and Y have
the same 1)robability
distribution,
(cont 'd)
x Although
the
as satisfactory
intuitive
as
the
(4-2) will serve our

rigorous mathematical
(4-2)
function.
probability
definition
more
52
purposes
definition
DISCRETE
Pr(e)
RANDOM VARIABLES
i\177
53
\275
(T,T,T)
--.
\337(T,T,H)
---.
p(x)
\337(T,H,T)
----
\337
0
\337(T,H,H)
----
\337
1
\177
\337(H,T,T)
--
\337
2
\177
\337(H,T,H)
\177
\337
3
\177
\337(H,
\177
H, T)
\337(H,H,H)
Old
sample space
N ew , smaller
sample space
(a)
p(x)
2
(b)
FIG. 4-1 (a) X
the
variable
random
of heads in
fUnctiOn.
\"number
probability
three tosses.\" (b) Grapi:\177
of the
a probability function, as in Figure4-2,

we
begin
by c\177nsidering in the original sample space events
such
as (X = 0),
(X = 1),... jin general
(J\177 =
x); (note that capital
J\177represents
the random
variable, and Ismall
x a specific value it lnay
take).
For these events we cal
culate the probabfi\177tzes
and
denotethem \234(0), \234(1),.,
.\234(x) ....
This
probability
fu \177ction \234(x) may be presented equally well in any of 3 ways:
In the
ge
1.
Table
2.
Graph
the
same
case
of defining
f'orm, as in
form, as in
and anyone who

were
teral
ra!
used
Figure
the loose
edom variable.
probability
funcl ion.
\177
This notation,
[ke any
other,
4-1a.
Figure
4-lb.
definition
(4-2)
In conclusion,
may
be regarded
Thus, for exampi

for Pr (X =
the number e,/2(3) is short
>f heads
is three.\" Note that
that
spondingly abbn
viated
to ?.
might be deceived into

there is more to a random
simply as an
3), which
when
in turn
A' =
3 is abbreviated
variable
thht they
than
its
for convenience.
abbreviation
is short
thinking
for
\"the probability
to 3, Pr is
corre-
54
VARIABLES
RANDOM
Pr(el)
Pt(e2)
Pr(e)
01d outcome
set
New,
smaller
set of
numbers
4-2 A general random

variable
X as a mapping of the original
outcome set onto a
condensed set of numbers.
(The set of numbers illustrated is 0, 1, 2,..., the set of positive
integers. We really ought to be more general, however,
allowing
both negative values and
fractional
(or even irrational) values as well.
Thus
our notation, strictly
speaking,
should
FIG.
be
x\177,x
2, ...
x\177,...
than O, 1,
rather
2,... , x, .... )
sample space (outcome set) is reducedto a much

sample space. The original sample spaceis introduced
us to calculate the probability function
?(x)
for the new space; having
a
complicated
numerical
its purpose,
can be
Figure
the old
easily
very
answered
4-3, what
space
unwieldy
in the new
is the probability of
relevant probabilities in
the
Pr (X <
forgotten. The interesting
is then
1 head
new
sample
1)
p(0)
smaller,
to
enable
served
questions
space. For example,referring

or fewer ? We simply
add
to
up the
space
--
+ p(1)
\177
\1774-
a _
\177
(4-2)
p(x)
Prob
= \177
'2
FIG.
4-3
The
event
\337
(H,
T, H)
\337
(H,
H, T)
<Z
1 in
'3
both sample spaces,illustrating

the new sample space.
the
easier
calculation
in
DISCRETE
The
RANDOM
VARiABLEs
answe\177
found,
been
have
could
spa,
sample
more
with
but
trouble,
in the original
EXAMPL
In the
in t
changes
is 1,
;ame experiment of
tosses
of a coin, let Y = number
of
for the sequence HTT, the value of Y
there is one changeover from
H to T. In Figure 4-4 we use the
becaus,
\337(H, H,
\337(H,
fair
example,
For
\177esequence.
T)\177-.
\177
T, H)
(T, r. H).
FIG.4-4
The
random variable
Y (\"number of
its probability
and
technique p(y)
de,'eloped above to
function
define
changes in sequence
distribution.
random
this
of 3
tosses of a coin\
variable and its probability
!-
PROBLEMs
In each
by
4-1 In
.\177se,
constr
first
4 fair
(a) 2' =
(b)
4-2 Let
4-3
Two
paper
Y=
2' be
tabulate
ucting
the
a sample
probability
function of the random variable,
space of the experimental outcomes.
ossesof a coin,let
heads.
lumber
of
lumber
of changes
in
he total
number of
dots showing
box(
are
between
t]
s each contain
drawn,
\177enumbers
6 slips
one from
drawn
sequence.
when
of paper
each of
(absolute
the
two
fair dice
are tossed.
numbered 1 to 6. Two slips of

Let 2' be the difference
boxes.
value).
56
VARIABLES
RANDOM
To review Chapter 2, consider the experiment

of tossing
number of headsX may be 0, 1,2, or 3. Repeat this experiment
to obtain 50 values of X, so that you can
table
of X.
(a) Construct a relative frequency
4-4
=::>
O)
Graph
the sample \177 from (2-lb).

the mean squared deviation
Calculate
1. The relative
2.
were repeated
experiment
If the
(e)
tend
frequencies
50 times
table.
frequency
relative
this
(c) Calculate
3 coins; the
from
(2-5b).
, to
millions'oftimes
what value would
\177 tend
squared deviation)
3. MSD (mean
s 2 tend?
tend?
4.
4-2
3 we defined probability
In Chapter
we notice the
close
and
4-4
Problem
3 coins.
VARIANCE
AND
MEAN
If the
to toss ad
the probability
as
limiting
frequency. Now
relative
frequencytable observed in
the probability table calculated in Figure
4-1, for tossing
relation
between
size were
sample
the
infinitum),
the relative
limit,
(i.e., if we continued
table would settle down to
increased without
frequency
relative
table.
frequency table (Problem 4-4),we calculated

sam?le 4.It is natural
to calculate
analogous
tion values from the probability
table,
and call them the mean \177and
rr 2 of the
probability
distribution ?(x), or of the random
variable
From
\177 and
the relative
variance
the
s 2 of our
mean
populavariance
X itself.
Thus
(4-3)cf.
1, & \177 xp(x)
Population
mean,
Population
variance,
o2
& \177 (x
(4-4) cf. (2-5b)
- t,) 27(x)
Or simulate
this by consulting
the random numbers in Appendix
even number representing a head, and an odd number a tail); or
results, as follows'
(2-lb)
Table
IIa,
else use the
(with
an
authors'
03220 11232 11221 22213

13332
12212
12121 11233 21112 11213
Strictly
n --\177co,
speaking, we calculated the mean squared

they become practically
indistinguishable.
deviation, rather
than
s2. However,
as
MEAN AND
our e: :ample of tossingthree coins,we
the anal \177gy here to our calculations in
We
call fi\177
the \"population
mean,\" since
For
Note
sample mean, s\177nce it

parent
popul a :ion of all
tion and samp\177le variance,
and sample Vhlues

TABLE
coins. On
of the
tdsses
possible
all
Given
on the
is based
it
tosses.
similarly
o 2 and s 2 represent
A c lear distinctionbetweenpopulation
respectively.
to
we return
Var
Random
ia\177l
Easier CalcUlation
of a2, Using
of cr 2 from
Calculation
(x
(x
\177)
\177)2
(4-5)
(4-4)
(\177,-/)2
\177,.pC)
?(\177,)
--3/2
9/4
9/32
3/8
--1/2
1/4
3/32
3/8
6/8
+1/2
1/4
3/32
3/8
-t-3/2
9/4
9/32
12]8
9/8
02= 24/32
12/8
/\177=
7.
6 and
in Chapters
point
this
drawn
tosses
Mean and Variance of a
of the
x p(x)
0
3/8
3/8
1/8
called
a
fro m th e
p 0pula-
possible
(4-3)
1/8
,\177 is
mere sample of
of t* from
Function
4 = 1. 5
Tabl\177
populatlon of
the mean
hand,
Calculation
ProbabiJity
in
on a
Calculation
\1774-1
cr 2
is based
crucial;
is
and
compute/\177
Table 2-4.
other
the
VARIANCE
--
= 1.50
x\177?(x)
= 24/8
/1\177= 18/8
.75
o2 =
Since th definitions
parallel inter I tretations.
a ?'
of/\177 and
We
average, usi\177 g probability

The mean is dsoa fulcrum
those
parallel
of
,\177 and
6/8
s 2, We
find
continue
to think
of the mean/\177 as a weighted
weights rather than
relative
frequency
Weights.
and
center.
The standard deviation is a measureof
spread.
5
The
ion of
computa
09'
Mth
*
last column
Proof
its
proof,
1
ofjTable 4-
is equivalent
thati(4-5)
noting
th!t.,
\177 x?(x\177
(4-5)
is illustrat\177g
in the
\177
(x2 _
2,uz +
\1772)?(x)
is a constant:
=
Since
\177
to (4-4).Reexpress(4-4)as'
.2=
and
using:
by
a2= \177 a\1772?(x) _ t, 9.

is analogous to (2-7).The computation
This formula,
simplified
is often
=/\177
and
\177 \177p\275)
\177?(x) =
\1772=
1, we
x p(\177) + \177'
2.\177
have
\177 x\177
p(x)
\177 \177,\177?(\177)
_
2\177(/\177)+
y2
?(1)
(4-5)
proved
58
VARIABLES
RANDOM
When
a random variable is linearly transformed, the new mean
and
variance behave in exactly
the same way as when
sample
observations
were
transformed in Section
2-5 (the proof is quite analogous and is left
as an
exercise). For future
reference,
we state these results in Table
4-2.
We could
the rows,
all the
verbally
out
write
as Follows:
Variable
Random
in
this
Mean
working
table,
(Y) of a Random
Transformation
Linear
4-2
TABLE
information
(X)
Variable
Variance
across
Deviation
Standard
\"Consider
the random variable X, with
mean
3tx and variance
If we definea new random
variable
Y as a linear function
of X (specifically
Y = a + bX), then
the
mean
of Y will be a + b,ux, and its variance
be
will
b2rr\177r.\"
PROBLEMS
3t and o2 for the probability

distributions
check, compute a \177'in 2 ways from the definition
the easy formula (4-5).
(4-6) Compute
3t and
o 2 for the random variables of
4-5
Compute
As a
(b) Problem
4-7
Letting
If
(a) By
Problem
(4-4),
and
4-1.
from
4-2.
Problem
(a)
in
4-3.
of dots rolledon a fair
4, calculate3tr and ay in 2 ways:
the number
2X +
the
tabulating
probability
function
of
Y,
die,
then
find
3tx and
using
a x.
(4-3) and
(4-5).
(b)
(4-8)
By
A bowl
etc.
9's,
Table
contains tags numbered

Let I'
from
1 to
denote the number on a tag
10. There
drawn
at
are ten 10's, nine

random.
a table of its probability

function.
(b) Find 3tx and ax.
A student
is given 4 questions, each with
a choice
of 3 answers. Let
X be the number of correct answers when the student has to guesseach
answer.
Compute
the probability
function and the mean and variance
(a) Make
4-9
4-2.
of
X.
i' be
:c>4-10 Let
(This
4-11Suppolethat
for family sizeo

at 6.)
dren
Proport\177 3n
of famili
Source.
St\177
(a) Le
nted
drawn
) The
regist\177
proba[
Is
BINOl\177
the
in
Then
is
\177Slightly
.17
.11
.06
.03
rr
1963, p. 41.
of U.S.,
of children in
number
slip of
by
done
be
\17702
paper,
the
imagine each
well
mixed,
is given
in
slips
fufiction of X
at random.
selected
a family
lots:
the
being
familY'
and then one slip

table, of course.
x.
[IAL DISTRIBUTION
as an example
example of
number
random
variables.
We
shall
study
can be developed
formula
a general
variable
binomial
number of headsin
to g :neralize,we
or
of how
?(x).
function
Lbility
clas sical
discrete
of
types
many
>mial
the to
listed
.18
number
variable.
There a\177e
are
\337
43
a child
\"su, ',cess\"
either
probability
X =
In order
let
a
bin
prob\177
The
--/\177
be selected at random (rather

than
a famil'y), and
of children in his family.
(This selection may be
,y a teacher,
for example, who picks a child by lot from
the
of children.)
What are the possible values of Y? Completethe
ility table, and compute #r and a r.
\337
\177x or/\177r
more properly
called the \"average family
size'\"'?
done
for
,e the
N\177,w
Y
There
may
on a
repres\177
one- the
data
the
simplicity,
be the
Find p x and
4-3
(For
'Y
a
families yields the
of American
Abstract
itistical
(This election
(c)
population
Z --
Z, where
by truncating
No. Chi
(b)
of
4-5.)
whole
the
table
ng
follow
alterec
let
standard deviation
}tre the mean and

\177ntroduces section
What
# and standard deviation
with mean
variable
a random
59
DISTRIBUTION
BINOMIAL
shall
\"failure,\"
of
speak
with
successes
n tosses
is
of a
coin
of n independent \"trials,\"
each
resulting
respective
probabilities
rr and (1 - rr).
,Y is defined as a binomial
random
many practical random variables of this

in T \177ble 4-3. We shall now derive a simpleformula
type,
for
some
ot ? which
the probability
60
VARIABLES
RANDOM
TaBLe
\"Trial\"
Tossinga fair
of a
Birth
coin
child
Examples of
4-3
Variables
Binomial
\"Success\"
\"Failure\"
Head
Tail
1/2
Boy
Girl
Practically
1/2
Family size
'
rr
Number
tosses
of
heads
of
Number
in
boys
family
Throwing 2 dice
Drawing
a voter
7 dots
in
Democrat
Anything
else
6/36
'
Number
\177
throws
of
sevens
Proportion of
Democrats
Republican
a poll
Number
size
Sample
the
in
of
Democrats
in the sampk
population
the
history of one
which may
radioactively
decay during a
certain time period
Decay
Very
No change
Number of
large,
Very
small
the number
of atoms
atom
radioactive
decays
in
sample
the
\234(x).
First,
consider
the special case in which
we compute
the
probability of getting
3 heads
in tossing 5 coins (Figure 4-5a).Each
point
in
our outcome set is represented as a sequenceof five of the letters S (success)
and F (failure).We concentrate
on the event three heads (X -- 3), and show
all outcomes that
comprise
this event. In each of these outcomes
S appears
function
\337(sssss)
(sss ...........
ss)
(sss...........
SF)
\337(SSSSF)
Outcome
\337(SSSFS)
set
5 trials
trials
(SSS --SSFF...
F)
\177
Event
X=3
times
n-x
times
\337
(FFSSS)
\337
(FFFFF)
FIG. 4-5
\337(FF
x..
Computing
binomial probability.
coin. (b) General
case:
..............
(a) Specialcase' 3
x successes in n
F)
x..
heads
trials.
in 5 tosses
of a
DISTRiBUTiON
BINOMIAL
61
three times
SSSFFis
F twice.
and
The prol:
of
ability
the
Since the
of S
probability
is
probability of
S FF...
F
'i
is ,\177.,,...,
'
note:that
any
example,
th e probability
-- =).
\177r'(1
we
Now
are include d only have to
three S's and
this
\177r) =
(\177 _
rr\370:
(1
,,)...
--
rr)\177-x
differentIy.
such
many
in
which
is designated
of ways
number
(outcomes)
sequences
precisely the number of ways
can be arranged.This
is \370
and
or,
in general
=10
x!(n
(X \177
=
includes
with
\177 7T tg \177
)i
\177(\177
its probability
Hence
p(3) =
a probability
1312
(\177) (\177)
\275\177
is:
(x
Vr
\234(x)
5\177
3!2!
(\177
summarize
o This formula
designated
and so
thus
d-)
with
(xn)rr\177(1_
rr),\177-\177
(4-7)
\177o
Figure
4-6.
is C
S\177,S2,
on;
x)
= 10
outcomes, eacl\177
We
- x)!
event
Our
as follows. Suppose we wish to fill five spots with five

F1, F2. We have a choice of 5 objects to fill the first spot, 4 the
number of options we have is' 5.4.3.2.1= 5!
S eveloped
th\177a,
We
For
\177r\177(\177
-- =)\177
ordered
onIy
determine how
This is
evento
two F's
=. = .(\177
they are
appear;
tors
\177).
independence of the trials.

outcome
in this event has the same probability.
of SFSSF is
further
The same fa
(\177 ,.
by the
justified
being
multipI\177catlon
x times
n --
times
'
(1 --
F is
In general, the
sequence
SS...
this
and
rr
objects,
second,
(4-6)
\275o\177t
'a)
the
as
RANDOM
62
VARIABLES
Pr(e)
,rr(1
\177')\"-z
\337
{FF
fr (1
\177')\"-z
\337(FF
.........
.........
fr(1 - fr) \"-z
\337(SF
.........
rr\177(1 - fr)'*-\177
\337(FF'\"
4-6
FIG.
As a
final
fair coins.
three
But
this
n,,r
we
What is the
of a
is not the
(1 - \177r)
\177-\177
....... SS)
example,
Pr (X
a confirmation
(1-
FFF)
Computing binomial probability of x
trial in which
dependent
FSF)
FSS'\" SS)
\337(SSSS
\177,',
FFS)
rr
= 2)
return
probability
to our previous experiment of tossing

of two heads? Each toss is an in-
1/2. Noting
in n trials.
successes
also
that
n =
2! 1!
3[
\177 \177
32) r\177_V.l_\177_\177._
3 and x
(\177)a
\177
a
= 2, we have
(4-8)
previous result.
problem at hand;
appear as S. Thus
in our
case we
cannot
distinguish
between Sz, S2,
of our separate and distinct

arrangements
in (4-6), (e.g.SzS9SaF\177F2 and S2S\177SaF\177F\177.)cannot be distinguished, and appear as the single
arrangement
SSS\177rF. Thus (4-6) involves serious double
counting.
How much?
We double counted 3.2.1 = 3! times because we assumed in (4-6) that we could dis-
and
Sa--all
of which
many
and Sa when in fact we cannot. (3!is simply

the number of distinct
and Sa.) Similarly,
we double counted 2.1 = 2! times
because
we
that we could distinguish
between
F\177and
Fs, when in fact we cannot.
When (4-6)is deflated
for double-counting
in both these ways, we have
tinguish
between
of arranging
assumed in (4-6)
ways
S\177,S 2,
S\177,S 2,
= 3! 2!
3!(5
-- 3)[
63
DISTRIBUTIONS
CONTINUOUS
PROBLEMS
as well as the complete binomial distribution

i n' Tabl
tabulated
e i][I
of the Appendix' for
(b)
.
\177'hen
(c)
\177rom (b),
Graph
the
(d)
4-13
b\17711
drawn
is
(b) irePeat
a secondball
(a), for
dice, let Xbe the

tos.s of
blind
,q
on:the
basis of
binomial
variable,
\177(For
f0
that
?
probability
most
2 times
in terms
of
students
only,
showing
its
(b)
ksymptotes.
and
in
At
the target
6 tosses
you will hit the target
least
3 times ?
you
guess
the mean of \177general
rr ?
rest.
of hitting
guess the variance?
Can you
leading into section
of
4-5). Graph the
inflection.
DISTRIBUTIONS
In Chz ,ter 2 w e saw how a continUOus quantity

graphed
Wi:h a relative frequency histogram. The
pro'
TabUlatethe
qlaximum.
iboints
CONq,INUOUS
\177Starred
that occur.
cr \177.Graph.
probability
these questions, can
calculus
e-\177/2,
and
3t
suppose the
6-6.)
\177mmetry.
(d)
a dart,
of aces
number
of .Y. Find
\370Chapter
(a)
(c)
4-4
of
function
function
(This leadsint
function
number of blue ballsdraw\177.

Problem
4-9 using the fornlulas of
the total
probtbility
is 1 . What is the
exaCily2 times? At
\337
4-18
until
cr. Graph.
In rt 11ing 3
On
4-17
Y =
black
Tabulate its proba-
balls drawn,
red
and so On
is drawn,
.ection.
this
4-16
coin.
fair
replacement)
with
tin
of
number
,it and
Find
the probability
Check
(4-15)
use
are tossed;
calculate 3t and a 2.
of (b), showing
from a bowl containin g 2 red, 1 blue, and
= the total
(a) Eet X
bilit,!function.
for a
the results
obtain
\177 to
\177,
rr =
set
ballsl The ball is replaced, and

3 bats
have been drawn (sam
4-14
\177
probability
the
'al rr.
genei
\177
use.
optional
your
a diagram
similar to Figure4-6to obtain
(a) ( :onstrUCt
funct
ion for the number of heads Z when
4 coins
4-12
p(x) are
\1771
eros
are optional,
since they
are
such
as height
histogram
more theoretical
and/or
of
\177Was
heights
difficul\177
than
best
of
the
64
VARIABLES
RANDOM
4-7a below. (For purposes of illustration,

rather than inches.
Furthermore,
the g-axis has
been shrunk
to the same scale as the x-axis.)Note that
in Figure
4-7a relative
frequency is given by the height of each bar; but since its width (or base) is
1/4, its area (height times width) is numerically only 1/4 as large. Thus we
can't use area in this figure to represent relative frequency, since it would
badly
understate.
In fact, if we wish
area
to represent
relative frequency each
height
measure
we
reproduced
2-3 is
Figure
Figure
in
in feet,
1.00-
0.75-
0.50-
0,257 Height (ft)
(a)
Relative frequency
ea
,- 1.00
>' 0.75
Unit square,
= 0.50--
Area
= 1
--
\1770.25
a::
(b) making
(a) transformed into

total area -- 1.
increased
fourfold.
This is done
area of each bar is relative
frequency,
FIG.
4-7
Relative
must
height
be
frequency
relative
In
histogram
frequency
and
in
relative
4-7b,
in
density
frequency
Figure
the height
where
the
of each bar is called
density.
general
(relative frequencydensity)(cellwidth)
area of any
sum
7 Height (ft)
There is but one

to one (the sum
more
of
bar
(relative
relative
frequency)
frequency.
important
observation. In Figure
relative frequencies must be
all
4-7a, the
heights
one). From the
Height
Area = relative frequency
of
in
men
(if)
interval
(5\177 to 5\177ft)
\177.probability
Area = relative frequency
\177
of
men
in
interval
probability
(c)
Area = probability
Height (ft)
Height
(ft)
of drawing a man in interval
Height (ft)
:lative frequency
density
may be approximated by a probability
density
size increases, and cell size decreases. (a) Small
n, as in Fig. 4-7b.
(b)Largeenough n to stabilize relative frequencies.
(c) Even larger n, to permit finer cells
while
keeping
relai:ive frequencie s stable. (d)For very large n, this becomes (approximately)
a smooth probability density curve.
FIG.
function
How
4-8
as
sample
65
66
VARIABLES
RANDOM
of
equivalence
numerical
follows
the
that
characteristic of a density
to
equal
sum
also
Figure
And
it encloses an
in statistics:
function
in
to one.
area
this
4-7b,
it
is a
key
numerically
1.
In Figure
4-8 we
1. Sample
2. Ceil
size
as
frequencydensity
of
increases.
size decreases.
a small
chance
sample,
size increases,chanceis
At the
probabilities.
of cells.
definition
to the relative
happens
what
show
random variable
a continuous
With
4-7b must
area
4-7a to
Figure
in
height
in Figure
areas
out,
same time,
we
remains fixed
refer to
shall
the picture. But as sample

frequencies settle down to
in sample
size allows a finer
at 1, the relative frequency
so-called probability density
and relative
increase
the
While the area
density becomes approximately
function, which
influence
fluctuations
averaged
the
curve,
simply as the
function,
probability
designated
If we wish
to compute
the mean and variance from
Figure
4-8c,
the
discrete formulas (4-3)and (4-4)can be applied.
But
if we are working with
the
probability
density
function in Figure
4-8d,
then integration
(which
calculus students will recognize
is the limiting case of summation)
must
be
used; if a and b are the limits
of X, then (4-3) and (4-4)become
(4-9)
Mean,
,tt
Variance,
\177x
p(x)
(x
\1772 =
dx
\177)2 p(x)
(4-10)
dx
that we state about discreterandom

variables
are
continuous random
variables,
with summations
replaced by
integrals. Proofsare alsovery similar. Therefore,
to avoid tedious duplication,
we give theorems
for discrete random variables only, leaving it to the reader
to supply
the
continuous
case himself, if he so desires.
the
All
equally
4-5
theorems
valid for
THE
DISTRIBUTION
NORMAL
For many
random variables, the
bell-shapedcurve,
called
the
normal
probability
curve, or
density
function
Gaussiancurve,
is a
as
specific
shown
in
E NORMAL
TH
4-9 to 4-12. It is
Figures
statistics. Ma
are
in
made
variables
ny
the binomial'
physical
\177easuring
there
n addition,
often can
which
(a) Standard Normal

pro
The
be approximated by
symbols
3.14
1./v/\177-'\177is
ar
rr
2.7
and
curve.
normal
the
standard normal variable Z
of the
function
)ability
constant
in
Distribution
p(z)
The
function
distributed; for example,errorsthat

and economic phenomena often are normally
are other useful probability functions
(SUch
as
are normally
\177
distributed.
probability
useful
most
single
the
DiSTRIBUTiON
a scale
x/2\177
is
(4-11)
e-k\177z2
to make the
factor required
d e denote important
mathematical
constants,
8 respectively. We draw the normal curve
total
area
1. The
approximately
4-9 to
in 7 Figure
p(z)
-2
-3
1.0
.5
I
I
-1
Unit square
.2
\337
-2
.3
-1
(b)
FI( r.
reach
(a) Standard
a maxintum at z
to the left
In Problem
4-9.
The
Section
4-9
or
right
of
0,
4-11
you
math\177
4-3 to
may
-- 0. We
normal curve. (b) Vertical
axis
rescaled.
so: as we move
z2 increases; since its negativeexponent is increasing
confirm
have confirmed
that
in (4-11)
the
natical
constant
rr = 3.14 is not
d(:signate probability of success.
of (4-11)
graph
to
that this is
be
confused
is that
in Figure
shown
with the
rr
used
in
68
VARIABLES
RANDOM
?(z) decreases. Moreover, the

decreases;
as z takes on very
in size,
away from zero, the

or negative) values,
we move
further
?(z)
more
(positive
large
very large and p(z) approacheszero.

Since z appears only in squared
form, --z
generates the same probability
in (4-11)
as + z. This confirms
the
shape
of this standard normal curve as we have drawn it in Figure 4-9. The mean
and
variance
of Z can be calculated by integration
using
(4-9) and (4-10); since
the negativeexponent in
becomes
(4-11)
is symmetric.
curve
this
Finally,
p(z)
(o4Zqzo)
0
4-10
FIG.
this requires
enclosed
Probability
calculus, we
quote
Zo
by the
normal
between
0 and
z0.
without proof:
results
the
curve
\177z=O
Crz= 1
it is
we
very reason,
this
for
Later
when we
mean:
in
speak of
it so
shifting
fact,
that
Z is
its
that
called a standard
any
\"standardizing\"
is 0
mean
and
variable,
shrinking
normal
variable.
this is precisely
(or
stretching)
what
it so
deviation (or variance)

is one.
The probability (area) enclosedby the normal
curve
between the mean
(0)and any specified value (say z0) also requirescalculusto evaluate
precisely,
but may be easily pictured in Figure
4-10.
This evaluation of probability, doneonceand for all, has been recorded
in Table IV of the Appendix. Students without
calculus
can think of this as
accumulating the area of the approximating rectangles, as in Figure 4-8c.
To illustrate this table, consider the probability
that Z falls between .6
and 1.3,as shown
in Figure
4-11a. From Table IV in the Appendix
we note
that
the probability
that Z falls between 0 and .6is .2257;
similarly
the probability that Z falls between 0 and' 1.3is .4032.
We require
the difference in
that its standard
these
namely:
two,
Pr (.6
In
Figure
4-1
Becauseof
lb we
the
<_
--<_
1.3)
.4032 --
.2257 =
.1775
consider the probability that Z falls between --1 and +2.

of the normal curve, the probability
that Z falls
symmetry
<
\177.3)
-1
-2
DISTRIBUTION
NORMAL
THE
FIG. 4,11
In
.3413.
pr (--1
the
just over
2/31
PROBLEM
4-19 IfZ
probability of Z
+1,
\177hich
between
is
and
<_
(b)
If
Z <
I\177
ekaluate:
oo).
2).
%) =
.99, what
is z o?
(-Zo
_<
Zo) =
.95, what
is
\177dom
to
Z).
Z <
ra
Table IV
1.64).
<_
_<
Normal
standar*\177d
between one
1) is .6826,or
< +2).
_<
use Appendix
(-z0
(b) General
If a
enclosed
< +
-
(-- 1 __< Z
n0rmal 'cU;\275e,
variable,
normal
oo <
(_
(e)
(a) If
probability
the
the mean
.8185
\177
Pr (--2
Pr (Z <_
(d)
.3413 + .4772=
that
below
area of the
of the
is \177standard
pr
2) =
Z <
and
(c) Pr (-2.33 _<
and
0 and
between
pr\370bability
this to the
confirm
above
(a) pr (-2
(b)
__<
\275.tudent may
standard de\177ation
4-20
zi
.Z1772--which yields
2-namely
Finally,
to the
add
we
instance
thi'\177
probabilities.
normal
Standard
1 is identical
0 ah!d
between
(b)
(a)
_<
Zo
Distribution
variable
deviation
X
a, it
probability curve,
function iss written:
has a.n0r\177al
probability
p{x)
1 e_lA(?)2
with
mean
(4-12)
tt (4-12)is centered
at/\177, we note that the peak of the
curve ocCurg when the\177
To prove
that
negative exP\370\177ent attains its smallest value 0, i.e., when x -- ,u. It may also be shown
(4-12)
is seal e d by the factor a. Finally,
it is bell shaped for the same reason5 given
in
part (a).
RANDOM
70
VARIABLES
in the very special case in which

F = 0 and a -- 1, (4-I2)
standard normal distribution (4-11). But more
important,
regardless
of what
F and
a may be, we can translate
an),
normal
variable
X in (4-12)into
the standard
form (4-11) by defining:
that
notice
We
reduces
the
to
X-- F --Z
(4-13)
o'
p(x)-
\177 e -\177
X Standard
Sometimes
scale
is stretched
FIG. 4-12
Linear
\177 \177
\177 p(z)-\\ \177\177
normal
variate
/ Sometimes
1 e-\177Z2
\177
/
of any
transformation
normal
is shrunk
it
the
into
variable
standard
normal
variable.
Z is
recognized as just a linear

transformation
that whereas the mean and standard
Notice
of X,
as shown
of
deviation
in
Figure
a general
4-12.
normal
variate X can take on any values,

the standard
normal variate Z is unique
mean
0 and standard deviation 1 as proved
in Problem
4-10.
To evaluate any normal
variate
X, we therefore translate X into
Z,
and
then evaluate Z in the standard
normal
table (Appendix Table IV).
For example,suppose
that
X is normal, with
F = 100 and \177r= 5. What is
the probability
of getting
an X value of 110or more?That
is, we wish to
evaluate
with
Pr (X >_
First
(4-14) can
be
written
equivalently
pr (X-Any inequality
divided
by the
is preserved if both
same positive
amount
110)
(4-14)
o as
;
100 -> 110 --
(4-15)
100)
sides are diminished

(5).
by
the
same amount
(100) and
THE
not!
which,
is
(4-13),
ng
71
DISTRIBUTION
NORMAL
(4-16) is the
d form
of (4-14), and from Table IV
.0228. Moreover, the standardiied
form
(4-16) all\370\275is a Clearer interpretation
of our original
question;
in \177fact, we
were askin\177 \"What
is the probability
of getting
a normal
valUe at l\177ast two
standard
deviations
above the mean ?\" The answer is' very small--about one
We see t
evaluate
we
in
standardize
to be
probability
this
t.
fifty
As a
line has
standard
between
is a
\177 Which
length
diiViation
9.9 and
a bolt picked at random from

a production
normal random variable with mean
10 cm and
What is the pr0babilitY
that
its length
Will
b e
suppose
example,
filial
cm.
0.2
10.1cm? That
is
Pr (9.9 _<
This
may
b\177
standardized
in the
written
Pr (9.9-10<
,
--X-.2
= Pr
=
(--.50
_<
10.1)
_<
form
10< 10.1
2---
_<
.50)
.38
These :alculations confirm our earlier observation

from
although tl}ere is any number
of normal
curves, there is only
normal cur\177e. This is fortunate;
instead of requiring a whole
we only ne\177d one (Appendix
Table IV).
4 21 Draw'\177
similar
diagram
the reit
4-22 If X i
r
(b)
'r (X
(c)
r (800
(b)
(c)
of tables,
book
(4.5
_<
<_
6.5)
800)
_<
mean
4-12 for both

being
area
the
'e over
\177
X)
population
of' 68
inches,
where
5 and Ox = 1
400 and Ox = 200
where
400
where
Ox
and
5 feet 6 inches
5 feet 6 inches and
3 answers,
= 200
of men's heights is normally

and standard deviation of 3 inches.
6 feet
A re under
e between
A \337
To ch :ck your
in
evaluated.
:tion of the men who

A\177
Solved
examples
calculate'
4-23 Supprse that

with
to Figure
directly above. Shadethe

normal,
(a)
(a)
standard
PROBLE
propo
4-12:
Figure
one
6 feet.
see whether
they
sum
to
1.
distributed
Find
the
72
RANDOM
4-6
A FUNCTION
OF
VARIABLE
RANDOM
of tossingthree
again at the experiment
Looking
that there
is a
Formally,
we might
Now
VARIABLES
let us suppose
state
that
is
R
form
specific
the
that
well given
is equally
bilities
the
of X
Value
third
4-4.
Table
Form of
Tabled
4-4
TABLE
by
R =
Function
,. =
(X
Value of R
= (,\177-
g(x)
- 1)\177'
(0
- 1)2
(1
- 1)2
(2
(3
- 1)2
- 1)2
of R are customarily
rearranged
in order as shown in the
Table 4-5. Furthermore,
the values
of R have certain probamay be deduced from the previousprobabilities
of X, (just as the
of X were deducedfrom the probabilities
in our original sample
The values
column of
which
probabilities
toss.
is
function
this
us suppose
of heads we
= (x
which
of
let
coins,
the number X
a function of X, or
upon
depending
reward
4-5
TABLE
Calculation
from
the
of the
(1)
(2)
(3)
?(x)
r = g(,\177')
3/8
\337
0
\337
2
\177/8\177.
3/8
\337
3
1/8
--------
X
(4)
R Value
Values
(5)
?(r)
4/8
4/8
1/8
4/8
0
_.\177
of Each
Probability
of Various
Probabilities
3/8
NOTATION
(0 or 2) give
The third
to an
ris:
and
The last
of R.
we
R value
,ourth
column
',olumn
shows
Table 4-4
from
note
of 1.This
in this table
calculation
show the
of R,
from X,
\"derived\"
been
has
it
Table 4-5.
distribution
probability
of the mean
the
9f X,
values
two
that
with arrows in
is indicated
tom variable; although
a ran
is
all
4-1). Thus
in Fig ue
space,
\177 73
it
has
the proper 1 es of an ordinary random variable. The mean

of R c\177n be
fron its probability distributio n, as in Table
4-5, and is found to
1.0. But if it is more convenient, the answer can be derived
from
the
computed
be
probability
of X, as in Table
di s :ribution
TAI\177LE
of R
Mean
4-6
= (X
4-6.
o
1
3/8
3/8
3/8
4/8
/\177/\177
=
It is
this works;
see why
to
easy
a disguised
in
first and
8/8
= 1.0
Way we
are calculating
9f Tab!e 4,6
line of Table 4-5.,Also,
the
second
and fourth
correspon
d to t he first
and
ihird
lines of Tabl e 4-5.1Thus
as Table
4,5; it therefore
Table
4-6 con rains precisely the same information
for/\177e.
The only difference in the two tables is t\177at 4-6
yields the sam e value
is orderedac :ording to X vaiu es, While 4-5 is ordered (and condensed)
/\177e
the
in
sanlte way as
p(:c)
g(x)?(x)
?(x)
o
from
1)\177, calculated
appear togeth\177er as
lines of Table4-6
exa
[heorem
be generalized,
function, then R =
from
her
function
probability
the
4-5.
the
NOTA\177
Some n e
of the
mean.
third
lines
as follows. If X is a random
g(X) is a random variable.
.probability
function
of R, or
alternatively
.variable,
/\177R
may
be
from
of X accordingto
\177
4-7
The
second
can
nple
\337
is any
calculated
eit
g
and
Table
values.
accordingto
This
the
in
(4-17a)
,ION
\177notation
For
wi ll
any rando
help u s better understand the vari\370us vie\177p \370ints

m variable, X let us say, all the following terms
74
VARIABLES
RANDOM
the same thing: \177o
exactly
mean
/\177x
mean
of X
= average .g
of
expectation
expected value of X
= E(X), the
Theterm
because
is introduced
E(X)
sum, i.e.,
a weighted
it
reminder
as a
is useful
that
it represents
\177(a)
we recall
well write
Finally,
equally
= Z
x
was just an
in an easily
that
(4-17b)
can be written
result (4-17a)
notation,
new
this
With
(4-3)
g(x)
p(\177)
(4-17b)
abbreviation for g(X), so that

remembered form'
we
may
Theorem
e[g(x)] =
As
an example
of
write
we may
notation,
this
(4-17c)
E(X -- t\177)\177=
\177
E(X
\1772
(x
--
(4-18)
p\177)\177
p(x)
By (4-4),
--/\177)'\177
(4-19)
we see that g= may be regardedasjust a kind

the expectation of the random variable
(X--3t)
z.
Thus
of
namely,
expectation
PROBLEMS
4-24 As in
fairly
4-1,
Problem
let X
be the number of
heads when
4 coins
(a) If R(X) = X\" -- 3X, find its probability function, and/\177R

(b) Find E [X- 21in 2 ways:
(1) Using the probability function of [X -- 21;and
(2) Using
(c)
Find
the
probability
function
of X
in
and
(4-17)
E(X 2)
(d) Find \234(X-
#x)\".
Is this
related to
\177x
in
any
way ?
for the plethora of names is historical.

For example, gamblers and economists
\"expected gain,\" meteorologists use the term \"mean annual rainfall,\"
and
use the term \"average grade.\"
\1770The
reason
use the
term
teachers
are
flipped.
75
NOTATION
(4-25)
4 coinsI are
The
letting .g
4-24,
Repeat
4-26
tossed.
T,
tiI.\177e
with
variable
be the
the average
Fid
of 2 b
rewarc
Review
a maze,
is a rai\177dom
.2
26
.2
27
.1
time.
1 biscuit
for each second. faster
just 23 seconds,he gets a reward
25 seconds or longer, he getsno
;cuits. Of course,if he takes

.) What
is the rat's averagereward 9.
ms
Probl
4-27 In
.3
25
is rewarded with
example,
if he takes
(For
25.
than
24
rat
the
run
when
?(t)
(a)
to
probability function.
the following
(b) Suppose
a rat
required for
in seconds,
in sequence
of changes
number
a r\177'.cent
election,
presidential
If Gallup
Republican.
40 \177owent
60
\177oof
voters
the
took a sampleof 5
went
Demo\177cratic,
at random,
voters
find
(a)
that
probability
T.\177e
(b) T't\177e
probability
the
i.e.,
that
election winner,
D\177mocratic.
(c)
4-28
Id what
Three:
coins
Make\177
the sample would
that
be all Democrats.
sample would correctly forecast the

a majority of the sample would be
i
a sample of 5 better than a sample of 1? \177

are independently flipped; let ,Y = number
of heads.
of the probability function, and find
/\177x and
a\177r
way is
table
assu\177ing
(a)
coins
T\177e
are
fair.
last coin is biased,coming

\"heads
up\" 3/4 of the tim e.
SuppOse the amount of cerealin a package
cannot be weighed exactly.
In fa\177 L, it is a normally distributed random variable,
with/\177 -- 10.10
oz. a\177;dcr = .040 oz. On the package is claimed, 'net weight,
10 oz.\"
(a) '\275\177hat proportion
of the packages are underweight?
(b) The
4-29
(b) T
o what
of 1 \177
of
the
value must the mean/\177

be raised
packages
be underweight?
in order
that
only
1/10
76
VARIABLES
RANDOM
volunteers had their breathing capacity

measured
before
and
after a certain treatment.
The
data
might have looked like this'
Eight
4-30
Capacity
Breathing
Before
After
Improvement
2750
2850
+100
2360
2380
2950
2800
Person
+20
-150
Let us concentrate
deteriorates,i.e., whether
so
that
his
chance
and 1/6the
so
(a)
Find
errors.
(b) What
*4-32
that
impossible.)
learns rapidly,
1/4 the second time,
time,
first
the
that
that
he
equally well from his successesand

may be consideredindependent.
table and mean
of X -- the total number of
learns
the three trials
the probability
is the
1/2 the
is
signs? (Assume
He
succession.
in
what
average,
is practically
a tie
times
improves or
is q- or --.
time.
third
We assume
failures,
that
a task 3
of error is
performs
person
a given person
of the improvement
has no effect, on
be 6 or more q-
precise
are so
measurements
(4-31) A
will
there
that
sign
the
treatment
that
Supposing
probability
whether
on
of more
probability
I error?
than
(Requires calculus)
A
random
variable
X is
p(x)
continuous, and has a

=
\177x2
-- 0
(a)
Graph
x <
function
otherwise
p(x).
(b) Find the
expect ?
(c) Find
0 <
probability
a 2.
mean, median, and mode.Are
they
in
the order you
chapte?
Tllo
5-1
Varxble
andom
DISTRIIiUTiONS
thi
in Table
i section
5- i
TABLE
The main
Therefore
to recognize
bb
will
we outline
a simpleextensionof the last two chapters.

the old ideas behind the new names.
:ction is
first
The
problem
Review of
5-1, as
both
Section 5-1, Showing
Origins
the
review.
and
introduction
an
of the
Ideas
(new
terminology)
ol
Application
d Idea
(G rh
(3-11)
H)
Pr(X =x
\177
Pr
(H/G)to=
applied
er (X = 2/Y
\177=1)
or Pr(E
\177 F)i=
general
(3-22)
Pr(G)
= x/Y \177 y)
F is/ndepe
Pr (F/E)!=
1)
\177/)in
(5-2a)
(5-2b)
p(2, 1)
'
?(x, y) in general
rh G)
\177
Event
function
to
applied
Pr(X=
Pr (X
p obabthty
Joint
Conditionalprobability
= 1)
p(2/Y
p(x/Y =
in general
dent of E if
Pr (F)
Pr(E)
Variable
(3-24)
p(x/y)
or p(x,
Pr(F)
(3-25)
function
:q)
\177) or
X is
= p(:r)
p(x/\177d)
independent
p(x)p(y)
of
if
'
TWO
78
Probability
Joint
(a)
VARIABLES
RANDOM
In the experiment
single sample space) two
of tossinga coin three times,let us
X-Y
--
number of
(on
our
heads
of changes
number
Two Random
TABLE 5-2
define
variables'
random
in
sequence
on
Defined
Variables
the Original
Sample Space
(1)
(2)
(3)
Corresponding
Corresponding
Outcomes
X value
value
!
1
2
1
0
2
1
1
0
We
might
be interested
in the probability of 2 heads and
1 change
of
sequence occurring together. As usual,
we refer to the sample space of the
experiment
(in column 1 of Table 5-2),and look for the intersection of these
two
events,
obtaining
Pr (X
For convenience
Pr
ing
in
of X
5-3
Table
1) =
Y =
(h
(5-1)
2/8
2 \177 Y = 1) is abbreviated to \234(2,1)

compute \234(0,0), \234(0, 1), \234(0,2), ?(1,2)...
is called the joint (or bivariate)probability
(X =
we could
Similarly
= 2
what
(5-2a)
, obtainfunction
and Y.
The formal definition is

\234(x,
y)
,__a
Pr
The general caseis illustrated

X =
2...
form
a partition
of the
(X =
in
Figure
\177
Y =
(5-2b)
y)
5-1. The
sample space, shown
events X -schematically
0, X
--
1,
as a
79
DISTRIBUTIONS
T^BLE
5-3
The Joint
?(x, y),
value
of
value
of X
of X
and
1/8
2/8
l/8
3/8
2/8
1/8
3/8
\177/8
4/8
2/8
2/8
iv(Y)
2
3
slic\177ng.
shown as aver tical
a Coin.
\177
o
1
horizontal
Probability
Tosses of
in Three
Iv'
events Y = 0, Y = 1 ... form a partition

sample space.The intersection
of the horizontal
slice
X =
and the vertical slice Y = y is the evep* (X = x (h Y= y).
Its probability is collected into ?(x, y) in the table.
This tan :, or specificallyTable 5-3, may be graphed,
but we run into
)hical
difficulties
in trying to represent 3 dimensionson a
some typogra
two-dimensiot
al
piece
of paper.
We shall suggest some possible
ways
to
resolve
this di fficulty. First, since the outlay of the x and y in Table '5-3 is
running x across and y up as in
arbitrary, we hall change it for convenience,
Figure 5-2a ( his is the custom
in analytic
geometry).
Then the functional
values ?(x, y)imay
the
Similarly,
slicing
be
of the
plotted
in the
direction of
we
which
axis
an
i\177agine
X=0
X=I
012
X=2
0
1
(X.Y)
\337
p(0,
p(x,
y)
\337 \337y
0)
.p(1, 0)
2
3
X-x
.
New
sample
space
Orig hal sample space
FIG. 5-I
TWO
random variables
(X,
Y),
showing
their sample
space and
joint
proba-
bility function.
i
TWO
80
VARIABLES
RANDOM
Y/
2.
/o
(a)
(b)
3 x
2
(\275)
5-2
FIG.
(a) Realignment
coming
be
Various graphic presentations

of the axes. (b) ?(x, y)
the paper.\" (c)p(x, y)
of Table
is
up out of the paper, as in Figure

5-2b, or the functional
by the size of the dot, as in Figure 5-2c.
5-3.
out of
up
'
value
may
the
joint
represented
(b)
function of X, for
It
the
p(2)
sample space
chunks
comprising
those
\177
general,
p(2,
p(2,
1) q-
yet have to work with

we compute the
X,
in
can
How
= Pr
probability
schematic
p(2) -- p(2,0) q=
interested only
of X and Y.
example
that
appears
X = 2 in the
bilities of all
in
Function
Probability
Marginal
Suppose
we are
probability
function
and
bivariate probability function

by a line segment \"coming
represented
by the size of the dot.
of the
is represented
(X =
of
(i.e.,
of Figure 5-1) is the

it, i.e.,
p(2, 2)
q- p(2,3) q-
\337
'-p(2,
the horizontal
sum
of the
y) q-
.--
slice
proba-
(5-3)
(5-4)
y)
for any given
2)?
event
this
probability
x,
?(x) =
v)
(5-5)
DISTRIBUTIONS
For example,
may be
is idea
tt
applied to Table 5-3.We

8 \177
place this s m in the right-hand

z, thus pJoviding the whole
and
every
sometimes
it was
X (which
could
in Figure
In
found
been
t\177ave
4-I).
of X, to
specifictechnical
describes how the probability distribution of X may
another
in play;
Y is
V!ariable
This is
describehow
it is just the ordinary probability funct!on of

without any reference to Y, as indeed ltWas
has no
\"marginal'
word
the
conclusiVn,
it simply
when
of course,
But,
is computed for
margin.
right-hand
distribution
probability
marginal
the
calle\177i
obtained\177
the
in
find,
thus
margin. Similarly,.p(x)
column
81
a row
is
sum
meaning.
calcul\177te.d
be
and placed\"\"m
calculated
the margin.\"
In an
?(tJ), the (marginal) probability diStri

row of Table 5-3;eachelement
in
this row
column
above.
Finally, we note as expected,.
the
exact
correspor
dence of this marginal
probability
distribution
of Y with the
probability
distribution of Y calculated
in Figure 4-4 without
any
reference
whatsoever
to
of
bution
(c)
we calculate
way
identical
s is set out
is the mm of the
Y; th
marginal
the
in
Function
Con&tlonal:Probabfi\177ty
example of tossingthree coins,we might wish to know the probanumbers

of heads, given one change in sequence.
And,
in
general,
it is often of interest to know the probability
distribution
of X, When
Y is given.
TM\177s,
let us suppose
that Y is known to be 1. The conditional
probability
disilribution
of \177Y, given
Y = 1, is designated as ?(x/Y\177- 1).
How is it to be\177evaluated
?
In the
bilities of vario.\177s
Clearly,
5-1
Y
generally,
?r Table 5-3
1 appears
as the third
second colum n
in
do not sum
T] tey
distribution.
X values. Thus
conditional
summed
sum
are equally
Ly
1/2,
is
is that
I, shown
vertical
it is
the
Figure
sli ce
for
reproduced as the
joint
represent
cannot
in
probabilities
a probability
relative probabilities of various

that
X' cannot
be 0 or 3, but X'
Intuitively,
therefore,
we arrive at the
give us the
1, we know
probable.
we
5-3;
Table
The problem
Y =
get these
hence it
\177n
to 1, hencethey
distribution
:lid we
to on
to
know
prC\177bability
How
column.
must
of 1 or
values
column
however,
do,
if we
slice for
specifical!y.
The appropriate
5-4 below.
Table
in
column
this
the vertical
examine
w\177 should
of ,Y given
numbers
9.
Since
=
all
1 as shown in the fhird

elements
in column 2
simply doubled them all. The result

a bona fide probability distribution.
(column 3)
82
TWO
VARIABLE\177
RANDOM
Derivation of the
of X, Given
5-4
T^BLE
of X
Values
?(x, 1)
1)
o
1/2
2/8
2
3
\177(w/Y
o
1
Dis-
Conditional
tribution
2/8
1)
Sum =Pr(Y=
-- p(1)
Sum
=1
= 1/2
doubling
Formally,
theory
in
2 is justified
rigorously
probability was found
to be'
in column
all elements
conditional
3, where
Chapter
= Pr (H (h
Pr (H/G)
(3-22)
G)
Pr (G)
substitute
merely
We
H,
G and
for
defined
events
by the
in
terms
repeated
of random
variables, as follows'
For G,
x)
(X =
H, substitute
For
(Y
substitute
(5-6)
1)
Thus
Pr (X = x/Y
new
Using
= 1) = Pr(X
example,
our
p(1)
= 1/2,
so
that
p(\177/y
thus
Y=i)
notation
p(x/Y
In
=xrh
Pr (Y = 1)
justifying
the doubling
The generalization
p(x, 1)
(5-7)
becomes
\177) =
(5-7)
p(1)
2p(\177,
(5-8)
\177)
5-4.
Table
in
1) --
of (5-7) is clearly
p(x/Y
y) =
p(x,
(5-9)
y)
P(Y)
The
conditional
?(x/y),
probability distribution
may
?(v)
p(x/y)
Note
be
further
abbreviated
to
giving
how
similar
this
is to
= ?(x,
y) ]
equation (3-22).
(5-10)
'
Sinceth,
>le to
(d)
or
= v)
E(X/
os
it can
distribution,
fide
83
be
mean
conditional
the
obtain
a bona
is
distribution
conditional
exam
for
used
UTi
iST
3tx/\177
\177
(5-11)
x p(x/y)
Independe\177tee
We
defin
the independence
of 2
concept of the independence
of 2 events
Definition.
T \177erandom
X and
variables
if ?orevery .\177and
in :tependent.
random
develope
by
d in
Chapter 3.
called
are
events (X
y, the
= x) and
the
extending
variables
independent
( Y = \177/)are
The conse!quencesare easily derived. From (3-25)we

(\177fevents
(X = x) and (Y = g/) means
that
(5-12)
know
that
the
independence
= x
)r (X
(h
Y =
y) =
?(x,
y) =
Pr (X = x) Pr
(Y =
\177t)
(5-13)
?(x)?(y)
Returning
to
our example,
we easily show that
X and
Y are not independent. Foi independence,(5-13)must hold for every (x, y) combindtion.
We
ask
whethdr
it holds, for example when
x -- 0 and y
07 The answer
\177sno;
from theprobabilities
in Table 5-3, (5-13)is shown
to be violated
Since
PROBLEMS
5-1 In 4
tosses:
.V =
Y =
List the
(a)
coin,
again
heads
\177fa
nu
rr ber
of changes of sequence
and then find
space,
sample
The
1:
ivariate
Figur-- 5 2c
(b) The
marginal)
(c) The
\177ean
and
(d) The c\177nditional

(e) The conditional
(f) Are
let
of
nur\177.ber
and
function;
probability
probability
variance
of
Y independent?
and
with
a dot
of
X.
function
probability
mean
function
illustrate
variance
?(x/ Y
2).
of X, given Y
2.
graph
as in
84
5-2
Y have
X and
Suppose
VARIABLES
RANDOM
TWO
the following joint
Answer
Suppose
Answer
same questions
the
X and
same
the
4-7,
Section
in
some function
of
.10
.20
.10
as
an
In this chapter we
of a pair of random
.1
.1
.1
.4
.1
.1
.I
in
Problem
5-1.
questions
as
VARIABLES
analyze
variables
a derived
\1777)will
variable T which
is some
function
X, Y:
g(X,
of this section will

therefore
run
the main difference being that
the
replace
the probability function
?(x).
new variable T.
was
(5-14)
= g(X)
proofs
shall be particularly
R which
variable
chapter,
previous
We
distribution
we analyzed a derived
random
(original)
random variable X'
shall
The conceptsand
5-1.
joint
r =
function p(x,
Problem
x\177
of the
in
OF TWO RANDOM
FUNCTIONS
5-2
following
the
have
.15 .30 .15
10
5-3
interested
in
the
distribution
parallel to those
probability
joint
and mean
of the
Example
Following
simple
our
examples,
normal
procedure,
we develop the argument
and then generalize.To
use our example
of
in
terms
tossing
of
three
OF TWO
FUNCTIONS
coins, shown
in
S is just the
5-3, suppose
Figure
specificcase( ;-15) becomes:
VARIABLES
sum
X and
\177of
\276.
s=x+Y
s) mbol
the
use
We
of X and
is
In Figure
derived direct
emphasizethat
T to
than
rather
(5-16)
in
special case of
being
(5-15),
a simple
the derivation of
the
function
probability
fufiction
this
sum.
\234(g), the probability

function of
the original sample space, or indirectly
by
function ?(x, \177,). In either case, the result is
?wo views of
this
(5-16)
5-3, we show how

iy from
the joint prok ability
FIG.5-3
a very
In
of S
S may
be
means
of
the
same.
!i
+ Y
(a) Directly
Reordered
s -- a\177q0
1
1
2
1
\337(HT\177 \177)
.mHi
?)
0
1
2
1
1
2
1
t)
\337(TTq
\337(TT}
[)
.(TH:
,)
\337(THI\177
i t)
\337(HT]
(b) Using
the
Original
sampl
'
i
[;:\"\177
,)
jo nt
x\"x,,y
of X
2 \177{]//\177
\337(HHT)
To give some
and
(HTT)'
\177
Final
\337(HTH)
0
1
\337
\177\177ff4\177
\177
2/8
4/8
1/8
(THT)
\177
p(s)
Intermediate
e space
(TTH)\177
function
Probability
\177d
'\177
\177\177'
as
an intermediate
sample
condensation
space
p(s)
\3370
1/8
'2
2/8
3 4/8
\3374
1/8
ation
as to why this random variable might
be of practical interest,
the tossing of 3 coins as \"having
3 children,\"
and then consider X =
number of girls a \177dY -- number of sex changes.
Since girls are more expensive
to clothe
than boys, and
ince sex changes
interfere
with the convenient passing
on of clothing
from one child t6
the next child, we might interpret S = X q- Y as a rough index of the
clothing costs for :he family.
Of course, a weighted average of X and Y might be even more
we
may
appropriate.
reinterpret
ti
IWO
86
one hand, consider the direct

derivation
in Figure 5-3a. To
note that four of the eight equiprobableoutcomesare associated
3. Hence _p(3)= Pr(S = 3) is 4/8.
Other
$ probabilities
are
the
On
we
illustrate,
with
evaluated.
similarly
combinations
The expectation
hand, applying
(4-3)
p(3) may
5-3b,
Figure
in
yield S = 3, and the
the joint
deriving
(x, y)
be evaluated indirectly,
by
i/). Then the three circled
all
sum
of their probabilities is 4/8.
E(S)
may similarly be derived in two ways.
On the one
directly
to the probability distribution
of S, we have,
hand,
the other
On
first
VARIABLES
RANDOM
?(x,
by definition:
E(S) =
=
the
On
1(0) +
0(\177)
hand, we may arrive at
and
3- Y) =
E(X
\177
=
(5-I8)
does,
in
(x
3-
0)(\177)
we see that
same as the
4(\177)
the same result by using the joint

if (4-17c) can be extended
g/)
(5-18)
p(x,
(0 +
\177)(0)
0)(0) +
(\177 +
\177)(-\177). \337
\337
(\177 +
\177)(}\177+
(2 +
\177)(-i)
at least
work
derived
result
same
the
in
(5-17).
Why?
example.
this
in
(3 +
The last
to
-\177+-\177)
secondlast term
is just
(5-18)
(\177 +
3(-\177
is the
3(\177)
2\253,
fact,
3 terms of (5-18)amount
we wonder
Specifically,
Y.
= (0
which
2(\177)
of X
to:
So
(5-17)
s p(s)
other
distribution
E(S)
\177
of
3(\177)
of
a disguised form
in this
Continuing
(5-17).
the
more
fashion,
condensed
form
(5-\1777).
In
a similar
Theorem.
way we
If T
= g(X,
E(T)
[compare
X
could prove generally

Y)
any
is
= E[g(X,
of two random
function
\276)]
\177 g(x,
\177,\177J
V)?(,\177,
variables, then
V)
(5-19)
(4-17c)].
For an example of how

and Y, we return to the
this
tossing
works
for a
of three
more complicated function
coins, and
T=X\177--2Y
of
consider
(5-2o)
meth \370d
Following
th\177
probability
distribution
tp(t)
--3
1/8
-\177
2/8
\177
(x 2
= (02
3/8
-2/8
2/8
2
9
2/8
4/8
\177/8
9/8
which Et T) is directly
calculated
Alternatixely, we could calculate
in Table 5-3. Thus, noting
(5-20):
E(T)
fol16Wi ng
(5-17)
using
?(t)
given
the
derive
for T:
Calculation of \234(T),
from
we can
5-3(a),
Figur
of
87
VARIABLES
TWO
OF
FUNCTIONS
-- 2y)
p(x,
-- 2(0))(1/8)
1.
to be
using
_p(x,
\177t)
as
y)
+ (0\177'--
2(1))(0)
+ (0 \177-- 2(2))(0)
2(2))(0)
2(0))(0)..--t-(32--
q-(12--
(5-19),
from
E(T)
\177--1.
PROBLEMS
5-4
X(X +
Let U
V =
where
Jr
(X -and
Y)
8)(Y -Y have
4)
the same joint
as
distribution
in Proble
namely
x\177
5
10
(a)
.20
.10
.30
.15
.15
Fine the distribution of U, and
(b)
(c)
.10
Find\177
the
mean
Fin! E(V),
of U using
'
fro
(5-19).
this
its mean.
m 5-2,
Let U =
VARIABLES
RANDOM
TWO
88
XY- 1
r = (X-
where X and
2)
\177)(\177'-
joint
the same
Y have
as
distribution
Problem
in
5-3,
namely
2
xx\177
(a)
Find
(b) Find
the
.1
.1
.1
.4
.1
.1
.1
of U, and from
of U using (5-19).
distribution
the mean
its
this
\177nean.
(c) Find E(V).

5-3
COVARIANCE
is a
This
related.
graphed
to move
As
in
an
measure
example,
Figure
5-4a.
of the degree to which

two
variables
consider the joint probability function
We notice some tendency for these
are
of
two
linearly
Table
5-5, '
variables
Y; and
together (i.e.,a largeX tends to be associated with a large

a
small X with a small Y).
Our measure of how the variables
move together should be independent
of our choiceof origin.It will, therefore,
be convenient
in Figure 5-4b to
translate
both
axes from the (0, 0) origin
to/\177x and/\177y (which are calculated
to be 3 and 3); this means defining
two
new variables
Y-
and
Now supposewe
multiply
the
new coordinate
values together,
(x-/,x)(\276TABLE
Joint
5-5
1
1
Probability
p(x,
.2
.1
.1
.1
4
$
\177d)
.2
.2
0
0
.1
89
cOVARIANCE
(a)
-- /1X
FIG.
Tran: fation of axes.
5-4
(a)
Original.
tribution.
(b) Axes
translated to
the
center
of the
dis-
fx) coordinate
product will be positive.
It will also be'\177positive for any point in the third quadrant,
since both factors
are negative.
Cut
for points in the other two quadrants the product is ne\177at ve. If we sun\177 all of these,
attaching the appropriate probability weights
to
For
any
and
( Y
poin'
--
fv'\177
in quadrant
in
will be
coordinate
5-4b
Figure
its (X-
both
positive; hence this
i.e.,
each,
x y (x
\177Z
this gives us a good measure of how

fact
axv , the \177\"covariance
of X and
In
and III;
our
thus
eX'tmPle,
t!te
the heavier
Positive
terms
-- fir)
-- fx)(Y
the
p(x, y)
s move
Variable
together, and
is in
Y.\"
probability weights
in this
(5-21)
calculation
will
appear
outweight
in quadr?ts
the negative.
IWO
90
VARIABLES
RANDOM
covariance
Consequently,
for the variables to
tendency
probabilitieshad
positive,
be
will
Alternatively,
together.
move
as
indicating,
expected,
if the
some
larger
in quadrants
II and IV, covariance would
be
the tendency for X and Y to move in opposite directions.
Finally, had the probabilities
been evenly distributed in the four quadrants,
there would be no discernible
tendency
for X and
Y to move together and,
occurred
indicating
negative,
as expected,
their
zero.
to the following formal
would
be
is equivalent
covariance
We notice that
(5-21)
definition)
Definition.
This
(-2)(-2)(.1) + (-1)(-1)(.2)
+ (-1)(+1)(.i)
+ (+1)(-1)(.1)
of axy
computation
The
(5-22)
as follows'
Calculated
axY =
Y,
/\177x)(Y- /\177y)
= E(X-
%x:Y
and
of X
Covariance
may
often
be simplified
axy
= E(XY)
with its proof, is analogous
formula,
(+1)(+1)(.2)
(+2)(+2)(.1)
= +1.0
using
by
(5-23)
-- YxYY
to (4-5):
(5-24)
Proofof (5-23):beginning
axy
= \177 (x -
with
(5-21),
\177x)(y
Yr) ?(x,
x\177y
Y\177x
y)
a\177y
(xy
+ \177x\177 ) p(x,
y)
\177c
y
-- \177
xy
y) -- ttlz
p(x,
\177 x p(x, y)
-- YX
\177
y ?(:c, y) +/\177xY\177'
\177
x y
?(x, y)
(5-25)
In the second term,
we find
that
x
x?(x,y)=\177x\177y?(x,Y)\177
=
and by (5-5),
(5-26)
Similarly,
in
the
term of
third
(5-25),
y p(x,
Finally,
in the
last term of
(5-25)
reduces
= \177
y)
(5-27)
(5-25),
\177?(x,
Thus
y)
to
x
xy p(x,
y)
yr(/\177x)
-/\177x(/\177r)
= E(XY)-
ttxl\177y
(5-23) proved
91
COVARIANCE
The variance of X [ref.
being the
col
Since
we
a2
it pl\177
find
(4-19)] is recognized
a special
just
as
itself.
X with
of
\177ariance
measures the extentto which the two

usible (indeed,it may be proved a) that
variables
case of
this,
move together,
Theoren
If X
and r
are independent,
(5-28)
'\177
PROBLEM
5-6 For tI
joint
following
table,
probability
2
0
1
(a) Fr>m
1
(b)
(s-7)
the
(5-22).
definition
the easier
Fr:>m
Repeai
formula (5-23).
.xx\177 o
and
Y have
the
3
Proof.
If X a m
Yare
independent,
[\177 (\177-
= 0.0
,\177
distribution:
.10
.40
.20
.05
.05
.20
then
2vx, y)
Thus (5-21)bec, me s
'\177,,.
/'
joint
following
.,., ,
\337
.
X
joint probability distribution:
for the following
5-6,
Problem
5-8 Suppo:e
.2
.4
axy
.te
Calcul
.2
.2
= 0
= ?(x)
\177x)?\275)1
(5-I 3)repeated
?(y)
[\177
(y
\177r)?(y)l
(5-28)proved
RANDOM
TWO
92
p(x)
Find
(a)
confirm
5-9
that
is
VARIABLES
and p(y); then by verifying

that
and Y are independent [ref. equation
(b)
What
(a)
Referring to
(b)
Referring
p(x, y)
(5-13)].
ax\177,
Problems 5-4 and 5-5,is it
that
true
E(V)=
axY ?
5-1, find
Problem
to
p(x)p(y)=
(1)
'5-10
0
1
X and
Are
.1
.3
.1
.1
.2
.2
of X and the probability function
function
E(X)and E(Y).
Y independent?
Y. Compute
(b)
probability
the
Find
(a)
of
(c) Calculate axy.
(d) Which
are
statements
(1) If X
and
(2) If \177x\177 =
a
=> 5-11 In
true,
for
0, then X
then
and
game,
any
independent,
gambling
certain
Y?
must be zero.
be independent.
must
a pair
and
axr
of honest three-sideddiceare
Let
thrown.
X1 =
X2 =
The
are
number on first
number on the
of
distribution
probability
joint
die
second die
X\177 and
X2 is,
of course
2
3
The total
number
of
dots
S is:
S--- X1 .qL
of S, and its mean and variance.

and variance of X 1 and X\177..
see the relation between (a) and (b)?
(a)
Find
the
(b)
Find
the mean
(c)
Do
you
X 2
distribution
Supp
:v-5-12
using
\177,se
5-11 is
of Problem
gambling
game
as follows:
the
93
COMBINATION
LINEAR
by
complicated
dice,
l\177aded
\337
\177'
of
distribdtion
.5
.3
.4
.3
.1
X\177
dice are tossed

X2, and then
and
independently, tabulate
the
joint
answer the same questionsas in
5-11.
Problela
OF
COMBINATION
LINEAl
5-4
.4
the
that
Assumlng
I
2
VARIABLES
RANDOM
TWO
(a) Mean
First,
web
When
of \177
the mean
the
and
of more complicated functions, and return

Section 5-2 in which S was just the sum of
E(S) the student's suspicionsmay have been
leave
take
example of
we calculated
simple
problems.
mean
In
simply the
out to
turned
(2\177)
of Y, (1). Moreover,
tct, for any X and
of
sum
the
this was exactly the

it may be proved* that
Y,
mean
to
the
aroused;
of X,
(1\177)
conclusion in
the
Theorem,
E(X
often refer to this

the expectation
Mathematicia}s
or
\"linearity\":of
cover the
caseof a \"weighted
For S
--
W-- aX
first
term,
Y)
we may
x y
x p(x, y)
write
=
=
(5-5)
Similarly
the
seC\177nd
easily
to
(5-30)
--- \177 (x d-' y) ?(x,

zc y
\177XlV(x,y )
by
be
may
+ bY
XY
g the
as the \"additivity\"
generaliz'ed
property
important
operator. it
= \177
Considerin
(5-29)
becomes
Y, (5-19)
E(X +
E('Y)\177
E(X) +
sum\"
'
5 Proof
+ Y) =
't- \177
X y
it as
\177x
y)
y p(x,
y)
!
[\177p(x,
y)]
\177 x p(\177)
to E(Y), so that
E(X + Y)= E(X) +
term reduces
E(Y)
(5-29)
proved
TWO
94
where
a and b are any
//tion of X and Y.\" For
a = b = 1. As another
Y is (X +
Y)/2 = \177
1/2. Similarly,
We
we
know the averageof X and
(5-30) to
into
average
the
find
is always justified; thus
operation
simple
average
if we
that
these
plug
as a \"linear combinathe special case in which
which
Y,
\177
known
is just
1.
b =
guess
might
might
average of two random numbers X and

is just a weighted sum with weights
is just a linear combination
with
a and
the
example,
also
q-
S = X
example,
X
W is
constants.
two
weighted
any
a +
satisfying
VARIABLES
RANDOM
of
the
average
of Y,
this
Fortunately
\177.
\177
\337
Theorem.
E( I4/) = E(aX q- b Y) = aE(X) -+- bE(Y)

(5-31)
review, the student should compare(5-19)
and
(5-31).
Both provide
a means of calculating
the expected
value of a function of X and
Y. However,
(5-19) applies to any function
of X and Y, whereas
(5-31) is restricted to linear
functions
only.
When
we are dealing with
this
restricted
class of linear
As a
functions, (5-31)is generally

preferred
Whereas evaluation of (5-19)involves
of X
and
of
and
distribution
ability
distributions
ginal
JF
to
(5-19) because it is much simpler.

through
the whole joint prob-
working
5-3), (5-31) requires only the marthe last row and column
of that
table).
Table
(e.g.,
Y (e.g.,
(b) Variance
Again, we
considera
later. The variance
is a
and
first,
any linear
than
complicated
more
little
combination
its mean. It
? that
proved
be
may
sum
simple
a sum
of
Theorem.
var(X+
6
Since
Proof.
the
the proof
It is time
awkward
parallels the
to simplify
(5-32)
Y)
of (5-29), it is left as an exercise.

proofs by using brief notation
such as E(W)
proof
our
more
or the even
w?(w),
\177
Y+2cov(X,
=varX+var
Y)
\177
awkward
w(x,
\177/)?(x,.V).
rather
First,
than
from
(4-19),
= E(S -- ItS)
var S
Substituting
for S
var
and its,
E[(X q-...Y)-
S =
(itx q-
= E[(X--Itx)+ (Y-
'
Realizing
itx)
El(X-each
that (5-31) holds

var
for
'\177
+
of these
E(X --
is a
any random
itx) \177+
=varX+2cov(X,
S =
2(X--
itx)(Y-random
variables,
Itx)( Y
2E(X --
Y)+varY
ity) +
(Y--
variable
-- itI \177)+
E( Y
-(5-32)
proved
LINEAR
.g
var
where
spectively.
A\177
(ai
covariance
for
pendent,
nd coy (X, Y)are alternate

interesting simplification
in the
simplifies to:
for
notations
occurs
dice Problems
and
X and
5-11 and
axe'
re-
Y have
zero
and
a\177
when
whenever
occurs
this
);
uncorrelated
example
95
COMBINATION
inde-
are
(5-32)
Then
5-12.
Corollarj
(: ;'32)
Theorem.
and
importan\177
tl
in
with
specil
restricted
+,b Y),
(aX
of the
Summary
of
\276 +
Mean and
X and
Mean
Variance
E[g(X, Y)]
=
combira
Row
2/) p(x, y)
(5-19)
\177ig(x,
X,Y
\234(aX + 5 \276)
+ bY
tion aX
Setting b =
of o n
variable, aX
8 Since the
has a
corollary
6 = 1
in row 2
a =
4. Function
E(X +
Setting
sum
----- a
proof
in
\177rallels
similar
row
the
E(X)
to (5-33).
E(Y)
(5-29)
E(aX)
= aE(X')
of (5-32),
it
is left
\177'\177
X +
b\"'
var
(5-34)_
Y)
+ 2 cov (X,
Y)
= a\177var
4-2)
as an exercise. Note
Y)
varX+var
var (aX)
(ref. Table
proof
2 var
+ 2ab coy (X,

var (X
Y)
+ b Y)
(aX
var
+ bE(Y)
-- aE(X)
(5-31)
x +
(5-34)
Y)
Functions
Various
of
Variance
Variables
Derived by'
3. Simple
(X,
2ab coy
and
function
Linear
vat
Variance
X and
2.
b2
X +
vat
Random
the
Mean
Function
Any
= a2
theorems of this section

are summarized
in Table 5-6, a
table for future
reference.
The general
function g(X, Y) is
e first row, while the succeeding rows representincreasingly
\1771cases.
T^BLE 5-6
g(X,
combination.S
the ether
This
very
t.
(5-33)
va\177
dealt
to any linear
be generalized
may
Y) =varX+var
vat(X+
Finally,
uncorrelated,
Yare
IfXand
(5-32)
X
(Table 4-2)
also
that (5-34)
96
TWO
VARIABLES
RANDOM
Example
we choose
Suppose
at
family
letting
G =
so that
known
it is
Suppose
family
the
in
number of girls in
= number
q- G
=z
of boys
-- number
family
the
of children.
that
= 2.0
vat B
-- 2.2
var
G) = 0.3
coy (B,
of children, and the variance'
the average number
calculate
can
we
Then
From (5-29)
population,
a certain
from
random
= 2.3
From
(5-32)
(C)
var
2.0 +
2(0.3)= 4.8
2.2 +
PROBLEMS
5-13
are
only
not
function
5-12,suppose
5-11 and
Problems
Continuing
but
loaded,
of the 2 numbers
mean
3,\177
the
distribution
.1
.1
.1
.i
.1 .i
.,5
of S
of 3-sided dice
probability
joint
.2
.1
'
'
(the ?otal
number
t.
2..
of dots), and its
and variance.
(b) Find the mean and variance

(c) Find the covariance of
and (5-32) hold true.
(5-14)When
the
.1
Find
pair
the
that
is
(a)
so
dependent,
a coin
is
fairly
tossed
X -- number of headson the

y--- number of headson the
Z -- total number of heads
of
X1
of X2.
X2, and then
X\177 and
and
3 times, let
first
two coins
last
coin
verify
that
(5-29)
COMBINATION
LINEAR
(a) Are X
(b) F\177r
ance.
tossed
, having
in
3-26)
sample space'
10
15
T'T)
\337
(H
. (T
_H)
15
\337
(T
H T)
10
\337
(T
T H)
.10
.15
\337
(TTT)
2nd
(b)
50
80
20
\177
20
\177
(,q +
(b) The instructor
Repea
covariance
%\177
of the
(5-18) Repeat
table, assuming
a simple average of
50
the
second
exam was
,t
such
tverage
two
the
grades,
twice as important,
average
Problem 5-16, if'the covariance
interpr
calculated
thought
a weighted
tool
in the
blanks
the
instructor
The
.V =
5-17
Variance
\177
',rage W
FJ[1 in
so
Standard
Deviation
characteristics:
following
the
with
ob-
X Veighted
av{
(a)
grades,
wrote 2 exams,each time
Mean
X2
\177xam,
(a) Average,
large class
Class
X\177
fairly
10
. (H T H)
1st e\177am,
not
Pr(e)
. (HHT)
udents
of a certain
tainin\177 a distribution
of
is
which
H H)
\177,(H
sl
and vari-
true.
5-16 The
mean,
(Problem
coin
following
the
fact
covariance ?
distribution,
the
find
(5-32) hold
and
(5-29)
that
Z,
Y, and
t Problem 5-14for
Repe\177
is their
What
independent?
of X,
each
(c) v \177rify
(5-15)
and
97
a negative
grade
?
problem 5-16,if
covariance ? What
the
covariance
--200.
is
has
is 0.
might you
to the variance
How
it done
98
Review
VARIABLES
RANDOM
TWO
Problems
X and
5-19 If
Find
the
joint probability function
the following
Y have
.1
.3
.1
.1
.1
.3
and mean
distribution
probability
of
(a) x.
(b)
(c)
(d)
Y.
The
sum S
Y, given
= X+
X =
Y.
5.
(e) Are
5-20
X and
Y independent?
Briefly, why?
(f) Find Pr(X < Y).
In a small community of ten
working
couples,
thousands of dollars)has the following
distribution:
Man's Income
Couple
couple
10
15
15
15
10
10
10
10
15
10
6
7
2O
10
15
10
20
15
10
20
The
(b) The
(c)
(d)
bivariate
probability
probability
The covariance
The
at a
con-
and
wife
10
to represent the community

income of the
by lot
(random)
be the
(in
is drawn
income
Income
vention. Let M and

respectively. Find:
(a)
Wife's
yearly
man
distribution
of M;
and its dot graph.

also/&u and a.\177s.
distribution
of
also/&v
probability
a.m[..
distribution,
147;
and
GIV.
\177
COMBINATION
LINEAR
(e) E( ['V/.M = 10), E(IV/M =

conditional mean of IV increases
(f)
what
and
\177L?
\177ts mean
that
This is
as
variance
the
increaseS,
another expressionof the
IV.
the total combined income of the
C represents
If
too.
M and
between
relation\"
\"positive
20).Note
man
and'
wife,
9.
is pr (C _> 25)9.
If tncome IS taxed a strmght
20 percent,
what is the mean and
variani:e
of the tax on a couple's income9.
(i) If the income
of a coupleis taxed
according
to the following
progressive tax table, what is the mean and variance of the taX?
(g)
What
(h)
\177
(5-21)
Ten P!ople
in
a room
Person
For
(a)
a,
Ti
Tax
Income
Combined
10
15
20
25
30
35
10
40
13
have the following
(inches)
Height
heights and weights
Weight (pounds)
g
B
C
70
65
65
140
75
160
70
150
70
140
65
140
150
75
150
75
160
70
160
person
drawn
(b) TJ\177eprobability
(c) T!\177eprobability
(d) TJ
le
(e)
\177,W/H =
height
H and weight W), find.
distribution,
and graph it.
distribution
of H, and its mean and variance.
distribution
of W, and its mean and variance.
by lot
(with
probability
\177ebivariate
covariance,
aH\177r.
65), E(W/H
= 70), E(W/H
= 75).
100
VARIABLES
RANDOM
TWO
(As height increases, the conditional mean

weight
another view of the positive
covariance
of H and
(f)
Are H and W independent?
(g) If a \"size index\" I were defined as
is
which
increases,
W.)
I=2H+3W
5-22 Suppose
and verify
of I
distribution
(a) List the
sample
is the
(b) What
Each coin
possible
space.
that
R,
up\" you
\"heads
0 to
R ranges from
and variance?
its mean,
work through an alternate way

going to the trouble of finding
let us define
find
to
without
R,
tion. To begin
with,
X\177
= the
X2
= the
the mean and

distribu-
exact
its
to the reward
nickel's contribution
to the
the quarter's contribution to
Xa =
the
find
a nickel, a
are allowed
the table.
on
coins
lands
reward
of
distribution
now
shall
We
variance of
then
directly.
involves dropping 3
a game
dime,and a quarter.
to keep,so that the
deviation of I;
and standard
variance
mean,
the
find
dime's contribution
reward
reward
the
Thus
(5-35)
is the distribution
(c) What
(d)
(e) Apply
on
supposethat
the
X\177,
find
of
:=> 5-25 A
at
there were 4
Answer
2 quarters.
instead
What
coins
the
of 3 coins, we dropped
is the range, mean, and
bowl contains 6 chipsnumbered

from
1 to
and then a second is selected (random
6. One
random
Let
X\177 and
X2 be
the
first
and
X\177 and
chip is selected
sampling
numbers
second
(a) Tabulate the joint probability

function
of
(b) Tabulate the (marginal) probability functions
Are
Xa.
R ?
replacement).
(c)
and
5-22.
Problem 5-22,supposethat
3 nickels, 2 dimes, and 5 quarters.
of
3 coins,
a dime,and
Continuing
variance
varianceof X2
var (R).
and
E(R)
instead
Problem
and variance?
mean,
its
mean, and
a nickel,
table
same questions as in
5-24
of
distribution,
(5-29) and (5-33)to
(5-23) Continuing,
dropped
the
find
Similarly
X\177 and
of
without
drawn.
X=.
X\177 and
X= independent'?.
(d) What is the covarianceof X\177 and X= ?

(e) Find the mean and variance
of X\177 and
(f) Find the mean and variance
of S -- X\177 + X= in two different ways.
COMBINATION
LINEAR
:> 5-26 Ret eat Problem 5-25 with

draWi
draW!
probl
(random
m
the
5-27
Let
the
be
of dots
number
total
the
mathematically
twice
\177
in
replacement).
with
sampling
replacement)
(with
following
then replaced
and recorded,
101
change. The first Chip is

before the second is
bowl
Isn't this sampling

identical to tossing a die
10 fair
when
showing
dice are
tosse\177
'hat are the
(a)
(b)
\276that
(a)
,a\177bowl
is the
variance
and
mean
of possible
range
of Y?
values of
Y?
=> 5-28
contains
by $. Tabulate
of
variat\177ce
(b) Repeat
(c)
(d)
50 chips
of two chips
A sarhple
Repeat
o you
the
is drawn
probability
numbered 0, and 50 chipsnumbered

1.
with
replacement;
the sum is denoted
function
of $. What
are the mean and
:
S?
for a sample of three chips.

for
a sample
recognize
the
of five chips.
probability
functions
in
(a),
(b),
and
(c)?
chapter
Sampling
6-1
INTRODUCTION
In the
last three chapters we have
analyzed
probability
and random
variables; we shall now employ this essential theory
to answer
the basic
deductive question in statistics:
\"What
can we expect of a random sample
population ?\"
met several examples of sampling:
the poll of voters
sampled from the population
of all voters; the sample of light
bulbs
drawn
from the whole production of bulbs;
a sample of men's heights drawn from
the whole
population;
a sample
of 2 chips drawn
from
a bowl of chips
(Problem 5-25).All of these are sampling }vithout replacement;
an individual
once sampled, is out. Sincehe is no longerpart of the population
he cannot
appear again in the sample. On the other hand, sampling
with
replacement
involves returning any sampled
individual
to the population. The population
remains
constant;
hence any individual may appear more than
once
in a
sample, as in Problems
5-26 and 5-28. Polls of voters
are typically
samples
without replacement; but there is no reasonwhy a poll could not be taken
with
replacement.
Thus
no record would be kept of those
already
selected,
and, for example,John Q. Smith
of Cincinnati
might vote twice in the poll.
a
privilege he will not enjoy on election day.
As
defined
earlier,
a random sample is one in which
each individual
in
the population is equally
likely
to be sampled. There are several
ways
to
actually
carry out the physical process of random
sampling.
For example,
suppose a random sample is to be drawn
from
the population
of students
from
drawn
We
in the
a known
already
have
classroom.
1.
The
most
board chip, mix

draw
graphic
all
these
method
is
chips in a
2. A more practical method

is
a random sample of numbers.
to
put
each
person's
large bowl and then
to assign
Thus
102
each person
for a
name
draw
a number,
population of
on a
the
card-
sample.
and
less than
then
a
!! '103
INTRODUCTION
hundred, 2-d git

by throwing
(Appendix \177able
required
in
t\177e
A random
suffice.
numbers
or
die twice,
\17710-Sided
2-digit number
a table
consulting
pair
off a
reading
and
II)
by
digits
of
may
be
obtained
of random numbers
for each individual
sample.
are mathematically
equivalent. Method 2
used in practical
sampling.
However,
the
first
method
i conceptually
easier to deal with and to visualize; consequently
in our theor\177 tical development
of random sampling, we talk of drawing
chips
from
a bowl. Moreover, if we are studying
men's
heights,
then the
height
alone
i.\177
all that is required on the chip and the
man's
name is irrelevant.
Hence we ca\177 view the population
simply as a collection of numbered chips
in
a bowl,
w\177ich
is stirred
and then sampled.
!
\337
\337 How
canirandom
samphng
be mathematically specified.9 If we draw one
chip at rando\177n,
its number
can be regarded as a random
variable
taking
on
values that rar\177ge over the .whole Population of chip values,
with
probabilities
correspondin\177 t \370the relative
frequencies
in the population.
As an ex'\177mple,
suppose
a population
of 80 million
men's
heiehts
ha\177
the
frequency'idistribution
shown in Table 6-1. For future
reference,\177we
aisc}'
These
t\177o
is simpler
compute
of
..V,
3t
methods
sampling
a\177,cl
Table
a 2 from
6-1, and
the ?arent
3 represents
where
it is
hence
employ,
to
TABLE 6-1
call them
population
of Men
Population
ought
(3)
51
825,000
.01
54
791,000
.01
57
2,369,000
.03
60
5,505,000
.07
9,483,000
.12
66
16,087,000
.20
69
20,113,000
.25
72
149480,000
75
7,891,000
78
1,633,000
823,000
z =
4-8c.
\177
also
Frequency
81
approximate
each
o have
variance
Frequency,
of cell)
idpoint
63
precise, we
Heights
mea n and
s heights.
Relative
x We
men
(2)
(1)
Height
(M
the,
of
height
by the cell
used a
very
80,000,000
.18
.10
.02
.01
z = 1.oo
midpoint to keep concepts simple. To be 'more

of height into many
cells, as in Figure
fine subdivision
104
SAMPLING
From
(4-3)'
From
(4-4)'
51(.01) q- 54(.01)\"'
3t =
67.8)2(.01)q- (54 -- 67.8)2(.01)

\"'
(51 --
a2 =
81(.01)-(81
q-
67.8
= 28.4
-- 67.8)2(.01)
a=5.3
is equivalent mathematically
to
a bowl with each chip carrying
The first chip selectedat random
can take on
any
of these
x values, with
probabilities
shown
in column 3. This random
variable
we designate
as X\177; the second draw is designated as the random
variable
X2, and
so on. But each of these
random
variables
X\177, X2,
... X,
(together representing our sample of n chips) has the same probability distribution
\234(x), the distribution
of the parent population; that is 2
population
this
from
sampling
Random
million chips of
the x value shown in column
1.
the 80
placing
This
of course,
equality,
secondchip
is drawn
(6-1)
Fortunately,
though
X\177,
must show
We
X2,...
X\177
now
are
noted
already
have
the
same.
population (and not
sample
if we
same bowlful
sampling
(6-1)
with replacement, sincethe

as is the first chip, etc.
even
replacement,
without
since this is not
dependent;
Once that
Thus
of
distribution
But
of
this
X\177,
is not
which
X2
X2 will
we
obvious,
all
at
is
as the
of X2
same
the
distribution
has been taken from the

chang es3, along with relative
on X\177; or to restate,
the
value
population
is
X\177
conditional
sample
first
the
of
distribution
the
that
replaced),
(probabilities).
conditional
first draw.
. . . p(x\177)
of the population. However, the
given X1 is not
distribution
holds true
from exactly the

also holds true for
why.
distribution
frequencies
-- \234(xa)
= ?(x2 )
?(xO
2 in
column
dependent
depend on
the value
of
in the
X\177 selected
the issue in (6-1). In that equation

is not the conditional distribution, but
is
\234(x2)
rather
the
the
of X2--without an3 condition,

i.e., without
any knowledge of X\177. And if we have no knowledgeof X\177 and consider the distribution
of X2, there is no reason for it to differ from the distribution of X\177.
Our
intuition
in this case is a goodguide.We could
formally
confirm
this result by considering
the full table showing the joint probability function
of X\177 and
X2. It is symmetric around its main
diagonal;
hence
although
conditional distributions (rows or columns) vary
in this table,
the marginal
marginal
distribution
speaking,
(6-1) is not precise enough. It would be moreaccurate
the probability function
of X\177,72 of X 2, etc., and then write
\177
Strictly
p\177(x)
-- p2(x)
pa(:\177)
' \"
--
p\177(x)
to let?\177 denote
-- p(x)
where -- means \"identically

equal
for all x.\"
\177
In our example, with
a population
of 80 million heights, this
practical
consequence.
But with smaller populations
it might.
change
would
bc of
no
105
sAMPLE SUM
distributions of X\177 and of X2 are necessarily identical. (SeeProblem

Thus equatio a (6-1) holds true, even
in the case of sampling without
5'25b.)
replace-
ment.
Before
is extremely
matters whe:her
Ie4ds us
the frequencies
It hardly
replacement.
with
in the
is replaced
will
(X=)
be
to the conclusionthat
column
in
population or
2 or the [dative
of the
independent
practically
replacement;
with
first
from
replacement
without
sampling
to sampling
is equivalent
\177oPulation
infinite
million, sampling without
3.
th: second draw
This
(X0.
as sampling
the
wh'en
observation.
further
as 80
sampled
changes
hardly
one
such
large,
the individual
not one individual

frequencies i:\177column
Thus
we have
the same
is ?tactically
replacement
an
6-1,
Table
l\177aving
parent population
is
: this
important
e\177iough
that
we shall return to it in Section
6-5.
Simulat ed\177b a
Cone!us,ion.
Any population to be sampled
bowl
of chips,
with the following mathematical characteristics:
1. The \177umber on the first chip drawn is a random
variab!e
X\177, with
a
distribution
dentical to the distribution
of the population
random variable X.
maybe
2. The \177ampI
Each
X\177
the
h a
n chips gives us n random

same (marginal) distribution
characteristic
(6-1) holds in all
e of
variables
(X\177, X\177,
.'..
population X.
regardless
of sample
of the
that
This fund an\177 :ntal

cases
replacemen(or population
size.
However,
the independence of X'l, \177r2,...
is a more c(.mplex issue. If the population is finite
and
sampling
is without
, then the X; are dependent, sincethe conditional
distribution
replacement
of any X\177 depends
on the previous X values drawn. In all other
cases the
Xi are independent; for simplicity,
we shall
assume this independence in the
rest of the took (except Section6-5).
6-2
SUM
SAMI'LE
Now
V\177e arc
First consid\177rS,
The
expect j ,d
to use the heavy artillery

drawn
up
sum of the sample observations, defined
ready
the
+
value ofSS&isX\177
obtained
E(S) =
4
E(s)=
E(\177'\177+
X2 +
=
=
\177
Again
by Thi 'orem
(5_29).
gener all
induction.
:ati\370n
of the
\234[(X\177+
X\177+...
X._\177)
(5-29), as:
5.
as'
(6-2)
(6-3)
+ E(X,,)
+\"'
X\177]
q- X\177+ \337
\337
\337
q- Xn-\177) q- E(X\177)
\177[(x\177 + x\177 +-..
+ x\177_\177)+ x\177_\177]+ \177(x\177)
E(X\177
_-- E(X1+ X\177+

=
This
+by Xa
+'\"4 Theorem
4- X\177
using
-+- E(X2)
E(XO
.... + X\177) =
15-29):
by Theorem
X\177
inC\177 ptcr
\234(x\177) +
E(x\177)
special two-variable
...
+ X,,_=) +
+.-.
case in
+
(5-29)
E(X,\177_O
\234(x\177)
is an
example of proof
by
SAMPLING
106
(6-1) that
Noting from
each
population,
it follows
(6-3) can therefore be
X\177,
each
that
written'
...
has
the same
E(S)
or
the expected
population times the
that
Note
the mean o'f the
simply
is
parent
size.
sample
\177
var
(X\177
vat
X\177
X2 +
var X2
\"' +
X\177)
... +
(5-33)'
Theorem
using
by
var
(6-6)
X\177
this depends on the assumed

independence
of X\177, X2, ... X,\177.
all the X\177, X2, \337\337. X\177 have the same distribution as the populaalso
have the variance a \177of the population.
Thus (6-6) becomes:
var
S = a 2 + a 2 + ...
+ a2
since
Again,
tion,
(6-5)
n/\177 I
value of a sample sum
var S
the
as
population
(6-4)
same way, the varianceof S is obtained
In the
the
I I\177s =
Thus,
mean as
+... +
same distribution
X,, has the
X2,
they
(6-7)
= na2
or
(6-8)
Formulas (6-5)and
average
length/\177
are
(6-8)
a machine
suppose
is made by joining
5' is a random variable,
is
from
fluctuating
sample
100(.40)
Moreover, becauseour
X=,
the
...
X\17700
standard
are
sample
independent.
deviation
of $.
is
drawn
Therefore,
As
bicycle
a
deviation
sample of
a random
together
6-1a.
of
population
standard
inch and
--.40
in Figure
illustrated
produces a
another
example,
chain links
--.02
inch. A
with
chain
100oftheselinks. Its length

Its expectedlength
to sample.
= 40.0
inches
an infinite
we may apply
from
population,
(6-8) to
X\177,
compute
10(.02)= .20inch
The
student
teristics
We
/\177s
was
will notice that

this
is an example of
a sample (\177s, as) have been deduced
of the parent population.
of
characteristics
(/\177,
pause
a)
to interpret
n times/\177. But why
(6-5) and
should a s
(6-8) intuitively.
be only
x/\177
It
times
statistical deduction;
from
was
known
charac-
no surprise
a? Typically,
that
a sample
THE
Population
107
MEAN
SAMPLE
Sample sum
(one ObserVation)
(n =
4 observations)
p(s)
/as =
41\177
(a)
p(5)
Sample
(n
--
mean
4 observations)
0'2=
Parr
Population
i(X)
(one
observation)
(b)
FIG. 6-1
sum
(a)
(e.g.
and some
the
it is
spread
sample sum S to the parent population.

sample mean 2 to the parent population.
of the
[elation
tin)
ich are
in
substan'
accumulated without
less
6-3 THE SAMPLE

Recall
tl
undersized so that
chain
the
ially
some individuals
include
will
wl\177
(links)
(b) Relation
which
cancellation
some
of the
\337
are
occurs.
oversized,
Thus while
does exceed the spread in an individual

link (a),
it would be if the errors in all the links were
(as)
than
(mO.
cancellation
MEAN
the
of
definition
\177
sample
1 (X\177 +
mean,
X\177
\337
X\177)
(2-1a)
repeated
(6-9)
We
easily
recogniz\177
be anal
that 2
,zed in
is just a linear
terms
of $.
transformation
of $,
and hence
2 can
108
SAMPLING
is important
to remember that \177, as well as $, is a random
fluctuates
from sample to sample.It seems
intuitively
clear
fluctuate
about
the same central value as an individual
observation,
It
that
less deviation becauseof
We thus
out.\"
\"averaging
plausible
find
variable
that X will
but with
the formulas
(6-10)
(6-11)
Proof.
and
the variance,
for
and from
Table
5-6 to
(6-9).
Fs
P\177
--
F7
-- F
we apply the
(6-10)proved
last row
of
5-6 to
Table
(6-9) again.
(6-7)
1 (ha2)
a\177
aX =
of
the
sample
will confirm how
(6-12)
proved
(6-11)
x/\177
(6-11)are illustrated in Figure 6-lb. A graph of the

mean for n -- 9 and n = 25 is left as an exercise;
distribution
concentrates
about F as sample size
(6-10)and
Formulas
distribution
this
of
(6-5)
from
Now,
mean, we apply the last row

1
for the
First,
this
increases.
review
We
of a
die. Two
of 2
sampling
distribution
its mean
section
(X\177,
of all
population
infinite
ity
this
rolls
X2)
by reconsidering
can be regarded
possible rolls
of the
chips from a bowl, as discussed

of the parent population
(F) and standard deviation
a familiar
problem
as a sample of 2
is also
die.
This
in
Problem
is shown
in
rolling
from the
the
taken
equivalent to a
5-26. The probabilTable

6-2a, along with
(a).
this experiment has such simple probability

characteristics,
we
compute the probability distribution
of S and of \177 for a sample
of
2 rolls of the die as shown
in Table
6-2b; the moments of both $ and X are
Because
can also
also calculated
in
this
table.
TABLE 6-2
(a)
of the
Distribution
Probability
Roll of a Die (Population)

x?(x)
?(x)
\177]
1/6
1/6
1/6
2/6
1/6
3/6
1/6
4/6
1,/6
5/6
1/6
6/6
21/6
= 3.5
similarly
= 1.71
a =
TABLE
6-2
Distribution of the
b) Probability
S and
Sample
(1)
Outcome
Sample
\177et
X,
n --
with
i(
(4)
(3)
(2)
or
(5)
S ace
First Se..cond
Die D
te
Sum
Mean
\337
Probability
s p(s)
\177
p(\177)
1/36
2/36
1/36
1.5
2/36
6/36
3/36
3/36
4/36
.(1,2)
.(2,1)
.(1,3)
.(2,2)
.(3,1)
36
\177able
equipro
outcom
!s
.(6 I, 6) i
2.5
5/36
3.5
6/36
5/36
4.5
4/36
10
3/36
11
5.5
2/36
12
1/36
6/36
12/36
3%
252/36
tt\234
similarly
as
109
=2.4
126/36
= 3.5
= 7.0
similarly'
a\234
1.2
TABLE6-2
(C)
Direct
On
Moment
Table
from
/\177s
This Relevant Formula

(using population
a from Table 6-2a)
Hand,
Other
the
Givesthe
Calculation
of Mean and Variance
Calculation
Alternative
Calculation
Short-cut
6-2b
/\177and
7.0
(6-5)
/\177s
2.4
(6-8)
as =
3.5
(6-10)
/\177,r\177
= /\177
2(3.5)
\177/\177
a
\177/5(1.71)
3.5
3.5
o-
cr'r'-
(6-11)
1.2
\177/\177
a\177
\177as
= 7.0
= 1.7
= 7
n/\177
2.4
= 1.2
\177/5
10
o's =
1.71
12
2.4.._
(a)
p (\177),Sample
mean
(n =
2)
p (x),Population
(n=
3 14
\1772}=
1)
3.5
o'= 1.7
o'\234=
1.2
(b)
6-2 Throwing a die twice

(a specific illustration of Fig. 6-1).(a) Relation
of the
sample sum S to the parent population. (b) Relation
of the sample mean ? to the parent
population. (Note. In order to facilitate graphing, the probabilities
were converted to
probability
densities,
so that they would
all have the same comparable area -- 1.)
FIG.
110
MEAN
SAMPLE
in
more
show how
;-2c we
Table
of this section. Finally,
formulas
the
tsing
simply
is summarized
example
in
have been
could
moments
these
111
obtained
d\177e-tossmg
this
6-2.
Figure
PROBLEMS
or false?If false, correct

die is rolledtwice,
6-1 True
an
having
randore'variable
of
range
The
However, 2 does not take on

values e,re rare. Thus \177 has
=
o2
illustrating
a better
X.
extreme
.the
likely
deviation than
the
why
range
the
of a
random varigble is
deviation.
errors'
sampled
of Table
the population
from
total
Would
length
total length would vary (from sample to

a standard
deviation of nor = 53 inches.
other
hand, if the 10 men in the random
sample were
The
inches.
678
with
samplel
a.single roll
as for
same
end, the expectationof the
end to
laid
then
nAt
6, also the
than the standard
6-2 True or 'alSe?if false, correct

if '0 men were randomly
be
as
satxpectati\370n
the
all values equally

a smaller
.standard
this illustrates
of spread
measure
6-1,am
3\253,
At.
crtj\177'.
dentally,
Inc
Atx
l to
from
\177 is
of
expectation
2 numbers (X)is a
of the
average
the
siqgleroll X. This illustrates
for a
errors'
the
When a
O n the
averagel :l, the expectation
of the average would be At = 67.8 inches, and
its stan\177tard
deviation
would
be cr -- 5.3 inches. \"['his is how thee long
and shcrt menin the sample tend to \"average out,\"' making
X flUCtuate
less thah a singleobservation.
(Classrom Exercise)
6-3
(b)
Tal\177e
how in
(c)
men
s\"i\370f the
height
weight
the
sample
sample,
each
(a)
a
class.
samples of size4 (with

replacement),
tall students tend to be offset
by short
calculate
\177. Plot
the Values of \177 and
of employees in a certain
around a mean of 150
pounds.
A random group
20
of
elevato[ each morning.

(b)
of the population of
graph
t\370tal
Th\177
showing
students.
cqmpare
full
and OnCe'third
the
mean
pounds,
of
25
with
employees
and variance
has
building
a standard
getin
the
of:
S.
weight
average
b\370\177l is
Find
office
large
s distributed
deviatk\177n
=> 6-5
(probability)
pCpulation
The
in the
random
a few
\177ach
FO\177
to (a).
6-4
frequency
Ma\177e a relative
(a)
weight
of many
2.
chips, one-third marked 2, one-third
marked 6.
ma\177ked
\337
4,
112
SAMPLING
(a) When
one chip is drawn,
let 2' be its number. Find F and cr, (the

and standard deviation.)
of 2 chips is drawn,
let \177?be the sample
mean'. Find
mean
population
(b) When
a sample
(1)
The
(2)
From
table
probability
\177.
%.; check your
Fx' and
calculate
this
(6-11).
(6-10) and
of
(c) Repeat (b) for a sample of 3 chips.

(d) Graph p(\177) for each case above, i.e., for sample size n
Comparison is facilitated
by using
probability
density, i.e.,
bar graph
As
else
The
1, 2,
by
using
to the share
LIMIT
THEOREM
In
around
concentrated
more
of\234(\177)7
this
case
\177 is
of the Sample Mean

exactly
Population
This follows from
normal.
which we quote
the
When
a theorem
linear
on
proof:
without
If X and Y are normal,

then any linear combination
Z = aX + b Y is also a normal random variable.
a normal
With
population,
normal. The sample mean

n normal variables,
each observation in
be written
as a
.\177can
F =1
Xt
lq
that
(6-13)
emphasizethat
(ref.
cases.
The Distribution
is Normal
combinations,
so
3.
= (height) (width).
becomes
that\234(.\177)
In the precedingsection we found the mean and standard deviation of,\177.

one question
we have not yet addressed is the share
of its distribution.
We consider two
(a)
area
is happening
CENTRAL
THE
6-4
notice
n increases,
/\177. What
probability
with
using
answers
can be
its
concentrates
6-11).
(b) The
Distribution
of
.\177
When
the
sample
.g\177,
2,, is
2'2,...
linear combination
of
.g
is normal.
about F
Population
these
(6-14)
. + 1 X,,
1 X2
lq
used to establish that
distribution
(6-13)
Finally,
as samplesizen
we re-
increases
is Not Normal
It is surprising that,
even
in this case, most of the
same
follow. As an example, considerthe bowl of 3 kinds of chips in
conclusions
Problem
6-5.
i;
is obviou
This
tion. As
6-3a?
we notice the
This sam
as shown
,endencyto the normal

tendency to the normal
...
2, 3,
(n
throws
it is
fact,
a rectangular
larger sample istaken,the distribution

As well as the increasing concentration
:r and
a larg
Figure
in
population; in
a nonnormal
ly
THEOREM
LIMIT
CENTRAL
in
from
throws
of
\177?is
this
of
distribugraphed
distribution,
shape.
occurs for the sample of dice
a population
of all possible throws),
bell
shape
6-3b.
\177gure
population is shown, having

chips
numbered 2, 4, a\177d 6, with proportions
1/4, 1/4, and 1/2. Sample
means\"from
this populatio[
\177
also
show the same tendency to normality.
'
\337
These
thrtee examples
display an astonishing pattern the sample mean
becomes
norn'lally
distributed
as n grows,
no matter what the parent p0pulation
is. This
p'attern is of such central
importance
that mathematician\177 have
fo rmu 1ate d :as
Finally,
the
increases,
from
practically
approaches a normal
and standard
deviation
The
cent\177:al
well. For
it
specifies
it has
the
,n of
distributi,
of
examples
that
found
\177een
\177
\275ith
from
taken
case in
the
6-.3.
figure
is normal
.g
taken
samples
conclusions
previous
our
can now be very specific

a known population..
.g, we
of
inference.
large
when the size n

normal. This is certainly the
conch\177sion, we can assume that

a norm;[1 population, and for large
population.
tion
as
practical
very
but
samples, and is
In fact, as a rule of
reaches
about
10'or 20,
usually
practically
\177 is
(6-15)
mean
\177/v'\177).
In
from
(with
distribution,
X, of
population
\370any
large-samplestatistical
therefore the key to

thumb
mean,
is not only remarkable,

the distribution of .V in
theorem
limit
ccmpletely
size/t
sample
the
the
of
distribution
taken
sample
Theorem. As
Litnit
Cetttral
The
a third
6-3c
i\177 Figure
in our
for any sample

taken
from practically any
on the mean and standard.deviadeduction about a sample mean
Example
Conside\177the
marks
a normal dist 'ibution

student
\177
The
first
2 graphs
I:
as already
f
Figure
culated.
0
The
theorem,
with
of all students on a statistics test. If the

a mean
of 72 and standard deviation of
done the
6-3b (in
first
3 graphs
one qua ification

is that the population
see, [or example, P. Hoel, Introduction
pp. 143-5,Jol
Wiley
& Sons,
of Figure
Table 6-2).The rest
1962.
have
6-3a (in
of the
variance.
Mathematical
finite
to
graphs
Problem
may
marlys
have
9, cQmpare
6-5), and
be similarly
For a proof of
Statistics, 3rd
the
calthis
ed.,
114
SAMPLING
p(x)
p(x)
p(x)
n=l
2
p(\177)
4 \177
p(\177)
n=2
i
[--I
r-
p(\177)
p(\177)
p(\177)
p(\177)
n=3
p(\177)
n=5
\177
p(\177)
\337
p(\177)
\177
p(\177.)
n--10
FIG. 6-3 The limiting

six kinds of chips
(c)
(b)
(a)
normal
shape for
(or die). (c)
\177p(\177).
(a)
Bowl of three
Bowl of three
of chips
kinds
kinds of chips. (b)Bowl

of different
frequency.
probability that any one student will have a mark over 78 with
(2)
that a sample of 10 students
will have an average mark
over
78.
1. The probability that
a single
student
will
have
a mark over 78 is
by standardizing
the normal population
(1) the
the probability
found
of
= Pr (Z >
.67)
= [50
:l
- .2486=
.2514.
above
'115
THEOREM
LIMIT
CENTRAL
2. Now c( nsiderthe distribution

of the sample mean.
we kno/v it is normal,
with a mean of 72 and a
a/x/\177 =
From
9/
78
exceeding
t\177
the
calculate
we
this
From the theorems

standard deviation
probability of a sample mean
be'
Pr
>
(\177
78-
Pr (X-/\177
78)=
> 2.11)
Pr (Z
72\177
':
= .0174
(6-16)
\177
\177
(X > 78)
Pr
50
\337
\177=
100
\177
72
--
FIG.
6-4
of probabilities
Co\177parison
of
ten
students
will
and
for
chance
well. This
mean.
that a i:ingle
that a sample
the sample
(about
1/4)
(about
1/60)
chance
this
perform
.017
population
for the
Hence, alihough there is a reasonable

student will gelover
78, there is very little
average
78)
X,. Pr (\177 >
is shown
in
Figur
e 6-4.
PROBLEMS
6-6 The welghts

about
.\177mean
What
i\275the
of
packages
of 25
probability
avelage
weight
Suppos\177 that the
an
6-7
has a 4ean of
that
in
11.1
a random
of schooling between
filled by a
ounces,
that
with
machine are normallv
a standard
n packages
deviation
distributed
of one
ounce.
from the machine willhave
of less than
24 ounces
if n = 1, 4, 16, 647
education level among
adults
in a certain country
years,
and a variance of 9. What
is the probability
survey of 100 adults you will find an average level
10 and
I27
6-8 Does t le central

limit theorem
sum ? J ustify
briefly.
(6-15) also hold
true
for
the
sample
116
SAMPLING
6-9
is designed
elevator
An
If the
10 persons.
of
limit of
a load
with
normally distributed with a mean

of 22 lb, what is the probability that
the load limit of the elevator?
6-10
2000 lb. It claimsa capacity
people using
the
elevator
are
of 185 lb and a standard deviation
a group
of 10 persons will exceed
all the
of
weights
Supposethat bicycle chain links have lengths distributed around a

mean/z = .50cm,with a standard deviation cr -- .04 cm. The manufacturer's standardsrequire
the chain
to be between 49and 50 cm long.
(a)
the
are made of
If chains
standards
100
of
proportion
what
links,
them
meets
(b) If chains are made of

? How
standards
the
(c) Using 99 links,

the quality
control
percent of the chains
(6-11) The amount
pocket
money
distribution
with
of
a nonnormal
tion
of $2.50.
will
be
carrying
only
99 links, what proportion now meets
many links should be put in a chain ?
to what value must cr be reduced
(how much must
on the links be improved) in order
to have 90
meet the standards ?
that persons in a certain

a mean of $9.00and a
What is the probability

a total of more than
6-12 In Problems 6-6 to 6-11,the
S2100
that the
required
formulas
variance
has
of 225 individuals
a group
that
city carry
standard devia-
independently drawn. Do you think

this is a questionable assumption? Why
?
'6-13 A farmer has 9 wheatfields planted.The distribution
of yield from
each field has a mean
of 1000 bushels
and variance 20,000. Furtherindividuals
in the
sample were
the yields of any

2 fields
are correlated,
same weather conditions, weed control, etc; in
more,
10,000. Letting S denote the total yield from

(a)
The mean and variance of S. [Hint.
How
to (5-32) be adjusted
?]
(b) Pr (S < 8,000), assuming
S is normal.
*6-5
SAMPLING FROM
WITHOUT
alternatively,
analysis,we have
sampling
from
assumed
This
is a starred
it without loss
all 9
fields,
must
the footnote
section, and
of continuity.
like
either
an infinite
matter whether we replaceor

possibility
sampling from a finite population,
skip
is
covariance
find
proof
POPULATION,
FINITE
it doesn't
the
share
they
the
fact
REPLACEMENT
In the preceding
ment, or
because
a starred
not. This
leaves
without
problem,
sampling
population-.
it
with replacein
which
one
case
remaining
replacement.
is optional;
the student
may
We
argued
already
hay
observations
X,2,...
(X\177,
whether
or i ot
(6-5)
still fol. ows
in Section
6-1
have
the
will
X\177)
we replace;
from (6-3)'
i.e.,
\177
similarb
And
land, the
On the other
or
not
of
the
we re t
lace;
male
;tudents
it is
/rs
/\177-
=/t
117
so
that
repeated
(6-5)
n/\177
(6-10) repeated
does dependon whether

10 of the heights
we sample
Suppose
college campus;
supposefurther
first
the
that
student we st mple is the star of the basketball team (say Lew Alcindor,
7 feet 1 inch)
Clearly, we now face the problem of a sample average
that
\"off target\"-?specifically,
too high. If we replace,then in the next 9
chosen, Alciqdor could turn

off target on the high
up
side.
further
to
once
then
replace,
mean
er\177
we don't
have
In summary, sampling without

replacement
(i.e.,
\177has less variance), because extreme
return to haunt
us again.
cannot
s\177mpled,
Formall)
the argument runs as follows.If we sample without
ment,
then X\177 X\177... 2,, are not independent. Hence all our theorems
variance of S
ind
\177
on the
based
above,
modified to'Sp,',cifically, (6-7)

true.
hold
on the
independence assumption, dO
assumed
which
replace-
not
must
replacement--now
be
vats = a\177
replacement{
where N
(6-17)
without
(sampling
..
size,
and
n = sample size. Furthermore
med replacement--must be similarly
modified
(6-12)
population
also asst
which
varY= a5
to-
=-\177-L\177
(6-18)
without
(sampling
replacement)
Although
we d'
not prove
1. The varijance
of
\177
intuitive
with
is
'eliable samplemean
a more
values
our sample
throw, lag
we don
at
again.
Alcindor
abor[t
worry
yields
again,
But if
examp!e
these two
of.\177 without
replacement
(6-12);
of heights
distribution
regardless,
holds
of
a sample
X\177 in
(marginal)
same
sample mean
see why.
easy to
on a
the
all
that
(6-.1)
equation
of the
variance
REPLACEMENT
WITHOUT
SAMPLING
formulas,
we interpret
them-
variance
replacement
(6-18) is less than the
(this is. the
formal
confirmation
of
\177our
This
occurs
because
'the
of college students)D
SAMPLING
factor,\"
\"reduction
(6-19)
(\177-\177)
appearing
and
and
have wondered where
If you
seethat
(6-18) must
(6-12) and
sample
size
replacement
coincide.
necessarily
denominator, you can
in the
from
1 came
the
the
course,
made between
can be
distinction
no
case,
[Unless of
than one.
is less
(6-18)
in
one. In this
nonreplacement,
only
is
to logicallymake (6-12)and (6-18) equivalent,

one.]
2. When n = N, the sample coincideswith the whole population, every
time. Hence every
sample
mean
must be the same--the population mean.
The variance
of the sample mean, being a measure of its spread,
must be zero.
This is reflected
in
(6-19)
having a zero numerator; and var.g in (6-18)
becomes
zero. (Note that
with
replacement
this is not the case in this
instance,
n = N does not guarantee that
the
sample
and the population are
in order
necessary,
it is
as they
must be, for
size of
a sample
identical).
3.
men
the
On
other
are sampled
when
hand,
from 80 million),
common
whether
is
(6-19)
then
the same as with replacement.

sense;
if the population is very
or not the observations are thrown
is practically
(e.g., when
practically 1, so that
than N,
smaller
is much
This,
of course,
large,
it makes very
back
in again
200
var
coincides
little
before
with
difference
continuing
sampling.
PROBLEMS
\337
6-14
In the
game of
bridge, cardsare allotted
points
Points
Cards
All
(a) For the

and
the
(b) In
random
cards
below
jack
Jack
Queen
King
Ace
population of
as follows:
52
find
cards,
the mean
number of
points,
variance.
a randomly dealt hand of

variable. What are the
players beware'no points
counted
13
cards,
mean
for
the number
variance
and
distribution).
of points Y is a
of Y? (Bridge
1 19
POPULATIONS
BERNOULLI
is Pr ( Y >_ 13) ? (Hint. The distribution shapeis approximately

might hope from the central limit theorem).
Rework
Problem 6-9, assuming the population of peopleusing
etevato\177 is no longer very
large,
but rather
(a) N = 500.
(C) Wh: t
as we
normal
'6-15
(b) N =
We
the
in
50.
final
have
examined
stati
.tic that
Chapter
(a)
The
Bet
the
I oulli
First, we
BERNOULLI
FROM
SAMPI,ING
6-6
the
the distribution
we
is the
study
proportion
sample
POPULATIONS
of a sample mean
to in our
and
one referred
a sample
sum;
of U.S.
voters
poll
P.
Population
be
must
clear
drawn. We c()nceiveof this
T^m\177E
on the population from which

made up of a large number
the
as being
6-3
A Bernoulli
Frequency
salmple
Variable
p(a\177)
66,000,000
Republican
66,000,000
150,000,000
= .44
84,000,000
Democrat
84,000,000
is
of individuals,
150,000,000
--
.56
.56
.56
5o,ooO,ooo
or R (Democrator Republican).We can make this 16ok like

of chips
by relabelling each D with
a 1 and each R With
a 0.
Thus, if the voting population of 150 million
is comprised
of 84 million
Democrats and 66 million
Republicans,
the population
probability distribuall marked
the familiar
tion would
\177owl
l\177e as
shown
in Table
6-3.
'
The po!,ulationproportion
rr. of Democrats
is .56, which
is also
the
probability,!in
sampling one individual
at random,
that a Democraf will be
chosen.
This' is called a \"Bernoulli\"
population
and its distribution
is graphed
later in Fig\177re 6-6a. This is the simplest kind of probability
distri'bution,
120
SAMPLING
two values 0 and 1.(Notethat this population

is as far from
any that we will encounter).
Its mean and variance are easily
Table
6-4. In our example,/\177
= .56, and cr = .5
The reason that
the
arbitrary
values of 0 and 1 were assignedto the
population
is now clear. This ensures that/\177
and
rr coincide.
at only
lumped
being normal
computed in
as
Calculation of
TABLE 6-4
\177tand
cr 2
(1
,r)
rr
\177r
Population
a Bernoulli
for
(x
- rr)2(1 (1 - \177r)\177r
rr
rr
(6-20)
=(1
- =)
(6-21)
(6-22)
=)
(b) Bernoulli Sampling
can we expect of a sample drawn

from
this sort of
is so large that
even
without
replacement,
the
are practically
independent;
the probability of choosinga
practically
.56 regardless
of whether or not we replace.
a sample of n = 50 let us say, we might
obtain,
for example,
\"What
ask,
now
We
population
?\"
population
The
observations
Democratremains
If we take
numbers:
the following 50
1... 0 1 1
0 110 100 10 11
sum, of course, will
We recall encountering this
sample
The
sample.
4-3;
Table
a binomial
thus
just
be
before
random
(6-23)
the number of Democrats in the

as a binomial random variable in
a sample
simply
is
variable
sum
in
disguise.
is this
Why
wish to
interesting coincidenceof any

binomial
the
calculate
probability
practical
of at least
value
? Suppose
30 Democratsin
we
50
evaluate the probability of exactly

30 Democrats,
of 31, 32,
and so on. This would require a major computational effort: not only are
some twenty odd probabilities involved,
but
in addition,
each is extremely
We could
trials.
to
difficult
? As
ability
an
calculate.
exercise,
of getting
But
we recognize
the student should

30 Democrats in
that
consider
whether
a sample of 50, which
(\177:) (.56)a\370(.44)
is equivalent
this
it is
\177\370
is:
feasible
to calculating
to evaluate
the prob-
the
y that
probabilil
is at
least 30
)n we have completely
secti
previous
S, the sample sum taken

of 50. This is very
a sample
121
POPULATIONS
BERNOULLI
because:in the
to calculate,
the
described
population,
a Bernoulli
from
easy
of
distribution
s\177ample
any
sum.
Sin fact
normally
s approximately
distributed,
the following
with
mean
and variance
From
from
(6-20)
(6\1775),
'!
(6-24)
mean
Binomial
From (6-7),
and using (6- !1),
as
X/n\177(1
Hence the probability of at
least
Pr (S 2
.dardized form
in stm
which,
Ip (S-- us >
To confirm
t \177
'e
e s
usefulness
in a
30 Democrats
sample of 50 is'.
30)
is
32
\177
Pr
(Z >
-approximation
to the
-- this
\17712.3
]
of
normal
on p. 6-5.
120.\177he
s For
normal
to
approximation
(6-26)
.28
.58)
student should compare th\177s simple

solution
w\177th the
in evaluating lsome twenty-odd expressions,
eachlike
Figure
(6-25)
deviation
standard
Binomial
\177r)
calculations
one in
the
the binomial
the
binomial,
inkolved
the fobtnote
is graphed\370 in
the central limit theorem.

A useful rule of thumb
is that n should be large
enough
to make: n= > 5 and n(1 -- =) > 5. If n is large, yet = is so small that n= < 5,
then
there is a \177etter approximation
than the normal, called the Poisson
distribution.
\177
This graph cl, irly indicates
that a better approximation to the binomial histogram would
be the area und \177rthe curve above 29.5, not 30. This peculiarity arises from trying to approximate a dis;;rete variable
with a continuous one, and is therefore
called the cqntinuity
correction.
Our 'better approximation is
large
n, bJ
Pr S --/ts
(
To keep the
book.
an. \177lysis uncluttered,
>
\177
this continuity
Pr (Z
> .43) \177 .334
correction is
ignored
(6-27)
in the
rest of
the
SAMPLING
122
continuity
With
correction Pr =
correction Pr =
continuity
Without
answer
We now
is the
as
Just
to
turn
distribution
the total
disguise, so the
to
tion
second major issue of this section:

of the sampie proportion P?
number of successesis merely
the
sample proportion is merely
theory developed
of the
(Compare Fig. 6-1a.)
the
P -All our
35
binomial.
the
for .\177, can
sample statistic ?.
--
now
from
Thus,
Pr
in sample
Democrats
of
FIG. 6-5 Normal approximation
what
30
/\177s29.5
.28
with exact
= .337
Compare
s = number
.334
the
sample
,r known,
with
sum in
in disguise:
sample
mean
(6-28)
.\177
be applied to determinethe distribu(6-10) and (6-20) the mean of ?
is'
(6-29)
we note that, on the average, the sample

proportion
? is on target,
value is equal to the population proportion which
(we shall
see in Chapter 8) it will be used to estimate. But any specific
sample ? will
be subject
to sampling variation and will typically
fall above or below
From (6-11)and (6-22) we discover that its standard deviation is
From
i.e., its
this
average
= X/(1
Finally,
since P is
(central limit
(6-30)
=)
a samplemean, its distribution
is normal
for large
samples
theorem).
As an example, consider the population of voters

shown
in
What is the probability
that
in a random sample of 50 voters
and 60 percent will be Democrats
? From (6-29) and (6-30)
/\177p
rr =
.56
%/.56(1 -
.56)
.070
Figure 6-6a.
between 50
two
These
define the
tion
along
.es,
vall
of
d\177stnbut\177on
near
was no+ere
with our knowledge that

P shown in Figure 6-6b.
completely
is normal,
Even though
our
P is
statistic
sample
our
normal,
123
POPULATIONS
BERNOULLI
popula-
approximately
normal.
i\370\370
5O
cr
1.0
7r = .56
/\177
=
= w/\177r(1 - \177)= .50
(a)
.50 \177P
.4
.2'
\177
.6
gp =
FIG. 6-6
of voters.
(b)
\177.5208
.8
1.0
= .56
proportion
the sample
of
Rela[ion
Fig. 6-lb). (a] Population
\177.60)
In a
proportion
population
the
to
sample of 50 voters,
The evahtation of the area of this normal

now a's [raightforward matter:
between
distribution
(compare
6f P.
distribution
.5\1770 and
.60 is
\177r
(.s0
-<
P <
.6o) =
(.S0 --.S6
\177r \177
Pr
.5208
(-.857
.\177\177
-<
\177
\177
--
.60
--.\1776]
<
.572)
PROBLEMS\177
(Note
continuity
if you
want
tN
Lt
co]
rections.)
high accuracy
in
your
answers,
you should
make
124
SAMPLING
takes a poll of 1000

Democratic. Letting P
Gallup
Suppose
6-16
56 percent
is
(.52 < P <
(a) Pr
.60).
be the
problems raisedin
In tossing a fair coin
tion of
6-18 If
a fair
heads
die
of
quarter
(a) Pr
is rolled
(a)
is the
the
propor-
that at least
probability
is the
what
? Answer
that
General
one
ways'
two
number
chance
Year,
Year,
SAMPLINGTHEORY
SUMMARY OF
6-7
what is the probability
of aces _> 25)

that of the first 100 babies born in the New
more
than
60 will be boys ?
What
is the chance
that of the first 8 babies born in the New
more
than
6 will be boys ? Answer
two
ways'
(a) Exactly, using
the
binomial
distribution.
(b) Approximately, using
the
normal
distribution.
What
6-20
will correctly
to answer the
sample
beginning
(? >
(b) Pr (total
(6-19)
100 times,
are aces
these
find
1.
Chapter
50 times,
exceed
.55 ?
will
th'e
which
proportion,
sample
(b)
Pr (P > .5), i.e., the probability that
predict the election. Note how
we
are
6-17
a population
from
voters
Sampling
1. The distribution
large samples say n >
population is near normal,
of
mean
sample
the
t0 or
20 as a
a much
then
normal for
(Moreover,
if the
approximately
\177 is
thumb.
of
rule
smaller sample
will
be
approximately
normal.)
2. X will
3. If we
have
an
equal
expectation
sample without
population (N) is very
large,
\177
will
expectation.
a variance equal to'
population
have
--i
If the
the
to/\177,
replacement,
this
reduces
to, approximately:
G2
which
is also the
Thus we
may
formula for
variance
when
we
sample
with
replacement.
write'
(6-31)
125
SUMMARY
is a
which
and
(b) Bernoull
distributed
for \".g is normally
abbreviation
seful
varianC,
mean
with
Sampling
If we a ply thi s sampling

theory to a special population-.
-chip s coded
0 and 1-.-th}n we have the solution
to the proportion
problem. The Sample
proportion
:\177 is just
a disguised \177, an d th e population proportion rr: is just
a disguised/\177,so that
(6-32)
again assum\177
\177gn
Probl
ems
Review
6-21
large.
is sufficiently
Five n
proba'
(c)
TI\177
Gi
6-22 A
e total
weight
\177ean
intuitive
mar
follow
is
than
more
is more
reason
pays
carnival
at
more
Winning
why your
average net
approximate
'
is
Whlat
his
plays the game'
(1) 5 times
(2) !25
s I to play a
game
(roulette)
the
with
Net
Winning
Y-
winning
in
Probability
20/38
18/38
0
'
ending up a loser (net
a game
chance of
loss)
times ?
(3) !:125
times
Hoqv could you

Ho;\177v
answers are related.
+!
(a) Wt.at is the
(d)
170 ?
-- 1
()
losing?
mean
is the
than 8507
(c)
What
170
than
$2
if he
elevator.
\177gpayoff:
y \177 Gross
population with
that
fility
AI five men weigh

(b) Th e average weight
(a)
(d)
= 20lb, get on an
and a
160 lb
/\177=
from a normal
at random
selected
\177en,
weigh\177
many
times
get an exact answer

should
for
(b)l
he play is he wants to
be 99 \177 certain
of
126
SAMPLING
6-23
Fill
blanks.
in the
(a) Supposethat
in a certain
election, the U.S. and California
are
proportion of Democrats,,r, the only difference
being
that the U.S. is about 10times
as large
a population. In order to get
an
equally
reliable
estimate of rr, the U.S. sample should be
in their
alike
as large as the
+2\" or --2\",equally
likely.
is taken. The sample sum
error as much
as
no more than
.
t. Worst possible
2.
==>
6-24
Let
\177be
we feel
the
for
example,
For
(95 %) to be in
two
inde-
be in
S couIdi)ossibly
100, these
of
by
error
errors
are
error =
a die is thrown
mean when
sample
that
certain\"
\"fairly
S is likely
. However,
suppose
we
which
A sample
<
error
Likely
an error,
with
measured
is
measurements
pendent
sample.
California
(b) A certain length

for simplicity to be
.\177 is
\"quite
times.
1000
close\" to
hr.
More
to
the
Intuitively
precisely,
calculate
Pr
(6-25) In making
(a) If the
a budget,
up
consists
budget
--
(3t
.1 <
\177
+ .1)
</t
a housewife rounds
of 200 items, what
rounding error will exceed

s l.007
(b) Briefly state the assumptions necessaryin
(6-26)
Suppose there
62, 65,
denoted
are
68, 65,
five
in a
men
65. One
man
out
is
the
answer
nearest
100..
chance that the
(a).
room, whose heights in

at random with
is drawn
inches
his
are
height
X.
(a) Graph the
tion. Find
its
probability
mean
bt, and
Suppose a sample of
function
of X,
i.e., the population distribu-
variance a
two
men
is drawn,
with
replacement,
and
mean .\177is calculated.

(b) Construct a table of the probability
function of \177. (Hint.
List the
possible samples, i.e., the sample
space.
Are the outcomes equally
likely
? For each outcome,
calculate
the sample
(c) Graph the
probability
function
of X.
(d) Find the mean and variance

of .\177from
its probability
distribution.
(e) Check your answers
to (d) using the equations of this
chapter.
(f) Is the following a valid interpretation of these
formulas?
If not,
correct it'
.\177fluctuates
around/t;
sometimes
larger, sometimes smaller, but
exactly
equal
to 3t on the average. (/\177x =/t).-g,
the average of n
observations,
does not fluctuate as much
as a single observation,
however (tr} = a\177'/n < a\177)\337This is to be expected, because in a
sample,
a large observation will often
be \"cancelled
out\" by a
SUMMARY
mall
observation,
ions which
will
or at least swamped by
typical.
the
rest
\177i
of the
127
0bserva-
be more
(6-27)
Rep\177
*6-28
at
merit:
Problem
Why IS this
a sample of 2 men
sampling without
replacement
6-26 for
likely 3' it was
precise, for
of
ac\177 s
in
w/thout replace-
preferable
'
stated that relative
frequency
in the long run is
to be \"closeto\" probability.
To make this statement
the rolling
of a die for example,'letP denote the proportion
I0,000 throws, and calculate
In Ct}apter
\"very'
drawn
Pr(\177--.01
< P<
\177+.01)
chapter
Estimation
7-1
INTRODUCTION
induction,
statistical
beginning
Before
the concepts
we pause
Table
in
7-1
to review
of sample and population.
It
is essential
,it and
variance
calledpopulation
to remember
\177r
2
(though
so that its mean

unknown). These are
is fixed,
population
the
that
constants
are
generally
parameters.
the sample mean .V and sample variance s 2 are random

varying from sampleto sample,with a certain probability
distribution. For example,the distribution
of \177 was
found
to be approximately
.V \177. N (3t, o2/n) in Chapter
6. A random
variable
such as .V or s = which
is
calculated
from the observations in a sample
is given the technical name
By
contrast,
variables,
sample
statistic.
specific example of statistical

inference,
suppose
we wish to estimate
height
of American men on a large Midwestern campus.This
population'
mean
l.t is a fixed, but
unknown
parameter.
We estima_te it by
taking
a sample of 36 students,and
compute
the sample
mean X; let us
suppose this turns out to be 68 inches. We shall see in the next section that this
is our best single estimate or \"point
estimate\"
of/t.
But we also know, from
As a
the
average
T^B\177.E
Random
which
Random
are examples
used
tt
s\177
statistics,
Population
which
of
Population
Probabilities
are
compute
.V and
2.
versus
Random Subset of the
frequencies.fi./n
are used to
3.
is a
Sample
1. Relative
of'Sample
Review
7-1
p(\177')
to compute
and
\177
are examples
of
Fixed parameters,or
or
4. Estimators
Targets
128
129
INTRODUCTION
the theory of ?ur previous

sample this estimate .\177will
we are extremely lucky

in our
target, but rather a bit high or
that unless
be exactly on
chapter
not
\177 is distributed
around/z.
above and below i't as
6-tb. If we want to be reasonably confident that our inf4\177rence
is correct, we!cannot estimate/z to be precisely
equal
to our observed \177;
instead
we mUkt estimate that/z is bracketed by some
interval
know n as a
co\177de;7ce
intert,
al
of the following form.
Tec}\177nically,
low.
bit
in FigUre
shown
/z
the
with
estim\177
In this
estimate
in
around/z
tl\177e
chapter.
previous
to the assumptions of Section6-3,so that

of Figure 7-1.
decide'
\"How confident do we wish
to be that our imerval
that
it rioes bracket \177?\" It is common
to choose 953/00
First we must
estimate is right ....
according
words, we
in other
confidence;
the long run,
To geta c{)nfidence
this is the
in
shown
of 95
level
of
distribt.tion
normal
a technique that
19 times out of 20.
using
be
will
estimate
interval
correct
allowance.
error
we can be very
specific
in our interval
be specific about the distribution
of \177
To keep inferences simple, we assume
\177
could
we
distributed
distribution
is that
its
of the
that
show
will
because
normally
is
is a
is the evaluation
seckion we
of/\177,!
right-hand
side of (7-1),there is no problem
simple calculation of the average
of the sample
the
evaluating
.tor\177; this
problem
The
values.
(7-2)
inches
68 4- 3
/z
We observethz .t in
we
\177,
the smallest
select
enclose a
will just
.\177that
95 3/o
miditle chunk, leaving 2-}\177 probability

Figure
7-1. From our normal tables,
going above
a\177d
write
mean
the
below
therefore
(7-1)
allowance
an error
.\177 4-
estimate
we might
exampld,
an
As
by
will
note
we
ObviOusly,
in each tail, as
that
this involves
deviations of
1.96 standard
us, in
range under the
probability.
excluded
give
.\177.
We
t
P,] r
The brack
speak, obtainin!g
To
(7-4)
prove
standard
normal
inequalities
/\177
i [
-- '
note directly,
istribution.
Pr
In
(7-5)
the
brack,
\177,'
n
96
O'
.96-5-=i
be solved
statement:
\177
x/n
for/z,
may
from
<
--t.96
may
= 95 o\177
o/
be solved
</\177
\177-\177
<
<
for
5' +
(7-4)
1.96--=_} =95/o
standardizing .V, which

the standard normal tables:
c\177/x/7t
(7-3)
turned around\" so to
rr
< X+
4 could</\177 begin
by
.-7\177
X-- 1.96
Pr
rr
we
Thus
:ed inequalities
2- \177</z
<
equivalent
the
P]
i---'
)d
rr,_
1.96
\177/z --
1.96
h\177's the
(7-5)
95\177
,tt, obtaining
1.96
then
the equivalent inequalities
95\177o
(7-6)
(7-4)
proved
This area
\177
p.FIG.
we
7-1
don't
\\
which
1.96-\177
is
\177.
+
\177.
This area
1.96
\275
also/J..\177
Distribution of sample mean \177\177-\177

N [3t, (o2/n)]. (Note. tt is an unknown constant;
what its value is; all we know is that, whatever \177tmay be, the variable
X is
distributed around it as shown in this diagram.)
know
D\177stribuhon
of
mean
sample
This is what
we know,
the
> but
statistician
does
&
67
69
71
not.
\177
70
His
\337
\177'1
estimate,
interval
first
68
His second,
His third,
These are the

statistician's
and so on;
(so
far, att
bracket/z)
:
His one
> interval
estimates.
miss
z\17712
His
FIG.
%2
Construction of
twenty
interval
130
twenbeth
estimates'
a typical
result.
131
INTRODUCTION
We
u)t be exceedingly careful not
to
m\177smterpret
(%4). ,u has not
changed its c\177;haracter in the course of this algebraic
manipulation.
It has not
become a v\177riable;
it remains
a population constant. Equation (7-\177), like
(7-3),
is a prgbability statement about the random
variable
?, or more pre\177,aries, not
that
constructing an
point, let's
fundamental
men's heights on our

what is going on, supposewe
illustrate
clearly
to
of the population \177 (which

know to be 6 inches).Now let's
supern\177tural knowledge
some
69 inches)anc
we
a (which
happens whe the statistician (poor mortal

that
using
(7-4)
above.
Just for the sake of illustration,
such interval
estimates,
each
time from a different
Figure 7-2 illustrates
his typical
experience.
we
just
is) tries
he
suppose
let's
large
have
to be
observe what
know
to estimate
he makes 20
of 36.
sample
random
of
problem
our
to
return
average
of
estimate
interval
campus.Moreover,
is this
It
1.96(a/\177/\177).
\177.
this
apprec.ate
To
1.96(a/\177)
to \177 +
?-
interval\"
\"sandom
the
cisely,
interval
with
normal
is
\177
dviatio,
\177 95 \177
calculates
th\177
any
\177
will
\177
fall
computes
he
standard
and
(69)
rom
(7-3)
k,
ow
in the
range 67
to
71
takes
to
interval
95\177 confidence
appropriate
mean. We know
sample
the
doesn't know this; he blindly

the first mean \1771
the sqatistician
But
2
t
sample, from
Which
that
probability
of
population
i,ch.
to
there is
that
inches.
show the distribution

mean equal to the
d\177agram we
that
n
that
random
From (7-4) he
his first
be 70.
for
6
70 \177
=
interval
This
in
that
his
estimate
for
firsl effort, the
In his secor \177sample,
of individuals,
of
tion
diagram,
bracket
(7-4)
missed the
comes
\177and
able approximation
\177ne
kp\370ws
for
that
72
(7-8)
first one shown
the
\177 is
that nineteen
observe
Only
in
7-2.
Figure
enclosed
in this
We note
interval.
one--the
twelfth
of these twenty
does not;
in
estimates
this
cas e
he
wrong.
was
difficulty
(7-7)
statistician is right;
We
\177.
and
mar\177,
e gloss over
value for
he
and \177o on.

the constant
is
68 to
/
\177
\17736
the statistician
happens to draw a shorter group
computes
z\177\177o be 68 inches.
From a similar evalu a_
up with his second interval estimate shown in the
duly
and
\177
\177.96
here.
sample
a in this
In
size
evaluating
n
problem.
is 36
(7-4) the statistician has an observed

But there is one value he does n
,
132
We
this
but
easily
see why he was right
most
of the time. For each interval
is simply adding and subtracting 2 inches to his sample mean;
is the same 4-2 inches that
defines
the range ab around bt. Thus,
if
can
he
estimate
and only if he observes a

estimate bracket bt. Nineteen
ab, and
would
take
only
either
when
a statistician does not take many samples

-fie
(e.g., \177). And once this interval estimate is made, he
or wrong;
this interval brackets \177 or it does not. But the
of course,
practice,
'\177'2)'In
is
one instance
in the
only
within
the range ab, will his interval
twenty sample means do fall in the range
his interval estimate was right. He was wrong
he observed
a sample mean outside ab (i.e.,
of his
instances
these
all
in
sample mean
right
one
important point to recognizeis that the statistician

is using a method with
a
this follows because there
is a 95 \177 probability
within the range ab, and as a consequence
his
interval
estimate
will bracket \177. This is what is meant by a 95 \177 confidence
interval'
the statistician knows that
in the long run, 95 \177 of the intervals
he
constructs in this way will bracket
To review, we briefly emphasize
the main points'
95 \177 probability
of success;
that his observed \177 will fall
1. The
interval
that
is a
and remainsconstant. It is the
is constant,
parameter
population
estimate
random variable,
because
As long as \177 is a random variable that can take

is referred to\177as an
emmatot.
of \177.
2. But once the sample has been observed
value
(e.g.,
\177
no longer
valid.
longer
For
70 inches)
this reason
called a
it
95\177
when
it
values,
X takes
and
on one
specific
an \"estimate\"a of \177u.Since it is
are no longer strictly
\177is substituted
into (7-4), it is no
statements
but
statement,
probability
variable.
a random
whole range of
called
is then
probability
the
estimate
variable,
randpm
\177 is
on a
rather
95\177
confidence
interval'
our deduction in
into the induction that
Thus,
times
(7-4)
3t is
1.96\177 <
that
,\177is
within
(7-9)
\177
\177-
,t\177<
within
1.96a/x/\177t
\177+
1.96
1.96a/x//\177
of
the
'
of bt is
observed
around\"
x,. (7-9) is some-
\"turned
to
abbreviated
95 %
confidence interval'
(7-10)
a For emphasis, the

timator is denoted
potential
value.
estimate
is denoted
the capital letter
by
by the lower case letter

\177, while
the
X. We might call Z the realized value,
random_
and
es-
X the
13 3
INTRODUCTION
where
the critical value leaving 2-}5/0

normal distribution.
is
025
standard
the
To reca
once
fitulate,
3. Becauseof
in
he
4.
he
concentratec
narrows
interval
we know
the
that
clusions,
the\177n
An
95
right
be
ab increases. Hence our interval

point
and the one preceding
not
;\177.\177,\177
Intuitively,
its
of
estimator
there are
n{ossible that
is
for certain.
erred only
are wrong. All
our
less precise.
observa{ions
in
becomes
estimate
verify
of our conin Figure 7-2;
confident
each
tail
casual
\337
about the population
we knew thedistribution
...^
statistician
\177
confident
e.g., 9 5/o
leave less of the probability in
must
we
i{fference
\177It
cast,\" and the
9 0
Note how thais

Chapter 1. 'i
6.
\"die is
or wrong
is increased,
be more
to
wish
ranke
the
thus
upper tail of
if any,
estimates,
w4
5. If
the
then
certain,
for
right
has no idea which
will
size
s\177mple
As
either
in the
of the time,
in the long run.
the distribution of X becomes?more
around
/\177(cr/'\177/n
decreases
as n increases), and the confidence
(becomes
more precise).
is :hat
knows
7,
to be
observed
omniscience,
our
try. But he
twelftl
his
\177 is
will be
(7-9)
estimate
interval
probability
ta
\177.,,\177a v,\177,\177a
stronger realons, given
a \177WhY
to estimate
a mean with
a mean.
But there
are
next section.
the
in
beca. use
an interesting q .uest\177on,
(for example,
th e\177sample
did we use the sample mean ?\"
feasible
raises
This
statistics
other
a\177timate
preferable
\177tseems
\177.
was
parameter/\177
PROBLEMS
7-1 An
of
and v
.riance
(a) F
\177d a
the heights (in
measured
anthropologist
from a
10{ men
to be
95
71 and 9
respectively.
interval
\177oconfidence
inches) of a random
sample
found the sample mean
and
population,
certain
for the
mean
height/\177
of
population.
(b)
Find
whole
\177
a 99 \177
interval.
confidence
expenditures (in thousands

of 50 American families(all at the same
inco !e and asset level) The sample mean
is 5.2 and the itandard
devmt\177on
is .72.
Construct
a 95\177 confidence
interval
for the mean
7-2 A
research
the
study
of do!lars)of
the consumption
examines
a random
sample
conSrmption of all American families (at this income and assetlevel).

7-3 The r :aCtion times of 150 randomly selected driverswere foun4 to have
of .20 sec. Find a 95\177o
a me m of .83 sec and standard deviation
confi \177
[ence
of
dri vers.
interval
for the
mean\177reaCtio
n time
of the
whole
Poipulati\370n
134
ESTIMATION
(7-4)
From
a very large
randomly
selected'
class
74
83
78
64 55
58
86
71
72 64 42 62 62
65 68 60 76 86
65
58
49
64
56
50
71
56
45
73
54 86
a 95 \177 confidence
class. (Hin. Reduce your
into cells of width
5.)
grouping
What is the
7-5
95
\177
probability
(a) Once (as in

(b) Not at all ?
(c) More
82
53
73
70
average
of
mark
the
to manageable
proportions
who constructs
20 independent
by
will err:
7-1)?
in Section
example
our
58
74
once?
than
OF ESTIMATORS
PROPERTIES
DESIRABLE
7-2
a statistician
that
intervals
confidence
for the
interval
work
were
40 marks
57 75
58
87
Construct
whole
following
the
statistics,
in
To be perfectly
any population parameter 0, and

(In our special example in the preceding
section,/\177 is the population parameter 0, and \177 is its estimator
0). We would
like the random
variable
\177to
vary
within only a narrow range around
its
fixed target 0; (thus in our example
in Figure 7-2, we should like the distribution of X to be concentrated
around/\177,
as close to/\177 as possible).
We develop
this
notion
of closeness
in several ways.
denote an
(a)
No
for
it by 0.
Bias
unbiased
An
shown
we consider
general,
estimator
in
Figure
estimator
7-3a.
is one that is,

we state
on the
average,
right
on
target,
as
Formally,
Definition.
\177is
For example, X is an
an unbiased
estimator of 0 if
unbiased estimator
E(\177)
fact,
E(0)
= 0
'
(7-11)
because
of/t,
(6-10)
=/\177
Of course, an estimator 0 is calledbiased

bias is defined as this
difference:
if E(O)
is different
repeated
from 0; in
Definition.
bias B
(7-12)
_a_ E(\177)
--
OF ESTIMATORS
PROPERTIES
135
True
= E(O)
(a)
Bias\177_
True
E(\177')
(b)
(b)Biasedestimator.
CImparison of a biased
7-3
FIG.
is
Bias
(0)
illust:\177rated
in
0,
there
exceed\177
As an
will be
of
e\177ample
estimator.
unbiased
and
7-3b. The distribution
Figure
a tendency for
a biased
MSD =
estimator.
0.
mean squareddeviation
(7-13)
/7
will
on
inflate
it
since
target\";
over-estimate
0 to
the sample
estimator,
Unbiased
(a)
of 0 is \"off
(2-5a)r\177epeated
the
iverage
just
a little,
underestimate
by
dividing
population
the
1 instead
by
variance
2
variance.
of .\177 we
\177But
obtain
if we
the sample
(7-14)
(2-6)repeated
which
has
be,\177n
proved
proved,\"we mean
proved in thi\177 text,
4
This
that
an unbiased
it has been
we shall
of
estimator
proved
usually say
unclerest marion can be seen very easily

that Eq. (7-13) gives MSD -- 0, which
\"we
in the
rr \177'.(When
advanced
in
have
proved.\")
case ofn =
we say \"has
texts.
If it has
The student
1.Then
.\177coincides
been
been
who
with
obvious underestimate of cr \177

On the otOer hand, Eq. (7-14), when
n = 1, gives s2'= 0/0, which
is \177tndefined.
But this is not a'idrawback;
in fact, it is a good way to warn the unwary
that since a sample
of just one obs,:rvation has no \"spread,\"
it cannot
estimate the population variance
cr \177(assuming/\177
is ,nknown, of course).
\1773., so
is an
ESTIMATION
136
True
v(o)
(a)
True
FIG. 7-4
of an
comparison
inefficient
and
efficient
(a)
Efficient.
(b)
estimator
(both are
unbiased).
Inefficient.
puzzled by our division by n -- 1 in defining s 2 in Chapter

2 can now see
why' we want
to use this sample variance as an
unbiased
estimator
of the
population variance.
Both the sample mean and median are unbiased estimators of F in a
normal
population;
thus, in judging
which
is to be preferred, we must
was
their
examine
characteristics.
other
(b) Efficiency
As well as beingon target
estimator
an
of
tion
This is
the
of
notion
becauie
efficient
efficiency
\177to
of two
it
on
efficiency,
has
we should also likethe distributhat is, to have a small variance.

shown in Figure 7-4. We describe
\177as more
variance.
A useful relative measure of the
the average,
be concentrated,
smaller
unbiased *
is'
estimators
Definition.
vat
Relative
5 For biased estimators,
the
efficiency
definition
of
0 compared
of efficiency
is
\177:(\177
-
E(O
which
of course
is (7-15) if
both
estimators
- o)*
have 0 bias.
to
var 0
\177\177=
which is more
or sil nply
effcien.t,
Finally,
\"efficient.\"
median
population,
been
has
has been shown
median
G 2/\1771
Because it
is
that
(i.e.,
precise
df
sn
\177aller
of loo
way
more
again
411tend
size (n) we can reduce

tive
of/\177.
sample
the
(7-16)
a lar
mediar s derived from (7-15) as:

'I; sample,
the
relative efficiencyof the
(=/2)0'7,0
estimate
from
is cr2/n. On the other hand,

large
n, a variance of
to have, for
of the
a normal
W e' have
merits
the
on
sampling
(*r/2)(\370'2/n)
Hence in
to the
In
,u.
effcient estimator
to be the
variance
its
of
estimators
as
proved
already establ{shedthat
pass judgement
position to
in a
ve are
sample mean ':and

\177
absolutely
is called
other
any
than
eft%lent
137
ESTIMATORS
OF
PROPERTIES
estimator
An
efficient,
rr/2
(%17)
It
it
,u; or,
target
Of course,
estimate.
interval
the variance
of either
compared
mean
sample
1.5
is preferred.
,w4
to be closerto the
range)
\177
will
give
us a
will
give
us a
by
sample
increasing
an alterna-
Therefore,
estimator.
point
more
at the greater
efficiency of the sample mean
is to recognize
median will yield as accurate a point or interval
estimate
only
if we take a Ia
'ger sample. Hence, using the sample mean
is more
efficient,
because
i't cost: less to sample; note how
the economic
and statistical defini-
that the sample
tions of effcieq,
ting
cy
coincide.
(c) Consistency
s
Roughly
on its
pletely
:arget as
Figure 7-5. In
sistent estimate
We
now
he
measure
good
beaking, a consistent estimator
0.
good meas\177
re of
Consistency
a con-
r 0 will provide
a
:ate consistency more precisely. Just as the variance
\177fthe
spread
of a distribution about its mean, so the
\337
equires
& E(0
error
squared
mean
is a
concentrates comas sketched in
that
indefinitely,
as the sample size becomesinfinite,

perfect point estimate of the target
case,
limiting
is one
increases
size
sample
how the distribution of

this to be zero in the
0.
\177x'as
-- 0)2
(7-18)
0 is spreadabout
its
target
Walue
limit'
Definition.
11
-o.
consistenff
co
if
E(0
as
0 is
This
called\"
is
definition
'
conslstenc\177
sometimes calIed \"consistency

in probability\":
for any
Pr (]0
This
is often
taker
as the
definition of
- 0)2
(7-19)
-. 0 I
in mean-.square.\"
positive 6 (no
0' <
consistency.
(5)
as t/
--,. I
--\177OO
matter
It implies a
how
condition
small),
(7-20)
ESTIMATION
138
n= 100
= 50
True 0
FIG. 7-5
A consistent
how the
showing
estimator,
target 0 as n
squared
Mean
error is related
distribution of 0
concentrates
on
its
increases.
to bias and
by
variance
the following
theorem. 7
Theorem.
- 0)2
r(O
(7-21)
+ d'a
Corollary.
0 is a consistent estimator iff s its variance
and bias both approach
zero, as n \177 cc,.
If only the
unbiased\"
bias approaches zero, the estimator

a condition
that
is clearly weaker
does
Consistency
example,as an
7
Proof.
s lff is
9
E(O --
0)2
= E[(O
--'
E(O
o\177+
2
an abbreviation
Asymptotic
applies for
not guarantee
of
estimator
for
unbiasedness
all n, not just
in
bt
-- Ira) q--,tl\177) 2
0 +
thaf
a normal
(/t\177 --
\"asymptotically
consistency.
estimator
population,
is a good
the sample
one. For
median is
0)12
q- 2(,tt\177- O)E(O
(/,\177 -
an
is called
than
(7-22)
-- /t s) q- (/t\177 -- 0) 2
0) 2
\"if and
only
is also a
n -\177 m.
if.\"
weaker
condition
than
unbiasedness--since
the latter
consistent.to
good estimator;
not a
it is
But
because it is
As a final
and efficient.
consistent
both
is true
that it is a. biased estimator;
i.e., it is asymptotically unbiased.\177t
Since
but
is preferred
mean
sample
the
a consistentestimator
MSD is
sample
the
,example,
139
OF ESTIMATORS
PROPERTIES
ash
it can
\177c\177,
bias
this
also be proven that
its
variance
conditions of corollary (7-22)aresatisfied.

This
concept
of
aeconometrics.
biased, yet consistent estimator
is a very important
one for example,in
tends
to zeroi the
PROBLEMS
or false? If false, correctit.
7-6 True
(a) T1}e
sample
p\177oportion
(b) /t is
a random
7-7 Based )n
and
Prcve
(b)
Wt
they are
7-8 A
farm
(1) Should
(2) Should
to how to
he
average
he
square
it's a
Matherr[atically,
Thus MSD
as
t/--\177 oo.
--,-
n\370te
0.2
When
estimate.
to
is
estimator
Which
then
s s as n --* co. Since s s
in
? or
average?
question whether
that
But he is
average.
and
then square
02, and
0\177and
first, and
is best
(7-22),
noting that
approaches
zero.
the sample
median
has
zero
s2
MSD =
we
.\177?
area he wants
observation
To prove consistency, we use corollary

bias, and a varian
:e given by (7-16) which
thii'
to
relative
proceed'
x0
11 T\370 establish
1 \"JI- (-\177)X'2
of the
length
a second
take
decidestto
of
estimators
two
field, he makesa random

error,
so that
O\177is a normal
variate centered at 200 (the trule but
with
rr = 20. Worried
about his possible error, he
he mea\177;ures
the
his observed length
unknow!n value)
dilemmi\177 as
(\253)X'
field, whose
a square
\177rhas
and is
unbiased.
efficiency of
is the
at
prefera 3le ?
popula-
to sample),
sample
from
observations, consider the

? = (\177)xl + (\177)x\177
W &
(a)
estimator of the
Y.
parameter
of 2
a sample
an unbiased
(varying
variable
the
tc\177estimate
used
P is
proportion
tion
\\
is unbiased (for
any
n), it
follows that
MSD
is unbiased
14o
ESTIMATION
(a)
ways
Are methods (1)and

of saying the same
(Hint. Try
work out (1)
and (2).)
(b) If they
(Hints. This
problem
by generalizing
general a.
from a length
the
Furthermore,
marked 2,
(b) Is (2\177
(c) Is
(d) Is
an
unbiased
an
unbiased
1/,\177
1) an
7-10 To illustrate
the
all
bias very
tosses
We shall
ment
7-8.)
Problem
1//\177 ?
(c)
theoretically,
a sample of
die. The population
consider
of a fair
2 tosses
moments
easily computed'
3.5,
/\177=
(a)
of g.
computations?.
concretely,
all
and
1)?
(Compare
of/\17727
of
(2/\177
When
of X,
table
estimator
unbiased
an
marked 6.
probability
answered parts (a), (b), and
from the population of

are
\177 is
..one-third
chips
many
of
one-third
and
4,
estimator
estimator
you have
going through
and
200,
0.\177--
n measurements.
full
unbiased estimator of
could
How
without
a bowl
chips is drawn, construct the
hence
(a) Show (oncemore)that
(,\177)2
a sample of
marked
one-third
of 2
sample
230 and
2 different
has less bias?

be easier if you avoid arithmetic
of 200 feet to a length
of/\177, and also use
normality
is irrelevant to questions of
equation
(4-5)' E(X 2) = t\1772 + \177.)
consider
6-5b,
Problem
in
(b) to
answer
Generalize
(c)
using
try
they just
actually
will
expectation.Finally,
7-9 As
which
different,
are
or are
different,
thing?
actual values, like 0\177 --
of
a couple
(2) really
study
Empirical
many
in 2
estimators
sample
(Monte-Carlo
approach
times.
\1772 =
(You
35/12
ways.
technique).
can simulate the
roll
Repeat the experi2 dice

with the
does
it, say, 5
of
random digits of Appendix Table 11.If each student

times, and the results from the class a, c pooled,this
The result will be a table like'
would
Result of
2 Tosses
(3,
1)
(2, 5)
Averages
.\177
MSD
s \177
3.5
2-}
4\177
save
work.)
MAXIMUM-LIKELIHOOD
It
141
ma
(1)
i2
oes ,\177average
But
close
of the dice.
find
cr \1779
very easily,
be calculated
can
probabilities
tt\177ese
symm':etry
to
close
relative
the
table.
frequency
a relative
approach.
In (a), if the experiment
frequencies
would settle down to
ff\177eoretical
(b)
in
to 3t ?
to a27
close
(2) E oes s\"average

(3) Does MSD average
endlessly,
data
to array the
be convenient
answer
Then
the
calculating
After
were repeated
probal\177ilities
the
\177xploiting
by
table,
probability
relevant
(2)
iMS)
(3)
*7-3
(MLE)
ESTIMATION
UM-LIKELIHOOD
MAXI
(a) Introduction
The
question
characteristics?\"
most
statisticians
that
techniqu\177
example of sampling
a Bernoulli
from
flip a biased c\177in 10 times in

of heads,
an\177
get 4 heads.
(esti
solution
4 oi
With
,r T'
wer,
rr
(trials)
In
wor,
other
wouh
be if rr
4 heads fro m
that
\177This
popula
is also, oi
hereafter only
P--4/10)
proportion
sample
proportion
.1a
Is, if
rr
the
we might
--
rr)
\"-\177
.1, there
sample we
ask
(.1)a(.9)
is only about
observed.
ourselves
how
.8. The student can verify

.his sort
of population is only
;ion with ,r = .8 \177kould yield
were
the population
\"proportion.\"
course,
tc the
\177
common-sense
10 heads before us, we ask, \"Is

reasonable
.1, then the probability of four
heads
(successes)
\177vould
be, according
to the binomial formula
get
Similarly
would
The \177aximum-likelihood method is

often use. We introduce
it with an
population; to be concrete,suppose
we
order
in
to
ideas.
rr\177(1
that we
estimators
finding
order to estimate \177, the population

We shall temporarily forget
the
the
with
exist for
technique
\177tof
If
tosses
,r
nate
somi general
develop
some
Does
\177s,
attractive
these
w\177th
the
next\177
\177=
.006;
the
in our ten
.011
one
likely our
that
of
estimate
chance
in a
result of four
hu\177ndred
heads
probability
of getting
again
it seems implausible
sample result we observed.
the
probability of heads. But,
for simplicity,
we refer
142
ESTIMATION
143
MAxIMUM,LiKELIHOOD
we consider all
Similarly'
how
asking
:ely it is
li lb
that
the
other
re would
this
possible values for

yield the sample
re, in
that
eac(
we
case
ifi fact
The
results are shown in the first column of Table 7-2, and graphed
.. 7-6.
We refer to this
as the
likelihood
function, wheh
the
sample valu es of 4 and 10 are fixed,
and
the only variable in the functlon
is
the hypotheti ci tl value
of rr. For emphasis, we often write
this
as a funct\177bn
of
observed.
in Figur!
fully
re
alone'
The
maxir
likelihood
DefinitiO
likelihood
aura
estimate (,r
:tion. In general'
fum
= .4) is
the
value
maximizing
this
The
MLE
maximizes
hypothetical population
the likelihood of the observed
is the
value
which
(7-23)
sample.
We note'
(a) The s\177.mple proportion

P is our MLE of the population
proportion
re; it is often, l:,ut not always the case that the corresponding
sample value is
the MLE of tk e population
parameter.
(b) Figun 7-6 is the likelihood
function
for the particular sample we
observed,
(i.e., 4 heads in 10 tosses).
A different sample result would
call
for
a different like ihood function,
and
hence
a different MLE.
L(\177)
(\177)
.5
Gives
maximum
1.0
Hypothetical values of
population
proportion, rr
L( \177r)
FIG.
7-6
An
hypotheticid
ex,\177mple
of a
population
likelihood function. L(rr), the

proportions
would yield
likelihood
the
sample
that various
we observed.
function
144
ESTIMATION
L(\177r)
or o(x;
L(,r),
of various
values of \177r
hkelihood
hypothetical
the given sample

successes in 10 trials
yielding
/'4
4
(same
as 6in
Fig.
8
7-6)
of
10
x=
of
number
10 trials
p(x; ,r), probability

successes in
of x
trials, given
\177r
=
10
.8
1.0
b
= populabon
proportion
The binomial probability
7-7
FIG.
function
p(a:;
rr) plotted
against both x
and
rr.
is related to our previous deductionsabout

and 6. In this figure we graph the binomial
probabilities\234(x;
n, \177r). [Since
n is set at 10 regardless,this function
is referred
to simply as \234(x; ,r)]. In earlier chapters we regarded rr fixed and x variable,
as in slice a; thus the dotted function shows the probability
of various
numbers of headsif the population
proportion
is given as .8. In this chapter
we
regard x the observed sample result
as fixed,
while the population \177ris
thought
of as taking on a whole
set of hypothetical
values; thus slice b shows
7-7
In Figure
the
discussion
this
in Chapters
binomial
the likelihood
heads. Slicesin
that
the
3, 4,
various
a direction
possible
population
are referred
proportions
to as probability
would yield 4
functions,
direction are calledlikelihood

functions.
We now generalize maximum likelihoodestimation. (A summary
results is shown in the last three columns of Table 7-2 for reference.)
(b)
General
function
of our
Binomial
our result in the previous section was no

It is very easy to show that
estimate
of the binomial rr is
and that the maximum likelihood
the
sample
proportion
P.
Given any observed sample of x successes in n trials, the likelihood
accident,
ahvays
while
in the b
'slices
is
145
MAXIMUM-LIKELIHOOD
calculus
function
With
hood
can
easily be shown
>ccurs when ,r = x/n
il
=P,
[MLEofrr
the maximum
P. Thus
\177athat
value of
this
likeli-
the\177ampleproportion.]
We
argued
in ('\177hapter
1 that
it is reasonable to use the sample
proportion
to estimate the >opulation proportion; but in addition
to its intuitive
appeal,
we now add the\177more
rigorous
justification
of maximum likelihood' a polbulation
with rr -- > would
generate
with the greatest likelihood the sample
we
observed.
of
(c) MLE
is
have drawn
we
Suppose
which
a sample
is to
problem
o '\;") our
N(ht,
Normal Population
any
of
Mean
th
MLE
the
population is normal,
\177population mean 3t is:
sample.
Becaus e the
value x,
g\177ven
--
p(x;/z)
this
3t for
any
getting
of
probability
the
population
from a parent
of the unknown
x.,., xa)
(x\177,
find
(7-26)
(\177/20\177)(\177-\177)\177-
x,,/2 fro .2
th4
Specifically,
in our
first
while the
draw p(x\177
is ; 3t)
saiple
probabfilties of
e-(1/2a
the values
drawing
p(x\177;
3t)
\177
find where
,(rr) is a
maximum, set the
x\177that
we observed
(7-27)
)(x\177-\177)
\177
and
x2
x a are,
respectively
e-(\177/'\"\1771(\177 \'\177")
l
x,/2 fro
\177'\177
To
get the value
we would
that
probability
(7-28)
\177
equal
derivative
to zero.
d\234(\177)
(;)
(7-25)
[rr\177(n
-- x)(1
--
rr)'\177-\177-\177(
-- 1)+
x\177r:':-\177(1--
rr)'\177-\177]=
\177r\177-\177(1
-- rr) \177-x-\177,(7-25)becomes:
--rt0*
x)
You
can easily
'
co' lfirm
+ x(1
--nrr
\177.
rr)
q-x
=0
ti
that
this is a
maximum
(rather
than a minimum
or
inflection
point).
146
ESTIMATION
p(x;
/\177,)
Xl
x3
x2
(a)
p(x;
Xl
x2 \337
x3
(b)
\337
7-8 Maximum likelihood estimation

of the mean (p) of a normal
on three sample observations
(a:x, ace, %). (a) Small likelihood LOt,),
three ordinates.
(b) Large likelihood LOt0).
FIG.
based
of the
population,
the
product
and
p(xa; P)
assume
probability
We
joint
means
e-(1/
2a2 ) ( x a-tt ) -\370
(7-29)
as usual that
Xx,
X.,., and
Xa are independent so
function is the product of (7-27),(7-28),
and
(7-29)'
p(,x\177,
where ii
--
\"the
x2,
xa; p)
product
the
that
=
of,\" just
as \177 means
\"the
sum of.\" But
in
our
valuesxi are fixed and only p is thought

of as
varying over hypothetical values;we shall
speculate
on these various possible
values of/\177, with a view to selecting the most plausible. Thus (7-30) can be
estimation
problem
the sample
MAXIMUM-LIKELIHOOD
of
I[kelihood function
as a
written
147
\177
MLE
The
likelihood
consider
We
of \177is defined as the hy othetical

function (7-31). Its valuPe may
only a geometric interpretation in
in Figure
as
\177,'
served.Altho(\177gh
za (i.e., the
of all
\1770
fact,
m\177 \177amp\177e\177mean
prove\177,
as
7-12.
fi\177ced.
values'
(d) MLE
from
from a
'of
\177
\177 joint
tate MLE
in
\177opulation
with
popu
unknown
random
Parameter
a: y
now
We
the
in
seems
of xx,
is greater for
ar ently re mred
'
likelihood
is a pp
\1770
that
\177,
the
and
MLE
of
xa; this
\177qmight
in
c\177n,
this
is to
mathematically
Population
any
full
do
generality.
A sample (x\177, x\177...z\177)
function p(x; 0) where 0 is
we wish to estimate. From our definition
probability
that
of
tp\177
.whole
is obtained
sample
observedsample
values
of 0
by
maximizes
this
as fixed,
probability9\"
the
multiplying.
(7-32)
x\177; 0)
values
is
any
the
each
regard he
hypothetic\177
the
way to
the
But we
On
small.
7-8b is
(with replacement, or from

an infinite
population),
with the probability function p(x\177; 0), hence
sampling
probabilit\177
its
ation parameter
indepenpent,
are
population
who has carefully learned that \177is a fixed population

how it can appear in the likelihood
function
(7-31)
is simply a mathematical convenience.The true
vaiue
of
But
since it is unknown, in MLE
we must consider all of
its possible,
or hypothetical
treat
it as a va :iable.
drawn
of a
quite
more likel
. yto
are collectively closer to \1770,
in Figure
shift
sample.
It
average
value
in Problem
'his
of
large,
wonder
ma\177
in fact,
\177 is,
z\177 are
that a population
the sample we obthe probabfiitv of
so far distant \177om
reader
the
F\177nally,
parameter
as a variable.
to yieid
values
\177f the
me
\177.e.,
be.
We
Thus
httle additional
hkehh\177od
\177fiff\177J_z\177,t\177e
be
note
of\177.
likely
\1770 as
values. S\177nce the x

probability.
the spmple
have
a greater joint
han
\177
indeed,
very
. for\337\177, \177
\337 .
they
to
with
mean
.
a population
han\177
_L
we
but
7 8
Fieure
z\177 and
calculus,
with
the
maximizes
which
of/\177
derived
small because it is
three probabilities [i.e. the likelihood
'
the sample (xt, x2, xa)] is therefore
above
generating
generate
of
za) is very
probabilities
the
ordinate
\177,. The prod\177ct

with
mean
\177,\177
the other
vaIues
7-8a is not very
hypothetical
\177ut\" two
\"try
mean
with
value
be
and ask,
This
\"Which
is
of
emphas\177zed
all
148
ESTIMATION
L(O)
The MLE
function:
(7-32) the likelihood
by renaming
is that
I-I
of 0
value
hypothetical
(7-33)
O)
this
maximizes
that
likelihood
function.
(e)
Estimation
MME)
versus
(MLE
of Moments
vs Method
Likelihood
Maximum
above, we have estimated

a population proportion with
a sample
proportion,
and a population mean
with
a sample mean. Why
not
always
use this technique, and estimate
any
population
parameter
with the
corresponding samplevalue ? This is known as method of moments
estimation
(MME).
Its great advantage is that
is it plausible
and easy to understand.
In the analysis
MLE
Moreover,
But
suppose
often coincide.
MME
and
do
two methods
the
circumstance MLEis
(as
differ
7-14)? In such
in Problem
appeal of MMEis
The intuitive
superior.
usually
more
impressive advantages of MLE. Since MLE is

to generate the sample values observed,it is
in some sense the population value that
\"best
matches\"
the observed sample.
In addition, under broad conditions
MLE
has the following asymptotic
than
by the following
offset
the population value
likely
most
properties'
1. E\177cient,
is,
that
than
variance
smaller
with
2. Consistent,
estimators.
other
any
with variance tending
unbiased,
asymptotically
to zero.
3.
it may
be readily
\177,
Theorem
proved
MLE
the
(6-13);
mean and variance;
hence
used to make inferences.
For example,we
for
easily computed
with
distributed,
Normally
already
have
of/\177
in
a normal
seen that
these three propertiesare true

[Property 3 follows from
population.
2 follows from (6-10) and (6-11);Property

texts, and has beenalluded
to in (7-17).]
Property
in advanced
We emphasize that
large samples as n \177 ,v.
for example, MLE is not
these
But
properties
for the
necessarily
are asym?totic,
small samples often
that
used
is,
1 is
true
for
by economists
best.
PROBLEMS
'7-11
Following Figure 7-6, graph the likelihood

6 headsin 8 tosses of a coin; show the MLE.
function
for a
sample of
'7-12
Derivg the MLE
'7-13
(a) D\177!rive
kilowE
'7-t4
As N
(b)
the
of/\177
MLE
nt
tmbered 0,
numbe\177
N,
numbe
at is the
(b)
at is
2.
MME
of N?
provided
corridor
Is
the MLE of N? Is it
calculus\370
assuming/.t is
it
the
a sample
successive
unknown
of 5 tags
'
biased
biased
given
EADING
lescription of the
Lindgreh,
distribution,
normal
the
arrived at a convention, they

were
1,2, 3,... N. In order to estimate
detailed,
1. Wilks,
for
a brief walk in the

37,
t6, 44, 43, 22.
(a)
FURTHER
For a
.=d
a\"
using
unbiased
t delegates
tags
of
population,
a normal
for
149
READING
FURTHER
\177
\337
S.
B.
virtues
of MLE,
Mathematical
Statistics,
W. Statistical Theory,
see for example
New York'
New
York'
John
Wiley
Macmillan
& Sons (1962).

(1959).
chapter
Estimation
8-1
IN
DIFFERENCE
TWO
MEANS
In the previouschapter, we used a sample mean to estimatea population

mean. In this chapter
we will develop several other similar
examples
of how
a sample statistic is usedto estimate
a population
parameter.
Whenever two population means are to be compared,it is usually their
differe\177ce
that
is important,
rather than their absolute
values. Thus we often
wish
to
estimate
[gl -A
reasonable
in
sample
estimate
of
this
difference
/tg2
in population
means is the difference
means
(8-2)
(Assuming normality of the parent populations,this is the maximum likelihood estimator, with many attractive
properties.)
Again, because of the error in point estimates, we are typically
interested
in an interval estimate. Its development
is comparable
to the argument in
Section 7-1, and involves
two
steps:
the distribution
of our estimator
(.\177 -- -\1772) must
be deduced;
then this can be \"turned around\" to make an
inference
about
the population parameter
First, how is the estimator (.\177 -- \177) distributed? From (6-31) we know
that
the
first sample mean .\177x is approximately
normfilly distributed around
the population mean
bq as follows.
where
the
o\177 represents
sample
drawn.
the variance
Similarly
of the
first
population,
and
nl the
size of
(8-4)
150
DIFFERENCE
deviation
Standard
FIG. 8-I
random vari{,bles .\177

can
(6-13)
b\177
true,
assuming
(.\177tthat
approximateliy
true (by the

populations.
Un_der tttese
2'2) behaves
(5-31),
two
the
and
(5-34\177),
8-1 Equation (8-5) is

are normal; it still remains
in Figure
shown
is
.\1770.)
both populations
central
conditions, our
can
now
(8-5)
practically a\177y
(.gt
that
ensure
will
hence
independent;
are
-\1772
=
of
distribulion
This
and
directly'
applied
y,)
exactly
of (.\177 -- -\177o,),
Distribution
sampling procedures
of the two
MEANS
density
Probability
Independenc
IN
knowledgein
be turned
large
for
theorem)
limit
samples
the estimator
of how
(8-5)
around to construct
from
confidence
the
interval:
95 \177 confidence
(\177-
When
for
cr\177and
(btl
--
go, have
The variinces
not known;
is an
new
\177o,) 4-
a common
1.96
in
means
(ttt-/to,)
(8-6)
/o_\177 q-c\177_.\177
value, say a,
95,Voo confidence
the
interval
bto,) becomes'
(\177-
s\177and
for the difference
interval
tl\177.e best
of
the
the
\177\") 4-
two
statistician
1.96ox,//
populations,
-3-
(8-7)
cr\177 and
can do is guess at
cr\177in
them,
usually
the variances
(8-6)are
with
s\177 he
cbserved
in his two samples. Provided his sampleis large,
this
accurate enough approximation;but with a small sample, this introduces
source
of error. The student will
recall
that
this same proble m was
152
II
ESTIMATION
a single population
in estimating
next section we shall give
encountered
in
mean
Section
7-1. In the
problems of small-sample
for these
solution
estimation.
PROBLEMS
8-1
12 minutes
sample
minutes
11
of
of 50 workers
to complete the
minutes. Construct a 9570
in
large
a second
task,
interval
confidence
standard
with
average of
2 minutes.
plant took an average
plant took an
deviation
of
large
a standard
for the
of 3
deviation
difference between
averages.
population
two
the
one
complete
to
random
of 100 workers in
a task, with
sample
random
8-2 Two samples of 100 seedlings

were grown with
two
different
fertilizers.
One sample had an average
height
of 10 inches and a standard deviation
of 1 inch.
The second sample had an average
height
of 10.5 inches and
a standard deviation
of 3 inches.
Construct a confidenceinterval
for the
difference between the average population
(a)
At the 95 \177olevel
of confidence.
(b)
8-3 A
random
The
first
6. The
of confidence.
90 70 level
the
At
heights
of 60 students was taken

in two different
universities.
an average mark of 77 and a standard
deviation of
second sample
had
an average
mark of 68 and a standard
sample
had
sample
10.
of
deviation
(a) Find a 9570 confidence

interval
in the two universities.
for the
difference between the
mean
be necessaryto cut
error
marks
(b)
(c)
error
the
by 1/2 ?
increase
What
allowance
in the sample
to 1.07
size would be necessaryto
SAMPLE ESTIMATION:
8-2 SMALL
We shall
sample size would
in the
increase
What
allowance
assume in
THE
that
section
this
reducethe
DISTRIBUTION
the populations
are normal.
(a) One Mean,

In
estimating
a population
generallyhas no information
he uses the estimator s, the
mean/\177
on
sample
the
from
a sample
mean X, the statistician
population
standard deviation
standard
deviation.
Substituting
or;
this
hence
into
DISTRIBUTION
p (t)
as t
same
Normal,
...,..\177/
w\177thd.f.
oo
/
'\177
1.96
= z.025
t.025 =
FIG.
The standard
(7-10), he est:mates
the 95
\177
and the
distribution
normal
interval
confidence
\177a=
F:
.. d.f. = 5
d.f. = 2
! distribution
t';
t. o2\177=
4.30
cornDared.
as,
for/\177
s
Z.o\177-
q-
(8-8)
smaller
large (at least 25-50, depending

on
the
accurate approximation. But
his sample is
Provided
required),
will
this
be
sampie size, this
error. Hence if he wishes

must be broa lened.How
that
Recall
\177Vhas
precision
a reasonably
an
introduces
substitution
to remain
95\177o
confident,
with
appreciable
source of
his interval
es'timate
much?
distribution;
a normal
when
is known,
we mav
standardize, obtaining
_x.-g
\"Student
Z is ,,th'\177standard
x
(8-9)
a/\177/n
where
ariable,
variable
normal
defined
as
t
By
analogy,
we
introduce
a new
(8-\1770)
similarit'
of these two variables is immediately
evident.
The
only
difference is tt\177at Z involves or, which
is generally
unknown;
but t involves
s, which can \177lways
be calculated
from an observed sample. The precise
distribution
ot t, like Z, has been derived
by mathematicians
and is Shown
in Table V of
he Appendix.
The distribution of t is compared
to Z in \177igure
The
8-2.
This
and
t variable
later
because
proved
it is not
was first introduced by Gosset

writing under the pseudonym \"Student,\"
valid by R. A. Fisher.
We make no attempt to develop
the entire'proof,
\177eryinstructive.
It can be found
in almost any mathematical
statistics
text.
] 54
II
ESTIMATION
(We
forget
in order
on,
letters denoted their
to conform to
values, we shall use small
realizedvalues.
entirely
we shall
usage,
common
represent
either
letters
! and s, and
to
letters
capital
now,
Until
notation.
in
small
while
variables,
convention;
this
a break
emphasize
must
denoted random
But from now
random variables or realized
capital letters X, X,
Z,
P,
etc.)
As expected, the t distribution

is more spread out than the normal,
since the use of s rather than a introduces
a degree of uncertainty. Moreover,
while
there
is one standard
normal distribution, there is a whole
family
of
t distributions. With
small
sample
size, this distribution
is considerably
more
spread out than
the
normal;
but as sample size increases,the t distribution
approaches
the normal, and for samplesof about
50 or more, the normal
becomesa very
The
rather
accurate
x 2 we
approximation.
t is
not tabled
may write
d.f.
For example, for a
Appendix Table V that
in the upper tail is
s'
-----
sample with
the
in
8-2.
Figure
Pr
Substituting
for
critical
(--4.30
t value
a sample
for
of size
now
be
--
(8-11)
d.f. --
2, and
2\177
leaves
which
we
find
from
probability
4.30
<
<\177t
it
that
follows
4.30)=
for any observed
(8-12)
95\177
(8-10)'
Pr --4.30<
This deduction can
3, then
n =
By symmetry,
! according to
of freedom
degrees
t.o\177os =
This is shown
to sample size (n), but

divisor
in s 2. Thus, in cal-
according
of freedom,\" the
to \"degrees
according
culating
of
distribution
s/x//\177
\"turned
3, the 95 \177 confidence

= x
<
= 955/0
4.30
around\"
into the following inference:
interval
+ 4.30
for/z
is
(8-14)
phrase \"degrees of freedom\"

is explained
in the following
intuitive
way:
Originally there are n degrees of freedom in a sample of n observations. But one
degree of freedom is used up in calculating
,V, leaving
only n -- 1 degrees of freedom
for
the residuals (X\177-- .\177)to calculate s2.
For example,
consider
a sample of two observations,
21 and 15, say. Since X -- 18,
the
residuals
are q-3 and --3, the second residual necessarily being just the negative of the
first. While the first residual is \"free,\" the second is strictly
determined;
hence there is only
1 degree of freedom in the residuals.
Generally, for a sample of size n, it may be shown that if the first n -- 1 residuals are
specified,
then the last residual is automatically
determined
by the requirement that the sum
of all residuals be zero, i.e.,Z(X\177 -- X) = 0.
9.
The
'DISTRIBUTION
For a general sample
95 % confidence
size n, the
155
for/\177 is
interval,
where (00.5 is
wi'
tail,
upper
the
n
To sum t
critical
t value leaving
1 degrees of freedom.
is substituted
17-10). The
or
or,
when
do
normal
the
estimation
is that
difference
consequence
question
practical
\177ve use
of t
simiIarity
only
as a
and
value.
impo\177,.ant
An
and
norma
the
in the
probability
the
of
2\177
--
estimation in p, we note the

for
(8-15)
.g 4- t.o25- \177
,u
is:
an observed
a critical t
the
(s)
be substituted
the
we use
do
\"When
?'\177'i'\177'\"'Cr\"is kno\1777\177
sample value
must
value
normal
and
(8-15)
in
di\177trlh,,tic\177n
normal-d';\1777\177'\177,',\177,\177\".'\177
if cr)s unknown, then t\177i'b-\177tdistri155\177\"iio'n is theoretical'l\177;;;\177't\177r'i:

of sample
size. However, if the sample sizeis
the
normal
\177san accurate
\177nough
approximation
s of the t. So in\234ractice
the t distribution
is used only for small sam?lea when o /s unknown
.... and
the normal is used
but
used;
ate
large,
regardles\177
otherwise.
t/on
from
\177
e :t distribution
the
whic\177
requirement fcr
all
our
one additional
has
sample
is drawn
estimation,
small-sample
the #arent
requirement.
is assumednorma/.
even if
cr
p\370Pula
normalit
(But
is
known.
v is a
Recall
population was validated

by the central
mir theorem only if the sample size was large.)
As sampl,
size
(n) decreases,
estimation becomes less precise (i.e.,
interval
estima{es
become
wider). The two reasons for this are clearly
distinguished
in (0-15). First, the divisor
x/';
becomes
smaller. This appears in
(7-10)
as well as in (8-15);
thus even if tr is known and inference is based
on
the
normal
di\177
;tribution,
the error allowance increases and the
interval
estimate
becorr
es wider as a consequence.The secondaryreasonfor loss of
precision
occur' if s must be substituted
for an unknowl\177 or. The
smaller
the
sample, the mo
e the appropriate t distribution
will depart from the normal;
and the more s
\177read the
t distribution,
the broader the interval
estimate.
from Chapter
(b) Difference
that
n Two
inference
Population
We shall a
sume, as often
populations ma
have
a This
may be vet
about
a nonnormal
Means (3q --/to,)
occurs
in practice,
different means, they
that even
have a
common
two
the
though
varianc
led from TableV. For example, a 95 \177 confidence

interval constructed
a sample of si
_,e 61 should
use a critical
t value of 2.00; but the use of the normal
Value
of 1.96 as an apprc\177ximation
involves
very little (2 \177) error.
As we scan dm
rn the t.0e\177column in Table V, these critical values approach z.09,\177_-= 1.96.
this verifies
Figure
8-2, where the t distributions approach
the normal.
from
ESTIMATIONII
156
When
mated. The
both
to
(8-7) is
o \370-is known,
appropriate estimate
samples,
and then
obtain
an unbiased
where
X\177
sv for o in
95 % confidence
appropriate.
(8-7)
ith observation
the
that
requires
(n\177
1) + (n2
called the pooled samplevariance'
estimator
the
represents
of freedom
degrees
the
the
in
also
- I),
Substitution
of
be used, obtaining
the
sample.
first
distribution
esti-
be
squared deviations from
add up all the
is to
divide by
o 2 must
unknown,
When
interval'
/I
(8-17)
'N
where
t.o2, is the
critical
with
value
d.f. =
I? 1
1ll
2F n.z --
112
2.
PROBLEMS
8-4 Sixteen weather

stations
at random
locations in a state measure rainfall. In 1967, they recorded
an average
of 10 inches\177 and standard
deviation
of 1.5 inch. For the mean rainfall for the state,
(a) Construct a 95 % confidence
interval.
(b) Construct a 99 \177oconfidence
interval.
8-5 100 cars on a thruway
were
clocked
at an average speed of 69 m.p.h.,
with a standard deviation of 4 m.p.h.
Construct
a 95 5/0 confidence
interval
for the mean speed of all cars on this thruway.
(8-6)
A random
sample of 4 students in a large statistics course received the
following marks' 56, 70, 55, 59.Construct
a 95 5/0 confidence
interval
course.
8-7 From
a sample
of five random normal numbers from Table IIb find
a 95 % confidence interval for the mean of the
population.
8-8 Five people selected at random
had their breathing capacity measured
before and after a certain treatment, obtaining the following data:
for the
average
mark
all students
of
in the
Breathing Capacity
Persoil
(X)
Before
After
(Y)
Improvement
+ 100
2750
2850
2360
2380
2950
2800
2830
2860
+30
2250
2300
+ 50
-3-20
-- 150
157
PROPORTIONS
Let/\177x
(and/\177>,)
wt
(a)
mean capacity of
be the
;er) treatment.
af
(and
(point)
is the
at
(b) Co \177struct
estimate of the mean
a 95 \177oconfidence
Given the
8-10
assuming
means,
p?pulation
\177nthe
25
'\1771
60.0
S1 =
15
\177.,
68.0
s., =
method as in
8-12 Derive le confidence
8-3
PROBLEM
in disguise.
of 10, then
=
\337
P
\177=\177 \177a
The
portion
the
(1
95,5/o
We confirm
(the
th\177it
standard
deviation of P
But we
right-hand
(8-18)
se\177m
is just
deviation
as given
to have
in
P 4-
+ 0) =
the population
an interval estimate
rr is just
deriving
the
for
estimate
interval
--
1.96\177/rr(1
a recasting
of.\177)
1 q-. 0
1 +0+
to modify (7-10),
interval for rr is
rr
\177/cr-\177
of
method
\177implest
is therefore
confi
lence
AGAIN
+0+0+0+
proportion
thetlpopulation
Similarly,
disguise.
ONCE
we saw that
a sample
proportion
P is just a
For example, if we observe 4 Democratsin a
6-6,
Sectioi
In
mean
THE
PROPORTIONS:
POPULATION
ESTIM\177,TING
ELECTION
Use practically
(7-4).)
(8-5).
from
(8-6)
interval
d\177fference
12
10
(Hint.
(8-13).
to equation
footnote
the
l\177he
--
(/\177
interval (8-14) from
ae confidence
the sam
of 20
devih-
squared
0.2
for
interval
'5 ,g/oo confidence
Derive
was 27
age
sample
2 populations'
from
o\177 =
--/&v)
0'2.
n\177 =
and assume
Find a
a\177 =
samples
random
following
n.o
\3378-11
improvement (/\177
for
interval
In
a random
sample of 10 football players, the average
and the] sum of squared deviation\177 was 300' In a random
hockey iplayers, the average age was 25 and the sum of
!\177ons w\177s 450.
Estimate,
with a 95 \177oconfidence
,nterval,
8-9
before
population
whole
the
in
for a proThus
a mean.
(8-18)
rr)
by
\177
mean'/\177
of (7-I0)..\177
is replacedby
is replaced
sample
sample
x/\177r(1
--
rr)/n
Pi
and
[the standard
(6-30)].
reached an impasse; the
side of (8-18).Fortunately,
the
situation
unknown
has a
rr appears
i'n the
remedy' subslitute
158
the
? for
sample
before, when
this
II
ESTIMATION
we
substituted
right side of (8-18).This

is a strategy
s for \177in the confidence
interval
introduces another source

great problem. Thus'
approximation
sampl e size,this
the
\177-in
for
val
a large
with
inter-
confidence
95 }5
\177r
is'
,r
For
but
error;
used
Again,
is no
For large samples,the
where z.0,,5
tail. As an
of
we have
for/\177.
(8-19)
= P
4-
Z.o25
- ?)
is the critical
value leaving 2\177%
example, the voter poll of Section
probability in the
this formula.
of the
upper
1-1 used
options. The simplest is to read

8-4, a table which
is constructed
in the following manner. The first step is the mathematical deduction of how
the
variable
estimator
P is distributed, for any population
rr. This
is shown
for a sample size 20 in Figure
8-3. Thus,
for example, if rr = .4, then the
sample P has the dotted
distribution
shown in this diagram, and there is a
95 \177oprobability
that any P calculated from a random
sample
of 20 will
lie
in the interval ab. For each possiblevalue
of rr, such a probability
function
of ? defines two critical points like a and
b. When
all such points are joined,
the
result
is the two curves enclosinga 95 \177oprobability
band.
This description of how the statistic ? is related
to the
population
rr
can of course be \"turned
around\"
to draw a statistical inference about rr
from
a given sample P. For example,if we have observed
a sample proportion
the
?\177
interval
samples,
small
estimate
11/20 =
.55,
then
there
are several
for rr from Figure
the
95 %
confidence
for
interval
rr is
defined
byfg, the
Probability
Outline
/
\177
'
of the
probability
when
P,
..//'-../
Hypothetica
\177= .4
obse\177ed
P\177=
.55
FIG. 8-3
Distribution
1.0
Pg
of P.
discrete
function of
7r = .4
PROPORTIONS
',\177
159
1.0
,.9
.8
.7
.6
.5
.4
.3
.2
.I
0
.2
.t
.3
.4
.5
Sample
FIG. 8-4 95,5/,!

confidence
permission
f
Professor
of Confidence
width of
Limits
band
\177r\370bability
the
of ttte?
vertical
This
is
tt\177e
above
P\177,
< rr <
i.0
logic
i.e.,
.77
interval
axis, the (induced)

of the rr axis.
same
.9
(I954), p. 404.]
probability
(deduced)
direction
.8
for population proportions (rr). [Reproduced

with
from C. J. Clopper and E. S. Pearson, \"The Use
Illustrated
in the Case of the Binomial,\"
Biometrika,
26
.31
Whereas
.7
E. S. Pearson
this
direction
.6
intervals
the
Fiducial
(8-20)
is defined
confidence interval
we have used in
pause
briefly to
deriving
the horizontal
in
is
defined i n
confidence
the
intervals
review, because this is a imore

generalized ar\177.urnent
than
we have previously
encountered. Suppose the
true
value
of rr is .4; then
there
is a probability
of 95 \177othat a sample P will
fall between
a 4tnd b. If and only if it does (e.g., P0 will the confidence
interval
we construct \177racket
the
true rr of .4. We are therefore
using
an estimation
procedure whii:h
is 95 \177o probable
to bracket
the true value of rr, and thus
before. We
will
nevertheless
ESTIMATIONII
160
yield a correctstatement.
sample P will fall beyond
recognize the 5 \177 probability

that the
this
case our interval estimate
we must
But
a or b
(e.g., Po,);in
our conclusion will be wrong.

Why is this
a more
general theory of confidence interval
estimation
?
In previous instances (e.g., estimating
a population
mean,/\177) we constructed
a confidence
interval
symmetrically
about
our point estimate \177. But in
will
not
bracket
rr --
.4, and
rr, no such symmetry

is generally
involved. '\177For example,
with
our observed sample proportion P\177 = .55, the confidence interval
(8-20)
we
constructed
for rr was not symmetric about our point estimate .55.
The
95 3/0 probability
band in Figure 8-3 is set out in Figure
8-4, along
with
the
similar
bands
appropriate
for other sample sizes. This
neater
estimating
is used to construct 95 % confidence

As an example, if we have observeda P
diagram
confidenceinterval
for
rr is
in
of 15, the
a sample
95
approximately
.32 <
For
for
intervals
= .6
the same P = .6 in
rr is narrowvet'
<
rr
.84
sample
larger
of 100, the
95%
confidence
been
used,
for
interval
.50 <
Alternatively,
the same
with
such a
result, i.e.,
rr
<
.70
large sample,(8-19)could
have
with
,'(.6)(.4)
\275r--
.60 4-
= .60 4-
1.96\\/
100
.lO
there is a third
method
of estimating
rr that we introduce, not
for its practical value as for its illustration
of this useful principle'
with
a little
imagination,
several alternative methods of solving
a problem
can often be developed,and the most appropriate
one to use in a given set
Finally,
so
much
of circumstances is a matter
of judgment.
Let us be conservative, and ask: what
interval
estimate
in (8-18), i.e., what
is the
is
the
maximum
maximum
value
width of the
that
the
error
The student may have wondered why the 95 \177oprobability

band does not converge
on the
two end points
O and R. It is true that
one half of this band (madeup of all points similar
to b) does intersect the P axis at 0; this means
that if ,r is zero (e.g., no Socialists
in the U.S.),
then
any sample P must also be zero (no Socialists
in the sample). But the other half of this
band does not intersect the ,r axis at 0; instead
it intersects at h. This means that an observed
P of zero (e.g., no Socialists
in a sample) does not necessarily
imply that ,r is zero (no
Socialists
in the U.S.).
I. 6
allowance
,
//Tr(1
'
can be
that
\177
'
for
interval
confidence
valu\177F\177fe\177((IsJ;-\177i7)t't\177\177*;'\1775/o
writte
:'\177
shown'
It Is easily
have
----:'
can
-- rr)
PROPORTIONS
(with
7r
the
161
maii\177imt, m
sample),
a large
/n
or simply'
anl
1/4,
than
(8-21)is an
For
aming the worst; if, in

our interval estimate
is as:
this
But
Democrats
accurate
For
be
not
In these
1/2.
to
close
-- rr)'is less
or to restate,
of confidenc e.
rr(1
then
wide;
this
fo r 7r with at least a 95\177 level

simple formula is sometimes used in
the basis of historical experiencethat
the
on
knbwn
isI
need
very
this
it is
where
not 1/2,
7r is
fact,
esti mate
i \177
terval
example,i
(8-21)
.98
P +
rr =
'
polls
political
of
proportion
circumstances (8-.21)becomesia
very
approximation.
we
coml\177leteness
the 95
write
interval
\177 confidence
for the
difference
2 proportl9ns:
in
large n,
r_
'\1772) =
(P\177
P2) +
l --
/Pi(
1.96
is derivei
The simplest
u ay
To prove
way as
calculus, setting the

we may simply
is with
it without
same
the
essentially
in
derivative
calculus,
graph
(8-22)
P2(I -- P2)
tl 2
Hi
\177
This
P1)
(8-6).
of rr(l -- rr) equal to zero.
f(rr) -- rr(l -- rr). as follows'
for)
1 \\
\\
Note
that for ei her extreme
then rr(l --
rr)
r\177
aches its
value of
maximum
rr
(t
value
or
O)
the
because
value of
f(rr)
of symmetry
is zero'
of the
and
parabola.
if ,-r
1/2,
162
II
ESTIMATION
PROBLEMS
8-13
Construct
a random
(a) If the sample

(b) Ifn =25.
size
company,
20%
confidence
certain
by
company's standards.Construct
proportion rr (in the whole
meet the
not
do
which
produced
of tires
sample
random
did not meet the

interval for the
(8-15)
were 4820 Republicans in
of 10,000.
sample
8-14 In a
proportion of Repub-
for rr, the
interval
U.S., if there
in the
voters
lican
a 95,5/o
of tires)
population
standards,
n = 10.
= 2500.
(c)
If n
By
talking
you discover
I5 voters
to
3 favor
only
that
a certain
candidate. Construct a 95 \177 confidence

intervaI
for the proportion
of all voters favoring this candidate.
(8-16) In a random
sample
of 100 British smokers, 28 preferred brand X.
Construct
interval
to estimate the proportion of all
intentions,
consumer
U.S.
of
survey
X.
who prefer
smokers
British
8-17 In a
indicated that
sample
of 2500
a year.
Construct
intending
a new car
intended
they
families
U.S.
(a) Answer
interval
498 families in a random

to buy a new car within
for the proportion of all
purchase.
ways:
two
(1) Using
the usual formula (8-19).
(2) Using
the
formula
simplified
(b) If the sample P
had
(8-21).
the error
would
.40,
been
in
(a2)
been
have
greater?
8-18 If
rr
(8-21)
3/4, what is the precise percentageerror introduced by using

rather than (8-19)? Does this suggest
that (8-2I) is a reasonable
95
\177oconfidence
of safe
1/4
provided
approximation,
8-19 A sample of 100cars was

the cars passed the safety
cars
8-20 A sample
in
the
taken
1948.
interval
for
two cities.
of 3182 voters
Construct
3/47
cities. In one city 72 of

passed. Construct
the difference between the proportions
yielded
of 2
in each
the
second
the
only 66
following
SenatorJoseph
* From S. M. Lipset,\"The Radical

Right\" in
York Criterion Books [1955].
New
\177r<
test; in
relating their attitudes to

in
<
interval
Bell, Daniel,
frequency
McCarthy
for (%
ed.
and
--
The New
table,*
their
rr2),
American
where
vote
%
Ri\177ght,
163
VARIANCE
is the >roportion of
and rr 2
who were
voters
Democrat
all
voters who
of Republican
proportion
is the
were
pro-McCarthy,
pro-McCarthy.
to
Attitude
McCarthy
[
1958
Pro
Anti
Democrat
506
1381
Republican
563
\177,Vote
trban
In an
790 favored
180 opposed the

99,5/o
Cdlnstruct
of city
prdportions
the
certain legislation.In a rural

same legislation.
confidence
interval for the differencebeiween
and country voters who favor the legislation.
1000,
of
survey
of 300,
survey
'8-4 ESTIMATING
THE
VARIANCE
POPUI,ATION: THE CHI-SQUARE
is
There
732
OF
A NORMAL
DISTRIBUTION
a confidence
interval,
interesting
\177ot so
for the insight it provides.
Consider a normal population N(t\177 , cr \177)with both/\177 and cr2 unknown.
So far we hav estimated a2 with s 2 only as a means of finding
a confidence
interval for )t. Now suppose, on the other
hand,
that our primary interest is
\177na
, rather
is there
of the
in
example of
ne further
for its
much
value 6 as
tactical
example,
For
tl}an/\177.
requirement
countr\177's
of
is the variance
What
policy aimed at
if so,
distributed;
have
We
of \177=; but to
estimator s=
random
the
that
how do we
.\177iready
construct
C
farm
stabilizing
we shall assurge
we may
to
\"How much
ask
is necessary.
income
variable
variance
\177To
(e.g., farm
estimate
variance
income) is normally
proceed'\177
7-2,
an interval estimate
in Section
seen,
around
istributed
wish
of payments?\" in order
to get some indication
of foreign exchange reserves.Or, we may ask
farm
income9\"
in order to evaluate whether a
balance
Jap\177an's
\17727\"
To
that
s 2 is
for \177=we
answer
an unbiased estimator
must ask' \"how ls the
this,
it is
customa[y to
interval for a\177'

is of limited practical use is that
it depends
th e parent
population
is normal. By contrast,
most of the
confidence interv'.ils
for means remain approximately true
even if the parent population
is
nonnormal;
such confidence intervals
are called robust.
One reason tha'
crucially on the
th e confidence
ssumption
that
? Income
ition
reasonably
income
stabiliz
policies
high evel.
(i\177). Here
i We
Thus
are almost always

designed
to stabilize income around
a
aim both at reducing variance
(a =) and raising average
they
concentrate
only
on the
variance problem.
164
II
ESTIMATION
p(C2)
= 50
\337
<--d.f.
'
= 10
,,\177d.f.
\177
d.f.j:2 \177
.325
(where
8-5
FIG.
define a
C 2:
2.05
$2/o-2
s 2 = a 2)
of
Distribution
the
chi-square,
modified
C 2.
variable'
new
,A
s2
C2 _
(8-23)
0 .2
Of
when s 2 =
course,
C
is
\"how
ratio is
1 ?\"
this
around
cr 2,
distributed
1;
thus
our
question
can be
rephrased:
a modified Chi-square variable, with

n - 1 degrees of freebeen proved by advanced
calculus
that the distribution
of C 2 is
that of Figure 8-5; critical values
are given
in Appendix Table \276I.
Since
its numerator
s \177and denominator
rr 2 are both positive,
the variable
C\177is also always positive, with its distribution
falling to the right
of zero in
Figure 8-5. For small
sample
values
we note that
it is also skewed to the
right; but as n gets large, this skewness
disappears
and the C 2 distribution
approaches
normality.
Since s'\177is an unbiased estimator of rr \177,this implies
that the expected value of each of these
C 2 distributions
is 1. Moreover as
sample size increasesC\177becomes
more
and more heavily concentrated
around 1, indicating
that
s 2 is becoming
an increasingly accurate estimator
C2
is called
It has
dom.*
of G 2.
With
target
or2,
this
we
now infer
familiar technique.
*
C 2 is
comprised of
degrees of freedom
of how the estimator s\177-is

a 95\177oconfidence
interval
deduction
may
We
the
as s 2
with
illustrate
constant
parameter
[explained
in
the
sample
size
around
distributed
for a \177using
= 11 (d.f. --
cr 2, and the variable s2. Thus

footnote to equation (8-11)l.
our
its
now-
10). From
it has
the same
V^m^NC\234
Figure 8-5'
21
off
cutting
)r more
\177 of the
for o
<
.325
<
Pr
If
obse
the
ri
for
interval
value
:d
\177
\177
of s =
turns
cr 2
and
(8-24)
95\177
statement
= 95
to be
(8-25)
\177
3.6, then the 95 \177
o=
confidence
(8-26)
11.1
<
that
note
critical 'points
is
1.76 <
We
165
thus
<
out
the
find
we
VI
tail;
2.05
<
the equivalent
obtain
, we
in each
distribution
Pr
Solving
Table
precisely from
\177
his is another exampleof an asymmetrical

In gener [1, the upper and lower critical
values
interval
is written
C}n and the 95 \177 confidence
S2
interval.
=
, C .025
confidence
of
C=
are
denoted
S2
<
(8-27)
\177
\370\177
<
CY0=,
PROBLEM
*8-22
If a
sa\177
\177ple
constr
*8-23
From t
of
25 IQ
scores from
certain
s 2 --
has
population
120,
cta 95
of Problem
\177esample
8-6, construct
a 95
interval
\177oconfidence
for
Review
8-24
Probl
Two
the same good. In 400 articles

were substandard. In the same length
of
time,
tl te second
machine produced 600 articles,and 60 were substandar J. Construct 95\177oconfidence
intervals
for
(a) rq, the true proportion of substandardarticles
from
the first
produC\177
\177,d
by
used
are
\177chines
machine
to produce
A, 16
machin}.
(b)
. lthe
rr=,
true
proportion
of substandard
articles from
machine.
(c) Theldifference
between the
two
proportions
(rr\177
--
rr2).
the
second
ESTIMATIONII
166
8-25 To
that
Assume
of data
or
by
kinship,
for
the \"vitamin
effect,\"
the
effect
Rate
.Beats per
Minute
After Experiment
Experiment
Smith
71
84
Jones
67
72
Gunther
71
70
Wilson
78
85
Pestritto
64
71
Seaforth
70
81
that
Suppose
it is
Calculate
a 95 \177
on heart rate.
scientist
one
result
known
that
normally
approximately
rate
8-27 A certain
\"Sofar
through
a certain experiment.
rate, he collectsthe following
data:
6 people
on heart
Before
Person
ment
\"\177
% = %, and that
the
mice are not paired [i.e., the
(12 and 18)doesnot come from mice that are related
anything
else]. Construct a 95% confidence
interval
Heart
heart
16
\177
8-26 Suppose a psychologist runs
In orderto find
3 mice.
Treated Group
Group
19'
row
for 2 groups of
(in grams)
increases
weight
Control
first
the
supplement,
vitamin
obtained'
data were
Table of
certain
of a
effectiveness
the
determine
following
confidence
as a
people
distributed,
with
for the
interval
concluded
his study
has emerged from
whole have an
in
the
fertility
before-and-after
mean
average
73.
effect of the
control
experi-
as follows:
survey,
and
key measure of the outcome'

at the end of 1962, 14.2% of the
in the sample were pregnant, and at the end of 1963 (after
the birth-control campaign)
11.4\177
of the women (in a secondindependent
sample)
were pregnant,
a decline of about one fifth.\"
If the
samples
(both before and after)
included
2500
women,
what statistical qualification would
you
add to the above statement,
it is a
women
in orderto make
its
meaning
clearer?
:r
chpt,
Hypot
We be
shows
I notice
number of 16sses..27aces.This
a loaded die\177i
dice recently
my
Is
die, and lose whenever
I have
that
an in?rdinate
suffered
ect tha t mY
me susP
the
opP onentssusin g
to wonder Whether
this
is one of the 9rooked
aces one-_\177u\177ar\177L\177te\177\2342\177.,\177e\177tim_\177e:..,
.\177
,
well-founde
}\177at
I should make an accusation,
game
? My decision should depend on several
factors.
\177uspicion
1. Hov
makes
the philoSOphical
to keep
order
I begi n
as giving
specificallY,
advertised
and termi%te
in
with a
gambling
I am
100 throws
After
a\177e.
example,
simple
a very
it
issues clear.lSupposethat
die
HYPOTHESIS
SIMPLE
TESTING
the
mt\177Ch
I trust
did
my opponent
even before I began the
game
? For example, if I am playing

with
a sharplooking character I have just met on a Mississippi steamboat I will \177bemore
inclined
to erminate the game than
if [ am playing with
an old and trusted
to
(prior
the
c\370ileCting
evidence)
friend.
2.
potential
losses involved in making
a wrong decision ?
with very attractive odds in my favor;
if the die is, after
all,
a true one, then I will have
a good deal to lose if ][ erroneously
conclude
that it is cr,\177oked
and
terminate
the game.
3. D \370es the evidence itself (27 aces in 100 tosses) indicate that I am being
my
are
Wht}t
p!aying
be
may
cheated?
If ques
to
nut
this
1t).
(Ch\177tpter
cannot
:ions
(1) and
! whole
(2) can be answered,even roughly,

into the larger framework
r)roblem
H\370weJver, in
easiqy be
making
the ! wrong
effects?
In
\177many
many practical
answered; using a
decision
instance\177
and
it is
problems, the
example,
medical
certifying
only question
167
drug
(3) than
it is useful
then
of decisi0gtheory
questions
is the cost of
two
first
what
which
can
has
seri\177Ous
be
answered
side
by
168
and
scientist,
the
TESTING
HYPOTHESIS
address
it
but extremely
limited
is this
we
which
question
important
chapter.
this
in
state the two conflicting hypotheses as precisely(mathematically)

The hypothesis that
the
die is fair is really a statement
that
the
Bernoulli
population of all possible throws
has a proportion
of aces equal to
1/6.This is the hypothesis of\"no cheating\" or \"nothing
out of the ordinary.\"
Customarily, it is called the null hypothesis,
First,
possible.
as
H0'rr =
probability of an
is that the
hypothesis
other
The
called the
that
Suppose
(i.e., if
Reject
as the
decision
Of
_<
H 0 is
course,
occur (/'
not
will
how well will rule

then concentrated around its
error
normal
?\"
The
value .20 which

P > .20 is
while
lead to the
always
will
that
lead
separates
referred to
us to
reject
significant.
right
how
out
than
decision,
We can hope, however, that

the
small we apply probability
luck).
First,
an
more
> .20)
observed values of/'

call the results statistically
denoting
rejected, we
this rule
(9-3)
if
because
probability
analysis, as
9-1 b.
Figure
answer
suggested.After
.20)
accept Ht)
H 0 (i.e.,
of chance fluctuation
(bad
of error is small. To find
that
rule was
20 acesoccur
more than
if no
range,
critical
H0. When
before we started
collected
was
decision
rule is shown in Figure
9.-la.
two regions is often called the criticalpoint,
This
(9-2)
.25
plausible
observed/'
20 aces
in
1/4 =
evidence
following
the
'Accept H0
the
this is customarily
is 1/4;
ace
we should'
throws,
100
before
even
die
the
(9-1)
hypothesis,
alternate
H\177'rr
throwing
1/6 = .167
observed
will
be
R work
greater
(\"reject H0\.") We now ask,

Recall from Chapter 6 that a
distribution,
? The distribution
of P is
a small probability
than .20, causing R to give
the
wrong
\"How small is the probability of this
sample
proportion
P has an approximate
if H0 is true
value
mean
.167,
with only
with
3re =
and
cry
rr
(,,(1
1/6 =
.167
= .037
(9-4)
Accept Ho
169
Reject
.25
.20
.15
.10
HYPOTHESIS
A SIMPLE
TESTING
(a)
Eithe\177 Ho
is true,
in
case:
which
I error
\177
\177z
=
= .18
Probability
of
= .167
fr
(b;
Or/-/
(i.e., H] is true),
false,
\177n
whi
:h case:
\1773
=
Probability
\177
of
\177
,,_\177
typelIerr\370r='13
,25
\177'=
(e)
FIG.
9-1
Hyp\177
thesis testing.
(a) Known
dish
This
is easily
calcu
of P
ibution
Pr (P
is shown
> .20/Ho) =
Pr
18
\177=
This error
of
On the
error
then ?
rejecting
de aoted =.
ot
her
case
H\177
again approxi]\177ately normal.

accepting
he false H 0
in
i In this
chapter, we
write
Pr (
is
that
true,
we
/H o) to
0\177,
tt
of
.20,
that
is \177
167)
let
(9-5)
us say.
called a
type
error,
with
its
what is the probability of

.25; the distribution
of P is
to the rule R, we will make
an error
P < .20. The probability
of this
and
observe
mean
world,
> .9)
According
if
Unknown
probability of error
right
.20 .03'}
\177j,
it is true is
when
suppose
hand,
this
H0
-crz,
(c)
'
>
,P
Pr (Z
probability
(b) and
in Figure
9-lb. The
the probability to the
evaluating
by
ated
rule, R.
possibilities.
decision
two
with
is false;
rr =
\"probability,
assuming H 0
is true.\"
HYPOTHESIS TESTING
170
error is therefore
left
the normal distribution

the mean and standard deviation
by evaluating
calculated
to the
are
9-1c lying
distribution
.20.
of
Since
crj, --
it
(9-7)
that'
follows
.20/H0= er .P
Pr (P <
Pr (z
This error of accepting

probability denoted/\177.
R)
(like
We
in
now
conflict.
let
(9-9)
us say
type
II
with its
error,
must
row
\"Isthere
decision
better
rule, i.e.,
a better
critical
Of course, we should like to make the

of error (:,. and \177) as small as possible, but
these
two objectives
This is illustrated in Figure
9-2, which is a condensed version
of
TABLE 9-1
State of
the
--.20?\"
Possible Errors
in
If H 0 is true
false
true)
Testing
Correct
Reject H0
H0
decision.
Probability ---1
If H 0 is
Hypothesis
Decision
world
Accept
(H\177
- 1.15)
[:\177,
is reviewed in Table
9-1. Note
that the
sum to 1; this must follow, so long as we use
which involves the decision either2 to accept
or reject H0.
recall
that our decision rule R in (9-1) was determined
arbieach
trarily. We now ask:

point
for our test than
probabilities
20T6-\177
-- .25)
of testing
terminology
The
probabilities
<.
false is called a
it is
H0 when
(9-8)
\177 _btp
<
= .13=
a rule
(9-6)
= .043
X/=(1
this
of
.25
rr =
hip
Figure
in
Type
corresponds
to
\"confidence
level\"
Type II
Probability
also
/3
c\177;
called
\"significance
Correct
error.
I error.
Probability --
decision.
Probability
also
level\"
called
= 1
/3;
\"power\"
2 Of course other more complicated decision rules may be used. For example, the statistician
may decide to suspend
judgement
if the observed P is in the region around .20 (say.18 <
P < .22).If he observes an ambiguous P in this range, he would then undertake a second
stage
of sampling--which
might yield a clear-cut decision, or might lead to further stages of
sequential
sampling.
A SIMPLE HYPOTHESIS
TESTING
^cceptHo
171
Reject Ho
.22
of P
Distribution
if
Ho true
.167
Ho:
FIG. 9-2
7r
of how
lustration
false
if/-/o
.22
Hi'
= .167
reducing
.25
7r
= .25
increases/3
0c
(compare
with Fig. 9-1).
Figure 9-1, exci,\177pt

As
that
the critical point has been moved
up from
.20 t\370\177.22.
this does reduce \177; but it also increases fl. Moreover, we note
only {ay to eliminate \177is to move the critical point far enough
t \370
but a\177 we do so \177approaches
1; i.e., our test becomes\"powerless\"
can
n\177o longer
reject
even the most dishonest die Similarly,
it is
we hope,
that
the
the
right,
since we
easy to
confir\177
below .20) will
that
any
increa
se
trading off con!licting

only
From equatio
of P, concentrating
9-3.
to reduce/3 (by lowen.ng the critical point

statistics, as in economics,
the problem
is
objectives.
to increase the sample Size.

in n will reduce the sPi'ead
its distribution
more closely around its central value.
if n is inCieaSedfrom 100to 200,we obtain
the result shown in FigUre
Ihe only clifference in this test and th e one shown in Figure
9,1 i\177th e
The
Thus
attempt
e. In
\177ay to
ni (9-4)
reduce both
it is clear that
increase
in sam\177le size; note how it
These
prinlciples are illustrated
cz
and
an
/3 is
increase
reduces both
cz
and/3.
legal analogy. in a
asked to decide between H0, the hypothesis
that the accuse\177
is innocent,
and the alternate H1, that
he is guilty.
A type
I error is committed
if an innocent
man is condemned (innocenceis rejected),
while
a type II ',\177rror occurs
if a guilty man is set free
(innocence
is accepied).
The judge's adn\177onition
to the jury that \"guilt
must
be proved (i.e., innocence
rejected) beyord a reasonabledoubt\"
means
that
\177 should
be kept '9ery
small. There h\177ve been many legal reforms (for example, in limiting
the
evidence that cil n be used against an accused man)
which
have been designed
to reduce \177, ths probability
that an innocent man
will
be condemned.'But
these same refcrms
have
increased/3,
the probability that
a guilty
man Will
evade
punish m. nt. There is no way
of pushing
\177down
to zero, and insU?ing
absolutely
agai\177 [st Convicting
an innocent man without
letting
every defendant
murder trial,
tl\177e
jury
is being
with
an
interesting
172
TESTING
HYPOTHESIS
Accept/-/o
Reject
Ho
///////////\177////,\177
/,/\177/
.167
FIG. 9-3
How
go free,
It should
\177
and
fl are
.20
\337
fro
frl
both reduced by
sample size
increasing
raising
fl to 1 and making the
also be noted that historically
\177and
crime detection i.e., by increased
thus
improved
.25
\337
trial
(compare with
available
been
9-1).
(powerless).
meaningless
fl have
Fig.
both
evidence
reduced
brought
by
to
bear on H0.
best
take
to balance,
into
the
to
Returning
more funds for
statistical
problem,
increasing samplesize)
or trade off,
consideration
the
\177and
factors
we
we conclude
are
fl. Whenever
mentioned
that
(short
of raising
problem of how
possible the answer should
at the outset of the chapter.
left with the
1. The relative prior likelihood of the

two
competing
hypotheses.
To
use an earlier example:if your opponent
is a trusted friend, rather
than
a
complete
stranger, your greater prior confidence
in H0 will make you more
reluctant to
it; thus you will

keep
\177small.
relative cost of making
each
type
of error. To use the
same
example:
suppose
the cost of making
a type
I error (and accusing an old
friend
of being a cheat) is high, while
the
cost of making a type II error is
relatively
low
(you continue
to bet against a crookeddie but it is only for
peanuts); in these circumstances,
your greater concern about making
a type I
error will lead you to reduce \177to a relatively small value, even though/\177
is
increased
as a consequence. Or, drawing
on our
legal analogy, we may
interpret
legal reforms designed to protect the innocent
(i.e.,
reduce \177) as a
reflection
of the judgement that
the
cost
of type I errors (condemning
innocent
men)
exceeds the cost of type
II errors
(allowing
the guilty
to go
2.
free).
The
reject
A SIMPLE
TESTING
The
:u!.[yis that,
difiS,
be aris
cannot
\177,ered
\177is
quite serious
is constructe\177
illus/
We
on
rate
set at
a small
value
scientific inquiry,
these
questions
1Secausetype I errorsare usually
5 %\177or
-usually
\177o- Then
the
test rule
are
typically
basis.
this
die-tossing example how
with our
Three steps are
tested.
deal of
precision;
but
a great
in
any
with
173
HYPOTHESIS
hypotheses
involved.
1. The
rull hypothesis H0 and the alternative H\177 are formally

stated,
ar td (9-2). At the same time, the sample size (e.g., 100)and the
significance f the test (e.g., \177= 5 \177) are set.
2. We
ow assume that
the
null hypothesis
H0 is true. And
we
ask:
\"What
can
ve expect of a sampledrawn
from
this kind of world?\" This
example thus' if H0 is true, then
there
question is a: sweredin our die-tossing
is a probabilty of only 5 \177othat we will observe a sample P greaterthan .228.
This
critical
value
(.228) is determined as follows. We note from Appendix
Table
IV th \177t a Z value of 1.64 cuts a 5\177o tail off the standard normal
Z value
is translated into a P value:
distribution. This critical
as
in (9-1)
P--rr
--
1.64
(9-10)
and for
value of rr
the
1/6'
P-
/(\177/6)
N'
1.64
(9-11)
(5/6)
100
the critical value of P = .228.The resulting

test R, is shown
shows
us
what
to
expect
of
a
sample
P,
if H0 is true.
(At
in Figure 9of a type II error (fi) and
the power of this
test
this
stage,
t \177e probability
(1 - f) ma also be calculated,but
this
is not always done. As an exercise,
yiel&
which
\337
. This
= .31 and the power of the

test.is
.69.)
there remains only the last automatic
step.
and P observed. We now ask.
Is th\177s. P con-.
(i.e., P > .228)we reject H0.
that in our 100 tosses, we rolled 27 aces. This
conflict
with H0 that
it cannot
be \"reas9nably\"
the reader thould confirm that

With the rul\177e R, now established,
3. The \177sample
sistent
with
As
observed
an
it
is not
recall
:xample,
.27 is in such
t.) chance and H0 is rejected.
P =
attributed
Whereas
Summar)
critical valt
(R,) we
is taken,
H0?\" If
fi
(.20)
sp{ cify
and
c\177(.05)
\177:
our first
test
(R)
we arbitrarily
specified the
solved for \177., in this more typical hypothesis test
and solve for the critical value. Note that
the
\"95 %
in
174
TESTING
HYPOTHESIS
.05
.167
.228
&
I
I
value
Critical
our decision
yields
which
rule
Accept/\"/o
Reject Ho
p
.228
FIG.
9-4
Construction of a
confidence
of
level\"
test
of H0,
level
(e =
this
test
the hypothesis that
.05) with
is similar to the
,r =
n --
that
.228, and we will (correctly)

which
there is a 95\177o probability
below
accept
that
There is another
way
of looking
at
observed P exceeding.228,there are two
.167, at a 5%
concept of
that
Being reasonable,we
are left in
one. For
level
some doubt;
reason
this
(type I
it
we
the assumption
observed
be
will
H0. Thus we are using a method

we will be right when H0 is true.
this
testing
If we
procedure.
get an
explanations.
bet
is no
a very
with odds
got
and
we
surprise
that
second explanation. Although

the
as plausible as the second. But we
is just possible
that the first explanation
is the correct
qualify our conclusion \"to be at the 5 ,%o significance
opt
is conceivable,
explanation
\177o confidence
95
9-4 on
1. H0 is true, but we have been exceedingly unlucky

improbable sample P. (We're
born
to be losers; even when
of 19 to 1 in our favor, we still lose); or
2. H0 is \177zot true after all; the die is crooked,and it
we rolled
so many aces.
first
significance
100.
in interval estimation. We set up the test in Figure

H0 is true. If this
is so, there is a 95 \177oprobability
used
in
a sample
the
for
it
is
not
error level).\"
PROBLEMS
9-1Fill
in
Consider
enemy
the
blanks.
the problem facing a radar operator whose job

When something irregular appearson the
aircraft.
is to
screen,
detect
he
175
COMPOSITE HYPOTHESES
must de
between
all is
well; isonlycoming.
a bit
attack
\177j:
In
cas\177, the
this
type_
equipment is made as reliable

9-2 (a) To test whether
construct \177test, at
reduce both
number of
a die has a fair

the 1 \177osignificance
.167
What
construct
test ?
whether
a die
test,
has a fair number

at the 5 \177osignificance
versus
a:
(b) What
ed
Cornpal
(2)
\177di'fferent?
(3)
fi
di\177.Terent
'value
class
yo u to
lead
mistakenly
will
H o'
or= .167
H\177'
rr =
of aces, using
IeveI,
of
.300
this test ?
the test
with
(1) The critical
developed
different
in
the
text
in Figure
9-4, is
TE HYPOTHESES
COMP7S
9-2
fi for
o\177
and
3-2
Problem
in
in your
students
.60
100
Interpret.
appropriate
tl\177st
a\177
this
observed
you
many
(heads)
a sample of
H\177'Pr
Assume
significance.
of
how
for
\177/9
=> 9-4 (a) To
(c)
.25
alternative
the
versus
results
sample
the
reject HH0?
About
reject
07 1
(c)
= 100
rct the appropriate test of
H0:c,)inunbiased,
using a 25 \177o level
tosses!
(b) Do
aces, using
\177?
(a) Const
9-3
electronic
the
of
level,
--
Hl'7r
versus
type
the
fi,
\177zand
as possible.
sensitive
and
Ho:or =
(b) What
and
error is a \"false alarm,\"
;.
\"missed rearm. To
is a
error
on the screen.
of interference
an
(a) Introduction'
in
In our die-t,\177ssing
example
we have assumed that
there
is only one Way
a die ca 1 be Crooked (i.e.,or = 1/4). Thus the alternative hypothesis
which
H\177
was
one. But usually

against us.
a simple
the die may
be
hi. tsed
there
is no way
Thus our alternative
of knowing how heavily

hypothesis
Hx (crooked
HYPOTHESIS
176
be a
die) would
TESTING
.17
rr =
.18
.19
(9-12)
H0:\177r
.167
(9-13)
H\177',r
>
.167
(9-14)
H\177'rr
rr
including our
To
set of possibilities.
a whole
embracing
hypothesis,
composite
previous, simple alternative'
we wish to
summarize,
test
against the composite alternative
Since there are many

fi as simply as in
in Ht, we can no longer

evaluate
But note that
H0 is still a simple
hypothesis; thus the evaluation
of \177.has not been complicated. There is now,
therefore,
an
even
stronger
case for concentrating on oc, which
we set at
.05; we shall return
to an evaluation
of the more complicated fi values
later.
With this significance level of .05 given,
the reader
should now develop
as an exercise
the appropriate
decision rule for accepting or rejecting
H0.
Note
that this is identical to rule R, developed
in Figure
9-4. Since the rule
is based on the level of significance selected, (= = .05 in both cases), it is
the
remain
(b) The
Power
With
a
composite
section.
previous
of
entirely independent
may
included
alternatives
the
same,
any
of' fl. But
considerations
there are two
major changes in
test
while the formal

its
interpretation.
Function
a simple
alternative
Ht there
are now
many
fi
was
a single
possible
probability
values
of'
\177r,
value.
each
With
giving
9-5; each involves

evaluating
the area under a curve, lying
to
the left of the critical value
(P = .228).
Thus
the middle
curve shows how the sample P will be distributed
if the true rr is .25 and yields fi = 31%. To interpret' if \177ris in fact .25, then
there is a probability
of 31 \177othat
an observed
P will be less than our critical
different
fl. We
show three
H\177,
such
calculations
in Figure
cOMpOSITEHYPOTHESES
ACCept/-/o
177
Reject Ho
.228
I
Critical
value
of P if
Distribution
\177r
---
.25; then
/\177
=
.31
of P if
Distribution
\177
\177
\177r
= .30;
\"\"\337
Distribution
of P it
\177r: .18; then --\177
/\177
= .80
Hi:
FIG. %!
value of .228- :\177and
Table 9-2 lnas
been
,r; the corresponding

test
reject
(1 --
fi)
is
a crook{
hown
erroneously
will
we
constructed
\177values
in column
are
3;
more
\177'>
.167
conclude
type
that
using a whole
shown in column
this
is the
die.If a dishonest
gambler
an ace, he knc
vs he will be quickly
found
the
probability of
of fi, the
Calculation
crool,
.30
.25
-V\177
.167
then
II error,
the
set of
(n =
die
t00).
is fair
possible
(accept
values
of
2, and the power of the

probability
that we will correctly
a die which
out, and the
uses
always
turns
up
game abandoned;
the
die he uses, the greater your \"power\"
to uncover
him
less crooked the die, (i.e., as we move down this column),
the more
dJflict
It it becomes
to reject it. The dishonestgambler will recognize
this,
and will
[,refer
to get you to play against a slightly
crooked
die. The
power
of this
:st is thus
seen
to be our ability
to uncover
a crooked
die;
and if it is on
y slightly
biased,
our test has little
power,
and it becomes
almost
impossi
ble to distinguish
between
the two conflicting
hypotheses.
This is confirm
',d from
the last, limiting line in Table
9-2. Here the value of
,r is H0, and to\177
ejectH0would be wrong. The probability of this was a = 5 \177
as a
by
cheat.
ed
Th\177
definition.
The \"powe
power
function\"
function
is graphed
in Figure
9-6. Clearlywe should like a
close to the baseline, since its initial

\177z,the
level of si
gnificance, which
we wish to keep low. At the same
wish
the
powe]
function
to
be very steep; the more rapidly it
greater is our p.)wer to distinguish
between
competing
hypotheses.
hat
begins
very
height
is
time,
we
rises,
the
178
TESTING
HYPOTHESIS
TABLE 9-2
(Test of
fl
and
Function
Power
Fair Die, at
for
R,
Test
the
of Significance)
5 \177 Level
(3)
(2)
Probability
Possible
Values
of
.28
.26
.24
.22
.20
18
17
(.167)
H0
Power = 1 -/\177
rr
.30
(c)
of (Erroneously)
Accepting
.32
Limit
of Correctly
RejectingH0
Probability
Accepting
About
Warning
This introduces a
.02
.05
.98
.95
.12
.23
.39
.88
.77
.58
.61
.42
.76
.26
.89
.11
.94
.06
(.95)
(.05)
H0
secondreasonfor
our
interpreting
test in this
section
a composite
H0 differently from our test in the previous
section (with
a simple H\177). It is now possible that this die is only slightly
biased
(rr = .18).
If this is in fact the case, it is very likely (fi = .89) that we will observe P in
the range
below .228. R, tellsus to accept
H0 (true die); but this is a mistake.
At
the same
time the evidence is not strong
enough
to reject H0. What
to do ?
(with
Power
-\177
1-\1775
-- Pr (rejecting H0)
\177
=
Pr (type II error)
a -- Pr (type
.05
.167
.2O
=Ho
FIG. 9-6
Graph
of the
power function
of Table
/o significance
5o\177
.30
Possiblevalues
9-2, for the
level.
of rr
test R, of the
fair
die at the
COMPOSITE HYPOTHESES
The
reject
H0.\" t is
II eri'or.
this
in
only
is to suspend judgement;or formally

way that we can avoid
the great
risk of
option
apparent
other
only
179
Earlier,
\"not
incurring
H0),
alternative
this
risk
be\177 )roes
(fi can run up to 95). Thus
we prefer
to \"not
reject H0, :'suspendjudgement,'or conclude
that
\"our
sample P is not
statistically ignificant, i.e., P is not significantly
greater
than 1/6.\" But we
a type
we could liv,
do not
the risk
prohibitive
with
with
simple
of type II
H\177
different
(substantially
error; but
our
wfith
from
composite
,t H 0 outright.
acce
point of view. Suppose we are using test

we toss the die and observeP = .21.This
would
not bei an unlucky result; in fact it is the best luck that
we could
have
hoped for, s\177ce our estimate
P is exactly on the true value rr. If we Started
ut
suspectlag
bias, it has now been confirmed by our sample.
There are no
con: irm
We
from
this
rule R, on a'biaseddie (rr
grounds whaisoeve
we also
Since
to
this
There
we
r for
another
= .21);
concluding
that
this
is a
fair die.
cannot reject H0, we suspendjudgement.a

alternative
turn.
is another
shi[11 now
even more
is generally
which
We cannot acceptH0
'
attractive and
(d) Prob-val
The pro\177-value
(the
\177
Pr
a This
b-value
& Pr
to
sample value would

value we actually
the
be as extreme\177
(9-15)
observed/Ho]
illustrates
a problem involved
in accepting
H 0 if the sample size is small;
anreiect\177
inc[ease
in sample
size, by reducing our standard
error
woul d eventually
H,
*'resu--:
J \177t 0--},
,tung, oz course, mat we continued
to observe P _
section
note
how
allow
us
\\as
as
is defined
'
......
.21.
othe/' hand, if the sample size is extremely

large we can fall into a trap in
rejectin\177
tt o. To \177ee why, we consider more carefully
the question
of whether any die is
absolutely true, \177ith
to this must be no; H0 ' rr = 1/6 like any
\177 rr = 1/6 exactly ' The answer
On
the
to do to reject it is to take
estimate to the point where even
',\177 \337
\337
,,
cally s\177gmficant \"! But cDn(-h,\177;-,\177 ,\177, ,\177-':- - ....
o\177rrejection
of H0, \177.e., be statisti\337 '
' I
'-\177,-*,.*.st\177;
t\177at tins
me \177sreshonest
misses the point: it is not
dishonest enoug\177 to be of any practical consequence.We therefore
must distinguish
between statistical
significance,
and practical importance.
other simple
hypdthesis, must
be (slightly)
false. And
enough Sa\177{nple thus reducing our standard

an observed Pjus\177 slightlv different from 1/6 will
a large
all
we have
error of
call f
sample size is obviously

an important
consideration in hy othes\177s
. ,.'
IfconclusiO\177j,
the s a\177q-[\177le
is ver- lar-e, re'ectin H\177
'
'
.
T
Y
g
J
g 0 may oe very clan,,erous. on the
2 the sample \177svery small, accepting H 0 is danserous
(and this i7,h\177 -\177'\177
.... ,,,ot\177her hfi. nd,
\177oreconomists
ant ..+t.
.._, _ .
...
.,-, .
o -\177,. m,_uc c/mcal
10roDtern
test\177nIn
'
' g
\337
\177-
.....
\177-,-,-,\177, =uc,m
sc\177ennsts,
ause
Short
\"proba
,ility-value.\"
It
this for
term,
to a \177'oid confusion\370
with
their limited
is sometimes further
available
contracted
informationS).
to ';o-value\". ' we
d\370not
HYPOTHESISTESTING
180
example,
gambling
the
For
of aces to
proportion
be
prob-value
_
(P >
Pr (Z
and
observing
the
have
.27, we
=\177 Pr
100 times
a die
tossing
in
\234 =
(9-16)
27/H0)
\337
\177_ 2.77)
(9-17)
-- .0028
This calculation is very similar to the calculation of v., and is shown in Figure
9-7a. We further
note
that if the observed value of \234 is extreme,
the probvalue is very small. Thus the ?rob-value measures the credibility
of rio. It is an
excellent way for the scientist to summarize what
the
data
says about the
null
hypothesis.
The
to
of prob-value
relation
H0
testing
\177
may
be seen
in
Figure
9-7b.
Prob-value
.27
.167
= Observed
\337
(a)
.167
FIG.
9-7
Prob-value for the
(a) Calculation
of
prob-value
lation
gambling
when
of prob-value
.27
example;
observed
to
H0 is rr = 1/6 and sample size is n = 100.

P = .27. (b) Fig. 9-4 repeated to show re-
o\177'Reject
0 iff
prob-value
HYPOTHESES
COMPOSITE
the prc
Since
b-value is
f the test,
tion region
smaller
To restate thls, we recall that
of
valu\177
To
\177at
th e
rejec-
(9-I8)
0\275
I
measure of the credibilit

be re'ected.
\177nterpretation:the prob-value is the smallest
may be rejected.
is a
prob-value
the
yet another
which
H0
\177gure 9\1777 shows
possible
<
iff prob-value
sinks below \177.,then
thislicredibility
of P is in
18i
\177
Reject
of H0; if
value
observed
0\275,the
than
i.e.,
\177
H0 must
of the traditional hypothesis testing of

arbitrarily, and the simple decisionto reject
or not reject Ho does not
allow
the sample
to \"tell us\" all that it might.
Prob-value is therefore
the preferred
way of stating
the result
of a hypSthesis
test. Then ea!:h reader can set his own level of significance
o\177at
whatever
value he deen\177s appropriate,
and make his own decision to reject Ho if the
prob-value
< \177. [If the prob-value
> \177, he should
suspend judgement for
concl
0\275
is
the reasonsci':edin
a
with
set rather
that
Suppose
of 90
distance
\177topping
above.]
9-2(c)
Section
E\177'ample.
\177nother
linings
criticism
major
\177de,
Section9-1isthat
an auto
feet. The
been
has
firm
is considering
firm
using brake
a switch to
another type )f lining, which is similar in all other respects, but alleged to
have a shorte: stoppingdistance.In a test run the new linings are installed
on 64 cars; th\177 average stopping distance is 87 feet, with a standard deviation
of 16 feet. In \177our job of quality control, you are asked to evaluate
whether
the
Let
or not
\177 =
is better.
lining
ne\177
stopping distance
average
for the
of new
population
linings
and test
90
Ho:/\177 =
against the alternative
.{
Noting
that
method
simila
\337
tl\177e
H\177:/\177
observed,
Translating
<90
87, you calculate
the
using
prob-value,
(9-16):
i to
prob-value
\177
In other wordsl, this

you
2 is
observed
\177.e.,
is the
probability
Pr
(.,\177 <7
that
or more below
Z values, we have
3 feet
(9- 9) into
prob-value
Pr
87/Ho)
.\177will
(9-19)
be as
extreme as the
the hypothetical value
87 --
of
value
90 feet.
90\177
(\"\177-
Pr (Z <7
--
.067
- 1.5)
(9-20)
HYPOTHESIS TESTING
182
You report therefore that there is evidence that the new linings
are
better,
since there is only
a 6.7\177o probability
that you would get such extreme
test
results
from an equivalent product. Thus
you
leave the decision to the
vice-president. If he usesa 10\177o significance
level, he will
switch
to the new
linings. But if he uses a 5 % significance
level, he will not switch.
(e) How to
Select H0
So far we have
Hx. Casesoccasionally
example,
suppose
vote
Democratic
preference is the
rr\177r
and
a simple
both
and composite
occur
when
both H0 and Ht are composite.As an
we are asking whether American men are more likely
to
than American women. The null hypothesis
(that voting
same)
rrw
simple hypotheses.
many
contains
.50
r\177w
rr\177
--
rr w
-- .51
r\177
r\177w
H0'r\177
where
H0 against
a simple
tested
x,
of
the proportion
represent
x _<
_<
voting
women
and
men
Democratic.
Moreover, the alternate hypothesisis even

.51
and
rr\177
.52
and
rrw =
rr\177
.52
and
rr
and
rr
.50
.49
.50
Hx:rr\177
composite.
more
rr
_<
x _<
O_<y<_l
x>.y
Additional
complications are
introduced when only H\177 was

to start. The key is to define
between
voting
preferences.
a new
refer to
H0
involved,
Indeed
population
above
it is
as one-dimensional,
and
7r3\177
[ --
H\177 as
Tr W
two-dimensional.
and
beyond those
to
difficult
parameter
Specifically
\177---
\177
We
now
composite.
-the
know
where
difference
COMPOSITE
The null hy p
becomes
)thesis
H0'
against the
For large samples,a test can

sample'iproportions,
now
in the
As this
p\177
makes
illustration
that
one
and
\177--
;:(9-21)
H\177' \177>
(%22)
alternative
.mposite
c\177
uninteresting,.
183
HYPOTHESES
be constructed
on the basis of
difference
may sometimes be
clear,
the null hypothesis
we neither believe nor wish
selected becat se of its simplicity.

establish, and we prove H1by
the
Pw.
It is
establish'i
to
alternative H1 that we are trying to

rejecting
H0. We can see now why
statistics
is
sometimes
caled \"the science
of disproof.\"
H0 cannot be prox;en;\177and Hx
can be prove n only by disproving
(rejecting) H0. It follows that if we wish to
prove some pi oposition,we call it Hx and set up the contrary hypothesis
H 0
as the \"stra w man\" we hope to destroy.
It is the
research engineersin an electronics

company clai\177 \177that they have developed a new televisiontube superior
to the
old, which
hal
an average lifetime of 12,400hours. They
ask
you to prove
its superiorit)
You wish to establish,
that the
Suppose
\177xample.
Another
Hl'/t
12,400
>
lifetime of all new tubes. The

is no better, i.e.,
is the:average
where/t
to destroy is that
Ho:/\177
The new
tube
significantly
lished.
s then
g eater
\"straw
man\"
you hope
tube
this
tested
than
in the hope
12,400. If it
This exar
I2,400
that
is,
the
then
observed
H0 is
sample
mean
rejected and
H\177
will
be
is estab-
earlier warning against accepting H0.

slightly above/to, yielding a prob-value
of
20 \177o. If the vi
president
specifies the significance level 0\177at 5 \177o, the evidence
is not strong
el ough to allow us to reject H0.But we cannot accept H0 either,
for two reason:
did not believe it in the first place; itwas setup simply
as a \"straw m n\"(1)we
we hoped
to knock over in order
to establish
Hx; (2) the
tests suggest \177
is wrong, (although
not
as strongly
as we would have liked).
We therefore
o!
>t to withhold
judgement,
simply quoting the prob-valuel \177
and
wait for furthe:
evidence.
Supposethe
See
Section
\177ple
s.
tmple
\177
9-2!
our
emphasizes
mean
above.
X is
184
TESTING
HYPOTHESIS
PROBLEMS
9-5
has always grown to a mean
of seed
type
certain
of 49 seedsgrown
under
inches and a standard deviation
A sample
8.8
of
(a) At the 5\177o

conditions
grow
(b) Graph
9-6
(1 --/\177)
3 of Table 9-2,an
all
graphing
by
in
and compare
test,
with
it
desirableshapefor
is the most
(a) Calculate
the
At
(c)
At a 1 \177 level
you
accept
it ? Explain your answer.

average 320 cupsof coffee
per
standard deviation of 40. After

advertising,
they
sell an average
of 350 cups.
(a) Has advertising
left
with
a standard
would you reject the man's claim

would you reject his claim?
of significance,
sells on the
of 150
sample
random
S6730
salary of
and interpret.
of significance
therefore
shop
coffee
shows
prob-value,
a 5 \177olevel
the average yearly
is only S6600.A
a mean salary of
profession
a certain
(b)
Would
2). Draw
(column
\177values
this
that
claim
iraplausible
men in that profession

deviation of s900.
9-8 A
characteristics
\"operating
OCC
9-7 A man makes the

men
new
the
that
involved graphing all the
die test
the operating characteristics curve

for
the power function in Figure
9-6. What
an
1 inch.
test.
this
our
of
(OCC) is defined
curve\"
of
function
function
in column
values
inches.
mean height
no better plants.
power
the
has a
the hypothesis
test
level,
significance
the power
Whereas
of
of 8.5
height
conditions
new
business
their
they
with a
day,
find that on
7 days
unchanged ? Calculate the
prob-
value.
(b) If the
the
that
(c)
test
owner of
coffee
the
level) is
is unchanged
?
(significance
business
What
Under what
*(d) If coffee
improved,
have you
are they
assumptions
conditions
sales
sales have to
be
can
in
made
specifies
that the type I error of

reject the hypothesis
you
made implicitly
questionable ?
at the 5 \177osignificance
among 100
a standard
in
be observed for 25 days,what

to justify
a statement
order
9-9 In order to comparethe

was
shop
to be 5 \177o, do
yearly
men
in
level
would
that
(a) and
(b) ?
the average
had
business
incomes
each.
parts
In one
in two
professions a
sample the mean
survey
incomeis
S6000
with
deviation of $700; in the second sample the
mean is $6200with a standard
deviation
of S400. To weigh
the claim
that the mean salary in the second
profession
is no larger than in the
TWO-SIDEDTESTS
first Pr
ferenc e
calculate the
in means developed
ffession,
(9-10)Record
of 25.
reducin
production.
was
erely statistical
(a)
To objectively
left
un(\177hanged,
(b) If
\177o
1 2
Supposi
the
'\177
alternate
'9-12 The out

articles
as
TWO-SII
\177ED
In the
pre\177
thz
or less D\177
more
How small
inferior
at the
400.
in
machin\177
Democratic
is
following
that
there
is a
bias towards
in order
to reject the
ious section we asked whether men voters were more heavily

,n women.
Suppose
instead we ask whether men voters
than
-22)we
(!
the
or
digits.
would 2' have to be

5 \177osignificance
level
In either
women.
H0' a
in
in a factory is substandard 4 \177 of the time.

being inferior produces 2' substandard
hypothesi\177
But where
that
level of
ement
TESTS
mocratic
null
of
suspected
\177ine
a fair
24844302
hypothesis
machines
of all
>ut
maCi
'
is
of maria
favor
in
of. random
2451
1. g086l
'that
rule
= 1 \177
the hypothesis
test
from a population
small di gits.
9-3
do they
level,
significance
is drawn
that =
decides
I error),
is
production
whether
prob-value.
the
sample
an hourly
board
significance(type
At
deviation
installed,
was
summarize the evidenceon
calculate
union? !
'9-11
machine
a standard
fluctuation.\"
arbitration
tlhe
100 hours
in a random sample 6f 500

average of 674 article s with
of 5. Pointing to the drop of 4 articles per hour
management
claimed
the safety device was
The union countered that
the
drop
of 4 articles
mean,
sample
on dif-
theory
8.)
Chapter
in
produced
deviation
\177rd
the
safety device
machine
he
a stand
in
hourly
tfter a
hours
prob-value. (Hint. Usethe
in a random
sample of
average of 678 articles with
that
Sl\177oW
c \177dan
produ
185
used
a one-sided
case we use the
same
are
simple
(9-23)
= 0
alternative
Hi'0 > 0
we now
must
u ;e
the two-sided
alternative
\275>0,
i
i H\177'
We
than
6 \177 0,
reject
H0 if\177':our sample
0. We test H0 \"from
which
estimate
is equivalent
of
both sides.\"
\177is
to \177..
significantly
greater
(9-24)
than, or less
186
TESTING
HYPOTHESIS
suppose we are again

testing
the trueness
of a
die. But instead
of testing a suspect die we are betting
against,
suppose
we
work in quality
control
in a die-making factory. We are now just as concerned about a die that shows too few aces, as one which shows too many.
Our appropriate test involves'
As
a second illustration,
.167
H0'rr =
(9-25)
(9-13) repeated
against
the
alternative
two-sided
H\177'rr
[compare
with
(9-14)].
two-sided. For a level
\177
The critical
of
i.e.,
.167,
region (for rejecting
significance
(9-26)
167 or
.167,
\177z--
5 %
now also
must
H0)
this is shown
in
be
9-8b;
Figure
a= 5%
.167
.228
Ho
Reject
./\177
\177
\177z
=
,090
.167
5%
\177\177.....each
.244
but
area
in
ta,I = 2\177 p
Ho
Reject
Ho
Reject Ho
FIG. 9-8 A one-sided

and two-sided test of a die compared.(a) A
H0:rr =. 167against
the alternative
Hz'z- > .167(Fig. 9-4 repeated). (b)
of H0'rr = .167against
the alternative
Hx',r \177 .167.
one-sided
test of
A two-sided
test
(,F
RELATION
area
an equal
2.-} \177)
is cut
rejectingH0 a'.\177large
TO cONFiDENCE
TESTS
HYPOTHESIS
Off
keep the criticalregion for
in order to
tail
each
Thus'
as possible.
P -- 7%
H o if [ZI
Reject
=
%(1
1 57, the
rr0 is.
where
whether
dif
hypothesis
null
rr 0 by a
;ers from
187
INTERVALS
> 1.96
(9-27)
- To)
value of rr. Equation (9-27) simply

critical amount .on either
the
high
is \"How do we recognizewhen
to use
asks
side or
a two-
low side.
'he final question
tailed test or a one-tailed test?\" The one-tailed test is recognized by an
asymmetrical
\177hrase like
\"more than, less than, at least, no more than,
better,
worse,...\"
\177
ad so on. Thus our first
test
of whether
the probabilityof an
ace on the gambler's
die was more than
one
sixth,
required a one-sided test.
the
PROBLEMS
i
9-13
H0
Test
i/'2 versus
rr =
tossing
H\177'rr
\177
up.\"
\"point
thumbtack
use the ;ample observations of

(a) Aft{ r 10 tosses.
(b) Aft4r
9-14
Usi ng
answer
implausibly
>
now
(a)
Two-side
low,
6600 or
a two-sided
the same
9-4 THE I ELATION

CONF]
5 5/0
level
the probability
of significance,
of
and
3-1.
Problem
Problem 9-7,suppose
Hx:3t
two-siddd:
rr is
where
tosses.
100
Referri4g to
is no lounger
1/2
Use a
DENCE
Hypothesis
pt
that
claim that 3to -- 6600

the alternate hypothesis is
man's
the
i.e., suppose
< 6600.
test of H0,and alsoa two-sided
questions as
in
Problem
OF HYPOTHESIS
prob-value,
9-7.
TESTS TO
INTERVALS
Tests
In this s :ction we shall reach a very

important
conclusion:
a confidence
interval can >e used to test any hypothesis;
in fact, the two procedures are
with an example.
equivalent. \177re illustrate
a firm has been producing a light bulb with an average life of
Suppos\177
800 hours. I wishes to test a new
bulb.
A sample of 25 new bulbs has an
(\177), with
a standard deviation of 30 hours (s).
average life )f 810hours
of our small sample we should use the
t, rather
than the
Noting that oecause
188
TESTING
HYPOTHESIS
we can
distribution,
normal
1. Test
the
either
hypothesis
Ho:)to = 800
against
(9-30)
the alternative
800
H0 may
be accepted7
the
at
i.e.,
if )t0
-- 2.06
Given our
s/x/\177
tl
)to
-\177-
_<
if
(9-32)
t.02.\177
+ 2.06
_< .\177 _< )to
samples, along
of significance
\177olevel
lobserved
(9-31)
hypothesis
our
with
788
Since our observed .\177 (810) does

This is shown
in Figure
9-9a.
__< .\177 _<
)t0,
this
condition
(9-33)
812
fall within
becomes
interval,
this
)t0 is
acceptable.
Accept
1
812
800
788
Observed
Hypothetical
value
value
x
\177uo
Conhdence interval
810
798
822
t.025 \177
= 2.06
30
_4\177
12
(b)
FIG. 9-9 Comparison of two-sided

hypothesis
test with confidence
sample with \177 = 810 and s = 30). (a) Test of H'0:\1770 = 800 versus
Confidence interval
for/t.
interval
H\177'p
(using
\177 800.
a
(b)
? More specifically \"H 0 should not be rejected.\"

To simplify the exposition in this section
and avoid double
negatives,
we shall use \"accept
H'0,\" rather than \"do not reject
Ho\"-although as we have pointed out earlier,
the latter (weaker) conclusion
may be the only
one
justified.
RELATIO\177
OF
2.
Alte\177
natively,
the sample result
3t. Using
the same
(8-1 5) as:
for
interval
is d
interval
TESTS TO CONFIDENCE
HYPOTHESIS
in
fined
]89
INTERVALS
could be usedto construct
95%
of
level
a confidence
this
confidence,
confidence
\"
X 4-
t.025
30
4- 2.06
810
(9-34)
/\177
,\177/25
or
< 3t
798
is show
This
in
The obs
erved X of
st in Figure
we note that
810 fails in the

9-9a; hence 3t0 is
hypothesis te
9-9b
Figure
This is tI\177e key

an
it be
will
interval
we
U\177e
is the
if 3t o
only
same length
in
acceptable.
cases:
both
same error
the
region defined in the

At the same time, in
this confidence interval,

from the diagram, sincethe
it is constructed by adding
falls within
is clear
This
hypothesis.
precisely
subtracting
and
if and
acceptable
the confidence interval.
within
3to falls
point:
\177tcceptable
(9-35)
< 822
9-9b.
Figure
allowance
(t.02\177
12).
s/\276/7\177=
Provided the sample mean \177 and 3t o differ by less than this, 3to will fall in the
confidence
in :erval, and will also be an acceptable hypothesis. This holdsfor
any
3t0. (To c >nfirm, note that
3t0 = 797.6 would be just barely contained in
the
confidem
e interval at the bottom; at the same time
this
hypothetical
value would Slhift our acceptable
region to the left in the top diagram to the
point
where
our sample \177 = 810 would just barely remain in that region.
But any smaller hypothetical value of 3t will fall outside
our confidence
region and bd rejected.)

It
can
b\177proven, in general,
i H 0 is accepted
interval
a general
the basis of bott
s For
of
test
\177, but
ou
that
if and only
observed \177
In constru,
satisfy (9-37)for
if
the
hypothesis
relevant
the
confidence
(9-36)
(rather
than geometric
interpretation) for (9-36),consider
interval and hypothesis test. (We illustrate with the normal
are equally valid for most tests.) With
95 % probability,
the confidence
remarks
a/x/\177
the
if
contains H0
tlgebraic proof
r
In deciding whet
her to
\177tisfies
\177nga
acceptthe
this
null
<
hypothesis
(9-37)
1.96
3t0, we
first
and then
fix/to,
see whether
inequality.
confidence
interval, we
first
observe
,\177; then
the values
n our confidence interval.

3% will be in the confidence
,% is accepted, for in both cases we have
1.96
interval
of 3t
if
and
which
only
190
TESTING
HYPOTHESIS
noting,
of course, that
level of type I error (level
of confidence
5 %).
level
the
(e.g., 95 \177)
match
must
the
significance,
of
*(b) One-sidedHypothesis Tests
one-sided test of a hypothesis,

confidence interval, as shown
in
Using the same sample result, we see that the observed X of
(9-36)
Equation
9-10.
Figure
810 falls
for a
true
remains
provided,of course,that
a one-sided
use
we
region defined in the hypothesis

test in Figure
9-10a; hence/\1770 is acceptable.
At the same time, in Figure
9-10b we note that
/\1770 falls
within
the confidence interval. This illustrates
once
more
that H0 is
accepted if and only if the confidence interval contains H0.
The reasons for one-sided
hypothesis
tests have been established at
length
in
the
in this
acceptable
chapter. These
fidence intervals
too.
same reasonsjustify
the
for example
that
the
dam
in a
Suppose,
considering construction of
Accept
multipurpose
use of
federal
one-sided congovernment
is
river basin.
Suppose
\177o
810.3
800
(a)
Observed
X
Hypothetical
\177o
Confidence
interval
810
799.7
8
t.o\177
= 1.71
FIG.9-10 Comparison
sample
result
as
of' one-sided
Fig. 9-9). (a) Test
\177
30 = 10.3
hypothesis test and

of/-[0:P0 = 800 versus
interval
for p.
confidence
H l:p
interval
> 800.
(using same
(b) Confidence
RELATIO\177X
OF
HYPOTHESIS
TESTS
CONFIDENCE
TO
191
INTERVALS
that
further
the cost of this installation

is s100 million. The problem is'
the b \177
roefits
from
the project exceed this
cost ?
To get
.n idea of irrigation benefits, suppose we run a careful calculation
of the opera
sample
of 25 farmers in the river basin, and
estimate tha tion of a random
the net profit
(per
100 acres) will
increase
on the average by
$810
(with '\177
standard
deviation
of $30). To simplify
the exposition,
we have
used
the sat
te numbers
as in Figures 9-9 and 9-10,exceptthat
,\177and
u now
refer to the
tverage increase in profit.
'
The be\177
point
estimate of \177 (average
profit
increase) is 810. BUt if we
would
in
this
use
o\177
calculations, we will
r benefit
i.e., it may \177eway too high, or way

estimate of {799.7,the critical point
Figure
9-10. We can be 95 \177 confident
by
know
certain
in our
that
this figure
benefits (flood control, recreation, etc.) and

nillion. We can now be very confident that benefits
of other
estimates
sum
underestimates
this
Now
one-sided confidenceinterval
in
understates. We don't
doesn't matter; the point is that we are almost
benefits.
Suppose we use similar
under-
this
but
much,
ho,\177
that
of its reliability;
consider the alternative
no account
take
low.
too
to $1 I0
stage we
since at eacl
policy point
)f
a much stronger conclusion than
the
that
\370
costs,
From
benefits.
underestimated
consciously
have
this is
view
these
that
exceed
\"best
of benefits is si20 million,

since
the reliability
of this estimate
(This strategy clearly has a major
drawback.
An understatement of' >enefits may reduce
the estimated
benefits below cost--in which
estimate\"
a m rstery.
remains
have to start all over again.)

the case\" against our conclusion,it is strengthened.
Economists
cften apply this general
philosophy
in another way by selecting
adverse assmnptions
in
order
to strengthen
a
policy
conclusion; they may
use one-sided
confidence intervals in the future for the same reason.
case
we wou?
(c) The
\"cooking
b'
Thus,
Con! dence Interval
as
Technique
a General
The read
ask- \"Doesn't
(9-36) reduce hypothesis testing
to a very
t of interval estimation ?\" In a sense this is true. Whenever
a
confidence
in\177
erval has been constructed, it can immediately
be used to test
any
null hypc thesis'the hypothesis
is accepted
if and only
i? it is in the con,\177r
may
adjure
simple
fidence
lent
interv\177
tI.
form:
To
A cbnfidence
the
0
I.e.,
the presen
of discount,
and
also important
set of
interval
as
be regarded
may
just
acceptable hypotheses.
value of these
:\177r
the extent
)nsiderations;
we can restate (9-36)in
this point,
emphasize
to
accumulated
which
but
benefits
benefits.
such as
Issues
here on
costs to
the
equiva-
(9-38)
must exceed
we concentrate
an
the
appropriate
justify
the
statistical
issues.
rate
project are
HYPOTHESISTESTING
192
is whether, in view of this, our study

of hypothesis
been a waste of time. Why
not
simply
construct
the
(single) appropriate confidence
interval,
and use this to test any null hypothesis that anyone may suggest
? There is a gooddeal of validity
to this conclusion;
nevertheless, our brief study
of hypothesis
testing has been necessaryfor the
question
next
The
in
testing
has
chapter
this
reasons'
following
hypothesis testing
1. Historically,
used
frequently
been
has
physical
in
research. This technique

must
be understood
to be evaluated;
specifically
the
nature
of type I and type II error and the warnings
about
accepting
H0 must be understood.
2. Certain hypotheses
have no corresponding
simple confidenceinterval,
and
are consequently
tested on their own.
3. The calculation
of a prob-value
provides additional information
not
available
if the hypothesis is tested from
a confidence
interval.
4. Hypothesis testing plays an important
role
in statistical decision
and
social science
theory, developedin
15.
Chapter
PROBLEMS
9-15
Three
claim
sources
different
the average
that
income in
certain
is S7200, $6000,and s6400respectively.

You
find from a
sample of 16persons
in the profession
that their mean salary is S6030
profession
and
(a)
at
the
At
is S570.
deviation
standard
the
\177
test each
level,
significance
of the three hypotheses,one
a time.
(b) Construct
by simply
3 hypotheses
noting
for 3t.
interval
it is
whether
Then test each of the
included
in
the
confidence
interval.
marks'
3, 9, 6, 6, 8, 7, 8, 9.
(9-16) A sample of 8 students made the following
Assume
the
population
of marks is normal. At a 5 \177 level of significance, which
of the following
hypotheses about the mean mark
(/\177)
(a)
'9-17
reject?
you
would
3to
--- 8.
(b) 3t0 =
6.3.
(c) 3t0
4.
(d)
--
9.
3to
As in the
second example
process of manufacturing
The engineers
than
the
old
have
standard.
of
television
Section
tubes
9-2(e),
has a
suppose
a standard
mean of 12,400hours.
a new process which

they
To establish this, a sampleof
found
hope
100tubes
is better
from
has a
rocess
i
ihours.
new
4000
mean of 12,760hours, andCONCLUSIONS

a standard
devi\177ition
i,
(a)
\177onstruct
a one-sided
(b)
(\177alculate
the prob-value
interval for the
confidence
associated
new/\177.
of
hypothesis
null
the
with
of
193
no ir \177provement.
(c)
/\177t
\177
do you reject the
of significance,
level
hyp6thesis
null
CONCLUSIONS
9-5
Hypothesis
for
the
several
testing is a
to an
preferred
observed
sa{nple
sample is
sff.tistically
a\177
interval
the
with
used
a clearer
care,
greht
confidenceinterval
gives
whereas a test merely
result,
be
must
of a
construction
hypothesis test;
is\177usually
picture of
indicates whether
or
not
the
the
significant.
are real
there
Second.
accepting
calculated.
technique that
the
First,
r',\177asons.
problems -especiallywith
the prob-value
of
H0;
implausible
instead,
'[his provides a clear and
sample
. in
should
be
test
of
picture
immediate
small
the
how
Well the
statistical re}ults match
H0,
leaving
the rejection decision to the reader.
Finally; rejection of H0 does not answer
the question
\"Is there any
practical economic (as opposed
to statistically
significant)
difference between
our samplei\177esult and H0?\" This is the broader question of decisiontheory,
it
developed
Review
9-18
Pr\370l
Fore
towa
9-19
[ellis
are tossed together 144times. The

is 2.2. To answer a gambler who
fears
coins
headl
ds heads, calculate
\177hesis
\370
hyp
15.
Chapter
of
fair
the
number
of
coins are biased
associated
with the null
prob-value
average
the
coins.
A saf
aple of 784 men and 820 women in 1962 showed that 30 percent
men and 22 percent of the women
stated
they were against the
John
Birch Society. The majority
had no opinion.
(a) I etting
rr\177 and
rrw be the population proportion of men
and
',n respectively
who are against the Society,
construct
a 95,g/oo
confi tence interval
for the difference (rr\177 -- rrw).
(b) !;?hat is the prob-value for the null hypothesis
that (rr3x -- rrw) =
of
thi
07
(c) /
woml
(d)
\177tthe
5,\177/o
significance
,m statistically
X
lould you
judge
level,
is the
difference between men
significant ? (i.e., doyou

this difference
to be
reject
the null
and
hypothesis) ?
of sociological
significance
HYPOTHESIS TESTING
194
(9-20) Of 400 randomly
selected
townspeople
candidate.
presidential
a certain
in a
certain city, 184 favored
Of 100randomly
the same city, 40 favored

the candidate.
(a) To judge whether the student population
have the same proportion favoring
the candidate,
value.
(b) Is
the
significant,
in
difference
at the
5 }/o
the
students
and
town
calculate
students
in
population
the prob-
and townspeople statistically
of 100 workers in one plant took

deviation of 2.5 minutes.
A sample
of 100 workers in a second
plant took an average of 11
minutes,
and
a standard
deviation of 2.1 minutes.
(a)
Construct
interval
for the difference in the two
population
means.
(b) Calculate the prob-value
for the null hypothesis
that
the
two
population
means are the same.
(c) Is the difference
in the two sample means statistically
significant
at the 5 \177 level ?
9-21 To
complete a certain
level
selected
an average
9-22
of 12 minutes,
task
a sample
and
a standard
By talking to a random sample of 50 students,suppose

you
find that
27 percent support a certain
candidate
for student
government. To
what
extent
does this invalidate the claim that
only
20\177o of
all the
students support the candidate?
!o
cr
chapt
of Variance
4nasis
INTR
10-1
ODUCTION
In the 1,tst
in
Section
population neans. Now we
made inferencesabout one population
we have
chapters
three
mean; more over,
8-1 we extended
this
to
the
difference
using techniques
calledanaly\177 .isof variance.
\177
Since
the development
of this technique
complicated
and mathematical,
we shall give a plausible,intuitive
of what is if .volved,
rather than rigorous proofs.
10-2
ONE.
OF
ANALYSIS
FACTOR
As an example,
compared.i\177ecause
suppose that
these
inexplicable':reasons,
hope of\"avCrag\177ng
r means,
compare
output
three
machines
per
hour
in two
commonly
becomes
description
VARIANCE
machines
are operated
is subject to
(A, B,
by
men,
and C) are being

and
foi' other
chance fluctuation. In the
and thus reducing the effectof chance fluctuation,

hours is obtained from
each
machine
and set' out
Table 10-1, \1771ong with the mean of each sample.
Of the nany questions
which
might be asked, the simplestare set out
random
saff [ple
out'
of 5
Table 10-2.
1To
a
in
in
Lrgument simple, we assume (among

other things) that there is an equal Size
a\177nfrom
each of the r populations.
While
such balanced samples afe typical
intal sciences (such as biology
and psychology),
they are often imp6ssible
in
the nonexperii
nental sciences (e.g.,economicsand sociology).
While analysis of Variance
can be exten& d to take account of these
circumstances,
regression analysis (dealt with in
Chapters 11 tC 14) is an equally
good--and
often preferred
technique. But regardless
of
its limitations,
analysis of variance
is an enlightening way Of introducing
regreSSi6n.
keep
the
sample (n) dr
in the experim
195
196
OF
ANALYSIS
TABLE
10-1
VARIANCE
Sample
of Three Machines
Output
Machine, or
Sample
Sample from
Number
i=1
48.4
=2
56.1
=3
52.1
Machine
49.7
48.7 48.5 47.7
56.3
56.9
51.1 51.6
48.6
55.1
57.5
56.4
52.1 51.1
51.6
X =
Average
\177
52.2
TABLE 10-2
How It Is
Question
the machines
(a)
Are
(b)
How much are

different ?
Analysis of Variance
hypothesis)
different
machines
the
Answered
Multiple
(test
Table
of
(simultaneous
comparisons
intervals)
confidence
(a) HypothesisTest
The
the machines really different?\"

That
is, are
10-1 different because of differences
in the
underlying
population means/\177i
(where/\177i
represents
the lifetime performance
of machine
i). Or may these differences in \177 be reasonably
attributed
to
chance fluctuations
alone?
To illustrate,
suppose we collect three
samples
from
one machine, as shown in Table
10-3. As expected, sample statistical
fluctuations
cause
small differences
in sample means even though
the/\177's are
first
the sample
question
means \177i
TABLE 10-3
is \"Are
in
Table
Three Samples of the
=2
=3
Machine
Sample Values
Sample Number
i=1
of One
Output
51.7
53.0
52.0
51.8
51.0
52.1 52.3 52.9 53.6 51.1
52.8 51.8
52.3
52.8
51.8
51.9
52.4
52.3
52.2
197
OF VARIANCE
ANALYSIS
ONE-FACTOR
identical. Sot
le question may be rephrased,

\"Are the differences in \177 of Table
ne order as those of Table 10-3(and
thus
attributable
to Chance
fluctuation), \177
\177rare
they large enough to indicate a difference
in the underlying
p's ?\" \177
he latter explanation seemsmoreplausible;
but
how do we
develop
a fori nal test ?
As befori
i the
hypothesis
of \"no difference\" in the population
means
10-1
sal
of the
becomes the
ull hypothesis,
HO:Pl =
The
alternate
is
hypothesis
different,
not necessarily
(but
some
that
for
H\177:p\177 \177,!p\177
(10-1)
=/\177a
P2
are
p's
(10-2)
i and
some
all) of the
To develo? a plausibletest of this hypothesis we first require

a numerical
measure of th\177 degree
to which the sample means differ.
We
therefore
take
the three sample means in the last column of Table 10-1and calculate
their
variance.
Using formula (2-6) (and being very
careful
to note that we are
calculating
the
variance
of the sample means and not the variance
of all
values
in the
t\177
Lble),
we have
(56.4
52.2) 2 +
--
\253\177 [(48.6
- 52.2)
2 +
(51.6 --
52.2)
-- 1:.5
(10-3)
where
.\177 =
Yet s} doei not

Table
whil
10-4,
that produce
average
tell the
;h has the
laI 'ge
(i.e., the
of rows
number
r =
s,\177 as
10-4
Samples
Different
Machine\177
i---1
Sample Output
54.6
53.4
--3
56.7
_1 \177 \177i
Fi_-- 1
for
Table
means), and
(10-4)
52.2
the dat
consider
example,
more erratic
10-1, yet
row. The implicatio
within each
fluctuations
chance
?ABLE
\177
whole story;
same
of sample
number
of the Production
of Three
Machines
from
Machine
45.7 56.7 37.7
48.6
48.3
57.5
54.3
52.3
64.5
56.4
44.7
50.6
56.5
49.5
51.6
X --
52.2
a of
machines
ns
of
198
this are shown
outputs could
all sample
that
differences in sample
the (same) differences
the
p's are
(s}) is
chance.
by chance
explained
be
hardly
variance
H0 .-because the
reject
and
different
the
hand,
we measure
can
means
as
the spread
Thus we
compute the
it
interpreting
(or variance) of observed values

variance
within
the first sample
(48.4 --
\177=
'
(n
*\177
where X\177
jth
is the
__
48.6)\370'q-
\177?\177)0-
\177
i=2
the
in
first
10-l,
'\"
sample.
Conceivable
\177
o
o
\337
i=3
each
Table
(10-5)
observed value
i=1
within
.52
\177
in
be
1) j=\177
--
we seem to
? Intuitively,
fluctuation
chance
this
in
conclude
sample
in
large relative to the chance fluctuation.
How
sample.
by
have
now
We
can
means
sample
in
same population--i.e.,
On the other
the
from
explained
so erratic
because the machines in this case are not erratic.

our standard of comparison.In Figure 10-1
(b)we
10-lb,
Figure
be drawn
may be
means
10-1a, the machinesare
In Figure
10-1.
Figure
in
VARIANCE
OF
ANALYSIS
00
..\177
oo
x,;
a,
6O
5O
40
common
(a)
i=1
i=3
i=2
\177
\177
, Apparently
3 different
'\177
i=2
i
populations
o \177oo,
\177
= 3
0o,\177oo
60
50
(b)
FIG.
10-1
(a) Graph
of Table 10-4.(b)
be
Graph
different.
of Table
10-1.The populations
appear
to
'we compute
Similarly
second
an :1 third
(s\177)
the
becomes
(n-
\177easure of
\177m
of
each
or
The
simple
.25
chance fluctuation
the r samples,
freedOin, s\370 that
the
of
dom.
The key question
can
fluctuation
of these
the
within
(10-6)
= .547
variance
sample
have
P001ed ' variance

n\370w b e Stared.
is
\"pooled
to as
is referred
and
we
:s of
fre\177
average
.87 +
.52
1) degre,
degrees
29
chance
variance
(%2).
\177 si 2
-1I'i=\177
= =
sv
variance.\"
the
samples
with
examine the
1)
r(/-
s} has
sx2 large
relative
to %.
In practic , we
199
'
OF VARIANCE
ANALYsIs
ONE-FACTOR
ratio
(10-7)
called the ' 'v j ri ace

n
ratio.\"
whenever Hoi\275
this
is F-introduced
ratio
it
the average,
will
so that,
numerator
the
into
on
have,
will
of statisticaI fluctuation,
becailse
however,
and
true,
a value
below.
sometime\177
0 is nct true (and the/\177's are not the same) then ns} will be
compare]
to s\177, and the F value in (10-7)
will be greater
Formally,
H 0 \177srejected
if the computed
value of F is significantly
If H
relahvely
large
1.
than
for the formal

means are the
same,
meaningless.
a 2,
1. The
(in
fact,
gfeater
;t obvious
be
could
way is
anbther
normal
are necessary
assumptions
true, and the three population

of our data into three samples
is
viewed as one large sampledrawn
three alternative ways of estimating
division
p \177pulation.
Now
consider
of that population.
m(
these
(10-7) from
from three
H0 is
addition,
in
the
then
observations
All
a single
variano
the
variance;
If
below).
lest
samples
our
that
the same
wii:h
populations
we interpret
are drawn
test further,
this
of vieW/Suppose
from
1.
than
del/eloping
Before
point
1;
near
be above'one,
sometimes
to estimate it
by
computing
the variande
of
the one large s\177.mple.

2.
each
The
of the
second
way is to
3 sa nples as in
estimate
and
(10-5)
by
it
(10-6]
averaging
This is the
the variances Wlthin

the denominator
s 2 in
of (10-7).
3.
from
Infer
Chapter
a\177
from
\177ihow
of the populati,'m:
s2x-\177,the
the
observed
variance
variance
of sampl e
of sample
means is related
means.
to
th e
Recall
varfance
(7 2
or
(6-12) repeated
(10-9)
200
VARIANCE
OF
ANALYSIS
This suggests estimating a 2 as nS}r, which is recognized as

(10-7). We note that
we are estimating
population variance
the observed variance of the sample
means.
by
of
numerator
the
up\"
\"blowing
if H0 is true, we can estimate

rr 2 by three valid methods.
last two, we note that
one
appears
in the numerator of
(10-7),the other in the denominator; they should be about equal, and their
ratio closeto 1. [This establishes
why n was introduced into the numerator of
(10-7).]But if H o is not true, the denominator will still reflect only chance
fluctuation, but the numerator
will be a blow-up of the differences
between
means; and this ratio
will consequently
be large.
The formal
test
of H0, like any
other
test, requires
knowledge of the
distribution of the observed
statistic
in this case F if Ho is true. This is
To recapitulate:
Considering only
the
F.o5 value, cutting off 5 % of the upper tail

shown.Thus, if H0 is true there is only a 5 \177 probability
that we would observe an F value
exceeding
3.89, and consequently
reject H0. It is conceivable, of course,
that
H0 is true and we were very
unlucky;
but we choose the more plausible explanation
that
H0 is false.
To illustrate this procedure,
let us reconsider the three sets of sample
results
shown in Tables 10-1, 10-3,and 10-4,and in each case ask whether the
machines exhibit differencesthat are statistically
significant.
In other words,
in each case we test H0:fq = ft2 = fro against
the alternative
that they
are
not equal. For the data in Table
10-3, an evaluation of (10-7)yields:
shown
of
the
in
10-2.
Figure
The critical
is also
distribution
.35
nsx
\177
is below the
this
Since
differences
observed
fluctuations.
in
Table
10-3
in
critical F.o5
means
(I0-10)
_ .64
.547
value
of
3.89,
can reasonaNy
(This is no surprise; recall that

from the same machine.)
we
be
we conclude
explained
generated
by
these three
that
the
chance
samples
Reject H0
0
FIG. 10-2
The
distribution
of F
when
3.89
3
H 0 is
true
(with
2, I2
degrees of freedom).
ONE-FACTOR
For
the
Table 10-4, the
ata in
ratio
OF
ANALYSIS
201
VARIANCE
is
77.4
F- 35.7-2.7
In
case,
this
sample means (and consequently the

is the chance fluctuation
(reflected
in a
ator). Again, the F value
is less than
the critical vaiu e 3.89.
for the data in Table
10-.1, the F ratio is
'
large denomit
However
between
difference
:he
numerator) is
But so
greater.
much
77.4
\177-
case,
In this
tt
in sample
e difference
fluctuation,
F ratio
the
\177king
--
.547
\1774\177
means is very
far exceed the
(\1770-\1772)
large
chance
to the
relative
critical value
so that
3.89,
H0
is rejected.
These
I 0-1
he only case
dil\177 :rent
means.
our earlier
confirm
tests
.\177e
formal
provides
tions have
(b)
thr
in
which
we conclude
that the
Table
conclusions.
intuitive
underlying
pSpUla-
F Dii tribution
The
This
dist\177
bution is so important
some detail. The F
for
later
it is
applications,
Worth
distribution
shown
in Figure
10-2 i s only
different distribution depending on degreesof freedom
(r -- 1) in the \177umerator,
and
degrees
of freedom [r(n -- 1)]in the denominator. lntuitivel),
we can see why this is so. The more degrees of freed6m in
calculating
bo h numerator and denominator, the closer
these two estimates
in
considering
one of
many;
aere
of variance wi\177
is a
a2; thus the more closely their ratio

is illustrated in Figure
10-3.
::
We
could
\177resent
a whole
set of F tables, each correspondingto a
different
combination
of degrees of freedom. For purposesof practical
testing,
however,
only .the critical 5 % or 1 \177opoints
are required,
and are set Out in
Table
VII in t le Appendix.
From this table, we confirm
the
critical
point
of 3.89 used in Figure
10-2\177
will
(c)
likely
The
ANO
This
Tabl e
his
devoted
sectic
the same varia

is th
cr 2-
\177
to a summary shorthand of how

is summarized in Table
10-5.
model
tll samples
ace
their target
1. This
rA
\177
are usually do\177 \177e. The

column
2 that
(Indeed it
be to
around
concentra/
but,
are assulned
of course,
possible differences
in
these
We
CalCulations
confir m
in
drawn from normal populations wit h

means that may,
or may not, differ.
means
that are being tested).
OF VARIANCE
ANALYSIS
2O2
d.f.-- 50, 50
12
d.f. = 8,
d.f. = 2,
0
nominator.
Note
the
The resulting
an ANOVA
is mostly
of the
(b) part of this

Table 10-1.
In addition,
ratio,
we evaluate
table
calculations.One
table
this
is on
with the
two handy
Summary
in
Population
\177
in
denominator;
machines
three
of Assumptions
(3)
Observed SampleValues
Distribution
l'-'n)
N (/\1771'
62')
X\177j
(j=
N(t\177o.,
\177)
X,,\177
(j
= 1 .\"
N(P3,'
cr 2)
Xas
(j
= 1 '\"
n)
n)
(j
= 1 '\"
n)
N(/ti, cr2)
/-/0' tt\177=Pg
\337
these
means
Xi\177
=3*i, for
any
in
intermediate checks on our

3. The other is on sums
column
(2)
Assumed
calculations
showing
row the
the specific exampleof the
of freedom
10-5
row
first
called
10-6,
Table
Of VAriance. This
the second
and
provides
degrees
T^tmE
in
shorthand
arrangement,
a bookkeeping
of the numerator
conveniently
laid out
for ANalysis
are
obvious
an
rejecting
freedom increase.
calculations
table
degrees of freedom in numerator

and deH 0) moves toward 1 as degrees of
point (for
critical
3.89
2.85
various
with
distribution,
how
1.60
FIG. 10-3 The ?
12
are not all equal
II
203
ANALYSIS
204
squareswithD\177
is divided
the
by
The variance
variation
F is
Thus
withD\177
(e.g., machines
parent populations
be systematically
sometimes referred to
F=
the
as
explained variance
(10-17)
Proved as follows.
The difference,
or deviation of any observed
of all observed values (.V), can be broken
down into two parts.
= explained deviation
deviation
,g)--5
(xi;Table
using
Thus,
(56.9)is
than
4.7 greater
of this
unexplained, due
two
On
the
right
(56.4-- 52.2)
4.7 -- 4.24- .5
side, the middle
the
Furthermore,
the
i=1
(.\177/
algebraic
first
--
sample
56.4)
and./'
ij
(cross product)
'=
term
(Xij --
,V/)
sum of deviations
term on the
j=l
side
right
x,.)
XX
ij
(10-14)
is
, which
about
must be zero since
mean
the
of (10-14)
'=
is always
zero.
is'
(10-15)
i=1
of./
independent
Substituting
(56.9 --
over all i
sum
5:22
ij
(10-I3)
Y,) )
is explained by the machine (4.2), while very little (.5) is

fluctuations. Clearly (10-13) must always be true, since the
random
\177j
\177)+ (x\177 -
52.2) =
occurrences
of X\177cancel.
Square both sides of (10-13)and
522;
deviation
deviation
total
to
4- unexplained
value (Xij) from the mean
10-1 as an example, the third observation

in the second
.\177= 52.2. This total deviation can be broken
down into
(56.9 --
Thus most
(\234\177-
or chance
machines).
variance
unexplained
Total
differently).
perform
because it is the random

explained (by differences in
variance ratio.
is \"unexplained\"
rows
cannot
that
2; the
adds
different
from
come
rows
variance
The
sum
of squares
between rows plus the sum of
up to the total sum of squares.
2 When
any variation
appropriate
degrees of freedom, the result is variance.
betwee/\177 rows
is \"explained\" by the fact that the rows may
in column
of squares
VARIANCE
OF
these two conclusions back
into
(10-14),
we have:
(I0-16)
i
Total
\177'J
variation =
explained
variation
+ unexplained variation.
205
OF VARIANCE
ANALYSIS
ONE-FACTOR
suggests
This
of strengthening
this F test. Suppose that these
sensitive to differences in temperature.
Why not introduce
temperature
\177xplicitly
into
the analysis ? If someof the previously
unexplained
variation
ca r
now be explained by temperature,
the denominator
of (10-17)
will
be redm
ed. With the larger F value that results we will have a more
powerful test
the machines
(i.e., we will be in a stronger position t \370reject
Ho). Thus th\177 ofintroduction
of other explanations of variance
will assist
us in
macbin
three
detecting
us
a possiblemeans
as are
wh\177
to two-wa
specificinfluence
ither one
in Sect/on
\177ANOVA
10-3.
is important.
(machine)
This brings
*(d) ConfidericeIntervals
diffi.
The
ANOVA
cas\177
means
tion
practical or
how
\"by
e :onomicimportance. Again,
\177uch
If we wa
\177ted
#2)
usi\177 tg
(,\177 --
-\1772)
differ
the
and
in
the
whether p'bpUla.
a difference
can
ask
\177
such
to be of any
important to find
small
be more
?\"
machines
only two
compare
be an easy c nestion to answer' just

(#\177
to
may
it
means
population
do
to
9 hold true
cited in Chapter
tests
be too enlightening
ffer; by increasing
sample
size enough,
be established.-.even though
it Es too
alway,,
nearly
out
:ulties with hypothesis

as well. It may not
in
construct
i0-1,
Table
confidence
this Would
interval
for
t distribution'
(8-17) repeated
In
(8-17),
more
it is
variance
s\177 was
re\177
frorr
\177q-/\1772)
the
all
use
samples as in
of freedom. Thus
all three
12 degrees
4+4+4=
pooled from the two samples. However,

information available, and pool the
variance
the
.sonableto
(48.6 --
56.4)
obtaining
(10-6),
the
95
\177oconfidence
2.179V/.-\177x/\177
zk
with
2
.547
s\177interval
is
- q-\253
;
Similar
Pa) and
for (/q-
\177nfidence intervals
for
(p\177- Pa)
may be
constructed,
a total
example,theS\177 ,rintervats
of
are' three
(P\177
--
pe)
(/q --/\177a)
(P2 -The
results
of
intervals
1.0
(a)
--3.0
=k
1.0
(b)
+4.8
=k
1.0
(c)
= -=
Pa) =
[or (g)for
his piece-by-piece
approach
7.8 :t:
are
r populations]; inour
summarized
(10-19)
in Table
10'7.
,:
OF
ANALYSIS
206
Differences
I0-7
TABLE
VARIANCE
of Confidence
in
-7.8
*(e)
There is just one difficulty
The level
of
a--
.857,
fact
they
are not;
our observed
go wrong.
in the
system
could
this
confidence
three individual
for example, they
consequence.The problem
is how
obtain the correct simultaneous
In fact,
this
must the
confidence
problem
that
to allow for
(10-19)
- ?a) +
the
(\"
x/r.o5
x/(r--
-\177a) =1=
x/F.
o5
s\177
(r
//
much
simplest,
statements4
are
in
if
wider
level of
due to
true.
(a)
1) 2
(b)
(c)
1)
to
wide as a
order to
a 95%
to yield
proof
are
whole system.
how
around:
order
simultaneously
solutions,
we quote without
% confidence, all the following
in
for the
Thus
s\177.
dependence
this
coefficient
be in
true ?
term
will be
(I0-I9)
in
stated the other way

in
all are
Of the many
Scheft6;a with 95
estimates
But
independent.
were
be
far
be reduced
the common
involve
confidence
is usually
intervals
individual
can
we
Although
(10-19) would
statements
all
all three interval
high,
s\177is
Comparisons
approach.
the above
if the
(.95)
statement [e.g., 10-19(a)],we can be

system of statements (10-19)is true; there
whole
the
that
ways in which
three
4.8 4- 1.0
individual
of each
less confident
with
4- 1.0
-3.0
1.0
4-
Multiple
Intervals:
Confidence
Simultaneous
confident
2
3
Level
Estimate
(/t\177.-/q)
.\177x). 95\177o
Interval
Each
95,\177/o
Means
Population
in
from Sample Means (.\177- -
Estimated
(10-20)
H. ScheffC The Analysis

of Variance, p. 66-73, New York' John
Wiley,
1959.
4 And some other
statements
as well--as we shall see in (10-26). In fact if we were interested
a
in the three comparisons of means

slightly narrower.
only
in
(10-20),
our interval
estimates could
be made
ANALYSIS
oNE-FACTOR
207
OF VARIANCE
where
F.0
tic
r =
n
We
machines
tail.
upper
it\177the
sample variance, as calculated
thl pooled
s\177 =
nt\177mber
Table
--
\177
of
--
the actual
3t2 =
(48.6 --
3ta =
ca.culations
the
to be
compared.
of statements (10-20)and
(10-19).
simultaneous confidenceintervals
10-1,
These
equa-
size.
note the similarity
\177
width
(means)
rows
of
sample
eai\177h
in
10-6 or
in Table
n (10-6)
\177:onfidence
-7.8
4.8
56.4) -_1= 43.89
the
are
(.74)x/\177-\177)
\361
(a)
1.3
(c)
:t=
For
are summarized in
Table
10-8.
As expected, the
than in Tabl e 10-7 (compare 1.3
width (vagueness) that makes
uS 95
is greater
interval
versus 1.0).Indeed,it is this increased

confident thaf! all statements are true.
As
a bones,
this theory can be usedto make
any
number
of comparisons
of means, called\"contrasts.\"
A \"contrast
of means\" is defined as a qinear
combination,
!or weighted
sum,
with
weights
that
zero:
add to
i=1
provided
(10-22)
\177C\177=O
i=I
Differences in Population
Means (3\177i - lb)
stimated from Sample Means (\177 -\177t). 95 % Level
f Confidence
in All Interval Estimates. (Comparewith
'ABLE 10-8
--7.8 4- 1.3
o
-3.0
4-
+4.8
4- 1.3
1.3
208
ANALYSIS
For example,
the simplest
OF
VARIANCE
contrast is the
(q-
/\177-/\1772 --
It
is the difference
(\177=
is
There
no
to the number
limit
contrast of the population

means
the sample means, plus or minus
As another
the
example,
average
of contrasts.It
will be
of means
given
statement, from
95 3/o confidence,
With
contrasts
and
each
the
same contrast
of
(10-21a) is one example.
in (10-24)
is estimated as
\177'
(10-20)
which
that
surprise
by
i
The general
and/\177a'
(10-24)
no
is
estimated
allowance.
error
an
contrast
of/\1772
+ (-
+ (-
= (+
interesting
Another
(10-21a).
in
between/q and the
(10-23)
(0)a
(--1)\177= +
q-
was estimated
that
contrast
this
was
contrast
1)/\177
difference
(10-25)
23
(0-25)
were derived,
is
all
are bracketed by
the
bounds'
(10-26)
\337
provided
2
is pooled
When
set of
three
infinite
(r --
1)(\177
only that \177 C\177 = 0 to satisfy the definition

of \"contrast.\"
variance,
and F o5 is the critical value of F.
we examine
(10-26) more carefully, we discover that
95\177o
simultaneous
statements
number
justifiably
statements
confidence
in (i0-20) but
of contrasts
wonder
\"How
?\" The
answer
also
that
intervals
statements
can
be
this
defines
which includes not only

the
like (i0-25), and indeed an
constructed.
The student may
be 95 % confident
is: because these statements
can we
As before
of an
are
infinite number
dependent.
of
Thus,
for example, once we have

made
the first two statements in '(10-21),
our
intuition
tells us that the third is likely to follow. Moreover,oncethese
three
statements
are made, intervals like (10-25)tend
to follow,
and can be added
with
little
damage
to our level of confidence.As the number
of statements or
contrasts grows and grows,each new statement
tends
to become simply a
restatement
of contrasts
already specified, and essentiallyno damage
is done
to our level of confidence.
Thus,
it can be mathematically confirmed that
the
entire
(infinite) set of contrasts in (10-26)
are all simultaneously
estimated at
a 95 \177olevel
of confidence.
ONE-FACTOR
PROBLEMS
\177
sa
\177
10-1
(a) Usi \177gfirst
Industry
66
Industry
58
a t-test
(as
income (in $00) recorded,as
annual
average
their
with
follows'
\177ndusm.\177s,
at random from two different
was drawn
4 workers
of
63
61
65
53
62
56
or not there is a statistically significant

at the 5 % level.
th e t and F tests exactly equivalent ? Can you
whether
calculat\177
F-test,
then an ANOVA
8) and
Chapter
in
209
vARIANCE
OF
ANALYSIS
difference
in inconce
Are
(b)
freedo m
to as the
is often
referred
the
numerator
?
distribU\177:ion
in
*(c) Usi g first the t distribution

(10-20), construct a 95 \177oconfidence
income\177 in the two industries.
1
=>
10-2
plots of
TwelVe
is held
other 2\177groups.
(a) A1
Yield
60
64
75
70 66 69
74 78 72 68
65
a 5
the
st\177rring
*(c) Can you
be
differences
95
\177 confident
You
observed
lave
cel rain
occupation
the
applied to the
55
:
yield?
affect
income
(Y) of
contrast
a sampleof
a 5
women
610
56
70
50
62
54
48
% level of significance,
income is the same
mean
and
men
and
Men
48
th it
means,
of
to be'
Women
(a) A
mean
similar
to Table' 10-8,
that are statistically significant.
that the two fertilizers have a different
W hat is the difference between

w\177
ighted average of means?
*(d)
in
effect
in a
Control, C
of
Th e first
3 groups.
B are
and
% significance level, does fertilizer

C6\177nstruct
a table
of differences in means,
*(b)
10-3
is
difference
into
divided
while 2 fertilizersA
observed to be:
group
1 degree
F distribution
the
then
for the
interval
the t 2
why
see
with
distribution
(8-17), and
land are randomly
control
\177s a
can
you
for men
reject
and
the
women
hypothesis
null
210
Construct
*(b)
VARIANCE
OF
ANALYSIS
difference in
for the
interval
two
the
means.
Since
(a)
Table
ANOVA
variance
d.f.
Variation
Source
state its solution.
13 we
Chapter
in
F = 56
60
Fs =
52
F\177 =
later
is important
problem
this
Between
sexes
128
128
128
= 2.67= F
48
Residual
288
Total
416
value of 5.99, thus
than the critical
is less
'48
signi-
statistically
not
ficant.
*(b) Evaluate the
or, more simply
in (10-20);
equation
first
(10-18),
t.0s5 = v/F. 05
that
noting
(,u\177
This also confirms
--
,u\177)
the
(52 --
-8
answer
60) 4- 2.45x/\177'-\177/2/4
4- 12
since
in (a);
this
zero,
includes
interval
this is not statistically significant.
'10-4
the
to
Referring
example of
machine
use equation
10-6(b),
(10-26) to
Table 10-1and
Table
ANOVA
the
solve
incidentally
following
problem:
Suppose one factory

the
of the
machines
70\177o.
Find
following
second and
95\177o
for the
production
10-5Fromeach of
is
to be
outfitted entirely
Suppose a second factory
first type.
three
third
confidence
types,
interval
with
of
machines
outfitted with
in the proportions
30 \177 and
for the difference in mean
is
to
be
2 factories.
large
classes,
50 students were
sampled, with
the
results'
Class
A
Average Grade .\177

68
73
70
Test whether the classes
are equally
Standard
Deviation,
11
12
8
good
at a
5 ,%0
significance
level.
TWO-FACTOR
ANALYSISOF
10-3 TWO-['ACTOR
'
VARIANCE
tlready
seen that the F test on the differences
wou !d be strengthened if the unexplainedvariance
in machines
given
could
be reduced.
have:
We
(10-17)
in
for example,that
We suggested
is
variance
the human
duL\177
' to
for. Suppose \177hat
sample
the
variance
unexplained
some
if
or
into account;
be taken
might
:his
temperature,
21i
vARIANCE
Table
ANOVA
The
(a)
OF
ANALYSIS
if
factor, we shall seehow
this
given
10-4
outputs
in
Table
is due to
unexplained
some
might
be adjusted
were produced by
machinist producing one of the sample

reorganized according to a twO-way
classification
by machine and operator), is shown in Table
10-9. It is necessary
to complicate our notation
somewhat.
We are now interested in the average
of eac h opera
:or (.,?j, each column average) as well
as the average
o\177 each
machine
(Xi., each row averageS).
Now the picture is clarified; some operators are efficient
(the
first and
fourth), some are not. The machines
are not that erratic after all; there
is just
a wide differe\177 ce in the efficiency of the operators. If we can explicitly
adjust
for this, it willlreduce our unexplained
(or chance)
variation
in the denomina tor of (10-17)isince the numerator
will remain unchanged, the Fratio Will be
larger
as . a co!p seq uence, p erhap s allowin
gus to reject
. H0. To sum Up, it
appears
that
a nother influence (difference in operators)
was responsibl e: for a
with each
This
data,
different/aachinists
five
on eac h
values
S;,mples
10-9
T^BLE
in Table
oper
\177
Machine
Mac
machine.
at
The
dot
suppresses
arranged
indica
:es the
\177script
(as given
Machines
to machine operator)
Xf.
45.7
48.3
54.6
37.7
Average
48.6
53.4
50.6
54.3
49.5
57.5
56.5
52.3
44.7
56.4
56.7
59.3
49.9
50.7
56.2
44.9
56.7
64.5
avirage
\177.j
the st
Three Different
according
Machine
j=l
hi\177f 1\177...\177
4
Operator
now
or
(Xis) of
of Productisn
10-4, but
subscript
over
1
which summation
j in \177i. = - \177 Xi\177'
occurs.For
example,
5!.6
X
=52.2
the
dot
212
ANALYSIS
OF
VARIANCE
OF
ANALYSIS
TWO=FACTOR
lot of extrane(
us noise in our simple one-way
analysis
in
by removing .his noise, we hope to get a much more
machines.
The anab
of columns
ble I0-10.Of
in
at the
7'
variation
\177
of
bottom of
letter c
small
of our
test
is sum-
and
represents the
before, thecom-
10-4. As
total
the
to
sum
number
variation
i.e.,
column,
this
section;
the previous
powerful
ANOVA,
one-factor
the
the
course,
10=9, and replaces n in Table

shown in column 2
rable
source
ponent
of
fsis is an extension
in Th
marized
213
'
VARIANCE
\275
= ci--1
5;
Z 5---1
Z(x,
i=1
To\177al v
iriation =
machine
variation
(row)
j----1
operator
variation
(column)
7'
+,
\177, \177(X\177+ rand\177)\177r\177

variation
i
ti
\337
that
note
We
(10-27) is estal\177liShed
otaI
the
test
there is a
the
is a
\177'_
-which, if H0is
(10-28)
factor
will
\370nly
means.
those
the random
be interpreted bei\370w.)
(The last term

it will
the
there
\177tions
as
F value
critical
is a
are
difference
shown in
in
full
(10-28)
Thus, if the observed
F distribution.
an
by machines
variance
explained
unexplained
we
reject
may
row
population
in
Table
we
or whether
by constructing
in machines
differences
for
(10-27),
test the extraneous
operators;
in either
taken
into account.
be
in
in machines,
difference
significant
variance
MSS7'
MSS\177,
rue, has
exc\177-\177ds
concluding thai
Our calc u
evaluated
the
to
parallel
maniPUlatiOns,
case.
puzzling;
bit
in
difference
other
On the on!: hand, we test

the ratio
in
of
broken down into components
variation
\177ether there
significant
of
influence
variation;
H potheses
With
can now
a complex
set
the
simpler
may seem
10-27)
in
Testing
(b)
by
(10-27)
\177)\177'
ted by column
exhibi
variation
(10-16) in
used to establlsh
variation
defined as the
this is
like machine
is defined
variation
\177perator
is thjat
difference
-\177s+
,\177i.-
null
means.
the
10-11,
F calculated
hypothesis,
whence (10-28)
is
77.4
FSince
this
exce.
and 8 d.f., and
is the
critical \177F
(10-29)
5.9 -- 13.1
value
of 4.46,
we reject the
null
hypothesi
5 \177osignificance.
UNIVERSITY
CARNEGIE-MELLON
LIBRARIES
UNIVERSITY
PITI'SBURGH,PENNSYLVANIA
152I,'1
ANALYSIS
214
VARIANCE
OF
Two-Way
TABLE 10-11
(3)
(2)
(1)
(4)
10-9
Table
in
(5)
(6)
Critical
Variance;
Variation;
(SS)
Source
Given
Observations
for
ANOVA,
d.f.
(MSS)
Between
machines
154.8
77.4
13.1
4.46
Between
operators
381.6
95.4
16.2
3.84
Residual
variation
47.3
5.9
583.7 x/
Total
14
machines are similar. We now

not reject the null
unchanged, but the chance variation
since the effect of differing
operators
that the
leverage,
statistical
greater
we might
Similarly,
variance;
but
from column
F =
time,
this
case,
null
of
(10-11),
remains
us
hypothesis.
perform
the operators
that
ratio of an explained to an
course, the numerator is the variance
unexplained
estimated
differences. Thus
by operators
explained
variance
the
noise has
\"machine\"
There is one issuethat

our one-factor
test
tion. In
at the spread
whole
row in
of
--
variance
a strong test of how operators

exceeds the critical F value s
that machinists do differ.
our
of the
hypothesis
in
is much
smaller,
This has given
in the denominator
been netted out.
has
null
Ftest
numerator
The
is the
unexplained
this
this with our
hypothesis.
rejection
allowing
test the
equally well. Once again
In
compare
we could
where
x/
been isolated;as
passed
over quickly,
we calculated
observed
Table 10-4. But

observations
columnwise,
values
in
null
that
unexplained
within a
the two-way
we get
a consequence
observed F
Since our
reject the
16.2 (10-30)
5.9
MSS\177
compare.
of 3.84, we
we
95.4 =
MSSc
of 16.2
concluding
value
hypothesis,
still
clarifica-
requires
variation
by
looking
category, or cell, e.g.,within

test (Table 10-9)we have
as well as rowwise;this
has
left
us with
a
split
only
we have a stronger test because

we have gained more by reducing
than
we have lost because our degrees
of freedom in the denominator
have been reduced
by 4. (The student will observe
that if we are short of degrees of freedom-i.e., if we are near the top of F TableVII, loss of degrees of freedom may be serious.)
8 Different than in the previous test since degrees of freedom are now 4 and 8.
Strictly
unexplained
speaking,
variance
TWO-FACTOR
one observation in
tion (57.5) o how
car l
Variation
is produced
output
much
there is only
for example,
Thus,
cell.
each
be computed within
that
cell.
were no random error, how
no longer
a single
observa-
machine 2.
should We do ?
4 on
operator
by
215
vARiANC E
OF
ANALYSIS
What
We
ask,
i\"I f there
would
we predict
the
output of op[rator 4 on machine
27
We note, reformally,
that
this
is a
better-than-a\177erage machine (.\177. -- 56.4) and a relatively efficient operator
(-\177.4 -'
56.2). !On both counts we'would
predict
output
to be above average.
This
strategy
\177an easily
be formalized
to predict -\1772.4. We can do this for each
cell, with the.. random element estimated as the difference in our observed
value (X\177) ar/d the corresponding
Predicte d value (\"\177S). This yields a Whole
set of randorr elements,
whose
sum of squares is preciselythe unexplained
variation
9 SS
(the last term in equation
(10-27),
also appearing in column
2
of Table 10-1( ; divided by d.f., this becomes the unexplained variance used
in the denomi nator of both tests in this
section.
One final warning'
there is
no intiraction
if certain
oper\177ators
valut
Predicted
some
\"\177ij is
-=
defined
\177 +
as
X) +
in o!r example
Specifically,
.'\1779.\177
= 52.2
\177
-- 52.2
Thus,
our predic
tion of
adjusting average
(4.2)and
the
Cancelling
de! ee
to
el
emphasize
both machine
In
our
th;
iar
exam'
random
d operator
le
Thus, this obserw d output

plained--the
re sul of rando
Unexplained
elements
as defi
ne
= -'\177. +
ement, being the difference
\177t
this
'ariation
(56.2- 52.2)
2 is
machine
on
this
machine
calculated
by
is above average
(4\1770).
in (10-32):
X/\177 -
We
(10-32)
+ (56.4- 52.2)+
+ 4.2 + 4.0-- 60.4
'\177J
and the random
-- \177)
of operator 4
(52.2) by the degree to which
this operator
is above average
which
\177values
(,\177j
performance
the
performance
(10-31)
performance
oper\177ator
(,\177i.
-!- adjustment
performance
machine
reflecting
adjustment
:eftecting
'\177*\177'
that
assume
we
complex model, and more sample observations.The
a more
require
would
.,\177j,
the two factors as would occur, for example,

machines,
and dislike others; suchinteraction
between
like
output
predicted
computing
in
j.
,\177
-'\177i. -
57.5 --60.4
&\177
is 2.9
units
(SS\177)
is recognized
below
m influences.
what
(10-33)
and expected,
observed
3\177.\177
+ \177
left
Output
X9.\177-
in (10-34).
-- -\177
between the
X\177j -
eleme nt is
\177,j
becomes:
(10-34)
after adjustment
unexplained
(10-35)
i!
= --2.9
we expected,
and
must
be left Unexi!
to be
the
sum
for
of squares
of
all
random
216
analysis of variance developed

in
interaction
does not exist.
two-way
tion
VARIANCE
OF
ANALYSIS
that
*(c) Multiple
on the assump-
is based
section
this
Comparisons
Turning
tests to
hypothesis
from
statement for
in row
may write
to (10-26)'
means
the bounds'
within
fall
is quite similar
all contrasts
95 \177oconfidence,
With
we
intervals,
confidence
which
ANOVA
two-factor
(10-36)
\177 Ci/\177i
4- x/F.o s
\177 Ci,\177i.
sv
where
F. 0s = the
=
sv
critical value
of
calculated
x/Mss\177,, as
r -- number
of rows
c --
of
number
(r --
F, with
in
1) and
10-10,
Table
1)(c --
--
(r
1) d.f.
column
columns
that (10-36) differs from (10-26) becauseunexplained

smaller, making
the confidence
interval more precise.
As an example,considerthe machines
of Table 10-9, analyzed
10-11. With 95% confidence, all the following
statements
Note
sv
variance
is now
Table
/q-/\177,.= (48.6-- 56.4)

t\1772 =
/\177t --
/q
--/\177a
and
[Intervals
that
do
all
-- 7.8
4- 4.5'
--3.0
q- 4.5.
4.8 4-
other
possible
/\177z --/\177a
overlap
not
significance'
thus H0 (no
cases
another
illustration
hypotheses.]
Of course,we
interchanging
in
44.46
v/5---.\177
are true'
25
(0-37)
contrasts
in
starred to
means)
the column
(10-36).
their
indicate
would
of how confidenceintervals
equation
ANOVA
4.5'
zero are
difference
contrast
could
r and
4-
in
statistical
be rejected
may
be used
in
these
to test
means equally well, by simply

As an example, how do the
ANALYSIS
TWO-FACTOR
operatorsof
95 \177oco
With
analyzed in
10-9 compare, when
'able
the
all
\177fidence,
following
4- 7.8*
8.6
=t=
3.1
4- 7.8
v/3.84
217
Table
ANOVA
10-11 ?
are true'
statements
(59.3--49.9):Jr
9.4
vARIANCE
OF
\177v'/\177(2
i
x/5.-\177
718'
14.4 4- 7.8*
--0.8
7.8
4-
(0-38)
and
all other
For
example\177
possible contrasts, of the
(55.4 --
This last conworkers
2 and
:ast
might
be
of interest
5 are women; thus

has been estirr ated, as a bonus.
The
presented
first
of
part
mot
\177concisely,
BLE
form
the
8,0 4-
th
if workers 1, 3, and 4 are men, and

difference in men and women
average
diffeYences in means-
equation
(10-38)--all
in Table 10-12.
10-12 Differencesin Operator

the sample means
e listed
95 \177 simultaneous
2
9.4*
8.6*
-.8
3to\177
To
\177o0.
take
intervals,
differences
significant
ttj
Means
(,\177.j-
confidence
value + 7.8. Statistically

are starred.]
\177
5,0*
[l\177stimated from
c\177instruct
47.4) 4- 5.5x/5-
3.1
-6.3
--5.5
14.4'
5.0
5.8
11.3'
can be
ANALYSIS
218
VARIANCE
OF
PROBLEMS
refine the experimentaldesignof Problem

10-2,
suppose
the twelve
plots of land are on 4 farms
(3 plots
on each). Moreover, you suspect
that
there
may be a difference in fertility
between
farms. You now
retabulate the data in Problem
10-2, according
to fertilizer and farm
10-6 To
as follows.
\177Farm
Fertilizer'\177
69
72 74
or not
whether
(a) Reanalyze
nificance
level.
75
(10-7)
of differences
struct
a table
Three
men work
boxes
packed
on an
by each
television
22 22
four farms?
similar
to Table
also con-
boxes. The number

in
the
of
below.
table
18
significant at the 5 \177 level.
significant,
construct
is statistically
statistically
are
confidence
pulse
Before
25
16
17
with the following
program,
sig-
21 18 21
P.M.
for
the
5\177o
4-5 P.M.
10-8
at the
significant;
is shown
hours
11-12A.M.
*(b)
of
fertilizers
of packing
task
Hour \177
table of simultaneous
95 %
Five children were tested
differ,
fertility
in
3 given
(a) Test whether each factor

For the factors which
68
in the
\177Man
1-2
66
78
farms.
in
identical
in
55
7O
the statistically
that
differences
fertilizers
th\177
(b) Is there, after

all,
a difference
(Use a 5 \177 significance
level.)
*(c) Construct a table of differences
10-12, starring
60 64 65
Control
intervals as
before
rate
results:
After
104
96
102
112
I08
112
89
93
85
89
in
Table
and after
10-12.
certain
ANALYSIS
TWO-FACTOR
(a) I'est whether pulse rate

for
tt
10-9 Rewc
First,
a 95
Zonstruct
e population
\177oconfidence
rk Problem 10-8using
tabulate the
,efore (X)
96
104
102
112
108
112
89
93
85
89
The s mple of
equat on
(Y)
After
(8-15).
D's fluctuates
interval
the following
in pulse rate'
change
5 \177osignificance
for the change in
at the
changes,
of all children.
219
OF VARIANCE
technique
Difference D
level\370
p\177lse
rate
(matched t-test)
(Y-Jr)
+8
+10
+4
sample
to estimate
+4
+4
around
the
true difference
. Now
apply
chapter
to Regression
Introduction
Our
first
example
of statistical inference (in Chapter
7) was estimating
the mean of a singlepopulation. This was followed
(Chapter
8) by a comparison of two population means. Finally
(Chapter
10) r population means were
compared,using analysis of variance. We now consider the question \"Can
the
analysis
be improved
upon if the r populations do not fall in unordered
categories, but are ranked numerically 9.\"
For
example,
it is easy to see how
the analysis
of variance could be used
to
examine
whether
wheat yield depended on 7 different
kinds
of fertilizer2
Now we wish to consider
whether yield depends on 7 different
amounts
of
fertilizer;
in this case, fertilizer application is defined
in a numerical
scale.
If yield (Y) that

scatter
similar
various
from
follows
1-11
to Figure
be
might
fertilizer applications (X) is plotted,a

observed.
From this scatter it is clear
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
\337
100
200
300
1 By
extending
Problem
Observed
\337
\337
\337
\337
\337
relation of
,,,x
400 500 600 700
Fertilizer
FIG. 11-1
\337
\337
wheat
10-2.
220
(lb/acre)
yield
to fertilizer
application.
that
of fertilizer
the amount
\177
affects yield
fe[tilizer
how
define
dependence
a curve
ito fitting
geometrically
of Y on X. Ttjis regression
brief and pre\177ise description,
given
the quan!
of a
ity
regression
model, useful as a
mathem,aticat
exclusively to how
is devoted
of
cl\177aracteristics
statistical test\365
significance;
of
(e.g., its
but these issues
line
this
AN
may
are deferred
to
X.
more
Instead
Chapter
to
12.
complicated
we assume
by the
tions, so that. the

one observation
is not
experimenter,
it
are available
experimenter
sets X
Y in each
case, shown
S\177ppose
funds
referred to as the \"dependent\"

dependent on yield, but instead
it is
fertilizer,
application
fertilizer
si\177hce
is determine(/
variable
best be
be subjected
E21AMPLE
Since wheat yield depends on

Y;
may
line
straight
slope)
Furthermore, i It \177spossible
that
Y \177srelated
to J6 \177n a
nonlinear wa'; but these issues are not dealt with here.
that
the
appr( priate description is a straight
line.
variable
of course equivalent
income.
cha')ter
fitted. The
11-1
the so-called
scatter,
this
a simple
be
or as a means of predicting the yield

Y for a
Regression is the most useful of all statistical
in economics it provides
a means of defining
good demanded depends on its price,
or how consump-
of
tion depends )n
This
an equation is
through
describing
equanon
fertilizer
X.
A\177 another
example,
amount
techniques.
how
will
an
define
be possibleto
it should
Moreover,
\177.e.,
X. Estimating
on
o\177'
....
matter.
does
221
EXAMPLE
AN
is
referred
to as the
for only seven

at seven different
in
Figure
\"independent\"
observaonly
experimental
values,
11-2
and Table
taking
11-11
80
60 -
20--
100
300
200
FertiliZer
Observed wheat
yields
400 500 600
700
(lb/acre)
at various
levels of
fertilizer
apPlicationl
INTRODUCTION TO
222
REGRESSION
Experimental
Data
of Wheat to the
of Applied Fertilizer, as in
11-1
TABLE
Yield
Relating
Amount
Figure
11-2
Yield
Fertilizer
(bu/acre)
(lb/acre)
100
40
200
45
300
50
400
65
500
70
600
70
700
80
of all note that if the points were exactly

in a line, as in Figure
then the fitted
line
could
be drawn in with a ruler \"by eye\" perfectly
accurately.
Even if the points were nearly
in a line, as in Figure
11-3b, fitting
by eye would be reasonably satisfactory.
But
in the highly scattered case, as
in
Figure
11-3c, fitting by eye is too subjectiveand
too
inaccurate.
Further-
We first
11-3a,
more,
fitting
by
eye
plotting all the points
requires
first.
If
there
\337
\337
\337
\337
\337
x
(a)
\337
\337
\337
\337
\337
(c)
FIG.
11-3
Various degrees of scatter.
were
100
POSSIBLE
observations
experimental.
The foll\177)wing
line, succesSiYely
It is tim
is,
\177gure
fitted
\177I
t is
11-4.
line..
i.i.\177.
,
of the line.
the line
V\177'e
and
precisely \"What
is a good fit
the total error small.\"One typical
define\177d
\276i
note
ii\177egative
1. As our
the total of
aI
first
these
satisfactory.
more
ask
makes
-
as
\276i),
that
the
where
tentative
errors,
the
\"fitted value
\177Y\177
is the
error
of
\276\"
or
Y'to the
the
ordinate
positive when the observed \276\177

is
observed
criterion,
answer surely
is shown in
?\" The
the observed
from
distance
ver\177tical
the error is
when
223
LINE
FITTING ALINE
FOR
CRITERIA
to
tha
fit
\"a
set
sections
would
be very tedious,
and an algebraic
computer could solve would
be preferable.
forth various algebraic methods for fitting
a
more sophisticatedand
POSSIBLE
11-2
this
electronic
w.\177ich an
technique
FOR FITTING
CRITERIA
\276i is
consider a
above
the line.
below
fitted
line
which
minimizes
crite 'ion worksbadly.

Using
this criterion,
the two lines shown in
f t the observations equally well, even
though
the fit in Figure
11-Sais intui' ively a good one, and the fit in Figure 11-Sb is a very bad one.
The problem is one of sign; in both cases positive errorsjust offset
negative
errors,
leavi n their sum equal to zero.This
criterion
must be rejected, since
it provides
nc distinction
between bad fits and good Ones.
2. There
two ways of overcoming
the
sign problem.
The firs t is to
But this
Figure
11-5
\276
Error..,=
\177
\177',..\177\177J
Xs
Error
in fitting
ne
Xi
X.9
FIG. 11-4
Fitted
points
with
a line.
224
TO REGRESSION
INTRODUCTION
x
(b)
(a)
FIG.11-5
the
minimize
absolute values of
sum of the
\177
Since
this criterion
drawback.
out bad
rule
It is evident
criterion better
than
errors are not
large positive
would
lYE
the
fit
allowed
Figure
11-6,
that
in part
a;
errors,
(11-2)
PJ\177I
like
fits
Figure
in
the
to offset
large negative
11-5b. However,
the
(\177 I Y.\177--
it
still
b satisfies
fit
in
part
\177l
is
3, rather
ones,
has
this
than 4). In
fact, the reader can satisfy himself that the line in part b joining the two end
points satisfies
this
criterion
better than an)' other line. But it is not a good
common-sense solution
to the problem,
because it pays no attention
whatever to the middle
point.
The fit in part a is preferable because it takes
of all
account
points.
Y
I0
x
(b)
(a)
FIG. 11-6
THE
225
SOLUTION
SQUARES
LEAST
3. As
to
second
minimiz,
the sum
overcome the
way to
of the
squares
squares\"
\"least
amous
(a)
errors,
criterion;
its
propose
we finally
(11-3)
is the
This
problem,
sign
of the
the
overcomes
Sq\177taring
sign problem
include-
justifications
by
making
all errors
positive
(b) Sqt.aring e nphas\177zes

the
large
errors, and in trying
to satisfy this
criterio \177large errors are avoided if at all possible. Hence all points are
taken irto account, and the fit in Figure 11-6a is selected by this criterion
in preference to Figure 11-6b.
(c) Th
chapter.
Figure
Our scater of observed X and

11-7
Our objective is to fit
This
involve:
three
that:
from
Y values
a line
Y = a0
steps'
Translate
Step
variable
x,1.st
theoretical justifications
for least squares,

!
SOLUTION
SQUARES
LEAST
is very manageable.
squares
are two
important
in th e next
Th4re
develoP[d
11-3 THE
least
of
algebra
(d)
Table 11-1 is graphedin
(11-4)
+ bx
i
.g into deviations
from
its
mean;
a new
i.e., define
x=K-\177
In Figure 11-7bwe
to
similar
show
procedure
th\177
how
(11-5)
this involves a geometric translation

in Section 5-3, where both
developed
axis
;f
axes
were
to sltudy covariance.
The new x value becomes positive
or negative
on i whether
,V was above or below \177. There
is no change in the
Y values.
a0, but the slope b remains
the
same.Th e' intercept a differs from the original
translated
depending
One
of
value is that
.Y is
be
unusually
simplified
advantages of measuring .Vz as deviations

from their central
more explicitly ask the question \"Howis Y affected
when
large,
or unusually
small ?\" In addition,
the mathematics will
b\177ecause
the
sum of the new x values equals zero 2
tl' e
w e can
Proof:
Noting
that the
ean g is defined
\177
as \177 Xi
\177 xi
xi
-----
(11-6)
it follows
= n\177- nX
that
= 0
\177
Xi
= rig
and
(11-6)proved
TO
INTRODUCTION
226
REGRESSION
80
\177+
60 __
bX
is .slo?e
40
2O
-ao
I00
200
300
400
500 600
700
+ bx
I
60
b is slope
of this line\177
I--I
20 P-
x=X-X
I
t
t
-300
100
-i'\177
\337
400
300
200
+300
+200
+100
-100
-200
500
600
700
(new
origin)
X (old origin)
X
(\177)
I 1-7
FIG.
Translation
of axis.
(a)
Fit the line
in
11-7b;
Figure
Regression,
X.
translating
Step 2.
variables. (b)
original
using
Regression,
i.e.,
fit
line
the
y=a+bx
to
this
scatter
the values
by selecting
criterion, i.e., selectthose
Since
the fitted
value
?i is
on
our
\177
When
a and
for a and b that
of a
values
and b
a +
\177
satisfy
the least
minimize
squares
(11-8)
line (11-7)
estimated
this is substituted into (11-8), the

b to minimize the sum of squares,
S(a, b)
that
(11-9)
bx\177
problem becomesone of
( Y\177--
bx\177)2
selecting
(11-10)
THE
The
our
a{ what value of a and b it
the next
.lest minimization
paral
more cumberi
calculus
Minimiz ng S(a, b) requires setting its

a and b equai to zero.In the first instance,
thr,
.ugh
that
be used
will
(11-10)with
the resulting
where
us
in
the
\177x
with respect to
the partial derivative with
derivatives
partial
setting
i =
by
l)(\177
b\177,) \177=
rearranging'
(11-6),
a=
2(-
b.\177,)\370'
=
and
--2
by
-- b
na
\177 Y\177--
Noting
us
to zero'
ual
Z (,\276, -
Dividing
give
below.]
tted
respectto a e
it
can minimize
11-1,and rejoin
of Appendix
algebra
and
is calculus,
technique
without
[Readers
\177raph.
;ome
is st,
theorem
will
squares) line.
(least
sim
too,
vary
will
This
a minimum.
will.be
depend s on
expression
this
that
lines are tried), S(a,b)
as various
(i.e.,
b vary
td
optimum
The
is usedto emphasize
g(a, b)
notation
and b. As a
and we ask
227
SOLUTION
SQUARES
LEAST
\177 x\177 =
a.
solve for
we can
(11-13)
a--
or
\177 Y\177
Thus our lea\177t squares

estimate
of a is simply the averagevalue
of Y; referring
to Figure 111'7,we see that this ensures
that our fitted
regression
line must
pass through the point (X, Y), which
may be interpreted
as the center of
gravity
to b
of
of n
sample
t\177e
It is alS0 necessaryto
equal t. zero,
\177 (y\177
a _
points.
set the
bx\177.)\177.
=
\177 \177,.\177(\177
of (11-10)
derivative
partial
\177
2(_xi)(y\177
b\177,)
-=
a -- bxi)t
with
= O
respect
(11-14)
(\177-\1775)
Rearrang'in i
Noting
th
al
\177 x\177=
O, we
can solve for

b --
b.
\177x,
(11-16)
228
INTRODUCTION TO
REGRESSION
$++
\177 c\177\177
\177 \275r\177
II
II
II
II
THE LEAST SQUARES

resu
Our
ts 3
(11-16) are
and
(11-13)
in
229
SOLUTION
important
to restate
enough
as:
Theore m
their
and
as deviations from
least squares values of a
measured
x values
With
the
mean,
b are
a=Y
(ll-13)
b =
problem
For the ixample

five colu:nns in
first
Table
until
the next chapter). It
This
fitted
in
Table
11-2;
(the
follows that
Y
is graphed
lin.
If desired, this
Step $.
our
original
f\177ame
the
original
X values'
of
11-1, a and b are calculated i:n the

last three columns may
be ignored
the least-squares
equation
is:
11-7b.
now be retranslated back
11-7a. Express (11-17)interms
can
regression
in Figure
reference
(! 1-17)
60 +.068x
Figure
in
(11-16)
\177 Y\177x,
Y =
60 +
.068(X-
X)
60 +
.068(X-
400)
into
of
= 60 + .068X'-27.2
This
is graphed
lin
fitted
A compz..'ison
intercept.
may
(b
regressii\177n
fitted
be
of (11-17) and (11-18)confirms

= .068) remains the same; the
M\370\177reover,
is
fertilizer
iour
t \270ibe
alternatNe
X'
3\1770'
least
To
for
any
that
the
slope
of our
is in the
difference
only
easily the original intercept
applied,
our
Y = 32.8 +
(a0 = 32.8)
60 +
be perfectly, rigorous, we could

to zero, \177 actually
do have a
saddle point or
ocal minimum.
is now easily
application
(11-18). For
example,if
3501b
of
best estimate of yield is

.068 (350)= 56.6bushels/acre
least squares equation

then
x = --50, and
=
fertilizer
given
equation
squares
equal
how
note
we
of yield
estimate
An
When
11-7a.
Figure
in
(i 1-18)
.068X
rec\370V\177red.
derived from
The
32.8 +
have
(11-17)yields exactly
.068 (--50)
shown
minimum
the
same
result.
= 56.6
that when the partial derivatives are set

of squares. rather than a maximum,
sum
TO
INTRODUCTION
230
REGRESSION
PROBLEMS
(Save
11-I
work
your
in
a random
Suppose
for
chapters,
three
next
the
sample of
reference.)
future
income and
had the following
5 families
savings:-
(b)
11-2
Use
Interpret
$8,000
11,000
1200
1000
9,000
6,000
6,000
and graph the
regression
the intercepts
a and ao.
700
300
of savings
line
of Problem 11-1to regress

C = Y- S.)
the data
S on income
C on
consumption
Y.
income
Y.
define
(Economists
11-3To interpret
the
$600
(a) Estimate
Savings S
Income
Family
slope
regression
the
b, use
equation (11-18)to
answer
questions.
following
(a) About how much is the
for every
increased
yield
pound of fertilizer
applied?
(b)
were
If wheat
it be
would
pound,
(c) To
[The answer to
$.25 per
drop
to
?
is simply
(a)
of fertilizer
effect
\"marginal\"
economical
approximately
price
what
it economical
make
bushel and fertilizer cost

to apply fertilizer ?
would fertilizer have to
per
82
worth
the slope b. Economistsrefer

x on yield Y.]
::> 11-4If we
translated
both
X and Y into deviations
was translated in Figure
11-7b),
then
(a) What would the new y-intercept be? Would
same ? Doesnot this imply that the fitted regression
x and
the
slope
equation
to
b as
(just
as
the
X
remain the
is simply
g =bx
(b) Prove
in terms
that
\177
xiy \177--
of deviations
\177 x\177Y\177,hence
as
b=
'11-5
(Requires
bein\177
calculus.)
translated
into
\177
we
may alternatively
write
XiYi
2
Suppose X is Ie\177t in its ori\177innl form,

x (deviations from the mean).
rather
than
(a) Write
a0 and
the
11-1
partial derivatives
to zero the
(d)
Suppos
50
40
40
60
\177
I
40
30
50
50
(a) Fit
regression line of P on R.
(b)
Does
Criticiz{
this regression
\177
line \"show how research
SQUARES ESTIMATES
i
o e
of minimizing
\177n
ordinary
where k\177., k\177, k,\177are

With a
lit'
quadratic
f(b)
constants,
[e algebraic
LEAST
OF
b,
AND
\177
necessary to solve the theoretical

problem
function
of one variable b, of the form
b, it is
a and
est\177mating
OF a
(11-19)
= k2b 2 + ktb + ko
with k\177 > 0.
(11-19) may
manipulation,
be written
as
(11-20)
4k\177?
Note
that b ap Dears in the first
hope of minimizing
the
expression
the first term. t.eing a squareand
minimized
whei\177
it is
zero,
?\"
CALCULUS
WITHOUT
generates profits
DERIVATION
ALTERNATIVE
AN
of dollars)
(thousands
\177.
1-1
PrOblem
Expenditure,
Research
of dollars)
(thousands
APPENDIX
and b,
and researchexpenditures.
had the following profits
firms
four
Profit,
Be\234
to ao
respect
with
\"normal\" equations.
Ev41uate
these
two normal
equations using the data in
anld solve for a0 and b. Do you get the same answer ?
Co'nparethe two alternative
methods
of solution.
Firm
of
in terms
(11-10),
so-called
two
obtaining
(c)
squared deviations as in
sum of
3.
Set i equal
(b)
thus
\337
11-6
out
231
11-1
APPENDIX
term, but not in the second.

lies in selecting a value
hence
never negative,
the
that is,
Therefore
of b to
first
term
our
minimize
will
be
when
b +
k\177
2k2
(11-21)
232
REGRESSION
TO
INTRODUCTION
then
b
2k2
This result
of
function
is
this
With
will
b) =
S(a, b)
-- a) 2
[(Y,
\177
\177
[(\177,
power)
first
(11-23)
problem
the
to
return
of selecting
- a)
(11-24)
b\177,]\177
follows'
this, as
to manipulate
useful
be
let us
quadratic
minimize
s(a,
It
of secondpower)
2 (coefficient
in hand,
theorem
a and b to
of
restate,
setting
by
(coefficient
--
11-8. To
Figure
(11-19) is minimized
the form
b=
values for
in
graphically
shown
-- 2b(Yi
-- a)xi +
(11-25)
bSx\177]
(11-26)
In the
middle term,
consider
Z
Using
this
to
--
(Y'
\177) =
\177
( \177-
is a useful recasting of
while the last 2 terms contain
This
only the
(Y\177
--
-- a
(11-26)we
a)'-
2\177 \177
b alone.
a)
\177--
f(b) = k2b
+ klb
To
the
find
This may
Y\177
--
2a E
have
\177
(11-24),becausethe
is relevant.
term
first
Yixi
\177
term of
the middle
rewrite
S(a,
(11-27)
a)sci--
first
term
of a
value
be written
has
Y\177+
k.o
-b
-k l
2k\177_
FIG.
11-8
The
minimization
of a
quadratic
(11-27)
\177
function.
contains
which
a alone,
minimizes
t \270 (11-23),
According
this
is minimized
when
(11-13) proved
2/'/
To
find
are relevant.
te value
\\ccording
of b which
minimizes
to (11-23),
_...
-(-2
233
1 I-1
APPENDIX
this
(11-27),
is minimized
only
the last
tw\370
terms
when
,Y_.,
(11-16)
Proved
chapter
Regression
12-1
have
So far we
This yielded a and
b,
statistics of the sample, (like \177 in Chapter

2); now we
inferences
about the parent population (like our inferences
make
to
wish
a line.
fitted
mechanically
only
descriptive
are
which
MODEL
MATHEMATICAL
THE
Specificallywe must consider the mathematical model

tests of significance
on a and b.
Turning back to the example in Section
11-1, suppose
that the experiment could be repeated
many
times
at a fixed value of x. Even
though
fertilizer
application
is fixed from experiment to experiment,
we would not
observe exactly the same yield each time. Instead, there would
be
some
statistical
fluctuation of the Y's, clusteredabout
a central
value. We can
think
of the many possible values of Y forming
a population;
the probability
function of Y for a given x we shall call\177?(Y/x).
Moreover,
there will be a
similar probability function
for
Y at any other experimental level of x. One
in Chapter 7).
us to run
about/t
allows
which
of Y populations
sequence
possible
obviously be mathematical
To keepthe
about
the
manageable,
problem
of
regularity
these
is shown
in
12-1a.
Figure
in analyzing
involved
problems
There
would
such a population.
we make a reasonable set of assumptions

populations,
as shown in Figure
12-lb.
We
assume the probability functions
?(Y\177/x\177)
have
1. The same variance a2 for all xi; and

2. Means E(Y\177) lying on a straight line,
as
known
the
true
regression
line'
\234( \2763
\177
Remember
that
capital letter
denotes
the
our notation
an
original
conventions
(12-1)
=
are
different
observation and a
mean.
234
small
from Chapters 4 to 7. Now a

denotes its deviation from
letter
THE MATHEMATICAL
235
MODEL
p(\276/x)
(a)
i
p(Y/x)
FIG.
General populations of Y, given x. (b) The special form of the

of Y assumed in simple linear regression.
12-1
The populat
from
i on
of
a large valm
by
parameters
fi specify
\177and
information. We
The 'andom variables
the line;
that
statistically
are
Yi
to make
tend
not
Y\177does
assume
also
sampig
3.
x3
x2
Xl
Y2
they
are
to be
independent.'
large;
i.e.,
Ya is
populations
estimated
For ekample,
\"unaffected\"
Y\177.
be
may
written
more
concisely as'
These aisumptions
th The
iJi
random
variables
Yi
mean
I
i
and
variance
are
--
statistically
independent,
(12-2)
236
written
describe the deviation of
?t is useful to
occasion,
On
value as the
be
THEORY
REGRESSION
error or disturbance term

Yi =
where the
e\177, so
means
translated
No
onto a
In fact, the
zero mean.
let us consider
or disturbance term
Now
in
e is
of
just the
distribution
detail
more
ei.
of
TERM
ERROR
the \"purely
random\" part
Yi,
of
does it exist ? Or, why

doesn't
and exact value of Yi follow,
once the value of xi is given
?
The error may be regarded as the sum
of two components:
error
later.
assumption
THE
of
as the
before
refer
normality
THE NATURE OF
12-2
e are identical, except that
Y and
the shape of the distribution

to assumptions
(12-4)
results
as possible
from these,
or otherwise). We therefore
set\"; we shall derive as many
\"weak
(12-4)
distribution
(normal,
adding a more restrictive
with
yet about
made
is
assumption
variables,
- 0
of
distributions
the
that
differ.
alternatively
variance
and
We note
may
(12-3)
random
mean
their
its expected
Y\177from
the model
4- f9.r i 4- e i
rz
independent
e\177are
that
the
a precise
Why
(a) MeasurementError
There are
measuring
inaccurate
at various income
of budget and
(b)
reasons
various
levels,
is a study
the measurement
be measured incorrectly.
due to sloppy harvesting
Y may
error
an
error
the
of
in
or
of families
consumption
consumption
In
might
consist
Error
This occurs becauseof
social phenomena.
Even
of our
would result
be
inaccuracies.
reporting
Stochastic
repetition
why
wheat yield, there may

weighing. If the example
in
the
inherent
wheat experiment
different
yields;
irreproducibility
were no measurement
if there
using
these
exactly
the same
differences are
of biological
error,
and
continuous
amount of fertilizer
unpredictable and
are
ESTIMATING
stoch.
called
for\177\177tic
control
cot
But
etc.
duplicated.
omitted
vari\177
In the s
cz
AND
237
/3
They
may be reduced by tighter
experimental
by holding constant soil conditions,amount
of water,
control
is impossible--seeds,
for example, cannot be
differences.
example,
\177?lete
may
error
tochastic
.bles, each
be regarded
as the influenceon
of
many
small effect.
individually
an
with
experiments
are usually
not
possible.
hold U.S. national income constant
for
several
yearSt i while
he examines
the effect of interest
rate
on investment.
Since he cannot neutralize
extraneous
influences
by holding them constant,
his best alter[\177atiVe
is to take them explicitly into account, bv J reeressin
o
g Y
For
\177cial
sciences,
controlled
an
economist
cannot
example\177
on x
and
th\177
stochastic erqor; it
is
next chapter.
the
12-3
factors.
extraneous
called
ESTiiX [ATING
Suppose
in Figure
that
12-
\177
Th\177s
\"multiple
fl
AND
our true regression
!. This
is a useful
technique for reducing
regression\" and is discussedfully
in
fix is
0\177
q-
the dotted line shown
to the statistician, whose job it is

to estimate it
as best he can by observing
x and Y. Suppose at the first
level
x\177, the
stocha
;tic error e\177takes
on a negative value, as shown
in the diagram;
he will obse r,
'e the Y and x combination
at P\177. Similarly,
suppose
hi s only
other two ob
;ervations
are P,. and Pa, resulting from positive values
of e.
will
remain
unknown
p(Y/x)
Xl
FIG. 12-
True
(population)
x2
x3
regression
and estimated
(sample) regression.
REGRESSION
238
squares
line Y--
a 4- bx, applying
solid estimating line
ing, the
points
he has
information
reader
estimates the true line
statistician
the
suppose
Further,
THEORY
be sure
he can
regression and its surrounding e
estimated regressionline
on
the
Pa.
and
a critical diagram; before proceed-
This is
figure.
this
in
should
of Chapter 11 to the only

He would then
come
up with the
method
the
P\177, P\177.,
a least
fitting
by
clearly
on
distribution
between
distinguish
one
the
hand,
the true
and the
other.
Unless the statistician is very lucky

indeed,
it is obvious that
his estiline will not be exactly on the true population line. The best he can
hope for is that the least squares method of estimation
will
be close to the
target. Specifically,we now ask: \"How is the estimator a distributed
around
its target 0\177,and b around
its target fiT'
mated
12-4 THE
MEAN
We shall show that
OF
VARIANCE
AND
the
estimators
random
a AND b
a and b
have the
following
moments'
E(a) =
var
(a) =
E(b)
\177
(12-5)
--H
(12-6)
fi
\177
(12-7)
(12-8)
var
(b)
where a2 is the variance

of the error (the varianceof Y). We note
from (12-5)
and (12-7) that both a and b are unbiased estimators
of = and fl. Because of
its greater importance
we shall
concentrate
on the slope estimator b, rather
than
a, for the rest of the chapter.
of (12-7)
Proof
written
and (12-8). The formula for b
in
(11-16)
may
be re-
as
(12-9)
where
(12-10)
Thus
where
Wi
----
x,
k
(12-12)
THE MEAN
Since each xi i!
a fixed
VARIANCE
AND
so is eachwi.
constant,
OF a
Thus
AND
from
(12-11)we
eStkblish
the importan t! conclusion,

b is a weighted
of the
by
Hence
(5E(bl
may
vm
meat
For the
the
that
(b)=
w\177var
, from
combination)
(12-13)
write
may
= WlE(Y1)
.n ting
write \177'
Moreover,
we
;1) we
sum
(i.e., a linear
variables
Yi
random
Yx
+ w} var Y, =
+\"'
\177
(12-14)
wiE(Y0
assumed independent, by
Yi are
variables
w,,E(Y,O
w\177E(Y2)\"'
2
\177 w\177var
(12-15)
Yi
(12-1)
(12-14) and
(12-16)
=
and
(5-34)__.
i12217)
w,
2-12)
noting
(12-18)
k
but
\177
xi
is z, to,
according
\177
to (11-6).
Thus
From (12-10
For
the
vari\177
race, from (12-15)and
(12-2)
(12-19)
0.
\177 w\177
\177\177
=,
(b)
var
(12-7) proved
= fi
E(b)
=\177
(12-20)
x\177 0.\177.
k \177
0.2
notin
Again
(12-10),
var (b)
A
completing
weight
simil\177
(12-21)
r derivation
:heproof. We
w\177 at&tached
to the
of the
--
0.2
proved
mean and VarianCe
observe
Y\177observation
fr
\370m
(12-8)
of
(12\17712)that
is proportional
a is
in
left as
an eisercise,
Cal\177ia\177in\177
\177i
to the deviation
the
x\177.
REGRESSION
240
Hence
the
the
calculation
outlying
observations
of b.
THE
'12-5
THEORY
will
exert
a relatively
This is the major justification of using the

linear regression model.
least
Within
the class
the least
Theorem.
method
squares
estimators
of fi (or 00,
has minimum
variance.
Gauss-Markov
estimator
in
THEOREM
GAUSS-MARKOV
unbiased
heavy influence
of linear
squares
in the
(12-22)
i
is important
theorem
This
assumptions (12-4),and
distribution
statistics
texts.
To interpret
it
requires
hence
error term. A
of the
because
proof
follows
even
no assumption
be found in
may
from
the weak
of the shape
most
set of
of
the
mathematical
important
theorem,
consider
b, the least squares
have already seen in (12-13)
that it is a linear estimator,
and
we restrict ourselves to linear estimators becausethey
are easy to analyze
and understand. We restrict
ourselves
even further, as shown in Figure
12-3;
within this set of linear
estimators
we consider
only the limited class that are
unbiased.
The least squares estimator not only is in this class, according to
(12-7),but of all the estimators in this class it has the minimum variance.
It is often,
therefore,
referred
to as the \"best linear unbiasedestimator.\"
of
estimator
Gauss-Markov
The
case of
this
fl. We
regression, we
theorem
might
ask
has an interesting corollary. As a special

happens
if we are explaining Y, but
what
Least squares
estimator.
In its class,
this
estimator
has least variance
FIG. 12-3
Diagram
of the
restricted class of estimators
theorem.
considered
in the
Gauss-Markov
THE
so
0 in (12-Z),
fi
(12-2),J. is
least squarese
mean
(/\177)
is
play. From
x comes .into
(11-13) its
the least squares estimator
of a population
(Y), and the Gauss-Markov theorem fully
population
(/\177).
from
Moreover,
Y. Thus,
is
mean
sample
t\177e
mean is the best
sa[nple
the
applies:
of the
Limatot
variable
independent
no
that
mean
th
241
OF b
DISTRIBUTION
linear unbiasedestimator
of a
population
that the Gauss-Markov theorem is restricted,

are both linear and unbiased. It follo ws that
there may be h biased
or nonlinear
estimator that is better (i.e., has smaller
variance) tha
the
least
squares
estimator. For example, to estimate
a
population
m\177an,
the sample median is a nonlinearestimator. It is better
than
the sample
m ;an for certain kinds of nonnormal
populations.
The sample
median is jus one example of a whole collection of nonlinear statistical
arametrlc
methods known
as \"distribution-free\"
or\" non
'\" statistics.
' ' These
mear\177i
emphasized
ie
must
only
applying
are expressly\177lesigned
12-6 THE
inference
for
when
the population
cannot be assumed
distributed.
be normally,
to
that
\177o estimators
}ISTRIBUTIONOF
W, ith the i mean

and variance
of b established in (2-7) and (2-8), we now
ask: ' What
i\177the
shape
of' the distribution
of b?\" If we add (for the first
time) the strong assumption that the Yi are normal, and recall that b is a
linear combi \177ation of the Yi, it follows that
b will also be normal from
(6-13).But exen without assuming the Yi are normal, as sample size increases
the
distributi
\177nof b will usually approach normality; this
can
be justified
by
a generalized form \177'of the central limit theorem (6-15).
We
are
tow in a position to graph the distribution
of b in Figure 12-4,
in
order
to
evelop a clear intuitive
idea
of how this estimator varie\177:s from
sample
to sample. First, of course, we note that (12-7) established that
b
is an unb\177ase
estimator
so that the distribution
of' b is centered
on its
target
The \177ntqrpretanon
the experiment
makes the
of
the
had been badly
(\177eviations
xi
small;
designed
(12-8) is more difficult.

with the Xi's close
hence
\177
variance
x \177.small.
Therefore
Suppose
that
together. This
a\177'/\177 x \177'
the
(12-8) is largeand b isa comparatively

unreliable
estimator.
To check th\177 intuitive
validity
of this, consider the scatter diagram
in Figure
12-5a. The t unching
of the X's means that
the
small part of the line being
of b from
variance
The
central
mit theorem
(6-15)concernedthe
normality
of the
sample mean
X.
In
6-8 i'. was seen to apply equally well to the sample

sum S. It applies also to a
weio\177hted sum
of random variables such 'as b in (12-13), under most conditions.
See for
example, D. A. S. Fraser, Non?arametric StatiStics, New York: John Wiley, 1957.Similarly
the normality
\177fa is justified.
Problem
242
THEORY
REGRESSION
FIG.
12-4
The probability
of the
distribution
estimator b.
Y=a
+ lSx
Unknown true
regression
Y=a+bx
Estimated
regression
(a)
Y=a
'
12-5
(a) Unreliable
estimate
when Xi are very

X i are spread out.
true
Estimated
regression
\177k
FIG.
tSx
regression
\177- \177'\177
(b)
--- Unknown
close.
(b) More reliable
fit
because
cE
CONFIDEN
INTERVALs
}tYPOTHESES
TESTING
AND
\177BOUT
243
/\177 '
investigated
is obscured by the error e, making
the
slope
estimate
b very
unreliable. In this specificinstance,our estimate
has
been pulled badly out
of line by th\177 \177rr\370rs... i n particular,
the on e iI\177dicated
by the arrow.
By contrait, in Figure
12-5b we show the case where the X's are reasonably spread otJt. Even though
the error e remains the same,
the estimate
b
is much
more
'eliable,
because errors no longer exert the same leverag e.
As a con.:rete example,suppose
we
wish
to examine how sensitive
Canadian impilrtS
(Y)
are to the international
value of the Canadian
dollar
(x). A much more reliable estimate
should
be possible
using the period' 1948
to
1962
when
the Canadian dollar was flexible (and took on a range of
values)
than
in the period before or Sincewh en this dollar was fixed (and
only
allowed
t\177 fluctuate
within
a very narrow range).
12-7
With
statistical
to the
AND
INTERVALS
CONFI[\177ENCE
ABOUT
HYPOTHES'ES
TESTING
fi
the
variance
\177ean,
infe\177
inferenc. \177about
normality
of the estimator b established,
now in order.
Our argument
will
be similar
in Section 8-2. First standardize the estimat$
r b,
and
fi are
about
\177nces
3t
obtaining
(12-23)
\177
where
Since
where ?i
\177-\177
N(0
{,
1).
of
variance
a\177', t\177e
is the fitted
value
of
Y is
generally
n--
5 Z
[ Y\177--
the
estimated
on
?.\177=
s \177'
is
often
referr,
The divisor (n
unbiased estim.
estimator
L]
as \"residual variance,\"a term

2) is used in (12-24) rather than
a
of
\177'.When
this
longer normal, but
a As
\177'
substitution
instead has the
with
is estimated
(12-24)
\177
regression line'
i.e.
(12,25)
bx\177
:d to
tor
b is nd
a +
it
unknown,
similarly
used
n in order
of s \177'for
slightly
in ANOVA.
to make s\177 an
cr \177'
is
more
made,
the
spread2out
argued
in thl footnote to equation
(8-11).
But in the present calculation
of s 2, two
estimators a and \177are required; thus there remain
two fewer degrees of freedom for s a.
Hence (n - 2) is t\177e divisor
in s \177',and also the degrees of freedom of the subsequent
t
distribution
in
(12-\1776).
REGRESSION
244
THEORY
t distribution'
(12-26)
t --
For
the
t distribution
that
the
distribution
to be
of
(a) Confidence
the strong assumption

(12-26) we may now proceed to
we require
valid,
strictly
From
is normal.
or test an
interval
a confidence
construct
Yi
$2
hypothesis.
Intervals
t.025denote the
letting
Again
leaves
which
value
2\253\177
of the
distribution
upper tail,
in the
Pr (--t.0._,,
for
Substituting
(12-27)
t.025)= .95
t <
<
(12-26)
t from
[-
Pr
The
inequalities
within
(12-28)
'
.95
t.o.\17751
<
I-t'\37025
the bracket
may
be
reexpressed
.95
(12-29)
yields
which
The
95 \177 confidence
interval
for
fi:
s
fi
t.025 has
where
(n --
=b
2) degreesof
4-
(o25
(12-30)
--\177.\177
freedom.
evaluated
our example of wheat yield in the previous

chapter,
the confidence
for
/3 (the effect of fertilizer on yield) is computed as follows. s is
in the last three columns of Table 11-2.Also
noting
the values
4 Using a
similar
For
interval
argument,
and noting
oc
We note
that
this
is very
similar to
the
(12-6), the
-- a 4- t.025
confidence
95\177
confidence
interval
for :z is'
(12-31)
/\"
interval
for it
given
by equation
(8-15).
for b and
calculated
becomes
iNTERVAi-i
table,
our 95 \177o confidence
that
in
FOR
PREDICTION
(12-30)
interval
3.48
2.571
.068 4-
245
Yo
,/28o,oOe
.017
4-
.O68
.051 <
fi
.085
<
(12-32)
Testing hyp\177 theses.

A two-sided
test of any hypothesis may
be carried
out simply by hoting
whether
or not the confidence interval
(12-30)
contains
that hypothesis. For example, the
hypothesis
typically
tested is the null
hypothesis.
i.e., using our example, that

t w,
the
against
H0 must be rejected at a 5
is not containeli in (12-.32).
nce fertihzer
(12-33)
The first
calculate
if; to
step
From (122.26)and
the
(12-34)
since the
level,
affect
favorably
the one-sided
against
(12-33)this
statistic
reduces
on yield. In testing this
\177
\177o signifiCance
is expected to
appropriate to itest
true.
no effect
has
fertilizer
i-sided alternative
Ht' fi
(12-33)
=: 0
H0:fi
yield,
of'zero
value
null
it seems
more
alternative:
under
to
the assumption
Ho
that
is
12-36)
and in our
exa\177
nple'
=
Since
this
obsm
of
in favor
th
red value
:onclusion
.068
(I2-37)
10.3
\177-
3.48/42801000
exceedsthe
Critical
of 2.015, Ho
affectsyield.
t.05 value
that fertilizer favorably
is rejected
:
PREDI(
12-8
If
how
TION
INTERVAL
FOR
Yo
one new application of 550 pounds of fertilizer

pre, lict the resulting yield ?
The bestp )int estimate
will be the corresponding fitted
Y
we
plan
(x o =
150)
do we
value
on our
THEORY
REGRESSION
246
Y= cr + \177Sx
True regression
Estimated
regression
0
FIG.
t2-6
How the
regression line,
estimated
Yo is
estimator
related to
the
E(Yo).
target
i.e.'
= a+
=
xo
(12-38)
bXo
60 +
.068(150) =
70.2bu/acre
(12-39)
involve some error because,

b. Figure 12-6 illustrates
the effect of these errors; the true regressionis shown,
along
with an estimated
regression. Note how the fitted 7o in this case underestimates.In Figure
12-7
the true regression is again shown along with
several
estimated
regressions
fitted from several possiblesets of sample data. The fitted
value
is sometimes
too low, sometimestoo high,
but
on the average, just right.
The important observation in Figure
12-7 is that if x 0 were
further
to
point estimate, this
But
as a
for
example,
of e?rors
will
made
in
certainly
almost
calculating
a and
E(Yo)
Y=a + \177Sx
(True regression)
Estimated
regression
values (YO)
shown as dots
Fitted
lines
1
:\177
xo
FIG. 12-7
Yo
as an
unbiased estimator of E(Y0).
INTERVAL: FOR
PREDICTION
the
right,
On
the other
thin
of zero then
in b that
a
caus\177
prediction'
the effect
would
:s this;
such
of ai\177extreme
Formally,
it
dividual
Y oN
thus
ha\177
spread out. Moreover,it

slope b will
of fertilizer; but
in the
do
range.
wider
central \177Value
is the
little
error
harm
in
amount
any
predicti\177\370n
of
amount
of fertilizer will be thrown
badly into error.
shown
that5 the 95\177o prediction
interval
for an in-
be
nay
be less
an error
average
an
;ervation is
Yo
where t.o25
further
lr estimates
=,iVen
be spread out over an even

to the left and closer to its
would
estimates
OUr
and if xo were
247
Yo
(n
-- 2)
(12-42)
s jl 7,
?0 +
d.f.
For example, we can

550lb/acreoflfertilizer
were
predict'
a prediction
construct
now
With
applied.
150\177.
Yo =
70.2 4- 2.571(3.48)\\/\177
60.3
_<
interval
a 95 \177 chance
for yield if
right we
of being
+1
280,000
Yo _
(i2'43)
80.1
This predictiO: interval

is shown
in Figure 12-8. Moreover, the same calCulation for all po, sible xo values
yields
the dotted band of predictio
n interval
s,
expanding
as :o moves farther
away
from its central value of zero.
It should
be emphasized
that x o may be any value of x. If xo lies among
the observed
'alues Xl \337\337\337x\177, the process is called \"interpolation.\" If xo is
one of the valttes xx \337\337\337x,,, the process migh't be called, \"using also the other
values of x to ':Shar
en our knowled
e of' ' \177n\177.\177,,\177,,l\177,;,,- o, o. ,' \177 \177 ;\177
beyond
x\177 or Jh,, then
the process is called \"extrapOlatiOn.\"
Th e techhiqueS
a Without going Into
in
a prediction
the
proof
of (12-42), we
sketch its
plausibility.
The variance involved
i s rodghly
(a) +
Par = var
var
(b)x\177
+ var
(Y)
(12-40)
varim ce of a, plus the variance of b weighted with x0\177

, plus the inherent variance
This last sourceof error must be included of course;
if \177and/9
were known exa4tly, the prediction
of Y0 would still be subject to error.
even
Into (12-40)we substitute
(12-6), (12-8), and (12-2)
that
is, the
of any
Y observation.
G2
var
When s is substi
\177ted for
o, (12-42)
{72
=-
09.
a2
(x\177))+
follows.
E\177,\177
\177 +
(12-41)
248
THEORY
REGRESSION
Y
.''
I
I
Y=
bx
a+
80.1
8O
regression
.-
\177
'
\177o.2
60 + .068x
\"\177=Estimated
.....-
.._.-
40
[
I
20
,,[
150
xo
FIG.
developed
siderable
Prediction
12-8
for
interval
Yo.
in this section may

be used
for extrapolation,
caution
as we shall see in the next section.
but
only
with
con-
PROBLEMS
12-1 Construct a
(a)
95
\177
for the
interval
confidence
regression
in
coefficient/\177
11-1.
Problem
(b) Problem 11-2.
12-2
the
of
Which
prove
to be
following
does the data
hypotheses
unacceptable at
the
\177
of significance
level
11-1
Problem
of
(a) fi=0
(b) /\177=
1/2
(c)
(d)
\177 =--.1
12-3 At the 1 \177

the
level
hypothesis
use the data
of significance,
that savings
alternative hypothesis that

12-4 Using the data of Problem
for the savings of a family
(a) $6,000
does
11-1,
what
income,
test
against the
with income.
is your
an income
of Problem 11-1to
on
depend
increases
savings
with
not
of
95
\177oprediction
interval
EXTRAPOLATION
OF
DANGERs
249
(b) $8,0!0
(c)
s10.
000
(d)
S12
000
(e)
Whi
:h of
if)
NoV
from
these four intervals is least precise? Most

is the answer to (b) related to the confidence
and I
matical\" and
interpola
dan
central value.
(a) Mathema
ou
underlying
(b) Practical
it is
which
allowed to
widely
vary
which we
in extrapolation,
In both
'practical.\"
cases, there
ion and dangerousextrapolation.

get of misinterpretation
as
call
\"mathe-
in
the
further
previous
section
that
zero. This is true, even

mathematical model hold exactly.
.ves away from
prediction
if all the
intervals
get
assumptions
\177anger
it
practic\177
must
absolutely corfsct. Rather,
be recognized
it
is a
that
mathematical
useful approximation.
model is never
In particular, one
means are strung
the hypothesis that

th e population
line. If we consider the ernhzer example,it is likely
that
the
true r,
[ation increases initially,
but
then bends down eventually as
a \"burning poi: \177t\"is approached,
and the crop is overdosed.This is illustrated
cannot
out
take
in an
se 'iousty
exact
\177
straight
in Figure
ately
12-9which
reduced.
is practically
model.
of
HoweV
experimenta
is an extension of Figure
11-2
with the scale appr\370priregion of interest, from 0 to 700 pounds, the relation
a ;traight line, and nO great harm
is done
i n assuming the llnear
:r, if the linear model is extrapolated far beyond this region
Iion, the result becomes meaningless.
In
division between
there
is continually
and further from its
Rather,
x 0 gets
might
sharp
no
is
cal Danger
>hasized
m,
to explain how the interest rate

(i)
affects
Would you prefer to take observations
of
which
the authorities
were trying
to hold
interest
U.S.
dangers
two
en
larger as x0
In
found
OF EXTRAPOLATION
DANGiRs
an
It was
the
a periodin
There
increasing
in
over a periodin
constant i or
safe
interval
trying
are
you
investm\177.qt (I)
12-9
precise
(1\177 -3\177)?
Suppose
12-5
\177
the
250
THEORY
REGRESSION
Linear model
--
-\177r
\"x
700 1,000
400
100
Fertilizer
FIG.
There are
models
\"nonlinear\"
are
tests
statistical
Moreover
Comparison of linear
12-9
appropriate. Thesetopics
'12-10
MAXIMUM
12-5 including
Y). In Sections
of
normality
required
general principle that

parent
population
make the
strong
sample
small
not
or
whether
advanced texts.
estimation
In these
(i.e.,
term
was
assumption
normality
.and this
requires a
estimation
to validate the t distribution.

assumption
of a normally
of least
justification
of the error
normality
the
12-6 to 12-9,the
sample
small
for
only
in more
Gauss-Markov
the
of
assumption
seem more appropriate.
determine
ESTIMATION
LIKELIHOOD
12-1 to
Sections
if they
to help
covered
are
models.
nonlinear
and
available,
available
they are
squares requiredno
(true)
model
because
normally
of
a quite
distributed
last two sections
distributed error throughout.
we
On
we derive the maximum

likelihood
estimates
of o\177and/\177, i.e.,
hypothetical population values of o: and/\177 more likely than any others
to generate
the sample values we observed.TheseMLE
of o\177
and/\177 turn
out
to be the least squaresestimates;
thus
maximum
likelihood
provides a second
justification
for using
least squares.
Before addressing the algebraic
derivation,
it is best to clarify
what
is
going on with
a bit of geometry.
Specifically, why should
the maximum
likelihood line fit the data well ? To simplify,
assume
a sample of only
three
this
premise,
those
observations
First,
it
carefully,
(P\177,
let us
P2,
try
we note
points.) Temporarily,
Pa).
line shown
the
out
that
it
seems
to be
in
12-10a.
Figure
pretty
bad
fit for
(Before examining
our three observed
this were the true regression line; then

the
be centered around
it as shown.
The likelihood
that
such
a population
would give rise to the samples
we observed
is the
probability density that we would get the particular set of three
e values
shown in this diagram. The probability
density
of the three values is shown
distribution
suppose
of errors would
LIKELIHOOD
MAXIMUM
251
ESTIMATION
ordiqates above the points P\177, p=, and Pa. Because our th
by assumption statistically inde,,end=-* ,\177-- ,'- ,..ree\177observa..
:
\177'
\177,,\177,the
llKeilnooo
oI all three
(i.e., the pr\275 bability
density
of getting the sample we observed),
is the
product
of these tl\177:ee ordinates.
This likelihood
seems relatively small, mostly
because
the /ery small ordinate o\234P\177reduces
the product
value. Our intuition
that
this is
bad estimate is confirmed;
such
a hypothetical
population is
not very Iik\177 y to generate our sample values. We should be able to do better.
as the
t\177ons are
In Figu!:e12-10/5
it is evident
is more likely
thetical pop/alation
we can
that
do
This
better.
much
rise to the
give
to
sample we
hypo-
observed.
el
The second observation

Its probability ordinate
xt
x2
x3
(a)
P(Y/x)or
p (e/x)
x2
Xl
x3
FIG.
it is
a hypoth
12-10
Max
only
likely to generate
tt
:hum
likelihood
estimation.
',tical
population
that the
e observed
P1, P2, Pa'

likely
(a) Note. This is not

statistician is considering.
(b)Another
to generate
hypothetical
Pz, P2, Pa.
the
true population;
But
population;
it is
not
very
this is more
252
THEORY
REGRESSION
terms are collectivelysmaller, with their probability

density
as a consequence.
The MLE technique
is seen to involve speculating on various
possible
populations.
How likely is each to give
rise
to the sample we observed?
Geometrically, our problem
would
be to try them
all out, by moving the
disturbance
The
being greater
population
through
line and its
surrounding
all
position involves a different

the likelihood of' observingP\177,
Each
choosethat
over,
shall
little
of' trial
set
P2,
This procedureseems
other points worth
The
noting.
another set of'
in a
intuitively,
result,
to
MLE we
MLE
It is
to arrive
likelihood.
12-10b
good
more-
fit;
surprise
that
we
is derived
f`rom
our
is no
it
case
In each
For our
almost
certainly give rise to another
MLE
of' 0\177and
fl. The second point is more
subtle. The likelihood of' any population
yielding our sample depends on
not
only
the size of' the e terms involved, but
also
on the shape of' the e
distribution--in particular
\1772, the
variance
of' e. However, it can be shown
that the maximum likelihood line does not depend on \1772. In other words, if'
we assume
\1772 is larger,
the geometry will look different,
because e will have
a flatter distribution;
but
the end result will
be the same maximum likelihood
three
sample
observations;
and/3.
maximizes this
is required in Figure
since it seems similar to the least squaresfit,

be able to show that
the two coincide.
There are two
0c
which
adjustment
f'urther
values for
be evaluated.
P\177 would
population
hypothetical
evident that
at the MLE.
values--i.e.,
by
moving
the regression
through all possible positions in space.
possible
its
e distribution
line.
has
means Whileof'
geometrYth at
arriving
done algebraically.For generality,

rather than just 3. We wish to
likelihood
as a Function
the probability
or probability
of` the
possible
suppose
..--.
p(V\177)
that
we
know
Y2'\"
(I2-44)
Y,\177)
density of' the sample

population values of' 0c,
density of the
provided a precise
estimate.This must be
have a sample of' size n,
maximum likelihood
?(Y\177,
the
would
it hasn't
method,
the
clarified
e specific
observations
sample
value
first
of' Y,
e_(1/2c\1772)
we observed-.-expressed
fi, and
\1772. First,
consider
which is
[ Yl-(\177+\177x\177)]\370'
(12-45)
x' / 2---\1777\177z
is simply the normal distribution

of' Y\177, with
its mean (\177 + fixt) and
variance (crs) substituted into the appropriate
positions.
[In terms of the
geometry
of' Figure
12-10, p(Y0 is the ordinate above
P,.]
The probability
density of the second Y value is similar to (12-45), except that the subscript
2 replaces l throughout,
and so on, f'or all the other observed Y values.
This
inde
The
of
\177endence
bilities togeth
to
,,r
P( Y\177, Y2,'...,
Y values
the
justifies multiplying
proba-
these
all
Thus
(12-44).
find
253
LIKELIHOOD ESTIMATION
MAXIMUM
Y\177)
__
where
the
rei,resents
\177
i=1
the product
exponentials
of,
product
factors, Using
reexpressed
be
can
(t2-46)
in
for
rule
\177amiliar
the
summing
by
exponents
ya,...
p(y\177,
y,\177)
the observed Y'\177

3, and \177a. To emphasize
t'mt
Recall
of g,
values
(12-47)
eX\177-\177/a'\177tI\177-\275\177\275\177'
are
given.
this,
We are speculating
we rename (12-47)the
on Various
likelihood
i\177nction
1
We
now ask
is
t\177
appear
exponent in;'olves
\177and
fl in
fl make
\177.and
moreover,
exponent;
the
i;.
of
values
which
the
minimizing
(12-48)
e_\177/aa\177x\177_\177_\177,\177
L largest
? The only
plac
e g
and
maximizing a function
with
a negative
Hence our problem is to choose
exponent.
to
\177rder
[Yi-m likelihood
\177.\177xi] \177solution
(12-49)
maximu
for
This is the proposition
suggested
in the geometrical an tlysis in Figure 12-10; no matter what,is ass\275\177ed:abPut
the
spread
of the distribution, the maximum likelihood line is not affected by it.
But an even more important
conclusion
follows
from comparing equation (12-49) vith equation
(11-10).
Maximum likelihood estimates are identical
to least
squhres estimates. The selection of least squaresestimates
a and
b
his providesminimize
the
\177
Moreover,
regardless of'
the
(11-10) is identical
to minimiz}
estimates o\275 =
called
our
follows
normally
6 e\" \337
e \177'
=
and
fi
\177stimates
ustification
theoretical
fr
\370n
d
ea
of o.
value
applying
of the least
maximum
stributed error.
for any
to
the
selection
of maximum likelihood
e is that we;re
minimize
(12-49). The only
differenc
d\177ffe
'
r e nt names . This establishes our
to
a and b.
squares method: it
likelihood
is
other
th e
important
estim.ate
techniques to a model
that
with
THEORY
REGRESSION
254
'12-11
THE CHARACTERISTICS OF
INDEPENDENT
THE
VARIABLE
So far
set
given
it
been
has
of fixed
assumed
that the independent
example, fertilizer
values (for
variable x takeson a
was
application
set at
certain
specifiedlevels).But
in many
cases x cannot be controlled
in this way. Thus
if we are examining the effectof rainfall
on yield,
it must be recognizedthat
x (rainfall)
is a random variable, completely
outside
our control.
The surprising thing
is that
the same MLE follows whether
x is fixed or a random
variable,
if we assume
[as well as (12-4)], that
1. The distribution
2. The distribution
every x i.
for
The
likelihood
of
x does
e is
not depend on \177, fi, or rrz.

independent of x, being N(0, rr 2)
(12-50)
(12-51)
of our
Y. If the
z and
both
of
z\177.are
sample now involves the probability

independent,
the likelihood function
of observing
is
(12-52)
Because
Collecting
of the normality
the
exponents
L =
?(x) does not

(12-50), the problem
Since
assumption (12-51),
by
holds
of maximizing
a joint
this
likelihood
minimization
in fact, even if the

probability
distribution;
true,
e_(i/2az)Z(y\177._=_fia:\177)2
depend on the parameters
these parameters reducesto the

This
p(x0p(:%)
x\177are
then
not
o\177,f,
and
function
(12-54)
to
rr \177according
with respect
to
of the same exponent as before.

independent,
and are determined
(12-54)
becomes-
(12-55)
again requiring the same (least squares)minimization

of the exponent.
We conclude that MLE and least squares coincide regardlessof whether
the independent
variable x is fixed, or a random
variable
ifx is independent
of the error and parameters in the equation
being estimated. This greatly
generalizesthe application
of the regression model.
chapter
Regression
13-1
INTR\177
)DUCTORY EXAMPLE
that
Suppose
were
taken
country.
Ew
the
different
if soil
conditions
wheat yield observations
and
fertilizer
t several
and temperatures
in
stations
experiment
agricultural
were essentially
Chapter
11
across the
the
same
ireas, we still might

ask,
\"Can't part of the fluctuation
in
Y
levels
of rainfall in
(i.e., the dis :urbance term e) be explainedby varying
different
are\177ts ?\" A better
prediction of wheat yield may be possible
if both
fertilizer an d rainfall are examined. Noticehow this argument
is similar to
the one used'in two factor ANOVA'
if the error e can be reducedby taking
in
all
these
i\177to account,
variablesRare
related.The
1
rainfall
along
with
Table 11-1.
ithe original
TABLE
will get
we
a better
explanation
of how
the other
of rainfall are shown in Table
13-1,
levels
observed
observations of
Fertilizer
Wheat
Observed
13-1
and
Application,
Wheat
Yield
(bu/acre)
yield
wheat
X
Fertilizer
(lb/acre)
Yield,
Rainfall
Z
Rainfall
(inches)
40
100
36
45
200
33
5O
300
37
65
400
37
70
500
34
70
600
32
80
700
36
255
and
fertilizer
from
256
REGRESSION
MULTIPLE
The
MODEL
MATHEMATICAL
THE
13-2
technique
used to describe how a dependent
or more independent variables
is in fact only an
extension of the simple
regression
analysis
of the previous two chapters.
Yield
Y is now to be regressed on
the
two independent
variables, or \"regressors,\"fertilizer
2' and rainfall
Z. Let us suppose it is reasonable
to argue
that
the
model
is of the form
regression
multiple
to two
related
is
variable
E( Y\177) =
regressors
both
with
Geometrically
in
measured as
is a
plane
from
deviations
as a
hollow
unlikely
to
observed Y,.
expected
plane. The
at
fall precisely on
this fertilizer/rainfall
value,
combination
any
observed
shown
directly
lying
E(Y) = ot
e\177
I
(x\177,
above, shown
as a solid dot, is
our particular
is somewhat greater than
and expected
IY
fertilizer
and
directly
example,
solid dot
and is shown as the
differencebetween
For
plane.
this
means.
their
space shown
three-dimensional
the
\177
in
For any given combination of rainfall

yield E(Y,) is the point on this
plane
dot.
Of course,
the observed value of Y,
expected
very
its
x and z
equation
(13-1)
7z;
13-1.
Figure
the
this
fl.% q-
\177q-
above
value of
Yi
this
is the
+ \177+ '\177z
E(Yi)
(xi, zi)
Fertilizer x
Rainfall
FIG.
\177
It
is a plane
could say that

more important
sumption
that
13-1
Scatter of observed points
about
the true
regression plane.
because it is linear in .v and z. Looked at from another

point
of view we
(13-1) is linear in cz, fi, and 7. 17n fact, this latter linearity
assumption
is the
of the two, since we are involved in estimating cz,/5',and
7; it is this askeeps our estimating equations (13-4)linear.
257
ESTIMATION
SQUARES
LEAST
!!
or
stochastic
its expected
:rror
term
\177lue
plus
/3
ass\177:
mptions about e the
is geo r \177etrically
fix i +
o\177+
Yi
may
be
expressed as
7z\177. +
(13-2)
!!
es
Chapter I2.
slope of the plane as we
i.e., keep z constant; thus
as in
same
as the
interpreted
Value
term
disturbance
Yi
with our
observed
any
e\177.Thus
this
move
in a
fi is
the
pal illel to the (x, Y) plane,
Y. Similarly
97 is geometrically
intermarginal effe:t of fertilizer x on yield
pretedas the ;lope of th e plane as we move in a direction parallel to the (z, Y)
Y.
plane,
i.e., k\177,ep x Constant;
hence 7 is the marginal effectof z on
direction
LEAS'[ '
13-3
Least
Sc
SQUARES ESTIMATION
baresestimates
m nimize
and 97 that
Y's and the
tted Y's;
of the squared deviations
i.e., minimize
(yi
where
by
a, b,
setting
- a --
th
of
derivatives
the estimates
between
the
Of 0c,
fi,
observed
\177'
a [d c are our estimators of

partial
by selecting
derived
are
the sum
this
hx,
:z,/3,
and
function
(13-3)
czt,)
7. This is
with
done
respect
with
to a,
calculus
b, and c
fi, and 7 are derived

in the same way as in the simple
with least squares. Geometrically,
this involves trying
hypothetical
regression
planes in Figure 13-1, and selecting
that one that
out
all possible
is most likely tl generate
the solid-dot sample values
we actually observed.
But first, fiote that Figure 13-1 involves 3 parameters (oc, fi, and 7), and 3 variables
(y, .r, and z). !ttowever, there is one additional
variable
in our system--p(Y/x, z)\177which
has not vet beSn \1771otted. It may appear that
there
is no way of forcing
4 variables
into
a three-d]mens{on\177.l
space,
but this is not so. For ex0\177ple,
economists
often plot 3 variables
(labor,
capitalj and output) in a two-dimensiOnal
labor-capital space by introducing
the
third output v\177riable as a system of isoquants.
Those
for whom this is a familia[
exercise
should have little trouble
in graphing four variables [Y'
dimensional
(t, \177,and z) space by introducing
the fourth variable [/9(Y/x, z)] as a system
of isoplanes.
!ach of these
isoplanes
represents
(Y, x, z) combinations
that
are equiprobable (i.e.: for which the probability density of Y is constant). Thus the Complete
geometric
mo( el is the regression plane
shown
in Figure 13-1, with isoprobability
planes
stacked above and below it. Our assumptions about
the error term (12-4) gaurantee
that
the isoprobabi ity planes will be parallel to the true regression plane.
For MLE, we introduce
the additional assumption that
the error configuration
is
normal.
The n we shift
around
a hypothetical regression plane
along
with its associated
set of paralle I \177oPr\370bability planes. In each position the pr\370bability density of the \370bserved
sample
of poir\177ts is evaluated by examining the isoprobability plane
on which each point
lies, and mUltiplYing
these together. That hypothetical
regression
which maximizes this
likelihood is cposen.The algebra
resembles
the simple case in Section
12-10; it i s easy to
9.
Maximum
lik
regression case
show that
,.lihood
again
thisiresults
estimates
this
of
cz,
coincides
in minimizing the
sum of squares
(13-3).
258
MULTIPLE
REGRESSION
LEAST SQUARES
ESTIMATio
N
259
equal to
'o, (or algebraically by a technique
similar to that
used
dix 11-1). 'he result is the following three estimating equations'
in Appen-
a=Y
=
\177 Yixi
Again,
nott
third
equat
be solved
may
Table 13-2, and yield the
\177 x\177 +
regression
multiple
fitted
(13-4)
\177 x\177.zi
estimate a is the mean of

for b and c. Thesecalculations
the intercept
that
\370ns
second and
Y. The
are
in
Shown
equation.
PROBLEM S
13-1
SupVOse a
exte\177
random sample of 5
Income
s 8,000
s 600
(a)
Savings
family
data (an
the following
yielded
families
11-1)
of Problem
sion
Assets W
$12,000
6,000
11,000
1,200
1,000
6,000
9,000
3,000
700
6,000
15
300
6,000
18,000
of S on Y and W.
(b) E oes the coefficient of Ydiffer
from
the answer to Problem 11-1(a)?
Whic
coefficient
better
illustrates
the relation of S to Y?
(c) F r a Family
with
assets of S5000 and incomeof ,88000,
what
would
you
redict savings to be?
(d) C\177tlculate the residual sum of squares, and residual variance
s 2.
(e) A-e you satisfied with
the
degrees
of freedom you have for s2 in
this oroblem ? Explain.
the
Estimate
regression
multiple
13-2
a random
Suppc\177se
exten ion of
mily
sample
equation
of 5 families
Problem 11-1)
Savings
s 600
yielded the
Income
data
following
Number
of
Children
s 8,000
700
6,000
2
1
3
300
6,000
1,200
1,000
11,000
9,000
(an
260
REGRESSION
MULTIPLE
multiple regressionof S on Y and N.

with
5 children
and income of $6000, what
would
you predict
savings to be?
'13-3 Combiningthe data of Problems 13-1 and 13-2,we obtain
the following
Estimate the
(b) For a family
(a)
table
of
Number
Income Y
Savings S
Family
700
6,000
300
6,000
the
Measuring
3,000
18,000
variables as
independent
wish to estimate the
6,000
6,000
9,000
1,000
S12,000
11,000
1,200
Assets
$ 8,000
600
Children N
deviationsfrom
the
mean,
we
regressionequation
S
\177.+
fly +
?,w +
6n
(a)
Generalizing
(13-4), use the least squares criterion
to derive
the
system of 4 equations neededto estimate
the four
parameters.
(b) Using a table such as Table
13-2,
calculate
the estimates of the four
parameters.
13-4
MULTICOLLINEARITY
Regression
Simple
(a)
In
the
Xi's were
In Figure
will
12-5a it
was
how
shown
closely bunched, i.e., if

be instructive to consider the limiting
the
our estimate b became unreliable

if
regressor
X had little
variation.
It
case,
where the Xi's are concentrated
on one single value X0, as in Figure 13-2. Then b is not determined

at all.
There are any number
of differently
sloped lines passingthrough
(X,
Y)which
fit equally well for each line in Figure
13-2, the sum of squared deviations
is the same,
since the deviations are measured vertically
from
(X, Y). This
geometric fact has an
algebraic
and the
b in
term
involving
depend on b at all. It
sum of squares. An
follows
alternative
counterpart.
(11-10) is zero;
that
any
way
If all Xi --
hence the
sum
-\177,
then
of squares
b will do equally well in

of looking at the same
= 0,
does not
all x i
minimizing
the
problem is that
261
MULTICOLLINEARITY
i!
FIG.
since
all
x\177ar
zero,
defined.
In concl\177.si\370n,
\177 x\177in
the
the
when
because of no
regression,
Degenerate
denominator
X. \177
of (11-16) is zero, and
ofX show
values
(variatio n) in
SPread
little
or
no variation,
b is
not
then the
effect of X 04 Y can no longer be sensibly, investigate d' BUt ifth e problem is

predicting Y-\177rather
th an investigating
Y s dependence on X this bunching
of the X values doesn't matter
provided
we stick to this
same
value
of X. All
the lines in F gure 13-2 predict Y equally
well. The best predicti \370n i s F' and
all these lines give us that result.
(b) In Multi
ile
Regression
consider
Again
the
hm\177t\177ng
case
where
of the independent
the values
Figur
\177
variables
a\177d
Z are
completely bunched up
on a line
L,
as in
\177'1
F
\1772
1
FIG.
13-3
Multicollinearity.
e 13-3.
262
MULTIPLE
REGRESSION
that all the observed points in our scatter lie in the vertical plane
through
L. You can think
of three-dimensional
space as a room
in
a house;
the observations are not scattered throughout
this
room,
but
instead lie embedded in an extremely
thin pane of glass standing
vertically
on the floor.
means
This
up
running
In explaining Y, multicollinearity
makes
us lose one dimension.In the
case of simple regression, our bestfit for Y was not a line, but
rather
a point (x,, Y); in this multiple
regression case our best fit for Y is not a plane,
but
rather
the line F. To get F, just fit the least squares
line through
the
points
on the vertical pane of glass.Theproblem
is identical
to the one shown
in
Figure
11-2; in one case a line
is fitted
on a flat pane
of glass, in the
other case, on a flat piece of paper. This regression line F is therefore
our
best
fit for Y. As long as we stick to the same combination
of 2'and Z i.e., solong
as we confine ourselves to predicting Y values on that pane of glass no
special
problems
3 arise. We can use the regression
F on the glass to predict Y
in exactly
the same way as we did in the simple regressionanalysis
of Chapter
11. But there is no way to examine
how ,Y affects Y. Any attempt
to define fl,
the marginal effectof X on Y (holding
Z constant),
involves moving off that
pane
of glass, and we have no sample
information
whatsoever
on what
the
world
out there looks like. Or, to put it differently,
if we try to explain
Y with
a plane rather than
a line
F we find
there
are any number of planes
running
through
F (e.g., rr\177and %) which do an equally good job. Sinceeach
passes
through
F, each yields an identical sum
of squared
deviations;
thus
each provides an equally
good fit. This is confirmed in the algebra
in the normal equations (13-4).When
X is a linear function
of Z (i.e., when x is a linear
function
of z) it may be shown that
the last two equations are not independent,
and cannot
be solved
uniquely for b and c.4
Now
let's be less extreme in our
assumptions
and consider the nearlimiting
case,
where
z and x are almost on a Iine, (i.e.,where
all our observations in the room lie very close to a vertical
pane
of glass). In this
case,
a
plane may be fitted
to our observations,
but the estimating procedure is very
earlier
a In
practice, there
routines typically
4 Two
equations
suppose
can
be a problem
down in the face
usually
age (X)
John's
that
would
break
be solved
is twice
in
getting
of perfect
for
line
unknowns,
but not
(Y). Then we can write
two
Harry's
the regression
F,
since computer
multicollinearity.
always.
For
example,
X=2Y
or
5X =
Note that
unknowns,
dent
these two
but
information.
they
10Y
(13-5)
tell us the same thing.

We have two equations with
two
generate a unique solution, becausethey don't give us indepenThe second just restates
what the first told us.
equations
don't
MU
263
NEARITY
LTICOLLI
sensitive to random errors, reflected

in large
c. Thus, even though
X may really affect Y,
its statistical
s.gnificance
can't be established becausethe standard
deviation
of b is so lar\177 :. This
is analogous
to the argument in the simple
regression
case
in Secti\370i
12-6.
unstable;
very
[ ecomes
it
of
variance
b and
estimators
thl
Z are collinear, or nearly so, it is

For prediction purposes,it does not
hurt
provided'
there is no attempt to predict for values
of X and Z removed
from their lin e of collinearity.
But
structural
questions
cannot be answered
the relation
of Y to either X or Z cannot be sensibly
investigated.
When
th\177
the
called
pr\370i
independent variables .,Y
and
of multicollinearity.
>lem
Example 1
In our wf
fertilizer
incredibly
amount
in
example, suppose
.eat yield
mca\177
ured
of defining
of fer '.ilizermeasured in
foc
lish error
mu s t
ounces
ounces
Z =
exactly.
Thu s
we have
an
pl a
regression
one possible
xample of perfect
le to the observations
But
into
an
this
on
fall
Now
and fertilizer
original regression
solution
y satisfactory
equal
(13-6)
must
of yield
32.8 +
Y =
measured
weight
any
pounds'
multicollinearity.
be our
would
\177nswer
Since
16X
and Z
of,Y
\17711combinations
acre.
per
measurement in
be sixteen times its
the
Z as the
variable
independent
another
makes
statistician
the
that
amount of
(as before) the
X is
that
and
acre,
per
pounds
in
given
line, and
to fit s a
Tabl e 11-1,
straight
if we try
in
given
in
(11-18)'
.068X'+ 0Z
(13-7)
would follow from
(13-6)
substituting
(13-7)'
Y =
Another
32.8 q- 0X'
equi!alent answer would
be
q- .00425Z
to make
a partial
for
substitution
,Y in
(13-7) as follows:
32.8 +
32.8
32.8 +
Y=
(13-8) is a wh
i. In fact,
to
\177
The
computer
,le family
ai [1
these
program
.06811X +
+ .06811X +
of
.068)\177X
planes
three-dimensional
would
suppose the cal c' dations are
.00425(1
\"hang
up\"
(13-8)
-- i)Z
depending
on the arbitrary
planes are equivalent
probably
handcrafted,
q-
(1 - 1)XI
(1 -- 1)(\177o)Z]
trying
to divide
value assigned
expressionsfor
by
zero.
So we
MULTIPLE
264
our simple
REGRESSION
two-dimensionalrelationship
between
all give the same correct prediction
whatever coefficients
of
X' and
of
\276,
Z we may
no
fertilizer
meaning
and yield. While

can be attached to
come up with.
Example
While the previous extreme example may

have
clarified
some of the
theoretical issues, no statistician
would
make
that sort of' error in model
specification.
Instead, more subtle difficulties
arise.
In economics,
for exampie,
suppose demand for a group of goods is being related to pricesand income,
with the overall price index being the
first
independent
variable.
Suppose
aggregate income measured in money
terms
is the second independent
variable.
Since
this is real income multiplied
by the same price index, the
problem of multicollinearity
may
become
a serious one. The solution
is to
use real income, rather than
money
income,
as the second independent
variable.
This
is a special case of a more general warning: in any multiple
regression
in which price is one independent variable,
beware
of other
independent variables measured in prices.
The problem
of multicollinearity may be solved if there happens to be
prior
information
a priori
about
the relation
of/\177
7-then
even
this information
in
7. For
and
example, if
it
is known
that
the
case of
will
us to
allow
(13-9)
5/\177
uniquely determine the
perfect collinearity. This
is evident
from
regressionplane,
the geometry of
Figure 13-3.Given a fixed relation between our two slopes (\177 and 7) there is
only one regression plane rr which
can be fitted to pass through
F. This
is
confirmed algebraically. Using (13-9),our model
(13-2)
can be written
\"--
It is natural to definea
new
0(,
(13-11)
(13-11)
5Zi) -1t- ei
variable
wi
Thus
'JI- \177(Zi -1t-
x\177+
(13-12)
5z\177
becomes
(13-13)
and a regression of Y on
estimate of 7, it is easily
w will
yield estimates a and b. Finally,

using (13-9)'
if we
wish an
computed
c=
5b
(13-14)
INT El; PRETiNG
13-5
Suppose
baX' a
-Jr- b\177.l['\":\177
+
is fitted to 25 abservations of Y and the X \177s.The

are published
in the form, for example'
Y= 10.6 +
(So
2.6)
(q = 4.1)
The br
:'
265
regression
a nt- btX'I
REGRESSION
REGRESSION
ESTiM\177TED
AN
the multiple
ESTIMATED
AN
INTERPRETING
28.4.Y\177
4.0X\177
(st =
11.4) 02 =
(t\177, =
2.5)
(ta =
n L- b4X' 4
least
12.7Xa
squares
+ .84Xi
1.5)
(sa
14. I)
(s4 =
.76)
2.6)
(t4
.9)
(t5 =
1.1)
i13\177i5)
'
th, e reliability of t hle least

or hypOtheSis test.
The true'\177 effect of Xt on Y is the unknown population parameter fit;
we estimate
\177 with
the sample estimator b t. While
the
unknown
fi\177 is fixed,
our estimator'ib t is a random variable, differing
from
sample
to sampl e. The
propertiesof bt may be established, just as th e properties of b were established
in the
prewouis
chapter. Thus b\177may be ShOWn to be normal... again
provided
the sample si:':e is large, or the error
term is normal. bt can also be shown to
be unbiased, vith its mean fit. The magnitude of error involved
in estimation
is reflected in the standard
deviation
of bt which,
let us suppose, is estimated
to be st = I .4 as given in the first bracket below equation (13-155,and
shown
in Fig Ire 13-4. When
bt is standardized
with this estimated
standard
deviation,
it (/ill have a t distribution.
':
squares
ac
information
eted
fit, eliher
is used
often
estimate\177
in
a confidence
assessing
in
intervaI
To recap].tulate'
we don't know fit; all we know is that
whatever
it may
be, our estim ttor bt is distributed
around it, as shown in Figure
13-4', This
knowledge of how closely bt estimates
fit can, of course, be \"turned
around\"
to infer a 95 !iercentconfidence
interval
for fit from our observed Samplebt
p (bO
Estimated Standard
\177
\177deviation
of bl=
True/gl unknown
FIG.
13-4
DiStributiOn
Of'
the
11.4
bl
eStin\177atOr
b t.
266
as
MULTIPLE
REGRESSION
follows'
fl --- bl -4- t.025s1
= 28.4 4- 2.09(11.4)
= 28.4
size, k = is
4-
(13-16)
23.8
parameters already
n -- k degrees of
intervals
can be constructed for the other
fi's.
If we turn
to testing
hypotheses,
extreme care is necessaryto avoid very
strange conclusions. Suppose it has been concluded
on theoretical grounds
that
X'\177should
positively
influence
Y, and we wish to see if we can statistically
confirm this relation.
This involves
a one-tailed test of the null hypothesis,
is the sample
[n--25
estimated in (13-15),
and
freedom.] Similar
confidence
critical
the
is
number
of
t value with
the
t.o25
=0
the alternative
against
If
Hx'fx > 0
on fix --0,
and there will be only a 5 %
probability of observing
a t value exceeding
1.72; this defines
our rejection
region in Figure
13-5a.
Our observed t value
[2.5 as shown below equation
(13-15)]falls in this region; hence we reject H0, thus confirming
(at a 5%
significance level) that Y is positively related to X\177.
The
similar
t values [also shown for the other
estimators
below (13-15)]
can be usedfor testing the null hypothesis on the other \177parameters.
As we
see in Figure
13-5b,
the null hypothesis \1772 = 0 can also be rejected,but a
similar
conclusion
is not warranted for \177aand f14- We conclude therefore that
H0
is
true,
b\177 will
be
centered
p(bl)
(a)
/\1771
=
(Null
Original
values of bi
values for bl
hypothesis)
1.72
-' t.05
Do not reiect
Ho
t\177'
t'\177
Reject
tI
t2
Ho
Other t values
(b)
FIG. 13-5
(a) Testof fi\177.(b)
Test of other
.....
JJiLJ
fi's.
AN
INTERPRETING
the results are 'statistically

is related to ea, ',h. But the
As long a.\177we confine
we
esis
about/\1773
tered
argument
is
9.
X2; the
X\177 and
evidence is
for
significant
that
`va and
X4.
emphasis.
for
rei\177viewed
it is
While
for
are not statistically
results
267
ourselves to rejecting hypotheses as with/\177

and
too much dilTtculty.
But
if we accept the null
hypothwe may run into a lot of trouble
of the sort first encounSince this is so important in regression
analysis,
the
a \177d/94,
Chapter
in
REGRESSION
encounter
won't
/\1772-
significant\"
ESTIMATED
true,
stehlSwdt7;;
\177,\177aat\177lc\177l.
lYltSl!gsn;fia\177;;t(\177
t coefficient
our
that
example,
fo,,r
for
(.9)is
X3
not
\177t\177;l\177;\370s;etthh\177tre2\177
;\370avr\177las[;\177t\177SghltPheboeJeWt\177ce.,\177l
1
positively related to ,Va. In (13-15) this belief

is confirmed' Y is related to Xa by a positive coefficient.
Thus
our statistical
evidence is co tsistent
with
our prior belief (even though
it is not as strong as
we might
like
it to be). \370To accept
the null hypothesis fla -' 0 and conclude
that
X3 doesnt
affect \276, would
be in direct contradiction to'both our\177.prior
belief
and the :statistical evidence. We would
be reversing
a prior belief even
for
grounds
statistical
the
though
had we not
positi[e
any
for
looked
evlen
results contra\177lict
It
Xa
follow\177
and
X\177 to
estimating
evidence weakly confirmedit. It\177Would have been better

at the evidence. And
we note
that this remain i true
as
t value, although
becomes
firmation
Y is
that
b\177.,lieving
weaker.
our
Only if
that
if'
stat\177stlca*i
con-
do the statistical
or negative,
instead
prior grounds for believing

not be dropped from the
they should be retained, with
all the
strong
had
we
related to
(13-15);
eq\177hation
zero
is
our
smaller,
becomes
belief.
prior
from
this,
13e positively
Y,
they
should
info rmation on their t values.

It must
,e emphasized that those
who have accepted hypotheses have
not
necessaril
\177
erred
in this way. But that
risk
has been run by anyone
who
has mechaniially
accepted a null hypothesis
because
the t value was not
pertinent
statistically
cited
(because
the
true
it
w\177s
place.
first
null
khe
.i.e., if
hypothesis
is especially acute
was introduced strictly
for
not because there is any reason

less acute if there is someexpectation
.
are
as in
and
simple),
It becomes
I
\177here
difficulty
The
si&nificant.
when
theoretical
grounds
\177
\337
for concluding
that
the
case we've
convenience
to believe
that
y
and
it in
H0 is
,V are
unrelated. Su\177ppose for illustration

that we expect a priori that
H0 is true;
in
such a case, ? weak
observed
relationship
(e.g., t = .6) would be in some
conflict
with }our prior expectation of no relationship.But it is not a serious
conflict, and easily explained
by chance. Hence resolving it in favor of our
prior expecta:ionand continuing
to use H0 as a working
hypothesis
might
be
a reasonable udgment.
\177Perhaps
how
Y is
becat se of too small a sample. Thus

12.7 may be a very
accurate
description
of
significant because our. sample
relat e,:l to Xa; but our t value is not statistically
standard
deviation
of our estimator (sa = t4.1) is large as a consequence.
is small, and
th
268
MULTIPLE
REGRESSION
We conclude once again,

complete grounds for accepting
judgment,
statistical
it is
Y
to
H0;
belief playing
with prior
must be
acceptance
provides
theory
statistical
classical
that
in-
based alsoon extra-
role.
a key
Prior belief plays a lesscritical

role
in the rejection of an hypothesis; but
by no means irrelevant. Suppose,for example
that although
you believed
be related to X\177, Xa, and X4, you didn't really expect it to be related to
someone
had just suggested that
you
\"try on\" \177Vgat a 5 \177olevel of significance. This means that if H0 (no relation) is true, there is a 5 \177o chance
of
ringing a false alarm. If this is the only variable \"tried
on,\"
then this is a risk
.,V2;
wecan live with. However, if many such variablesare \"tried on\" in a multiple
regression the chance of a false alarm increases dramatically. ? Of course, this
risk
can be kept small by reducing
the level of error for each t test from
5
to 1 \177oor less. This has led some authors
to suggest
a ! \177olevel of significance
with the variables just being \"tried
on,\"
and a 5 \177 level
of significance
with
the other variables expectedto affect
\177Y. Using
this criterion
we would
conclude that the relation
of \276 and \177V\177
is statistically
significant;
but the
relation of Y to X\177 is not. despite its higher t value .... because
there
are no
prior grounds for believing
it. s
To sum up: hypothesistests require
1. Good judgment,
and
model being tested;
2.
An
of the
understanding
and limitations
assumptions
of the
understanding
theoretical
prior
good
of the
statistical
techniques.
PROBLEMS
13-4
Suppose a
yields the
of
Y on
based
on a
regression
multiple
estimate,
following
Y= 25.I
Standard deviations
\177
confidence
? Suppose, for simplicity,

of them) were independent.
example,
as .40.
for
1.2X\177
30'
1.0X2
--
0.50X'a
(.060)
(2.1)
(I.5)
(1.3)
( )
( )
the ! tests for the significance of the several variables

the probability of no error at all is (.95)k. For
k =
(+4.3)
limits
that
sample of n
(11.9)
t-values
95
three independent variables
Then
this is .60, making the probability
of someerror
(some
false alarm)
(say
as
10,
high
who thinks he would never wish to use such a double standard

might
suppose that
price level, X\177is U.S. wages, and X\177the nun-tber of rabbits in South Australia.
With the t values shown in equation
(l 3-15), what would he do ?
Anyone
Yis the
U.S.
VARIABLES
DUMMY
(a)
(1) The
spaces
blank
n the
Fill
(b) The following
above
or false.
true
of
coefficient
the
in
are either
,Y\177 is
estimated
269
estimate.
If false, correct.
to be 1.2. Other
scientists
and calculate other estimates.The

distiibution
of these
estimates
would be centered around
the
true
valtie of 1.2. Therefore the estimator
is called unbiased.
(2)
the5
(3)
.
\177olevel
Y,
ratler
ar
There
function s
at
various
income over
of the
in ana
applications
Intr6duct\177,ry
null hypothesis
\1772 =
of statistical
categories
\177
that
'Y,.
0 at
does
coefficient 1.0
0.
information: crosssection
two. In
time
i:\177cross-section
ysing
consumption is related to national

periods (time series);and sometimesthey use
section
we develop a method that
is especially
data; as we shall see, it also has important
how total
of time
number
a combinatior
prior reasons for believing

to use the estimated
not
,Y\177does
example,
econometricians estimating the consumption
use
a detailed
breakdown of the consumption of inincome
levels at one point in time
(cross
section);
tt ey examine
sometimes
(a)
major
For
s\370r\177
\177etimes
dividuals
useful
s.
that
hypothesis
null
VARIABLES
two
and time seril
the
accept
than
DUM? \177IY
13-6
were strong
it \177s reasonable
-\177fthere
for believing
to reject the
it is reasonable
of significance.
Y,
mflt\177ence
prior reasons
strong
were
\177fthere
inflt\177ence
samples
other
collect
might
this
series
studies
as well.
Example
how the public purchase of government

(Y). A hypothetical
scatter
of annual
observations .)f these two variables
is shown for Canada in Figure
13-6, and
in Table 13-2 It is immediately evident that
the
relationship
of bonds to
Suppose
bonds
(B) is
to investigate
\177ve wish
r \177lated
to
national
income
income follov/s two distinct patterns--one applying

in wartime
(1940-5),
the other in p\177:eacetime.
The norr
lal relation
of B to Y (say L1) is subject to an upward shift (L0
during wartir
re; heavy bond purchases in those
years
is explained not by Y
alone,
but al:
\177oby
the patriotic
wartime campaign to induce public bond
purchases. B herefore
should
be related to Yand another variable-..
war
(W).
But
this
is oni y a categorical, or indicator
variable.
It does not have a Whole
i.e.,
how
con s' \177mption
expenditures
are related
to income.
270
REGRESSION
MULTIPLE
B = 1.26 + .68Y
--
\177
'40
,41\177x
X \177
x\177.,
./x'\177
8 -_
\177
.\177,,42 \177
x '45
X
,/\17746
./\"- 49
x\177,,,,-\"\177,\177<S
'47
'47
'
\337
'37
3 '34
I
12
Y ($ billions)
FIG. 13-6
Hypothetical
scatter
of public
purchases of bonds
(B) and
national income (Y).
range of values,
but
only two: on the one hand, we arbitrarily
at 1 for all wartime
years;
on the other hand we set its value
peacetime years. SinceW is either \"on\" or \"off,\" it is. referred
ter\"
or \"dummy\"
variable.
Our model is:
B = \177+ t3Y + 7W + e
set its
value
all
0 for
at
to as
a \"coun-
(13-19)
where
W-
= 0
equation is seen to
This single
may
also
We
represents
for
be
years.
peacetime
equivalent
to the following two
equations'
B=v.+fiY+7+e
for
wartime
(13-20)
B=\177+fiY+e
for
peacetime
(13-21)
be called a \"switching\"
forth between (13-20)
variable.
With
war
and peace,
and (13-21).
that 7 represents the effect of wartime
on bond
effect of income changes. (The latter
is assumed
or peace.)
The important point to note
is that
we
and
back
switch
years,
for wartime
note
the
sales; and fi
to remain the
one multiple
same in war
regression of B on Y and W as in (13-I9) will yield the two estimated lines
shown in Figure 13-6; L\177is the estimate of the peacetime function
(13-21),
and
L2 is the estimate of the wartime
function
(13-20).
Complete calculations for our exampleare set out in Table 13-3, and the
procedure is interpreted
in Figure
13-7. Since all observations are W- 0,
or W = 1, the scatter is spread only in the two vertical planes rrx and rr 2.
Estimation
involves a multiple
(least
squares)
regression
fit
of (13-19)
to this
II
[.,-1
War
years
271
II
II
MULTIPLE
272
REGRESSION
1.26+
.68Y + 2.43 W
B=
(slope
= .68)
:.68)
13-7
FIG.
scatter. The resulting
Multiple regression with
fitted
The
slopes
b, and
as
of
c is the
L\177
a plane
and
estimated
variable (W).
plane
+ bY+
B=a
can be visualized
a dummy
resting on its two
L,. are (by

wartime
cW
(13-22)
supporting buttressesrq
assumption) equaP\370to
the
and
rr,.
value
common
shift.
l0 This
restriction
means that L\177and L\177.are not independently fitted. In other words,
our
least squares plane (13-22)is fitted
first; L\177and L\177.are simply \"read off\" this plane. Thus
Lx does not represent a least squares
fit to the left-hand
scatter,
nor does L 2 represent
a
least squares fit to the right-hand scatter.
Thus
the dummy variable method of fitting
a single multiple regression plane
reading off L\177and L 2, can be comparedto the alternative method of independently
two simple regression lines to the two scatters in Figure
13-7. Our model would
and the
B =
<z\177
+
B =
<z2
estimated slopes(fl\177and
fix Y
+ fl2 Y
fl=) would
e\177
+ e2
for
wartime
for
peacetime,
generally not be
the
and
then
fitting
be:
same.
(cont'd)
In a du m my
portant to
if
un\177
only
our
in
is in B and Y, their
relationship
cannot
be properly
into account. In other words, sinceexperimental
over he \"nuisance\"
variable
W is not possible, its effects \177'must
be r, moved in the regression analysis.To ignore this
variabl
e is to
control
explicitly
;s
only
involves
is taken
ir] our eStimators, as well

donsider
what happens
a bias
occurs,
bias
W must
and
terest
unh
estimated
invite
variables
why both
problem. it is imbe included. \177

Even
regression
as in any
model
variable
ierstand
273
VARIABLES
DUMMY
\177:the
sional B-Yplane,as in
scatter in
This
13-8a.
Figure
B and
dimensions
two
the !three-dimensional
projecting
variance. To see how

so that our scatter
an increased
is ignored,
as
if
Y.
Geometrically
Figure
13-7
infOlves
this
the two-dlmen-
onto
recognized as the
is immediately
13-6; we also reproduce from

that
diagram
L1 and L=, oU\177 estimated
multiple
regression
using W as a dummy
variable.
f we calculat\177
La, the simple regression of B on Y, it clearly
ha s too g?eat a
scatter
same
slope.This
Jncome
A
:rror is
Y. With no
projectedontd
to
means,
higher
be
of this
war
years
tended to b e high
scatter, higher bond sales
wartime would be (erroneously)attributed
to be expectedin
Y
of B and W Which
in Figure 13-7 would be
investigation
any
scatter
our
dimension,
B-W,. plane, as in Figure

13-8b.
the wartime effect is to look at
11 whicl\177 is too large. This upward bias would
bond
S\177tles that
should
be attributed in part
attributed
to wartime
has illustrated
the
be alpplied to
can
is in
applica:ions
to th e
due
be
only
s\177mple
in
difference
the
same 6ause:
income
to higher
Would
alone.
general nature of
a wide variety
removing
diagram the
In this
the
exar\177Ple
This
useful
that
estima[e
(erroneously)
This
in part to
fact
side
alo\177:\177e.
similar
ignores
way
bias is due to the

on the right-hand
beeattributed
should
that
to
up\177%ard
years:ithus
income
Figure
in
l\177Iotted
of
seasonal
shifts
in
variables.
dummy
but
problems,
time
one of the most

series data, as
explained nexf
Estimates
dummy
variable
of four parameters are required

for this model, rather than
the three' in the
todd (1%19); thus one advantage
of the dummy model is that it conserves
one extra degree!offreedom.

additional prior \177estriction--that
advantage.
For inst ance, in
The
disadvantag
Of
the
dummy
fitOde
I is
tha t
it
requi\177es an
two slopes are equal. But this is not always :a disour example
it may be better to assume the two slopes equal
than
to indepenc}ently
fit a wartime function
to only five obSerVations. The very
small
wartime
sample
\177ayyield
a very unreliable
estimate
of slope, and it may make better sense
to pool all the dh ta to estimate one stope coefficient.
tl This is
equiv
involved,
this re
represents
the
to a simple regression of
li ne would pass through
the ef\177,,ct of W on B.
a ent
ression
B on
these
W. Because
t wo
of the
means;
thus
peculiar
thei r
'scatter
difference
MULTIPLE
274
REGRESSION
BB=
1.26
lO
B=
1.57+.76Y
(simple regression of B on
\276;
slope (,76) is biased)
L2
+.68\276+ 2.43W
1,26+ .68Y
(dummy variable
regression; slope
(.68) is unbiased)
8
6
4
2
12
0o
(a)
Ix
9 x
B (wartime)
Average
5.55
Estimate of % the effect of wartime on B.

[Compare this biased estimate (3.45) with the
unbiased
estimate (2.43) in L\177.
in part (a)l
B (peacetime)
\177---Average
FIG. 13-8
(the
effect
one explanatory
when
Error
because the
of Y)
of
effect
categorical
IV because the
variable
variable
numerical
is ignored.
(a) Biased estimate
IV is ignored. (b) Biased
estimate
variable
Y is ignored.
of slope
of the
(b) Seasonal Adjustment

To
wish
to
illustrate,
examine
consider a spectacular
how department store
example from
sales of jewelry
real
life.
increase
Suppose we
over time.
When we plot quarterly

sales
(in Table 13-4) against time
as in Figure
13-9a,
we note how sales shoot up every fourth
quarter
because
of Christmas.
Since we are interested in the long-term
secular
increase in sales,
these
strange Christmas observations should be discounted.
This
calls for a dummy
TABLE
Canadian Jewelry
and Seasonal Dummies
13-4
( $100,000'
s)
Qa
24
29
29
50
24
30
0
1
29
51
26
29
11
30
12
52
14
30
15
29
16
13
25
50
01
9s8
'9
10
S )urce' Dominion
variable1\177'Q:\177 ?Or fourth
Even
this
in
me
the
rio
fiT
is
+ e
+/4Q4
Qo\177and
0
1
0
0
Ottawa.
our model
be adequate. If allowance
dummies
Q2
(13-23)
should
Qa should
be
also
be made
adde& A
for
rifftomy
in the analysis
at which we might
conclude
that explicit dCcount
swings. We may expect a strong seasonal influence
from prior
theoretical reasdning. Or, such an influence may be discovered after we Plot tPe :scatter.
Finally,
it may [\177ediscovered
by examining residuals after
the regression
is fitted. Clearly
those observatldns
indicated
by arrows (in Figure 13-9a) have consistently
high re'sidUals.
12 There
are
quarters
so that
quarter)
el may not
olher
of Statistics,
Bureau
S=
shifts
(s)
Q4
1957
1960
Sales
(Quarter Years)
1959
275
vARIABLES
DUMMY
thr\177e
be taken'\177of
should
To explain
points
seasonal
this, we look for something

they have in common. Their
commo
n property is
all ocC\177:r in the fourth quarter. Hencethe fourth quarter is introduced as a dummy
regressoro
This \177.chnique of \"squeezing the residuals till they talk\" is important ifi every
kind of regress: >n, not just time series; used with discretion,
it indicate s which
furth er
regressors may
.e introduced
in order to reduce bias and residual variance.
that
they
MULTIPLE
276
REGRESSION
50
40
cS 30
x x
x
x x
$: 31.4+ .31T
X
(biased
slope
= .31)
\177,20
co 10
I
1957
50
1959
1958
1960
4O
24.2 + .075T + 25.5Q4

+ 4.4Q
8 + 4.7Q9.
cS 30
m..\177x
---\177
\17720
Effect of
(unbiased
TaloneonS
slope = .075)
co 10
I
1957
FIG. 13-9
growth
Secular
justment. (a) Inadequate
1960
1959
1958
in Canadian jewelry
sales,
with and without
seasonal
adregression of S on Talone. (b) Multiple
regression of S on
T, including
seasonal
adjustment.
simple
needed for the first quarter,

because
Q2, Qa, and Q4measure the
a first quarter base. (Whether
or
not to include the various
regressors Q4, Qa, Q,., can be decided on statistical
grounds,
by testing for
statistical significance.It is common
to include
them all in such a test, and
reject or acceptthem as a group. But such a statistical
test
on data as extreme
as ours would
be superfluous.)
Our modified model is now
Q\177 is
not
shift
from
The least
seasonal
squares fit
adjustment
The least squares

13-3. Equation system
5 unknowns.
la
\177z+
fit
/3\177T
\177a is
(/24Q\177
+/3\177Q\177)
in Figure
graphed
is exactly the
flaQa
same every
to this
model was calculated by
(13-4)
was extended
to a
system
year,
(13-24)
13-9b. Notice
i.e.,
each
that
our
year there
is
similar to that of Table

estimating equations for the
a method
of 5
DUMMY
the same
up\177
(b2)
shift
ard
seaso/
(lhese
shift
\177aI
in
277
VARIABLES
the first and second quarters.

not always be positive, as in our
fit between
our
need
coefficients
example.)
By contr. .st, the simple regression of S on T without
quarterly
adjustment
is graphed in ?igure 13-9a\177 It is a poor fit, with large residual variance. Even
worse,
the ca culated slopeshowing
the relation
of S to T is biased,for the
same reasons a s i n the bond example of part (a).
(c) Seasonal [djUstment
(Moving
Dummies
without
Average)
;
Dummy
comi
Another
the
\177
\177ariables
means of seasonally adjusting
the only
is to take
method
non
serie{,
time
not
are
a moving
(over
average
as shown in Table
13-5. Note how the wild
out in this averaging process. The
desired
Christmas is i\177'oned
to
time
n )w
can
S' on r.
It is
be estimated
a simple
by
regression
a whole
seasonal
relation
of seasonally
data.
year) of
at
of sales
swing
adjusted
this method with

the
dummy
variable
is that a total of three
observations
and end of the time series, in order
to g\177t the
moving\177avera\177e started
and finished. However, although
it is less evident,
the same loss'\177is involved
in using dummy variables, since three
degrees
of
freedom are lc\177st in estimating
the shift coefficients/\177,/\1773,
and
An advantage
of the moving average method
is that
it is not necessary
to assumea ionStant
seasonal
shift;
thus the adjustment for any
quarter
to
resting
int{
alternative. An apparent
are lost at th.\177beginning
compare
disadvantage
TABLE 13-5
Moving Average
S'
Time
\17757
1
1'\17758
$ (Unadjusted)
24
23
29
29
50
24
30
29
51
by
(Adjusted
Quarter
Four
Moving
Average)
\177\254(24
+29
\177(29\177
+
+29
29 +
\337
\337
+50)
50 +
=33
24) = 33
=33.25
= 33.25
= 33.5
REGRESSION
MULTIPLE
278
from year to year. The advantage

shifts and the relation of $ to
varies
seasonal
same regression.
both
fhat
in the
simultaneously
is
adjustment
average
moving
(A
is
variables
dummy
of
T are estimated
first
the
only
stage
is completed can S' be regressed

on
Another
advantage is that the dummy
coefficients
(fl\177, f13, and
f14)
give
index of the average seasonal shift,
and
tests of significance on them
two-step process;
using standard
be undertaken
easily
it
after
only
in a
T.)
an
can
procedures.
PROBLEMS
13-5
(a) Using the
simple
(b) Using
multiple
the
Is
(a)
the simple
(b) Compute the
(c)
Which
by
13-8
13-7
the
of jewelry
regression of
sales
$'
13-5,
Table
in
adjust-
seasonal
on T;
(adjusted)
simple
seasonal
using
to the
Referring
from the
available:
S for
regression
of $ (unadjusted) on T;
in (a) and (b),
do you think
better shows the time trend of sales ?
agrees
more
closely with the slope b\177 = .075 estimated
(1) Which
(2)
the sales
of 1961)
2 slopes
the
Of
predict
13-9,
Figure
quarter
of $ on T alone;
regression
of $ on T, including
than (a)?
years
two
the
Compute
in
first
regression
better
any
this
13-6 Referring to
13-7
17, the
(T =
quarter
ment.
sales
the jewelry
to
Referring
next
4th
dummies
jewelry sales in
consider
13-4,
Table
11th quarter.
the
to
9.
Supposing
the eight quarters

the only data
were
this
(a) Fit a simpleregressionline of $ on T, without

quarterly
adjustment;
(b) Is your slope estimate (time
trend)
unbiased
9. Why
9.
Referring
to Figures 13-6 and 13-8a,suppose
the
last 4 years are
missing. If a simple regressionof B on Y is calculated (ignoring W),
will
the bias of the slope be lessor greater
than
before
(when all the
years were used)?Why ?
ANALYSIS OF
COVARIANCE
REGRESSION,
OF
ANALYSIS
(a) Regression with

of
Analysis
If all the
regression
This can
Dummies
Equivalent
AND
VARIANCE,
to Analysis
of Variance or
Covariance
independent variablesare categorical (dummy)
analysis is essentially the
be proved
in
general;
familiar
but it is
analysis
more
instructive
variables,
of variance
to
then
(ANOVA).
illustrate
it in
applied analysis of
with a
used,
n[odeIof the
Y=
cz
in Table
ts analyzed
Note also that'!
in
both
of
estimate
to be identical\177
to our
sion on a nur\177ericaI
This could alte'rnatively
to
and
analysis
analysis
as
in
value (b
same
10-3
(17\177
variance (48) is the
=--8)
8).
=-
F2
--
so i s
same;
earlier exampleof explaining
sales'
bond
as a
regres-
variable (wartime).
be described
as a combination of standard regression
\177alysis
of variance.
Technically,
this combination is referred
of covariance (ANOCOVA), although
this
term
is Often
and
(income)
variable
a dummy
reserved for c\177.ses in which the effect of the dummy

variable
(wartim:e)
of prime interi,\177st and the other variable (income) is explicitly
introduced
only to remov i its noise effects
(i.e.,
to prevent
the sort of error shown
a
Another
pplication of
education,
biasing the
on income;
here the
study
concern
major
of
WOuld
',gression
res\177 \177lt.
Summary
Multiple
We
tions.
\337
in
income of the dummy

variable
(negro
versus white), with
a
on other numerical variables (years of experience,
etc. ) simply
a means
of keeping these other influences
from
r
simultaneous
\177
might be
of covariance
analysis
the
the effects of r\177

cial discrimination
be the effect o\177
\177'
is
13-8b). -
Figure
(b)
the
procedures are:seen
the two
Hence
(x/\177-\177x/'i-\177).
the
find
Problem
\177
referrid
We
13-6. We
that we found
tests the residual
standard
(13-25)
women.
in groups
differer\177ce
error\177 1
men
1 for
--
The data
been
fiG +
G = 0 for
the
(Y) of
have
alternatively
form:
where
for
of whether the income
regression could
Dummy
10,13we
Problem
In
variable.
problem
the
to
Variance
differs.
wonen
and
men
279
CoVARIANCE
AND
one independent dummy
c.tse of
simplest
the
OF VARIANCE
ANALYSIS
REGREssION,
independent
1.
\"
2.
ANOV.
Standt
variables.
3.
',gressionis an
r\177
deft
ANOC(
and numerical
\177e
three
riables:
extremely
cases,
special
\177dregression\"
is equivalent
useful
tool with many
distinguished
is regression
on
to regression on
VA (analysis of
ariables.
only
covariance)is regression
categorical
on
of the
variables.
numerical
only
broad apPlicanature
the
by
both
(dummy)
categorical
28O
REGRE
ANALYSIS
SION,
OF
These three te, hniques are
comparedusing
13-10to 13-13,which
the possible
in Figure
The
show
data
hypo!\177hetical
the
hypothetical
data of Figures
ways that mortality
may be analyzed.
13-10 shows a sample of observations
American men. Applying

reject the hypo[hesis that
the
true slope
mortalit\177 of
the
of
the
affect
does
age affects
of how
data
the
If
!irnortality
mortality\177
shown in
rate.
In the
is collected into
Figu4e
Note
\177 13-11.
--
0;
process we
we WOuld
regression,
standard
/9
thus
we conclude
derive a
three groups, we
up
come
this is exactly the
that
281
COVARIANCE
AND
VARIANCE
useful
that
age
estimate
b,
the scatter
with
same set of mortality
s as in Figure
13'10.
The only difference is that
we are n\177o\177
c about the age (X)variable.
Now
ANOVA
can be appli ed
to this data to t,\177'st whethe r the population means of these three scatters
are
equal. Once ag\177Lin, the conclusion
is that age affects mortality.
However,
ANOVA
does tot tell us holy age affects mortality,
unless we extend it to
multiple
compa:'isons.
Moreover, multiple comparisons will yield a whole
complicated tabl e, whereas
standard
regression provides a singledescriptive
number
(b) shov.'ing,
how age affects mortality.
So long
as 5\177is numerical,
as in Figures 13-10 and 13-11,we conclude
that
standard
re'\177resSion
i s generally
the preferred technique. But
when
x
(Y)
\370bservatimi
longer
as specifi
is categorical, it icannot
how mortality d{pends
categories(American,
humeri
is out of the qued,
these on a
If
For example,
be applied.
on
British,
Cal scale--or
tion, \177 and
our
nationality;\177
in
Figure
X variable
13-12
we graph
ranges over
vari6us
etc.) and there is no natural

way
of placing
even ordering them. Hencestandard regression
ANOVA
must be used.
s dependent
on income as well as nationality,
the
analysis
shown in Figure
13-13 is appropriate.
This uses nationality
dummies,
with th\177
tt e, numerical variable income explicitly
introduced
to eliminate
the error
that this has greatly
improved
our an a it might otherwise cause. We confirm
in Figure
13-12 that a national
characteristic of tj [ysis. Whereas it appeared
was a lower mortality rate than
the
Chinese,
we
see in Figure
13-1 \177eBritish
that
it
is
not
so
simple.
The
h\177ight
of
the
fitted
planes
for
China and the U
/ted Kingdom
are
practically
the same. The lower U.K.
mortality
rate is e
cplained
solely
by higher income.
mortality
of covariance
regression
x\177Standard
However,
if
this
tech\177 could
13-10.
Figures 13-12 an
of Figure
\177In
ique
also
is to
be applied,
be applied,
with
it
a line
is more
fitted to the scatter

efficient to use the
in Figure
ungrouped
13-11.
data
are assumed drawn from a single

age group; we
fa 13-13, all samples
tors influencing
mortality.
the stt
scatter in Figure l 3-12 dent will note that a standard linear regression line fitted to the
does not matter.
Yet if China is graph will yield b \177 0 and the conclusion that nationality
that
nationality does matt, ed last, rather than first, b \177 0 and it would be concluded
r. Thus, the conclusion depends
on the arbitrary ordering of our
nationality
variable.
consider only
\1770
To
confirm,
other
,Age
FIG.
13-10
\"Standard
regression,\" since X
25
Younger
30
35
Middle
Older
Age
FIG. 13-11
X is
x
x
x
x
x
x
x
x
x
x
is numerical.
group
grouped into classifications, and
ANOVA
may be
x
xx
U.K.
U.S.S,R.
x
xx
x
China
U,S.
Nationality
FIG. 13-12 X is categorical,
and
282
X
ANOVA
must be used.
used.
;\367i
ANALYSIS
SION,
REGRE
VARIANCE
OF
283
COVARIANCE
AND
Nationality
X
:apita
income
China
FIG.
13-13
In
A:
\177lySis
sumn
the independ,
be described>y
a simple
function.
is a
more
and
(nationality)
tool
powerful
numerical
whenever
and the dependenceof Y onX

Analysis of variance is appropriate if
numerical
X is
variable
independent
regressionis the
variable
:nt
U,S.S.R.
for a categorical variable

variable (income).
of covariance
ary, standard
U.K.
U.S.
set of
can
the
unordered categories.
PROBLEMfi
13-9
'13-10
a confidence interval for \177 using

with the answer to Problem
Con,, truct
Corn
\177are
Usin
the
fertil'
in Problem
data
dummies.
using
two
is the
result
type,
[zer
Prol: lem
in Table
data
the
i3-6.
10-3b.
.
10-2, estimate the regression of yield on
this
Compare
answer
your
with
to
10-2.
i3-11 The[oli\370Wing
of a
test of gas consumption
on
a kample
of 6 :ars
Miles Per Gallon
Engine
ttorsepower
,
Make A
Make
(a)
the
)etermine
of tl
\177etwo
(b)
Graph your
makes,
21
210
18
240
15
310
20
22O
18
260
15
320
in the performance (miles pergallon)

for horsepower differences.
in Figure 13-13.
difference
allowing
results
as
284
(13-12)
REGRESSION
MULTIPLE
(a) Based
on the
how education
analysis of
father's income
use the
information,
sample
following
describe
to
covariance
is related to
and placeof residence.

(b)
Graph
your
results.
Years of
Formal
(E)
Education
Urban Sample
Sample
(?)
15
58,000
18
11,000
12
Rural
Income
Father's
9,000
16
12,000
13
S5,000
10
3,000
11
6,000
14
10,000
14
chapt
14-1
SIMP
CORRELATION
LE
analysis showed us how variables

are Iinearly related;
show
us the degree to which
variables
are linearly
related. In regression analysis, a whole
mathematical
function
is estimated
(the
regressm
n equation);
but correlation
analysis yields only one number --an index desi\177!
\177ned to give
an immediate picture of how
closely
two variables
move
togeth e
r. In correlation analysis, we need not worry about cause and
effect relatim
is. Correlation between
X and
Y can be estimated regardless
of whether: (
t)
X affects
Y, or vice versa; (b) both
affect
each other;
or (c)
neither
direct]
/ affects the other, but they move together because somethird
variable
infiu\177
races
both.
Although
correlation is a less powerful
technique
than
regressic
n, the two are so closelyrelated mathematically
that
correlation
often become a useful aid in regression
analysis.
Simple iagression
correlation
a
\177alysis
(a) The
Popul
In
equati
varia
random
ation
will
Correlation
'n (5-22)
\177les
move
We
have
together'
Coefficient p (rho)
already
axr
defined a useful index of

, the covariance of X and
variablesused there were deviations from the mean'
two
how
Y.
The
(14-I)
It
will
be
uset
ul
to
expressthese
deviations
285
in terms
of
fully
standardized
286
CO}t}tEL^X\177ON
define
i.e.,
units;
the
variables'
new
X--\177x
(14-2)
ax
Y--Pt
Correlation pxr
being
in (14-2)
in
(5-22),
the
replace those in
(14-1).
Thus
our
be
(generally)
(b) The
more
interpreted
to r, the
attention
unknown
only difference
correlation
Population
This will
axr
to covariance
similar
is
variables
the
that
in
fully
Section
(14-3)
14-1(c) below;
sample correlation coefficient
used
for now we turn

this
estimate
to
p.
population
Sample Correlation Coefficient
with (14-3)
By analogy
Sample correlation
rxy -- n
Now consideran
s3
sX
index; (becauseof the
of this
development
intuitive
(14-4)
simi-
of the two concepts, some of this

interpretation
will closely parallel
the development of covariance
in Chapter
5-3). As our example,\177
we use the
marks on a verbal
(Y) and mathematical
(X) test scoredby a sampleof eight
larity
college
Each
students.
student's
scatter shown in Figure

columns of Table 14-1.
14-1a;
performance is representedby
this information is set out in
Sincewe are after a single measure of how

related, our index should be independent
of our
both
axes in Figure 14-lb, with
both
x and 3t now
the mean;i.e.,
Values
of the
x=
translated
,\177
and
variables are shown
\177d
=
in
a dot
the
on the
two
first
these variables
are
of origin. So we shift
defined
as deviations from
closely
choice
Y --
columns
F
3 and
(14-5)
4 of Table 14-1.
20
40 X =
60
80
y:Y-Y
P.
\337
P1
20
10 \370
x:X-X
I.
(b)
X:
60
$y
lo
Y= 50
X= 60
FIG.14-1
Sca
ter
of math
and
(c) Change
verbal
scale
scores.
(a) Original
of axes
to
287
standard
observations.
units.
(b)
Shift
axes.
288
Suppose
and
in the
be
in
is negative
for observations
tte positive, the
observations
most
produc\177s
x//
positive relat\1770nshir\177
(i.e.,
rather
quadrants,
fall
will
be Positive,
between
the
rises
oge
when
than \177\177phill;
lding
in the
will
in the
If
negative).
first
as
x and
\177/coordinates
for
also holds true
and
third
will
their
second or fourth
X and Y move
together,
quadrants;
consequently
sum
a reflection of the
Y But if X
in the
for will
our fall\177 x\177j
index.
most
observations
a negative
value
ob-
quadrant,
and Y are negatively

other falls), the original scatter will run
X and
will
any
coordinates negative.The product
such as P2
other
its
This
with both
on ly
most
14--lb, both
x\177j positive.
quadrant,
for each student,
good measure of how math and

Whenever an observation such as P\177
us a
together.
in Figure
product
values
coordinate
gives
x//)
move
to
tl \177ethird
coordin
(one
(\177
quadrant
a!nd their
positive,
servation
This
tend
fiXst
x and
the
multiply
all.
results
verbal
falls
we
thej
sum
289
CORRELATION
SIMPLE
rdated,
downhill
second
and
We conclude
fourth
that
as an index (,f
correlation,
\177 x//
at least carries the right
sign.
Moreover,
there is no relationship between X and
Y, and
our observations are
distributed e\177 ',nly over the four quadrants, positive and negative
terms
will
cancel, and tl\177tis index will be zero.
when
There at4 just

the units
in
\177hich
two
that
ways
x and
\177t/are
\177
be improved.
x!/can
(Suppose
measured.
First
the math
out
o f 50 instead of I00; x values
and
our \177 x\177/index
hair as large-\177-even
though
the degree
to which
verbal
and
performance
s related would not have changed.) This difficulty
by measuring both x and ?,, in terms of standard tinits;
i.e.,
are divided
b' their observed standard deviations.
marked
Xi
--
of cou
on
depends
been
be only
mathematical
would
is
both
avoided
x and
?/
sX
Y\177-
where,
it
test had
(14-6)
'se,
o
and
\177(14-7)
This step is sl
Our
new
own
in Figure
14-1c.
only one remaining flaw' it is dependent

on
size. ( .guppose we observed exactly the same sort of scatter from a
sample of do{\177ble the size; our index would also double,eve n though the
sample
index
\177 x\177:g\177
has
290
CORRELATION
Ca)
Y
{r=-ll
(d)
\275c)
FIG.
picture of
how
(f)
Scatter diagrams
by
This yields the
the
sample correlation
terms
is the same.) To
rather n- i, the
avoid
divisor
this
in
coefficient'
n--1
is recognized to be our definition

the original observations (X\275,
of
correlation coefficients.
associated
sample
,'
which
their
and
move together
size n or
variables
these
we divide
problem
(14-7).
14-2
(e)
(14-8)
xiy,
in
Y\275),
(14-4).
by
r may
substituting
be expressed in
s x and s r in
into
(14-7)
( 4-4)
1)'
(n --
and cancelling
291
CORRELATION
SIMPLE
(x; -
F)
--
52 (v,. -
'F)
- are appliedto (14-9)to
The dat
Example
i in Table 14 1
coefficient be!ween the math
and
scores of
verbal
calculate
the
(14-9)
correlation
our sampleof eight
sthdents.
654
(14-10)
= .62
=
\177/i1304)(836)
given
in Figure
14-2; especially note
perfect linear association, the product
of the
coordinates
n every
case is pos t\177ve; thus,
their sum (and the resulting coefficient of c..rrelation)is as large as possible. The same argument
holds
for
the perfect i\177',t)erse relation
of Y and ,g shown in diagram
d. This suggests
that
r has an upper limit of + 1 and
a lower
limit of -- 1. (This is proved
in
Some
:a
Section
(f)
behaves is
of how r
diagram b. \177ghen
is a
there
bi\177,low.)
e andf. Our calculation

of r in either case is
of the coordinates are offset
by
negative
ones. Yet widen
we examnine
the two scatters, no relation
between
X and Y
is confirmed i\177e but a strong relation is evident
in f; in this case a knowledge
of' X' will
tell'us a great deal about
Y. A zero value for r therefore does not
imply
\"no re ation\"; rather, it means
\177'no linear
relation.\"
Thus correlation
\177ompare diagrams
Finally
zero,
products
positive
becaus\177
is a measure of linear
relation
only;
it is of no use in describing
relations.
This brings us to the next critical question' \"In
what can we nfer about the underlying
population
p?\"
(c) Inference
Before
r,
statistic
from
and
x\177
which
verbal
e can
student.
If wi i
diagram
will
inference
about p from Our sample
assumptions
about the parent population
was drawn.
In our example, this would
be th e math
by all college entrants.
\177rsample
scored
many
might
statistical
any
our
clarify
This pop!ulation
of course, b e
draw
must
m :\177rks
r,
r to p
from
wi
nonlinear
calculating
subdivide both
be divided up
this
in
X' and
in
14-3, except that
as in Figure
appear
more dots
a
scatter,
Y into
Checkerboard
each
there
representing
class intervals, the area

pattern,
Fro m the
Would,
another
in
rleiative
our
CORRELATION
292
Math
FIG. 14-3
Bivariate
scattergram
population
frequency (sampling probability) in

Figure
14-4
is constructed.
\177The
each
of
histogram
score
(math and verbal scores).

the squares,
would have
the histogram in
approximately the
probability
density
in Figure 14-5. To conclude:in examining
student, neither his math score ,V nor his verbal score Y is predetermined;
both are random variables. Comparethis with our example in
Chapter 11, where one variable
(fertilizer)
was predetermined.
This distribution
in
Figure
14-5 is called \"bivariate
normal.\"
This
means that the conditional distribution
of ,Y or of Y is always
normal.
shape of the
a random
slice the surface at any value of Y, (say Y0), the shape of

the resulting cross section is normal. Similarly,
if we select
any .,Y value
(say ,V0) and slice the surface in this other direction,
the resulting
cross
section
is also normal.
It is worthwhile
pausing
briefly
to consider the alternative way that the
bivariate
normal population shown in three dimensions
in Figure 14-5 can
be graphed
in two dimensions.
Instead of slicing the surface
vertically
as we
did in that diagram,
slice it horizontally
as in Figure
14-6. The resulting
Specifically, if we
FIG. 14-4
Bivariate
population
histogram.
example is of a finite population, but a similar argument would apply

for an infinite
population. Moreover,instead
of using heights for probabilities,
we could use dots of
different
sizes; see Figure 5-4a.
\177
Our
Jr(,
\177
FIG.
section
cross
14-5
d :nsity. This
dimensional
this surface
Bivariate normal
representing
ellipse,
is an
probability
all X,
\275ariable
as a
associated
set of
Figure 14-7 to
in
isoprobabilitellipses.If the
its major
is marked
th e
with
combinations
the
in
\"c\"
i\177,Y space in Figure I4-7; isoprobability ellipsesdefined

sliced horizontally
at higher and lower levels
are
also
one
be useful
distribution.
curve
\"isoprobability\"
(Onceagain, nany social scientistswill

of forcing
a three-dimensional function
showing
293
CORRELATION
SIMPLE
major
axis
(d)common
bivariate
increases.
will
It
also
to all these
5 about
normal
distributionsconcentrate
Several
examples of populations,
cc\177rrelation coefficients
p are shown in Figure
14-8.
aX\177s,
shown.
as the familiar strategy

a two-dimensional
space by
into
the
two
when
this
recognize
isoquants, isobars,or whatever.)
mark
same
and
their
that the parent population is bivariate

normal,
inferences
p can easily be made from
a sample
correlation
r.
Recall the in: brences
about
\177from
P in Chapter 8. Using the same
reasoning
that establi s \177ed Figure
8-4,
Figure
14-9 is constructed. Thus fro .m any
Provided
about
sample
example,
the
p\177pulation
r, a
\1775\177
if a
confidence
interval
sample of 25 students
for the population p can be found,

For
r = .80, the 95 \177 confidence
interval
has
(Y,X)
FIG,
14\1776
An
isoprobability
ellipse from a
biVariate
normal
surface.
FIG. 14-7 The
bivariate
normal
distribution
shown as a set of
isoprobability
ellipses.
FIG. 14-8 Examples
of population
294
correlations.
295
CORRELATION
SIMPLE
q-
q-
q-
1.0
0 +.2 +.4 +.6 +.8+1.0
-.4 -.2
.8 -.6
correlation
Sample
FIG. 14-9
various
951\177'
o
bands
Confidence
sizes n. (This
sample
chart
Pearson from F. N. David,

Tables
of the ('orrelation Coefficient
bution
for p
is reac
vertically
for correlation p in a bivariate normal population, for

with
\177
is reProdUCed
the permission of ProfesSor
E. S.
of the Ordinates and Probability
Integral of the Distriin Small
Samples, Cambridge
University
1938.)
PreSs,
as
.58
< p
(14-11)
< .90
Becaus,
O f Space limitations,
We Shall concentrate in the ba!.an?
of this
chapter on ample correlations,and ignore
the corresponding
population.
correlation s BUt eaCh time a sample correlation is introduced,it should
be
recognized
\177at an equivalent
population
correlation is defined similarly,
and
inferences
ray be made about it from
the sample correlation.
PROBLEI\177
14-1
Son's
Height
(inches)
Father's Height
(inches)
68
64
66
66
72
71
73
70
\177'
66
69
\177
CORRELATION
296
From
(b)
The
(c)
At
random sample of 5
the above
5\177o
and
heights,
father
find
for the population correlation p;

can you reject the hypothesis that
interval
95 \177 confidence
the
son
r;
correlation
sample
The
(a)
level,
significance
p =07
sample of student
14-2 From the following

Student
(a)
Second Test
80
60
70
40
40
30
40
40
60
(b) Calculate the
95 \177oconfidence
find a
interval
Y on X, and
of
regression
90
and
r;
Calculate
Test X
First
grades,
find
for p;
a
95\177o
confidence
for fi;
interval
null
hypothesis
and the estimated regression line;

can you reject
p = 0?
null
hypothesis
\177 =
(c) Graph
the
(d) At
the
(1) The
(2) The
5 data
points
level,
\177osignificance
(d) Correlation and
If regressionand
scatter of math
(X)
07
Regression
analysis
were both applied to the same
(Y) scores, how would they
be
related?
Specifically,
consider the relation between the estimated
correlation
r, and
the estimated regression slopeb. In Problem
11-4(b)
it was confirmed that
correlation
verbal
and
_\177xy
(14-12)
E x2
and from
(14-9)
noting
that
both x and
\177
When (14-12)is divided
by
defined
as deviations
xy
(14-13)
=
\"
are
,/E
42;y
(14-13)
(14-14)
cORRELATI
SIMPLE
If we divide b )th the numerator
inside the
and denominator
297
\370N
sign
root
square
byn--1
- 1)= sr
(14-15)
sx
1)
or
b=rS__L
(,!4-I6)
sx
This
close
argument
cr :respOndence
late r. Note that
(e) Explained
of b and r
if either
!!i
ii
,,
role
important
an
play
will
zero, the other will
r or b is
also
in
the
be zero.
and'Unexplained Variation
Figur
14-10 we
reproduce our sample of
(y)
straightthe information
in Table
14-1. Now, if we \177ished
to predict
a tudent's verbal score (Y) without
knowing
.,V, then
th e best
prediction w( ul d be the average Observedvalue
(P).
At x\177, it is clea r from
this
diagram
hat we would make a very
large
error
namely (Yi -- Y), the
deviation in (\177from its mean. However, once our regressionequation
has
been calculat
',d, we predict Y to be Pi. Note how this reduces
our error,
since (Pt - i\177) the large part of our deviation is now \"explained.\"
This
leaves only a relatively small \"unexplained\"
deviation
Y\177- ?\177. The !! total
In
scores, along ,/ith

forward way \177rom
the
fitted
of
set out
regression
on
Y/\276i
- \177=total
dewation
rnath
Y/=
explained
deviation
by
, \177, '\177.\177,.-.'\177
\177,.
\276=\177
'
\177/'\177.\177'\177'\"\177
Fi(
\337
14-10
The value
of regression
(X)
and
not
regression
\177- \276=deviation
ex;lained by regression
xi
in reducing
variation
verbhl
in a
calculated
.,V,
in
Y.
298
CORREL^T\177OS
of
deviation
Y is the
sum:
--
Y) =
(Y\177
total deviation
It follows
--
is that
is surprising
YO, for any
--
-- explained deviation
unexplained
q-
(14-17)
deviation
Y) =
\177
(?\177-
(L
\177
(14-18)
-- L)
when these
holds
equality
same
this
\177)
deviationsare
i.e.
squared,
_
or, total
establishedin
we
Since
same way.
may write,
( Yi
\177
--
total variation
This equation
counted for
Y)=
b2
\177
the fact
explicit
11-4(a),
(14-21)
bxi
\177i =
x \370q-
\177
by X
that
regression
estimated
the
explained
variation
makes
by
y)2
--
a very
Recall
(10-16); (14-19)can be
(14-19)as
to rewrite
convenient
(14-19)
variation
unexplained
squared deviations.
according to Problem
(L --
is often
q-
of variance
analysis
in
the
much
sum of
as the
defined
is
similar conclusion proved
- L)
=E(?
F)=
variation = explained variation
where variation
it
(Y\177
that
\177 (Y,\177
What
Y) +
--
(Y\177
( Yg
(14-22)
?z)2
q- unexplained variation
explained
is that
variation
coefficient b.
This
procedure
acof
total variation and analyzing

its components
is called \"analysis
of variance
applied to regression.\" The components of variance
are displayed
in the ANOVA Table 14-2similar
to Table
10-6, (bearing in mind a that we
decomposing
\177
For
proof,
\177
(Y,
square both sides of

--
F)\177=
The last term can
\177
(14-17),and
F)+
[(Y\177-
\177(y\177._
be rewritten
over all values
of i:
Y\177)I
\177'
F)e+\177(y\177_
using
y\177.)2+2\177(r, --
F)(Y, --
Y\177.)
(t4-20)
(14-21)'
2b
this sum vanishes: in

to estimate our regression
(Y,-
sum
\177 x\177(Y\177
--
Y\177)
it was set equal

to zero in the normal equation
(11-I5)
used
line. Thus the last term in (14-20)
disappears,
and (14-19) is
proved. This same theorem can similarly
be proved in the general caseof multiple
regression.
A further justification
of the least squares technique
(not mentioned
in Chapter 1t)
is that
it results in this useful relation between explained,
unexplained,
and total variation.
a And
also noting that our terminology
for degrees of freedom has changed,
e.g., the total
number of sample observations
is now designated simply
as n, rather than
nr.
But
fact
coRREI,ATio
SIMPLE
TABLE 14-2
(a)
Degrees of
Source of lariation
Explained
(b\177?
or
\177
b2
\177
of Verbal
of
Sources
(by
Explained
regression
Total
expla
now
const
plained
variar
the
ning
14-1)
Degreesof
(d.f.)
Freedom
Variance
3.87
328
328
84.7
x/
1'). From this, a null hypothesis

the question is whether the ratio
as before,
?,)2
n-1
than
rather
Y,
ucted;
_:2
\177(\276i
508
836
may be
-.-
t/
Variation
Variation
S2
-- 2
Scores (Table
Math
and
residual)
Unexplained
reject
?i) 2
(I5
\177 (\276\177-
(b) For sampll
Variance
Z(?,--
regression)
Total
are
Freedom (d.f.)
Variation
residual)
Unexplained
Linear Regression
Table for
ANOVA
General
299
.ce to unexplained
variance is
hy i \177othesis
that
Y is unrelated
to
1'. Specifically,a
fi
ex-
1 to
than
greater
sufficiently
on
test
of the
the
Of
test
hypothesis
involves form
Ho'fi = 0,
ng the
(14-23)
ratio
regressipn
d by
explaine,
variance
unexplained variance
(14-24)
s \177
-
5 \177osignific,
of the
distribU
ice test
tion
involves
in the
from (14-24)e tCeeds this, reject the

We
must
hypothesis
confidence
int(
-23).
rval
The
for
fi
(as
critical
tail. If
the
F value
sample
Ieavqs
which
\177
calculated
F value
hypothesis.
that this is just an

first method
using
:mphasize
(1
the
finding
right-hand
in Section
alternate way
the
of
testing
t distribution
12-7) ::isusually
preferable.
to
th e
find
null
the
CORRELATION
300
Note
the
that
F and
are related, in
t distributions
F=
there
where
calculated
is one degree of freedom

in (14-24) is just the t 2
by
general,
t \177
numerator
of
the ANOVA
the
in
(12-36),
of
F. Since the F
F-test of this
is justified.
section
Example
In Table
and math
the
14-2(b)the
example.
score
of Table
bottom
(The necessary
14-1.)To test fi
r -Since this
null
of 5.99, the
short
falls
328
84.7
computational detailsare shown

evaluated to be:
on
0, (14-24) is
(14-25)
3.87
critical 5 % point
F,
of
we do
not reject the
using
(12-36)'
hypothesis.
The
test of/9
same
= 0 could be equivalently
s/x/\177
this falls
tails of the
critical
value
null hypothesis
the
(both for the calculated and for

must
follow
from both tests.
Alternatively, a 95% confidence
from (12-30)'
tg
= 1.97
9.2/x/1304
x\177
short of 2.45,(the
distribution),
done
.50
Since
verbal
for our
are presented
calculations
ANOVA
critical
the
for
interval
.50 4-
a total of 5 \177oin both

is not rejected. Since t 2 = F,
values), the same conclusion
leaving
be
\177 could
constructed
(2.45).254
= .50 -4-.62
This
includes
the
value
rejected. (Of course,this

of the sample.)
\177--0, once
more
result
inconclusive
confirming
may be
partly
be
cannot
that
H0
due
to the
smallness
\234
(f)
Interpretation
of Correlation
These variations in
Y are
now related
b =
r\\/5
to r. It
x7
follows
from
(14-14)
that
(14-26)
(14-22)
b in
for
value
Substituting
:,gt\177is
that
= r 23
Noting
301
CORRELATION
siMPLE
by definition
(Yi
\177
23
(i4-27)
(r,
the
solution
y)o\177,
for rs is
(14-28)
n teexpress the
we
Finally,
,.2
This eq tation
(C
F) \177
\177 (Y\177
F) 2
\177
of
variation
explained
(14-29)
of r
variation
total
intuitive interpretation of r 2. (NOte that

coefficient
r, often called the coefficient
a clear
provides
st!uare of the
this is the
(14-19). Thus
by noting
numerator
\275
correlation
It is the pro?ortion of the total

variation
in Y eXp. lained
the regresSion. Since the numerator cannot exceedthe denominator,
determin4tion.)
of
byfitting
the
off
in
upfiill;
running
reflects the tct

In either ca \177,a
When
the X-axis,
14-1 a
taken from a
an
culated
:stimator of
a Population
\177,
Our
a\177sumed
line
Y--
and
Pa.
which is
, -t- fiX,
E\177.ch
n\177trmal.
of
normal
bivariate
of iso\177\177robability
a true
is yes.
assumed
ask' \"Is the b
\177even
regression
population
Y.
exist
9.
line of
is shown
in
cal--
we
a bivariate
For
Y
Figure
X?\"
It
14-11
as
on
with major axis d. Now considerthe straight

by joining
points of vertical tangency
such
as
ellipses,
defined
these
does
ways
values
sample
for
or
to
parallel
pOpUlation
We now
population.
normal
between X and
Bivariate Normal
regression was calculated
bivariate
zero and a
be
will
= 0 areseento be equivalent
relation
Applied to a
normal po' relation, does there

exist
will
now
bt shown that the answer
a set
the regression line
linear
Y.
in
of Y is
variation
explained
= 0 and b
observed
\"no
o n Analysis
Regress
In
0. Thus r
b =
,kith
explain all the variation

i.e.,
nothing;
explains
stating
of formally
(g)
regression
fit will
0 (and r 2= 0) the
\177=
regression11ne
m\177ximUm
the
limits on r are 4-1. Thesetwo limits

were illustrated
in
part
(b), r = 1 and all observations
lie on a straight
line
in part (d), r =-1
and
this
perfect
inverse cor3elation
that
all observations
lie on a straight
line
running
downhill.
the
i\177 1'
14-2 i
Figure
side of (14-29)is1.Since
right-hand
of the
value
maxirn\177u\177
value
vertical
Concentrating
tangents
on the
defines a
slice
through
cross section
P\177Q\177,
for
slice
example,
of
P\177
Y
we
302
CORRELATION
0
Math score
14-11
/d
X=\177,+fi,Y
xx
FIG.
P'x
X
X1
Two regression lines found from the
isoprobabitity ellipses.
mean
of these Y values
occurs
at the point of tangency
P\177; at
our vertical line touches its highest
isoprobability
ellipse,
and the
highest point on any normal
curve is at the mean. Thus we seethat
the means
of the Y populations
lie on the straight
line
Y = \177+ fiX'. Next, the variance
of the Y populations
can be shown to be constant.4 Thus the assumptions
of
the regression model (12-2)are satisfied
by a bivariate
normal (correlation)
population. The line Y = v. + fiX' may therefore be regardedas a true linear
regression
of Y on
see
this
the
that
point
Thus if we know
score, this regression
were
](\177,
we
would
line
predict
score and we
be appropriate,
math
a student's
would
score to
his verbal
be
(e.g.,
P\177).
It
predict
to
wish
if
his
his verbal
score
math
is important
to
fully
curious conclusion,
since in Figure I4-5 the size of each cross section
slice differs depending on the value of X 0. However
each slice ?(Xo, y) must
be adjusted
by division by ?(Xo) in order to define
the conditional
distribution of Y. Thus recalling
the argument in Section 5-1 (c), and in particular
equation (5-10) ' the conditional distribution is
\177
This
may seem like a
?(
In fact,
this
thus have the
Y/Xo)
?(Xo)
same variance. makes all the conditional
adjustment
Y)
?(x\370'
distributions
of
Y \"look
alike,\"
and
SIMPLE
303
CORRELATION
understand wl Y we would not predict Q\177;i.e.,

we d \370not use th e majo\177 axis
of the ellipse (iinc d) for prediction,even
though
this represents
\"equivalent\"
performance .n the two tests. Since this student
is far above average in
mathematics,
tn equivalent
verbal score seems too optimistic
a prediction.
Recall that there is a large random
are a lot of stydents who will do
technically, p \177sless than I for
dicting at
Q\177,
of\"equivalent!
This is thi:
there
math,
are
.we
true
Q\177 and
exam,
but
and predict at
s score in
to \"regress\" toward
a student
Whatever
score
verbal
his
Pxga
the 6iher;
instead of presort of average 5
in
performance/\177r-
\"average\"
term regression.
poorly
Therefore,
popUlatiOn.
a tendency for
Will be
average). \370It is evident from Figure 14-I1 that

this
is
With a math score below average; in this ca se the
a student
fCtr
predicted verb.tl score

Another interesting
toward the average.

that the correlation
upward
regresses
coefficient
is
observation
is identically
(i.e., pxr
Y is unique
X anc
between
th\177s
moderate
more
' performance
origin
of the
(i.\177., the
mediocrity
equally
in one
There
m performance.
involved
element
.w. ell
but
Prx);
there
ar\177
two
th\177 regression
of Y on X and the regressionof X ,on /r. This is
immediately eOident if we ask how we would r)redict a student s math
\177core
(X) if we knew his verbal score (e.g., Y,,).
Exactly
t h \177same
argument
holds. Equivalent performance (point Q0 on
line d) is a bat predictor; sincehe has
done
very well in the verbal test, we
would
expect
tim to do less well in math,
although
still better than
average.
Thus,
the besl. prediction is P6 on the line X = o% q- fl, Y, the regression
regressions,
of X (math)
Y on X,
etc.)
tangents
o\177
but
these distributions
sion
hence
case our
of horizontal, rather
defim,s
a normal
line
least
has
thus
the
satisfying
sqt\177ares
values
is the direct analogu.e to.

regression is defined
by joining
This
Y (verbal).
ir} this
than
same variance,
our
a,
conditions
and b,
of
distribution
with
its
mean
of a true
are used to
(.P\177'
0\177, and
P4,
hor,Zi\370ntal
given Y;
lying on this
X,
regression of
estimate
of
regression
points
of thes e
Each
tangency.
vertical
conditional
our
each of
regres-
X oh
Y;
fl,.
in fact a w. ,,ighted average of Q\177and 3tr, with weights depending on p. Thus in the
case in v hich p -- 1, X and
Y are perfectly
correlated,
and we would predict Y
at Q\177. At the oth(..r
limit,
in which p -- 0, we can learn nothing about IikeIy performance
on one test from
Ihe result of the other, and
we wouId predict Y at try. But for all cases
between these tw
limits,
we predict using both Qx and l\177y; and the greater the p, the
more heavily Q\177\177sweighted.
\177
A cIassical case, \177
encountered
by Pearson & Lee (Biometrika,
1903),
involved trying to
predict
a son's helght from his father's
height. If the father is a giant the son is likeIy to be
tail; but there a? good reasons for expecting
him to be shorter
than
his father. \177
(For
example,
how tal was his mother? And his grandparents?
An so on.)So the prediction
for the son was de ived by \"regreSSing\" his father's height
towards
the population average.
\177
P\177is
limiting
304
CORRELATION
Y=a+bx
\177Y=50
\177/
Prediction
Pre,di.c.ti\370n
Y= 30
t
X =
60
= 90
score
Math
FIG. 14-12 Regressions estimated from
of verbal and math
a sample
scores.
Example
Our sample of
14-1 was,
shown
And
in
by
14-1, we
Table
from
drawn
14-11. We
Figure
scores shown in
from a bivariate
student's
eight
assumption,
have already estimated

r = .62
estimated
Y =
o:
50 +
= 20+
We
now estimate
of X on
are
X =
=, 4-/9, Y.
calculated
equations (I 1-13)and
(11-16),
Table
as
(14-10)repeated
fiX' with
.50x
(14-30)
.50Jr
(14-3I)
14-1;
this
care to
taking
and
normal population
p with
coefficients
The
in Table
i4-I
Figure
in this
involves
simple regression
using
the estimating
interchange X'and Ythroughout.
Thus
\177= 60+.78g/
= 60
=
+ .78(Y--
21 +
The two estimated regressions

14-12. Thus, for example, the predicted
result of 90 is 65; and the predicted
result of 30 is 44.4.
(14-32)
F)
.78 Y
(14-33)
(14-31)and (14-33)are shown

score of
score of a
verbal
math
a student with
student with
in
Figure
a math
a verbal
(h)
Both the
a random
more
variable,
latter.
model
model
'or example to describe the fertilizer-yieldproblem in Chapter

X vas fixed, or the bivariate
normal
population
of X and Y
11 where
this
with
th \177standard
Y be
made
assumptions
the
used
be
may
c\370rl'elation
in
makes few assumptions about X, but the

of this chapter requires that
Xbe
a random
\276a bivariate
normal distribution.
We therefor\177
conregression
model has wider application. Regression
ion
regress
having
that
clude
chapter.
However, the standard correlation model describeso\177ly
(It is true that r s can be calculated even when X is fixed as an indigation
effec{iVely
how
of
! ia ble.
X. The i
restrictiYei
about
models require that
regression
and correlation
But the two models differ
;tandard
va
When Regression?
'elation,
Col
When
305
CORRELATION
SIMPLE
regression
inferencesabput p in
redUCes variation;
but
be used
cannot
in
the
for
14-9.)
Figure
In addition,, regression answers more interesting

questions.
Like correlation, it ii\177dicates
if two
variables
move together; but it also estimates
how. Moreo {er' it can be shown that a key' iSSue in correlation
analysis- the
test of the
ntll
hypothesis
can be ans,a
:re d
null hypothe
.is
directly
(14-34)
= 0
Ho'p
from regression
analysis by
equivalent
the
testing
(14-35)
Ho'fi = 0
rejecti{.n
Thus
correlation
t[
question,
of tO = 0 implies rejection of p = 0, and

the
.oes not exist between X and Y. If this is the
:n it can be answered
by the regresSiOn test
that
conclusibn
correlation
only
of
(14-35),
and
need to introduce
correlation
analysis
at all.
Since re gressionanswersa broaderand more interesting
set of questions,
(and somec orrelation
questions
as well), it becomes the preferred technique;
correlation
{s useful
primarily
as an aid to understanding
regression,
and as
is no
there
an
auxiliar,\177';
t001.
(i)
\"Nonder, de\" Correlations
no
claim
In
suppose
liquor
inte\177
is
th
over
correlation,
>reting
\177ade
.t the
that
this
correlation
a period
one
necessarily
must keep
of teachers'
of years turns
out
in
firmly
indicates cause
salariesand
to be
mind
and effect.
.98. This
the
that
For
absolutely
\177xample,
consumption
would not
prove
of
that
306
CORRELATION
drink;
teachers
Instead, both
nor would
liquor
that
prove
it
third variable.. long-run growth

in national
this kind could be kept constantor
their
correlation
would become more meaningful.
correlation
in the next section.
Correlations such as the

It would be more accurate
to
is
the
but
enough,
real
conclusions
statisticaIly
this would
to
then
objective of partial
that
of cause
same
the
and effect is nonsense.

can also be leveled
at
charge
drawn from regression analysis. For example,a

salaries and liquor saleswould
also
yield a
coefficient.
Any inference of cause and
effect
from
teachers'
b
significant
still
discounted
is the
This
of
factors
third
only
fully
are often called \"nonsense\" correlations.

that the observed mathematicalcorrelation
inference
naive
any
effects
sometimes
applied
regression
say
be recognized
it should
Moreover,
above
teachers' salaries.
are influenced
by a
sales increase
because both
income. If
together,
moved
variables
nonsense.
be
Although
correlation and regression cannot
be used
as proof of cause
and effect,
these
techniques
are very useful in two ways. First, they may
provide furthe;'
confirmation
of a relation that
theory
tells us shouId exist
(e.g., pricesdepend
on wages).
Second, they are often helpful
in suggesting
causal relations that
were
not previously
suspected. For example, when
cigarette
smoking
was found to be highly
correlated
with
lung cancer,
possible links between

the two were investigated further.
correlation studies in which
third
factors were more
well as extra-statistical studies such as experiments
chemical theories.
This
included
more
controlled,
as
animals,
and
by regression
on X.
rigidly
with
PROBLEMS
14-3For the
following
sample
random
of 5 shoes,
(a)
The proportion
of the variation
(b) The proportion unexplained.
(c)
in
Whether
three
interval.
Y explained
in
on X, at the 5 \177osignificance
Y depends
alternate
find
ways--using
the F
Cost of Shoe
I0
15
test, t
=
Months
8
10
10
2o
12
2O
level.
and
test,
Answer
a 95 %
of Wear
this
confidence
CORRELATION
SIMPLE
14-4
normal distribution of scoresis perfectly

with isoprobability ellipses as follows'
bivariate
Suppose
with p
'\177 307
.50 and
symmetric,
80--
or False
True
(a)
The
(b)
The
? If false, correctit.
of Y on X is
regression
curve
regression
line of
= 80 +
Y
on
.s(x-
X has
80)
graph as
follows'
(c)
Thel variance
(d)
The
(e) Tht.s
b
a\177d
original
the
b,
for any given
b = r
(b)
b,
(c)
bb*=
rSx.
(after
Y values
explained
fitting
X)
by
X is
would
only
have
1/4.
3/4 the
Y values..
sample regression slopesof

scatter of points.
,s,,\177-
sx
variance of X.
Y variation
be the
True or False? If false,

(a)
1/4 the
is
of the
residual
the
of
varianc\177
14\337
5 Let
of
proportion
correct
it.
Y on
,Y, and
,Y
on
Y,
308
CORRELATION
If b ;>
(d)
(e) If
14-6In the
doing
< I necessarily.
b, > 1 necessarily.
1, then b,
1, then
<
graph
following
any algebraic
of 4 students' marks
calculations):
(a) The regressionline
X.
Y on
of
(without
geometrically
find
(b) The regressionline of 2' on Y.

(c) The correlation r (Hint. Problem 14-5c).
(d) The predicted

(e) The
student
of a
Y-score
predicted X-scoreof
a student
of 70.
with
X-score
with
Y-score of
70.
\177 8o
\1776o
x
4O
40
60
80
Term grade X
PARTIAL CORRELATION
14-2
As
more than
consider a simple three-variable
on spring
depends
the
Following
from the simple two-variable
we move
as
soon
which involve
suppose
example:
temperature (X)and rainfall

(Z).
techniques of Chapter 13 we
regression plane to a scatter of observations
of
to
case
complications
variables,
two
could
Y, X,
relations
arise. To illustrate,
that yield of hay
(Y)
fit
following
the
and Z:
Y=a+bX+cZ
we
how
Recall
Y is
how
the multiple
interpreted
related to X if Z
were
(14-36)
regression coefficient:
The
constant.
partial
correlation
estimates
coefficient
rxy. z is a similar concept. It estimates how X and Y move together if Z

were held constant. (For convenience
variables
Y, X, and Z are often
defined
as variables
1, 2, and 3; thus ryx.z
becomes
r\1772.a , the partial
correlation of
the
first
two
when
variables,
While the previous
regression analysis
section
corresponds
of
the
third
sections of
Chapter
is assumed
this
12, the
to the multiple
chapter
constant.)
regressionanalysis
to the
correspond
partial correlation
analysis
of Chapter
simple
in
this
13. Thus we
could
whole chapter on
on a
here
ribark
e\177
309
CORRELATION
PARTIAL
and
correlation,
partial
\1771ong
argued in the previous

section that
correlat on is relatively less important, we confine ourselves
to
a 'brief
introc uction to this concept,
intuitiv\177
and how it may be used.
,ing
assumptions
are generally
made about the parent populaof X,
Y, and Z is multivariate
normal.
This i\177plies
tion. Thh\177
T efollo\177
disU ibution
that for[any value of Z, the conditional
distribution
of Y and X is biv6riate
n\177,,'m
sho'\177n in Figure
14' 5 . o\1777,.
\177is defined
as the simP le correlation
\177,\177a l\177as
.
\177
\177.\177
X..\177
.hat.
at
one
t- lowever,
we have
since
of this
donditi(\177nal
joint
In
(ompuiing
its estimator
t it
variable
of 5( and Y.
problem arises.
distribution
z a
r\177Tx
possible to
is simply not
value
single
fix
Since Z is a random
Zo and sample the
corresp\177)nding!,conditional
distribution
of X and Y. Thus,
unless
the sample
is extren\177ely
la\177ge, it is unlikely that
more
than a single Y, X, Zo combination
involving Z0 'd/ill be observed.
The alternative is to compute rrx z as the
correlation
of Y and X after the influenceof Z has been removed
fro'm each. \177
The
resull ing partial correlation r x z.z can,after
considerable
manipulation,
;sed as the simple correlation
of Y and X (ryx),
adjusted by
wo simple
the
applying
correlations
rxz and
(namely
involving
rzz ) as
follows!
--
r\177x
This fo1)mula
special
case
rxz =
r\177z
\177artial
t at both X and
= 0), then (14-40)
uncorrelated
completely
are
however,
coefficient;
correlation
simple
and
be no closecorrespondence
there need
that
explicitly
hows
the
between
(14-40)
\177
rxz
/I
rx.z
rxz
r\177z
with
as
expect, the partial
we w.)uld
the
reduces to'
ryx.z =: ryx
and,
in
Z (i.e.,
(14-41)
and simple correlation
are
coefficients
the sam;e.
It is inst[uctive to
? By
the
\"influe\177
note
ce\" of
Z on
the
we mean
\"remr\177ving t \177einfluence,\"
obtainin\177
the reliduaI
we mean
is recogn
ed
to
r\177x.z cannot
regression
of Y on
Z:
(14-37)
+ bZ
subtracting
the
from
fitted
the observed
value,
deviation:
ti=
which
other extreme when

be calculated
the
case
this
fitted
= a
By
at
happens
what
with Z. In
correlated
perfdctly
becomes
be that
r- P= Y-a-bZ
part of
not
explained
residual leviatic n of X from its fitted value on Z. The partial

is the sir .ple cmrelation
of a and \177,thus:
-- ^^
I'.X\177
Iz \337
Z \177 ruv
(1\1774-38)
by Z.
Similarly,
correlation
we obtaifi
coefficient
v, the
\177rxy.z
(14-39)
CORRELATION
310
becomes zero as a consequence.

This
is recognized
as the multicollinearity problem of Chapter 13,
where
the corresponding
multiple regression estimate b could not be defined.
The parallel statistical properties of b and ryx.z
can be extended
further'
rejection
of the hypothesis that
fl = 0 in Chapter
13 is equivalent to
rejecting the nuI1 hypothesis
that pyx.z = 0. Again,
one
reason
for emphasizing regression analysis is confirmed' multiple
regression
will not only
answer its own set of regression questions, but also partial
correlation
questions as well.
= 1
since rxz
14-3
of (14-40)
denominator
the
and
CORRELATION
MULTIPLE
coefficient may be computed

for each independent
In addition, one single overall
index
of
value of fitting
the multiple
regression
equation can be defined:the multiple
correlation
coefficient, R, is the simple
correlation
coefficient of the observed
]r and the corresponding fitted \177. Thus,
if our estimated regression is:
correlation
A partial
variable in
regression.
multiple
\177=a
cZ
bJF+
_z
r\177.{\177I
(14-42)
then
propertiesof
has all the nice algebraic

particular, we note (14-29)which
This
Rs _
\177 (?\177
?)o.
F)
\177 (Y\177.--
Note
If
variation
the
there is more than

of Y explained
multiple
regression
variables
to our
see
of
in
how
(i4-44)
Our
Y.
conclusion
values of calculating
the
in Y.
variation
It remains,
coefficients,
Total
variation
We
could
set this
up
in
an
this
relate
to
-- variation
variation
as we add
fast R 2
in
explained
explained
ANOVA
to our
We
(13-15).
we can
by
by
t-test of
could
table Iike
our
\337
one
immediately
explanation
major
of the
regressionexplains
regression
multiple
extend (14-22) to
(X\177'\"Xa)
-Y4 +
explanatory
additional
increases
helpful these variables are in improving

is the same as in simple
correlation
R 2 is to clarify how successfully our
our example
(14-44)
regressor,
of them
how
watching
by
finally,
using
all
of Y
is only one regressor (independent

then the numerator represents
[with \177Yestimated
from the fuI1
there
if
of
variation
variation
total
In
correlation.
simple
any
form
explained
(14-42)]. Thus,
(e.g.,
model,
one
by
the
takes
to r 2
is identical
this
that
variable).
(14-43)
unexplained
additional
variation
Table 10-10,and
construct
(14-45)
the
MULTIPLECORRELATION
311
ratio
= additional variance explained by
: unexplained
variance
:F
just as
included
v\177im
r eii
fo r each)t.of
\177v.a!\177u\177s
s \177.ha.
o!
t\177sts
in
translated
lend themselves
(last
F ratio
an observed
are
ob-
thls
the
of
ficance
F values
regressor
orl each
rllllcance
s1
(13-15), and
equation
under
,\177ppear
of
of significance
the si ni
test
of
. These
(14-46)
--
\177uernCOmO construct
e S:\177si\177rrJY/;
g
'
\177s;\177erX4r'
i
ttJ
(i0\17730)was constructed. A
i s thu s seen to be a test
._
,,
ra. io
t!e
served
X\177
so
'\177\370\"\177
PROBLEM
assets \177,
(a
R, the
(d
Th
(1)
\177(
(f)1Is
r4. \177 '
other
thing:
or
for
(b)
14-7(d),
of
the
larger
the
(d) \177Usin\177 parts
(e)
'14-10
t-.
Repeat
8 Using,
W.
of cou\177'se,
Is
Y are
and
as
variation
explained
This
is one
t =
which
there for
w;\177
(13-15).
tOe'steps of Problem
'
'
13-2 and Sub-
of Problem
data
related
\337
find
of' the variation

and W.
--\177x/F.
such
eqdati0n
statis\177ticalisignificance
on
\"how
addition
by
the
\177s
unexplained
the
two
(a), (b), and (c), calculate the

variance
significance
of addin\177
W to the re
statistical
Oalc41ate
a\275
problem?
in this
than
of
(c)vh?i\177H( \177'imany
degrees
of
are
of
to n in (a) and ().
b freedom
9
in
by
is explained
which
of
\177
r\177ess\370r
iTh :Ipr\370portion
o\177 of S on
Y
test \177e
W.
measure
W throughout
( ) \177The\177proport\177on
regr?
r always
a better
r\177,r.,:
ol!owi\177g Problem
g re
(c), is R
s being equal\"?
as
and
\337
larger than
iProblem 14-7, using
stitFtingl
and
W,
(a)
\177i\1776\177siarilY
14-8 Repeat
'14-9
y and
)By
Co\177paring
'
(e)f
Wfixed.
holding
\177,
and
of S
variation
of the
\177
proportion
Y alone;
income
Y.
and
of S
correlation
multiple
of S
on
correlation
partial
the
(cb\177'\177)
\"\177s\177.,r,
and
of S
correlation
simple
savings S to
relating
13-1
find
the
ra\177,
( J
of Problem
thbdata
14-7 FSr
of adding
the
valuesg\177
14-9 to
Y as
ater
components
ratio
'
\177ey-s,
7,noum
oe
F, to
m\177odel:
\177ouno
find
the t value to test
a reg ressotaft er S \177s
'
regressed
:k x/\177', with 1 degree of freedom in
the
numerator
the
15
chapter
\337 \337
Theory
is devoted to making
decisions
in the face of uncertainty.
discussion involves Bayesianmethods,which
are not only
useful for their own sake, but also sharpen
our understanding
of the limitaThis chapter
part of the
A large
tions of
PRIOR
15-1
statistics.
classical
Problem 3-24b on Bayes'theorem

altered form. If we were to
slightly
a barometer,
sulting
DISTRIBUTIONS
AND POSTERIOR
we would
Prior
State 0
Prior
But
we
can
TABLE 15-2
\177
Prediction
a barometer
.60
characterized
by
Table
Conditional Probabilities p(x/O)

State
Shine (0=)
.40
p(O)
using
by
in
Rain (0x)
\177-\177
\"Rain\"
\"Shine\"
\337
'2\1770
.1\177'
.80
1.00
312
Shine (02)
.90
1.00
before con-
weather
Probabilities
Rain
probability
do better,
tomorrow's
use Table 15-1'

15-1
TABLE
to repeat,
enough
important
is
predict
15-2:
TABLE15\1773
Probabilities
posterior
, f)(O/x)
,
State
Rain (0x)
Shine (02)
.75
.25
probability
Posterior
P(O/\"rain\
by
Tabl e
isf
I15,31
Table
We
Tables
that
the barometer's
prediction is
15_1
and should
are no \177$nger\177 relevant,
observed
Afte\177]\177t
probabilillies
i'eca[l that
this
was
15!i a}d 15-2 int \370Figure
relati.te
probabilil[ieS
Sinc\177
'
these
size of
the
The new
15-1.
areas
hatched
two
we now
important,
\177'
the
be replaced
combining
by
sample space is \"rain\"
with
the two posterior
explaining
of rain
\"
(\1774)(\1779) -\177 ,3\177:'
of the state'shine
e\1770 \177'\177i\177
=; \177(0
con-
formal
full
and
=
,r0bability
its
'\"
= p(oo
nl
down
write
probability
the
express
p(o,
Similartylthe
diagrammatically,
derived,
\"rain,
15-3.
Table
'\177'\177;is
so
tn\177I
i n
W{use (5-10)to
firmatio
313
DISTRIBUTIONS
POSTERIOR
AND
PRIOR
and the
prediction
\"rain\"
rmn
as
\17715-2)
is
;(15-3)
2) - e(\177t/0Z)
= (.6)(.2) =. 12 .....
(15-4)
relations define the hat&ed'hreasin Figure
15-1. Companng
areas we b0n&ude tpat i t is three times a s likbly
for a \"rain\" predicti\370fi
to be
associate\177 With rai n' as with
shine.
Formally,
the hatched area in Figure
15-1 becgme\177
the
n ew samp le space, within
which
we calculate
the new
[
twb ca
These
(,
\177r0babilities.
To
\177s, we
that
note
p(\"rai
n\")
p(x\177)
state
.12 =
'(15-5)
.48
Shine (.6)
Rain (fi)
\"S i
.36 +
\177
I
Predictign'l
\177\177'
r\177'
\177
Rai h\"
(he
FIG.
15-1
How posterior
probabilities
are determined.
I/\177tVERSITY
\177
\177
\177
LtBRARI[S
CAR\177EGIE\177MELLO\177 \177IVERSIT\177
PITTSBURGH
PEP\177NSYLVANIA
1521\177
DECISION
314
again
(5-10)
Using
THEORY
p(O\177/x
p(O\177, x\177)
_.
(15-6)
_ .75
.36
p(x0
p(O\177, x\177)
\177.48
Similarly
=
p(o\177/:\177)
When
this
divisor
by using.the
in
distribution
?(x\177),
15-3. This is
Table
often
(\17754)
probabilities blown
up
in
posterior probability
is the
result
the
.25
\1772
.48
space has its
sample
(hatched)
new
this
way
p(x 0
more
the
in
written
convenient
and
we repeat
the
general form
p(o,
p(O/\177) =
To
keep
the
prior
probabilities
the
p(,\177/o)
in
perspective,
(barometer) is seen,
the evidence
Before
emphasis.
give the proper betting odds on the weather.

But
can do better; the posterior probabilities\234(O/x)
betting odds. (This may
be
intuitively
grasped
by
relative
frequency
interpretation.
Of all the times the
p(0)
after the evidence is

now
give
the proper
appealing to
p(o)
manipulations
the mathematical
physical interpretation for
,\177.)_
p(\177)
we
in
barometer registers\"rain,\"
in
proportion
what
The answer is 75 \177o.) As a simple summary,

distribution
is adjusted by the empirical
will
we note
that
evidence
distribution. Schematically:
Prior
Probability
and
of
?(o)
the
actually
prior
occur?
probability
to yield the posterior
Posterior
empirical evidence
probabilities
rain
yields
(\1775-9)
probabilities
?(o/\177)
?(\177/o)
PROBLEMS
15-1
A factory
machine,
table'
has 3 machines (0\177, 02, and 0a)

larger and more accurate it
the
Machine ,
020a
(newest)
produced
output
this
Rate of
bolts
bolts. The newer the

to the following
according
of total
Proportion
by
0\177 (oldest)
making
is,
machine
defective
it produces
\177o%
4o%
5o%
3 i5
OPTIMAL DECISIONS
Thus
the
(a)
is th\177
example, 0a produces half of the

Produces, 1 \177 are defective.
a bolt is selected at random;
,nce
\177e
(b)
02?
0\177?
By 02 ?
examined,
By 0a ?
afte
what
this
0\177? By
by machine
a roomful
;ights 0 have the following distribution:
is drawn at random from
aman
15-2
Of all
and
TM,
found defective;
produced
was
it
it is
before
it was produced by machine

the
bolt
is examined and
what is the chance
ion,
outp
fact\370ry's
of
people,
ten
iv(O)
(inChes)
70
71
72
73
74
75
(a)
(b)
of O.
distribution
(prior)
this
)ose also that a crude measuring

Lth the following distribution'
e (error in
inches)
device is available, that
makes
p(e)
.1
--2
--1
.4
.2
.1
2
SU
hei
d e\177
i.e:
help
us to be more accurate
in estimating
the man's
example, supposehis measured
height
using thii crude
s x-- 74 inches.We now have further information about 0;
\177is measurement
changes
the probabilities
for 0 from the prior
this
can
-'For
t .
mt\177on
?(0)
to a
his posterior
15-2
posterior
distribution
iv(0/x
74).
Calculate
and
distribution.
DECISIONS
(a)
Su
a
regularly sells
games. To keep
a salesman
t football
umbrellas or lemonade on
matters
simple,
suppose
Saturday
he has
just
316
THEORY
DECISION
ai).
options (actions,
possible
three
umbrellas;
sell only
a\177 =
sell someumbrellas,
sell
aa =
chooses
If he
It will
be
his
thus
lemonade.
only
his profit is $20;
it rains,
a\177 and
more convenient
losses will be --20
to describe
information
15-4
TABLE
l(a,
O)
Rain
10
5
25
--7
--20
a.2
Supposefurther
of
the
weather
that
distribution
the probability
is as shown
15-5.
Table
in
of 0
State 0
#(0)
Probability
is
What
out,
the
best action for

reading on;
before
Solution.
average
If
he
? Intuitively,
this
chooses
will
a\177,
--
Shine
.20
.80
be easier
what
--20(.20)
(long-run relativefrequency)
Rain
the salesman to
it
we calculate the
L(al)
All
loss table'
Distribution
Probability
15-5
TABLE
this
losses.
(02)
Shine
(0\177)
S10.
\177
Action
following
Function
Loss
State 0
,-'\"\"'7\177
the
he loses
(negative profit);
be certain
also
in
shines,
a loss
as
or + 10respectively.
action a2 or aa, there

wilI
may be assembled conveniently
if it
but
everything
he chooses
If
lemonade;
some
could
take?
(You
are urged to work
that way.)
he expect his loss to be, on the
expected loss if
+ 10(.80)
= 4
he
chooses
a\177'
We reco
as
this
= l(al,
evaluate
Simi
02) p(00
l(ai,
by (4-17)'
given
\177 l(a\177,
(15-11)
O) p(O)
0
\234\2752)
and
p(00
Ol)
value,\177as
of expected
concept
the
317
DECISIONS
OPTIMAL
5(.80)=
5(.20)+
(15-12)
(15-13)
L(aa) = 25(.20)- 7(.80)=--.6

In
gener\177
L(a)
The \177optiknal
in
tact,
action
only
th\177s \177s?
the
we assemble
is seen to
option that
15-6
T^BLE
O) p(O)
l(a,
be aa, which
allows any
Calculation of the
p(o)
(15-14)
.20
.80
01
{92
the
minimizes
profit.
expected
and calculations
information
\177tll our
\177
in
15-6'
Table
Action
Optimal
expected loss,
To summarize,
a\177
- 20
a\177
oss function
\177'
10
-7
25
aa
mber
to
next
anysection).
nr
We now
].
The obj
at S+
of $4 per d \177
days,
the
'remains
ective
P
an
(even
same:
roblem
can be
infinite
number,
to minimize
eneralized
as
expected
in
the
loss.
l(a,
t \270review,
WiSh
about
and
0).
?(0),
obabilities
those
days he
this
that
\177
0 or actions a
consider:
\177sFunction
\177
For
to state
necessary
\177
ot states
\177
mus\177 to
T\367
minimum
-.6\177
'
dlyiseems
ha
loss
l(a, O)
(b) Genet li
It
= expected
L(a)
20 rainy
yielding
we give an
days at
S+800
alternative
$-;\1770
for a sum
each,
of about
intuitive
yielding
S+400
calculation.
In, say , 10o
S--400; and about 80 shiny
in 100 days, or an average
318
DECISION
THEORY
Prior distribution
of data
Probability
P(\177)
1
{x/O)
Loss
if
that
Expected
find
>
loss L(a);
the smallest
decisions to
of Bayesian
logic
O)
possible;
otherwise
FIG. 15-2 The
l(a,
This
<
expected
minimize
loss.
?(0). These of course should

represent
the
best
intelligence on the subject. For example, suppose the salesman
moves
to another state, with
weather
probabilities
as given in Table
15-1.
If he has no barometer, he will have to use the (prior) probabilities in this
table.
But if he can consult the barometer
(described
in Table 15-2), then
of
course
the posterior probabilities ?(O/x) in Table
15-3 should be used. (See
The probabilities
possible
15-3.)
Problem
The
logic of
Bayesian inferenceis laid
15-2. Incidentally,
out
in the
of the average
calculation
the
in
block diagram, Figure
loss L(a) in
(15-14)
it
instead of?(0) as weights,

where
k is any constant
(independent of 0 and a). For ks(O) would generate losses kL(a), which
would
rank in the same order as the true
losses L(a), and hence point to the
would
not
hurt
to
use ks(O)
same correct optimizing
is a very useful observation. Thus, for

need not undertake the last step in calculating the posterior probabilities of rain ?(O/x\177) in (15-6) and (15-7); he
can forget about the denominator ,o(x0, and use (15-2)and (I5-4) instead-without affecting his decision. \370'
The loss function,
l(a, 0). In our example, we assumed
that monetary
loss is the appropriate consideration.This may be valid enough if the decision
is made (\"game
is played\")
over and over again: whatever
minimizes
the
expected loss in each game will minimize total expected loss in the long run.
our
example,
2 i.e., attaching
of .75
and .25.
weights
This
action.
umbrella
salesman
of .36 and
.12 to
his losses
would
yield
the
same result
as
weights
DECISIONS
OPTIMAL
Yet
are some
monetar\177
ofli
were
right
(1) a
pe )p e
criterion.
or
1/2chance(lottery
(a),
of choice
that
than
less
its
though
monetary
expected
(1) = s 100,000
s 100,000
is
even
(15-15)
$210,000 prize.
on a
ticket)
choice
prefer
would
only once, and then

expected
To illustrate'
suppose you
made
are
between
sure,
for
S100,000
(\177)
value
that
be the
a choice
!\177tax-free)
Most
decisions
not
may
)\177s
319
(15-16)
(b)'
s210,000(I,/2)-- S105,000
The rea
the
than
hun,
first
opportun
the sports
be based
of money
one
auth
ate me
expected
value
people
most
that
d. (The student
Once
housand.
be
availabIe
has already
been
itself as
n money
\"utility\"
the
/. Using
thousand
hundred
first
on how
speculate
more
much
he would spend the
these purchases have been made,less exciting

for spending the second hundred thousand;
bought,
and so on.) Such a decisionshould
but rather
(! 5-16),
in
of money.
bjectiveevaluation
:he decisionshould
are:
the
should
U(M).
be
an
As
on a subjective
Figure
illustration,
valuation
15-3 Shows
utility is the more approprion expected utility,

rather
than
expected
utilities
of the two choices
Since
based
Figure 15-3, the
(a)
(b) u2(\253)=
is
which
r victory
function
hereafter
\177all
for choice (a).In decision

situations,
a loss-of-utili O,
typically
be used as our loss function
l(a,
0);
should
kind
is
(15-17)
.7u
interpret
losses
in this way.
u2:
1.4uI
210,000
100,000
(dOllars)
\177
a This
dividual
15-3
utiliti
used to
by
defi\177
highly
hi m
ity,
Which
rather
Author's
subjective
evaluation
of money.
personal,
and temporary. It is defined empirically
bets he prefers. In other words,
many
bets like
than vice-versa.
':
for an in(15-15) are
320
THEORY
DECISION
PROBLEMS
15-3
Using the losses of Table 15-4,calculate

the optimal
action if
(a) The only available
probabilities
are the prior probabilitiesof Table
15-1.
(b) The
reads
\"rain\"
reads
\"shine.\"
barometer
(so
of
probabilities
posterior
the
that
Table 15-3are relevant).

barometer
(c)
The
(d)
Is the
a true or
following
above? If false
salesman
If the
before consulting
his
choose
must
then
barometer,
best.However,
(a) to (c)
questions
of
summary
false
it.
correct
can be consultedfirst,
barometer
if the
action (order his merchandise)

a2 (umbrellas and lemonade) is
then
the salesman
should
Choose
a\177 (umbrellas)
if the
Choose
aa (lemonade)
if
But a bright
salesman
going to all the trouble
t5-4 A farmer
His losses
the
barometer
predicts
\"rain.\"
barometer
predicts
\"shine.\"
seen
have
could
of learning
has to decide
depend on its
after
processing,
the
corn for use
has
decision
been
or
by the mill
(determined
content,
farmer'\177
decisions.
Bayesian
to sell his
whether
water
without
solution
obvious
this
about
B.
use
during
made) according to
the following table.
.'\"'7'--...\177State
Action
\177
Use A
(a) If
his corn
what
(b) Supposehe has

whether
it
is
regardlessof the
what
should
much does it
state
reduce
is that,
as
dry
a rough-and-ready
a method
of nature.
his decision
10
through
one
long
past
third of the time,
be ?
developed
or dry
wet
.30
20
has been classified
his decision
should
-10
information
additional
only
his
Wet
Use
experience
Dry
If
which
this
be ? How much
his expected
loss ?
is
indicates
is this
means of determining
3/4 of the time
that his corn is \"dry\"
method worth, i.e., how
correct
DECISIONS
OPTIMAL
--> 15-5
\177ool
rom
is to
serve 125students,who
be built to
Let
live
321
along
a single
distance student i lives from

origin
of school from origin
distance
Thu
-- a)
(xi
= distance
of student
school.
i from
(a) Whei'e is the optimum

place
(mean,
median,
mode, midrange?) to
build thq school
in order to
(1) i\275imize the distance that the farthest student has to walk.
(2)
\177lmlze
the
total
walking done, i.e., minimize
the
sum of the
ute deviations'
*(3)
sum
the
timize
of the
squared deviations'
Z
(Hi,it.
respit
\177alculus
e\177iuaI
(b)
tion
differentiating
the following accurately

.bove ? If not, correct
it.
is co
only
walk:
mor\177
mile.
1,
respect
with
to a,
setting the
reflect
live
your
where
conclusions
the school
is
in ques-
concerned only about

the total walking done; walking
no matter who does it. In (1), on the other hand,
g done by the two extreme people is considered a loss;
tone
by any others
is of no concern whatsoever.
(3) is a com\177we imply
that although
all walking is some kind
of loss,
the
:udent has walked, the greater
his loss in walking one more
s the person who walks 3 miles (xi -- a = 3) contributes
9 to
\177nction, whereas the person who walks 1 mile contributes
we are
red
only
suggests
zero.)
clmlze the number of students who

t, and
do not have to walk at all.
(4)
the
to
a)2
a loss,
THEORY
DECISION
322
AS
ESTIMATION
15-3
DECISION
states 0 (rain
earlier example the
In our
categorical (i.e., nonnumerical).

in this
theory;
consider a numerical
section we
shine)
and
was not an
this
But
and actions a were

essential part of the
example.
Example
the judge at
first contestant, whom
Suppose
of the
ignorance;
suppose
probability
distribution
p(O)
64
.1
65
.1
66
.2
.4
,2
Suppose,
in order to encourage an
if he makes a mistake (no matter
sl
as good
as a mile.\"What
(ii) Supposethe
rules
(iii) Suppose the

error
Solution. (i)
(ii) The median
(iii) The
15-5,with
(i),
the
are
rules
The
large
is to
the judge
or small);
miss
\"a
as his error
likely
most
is
$x for
is
his
even more severe, by fining

the judge
the same as (b), except that
the
loss
increase.What is his rational guess now ?
(modal) value 68.
made
this
is
67.
value
mean value 66.8.

(iii) are like (4), (2),and
(ii), and
same
guess,
intelligent
how
0.
rational judge guess ?

become
more severe, by fining
the judge
greater
his error,
the greater his loss. What
of x inches;
more severe
heights
68
the
should
an error of' x inches; the

rational
guess ?
Thus
Prior distribution of
15-4
be fined
an
66
64
FIG.
becomes
follow the
15-4.
.1
69
for
of contestants
.3
68
sacz
is asked to guess the height

seen. Yet he is not in complete
heights
in Figure
.2
67
(i)
the
that
shown
p(O)
never
has
he
he knows
0 (inches)
contest
beauty
(3)
in the
schoolhouse
Problem
solution.
To translate this into

height is the state of nature
the
0,
language of
the guessed height
familiar
and
decision theory,
(estimate)
the
is the
girl's
action
ESTiMATiON
a to be
iThe
numerical,
and
\177
\177er
formula'
loss functions,
Each of the 3
a table.
than
function
loss
the
pay is th e loss function [(a, 0);

is most convenientlY given
must
judge
the
fine
Optimal estimator, is shown in
correspo
,E
HOw
15-7
if
a = 0
1 0therwise
(,'the
a _
(a _
0'1
its
On
tt le
a is'
Estimator
Mode of p(O)
exactly,
loss function\
Median
Mean
0)p-
loss
in
is
(iii)
functi\370h
ory. It is graphed
but also by
' easily differentiated
exam
by
With
Corresponding
the
Optimal
0[
UadratiC\"
decisio
intuiti
0 Depends
of
Then
is'
O)
along
since
15-7.
Table
the Optimal EstimatOr

Loss Function
FunCtion l(a,
Ihe Loss
323
AS A DECISION
15-5.
Figure
attractive
its
(an
the
important
o)e tha t
is
It is justified
mathematical
requirement
USUally
rased
in
not only by its

properties. For
in minimization
;n\177tl\177eother
hand
(i) obviously cannot be differentiated,
nor
can
is an absolute value function.
W\177 reemPhasize
t h'at the Probability distributionp(0) used in the!deCiSiOn
proces\177 ou ht to reflect the best available
information.
Thus
we may be
forced \177,touie the prior distribution p(O) if we have not yet collecte d \177ny data,
1:
(ii), since
but
aft\177.pr d\177ita
is collected,
the posterior distribution
15:5 The
quadratic
p(O/x)is appropriate.
loss function.
324
THEORY
DECISION
PROBLEMS
of Problem 15-2.
are extensions
These
15-6
have
you
Suppose
(a)
to guess the
prior
with
of 0 only the
l(a,
Assuming
O) =
la --
(c) Assuming
l(a,
O) =
(a --
(15-18)
(15-19)
0)2.
Problem 15-6, after the

so that
the posterior
(15-20)
is relevant.
?(0/x)
distribution
CLASSICAL
VERSUS
BAYESIAN
crudely measured
has been
height
man's
74,
ESTIMATION:
15-4
Figure
l(a,
Repeat
15-2,
Problem
in
optimal estimate
otherwise.
(b) Assuming
as x =
man drawn
Find the
ifa=O
= 1
15-7
known.
?(0)
O) =
of the
height
distribution
This comparison is best shown with

an extended
example,
15-6; from this we shall draw conclusions later.
illustrated
in
(a) Example
to estimate the length
0 of a beetle accidentally
piece of machinery. A measurement
x is possible, using
subject
to some error; suppose x is normally
distributed
the true value 0, with
cr -I. Suppose
x turns
out
to be 20 min.
Suppose
is essential
it
caught in a delicate
a device which
is
about
Question (a). What

Our information
classical
the
is
Solution.
?(x/O)=
can be \"turned
on
N(O,
0 =
and,
mean
the
00 --
25 mm
posterior distribution
Solution.
any 00,
or0,
etc.,
It
(I 5-22)
will
and
be
then
useful
solve
to develop
it
for
our
(15-23)
of 0 = 20
we take the effort to find

of all beetles has a normally-distributed
and
variance
cr0\177
= 4. How can this
of 0 ?
population
0'
for
interval
1.96
q-
Suppose
(b).
Question
(15-2I)
1)
the following confidence
point estimate
that
of x, i.e.,
20 4- 1.96(I)
= 20
of course'
distribution
N(O,
for 07
interval
confidence
specifically
a\177.),
to construct
around\"
95\177
sampling
the
out
length,
be
a general formula
specific
IL
a biologist
from
used
applying
example. Since
jJ
with
to define
for
our prior
325
VERSUS CLASSICAL
BAYESIAN
ESTIMATION'
distribu,
and
the
is'
empirical evidence x
of our
bution
(15-24)
;V(Oo. .o
?(o)
(15-21)
= N(O.
repeated
it
b,
can
that
the
distribution
posterior
5-21)
may
K2e
to
p(x, O) =
.25) and
we can u
(15-26)
p(x, O) =
Now corn
)nly
the
(15-36)
(15-25)
(I5-26)
(1/2a2)(x--O)2
constants introduced
and other similar

nt. Since
K\177
specifically'
= K\177e (1/2ao\177)(O--0o)'2
p(x/O) =
where
normal;
be written'
p(O)
critical
also
a)
=
4 (15-24)
is
in this
footnote are of a
not
(15-27)
p(x/O)
p(O)
form
to write
\177)(02-200o+0o
\177)+(lj\3702)(x2--2x0402)]
KiK2e'(\177-/2)[(llao
which
exponent,
be rearranged
may
(15-28)
to
(15-29)
Let
1
the se
Using
de
.nitions,
2a
G0\177
G2
00
(15-30)
(15-31)
[02 -- 2abO
+ Ka]
[(0 --ab) 2
+ K 5]
---
--
(15-29)can be written
the exponent
(15-32)
(15-33)
a
(
Finally
\177
p(x,
0) =
K\177e -(\177/ea)(\370-a\177)2
(15-34)
and
p(0
-- p(x, 0) __
K7 e -(1/2a)(O-ab)\177
(15-35)
p(\177)
This
mea
a appean
(integrati
tt 0, given
x, is a normal variable with
mean
ab and variance a, \177r\370vided
aPP ;opriately in K 7. But it must, sincep(0/\177)
is a b\370na fide probability
function
\177gtc 1), and K7 is just the scale factor
necessary
to ensure this.
\177sth
326
THEORY
DECISION
where
1
0'o2
0 '2
(15-37)
{90
0'02
0 ,2
(5-38)
apply
Now
our
to
this
Since
example.
2=4
0.2__I
=
00
it
25
x =
20
that
follows
and
25
b-
20
1
105
Thus'
= ab
mean
--
variance
Hence the posterior distribution
may
= 21.0
.8
a --
.8)
20) =
compared
the
with
written
be formally
(15-39)
prior'
?(0) = N(25,4)
logic is shown in Figure
account of observeddata (x),
Bayesian
The
adjusted
to take
(15-40)
15-6.
with
A prior
the weight
observed x dependingon its probability

p(x/O).
The
distribution, with mean (21) falling, as expected, betweenthe prior
and the observed value (20).(As a bonus, variance is reduced in the
distribution.
Although
this
does
not
always
happen,
is
distribution
attached to the
result is the posterior
it
is evident
mean
(25)
posterior
that it must
happen for normal

distributions;
for (15-37) shows that
the posterior
variance
a is less than
0.0\177,and
also less than a 2 incidentally.)
Question
(c).
With the posterior distribution
(15-39)
now
in hand,
defining
a Bayesian
estimate of 0 requires only
a loss function.
Suppose this
is the quadratic
loss function; what
is the Bayesian
point estimator of {9?
Find
also
the 95
\177oprobability
interval
for 0.
VERSUS CLASSICAL
BAYESIAN
ESTIMATION:
Bayesian
95\177oProbability
327
interval-\177
21 + 1,96\177
\177pOsteriorI
p(O/x
20)
= N(21,.8)
= N(0o, O'o\177)= N\2755, 4)
p(o)
Bayesian
estimate
25
2!
20
estimate
\"<--- Classical 95\177 confidence

\177+1.9!
o'
interval
+1.96
-- 20
x, and its distribution.\177
Based only Ion \177th\177

evidence
Bayesian versus classical estimation.
15-6
FIG
Soh\177tton:
the
For
optimum
that
(Note
function,
the posterior mean (21)is
because p(O/x) is normal, this is also the
that
all
the loss functions
in
Table
15-7
loss
quadratic
the
estimator.
posterior me
:lian and mode, so
yield the sam,
answer. This is reassuring, and
To cons :uct
a 95 \177 probability
observation
the'
given
interval
20, there
is a 95 %
from (15-39) that,
we know
probability
in practice.)
happens
frequently
interval,
that
will
fall
in the
! th
21
i,l
,is/alue narrower
of the
rNe\177tte;t tihnig\177the
I
-_k:
1.76
the classical
(moreinformation
precise)than?(0).
prior
21
4-- 1.96,j\177
interval
(15-22),
PROBLEMS
15-8 As
(or*
\177ur
? 0)
leans
show
of measuring
that
in the
(beetles) becomes
posterior distribution
the
mean
variance
I
more and more precise

?(O/x),
---,-x
-\177
0
(15-41)
328
THEORY
DECISION
In
15-6,
Figure
independent
two
value z.
its measured
what would yo.u

measurements
be
we can
device,
measuring
errorless
true
the
that
::> 15-9Using
mean if
if we use an
0 will be
words,
other
certain
of the beetle
average
of 20 ram? (For an extension of
immediately following.)
had yielded an
see the
answer,
your
posterior
of the
intuitively
expect
section
(b) Generalization
of
a sample
that
Suppose
be taken
can
rather
measurements
independent
x\177,
sample mean
what happens as we
x. Using the
a single
just
than
x2,
...
x,, what
%\177
now
is the Bayesian estimateof 0 ? In particular,

get more
and
more observations (n \337o0).9
This problem
may be solved,using (15-36)to (15-38)
with one important
change. Since our data now is 2 instead of x we must make this substitution
in (15-38), and also substitute
0.2
for
o'2 in
(15-38). [Of
(15-37) and
samplemean
when
1 +
rr\177
00
and b
\1772
for
n -- 1
for
n =
variance of a
single observation.] Thus, our
is'
(15-36)
this reduces
to (15-37)
(15-43)
to (15-38)
(15-44)
d2
G\177)
In the
of a
of a
in
is just the
(15-42)
course,
variance
the
cr \177is
definition
generalized
(15-42)
n\177
this
reduces
0 '2
limit, as samplesize
n --\177 \177'
(15-45)
a -- 0'2
b N
Incidentally,
these
exactly
same results
(15-46)
nx
0,2
follow, whether
.\177o\177,
or
(15-47)
\177 \337oo
cr 0
Thus,
evaluating
(15-36)'
posterior mean --
(15-48)
\"\177 7:
ab
0.2
posterior variance =
Again the normality
mode, and median
of
coincide.
this
posterior
Hence,
a \177 --n
ensures
distribution
regardless
of
(15-49)
which
loss
that its mean,

we may
function
ESTIMATION'
329
VERSUS CLASSICAL
BAYESIAN
use:
Bayesianestimator
0 r
as
that,
We
con!lu&
This
is exact
\177c\177,
it should
as
interval
7o probability
95
?,elation of
0 N
of
1.96
\"'\177\1774-
more and more data
Classicaland
Bayesian
Assumed, Results are Instructive
the classical.
less and
approaches
estimation
Bayesian
be: as
(15-50)
\177
collected,
are
(Although
Cases Too)
Estimation.
Other
for
Normality
15
TABLE
Gets
And
Pr{
re
Requires,
and
Point
to
Estimat\177
Observed
Estimates
Our
In
the
20
estimate
Point
Classic\177
In
Example
(n = 1)
Along With
Interval
the Answer:
Confidence
\337
20 4- 1.96
[9(x/O)
as
Limit,
\337 4-
1.96
interval
[9(x/O),[9(0)
Point estimate
and
loss
p(x/O), p(O)
Probability
Same as classical
21
function
21
4- 1.76
Same
as classical
with
an unlimited
interval
b
peed
e
iprio\177r information
tt
\337
less wel\177ght
sample,
a a
ched
to
rior
p .
is completely
information;
disregarded,
and
as in
classical
estimation.
and Bayesianapproachesare compared

in more
detail if0 Table
turn
to the Other condition
that lead s tothe same
resul\177i. Bayes
estimate,rs
a\177so :approach
the classical if prior information is very
vague
(i.e.,
if rr0\177-\177-\177,\177ts stated
in (15-47). Thus the less the prior distribution
tells
us,
the less weight we attach to it. To sum
up,
the two reasons for completely
disregardin\177prior
information
are (I) if present data is in unlimited
Supply,
The classicai
15-8. W e no
or (2)
(c)
Is
if
pri(,r
0 Fb
In t
information
ed or
his chapter
for exan\177ple
the
is useless.
\337
Variable?
we regard the target to be estimated
as a random
variable
length 0 in Figure
15-6. Yet in all preceding chapterg,
beetle's
330
THEORY
DECISION
regarded the target as a fixed parameter

for example,
the average
American
men. Nevertheless, we may often find it useful to think
of,tt
as having a subjective probability distribution
........with
this being a description of the betting
odds we would give that
3t is bracketed
by any two given
values (see the description of subjective
probability
in Section
3-6). In the
problem of men's heights
it may
be helpful to boil down our best prior
knowledge
of 3t into a prior subjective
distribution
of \177u.Then
the posterior
subjective distribution
of 3t would reflect how the sampling data changed the
we have
height
betting
,tt of
odds.
PROBLEMS
15-10
Following the
beetle
in Section
example
15-4(b), suppose that:
=
25
0o
0.2.--1
and a sample of 4 independent observations
on
the trapped
beetle
yields an average length
\177of 20 mm.
(a) Calculate the Bayesianpoint
estimate
for 0, the length
of the beetle.
For two reasons this estimate
is closer to the observed value of 20 than
the
estimate
Bayesian
(b) Calculate the
=>15- 11 Suppose that,

college
in
you
campus,
(21)
in
probability
only
rather quote as your \"best estimate\"

in the population (whole campus):
(a) The
Explain.
interval
for 0.
sample of 10 students on
one is a Democrat. Which
a random
find
15-6.
Figure
Bayesian95 %
of the
proportion
an
rr of
American
would
you
Democrats
estimate,
classical
x
n
1
i-6 =
or
(b) The
Bayesian estimate which,
tribution'
assuming
pot)
this
subjective
prior dis-
CRITIQUE
a
yields5
loss function,
quadratic
x-l-3
n+6
15-5
(a)
estimate
the
-- .25
16
METHODS
BAYESIAN
OF
[QUE
331
METHODS
BAYESIAN
OF
\177
Stn
Ua
l\177ss
0). Comparedto
of
and loss
methods often
sense
the
?(0)
distribution
Bayesian
methods,
classical
(in
method
if there is
utility)
of
optimal statistical
a known
prior
is the
inference
[a[
credible point
, Problem l 5-11),and more appropriate
hypothesis
tests (e.g.,
3 below). Bayesian
methods
are particularly
useful in the social
business,
where sample size is often
very
small,
and Bayesian
considerably from the classical methods.
yield
estimat6s
Pr.oblent 1
sciences
metho&
15-8), more
Table
(e.g.,
estimates
interval
(b)
or criticism of Bayesianestimation
and loss function l(a, O) are usually
Th\177
The
prn
it is
is that
subjective.
highly
known
not
is there
nor
at all of specifying them exactly. For example,what

for an economist. measurin a population'sunemployment
often
with
the t
Then
\177 dif2CUlty
as
I\370;\177functions
I
hrde
error .We have already seen that

this
ls not as
it seems at first glance,
Since in many problem\177
any
of
of Table
15 7 lead to the same Bayes estimator
'
'
the
s\177lect'ng
oter
information
Th\177
I}
'nS!fin\177'n\370wi\177
t\370O
'\177
0 as a
re
Variat)le;
(as
thougB
rior
are often
there
distribution
as a subjective
functions,
:e for
example,
they
do
Lindgren,
the
involve
B. W.,
Statistical
Usuall
(0)
:
0f
reflecting
rate
Chips).
his prior
satisfactory.
of
specification
subjective judgments.
Theory,
interpreting
bowlful
entirely
right
the
unemployment
from a
distribution
indeed
]2
in
difficulties
cannot re g ard
it is drawn
to
lead
still
But
he may
not view even this as
'esian techniques require a rough-and-ready
these
pro
...the
would
Oil 0.
betting
Sin
\177
For
uired
an economist
think of?(0)
Instead
Moreover
:)m variable
function
loss
\"Wrong\"
estimator. ',
0 as a
rate,
statistical
!le
m\177v\177ta
serious
the
is
'\177pe
loss fun,
2nd Ed.
Ne w
The
York'
Macmilla
0 The
data
we borr
information
for Bayesian inference is ?(x/0), the distribution
s can often be borrowed
froIn dassica I statistics.
[For example,
classical
deduction in (15-42).1
uired
\177.B
\370i
:i
Of sample
recall how
THEORY
DECISION
332
interesting observation however, is that

such explicit specifications,are by no
elements.One
of
the
major
suppose
loss function,
Methods in
that,
opts
rather than
for the 0-1
using
been
Disguise
familiar
the
loss function.
the posterior distribution'
he
of
mode
the
badly
has
As we shall see
in extreme cases
wishes to estimate
0 with
no prior
he might use the \"equiprobable\"prior'
p(O)
= c, a constant
desperation
Further
techniques.
when exposed;
classical
knowledge.In
a Bayesian
Suppose
subjective
better. 7
as Bayesian
Methods
Classical
*(c)
in
of the same
of the Bayesianmethod
contributions
to lay bare the assumptions implicit

in the
next section,
some of these fare
any
intelligent
guess is substantially
free
require no
which
methods
classical
means
He
(15-52)
and attractive quadratic

will estimate 0 with
thus
_ p(O)p(x/O)
p(O/x) -
p\177-x)
mode, he
since the bracketed term
To find the
But
the
finds
this
statement
value
doesn't
[c/p(x)]
the value
'
(15-8) repeated
p(x)
of (15-52)'
because
But
of 0 which
is recognized
of 0 which
depend
makes
as just the
(15-54)
p(x/O)
?(x/O)
definition
?(O/x)
makes
on 0,
he only
largest.
needs
(15-55)
largest
of
the
But
to find:
classical
MLE. s
From this, we conclude that a classical statistician who uses MLE is

getting
the
same
result as a Bayesian using the 0-1 loss function
and an
\"equiprobable\" prior. This seems a very unflattering description of MLE,
sinceneither
this
prior
nor this loss function
is easy
to justify. But in many
cases,
MLE is not nearly this restrictive.
Ifp(O/x)
is unimodal and symmetric,
as it often is, then its mean, modeand median
coincide;
in such circumstances
MLE is equivalent to Bayesian estimation
using
any of our three loss
functions.
As
if the
consideran
discussion of
even
more
MLE above
questionable
has
not
application.
enough, we
Suppose we are estimating
been damaging
A further criticism of Bayesian

methods
is that there is too great
a cost of computing
Bayesian estimates (not to mention learning about them); but this criticism is being
weakened
with the advent of better
computer
programs.
8 Note
that in developing MLE in Section
7-3, the notation ?(x; 0) was used, equivalent
to 70:/0) used here.
?
AS A
S TESTING
HYPOTHESI
DECISIO
BAYESIAN
333
distrib\177
strib\177tion\177
proportion
rr (as in Problem
15-11). It has been pro\177ed \177that
i;tatisticianL\177using
MLE
will arrive at the same result (estimating
,
as a Bayesian
using the quadratic loss function
and
the prior
t
shown
in; Figure
15-7.
encoun\177re
i l.
popU\177latiCln
aa classlcal
class
\177 w\177th/x
rr
'n
with
\337 [/I
'ior distribution is
about the prior
comforttabll
of students
majority
We recall
are Republican, or
been un15-11; but it
have
may
we
that
in Problem
graphed
distribution
we have yet
worst
the
hopeless,
obviously
a huge
Democratic.)
are
ority
ma
huge,
(It means that
\177o(\177r)
.
I
FIG.
was vastly t/etter

strhng
very
to reject it
n
this.
than
d result in
ih Problem
should
b\177
used
15-11.
although
con\177:lusion,
Section7-3(e)],these
it
This
a small
are
with
large
great
15-7
explains
MLE
sample
Sul}pos, there are two

\"X\177
is
sighted
\177n a
\370Again,
le\177off
Macmilta
B.
an
W.,
but
Statistical
is
this
2nd
\177eetle
provides
sighting
Theory,
while
harmless,
insecticide.
expensive
territory;
DECISIO
BAYESIAN
species of beetle.SpeciesSo
a serious pest, requiring

as yet uninfested
Lindgren,
characteristics [see
sample estimation,
caution.
new,
)r example,
occasionally
give a
in leading us
correct
was
has man
attractive
prot\177erties; in small
AS
species
can
MLE
why
sample; our intuition
Ed.,
Ne w
is
no
York'
334
THEORY
DECISION
insecticide
was So or
whether the beetle
useful in establishing
be used or not7
information
S\177.
Should
To answer this
question,
we need to know the costs l(a, O) of a wrong
and the probabilities ?(0) of it being
one species
or the other;
these are given in Table 15-9. Obviously action a0 (don't spray)is appropriate
if the
state of nature is So (harmlessbeetle)while
a\177is appropriate
if the state
decision,
is S\177.
l\177,
it will
Probabilities of States of
Loss Function
T^m\177 15-9
.3
So
a0
(don't
each
100
15
15
spray)
a\177 (spray)
in
row of
\234(a0)
by their
table
this
Species)
Species)
\177
Action a
elements
S\177
(Harmful
(Harmless
\337
\337
_., \177
.\1770
weighting
and
Nature,
.7
?(o)
'
l(a\177, 0\177) =
calculate the expected lossesL(a),by
As always, we
short.
for
Should we spray, or not?

be convenient to generalize the losstable, calling
(a).
Ouestion
Solution.
probabilities'
appropriate
= ?(00)to0
= (.7)5 +
(15-56)
(.3)100= 33.5
and
L(a\177)
We
this
problem
testing: action ao (don't
beetle), while
q- (.3)15
I5-\177-min
action is a\177: spray.
the optimal
see that
Thus
(.7)15
action
a\177 (spray)
be expressed
may
spray) may
may
terms
in
accepting
interpreted'as
be
be interpreted
of
hypothesis
H0 (harmless
as accepting H\177 (harmful
beetle).
Question
(b). Suppose that prior

9 times as common
as
is the optimum
action ?
So is
species
p(O), what
Solution. Don't spray,
In
this
risk,\" i.e.,
case
the
assume the
harmful
beetle
as
shown
about
information
S\177.
in Table
species is
is harmless
this
Given
the beetles is
new information
that
about
15-10.
so rare, that
as our
working
it
is better
hypothesis.
to \"take the
HYPOTHESIS TESTINGAS
15-10
TABLE
A BAYESIAN
Action,
Optimal
of
Calculation
335
DECISION
a priori
p(O)
.9
.1
S 1
L(a)
(Zo)
spray)
(Don't
15
15
al
14.5\177- min
100
a0
15
(Spray)
So far we have assumedno
z).
beetle
that
been
sighted.
1.ength
r\177eas\177red as
their
27 min.
\177as
that
further
which are normal

0\177=
30 respectively.
25 and
\1770 --
me-\177ns
Suppose
a poster.ori [Assume
?(0) and
has
losses given
in
captured,
been
the
two
the
.pn
information
statistical
it
lengths,
tlnguishable\177y
and
suppose
Now
species are
random variables
What now is the
its
with
dis-
with
a --
best
action,
4,
15-9.]
Table
a generalsolution, leaving
of particulars
to the end. Losses are ca!culated\177as in (15-56),
Lhe appropriate
posterior probabilitiesp(O/x)for \234(0):
\234(ao) = ?(Ool)too + ?(Odx)to
It
$oluti,\177n.
substitulion
substituling
will
most
be
to develop
instructive
Similarl'\177
i
Wec
.ction a0
o\177se
-- jP(Oolx)lxo
L(al)
if and
only if
(15-57)
and (15-58) into
riterion: choosea0
obtain
p(O\177/x)[lot
:keted
Th\177
as-s9)
<
\234(a0)
SubstitUting
(15-58)
q- p(O1/x)ln
(15-59),and collectinglike terms, we
iff
- ln]
< p(Oo/x)[l\177o-
(15-6)
/o0]
quantities
ro
lt0-
(15-61)
100
and
are
call :d
,grets.
(to) is tl\177e e> tra

rather_t\177a n
the di
It is
easy to see why:
lossincurred
if we
Evaluating
:e in sprayed(ao).
column
elements
en !ot
regret
if the
used the wrong
action
the
harmless
i.e., sprayed(aO,
beetle is
(15-61)
we see that ro is
in Table
(15-10).
Our much
15
-- 15
larg er
10,
regret
336
THEORY
DECISION
rt = 100 -- 15 representsour net

spray) on a beetle that turns out
be
it may now
(15-60),
to
Returning
action (don't
loss
if we employ the wrong
to be'harmful.
of' regrets'
terms
in
written
< p(Oo/) ,'o
p(Odx)
--
p(Oo/x)
equation can
<
p(00 p(x/00
our
this is
that
Recall
to an
(15-65)
,-\177
0o).
the
called
theorem,
important
cross-multipli-
appropriate
An
as
spray), interpreted
a0 (don't
action
0 --
harmless,
of (15-65) leads us
cation
r_o
p(X/0o)
for
criterion
acceptanceof H0' (beetle
in
expressed
be
now
cancels,
?(x)
p(0o)
(15-64)
r\177
in this
probabilities
and noting that
posterior
(15-8),
The
using
full
(15-63)
Bayesian Likelihood-Ratio
Criterion'
H
Accept
p(x/ 0 0
where
the
r\177is
tion,
function. Thus
data
be
To illustrate
and p(00 are also assumed
equal.
if the
is more likely to
the very
the
this
iraplausible
then the
a sufficiently
p(x/0o)],
than
less
H0
Thus
inequality.
simple case in
assumedequal, and
The
the
0\177generating
00
will
the
sample
In simplest terms: we
generate the observed x. In this
viewed as hypothesis testing, within
a maximum
regrets
p(0o)
(15-66)becomes1;
sample
the
generating
the
which
probabilities
prior
side of
right-hand
likelihood of
greater than the likelihood

of
the alternative H\177 is accepted.
which
is called
be.
consider
further,
for error) are

is accepted
is
0x
p(x/00 is sufficiently
small enough
to satisfy
[i.e.,
as it should
accepted,
H0
0\177.
7-3 as
Section
of (15-66)
side
is certainly reasonable.If
criterion
This
explanation of the
likelihood ratio will
thus
the parameter
given
likelihood estimation in
left-hand
deduc-
classical
from
ratio.\"
\"likelihood
(penalties
the
x,
estimator
the
and
distribution,
prior
borrowed directly
is often
in maximum
it appeared
the likelihood
be
p(x/Oi)
and is the distribution of
Specifically,
(15-66)
ro p( 0o)
<
if 0i is true, p(Oi)is the

of the observed data.
earlier,
stated
iff
regret
is the distribution
As
[p(x/00].
[p(X/0o)] is
Otherwise,
selectthe
sense,
likelihood
hypothesis
this
context,
be
shown
could
HYPOTHESIS TESTINGAS
in
ga. In b we make the
Figure
function!
'distribution.
rotered
(c
Then
0o
on
criterion
Accept Ho iff
Ev,
(15-66) when
\337I
observed
x is
\177 r\177 or
r0
To keepthings
complicated
that the two lik
assumption
further
and 0, respectively)
(15-66) reduces to
337
DECISION
have
the
closer to 00 than
:lihood
normal
same
\177\370
(15-67)
0\177
result.
reasonable
Again,
BAYESIAN
normal
'00 are
?(00)
a common
with
\177 ?(00
we assume
simple,
\177.Then
is obviously
that 00 <
a more
0\177, and
that
our criterion
(15-66)
p(x/0o)
p(x]01)
f
critical
0o
Accept H0
:
if
x observed
this
ue
vat
in
\177Accept
range
H1
(a)
I:
p(x/01)
p(X/00)
00
01
valu e
critical
I
Accept
FIG.
15-
to
\177o
In
fact
symmetri
and
H0
I
I
I
Accept
testing,
using the
p(O 2) = p(01)]. (a) For
!ity is
not
required;
the
BaYeSian
any
ratio [sPecial
(b) If p(x/O i) = N(Oi,
likelihood
p(x/Oi).
tw \370distributions
ne ed
only
be
case
when
a).
unimOdal
and
338
for
THEORY
DECISION
H0
accepting
becomes
e-(\177/\177)(\177-\370\177)\177 ro
p(0o)
(15-68)
e -(\177/\177(\177-\370\370\177
< rl p(00
This may
be reducedn to: acceptH0

X
( 01 -(re
iff
log
r0
+
I\177 P-\177l)--J
p(0o)']
00
2
01 +
(15-69)
0\370
(The logarithms used throughout

this
section
are natural logarithms, to the
base
e. The common logarithms of appendixTable
VIII
can be converted
to natural
logarithms
by multiplying
by 2.30.) We note that
the
right-hand
side of (15-69) is independent of x; as in all hypothesis
tests, this can be
evaluated
prior
to observing
x. At the same time it does depend, as expected,
on background
information
?(0) and regrets. Moreover, when
r0 = rl and
p(00) = p(01),then the log term disappears and this reduces
to the special
case (15-67).
(15-69)
yields:
of the beetle
problem
particular
the
Finally,
Substituting
the information
accept H 0 iff
spray can
(c) and
question
in
given
< 3.2log
(15-76)
+ 27.5
(15-77)
< 23.4
logarithms
solved.
(2-\177.5)-4-27.5
3.2(--1.29)
taking
be
510(.7)q
16 log
Details:
now
Table 15-9 into
(5-78)
of (15-68):
I
- 2az
00 \177'+ \177 (x -
(.\177-
00)2
< K
(15-70)
where
(t5-71)
(15-70):
Rearranging
1
2a\177.(20xx
2(0x
i.e., accept
-- Oo)x
--
o\177
2
(0\177
\177'
--
0o\177.)< K
002) <
2o\177'K
(15-72)
(15-73)
H0 iff:
0\177--
definition
of K
K+
x<
Using the
20oX
00
2(0\177
in (15-71), (15-74)may
oo
(15-74)
-- 00)
be written
as (15-69).
AS A
TESTING
HYPOTHESIS
339
DECISION
BAYESIAN
Since
a 27 mm
served
H0.
beetle
is
beetle
25 ram--exactly
is,
to be
reasonable. The heavy
it
that
the relative size of

(b)
Co
\177
es H0
be chosen.
which
that explains
this
out
confirm
result.]
and
This
H\177
(two
is
analysis
involves only
two
00 and 0\177), one of
scope, since hypothesis
\177a composite
It\177.
limited
of
Thus
g the
materi
test,
we
less
sati
Ba,
loss
turns
(15-75) we
we have covered only that

first section of Chapter 9. In recalling
that
Classical
that
it had
the advantage of being far simpler;
but it was also
It
used
only the probability function ?(x/O), while
the
Lhod also exploits the prior distribution
#(0)
and regrets
(the
; we
have
seen in the last section how
important
both
these
ing
up an appropriate
test. Restated, the classical method
sets
involves
testing
this answer
beetle
described
here
states
of nature
testing
hypothesis
Th\177
competi
regrets
[From
that
harmless
Methods
Classical
with
two
the
risk.
see
if the
involved
if the
even
(15-78):
expect of a
we
thought
damage
spray to avoid this
us to
1 induces
further
With
spray.
still
in
we would
length
the
00,
violated, and we reject
this condition
is
the critical
value
beetle,
doesseemstrange
rum
can be i]
\177 =
5 o:
sometimes arbitrarily, sometimes with implicit
reference
to
vague c(}nsi\177
of loss
and prior belief. Bayesians would
argue
that
these co r}side t:ationsshould be explicitly
introduced--with
all the assumptions
exposed,
open to
improvement.
and
criticism
PROBL]
15-12
m
in
all
(c)
dec
15-13S
tal\177
no]
or
tio\177
\177
and losses given
in
Table
15-10-
the hypothesis test of question (c) above.

With
your
lent of 27 ram, what would you do ? Why does our argument
paragraph (spray even if beetle is 25 ram) no longer hold?
)pose
that
species
So and St were equally frequent.
Would
that
,ur decision ?
?frequent
species
would
So have
to be
in
order
to alter
your
classify people as sick or well

(hospia psychologicaltest.The test scores are
distributed, with a = 8, and mean 00 = 100if they are well
120 if they are sick. The losses (regrets) of a wrong
classificaobvious'
if a healthy person is hospitalized,resources
are
psychiatrist
not)
on the
has
to
basis of
THEORY
DECISION
340
and the person himself may even be hurt by the treatment.

Yet the other loss is even worse: if a sick person is not hospitali'zed,
he may do damage, conceivably fatal.
Suppose
this second
loss is
considered roughly
five times
as serious. From past recordsit has been
found that of the people taking
the
test, 60\177 are sick and 40\177 are
wasted
healthy.
should be the critical

classifiedas sick?Then
(1) What
(a)
is
What
(2)
(3) What
(Probability
\177?
is/\177? (Probability
(b) (1) Ira classical

is
What
(3)
By how
fi
I error).
of type
II error).
setting
\177
5\177,
then
what
critical score? Then
will be the
(2)
of type
arbitrarily
is used,
test
is
which the person
above
score
much has
(c) What would we have

to arrive at a
it is reasonable
?
to
in order
loss
average
the
by using
this
of the two regrets to
the ratio
assume
test
Bayesian
increased
method
less-than-optimal
having
5 %?
\177=
Do you
be
think
GAME THEORY
'15-7
of this book to consider a

theory. Recall that the concept
of
probability was developed in Chapter
3 as a groundwork for the
statistical
deduction
and induction that
followed.
Game
theory is not part of this
statistical
theory;
rather,
it illustrates a quite different
application
of the
concept of probability.
Game theory
is a way of analyzing
conflict
situations.
These may arise,
At
point
this
interesting
rather
we leave the general

branch
of decision
for example,in poker,

business,
might be card players playing
for
in business,
remain
and
or
military
or politics;
parties
conflicting
our
thus
insignificant
stakes,
leaders
engaged
in a
oligopolists
to
playing
desperate set of
moves
countermoves.
(a) Strictly
Determined Games
The players employ

he
argument
has
control;
The
the outcome
way
will
outcome
the
players is shown
in
Because
strategies.
over the outcome of the
some control
Table
also
depend
of the
15-11;
a player
game.
can choose his strategy,

But
on the strategy
game is related to
the \"payoff
this is called
he is not
in
complete
of his opponent.
the
of
strategy
matrix,\"
and
both
defines
TABLE
15-11
Matrix
for
An Example
A
of a
Payoff
B)
for
Function
(Loss
341
THEORY
GAME
B's Strategies
1
2
3
`4'S
Strategies
stn
simple,
Thus,
st function) for B. ,4
(o
matrix
outcome
11
should
while B
as possible,
:ge
;ible.
as small
Obvi
B will
15-11
some
\177where A
negative
!
game'shc
This is
tl]e
all-positi'\177
,n
in
Table
a ;umptlon
e for easier
hi\177nself,
strategy
his appropriate
2.
Tl\177e
from
In
attractiw
this
in B's
playing
6. A
fin,
reducing
the outcome
game shown
think
of a
and some
to induce
order
B S12for
eac
Table
payoff matrix
B pays A)
(where
in
time
B t \370play
he plays.
essence
of game
after he knows h(}w

his
strategy is obviousand the
known
largest
theory,
that B
payoff
becomes
has chosenstrategy
(18) and then play
however, is
that
each
has
opponent
game
player
3,
that
must
knowledge
of his opponent s decision;he only
matrix. We further
assume
that the game is repeated many
o4Iy
clues a player has about his opponent'sstrategy
must
come
tg his past pattern of play.
circumstances A finds .the.\177.c\177ntmu9us play of strategy 1 unis true that this row has thelargest possible payoff ($25).But
cooperation
in playing his strategy 1, and it is clearly
not
.t to cooperate. Indeed if B observes
that A is contin9ousl
gy 1, he will
select
strategy
2, thus keeping the payoff down
to
ategy
3 similarly
unattractive; B will counter
with strategy
3,
layoff to only 5. A chooses
strategy
2; the very best play by B
\275lmslelf
thle pakoff
times.
keep
we now make, in order

to keep our payoff matrix
geometric interpretation. The question
is \"With this
it in B's interest to play
this
game 9\"
strategy
Tl\177e
to
in playing the
in
make the
a strategy to
be selecting
pays B). Alternatively,

15-11, A might bribe
bne. For example,if it is

!scan column 3, selectthe
knows
should be trying
elements
a trivial
will just
commit
18
lose. So we might
positive
$12 side l\177ay\177ent, is

If a blay4r can selecthis
committeld
interest
no
have
d0 ' nothing but
can
normally
the
10
also
might
the
20
to player
r\177
on
25
,4. Thus if A selects

strategy
2 and B strategy 1,
be a payoff matrix for B, similarly
dependent
selected
by the two players. However, to keep the discussion
):ie that this is a \"zero-sum\" game- -i.e., what
,4 gains,
B loses.
115-11defines not only the gain matrix for A, but also th e loss
the payol
A
11
without
342
will
THEORY
DECISION
yield
still
the
calculated
these
payoff of
2t a
value
minimum
This
values.
minimum
10. Now
review
why ,4 chose strategy 2.
in each row
and then selected
the largest
maximum
of the row minima is called
He
of
the
\"maximin.\"
Now
to keep
consider
the payoff
the problem from B's point

as low aspossible.Strategy
of
view.
that he wants
when ,4 observes
Recall
ill-advised;
1 is
Payoff
25
20
\177-
\177
*,
11
I
I
.<_____>.
B's
strategies
A's strat
Compare
to the
saddle:
FIG. I5-9
him playing this, he
3 is also rejected
may
matrix
in Table
I, leaving B with a loss of $25. Strategy

$18. He selects strategy
2; the most it
that B calculates the maximum
value
of each
smallest of these. This minimum
of the column
cost
him
can cost him is $I0. Note

column, and then selects the
maxima is called the \"minimax.\"
Note
that
occurs
at the same point as maximin,
with
A will play his strategy 2, and B will
play
\"strictly determined\" game becauseminimax
each
This is illustrated
payoff measured
in
15-11.
with
counter
will
it
Payoff
Figure
vertically.
15-9,
At
in this
a payoff
strategy
his
and
a diagram
,Y we
note
special case minimax

of 10. In this game,
maximin
2;
this
called
is
coincide.
of the payoff matrix

a \"saddle point,\"
with
which
is
both
the
11
When st
Summ
strategy
B will
,rgest element
e.l In
I'l\177e
t)
(from
if he
is
both
it is
minimax.
and
maximin
both
game,
determined
strictly
this
payoff
play
exists,
point
\337
\177saddle
smallest element in its row.
and the
column
its
in
343
THEORY
GAME
is always 10,so that

bribed $t2 to do so.
B to A)
play
a game
will
and
it is
clearly
PROBL\177M\177
15-14
What
games
is
the
strategy
appropriate
in each
opponent, in
for each
case decide which
the
game
--1
--2
--2
player
(a)
B
1
--1 --3
B
2 3
I
1
10
I
1
--20
--4
--2
--6
the
favors.
following
344
(b)
THEORY
DECISION
Mixed
Strategies
Let us now
to
try
apply
the theory
of part (a) to
the
following
game:
largest
minimum
value
TABLE 15-12
B
1
5\177
4,\177
,4
Maximin
Minimax
A would select his strategy 2; this

(maximin
= 4); at the same time
column with the smallest maximum
arise; becauseminimax
and
is the
B
maximin
row with the
woutd
value
select
(minimax
do not
his strategy 1; this is the

= 5). But now problems
coincide, there is
no
saddle
Such a game is not strictly

determined,
with each playing only
one
strategy;
it is easy to see why.
B begins
by playing column 1, while
A plays
row 2; the payoff is 5. Now B observes,
that
as long as A is playing
row 2,
he can do better
by playing
column 2, thus
reducing
the payoff to 4. But
when
B switches
to column 2, it is now in A's interest to switch to row 1,
point.
raising the
As an exercise the student should confirm

that
a whole
countermoves
are set into play
eventually
drawing
around to the initial
position.
Then.a
new cycle begins.
This will continue
until the players recognize a fundamental
point.
Once
a player allows his strategy to be predicted,he will be hurt. Thus, for example,
when A's strategy
becomes
clear,
he can be hurt
by B. What
is his defense ?
A's best plan is to keep B guessing.Thus if A determines his strategy by
a chance
process, B will be unable
to predict what
he will do. For example,
A might
toss a coin, playing
row
1 if heads,
or row 2 if tails. He is using a
\"mixed
strategy,\"
weighting
each row with
a probability
of .50. Now B
doesn't know what to expect; the only question
left for A is whether this
50/50
mix is the best set of oddsto use.
The
best
mix of strategies for A is determined
in Figure 15-10. Along
the
horizontal
axis we consider various probabilitiesthat
A may
attach to
playing
row
1. This is all A has to select; once this
is determined
(e.g., if
payoff
to 6.
series of such moves

the players in a circle
and
345
THEORY
GAME
ar
E
\177
Maximin
o 5
value
'\1774
-u
If
sets
Bwill
\177
\177
'\177
x
\177-
play 2
imized
-' 4\177
A sets
or 2
/3),
A's probabilities
B
of
is
hen
extreme
if'
sets p
playing
Table
row
15-12).
2 is
also
-- 2/3).
1\1772,
the
\1771sets
value
p =
of the game l/\177 is 5. On the other hand, at the

1 (and always plays row 1), then
l/\177 =
3. Or if'
then
F1 =
3(1/2) +
r any probability/;
Generally,
for\177n
,
probabi!ity
o:
(15-80)
confirms
selects.
Similarly,
l/\177
plays only
These last two
--
5(\177
5(1/2) =
(15-79)
-- 27
(\1775-80)
select'
may
= S
--?)
that V\177 is a straight line function

if B plays only column
2, then
6p +
4(1
--?) =
of p ,
the
(15-81)
27
--
6p
3, then
column
Va
that
3(,0) +
F\177 =
Or, if B
shown in
the
expected
value
of the game which, of course,
the probabilities /l may select,
but also on what
B may
only
column
1, then the expected valueof the game
is a function
)ability/I
may select; this appears
in this diagram as the line
examining
in detail.
reme
left, if/1 sets ? = 0 (i.e., never
plays
row 1, but always
plays ro'
The
to
attached
probability
the
(for game
strategy
mixed
for playing row
,,nds on
At
we plot
Vert
only
It
always L play
\177
then
1/3
d\177
V1 = 4
and
B plays
DeterminatiOn of A's
,4 sets
V\177.
P = \177'B will
FIG.
only
A sets
at 4\177,
p =
do. If
!f
regardless of
whether
not
\177'
then Vis max-
27 +
8(1 --_p) --
equations are also graphed
in
Figure
15-10.
(15-82)
346
THEORY
DECISION
game
The
select?. If he
2, and keepthe
is now laid out for A to analyze,

? = 1/8, his opponent will
expected
value of the game at 4\177.
problem being to
with his
playing
geometrically,
and confirmedby evaluating
(15-81)
setting?
----- 1/8.]
Or ifA selects? = 1/2,
B will counter with 1, thus keeping the expected value of the game down to
4. Since ,4 is dealing
with
an opponent
who will be selecting
strategies
to
keep V low, the expected value of the game from A's point of view is shown
as the hatched
line in Figure 15-10. The bestA can do, therefore,
is to select
? -- 1/4.This guarantees
V = 4\177; moreover,
note that this is the intersection
of V\177 and V2. Ihus this is the value of the game regardless
of whether B
plays 1 or 2. This geometric
solution
may be read from Figure 15-10,or
determined
algebraically
by setting
V\177 =
Vo\177;using
(15-80)
and (15-81):
selects
5 --
27 =
p =
this value
Finally,
coun\177ter
by always
is shown
[This
(15-83)
2l,
(15-84)
\254
of? is substituted back into
(15-81)
for
of the
the value
game'
V2
A decides to
Thus
doeshe put
this
into
4 +
2(}) =
attach a probability
? There are
practice
(15-85)
4.\177
of
several
1/4
to playing
possibilities;
row I.
for
How
example,
he might toss 2 coins.If they both come up heads (probability 1/4), then he
row 1; if not, he plays row 2. If this game
is repeated
many times, A
will insure
that he receives an averagepayoff
which
will tend towards 4\177- and
there is nothing B can do to reduce
this. All B can hope for is that A has bad
luck; (e.g.,by the luck of the toss, A plays row 1 when B is playing
column
1).
This sort of bad luck can reduceA's average
winning
below 4\177 if the game is
played only a few times (or A's good luck can raise his average
winnings
above 4\253); but as the game is played over and over,
the element
of bad luck
plays
tends to
fade
out.
PROBLEMS
(a) ket's play a variation

on matching
coins. Each of us will choose
or tails, independently and secretly. I'll pay you S30 if I show
tails and you show heads. I'll pay you S10 if I show heads and you
show tails. Finally, to make it fair,
you pay me S20 if we match (i.e.,
both show heads,or both show tails). Do you want to play? Why ?
(b) What
are
the optimal
strategies of the two players
in an ordinat T
game of matching
pennies
? (Recall that in this game,
one player gets
the pennies if they match, the other gets them
if they
don't match.)
:=> 15-15
heads
still toss your penny i n
,ou
or tail ?
a head
r\177miSe.
co\177:
wants
Y\370u choos e
yo\177I eltect:a
on whether
a tail,
[ding
ac\177i
f\177o
det\177e
i
d]ng
on which
Do you agree to
(bi
u select
would
$5,\177 .h'at
(c) \177.re \177there

FrCm
$_5,
this
game
you conclude ?
any two lines in
strategies ?
five times and found
your
that
diagram
and the payoff matrix,

it is always preferable
do
you had
not
show that,
for him
intersect?
matter
select
the
no
to
from all
c'\370nsid -
further
'
erStion]?
In
for
sc\177lvin\177
maximinlandminimax
game, an d
If
coincide.
strategy, the
If
they
dctermin?d,
\177.ixed
strategies
are
cases o\177eo\177etdicaI
1Y or algebraically
\177
mfire a\177vanced
matpematiCal
extendin\177 th4 mechan ical SOlutiOn'
cases,
first
this
is
step is to
a strictly
test whether
deterinined
by each
techniques
are required;
it i s more important
and assumptions
ihilosophy
do,
player is determinedl
coincide, the game is not StriCtIy
called for' and are determined
in \177imple
as we have illustrated In more c\370mplex
t \270be used
maximin
do not
strategy
and
minimax
best game
t he
single
the?
fundameital
won
is \"dominated
payoff matrix,
strategy
the
club
c\177)t\177ld
n]t
If
\177Ce.
resPeCtiVelY'
S1
and
heart,
diamond
or club
$--2,
Sl, and 8--5, again
instead
thi
pehnies,
suggestsa
an
selects
he
while
$15, $4,
of the heart ace (i.e., the heart

strategy).
By initially
examining
he ha\177;e dropped
the heart strategy
club ac I
bv\"
He therefore
ace he's chosen.

play ? Why ? What
\177;\370ththe
diagram
tl\177te circumstances,
what
heads or tails
he s chosenthe spade,
he pays yOU S--10,
t \270play
were
i \177'\177
\370u
s\177retlY
than
to match
Wish
You
cards.
play
he pays you
head
det\177 e
(a)
to
companion
Jr
rather
a game,
Why
long sea voyage.
on a
yourself
15-16
such
347
THEORY
GAME
underlyillg
ganle
but
rather
than
to consider
the
the\370ry:
player using his best mixed strategy

can
guarantee
a certain ex.
pected value \177or the ga me, regardless of what his opponent
may do. HoWever,
this
is only
t\177e value
towards
which the average of many
games
wll tend.
If the galme .s only
playe
d a few times, luck may
raise
or lower
this
1. A
payoff.
2.
play
is
OIme
di'ctat\177
optimal
mixed strategy is determined
d by a random
process (tossing a
the
enough
t\177 decide
1, then
2,1 the:
(e.g., ? =
coin).It is simply
t\370 play
11 then
2,
each
and
'
1/2),
the
no t
good
strategy half the time---for example, alternating

so on. Once the opponent observesthis pgttern,
348
THEORY
DECISION
and
you.
Note
in the simplest game of
how badly a player would
be hurt
if he
interchanged heads and tails rather than tossed the coin. Once an intelligent
opponent
observed
this pattern, he couldwin every time. Each player must
be unpredictable,by deciding
his play be chance.
3. The theory of games is a very conservative
strategy.
It is appropriate
if a game is being replayed many times against an intelligent
strategist,
who
is out to get you, knows the payoff
matrix,
and can observe your strategy
mix. If these conditions are not met, chances
are you can find
a better
strategy than game play. To illustrate,
consider
an extreme
example. Your
predict your next play,
he can
hurt
5-15b),
(Problem
pennies
matching
payoff is:
Opponent
You
$4,000
_\177
$4
' 82 i
minimax
-
maximin.
coincide,
both you and your opponent should
time. But on the first play, your opponent plays strategy
1 ! This means either
that
he is a fool, or unaware
of the payoff matrix (and
that
$4,000
debacle
that he faces). It doesn't matter
which;
in these circumstances you drop game
strategy,
play row 1 and punish
your
opponent
for
his stupidity
or ignorance.
Game theory also should not be used in games against nature. As an
example,
suppose you are trying
to decide
whether to hold a picnic indoors
Since maximin
play
or
strategy
and
minimax
2 every
outdoors.
Nature
Rain
100
Outdoors 2
Indoors
You
Your
profit
You can
hold
depends
Not
Rain
picnic'
on
easily confirm
that
both where the

game
theory
1,000
picnic is held, and the weather.

selecting ? -- 10/11,with an
means
GAME THEORY'
o\177
weather
Instead,
the
that
be holdleg
he mor'\177e
does
so!uric
In cO.
nch
but
rather:
ate; but
\337
is t
ever
n outlined
Bayesian
if
if a
[sion,
etermined
under
\1777\177(01
is
Imetlods
student
will
and this of
The
hold.
not
play, unlessl\177e
hand,
probably
in Section
prior
by
these
opponent
15-2.
are required.
5-86)
it
see
that
one
may
ou
Y
the simpler
course is the Bayesian, or
opponent,
conditions
expected
not exist independently,

then game theory is appropribe too conservative
a line of
does
intelligent and informed. On the other

independently
(e.g., rain versus shine), then
is highly
determined
.
.
\177mmediately
p(O)
distribution
a hostlie
of:
(nature
assumptions
solution (15-:16)
is required;
value,
will
is dead wrong in this case, because

is intelli g ent and out to get
solution
game
co\177nplicated
of. the key\177ga}ne

theory
\177
an expected profit
000()
= 800
with
outdoors,
p\177cmc
I
s mply
you
that
clearly has gone wrong. An intuitive

glance
at the payoff
you should go outdoors, providing
there
is a reasonable
hat it won't rain. Game strategy is inappropriate becauseit is
false
premise
that nature is an opponent
determining the
the
sole
objective
of ruining your picnic (i.e., minimizing
V).
r '
\337
\337
e s odds
are determined independently; and let us suppose
ability
is 4/5 that
it will not rain. With these odds, you
Should
expectat:
based
mean
:sts
matrix
over $90. Theseodds
fit of just
lic indoors.
expectec
hold the
349
\177
Appendix
LISTOF
I. N 2 and
II.
TABLES
(a)
Random
Digits
(b)
Random
Normal
Numbers,
Standardized
III. (a) Binomial

Coefficients,
(;)
(b) Binomial Probabilities, Individual
(c) Binomial
Probabilities,
Cumulative
IV. Z, Standard
Normal
Distribution
V. Student's t Critical
Points
VI. Modified X 2 (C 2) Critical Points
VII. F Critical
VIII.
Common
Points
Logarithms
Citations for Tables
350
TABLE
Squares and
Square Roots
N 2
1.0000
1.00000 3.16228
1.0201
1.00499
1.0404
3.17805
1.009953.19374
1.0609 1.01489
3.20936
i1.1025
1.019803.22490
!1.1236
1.02956
;11.0816
1.02470 3.24037
3.25576
i1.1449
i!1.21oo
1.1
1.1
5.91162
1.54
2.5716
1.24097
1.56
2.4025
1.24499
3.924\1778
2.4336 1.249005.94968
1.55
2.4649
1.25300
2.4964
1.25698
1.04881
3.31662
1.58
5.96232
5.97492
1.60
2.5600
1.26491
4.00000
1.61
2.5921
1.26886
4.01248
1.2769
1.063013.36155
1.65
2.6569
1.27671
4.03733
!.2544
1.05830
3.34664
1.08167
1.62
1.65
1.66
1.67
3.42053
1.68
2.6244 1.27279 4.02492
2.6896 1.28062
2.7225
4.04969
1.28452 4.06202
2.7556
1.28841
4.07451
2.7889 1.292284.08656
2.8224
1.29615 4.09878
1.08628
3.45511
1.4161 1.09087
3.44964
1.69
2.8561
!.4400
1.70
2.8900 1.305844.12311
1.09545
3.46410
i.4641
3.47851
!.4884 1.10000
1.10454 3.49285
1.5129 1.10905
3.50714
!.5376 1.113553.52156
1.5625
t.11803 3.53553
1.12250
3.54965
!.6129
1.12694
3.56371
!.6384
i.6641
4.15521
1.50767
1.31149
1.74
3,0276
1.31909
1.76
3.0976 1.326654.19524
3.1329 1.33041 4.20714
5.0625
1,32288
4.18530
3.1684
1.33417
1.53791
4121900
3.2041
1.73
1.75
1.77
4.14729
2.9929 1.315294.15955
4.17133
1.13137 3.57771
1.13578 3.59166
1.79
1.14018
1.80
3.2400 1.34164 4.24264
1.81
5.2761
1.78
1.6900
I!.71611.144555.61939
li.7689
1.15526
5.64692
1.85
1!.7956
1.157585.66060
1.84
5.60555
1.14891 5.65518
1.16190
.35
1
\177.8496
1.38
li. 8769
1.17047
11.9044
1.17475
L9321
1.17898
li9600
1.18322
\177.8225
4.11096
2.9241
1.72
L7424
1.30000
2.9584
1.71
1.3!
.40
1.41
5.95700
2.5281 1.260953.98748
!.5876
1.3,'
3.89872
1.23693
1.57
1.2,
1.2
1.23288
2.3409
1.59
1.2
1.21
1.53
1.05357 3.33167
!.3924
1.2
1.52
2.3104
1.22882 5.88587
.1.2521
1.3689
1.2
1.22474
2.280I
3.30151
!.2996 1.06771 5.37639

1.3225 1.07258 5.59116
1.3456
1.07705 5.40588
2.2500
1.51
1.04403
11.03441
3.27109
!1.1664 1.03923
3.28634
\1771.1881
1.1
3.87298
1.150
5.67423
1.16619
3.68782
1.82
1.85
1.86
5.5124
1.87
5.72827
1.89 5.5721
5.74166
1.90
3.75500
1.88
......
1.54536- 4.25441
4.26615
1.34907
3.3489 1.55277 4.27785

5.3856 1.35647 4.28952
3.4225
1.36015
4.30116
5.4596 1.56582 4.31277
5.70155
5.71484
4.23084
5.4969
5.5544
3.6100
1.56748
1.57115
1.57477
4.32435
4.35590
4.54741
1.57840 4.55890
1.47
1.48
1.49
1.21655
1.21244
5.85406
1.22066 3\17784708
5.86005
1.91 5.6481
1.58203
4.37055
1.92
5.6864 1.38564 4.58178
1.95 5.7249
1.38924
4.59318
1.94 3.7636 1.392844.40454
1.95
5.8025
1.59642 4.41588
1.96 5.8416 1.40000
4.42719
1.97 5.8809 1.40357
4.45847
1.98
3.9204 1.40712 4.44972
1.99 5.9601
1.41067
4.46094
1.51
1.22474 3.87298
2.00
1.4\177
1.18745
1.19164
5.76829
1.43
1.19583
1.44
1.45
1.46
1.20000 3.79475
1\17720416
1.20830
5.78153
5.80789
3.82099
x/10N
N
351
4.0000
N2
1.41421
4.47214
X/\177N
TABLE
(Continued)
N
2.00
4.0000 1,41421 4.47214
2,01
4.0401
2.02
4.0804
2.05
1.41774
6.3001 1.58430
6.3504
1.58745 5.01996
4.50555
6.4009
1.59060
6.4516
1.59574 5.03984
1.59687
5.04975
1.60000 5.05964
1.42829 4.51664
1.45178
2.06
4.2456
1.45527
2.07
2.08
2.09
4.2849
4.5264
4.5681
1.45875
1.44222
1.44568
2.10
2.11
2.12
2.15
2.14
2.15
2.16
X/'\177-0N
5.00000
2.53
2.52
4.2025
2.05
X/'\177
1.58114
2.51
4.1616
2.04
N2
6.2500
4.48550
1.42127 4.49444
4.1209 1.42478
2.50
4.52769
4.55872
2.54
2.55
6.5025
5.00999
5.02991
2.56
6.5536
2.57
6.6049 1.60312
4.4100 1.449144.58258
2.60
6.7600 1.61245 5.09902
4,452I
4.59547
2.61
6,8121
4.61519
2.65
6.9169 1.62175
1.45258
4.54975
2.58
2.59
4.56070
4.57165
4.4944 1.45602 4.60455

4,5569
1.45945
4.5796 1.46287 4.62601

4.6225
1.46629
4.65681
4.6656 1.46969 4.64758
2.17 4.7089
1.47309
4.65855
2.18
4.7524 1.47648 4.66905
2.19 4.7961
t.47986
4.67974
2,20
4.8400 1.48524 4.69042
2.62
6.6564
6.7081
6.8644
1.60624
1.60955
1.61555
5.06952
5.07937
5.08920
5.10882
1.61864 5.11859
5.12835
2.64
6.9696
1.62481
5.13809
2.66
7.0756
1.65095
5.15752
2.67
2.69
2.68
7.1289 1.654015.16720
7.1824
1,65707
5.17687
7.2561 1.640125.18652
2.70
7.2900
2.72
2.65
7.0225 1.62788 5.14782
1.64517
5.19615
2.21
4.8841
1.48661
4.70106
2.71
2.25
4.9729
1.49552
4.72229
2.75
7.3441 1.646215.20577
7.5984
1.64924
5.21556
7,4529 1.65227 5.22494'
2.24
5.0176
1.49666 4.75286
5.0625
1.50000
2.76
2.22
2.25
4.9284 1.48997 4.71169
7.5625
1.65851
4.76445
2.77
7.6729
1.66453
5.26308
7.7284
1.66753
4.78559
2.79
7.7841
1.67055
5.27257
5.28205
2,80
7.8400 1.67552 5.29150
1.50555 4.75595
5.1076
2.27
5.1529 1.50665
2.29
5.I984
1.50997 4.77495
5.2441
1.51527
2.30
5.2900 1.51658 4.79585
2.31
5.3561
2.55
$.4289 1.526454.82701
1.51987
5.25450
7.5076
2.75
2.26
2.28
1.65529
2.\1774
4.74542
2.78
2.81
4.80625
5.24404
7.6176 1.661525.25557
7.8961
7.9524
1.67651
1.67929
8.0089
1.68226
5.50094
5.51057
2.52
5.5824
1.52515 4.81664
2.82
2.85
2.34
5.4756
1.52971 4.83755
2.84
2.85
2.86
8.0656 1.68525
5.52917
8.1225 1.68819 5.55854
8.1796 1.691155.54790
2.87
8.2569
1.69411
5.55724
2.89
8.5521
1.70000
5.57587
2.55
2.36
5.5225 1.55297
5.5696
1.55625
4.84768
4.85798
2.59
1.55948 4.86826
5.6644
1.54272
4.87852
5.7121 1.54596 4.88876
2.40
5.7600
2.41
2.42
2.44
2.57
2.58
2.45
2.45
5.6169
8.4100 1.70294 5.58516
5.8081 1.55242 4.90918
8.4681
1.55565
4.91955
2.92
2.95
8.5264
1.70587
1.70880
5.59444
5.8564
8.5849 1.711725.41295
5.9536
1.56205
4.93964
2.94
8.6456
1.71464
5.42218
4.95984
2.96
8.7616
1.72047
5.44059
4.96991
2.97
8.8209
1.72557
5.44977
4.98999
2.99
8.9401
1.72916 5.46809
5.9049 1.558854.92950
6.0025 1.565254.94975
1.56844
2.47
6.1009 1.57162
2.49
6.I504
1.57480 4.9?996
6.2001
1.57797
5.56656
2.91
4.89898
6.0516
2.50
8.2944 1.69706
2.90
1.54919
2.46
2.48
2.88
5.51977
6.2500 1.58114 5.00000

Ns
X/\177
2.95
2.98
3,00
N
x/i0fi
352
5.40370
8.7025 1.717565.45159
8.8804 1.72627
9.0000 1.73205
N 2
X./\177
5.45894
5.47723
Y '10N
(Continued)
TABLE
N2
3.0,
9.0000
1.732055.47723
3.0
9.0601
1.73494
9.1204
1.7378I
3.0.
9.1809
3.0
9.2416 1.74356
3.0
9.5025
3.0
9.5656
3.0
9.4249
9.4864
3.0
9.5481
3.0
1.74069
1.74642
3.50 12.25001.87085
5.51
5.48655
3.52
5.49545
5.50454
5.55
3.54
5.55
5.51562
5.52268
1.74929 5.55175
5,56
1.752145,54076
3.1
$.
5.92455
1.87885
5.94158
12.5516
1.88149 5.94979
I2.6025
t2.6756
1.87617
5.93296
1.884145.95819
1.88680
5.96657
3.57 12.7449 1.88944
5.97495
5.54977
5.58
12.81641.89209
1.75784
5,55878
5.59
12.8881
1.89475
5.99166
1.76068
5.56776
3.60
12.9600
1.89757
6.00000
3,61 15.05211.90000
3.62
9.7344 1.76655 5.58570

9.7969 1.769185.59464
5.]
1.87550
1.75499
19.6100
19.6721 1.76352 5.57674
3.1
5.91608
12.5201
12.5904
12.4609
15.1044
1.90265
5.98331
6.00833
6.01664
6.02495
3.65
15.1769
1.90526
5.60357
3.64
13.2496
1.90788 6.03324
5.66
15.5956
1.915116.04979
3.67
13.4689 1.91572
i9.8596
19.9225
i9.9856
1.77200
1.77764
5.62159
i0.0489
1.78045
5.63028
i0.2400
1.78885 5.65685
3.70
13.6900
1.92554
6.08276
3.7i
13.7641
1.92614
6.09098
.0.3684
.0.4329
1.791655.66569
1.79444
5.67450
5.68351
1.79722
\1770.4976
1.80000
1.77482
5.61249
.0.3041
3.:
3.,'
3.:
!0.5625
3.:
3.:
3.:
3.9
3.4
3.
5
3, 6
1,80278
3.68
5.69210
10.6276
1.80555
L0.6929
1.80851
t0.7584
1.81108
5.71839
5.75
13.9129
3.74
13.9876
1.93591
3.76
14.13761.95907
3.77
14.2129
[1.0889 1.82485
5.77062
3.85
1.82757
!1.1556
1.83030
;11.2225
5.77927
1.951526.10737
14.2884
1.94165
1.94422
6.11555
6.12372
6.13188
6.14003
6.14817
14.5641
1.94679 6.15630
3.80
14.4400
1.949366.1644i\"'\"
3.81
14.5161
1.95192
5.82
14,5924
1.95448
14.6689 1.95704
6.17252
6.18061'
6.18870
\1771.2896
1.85505
5.79655
5.84 14.74561.95959 6.19677

1.96214 6.20484
14.8225
5.85
1.96469 6.21289
14.8996
5.86
11.3569
1.8o576
5.80517
5,87
3,\177..._.\177]
11.5600
1.84662
3. 2
11.6964
5,\177$
11.7649
3.44
1111.8336
3.45
!11.9025
5.78792
11.9716
1.84952
1.85205
1.85472
5.83952
5.84808
5.85662
5.86515
1.85742 5.87567
1.86011 5.88218
47 i12.0409 1.862795.89067
5.89915
48 i12.11041.86548
12.1801
\177 !12.2500
/i
6.06630
6.07454
1.92875 6.09918
3.78
5.79
1.843915.83095
49
6.05805
13.8384
5.72
5.72715
1.81584 5.73585
10.8241
!0.8900 1.81659 5.74456
i0.9561 1.81954 5.75526
.11.0224 1.822095.76194
1.85848 5.81578
LI.4921 1.84120 5.82257
5.46
1.91050 6,04152
3.75 14.0625 1.95649
88
i11.4244
3.8
I5.3225
13.54241.91855
3.69 13.61611.92094
!0.1124 1.78326 5.63915

!0.17611.78606 5.64801
3.:
3.',
3.65
5.88
14.9769
15.0544
1.967256.22093
1.96977
5.89
15.1521
1.97231
3.90
15.2100
1.97484
5.91 15.2881
5.92
15.3664
5,95
15.4449
3.94
15.5236
5.96
15.6816
3.95
15.6025
6.22896
6.25699
-g. 2450
Oi
.97757
6,25500
.97990
6.26099
6.26897
.98242
1.98494
6.27691
1.98746 6.28490
1.98997 6.29285
15.7609 1.99249 6.30079

3.98 15.8404 1.994996.50872
3.97
1.86815
5.90762
1.87085
5.91608
V\1770 N
3.99 15.92011.99750
6.51664
4. O0
6.32456
353
16.0000
N2
2.00000
v'0N
TABLE I
N
N2
(Continued)
X,/\177
2,00000
6.32456'
4.00
I\"6.0000
4.01
4.02
4.03
16.080] 2.002506.55246
16.160q
2.00499
6,54035
16.2409 2.00749 6.34823
4.50
4.04
16.3216
2.00998
6.35610
4.06
16.4836
2.01494
6.37181
4.05
16.4025 2.012466.56396
4.07 16.56492.01742
6.37966
4.08
16.6464
2.01990 6.58749
4.09 16.72812.02237
6.39531
4.10
16.8100
2.12152
4.5I 20.34012.12568
4.52
4.55
20.4504
2.12605
20.5209
2.12858
6.70820
6.71565
6.72509
6.75055
4.54 20.61162.15075
6.75795
20.7025
2.15307 6.74537
4.56 20.7956 2.135426.75278
4.55
4.57
20.8849
2.13776
4.59
21.0681
2.14245
6.76018
6.77495
4.58 20.9764 2.140096.76757
2.02485 \"\177.405I 2
4.60 21.16002.14476
6.41093
4.12 16.9744
2.02978
6.41872
4.15 17.0569 2.03224 6.42651
4.11
20.2500
16.8921 2.02731
6.78255
4.62
4.65
21.2521 2.14709 6.78970

21.5444
2,14942
6.79706
21.4569 2.15174 6.80441
4.64
21.5296
4.61
4.14 17.1396
2.05470
6.45428
4.15 17.2225 2.05715 6.44205
4.16 17.5056
2.05961
6.44981
4.65 21.6225 2.156596.81909

4.66
21.7156
2.15870 6,82642
17.5889 2.04206 6.45755

2.04450
6.46529
17.5561 2.04695 6.47502
4.67 21.80892.16102
6.85574
4.68
21.9024
2.16555 6.84105
4.69 21.99612.16564
6.84856
4.17
4.18
4.19
17.4724
4.20
17.6400
4.21
17.7241 2,05185 6.48845
2.04959
4.22
17.8084
4.25
17.8929
2.05426
2.05670
4.24
17.9776
2.05913
4.26
18.1476
6.48074
4.70
6.49615
6.50384
6.51153
6.51920
2.06598 6.52687
4.25 18.06252.06155
4.27 18.23292.06640
6.53452
4.29
18.5184
2.06882 6.54217
18,4041
2.07123
6.54981
4.30
18.4900 2.07364 6.55744
4.31
18.5761
2.07605
I8.6624
18.7489
2.07846
2.08087
4.28
4.52
4.55
4.34
4.74
18.8356 2.08527 6.58787
4.76
4.80
25.6196
2.20454
6.97157
23.7169 2.206816.97854
25,8144
2.20907
6.98570
25.9121 2.21155 6.99285
4.39
19.2721
2.09525
6.62571
4.40
19.3600 2.09762 6.65325
4.41
19.4481
19.1844 2.092846.61816
4.94
24.4056
4.95
24,5025
19.9809
2.11424
N \177
X/\177
7.00000
4.92
4.47
2.12152
2.21559
4.95
6.64851
2.10713 6.66553
2.10950
6.67085
2.11187 6.67852
20.2500
24.0100
6.65582
19.7156
4.50
4.90
24.1081 2.21585 7.00714

24.2064
2.21811
7.01427
24.5049 2.22056 7.02140
19.8916
2.11660
4.89
4.91
4.46
20.0704
4.88
23.5225 2.20227 6.96419
6.64078
4.44
4.48
6.92820
4.86
6.61060
4.49 20.16012.11896
2.19089
4.87
2.09045
19.8025
25.0400
6.95701
19.0969
4.45
6.89928
2.20000
4.57
2.10238
2.18174
25.4256
4.85
19.5564
22.6576
6.88477
6.89202
4.84
6.59545
4.43 19.62492.1047\177
2.17715
4.77 22.7529 2.184056.90652

4.78
22.8484
2.18632 6.91575
4.79 22.94412.18861
6.92098
2.08567
4.42
6.85565
4.81 25.15612.19517
6.95542
4.82
25.2524
2.19545 6.94262
4.85 25.52892.19775
6.94982
6.56506
6.57267
6.58027
19.0096 2.08806 6.60505
2.10000
22.4676
4.75 22.56252.17945
18,9225
4.38
2.16795
6.81175
4.71 22.18412.17025
6.86294
4.72
22.2784
2.17256 6.87025
4.75 22.5729 2.174866.87750
4.35
4.36
22.0900
2.15407
2.22261
2.22486
7.02851
7.05562
6.68581
4.97
24.6016 2.22711 7.04275

24.7009
2.22935 7.04982
6.70075
4.96
6.69528
6.70820
X/\1770N
354
4.98 24.80042.25159
7.05691
4.99
24.900I
2.25585
7.06599
5.00
25.0000
2,23\177Q7
7.07107
TABLE
.oo
N2
'
i25.oooo
X/N
X/10N
2.23607
7.07107
!25.I001 2.23830 7.07814

5.02 i25.2004
2.24054 7.08520
5.01
(Continued)
5.50
30.2500 2.345217.41620
5.51
30.3601
2.54734
7.42294
50.5809
2.35160
7.43640
5.52 50.4704 2.34947 7.42967
5,0:5
:125.3009
2.24277 7.09225
5.55
5.04
i25.4016
2.244997.0993\177
5.54 50.69162.35372
30.8025
7.44312
5.56
2.55584 7.44985
50.9156
2.55797
7.45654
5,57
51.0249
2.56008 7.46324
5.05 !25.5025 2.24722

5 06 25.6036 2.24944
5.[37
5 98
5.39
\1772,5.7049
2,25167
5.55
7.10634
7.11337
7.12059
2.25389 7.12741
25.8064
!25.9081 2.25610 7.15442
,5,10 26.0100 2.25832 7.14145
5.12
2.260537.14845
5.58
2,36220
51.2481
2.56452 7.47665
5.60
31.5600
2.36645
5.61
51.4721 2.36854 7.48999
5.62
'26.2144
2.26274
7.15542
i26.3169
2.26495
7.1624O
5.63
2.26936 7.17635
2.27156 7.18331
2.26716
7.16958
5.64
5.65
5.66
i.
s.;li
2.27376 7.19027
2.27596 7.19722
.26.8324
!26.6256 2.278167.20417
26.7289
26.9561
5.68
5.\1770
27.0400
5.13
5. I4
:26.4196
5.15
!26.5225
5.:22
:27.2484
\1777.5529
27.5625
5.67
32.1489
5.69
52.3761
2.38118 7.52994
2.38328
7.53658
2.38537 7.54321
2.38747
32.2624
32,6041 2.58956 7.55645
5.72 52.71842.59165
7.54985
2..\1778473
7.22496
2.28692
7.23187
5.75
52,8529
2.59574
7.56968
2.28910
7.25878
7.24569
7.25259
5.74
52.9476
2.39585
7.57628
5.76
55.1776
2.40000 7.58947
5.77
55.2929
2.40208
55.4084
2.40416
oo.5\17741
2.40624
2.29129
5.75 55.06252.59792
5.78
2.300007.27524
5.79
7.56507
7.58288
7.59605
7.60263
7.60920
2.30217
7.28011
5.80
2.50454
7.28697
5.81
55.6400 2.40852 7.61577

55.7561
2.41059 7.62254
5.85
55.9889
5.52 28.30242.30651
5.35
28.4089
2.30868
7.29583
7.30068
5.\1774
28.5156
2.31084
7.50755
5.35
28.6225
5.\177;6
28.7296
2.315177.32120
5.$7
28.8569
29.1600
7.52530
2.37697 7.51665
32.4900
2.29565 7.25948
2.29783 7.26656
.5.40
7.50999
2.57908
5,71
27.7729
29.0521
2.57487
32.0356
31.9225
5.70
27.8784
5;._\177
51.8096
7.21803
5.98
5.29
28.9444
7.49667
31.6969 2.37276 7.50333
7.21110
5.'27
5.38
2.37065
31.5844
2,28254
2.29347
!28.0900
7.48331
2.28035
27.6676
27.9841
7.46994
51.1364
5.59
2,31733
2.32164
2.32379
2.41454
7.65544
7.34166
5.84 54.10562.41661
7.64199
5.85
54.2225
2.41868 7.64855
5.86 54,5596 2.42074 7.65506
5.87 54.45692.42281
7.66159
5,88
54.5744
2.42487
7,66812
5.89
54.6921 2.42695 7.67465
7.54847
5.90
2.31301 7.31457
2.51948
5.82 35.8724 2.412477.62889
7.32805
7.33485
2.32594 7.35527
2.32809 7.56206
5.42 29.2681
29.3764
54.8100
2.42899
7.68115
5.92
54.928I 2.45105 7.68765

55,0464
2.45511 7.69415
5,95 55.16492.45516
7,70065
5.94
55.2856
2.45721
7.70714
29.81162.33666 7.58918
5.96
55.5216
2.44151
5.!5
5. 7
2,53880
29.7025
29.9209
5.\1778
'30.0304
2.34094 7.40270
2.34307 7.4O945
5.97 35.64092.44556
o.25oo
2.345217.4162O
5.'}I
5.\1775
29.4849
5.44
29.5956
5. 6
lq'
'
5.91
2.330247.36885
9'\"
2.33,,08
2.33452
7.57564
7.58241
7.39594
N2
355
5.95
55.4025 2.459267.71562
5.98
5.99
55.8801
55.7604
6.00
56.0000[2.44949
2.44540
7.72010
7.72658
7,75505
2.44745 7.75951
7.7459\177'
TABLE I
v/iON
6.00
6.01
6.02
6.03
6.04
36.0000
2.44949 7.74597
6.50
42.2500 ;}.549518.06226
2,55147
42.5104
2.55343 8.07465
8.06846
42.6409
2.55539
8.08084
42.7716
2.55734 8.08703
2.45153
36.2404
2.45357 7.75887
7.75242
36.3609
2.45561
7.76531
6.51
6.52
6.55
7.77174
6.54
36.1201
36.4816 2.45764
6.05
36.6025
2.45967
6.07
36.8449
2.46374
6.09
37.088I
6.06
(Continued)
6.55
7.77817
36.7236 2.461717.78460
7.79102
7.79744
2.46779 7.80585
6.08 36.96642.46577
6.10 57.21002.46982
42.3801
42.9025
6.56
45.0536 2.56125 8.09958
2.55930
8.09521
6.57
43.1649
2.56320
8.10555
43.2964 2.565158.11172
6.58
7.81025
45.4281
6.59
6.60
\"43,5600
2.56710
8.11788
2.56905
8.12404
37.332I
2.47184 7.81665
6.61
45.6921 2.57099 8.15019
37.4544
2.47386
37.5769 2.47588 7.82945
6.62
43.8244
2.57294
8.13654
6.13
7.82304
6.14
37.6996
2.47790
7.83582
6.64
44.0896
2.57682
8.14862
6.16
37.9456
2.48193
7.84857
6.66
44.3556
2.58070 8.16088
6.17
38.0689 2.48395 7.85493

38.I924
2.48596
6.t9
38.3161 2.48797 7.86766
7.86130
6.67 44.48892.58263
8.16701
6.68
44.6224
2.58457
8.17515
6.69 '44.7561 2.586508.17924
6.20
38.4400
2.48998
7.87401
6.'/0
6.11
6.I2
6.15
6.18
6.21
6.22
6.25
6.24
6.25
6.26
6.27
6.28
6.29
37.8225 2.47992
6.63
7.84219
6.65
43.9569 2.57488 8.14248

44.2225 2.57876
44.8900
2.49599
58.8129
2.49600 7.89505
7.88670
38.9376
2.49800
7.89957
6.74
38.5641
38.6884
2.49199 7.88036
39.0625
2.50000 7.90569
6.71
45.4276
2.59615
6.75 45.56252.59808
8.22192
39.1876
2.50200
6.76
45.6976
2.50400 7.91833
6.77
39.4384
2.50599
7.92465
6.78
6.79
45.8329 2.601928.22800
45.9684
2.60584
8.25408
46.1041 2.60576 8.24015
7.93725
6.80
46.2400
39.5641
7.91202
2.50799 7..93095
2.51197 7.94355
40.0689 2.51595
39.9424
2.51396
7.94984
6.82
6.34
40.1956
2.51794
7.96241
6.84
46.7856
6.36
40.4496
2.52190
6.86
47.0596
6.37
40.5769 2.52389
6.38
40.7044
6.59
40.8321
40.3225 2.51992 7.96869

2.52587
7.97496
7.98123
7.98749
2.52784 7.99575
2.60768
8.24621
46.3761 2.609608.25227
46.5124
2.61151 8.25855
6.85 46.64892.61345
8.26458
59.8161
6.53
7.95613
2.60000
8.20975
8.21584
39.3129
6.31
6.35
8.18535
45.0241 2.59057 8.19146

6.72
45.1584
2.59250
8.19756
6.75 45.2929 2.594228.20566
6.30 39.69002.50998.
6.52
2.58844
8.15475
6.81
8.27045
8.27647
2.61916 8.28251
2.61534
6.85 46.92252.61725
6.87 47.19692.62107
8.28855
6.89
47.5344
2.62298 8.29458
47.4721
2.62488
8.50060
47.6100 2.62679 8.50662
6.88
6.40
40.9600
2.52982
8.00000
6.90
6.4I:
41.0881
2.53180
8.00625
6.91
47.7481
2.62869
6.95
47.8864
2.65059
8.51865
8.01873
48.0249 2.652498.52466
6.94
48.1656
2.65459
8.35067
6.96
48.4416
2.65818
8.34266
6.45
41.2164
2.53377 8.01249
41.3449
2.53574
6.44
41.4736
2.53772 8.02496
41.6025
2.53969
6'.46
41.7316
2.54165
8.03741
6.47
4128609
2.54362
41.9904
8.04363
2.54558 8.04984
42.1201
2.54755
8.05605
6.42
6.45
6.48
6.49
0.50
N
8.03119
42.2500 2.54951 8.06226

N2
v/i,ox
356
6.92
6.95
8.51264
48.5025 2.656298.55667
6.97 48.58092.64008
8.34865
6.98
48.7204
2.64197 8.55464
6.99 48.86012.64586
8.36062
'7.,00
4.%0000
2.64575
8.56660
TABLE
(Continued)
49.1401
2.64764
49,2804
2,64953
8.37257
8.37854
49.5616
2.65550
8.39047
49.8436
2.65707
8.40258
7,
,49.9849
2.65895
8.40833
150.2681
2.66271
8.42021
i50.1264 2.66083 8.41427
7.
7.
50,5521
2.66646 8.43208
\1770.6944
2.66855
7..
30.8569
2.67021
7!i
2.681428.47959
2.68328
8.48528
S1.98412.68514
7.2
52.1284
7.22
8.46759
2.67955 8,47549
51.6961
7.23
52.2729
7.24
52.4176
52.5625 2.692588.51469
7.26 52.7076 2.69444
8.52056
7.25
7.27
52.8529
7.28
7.29
52.9984
55.1441
8.52645
2.69629
8.53229
2.69815
2.70000
8.53815
7.30
2.70185
8.54400
7.51!
2,70370
8.54985\177
2.70555 8.55570
7.32i
7.35
7.34
7.35
5\177.8756
54\177i0225
7.56
[ 54i1696
7.58
7.39
7.37 [ 54\177i.'5169
54i4ff44
54i6121
2.70740
8.56154
2.70924
8.56738
2.71109
2.71293
8.57321
8.57904
2.71477 8.58487
2.71662 8,59069
2.718468.59651
2.74591
2.74775
8.68352
8.68907
2.74955 8.69483
57.3049 2.751368.70057
57.4564
2.75518
8.70632
57.6081.
2.75500
8.71206
8.75214
7.66
58,6756
7.67
58.8289 2.76948 8.757851
7.68
58.9824
2,76767
2,77128
7.69 59.15612.77508
8.76556
8.76926
7.70
59,2900
2.77489
8.7749'\177
7.71
59.4441
2.77669
8.78066
7.75
59.7529
2.78029
8.79204
7.74
7.75
59.9076
60,0625
2.78209
2.78588
8.80541
8.49117
2.68701 8.49706
2.68887 8.50294
2.69072 8.50882
57.0025
8.67756
57.7600
8.71780
7.61 57.91212.75862
8.72555
7.62
58.0644
2,76043 8.72926
7.65 58.21692,76225
8.75499
7.64
58.3695 2.76405 8.74071
7.65 58.52252.76586
8.74645
8.44985
1.84oo
55.8516
57.1556
8.67179
2.74408
7.60
8.46168
2.67769
7.,
7.55
7.56
7.59
2.67595 8.45577
5! .4089
51.5524
56.7009
7.58
8.44593
$1.26562.67582
7.55
7.57
8.45801
2.67208
\1770.9796
81.1225
7.
2.73861 8.66025
56.4001 2.74044 8.66603
7.52 56.55042.74226
49.7025 2.65518 8.59645
7.
7.51
56.2500
7.50
49.4209 2.651418,38451
7
N \177
49.0000 2.64575 8.36660
7.72
59.5984 2.77849
8.78655
8.79775
7.76
60,2176 2.78568 8.80909
7.77
60.3729
2.78747
8.81476
60.6841
2.79106
8.82610
7.78
7.79
60.5284 2.78927
7.80 60.84002.79285
60.996t
2.79464
8.82045
8.83176
7.81
7.82
61.1524
2.79645 8,84508
7.84
61.4656
2.80000 8.85458
7.86
61.7796
2.80357 8.86566
7.85 61,50892.79821
7.85 61,62252.80179
7.87
7.88
7.89
8.85742
8.84875
8.86002
61.9369
2.80555
8.87150
62.2521
2.80891
S.88257
62.0944 2.807158.87694
7.40
2.72029
8.60255
7.41
2.72215
8.60814
7.9I
62.568I
2.81247
8.61974
7.92
62.7264
2.81425
8.89582
8.89944
7.94
65.0456
2.81780
7.95
8.91067
8.91628
65.5616 2.821558.92188
65.5209
2.82512
8.92749
7.42
7.43
2.72597
2.72580
7.44
7.45
7.46
2.73150
7.47
7.48
7.49
7.50
N
2.72764
2.72947
8.61594
8.62554
8.63134
8.63713
2.73313
8.64292
2:73679
8.65448
2.75861
8.66025
2.73496 8.64870
7.90
7.95
7.96
7.97
7.98
7.99
62.4100 2.81069 8.88819
62.8849 2.816058.90505
65.2025
65.6804
65.8401
2.81957
2.82489
2.82666
8.00 64.00002.82845
8.95308
8.95868
8.94427
TABLE I
2.82845
(Continued)
8.94427
8.50 72.2500 2.915489.21954_
8.00
64.0000
8.0I
64.I601 2.85019 8.94986
8.5I
72.4201
2.91719 9.22497
8.02
64.3204
8.05
64.4809
8.55
72.7609
2.92062
8.54
72.9516 2.92255 9.24121
8.55
75.1025
8.56
75.2756
2.92404
2.92575
9.24662
9 25205
8.57
8.58
75.4449
2.92746
9.25745
75.7881
2.95087
9.26825
2.85196
2.85575
8.52 72.5904 2.918909.25038
8.95545
8.96105
8.04 64.6416 2.85549 8.96660

8.05
64.8025
2.85725
8.97218
8.06 64.9656 2.859018.97775
65 1249
65 2864
65.4481
8.07
8.08
8.09
-65.6100
-8.10
8.11
8.12
2.84077
2.84\177o5
8.98352
8.98888
2.84429
8.99444
2.84605
8.59
9.00000
2.84956
8.15
9.01110
66.0969 2.85152 9.01665
8.14
66.2596
2.85507
9.02219
8.15
'-8.61 74.152I 2.95428
66.4225 2.85482 9.02774

66.5856
8.16
2.85657
66.7489
2.85852
9.05881
8.19
67.0761
2.86182
9.04986
8.18
67.4041
2.86551
9.06091
8.22
8.25
67.5684
67,7529
2.86705
2.86880
9.06642
9.07195
8.24
8.25
8.26
67.8976
68.0625
68.2276
2.87054
2.87228
2.87402
9.07744
9.08295
),08845
8.27
8.28
8.29
68.5929
68.5584
68.7241
-8.30
2.87576
2.87750
2.87924
69.0561
69.2224 2.88444 9.12140

69.3889
2,88271
9.11592
2.88617
).12688
9.14877
8.59
70.5921
2.89655
9.15969
70.5600 2.898289.16515
-8.41
8.42
70.7281
70.8964
8,44
71,2336
2.90517
8\17745
71.4025
2.90689
8,45
\177.90000
2.90172
9.54545
8.74
76.5876
2.95655
9.34880
8.76
76.7576
2.95975
9.55949
8.77
76.9129
2.96142 9.56485
77.0884
2.96511
9.37017
77.4400
2.96648
9.58085
77.6161
2.96816
8.78
8.79
\177.80
8.81
8.82
8.85
V'\177
76.5625 2.958049.55414
77.2641
77.7924
77.9689
2.96479 9.57550_
9.58616
2.96985
7.59149
2.97155 9.59681
8.84 78.14562.97521
2.97489
9.40215
9.40744
2.97658
9.41276
8.85
8.86
78.5225
78.4996
8.87
78.6769 2.97825 9.41807

78.8544
2.97995
9.42558
79.0521 2.98161 9.42868
8.88
8.89
\177.90
79.2100
2.98529
9.45398
-8.91\"' 79.5881'\177.984969.45928
8,)2 79.5664 2.986649,44458
9.17061
9.17606
8,95
71,0649 2.905459,18150
Ns
9.52758
2.95466
9.18695
9.19259
71,5716 2,90861 9,19785

8.47 71.7409 2.910539.20526
8,48
71.9104
2.91204 9.20869
8.49 72.0801 2.915769.21412
8.5 O- -72.2500
2.91548 9.21954_
8,46
2.94958
76.2129
8.58 70.2244 2.894829.15425
8,40
-75.6900
8.75
69.5556 2.887919,15256
69.7225
2.88964
9.15785
8,56 69.8896 2.891579,14550
2.89510
9.50591
2.94449 9.51128
75.5424
2.94618
9.51665
75.5161 2.94788 9 52202_
75.8641 2.95127 9.55274
8.55
70.0569
9.30054
2.94279
8.71
8.54
8.37
2.94109
75.1689
8.75
68.8900 2.88097 9.11045_
8.33
74.9956
9.28440
9.28978
8.72 76.0584 2.95296 9.55809
9,10494
8.32
74.8225
8.69
9.09595
9.09945
8.51
74.6496 2.95959 9.29516
8.65
\177,\1770'
67.2400 2.86556 9.05559
8.21
8.64
8.68
66.9124 2.86007 9.04454
8.20
74.5044
74.4769
8.67
2.95598
2.95769
9.27901
8.62
8.65
8.66
\177.05527
8.17
2.92916 9.26285
75.9600 2.95258 9.27562
8,60
65.7721 2.84781 9.00555

65.9544
75.6164
9.23580
x 10N
358
79.7449
2.98851
9.44987
8,94 79.9256 2.989989.45516

8.95
80.1025
2.99166 9.46044
8,96 80,28162.99555
9.46575
8.97
80.4609
2.99500
9.47101
8.99
80.8201
2.99855
9.48156
.oo
81.0000 3.000009.48683
8.98
80.6404 2.996669.47629
TABLE
3.00000
81.0000
\177100
).01 81.1801
9.48683
90.,\17700
3.08221
5.001679.49210
9.51
90.4401
5.05585
9.73192
9.55
90.8209
5.08707
9.76217
0.,50
I
}.02
81.3604
3.00533
9.49737
LOS
81.5409
5.00500
9.50265
L04
\177.05
81.7216
81.9025
5.00666
3.00832
9.51315
!h06 82.0856
90.6304 5.08545 9.75705
9.54
91.0116 3.088699.76729
9.55
91.2023
9.57
9.58
9.50789
82.2649
82.4464
82.6281
5.0I 164
3.01330
3.01496
c\177.10
82.8100
3.01662
9.53959
t.). 11
82.9921
3.01828
9.54465
9.13
83.3569
3.02159 9.55510
\177.14
83.5396
9.15
85.7225
5.02324
\"' 9
0.0,,490
9.12 83.17443.01995
9.52.565
9.52890
9.53415
84.0889
\177.18
\177.19
84.2724
84.4561
9.65
\177.l\177/lo
92.9296 5.104859.81835
93.1225
3.10644
9,67
95.5089
5.10966
93.7024
3.11127
9.69
95.8961
9.;0
94.09003.11448
9,71
9.72
9.1.2841
9.68
84.6400 3.03515 9.59166

3.03809
9.79285
9.58645
9.58123
84.8241
85.0084
5.09677
9.57601
9.'.21
85.1.929
9.z8,64
3.09516
9.66 95.31565.10805
9.20
3.03480
5.05645
,5.09.554
92.1600 3.09839 9.79796

9.61
92.3521
3.10000 9.80506
9.62 92.5444 5.101619.S0816
9.63
92.7369
5.10322 9.81326
9,64
9.56556
9. I7
9I .5849
91.7764
91.968[
9.77241
9.77753
0.60
9.56033
85.9056 3.02655 9.57079

$.02820
3.02985
3.03150
9.59
9.54987
9.16
3.09031
9.56 91.39363.09192
!1.07
9'.23
9.52
9.74679
5.009989.51840
f\177.08
t.,.09
q.22
(Continued)
9159687
9.60208
9.60729
9.75
3.11288
3.11609
9.82344
9,82855
9.85362
9.84378
9.83870
9.84886
9.85395
9,85901
94.4784
5.11769
94,6729
5.11929
9.86408
94.8676
5.12090
9.86914
85.3776
$.05974 9.61249
85.5625
3.04138
9.61769
9.74
9.75
9.76
9.27
85.9329
3.04467
9.62808
9.77 95.45293.12570
3.29
86.3041
3.04795 9.65846
9.78
9.79
91.110
86.4900
3.04959
9.80 96.04005.15050
!,51
86.6761 5.05125 9.64883

86.8624
3.05287 9.65401
9,81
96.2361
5.15209 9.90454
5.05450
9.65919
96.4.524
87.0489
3.15369
9,85
96.6289
5.15528
4
\177125
85.7476 5.043029.62289
9.26
t
t
9.28 86.11843.04631
\177
.52
i .33
9.65328
9.64565
9.89949
9.90959
9.91464
9.67988
3.06431
9.69020
5.06594
9,69536
9.90
5.06757
3.06920
9.70052
9.70567
9.9549!)
9.92 98.40645.14960
9.959\177j2
9.95
98.6049
5.15119 9.96494
9.94 98.8056\177.15278 9.96995

9.95
99.0025
5.15456 9.97497
9.96 99.20165.15595
9.97998
' .56
\177'
87.6096
' 87.4225
: 87.9844
.39
88.1721
(.
,40
88.5600
': 88.5481
-:88.7564
3.05778 9.66954
3.062689.68504
9':43
88.9249
3.07083
9.71082
9144
89.1156
3.07246
9.71597
9k46
89.4916
5.07571
9.72625
9148
i89.8704
3.07896
9.73653
90.2500
3.08221
7.7.1679
91.45 89.5025
9.91
3.07409 9.72111
9.49 i90.06015.08058
- i
9.884,'75
9.88939
9.89444
9.67471
9.66457
0150
5.12890
9.87927
5.06105
5.05614
\177
,41
i .42
3.127.':;0
9.87421
3.05941
87.2356
i .38
95.6484
95.8441
5.12410
87.7969
' .37
95.2576
5.1 2250
9.84 96.82565,15688
9.91968
9.85
97.0225
5.13847 9.92472
9.86 97.21963.14006
9.92975
9.87
97,4169 3.14166 9.95479
9.88 97.61,t-t
5.14525
9.93982
9.89
97.8121 3.14484 9.94485
',34
i.35
9.82
\"
\"'
9o,06.\177
9.97
9.98
9.99
9.74166
98.0100
98.2081
99.4009
99.6004
99.8001
3.14645
5.15753
3.15911
3.16070
10.00 100.0003.16228
359
9.94987
5.14802
98\1779O
9.o80'\177
9.995(!0
10.0000
:'
TABLE
IIa
Randorn
39 65 76 45 45 19 90 69 64 61 20 26
73 71 23 70 90 65 97 60 12 11 31 56
72 20 47 33 84 51 67 47 97 19 98 4O
24 33
75 17 25 69 t7 17 95 21 78 58
37 48 79 88 74 63 52 06 34 30 O1 31
Digits
36 31 62 58 24 97 14 97
34 19 19 47 83 75 51 33
23 O5 O9 51 8O
O7 17 66
45 77 48 69 81 84 09 29
60 10 27 35 07 79 71 53
95 06 70
99
30 62 38 20
59 78
O0
46
11 52 49
93 22 70 45 80
28 99 52 O1 41
02 89
85 53 83 29 95 56 27 09 24 43 21 78 55 09 82 72 61 88 73 61
37 79 49 12 38 48 13 93 55 96 41 92 45 71 51 09 18 25 58 94
14 36 59 25 47 54 45 17 24 89
89 09 39 59 24 O0 06 41 41 20
10 08 58 07 04 76 62 I6 48 68 58 76 17 14 86 59 53 1t 52 2l 66 04 18 72 87
95 23 O0 84 47
47
90 56 37 31 71 82 13 50 41 27 55 10 24 92 28 04 67 53 44
08 16 94
87 18 15 70 07
98 83 71 70 15
93 05 31 03 07 34 I8 04 52 35 74 13 39 35 22 68 95 23 92 35 36 63 70 35 33
21 89 11 47 99 11 20 99 45 18 76 51 94 84 86 13 79 93 37 55 98 16 04 41 67
27 37 83 28 7t
79
57 95 13 91 09 61 87 25 21 56 20 11 32 44
95 18 94 06 97
77 31 61 95 46 2O 44 90 32 64 26 99 76 75 63
97 08 31 55 73 10 65 81 92 59
69 26 88 86 13 59 71 74 17 32 48 38 75 93 29 73 37 32 04 05 60 82 29 20 25
41 47
10 25 03
91 94
14 63 62
87 63 93 95 t7
08 61 74 51 69
81 83 83
04 49
92 79
89 79
43
45 85 50 51
29 18 94 51 23
77
79
88 O1 97
30
14 85 11 47 23
47 08 52 85 08 40 48 40 35 94 22 72 65 71 08 86 50 03 42 99 36
67 72 77 63 99 89 85 84 46 06 64 71 06 2t 66 89 37 2O 7O O1 61 65 70 22 12
59 40 24 13 75 42 29 72 23 19 06 94 76 10 O8 81 30 15 39 14 81 83 17 16 33
80 06 54 t8
63 62 06 34 4I 79 53 36 02 95
79 93 96 38 63
78 47 23 53 90
87 68 62 15 43 97 48 72 66 48
47 60 92 10 77 26 97 O5 73 51
56 88 87 59 41 06 87 37 78 48
94 61 09 43
34 85 52 05
53 16 71 13
88 46 38 03
65 88 69 58
62 20 21 14 68
09 85 43 O1 72
81
86
73
59 97 50 99 52
72 68 49 29 31
84 95 48 46 45
14 93
87
81 40
24 62 20 42 31
75 70 16 08 24
58
39 88 02 84 27 83 85 81 56 39 38
84 87 02 22 57 51 68 69 80 95 44 11.29O1 95 80 49 34 35 86 47
46 39 77 32 77 09 79 57 92 36 59 89 74 39 82 15 08 58 94 34 74
23 02 77
28 06 24 25 93 22 45 44 84 11 87 80 61 65 31 09 71 91 74 25
76 71 61 97 67 63 99 61 80 45 67 93 82 59 73 19 85 23 53 33 65 97 21
45 56 20 19 47
69 30 16 09 05 53 58 47 70 93 66 56 45 65 79
28 26 08
22 17 68
65
19 36 27 59
16 77
78 43
03 28
19 87 26 72 39 27 67 53 77 57 68 93 60 61 97 22 61
43 96 43 O0 65 98 50 45 6O 33 O1 O7 98 99 46 50 47
23 68 35 26 O0
99 53 93 61 28 52 70 05 48 34 56 65 O5 61 86 90 92 10 70 80
15 39 25 70 99 93 86 52 77 65 15 33 59 05 28 22 87 26 O7 47 86 96 98 29 06
58 71 96'30 24 18 46 23 34 27 85 13 99 24 44 49 18 09 79 49 74 16 32 23 02
04
31 t7
21 56 33 73
99
61 O698 03 91
87 14 77
93 22 53 64 39
07 10
78
76
58 54 74
61 81 31 96 82
42 88 07
77 94
30
10 05
O5
39
63 76 35
92 38 70 96 92
O0 57
25 60 59
24 98 65 63 21
28 10 99 O0 27
87 03 O4 79 88
52 06 79 79 45
46 72 60 18 77
47 21 61 88 32
12 73 73 99 12
360
08 13 13 85 51 55 34 57 72 69
82 63 18 27 44 69 66 92 19 09
55 66 12 62 11 08 99 55 64 57
10 92 35 36 12
27 80 30 21 6O
49 99 57 94 82 96 88 57 17 91
TABLE
lib
0.464
137
1.486
526
--0.531
354
--0.634
0.060
2.455 --0.323
1.394
555
0.906
1.179
[13
--1.501
--0,690
1.372
>-25
--0.482
--1.376
0.046
--0,005
--1.787
--1,339
0.012
63
--0.911
!6t
--0.!175
1
--21i56
--1.186
2,945
0,\17781
--0.486
--0.256
0.065
--0.561
1.598
--0.725
1.231
1.237
--1.384
--0.959
0.731
-- i.190
0,785
--0.963
0.194
1.974
1.192
--0.258
0.412
0,2t9
--0.169
0.415
--1.141
0.983 --1.330
--1,096 --1.396
0.250
--0.166
--0,465
0.857
0.120
--0.260
1.096 0.481 --1.69i

1.239 --2.574 --0.558
0.424
0,542
--0.432
--0.423
0.748
0.362
--1.041
--0.035
0.371
--0.702
--1,885
--0.255
--0.2t8
--1,630 --0.146 --0.392

--0.116 --1.698 --2.832
0.969
--0.273
0.439
--1,983 --2,830
0.779 0,953
0.313 --0.973
--0.491
0.856
--0,212
0.525
--1.865
--0,501
--1.530
0.960
--0.736
--0.957
0.241
0,022
--0,853
0.]61
--0.631
t.046 --0.508
0.360 --0.992
--0.873
--1,633
1.298
0.t87
-- t.851
1.147 --0.121
--0.199 --0.246
1.377
0.717
-2.o8
--1.805
0,321
0.761
0.378
199
o ;35
1.041
0,279
0,571
1,375
0,595 0,881 --0.934 \177.579

0.007 0.769 0.97I 0.712 1.090
--0.162 --0.136 1,033
0.203
0,448
--1.618 --0.345 --0.511 --2.051 --0.457
--0.918
t.393
--1.558
0,926
c=l
--0.288
0.296
--0.525
1.356
--0,105
0.697
3.521
--0.057 --t.229
--I.010
--0.068
0,543
--0.194
1.279
1.022
Numbers,
Normal
Random
--0.627
--1,108
1.620--1.040
--0.238
--i.0t6
--0.869
0.417
0.056
0.561
--2.357
--t.726
0.524
1.956
--0.281
1.047 0.089 --0,573 0,932

0.079
0.032
0.471 --1,029
1,114 0,882
1.265
--0.202
0,151 --0.376 --0,310
0.479
1.151 --1.210 --0,927
0.425
0.290 --0.902
0.610
1.709
--0.439 -- \177.ii41
--1.939 0.891 --0.227
0.602
0,873 --0.437 --0.220 --0.057
-- 1.399
0,385 --0.649 --0.577
0.237
--0.289
0.513
0.738 --0.300
--0,q30
0.199
--1.083 --0,219
--0.291
1.221
1.119
0,004 --2.015 -- 0. 594
0.159
o\370i72
\3708
--2.828
--0.439
--0,792 --1.275 --0.623 -- 1,047
2.273
o.$o6 --0.313 0.084
0.606 --0.747 0.247
1.291
0,063 --I.793
--0.699 -- 1,347
0.04t -0.3\17737
0.121 0.790 --0.584
0.541
0.484 --0,986
0.481 0.996
--1.132 -2.0\1778
0.921 0.145 0.446 --1,661 1.045
--1.363
--0.586
-- 1.023
0.768 0.(379
0.034 --2.127
0.665 0.084 --0.880 --0,579
0.551
'O.375 - 1.658 --1.473
0..234 --0.656
0.340 --0.086 --0.t58 --0.120
0.418
--0,513 0,3 \1774 --0.851
0.074
0,210 --0,736
1.041 0.008 0.427 --0.831 0, t91
0.292
0.5!1
i.266
--1,206 --0,899
0.110 --0.528 --0.813 0.071
0,524
1.026
2.9 }0
--0.574
--0.491
--1.114
i.297 --t.433 --1.345 --3.001
0.479
I.\17780
0.658
--1.334
--0.287
0.1 t4 --0.568
D.8 16
3. I \1773
).1'19
-- 1.346
--1,202
--1.250
0.630
0,375
--1.420
--0.151
--0.288
+-1.9
'-0.5t7
10.489
&o.2,3
!
o.5i
0.424
0,593
0,235
--0,853
0.782
0.247
--1.711
--0.430
0.416
--0,309
0.862
--0.109
--0.254 0.574
--0.921 \1770.509
0.161
'0.658
'7\17708
\337
8\177
5
'-70.62!
0.40\177
0.593
--1.127
--0.566
--1.I81
--0,518
0.500
2.923
--1.190
0. I92
0.394 --1.045 0.843

0.942
1,810 1.378 0.584 1.216
0.060
--0.491
0.499
--0.431
0,665
--0. t35
1,705
0.754 --0.732 --0.066
\1770.762
0,298
1,456
--\177.541
0,993
--t.407
--0.023 --0.463
0.833
1,049
2.040
1.810
--0. t24
1,381
0.022
--0.899
--0.394
--0.538
0.410 --0.349
361
--1.094
0,326
1.114
I
.068
1,045
0.031
0.772
0.733
0.402
0.226
1,164
1.006
0,884 --0.298
0.457
--0,798
2.885 --0.768
0.196
--0. t06
0.116 0.484 --I.272
--1,579 --1,616 1,458
1,262
0,532
0.359
--0.318 --0.094
--0,432
1.50I
--0.145 --0.498
--1.186
--0.142 --0.504
0,777
--0.515
--0.451
1.410
--0.281
--0.129
0.023 --1.204
1,066
0.736
--0.342
1.707 --0.188
0.580
1,064
0,162
1.395
1.097
--0,916
1.222
--1.t53
1,298
Coefficients
Binomial
IIIa
TABLE
0
1
1
1
10
10
15
20
15
21
35
28
56
35
70
36
84
126
21
56
126
10
10
45
120 210
11
11
55
165
12
12
66
13
13
14
14
78
91
220 495 792

286 715 1287
364 1001 2002
3003
3432
3003
15
15
105
455
1365
3003
5005
6435
6435
16
16 120
560
1820
4368
8008
11440
12870 11440
17
17
19448
24310
18
18
31824
43758
19
19
136
153
171
50388
75582
20
20
190 11404845
330
680 2380
816 3060
969 3876
1
1
7
28
84
36
252
210
120
45
10
462
462
330
165
924
792
38760
77520
1
11
55
220
495
1716 1716
6188 12376
8568 18564
11628 27132
15504
\1771
6\177
715
1287
i
28\177
2002
1001
5005
3002
24310
48620
92378
125970 167960
800\177
1944\177
4375\177
92371
1847\177
For
Note.
efficients
,\177:(\177
-
missing from
the
l)(,r
- 2) \337. \3373.2.1
above
table,
use the
362
relation
co-
TABLE IIIb
0
1
0
1
io95
.0025'
9025\177
8-1
.574
.45
,40
.50
.8000 .7500 .7000 .6500 .6000 .5500 .5000

.2500 .3000 .3500 .4000 .4500 .5000
.2000
.8100 .7225
.6400
.5625
.4900
,1800 .2550 .3200 .3750 .4200
.4225
.4550
.4800
.4950
.5000
.0100 .0225 .0400
.1225
.I600
.2025
.2500
.0900
.0625
.3600 .3025 .2500
.0135
.0486
.0001
.8145
2
3
.8500
.I500
.35
.1715\177
.\177354
0
1
.9000
.1000
.30
.25
.7290
.2160
.1664 .1250
.2430 .6141 .5120 .4219 .3430 .2746
.3251
.3840 .4219 .'4410 .4436 .4320 .4084
.3750
.0270
.2880
.0010 .0574 .0960 .1406 .1890.2389
.0034
.0080 .0156 .0270 .0429 .0640\177 .0911
.1250
.6561
.2916 .5220 .4096 .3164 .2401 .1785 .1296 .0915.0625
.0071
.20
.15
.10
Probabilitiesp(x)
Binomial
Individual
.4116
.4219
.4096
.3685
.3456'
.3845
.2995
.2500
.0410
.0625
.0975 .1536 .2109 .2646

.3105
.3456
.0005
\177.0036
.3750
.0115 .0256 .0469 .0756 .!115 .1536 .3675
.2005 .2500
.qOO0 .0001
.0081
.0039
.0016
.0005
.0256
.0150
5 0
\337
77381.5905
.4437
,2036 !.3280
.3915 .4096
.3277
.2373
.1681 1160
.3955
.3602
.3087 .3364\1773456.3369
.1323 .1811 \1772304 .2757
.3124
.077\177
.0503
.0312
.2592
.2059
.1562
.3125
.0214\177,0729
.1382
.2048
.2637
\337
0011.\177.0081
.0244
.0512
.0879
.0000 1.0004
.0022 .0064
.0146 .0284
.0488
.0\17768
.1128
.1562
.0000
;.0000
.0001 .0003
.0010 .0024
.0053
.0102
.0185
.0312
.7\17751
.2321
..5314
\177
..3543
.3771
,3993
.2621
.0277
.0156
.0984
.1762
,2458
.3125
0
1
2
3
.0 \17721
.0154
.0004
.0015
.0044. .0102
.0205
.0369
.0609
.0938
.0000
.000I
.0002 .0007
.0018 .0041
.0083
.0156
.0000
t. 0001
.0000
.0000
4
5
rr
.0819
.0055
.0012
2
3
If
.0415
0
1
i.0146
.3025 .2437 .1866 .1359 .0938

.3241 .3280 .3110 .2780 .2344
.1318 .1852 .2355 .2765
.3032
.3125
.0330 .0595 .0951 .1382 .1861 .2344
.0\17705
.0 0I
>
.1176 .0754 .0467
.1780
.3560
.3932
.2966
.69\1773
4783
.25'73
372O
.3206
.2097 .1335
.041)6
1240
.3960
.3670
,00\1776
0230
.2097
.2753
00\275)2
0026
.0617 .i147.1730
.2269
.2679
.2903
.2918
.2734
.0109 .0287
.0972
.1442
.I935
.2388
,2734
.1641
.0547
00(\1770
3002
00(\1770
)000
oo\234o
)ooo
.50,
nt
hange
.0012
.0824 .0490 .0280 .0152 .0078

.1306 .0872
.0547
.3115 .3177 .2985 .2613 ,2140 .1641
.3115
.0577
.2471 .1848
.0043
.0115
.0250
.0001 .0004
.0013
.0036
.0466
.0084
.0774 .1172
.0172 .0320
.0002
.0006
.0016
.0000 .0000 .0001

rr and
(1
\177).
363
.0037
.0078
(Continued)
TABLE IIIb
.15
.10
.05
0
,6634
.4305
.2725
.2793
.3826
.3847 .3355
.0515
.1488 .2376
.0331
0054
.0004 .0046
.0000
.0000 .0000
.0000
.1678
.0839
.0004
.2936
.25
.30
.3115
.0865
.0026
.0092 .0231
.0002
.0011
.0038
.0039
.0548
.0312
.2541
.2786
.2787
.1361
.1875
.2322 .2627
.2568
.1094
.2188
.2734
.0467
.0808
.1239
.0100
.0217
.0413 .0703
.0033
.0079
.0164
.0002
.0007
.0017
.0312
.0039
.0046
.0020
.0629
.0077
.0001 .0004
.2316 .1342 .0751 .0404
.1719
.2188
.1094
.0207
.0101
.1556
.1004
.0605 .0339
.3020 .3003
.2668 .2162 .1612.1110.0703
.1069
.1762 .2336
:2668
.2716
.2508
.0283
.0661
.1715
.2194
.2508 .2600 .2461
.0165 .0389 .0735

.0028 .0087 .0210
.3679
.3020
.1722
.2597
.0446
.0000
.0008
.0050
,0000
.0001
.0006
.0000
.0000 .0000
0000
.0000
.0000
.0000 .0000 .0000
.5987
.3487
.3151
.3874 .3474
.0746 .1937
.0105
.0010
.0574
.0112
.0084
.2090 .1569
.6302 .3874
2985 .3874
.50
.2587
0
1
.45
.2965
.0000
.40
.0576 .0319 .0168

.2670 .1977 .1373,0896
.0012
0000 .0000 .0000 .0000 .0000 .0001
.0000
.35
.1001
.1468 .2076
.0185 .0459
4 .0006 .0074
10
.20
.0003
.0000
.1969
.2759
.2253
.1168
.0012
.0000 .0001
.0000
.0001 .0015 .0085

.0264
6 .0000 .0001 .0012 .0055
7 .0000 .0000 .0001 .0008
.0000
.0000
.0000
.0000
.0000 .0000
.0424
.0743
.0098
.0212 .0407
.0004 .0013 .0035

.0000
.0001
.0563 .0282 .0135

.2684 .1877 .1211.0725
.3020
.2816
10 .0000 .0000 .0000 .0000
.2335
.1757
.2503 .2668
.2522
.1460 .2001 .2377
.1160
.1641
.0703
.0083
.0176
.0003
.0008
.0020
.0060
.0025
.0010
.0403
.0207
.0098
.1209 .0763
.0439
.2150
.1665
.1[\1772
.2508
.2384
.2051
.0584
.1029
.0162
.0368
.0031
.0090
.1536 .2007 .2340

.0689 .1115 .1596
.0212 .0425 .0746
.2461
.2051
.1172
.0014
.0043
.0106
.0000
.0001
.0005
.0000
.0000
.0000
.0016 .0042 .0098

.0001 .0003 .0010
.0001 .0004
.0000
.1641
,1181 .1672.2128.2461
.1074
.1298 .2013
.0401 .0881
.0039
.2119
.0176
364
.0229
.0439
'TA
IIIc
.o
2 1
2
1
2
3
4
2
4
5
.10
.09 '5
.1900
.00 5
1
2
3
.14\1776
.2710
.3859
.0280
.0608
.00\1771
.0010
,0034
.18(5 .3439 .4780
4
5
6
.1095
.01\1770
.0523
.00d5
.0037 .0120
,00d0 .0001
.0005
.20
.25
.30
.35
.4375
.0625
.4880
,1040
.0080
,5781 .6570
.1562 .2160
.0156 .0270
.0900
.8215
.5904
.6836
.7599
.1808
.2617
.3483 .4370
.0272
.0508
.0837
.0016
.0039
.0081
.7627
.3125
.0150
.0256 .0410
.8319 .8840 .9222
.001:
.000i
.0005
.0022
.0067
.0156
.0308
.000p
.0000
.0001
.0003
.0010
.0024 .0053
.2649
.4686
.6229
.032\177
\337
1143
,2235
.7379
.3447
.0989
.0170
.0016
.8220 .8824
.466I .5798
.1694 .2557
.0376 .0705
.0046 .0109
.0001
.0004
.9375
.2415
.6723
.ooo\177
.8704 .9085
.1792
.2627 .3672
.0579 .1035
.0473
.2500
.1265
.5563
.0059
.7500
.2025
.6875
.I648
.0266
\337
0158
.6975
.6090
.4095
\337
0013
.50
.5248
.0815
.0086
.002\177
.45
.7254 .7840 .8336 .8750

.2818 .3520 .4252 .5000
.0429 .0640 .0911 .1250
,22\177\177
.000}
.40
.5100 .5775 .6400

.I225 .1600
.3600
.0400
Tail
in Right-hand
Probabilities
Binomial
.02Z6
.2775
.00\1772
.15
.0100 .0225
Cumulative
.9497
.4718 .5716.6630.7438
.1631 .2352
.3174
.4069
.0540
.0870 .1312'
.0625
.9688
.8125
.5000
.1875
.0102 .0185 .0312
.9246 .9533
.9723
.6809 .7667
.8364
.3529 .4557
.5585
.1174 .1792 .2553
.0223 .0410 .0692
.0000 .0000 .0001
.0002
.0007
.5217 .6794
.1497 .2834
.7903
,8665
.4233
.5551
.3529 .4677 .5801

.1260 .1998 .2898
.0083
.9844
.8906
.6562
.3438
.1094
.0156
.0018
.0041
.9176
.9510
.6706
.7662
.9720 .9848 .9922

.8414 .8976 .9375
\370\370\370
i
3o1\177
.0257
.0738
.1480
.2436
.0027
.0121
.0333
.0706
.0002
.0012
.0047
.0129 .0288 .0556
.0000 .0001 .0004

.0000 .0000 .0000
.0013
.3917
.7734
.5000
.0963
.1529
.2266
.0357
.0037
.0078
.0038
.0090
.0188
.0001 .0002
.0006
.0016
365
.6836
.0625
TABLE 1Ilc
.15 .20
.25
.30
.3366
.5695
.7275
.8322
.8999
.0572
.1869 .3428
.4967
.6329
.9424
.7447
.3215
.35
.45
.40
.9681 .9832
.8309 .8936
.50
.9916 .9961
.9368 .9648
.0058
.0381
.1052
.2031
.4482 .5722
.6846
.7799
.8555
.0004
.0050
.0214
.0563 .1138
.1941 .2936
.4059
.5230
.6367
.oooo
.0004 .0029
.0104
.0273
.0580
.1061
.1737
.2604
.3633
6
7
8
.oooo .0000
.0012
.0042
.0113
.0253
.0498 .0885
.0036
.0085
.0t81
.0352
.0002
.0007
.0017
.0039
.9954
.9980
.0002
.0013
.oooo .0000 .0000 .0001.0004
.oooo .0000 .0000 .0000 .0000 .0001
.3698
.6126
.7684 .8658
.9249
.9596
.9793
.9899
.o712
.2252
.4005
.5638
.6997
.8040
.8789
.9295 .9615
.0084
.0530
.1409
.2618
.3993
.5372 .6627
.0006
.0083
.0339
.0856 .1657
.2703
.oooo
.0009 .0056
.0196 .0489
.oooo
.0001
.0006
.0031
.oooo
.0000
.0000
.0003
.oooo
.oooo
.0000
.0000
.0000
.0000
.0000 .0001 .0004

.0000 .0000 .0000
9 1
10
.10
.05
xo
(Continued)
.1445
.9805
.7682
.8505
.9102
.3911
.5174
.6386
.7461
.0988
.1717
.2666
.3786
.5000
.0100
.0253
.0536
.0994 .1658
.0013
.0043 .0112
.2539
.0250
.0498
.0898
.0014 .0038
.0091
.0195
.0008
.0020
.0001
.0003
.9865
.4o13
.6513 .8031
.8926
.9437
.9718
.9940 .9975
.9990
.o861
.2639
.4557
.6242
.7560
.8507 .9140
.9536
.9767
.9893
.o115
.0702
.1798
.3222 .4744
.6172
.7384
.8327
.9004
.9453
.OOLO
.0128 .0500
.1209 .2241 .3504
.4862
.6177
.7340
.8281
.0001
.0016 .0099
.0328
.0781
.1503
.2485
.3669 .4956
.0000
.0001 .0014.0064
.0197
.0473
.0949
.1662 .2616
.0000
.0000
.0001
.0009
.0000
.0000
.0000
.0000
.0001
.0035
.0004
.0106 .0260
.0016 .0048
.3770
.0548 .1020 .1719
.0123 .0274
.0547
.0000
.0000
.0000 .0000
.0001
.0017
.0005
10 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001
366
.6230
.0045
.0107
.0003
\1770010
TABLE
An
entr,
curve,
the
bi
n z
Areas
IV
Normal Distribution
for a Standard
Areas
table is the area under

the
= 0 and a positive value ofz.
values of z are obtained
:-z\177-----Area
= Probability
by,
0
Second Decimal Placeof
.01
0.0
.02
.03
.05
\17704
.06
.0279
.0319
.0359
.0675
.0714
.0753
.0080 .0120
.0478
0.2
.0793
.0517
.0832
0.3
.1179
.0871
.1217
.0910 .0948 .0987
\337
1554
.1591
.1255
0.4
0.5
0.6
0.7
0.8
\337
1368
.1406
.1700
\337
1736
.1772
\337
2088
.2123
\337
195o
.1985 .2019 .2054
.2291
.2324
.ss
.2611
.2642
\370
1.0
.2703
.2910
.2939 .2967
.3186
.3212
.3238
\337
2734
'
.3665
.3686
3708
.3051
.3340
\337
3531
3554
\337
3749
.3770
.3577
3790
.3980
.4147
.4292
3925
.3849
.3869
.3888
.3907
\337
3944
.3962
.4o49
.4066
...\17782
.4099
\337
4115
.4131
.4222
.4236
.4251
\337
4265
4279
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
30\17778
4656
.4713 t
.4719
I
I
.4772
.41}21
.4\17761
.41\17793
.4726
.4732
.4788
.4778
.4783
.4826
.4830 .4834
.4864
.4868
.4871
.4738
.4793
.4838
.4875
.4896 .4898 .4901 4904
.4920
.4\17718
.4940
.4955
2.7
.4664 .4671
.4966
2.8
.49'74
.4975
2.9
.498i
.4982
.4987. .4987
.4922
.4941
.4925
.4927
\337
4678
\"\177L'W6\177.4693
.3106
.3133
.3365
.3389
.3599
.3621
.3810
.3830
.3997
.4015
.4162 .4177
.43O6
.4[332 .4345 .4357 .4370 .4382 .4394 .4406 .4418

.44521 .4463 .4474 .4484 .44:95
.4505 .45\17715 .4525
.4564 .4573 .4582 .\177-i .459
.4654
1
.4608 .4616
.4649
2.6
3.0
.2852
.3315
.41032
1.5
.28:\1773
\337
3023
1.3
.41192 .4207
.2549
.2794
\337
3289
.3729
.2190.2224
.2517
.2995
1.2
1.4
.2157
.2486
.3264
.3438 .3461 .3485 .3508

.643
.2764
.1480.1517
.1844.1879
1443
.1808
.2422 .2454
.2389
2673
.1026 .1064 .1103 \337

1141
.1331
.3915
.2357
.0636
.0596
.2257
0.9
1.1
.1664
.1628
.09
.08
.0160 .0199 .0239
.0040
.0438'
1293
.07
.0557
.100012
.,0398
0.1
.4319
.4429
.4441
.4535
.4545
.4625
.4633
.4699
\3374706
\337
4744
.4750
.4756
.4761
\3374767
\337
4798
.4803
\337
4842
4846
.4808
.4850
4884
.4911
.4932
.4812
.4854
.4857
\337
4878
.4881
\337
4906
.4909
\337
4929
.4931
.4943 .4945 .4946 .4948
.4817
\3374887
.4890
.4913
.4916
.4934
.4936
.4949
.4951
.4959
\337
4960
.4961
.4962
.4972
.4979
.4963
.4988
\337
4989
.4989
.4989
.4990
.4952
.4964
.4967
.4968 .4969 \337
4970
.4971
.4973 .4974
.4976 .4977 .4977
\337
4978
.4979
.4980 .4981
4982
.4983
.4984 .4984 4985 .4985 .4986 .4986
.4956
.4957
.4987 .4988
367
.4990
TABLE V
Student's
Critical
Points
\275f\177/\177,,,,.
--t
Critical
point
Pd.f.\177
.10
.05
.025
.01
6.314
12.706
31.821
3.078
1.886 2.920
1.638
.005
63.657
9.925
4.303
6.965
2.353
3.182
4.541 5.841
1.533
2.132
2.776
3.747
4.604
1.476
2.015
2.571
3.365
4.032
1.440
1.943
3.143
3.707
1.415
1.895
2.365
2.998
3.499
1.397
1.866\177
2.306
2.896
3.355
1.83\1773\177
2.262
1.81.2
2.821
3.250
2.228
2.764
3.169
1.796
2.201
2.718 3.106
1.782
2.179
2.681
2.160 2.650 3.012
9
10
1.383
1.372
11
1.363
12
1.356
\177:i.2._.447
3.055
13
1.350
1.771
2.145
2.624
2.977
15
1.341
1.753
2.131
2.602
2.947
16
1.337
1.746
1.333
2.120
2.583
2.921
17
1.740
2.110
2.567
2.898
18
1.330
1.734
2.101 2.552
19
1.328
1.729
2.093
2.539
2.861
20
1.325
1.725
2.086
2.528
2.845
21
1.323
1.721
2.080 2.518 2.831
22
1.321
1.717
2.074
2.508
2.819
23
1.319 1.714
2.069
2.500
2.807
24
1.318
1.711
2.064
2.492
2.797
25
1.316
1.708
2.060
2.485
2.787
26
1.315
1.706
2.056
2.479
2.779
27
1.314
1.703
2.052
2.473
2.771
28
1.313
1.701
2.048
2.467
2.763
29
1.311
1.699
2.045
2.462
2.756
30
1.310
1.697
2.042
2.457
2.750
40
1.303 1.684
2.021
2.423
2.704
60
1.296
2.0'00
2.390
2.660
2.358
2.617
2.326
2.576
14
120
oo
1.345 1.761
1.671
1.289 1.658 1.980

1.282
1.645
368
1.960
2.878
VI
TABLE
Critical
Points*
(C 2
g2/d.f.)
Critical
point
.99
.90
.0O5
7.88
.060039
.00016
,00098
.0039
.04501
.0101
.0253
.0513
.0383
.0719
.117
.0158
.1054
.195
.0\17717
.0743
.t21
.178
.266
3.72
.0\17723
.111
.166
,229
.322
3.35
6
7
8
9
1\1773
.145
.206
.273
.367
3.09
1, 1
.177
.241
.310
.405
2.90
.206
.272
.342
.436
I'!'3
.232
.300
.369
10
21 6
.256
.463
.487
4
5
11
12
13
14
15
.0\17739
,8
2'\1774
.2\1771
.3\1777
.394
5.30
4.28
2.74
2.62
2.52
.278
.347
.416 .507
.298
.367
.435
.525
2.36
.316
.385
.453
.542
2.29
.333
.402
.469
.556
2.24
.349
.417
.484
.570
2.43
2.19
16
18
.3\2751
.363
.498
.582
2.14
.348
.390
.457
.522
.604
2.06
20
.3:}2
.413
.480
24
30
.4o
40
60
120
Interpol\177
* To
.95
.975
obtai
.543
.622
2.00
.452
.517
.577
.652
1.90
.498
.560
.616
.687
1.79
.554
.625
.724
.611
.663
.720
.726
.774
1.67
.675
.763
.798
.839
1.36
1.000
1.000
should be
ical
values
performed
using
of Z 2, multiply
!.53
1.000 1.000
reciprocals
the critical
369
1.00
of the
value
degrees of freedom.
of C s by (d.f.)
370
371
372
373
TABLE
2
0086
Common
VIII
10
0000
0043
1!
12
0414
0453 0492
0792
0828
13
1139
1173 1206
0864
0128
6
0170 0212 0253
0531
0569
0607
0899
0934
0969 1004
1239
1271
1303
14
1461
1492
15
16
1761
1790 1818
2041
2068
17
18
2304
2330 2355
2553
2577
2601
19
2788
2810
2833 2856
1523
2095
1335
1553
1584
1614 1644
1875
1903
2122
2148
2175 2201
2380
2405
2430
2625
3032
3054
3075
3263
3284 3304
3483
3096
3424
3444
3464
3636
3655 3674
3838
24
3802
3820
25
3979
3997 4014
26
27
4314
4330
4346
28
4472
4487
4502 4518
4654
2923
1072
1399 1430
1703
1987 2014
2253
2504 2529
2945
3139 3160
3345
3365
3502
3522
3541 3560
3692
3711
3729
3874
3892 3909
4048
4065
4232
4082
4249
3927
3945
3962
4116 4133
4281 4298
4393
4409 4425
4548
4564
4639
4669
4683
4698 4713
4843
4857
4969
4983
4997 5011
31
4914
4928
4942
5051
5065
5079 5092 5105
33
5185
5198 5211
4955
5224
5237
5366
34
5315
5328
5340
5353
35
5441
5453
5465
5478 5490
5599
36
5563
5575
5587
37
5682
5694
5705 5717
38
5798
5809
5821
39
5911
5922 5933
40
6021
6031
4579
4829
32
5119 5132
5250
5263
4440
4456
4594
4609
4728
4742
4757
4871
4886
4900
5024
5038
5145
5159
5172
5276
5289 5302
5378
5391 5403 5416
5502
5514
5527
5611
5623
5635 5647
5729
5740
5752
5843
5855 5866
5944
5955
5966
6053
6064 6075
5832
3201
4099
4265
4533
4786 4800 4814
3404
3181
3598
4216
4624
3385
3784
4378
4771
2765
2989
3579
4200
29
2742
2967
3766
3747
4362
30
2279
2227
2480
3118
3856
1732
1673
1959
3324
4031
1106
1038
1367
2672 2695 2718
3243
3617
0334
0719 0755
2900
3010
22
2455
0374
0294
0682
2648
3222
23
1931
1819
2878
20
6042
0645
1847
21
4150 4166 4183
Logarithms*
5428
5539
5551
5658
5670
5763
5775
5786
5877
5888
5899
5977
5988
5999 6010
6085
6096
6107
6117
6212,
6222
6972
6981
7059
7067
51
6128 6138 6149 6160 6170 6180 6191 6201

6232 6243 6253 6263
6274
6284
6294
6304
6335
6345
6355
6365 6375 6385 6395
6405
6435 6444 6454
6464
6474
6484
6493 6503
6532
6542
6551 6561 6571 6580
6590
6599
6628 6637
6646
6656
6665
6675 6684 6693
6721
6730
6739
6749 6758 6767 6776 6785
6812
6821 6830 6839 6848
6857
6866
6875
6902
6911
6920
6928
6937 6946 6955 6964
6990 6998 7007 7016 7024
7033
7042
7050
7076
7084
7093
7101
7110 7118 7126 7135
7143
7152
52
53
7160
7243
7168
7251
7324
7332
7177 7267
7185 7275
7193
7202
7259
7284
7340
7348 7356 7364
7308
7388
41
42
43
44
45
46
47
48
49
50
54
374
7210
7218
7292
7300
7372
7380
6314 6325
6415
6425
6513
6522
6609
6618
67026712
6794
6803
6884
6893
7226
7235
7316
7396
55
7419
7427
7435 7443
7451
7505
7513
752O
7528
7589
7597
7604
7566
7657 7664 7672 7679 7686

7716
7723 7731 7738 7745 7752 7760
7789 7796 7803 7810 7818 7825 7832
60
61
62
63
64
65
66
\17712c\177
67
\177261
8195
7582
7574
7649
7612 7619
7694
7767
7774
7846
7980
7987
7860
7868
7875
7882
7889
7931
7938
7945
7952
7959
8000
8007
7966
8021
8028
7973
8014
8O35
8055
8109 8116 8122
8176 8182 8189
8069 8075 8O82

8136 8142 8149
8215
8209
8202
7896 7903 7910 7917
8089
8096
8102
8156
8162
8169
8222
8228
8235
8041
8048
8241
8248
68
\177325
69
70
\177388
\17745I
71
t513
8519
72
I573 8579 8585 8591

8645
8651
1633 8639
74
5692
75
I{75i
76
77
78
79
8O
8274
8267
8525
8476
8482
8488
8537
8494
8500
8506
8543
8549
8555
856i
8567
8597
8603
8609
8657
8663
8669
8615 8621 8627

8675
8681
8802
8859
8887
8893
8899
8938
8943
8904
8910
8915
8949
8954
8960
8965
8998
9004
9009
9971
9053
9058
9063
9015 9020 9025

9069
9096
9101 9106 9112 9117 9122
9149 9154 9159 9165 9170 9175
9090
9143
8686
8745
8882
8982 8987 8993

9036
9042
9047
81
8382
8445
8470
8710 8716 8722 8727 8733 8739

8756 8762 8768
8774
8779
8785 8791 8797
8814 8820
8825
8831
8837
8842
8848 8854
8871
8876
8932
8254
8319
8531
8704
8698
8927
82
7627
7701
7839
8280 8287 8293 8299 8306 8312

8331 8338 8344
8351
8357
8363 8370 8376
8395 8401
8407
8414
8420
8426
8432 8439
8457
8463
73
7459
7466 7474
7536 7543 7551
58
7642
57
59
9074
9079
9128
9133
9180
9186
9232
9238
'83
9196
9201
84
9248
9253
9258
85
9263
9269
9279
9309
9284
9350 9355
9315
9320
9289
86
9304
9274
9299
9325
9330
9335
9340
87
88
9212 9217
9206
9222
9227
9360
9365
9370
9405
9410
9420
9455
9425
9450
9460
9415
9375
9400
9430
9435
9504 9509
9469
9474
9440
9465
9484
9489
9513
9518
9479
9552
9562 9566
89
9499
9O
9547
9557
9380 9385 9390
9523 9528 9533

9576 9581
957l
95'38
9586
91
9595
9600
9605
92
9609
9614
9647
9619
9643
9652
9624
9657
9661
9666
9713
9680
9717 9722 9727
9763
9768
9773
93
9689 9694
94
95
96
9703
9708
9741
9745
9754
9791
9759
9782
9786
9750
9800
9805
9832
9836
9795
19872
98
99
* The log
2,
because
of
the de
2.
9881
9675
9809
9814
9850 9854 9859

9886 9890 9894
9899 9903
9841
9845
9633
9818
9863
9908
9921
9926
9930
9934
9939
19961
9948
9974
9978
9952
9965
9969
9943
9983
9987
9991
9996
be raised to yield N.\" Thus log 100 =

the \"mantissa\" (the digits
to the right
iS
each lo g' The characteristic (the
inte
er to t he left of
i g\177ven for
,!forexam
lelo
191
1281
L
\177
'
g
\337
'i
P
g
\337 =
.
.
og lb- N requires the characteristic
\177haracteristic 2, log 100N the characteristic
3, and so on. Thus
to which
power
10.0.In
10 N
log 537 =
9877
9628
9671
i9917
\"'the
decimal\177
log
9699
9736
\1779827
97
the
749O 7497
7412
56
0,
(Continued)
VIII
TABLE
this
table,
10 must
only
375
376
APPENDIX
I. Reproduced, by permission,
Wiley and Sons, 1945.
II. (a) Reproduced,
by
(b) Reproduced, by
III.
by
Reproduced,
IV. Reproduced,
Edition, John
by
Wiley
Sons,
and
Statistics
1966.
Corporation.
Chemical Rubber
Company
Standard
Edition.
Student
permission,
Sons,
John
the RAND
from the
16th
Tables,
C. Clelland et al., .Basic

and
Wiley
from
permission,
permission,
Trievonometric
Wiley
from R.
permission,
Tables,
Mathematical
the
from
Applications, John
with Business
TABLES
FOR
CITATIONS
from
P.
Hoel,
Elementary
Statistics,
2rid
1966.
V. Reproduced, by permission,
from R. Fisher and F. Yates,Statistical
Tables,
Oliver and Boyd, Edinburgh,
1938.
VI. Reproduced,
by permission, from W. J. Dixon
and F. J. Massey, Introduction
to Statistical
Analysis, 2nd Edition,
McGraw-Hill,
1957.
VII. Reproduced, by permission,
from
Statistical
Methods, 6th Edition,
by
George W. Snedecor and William
G. Cochrane,
1967, by the Iowa State
University
Press,
VIII. Reproduced
Edition, \270
New Jersey.
from
1967,
Iowa.
Ames,
E. Freund,
permission
of
John
by
Modern
Elementary
Statistics,
Prentice-Hall Inc., Englewood
3rd
Cliffs,
to
4nsvelrs
The
given
merely
in
error
Problems
Odd-Numbered
\177,.tude
ans,\177ers
at is not
expected always
to
calculate
answers are given

for the
>eriefit
of those who want it;
b4caus
of slide rule inaccuracy.
>elow.
These
377
to
the answer
a fairly high
even so, the
last
as preciselyas the
degree of precision
digit may be slightly
378
Mode <
2-1
is
central measure
a bad
not
which is not
in
case,
this
2-17 27.6%(NOT
asymmetrical.
very
PROBLEMS
2-15 121.50
mode
mean.'The
<
median
ODD-NUMBERED
TO
ANSWERS
.952/4)
77.4, 81.25, 85
2-3
10
23
3-1
answer
n\177(authors'
(c) n
-60
Mean
2-5
3-5 (b) not
Mode
Median
= .54
= --
--
50
90
80
70
1/2
1/4, 1/4,
likely;
equally
-- .46)
50
(c) 3/4
hardly
raw
77.78
81.47
fine
77.4
81.25
85
coarse
78.4
80.00
80
defined
3-7
(c) .50, .70, .85,.35
(a)
3-9
Mode
degree
(b)
on
depends too much

of grouping.
but not always, does

grouping
give worse
Usually,
coarse
ap-
375
(c) No
the
the
_ .375
6
\177-o-
o --
(b)
(a)
.30, .65, .15
.50,
(a)
3-11(a)
.40
(b)
.60
(e) . 17
3_z
_
\177o --
.42t
proximations.
(c) .55
2-7
range
= 30
MAD =
=
s
2-9
integral
sum = 1
or 40
8.58
/\1772736
/
\276
x/114.--\177
10.67
.06
3-15
(a)
.21 -. 29
(b)
Yes.
24
2-11 239,483
Pr (A)
B)
3-17 (a)
2-13 (a) 77.4, 5x/4.84=

;/1--\177--_\177
=
11.0
3.8
Pr (A/A
Pr (A tw
easy to compute.
8.60,
\177'\177
-.58J
(d) .78
Text coding is preferred

because
it has
and small values of ,!/, which
are
(b)
(f)
\177
(b)
\177
(c)
\177
so
a
\177\177
\177 B)
Pr
(A)
Pr (A) +
x
-o\177x-\177t =
\177-\177-
.\177as
143
x \177 =.
Pr
.0046
.00076
(B)
TO
ANSWERS
z9 _
3-19 (a)
.62
2__8 __
4-I
= .089
=A
(c)
- \177 --.022
--45
0
3-25
(a)
E2)
(Et
-- 4
(b)
?(v)
1/16
2/16
4/16
6/16
6/16
6/16
4/16
2/16
1/16
16/16
4-3
(Et) Pr
(Es,),
I.e,,
(E\177) Pr
(Ea),
6\17736
10/36
'r (E\177,tq E a)
\177 Pr
-\177
8/36
6/36
4
5
4/36
2/36
36/36
(a)
les
(b)
]Yes i
4-5
3-29
(b)
16/16
= Pr
3-27
p(x)
(a) x
(b)
3-21
379
PROBLEMS
ODD-NUMBERED
(a)
tt = 2
(b)
/t =
(a)
'3
(b)
iinpo[sible
be an \177
error
must
conditions--there
of specification.
4-7 (a) 3tx
3.5
(b)
4-9
conditions.
sdm =
(d)
(c) 0j
.001
3-33(a)
5/T\1775
= 1t
(b), (c) lty
(d)
(c) [mp\370[sible
3-31 (a)
.75
1.5
1.7
3.4
p(x)
0
1
16/81 =
.198
32/81=
.395
24/8I
= .296
8/81 --
I/'81
= .0079
.099
.012
81/81
.001
\177or
t tosses,
/\177--
.001 +
5-g
\177
a --
s
(72 ---- \177---
(.999)
1.33
, 89
(b)
(c)
See
ow
certhintv
4-11
he probabilities grow
:as
n _,,
m.
toward
(a) /tx
(b)
= 1.36
,u :\177,=
(c) tt x
3.15
5/2'43 =
1.56
5/2.16= 1.47
4-13
TO
ANSWERS
380
(\177).2x.8a-\370c
(a)?(x)=
PROBLEMS
ODD-NUMBERED
p(x)=
4-15
1,\177
7 1,\177}
p(z)
o
64/125= .512
48/125
12/125 =
1/125=
125/216 = .579
.384
.096
75/216=
15/216
.008
1/216 =
.347
.070
.004
216/216= 1.00
1.00
-- 2
-- .50
c;\177\177
-- -\177
- .416
\177f\177
\177
.60
.48
0.6
0.4
p(x)
0.4
0.2 -
p(x)
0.2
0
0
&l
4-17It
= nrr
,7=(1
\1772=
- =)
4-19(a).9544
.9495
(b)
(c) .9901
(d) .9772
(e) .9772
p(.O
.729
__\177
.95
\177
.99
'
4-21
.243
.027
.001
\337
1.00
.30
= .27
0.8
0.6
p(x)
0.4
0.2
0
012
10 12
TO ODDaNUMBERED
ANSWERS
4-23
(a)
\177-\177
sum
(b)
4-25
,SFS
.SI\177F
1/48
.SSF
?(r)
(a)
.FSS
i
2/16
6/16
12/16
--2
6/16
4/16
-1'5
t2/16
(b)
i)
3/48
.FFS
.FFF
5/48
4-27
9/48
1/48
48/48
--.92
(nOte =
(b) 10/48 =
\177-+
} + -xo)
.21
= .75
Ix'
\276
5-1
21
(a)
(b)
6/16
8/16
8/16
2/16
4/16
16/16
(d)
1/48
\177
y?(y)
(c)
15/48
48/48
.FSF
/\177-
,\177e \177
?(x)
23/48
15/48
2/16
,\177R 3
0
1
15/48
3/48
5/48
.SSS
(c)
381
Pr(e)
(a)
PROBLEMS
it
12/16
0
1
4/16
0
1
0/16
p(x)
1/16
1/16
2/16 2/16
2
3
2/16
2/16
1/16
6/16
4/16
2/16
2/16
2
1
6/16
2/16
4/16
il/16
1
4
y
2/16
= 3
/\177)2=
3/4 =
o \177Xof
course
x
(a)
or
(b)
(c)
.6 \177:
.4 5'-\177 = .683
of 5 ha s
68\177
chance
whereas a
)n has only a 60 %
,redicting,
4-29
(a)
y
\337
382
ANSWERS
(c)
2,
(d)
TO
ODD-NUMBERED
PROBLEMS
1.2
(b)
?(x/\276=2)
,c
(c) ,2
2
1/3
1,/3
1/3
5-7
--.6
5-9
(a)
(e) 2, 2/3
(f)
No. For
p(O,l)
5-3
example,
-\177px(O)?\177.(l).
(a)
(b)
Yes
4
I
2 --
E(X)
5-11
\337 \337
(a)
p(.\177)
.2
.6
.2
p(s)
x
(b)
because of symmetry
+ E(Y)= 3\253
(i) 0,
(ii)
1/9
3
4
5
6
2/9
3/9
2/9
1/9
4
4/3
(b) For
X2
X\177and
(c)
02=2/3
/t = 2,
(d)
(c) \234(x\177+
0
1/6
4/6
1/6
(f) No, because,for

3)
?(0,
(a)
tp(.
-1
o
1
2
3
5
example,
\177?x(o)?r(3)
.2
.1
.4
--.2
.1
.5
.1
.1
.2
4
5
6
.3
.3
1.2
Var
0 + E(X_o)
Xa +
Var X2.
.3
.1
.2
\234(x
XQ =
?(s)
4.10,o}= 1.29
.4
.1
xs)=
(X\177 +
5-13 (a) s
t/3
(e) 1,
5-5
var
(b)
cs} =
.60
2.10,
o,\177=
.69
(X1,X 2)
= 0
/q
= 2.00,
/t 2
(c) cov
by
symmetry
TO
.ANSWERS
5-15
(a)
(d)
o
.15
.10
\177j
.10 .15
(f)
\337
5
\177}\177
-- . 25
osz =
5)
.2
.6
.2
(d)
and
are
different
= .50
1.5
I't'/x=5 = 6
(e) No, because (b)
dependent
1.0
(b)
?(\177j/X
5
6
7
.25 .25
are
383
PROBLEMS
ODD2NUMBERED
.7
.85
5-17
140 150
400
160
(a)
400
100
65
.2
.1
7O
75
.1
.2
.1
.1
.2
1200/9
133.3
160
I
Negati,
first
grad{
second
student
g\177
\177ho
means
that a high
rods t \370be followed bY a low
\177V
2. This may be becausea
well on the
(oes
becomesbver(0nfident
and
fails
exam
first
to
s\177\370nd examl
SimilarlY, a Student
who doe\177 poorly
o n the fi\177'st exam may
study ver\177 liar' f \370rth e second.
covariance makes the
The negative
less
fluctuating
(o = 10
average
grade\177
519
iP (:r)
.5
.2
.5
6
7
.4
.4
(b)
6. 2
(c)
?(h)
65
.3
70
\1774
75
.3
/t H
(c)
5.5,/t\177
(b)
70--
75--
ofl 5)
(a)'\177
\177
study
for the
instead
65--
-- 70
o\177t
15
?(w)
t40
.3
150
.4
160
.3
.!
.4
ttW:
.2
(d) 2O
.3
11.7= \177x
150
(e) 143.3,156.7
+ t1ix
(f)
No, because
a\177
= 60
384
TO
ANSWERS
(g)
590 (*-2/x H d- 3/t W)

840 (=4o\177rt + 90.\177V
+ 12oH W)
3t\177=
0.\177=
0.1
ODD-NUMBERED
PROBLEMS
,TI
(c)
1/2
1/2
(= X/840)
-- 29.0
coded
5/2
(d)
55O
.2
--5
560
.1
0
-4
570
580
59O
6O0
-1
0
1
.1
610 0
.1
62O
.2
63O
-1.0
-3
--2
.1
.2
5-23
(e)
100/4
/Ja
= 25/2
It4
-' 25/2
625/4
625/4
= 32.5
.2
.6
(b)
1/3o
1/30
1/30
840
(H H H
\337
(H
T
T
H
H
(H T
etc.
(b)
= 343.75
5-25 (a)
(a)
\337
(H
0.i2 -- 1375/4
65/2
\177/t,
o\177=
\337
(H
25/4
10(-1.0) = 590
600 +
Similarly,
/t 2 = 10/2
-1.0
bti =
0.x2
6oo)11o
= (i-
i'
_p(r)
1/16
1/16
10
1/16
T)
H)
T)
15
1/16
0
25
2/16
1/6 --. 1/6
1/6
i.e., 0 %
(d) -7/12
(f)
1/16
85
example,
1) % px(1)py(1)
p(1,
-\177
\337
\1776
= --.58
(e) 3.5,35/12
30
1/6
...
1/3o
(c) No, because,for
20
1/6
1/6
H)
H H)
p(Xl)
/x,g =
7.0
0.\177 =
28/6
= 2.92
(=o\177
092
+ 2o'12)
\177 4.67
32.5
343.75
Or compute
(the hard
from
way).
?(s)
directly
ANSWERS
5-27
350/12 =
(a
TO ODD-NUMBERED
29.2
900,000
(a) 9000 and
6-13
10= 50
385
PROBLEMS
(b) .147
(a).014
6-15
(b) .008
6-17 .24
6-1
In the
\"standard
Cl
sentence,
deviation\"
\"ran
(.023 without
.018
6-19
last
correc-
continuity
tion)
with
(.309)
(a)
.6-21
'\177--
.0028
(b). 131
(c) .131
6-5
8-7\177-'
1.63
850 --
Since
(d)
are asking
p(\177)
170 x 5, (b) and
exactly
the
(c)
same event.
'event
(b)
On the other hand,

occurs whenever (a) occurs, and
some other times as well.
1/9
2/9
3/9
6-23
2/9
1/9
(a) Equally
(b)
3.92x/\177
2n,
200, 39.2
4
6-25 (a) Pr
(c
:i = .016
(Izl >
p(\177)
6-27
(a)
\177/27
13
3/27
10]3
6/27
lZ/3
7/27
6/27
1\177
]3
3/27
p(x)
0.4
0.2
--
65,
(b)
(d)
X/'\177/'9
c\177
2 =
p(\177)
:: .943
6-3(a).
63.5
.3
65
.4
66.5 .3
5-7 pr
;7<Z<3:00)
=.9987
(c)
p(x) 0.2
6-9
0.4
.01154_
6-11
.02
65
68
1/27
o2--
TII
62
18/5 =
3.6
ANSWERS
386
(d)
TO ODD-NUMBERED
PROBLEMS
=p
px=65
(c)
:*
(\177)2
(\177)27\275)
(e)
\276
(f)
4/9
18/9
16
48/9
25
50/9
36
36/9
E(X
\177)=
Bias
156/9
E(,\177 2)
\177/\1772
--/if'
= 156/9 --
7-1 (a) 71
(3)
\177-
4-1.96
71 i
.59
x/lOO
Similarly,
(d)
E(1/\177v)
7-3
.83
Bias
4- .032
\177 \177/\177
-- .250
= .274
7-5 (a) 20(\177)

\177o\177(\177-o)
\177
.024
Theoretically,
(b)
- (sum of answers
above)
Answers
(a) and (b) can be roughly
approximated
by the normal, as .39and .30
(or .243, if you like). The correct values
are
.377 and .358, respectively.
(c) 1
(a) unbiased,
by
(6-10)
(b) unbiased,
by
Table
(c) biased;
by
variable'
(4-5),
Hence .\177is
(b) 9/10.
\337
for any
random
for X:
preferable.
i.e.,
7-9
4-2
= o2
E(X2) -/re
In particular,
7-7
42
-- 4/3
p(\177:)
\177:p(\177)
1/9
2/9
3
4
2/9
3/9
12/9
2/9
10/9
1/9
6/9
bias
\177}
=4/3
7-11
6/9
0.3
0.2
(a)
E(,\177)=
(b)
.\177
= tt
36/9
2.\177+
L(\177')
(2\177 +
1).p(\177)
5/9
14/9
27/9
11
22/9
13
13/9
E(2.\177q-
unbiased
1) \177.81/9
= 2/x 4- 1
0.1
0.5
\177
MLE -- .75
1.0
TO
ANSWERS
ODD-NUMBERED
(\177i
-,
8-19 .06 4- 1.965j
\177
100
100
,--'= .060 4-
\177roof:
(b)
(.66)(.34)
/(\17772)(.28)
7-13 (a)
387
PROBLEMS
.128
(Yi
\177 E(X\177
(.40)(.60)
/(.79)(.21) +
.39 4-2.58
8-21
l\177)
2
300
1000
xj
= .390 q-.080
\177G2
47.3
47.3
<
8-23
0r2
crz
8-25 4 4-
os < 658
15.2 <
.. i.e.,
.072
<
3.12
+ 1/3
2.776x/5-\177V\177l/3
= 4 \1774-8.17
8-1
)lx/.'\177 = 1
8-27 (%
4- .92
-- %) =
4- 1.96
8-3 (a)'9
of 4,
FaC
(b)
4-
so that
2.96
i.e.,
--
i.e., relative
= 69
\177 1.[8(4/\2751\177
8-7
Stt\177
,os\"ng a and it
decline --
Although the best guess

\177 .784
are both
one-fifth,
allowed
unknown,
only
.77
89
use
.142
rr
69
205/o\1774 -
135/o
decline is
when
sampling
fluctuation is
for, with 95\177 confidence
we can
say
that the relative de91ine was
between 7 \177 and
33
for the
5/0.
:\177
2!
=2\1774.1
8-13 .4 20
\177 1.96x'
9-1
=.4820
8-15 .0
I, II
\177 .0098
P -- .50
.
9-3
<(i \177 \177
.49
1.96x
8-17
4- .018
.028
.0284- .018
A,r
Thus
2500
2500
\177r
I -- 'rr 2 \"=-
240.
(c)
8-5
(.1 la,)(.886)
+
\177
--
96x/\17736/60
4-tl
(.142-- .114)
/\177.142)(.858)
(% Reject
(.199)(.801)/2500
(a \177).19
\177_
=.\177992 \177 .0157
i.e., ?
\17792\177 .98/X/2\177
'
The
er
=.1992
> .533
._.,>
/(.5)(.5)'
.67
X] 100\"
\177 .0196
closer P is to
approximation
H oiff
.5,
(b)
the
will (8-21)
About
make
(c) .085
25\177
of
erroneous
the stqdents
will
rejections of H0.
388
ANSWERS TO
9-5
Ho iff
(a) Reject
X--
PROBLEMS
ODD-NUMBERED
The best
that
done is a
can be
2.16\177 test'
8.5
>
X/l-7- \177
1.645
i.e.,
,,\177= 8.8, reject H0.(It

more accurate to use
critical value of 1.68.)
Since
would
be
the
= 0, .1, .9,or
Reject H 0 ifP
8.74
,\177>
H 0 if
(b) Reject
or
P<.402
(b)
P>.598
of the discrete
it would be better
to
because
Again,
of P,
nature
state the
answer:
H 0 if
Reject
1.0
1.0,
or P>.60
P<.40
0.8
found
\177 is
Then
correction
\1770.6
to
continuity
= 1.90).
by
be 5.73/00 (z
0.4
0.2
0
9-15 (a) reject
(t = 8.2)
accept (t
reject
9-7
(a) Prob-value
if the
claim
= .04
(z = 1.77) i.e.,
(S6600) is true,
the
of getting a sample as
as this ($6730) is only 4 \177.
chance
extreme
not reject. However,

if
possible, I would avoid accepting
H 0, in order to avoid
the risk of
a type II error.
(c) I would
.007 (z --
9-11 Reject
=
9-13
2.60)
<
6334
(b) 5726 <
It
Therefore reject, accept,reject.
Yes.
(b)
9-9
= .21)
(t --
H 0 (z
= 2.34
prob-value
and
the normal
tion, (which
reject Ho if
or
(c) You cannot reject

reason (a) or (b).
8-4, (which
reject H 0 if
rough),
or P>
.14
P<
very
a 5 \177 test
is not
(tenths),
binomial
.0804-
H0,
either
for
.043
<.001 (z = 3.7)
(c) The sample
is also
tive
difference
significant
cally
in
populations
matter.
.86
possible.
9-21 (a) 1 4-.64
(b) .002(z
(c) Yes
at the
sociologicalsignificance
difference
discrete
it is better to use the
Table II. It is seen
is
Since
9-19 (a)
(d) The
P>.81
Fig.
Using
that
approximarough),
is very
P<.19
Off)
18.4 Z
(b)
(b) prob-value
answers'
(i) Using
(ii)
< 3t
12,100
2.48)
.OLO).
(a) Three
9-17(a)
3.08)
is statisti-
5 \177olevel.
of the
is a
rela-
TO ODD-NUMBERED
ANSWERS
3.68
t=
(/
hour
(b) For
10-1 (a)
389
PROBLEMS
confidence
follow-
the
factor,
allowance is 4-2.77for
ing differences in
\177_.>/
the
--3*
3*
1
\\
',,,,
'
\177 2.
Therefore
6*
:i
10-995
t'=
Sincj
(b3
F and
t\1770,,
% confidence
3ttT --
oa
=64-3.5
s3e in this particular c\177se that

the tpst using t gives exactly
the
we
sam\177iconclusion as
the test
H o at
\337reject
5%
level.
F.
in
proved
have
Mat\177ematicians
using
interval:
= 6 4- 2.77V8/5
3tX
gene{al that whenever F has 1 and

k df, \177
and
t has k df, then t \177and
F halve exactly the same distribu[tion.
(c) 7 4-
L45x/4-\177x/2'\177
7 4- 4.7
2.6
11-1
(a)
760 +
S=:
or =-396
50(1114/18) -, oc,
(b)
a=
(319/3)
which
ialls
- estimate of
average person\337
ao
F.os
of
not reject
do
Th\177refor\177
= S--396 =
ings of a person
=z..o:,
short
3.06.
'
(a) lho
rlfact
(a)
r'
= 32.4
10/12
.068
acre\.
bushel
Not
economical
13.6q
\177
--
6.94\337
saw.
income\337
is extrapolating
recklessly.
\370
F.05
zero
with
savings
of
estimate
this
However,
Ho.
11-3
10-7
+.144Y
s760
of the
F\177
+.144y
= 760
10-5
-\177
(all
units are
\"per
(net return-
25{).
(c) 13.6e
Therefore
H 0 \337
11-5
16/3
(a) S(ao,
= 6.4
(b)
Oa \370
aS
0rt of
not
b) = \177( Yi -- ao
-- bXi) \177
aS
F. 05 =
reject
H 0.
6.94. There-
ab
-2
= -2
(Yi
-- ao
\"o
bXi) =
-- bX\177)
=0
390
(c) a 0 =
-396,
the
than
(d) The method
ODD-NUMBERED
TO
ANSWERS
before
is easier
in this problem.
.144, as
the text
in
method
PROBLEMS
(c)
878
(d)
s 2 = 7863/2
(e)
of freedom for s\177'

few.
It would be
degrees
Two
almost
are
= 3931.
too
better to collect more data. This

of data is even more
acute
in Problem
13-3. -'
scarcity
2.6
12-1
.144
(b)
13-3
/.0-\177
/.0-5-8'-\177
1-
/3*=
that
Note
[:\177,and
the error
are the same.
[3* and/3
(b) a =
12-3t
2.6/18
= 3.11
x/.0388/18
which
falls
fore do
12-5
It is
of wide
not
short of t.0 \177= 4.54. Therereject H 0.
preferable to observe i in
a period
13-5
(b)
S = 760
+ .115!!--
and
Y, and
misleadingly
tween S
and
no
W is (negwith both S
Y, taking
In fact,
correlated
atively)
.024c
+- .000010d
.1054
-.0242
d=
--38.1
is much
better.
269
8
52.5
42 (r-- 7.5)
= 33.6-
1.25(T- 7.5)
serious bias caused by

that we started at a
seasonal
high (Christmas), so
that
of course the time trend is
multiple correlation coeffiis the proper measure of

\"the
relation
of S to Y, other
things being equal.\"
The simple
correlation coefficient
measures
of S to
--.00_7\177b
+ .024d
There
is
the
fact
(b)
The
W.
+ 144c
760
c =
S-
cient
the relation
.007d
.029w
Coefficient of y is .115,which
is
less than the former value,
.144.
account of
--
(a) 36.7
13-7 (a)
13-1 (a)
(b) 25.5, which
fluctuation.
-.0017
(4)
18b--18c
= --18b
--6.3
(3)
18
= .8564- .148
allowances for
2.6 =
(2)
4- 3'18N/
18
760 = a
(a) (1)
i\177
.148
4-
15.4
--
fl*
18 4- 3.1 \177/
(a) /3--
downwards.
13-9
--8
= --8
l\177=
4-
--2.45\276\"48/2
4- 12.0
thereby produces a
be-
high correlation
Y themselves.
13-11Make
fi
is better
by .38
mpg.
ANSWERS
TO ODD-NUMBERED
15-3
14-1
i'/\17744)
\1779<
=. 62
p(34)
< .95
(a) a 1
(b)
a\177
(c)
aa
It
(d) False.
/(1 00)(20)=
best when 'rain' is

and
also 'When no
is possible.
Action aa
prediction
when 'shine' is predicted.

a 2 is never best.\"
is best
4.7while
F. 0a
Action
10.13
-318
I
\177\177.35
2.2
4-.51
while
t.o2$
any of these 3
Fo
15-5 (a) Midrange,
is false,
me
and should
ca
(a)
73 or
Mode,
(b) Median,
be:
mode.
mean,
<
74
73 to 74
(c) Mean,
1, no strict conclusion
be drawn about b,.\"
median,
(b) Correct.
reasons, do
15-7
14-5 \177)
'
a\177is
predicted,
.62
be:
should
\"Action
14-3
391
PROBLEMS
15-9 Closerto
73.5
because
20,
th\177
twice as reliable.
are
data
\337
14-7
(b) .9\1772
15-11
'
(a) is less believable, becauseit puts

complete
faith in a very
small
and unreliable sample.
> r 2 necessarily
Rzi
15-13(a) 103.54,\177z=
(b) 113.12,\177=
('D rst-
Average
14-9 (!a).:
of 3.16.
) .0 6
(c) r0/r
.020
\177.195
\177=
loss increases'by a
= 4/1,
.33, /3 =
.05,
which is
factor
unreason-
able.
2f
1,2
Ie)
15-15
- 5.3
do not
(a) You
the
want to play, because

of the game is 10/8 to
which I could win by using
value
me,
the strategy mix: H played

the time, T played
3/8.
(b) Each
which
15-1
ia)
.4, .5
, .44,
.28
play
Hand
results
5/8 of
Tequatly often,
in a zero payoff.
I
would secretly choosd my penny

only if my opponent was also
secretly
choosing and seemed
easy
to outwit.
Definition
Meaning,
Syrnbol\177
(a)
ENGLISH
ANOV?
a
!
or Other
Reference
LETTERS
(11-7),
intercept
regression
\177estimated
(11-7),(11-16),
(12-13)
slope
regression
\177stimated
(I 1-13)
Table 10-6
nalysis of variance
Important
(7-12)
ias
umber of columns
C\177.
d.f.
E(X)
:onstant coefficient
in
nodified chi-square
variable
legteesof
freedom
egression
error
(also F,
of
(10-27)
coefficient
regression
,st\177mated
in analysis
or
variance,
a contrast
(13-3)
(10-22)
(8-23)
(8-11)
(12-3),(12-4)
G, etc.)= event
(3-6)
riotE
(3-17)
of X
\177peCted value
\177x
e ratio
(4-17b)
(10-7), (10-17), (10-,28),

(14-2\1774)
,ternate
only
and
L( )
li celihood
MLE
1T
MSD
1Tean
aximum
(9-1)
hypothesis
(9-2)
if
function
likelihood
(7-24), (12-48)
estimate(tion)
squared deviation
Table 7-2
(2-5),(%13)
393
394
OF
GLOSSARY
IMPORTANT
SYMBOLS
or Other
Definition
Meaning
Symbol
(6-17)
size
sample
normal
specified
with
distribution,
variance
mean and
(6-17)
size
population
Table10-6
squares
mean sum of
MSS
m(,
bn?ortant Reference
(6-31)
(1-2),
sample proportion
Pr (E)
(E/F)
Pr
?(x)
probability
of event
conditional
probability
probability
function
joint
?(,\177'/y)
Y =
multiple
correlation,
decision
rule
variance
residual
S2
pooled
sample sum
variance
ss
sum
student's
of squares,
Y,
if Z
(14-39),(14-40)
(14-44)
(14-43),
(9-3)
(2-6)
or
(12-24)
regression
(8-16),(10-26)
of samples
or
(10-27)
(14-29)
(5-16),(6-2)
Table
variation
(I0-6)
(8-10), (12-26)
(13-24)
vat
variance = o2
weighted
X and
t variable
time
of variance
or
in
(5-10)
(14-4), (14-16)
in analysis
variance of sample,
(5-2b)
X,
determination
of
coefficient
or
partial correlation of
were held constant
R
and
function of
of X
number of rows
(3-22)
of X
correlation,
simple
given F
of E,
probability
conditioned
given
F2
function
probability
(6-28)
(3-\177)
(5-32)
(5-30)
sum
-- random
(also Y, V, W, etc.)
regressor
(alsoy, v,
(4-1),
or
variable,
in original
etc.)
regressor
in
the mean
(11-4)
form
= value
terms of
of
X,
deviations
sample mean of X (note

different usage than
this
\234)
(4-2)
(Fig. 4-2)
or
from
is a
(1i-5)
(2-1),(6-9)
GLOSSARY
IMPORTANT
OF
Definition
Important
Meaning
Symbol
'realized)value
this
letters
or
GREIK
LETTERS
are
probability
of type
population
regression
I error,
or
(12-3)
intercept
(9-9)
(12-3)
of type II error, or
population regression slope
population
any population
coefficient
regression
parameter
sampleestimator
of
population
mean
population
proportion
(7-11)
(4-3),
Pxy
population
correlationof
population
standard
population
variance
population
covariance
of
sum of
E'uF'
E and
ErhF
X and
F, or
F
(6-20)
(14-3)
(4-4)
deviation
(4-4),
of X
and Y
both
(4-5),
(4-19)
(5-21), (5-22), (5-23)

Table
/[ATHEMATICAL
E or
(4-17a)
(7-30)
product
OT\177ER
(4-10),
(1-2), (4-7),
II
(C)
parameters as follows:
(9-18)
probability
(8-9)
(4-13),
for population
reserved
generally
(7-9)
Table 13-1
a secondregressor
(b)
Other
between
standard normal variable,
or
Reference
of X.
distinction
little
After Chapter 8
capital and
is forgotten
395
SYMBOLS
SYMBOLS
(3-10)
2-2
396
GLOSSARY
equals,
by
OF
(d)
GREEK.
distributed
SYMBOLS
definition
(2-1a)
equals
approximately
is
IMPORTANT
(2-lb)
as
(6-31)
ALPHABET
EnoMish
Letters
Names
Equivalent
Letters
Names
English
Equivalent
As
Alpha
Nv
Nu
Bfi
Beta
E\177
xi
1'7
Gamma
Omicron
Delta
g
d
0o
A6
II\177-
Pi
Pp
Rho
E\177
Epsilon
Z\177
Zeta
Eta
\177
e
z
O0
Theta
It
Iota
Kx
Kappa
A/\177
Lambda
M\177t
Mu
nl
:Eo
Sigma
p
r
s
Tr
Tau
\276v
Upsilon
q\177\275
Phi
Xg
Chi
\177'V'
Psi
\177o)
Omega
uory
18, 224
dev: ations,
Absoh\177t[
' 168
, 278
hyl\177ot. hesis
Alternate
Analysi\177of
c9variance
Analysbq of
assun4ptiorls, 202
confid
nce iintervals,
hypot
intera
esis !est,
tion\177 215
one
Best linear
Bias, 134
195
206, 216
in
to,
con >arep to, 195, 278

sum o sqm tres, 204
!11,299
table,
011\177
two
tors
fa
classical
mean, 121
al.\177o Variation
Measures
363
trial, 59
thod compared,
variance, 121
324, 332,
critique, 33I
ecisions,
estima\177tion,
C._,
\17722
ri\177tio
Classicalversus
test,
336
loss fu\177ctiq',
315; 318,323
prior and r\177d,
sterior
probability,
subsect,
utility:
\1773\177
weakn\177aS,
nayes'
241
estimation,
Bayesian
Coding. 22
312
Collinearity,
see
Multicollinearity'
event, 35
331
Compositehypothesis,
312
Confidence interval, acceptable hypotheses, as set of, 2, 191,216

in analysis
of variance, 205, 216
319
3} 1
thlleare4, 44,
164
modified,
324,339
Complementary
xve n}th,-e,
\177fqnctlon'
121
Chi-square variable,
table, 368
333
113
theorem,
binomial,
for regression,
332
large
st,'engt\177,
for
compared,349
hypothesis
}iests,
shmpl\177,
328
likelihood
164, 368
limit
Central
327, 329
MLq,
and
statistic,
301
Centers, 12
by interval,
game theor
78
distribution,
normal, 292,
327, 329
intervals,
120
sample sum, as a,
Bivariate
confidgnce
12
approximation,
normal
of location
312
\177ods,
inc
362
table, 365
cumulative
tabl\177e,
mett
59
coefficients, table,
211
\177lean;
Bayesiarl
Unbiasedness
distribution,
Binomial
\177
D4; see
,n, 2\177
variati
sampling,
see also
298
variable is
MSD, 135
of sample
in
regres ion, !applied
if some
240
273
ignored,
i 195
f\177:tor,
estimator,
unbiased
regression,
196, 213
119, 125
120
variance,
and
mean
(ANOVA),
v{\177riance
162
Daniel.
Bell,
Bernoulli population,
397
175,
182
398
INDEX
Covariance, 88,286
(cont.)
interval
Confidence
Bayesian, 327. 329

for difference
206,
means,
several
in
216
proportions, 161
example.2
mean,
for
small
large sample,
158
proportion,
sample,
small
for
2, 157
multiple,
coefficients,
regression
266
simple, 244
for
Degrees of
freedom, 154
in
analysis
in
multiple
of variance, 199
regression,
259,273,
31t
see Independence
17
Deviations,
in means and proportions,

see
Confidence Interval, for difference
Discrete variable, 8, 52
Distribution,
see Probability
functions
Dummy
variable
regression,
269
and analysis of covariance, 279
278
ANOVA,
compared to moving
average,
277
for seasonal adjustment, 274
137, 148
121
correction,
Continuity
3, 106
and
163
variance,
Consistency,
312
Deduction.
Difference
as a, 131
interval,
269
Dependence,
statistical,
difference, 161
for proportions,
random
information,
Destructive testing, 5
190
one-sided type,
for
see Confidence
difference
for
interval,
Crosssection
in simple
2, 131
several,
means,
in hypothesis
regression, 243
in single
sample, 154
in two samples,
156
Density function, 64
129, 132
large sample,
sample,
152
meaning of,
for
to, 187
relation
test,
hypothesis
Critical point
Decision theory.
155,205
in two
difference
223
testing, 168
a line,
fitting
large
150
sample,
small sample,
for
means,
in two
difference
for
91
independence,
and
Criteria for
293
correlation,
for
Continuous distributions, 63
9
variable,
Continuous
Contrast of means,
experiments,
Controlled
of MLE, asymptotically, 148

of sample mean and median, 137
Error, confidence interval
allowance,
129; see also Confidence interval
in hypothesis
testing, 169
288
confidence interval,
286
in
305
test,
hypothesis
293
compared to,
covariance,
independence,
relation to, 91
interpretation,
286,
in
291,300
306,
compared
286
simple,
285
point,
to, 285,296,
301,
see Confidence
128
Bayesian, 322
versus
Bayesian
classical,
estimator, comparedto,
and
Counted data, see Bernoulli population;

Binomial distribution
Counter
variable,
120, 157,270; see also
Dummy
interval,
variable
215
interval
305
sample,
in ANOVA,
243,275,297
fitting,
regresson,
Estimate,
308
population, 285
regression,
after
1,
236
model,
regression
residual,
multiple, 310
partial,
equivalence
137
of,
237
291
assumptions,
calculation,
and statistical
economic
285
Correlation,
136
Efficiency,
207,216
regression
function,
loss
324
132
323
properties of, 134

Estimating
equations,
259
(least-squares)
multiple
regression,
in simple regression,
227
in
399
INDEX
7
Darrell,
HUff,
167
HYpothesis test,
Event
in
45
ind
:nc\177ent,
int,
:cdion of, 34
Bayesian, 333,339
exclusive, 34
for
value; Mean
3ee Expected
m,
\177ction
ot
216
point,
critical
errors of type
,345
ss, 316
\177
or a s.\177mple
prob-value, 179
regression,
245,299
for seasonal influence,
in
135
variance,
86, 93,106
Of a siam,
\177ee a!so Mean
istatiJtic '
201
use, 299,
regression
rela!ion to
tabli\177.,
31
t, 209,300,
of variables,
215
egression,
223,245,303
sJr.D
A. S., 241
'F*-r-.a\177-.k;,;-;
S; see also Relative f.r. equ\177e2cy
ri\177MP\177t\177\"J'
Aom
variable,
Fundtions,
o\177one ran,,
....
of\177twO
Gan[e theory,
c. nservative,
function, 341
1\177
\177ss(payoff)
finimax
and
\177ture
349
maximin,
as opponent,
addle point,
.trategies,
342, 347
dominated, 347
distribution,
G\177tussian
see Normal
listogram,
240
theorem,
auss-Markov
( lossary
( [ossett,
of symbols,
W. S.. 153
11
toel, P., 1t3
393
225
in regression,
attractive properties,
225
228
calculations,
229
coefficients,
regression,
multiple
Likelihood
function,
257
143,250
ratio test, Bayesian,

B. W., 149,331
Likelihood
Lindgren,
contrast
336
of means,
207
games,340
variable
numbers, 49
of large
Leastsquares
Linear combination,
340
pure,
distribu-
tion
in
342
strictly determined
interval
equations, 227,259
348
mixed, 344,347
i
see Bivariate
distribution,
Joint
Law
340
solution, compared to,

as too, 348
B'wesian
Interpolating in regression,
247
Interval estimate, see Confidence
Isoprobability
ellipses, 293
84
variables,
random
1,3;seealso
interval
Confidence
in ANOVA,
91
83
and inference,
Induction
153
to,
45
of events,
311
Null
statistical, 45
independence,
covariance,relation
-' 1
I:itted (predicted)value,
in!
see also Confidence interval;

hypothesis
370
R. A.,
'ishe\177
249
ANOVA use, 199,204, 213
distribution,
;
regre\177=,,,,*,
276
185,187
two-sided,
in
dangers
E\177trapc\177lation
170
176
170,
power,
108,117
mean,
.Lmple
168
I and II, 169,
regression,
266
one-sided, 168, 190
93
combination,
i\177ear
multiple
in
2, 187,
interval, relation to,
confidence
variables, 73,
of random
175,182
305
correlation,
74
definition,
Exp\177
simple,
versus
composite
,33
Ex \177
196,213
ANOVA,
of
random
93
variables,
regression slope, 239

Linear
of a
transformation,
variable,
70
of observations,
of random
normal
19
variables,
58, 93
400
INDEx
375
Logarithms,
315, 318, 323,341
Loss function,
j.,
McDonald,
as Bayesian estimates,
of binomial
\177-,141,
least 254
squares,
of mean
in
multiple
of
parameter
regression,
in general,
141,
\370fproportion,
250
Bernoulli
of binomial,
comparisons, 206, 216, 281

correlation,
\17742,
310
wuqation,
and
Multiple regress/on, 255
148
calculations, 258
intervals,
266
equations,
259
confidence
error reduced, 237
333
120
estimating
265
interpretation,
for, see Confidence
estimation, 257
least squares
posterior., 328
of
56, 66, 29
coefficients,
238
86,
approximation to
67
distribution,
Mean sum of squares,

Measures of location,
as
Bayesian
ef\177ciency,
Minimax
Mode,
12and
as
Bayesian
as MLE
maximin,
estimator,
estimator,
332
121
Binomial,
155
of symbols,
random
variables, 52,
regressors,
225, 234
for
393
132
switch, 154,
Null
12
estimator,
137
to, 153,
Notation,
glossary
for mean, 73
203
12
hypothesis,
danger
in
danger
in
168
accepting,
rejecting,
178, 267
179
323
estimator, 136
unbiased
t, relation
table,
367
for
Measuresof spread,17
of sample,
variable,
Normal
93
see al,\276o Expected

value; Sample mean
Mean squared
error,
137
and consistency, 138
related
to bias and variance, I38
Mean
squared
deviation
(MSD), 18
bias,
135
Median
305
equations, 227, 259

Z, 66
Normal
I22
241
statistics,
Nonsense correlations,
variable,
regress/on
to, 308
relation
66, I03
of sample proportion,
of sample sum, 106
of sum,
partial correlation,
xee also Regression
Nonparametric
of random
256
model,
mathematical
of//near
combination, 58, 93
MLE,
145
56,
278,283
hypothesis tests, 266
confidence
interval interval
of population
to, 255,
ANOVA,
re/at/on
b/as
reduced,
273
144
population,
I21
310
and las\177 regressor,

311
and regress/on, 310
253,
83
conditional,
Multiple
Multiple
257
142, 147
deficiencies,
sample
Mean, of
to, 250,
of moments,
in regress/on,
small
148
260
264
treatment,
142, 145
(normal),
versus method
m partial correlation, 3IO
144
equivalence
of regressors,
M!dticollinearity
147, 251
properties,
sample
large
\17742,
Operating characteristics
set, 30
Outcome
curve,
342, 347
323
148
Monte Carlo, 140
(MLE),
332
interpretation,
geometric
of moments
estimation,
Mean; Variance
method
,tee also
Maximum
141 Likelihood Estimates
I9
Moments, 16,
Parameters of population,
glossary,
Partial
395
correlation,
assumptions, 309
308
128
184
401
INDEX
coml:
n, 309
regre\177
relation
341
Payoff
Point
point
Poisson
bution,
Pooled
Ice, 156, 199
Popula'
02
Power
athesis
Predtct\177t)]
:erval
45, 312, 326
Probabt!ity,
\1777
as
tre;qucncy,
27,
of,
limit
48
66
Probabi\177Jity
d\177nsity function,
Probabi\177lity
f\177tncl[ons (distributions)
6inon ial, 59
81
i \177
104 314
relation to,
Propert]es of
Proportions,
10, 104
Randon\177
digi
Random
nor
Randorb sat
exambluff
266
etc.,
q/,
correlation, compared
301. 305
term,
236
331
level,
estimators,
134
181
122; see also Confidence

frequency
table, 360
numbers,
\177ling,
102
I9, 125
;. 102
1, 102,103
234, 237
249
see also Multiple
nonlinear, 250
parameters, 235
245,
prediction,
303
prediction interval, 245

residuals,
237, 275, 297
see also Multiple

Regrets,
361
253,254
model,
limitations,
multiple. 255:
gression
significance
225
estimation,
mathematical
331
test, 179
hal
285,*'296,
to,
coefficients, 229, 24I

versus
random independent
model
inter\177a]; Relative
Bern+u!li,
244
or,
variable, 254
o\177a
definittion.
243
[\177,244
fixed
12. 326
312,
Prob-va\177lue
for
for
for
301
population,
normal
least squares
330,
personal,
posterior,
236
variable,'\177254, 305
Iiketihood function,
marg]nal,\177
variable,
estimated
5:
normal, 6,
prior,
'
63
cont\177uou\177,
joint,'78
independent
error
'
\177
b\177var\177ate,
discr{te,
error term,
about
confidence intervals,
330, 331
subjective?50,
.I
about
bivariate
40
perso[hal,50, 330,331
condi'tiona
about dependent
bias. 273
cond(tiona!,
symn\275ctnc,
299
assumptions,
235
axionl\177atic?49
relattye
as ANOVA.
326,
314,
probabilities,
I7
Regression, 220,234
331
45,312,
of sample,
Range
267
regression,
in
84
regressor, 254
test, 170, 176

245
lities,
:ertor
and
63
continuous,
of, 72.84
function
in regression,
dion
72,
derived,
328
lance,
ini(
Sampling
variable,
discrete, 52
Posteridr
Prior
see also
Random
66, 103
56,
Prior pro
124
definition, 52
t'
Posteridr
128
of population,
summary,
121
me!in, 328
Posteri4r
simulated, 26, 56, 105

as subset
seeEstimate,
e:
102, 124
replacement,
with
space, 35
Partitio\177
116, 124
without replacement,
308
to,
Relative frequency,
density,
regression
335
9, 63, 103
64
limit is probability,
Residuals. see Error;
Robustness,
163
27,
66
Variation
re-
402
INDEX
Saddle point,
342
as Bayesian
323
estimator,
and central limit theorem,

distribution, 109, 112
113
t Statistic,
normal,
estimator,
241
transformation
of sample
Gauss-Markov
as linear
sum, 107
as sample
mean, 122,
frequency
Relative
test
ad-
109, 115
289
169
error,
170
!I error,
Type
correlation,
225, 230
regression,
Type I
105
distribution,
Translation of axes,in
in covariance, 88
in
Sample space, 30
Samplesum,
368
350
justment
variance, 108, tt7
Sampleproportion,
125;seealso
table,
Tables,
Test of hypothesis,
see Hypothesis
Time series,269;seealso Seasonal
112
distributed,
normally
153
to, 209, 300,311

relation
to, 153, 155
F, relation
as estimator of/\177, 128, 136

expected value, 108, 117
as
152
distribution,
137
efficiency,
327
estimates,
Bayesian
and
15
distribution,
Symmetric
mean, 13, 107
Sample
mean, 106, 117
see Variance
of
bias, 6
methods, 5
see alsoRandom
dummy
using
variables, 274
deviation,
319
loss,
152;
120
163
see also Variation

58, 95
combination,
linear
pooled, 156, 199
of population,
of
see also
of
56,
199;
variable, 56, 66
statistic
see ah'o F
regression
coefficients,
128
unexplained
sample, single,
85; see
18, 135
sample
mean, 108, 117
of sampleproportion,
122
of sample sum, 106, 117
of
of
Game theory
see t Statistic
238
residual, 204, 237, 275;seealso

Variation,
also Variance
variable, 70
66, 103
328
distribution,
posterior
of random
sum,
94
unexplained, 204;
see
Student's t,
Sum of random
variables,
Mean; Variance
Sum of squarcs, see Variation
of
population,
91
204;
explained,
ratio,
19; see
Statistic definition, 8,
Strategies,
168,
see Regression
of normal
variable,
59
of random
test,
181
prob-value,
regression,
Standard
135
interval,
as covariance,
277
hypothesis
Skewed distribution,
15
Slonim, M. V., 7
Small sample estimation,
Confidence interval
Square
root table, 351
Standardization,
variance,
confidence
average,
moving
Significance level of
170
Simple
of sample
Variance, of Bernoulli
of binomial, 121
H., 206
relation to
Sign test, 76
mean, 136
sampling
Seasonaladjustment,
with
sample,
random
of sample
Utility versus monetary
reasons, 5
Scheft6,
138
asymptotic,
102
Sampling,
134
Unbiasedness,
117
106,
variance,
Sample variance,
seealso Variation,
unexplained
also
Variation
(explained, unexplained, and

203,204,
212, 213,297
total),
unexplained,
205,211,215
403
INDEX
Wilks, S. S., 149
Venn diag
Z variable,
Wallis, W
Weighted
15,94
see
Normal
Zero-sum game, 341
variable

Introductory Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory Statistics

Uploaded by

Copyright:

Available Formats

\370ductory

\177jective has been to write a text that

and inferer :e to the theory

logical relation between topics that have often

priority on the regression model, not only

business schools, and for service

arrow (=>) if they

to his students' background.

to the instructor's manual.

and design. The text

sections. In all instancestheseareoptional;

Our experiencehas beenthat

two-semestercourse;a single semester

Ontario and Wesleyan

ce([ters (Measuresof Location)

3-1 Int\177 oduction

4-7 Notl ttion

a Finite Population, without

14-2 Pa:'tiaI Correlation

Digits and Normal Variates

Standard Normal Probabilities

originally meant the collectionof popUlatiOn

1-1 EXAM ?LE

each candidate. Clearly,

exampleof statistical inference

with crucial J uestions being,

typifies the very

.57 and .63.

estimate is not made

sample from a Republican population. Hence,the more precise our predicFormal

be contradicted. \177The other is to sample the

to a precise prediction, the more

from the specific to the general,

e whole population. Thus

problem of deduction first.

(that the population

proportion) is based on a prior deduction

sample behave? Will the

inference. This involves, in the later chapters, turning

as the basis for

even more useful

for any one

almost always play some part. In our example of

ing may involve

destructive testing. For example,suppose

population of bulbs until

for ensun >ased on the assumption that

lis, W. A., and

already discussed the

the sample must

number is the statistic

compute. In the sample of

describe two other

(a) The results

proportion (.60) required

answers of the 1000

random variable because

3 is called a \"frequency distribution',\"

so will note that the two

that a sample of 200 men is drawn from a certain population,

Frequency, and Relative Frequency of the

2-3. This frequency disto representfrequencies

into cells is ilIustrated

with the following