You are on page 1of 57

Chapter 1 Descriptive Statistics

Data
Data are very important for scientific study, and statistics is a discipline that deals with the
collection, presentation and analysis of data. In this chapter we are going to study how we can
summarize and describe a set of data. When we study a set of data we need to identify the
following important characteristics of the dataset.
Primary and secondary data. When the data are collected by us it is called primary data.
We always have the individual values of the data. When the dataset is collected by others,
it is called secondary data. Sometimes the data is grouped into a table, and is called
grouped data.
Population and sample data. Population refers to the totality of elements in which we are
interested. Suppose we want to study the salary of Hong ong people, our population
includes all those persons who wor! in Hong ong. However as the population is so big
that it is not practical and economical to collect salary data of all the wor!ing people, we
always select randomly only a subset of the population and the data is sample.
Discrete and continuous data. It is important to identify whether the data is continuous or
discrete. "or e#ample data on the number of persons in a household is discrete, and data
on salary is continuous. Different statistical techni$ues are used for handling discrete or
continuous data.
Frequency Distribution
Statistical data obtained by means of census, sample surveys or e#periments usually consist of
raw, unorganized sets of numerical values. %efore these data can be used as a basis for inferences
about the phenomenon under investigation or as a basis for decision, they must be summarized
and the pertinent information must be e#tracted.
Example 1
& random sample of '(( households in a town was selected and their monthly town gas
consumption )in cubic metres* in last month were recorded as follows+
,, -. -/ '(0 1- -1 0, 02 -, 31
-( '(0 -/ -0 0' '(2 0( '(/ 31 ,.
'(1 1- -3 .0 1. 33 0. 00 3( 1,
-- ''. 01 -- 20 3. 1( 33 -- 3.
1. -, -' 1- 11 2' '(, 0. 02 12
1- 1, -1 -/ 1' 00 ,3 30 1- 3(
''01 /0 '(2 -3 31 10 0- '(. -. 0'
23 '.( 1/ '., '/. -3 2- ,, ''. .-
2. .2 '/( '(( 23 ,1 /' '.0 '/1 ,0
'(. ,' '/, ,/ '(, ''( '(1 23 '(- ''1
'
& useful method for summarizing a set of data is the construction of a fre$uency table, or a
fre$uency distribution. 4hat is, we divide the overall range of values into a number of classes and
count the number of observations that fall into each of these classes or intervals.
4he general rules for constructing a fre$uency distribution are+
)i* 4here should not be too few or too many classes.
)ii* Insofar as possible, e$ual class intervals are preferred. %ut the first and last classes can
be open5ended to cater for e#treme values.
In e#ample ', the sample size is '(( and the range for the data is ''/ )'/1 5 .2*. & fre$uency
distribution with si# classes is appropriate and it is shown below.
Frequency distribution of household town gas consumption
4own gas monthly
consumption
) in cubic metres*
6umber of
households
.( 5 /0 ,
2( 5 ,0 ',
3( 5 10 .,
-( 5 00 /(
'(( 5 ''0 '-
'.( 5 '/0 1
4otal '((
Class limits+ are the numbers that typically serve to identify the classes in a listing of a fre$uency
distribution. 4hus, in the above fre$uency distribution, for the class whose fre$uency is /(, its
lower class limit is -( and upper class limit is 00.
&s contrasted to a class limit, a class boundary is the precise point that separates one class from
another, rather than being a value indicated in one of the classes. & class boundary is typically
located midway between the upper limit of a class and the lower limit of the ne#t higher class
ad7oining it. 4herefore the class boundary separating the class 3(510 and the class -(500 is
halfway between 10 and -(, that is, at the point 10.,.
Class interval+ is the width of a class. 4he class interval of a class is computed by subtracting the
lower limit )boundary* of the class from the lower limit )boundary* of the ne#t class.

Class midpoint or class mar+ is the point dividing the class into e$ual halves on the basis of
class interval. 4his point can be obtained by adding the lower and upper limits )boundaries* of a
class and dividing by ..
!elative frequency of a class+ is the fre$uency of the class divided by the total fre$uency of the
distribution.
Cumulative frequency distribution+ shows the number of items of a series that are less than )or
more than* certain specified values.
.
"easure of Central #endency
& value that would describe the 8centre8 of a distribution would be visually located near the spot
where most of the data seem to be concentrated. 9onse$uently, values that fulfil this role are
called measures of central tendency.
4he most common measures of the central tendency of a data set are arithmetic mean or simply as
mean, median and mode.
4he mean of a set of numerical data is the sum of the set divided by the number of observations,
that is, their average.
4he median of a distribution is the value which divides the distribution so that an e$ual number of
values lie on either side of it, i.e., half of the items have values smaller or e$ual to it and half of
the items have values larger or e$ual to it.
4he mode of a set of numerical data is the value which occurs most fre$uently.
Example 1 $calculating mean% median and mode for individual data&
4he following table shows the hourly wage rates of eight sampled construction wor!ers.
Wor!er i ' . / 2 , 3 1 -
Hourly wage
rate )
i
x
*
:/, /- 23 3( 3, 30 1. 1-
;ean
-
* )
-
'

i
i
x
x
*
-
)
- 1 3 , 2 / . '
x x x x x x x x + + + + + + +

-1, . ,1
-
23/
):*
<ocation of the median+ , . 2
.
0
.
'

+ n
th
;edian = , . 3.
.
3, 3(
.
, 2

+ x x
):*
;ode+ the sample size is too small, mode cannot be identified.
Example ' $calculating mean% median and mode for grouped data&
/
4he following table shows the daily wages of a random sample of construction wor!ers.
9alculate its mean, median and mode.
Daily Wages ):* 6umber of Wor!ers
.(( 5 /00 ,
2(( 5 ,00 ',
3(( 5 100 .,
-(( 5 000 /(
'((( 5 ''00 '-
'.(( 5 '/00 1
4otal '((
Solution
Daily Wages ):*
6umber of
Wor!ers
i
f
9lass ;ar!
i
x
i i
x f
.(( 5 /00 , .00., ',201.,
2(( 5 ,00 ', 200., 1,20..,
3(( 5 100 ., 300., '1,2-0.,
-(( 5 000 /( -00., .3,0-,.,
'((( 5 ''00 '- ',(00., '0,10'.(
'.(( 5 '/00 1 ',.00., 0,(03.,
4otal '(( -.,/,(.(
"ean , . -./
'((
( . /,( , -.
* )
3
'
3
'

i
i
i
i i
f
x f
x ):*
Daily Wages ):* 6umber of Wor!ers
i
f
9umulative "re$uency
i
F
.(( 5 /00 , ,
2(( 5 ,00 ', .(
3(( 5 100 ., 2,
-(( 5 000 /( 1,
2
'((( 5 ''00 '- 0/
'.(( 5 '/00 1 '((
4otal '((
&s (.,n = (.,)'((* = ,(, so the median lies in the 2th class.
"edian =
* )
, . (
2
2
/
2
c
f
F n
L

+
where L is the lower class boundary,
c is the class interval.
- . -/. * .(( )
/(
2, * '(( ) , . (
, . 100

+ ):*
Daily Wages ):*
6umber of
Wor!ers
i
f
9lass Interval
i
c
>elative Density
* .(( )
8
i
i
i
c
f
f
.(( 5 /00 , .(( ,
2(( 5 ,00 ', .(( ',
3(( 5 100 ., .(( .,
-(( 5 000 /( .(( /(
'((( 5 ''00 '- .(( '-
'.(( 5 '/00 1 .(( 1
4otal '((
&s /(
8
2
f is the largest relative density, so mode lies in the 2th class.
"ode * )
* ) * )
2
8
/
8
2
8
,
8
2
8
/
8
2
2
c
f f f f
f f
L
+

+
/ . -,- * .(( )
* ., /( ) * '- /( )
., /(
, . 100
+

+
):*
(dvantages and disadvantages of each measure
"ean
Advantages+ )i* &ll values in the distribution are used in its calculation, so it can be
regarded as more representative than the other two measures.
)ii* Its method of calculation is simple and most people understand the
meaning of its result.
,
)iii* Its result can easily be used in further analysis.
Disadvantages+ )i* Its result can be easily distorted by e#treme values. &s such, its result
may be rather lower or higher than the bul! of the values and
becomes unrepresentative.
)ii* In case of open end classes, mean can be calculated only if their class
mar!s are determined. If such classes contain a large proportion of
the values, then the mean may be sub7ected to substantial error.
"edian
Advantage+ Its result will not be affected by e#treme values and open end
classes.
Disadvantage+ It has to be supplemented by other statistics because it does not
reflect the distribution in the way that the mean does, that is,
including all values.
"ode
Advantages+ )i* Its result will not be affected by e#treme values and open end
classes.
)ii* If data are not grouped, it can be determined easily.
Disadvantages+ )i* It has to be supplemented by other statistics.
)ii* It is difficult to obtain an accurate estimate of the mode if the values
are classified into a fre$uency distribution.
)ow to select a suitable measure
)i* &lways select the mean whenever there is no special reason for choosing the other two
measures.
)ii* Select the median is the distribution consists of substantial amount of e#treme large or
small values.
)iii* Select the mode if integral result is preferred as in cases the data are in ordinal scales.
3

"easure of data variation $variability&
& measure of central tendency is almost never, by itself, sufficient to provide an ade$uate
summary of the characteristics of a set of data. We will usually re$uire, in addition, a measure of
the amount of variation in the data.
?#ample '
9onsider the following measurements, in grams, for two samples of strawberry 7am bottled by
companies & and %+
Sample for
9ompany &
/' /. /. // /.
Sample for
9ompany %
.- .0 /. /, /3
%oth samples have the same mean, /. grams. It is obvious that company &, in comparison with
company %, bottles strawberry 7am with a more consistent content. We say that the variability of
the observations is smaller for company &. 4herefore in buying strawberry 7am we would feel
more confident that the bottle we select will be closer to the advertised average content if we buy
from company &.
4he most important measures of variability or dispersion are the range, mean deviation,
standard deviation and variance.
)4here are some other measures li!e $uartile deviation and percentiles. We shall not study these
measures. >ead our te#tboo! if interested*
4he range of a set of numbers is the difference between the largest and the smallest number in
the set.
Example ' $For individual data&
4he following table shows the hourly wage rates of eight sampled construction wor!ers.
Wor!er i ' . / 2 , 3 1 -
Hourly wage
rate )
i
x
*
:/, /- 23 3( 3, 30 1. 1-
4he range is :1- 5 :/, = :2/.
4hough range is simple and can be obtained easily, its result is unstable. 4his is particularly true
if the sample size is large. So whenever the sample size is over '(, we seldom choose to use
range to indicate variability of the data.
1
Mean deviation is the average of the absolute deviation of the numerical data from their mean.

Wor!er i ' . / 2 , 3 1 -
Hourly wage
rate )
i
x
*
:/, /- 23 3( 3, 30 1. 1-
-1, . ,1

i
i
x
x x
...-1, '0.-1, ''.-1, ..'., 1.'., ''.'., '2.'., .(.'.,
;ean deviation
3,3 . '/
-
., . '(0
-
-1, . ,1
-
'

i
i
x
):*
4he mean deviation is a good measure to show the e#tent of variation of the data in a distribution.
However, when this measurement is used in further analysis, it would give rise to some
unnecessary tedious mathematical problem as a result of its absolute value term. 4o avoid this
pitfall, we can use the standard deviation instead.
Standard deviation of a population $
*
is the s$uare root of the average of the s$uared distances
of the observations from the mean.
N
x
N
i
i

'
.
* )

, where

is the population mean


4o compute the sample standard deviation
* )s
we use the above formula, replacing

by x
and N by ' n .
.
'
) *
'
n
i
i
x x
s
n

Wor!er i ' . / 2 , 3 1 - 4otal


Hourly wage
rate )
i
x
*
:/, /- 23 3( 3, 30 1. 1- 23/
.
) /,.-1,*
'3...3):*
1
x
s


Variance is the s$uare of the standard deviation.
-
.
. '
) *
'
n
i
i
x x
s
n

Example * $for grouped data&


4he following table shows the daily wages of a random sample of construction wor!ers.
9alculate its mean deviation, variance, and standard deviation.

Daily Wages ):* 6umber of Wor!ers
.(( 5 /00 ,
2(( 5 ,00 ',
3(( 5 100 .,
-(( 5 000 /(
'((( 5 ''00 '-
'.(( 5 '/00 1
4otal '((
Solution
Daily Wages ):*
6umber of
Wor!ers
i
f
9lass ;ar!
i
x
, . -./

i i
i i
x f
x x f
.(( 5 /00 , .00., .,3.(
2(( 5 ,00 ', 200., 2,-3(
3(( 5 100 ., 300., /,'((
-(( 5 000 /( -00., .,.-(
'((( 5 ''00 '- ',(00., 2,03-
'.(( 5 '/00 1 ',.00., /,//.
4otal '(( .','3(
;ean deviation 3( . .''
'((
'3( , .'
3
'
3
'

i
i
i
i i
f
x x f
):*
Daily Wages
):*
6umber of
Wor!ers 9lass ;ar!
.
) *
i i
f x x
0
i
f
i
x
.(( 5 /00 , .00., ', /1.,--(
2(( 5 ,00 ', 200., ',,12,32(
3(( 5 100 ., 300., /-2,2((
-(( 5 000 /( -00., '1/,.-(
'((( 5 ''00 '- ',(00., ',/1','3-
'.(( 5 '/00 1 ',.00., ',,-3,(/.
4otal '(( 3,23.,2((
@ariance
.
323.2((
) * 3,, .13.11
00
s
Standard deviation =
3,.13.11 .,,.20
Comparison of the variation of two distributions
4he values of the standard deviations cannot be used as the bases of the comparison because+
)a* units of measurements of the two distributions may be different, and
)b* average values of two distributions may be widely dissimilar.
4he correct measure that should be used is the coefficient of variation
* )CV
.
A '((
x
s
CV
Example +
4he following table shows the summary statistics for the daily wages of two types of wor!ers.
Wor!er8s
4ype
Daily Wages
;ean Standard deviation
I :'(( :.(
II :',( :.2
9ompare these two daily wages distributions.
Solution
In comparison Distribution >eason
&verage magnitude
II B I '(( ',( >
I II
x x
@ariation I B II
A '3 A '((
',(
.2
A .( A '((
'((
.(
>
II I
CV CV
Chapter #wo , -robability
.ntroduction and concepts
CPerhaps it was manDs un$uenchable thirst for gambling that led to the early development of
probability theory. In an effort to increase their winnings, gamblers called upon the
'(
mathematicians to provide optimum strategies for various games of chance.E 5555 from Walpole
>.?. Introduction to Statistics
Probability is the basis upon which the discipline of statistics has been developed and applied in
many fields associated with chance occurrences such as politics, business, weather forecasting,
and scientific research. Probability may be ta!en as a tool with which we may solve problems
involving uncertainties. In fact uncertainty is a basic element of human e#periences. 4o cite some
e#amples+ travelling time, number of customers, rainfall, temperature, share price movement,
length of our life, etc.
4here are three approaches to understand probability. In the empirical approach, probability may
be ta!en as a relative fre$uency. &s such the probability of an aeroplane arriving its destination
on time may be ta!en as the proportion of times the aeropline has been on time in the past, say,
one thousand times.
Suppose in a trial of an e#periment, there are ! possible outcomes which are e$ually li!ely. 4he
probability of the occurrence of an outcome is therefore 'F!. 4hus in throwing a coin, the
probability of having a head is G. In our course, we shall adopt this approach but the empirical
approach is always useful in giving us some intuition to understand the problem.
4he third approach is very mathematical. & number of a#ioms have been set up and from these some
theorems of probability have been developed. 4his approach is too abstract and usually used by
mathematicians.
Some /asic Concepts
Sample space+ is a set of all possible outcomes of an e#periment.
?vent+ is a subset of a sample space.
4o find the probability of an event we need to count the number of outcomes of the event and the
number of all possible outcomes of the e#periment, and then to divide the former by the latter. Hence
the following counting rules may be helpful.
Some counting rules
Example 1
4hree items are selected at random from a manufacturing process. ?ach item is inspected and
classified defective )D* or non5defective )6*.
Its sample space is =

'

NNN NND NDN DNN


NDD DND DDN DDD
Example '
''
4he event that the number of defectives in above e#ample is greater than '.
Its sample space is = HDDD DD6 D6D 6DDI
4he probability of the event is 2F- or G.
Example *
Suppose a licence plate containing two letters following by three digits with the first digit not
zero. How many different licence plates can be printedJ
'st
<etter
.
nd
<etter
'st
Digit
.nd
Digit
/rd
Digit
6umber of
9hoices
& 5 K
).3*
& 5 K
).3*
' 5 0
)0*
( 5 0
)'(*
( 5 0
)'(*
6umber of different licence plates that can be printed is
).3*).3*)0*)'(*)'(* = 3(-,2((
Example +
"ind the possible permutations )the number of ways where se$uence of the letters is counted*
from / letters &, %, 9.
9onsider the following tree diagram+
#he number of permutations of n distinct ob0ects is

L * ' *) . *.....) . *) ' *) ) n n n n P
n n

#he number of permutations of n distinct ob0ects taen r at a time is
'.

* ' *) . *...) ' *) ) + + r n r n n n P
r n


( )( )
*L )
L
* ' *) . *...) ' *) )
* ' *) . *...) ' *) ) * ' *) . *...) ' *) )
r n
n
r n r n
r n r n r n r n n n


+ +


e.g. 4he number of /5letter words formed from , letters is
3(
*L / , )
L ,
/ ,

P
#he number of distinct permutations of n ob0ects of which
'
n are alie of the first
ind%
.
n are alie of the second ind% 11111%
k
n
are alie of the th ind and
n n n n
k
+ + + ...
. '
is
* L *...) L *) L )
L
. ' k
n n n
n
"ind the possible permutations of the following , letters+ & & & % 9
4here are five ob7ects of which three are ali!e.
4he answer
L /
L ,
L /
, ,

P


Example 2
How many 15letter words can be formed using the letters of the word 8%?6K?6?8J
)there are ' %, / ?, . 6 and ' K*
4he number of 15letter words that can be formed is
2.(
* L ' *) L . *) L / *) L ' )
L 1



#he number of combinations $number of ways where sequence is not counted& of n
distinct ob0ects taen r at a time is
*L ) L
L
r n r
n
C
r n

"ind the possible combinations of , distinct ob7ects ta!en / at a time.


4he answer

*L / , ) L /
L ,

Example 3
'/
4he number of /5person committees that can be formed from a group of 2 persons is
2
*L / 2 ) L /
L 2
/ 2

C
Example 4
& bo# contains - eggs, / of which are rotten. 4hree eggs are pic!ed at random. "ind the
probabilities of the following events.
)a* ?#actly two eggs are rotten.
)b* &ll eggs are rotten.
)c* 6o egg is rotten.
Solution+
)a* 4he - eggs can be divided into . groups, namely, / rotten eggs as the first group and ,
good eggs as the second group.
Metting . rotten eggs in / randomly selected eggs can occurred if we select randomly .
eggs from the first group and ' egg from the second group.
4he number of this outcome is ( )( ) ',
' , . /
C C
4otal number of possible outcomes of selecting / eggs randomly from the total - eggs is
,3
/ -
C
.
4hus the probability of having e#actly two rotten among the / randomly selected eggs is
( ) ( )
,3
',
/ -
' , . /

C
C C

)b* Similarly, the probability of having all / rotten eggs is
( ) ( )
,3
'
/ -
( , / /

C
C C
)c* 4he probability of having no rotten egg is
( ) ( )
.-
,
,3
'(
/ -
/ , ( /

C
C C
'2
Rules of probability
4he following rules may help us to find the probability of an event.
(ddition !ule+ "or any events that are not mutually exclusive
* ) * ) * ) * ) B A P B P A P B A P +
where B A is the union of two sets & and %, it is the set of elements that belong to &
or to % or to both.
B A is the intersection of two sets & and %, it is the set of elements that are
common to & and %.
Illustrative example
'-( students too! e#aminations in ?nglish and ;athematics. 4heir results were as follows+
6umber of students passing ?nglish = -(
6umber of students passing ;athematics = '.(
6umber of students passing at least one sub7ect = '22
4hen we can rewrite the above results as+
Probability that a randomly selected student passed ?nglish =
0
2
'-(
-(

Probability that a randomly selected student passed ;athematics


/
.
'-(
'.(

Probability that a randomly selected student passed at least one sub7ect
,
2
'-(
'22

"ind the probability that a randomly selected student passed both sub7ect.
Solution
<et ? be the event of passing ?nglish, and ; be the event of passing ;athematics.
It is given that+
0
2
* ) E P N
/
.
* ) P N
,
2
* ) E P
&s
* ) * ) * ) * ) E P P E P E P +
/' . (
2,
'2
,
2
/
.
0
2
* ) * ) * ) * ) + + E P P E P E P
Example 5
',
& card is drawn from a complete dec! of playing cards. What is the probability that the card is a
heart or an aceJ
Solution
<et & be the event of getting a heart, and % be the event of getting an ace.
4he probability that the card is a heart or an ace is
* ) B A P
.
* ) * ) * ) * ) B A P B P A P B A P +

'/
2
,.
'3
,.
'
,.
2
,.
'/
+
For mutually exclusive events,
* ) * ) * ) B P A P B A P +
What is the probability of getting a total of 818 or 8''8 when a pair of dice are tossedJ
Solution
4otal number of possible outcomes = )3*)3* = /3
Possible outcomes of getting a total of 818 +H',3N .,,N /,2N 2,/N ,,.N 3,'I
Possible outcomes of getting a total of 8''8 + H,,3N 3,,I
<et & be the event of getting a total of 818, and % be the event of getting a total of 8''8.
4he probability of getting a total of 818 or 8''8 is
* ) B A P
.
* ) * ) * ) * ) * ) * ) B P A P B A P B P A P B A P + +
...& and % are mutually e#clusive

0
.
/3
.
/3
3
+
.f ( and (6 are complementary events then
* 8 ) ' * ) A P A P
?#ample 0
& coin is tossed si# times in succession. What is the probability that at least one head occursJ
<et & be the number of heads occurs in si# successive tosses.
* ( ) ' * ' ) A P A P
'3
( )( )( )( )( )( )
32
3/
.
'
.
'
.
'
.
'
.
'
.
'
'
Conditional -robability
<et & and % be two events. 4he conditional probability of event & given that event % has
occurred, denoted by
* F ) B A P
is defined as
* )
* )
* F )
B P
B A P
B A P

provided that P)%* B (.


Similarly, the conditional probability of % given that event & has occurred is defined as
* )
* )
* F )
A P
B A P
A B P

, provided P)&* B (.
?#ample '(
& hamburger chain found that 1,A of all customers use mustard, -(A use !etchup, and 3,A use
both, when ordering a hamburger. What are the probabilities that+
)a* a !etchup5user uses mustardJ
)b* a mustard5user uses !etchupJ
Solution
<et & be the event of using mustard, and % be the event of using !etchup.
It is given that+
1, . ( * ) A P
N
-( . ( * ) B P
N
3, . ( * ) B A P

)a* P)a !etchup5user uses mustard*
-'., . (
-( . (
3, . (
* )
* )
* F )


B P
B A P
B A P

)b* P)a mustard5user uses !etchup*
-331 . (
1, . (
3, . (
* )
* )
* F )


A P
B A P
A B P

"ultiplicative !ule
* F ) * ) * ) A B P A P B A P
or =
* F ) * ) B A P B P
Statistically .ndependence7 the occurrence or non5occurrence of one event has no effect on the
probability of occurrence of the other event.
'1
#wo events ( and / are independent if and only if
* ) * ) * ) B P A P B A P
Example 11
& pair of fair dice are thrown twice. What is the probability of getting totals of 1 and ''J
Solution
<et
i
A
be the event of getting 818 in the i5th throw and !
B
be the event of getting 8''8 in the 75th
throw.
P)Metting totals of 1 and ll* * ) * ) * )
. ' . '
A B P B A P B A P +
* F ) * ) * F ) * )
' . ' ' . '
B A P B P A B P A P +
* ) * ) * ) * )
. ' . '
A P B P B P A P + .... ! i
B A ,
are independent
,2
'
/3
3
/3
.
/3
.
/3
3

,
_

,
_

+
,
_

,
_

#heorem of #otal -robability


If the events
k
B B B ,..., ,
. '
constitute a partition of the sample space S such that
* ( )
i
B P

for i = ', ., ... , !, then for any event & of S
* ) ... * ) * ) * )
. ' k
B A P B A P B A P A P + + +

* F ) * ) ... * F ) * ) * F ) * )
. . ' ' k k
B A P B P B A P B P B A P B P + + +
Example 1'
Suppose ,(A of the cars are manufactured in the Onited States and ',A of these are compactN
/(A of the cars are manufactured in ?urope and 2(A of these are compactN and finally, .(A are
manufactured in Papan and 3(A of these are compact. If a car is pic!ed at random from the lot,
find the probability that it is a compact.
<et & be the event that the car is compact,

'
B be the event that the car is manufactured is Onited States,

.
B be the event that the car is manufactured in ?urope, and

/
B
be the event that the car is manufactured in Papan.
* ) * ) * ) * )
/ . '
B A P B A P B A P A P + +

* F ) * ) * F ) * ) * F ) * )
/ / . . ' '
B A P B P B A P B P B A P B P + +

/', . ( * 3( . ( *) .( . ( ) * 2( . ( *) /( . ( ) * ', . ( *) ,( . ( ) + +
/aye6s #heorem
'-
If k
E E E ,...,
. , ' are mutually e#clusive events such that
k
E E E ...
. '
contains all
sample points of S, then for any event D of S with
( * ) D P
,

k
!
!
i i
i
D E P
D E P
D P
D E P
D E P
'
* )
* )
* )
* )
* F )


* F ) * ) ... * F ) * ) * F ) * )
* F ) * )
. . ' ' k k
i i
E D P E P E D P E P E D P E P
E D P E P
+ +

Example 1*
Suppose a bo# contains . red balls and ' white ball and a second bo# contains . red ball and .
white balls. Qne of the bo#es is selected by chance and a ball is drawn from it. If the drawn ball is
red, what is the probability that it came from the 'st bo#J
Solution

<et & be the event of drawing a red ball and % be the event of choosing the 'st bo#.
Miven+
.
'
* 8 ) * ) B P B P N
/
.
* F ) B A P N
2
.
* 8 F ) B A P
P)9oming from the 'st bo#Fthe drawn ball is red*
* F ) A B P

* 8 ) * )
* )
* )
* )
B A P B A P
B A P
A P
B A P
+


1
2
*
2
.
*)
.
'
) *
/
.
*)
.
'
)
*
/
.
*)
.
'
)
* 8 F ) * 8 ) * F ) * )
* F ) * )

B A P B P B A P B P
B A P B P
Chapter #hree , -robability Distributions
4o cope with uncertainties of outcome, a statistical model that describes the behavior of the outcome
is needed. 4hese theoretical models which are very similar to relative fre$uency distributions, are
called probability distributions.
'0
!andom 8ariables 5 & random variable is a variable that ta!es on different numerical values
determined by the outcomes of a random e#periment.
Example 1
&n e#periment of tossing a coin / times.
<et random variable, R be the number of heads achieved.
&s S = HHHH HH4 H4H 4HH 44H 4H4 H44 444I,
so R = H(, ', ., /I
Discrete random variable 5 in a given interval, only a specified number of values can occur.
Continuous random variable 5 in a given interval, any value can occur.
-robability Distribution of a random variable 5 is a representation of the probabilities for all
the possible outcomes.
Example '
4he probability distribution of the number of heads occurred when a coin is tossed 2 times.
# ( ' . / 2
P)R=#*
'3
'
'3
2
'3
3
'3
2
'3
'
4hat is,
'3
* )
2 x
C
x " P ,
, ( x
', ., 2
Example *
9onsider an e#periment of tossing two fair dice.
<et random variable, R be the sum of the two dice. 4hen the probability distribution of R is+
R . / 2 , 3 1 - 0 '( '' '.
P)R=#*
/3
'
/3
.
/3
/
/3
2
/3
,
/3
3
/3
,
/3
2
/3
/
/3
.
/3
'
.(
4he probability function
* )x f
, of a discrete random variable R e#presses the probability that R
ta!es the value #, as a function of #. 4hat is
* ) * ) x " P x f
where the function is evaluated at all possible values of #.
Properties of probability function
* ) x " P
+5
'.
( * ) x " P
for any value #.
..


x
x " P . ' * )
"athematical Expectations
#he expected value% E$9&% of a discrete random variable R is defined as

E " xP " x
x
) * ) * or
#

It is the mean of the probability distribution.


<et R be a random variable. 4he e#pectation of the s$uared discrepancy about the mean, )R
#*
.
, is called the variance, denoted #
.
, and given by
Var " E "
x x
) * S) * T or
. .

) * ) * x P " x
x
x

.

x P " x
x
x
. .
) *
Example 4
9alculate the mean and variance of the discrete probability distribution in e#ample . and /.
The Normal Distribution
6ormal distribution is probability distribution of a continuous random variable. It is based on the
<aw of ?rrors which states that
'. ?rrors are inevitable.
.'
.. <arge errors are less li!ely than small errors.
/. Positive and negative errors are e$ually li!ely.
Definition +
& continuous random variable R is defined to be a normal random variable if its probability
function is given by
f x
x
) *
) *
e#pS ) * T
'
.
'
.
.

for U # U +
where = the mean of R, = the standard deviation of R,
= /.'2',2
6otation + R V 6),
.
*
Properties of the normal distribution+5
'. It is a continuous distribution.
.. 4he curve is symmetric and bell5shaped about a vertical a#is through the mean .
/. 4he total area under the curve and above the horizontal a#is is e$ual to '.
2. &rea under the normal curve+
&ppro#imately 3-A of the values in a normally distributed population are within
' standard deviation from the mean.
&ppro#imately 0,.,A of the values in a normally distributed population are
within . standard deviation from the mean.
&ppro#imately 00.1A of the values in a normally distributed population are
within / standard deviation from the mean.
#he standard normal curve +
4he distribution of a normal random variable with = ( and =' is called a standard normal
distribution. Osually a standard normal random variable is denoted by K.
6otation + K V 6)(, '*
>emar! + Osually a table of K is set up to find the probability P)K z* for z (.
Example 4
..
Miven K V 6)(, '*
)a* P)K B '.1/* = (.(2'-
)b* P)( U K U '.1/* = P)K B (* 5 P)K B '.1/* = (., 5 (.(2'- = (.2,-.
)c* P)..2. U K U (.-* = ' 5 P)K U 5..2.* 5 P)K B (.-*
= ' 5 (.((113 5 ..''0 = (.1-(/2
)d* P)'.- U K U ..-* = P)K B '.-* 5 P)K B ..-* = (.(/,0 5 (.((.,3 = (.(///2
)e* the value z that has
)i* ,A of the area below it
<et the corresponding z value be z', then we have P)K U z'* = (.(,.
"rom the standard normal distribution table we have P)K U 5'.32* = (.(,.
So z' = 5'.32
)ii* /0.22A of the area between ( and z.
<et the corresponding z value be z', then we have P)( U K U z'* = (./022.
"rom the standard normal distribution table we have P)( U K U '..,* = (./022.
So z' = '..,
4heorem +
If R is a normal random variable with mean and standard deviation , then
#
"

is a standard normal random variable and hence


P x " x P
x
#
x
) * ) *
' .
' .
< <

< <

Example 5
Miven R V 6),(, '(
.
*, find P)2, U R U 3.*.
Solution+
,
_


<

<

< <

3. 2,
* 3. 2, )
"
P " P

,
_


< <

'(
,( 3.
'(
,( 2,
# P
./
* . . ' , . ( ) < < # P
* . . ' ) * , . ( ) ' > < # P # P
= ' 5 (./(-, 5 .'',' = (.,132
Example :
4he charge account at a certain department store is appro#imately normally distributed with an
average balance of :-( and a standard deviation of :/(. What is the probability that a charge
account randomly selected has a balance
)a* over :'.,N
)b* between :3, and :0,.
<et R be the balance in the charge account. R 6)-(,
.
/( *
)a* *
'.,
) * '., )


>

>
"
P " P
(33- . ( * , . ' ) *
/(
-( '.,
) >

> # P # P
)b*
,
_


<

<

< <

0, 3,
* 0, 3, )
"
P " P

,
_


< <

/(
-( 0,
/(
-( 3,
# P

* , . ( ) * , . ( ) ' * , . ( , . ( ) > < < < # P # P # P
/-/( . ( /(-, . ( /(-, . ( '
Example 1;
Qn an e#amination the average grade was 12 and the standard deviation was 1. If '.A of the
class are given &Ds, and the grades are curved to follow a normal distribution, what is the lowest
possible & and the highest possible %J
<et R be the e#amination grade and #' be the lowest grade for &.
'. . (
1
12
'. . ( * )
'
'

,
_


> >
x
# P x " P
"rom the standard normal distribution, we get

'.'( . ( * '1 . ' ) > # P
, and
''0( . ( * '- . ' ) > # P
.2
so
'. . ( * '1, . ' ) > # P
4hus '1, . '
1
12
'

x
i.e. -/ . . -. * '1, . ' *) 1 ) 12
'
+ x
4he highest possible % is -..

The Binomial Distribution
& binomial e#periment possesses the following properties +
'. 4here are n identical observations or trials.
.. ?ach trial has two possible outcomes, one called CsuccessE and the other CfailureE. 4he
outcomes are mutually e#clusive and collectively e#haustive.
/. 4he probabilities of success p and of failure ' p remain the same for all trials.
2. 4he outcomes of trials are independent of each other.
Example 11
'. In testing '( items as they come off an assembly line, where each test or trial may
indicate a defective or a non5defective item.
.. "ive cards are drawn with replacement from an ordinary dec! and each trial is labelled a
success or failure depending on whether the card is red or blac!.
Definition +
In a binomial e#periment with a constant probability p of success at each trial, the probability
distribution of the binomial random variable R, the number of successes in n independent trials, is
called the binomial distribution.
6otation + R V b)n, p*
P)R = #* =
n
x
p $
x n x

_
,


# = (, ', , n
p W $ = '
Example 1'
Qf a large number of mass5produced articles, one5tenth is defective. "ind the probabilities that a
random sample of .( will obtain
)a* e#actly two defective articlesN
)b* at least two defective articles.
.,
<et R be the number of defective articles in a random sample of .(. R b).(,
'(
'
*
)a* .-,'1 . (
'(
0
'(
'
.
.(
* . )
'- .

,
_

,
_

,
_

" P
)b*
3(-., . ( .1('1 . ( '.',- . '
'(
0
'(
'
'
.(
'(
0
'(
'
(
.(
'
* ' ) * ( ) ' * . )
'0 .( (

,
_

,
_

,
_


,
_

,
_

,
_


" P " P " P
Example 1*
& test consists of 3 $uestions, and to pass the test a student has to answer at least 2 $uestions
correctly. ?ach $uestion has three possible answers, of which only one is correct. If a student
guesses on each $uestion, what is the probability that the student will pass the testJ
<et R be the no. of correctly answered $uestions among 3 $uestions. R b)3,
/
'
*
( ) ( )
x x
x x
x
x " P " P



,
_


3
3
2
3
2
/
.
/
'
3
* ) * 2 )
( ) ( ) ( ) ( ) ( ) ( ) '(('2 . (
/
.
/
'
3
3
/
.
/
'
,
3
/
.
/
'
2
3 ( 3 ' , . 2

,
_

,
_

,
_

4heorem
4he mean and variance of the binomial distribution with parameters of n and p are
= np and
.
= np$ respectively where p W $ = '.
Example 1+
& pac!aging machine produces .( percent defective pac!ages. & random sample of ten pac!ages
is selected, what are the mean and standard deviation of the binomial distribution of that processJ
<et R be the no. of defective pac!ages in a sample of '( pac!ages. R b)'(, (..*
Its mean is = np = )'(*)(..* = .
Its standard deviation is .3, . ' * - . ( *) . . ( *) '( ) np$
4he 6ormal &ppro#imation to the %inomial Distribution 4heorem +
Miven R is a random variable which follows the binomial distribution with parameters n and p,
then
.3
P " x P
x np
np$
#
x np
np$
) * )
) . *
) *
) . *
) *
*

< <
+ (, (,
if n is large and p is not close to ( or '.
>emar! + If both np and n$ are greater than ,, the appro#imation will be good.
Example 12
& process yields '(A defective items. If '(( items are randomly selected from the process, what
is the probability that the number of defective e#ceeds '/J
<et R be the no. of defective in a random sample of '(( items. R b)'((, (.'*
'( * ' . ( *) '(( ) np
% / * 0 . ( *) ' . ( *) '(( ) np$
* , . '/ 8 ) * '/ ) > > " P " P
by normal appro#imation
'.' . ( * '31 . ' )
/
'( , . '/ , . '/ 8
>
,
_


>
,
_


>

# P # P
"
P

Example 14
& multiple5choice $uiz has .(( $uestions each with four possible answers of which only one is
the correct answer. What is the probability that sheer guesswor! yields from ., to /( correct
answers for -( of the .(( problems about which the student has no !nowledgeJ
<et R be the no. of correct answers for -( with sheer guesswor!. R b)-(, (..,*
.( * ., . ( *) -( ) np
% ', * 1, . ( *) ., . ( *) -( ) np$
* , . /( 8 , . .2 ) * /( ., ) < < " P " P
by normal appro#imation
''03 . ( ((//3 . ( './( . ( * 1' . . '3 . ' )
',
.( , . /(
',
.( , . .2
< <

,
_


< <

# P # P
The Poisson Distribution
?#periments yielding numerical values of a random variable R, the number of successes
)observations* occurring during a given time interval )or in a specified region* are often called
Poisson e#periments.
& Poisson e#periment has the following properties+
.1
'. 4he number of successes in any interval is independent of the number of successes in
other interval.
.. 4he probability of a single success occurring during a short interval is proportional to the
length of the time interval and does not depend on the number of successes occurring
outside this time interval.
/. 4he probability of more than one success in a very small interval is negligible.
Examples of random variables following -oisson Distribution
'. 4he number of customers arrived during a time period of length t.
.. 4he number of telephone calls per hour received by an office.
/. 4he number of typing errors per page.
2. 4he number of accidents occurred at a 7unction per day.
Definition +
4he probability distribution of the Poisson random variable R is called the Poisson distribution.
6otation + R V Po)*
where is the average number of successes occuring in the given time
interval.
P)R = #* =
e
x
x

L
# = (, ', .,
e = ..1'-.-/
4heorem+ In a Poisson Distribution mean is e$ual to variance, i.e.,
.
.
Example 14
4he average number of radioactive particles passing through a counter during ' millisecond in a
laboratory e#periment is 2. What is the probability that 3 particles enter the counter in a given
millisecondJ
<et R be the no. of particles entering the counter in a given millisecond. R Po)2*
'(2. . (
L 3
2
* 3 )
3 2

e
" P
Example 1:
.-
Ships arrive in a harbour at a mean rate of two per hour. Suppose that this situation can be
described by a Poisson distribution. "ind the probabilities for a /(5minute period that
)a* 6o ships arriveN
)b* 4hree ships arrive.
<et R be the no. of ship arriving in a harbour for a /(5minute period. R Po) '
.
.
*
)a* /310 . (
L (
'
* ( )
( '

e
" P
)b* (3'/ . (
L /
'
* / )
/ '

e
" P
4heorem +
4he mean and variance of the Poisson distribution both have mean .
Poisson appro#imation to the binomial distribution
If n is large and p is near ( or near '.(( in the binomial distribution, then the binomial distribution
can be appro#imated by the Poisson distribution with parameter np.
Example ';
If the prob. that an individual suffers a bad reaction from a certain in7ection is (.((', determine
the prob. that out of .((( individuals, more than . individuals will suffer a bad reaction.
Sol
n
+ &ccording to binomial +
4he re$uired probability
= ( ) ( ) ( ) ( ) ( ) ( ) '
.(((
(
(((' (000
.(((
'
(((' (000
.(((
.
(((' (000
( .((( ' '000 . '00-

_
,

_
,

_
,

'

. . . . . .
Osing Poisson distribution+
P)( suffers* =
.
(
'
( .
.
e
e

L
= np = .
P)' suffers* =
.
. '
.
L '
.
e
e

P). suffer* =
.
.L
.
. .
.
e
e

.0
4hen the re$uired probability = '
,
( /./
.

e
.
Meneral spea!ing, the Poisson distribution will provide a good appro#imation to binomial when
)i* n is at least .( and p is at most (.(,N or
)ii* n is at least '((, the appro#imation will generally be e#cellent provided pU (.'.
Example '1
4wo percent of the output of a machine is defective. & lot of /(( pieces will be produced.
Determine the probability that e#actly four pieces will be defective.
<et R be the no. of defective pieces among /(( pieces. R b)/((, (.(.*
'//- . ( * 0- . ( ) * (. . ( ) * 2 )
.03 2
2 /((
C " P
%y Poisson &ppro#imation+
3 * (. . ( *) /(( ) np
'//- . (
L 2
3
* 2 )
2 3

e
" P
C)(-#E! + , S("-<.=> D.S#!./?#.@=S (=D ES#."(#.@=
Definition
'. & sample statistic is a characteristic of a sample.
& population parameter is a characteristic of a population.
.. & statistic is a random variable that depends only on the observed random sample.
/. & sampling distribution is a probability distribution for a sample statistic. It indicates the
e#tent to which a sample statistic will tend to vary because of chance variation in random
sampling.
2. 4he standard deviation of the distribution of a sample statistic is !nown as the standard
error of the statistic.
/(
&n illustrating e#ample
Suppose a population consists of four elements, H(,',.,/I. & simple random sample of two
elements is to be drawn.
4he population has two parameters+ a mean

of '., and a variance


.

of '.3331.
Qbviously there are si# possible samples ) 3
2
.
C *. 4hey are
Sample Sample mean error Probability
(,' (., 5'.( 'F3
(,. '.( 5(., 'F3
(,/ '., ( 'F3
',. '., ( 'F3
',/ ..( (., 'F3
.,/ .., '.( 'F3
"rom the above table, we can see that if we draw a sample and use the sample mean to estimate
the population mean, the accuracy of our estimate depends on which sample we have drawn,
which in turn depends on chance.
4he probability distribution of sample mean is !nown as a sampling distribution of sample mean,
as compiled in the following table+
Sample mean (., '.( '., ..( ..,
Probability 'F3 'F3 .F3 'F3 'F3
4he e#pected value of sample mean is
% & P & & E + + + +

, . ' 3 F ' X , . . 3 F ' X ( . . 3 F . X , . ' 3 F ' X ( . ' 3 F ' X , . ( * ) X * )


.
Hence the average value of the sample mean is e$ual to the population mean. We call the sample
mean an unbiased estimator of the population mean.
4he variance of the sample mean )i.e., the average s$uare deviation of the sample mean from the
population mean* is+

* ) * ) * )
.
& P % & & V
=
. . . . .
' ' . ' '
)(., '.,* X )'.( '.,* X )'., '.,* X )..( '.,* X ).., '.,* X
3 3 3 3 3
+ + + +
=(.2'31
Sampling Distribution of "ean
4he 9entral <imit 4heorem
If repeated samples of size n are drawn from any infinite population with mean

and
variance
.
, and n is large )n /(*, the distribution of # , the sample mean, is
/'
appro#imately normal, with mean )i.e. ) * E x * and variance
.
Fn )i.e.
.
) * V x
n

*,
and this appro#imation becomes better as n becomes larger.
6otes+ &s in the previous illustrating e#ample, we can see the following modifications+
)i* If the population is finite,
.
) * )' *
n
V x
N n

N where )'5nF6* is !nown as the finite


population correction factor. When 6 is very big, the factor is e$ual to '.
)ii* If n is small, say less than /(, the sampling distribution is not so normal. & t5
distribution will be used )discussed later*.
In the above e#ample, 6=2, n=.,
.
) * )' *
n
V x
N n

=)'5.F2*)'.3331F.* = (.2'31. If the


population is big )or the sample is drawn with replacement*, then
.
) * V x
n

='.3331F.=(.-///.
In this course we assume a big population or sampling with replacement.
?#ample '
&n electrical firm manufactures light bulbs that have a length of life that is appro#imately normal
distributed with mean e$ual to -(( hours and a standard deviation of 2( hours. "ind the
probability that a random sample of '3 bulbs will have an average life of less than 11, hours.
<et " be the average life of the '3 bulbs. " 6)

x
-((,
n
x
.
.

'3
2(
.
*

,
_

<

,
_


<

<
'3
2(
-(( 11, 11,
* 11, ) # P
"
P " P
x
x
x
x


((3.' . ( * , . . ) < # P
?#ample .
4he mean IY scores of all students attending a college is ''( with a standard deviation of '(.
)a* If the IY scores are normally distribution, what is the probability that the score of any one
student is greater than ''.J
)b* What is the probability that the mean score in a random sample of /3 students is greater
than ''.J
/.
)c* What is the probability that the mean score in a random sample of '(( students is greater
than ''.J
Solution
)a* <et R be the student8s IY score. R 6)''(,
.
'( *
2.(1 . ( * . . ( )
'(
''( ''.
* ''. ) >
,
_


> > # P # P " P
)b* <et
'
" be the mean score of a sample of /3 students.
'
" 6)
, ''(
x

/3
'(
. .
.

n
x

*
'',' . ( * . . ' )
/3
'(
''( ''.
* ''. )
'
>

,
_

> > # P # P " P


)c* <et
.
" be the mean score of a sample of '(( students.
.
" 6)
, ''(

'((
'(
. .

*
..- . ( . ( * . )
'((
'(
''( ''.
* ''. )
.
>

,
_

> > # P # P " P


?stimation
?stimation is the process of using statistics from sample data to estimate the parameters of the
population. & statistic is a random variable which depends on which sample is drawn from a
population.
4he followings are some e#amples
?stimator Population parameter
'.
x

.. s
.

.
/. P P
//
4here are two important properties for an estimator, namely, unbiasedness and efficiency.
Onbiased estimator+ &n estimator, for e#ample,
x
, is unbiased if and only if ) * E x .
?fficiency+ 4he efficiency of an estimator, for e#ample,
x
, is given by ) * V x . 4he smaller the
) * V x , the more accurate will be the
x
as an estimator.
4here are two types of estimate
'. & point estimate is a single5value estimate of a population parameter, for e#ample,
Z Z
N x P p
.
.. &n interval estimate of a population parameter gives an interval that may contain the true
value of the parameter with a certain probability )i.e. confidence*N for e#ample,
Pr) * (.00. a ' < <
"or a point estimate, both the accuracy and reliability of the estimation are un!nown. "or an
interval estimate, the width of the interval gives the accuracy and the probability gives the
reliability of the estimation.
?#amples /
)a* 4he mean and standard deviation for the $uality point averages of a random sample of /3
college seniors are calculated to be ..3 and (./, respectively. "ind a 0,A confidence
interval for the mean of the entire senior class.
)b* How large a sample is re$uired in )a* if we want to be 0,A confident of is off by less
than (.(,J
Solution
<et

be the mean of the entire senior class.


Miven+ n = /3, 3 . . x , s = (./, )' 5 * = (.0, (, . (
)a* & 0,A confidence interval estimate for the

is

,
_

+ < <

,
_

+ < <
/3
/ . (
03 . ' 3 . .
/3
/ . (
03 . ' 3 . .
[ [
(., . ( (., . (

n
( x
n
( x
/2
30- . . ,(. . . < <
)b* <et
'
n be the re$uired sample size.
4o be 0,A confident that

if off by less than (.(, would imply


(, . (
/ . (
03 . ' (, . (
[
' '
(., . (

,
_

,
_

n n
(

'/0 /( . '/-
(, . (
* / . ( *) 03 . ' )
.
'

1
]
1

n
& summary table for constructing )' *A confidence interval for mean and proportion
?stimating 9onditions "ormula
;ean <arge samples )n /(* Q>
is !nown
n
# "
n
# "


. .
+ < <
;ean X Small samples and
un!nown
n
s
t "
n
s
t "
.
,
.
,

+ < <
v=n5'
Proportion <arge sample
n
$ p
# p
[ [
[
.

t
Difference of means
<arge sample Q>
'
and
.
are !nown
.
.
'
.
'
.
. '
[
* )
n n
# " "

+ t
Difference of means
Small sample \
'
and
.
are un!nown, assume
. '


. '
.
,
. '
' '
* ) * )
n n
s t " "
p
+ t

.
. '
+ n n ,
pooled estimate of sample standard
deviation+
.
* ' ) * ' )
. '
.
. .
.
' '
+
+

n n
s n s n
s
p
/,
Difference of means
Small sample \
'
and
.
are un!nown,
assume
'

.

. .
' .
' .
,
.
' .
) *
s s
" " t
n n

t +
'
* F )
'
* F )
* F F )
.
.
.
.
.
'
.
'
.
'
.
.
.
. '
.
'

n
n s
n
n s
n s n s

Difference of means Paired observations


, F .
d
v
s
d t
n

t
N
' . d x x
and v=n5'
Difference of proportions <arge samples
.
. .
'
' '
.
. '
[ [ [ [
* [ [ )
n
$ p
n
$ p
( p p + t

?#ample 2
4he contents of seven similar containers of sulfuric acid are 0.-, '(.., '(.2, 0.-, '(.(, '(.. and
0.3 liters. "ind a 0,A confidence interval for the mean of all such containers, assuming an
appro#imate normal distribution.
Solution
<et

be the mean of all such containers.


Miven+ n = 1
1( x

2- . 1((
.
x
'(
1
1(

n
x
x
(- . (
3
1
1(
2- . 1((
'
* )
.
.
.
.



n
n
x
x
s
s = .-.- . ( (- . ( N )' 5 * = (.0, (, . ( ,
221 . .
(., . ( , 3
t
& 0,A confidence interval estimate for the

is

,
_

+ < <

,
_

+ < <
1
.-.- . (
221 . . '(
1
.-.- . (
221 . . '(
(., . ( , 3 (., . ( , 3

n
s
t x
n
s
t x
.3. . '( 1/- . 0 < <
?#ample ,
/3
In a random sample of n = ,(( families owning television sets in the city of Hamilton, 9anada, it
was found that # = /2( owned color sets. "ind a 0,A confidence interval for the actual
proportion of families in this city with colour sets.
<et P be the actual proportion of families in this city with colour sets.
Miven+ n = ,((, 3- . (
,((
/2(
[
n
x
p ,
(, . ( 0, . ( * ' )
& 0,A confidence interval for P is

,
_

+ < <
,
_

n
$ p
( p P
n
$ p
( p
[ [
[
[ [
[
(., . ( (., . (
1. . ( 32 . (
,((
* /. *). 3- ).
03 . ' 3- . (
,((
* /. *). 3- ).
03 . ' 3- . ( < < + < < P P
?#amples 3
& standardized chemistry test was given to ,( girls and 1, boys. 4he girls made an average
grade of 13 with a standard deviation of 3, while the boys made an average grade of -. with a
standard deviation of -. "ind a 03A confidence interval for the difference ' and ., where ' is
the mean score of all boys and . is the mean score of all girls who might ta!e this test.
Miven+ 1,
'
n , ,(
.
n , -.
'
x , -
'
s , 13
.
x , 3
.
s ,


(2 . ( 03 . * ' )

& 03A confidence interval for
. '
is+
.
.
.
'
.
'
(. . ( . ' . '
.
.
.
'
.
'
(. . ( . '
[ [
* )
[ [
* )
n n
( x x
n n
( x x



+ + < < +
,(
3
1,
-
(, . . * 13 -. )
,(
3
1,
-
(, . . * 13 -. )
. .
. '
. .
+ + < < + ,
/(
'
> n \ /(
.
> n , so
' '
[ s \
. .
[ s
,1 . - 2/ . /
. '
< <
?#ample 1
/1
In a batch chemical process, two catalysts are being compared for their effect on the output of the
process reaction. & sample of '. batches is prepared using catalyst ' and a sample of '( batches
was obtained using catalyst .. 4he '. batches for which catalyst ' was used gave an average
yield of -, with a sample standard deviation of 2, while the average for the second sample gave
an average of -' and a sample standard deviation of ,. "ind a 0(A confidence interval for the
difference between the population means, assuming the populations are appro#imately normally
distributed with e$ual variances.
Solution
<et
'
and
.
be the mean population yield using catalyst ' and catalyst ., respectively.
Miven+ '.
'
n , '(
.
n , -,
'
x , 2
'
s , -'
.
x , ,
.
s ,


'( . ( 0( . * ' )
, .( . '( '. .
. '
+ + n n ,
1., . '
(, . ( , .(
t

pooled estimate of sample standard deviation
.
* ' ) * ' )
. '
.
. .
.
' '
+
+

n n
s n s n
s
p
21- . 2
. '( '.
, * ' '( ) 2 * ' '. )
. .

+
+

& 0(A confidence interval for


. '
is+
. '
(, . ( , .( . ' . '
. '
(, . ( , .( . '
' '
* ) * )
' '
* ) * )
n n
s t x x
n n
s t x x
p p
+ + < < +
'(
'
'.
'
* 21- . 2 *) 1., . ' ) * -' -, )
'(
'
'.
'
* 21- . 2 *) 1., . ' ) * -' -, )
. '
+ + < < +
/' . 1 30 . (
. '
< <
?#ample -
4he weight of '( adults selected randomly before and after a certain new diet was introduced was
recorded as follows+
&dult
%efore )
'
x * &fter )
.
x *
Difference
' 13 -' 5,
. 3( ,. -
/ -, -1 5.
2 ,- 1( 5'.
, 0' -3 ,
/-
3 1, 11 5.
1 -. 0( 5-
- 32 3/ '
0 10 -, 53
'( -- -/ ,
"ind a 0-A confidence interval for the mean difference in weight.
Solution
i
d
d
n

= 5'.3
.
.
) ) '.3**
2(.1
'
i
d
d
s
n

"or v = n5' = 0N
(.('
..-.' t .
& 0-A confidence interval is
3./-
'.3 ).,-.'*
'(
_
t

,
4hat is 1..0 2.(0
d
< <
?#ample 0
& certain change in a manufacturing procedure for component parts is being considered. Samples
are ta!en using both the e#isting and the new procedure in order to determine if the new
procedure results in an improvement. If 1, of ',(( items from the e#isting procedure were found
to be defective and -( of .((( items from the new procedure were found to be defective, find a
0(A confidence interval for the true difference in the fraction of defectives between the e#isting
and the new process.
Solution
<et
'
P and
.
P be the true fraction of defectives of the e#isting and the new processes,
respectively.
Miven+ ',((
'
n , 1,
'
x , (, . (
',((
1,
[
'
p
.(((
.
n , -(
.
x , (2 . (
.(((
-(
[
.
p

'( . ( 0( . * ' )
& 0(A confidence interval for
'
P 5
.
P is+
/0
.
. .
'
' '
(, . ( . ' . '
.
. .
'
' '
(, . ( . '
[ [ [ [
* [ [ )
[ [ [ [
* [ [ )
n
$ p
n
$ p
( p p P P
n
$ p
n
$ p
( p p + + < < +
.(((
* 03 *). (2 ).
',((
* 0, *). (, ).
32 . ' * (' . ( )
.(((
* 03 *). (2 ).
',((
* 0, *). (, ).
32 . ' * (' . ( )
. '
+ + < < + P P
(.'301 . ( (('301 . (
. '
< < P P
<ecture 2 , .ntroduction to #est of )ypothesis
Statistical Hypothesis
9onsider the following e#ample+
& manufacturer of sports e$uipment has developed a new synthetic fishing line that he claims has a
mean brea!ing strength of - !ilograms with a standard deviation of (., !ilogram. 4est the
hypothesis that

= - !ilograms against the alternative that

- !ilograms if a random
sample of ,( lines is tested and found to have a mean brea!ing strength of 1.- !ilograms.
Ose a (.(' level of significance.
When a random sample is drawn from a population )the ,( lines randomly selected*, the sample
information can be used assess the validity of some con7ecture, or hypothesis. Here

= -
!ilograms is !nown as the null hypotheis and

- !ilograms is the alternative hypothesis. 4hey


are complementary to each other, and we need to decide which one to accept on the basis of the
sample result of ,( lines.
6ow let us ma!e a 0,A confidence interval about the mean brea!ing strength of the population as
below+
(., (.,
1.- '.03X 1.- '.03X (.0,
,( ,(
P
_
< < +

,
N
i.e., )1.33'2 1.0/-3* (.0, P < < .
&s there is a probability of (.0, that the mean brea!ing strength is between 1.33 !g and 1.02 !g,
it is highly unli!ely that the null hypothesis

= - !g is true and hence should be re7ected.


4here are four possible situations for the above decision ma!ing e#ercise+
2(
(
) is correct
(
) is wrong
&ccept
(
)
9orrect decision 4ype . error
>e7ect
(
)
4ype ' error 9orrect decision
We still have a probability of '5(.0,, or (.(, to re7ect a true
(
) . We call this probability ]level of
significanceD or

, which is the probability of committing a type ' error.


4he rationale of hypothesis testing is simply outlined as above. 4here are however some formal
concepts and procedures to conduct the test. 4he details are put down below.
Some Hypothesis 4esting 4erminology
'. 6ull hypothesis, H(
& hypothesis that is held to be true until very strong evidence to the contrary is obtained.
H( +
(

.. &lternative hypothesis, H'
It is a hypothesis that is complement to the null hypothesis. Hence it will be accepted if
the null hypothesis is re7ected.
' (
+ ) )two5tail test*
' (
+ ) > )Qne5tail test*
' (
+ ) < )Qne5tail test*
In the one5tail test we have some e#pectation about the direction of the error when the
null hypothesis is wrong, while in the two5tail test we donDt have such e#pectation.
/. 4est statistics
is the value, based on the sample, used to determine whether the null hypothesis should
be re7ected or accepted.
2. 9ritical region
is a region in which if the test statistic falls the null hypothesis will be re7ected.
,. 4ypes of error
2'
)a* 4ype I error+ >e7ect H( when H( is true
)b* 4ype II error+ &ccept H( when Ha is true
3. 4he significance level,
is the probability of committing a type ' error, i.e., P)4ype I error* = .
4he probability of committing a type . error is N i.e., P)4ype II error* = .
%asic Steps in 4esting Hypothesis
'. "ormulate the null hypothesis.
.. "ormulate the alternative hypothesis.
/. Specify the level of significance to be used.
2. Select the appropriate test statistic and establish the critical region.
,. 9ompute the value of the test statistic.
3. 9onclusion+ >e7ect H( if the statistic has a value in the critical region, otherwise accept
H(.
2.
4ests concerning means
4he tests concerning means and proportions are summarized in the following table.
H( 9onditions 4est statistic
(

<arge samples )n /(* Q> is !nown
n
x
(


(
Small samples and un!nown
n s
x
t
(

with ' n
( . '
d
<arge samples Q>
'
and
.
are
!nown
.
.
.
'
.
'
( . '
* )
n n
d x x
(

+

( . '
d
Small sample \
'
and
.
are
un!nown, assume
. '

,
_

. '
( . '
' '
* )
n n
s
d x x
t
p
with .
. '
+ n n and
.
* ' ) * ' )
. '
.
. .
.
' ' .
+
+

n n
s n s n
s
p
if
'
=
.
but un!nown
( . '
d
Small sample \
'
and
.
are
un!nown,
assume
'

.

.
.
.
'
.
'
( . '
* )
n
s
n
s
d x x
t
+

2/
with
'
* F )
'
* F )
* F F )
.
.
.
.
.
'
.
'
.
'
.
.
.
. '
.
'

n
n s
n
n s
n s n s

( . '
d Paired observations
n s
d d
t
d
(

with ' n
p p
(
<arge sample
(
( (
[

)' *
p p
(
p p
n

p p
' .
( <arge samples

,
_

. '
. '
' '
* [ ' ) [
* [ [ )
n n
p p
p p
(
' ' . .
' .
[[
[
n p n p
p
n n
+

+
22
?#ample '
& manufacturer of sports e$uipment has developed a new synthetic fishing line that he claims has a
mean brea!ing strength of - !ilograms with a standard deviation of (., !ilogram. 4est the
hypothesis that

= - !ilograms against the alternative that

- !ilograms if a random
sample of ,( lines is tested and found to have a mean brea!ing strength of 1.- !ilograms.
Ose a (.(' level of significance.
6ull hypothesis+
-
!ilograms
&lternative hypothesis+
-
!ilograms
<evel of significance+ (.('
9ritical region+ K B
,- . .
((, . (
(
or K U
,- . .
((, . (
(
9omputation+
n = ,( - . 1 x , . (
-.- . .
,(
, . (
- - . 1

n
x
(

9onclusion+ &s the sample z )= 5..-.-* falls inside the critical region, so re7ect the null
hypothesis at (.(' level of significance and conclude that

is significantly smaller than -


!ilograms.
?#ample .
4he average length of time for students to register for fall classes at a certain college has been ,(
minutes with a standard deviation of '( minutes. & new registration procedure using modern
computing machines is being tried. If a random sample of '. students had an average registration
time of 2. minutes with a standard deviation of ''.0 minutes under the new system, test the
hypothesis that the population mean is now less than ,(, using a level of significance of )'* (.(,,
and ).* (.('. &ssume the population of times to be normal.
<et

be the population mean time for students to register in the new registration procedure.
)'* 6ull hypothesis+
,(
minutes
&lternative hypothesis+
,( <
minutes
<evel of significance+ (.(,
9ritical region+ )n = '. U /(N and the new is un!nown, so t5test should be used *
degree of freedom ) * = n 5' = '. 5' =''
2,
t U
103 . '
(, . ( , ''
t
9omputation+
n = '. 2. x s = ''.0

/.0 . .
'.
0 . ''
,( 2.

n
s
x
t

9onclusion+ &s sample t )= 5../.0* falls inside the critical region, so re7ect the null
hypothesis at (.(, level of significance and conclude that

is significantly smaller than


,( minutes.
).* Identical with those of )'* e#cept the critical region would be replaced by+
1'- . .
(' . ( , ''
<t t
and the corresponding conclusion would be changed as follows+
&s sample t )= 5../.0* falls outside the critical region, so re7ect the alternative hypothesis
at (.(' level of significance and conclude that

is not highly significantly smaller than


,( minutes.
?#ample /
&n e#periment was performed to compare the abrasive wear of two different laminated materials.
4welve pieces of material ' were tested, by e#posing each piece to a machine measuring wear.
4en pieces of material . were similarly tested. In each case, the depth of wear was observed.
4he samples of material ' gave an average )coded* wear of -, units with a standard deviation of
2, while the samples of material . gave an average of -' and a standard deviation of ,. 4est the
hypothesis that the two types of material e#hibit the same mean abrasive wear at the (.'( level of
significance. &ssume the populations to be appro#imately normal with e$ual variances.
<et
'
and
.
be the mean abrasive wear of material ' and . respectively.
6ull hypothesis+
. '
, i.e. (
. '

&lternative hypothesis+
. '
, i.e. (
. '

<evel of significance+ (.'(
9ritical region+ )&s both
'
n and
.
n are smaller than /( and their standard deviations are
un!nown, so t5test has to be used.*
.( . '( '. .
. '
+ + n n ,
1., . '
(, . ( , .(
>t t
or
1., . '
(, . ( , .(
< t t
23
9omputation+
'.
'
n -,
'
x 2
'
s
'(
.
n -'
.
x ,
.
s
(, . .(
. '( '.
, * 0 ) 2 * '' )
.
* ' ) * ' )
. .
. '
.
. .
.
' ' .

+
+

+
+

n n
s n s n
s
p
(-3 . .
'(
'
'.
'
(, . .(
( * -' -, )
' '
* ) * )
. '
. ' . '

n n
s
x x
t
p

9onclusion+ &s the sample t )=..(-3* falls inside the critical region, so re7ect the null hypothesis
at (.'( level of significance and conclude that the mean abrasive wear of material ' is
significantly higher than that of the material ..
?#ample 2
"ive samples of a ferrous5type substance are to be used to determine if there is a difference
between a laboratory chemical analysis and an R5ray fluorescence analysis of the iron content.
?ach sample was split into two sub5samples and the two types of analysis were applied.
"ollowing are the coded data showing the iron content analysis+
Sample
&nalysis ' . / 2 ,
#5ray ..( ..( ../ ..' ..2
9hemical ... '.0 .., ../ ..2
&ssuming the populations normal, test at the (.(, level of significance whether the two methods
of analysis give, on the average, the same result.
<et
'
and
.
be the mean iron content determined by the laboratory chemical analysis and
R5ray fluorescence analysis respectivelyN and
D
be the mean of the population of differences of paired measurements.
6ull hypothesis+
. '
or (
D

&lternative hypothesis+
. '
or (
D

<evel of significance+ (.(,


9ritical region+ )&s n = , U /(, so t5test should be used.*
21

113 . .
(., . ( , 2
>t t
or
113 . .
(., . ( , 2
< t t
9omputation+
Sample
&nalysis ' . / 2 ,
#5ray ..( ..( ../ ..' ..2
9hemical ... '.0 .., ../ ..2
i
d 5(.. (.' 5(.. 5(.. (
.
i
d
(.(2 (.(' (.(2 (.(2 (
, . (
,
'

i
i
d '/ . (
,
'
.

i
i
d
' . (
,
, . (
,

d
d
( )
(. . (
* 2 *) , )
* , . ( ) * '/ . ( *) , )
* ' )
.
.
.
.


n n
d d n
s
d
,-'' . '
,
(. . (
( * ' . ( )

n
s
d
t
d
D

9onclusion+ &s the sample t )=5'.,-''* falls outside the critical region, so re7ect the alternative
hypothesis at (.(, level of significance and conclude that there is no significant difference in the
mean iron content determined by the above two analyses.
4ests 9oncerning Proportions
?#ample ,
2-
& manufacturing company has submitted a claim that 0(A of items produced by a certain process
are non5defective. &n improvement in the process is being considered that they feel will lower
the proportion of defective below the current '(A. In an e#periment '(( items are produced with
the new process and , are defective. Is this evidence sufficient to conclude that the method has
been improvedJ Ose a (.(, level of significance.
<et P be the proportion of defective product in the new production process.
6ull hypothesis+ P = (.'
&lternative hypothesis+ P U (.'
<evel of significance+ (.(,
9ritical region+
32 . '
(, . (
< ( #

9omputation+
n = '(( # = , (, . (
'((
,
[
n
x
p
331 . '
'((
* 0 . ( *) ' . ( )
' . ( (, . (
* ' )
[

n
P P
P p
(
9onclusion+ &s the sample z )=5'.331* falls inside the critical region, so re7ect the null hypothesis
at (.(, level of significance and conclude that P is significantly smaller than (.'. 4hat is, the
production method has been improved in lowering the proportion of defective below the current
'(A.
?#ample 3
& vote is to be ta!en among the residents of a town and the surrounding country to determine
whether a proposed chemical plant should be constructed. 4he construction site is within the
town limits and for this reason many voters in the country feel that the proposal will pass because
of the large proportion of town voters who favor the construction. 4o determine if there is a
significant difference in the proportion of town voters and county voters favoring the proposal, a
poll is ta!en. If '.( of .(( town voters favor the proposal and .2( of ,(( county residents favor
it, would you agree that the proportion of town voters favoring the proposal is higher than the
proportion of county votersJ Ose a (.(., level of significance.
<et
'
P and
.
P be the proportions of town voters and country voters, respectively, favouring
the proposal.
6ull hypothesis+
. '
P P or (
. '
P P
&lternative hypothesis+
. '
P P > or (
. '
> P P
<evel of significance+ (.(.,
20
9ritical region+
03 . '
(., . (
> ( #
9omputation+
.((
'
n '.(
'
x 3 . (
.((
'.(
[
'
'
'

n
x
p
,((
.
n .2(
.
x 2- . (
,((
.2(
[
.
.
.

n
x
p
,'2 . (
,(( .((
* 2- . ( *) ,(( ) * 3 . ( *) .(( ) [ [
[
. '
. . ' '

+
+

+
+

n n
p n p n
p
-1( . .
,((
'
.((
'
* 2-3 . ( *) ,'2 . ( )
( * 2- . ( 3 . ( )
' '
* [ ' ) [
* ) * [ [ )
. '
. ' . '

1
]
1

1
]
1

n n
p p
P P p p
(
9onclusion+ &s sample z )=..-1(* falls inside the critical region, so re7ect the null hypothesis at
(.(., level of significance and conclude that the proportion of town voters favouring the proposal
is significantly larger than that of the country voters.
,(
Chapter 4 , Chi,square #ests
4here are two types of chi5s$uare tests+ goodness5of5fit test and tests for independence.
>oodness,of,fit #est
& test to determine if a population has a specified theoretical distribution. 4he test is based on
how good a fit we have between the fre$uency of occurrence of observations in an observed
sample and the e#pected fre$uencies obtained from the hypothesized distribution.
4heorem+ & goodness5of5fit test between observed and e#pected fre$uencies is based on the
$uantity

test
.
.

) * * E
E
i i
i
where
test
.
is a value of the random variable whose sampling distribution is
appro#imated very closed by the 9hi5s$uare distribution,
Qi is the observed fre$uency of cell i, and ?i is the e#pected fre$uency of cell i.
4he number of degrees of freedom in a 9hi5s$uare goodness5of5fit test is e$ual to
the number of cells minus the number of $uantities obtained from the observed
data that are used in the calculations of the e#pected fre$uencies.
"or a level of significance e$ual to

,
test
. .
> constitutes the critical region. 4he decision
criterion described here should not be used unless each of the e#pected fre$uencies is at least
e$ual to ,.
Example 1
9onsider the tossing of a die '.( times.
"aces
' . / 2 , 3
Qbserved .( .. '1 '- '0 .2
?#pected
%y comparing the observed fre$uencies with the e#pected fre$uencies, one has to decide whether
the die is fair die or not.
6ull hypothesis+ the die is a fair die, i.e.
3
'
* ) i " P for i = ', ., /, 2, ,, and 3
&lternative hypothesis+ the die is not a fair die
,'
<evel of significance+ (.(,
9ritical region+ , ' 3 ' n N
(1 . ''
.
(, . ( , ,
.
>
9omputation+
?#pected value = .( *
3
'
) '.( * ) i " nP
i ' . / 2 , 3
Qbserved
* )
i
* .( .. '1 '- '0 .2
?#pected
* )
i
E

.( .( .( .( .( .(
i i
E * ( . 5/ 5. 5' 2

3
'
.
.
* )
i i
i i
E
E *

1 . '
.(
2
.(
* ' )
.(
* . )
.(
* / )
.(
.
.(
(
. . . . . .
+

+ +
9onclusion+ &s the sample
.
)= '.1* falls outside the critical region, so re7ect the alternative
hypothesis and conclude that the die is a fair die.
Example '
4he following distribution of battery lives may be appro#imated by the normal distribution.
9lass boundaries Qi z5value p5value ?i
'.2, 5 '.0, .
'.0, 5 ..2, '
..2, 5 ..0, 2
..0, 5 /.2, ',
/.2, 5 /.0, '(
/.0, 5 2.2, ,
2.2, 5 2.0, /
9hi5s$uared test can be applied to test whether the above fre$uency distribution can be
appro#imated by a normal distribution or not.
6ull hypothesis+ the distribution can be appro#imated by a normal distribution
&lternative hypothesis+ the distribution cannot be appro#imated by a normal distribution
,.
<evel of significance+ (.(,
9ritical region+
.
(, . ( ,
.

>
where / n , and n is the number of cells.
9omputation+
"or finding the e#pected values, the mean and standard deviation of the fre$uency distribution
have to be found first.
9lass boundaries
* )
i
f
Qi
9lass mar!
)
i
x
*
'.2, 5 '.0, . '.1
'.0, 5 ..2, ' ...
..2, 5 ..0, 2 ..1
..0, 5 /.2, ', /..
/.2, 5 /.0, '( /.1
/.0, 5 2.2, , 2..
2.2, 5 2.0, / 2.1
n = 2(
, . '/3 fx 1, . 2-2
.

fx
2'., . /
2(
, . '/3

f
fx
x
( )
( )
3030 . (
/0
2(
, . '/3
1, . 2-2
'
F
.
.
.


n
n fx fx
s
1
]
1


< <


3030 . (
2'., . /
3030 . (
2'., . /
i i
+
#
L
value (
N

where
i
L
and
i
+
are the <ower and Opper %oundaries of the ith class.
1
]
1


< <


3030 . (
2'., . /
3030 . (
2'., . /
i i
+
#
L
value p
,/
1
]
1


< <

3030 . (
2'., . /
3030 . (
2'., . /
* 2( )
i i
i
+
#
L
P E
9lass boundaries Qi z5value p5value ?i
'.2, 5 '.0, . KU5..'( .('10 (.1'3
'.0, 5 ..2, ' 5..'(UKU5'./- .(3,0 ..3/3
..2, 5 ..0, 2 5'./-UKU5(.33 .'1(- 3.-/.
..0, 5 /.2, ', 5(.33UKU(.(, ..3,/ '(.3'.
/.2, 5 /.0, '( (.(,UKU(.11 ..,0, '(./-
/.0, 5 2.2, , (.11UKU'.20 .',., 3.'
2.2, 5 2.0, / '.20UK .(3-' ..1.2
In order to satisfy the rule that the e#pected value in each cell is larger than or e$ual to ,, we have
to combine the first three classes in to one cell and the last two classes into another cell. &s such,
the number of cells )n* is 2.
-.2 . -
* -.2 . - - )
/- . '(
* /- . '( '( )
3'. . '(
* 3'. . '( ', )
'-2 . '(
* '-2 . '( 1 ) * )
. . . . .
.

+

E
E *

0(' . .
9onclusion+ Since
-2' . /
.
(, . ( , '

, so the sample
.
)= ..0('* falls outside the critical region.
&s such, re7ect the alternative hypothesis and conclude that the distribution of battery lives can be
appro#imated by a normal distribution.
#est for .ndependence
4he 9hi5s$uare test procedure can also be used to test the hypothesis of independence of two
variablesFattributes. 4he observed fre$uencies of two variables are entered in a two5way
classification table, or contingency table.
>emar!+ 4he e#pected fre$uency of the cell in the i
th
row and 7
th
column in the contingency
table
E
i!

)total of row i* X)total of column 7*
grand total
4he degrees of freedom for the contingency table is e$ual to )r '* )c '* where r is the number
of rows and c is the number of columns in the table.
,2
Example *
Suppose that we wish to study the relationship between grade point average and appearance.
Mrade Point &verage
&ppearance ' . / 2 4otals
attractive '2 ) * '' ) * '( ) * , ) * 2(
ordinary '( ) * '3 ) * '3 ) * '2 ) * ,3
unattractive / ) * 2 ) * 1 ) * '( ) * .2
4otals .1 /' // .0 '.(
6ull hypothesis+ 4here is no relationship between grade point average and appearance. 4hat is,
the two characteristics are independent.
&lternative hypothesis+ 4here is a relationship between grade point average and appearance. 4hat
is, the two characteristics are not independent.
<evel of significance+ (.(,
9ritical region+
.
(, . ( ,
.

>
, where = )r 5'*)c 5 '*
9omputation+
Mrade Point &verage
&ppearance ' . / 2 4otals
attractive '2
),*
''
)-./00*
'(
)--*
,
),/12*
2(
ordinary '(
) -3/1*
'3
) '2.21*
'3
)-4/5*
'2
)-0/40*
,3
unattractive /
)4/5*
2
)1/3*
1
)1/1*
'(
)4/6*
.2
4otals .1 /' // .0 '.(
31 . 0
* 31 . 0 , )
''
* '' '( )
// . '(
* // . '( '' )
0
* 0 '2 ) * )
. . . . .
.

+

E
E *


,/ . '/
* ,/ . '/ '2 )
2 . ',
* 2 . ', '3 )
21 . '2
* 21 . '2 '3 )
3 . '.
* 3 . '. '( )
. . . .

+
,,
-'- . '(
- . ,
* - . , '( )
3 . 3
* 3 . 3 1 )
. . 3
* . . 3 2 )
2 . ,
* 2 . , / )
. . . .

+
9onclusion+ Since
,03 . '.
.
(, . ( , 3

, so sample
.
)='(.-'-* falls outside the critical region.
So re7ect the alternative hypothesis and conclude that there is no evidence to support there is
relationship between grade point average and appearance.
#est for )omogeneity
4o test the hypothesis that several population proportions are e$ual.
>emar!+ 4he approach for the test of homogeneity is the same as for the test of
independence of variablesFattributes.
?#ample 2
& study of the purchase decisions for / stoc! portfolio managers, &, %, and 9 was conducted to
compare the rates of stoc! purchases that resulted in profits over a time period that was less than
or e$ual to one year. Qne hundred randomly selected purchases obtained for each of the managers
showed the results given in the table. Do these data provide evidence of differences among the
rates of successful purchases for the three portfolio managersJ 4est with . (, . (
>esult
;anager
& % 9
Purchases show profit 3/ 1' ,,
Purchase do no show profit /1 .0 2,
4otal '(( '(( '((

6ull hypothesis+ the rates of stoc! purchases that resulted in profit were the same for the three
stoc! portfolio managers
&lternative+ their rates were not all the same
<evel of significance+ (.(,
9ritical region+
. * ' / *) ' . )
N
00' . ,
.
(, . ( , .
.
>
9omputation+
>esult
;anager
& % 9 4otal
Purchases show profit 3/
)10*
1'
)10*
,,
)10*
'-0
,3
Purchase do no show profit /1
)02*
.0
)02*
2,
)02*
'''
4otal '(( '(( '(( /((
20 . ,
/1
* /1 2, )
/1
* /1 .0 )
/1
* /1 /1 )
3/
* 3/ ,, )
3/
* 3/ 1' )
3/
* 3/ 3/ )
. . . . . .
.


9onclusion+ &s the sample
.
)= ,.20* falls outside the critical region so re7ect the alternative
hypothesis and conclude that there is no sufficient evidence to support the rates of purchases
resulted in profit of the three portfolio managers were different.

,1

You might also like