Statisticsnotes

Quantitative Methods G
Unit 1 : Data Collection

Unit 2 : Tables, charts and graphs
Unit 3 : Summarising numerical data on one variable
Unit 4 : Describing the relationship between two variables
Unit 5 : Probability
Unit 6 : Discrete probability distributions
Unit 7 : Continuous probability distributions
Unit 8 : Sampling distributions
Unit 9 : Hypothesis testing: principles
Unit 10 : Hypothesis testing: applied
Unit 11 : Simple linear regression
Unit 12 : Time series, forecasting and index numbers
Full screen mode can be exited by pressing Esc
Copyright c 2002 University of Canberra.
UNIT 1
DATA
COLLECTION
Unit list Previous Page Next page Back Contents Page page 1 of 47
1.0 Contents
To go to a section click on the section name.
1.1 Learning objectives
1.2 Introduction
1.3 Why study statistics?
1.4 What is statistics?
1.5 Populations and samples
1.6 Errors in sampling
1.7 Sampling plans
1.8 Simple random samples
1.9 Systematic random samples
1.10 Stratied random samples
1.11 Cluster samples
1.12 Multistage samples
1.13 Measuring unemployment in Australia
1.14 Summary
1.15 Tutorial Exercise
Print Workbook 1
To go to the text for an objective click on the objective.
When you have completed this unit, you should be able to:
Explain why an understanding of statistics is essential for all business students;
Dene the term statistics and differentiate between samples and populations and between descriptive and
inferential statistics;
Explain some of the errors that can be made in sampling;
Dene the terms probability sample, non-probability sample and sample frame and explain the need for
randomisation in statistical sampling;
Distinguish between a simple random sample, a systematic random sample, a stratied random sample,
a cluster sample and a multistage sample and explain the advantages and disadvantages of each type of
sample;
Use random number tables to select a simple random sample, a systematic random sample and a stratied
random sample from a sample frame;
Explain how Australias unemployment gures are calculated.
1.2 Introduction
The unit starts by explaining why business students need to study statistics. In the following two sections the term
statistics and its two components descriptive statistics and inferential statistics are dened. These denitions are
then used to provide a framework for the subject.
The remainder of the unit deals with the rst stage of a statistical investigationthe collection of data. Most data
are sample data. Five different sampling procedures are discussed:
simple random sampling,
systematic random sampling,
stratied random sampling,
cluster sampling and
multistage sampling.
At the end of the unit you should know how to collect data. In the next unit you will learn how to organise these
data into tables, charts and graphs.
1.3 Why study statistics?
There are two main reasons why business students need to study statistics.
1. To test theoretical models.
Statistics is used to test accounting, economic and nancial theories. Economic theory develops models which
attempt to explain and predict the behaviour of individuals and societies. These theoretical constructs are of-
ten very persuasive and seem to provide an accurate picture of the societies in which we live. The models are
logically consistent. They make initial assumptions about the behaviour of individuals and societies and then,
from these assumptions, deduce how individuals and societies will react to changes in their environment.
Economic theories are often intellectually satisfying in the way that the conclusions are deduced logically from
the assumptions. However, the theories are not developed to provide intellectual stimulation but to explain and
predict human behaviour. The true test of an economic theory does not lie in its logical structure but in the
extent to which it provides accurate predictions of the responses of individuals and societies to changes in their
economic environment. Statistics develops the procedures used for collecting information and for measuring
the extent to which observed behaviour agrees with the predictions of theory. Through the use of statistics, the
validity of theories can be assessed.
2. For decision making.
Statistics is used in applying theory to decision-making. Even when a model has been shown to make pre-
dictions that agree with observed behaviour, these predictions are often too general to be of use to decision
makers. From the theory of demand, a businessman could be advised that a cut in price would lead to an
increase in the quantity demandedhardly a surprising conclusion. To be useful advice, the businessman
would also need to know by how much the quantity demanded will increase following the cut in price. In
macroeconomics, it is shown that an increase in government expenditure will lead, through the multiplier, to
an increase in the level of economic activity in the country. Again to be useful, this result must be quantied.
What is the value of the multiplier for Australia? To answer these questions, data must be collected and used
to estimate the numerical values of the responses.
The process of collecting data and quantifying theories is the subject matter of statistics. Without the use
of statistics to measure the variables that economists discuss, economics has little practical value. Without
statistics, economics is little more than idle speculation. When combined with statistics, economics becomes
a powerful aid to decision-making in a wide range of practical situations.
A knowledge of statistics is essential for all business students.
1.4 What is statistics?
The common view of statistics is that it deals with tables of economic data similar to Table 1.1 below.
Table 1.1: Australian Unemployment 1996 to 2000 (%)
Age(years) 1996 1997 1998 1999 2000
15 up to 20
18.6 18.3 18.3 15.6 15.2

20 up to 25 10.2 12.7 11.7 10.0 8.3
All persons 7.7 7.9 7.6 6.5 6.0
this class includes observations with ages from 15 years old up to but not including 20 years old. The other classes are dened similarly.
Source: The Labour Force, Australia, ABS, Cat. No. 6203.0
This table was constructed by the Australian Bureau of Statistics to show the changes in the unemployment rate
over time and by age group. The rate of unemployment is one of the major factors in both government and private
sector policy decisions and so it is important that these gures be reliable.
To see the full range of a statistical investigation let us look at how these gures were obtained and the uses to
which the gures are put.
The rst stage in the construction of the table was to survey a sample of approximately 30,000 dwellings each
month and record the labour force status of each person in the dwellings. This is called the Labour Force Survey.
The result is a huge quantity of completed forms. Data in this form is not usefulit must rst be summarised.
The second stage was to summarise these data and present the data in tabular form.
Before these data are used, the reliability of the gures must be investigated. Decisions should not be based on
unreliable gures. The third stage, then, is to measure the reliability of the data.
The data can now be analysed. The effectiveness of different policy initiatives over this period can be measured
and this information used in making more informed government policy decisions. The major objective in con-
structing this table is to aid the decision-making processes of the government.
The stages in the construction and application of Table 1.1 are typical of the processes used in statistics.
The four stages of a statistical investigation are:
1. collect the data;
2. summarise and present the data;
3. measure the reliability of the collected data;
4. analyse the data and use this knowledge in decision-making.
This suggests the following denition of statistics.
Statistics: The study of the methods used to collect, summarise and measure the reliability of data used in
decision-making.
1.5 Populations and samples
Through the Labour Force Survey the Bureau of Statistics attempts to measure the employment status of all mem-
bers of the Australian adult population. However, it bases its estimates on a survey of less than 1% of the total
adult population. It is a common feature of statistical investigations that data is only available for a part of the
group of interest.
Population: The set of all items of interest in a statistical investigation.
The items of interest can be objects such as individuals, households or countries, or measurements such as in-
come, savings etc.
Sample: A part, or subset, of the population.
In the Labour Force Survey the population is the usually resident population of Australia aged 15 and over and
the sample is the residents in the 30,000 selected dwellings aged 15 and over.
Most available data are sample data. Using sample data raises the most fundamental problem of statistics.
How reliable are gures calculated from samples as estimates of population values?
Consider for instance the Labour Force Survey outlined above. Although the sample used is large, it is still less
than 1% of the population of all adults in Australia. How likely is it that the households in the sample have a
very different employment status to the employment status in the population as a whole? If the sample gures
do differ from the population gures, the government acting on the sample data could make inappropriate policy
decisions.
Statistics is divided into two branchesdescriptive statistics and inferential statistics.
Descriptive statistics: Methods of collecting, summarising, and presenting data.
Inferential statistics: Methods for drawing conclusions (that is making inferences) about a population based on
sample data. When proper sampling procedures are used measures of the reliability of the conclusions can be
calculated.
The reliability of a conclusion is specied by giving the probability that the conclusion is true.
eg. The unemployment rate in the ACT is between 6.5% and 6.9% with probability 0.95.
In units 1 to 4 of this subject you will study descriptive statistics. We start with the process of collecting data and
then discuss the summarisation and presentation of data in tables, charts and graphs and through the calculation
of numerical summary measures.
In units 5 to 7 we discuss the probability theory necessary to understand inferential statistics. You will see that
we only introduce the most basic concepts of probability.
In units 8 to 12 this probability theory is applied to inferential statistics.
Example 1.1
A tutor asked all the students in her small tutorial group: How many hours did it take you to complete the rst
tutorial exercise? The responses from the eight students in the tutorial were
4 6 6 4 8 7 6 4
(a) She forwarded this information to the Subject Coordinator. Are these gures population or sample data?
(b) The tutor concluded that on average the students in her tutorial were taking about 6 hours to complete the
rst tutorial exercise. On the basis of this evidence from this tutor, the Subject Coordinator concluded
that the students of the subject were taking about 6 hours to complete the rst tutorial exercise. Are these
analyses of the results, exercises in descriptive statistics or exercises in inferential statistics?
Solution
(a) Whether the data are viewed as sample or population data depends upon the group of students about whom
conclusions are to be drawn on the basis of the data. If the tutor is only interested in how many hours this
group of students took to complete the exercise, then, for the tutor, these are population data. If, on the
other hand, the tutors interest is in all the students that take this course now and in the future, then these are
sample data. The Coordinator is probably interested in all the students taking the subject and so for her/him,
these are sample data. Notice that it is possible for the same data to be population data for one study and
sample data for another.
(b) The tutor has summarised the responses of the eight students into a single numberthe average time spent
on the tutorial exercise. This is an exercise in descriptive statistics.
The Coordinator has drawn a conclusion on the average time spent on the exercise by all the students in
the subject from the average time spent by this sample of eight students. This is an exercise in inferential
statistics. The Coordinator has now to decide whether this is a reliable estimate of the average time taken
by all students to complete the tutorial exercise.
Socrates
1.6 Errors in sampling
You now know that the objective of taking a sample is to use the sample data to make reliable statements about
the population. Many sample results are reported in the media and discussed at great length. Have you realised
that little consideration has been given to the reliability of the sample results? For example, the Canberra Times
reports on the results of surveys of public opinion. Are these polls reliable guides to public opinion? When
considering the results of such polls, there are a number of questions that should be asked.
1. Was the sample taken from the correct population?
The sample should be taken from the population of interest. We need to distinguish between the target popu-
lation and the sampled population.
Target population: The population about which conclusions are to be drawn on the basis of the sample data.
The sample should have been selected from this target population. However it is often difcult to make sure
that the group from which the sample is selected is the same as the target population.
Sampled population: The population from which the sample is taken.
Some of the most spectacular errors in statistics have occurred because the sampled population was very dif-
ferent from the target population. The most famous example of this was the 1936 Literary Digest prediction
for the US presidential election. The sample was selected from telephone directories and club membership
lists. This sampled population included a much higher percentage of rich Republican voters than existed in
the population as a whole. The Literary Digest predicted a victory for Lansdown, the Republican candidate,
but in fact there was a landslide victory for Rooseveltthe Democrat candidate!
2. What procedure was used to select the sample from the sampled population? Did this procedure introduce a
bias into the results?
One popular procedure used for opinion polls is for the interviewers from the polling agency to walk around
town, read prepared questions to individuals in the street and record the responses. The sample selection is
made by the interviewers. This can introduce a selection bias in the sample. The interviewers may only seek
the opinions of a particular type of passerby. For example, it is not unknown for young pollsters to only seek
the opinions of attractive young members of the opposite gender. The opinions of large drunken males are
seldom sought. Samples selected by interviewers have often given inaccurate estimates of population values.
The bias can be reduced, but not eliminated, by requiring each interviewer to ll a quota of different types of
passerby, e.g. 20 males aged 2029, 20 females aged 2029 etc. This is then called quota sampling. This
still leaves the interviewer free to select within the group and this selective sampling can result in the sample
values not being representative of the population as a whole.
3. What was the response rate?
Data are often collected through a questionnaire. Questionnaires are sent through the post and the recipients
are asked to ll in and return the completed questionnaire. The individuals to whom the questionnaires are
sent are the sampled units. Not all these individuals will choose to respond to the questionnaire. The results
would then be based not on the opinions of all those to whom the questionnaire was sent, but only on those
who chose to respond. The opinions of those who choose to respond may differ from the opinions of those
who choose not to respond. This is especially important where the response rate is low. With low response
rates, a follow-up survey of non-respondents should be used and the results of the initial and follow-up surveys
compared.
4. Were any of the questions asked biased?
Questions should not suggest a preferred response. It has been found that even small variations in the ways
questions are phrased can elicit very different answers. Constructing neutral questions is extremely difcult.
5. Did the respondents have any reason to give an incorrect reply?
Household expenditure surveys reveal that alcohol and cigarette consumption are regularly under-reported.
Surveys of television viewing often report that respondents claim to have watched more current affairs pro-
grammes and less soap operas than is actually the case. In general, people like to present themselves in a
favourable light and may give responses that promote this.
6. What was the size of the sample?
At rst, it may seem that the size of the sample is the main determinant of the reliability of a sample: the
bigger the better. This is not the case. A small carefully selected sample from the target population will give
more reliable results than a large sample selected in a non-random manner.
The choice of inappropriate sampling procedures, the asking of inappropriate questions and calculation errors can
cause the calculated sample values to differ from the true population values. Such errors are called non-sampling
errors. They can be reduced by careful sample selection, by asking more appropriate questions and by checking
calculations.
Even when great care is taken to reduce these non-sampling errors, it is possible that purely by chance the sample
selected has different values from the population. For example, if the names of all the students at UC are put in a
hat and a sample of 10 students selected, it is possible that the 10 tallest students are selected. Statements made
about the heights of students based on this sample will be seriously in error. Errors that arise through the random
nature of sampling are called sampling errors. These cannot be completely avoided by exercising care in sample
selection as they arise through the nature of the sampling process. However, using efcient sampling procedures
will reduce the probability of large sampling errors.
In the remainder of this unit you will learn sampling procedures that enable samples to be taken without a sample
selection bias.
Example 1.2
A television current affairs program asked:
Do you believe that capital punishment should be introduced for cheating students who copy
their assignments? Ring in and give us your opinion.
Were the results reliable?
Solution
Target population: not specied but probably all adults.
Sampled population: viewers of the program.
Viewers of a current affairs program may have different opinions from those who watch cartoons or do
not watch television at all. The sampled population differs greatly from the target population.
2. What procedure was used to select the sample from the sampled population? Did this procedure intro-
duce a bias into the results?
The original sample was all the viewers of the program but the respondents were self-selected. This
makes the sample extremely unreliable.
Usually very low. Only a few people will have responded. Students who were copying their assignments
had a strong interest in the result and so may have been over-represented in the respondents.
4. Were the questions asked biased?
The inclusion of the emotive word cheating could have biased the responses.
5. Did the respondents have any reason to give an incorrect response?
Students who were copying their assignments had an interest in the resultbut that was not a bias; it
only reected their opinions.
Probably very few responded. Even if the response was large, the problems listed above would make
this an unreliable sample. On a recent Australian television program, it was estimated from 30,000
respondents that 65% of the population supported the policies of a particular politician. A properly
conducted survey showed that the true percentage was less than 30%!
In conclusion, the results of surveys of this kind are very unreliable and should never be taken seriously.
Example 1.3
We choose every 100th domestic phone number in the ACT phone book and ask the person answering the phone:
Were you unemployed last week?
The percentage unemployed in the sample is used to estimate the percentage unemployed in the ACT. Is this a
reliable estimate?
Lets go through the checklist again!
Target population: all adults who are usually resident in the ACT.
Sampled population: the population in domestic accommodation in the ACT with a listed telephone number.
The sampled and target population are similar but not identical. Non-adults could be included and people
without accommodation or in accommodation with an unlisted phone number would be excluded. These dif-
ferences can be minimised. The problem of non-adults being included in the sample could be removed by
asking for the respondents age, and the problem of unlisted numbers could be overcome by dialling numbers
at random rather than using the phone book.
In developed countries phone polling with random dialling is a well tested and widely used polling methodol-
ogy. Most of the opinion polls that you read in the newspapers are telephone polls with random dialling. The
next time you see one of these polls study the method used to collect the data.
In less developed countries, where only the rich have phones, telephone polling is highly unreliable.
2. What procedure was used to select the sample from the sampled population? Did this procedure introduce a
bias into the results?
The phone numbers were randomly selected from the phone book and the person who answered the phone
questioned. Questioning the person who answers the phone could bias the results.
(a) The unemployed spend more time at home and so are more likely to answer the phone. This could bias
the estimated unemployment rate upwards.
(b) The young are generally the rst member of the household to answer the phone and their unemployment
rate is higher than the rest of the population. Again this could bias the results upwards.
Professional pollsters overcome these biases by asking for a randomly selected specic member of the household
for example the third oldest etc.
Telephone surveys generally have very high response rates. There is unlikely to be a major bias here.
4. Were the questions asked biased?
The only problem with the above question is the denition of unemployment. When should an individual
consider themselves to be unemployed? If an unemployed physics professor earns a little extra money by
delivering junk mail for two hours a week, is she still unemployed? Most people would say yes but the ABS
would say no! The interviewer should give the denition of unemployed to the respondent.
5. Did the respondents have any reason to give an incorrect response?
Some respondents may be unwilling to admit that they are unemployed. (Or that they are employed if the tax
department rings!)
This would be a large sample.
The major problem with this survey is asking the person who answers the phone to respond. This could
produce major biases in the estimated unemployment rate. The results of this survey would be unreliable.
1.7 Sampling plans
In the previous section, we have seen that errors can arise through the use of incorrect sample selection procedures.
Sample selection procedures that at rst seem reasonable may introduce biases into the results.
Taking a sample in an ad-hoc manner is unlikely to produce reliable results. Some biases in sample selection can
be foreseen and avoided but even when great care is exercised, unforeseen biases may be introduced that make the
results unreliable. What is needed is a systematic approach to choosing a sample so that biases in the sampling
process are always avoided. The probability sample plans described from Section 1.8 onwards ensure that there
are no biases in the sampling procedure.
The two main categories of samples are: non-probability samples and probability samples.
Non-probability sample: A sample selected in such a way that the probability of some items of the population
being included in the sample cannot be calculated.
One common type of non-probability sample is a judgemental sample.
Judgemental sample: A sample in which the sample-taker uses her/his preferences to choose which items to
include in the sample.
It is usually impossible to quantify the decision processes used by the sample-taker in making her/his selection
and hence the probability of an item being included in the sample cannot be calculated.
Statisticians prefer probability samples.
Probability sample: A sample selected in such a way that each item in the population has a known probability
of being included in the sample.
Notice that it is not necessary that each item in the population has the same probability of selectiononly that
each probability is known.
Statisticians favour probability samples over non-probability samples for two reasons:
1. The exercise of judgement in selecting a non-probability sample may bias the results;
2. With a probability sample, the likelihood that the sample value is close to the population value can be
calculated. This gives a precise measure of the condence with which the sample value can be used in
making policy decisions.
In this unit we will look at ve types of probability samples:
simple random samples
systematic random samples
stratied samples
cluster samples
multistage samples.
All these samples use the same basic rule of sample selection: the principle of randomisation.
Principle of randomisation: When a selection is to be made, the selection should be made at random.
This is one of the basic principles of statistics. By selecting at random, built-in biases can be avoided.
1.8 Simple random samples
In these samples every item in the population has the same probability of being selected.
Simple random sample: A simple random sample of n items is one selected from the population in such a way
that every different sample of size n has the same probability of being chosen.
Simple random samples can be selected with or without replacement and with or without ordering.
A sample is said to be selected with replacement if, after each selection has been made, the selected item is
replaced in the population and so can be selected again in the same sample. In sampling without replacement,
items are not replaced in the population after selection and so cannot be selected twice in the same sample.
A sample is said to be selected with ordering if the order of selection is important. In sampling with ordering,
selecting a rst and then b second is considered a different sample from selecting b rst and then a second. For
sampling without ordering, these two samples would be counted as the same sample.
Simple random sampling can only be used where there is a list of all the members of the population. This list is
called the sample frame.
Sample frame: The list of the items from which the sample is to be drawn.
The sample frame should be listed in a systematic mannere.g. alphabetically.
The steps in selecting a simple random sample are:
1. obtain a sample frame that is as close as possible to the target population;
2. allocate each item in the list a number with the same number of digits;
3. use random number tables to select the required sample.
The use of an ordered list and a published set of random numbers enables the statistician to demonstrate that a
simple random sample has been taken and that the sampling procedure has not been manipulated. Selecting items
by writing each items identier on a separate piece of paper, and then drawing pieces of paper at random from
a hat is simpler and will also give a simple random sample but has the disadvantage that the statistician cannot
later prove that random selection was used. Auditors use random sampling and sometimes have to demonstrate in
a court of law that proper sampling procedures were used in an audit. In circumstances like these it is important
that the sampling process used be fully documented and replicable.
Using a simple random sample has a two main advantages:
1. It avoids the bias introduced by some non-random selection methods;
2. It is possible to quantify the reliability of the results.
There are some disadvantages of using a simple random sample.
1. Some clearly identiable groups of interest may be under-represented. For example, if a random sample of
UC students is taken, it is possible that the sample has no student from some of the smaller schools. If the
object is to investigate a wide range of students, this is undesirable.
2. It requires that there be a sample frame. For some sampling problems, it is very difcult to obtain a list of
all the population.
3. It can be very expensive. The selected individuals may be spread over a wide area and contacting each
individual would then involve a long and expensive journey.
4. More reliable estimates can often be obtained by using a stratied random sample of the same size (see the
section on stratied random samples below).
Example 1.4
There are four employees in an ofce - Arthur, Ben, Clare and Daniel. A sample of size 2 is taken from this
population of 4 employees.
(a) List all the possible samples if the sampling is
i. with replacement and with ordering;
ii. with replacement and without ordering;
iii. without replacement and with ordering;
iv. without replacement and without ordering.
(b) If the sample selected is a simple random sample, what is the probability that the sample selected is Arthur
and Ben?
Solution
Let Arthur = a, Ben = b, Clare = c and Daniel = d.
(a) i. For sampling with replacement and with ordering, the set of possible samples is
S = {aa, ab, ac, ad, ba, bb, bc, bd, ca, cb, cc, cd, da, db, dc, dd}
Notice that if a is the rst element selected, it is replaced in the population before the second element
is selected. Thus a could be selected again when the second element is selected. Thus the sample
could be a rst and a second. Similarly the sample could be bb, cc or dd.
ii. For sampling with replacement and without ordering, the set of possible samples is
S = {aa, ab, ac, ad, bb, bc, bd, cc, cd, dd}
Here ba and ab are counted as the same sample and so are not listed separately.
iii. For sampling without replacement and with ordering, the set of possible samples is
S = {ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc}
Here the rst element is not replaced before the second element is selected. Thus if a is the rst
person selected then when the selecting the second person only b, c and d are left in the population to
be selected.
iv. For sampling without replacement and without ordering, the set of possible samples is
S = {ab, ac, ad, bc, bd, cd}
(b) With a simple random sample, each sample has the same chance of being selected.
i. For sampling with replacement and with ordering, there are 16 possible samples and two of these
samples (ab and ba) include both Arthur and Ben.
.
.
. P(Arthur and Ben) =
2
16
ii. For sampling with replacement and without ordering, there are 10 possible samples and one of these
samples (ab) includes both Arthur and Ben.
.
.
1
10
iii. For sampling without replacement and with ordering, there are 12 possible samples and two of these
samples (ab and ba) include both Arthur and Ben.
.
.
2
12
iv. For sampling without replacement and without ordering, there are 6 possible samples and only one of
these samples (ab) includes both Arthur and Ben.
.
.
1
6
Example 1.5
In the year 2001, 965 students were registered as studying in the Division of Management and Technology.
Explain how you would use the random number table below to select a simple random sample of 20 students.
Table 1.2: A Short Table of Random Numbers
7898 8002 4418 2747 8079 7993 6863 9542 0949 4531 6955 5826 9971 6233 7887
8640 3204 6906 5719 1116 5982 9532 2422 8333 8828 9002 2680 1928 8532 3600
4431 3453 3070 5239 3168 6490 0275 8443 9984 7503 0263 8086 3372 5454 1599
5868 4764 0158 1225 5558 7840 9394 8126 6974 1561 4765 0758 8717 6979 6306
8214 6959 7775 5844 5149 9173 4558 9107 0453 6119 2915 6586 9670 6580 5202
3137 1170 0345 6099 6352 6074 6142 1898 3657 1924 5625 3556 8178 0103 6107
3490 3349 7010 2045 6123 6271 8981 5274 2183 9820 0957 3988 6747 3508 8914
Solution
The steps to be followed in selecting this simple random sample are:
1. Obtain a sample frame that is as close as possible to the target population.
The University Registrar can provide a list of all the students enrolled in the Division of Management and
Technology. This list will contain many errors but it is the best available sample frame.
2. Allocate each item in the list a number with the same number of digits.
With 965 students, each student can be allocated a different three-digit number. The rst student in the list
could be allocated the number 001, the second student the number 002, etc.
3. Use random number tables to select the required sample.
A random starting point should be selected from the above table. Suppose the starting point used is the start
of row 6. Then dividing the numbers in rows 6 and 7 into groups of three gives
313 711 700 345 609 963 526 074 614 218 983 657
192 456 253 556 817 801 036 107 349 033 497 010
The 20 selected students are the students with numbers
313 711 700 345 609 963 526 074 614 218 657 192
456 253 556 817 801 036 107 349
Note : The number 983 is ignored as it lies outside the range of numbers (001 965)
allocated to students.
Quick Question 1
1.9 Systematic random samples
The simple random sampling procedure outlined above can be extremely tedious when a large sample is to be
selected. To select a sample of 200 items, 200 random numbers are extracted from the random number tables.
Systematic random sampling gives a much quicker way of selecting a sample that is almost a simple random
sample and in practice can be analysed as if it is a simple random sample. It also has the advantage that the
selected items are evenly distributed over the sample frame and not concentrated in one area of the sample frame.
The four-step procedure is:
2. Determine the sample fraction of 1 in k by:
k =
population size
sample size
(If the result is not an integer, round the number upwards to the next integer.) Then 1 in every k items is to
be chosen.
3. Renumber the rst k items in the sample frame and then use a random number table to randomly select one
of these items. Suppose the selected item is number r.
4. The sample selected is the rth item and then every subsequent kth item.
It is possible that this could lead to odd samples. The problems arise where the items in the sample frame have a
pattern which coincides with the selection pattern used in systematic sampling. For example, if the sample frame
is a list of married couples ordered: wife, husband, wife, husband, wife, etc and the sample fraction is even, then
a systematic sample will contain all husbands or all wives.
These cases are rare. In all other cases systematic and simple random samples have the same properties. System-
atic samples are therefore analysed as if they are simple random samples. Systematic sampling is a quick way of
selecting an almost simple random sample.
Example 1.6
In the year 2001, 965 students registered as studying in the Division of Management and Technology. Explain
how you would use the random number tables below to select a systematic random sample of 20 students.
Table 1.3: A Short Table of Random Numbers
7898 8002 4418 2747 8079 7993 6863 9542 0949 4531 6955 5826 9971 6233 7887
8640 3204 6906 5719 1116 5982 9532 2422 8333 8828 9002 2680 1928 8532 3600
4431 3453 3070 5239 3168 6490 0275 8443 9984 7503 0263 8086 3372 5454 1599
5868 4764 0158 1225 5558 7840 9394 8126 6974 1561 4765 0758 8717 6979 6306
8214 6959 7775 5844 5149 9173 4558 9107 0453 6119 2915 6586 9670 6580 5202
Solution
The steps in selecting this systematic random sample are:
The Registrar can provide a list of Division of Management and Technology students.
2. Determine the sample fraction of 1 in k by
k =
population size
sample size
=
965
20
= 49 (after rounding upwards)
Then 1 in every 49 students is to be chosen.
3. Renumber the rst 49 students in the sample frame and then use a random number table to randomly select
one of these students.
As there are only 49 students in the list, each student can be given a two-digit number from 01 to 49.
Choosing a random starting point in the above tablesay the start of the second rowand dividing the row
into two-digit numbers gives:
86 40 32 04 69 06 57 19 11 16 59 82
After discarding 86 as too large (i.e. outside the range 01 to 49), student 40 is the student selected out of
the rst 49 students.
4. The sample selected is the 40th item and then every subsequent 49th item.
The rst selection is student 40 on the list.
The second selection is student number (40 + 49) = 89 on the list.
The third selection is student number (89 + 49) = 138 etc.
This will only give a sample of 19 students as the 20th number would be 971 and there are only 965 students.
The nal student should be randomly selected from the population.
Quick Question 2
1.10 Stratied random samples
Where a population can be divided into a number of clearly dened groups, it is often advantageous to ensure
that the sample contains some items from each of these groups. This can be achieved by taking a separate simple
random sample from each of the groups. In this procedure each group is referred to as a stratum (singular word
form) of the population. The strata (plural word form) must be mutually exclusive and collectively exhaustive
groups so that each item of the population is in one and only one stratum.
Stratied random sample: A sample obtained by dividing the population into mutually exclusive subgroups,
called strata, and then drawing a simple random sample from each stratum.
With stratied random sampling, a decision has to be made on how to divide the sample between different strata.
One method, called proportional allocation, is to allocate the sample in proportion to the size of the stratum.
However, there are better ways of dividing the sample between strata that take into account both the cost of sam-
pling and the variability of the strata values. These methods are described in higher level statistics units but in
this subject we will only consider proportional allocation.
A stratied sample has the following advantages:
1. Ensuring that all the population groups are included, may make the sample more representative of the
population than a simple random sample. It can give a more reliable estimate of a population value than a
simple random sample of the same size. Stratied sampling is particularly advantageous when the items in
each stratum are very similar to each other and different to the items in other strata.
2. It ensures that there are sufcient members of each group in the sample for the characteristics of each group
to be estimated and for comparisons to be made between the groups.
3. It is still possible to quantify the reliability of the results.
However, there are some disadvantages:
1. It requires that there be a sample frame.
2. It requires that a separate sample frame be constructed for each group. This may be difcult and expen-
sive when the required information is not readily available.
3. It can be very expensive. The selected individuals may be spread over a wide area and contacting each
individual would then involve a long and expensive journey.
Example 1.7
In the year 2001, 965 students registered as studying in the Division of Management and Technology. There were
425 male students and 540 female students. Explain how you would use the random number table below to select
a random sample of 20 students stratied by gender with proportional allocation.
Table 1.4: A Short Table of Random numbers
7898 8002 4418 2747 8079 7993 6863 9542 0949 4531 6955 5826 9971 6233 7887
8640 3204 6906 5719 1116 5982 9532 2422 8333 8828 9002 2680 1928 8532 3600
4431 3453 3070 5239 3168 6490 0275 8443 9984 7503 0263 8086 3372 5454 1599
5868 4764 0158 1225 5558 7840 9394 8126 6974 1561 4765 0758 8717 6979 6306
8214 6959 7775 5844 5149 9173 4558 9107 0453 6119 2915 6586 9670 6580 5202
Solution
The steps to be followed in selecting this stratied random sample are:
1. Obtain a separate sample frame for each stratum that is as close as possible to the stratum in the target
population.
Obtain a list of Division of Management and Technology students with each student identied by name
and gender. This list should then be divided into two separate lists of the 425 male students and 540
female students.
2. Determine the number of students to be selected from each stratum.
The sample fraction is 1 in k where
k =
population size
sample size
=
965
20
= 49 (after rounding)
i.e. 1 in every 49 students is to be selected. Thus
425
49
= 9 (after rounding) of the male students and
540
49
= 11 of the female students are to be selected.
3. First select the 9 male students.
With 425 male students, each student can be allocated a different three-digit number in the range 001 to
425. If the starting point used in Table 1.4 is the start of row 2, then dividing the numbers in rows 2 and
3 into groups of three gives:
864 032 046 906 571 911 165 982 953 224 228 333
882 890 022 680 192 885 323 600 443 134 533 070
Ignoring numbers outside the range 001 to 425, the selected 9 male students are the students with the
numbers:
032 046 165 224 228 333 022 192 323
4. The 11 female students can now be selected in the same way.
Quick Question 3
1.11 Cluster samples
In the sampling procedures discussed so far, each item in the sample is chosen individually. In cluster sampling,
the population is rst divided into groups called clusters with each item of the population being be a member of
one and only one of the clusters . Then a simple random sample of clusters is selected and all the items in the
selected clusters are included in the sample.
Cluster sample: A simple random sample of groups or clusters of items.
With this method the clusters are usually geographical areas.
As an example, consider the problem of selecting a sample of Canberra households. There is no well dened
sample frame available. Even if a random sample of households could be selected, the households would be
spread over the City and each visit would require a certain amount of travel time. This would make the sample
expensive. To take a cluster sample of Canberra households the procedure would be:
1. divide Canberra into a large number of small areas (eg use streets);
2. randomly select some of these small areas;
3. visit each of the selected small areas and interview all the households in the area.
Note that with stratied sampling, the population is divided into groups and some items are selected from all of
the groups. With cluster sampling, the population is divided into groups and all the items selected from some of
the groups.
Cluster sampling has the following advantages.
1. It does not require a sample frame of the items of the population. It does require a sample frame listing all
the clusters in the population. You cannot take a simple random sample of clusters without a sample frame
of clusters.
2. It is much cheaper than a simple random sample of the same size;
3. It is still possible to quantify the reliability of the results.
There are some disadvantages.
1. Some clearly identiable groups of interest may be under-represented.
2. Where the clusters are geographical areas, the items within the geographical area may be very similar to
each other. For example, if the area selected is in a poor part of the town, nearly all the households in
the selected cluster will be low income households. The households selected through a cluster sample will
then show less variability and so yield less information than the same sized simple random sample. The
more similar are the items within each cluster, the more serious is the problem. However, the choice is not
usually between simple random samples and cluster samples of the same size. For the same cost, a much
larger cluster sample can be taken and this advantage often outweighs the loss of variability.
Notice that:
1. a stratied random is most efcient when the items in each stratum are very similar to each other.
2. a cluster sample is most efcient when the items in each cluster are very different from each other.
1.12 Multistage samples
Multistage sample: A sample selected through a sequence of two or more sampling procedures.
For example, to take a multistage sample of Canberra households:
1. divide Canberra into a number of areas
2. randomly select some of these areas
3. randomly select some of the households in each of the selected areas.
In this example two simple random samples have been taken (a simple random sample of areas and then a simple
random sample of the households within the selected areas).
Most large-scale samples are combinations of different sampling proceduresin the next section you will learn
how multistage sampling is used to estimate the rates of unemployment in Australia.
The advantages and disadvantages of multistage samples are simply the advantages and disadvantages of the
sampling procedures in the sequence of sampling procedures.
Quick Question 4
Socrates
Socrates
1.13 Measuring unemployment in Australia
The rates of unemployment for Australia are discussed at great length in the popular press and on television. One
of the major objectives of government economic policy is to achieve acceptable rates of unemployment. But what
is meant by a rate of unemployment and how is it calculated? Are the gures reliable? In this section you will
learn how the unemployment rates for Australia are dened and calculated.
The main source of data is the Australian Bureau of Statistics monthly Labour Force Survey. This survey is
described below. For a more detailed description read
Information Paper: Labour Force Survey Sample Design 1997, ABS, Cat. No. 6269.0.
Labour Statistics: Concepts Sources and Methods 2001, ABS, Cat. No. 6102.0.
Denitions
Before unemployment can be measured it must rst be dened. Economic models do not usually dene their
variables very precisely but to collect gures in a systematic manner all the data collectors must use exactly the
same denitions of the terms used in the survey.
The target population for the Labour Force survey is the usually resident civilian population aged 15 years and
over. A person in this population is counted as a member of the labour force if he/she is either employed or
unemployed.
The denitions of employed or unemployed given by the Bureau of statistics are given on the following page.
Read these denitions carefully. Notice that many people in the population do not count as either employed or
unemployed! They are not included in the unemployment gures at all.
Employed: Persons aged 15 and over who, during the reference week:
Worked for one hour or more for pay, prot, commission or payment in kind in a job or business, or on
a farm (comprising employees, employers and self-employed persons): or
Worked for one hour or more without pay in a family business or on a farm (i.e. contributing family
workers); or
Were employees who had a job but were not at work and were: on paid leave; on leave without pay
for less than four weeks up to the end of the reference week; stood down without pay because of bad
weather or plant breakdown at their place of employment for less than four weeks up to the end of the
reference period; on strike or locked out; on workers compensation and expected to be returning to their
job; or receiving wages or salary while undertaking full-time study; or
Were employers, self-employed persons or unpaid family helpers who had a job, business or farm, but
were not at work.
Unemployed: Persons aged 15 and over who were not employed during the reference week, and
Had actively looked for full-time or part-time work at any time in the 4 weeks up to the end of the
reference week and
Were available for work in the reference week or would have been available except for temporary illness
i.e. lasting for less than 4 week to the end of the reference week); or
Were waiting to start a new job within 4 weeks of the end of the reference week and would have started
in the reference week if the job had been available then; or
Were waiting to be called back to a full-time or a part-time job from which they had been stood down
without pay for less than 4 weeks up to the end of the reference week (including the whole of the
reference week) for reasons other than bad weather or plant breakdown.
Labour force: The sum of the number employed and the number unemployed in the population
Unemployment rate: The number unemployed expressed as a percentage of the labour force.
Notice that people who have been without work for a long period and have become so discouraged that they are
no longer actively seeking work are not counted as unemployed. In times of high unemployment this could lead
to a gross underestimate of the percentage of people who wish to work but cannot. To see the magnitude of this
problem always look at the participation rate as well as the unemployment rate.
Participation rate: The labour force expressed as a percentage of the usually resident civilian population aged
15 and over in the same group.
Falls in the participation rate could be because of hidden unemployment.
Survey methodology
The Labour Force Survey information is obtained by specially trained interviewers, using face to face and tele-
phone interview collection methods from the occupants of selected dwellings. Information is gathered about
each individual in the target population in each of the selected dwellings. Selected dwellings remain in the survey
for eight consecutive months. The rst interview is conducted face to face and then subsequent interviews are
conducted by telephone.
Notice that dwellings are selected and then every individual in the dwellings included in the sample. This is a
cluster sample. Using the dwellings as clusters reduces transport costs.
The response rate is over 98% and so there is no low response rate bias.
The sample design
How is the sample of dwellings selected? A multi-stage probability sample design is used. The probability of
each individual being included in the sample is shown in Table 1.5 below.
Table 1.5: The Labour Force Survey Sample Design
State Probability
New South Wales 1 in 300
Victoria 1 in 252
Queensland 1 in 222
South Australia 1 in 147
Western Australia 1 in 360
Tasmania 1 in 83
Northern Territory 1 in 85
Australian capital Territory 1 in 85
The Labour Force Survey is a probability sample and so the reliability of the estimates can be measured.
There are two conicting criteria used to determine the sample size from each state.
The most reliable national estimates would be gained when the total sample for Australia ia allocated in
proportion to the population of each State/Territory.
For each State/Territory to have estimates as reliable as every other State/Territory, equal sized samples
from each State/Territory would be used .
The allocation of the sample across the States and Territories is designed to achieve a compromise between the
national estimates and the State/Territory estimates. It results in the sample fraction for each State/Territory
differing but not to the extent that would realise identical sample sizes. Within each State/Territory each dwelling
has the same probability of selection.
The sample data is collected from:
a stratied multi-stage sample of private dwellings. (The above sample proportions gives a sample of
29,000 private dwellings)
a stratied multi-stage sample of 500 non-private dwellings (hotels etc).
There is a total of approximately 61,500 people responding to the survey.
For the sample of private dwellings, the sample frame used was the list of all 34,000 census collection districts
used in the census. Each of these collection districts contain approximately 250 private dwellings.
The collection districts are stratied geographically by state and territory and then further stratied to give ap-
proximately 300 private dwelling strata. A separate simple random sample of collection districts is taken from
each of the 300 private dwelling strata.
This is a stratied random sample of collection districts with 300 strata. Taking separate samples from each of
the 300 private dwelling strata ensures that there are sufcient individuals included from each state and territory
for reliable estimates of state and territory unemployment rates to be obtained.
A multi-stage process is used to select the dwellings from within each private dwelling strata.
1. Using the sample frame of collection districts for each strata a simple random sample of collection districts
is taken.
2. Using the sample frame of blocks of dwellings available for each selected collection district, a single block
is selected.
3. The selected blocks are visited and a list of all the dwellings in the block made. A systematic random
sample is taken from the list of dwellings in the block. This selected sample is then given to an interviewer
who visits the selected dwellings in the list.
With this sampling procedure:
1. the interviewer has no say in the selection of the dwellings. There is no judgemental selection factor here
to bias the results.
2. there is a second cluster effect here. By only visiting one block in each collection district there is a reduction
in transport costs.
A similar multi-stage design is also used to draw the sample of non-private dwellings.
The Labour Force Survey is a highly organised and well structured sample of individuals from the target popula-
tion. It is random but certainly not haphazard. The gures obtained are reliable estimates of the dened rates of
unemployment.
Notice that the gures do not come from counting the number of people visiting the CES as claimed by a former
deputy prime minister.
1.14 Summary
This unit introduces some of the basic vocabulary of statistics.
Distinctions are drawn between populations and samples, between descriptive and inferential statistics and be-
tween probability and non-probability samples.
You learned that most data collected are sample data and that the fundamental problem of statistics is to assess
the reliability of these sample data. The reliability of sample data can only be measured with any precision for
probability samples.
You learned how to use random number tables to select different types of probability sample: a simple random
sample, a systematic random sample and a stratied random sample. You also learned the advantages and disad-
vantages of each type of sample.
The result of the data collection process is a large volume of disorganised data. In the next unit you will learn
how to organise these data into tables and diagrams.
Read the learning objectives at the start of this unit. Have you achieved those objectives? If there are any objec-
tives about which you are unclear, re-read the appropriate sections before trying the tutorial exercise.
UNIT 2
TABLES, CHARTS
AND GRAPHS
2.0 Contents
2.2 Introduction
2.3 The four levels of measurement
2.4 Discrete and continuous variables
2.5 Presenting data in tables
2.6 Raw data and arrays
2.7 Value frequency distributions
2.8 Stem-and-leaf displays
2.9 Class frequency distributions
2.10 Cumulative frequency distributions
2.11 Presenting data in charts: pie charts and bar charts
2.12 Presenting data in graphs: drawing histograms and cumulative frequency polygons
2.13 Misleading diagrams
2.14 Summary
Print Workbook 2
Distinguish between variables measured on a nominal, ordinal, interval and ratio scale;
Identify discrete and continuous variables;
Use arrays, stem-and-leaf displays and value frequency tables to organise data;
Construct class frequency tables;
Use pie charts and bar charts to emphasise particular features of data;
Draw histograms and cumulative frequency polygons;
Recognise when a diagram presents a misleading picture of data.
2.2 Introduction
In Unit 1 you have learnt that the data collection process results in a large volume of disorganised data. For these
data to be useful, they must be organised and the resulting information communicated to decision makers.
This unit describes the process of organising and presenting data.
The way data is organised depends on the type of data collected. The unit starts by categorising data into four types
through the level of measurement of the variable. A distinction is then drawn between discrete and continuous
variables. These concepts guide us on the appropriate techniques to use in organising data.
Statisticians communicate with decision makers through tables, charts and graphs. Tables are used to display
detailed information and the unit provides some guidelines on the construction and presentation of tables. We
use charts and graphs to emphasise particular aspects of the data. We have discussed several types of charts and
graphsstem-and-leaf displays, pie charts, bar charts, histograms and cumulative frequency polygons. The unit
concludes by discussing ways in which charts and graphs can be used to misrepresent data and hence mislead us.
2.3 The four levels of measurement
First let us dene two terms that we will use throughout this unit:
Variable: A characteristic of a sample or population that is of interest.
Data: The observed values of variables.
(Data is a plural wordthe singular is datum.)
For example, suppose that in a survey of the readers of a magazine the following questions were asked:
(a) What is your gender?
(b) What is your highest level of educationprimary, secondary or tertiary ?
(c) What was your year of birth?
(d) What was your income last year?
The recorded responses would be the data for the four variables. Notice that the data would be of very different
forms for the four variables. The data for the rst variable would be a list with values male and female. The data
for the third variable would be a list of numbers.
Upper case letters from the end of the alphabetV , W, X, Y and Z are used to represent the names of variables
and then lower caseoften with subscriptsare used for the data. For example we may refer to a variable X
with data x
1
, x
2
, . . ., x
n
.
The way data is analysed depends upon the type of data that has been collected. In this section you will learn
that data are classied into four levels of measurementnominal, ordinal, interval and ratio. Throughout this
subject you will be told to which levels of measurement each of the analytical tools developed in this subject can
be applied.
The level of measurement of a variable is determined by the arithmetic operations that can be carried out on the
values of the variable. In the discussion below, the notations v
1
and v
2
are used to represent two values of a
variable.
The questions to ask are:
1. Can we say whether every pair of values are either equal or unequal?
i.e. for every two values can we say that either v
1
= v
2
or v
1
= v
2
?
(Generally we can do this for all variables of interest.)
2. Can all unequal values be usefully ordered?
i.e. for all unequal values can we say that either v
1
< v
2
or v
1
> v
2
?
3. Can the difference between two values be usefully calculated?
i.e. can we calculate v
1
v
2
?
4. Can the ratio of the two values be usefully calculated?
i.e. can we calculate v
1
/v
2
for v
2
= 0?
Nominal level: A variable is said to be measured on a nominal level of measurement if
1. every pair of values can be said to be equal or unequal, but
2. there are unequal values of the variable that cannot be usefully ordered.
Ordinal level: A variable is said to be measured on an ordinal level of measurement if:
1. every pair of values can be said to be equal or unequal, and
2. every pair of unequal values can be usefully ordered, but
3. the difference (v
1
v
2
) between some pairs of values cannot be usefully calculated.
Interval level: A variable is said to be measured on an interval level of measurement if:
2. every pair of unequal values can be usefully ordered, and
1
v
2
) between every pair of values can be usefully calculated, but
4. the ratio (v
1
/v
2
) of some pairs of non-zero values cannot be usefully calculated.
Ratio level: A variable is said to be measured on a ratio level of measurement if:
2. every pair of unequal values can be usefully ordered, and
1
v
2
) between every pair of values can be usefully calculated, and
4. the ratio (v
1
/v
2
) of every pair of non-zero values can be usefully calculated.
Example 2.1
What is the level of measurement for the responses to the following questions in a questionnaire?
(a) What is your gender?
(b) What is your highest level of educationprimary, secondary or tertiary ?
(c) What was your year of birth?
Solution
(a) The variable gender has two possible values: male and female.
Clearly two values are either of the same gender (equal) or different genders (unequal).
The two unequal values male and female cannot be usefully ordered. They can be ordered alphabetically
but this is not an important property of the data and so the ordering is not useful.
The variable gender is measured on a nominal level.
(b) Two possible responses to this question are tertiary and secondary.
The values tertiary and secondary are unequal. Every other pair of values can be said to be either equal
or unequal.
These responses can be ordered (tertiary > secondary where > indicates higher level than).
The difference cannot be calculated (tertiary secondary = ?).
The variable highest level of education is measured on an ordinal level.
(c) Two possible responses are 1980 and 1960.
The two values 1980 and 1960 are unequal. Every other pair of possible values can be said to be equal
or unequal.
These two values can be ordered (1980 > 1960).
The difference between the values can be calculated (1980 1960 = 20). There was a difference of 20
years between the two years of birth.
The ratio of the two values (1980/1960 = 1.01) can be calculated, but it is not a useful number. What
does the ratio mean?
The variable year of birth is measured on an interval level of measurement.
Two possible responses are $20,000 and $5,000.
The values $20,000 and $5,000 are unequal. Every other pair of possible values can be said to be equal
or unequal.
The values can be ordered ($20,000 > $5,000).
The difference between the values can be calculated ($20,000 $5,000 = $15,000).
The ratio of the two values can be calculated ($20,000/$5,000 = 4).
The variable income is measured on a ratio level of measurement.
The terms qualitative variable and quantitative variable are used in some texts.
Qualitative variable: A variable measured on a nominal or ordinal level.
Quantitative variable: A variable measured on an interval or ratio level.
The levels of measurement can be ordered as nominal, ordinal, interval and ratio. Nominal is referred to as the
lowest level of measurement and ratio as the highest level of measurement.
In the subsequent units of this subject, all methods are categorised by the levels of measurement of the data to
which the method can be applied. For example, if a method is said to be used for variables measured on an
ordinal or higher level, this should be interpreted as meaning that the method can be used for variables measured
on an ordinal, interval or ratio level. If a method is said to be used for variables on an interval or higher level,
this means that the method can be applied to interval or ratio data.
Socrates
Quick Question 1
2.4 Discrete and continuous variables
Variables measured on an interval or ratio level may also be classied as either discrete or continuous. A precise
denition of these terms is beyond the scope of this subject but the following operational denitions should suf-
ce.
Discrete variable: A variable is said to be discrete if the possible values of the variable can be listed in such a
way that no observation could lie between the listed values.
Continuous variable: A variable is said to be continuous if, for every pair of values of the variable, it is possible
for an observation to lie between these values.
Example 2.2
Are the following variables discrete or continuous?
(a) The number of subjects taken by a student in a semester.
(b) The weight of a parcel in gms.
(c) The weight of a parcel recorded to the nearest 10 grams.
(d) Annual income.
Solution
(a) The number of subjects taken by a student in a semester.
The possible values for the number of subjects taken by a student in a semester are {1, 2, 3, 4, 5}. A stu-
dent cannot take a number that lies between these values such as 1.2 subjects or subjects. The variable
number of subjects taken is discrete.
(b) The weight of a parcel.
For any two values of the weight of a parcel an observation could lie between the values. For example,
with the two weights of 1120.4 gms and 1120.5 gms, a parcel weighing 1120.4356 gms would lie between
these two values. The variable weight of a parcel is continuous.
(c) The weight of a parcel is recorded to the nearest 10 grams.
The possible recorded weights in grams are {0, 10, 20, 30, . . .}. Obviously, no recorded weight could lie
between these values. Recorded weight is a discrete variable.
(d) Annual income.
Assume that incomes are paid in dollars and cents.
Then, the possible incomes (in dollars), are {0.00, 0.01, 0.02, . . . }. No observed income can lie between
these values. The variable annual income is discrete.
A variable cannot be both discrete and continuous but it is possible for a variable to be neither discrete nor
continuous. For some discrete variables, such as annual income in Example 2.2, the size of the gaps between
the listed values are very small compared to the size of the values. When a discrete variable has relatively small
jumps between the values, the variable may be treated as a continuous variable.
The cases of the weight of a parcel and the recorded weight of a parcel in Example 2.2 illustrate an important
point. For measuring variables, such as height and weight, the true values are continuous but the recorded values
are always rounded and so are discrete. All recorded data are discrete.
Socrates
Quick Question 2
2.5 Presenting data in tables
The rst stage in organising the collected data is often to present the data in a table. Tables can be a useful means
of communicating detailed information. Tables can be used to present data on any level of measurement. The
essential requirement of a successful table is that it should present the information in an easy-to-understand form.
The following principles of tabular presentation are aimed at making tables easy to understand. Not all the points
are relevant for all tables.
Every table should have a title and a number;
All the columns of the table should be clearly labelled with units given;
There should be a nal row and/or column giving appropriate total values;
The primary (original) source of the data should be given;
There should not be any unnecessary columns.
In the following sections we will look at some common types of table.
2.6 Raw data and arrays
Raw data: The data in the form collected, before being organised in any way.
The rst step in organising data depends upon the type of data and the number of observations that have been
collected.
With a small number of observations of a variable measured on an ordinal or higher level of measure-
ment, arrange the data into an array;
With a large number of observations, but only a small number of different values, draw up a value
frequency distribution;
With a large number of observations, and a large number of different values, use a stem-and-leaf display
or a class frequency distribution.
Arrays are described in this section and value frequency distributions, stem-and-leaf displays and class frequency
distributions are covered in the next 3 sections.
Data on an ordinal or higher level of measurement can be organised into a list with the values listed from lowest
to highest. This is called an array.
Array: A list of the observations ordered from lowest to highest.
Forming an array is a useful rst step in organising data when there is only a small number of observations.
Example 2.3
A random sample of 20 part-time students were asked: How many subjects are you taking this semester? The raw
data, as collected, are given in Table 2.1.
Table 2.1: The Number of Subjects Taken by 20 Part-time Students
1 2 3 2 2 1 2 2 2 1
3 2 2 1 2 1 2 3 3 1
Source: Random sample of 20 part-time students in 2001
Organise these data into an array.
Solution
1 1 1 1 1 1 2 2 2 2
2 2 2 2 2 2 3 3 3 3
2.7 Value frequency distributions
Where a variable has only a small number of values, the data can be organised into a value frequency distribution.
Value frequency distribution: A table showing the number of times each value was observed.
The frequencies in the frequency distribution can be either the number of observations or the relative frequency
(proportion) of the values.
A value frequency distribution can be constructed for variables on any level of measurement.
Example 2.4
The number of subjects taken by a class of 20 part-time students were as shown in Table 2.3 below.
1 2 3 2 2 1 2 2 2 1
3 2 2 1 2 1 2 3 3 1
Organise these data into a value frequency distribution.
Solution
Here, 20 observations have only 3 different values {1,2,3}.
Table 2.4: Number of Subjects Taken By 20 Part-time Students
Number of Subjects Frequency Relative
Frequency
1 6 0.30
2 10 0.50
3 4 0.20
Total 20 1.00
Source : Random sample of 20 part-time students in 2001.
Notice that when using a value frequency distribution, the recorded values of the observations are not lost. The
values are displayed as recorded.
2.8 Stem-and-leaf displays
When a variable has a large number of different values, the values can be displayed in a stem-and-leaf display.
A stem-and-leaf display can be constructed for variables measured on at least an ordinal level.
In a stem-and-leaf display divide the digits of an observation into the leading digit or digits, called the stem, and
the trailing digit, called the leaf. For example, an observed value of 182 has a stem of 18 and a leaf of 2. Then
organise the observations into rows with each row having a different stem and the leaves listed from smallest to
largest.
Example 2.5
The weekly food expenditure for a simple random sample of 30 low-income households was recorded to the
nearest dollar. The results are displayed in Table 2.5 below.
Table 2.5: Weekly Food Expenditure of Low Income Households
96 80 97 87 98 86 92 82 84 108
95 99 102 80 90 86 98 98 108 109
99 87 99 96 94 94 106 90 80 90
Source: Simple random sample of 30 low-income households in 2001.
Organise these data into a stem-and-leaf display.
Solution
Can we usefully organise these data into a value frequency distribution? Nothere are too many different values
(17 different values from 30 observations). They can be displayed in a stem-and-leaf display. Table 2.6 shows the
stem-and-leaf display for these data.
Stem Leaf
8 000246677
9 0002445667888999
10 26889
Total 30
Source : Simple random sample of 30 low-income households in 2001.
The stem-and-leaf display is useful for seeing the general shape of the distribution. It can also be used as a rst
step in forming a class frequency distribution. Note that the values of the observations can be reconstructed from
the table. For example, we can see in Table 2.6 that the values of the three highest observations were $108, $108
and $109.
Quick Question 3
2.9 Class frequency distributions
Stem-and-leaf displays are a tool used by statisticians to organise data but they are seldom used to communicate
information to non-statisticians. Class frequency distributions provide an alternative way of displaying the values
of a variable that has a large number of different values. They are more easily understood by non-statisticians
than stem-and-leaf displays.
Class frequency distributions can be constructed for variables measured on at least an ordinal level.
Table 2.7 is an example of a class frequency distribution.
Table 2.7: Weekly Food Expenditures of Low Income Households
Food Expenditure ($)
Frequency
80 up to 85 5
85 up to 90 4
90 up to 95 6
95 up to 100 10
100 up to 105 1
105 up to 110 4
Total 30
the rst class includes all observations from $80 up to but less than $85
the second class includes all observations from $85 up to but less than $90 etc
Source: Simple random sample of 30 low income households in 2001.
In a class frequency distribution the observed values of the variable are grouped into classes and the values of the
original observations are lost.
Class frequency distribution: A grouping of data into classes showing the number of observations in each mu-
tually exclusive class.
The rst, and most difcult, stage in constructing a class frequency distribution is to choose appropriate classes.
There are no hard and fast rules here but the following points should be borne in mind:
1. Choose mutually exclusive and collectively exhaustive classes so that each observation falls in one and only
one class.
2. Use between 5 and 15 classes. The object of the frequency distribution is to show the shape of the distri-
bution. With too many classes, the shape of the distribution is not clear and with too few classes, too much
information is lost.
3. Dont have many classes with a frequency of less than 5. Most data are sample data and a shape based on
very few observations in each class may only reect the shape of the sample data and this may not be a
reliable guide to the shape of the population. With more observations, more classes can be used.
4. Try and make the classes natural rather than contrived. Classes like 2.7 to 3.1, 3.2 to 3.5 etc look odd and
detract from the clarity of the table. Choose class widths of 1, 2, 5 or multiples of 10 of these units.
5. Where possible, make the classes of equal width. This may not be possible where most of the observations
are concentrated in a narrow range but there are a few observations that are much larger or smaller than the
others.
These points are sometimes in conict with each other and can be disregarded if they are inappropriate. They are
for guidance only.
The steps in constructing a class frequency distribution are:
1. Choose appropriate classes.
2. Work through the observations and place a tally mark in the appropriate class for each observation. Count
the tally marks to nd the number of observation in each class.
3. Delete the tally marks and lay out the results in a table.
A table is constructed to convey information to the reader. When you have nished a table, look carefully at the
table and see if it does present the desired information in a clear and unambiguous form.
Example 2.6
The weekly food expenditure for a simple random sample of 30 low-income households was recorded to the
nearest dollar. The results are displayed in Table 2.8 below.
96 80 97 87 98 86 92 82 84 108
95 99 102 80 90 86 98 98 108 109
99 87 99 96 94 94 106 90 80 90
Source: Simple random sample of 30 low-income households in 2001.
Organise these data into a class frequency distribution.
Solution
1. Choose appropriate classes.
The 30 observations run from $80 to $109 a range of $29. With a common class width of $2 this would
give 15 classes with an average of about two observations per class too few observations per class. With
a common class width of $5 there would be 6 classes with an average of 5 observations per class. With a
common class width of $10 there would only be 3 classes too few. A class of $5 is the best compromise
giving classes of $80 up to $85, $85 up to $90, $90 up to $95 etc.
2. Work through the observations and place a tally mark in the appropriate class for each observation. Count
the tally marks to nd the number of observation in each class.
The table now looks like Table 2.9
Food Expenditure ($) Tally marks Frequency
80 up to 85
d
d
5
85 up to 90
4
90 up to 95
d
d
6
95 up to 100
d
d
d
d
10
100 up to 105
1
105 up to 110
4
Total 30
3. Delete the tally marks and lay out the results in a table.
A class frequency distribution can show either the number of observations in each class or the proportion
of observations in each class (called the relative frequency) or both.
Food Expenditure ($)
Frequency Relative Frequency

80 up to 85 5 0.17
85 up to 90 4 0.13
90 up to 95 6 0.20
95 up to 100 10 0.33
100 up to 105 1 0.03
105 up to 110 4 0.13
Total 30 1.00
the rst class includes all observations from $80 up to but less than $85
the second class includes all observations from $85 up to but less than $90 etc
Quick Question 4
Notice how each of the classes is specied by giving a lower and an upper limit. The class frequency gives the
number of observations that have values from (and including) the lower limit up to (but not including) the upper
limit.
The following terminology is used for class frequency tables.
Stated class limits: The numbers used in the table to delimit the classes.
Class boundary: The dividing point between two adjacent classes.
Class 1

Class 2

Class 3

class boundary
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
class boundary
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
class boundary
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
class boundary
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Each class has a lower boundary and an upper boundary. Notice that the upper boundary of a class is the lower
boundary of the following class. (For some variables there is a degree of arbitrariness about the class boundaries.)
Class width: The distance between the class upper and lower boundaries:
class width = class upper boundary class lower boundary
Class mark: The midpoint of the class:
class mark =
class upper boundary + class lower boundary
2
Example 2.7
For the second class in Table 2.10 nd:
(a) the stated class limits;
(b) the class boundaries;
(c) the class width;
(d) the class mark.
Solution
(a) The stated class lower limit of the second class is $85 and the stated class upper limit is $90.
(b) The raw data used in constructing Table 2.10 had been recorded to the nearest dollar (see Table 2.8). An
observation of $84.50 could be rounded up to $85 and then included in the second class. Any value below
$84.50 would be in the rst class. The lower boundary of the second class is $84.50. This provides a
dividing point between the rst and second class. All values below the boundary lie in the rst class and
all values above the boundary are in the second class. The class upper boundary is $89.50
(c) The class width of the second class in the above table is:
($89.50 $84.50) = $5.00
(d) The class mark of the second class in the above table is:
$84.50 + $89.50
2
= $87.00
Quick Question 5
2.10 Cumulative frequency distributions
It is often useful to know how many observations were below or above a certain value. For example you may
want to know how many students obtained a lower or higher mark than you in a test. This type of information can
be displayed in a cumulative frequency distribution.
Or-less-than cumulative frequency distribution: A table giving the number of observations that are less than or
equal to a list of given values.
Or-more-than cumulative frequency distribution: A table giving the number of observations that are greater
than or equal to a list of given values.
Example 2.8
Construct the or-less-than cumulative frequency distribution for the data in Table 2.11 below.
Food Expenditure ($) Frequency
80 up to 85 5
85 up to 90 4
90 up to 95 6
95 up to 100 10
100 up to 105 1
105 up to 110 4
Total 30
Source: Random sample of 30 low income households in 2001.
Solution
The or-less-than cumulative frequencies are obtained by starting with the rst frequency and summing down the
frequency column of the frequency distribution.
Table 2.12: Calculating the Cumulative Frequencies
Food Expenditure ($) Frequency Cumulative Frequency
80 up to 85 5 5
85 up to 90 4 4 + 5 = 9
90 up to 95 6 6 + 9 = 15
95 up to 100 10 10 + 15 = 25
100 up to 105 1 1 + 25 = 26
105 up to 110 4 4 + 26 = 30
Total 30
The cumulative frequencies in column 3 give the number of observations with value less than or equal to the
upper boundary of the class. For example, to be in the rst four classes, an observation must have a value of
less than or equal to the upper boundary of class 4. From the cumulative frequency column you can see that there
are 25 observations in the rst four classes. Thus there are 25 observations with a value of less than or equal to
$99.50.
This gives the following or-less-than cumulative frequency distribution.
Table 2.13: Or-less-than Cumulative Frequency Distribution
Household Expenditure Or Less Than Or Less Than Cumulative
($) Cumulative Frequency Relative Frequency
79.50 0 0.00
84.50 5 0.17
89.50 9 0.30
94.50 15 0.50
99.50 25 0.83
104.50 26 0.87
109.50 30 1.00
There were no observations below the lower boundary of the rst class. This gives a cumulative frequency of 0
below $79.50. It is usual to start an or-less-than cumulative frequency distributions with a 0 frequency.
Example 2.9
Construct the or-more-than cumulative frequency distribution for the data in Table 2.14 below.
Food Expenditure ($) Frequency
80 up to 85 5
85 up to 90 4
90 up to 95 6
95 up to 100 10
100 up to 105 1
105 up to 110 4
Total 30
Solution
The or-more-than cumulative frequencies are obtained by starting with the last frequency and summing up the
frequency column of the frequency distribution.
Table 2.15: Calculating the Or-more-than Cumulative Frequencies
Food Expenditure ($) Frequency Cumulative Frequency
80 up to 85 5 5 +25 = 30
85 up to 90 4 4 + 21 = 25
90 up to 95 6 6 + 15 = 21
95 up to 100 10 10 + 5 = 15
100 up to 105 1 1 + 4 = 5
105 up to 110 4 4
Total 30
The cumulative frequencies in column 3 give the number of observations with value greater than or equal to
the lower boundary of the class. For example, to be in the last four classes, an observation must have a value of
greater than or equal to the lower boundary of class 3. From the cumulative frequency column you can see that
there are 21 observations in the last four classes. Thus there are 21 observations with a value of greater than or
equal to $89.50.
This gives the following or-more-than cumulative frequency distribution.
Table 2.16: Or-more-than Cumulative Frequency Distribution
Household Expenditure Or More Than Or More Than Cumulative
79.50 30 1.00
84.50 25 0.83
89.50 21 0.70
94.50 15 0.50
99.50 5 0.17
104.50 4 0.13
109.50 0 0.00
There were no observations above the upper boundary of the last class. This gives a cumulative frequency of 0
above $109.50. It is usual to end an or-more-than cumulative frequency distributions with a 0 frequency.
Quick Question 6
Quick Question 7
Socrates
2.11 Presenting data in charts
Tables provide detailed information. Charts and graphs are used to emphasise particular aspects of the data.
Charts are often used to display nominal and ordinal data but graphs can only be drawn for interval and ratio data.
In this section you will learn how to choose an appropriate chart to display data. In the next section you will learn
how to draw two types of graph.
Table 2.17: Employed Persons by Sector (1995/96)
Industry Male Female Total
(000) (000) (000)
Agriculture and Mining 369.9 137.4 507.3
Manufacturing 1397.9 394.7 1792.6
Wholesale and Retail 1108.9 997.6 2106.5
Business Services 992.0 666.2 1658.2
Community Services 847.6 1375.5 2223.1
Total 4716.3 3571.4 8287.7
Source: Labour Force, Australia, ABS, Cat. No. 6203.0
Carefully study Table 2.17.
What did you learn about the distribution of paid employment in Australia from looking at the table? Many of us
nd that a table like this contains too much information. The important features are lost in the detail. Charts are
used to emphasise the important features of data. Use charts for impact and tables for detail.
Charts are a means of communicating information and to be effective, they must be clearly laid out. All charts
should be numbered and have a title. They should be fully labelled with a key given when necessary.
The type of chart used to display data depends on the features of the data to be emphasised. In the following
subsections you will learn when to use
1. pie charts
2. simple bar charts
3. component bar charts
4. multiple bar charts
5. 100% bar charts
All of these charts can be easily drawn using Excel. The important skill is to choose the right chart!
Before drawing a chart always ask yourself the question
What features of the data am I trying to show?
After drawing a chart, assess the chart and see if does indeed clearly show the important features of the data.
2.11.1 Pie charts
Pie charts are used to show the breakdown of a total into its components. They do not show the absolute values
of the components, only their relative sizes. They are especially useful where the components are measured on a
nominal scale.
A pie chart could be used to show the breakdown of total paid employment between the different sectors of the
Australian economy (see Figure 2.1). This chart shows the relative importance of the different sectors but does
not show the numbers employed in each sector.
Figure 2.1: Employed Persons by Sector (1995/96)
Source : Labour Force, Australia, ABS Cat No 6203.0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ........................................................................................................................................................................................
.........................................................................................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. ..................
.................................
...............................................
...........................................................
...............................................................
..........................................................
....................................................
...............................................
..........................................
.....................................
................................
...........................
......................
................
...........
......
..
..................
.....
........................
..........................................
.............................................................
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
..
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$$
$
$
$
$
$
$
$$
$
$
$
$
$
$
$$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
Agriculture and Mining
Manufacturing
Wholesale/Retail
Business Services
Community Services
Quick Question 8
2.11.2 Simple bar charts
Simple bar charts are used to compare the sizes of different totals. Use simple bar charts for displaying nominal
and ordinal data, and for showing changes over time.
A simple bar chart could be used to show the numbers employed in the different sectors in Australia.
Source: Labour Force, Australia, ABS Cat. No. 6203.0
0
500
1000
1500
2000
2500
Number employed (000)
Agriculture
Mining
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Manufacturing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wholesale
Retail
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Business
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Community
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
This chart shows the total number employed in each sector more clearly than the pie chart but gives a less clear
picture than the pie chart of the importance of each sector in the total.
2.11.3 Component bar charts
Component bar charts are used to compare the sizes of different totals and to show the breakdown of these totals
into their components. They are especially useful for displaying nominal and ordinal data, and for showing
changes in totals and their components over time.
A component bar chart could be used to show the numbers employed in the different sectors in Australia and the
breakdown of employment into males and females.
0
500
1000
1500
2000
2500
Agriculture
Mining
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Manufacturing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wholesale
Retail
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Business
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Community
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Males
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Females
With this chart the total numbers employed in each sector can be compared. The numbers of males in each sector
can also be compared but it is not so easy to compare the numbers of females in each sector. To aid comparisons
put the most important component at the base of the bar.
2.11.4 Multiple bar charts
Multiple bar charts are used to show differences in the sizes of the components of some totals. They are used
where differences in the sizes of the components are more important than differences in the totals. Use multiple
bar charts for displaying differences in the sizes of the components of nominal and ordinal data, and for showing
changes in the sizes of components over time.
A multiple bar chart could be used to show how the male and female components of employment differ between
sectors. Compared to the component bar chart, the multiple bar chart gives a clearer picture of differences in the
number of females employed but a less clear picture of the total employment in each sector.
0
500
1000
1500
Agriculture
Mining
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Manufacturing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wholesale
Retail
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Business
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Community
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Males
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Females
2.11.5 100 % Component bar charts
100% component bar charts are used to show the percentage of each component in the total. They are used where
differences in the sizes of the totals are not important.
A 100% component bar chart could be used to show the percentages of employment in each sector that is male
and female.
0
10
20
30
40
50
60
70
80
90
100
Percentage Male
0
10
20
30
40
50
60
70
80
90
100
Percentage Female
Agriculture
Mining
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Manufacturing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wholesale
Retail
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Business
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Community
Services
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Males
.
.
.
.
.
.
.
.
.
.
Females
2.12 Presenting data in graphs
You will recall from the previous section, that tables are used to provide detailed information. Charts and graphs
are used to emphasise particular aspects of the data. In this section we will look at graphs. Graphs can only be
drawn for data on an interval or ratio level of measurement.
Remember that graphs are drawn to communicate information. Graphs should be simple and easy to understand.
A complicated graph is a bad graph. To make your graphs easy to understand always:
1. give the graph a title and a number;
2. give the primary source of the data.
3. clearly label the axes and give the units of measurement;
4. mark the scale on the axes;
5. for variables measured on a ratio scale start the axis at zero whenever possible;
6. when the graph contains several lines, label each line or include a key.
In the next two subsections, you will learn how to draw two of the most useful graphs in statistics the histogram
and the cumulative frequency polygon.
2.12.1 Drawing histograms
Histograms are graphs that use bars to display the shape of class frequency distributions. Histograms can only be
drawn when the class variable is measured on at least an interval scale. The histogram of the class frequency data
from Table 2.11 is shown in Figure 2.6 below.
Figure 2.6: Weekly Food Expenditure of Low Income Households
Source: Random sample of 30 low income households in 2001
75 80 85 90 95 100 105 110 115
Food expenditure ($)
0
2
4
6
8
10
12
Frequency per $5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The following points should be noted when drawing histograms.
1. A histogram is a graph and not a bar chart. The horizontal axis should be marked as for a graph. Some texts
show the bars of the histogram being labelled with the class limits. For example the second bar in Figure
2.6 could be labelled as $85 up to $90. This is not good practice. Label the bars in a bar chart but do not
label the bars in a histogram.
2. The base of each bar stretches from the class lower boundary to the class upper boundary. There should be
no gaps between the bars of successive classes.
3. Where all the classes are of equal width the height of the bar is the class frequency but when the classes are
of unequal width the the class heights have to be adjusted. With unequal class widths
height of bar =
class frequency
width of the class
This adjustment is illustrated in Example 2.10 below.
4. Histograms can be drawn with either the frequency or the relative frequency up the vertical axis.
Example 2.10
A questionnaire was sent to a random sample UC graduates one year after graduation. Each student was asked
to give her/his gross income in 2001 rounded to the nearest thousand dollars. The incomes of the responding
graduates are summarised in Table 2.18 below.
Table 2.18: Income Distribution of 40 UC Graduates
Income ($000) Frequency
15 up to 20 3
20 up to 25 4
25 up to 30 11
30 up to 35 13
35 up to 40 4
40 up to 45 3
45 up to 75 2
Total 40
Source: A random sample of 40 UC graduates in 2001.
Display these data in a histogram.
Solution
Notice that the last class in Table 2.18 is wider than the other classes. The height of the histogram bar for this
class must be adjusted to allow for this different width.
As most classes are of width $5,000, choose $5,000 as the standard class width for income. (Any number can
be chosen for the standard class widthit does not affect the shape of the distribution.) The calculations for the
heights of the bars are shown in Table 2.19.
Table 2.19: Calculating the Bar Heights for the Histogram
Income Frequency Class Boundaries Class Width Class Width Bar Height
($000) ($000) ($000) (standard units)
(1) (2) (3) (4) (5) (6)
15 up to 20 3 14.5 19.5 5 1 3
20 up to 25 4 19.5 24.5 5 1 4
25 up to 30 11 24.5 29.5 5 1 11
30 up to 35 13 29.5 34.5 5 1 13
35 up to 40 4 34.5 39.5 5 1 4
40 up to 45 3 39.5 44.5 5 1 3
45 up to 75 2 44.5 74.5 30 6 0.333
Total 40
Note:
1. In column (4) the class widths are calculated by
class width = class upper boundary class lower boundary.
2. In column (5) the class widths are calculated by
class width (standard units) =
class width ($000)
standard class width($000)
In this example the standard class was taken to be $5000 and so the the gures in column (5) are obtained
by dividing the gures in column (4) by 5.
3. In column (6) the bar heights are calculated by
bar height =
frequency
class width in standard units
If the classes had all been of equal width and this class width was taken as the standard unit of measurement,
then all the gures in column (5) would be 1. Then the bar heights would just be the class frequenciesas
in Figure 2.6.
4. The bars are now drawn with a base of the class boundaries given in column (3) and height of the bar given
in column (6). The histogram is shown in Figure 2.7.
Figure 2.7: Annual Salaries of UC Graduates
Source: Random sample of 40 UC Graduates
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Income ($000)
0
2
4
6
8
10
12
14
Frequency per $5000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
With a histogram:
bar height =
class frequency
bar width in standard units
.
.
. (bar height)(bar width in standard units) = class frequency
.
.
. Area of bar = class frequency
In a histogram it is the area of the bar that represents the class frequency.
Quick Question 9
2.12.2 Cumulative frequency polygons
Cumulative frequency polygons are also used to display the data in a class frequency table. They are the graphs of
the cumulative frequency distributions discussed in Section 2.10. Cumulative frequency polygons can be drawn
with either the cumulative frequency or the cumulative relative frequency or both, up the vertical axis. Cumulative
frequency polygons are also called ogives.
Example 2.11
Draw the or-less-than cumulative frequency polygon for the data in Table 2.20 below.
15 up to 20 3
20 up to 25 4
25 up to 30 11
30 up to 35 13
35 up to 40 4
40 up to 45 3
45 up to 75 2
Total 40
Solution
The or-less-than cumulative frequency distribution table is shown in Table 2.21 below:
Table 2.21: Or-less-than Cumulative Frequency Distribution
Annual Income ($000) Or-less-than Or-less-than Cumulative
14.50 0 0.000
19.50 3 0.075
24.50 7 0.175
29.50 18 0.450
34.50 31 0.775
39.50 35 0.875
44.50 38 0.950
74.50 40 1.000
(Remember that the or-less-than cumulative frequencies relate to the upper boundaries of the classes.)
The graph of this distribution is displayed on the following page.
Figure 2.8: Or-less-than Cumulative Frequency Polygon Of Incomes of UC Graduates
Source: Random Sample of 40 UC Graduates in 2001
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
Annual Income ($000)
0
5
10
15
20
25
30
35
40
Cumulative Frequency
0
25
50
75
100
Cumulative Percentage
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Quick Question 10
Socrates
2.13 Misleading diagrams
A common method of using diagrams to over-represent increases is to omit the origin. Always be wary of graphs
that have a vertical scale that does not start at zero. Be particularly careful with graphs where the vertical scale is
not marked at all!
A second method of over-representing increases is to represent values by the area or volume of a gure and then
increase each dimension of the gure by the amount of the increase. If a value doubles and this is represented by
doubling each dimension of the gure the area will increase fourfold and the volume eightfold!
Example 2.12
Table 2.22: Woolworths Sales Revenue for 1988/89 to 1992/93
Financial Year Sales Revenue
($ million)
88/89 6,584.0
89/90 7,445.0
90/91 8,272.6
91/92 9,183.4
92/93 10,434.0
Source: Woolworths Prospectus 1993.
Display these data in a chart:
(a) that honestly represents the growth in sales;
(b) that exaggerates the growth in sales.
(a) Sales revenue is measured on a ratio scale. In calculating ratios, we compare the distances of values from
zero and these ratios can only be visualised if the graph includes the zero point. Where ratios are important
always include the zero point.
Figure 2.9: Woolworths Sales Revenue for 1988/89 to 1992/93
Source: Woolworths Prospectus, 1993
Financial Year
0
2000
4000
6000
8000
10000
12000
Revenue ($ million)
1988/89
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1989/90
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1990/91
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1991/92
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1992/93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(b) To over-emphasise the growth in sales start the vertical axis at as high a value as possible.
Figure 2.10: Woolworths Sales Revenue for 1988/89 to 1992/93
Source: Woolworths Prospectus, 1993
Financial Year
6000
7000
8000
9000
10000
11000
Revenue ($ million)
1988/89
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1989/90
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1990/91
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1991/92
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1992/93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In comparing the vertical heights, the distances away from $6,000 million are being compared and not the
distances from 0. This gives a misleading picture of the rate of growth of sales over the period.
2.14 Summary
This has been a unit about communication. How to summarise data and communicate the information to others.
In this unit you learned how to summarise data into tables. You also learned how to emphasise the important
features of data by using charts and graphs. Remember that tables are used to present detailed information and
charts and graphs are used to emphasise particular features of the data.
When selecting an appropriate form of presentation of data, always ask yourself the question:
What aspects of the data do I wish to show?
After drawing the diagram, critically examine the result and decide:
Does the diagram clearly show the important features of the data?
Read the learning objectives at the start of this unit. Have you achieved these objectives? If there are any
objectives about which you are unclear, re-read the appropriate sections before trying the tutorial exercise.
UNIT 3
SUMMARISING
NUMERICAL DATA
ON ONE VARIABLE
3.0 Contents
3.1 Unit objectives
3.2 Introduction
3.3 Symbols for populations and samples
3.4 Rounding
3.5 The four measures of the central value
3.6 Properties of the measures of the central value
3.7 The average rate of change over time
3.8 The four measures of spread
3.9 Properties of the measures of spread
3.10 A measure of skewness
3.11 Using statistics and box plots to compare data sets
3.12 Combining the mean and standard deviation: the empirical rule, z-scores and outliers
3.13 Calculating the measures of the central value and spread
3.14 Unit summary
Print Workbook 3
3.1 Unit objectives
You know that the result of the data collection process is a large volume of disorganised data. In Unit 2 we dis-
cussed how these data can be presented in tables, charts and graphs. In this unit you will learn how to summarise
numerical data by calculating measures of the main features of the data.
Dene the four measures of the central value of a data set: the mean, the median, the mode and the
geometric mean;
Select the most appropriate measure of the central value for any data set;
Calculate the average rate of change of a variable over time;
Dene the four measures of the spread of a data set: the range, the standard deviation,
the quartile deviation and the coefcient of variation;
Select the most appropriate measure of the spread for any data set;
Calculate the mean and standard deviation for data in a value frequency table;
Calculate the mean and standard deviation for data in a class frequency table;
Dene and interpret a measure of the skewness in a data set;
Use three point and ve gure summaries and box plots to compare data sets;
Calculate z-scores and identify outliers;
3.2 Introduction
Consider the two sets of data given in Tables 3.1 and 3.2 below.
Table 3.1: Annual Income of 40 UC Graduates ($000)
21 19 33 15 27 21 31 43 36 25
31 26 27 32 31 57 30 34 18 25
20 34 22 26 32 40 30 27 29 36
37 26 38 27 44 31 27 32 33 70
Table 3.2: Annual Income of 30 Non-Graduates ($000)
10 19 22 20 23 29 22 24 16 22
28 20 25 20 29 28 22 15 15 21
21 15 16 34 29 33 11 24 24 23
Source: A random sample of 30 non-graduates in 1999.
These two sets of data can be compared by examining their histograms as shown in Figures 3.1 and 3.2.
Figure 3.1: Annual Salaries of 40 UC Graduates
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Income ($000)
0
2
4
6
8
10
12
14
Frequency per $5000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 3.2: Annual Salaries of 30 Non-graduates
Source: A random sample of 30 non-graduates in 1999.
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
Income ($000)
0
2
4
6
8
10
12
14
Frequency per $5000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
From these histograms, you can see that there are three obvious differences between the income distributions of
graduates and non-graduates:
graduates have a higher average income than non-graduates,
graduates have a wider spread in their incomes than non-graduates, and
the income distribution of non-graduates is symmetric around the central value but the income distribu-
tion of graduates is skewed and has a longer tail in the positive direction.
Although sets of data can be compared by drawing their histograms, this method is of limited usefulness. The
comparisons are often subjective and become complicated when a large number of data sets are involved. A better
way is to compare numerical measures of the main features of the data. In this unit, you will learn to select and
compare:
measures of the central value of the data(also called measures of central tendency),
measures of the spread of the data around the central value, and
measures of the skewness of the distribution.
These measures are almost always calculated on a computer using either a statistical package or a spreadsheet.
However in your degree studies you will often need to quickly calculate these measures for small data sets. Details
on the calculation of the measures is given at the end of the unit.
3.3 Symbols for populations and samples
You learned in Unit 1 that the central problem of statistics is the reliability of using sample information to make
inferences about populations. In statistics, the terminology and symbols used distinguishes between values cal-
culated from populations and values calculated from samples.
Parameter: A descriptive measure of a population.
Statistic: A descriptive measure of a sample.
Parameters are usually denoted by greek letters and statistics by latin letters. The following symbols will be used
throughout this course.
Table 3.3: Some Statistical Symbols
Numerical measure Population Sample
Size N n
Mean x
Median m
Standard deviation s
Proportion p p
For example, the equation =$2,000 states that the population mean is $2,000 and the equation x =1.2 metres
asserts that the sample mean is 1.2 metres.
A hat

placed over a parameter indicates an estimate of a population value from sample data. The equation
=$1,880 implies that, on the basis of sample data, the population mean has been estimated to be $1,880.
3.4 Rounding
Before we carry out our rst calculations, it is important to know a few ground rules on rounding off in calcula-
tions.
The result of rounding a number such as 13.64 to the nearest tenth is 13.6 because 13.64 is closer to 13.6 than
to 13.7. Similarly, 13.64 rounded to the nearest integer is 14 as 13.64 is closer to 14 than to 13. However, with
the number 13.65 a dilemma arises when rounding to the nearest tenth as it is equidistant from 13.6 and 13.7.
In deciding whether to round this number up or down follow the principle of randomisation and toss a coin or
perform some similar random act to decide on the rounded value. The number 13.65 should be rounded up to
13.7 or down to 13.6 at random.
The extent to which the result of a calculation is to be rounded should be decided before the calculation is started.
There are two considerations here.
1. Firstly, how accurate is the data used in the calculation? The result cannot be more accurate than the data
from which it was calculated. If, for example, incomes are recorded to the nearest $1,000, then values
calculated from these incomes should be rounded to the nearest $1,000. The answer is not more accurate
than this and so should not be presented with unjustied precision.
2. The second point to be considered is the needs of the user of the information. If the user wants a precise
answer, then give the answer with as much precision as is warranted by the data. If the user wants only an
approximate answer, then round the result further.
Rounding should be carried out at the end of calculations and not in the middle of calculations. If you have to
round within a sequence of calculations, carry at least two more decimal places than you intend to use in the
rounded answer.
Quick Question 1
3.5 The four measures of the central value
Suppose that we want to summarise a set of observations by a single value that is in some way representative of
the whole set of values. This single value would then be referred to as a central value or a measure of central
tendency for the set of observations. It would also be called an average value for the set of observations. There
are several different interpretations of what is meant by a representative value and this leads to several different
measures of the central value. Here, we will consider four measures of the central value:
the mean,
the median,
the mode,
the geometric mean.
In the next two sections you will learn the denitions and properties of these different measures. The details on
how to calculate the measures are given later in the unit.
The mean
The mean: The sum of all the values divided by the total number of values.
The symbols used for the mean are
= the population mean
x = the sample mean
This is the most widely understood measure of the average value of a set of observations. It is what most people
would use if asked to calculate an average value.
Example 3.1
A simple random sample of 9 low income households revealed the following weekly expenditures on food:
$80 $70 $70 $150 $70 $80 $110 $70 $110
What was the mean weekly food expenditure of these 9 households?
Solution
Let X = the mean weekly food expenditure($)
then X =
80 + 70 + 70 + 150 + 70 + 80 + 110 + 70 + 110
9
= 90
The mean weekly food expenditure was $90.
The median
The median: The middle value after the values have been ordered from the smallest to the largest.
The symbols used for the median are:
= the population median
M = the sample median.
These symbols are not universally accepted. When you read other texts, you will nd different symbols used.
Example 3.2
$80 $70 $70 $150 $70 $80 $110 $70 $110
What was the median weekly food expenditure of these 9 households?
Solution
Ordering the values from the smallest to the largest gives:
$70 $70 $70 $70 $80 $80 $110 $110 $150
The middle observation is the fth observation of $80. The median weekly food expenditure was $80.
The mode
The mode: The value that is observed most frequently.
There is not a standard symbol for the mode. Here we will just use population mode and sample mode as appro-
priate.
Example 3.3
$80 $70 $70 $150 $70 $80 $110 $70 $110
What was the modal weekly food expenditure of these 9 households?
Solution
The value of $70 was observed four times, the values of $80 and $110 were each observed twice and the value of
$150 was observed once. The value that occurs most often is $70.
The modal weekly food expenditure was $70.
The geometric mean
The geometric mean: The geometric mean of n non-negative numbers is the nth root of the product of the num-
bers.
Again there is no standard symbol for the geometric mean. Notice that the geometric mean can only be calculated
for non-negative numbers.
Example 3.4
$80 $70 $70 $150 $70 $80 $110 $70 $110
What was the geometric mean food expenditure?
Solution
Let gm = the geometric mean food expenditure in the sample($)
then gm =
9
80 70 70 150 70 80 110 70 110 = 86.77

The geometric mean weekly food expenditure was $90. (Notice that the gure has been rounded because it
appears that the original data was only recorded to the nearest $10.)
If you do not know how to use your calculator to calculate kth roots click here for some instructions.
3.6 Properties of the measures of the central value
In this section you will learn the most important properties of each of the four measures of the central value. This
will help you choose the best measure to use. Examples are given that illustrate the properties of the measures.
These examples are numbered by reference to the property illustrated. The topics covered are
the properties of the mean
the properties of the median
the properties of the mode
the properties of the geometric mean
selecting a measure of the central value
Properties of the mean
1. The mean can be calculated for variables measured on at least an interval scale.
2. It is the most widely used and understood average.
3. It uses all the values and responds to changes in any one value.
4. In the presence of extreme values, the mean can become very unrepresentative.
5. It is often a value which cannot occur.
6. The total value of the observations can be calculated by multiplying the mean by the number of observa-
tions:
mean =
sum of the observations
number of observations
.
.
. mean number of observations = sum of the observations
= total value of the observations
7. The sum of the deviations from the mean is zero.
(x x) = 0
Example 3.5
3. The mean value of the sample observations 4, 12, 16, 9, and 4 is:
x =
4 + 12 + 16 + 9 + 4
5
=
45
5
= 9
The mean is 9. If the third value increases to 26, then the mean increases to 11.
The mean responds to changes in any one value. You will see later that this is not true for some other
measures of the central value.
4. If the sale prices of a sample of 5 houses in Canberra were:
$90,000 $95,000 $100,000 $105,000 $750,000
then the mean price is (working in $000)
x =
90 + 95 + 100 + 105 + 750
5
=
1140
5
= 228
The mean price is $228,000. Note that 4 out of the 5 houses sold for less than half the mean price. Here, in
the presence of a value much larger than the other values, the mean price is not representative of the typical
sale price. The mean is not a good central value for this set of data.
5. If four families had 3, 6, 6 and 2 children, then the mean number of children per family is 4.25but no
family can ever have 4.25 children. Here the mean is a value that cannot occur.
6. If a man saved an average of $8 per week for 10 weeks, his total savings would be $80. The total is found
by multiplying the mean by the number of observations.
Properties of the median
1. The median can be calculated for variables measured on at least an ordinal level of measurement.
2. It is an insensitive measure in that many changes in the observed values will have no effect on the median.
3. It is not affected by extreme values.
Example 3.6
2. To nd the median value of the sample observations 4, 12, 16, 9 and 4, rst, order the observations from
smallest to largest as:
4 4 9 12 16
The median is now the third value which is 9. If the value of the largest observation increases to 26, the
median will be unchanged. Here, a change in one of the observations has had no effect on the median.
$90,000 $95,000 $100,000 $105,000 $750,000
then the median price is the value of the third observation$100,000. The extreme value of $750,000
does not pull the median up and make it unrepresentative. The median is a central value and hence a good
average to use here.
Properties of the mode
1. The mode can be calculated for variables measured on any level of measurement.
2. It only uses the most frequently occurring value and ignores all other values. This can make the mode a
poor measure of the central value.
3. For some data sets, there are no modes and for other data sets, there are two or more modes.
Example 3.7
2. The modal value of the sample observations 4, 12, 16, 9, and 4 is 4. Here, the mode is the smallest of the
observations. It is not a central value.
$90,000 $95,000 $100,000 $105,000 $750,000
then each value has been observed once and so there is no modal value.
For the set of observations 30, 96, 30, 44, 81, 61 and 44 there are two modes30 and 44. The observations
are then said to be bimodal.
Properties of the geometric mean
1. The geometric mean can be calculated for non-negative variables measured on at least an interval scale.
2. It uses all the values and responds to changes in any one value.
3. It less affected by the presence of extreme values than the mean.
4. The main use of the geometric mean is in calculating average rates of change over time. We will look at
this in the next section.
Example 3.8
$90,000 $95,000 $100,000 $105,000 $750,000
the geometric mean price is (working in $000)
gm =
5
90 95 100 105 750 = 146.4

The geometric mean price is $146,000 which is much less than the arithmetic mean price of $228,000.
Selecting a measure of the central value
Let us pause for a while and ask: What have we learned so far? You now know that there are four measures of
the central value. Now you may be tempted to ask: Which is the best measure of the central value to use?
For nominal data, the mode is the only measure of the central value that can be calculated. Use the mode as the
measure of the central value for nominal data.
For ordinal data, the mode and the median can be calculated. The mode can be an unrepresentative value and is
seldom used for other than nominal data. The median is the middle value and is often a more typical value for the
data set than the mode. Generally, use the median as the measure of the central value for ordinal data.
For interval and ratio data, it is possible to calculate the mean, the median, the mode and the geometric mean. In
statistics we are concerned with communicating our conclusions to non-statisticians and generally, non-statisticians
feel more comfortable with the mean than with the median. For this reason, the mean is the preferred measure of
central value for interval and ratio data, provided it is not distorted by the presence of extreme values. If there
are extreme values, use the median as the measure of the central value for interval and ratio data. If there are no
extreme values, use the mean as the measure of the central value.
Quick Question 2
Socrates
3.7 The average rate of change over time
Economists, accountants and nancial analysts often calculate average rates of change for time series. For exam-
ple, they calculate:
average annual rates of ination;
average annual rates of wage increases.
The percentage increase in a time series is calculated by:
percentage increase =
nal value - initial value
initial value
100
Example 3.9
The average weekly earnings of an adult in full time employment in Australia was $536.10 in June 1989 and
$789.10 in June 1999. What was the percentage increase in earnings over this period?
Solution
Percentage increase =
789.10 536.10
536.10
100 = 47.19
Average earnings increased by 47% between June 1989 and June 1999.
In the above example the increase was 47% over ten yearsbut what was the average percentage increase per
year? One method of calculating the average annual percentage increase is to nd the percentage increase for
each year and then average these gures. The following example shows that this is not a good method.
Example 3.10
The annual prots of Company A from 1996 to 1999 were:
Year 1996 1997 1998 1999
Prots ($ million) 10 4 5 10
What was the mean annual percentage increase in prots?
Solution
Table 3.4: Calculating the Annual Percentage Increases
Year Percentage Increase
1996 1997
410
10
100 = 60
1997 1998
54
4
100 = 25
1998 1999
105
5
100 = 100
.
.
. mean percentage increase =
60 + 25 + 100
3
= 21.67.
Prots were $10 million in 1996 and still $10 million in 1999 but the mean annual percentage increase in prots
over this period was 22%! Clearly the mean percentage increase is a misleading average here.
A better measure of the average rate of increase of a time series is the equivalent constant rate of increase.
Equivalent constant rate of increase: The constant percentage rate of increase that would take the time series
from its observed initial value to its observed nal value
There are two ways of calculating this gure depending on the data available. If the values of the time series are
known used method 1. If only the annual percentage increases are given, use method 2.
Method 1: From the time series
Equivalent constant rate of increase =
_
fi
S
f
S
i
1
_
100
where f = the number of the nal period
i = the number of the initial period
S
f
= the value of the time series in the nal period
S
i
= the value of the time series in the initial period
Example 3.11
The average earnings of an adult in full time employment was $536.10 per week in June 1989 and $789.10 per
week in June 1999. What was the average annual percentage increase in wages over this period?
Solution
June is the midpoint of the year.
.
.
. f = 1999.5 S
f
= 789.10
i = 1989.5 S
i
= 536.10
Equivalent constant rate of increase =
_
_
1999.51989.5
789.10
536.10
1
_
_
100 = 3.941
Over the last 10 years earnings in Australia have grown at an average rate of 3.9% per annum.
Method 2: From the percentage increases
When is the original series is not available but the percentage increases for each period are given use the following
method:
Let n = the number of percentage increases to be averaged
r
t
= the percentage increase in period t t = 1, 2, . . . , n.
1. Calculate the change ratio, k
t
, for each year:
k
t
= 1 +
r
t
100
t = 1, 2, . . . , n
2. Calculate the geometric mean of the change ratios:
gm =
n
_
k
1
k
2
k
3
k
n
3. Then
Equivalent constant rate of increase = (gm1) 100
Example 3.12
The rates of ination in Australia for the last 4 years were 3.7%, 1.3%, 0.2% and 1.2%. What was the average
annual rate of ination over this period?
Solution
1. Calculate the change ratio for each year by:
k
1
= 1 +
3.7
100
= 1.037
k
2
= 1 +
1.3
100
= 1.013
k
3
= 1 +
0.2
100
= 0.998
k
4
= 1 +
1.2
100
= 1.012
2. Calculate the geometric mean of the change ratios:
gm =
4
1.037 1.013 0.998 1.012 = 1.0149

3. Then
Equivalent constant rate of increase = (1.0149 1) 100
= 1.49
Over the last four years, ination in Australia has been running at an average rate of 1.5% per annum.
Quick Question 3
3.8 The four measures of spread
In this unit you are learning how to summarise numerical data by calculating measures of:
the central value around which all the observations are distributed;
the spread around the central value;
the skewness of the spread around the central value.
In the previous three sections you learned of the different measures of the central value. This section introduces
the four main measures of spread:
the range;
the standard deviation;
the quartile deviation;
the coefcient of variation.
In the next section the properties of these measures are discussed and guidance given on selecting the best measure
of spread.
It is not necessary for a measure of spread to have a clear interpretation when looked at on its own. The objective
is to use the measures of spread to compare the spread in two or more data sets. The only requirement for a good
measure of spread is that the more widely spread the data the larger the value.
Example 3.13
A simple random sample of 9 low income households had the following weekly expenditures on food:
$80 $70 $70 $150 $70 $80 $110 $70 $110
A separate simple random sample of 7 middle income households had the following weekly expenditures on food:
$240 $250 $280 $150 $300 $80 $240
Is there a greater spread in the food expenditures of middle income households than low income households?
Solution
Always try and visualise the distribution of the observations. If there is large number of observations draw a
histogram. With a small number of observations use a dotplot with each observation represented by a single dot.
The dotplots for these two sets of observations are shown on the following page.
Figure 3.3: Dotplots of the Food Expenditures of Low and Middle Income Households
0 50 100 150 200 250 300 350
Household Food Expenditure ($ per week)
Low income households
0 50 100 150 200 250 300 350

Household Food Expenditure ($ per week)
Middle income households

There is a greater spread in the food expenditures of middle income households than low income households.
This example is used to illustrate the four measures of spread. Any good measure of spread will give a larger
value of spread for middle income households than for low income households.
The range
Range: The numerical difference between the largest and smallest values.
Example 3.14
Calculate the range for each of the two data sets in Example 3.13. Is there a greater spread in the food expenditures
of middle income households than low income households?
Solution
For the low income families:
Range = $(150 70) = $80
For the middle income families:
Range = $(300 80) = $220
The range for low income households is $80 and the range for middle income households is $220. Middle income
households have a larger range of food expenditures than low income households. There is a greater spread in the
food expenditures of middle income households than low income households.
The standard deviation
If all the observations are closely grouped, they will all be close to the mean and so the average distance from the
mean will be small. The more widely spread the data the larger the average distance from the mean.
Figure 3.4: Comparing Distances from the Mean
Distribution with a small spread

Mean
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................................................................................
. . . . . . . . . . . ............................
...........................
distance from the mean
Distribution with a large spread

Mean
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
................................................................................................................................................................................................................................................................................................................................................................................. . . . . . . . . . ............................
...........................
distance from the mean
This suggests that the average distance of the observations from the mean would be a good measure of spread.
However if we use (x x) as the distance of the point x from the mean x, the points above the mean will be a
positive distance from the mean and the points below the mean will be a negative distance from the mean. These
positive and negative distances will always cancel out.
Thus for every dataset
(x x)
n
= 0
and so the average deviation from the mean cannot be used as a measure of spread.
To overcome the problem of the positive and negative deviations cancelling out, all deviations are rst squared
making them all positiveand then the average of these squared deviations is used as a measure of spread.
Variance: The average (mean) of the squared deviations from the mean.
One disadvantage of using the variance as a measure of the spread in the data is that it can have peculiar units. For
example, if the data is in dollars, the distances from the mean are also measured in dollars and so the squares of
these distances are measured in square dollars. Thus, if the data is in dollars, the variance is measured in square
dollarsa unit which most people nd difcult to visualise! To get back to the same units as the data take the
square root of the variance. The square root of the variance is called the standard deviation and has the same units
as the data.
Standard deviation: The square root of the variance.
The symbols used for the variance and standard deviation are
2
= the population variance = the population standard deviation
s
2
= the sample variance s = the sample standard deviation
THE VARIANCE THE STANDARD DEVIATION
Population
2
=
(x )
2
N
=
(x )
2
N
Sample s
2
=
(x x)
2
n 1
s =
(x x)
2
n 1
Notice that when calculating the variance and standard deviation from a sample, the denominator is n 1 and
not n. Why is this? Remember that the objective in taking a sample is to use the sample data to estimate
population parameters. The sample variance is calculated to give an estimate of the population variance. It
can be shown that calculating the sample variance with a denominator of n 1 will, on average, give a better
estimate of the population variance than using a denominator of n. Thus to get the best possible estimate of the
population variance and standard deviation from sample data use n1 in the denominator of the sample variance
calculations.
Example 3.15
Calculate the standard deviation for each of the two data sets in Example 3.13. Is there a greater spread in the
food expenditures of middle income households than low income households?
Solution
Let x
L
= mean food expenditure of the sampled low income households
s
L
= standard deviation of food expenditures of the sampled low income households
Then x
L
=
80 + 70 + 70 + 150 + 70 + 80 + 110 + 70 + 110
9
= 90
Table 3.5: Calculating the Variance for Low Income Households
x x x (x x)
2
80 80 90 = 10 100
70 70 90 = 20 400
70 70 90 = 20 400
150 150 90 = 60 3600
70 70 90 = 20 400
80 80 90 = 10 100
110 110 90 = 20 400
70 70 90 = 20 400
110 110 90 = 20 400
Total 0 6200
.
.
. s
2
L
=
6200
91
= 775
.
.
. s
L
=
775 = 27.839
The standard deviation of food expenditures of low income households is $30 (after rounding).
(Notice that, as we expected,
(x x) = 0. Positive and negative deviations from the mean cancel out.)

Now lets repeat the process for the middle income households.
Let x
M
= mean food expenditure of the sampled middle income households
s
M
= standard deviation of food expenditures of the sampled middle income households
Then x
M
=
240 + 250 + 280 + 150 + 300 + 80 + 240
7
= 220
Table 3.6: Calculating the Variance for Middle Income Households
x x x (x x)
2
240 240 220 = 20 400
250 250 220 = 30 900
280 280 220 = 60 3600
150 150 220 = 70 4900
300 300 220 = 80 6400
80 80 220 = 140 19600
240 240 220 = 20 400
Total 0 36200
.
.
. s
2
M
=
36200
71
= 603.33
3
.
.
. s
M
=
603.33
3 = 77.675
The standard deviation of food expenditures of middle income households is $80 (after rounding).
The standard deviation of food expenditures is $30 for low income income households and $80 for middle income
households. There is a greater spread in the food expenditures of medium income households than low income
households.
The numerical calculations described above are used to illustrate the denition of the standard deviation
and the variance. The method described above is very inefcient. Never use the method described above.
To calculate the standard deviation use the methods described in section 3.13 below. calculator or the
method below.
The quartile deviation
A major problem with the range is that it is affected by any extreme values. After all, the range is the difference
between the two most extreme values! The standard deviation is also highly sensitive to extreme values. The
quartile deviation attempts to overcome this problem by removing all the extreme values and then calculating the
range of the remaining observations.
First quartile (Q
1
): The value x such that one quarter of the observations are less than x and three quarters of
the observations are more than x.
Third quartile (Q
3
): The value x such that three quarters of the observations are less than x and one quarter of
the observations are more than x.
To calculate the quartiles use:
Q
1
= the value of the
n+1
4
th observation after the observations have been ordered from
smallest to largest.
Q
3
= the value of the
3(n+1)
4
th observation after the observations have been ordered from
smallest to largest.
What do we do if
n+1
4
and
3(n+1)
4
are not integers? In this subject we will follow the convention of using the mean
of the values on either side of
n+1
4
and
3(n+1)
4
when these values are non-integer. This is only a convention. Other
textbooks use other conventions.
Interquartile range: The difference between Q
3
and Q
1
.
Quartile deviation: The interquartile range divided by 2.
Example 3.16
Calculate the quartile deviation for each of the two data sets in Example 3.13. Is there a greater spread in the food
expenditures of middle income households than low income households?
Solution
First order the food expenditures of the low income households from smallest to largest.
$70 $70 $70 $70 $80 $80 $110 $110 $150
Q
1
= the value of the
9+1
4
= 2.5th observation
= the mean of the 2nd and 3rd observation (as 2.5 is not an integer).
=
70+70
2
= 70
Q
3
= the value of the
3(9+1)
4
= 7.5th observation
= the mean of the 7th and 8th observation (as 7.5 is not an integer).
=
110+110
2
= 110
QD =
Q
3
Q
1
2
=
110 70
2
= 20
The quartile deviation is $20 for low income households.
Now order the food expenditures of the middle income households from smallest to largest.
$80 $150 $240 $240 $250 $280 $300
Q
1
= the value of the
7+1
4
= 2nd observation
= 150
Q
3
= the value of the
3(7+1)
4
= 6th observation
= 280
QD =
Q
3
Q
1
2
=
280 150
2
= 65
The quartile deviation is $65 for middle income households.
The quartile deviation of food expenditures is $20 for low income households and $65 for middle income house-
holds. There is a greater spread in the food expenditures of middle income households than low income house-
holds.
The coefcient of variation
The nal measure of spread is the coefcient of variation.
Coefcient of variation: The standard deviation expressed as a percentage of the mean.
.
.
. for a population CV =

100
for a sample cv =
s
x
100
The previous measures of spreadthe range, the standard deviation and the quartile deviationall have the same
units as the data but the coefcient of variation is a percentage. The coefcient of variation can be used to compare
the spreads of data measured in different units.
Example 3.17
Calculate the coefcient of variation for each of the two data sets in Example 3.13. Is there a greater spread in
the food expenditures of middle income households than low income households?
Solution
In Example 3.15 we calculated that:
x
L
= 90 s
L
= 27.839
x
M
= 220 s
M
= 77.675
For low income housholds
cv
L
=
27.839
90
100 = 30.932
For middle income housholds
cv
M
=
77.675
220
100 = 35.307
For middle income households the standard deviation is 35% of the mean and for low income households the
standard deviation is 31% of the mean. There is a greater spread in the food expenditures of middle income
households than low income households.
3.9 Properties of the measures of spread
In this section you will learn the most important properties of each of the four measures of spread. This will
help you choose the best measure to use. Examples are given that illustrate the properties of the measures. These
examples are numbered by reference to the property illustrated. The topics covered are
the properties of the range
the properties of the standard deviation
the properties of the quartile deviation
the properties of the coefcient of variation
selecting a measure of spread
Properties of the range
1. It is easy to calculate and easy to understand.
2. It only uses the two extreme values and so differences in other values have no effect on the range.
Example 3.18
2. Suppose one marker marked a set of test papers as:
10 10 10 10 10 10 26
and a second marker marked the same set of papers as:
10 12 14 16 18 20 26.
The second marker is more varied in the marks awarded but the range for both markers is 16. The range
ignores the variation between the two extreme values and, therefore, does not adequately summarise the
variation in the data. This dependence on only the two extreme values makes the range a poor measure of
spread.
Properties of the standard deviation
1. The standard deviation measures the spread around the mean. Therefore, if the mean is not an appropriate
measure of the central value, then the standard deviation is not an appropriate measure of spread.
2. It uses all the observations (unlike the range and the quartile deviation).
3. It is based on the mean of the squared deviations and hence, like other means, can be strongly affected by
extreme values.
4. It is very easy to analyse mathematically.
5. The standard deviation is by far the most widely used measure of spread.
Properties of the quartile deviation
1. The quartile deviation only reects the range of the middle 50% of the observations all the other observa-
tions are discarded.
2. As all the large values and all the small values are discarded, the quartile deviation is not affected by any
extreme values.
Properties of the coefcient of variation
1. The coefcient of variation (cv) measures the size of the spread relative to the the size of the mean. It is a
measure of relative spread.
2. The coefcient of variation has no units. It can be used to compare the spread in observations measured in
different units.
Example 3.19
2. A sample of 20 Fijian village households had a mean cash income of $12 per week with a standard deviation
of $3 per week. A sample of 60 village households in Vanuatu had a mean cash income of VT2,000 per
week with a standard deviation of VT400 per week. Which sample has the wider spread in cash incomes?
Solution
cv
Fiji
=
3
12
100 = 25%
cv
V anuatu
=
400
2000
100 = 20%
For Fiji households, the standard deviation is 25% of the mean and for Vanuatu households, the standard
deviation is 20% of the mean. There is a greater relative spread of cash incomes between Fiji households
than between Vanuatu households.
Selecting a measure of spread
Let us pause for a while and ask: Which is the best measure of spread to use?
The standard deviation is the most commonly used measure of spread. It is the preferred measure of spread,
provided that it is not distorted by the presence of extreme values. In the presence of extreme values use the
quartile deviation as the measure of spread.
In general, the standard deviation is used as the measure of spread when the mean is used as the measure of the
central value. On the other hand, the quartile deviation is used as the measure of spread when the median is used
as the measure of the central value.
Quick Question 4
3.10 A measure of skewness
A data set is said to be symmetrically distributed if the observations have the same spread above and below the
central value. The dotplot below shows a symmetric distribution.
Figure 3.5: A Symmetric Distribution
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
mean
median
mode
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Notice that in this case the three measures of the central valuethe mean, the median and the modeare all
equal.
A data set is said to be positively skewed if the observations have a longer tail in the positive direction (to the
right) than in the negative direction (to the left). (In the diagrams below only the shape of the dotplot distribution
is shown.)
Figure 3.6: A Positively Skewed Distribution
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
mean
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
median
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
mode
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
In this case there are some large observations. The mean is more affected by extreme values than the median and
the mode. The mean is pulled up by the extreme values and so with a positively skewed distribution the mean is
larger than the median and the mode.
A data set is said to be negatively skewed if the observations have a longer tail in the negative direction (to the
left) than in the positive direction (to the right).
Figure 3.7: A Negatively Skewed Distribution
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
mean
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
median
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
mode
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
In this case there are some small observations. The mean is more affected by extreme values than the median and
the mode. The mean is pulled down by the extreme values and so with a negatively skewed distribution the mean
is smaller than the median and the mode.
You can see that:
1. for a symmetric distribution : mean = median.
2. for a positively skewed distribution : mean > median.
3. for a negatively skewed distribution : mean < median.
This simple relationship was exploited by the famous statistician Karl Pearson, who proposed the following mea-
sure of skewness:
skewness =
3(mean median)
standard deviation
100
For symmetric distributions, the measure of skewness will be 0. For positively skewed distributions (with a longer
tail in the positive direction than in the negative direction), the measure is positive. For negatively skewed distri-
butions, the measure is negative.
However, some care must be exercised in interpreting this measure of skewness as it is possible for a non-
symmetric distribution to have a measured skewness of 0. Symmetry is a sufcient but not a necessary condition
for a measured skewness of 0 and so if a set of data has a zero skewness we can only conclude that the data is not
skewed and may be symmetric.
Example 3.20
A simple random sample of 9 low income income households had the following weekly expenditures on food:
$80 $70 $70 $150 $70 $80 $110 $70 $110
Calculate and interpret a measure of skewness
Solution
From Examples 3.1, 3.2 and 3.15 we have:
mean = $90
median = $80
standard deviation = $27.839
.
.
. sk =
3(90 80)
27.839
= +1.078
The skewness is +1.1. The distribution of food expenditures is positively skewed.
3.11 Using statistics and box plots to compare data sets
One method of comparing two or more sets of data is to calculate a three gure summary comprising:
a measure of the central value;
a measure of the spread;
a measure of the skewness
for each set of data and then compare these calculated values across the data sets.
Example 3.21
A simple random sample of 40 UC graduates and 30 non-graduates had the annual incomes displayed in Tables
3.7 and 3.8 below.
Table 3.7: Annual Income of 40 UC Graduates ($000)
21 19 33 15 27 21 31 43 36 25
31 26 27 32 31 57 30 34 18 25
20 34 22 26 32 40 30 27 29 36
37 26 38 27 44 31 27 32 33 70
Source: A sample of 40 UC graduates in 1999.
Table 3.8: Annual Income of 30 Non-Graduates ($000)
10 19 22 20 23 29 22 24 16 22
28 20 25 20 29 28 22 15 15 21
21 15 16 34 29 33 11 24 24 23
Source: A sample of 30 non-graduates in 1999.
Compare the income distributions of graduates and non-graduates.
Solution
Looking at the graduate incomes we can see that there is one exceptionally large value of $70,000. This value
will pull up the mean and make the mean a poor central value. For graduate incomes it is better to use the median
as the measure of the central value and the quartile deviation as the measure of spread. The same measures should
be calculated for all the data sets being compared.
Table 3.9: Comparing Graduate and Non-Graduate Incomes
Graduates Non-Graduates
Median $30,500 $22,000
Quartile deviation $4,000 $3,000
Skewness +0.17 0.00
From this table we can conclude:
Graduates earn on average $8000 pa more than non-graduates.
There is a wider spread in the incomes of graduates than in the incomes of non-graduates.
Graduate incomes are positively skewed but non-graduate incomes are not skewed. (Be careful here
remember that zero skewness does not show that the distribution is symmetric!)
A second method of comparing two or more data sets is to use a ve gure summary. The ve gure summary
of a data set is:
the median
the rst and third quartiles
the minimum and maximum values
These ve measures are often displayed in a box plot. The form of the box plot is shown below.
Figure 3.8: Displaying Data in a Boxplot
................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Minimum
................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Q
1
................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Median
................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Q
3
................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Maximum
X
The lines from the quartiles to the minimum and maximum are called the whiskers.
To compare several data sets draw side-by-side box plots for each data set using the same horizontal axis.
Example 3.22
Use side-by-side box plots to compare the graduate and non-graduate incomes in Tables 3.7 and 3.8.
Solution
The ve gure summary of graduate and non-graduate incomes is displayed in Table 3.10 below.
Table 3.10: Comparing Graduate and Non-Graduate Incomes
Graduates Non-Graduates
Minimum $15,000 $10,000
Q
1
$26,000 $19,000
Median $30,500 $22,000
Q
3
$34,000 $25,000
Maximum $70,000 $34,000
Figure 3.9: Side-by-side Boxplots of Graduate and Non-Graduate Incomes
0 10 20 30 40 50 60 70 80
Graduates
Non-graduates
Incomes ($000)
Source: Random samples of 40 UC graduates and 30 non-graduates in 1999
From the vertical lines at the medians, we can see that the median income of graduates is higher than the
median income of non-graduates.
From the length of the two plots, there is a greater spread in the incomes of graduates than non-graduates.
For the graduates the whisker is longer on the right hand side than the left. The incomes of graduates is
positively skewed.
For the non-graduates the whisker is the same length on both sides of the box and the median is located
in the centre of the box. Incomes of non-graduates are not skewed.
3.12 Combining the mean and standard deviation
We hope that you have now learnt to calculate the mean and standard deviation. In this section you will learn
some important ways of combining the mean and standard deviation.
The empirical rule
We rst ask: what can we deduce about the observed values from a knowledge of the mean and the standard de-
viation? The empirical rule has been found to to hold for most sets of data.
The empirical rule:
For most sets of data:
1. about 70% of all the observed values lie within 1 standard deviation of the mean.
2. about 95% of all the observed values lie within 2 standard deviations of the mean.
3. over 99% of all the observed values lie within 3 standard deviations of the mean.
This is only an empirical result, ie. one that has been found to work in practice. It cannot be proved and these
results do not hold for some odd sets of data.
Example 3.23
A survey of 200 full-time UC students revealed that they worked an average of 30 hours per week with a standard
deviation of 4 hours per week. On the basis of the empirical rule, it is to be expected that:
1. about 70% of the 200 full-time students worked between (30 4) = 26 hours per week and (30 + 4) = 34
hours per week.
2. about 95%of the 200 full-time students worked between (3024) = 22 hours per week and (30+24) =
38 hours per week.
3. over 99% of the 200 full time students worked between (3034) = 18 hours per week and (30+34) =
42 hours per week.
z-scores
Notice that the empirical rule works in terms of standard deviations away from the mean. Distances in statistics
are often measured in terms of standard deviations.
zscore: The zscore of an observation is the number of standard deviations the observation is from the mean.
For population data the z score of an observation X is : z =
x
For sample data the z score of an observation X is: z =

x x
s
Notice that a zscore is a number and has no units. Observations above the mean have a positive zscore and
observations below the mean have a negative zscore. These zscores are very important and will be used fre-
quently in the later units of this course.
Example 3.24
The graduate incomes in Table 3.7 have a mean of $31,075 and a standard deviation of $10,075. Bruce has an
income of $12,000. How far is Bruces income from the mean?
Solution
z =
12000 31075
10075
= 1.89
Bruces income is 1.89 standard deviations below the mean.
The empirical rule can be re-stated in terms of zscores.
For most sets of data:
1. about 70% of all the observed values have a zscore of between 1 and +1;
2. about 95% of all the observed values have a zscore of between 2 and +2;
3. over 99% of all the observed values lie have a zscore between 3 and +3.
Outliers
Very few observations are more than 3 standard deviations from the mean. We will refer to such an observation
as an outlier.
Outlier: An observation that is more than 3 standard deviations from the mean
Example 3.25
The graduate incomes in Table 3.7 have a mean of $31,075 and a standard deviation of $10,075. The highest
income is $70,000. Is this an outlier?
Solution
z =
70000 31075
10075
= +3.86
This income is 3.86 standard deviations above the mean. It is more than 3 standard deviations from the mean and
so it is an outlier.
Outliers are unusual values. Investigate any outliers in a data set:
was the observed value recorded correctly?
were there any special circumstances that led to this unusual observation?
Socrates
3.13 Calculating the measures of the central value and spread
You now know the measures of the central value and spread of a data set. This section covers the details of the
calculations. In the real world all calculations are carried out on a computerusually in a spreadsheet. However:
in this and other courses you will have small data sets that can be quickly summarised using a calculator;
some of the professional bodies (eg the Institute of Cost Accountants) require that their members be able
to calculate these measures by hand;
and so we need to cover these details.
This section shows the calculations for
raw data;
data in an exact value frequency distribution;
data in a class frequency distribution;
3.13.1 Calculations for raw data
The formulae for the measures of the central value
mean : x =
x
n
median : M = the value of the
n+1
2
th observation in the array
(if n is odd use the mean of the values on either side of
n+1
2
)
mode : mode = the most frequently occurring value
The formulae for the measures of spread
Range : Range = largest value - smallest value
Standard deviation =
(x )
2
N
=
X
2
N

_
X
N
_
2
s =
(x x)
2
n 1
=
_
1
n 1
_
x
2
x)
2
n
_
Quartile deviation QD =
Q
3
Q
1
2
where Q
1
= the value of the
n+1
4
Q
3
= the value of the
3(n+1)
4
Note: With Q
1
If
n+1
4
is a non-integer value use the mean of the values on either side of
n+1
4
.
Similarly with Q
3
.
Example 3.26
A questionnaire was distributed to all the rooms in a large hotel. One of the questions asked for the number of
guests staying in the hotel room. The rst 8 responses to this question were:
1 4 2 2 2 1 4 5
Calculate the measures of the central value and the spread for these data.
Solution
Calculating the measures of the central value
The mean
Let x = the mean number of guests per room
Then x =
1 + 4 + 2 + 2 + 2 + 1 + 4 + 5
8
=
21
8
= 2.625
The mean number of guests per room was 3.
The median
Ordering the observations from smallest to largest gives:
1 1 2 2 2 4 4 5
Let M = the median number of guests per room
= the value of the
8+1
2
= 4.5th observation
= the mean value of the 4th and 5th observations in the array
=
2+2
2
= 2
The median number of guests per room was 2.
The mode
Sample mode = 2
The modal number of guests per room was 2.
Calculating the measures of spread
The range
Range = 5 1 = 4
The range was 4 guests per room.
The standard deviation: method 1
x = 1 + 4 + 2 + 2 + 2 + 1 + 4 + 5 = 21
x
2
= 1
2
+ 4
2
+ 2
2
+ 2
2
+ 2
2
+ 1
2
+ 4
2
+ 5
2
= 71
.
.
. s =
_
1
n 1
_
x
2
x)
2
n
_
=
_
1
8 1
_
71
21
2
8
_
= 1.5059
The standard deviation of the number of guests per room was 1.5.
This is a much more efcient method for calculating the standard deviation than the method used earlier in
this unit.
The standard deviation: method 2
Use the standard deviation keys on your calculator. Read the instruction manual to nd out how to use your
calculator for standard deviations. The instructions below relate to a simple Casio calculator.
Step 1: Clear the calculator.
Press AC SHIFT AC
Step 2: Go into SD (standard deviation) mode.
Press MODE
(This key differs between modelslook at the mode list on
your calculator)
An SD indicator should appear in the calculator display.
Step 3: Enter the data. After entering each observation, press the DATA key.
Press 1 DATA
Press 4 DATA
Press 2 DATA
Press 2 DATA
Press 2 DATA
Press 1 DATA
Press 4 DATA
Press 5 DATA
Step 4: Extract the results.
For the mean,
Press SHIFT X (the X key is usually the 1 key)
This gives the X = 2.625
For the standard deviation of a sample,
Press SHIFT
x
n1 (the
x
n1 key is usually the 3 key)
This gives S = 1.5059.
For the standard deviation of a population,
Press SHIFT
x
n (the
x
n key is usually the 2 key)
This gives = 1.4087.
There are a number of exercises in this course that require the calculation of standard deviations from raw
data. Always use your calculatorit is by far the quickest way. In an exam situation you will not have time
to use any other method.
Ordering the observations from smallest to largest gives:
1 1 2 2 2 4 4 5
Q
1
= the value of the
8+1
4
= 2.25th observation
= the mean of the values of the 2nd and 3rd observations
=
1+2
2
= 1.5
Q
3
= the value of the
3(8+1)
4
= the mean of the values of the 6th and 7th observations
=
4+4
2
= 4
QD =
41.5
2
= 1.25
The quartile deviation was 1.25 guests per room.
Quick Question 5
3.13.2 Calculations for a value frequency distribution
mean : x =
fx
n
median : m = the value of the
n+1
2
(if n is odd, use the mean of the values on either side of
n+1
2
)
mode : mode = the most frequently occurring value
Range: Range = largest value - smallest value
Standard deviation =
f(x )
2
N
=
fx
2
N

_
fx
N
_
2
s =
f(x x)
2
n 1
=
_
1
n 1
_
fx
2
fx)
2
n
_
Quartile deviation QD =
Q
3
Q
1
2
where Q
1
= the value of the
n+1
4
Q
3
= the value of the
3(n+1)
4
Note: With Q
1
If
n+1
4
is a non-integer value use the mean of the values on either side of
n+1
4
.
Similarly with Q
3
.
Example 3.27
A questionnaire was distributed to all the rooms in a large hotel. One of the questions asked for the number
of guests staying in the hotel room. The rst 8 responses to this question were organised into an exact value
frequency table as shown in Table 3.11.
Table 3.11: The Number of Guests per Room
Number of Guests Frequency
x f
1 2
2 3
3 0
4 2
5 1
Total 8
Source: A sample of 8 rooms in 1999
Solution
The mean
The calculations can be laid out as shown in Table 3.12 below.
Table 3.12: Calculating the Mean
x f fx
1 2 2
2 3 6
3 0 0
4 2 8
5 1 5
Total 8 21
Let x = the mean number of guests per room
Then x =
21
8
= 2.625
The mean number of guests per room was 3.
The median
Summing down the frequency column to nd the or-less-than cumulative frequency distribution gives Table
3.13 below.
Table 3.13: The Or-less-than Cumulative Frequency Distribution
Number of Guests Frequency Cumulative
x f Frequency
1 2 2
2 3 5
3 0 5
4 2 7
5 1 8
Total 8
Let M = the median number of guests per room
Then M = the value of the
8+1
2
= 4.5th observation
= the mean value of the 4th and 5th observations
From Table 3.13 it can be seen that the 3rd, 4th and 5th observations all have value 2.
.
.
. M =
2+2
2
= 2
The median number of guests per room was 2.
The mode
Sample mode = the value with the highest frequency = 2
The modal number of guests per room was 2.
Range
Range = 5 1 = 4
The range was 4 guests per room.
The calculations can be laid out as shown in Table 3.14 below.
Table 3.14: Calculating the Standard Deviation
x f fx fx
2
1 2 2 2
2 3 6 12
3 0 0 0
4 2 8 32
5 1 5 25
Total 8 21 71
.
.
. s =
_
1
7
_
71
(21)
2
8
_
=
15.875
7
=
2.2679 = 1.5059
The standard deviation of the number of guests per room was 1.5 guests.
3.15 below.
Table 3.15: The Cumulative Frequency Distribution
Number of Guests Frequency Cumulative
x f Frequency
1 2 2
2 3 5
3 0 5
4 2 7
5 1 8
Total 8
Q
1
= the value of the
8+1
4
= 1.5 (as from the table, observations 1 and 2 have value 1 and observations 3, 4 and 5
have value 2)
Q
3
= the value of the
3(8+1)
4
= 4 (as from the table, observations 6 and 7 have value 4)
QD =
41.5
2
= 1.25
The quartile deviation was 1.5 guests per room.
Quick Question 6
3.13.3 Calculations for a class frequency distribution
mean : x =
fm
n
where m is the class mark
median : M = the value of the
n
2
mode : mode = the class with the highest bar in the histogram
Range: Range = largest possible value - smallest possible value
Standard deviation: =
f(m)
2
N
=
fm
2
N

_
fm
N
_
2
s =
f(mx)
2
n 1
=
_
1
n 1
_
fm
2
fm)
2
n
_
Quartile deviation: QD =
Q
3
Q
1
2
where Q
1
= the rst quartile
= the value of the
n
4
Q
3
= the third quartile
= the value of the
3n
4
Points to note about the calculations:
1. The mean
The calculations for the mean assume that the mean value of the observations in each class is equal to the
class mark. This enables the total value of the observations in each class to be estimated by multiplying
the class frequency by the class mark (see property (f) of the properties of the mean). This assumption is
only an approximation and means calculated from class frequency tables are therefore only approximate.
The results should be rounded accordingly.
2. The median
In calculating the median from a class frequency distribution, we have to estimate the value of the kth
observation in the median class. The interpolation formula for estimating the value of the kth observa-
tion in a class is:
v
k
= +
k
f
w
where v
k
= the estimated value of observation k
= the class lower boundary
k = the number of the observation to be estimated
f = the class frequency
w = the class width
This formula assumes that the values are evenly spread over the class. This assumption is only an ap-
proximation and medians calculated from class frequency tables are therefore approximate. The results
should be rounded accordingly. Example 3.28 illustrates the use of this formula.
Note that the median for the raw data and the value frequency distribution is the value of the
n+1
2
th
observation but the median from the class frequency distribution is the value of the
n
2
th observation.
3. The mode
There are procedures available for estimating a modal value within the modal class but they have little
merit. In this course we will only seek to identify the modal class.
4. The standard deviation
The calculations for the standard deviation assume that all of the observations in each class are equal
to the class mark. This assumption leads to an approximate standard deviation and the result should be
rounded accordingly.
5. The quartile deviation
In calculating the quartile deviation, it is assumed that the observations in the quartile classes are evenly
distributed over these classes. The interpolation formula described above is then used to estimate the
value of the kth observation in the quartile classes. This leads to an approximate quartile deviation and
the result should be rounded accordingly.
Note that the quartiles from the raw data and from the exact value frequency distribution are the values
of the
(n+1)
4
th and
3(n+1)
4
th observations, but the quartiles from the class frequency distribution are the
values of the
n
4
th and
3n
4
th observations.
Example 3.28
Estimate the values of the four observation in the fth class of Table 3.16.
Solution
v
1
= 24.5 +
1
4
5 = 25.75
v
2
= 24.5 +
2
4
5 = 27.00
v
3
= 24.5 +
3
4
5 = 28.25
v
4
= 24.5 +
4
4
5 = 29.50
The estimated incomes of the four graduates are $25,750, $27,000, $28,250 and $29,500. However, this method
only provides an approximate gure for the incomes and so it is more realistic to round the results to $26,000,
$27,000, $28,000 and $29,000.
Example 3.29
A questionnaire was sent to a random sample of UC students one year after graduation. The incomes of the
responding graduates are summarised in Table 3.16 below.
15 up to 20 3
20 up to 25 4
25 up to 30 11
30 up to 35 13
35 up to 40 4
40 up to 45 3
45 up to 75 2
Total 40
Solution
The mean
Table 3.17: Calculating the Mean Income
Class Boundaries Frequency Class mark
($000) f m fm
14.5 19.5 3 17.0 51
19.5 24.5 4 22.0 88
24.5 29.5 11 27.0 297
29.5 34.5 13 32.0 416
34.5 39.5 4 37.0 148
39.5 44.5 3 42.0 126
44.5 74.5 2 59.5 119
Total 40 1245
Let x = the mean income of the 40 graduates ($000)
Then x =
1245
40
= 31.125
The mean income of the 40 graduates was $31,000.
The median
3.18 below.
Table 3.18: The Cumulative Frequency Distribution of Incomes
Class Boundaries Frequency Cumulative Frequency
14.5 19.5 3 3
19.5 24.5 4 7
24.5 29.5 11 18
29.5 34.5 13 31
34.5 39.5 4 35
39.5 44.5 3 38
44.5 74.5 2 40
Total 40
Let M = the median income of the 40 graduates ($000)
Then M = the value of the
40
2
= 20th observation
= the value of observation 2 in class 4
= 29.5 +
2
13
5 = 30.27
The median income of the 40 graduates was $30,000.
The mode
The modal class is the class with the highest bar on the histogram. Here, the modal class was $30,000 up to
$35,000.
The range
Range = 74.5 14.5 = 60
The range was $60,000.
Table 3.19: Calculating the Standard Deviation of Incomes
Class Boundaries Frequency Class mark
($000) f m fm fm
2
14.5 19.5 3 17.0 51 867.0
19.5 24.5 4 22.0 88 1936.0
24.5 29.5 11 27.0 297 8019.0
29.5 34.5 13 32.0 416 13312.0
34.5 39.5 4 37.0 148 5476.0
39.5 44.5 3 42.0 126 5292.0
44.5 74.5 2 59.5 119 7080.5
Total 40 1245 41982.5
.
.
. S =
_
1
39
_
41982.5
1245
2
40
_
=
82.8686 = 9.1032
The standard deviation of graduate incomes was $9,000.
3.20 below.
Table 3.20: The Cumulative Frequency Distribution of Incomes
Class Boundaries Frequency Cumulative Frequency
14.5 19.5 3 3
19.5 24.5 4 7
24.5 29.5 11 18
29.5 34.5 13 31
34.5 39.5 4 35
39.5 44.5 3 38
44.5 74.5 2 40
Total 40
Q
1
= the value of the
40
4
= 10th observation
= the value of the 3rd observation in class 3
= 24.5 +
3
11
5 (using the interpolation formula)
= 25.864
Q
3
= the value of the
340
4
=30th observation
= the value of the 12th observation of class 4
= 29.5 +
12
13
5 (using the interpolation formula)
= 34.115
QD =
34.11525.864
2
= 4.126
The quartile deviation was $4,000.
Quick Question 7
3.14 Unit summary
This has been a Unit of calculations. You have learned how to calculate
four measures of the central value of a data setthe mean, the median, the mode and the geometric mean;
four measures of the spread in the values of a data setthe range, the standard deviation, the quartile deviation
and the coefcient of variation;
a measure of skewness.
You have also learned how to choose the most suitable measures of the central value and the spread. In Unit 4
you will learn how to measure the relationship between two variables. In all the calculation units, remember to
round your answers appropriately. If the calculation is based on approximations or uses unreliable or rounded
data, round your answers.
Read the learning objectives at the start of this unit. Have you achieved those objectives? If there are any
UNIT 4
DESCRIBING
THE RELATIONSHIP
BETWEEN
TWO VARIABLES
4.0 Contents
4.1 Unit objectives
4.2 Introduction
4.3 Scatter diagrams
4.4 Types of relationship between variables
4.5 The covariance between two variables
4.6 Spearmans rank correlation coefcient
4.7 Linear relationships
4.8 The correlation coefcient
4.9 Estimating linear relationships
4.10 The coefcient of determination
4.11 Estimating linear relationships by hand
4.12 Unit summary
Print Workbook 4
4.1 Unit objectives
You now know how to summarise the observed values of a single variable. In this unit you will learn how to
describe the relationship between the observed values of two variables.
After completing this unit you should be able to:
Draw a scatter diagram of the values of two variables.
Explain what is meant by a monotonic relationship.
Distinguish between a monotonic and a linear relationship.
Calculate and interpret the covariance between the values of two variables.
Calculate and interpret Spearmans rank correlation coefcient for the values of two variables.
Calculate and interpret the correlation coefcient for the values of two variables.
Explain the principle of least squares.
Estimate and interpret a simple linear regression.
Calculate and interpret the coefcient of determination for a simple linear regression.
4.2 Introduction
Most elds of study investigate the relationship between two or more variables. This is certainly true of account-
ing, economics and nance.
For example, in economics we have:
1. the quantity demanded of a good depends on the price of the good. An increase in the price of a good
results in a fall in the quantity demanded. The relationship between the price of a good and the quantity
demanded is called the demand function.
2. the quantity supplied of a good depends on the price of the good. An increase in the price of a good
results in an increase in the quantity supplied. The relationship between the price of a good and the
quantity supplied is called the supply function.
3. the aggregate demand for consumption goods in an economy depends on the aggregate income in the
economy. As the aggregate income increases the aggregate demand for consumption goods increases.
The relationship between the aggregate income and the aggregate demand for consumption goods is
called the consumption function.
4. the level of investment in an economy depends on the rate of interest. As the rate of interest increases
the level of investment falls. The relationship between the rate of interest and the level of investment is
called the investment function.
Notice the following common points about the above statements:
Each statement asserts that a relationship exists between two variables.
With each statement the value of one variable is said to depend on the value of another variable. The
variable whose value depends on the other variable is called the dependent variable. The other variable
is then referred to as the independent variable. In the rst example above, the quantity demanded is
the dependent variable and the price of the good is the independent variable. In the fourth example the
level of investment is the dependent variable and the rate of interest is the independent variable. In the
examples below Y is used to denote the dependent variable and X is used to denote the independent
variable.
Each statement asserts that when the independent variable increases, the dependent variable will change
in a specied direction.
If an increase in the independent variable leads to an increase in the dependent variable there is said to be
a positive relationship between the two variables. Example 2 states that there is a positive relationship
between the price of a good and the quantity supplied. Example 3 states that there is a positive relation-
ship between the aggregate income in an economy and the aggregate demand for consumption goods.
If an increase in the independent variable leads to a decrease in the dependent variable there is said to be
a negative relationship between the two variables. Example 1 states that there is a negative relationship
between the price of a good and the quantity demanded. Example 4 states that there is a negative rela-
tionship between the interest rate and the level of investment in an economy.
If there is either a positive relationship or a negative relationship between two variables there is said to
be a monotonic relationship between the two variables.
Unfortunately, in the social sciences the relationships between variables are rarely as simple as the above functions
suggest. For example, although the demand for a good does depend on the price of the good, it also depends on
many other factors such as the prices of substitute and complementary goods, income and taste. Changes in the
values of any of these other variables will also result in changes in demand. The nice smooth curves drawn in
economic textbooks are drawn under the assumption that the values of all these other variables do not change. This
assumption is called ceteris paribus. In practice the values of other variables do change between observations
and so the observed points do not lie on a smooth curve but are scattered around a curve. This can make it difcult
to decide whether or not there is a relationship between two variables.
In the sections below we will try to answer the following questions:
1. Is there a monotonic relationship between two variables? To answer this question we will look at scatter
diagrams and calculate covariances.
2. If there is a monotonic relationship between two variables, how strong is the relationship between the
variables? Spearmans rank correlation coefcient is used to measure the strength of the monotonic
relationship between two variables.
3. If there is a monotonic relationship between two variables, is the relationship also linear i.e. do the
observed points lie close to a straight line? The scatter diagram is used to decide if the relationship is
linear.
4. If there is a linear relationship between two variables, how strong is the linear relationship? The cor-
relation coefcient and the coefcient of determination are used to measure the strength of the linear
relationship between two variables.
5. If there is a linear relationship between two variables what is the equation of the linear relationship?
Least squares lines are used to estimate the equation of the relationship between two variables.
Quick Question 1
4.3 Scatter diagrams
When investigating the relationship between the values of two variables measured on an interval or ratio level of
measurement the rst step should always be to plot the data with the dependent variable up the vertical axis
and the independent variable along the horizontal axis. This is called a scatterplot or a scatter diagram.
Example 4.1
The monthly incomes and food expenditures of a simple random sample of 5 households are shown in Table 4.1
below.
Table 4.1: Income and Food Expenditure
Household Income Food Expenditure
($000 per month) ($000 per month)
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Source: Simple random sample of 5 households
What is the relationship between income and food expenditure?
Solution
Economics, and common sense, suggest that household food expenditure depends on household income. House-
hold food expenditure is the dependent variable and household income the independent variable. All scatter
diagrams should be drawn with the dependent variable up the vertical axis and the independent variable along the
horizontal axis. The data is plotted with food expenditure up the vertical axis and income along the horizontal
axis.
Figure 4.1: Household Monthly Income and Food Expenditure
0 2 4 6 8 10
Income($000 pm)
0.5
1.0
1.5
2.0
Food expenditure($000 pm)
Source: Random sample of households

Usually a higher income is associated with a higher food expenditure. The relationship is not perfect as a higher
income does not always lead to a higher food expenditure. Household 3 has a higher income than household 2 but
a lower food expenditure. This is due to the inuence of factors other than income on household food expenditure.
For example, household 2 could have more people to feed than household 3.
Look for any shape to the points in a scatter diagram. If the independent variable has a strong effect on the
dependent variable then there will be a clear shape to the points. If the independent variable has little or no effect
on the dependent variable then the points will be randomly scattered over the graph.
Remember that however strong the relationship between the variables appears to be, we can never conclude that
changes in the value of the independent variable cause changes in the value of the dependent variable. There are
always other possible explanations for the observed relationship:
It could be that changes in the dependent variable cause changes in the independent variable. The
analysis has mixed up the direction of causation between the variables.
It could be that changes in some third variable cause changes in both the dependent and independent
variables. This is called a spurious correlation between the variables. The most common third variable
is time. Figure 4.2 on the next page is a plot of the number of house break-ins in the ACT against the
number of UCaccounting students. There appears to be a strong relationship between these two variables
but obviously it would be wrong to conclude that increases in the number of UCstudents causes increases
in the number of break-ins. The upward slope to the points in the scatter diagram resulted from both these
variables increasing over time. Any two variables that increase over time will give a scatter diagram
similar to this and so exhibit a spurious correlation. This complicates the identication of relationships
between economic variables.
Figure 4.2: House Break-ins and UC Accounting Students
Number of UC Accounting Students
Number of Break-ins
Source: Bureau of Statistics, 1998

Remember that statistics can never show causation. However sophisticated the statistical techniques we use,
we can never show that changes in the value of one variable cause changes in the value of some other variable.
We can only show that the variables have changed together.
Statistics can never show causation. Statistics can never show causation. Statistics can never show causa-
tion. Statistics can never show causation. Statistics can never show causation. Got it?
4.4 Types of relationships between variables
In this unit we will seek to identify ve different types of relationship between two variables:
1. no relationship between the variables. The independent variable has little or no effect on the dependent
variable.
2. a positive relationship. The higher the value of the independent variable the higher the value of the dependent
variable.
3. a negative relationship. The higher the value of the independent variable the lower the value of the dependent
variable.
4. a positive linear relationship. This is a special case of a positive relationship. The higher the value of the
independent variable the higher the value of the dependent variable and the points are scattered around a
straight line.
5. a negative linear relationship. This is a special case of a negative relationship. The higher the value of the
independent variable the lower the value of the dependent variable and the points are scattered around a
straight line.
These cases are illustrated below.
Case 1: No relationship between the two variables
Figure 4.3: No Relationship between the Two Variables
X
Y
There is no shape to the observations. There is no observed relationship between the two variables. Again we
have to be careful here. It is possible that the independent variable does inuence the dependent variable but that
this effect has been masked by the effect of changes in other variables. All we can conclude is that there is no
evidence of a relationship.
This type of scatter diagram could result from plotting:
the quantity demanded of beer against the price of a premium wine. Beer and premium wine are very
distant substitutes and so any relationship between these variables is unlikely to be revealed in a scatter
diagram.
the daily percentage change in the All Ords against the time you get up in the morning. Presumably there
is no relationship between these variables.
Case 2: A positive relationship
Figure 4.4: A Positive Relationship Between the Two Variables
X
Y

The points slope up from left to right. As X increases Y usually increases.

When the points slope upwards there is said to be a positive relationship between the observed values of the
two variables. Notice that in this case there appears to be a curve to the observed points. Although there is a
monotonic relationship there is not a linear relationship.
the quantity supplied of a good against the price of the good;
the quantity demanded of a good against the price of a substitute good;
the aggregate consumption in a country against the aggregate income;
a students mark in a test against the number of hours she studied for the test.
Case 3: A negative relationship
Figure 4.5: A Negative Relationship Between the Two Variables
X
Y
The points slope down from left to right. As X increases Y usually decreases.
When the points slope downwards there is said to be a negative relationship between the observed values of the
two variables. Again notice that the relationship between these two variables does not appear to be linear.
the quantity demanded of a good against the price of the good;
the quantity demanded of a good against the price of a complementary good;
the level of investment in a country against the rate of interest.
Case 4: A positive linear relationship
Figure 4.6: A Positive Linear Relationship Between the Two Variables
X
Y
The points are scattered around an upward sloping straight line. There is a positive linear relationship between the
observed values of the two variables. A positive linear relationship is a special case of a positive relationshipall
linear relationships are monotonic but not all monotonic relationships are linear.
Economists do not usually specify that relationships are linear but statisticians often do because it makes the
relationships easier to estimate! It may be easier to estimate but is it right? Linear relationships should not be
estimated for nonlinear data.
the estimated quantity supplied of a good against the price of the good;
the estimated aggregate consumption in a country against the aggregate income.
Case 5: A negative linear relationship
Figure 4.7: A Negative Linear Relationship Between the Two Variables
X
Y
The points are scattered around a downward sloping straight line. There is a negative linear relationship be-
tween the observed values of the two variables. A negative linear relationship is a special case of a negative
relationshipall linear relationships are monotonic but not all monotonic relationships are linear.
the estimated quantity demanded of a good against the price of the good;
the estimated level of investment in a country against the rate of interest.
Note
1. A monotonic relationship is a special type of relationship. Later in this subject you will learn how to test
for the existence of a monotonic relationship. If the test shows that there is no evidence of a monotonic
relationship between the two variables then there could be
(a) no relationship between the variables or
(b) a non-monotonic relationship. Figure 4.8 is a scatter diagram of two variables that are clearly related
but the relationship is not monotonic. As X increases Y at rst decreases and then increases.
Figure 4.8: A Non-monotonic Relationship between the Two Variables
X
Y
Non-monotonic scatter diagrams could result from plotting:

the average cost per unit of output against the number of units produced;
the prots of a monopolist against the price per unit charged.
2. A linear relationship is a special case of a monotonic relationship. Later in this course you will learn how
to test for a linear relationship between two variables. If the test shows that there is no evidence of a linear
relationship between the two variables then there could be
(a) no relationship between the variables or
(b) a non-linear monotonic relationship between the variables or
(c) a non-monotonic relationship between the variables.
It is an extremely common error for unskilled users of statistics to test for a linear relationship and, on
nding that there is no evidence of a linear relationship, conclude that there is no relationship between the
two variables. There could still be a nonlinear relationship!
Drawing a scatter diagram is an essential rst step in the investigation of the relationship between two variables.
Always draw the scatter diagram. However, graphs can be interpreted in different ways by different people. After
looking at Figure 4.4 it was asserted that the relationship between the variables was positive but not linear. Do
you agree? You may feel that the relationship is linear or that there is no relationship between the variables. We
need numerical measures of the strength of the relationship between two variables.
In the next two sections we will look at numerical measures used to identify and measure the strength of
monotonic relationships.
Then in the remainder of this unit we will see how to identify, estimate and measure the strength of linear
relationships.
Quick Question 2
4.5 The covariance between two variables
A useful visual aid when examining scatter diagrams is to draw a vertical line through the mean of the X values
and a horizontal line through the mean of the Y values onto the scatter diagram.
Figure 4.9: A Scatterplot with the Mean Lines
X
Y

x
y
In Figure 4.9 the blue line is the vertical line through the mean of the X values and the red line is the horizontal
line through the mean of the Y values.
All the points to the left of the blue line have x < x and so for these points (x x) < 0.
All the points to the right of the blue line have x > x and so for these points (x x) > 0.
All points below the red line have y < y and so for these points (y y) < 0.
All points above the red line have y > y and so for these points (y y) > 0.
The lines through the means divide the scatter diagram into four quadrants:
Figure 4.10: The Value of (x x)(y y)
X
Y

Q1 Q2
Q3 Q4
x
(x x) < 0 (x x) > 0
y
(y y) < 0
(y y) > 0
For all points in Q1 (x x) > 0 and (y y) > 0 .
.
. (x x)(y y) > 0
For all points in Q2 (x x) < 0 and (y y) > 0 .
.
. (x x)(y y) < 0
For all points in Q3 (x x) < 0 and (y y) < 0 .
.
. (x x)(y y) > 0
For all points in Q4 (x x) > 0 and (y y) < 0 .
.
. (x x)(y y) < 0
The covariance between the variables X and Y is the mean of the observed values of (xx)(y y) and it is used
to distinguish between positive and negative relationships.
Covariance of a population: The covariance between the variables X and Y in the population is denoted by
COV (X, Y ) and is dened as:
COV (X, Y ) =
(x
x
)(y
y
)
N
Covariance of a sample: The covariance between the variables X and Y in the sample is denoted by cov(X, Y )
and is dened as:
cov(X, Y ) =
(x x)(y y)
n 1
The covariance can be calculated for data measured on at least an interval scale.
Notice that the denominator of cov(X, Y ) is (n 1) and not n. The sample covariance is used to estimate the
population covariance. It can be shown that using (n 1) in the denominator of the sample covariance gives, on
average, a better estimate of the population covariance than using n in the denominator. (Remember that we did
this for variances too!)
The denitions given above are not computationally efcient. To calculate the covariance use the formulae:
COV (X, Y ) =
xy
N

x
y
cov(X, Y ) =
1
n 1
_
xy
x.
y
n
_
The covariance for a positive relationship
Figure 4.11: The Covariance for a Positive Relationship
X
Y

x
y
Where there is a positive relationship between X and Y in the sample most of the observations are in the rst and
third quadrants. In the rst and third quadrants (x x)(y y) > 0 and so the covariance is the mean of mostly
positive numbers. The covariance is therefore positive.
Thus if the covariance between the observed values of two variables is positive this suggests that there may be a
positive relationship between the two variables.
The covariance for a negative relationship
Figure 4.12: The Covariance for a Negative Relationship
X
Y
x
y
Where there is a negative relationship between X and Y in the sample most of the observations are in the second
and fourth quadrants. In the second and fourth quadrants (x x)(y y) < 0 and so the covariance is the mean
of mostly negative numbers. The covariance is therefore negative.
Thus if the covariance between the observed values of two variables is negative this suggests that there may be a
negative relationship between the two variables.
The covariance for no relationship
Figure 4.13: The Covariance Where There is No Relationship
X
Y
x
y
Where there is a no relationship between X and Y the observations are scattered over all four quadrants. Some
of the observations have (x x)(y y) > 0 and some have (x x)(y y) < 0. In the sum of these values the
positive and negative values will approximately cancel out giving a sum of approximately 0 and so the covariance
is close to 0.
Thus if the covariance between the observed values of two variables is close to 0 this suggests that may not be a
monotonic relationship between the two variables. (Though there could still be a non-monotonic relationship.)
Example 4.2
Calculate the covariance for the food and income data in Table 4.2.
Household Income Food Expenditure
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Solution
When examining the relationship between two variables always plot the data rst.
0 2 4 6 8 10
Income($000 pm)
0.5
1.0
1.5

x
y
There are more observations in the rst and third quadrants than in the second and fourth quadrants. The covari-
ance may be positive.
Let X = the monthly income of the household ($000)
Y = the monthly food expenditure of the household ($000)
Table 4.3: Calculating the Covariance
Household x y x x y y (x x)(y y)
1 1 0.5 4 0.5 +2.0
2 3 1.1 2 +0.1 0.2
3 4 0.8 1 0.2 +0.2
4 7 1.0 +2 +0.0 +0.0
5 10 1.6 +5 +0.6 +3.0
Total 25 5.0 0 0 5.0
Mean 5 1.0
.
.
. cov(X, Y ) =
(x x)(y y)
n 1
=
5.0
4
= +1.2500
The covariance between income and food expenditure is $
2
1,250,000. As the covariance is positive, there may be
a positive relationship between income and food expenditure.
Or, much more efciently:
Table 4.4: Calculating the Covariance
Household x y xy
1 1 0.5 0.5
2 3 1.1 3.3
3 4 0.8 3.2
4 7 1.0 7.0
5 10 1.6 16.0
Total 25 5.0 30.0
.
.
. cov(X, Y ) =
1
n 1
_
xy
x.
y
n
_
=
1
4
_
30.0
25 5.0
5
_
= +1.25
The covariance between income and food expenditure is $
2
1,250,000.
The covariance has no minimum or maximum possible values and so it does not give a clear indication of the
strength of the monotonic relationship between the two variables. For example, in Example 4.2 does a covariance
of $
2
1,250,000 show that there is a strong positive monotonic relationship between the two variables or is this
number close to zero so that it shows that there is no observed relationship?
On its own the covariance is a poor measure of the relationship between two variablesit indicates the direction
of the relationship but not the strength.
Later in this unit you will learn how to combine the covariance with other measures to obtain the correlation
coefcienta very important measure of the strength of the linear relationship between two variables. In the
next section you will learn of Spearmans rank correlation coefcient which provides a better measure of the
strength of the monotonic relationship between two variables.
Quick Question 3
4.6 Spearmans rank correlation coefcient
One measure of the strength of the monotonic relationship between two variables is Spearmans rank correlation
coefcient, r
S
. This can be calculated for data measured on at least an ordinal scale.
Spearmans rank correlation coefcient: This is calculated by the following 4 step procedure:
1. Rank the values of the X variable from smallest to largest. Denote the rank of x by R(x).
2. Rank the values of the Y variable from smallest to largest. Denote the rank of y by R(y).
3. For each observation calculate
d = the difference between the two ranks = R(x) R(y).
4. Calculate r
S
by
r
S
= 1
6
d
2
n(n
2
1)
where n = the number of observations.
These calculations are illustrated in the example below. You will learn how to interpret the calculated coefcient
later in this section.
Example 4.3
Calculate Spearmans rank correlation coefcient for the food and income data in Table 4.5 below.
Household Income Food
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Solution
1. Rank the values of the X variable (income) from smallest to largest. Denote the rank of x by R(x).
x 1 3 4 7 10
.
.
. R(x) 1 2 3 4 5
Write these ranks as an extra column in the above table.
2. Rank the values of the Y variable from smallest to largest. Denote the rank of y by R(y).
y 0.5 1.1 0.8 1.0 1.6
.
.
. R(y) 1 4 2 3 5
Write these ranks as an extra column in the above table.
3. For each observation calculate d = R(x) R(y).
Table 4.6: Calculating the Differences in the Ranks
Income Food
x y R(x) R(y) d
1 0.5 1 1 0
3 1.1 2 4 2
4 0.8 3 2 1
7 1.0 4 3 1
10 1.6 5 5 0
4. Calculate r
S
by r
S
= 1
6
d
2
n(n
2
1)
.
.
.

d
2
= 0
2
+ (2)
2
+ 1
2
+ 1
2
+ 0
2
= 6
n = the number of points = 5
.
.
. r
S
= 1
6
d
2
n(n
2
1)
= 1
6 6
5(5
2
1)
= +0.7
Spearmans rank correlation coefcient for the food and income data is +0.7.
You can now calculate Spearmans rank correlation coefcient but what is the interpretation of the calculated
value?
It can be shown that the calculated value will always lie between 1 and +1.
1 r
S
+1
If there is a positive relationship between the variables X and Y the value of r
S
is positive. The stronger
the relationship the closer is r
S
to +1. The weaker the relationship the closer is r
S
to 0.
If an increase in X always leads to an increase in Y then r
S
= +1 . This is called a perfect positive
relationship.
If there is a negative relationship between the variables X and Y the value of r
S
is negative. The stronger
the relationship the closer is r
S
to 1. The weaker the relationship the closer is r
S
to 0.
If an increase X always leads to a decrease in Y then r
S
= 1 . This is called a perfect negative rela-
tionship.
We can use r
S
to categorise monotonic relationships. The following categories are admittedly arbitrary and are
only adopted temporarily. Later on in this subject you will learn a more scientic approach to assessing the
strength of monotonic relationships in samples.
Table 4.7: Using Spearmans Rank Correlation to Categorise Monotonic Relationships
Spearmans rank correlation Strength of monotonic relationship
r
S
= +1 perfect positive relationship
+ 0.7 r
S
< +1.0 strong positive relationship
+ 0.4 r
S
< +0.7 medium strength positive relationship
0.4 < r
S
< +0.4 little or no monotonic relationship
0.7 < r
S
0.4 medium strength negative relationship
1.0 < r
S
0.7 strong negative relationship
r
S
= 1 perfect negative relationship
Example 4.4
How strong is the relationship between household income food expenditure for the data in Table 4.5?
Solution
From Example 4.3, r
S
= +0.7. Using the above table there is a strong positive relationship between household
income and household food expenditure.
Example 4.5
What is the relationship between the observed values of X and Y in Table 4.8?
Table 4.8: Observed Values of Two Variables
x y
3 10
5 16
7 20
9 22
11 23
Solution
First plot the data.
Figure 4.15: Scatterplot of Observed Values of Two Variables
0 2 4 6 8 10 12
X
0
5
10
15
20
25
Y

x
y
Each time X increases Y increasesa perfect positive relationship.
x y R(x) R(y) d
3 10 1 1 0
5 16 2 2 0
7 20 3 3 0
9 22 4 4 0
11 23 5 5 0
d
2
= 0
2
+ 0
2
+ 0
2
+ 0
2
+ 0
2
= 0
.
.
. r
S
= 1
6
d
2
n(n
2
1)
= 1
6 0
5(5
2
1)
= +1
X and Y increase together and so for each observation have the same ranks. There is a perfect positive relationship
between X and Y .
Example 4.6
What is the relationship between X and Y in Table 4.10?
Table 4.10: Observed Values of Two Variables
x y
3 21
5 20
7 12
9 6
11 4
Solution
First plot the data.
Figure 4.16: Scatterplot of Observed Values of Two Variables
0 2 4 6 8 10 12
X
0
5
10
15
20
25
Y
x
y
Each time X increases Y decreasesa perfect negative relationship.
x y R(x) R(y) d
3 21 1 5 4
5 20 2 4 2
7 12 3 3 0
9 6 4 2 2
11 4 5 1 4
d
2
= (4)
2
+ (2)
2
+ 0
2
+ 2
2
+ 4
2
= 40
.
.
. r
S
= 1
6
d
2
n(n
2
1)
= 1
6 40
5(5
2
1)
= 1
When X is small Y is large and so there is a large difference in the ranks. This makes
d
2
large and hence r
S
is
negative. There is a perfect negative relationship between the observed values of X and Y .
Quick Question 4
4.7 Linear relationships
You now know how to measure the strength of monotonic relationships between two variables. We now turn to
linear relationships between two variables.
Note that economists do not usually specify that relationships are linear, only that they are monotonic. For
example, Keynes stated:
The fundamental psychological law . . . is that men are disposed, as a rule and on average, to
increase their consumption as their income increases, but not by as much as the increase in their
income.
(The General Theory of Employment, Interest and Money, 1936, p96)
Here Keynes asserts that there is a relationship between consumption and income. As income increases, so does
consumptionbut not by as much. Keynes is asserting that there is a positive relationship between income and
consumption.
Statisticians usually estimate linear relationships. Linear equations are easier to specify and estimate than the
more general monotonic relationships. The following example illustrates the specication of a linear relationship
between two economic variables.
Example 4.7
An economy has a consumption function
c = 4.0 + 0.6i
where C is aggregate consumption expenditure in the year ($ billion) and I is aggregate income in the year ($
billion). Draw the graph of this consumption function.
Solution
Here consumption is the dependent variable and income is the independent variable. The level of consumption
for any value of I can be calculated from this equation and some values are shown in Table 4.12.
Table 4.12: A Consumption Function
Income ($billion) Consumption ($billion)
0 4.0
1 4.6
2 5.2
3 5.8
4 6.4
5 7.0
Notice that each time I increases by 1, C increases by 0.6.
If these values are graphed, the observed points lie exactly on a straight linesee Figure 4.17.
Figure 4.17: An Exact Consumption Function
C
I
0 1 2 3 4 5 6
0
1
2
3
4
5
6
7
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
_
}
}
0.6
0.6
4.0
c = 4.0 + 0.6i
The graph of any equation of the form
y =
0
+
1
x
where
0
and
1
are constants, is a straight line.
0
: is the intercept. It is the value of Y when X is zero.
1
: is the slope of the line. It is the increase in the value of Y when the value of X increases by 1 unit.
As the graph of the equation is linear, this is referred to as a linear relationship.
If
1
> 0 there is said to be a positive linear relationship between the two variables. When X increases
by one unit, Y increases by
1
units. The graph slopes upwards from left to right. When X increases,
Y increases and so this is also a positive relationship. A positive linear relationship is a special case of a
positive relationship.
If
1
< 0 there is said to be a negative linear relationship between the two variables. When X increases
by one unit, Y decreases by |
1
| units. The graph slopes downwards fromleft to right. When X increases,
Y decreases and so this is also a negative relationship. A negative linear relationship is a special case of
a negative relationship.
In practice observed points never lie exactly on a straight line. In a linear economic model it is assumed that when
all other variables are held xed (ceteris paribus ) there is a linear relationship between the dependent and the
independent variables. When taking observations all these other variables cannot be held xed and this causes the
observed values to be scattered around the straight line.
We observe a sample of points:
Figure 4.18: The Observed Sample of Points
X
Y
x
1
y
1
x
2
y
2
x
n
y
n
The deviations of the points from a straight line are due to changes in the values of other variables.
The questions to be answered are:
1. is there a linear relationship between the two variables?
2. if there is a linear relationship how strong is this linear relationship?
3. if there is a linear relationship what is the equation of the straight line around which the observed values are
scattered?
You will rst learn how to decide whether there is a linear relationship between two variables. Then, in section
4.8, you will learn how to measure the strength of a linear relationship. In section 4.9 you will learn how to
estimate the equation of a linear relationship.
To decide whether there is a linear relationship between two variables just examine their scatter diagram and see
if the relationship appears linear. This is obviously very subjective and different analysts will come to different
decisions.
Figure 4.19: A Scatterplot with a Linear Relationship
Y
X
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Here the points do appear (to me) to be scattered around an underlying linear relationshipbut you may disagree.
An estimate of the linear relationship has been drawn on the graph as a guide. As the points are scattered around
a straight line it makes sense to measure the strength of the linear relationship and to estimate the equation of the
underlying linear relationship for these data.
Figure 4.20: A Scatterplot with a Non-linear Relationship
Y
X
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
With these observations the relationship appears nonlinear. There is no point in estimating the strength of the
linear relationship or estimating the underlying linear equation. The estimated linear relationship drawn on the
scatter diagram is inappropriate.
The calculations for measuring the strength of a linear relationship and estimating its equation are always carried
out using a computer package and you will learn how to do this on Excel. However the results are only sensible
if there is a linear relationship to be estimated. Always look at the scatter diagram before accepting the results
of a computer estimated equation. Unfortunately many users of statistics do not do this and so generate linear
equations for non-linear data. This can lead to ridiculous results!
Quick Question 5
4.8 The correlation coefcient
The correlation coefcient measures the strength of the linear relationship between two variables. It can be
calculated for data measured on at least an interval scale.
The population correlation coefcient: The correlation between the variables X and Y in the population is
denoted by and is calculated by:
=
COV (X, Y )
y
where COV (X, Y ) = the covariance of X and Y in the population
x
= the standard deviation of X in the population
y
= the standard deviation of Y in the population
The sample correlation coefcient: The correlation between the variables X and Y in the sample is denoted by
r and is calculated by:
r =
cov(X, Y )
s
x
s
y
where cov(X, Y ) = the covariance of X and Y in the sample
s
x
= the standard deviation of X in the sample
s
y
= the standard deviation of Y in the sample
The linear correlation coefcient has the same interpretation for linear relationships as Spearmans rank correla-
tion coefcient has for monotonic relationships. We have 1 r 1 and r is used to measure the strength of
the linear relationship.
Table 4.13: Using the Correlation Correlation to Categorise Linear Relationships
correlation coefcient Strength of linear relationship
r = +1 perfect positive linear relationship
+ 0.7 r < +1.0 strong positive linear relationship
+ 0.4 r < +0.7 medium strength positive linear relationship
0.4 < r < +0.4 little or no linear relationship
0.7 < r 0.4 medium strength negative linear relationship
1.0 < r 0.7 strong negative linear relationship
r = 1 perfect negative linear relationship
For r = +1 and r = 1 the observed points must lie exactly on a straight line.
Most statistical calculators can calculate a correlation coefcient. Look for an r key and read the calculators
manual.
Example 4.8
Estimate the strength of the linear relationship between household monthly income and monthly food expenditure
for the sample data in Table 4.14.
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Solution
Always plot the data before calculating the correlation coefcient! If the scatter diagram shows that the relation-
ship is non-linear do not calculate the correlation coefcient as this measures the strength of the linear relationship
between the two variables.
0 2 4 6 8 10
Income($000 pm)
0.5
1.0
1.5

With so few observations it is difcult to judge whether a linear relationship is appropriate or not. Assume that a
linear relationship can be used.
cov(X, Y ) = 1.25000 from Example 4.2
s
x
= 3.53553 from a calculator
s
y
= 0.40620 from a calculator
.
.
. r =
1.25000
3.53553 0.40620
= +0.8704
In this small sample there is a strong positive linear relationship between household monthly food expenditure
and household monthly income.
Example 4.9
What are the approximate values of the correlation coefcients for the two scatter diagrams below?
Figure 4.22: Estimating the Correlation Coefcient 1
X
Y
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Figure 4.23: Estimating the Correlation Coefcient 2
X
Y
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Solution
In Figure 4.22 the points slope downwards from left to right. The correlation coefcient is negative. The points
are all close to the line and so there is a strong relationship. The correlations coefcient is in the range 0.7 to
1.0. (The correlation is in fact 0.9389but there is no way for you to see this just from looking at the scatter
diagram.)
In Figure 4.23 the points slope upwards from left to right. The correlation coefcient is positive. The points
are widely scattered around the line and so this is not a strong relationship. The correlation coefcient is some-
where between +0.4 and +0.7. There is a medium strength linear correlation between these two variables. (The
correlation is +0.5036.)
Quick Question 6
4.9 Estimating linear relationships
If the scatter diagram reveals that the observations are scattered around a straight line, the next step is to estimate
the equation of the line around which the observations are scattered. The obvious line to use is the line that goes
as close as possible to the observations. First though we must dene what is meant by the closeness of a set of
observations to a line.
The distance of an observation from a line is dened as the vertical distance between the observation and the line
and is called the residual of the observation.
Residual: the vertical distance between an observation and a line.
This is illustrated in Figure 4.24 overleaf.
Consider the ith observation with X = x
i
and Y = y
i
and the line y =

0
+

1
x.
Let e
i
= the residual of the ith observation
The height of the observation is y
i
. When X = x
i
the height of the line is y
i
=

0
+

1
x
i
.
Figure 4.24: The Residuals in a Regression
X
Y
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
........ ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ....
x
i
y
i
_
_
_
_
_
_
e
i
y
i
=

0
+

1
x
i
y
i
y =

0
+

1
x
.
.
. e
i
= the vertical distance between the observation and the line.
= y
i
y
i
= y
i
(
0
+

1
x
i
)
If an observation is above the line e
i
is positive and if an observation is below the line e
i
is negative.
e
i
measures the distance of observation i from the line. However
e
i
cannot be used to measure the total distance
of all the observations from the line because the positive distances of the observations above the line will cancel
out with the negative distances of the observations below the line. Remember that you met the same problem
when dening the variance and the same solution is used heresquare the values!
Sum of squares due to error: The total distance of a set of observations from the line y =

0
+

1
x is called the
sum of squares due to error, SSE, and is calculated by
SSE =

e
2
i
=

[y
i
(
0
+

1
x
i
)]
2
Example 4.10
Calculate the distance of the observations in Table 4.15 from the line y = 13 2x.
Table 4.15: Observed Values of X and Y
Observation Number x y
1 1 8
2 3 7
3 5 6
Solution
Table 4.16: Calculating the Distances from the Line
x
i
y
i
y
i
= 13 2x
i
e
i
= y
i
y
i
1 8 11 3
3 7 7 0
5 6 3 +3
SSE = (3)
2
+ 0
2
+ 3
2
= 18
The observations in Table 4.15 are 18 square units from the line y = 13 2x.
Figure 4.25: The Calculated Residuals
0 1 2 3 4 5 6 X
0
2
4
6
8
10
12
14
Y
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.
.............
.............
.............
.............
.............
.............
..........
.............
.............
.............
.............
.............
............
_
_
y = 13 2x
e
1
= 3
e
2
= 0
e
3
= +3
For a given set of observations changing the line will change the residuals and hence change the value of SSE.
The line that has the minimum value for the sum of squares due to error is the line that is as close as possible to
the observations. It is called the least squares line. This is the line that is used to estimate the linear relationship
between two variables.
Least squares line: For a given set of observations the line with the smallest value of SSE is called the least
squares line
The least squares line estimated from a population is written as y =
0
+
1
x. The least squares line estimated
from a sample is used to estimate the population line and is written as y =

0
+

1
x.
Least squares lines are always estimated using a statistical package or a spreadsheet. In this unit we will use Excel
for all our regressions.
Quick Question 7
Example 4.11
The monthly incomes and food expenditure of a simple random sample of 5 households is as shown below.
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Use Excel to estimate the least squares regression line for these data.
(a) What is the equation of the least squares line?
(b) What is the sum of squares due to error?
(c) What is the correlation coefcient?
(d) Calculate the residual for each observation and hence verify the gure given by Excel for the sumof squares
due to error.
Solution
The data were entered into Excel and some of the output is reproduced below:
Table 4.18: The Excel Output for the Food Expenditure Data
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.87038828
R Square 0.757575758
Adjusted R Square 0.676767677
Standard Error 0.230940108
Observations 5
ANOVA
df SS MS
Regression 1 0.5 0.5
Residual 3 0.16 0.053333333
Total 4 0.66
Coefcients Standard Error tStat
Intercept 0.5 0.193218357 2.587745848
Income 0.1 0.032659863 3.061862178
(a) What is the equation of the least squares line?
From the red gures in Table 4.18, the intercept is 0.5 and the coefcient of the independent variable,
Income, is 0.1. The estimated equation is
Food = 0.5 + 0.1 Income

The scatter diagram and the estimated line are displayed in Figure 4.26 below.
Figure 4.26: The Estimated Regression Line for Food Expenditure
0 2 4 6 8 10
Income($000 pm)
0.5
1.0
1.5
Food ($000 pm)
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
(b) What is the sum of squares due to error?
The sum of squares due to error is the sum of the squares of the residuals. From the blue number in the
output SSE = 0.16.
(c) What is the correlation coefcient?
The entry in the Multiple R row of the table gives the magnitude of the correlation coefcient but not the
sign. The sign of r is the same as the sign of the slope and so in this case is positive. The correlation coef-
cient is +0.8704 (after rounding). There is a strong positive linear relationship between food expenditure
and income.
(d) Calculate the residual for each observation and hence verify the gure given by Excel for the sumof squares
due to error.
Table 4.19: The Estimated Food Expenditures and Residuals
x
i
y
i
y
i
= 0.5 + 0.1x
i
e
i
= y
i
y
i
e
2
i
1 0.5 0.6 0.1 0.01
3 1.1 0.8 +0.3 0.09
4 0.8 0.9 0.1 0.01
7 1.0 1.2 0.2 0.04
10 1.6 1.5 +0.1 0.01
Total 0.0 0.16
The SSE is 0.16 which agrees with the Excel gure from (b).
Quick Question 8
4.10 The coefcient of determination
In the initial formulation of the simple linear model we had:
y =
0
+
1
x
However this linear equation will not hold exactly for the observed values because of the effect of changes in
the values of other variables. Changes in the values of other variables will result in the observed values being
scattered around the straight line. The residual is the difference between the observed value of Y and the straight
line. This difference is due to the effect of variables other than X on the dependent variable.
e
i
= y
i
(
0
+

1
x
i
)
.
.
. y
i
=

0
+

1
x
i
+ e
i
The observed value y
i
is determined by
(a) the value of x
i
and
(b) the value of e
i
.
The e
i
term measures the effect on Y of all variables other than X.
The variable X is separated out because, in economic theory, it is considered to be the most important variable
determining the value of Y . If the model is correct, a large part of the observed changes in Y should be explained
by the observed changes in the variable X. The coefcient of determination measures the proportion of the
observed changes in Y explained by the observed changes in X.
.
Figure 4.27: The Deviations from the Mean
Y
X
x
i
y
y
i
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
_
_
_
e
i
y =

0
+

1
x
A
B
C
D
In Figure 4.27, the ith observation is the point A in the scatter diagram. In this scatter diagram a horizontal red
line has been drawn at the mean value of the observed Y . The observed value of Y when X = x
i
is the distance
AC above y. From the tted least squares line, you can see that if we had used the regression line to predict the
deviation of Y from the mean we would have predicted a deviation from the mean of BC when X = x
i
. Of the
total deviation from the mean of AC, the deviation BC is explained by the line using the value X = x
i
. The
deviation AB = e
i
is not explained by the line.
In Figure 4.27, we have:
AD = y
i
= observed value of Y
AC = y
i
y = observed deviation of Y from the mean
BD = y
i
=

0
+

1
x
i
= explained value of Y
BC = y
i
y = explained deviation of Y from the mean
Now AC = BC + AB
.
.
. y
i
y = ( y
i
y) + (y
i
y
i
)
i.e. observed deviation = explained deviation + unexplained deviation
This equation divides the observed deviation of Y from the mean into two parts:
the deviation of Y from the mean that is explained, using the regression line, by the observed value of
X; and
the deviation of Y from the mean that is due to the effect of other variables on Y .
Example 4.12
The monthly incomes and food expenditure of a simple random sample of 5 households is as shown in Table 4.20
below.
x y
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
In Example 4.11 the least squares regression equation was estimated to be
y = 0.5 + 0.1x
For each observation calculate and interpret the observed deviation from the mean, the explained deviation from
the mean and the unexplained deviation from the mean.
Solution
The mean food expenditure is
y =
0.5 + 1.1 + 0.8 + 1.0 + 1.6
5
= 1.0.
Table 4.21: Calculating the Observed, Explained and Unexplained Deviations from the Mean
1 2 3 4 5 6
observed deviation explained deviation unexplained deviation
x
i
y
i
y
i
= 0.5 + 0.1x
i
y
i
y y
i
y y
i
y
i
1 0.5 0.6 0.5 0.4 0.1
3 1.1 0.8 0.1 0.2 0.3
4 0.8 0.9 0.2 0.1 0.1
7 1.0 1.2 0.0 0.2 0.2
10 1.6 1.5 0.6 0.5 0.1
sum 0.0 0.0 0.0
sum of squares 0.66 0.50 0.16
The calculations for these gures are explained overleaf.
Note that for each observation we have:
y
i
y = ( y
i
y) + (y
i
y
i
)
i.e. observed deviation = explained deviation + unexplained deviation
column Explanation
1 The observed values of the independent variableincome.
The income of household 1 is 1 ($000)
2 The observed value of the dependent variablefood expenditure.
The food expenditure of household 1 is 0.5 ($000)
3 The predicted value of the dependent variable from the regression line.
The income of household 1 is 1.0 and so the predicted food expenditure is
y
1
= 0.5 + 0.1 1 = 0.6
The predicted food expenditure of household 1 is 0.6 ($000)
(continued overleaf).
column Explanation
4 The observed deviation from the mean : y
i
y
Household 1 food expenditure is 0.5 compared to the mean expenditure of 1.0.
y
1
y = 0.5 1.0 = 0.5
Household 1 spends 0.5 ($000) less on food than the average household.
5 The explained deviation from the mean : y
i
y
Household 1 predicted food expenditure is 0.6 (fromcolumn 3) compared to a mean expenditure
of 1.0.
y
1
y = 0.6 1.0 = 0.4
The regression line predicts that household 1 expenditure is 0.4 ($000) less than the mean.
6 The unexplained deviation, also known as the residual : y
i
y
i
.
This is the difference between the observed value of the dependent variable and the predicted
value from the regression line. For household 1:
y
1
y
1
= 0.5 0.6 = 0.1
The observed value is 0.1 ($000) less than predicted by the regression line.
The following points hold for all least squares regressions
1. For each observation:
Observed deviation = explained deviation + unexplained deviation.
2. The sum of the observed deviations is 0 i.e.

(y
i
y) = 0.
3. The sum of the explained deviations is 0 i.e.

( y
i
y) = 0.
4. The sum of the unexplained deviations is 0 i.e.

(y
i
y
i
) = 0.
Squaring the equation
y
i
y = ( y
i
y) + (y
i
y
i
)
and summing over all observations gives (after a little tedious algebra):
(y
i
y)
2
=

( y
i
y)
2
+
_
y
i
y
i
_
2
Or SST = SSR + SSE
where SST =

(y
i
y)
2
= total observed variation around the mean in Y
and SSR =

( y
i
y)
2
= variation in Y explained by the regression model
and SSE =

_
y
i
y
i
_
2
= variation in Y not explained by the regression model
The equation
SST = SSR + SSE
is fundamental to an understanding of regression. SST is the total observed variation in the dependent variable.
The objective of a regression analysis is to explain the observed variations in the dependent variable. The equation
partitions this total into two parts.
SSR : This is the part of the observed variation in the dependent that can be explained by the observed
changes in the independent variable. The larger is SSR, the more successful is the model.
SSE : This is the part of the observed variation in the dependent that cannot be explained by the observed
changes in the independent variable. This variation is due to changes in the values of variables other than
X. With a successful model this number should be small.
Dividing the equation
SST = SSR + SSE
by SST gives
1 =
SSR
SST
+
SSE
SST
Let R
2
=
SSR
SST
Then 1 R
2
=
SSE
SST
Then R
2
= the proportion of the variation in Y explained by the variation in X
1 R
2
= the proportion of the variation in Y not explained by the variation in X.
R
2
is called the coefcient of determination.
The coefcient of determination: The proportion of the observed variation in Y explained by the observed
variation in X.
1 R
2
is called the coefcient of indetermination.
The coefcient of indetermination: The proportion of the observed variation in Y that cannot be explained by
the observed variation in X and is due to the effect of changes in variables other than X on Y .
Note:
(a) As R
2
is a proportion we have:
0 R
2
1
(b) The larger is R
2
, the more of the variation in Y that can be explained by the variation in X.
(c) R
2
= 1 if SSE = 0. But SSE =

e
2
i
, the sum of the squares of the residuals, and so can only be 0 if all
the residuals are 0. The residual is the vertical distance from the observed value to the regression line and
so can only be zero if the regression line goes through the observed value. Thus R
2
= 1 if the regression
line goes through all the observations.
(d) It can be shown that the coefcient of determination is just the square of the coefcient of correlation i.e.
R
2
= r
2
.
Example 4.13
For the data in Example 4.12
(a) Calculate the total variation in food expenditure, the variation in food expenditure explained by the model
and the variation in food expenditure not explained by the model.
(b) Calculate and interpret the coefcient of determination.
Solution
(a) Total variation = SST = (0.5)
2
+ 0.1
2
+ (0.2)
2
+ 0.0
2
+ 0.6
2
= 0.66
Explained variation = SSR = (0.4)
2
+ (0.2)
2
+ (0.1)
2
+ 0.2
2
+ 0.5
2
= 0.50
Unexplained variation = SSE = (0.1)
2
+ 0.3
2
+ (0.1)
2
+ (0.2)
2
+ 0.1
2
= 0.16
(b) R
2
=
SSR
SST
=
0.50
0.66
= 0.7576
There is a total difference of 0.66 ($000)
2
between the food expenditures of the 5 households in the sample.
Of these total differences, 0.50 ($000)
2
can be explained by the different incomes of the 5 households. The
remaining 0.16 ($000)
2
of the differences in the food expenditures cannot be explained by the differences
in the incomes of the households and so must be due to other factorssuch as the number of people in the
household. Thus 75.76% of the observed differences in household food expenditures can be explained by
observed differences in household incomes.
The methods used above are not efcient. Use a statistical package for estimating linear regressions. If you must
use hand calculations, use the method described in the next section.
Example 4.14
Use Excel to calculate the total variation in food expenditure, the variation in food expenditure explained by the
model and the variation in food expenditure not explained by the model for the data in Example 4.12. Calculate
the coefcient of determination for these data.
Solution
Part of the Excel output is reproduced below.
SUMMARY OUTPUT
R Square 0.757575758
Observations 5
ANOVA
df SS MS
Residual 3 0.16 0.053333333
Total 4 0.66
From the blue gures we have:
Total variation = SST = 0.66
Explained variation = SSR = 0.50
Unexplained variation = SSE = 0.16
From the green gures
Coefcient of determination = R
2
= 0.7576 (after rounding)
Example 4.15
In a regression between the salary and length of service of public servants it was found that the correlation
coefcient was +0.8. Interpret this gure.
Solution
As the correlation coefcient is positive, salary increases with length of service. The square of the correlation
coefcient is 0.64 and so 64% of the observed variation in salaries can be explained by differences in length of
service.
Example 4.16
In a regression between the volume of petrol sold and the price of petrol, the correlation coefcient was -0.3.
Interpret this gure.
Solution
As the correlation coefcient is negative, as the price of petrol increases, so the volume of petrol sold falls.
However, the relationship is not a strong one with only 9% of the observed variation in petrol sales explained by
the observed variation in prices. Other factors in the e term resulted in the remaining 91% of the changes in the
sales of petrol.
Quick Question 9
4.11 Estimating linear relationships by hand
The general advice here is dont! If you absolutely must estimate a regression equation by hand use the following
formulae.
Let x = the mean of the observed values of the independent variable
y = the mean of the observed values of the dependent variable
s
x
= the standard deviation of the observed values of the independent variable.
s
y
= the standard deviation of the observed values of the dependent variable.
cov(X, Y ) = the covariance of the observed values of the variables.
Then the estimated regression equation is y =

0
+

1
x where
1
=
cov(X, Y )
s
2
x
0
= y

1
x
To calculate the coefcients of determination and correlation use:
SST = (n 1)s
2
y
SSR = (n 1)
1
cov(X, Y )
R
2
=
SSR
SST
r =
R
2
with r having the same sign as b
Example 4.17
The following data on food expenditure and income were collected:
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Estimate the least squares line:
y =

0
+

1
x
and calculate the coefcients of determination and correlation.
Solution
Step 1 : Calculate the following sums from the data
x
i
= 25

x
2
i
= 175

x
i
y
i
= 30
y
i
= 5

y
2
i
= 5.66 n = 5
Step 2 : Intermediate calculations
s
2
x
=
1
n 1
_
x
2
i

(
x
i
)
2
n
_
=
1
4
_
175
25
2
5
_
= 12.5
s
2
y
=
1
n 1
_
y
2
i

(
y
i
)
2
n
_
=
1
4
_
5.66
5
2
5
_
= 0.165
cov(X, Y ) =
1
n 1
_
x
i
y
i
x
i
)(
y
i
)
n
_
=
1
4
_
30
25 5
5
_
= 1.25
Step 3 : Calculate the parameters of the equation
1
=
cov(X, Y )
s
2
x
=
1.25
12.5
= 0.1
0
= y

1
x =
5
5
0.1
25
5
= 0.5
Thus the estimated equation is
y = 0.5 + 0.1x
Step 4 : Calculate the coefcient of determination and correlation
SST = (n 1)s
2
y
= 4 0.165 = 0.66
SSR = (n 1)
1
cov(X, Y ) = 4 0.1 1.25 = 0.50
SSE = SST SSR = 0.66 0.50 = 0.16
R
2
=
SSR
SST
=
0.50
0.66
= 0.7576
r =
R
2
= +
0.7576 = +0.8704
From the coefcient of determination, 76% of the observed differences in food expenditures can be explained by
the observed differences in incomes. The other 24% is due to other factors.
Notice that the correlation coefcient is taken as the positive square root of the coefcient of determination
because the slope of the line is positive (
1
= +0.1).
Quick Question 10
4.12 Unit summary
In this unit you have learned how to measure the strength of the relationship between two variables. There is no
general measure of the strength of the relationship. However:
When the relationship is monotonic use Spearmans rank correlation coefcient to measure the strength
of the relationship between two variables. Remember that if the number is close to zero this only shows
that there is not a monotonic relationshipthere could still be a non-monotonic relationship between the
variables.
When the relationship is linear use the correlation coefcient to measure the strength of the relationship
between two variables. Remember that if the number is close to zero this only shows that there is not a
linear relationshipthere could still be a non-relationship between the variables.
If the rank correlation coefcient is close to +1 or 1 this shows that the there is a strong monotonic relationship
between the two variables. However it does not show that changes in the value of one variable have caused
changes in the value of the other variable only that the two variables have changed together.
If the correlation coefcient is close to +1 or 1 this shows that the there is a strong linear relationship between
the two variables. However it does not show that changes in the value of one variable have caused changes in the
value of the other variable only that the two variables have changed together.
Read the learning objectives at the start of the unit. Have you achieved these objectives? If there are any objectives
about which you are unclear reread the appropriate sections before trying the tutorial exercises.
UNIT 5
PROBABILITY
5.0 Contents
5.1 Unit objectives
5.2 Introduction
5.3 Experiments, outcomes and sample spaces
5.4 Calculating probabilities for outcomes
5.4.1 The empirical approach
5.4.2 The classical approach
5.4.3 The subjective approach
5.5 Calculating the probability of events
5.6 Conditional probabilities and tree diagrams
5.6.1 Independent events
5.7 Venn diagrams and the rules of probability
5.7.1 The probability an event does not occur
5.7.2 The probability event A or event B occurs
5.7.3 The probability both event A and event B occur
5.7.4 The rules of probability
5.8 The reliability of the sample mean.
5.9 Summary
Print Workbook 5
5.1 Unit objectives
As you know, the fundamental problem of statistics is to measure the reliability of statements made about pop-
ulations based on evidence gathered from samples. This unit describes the probability theory that is used in the
measurement of reliability.
Dene the terms outcomes, sample spaces and events;
Explain the three approaches used to assign probabilities to outcomes and events;
Calculate the probabilities of events from the probabilities of outcomes;
Calculate conditional probabilities;
Dene the term independent events and determine whether or not two events are independent;
Use Venn diagrams and tree diagrams to calculate probabilities for combinations of events;
Use the 5 rules of probability and their 3 special cases to calculate probabilities for combinations of
events;
Measure the reliability of using the mean of a small sample to estimate the mean of a small population.
Practice is an essential part of the learning process in all parts of statistics but it is most important in probability.
You will not master the concepts unless you practice. Try and solve each example yourself before reading the
solution.
5.2 Introduction
In Unit 1 you learned the terms descriptive statistics and inferential statistics:
Descriptive statistics: Methods of collecting, summarising, and presenting data.
calculated.
Examples of the types of question to be answered in inferential statistics are:
If a random sample of incomes is taken, what is the probability that the mean income in the sample is
within $5,000 of the mean income in the population?
How big a sample do we need to take to be 95% certain that the mean income in the sample is within
$5,000 of the mean income in the population?
A job advertisement claims that the mean income of salesmen is more than $50,000 but a random sample
of 20 salesmen only had a mean income of $45,000. How certain can we be that the claim in the
advertisement is false?
In Units 1 to 4 we have focused on descriptive statistics. In Units 5 to 7 you will learn the probability concepts
that form the basis of inferential statistics. You must master these concepts as they are applied throughout the
study of inferential statisticsUnits 8 to 12.
This unit takes a three-step approach to probability:
Probability is dened for each of the outcomes of an experiment.
These outcomes are combined into events and the probability of events calculated by summing the prob-
abilities of the outcomes in the event.
Events are combined into compound events and the probabilities of compound events calculated from
the probabilities of the component events by applying the rules of probability.
An example of the application of probability to the measurement of the reliability of sampling is given at the end
of the unit.
5.3 Experiments, outcomes and sample spaces
Probability theory is based on the idea of an experiment, the outcome of which cannot be predicted with certainty.
The rst stage in the analysis of an experiment is to list the possible outcomes of the experiment and to estimate
the likelihood of each of the outcomes occurring.
Random experiment: A process or course of action that results in one of a number of possible outcomes. The
outcome that occurs cannot be predicted with certainty.
Outcome: A particular result of an experiment.
Sample space: The set of all the possible outcomes of an experiment. The outcomes in the set must be mutually
exclusive and collectively exhaustive.
Recall that in unit 1, you learned the conditions: mutually exclusive and collectively exhaustive. For outcomes,
these conditions require that the result of the experiment is always one, and only one, of the listed outcomes.
Some examples should make these denitions easier to understand.
Example 5.1
A student sits at the side of the road and records the colour of each car as it passes. What is the sample space?
Solution
Here the experiment is the recording of the colour of the car. The result of the experiment cannot be predicted
with certainty. One possible outcome is blue, another possible outcome is green. The sample space could be
(there are many possible sample spaces):
S = {white, yellow, blue, green, red, other}
Note that the outcome other has been included to make the outcomes collectively exhaustive. As each car passes,
it must be recorded as being only one of the set of colours listed in the sample space. The list of outcomes is then
mutually exclusive.
Example 5.2
A six-sided die is thrown and the number uppermost recorded. What is the sample space?
Solution
One possible outcome is that a 6 is uppermost, another is that a 1 is uppermost. The sample space is:
S = {1, 2, 3, 4, 5, 6}
Each time the die is thrown, one and only one of these outcomes is observed.
The experiment of greatest interest to statisticians is the selection of a sample. This is illustrated in the following
example. This example will be used throughout this subject to explain different concepts.
Example 5.3
There are four employees in an ofce: Arthur, Ben, Clare and Daniel. A sample of size two is taken with
replacement and with ordering from this population of 4 employees. What is the sample space?
Solution
Let us denote Arthur by a, Ben by b, Clare by c and Daniel by d. Then one possible outcome is ca with Clare
selected rst and Arthur second. Another possible outcome is ac with Arthur selected rst and Clare selected
second. The sample space is:
When the sample is taken, one and only one of these samples is observed. The listed outcomes are mutually
exclusive and collectively exhaustive.
The general notation used for outcomes and sample spaces is:
n = the number of outcomes in the sample space.
e
i
= outcome i for i = 1, 2, 3, . . . , n.
S = the sample space.
.
.
. S = {e
1
, e
2
, e
3
, . . . , e
n
}
In Example 5.3
n = 16
e
1
= aa, e
2
= ab, e
3
= ac, etc
Quick Question 1
Quick Question 2
5.4 Calculating probabilities for outcomes
In some of the experiments discussed above, it is possible that not all the listed outcomes have the same likelihood
of occurrence. In Example 5.1 a car is more likely to be white than blue. The likelihood of an outcome occurring
is called its probability.
Probability of an outcome: A number between 0 and 1 inclusive that measures the likelihood of the outcome
occurring. The more likely the outcome the closer is the number to 1. The less likely the outcome the closer is
the number to 0.
The probability of outcome e
i
is written as P(e
i
).
There are three approaches to assigning probabilities to outcomes:
the empirical approach
the classical approach
the subjective approach.
These approaches to probability are discussed in the next three subsections.
With this approach the probability of each outcome is found by repeating the experiment a large number of times.
A run of the experiment is called a trial.
The empirical probability of an outcome: The probability of an outcome is the proportion of times the outcome
is observed in a very large number of trials.
Example 5.4
The colour of cars passing along College Street were recorded using the sample space:
S = {white, yellow, blue, green, red, other}
The colours of 5000 cars were recorded and the results are displayed in Table 5.1 overleaf. Find the probability
of each car colour.
Solution
The experiment is to record the colour of a car passing along College Street. This experiment was carried out
5000 times and so there were 5000 trials of the experiment.
Table 5.1: Colours of Cars on College Street
Colour Frequency
white 1000
yellow 1500
blue 500
green 1000
red 500
other 500
Total 5000
Source: Physical count of cars in the week beginning 15 December 1999
P(white) = proportion of cars that are white
=
1000
5000
= 0.2
Repeating this for each colour gives the following probabilities.
Colour Frequency Probability
white 1000
1000
5000
= 0.20
yellow 1500
1500
5000
= 0.30
blue 500
500
5000
= 0.10
green 1000
1000
5000
= 0.20
red 500
500
5000
= 0.10
other 500
500
5000
= 0.10
Total 5000 1.00
With this approach each outcome in the sample space is asserted to have the same chance of occurring.
Classical probability sample space: A sample space in which each outcome has the same probability of occur-
ring.
Classical probability of an outcome: For a sample with n equally likely outcomes, the probability of each out-
come is
1
n
.
Example 5.5
A fair six side die is thrown and the number uppermost recorded:
S = {1, 2, 3, 4, 5, 6}
Find the probability of each outcome.
Solution
The word fair is used to indicate the each of the sides of the die has the same chance of being uppermost. As
there are 6 outcomes in the sample space each outcome has a probability of
1
6
.
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) =
1
6
With this approach to probability each outcome in the sample space is allocated a probability based on the beliefs
and experiences of the individual.
Subjective probability of an outcome: An estimated probability of an outcome based on the beliefs and experi-
ences of an individual.
Example 5.6
At the end of this subject you will be allocated one of the grades Fail (NX), Pass (P), Credit (CR), Distinction
(DI) or High Distinction (HD). How do you rate your chances of each grade?
Solution
S = {NX, P, CR, DI, HD}
and you may allocate probabilities (perhaps optimistically) as:
P(NX) = 0.0, P(P) = 0.2, P(CR) = 0.4, P(DI) = 0.3 and P(HD) = 0.1.
Each of the three approachesempirical, classical and subjectivedescribed above assigns a probability to each
outcome in the sample space.
The probability of outcome e
i
occurring is denoted by P(e
i
). Whatever the method used to allocate probabilities
to the outcomes, the probabilities must satisfy the following conditions.
'
&
$
%
Probabilities of outcomes
The probabilities of outcomes must satisfy:
(a) 0 P(e
i
) 1
(b)
i=n
i=1
P(e
i
) = 1
To recap the position so far:
The list of the mutually exclusive and collectively exhaustive outcomes of the experiment is called the
sample space.
Each outcome, e
i
, has a probability P(e
i
).
The probabilities of the outcomes satisfy the conditions listed above.
Socrates
Quick Question 3
Quick Question 4
5.5 Calculating the probabilities of events
In the previous section you learned to allocate probabilities to outcomes. Frequently, we are more interested in
knowing whether or not the outcome has some property than in the value of the outcome itself. For example, in
Example 5.5 it might be of interest to know whether the number is even or odd. For each property of interest,
some of the outcomes will have the property and some will not. The set of outcomes with a specied property is
called an event.
Event: A collection of one or more outcomes of an experiment
Upper case letters A, B, C, D etc are used to denote events. The set of outcomes not satisfying the property is
denoted by the same upper case letter with a bar A, B, C etc.
Example 5.7
A die is thrown and the number uppermost recorded. List the outcomes in the following events.
(a) the number uppermost is even.
(b) the number uppermost is not even.
(c) the number uppermost is four or less.
Solution
The sample space is S = {1, 2, 3, 4, 5, 6}.
Let E = the event the number uppermost is even.
F = the event the number uppermost is four or less.
(a) The outcomes 2, 4 and 6 have the property of being even. The result of the experiment is even if any one of
these outcomes occurs. The other outcomes do not have this even property. Thus we have
E = {2, 4, 6}
(b) The outcomes 1, 3 and 5 have the property of being not even. The other outcomes do not have this property.
Thus we have
E = {1, 3, 5}
(c) F = {1, 2, 3, 4}
Any property of interest leads to a list of outcomes that have this property. The result of the experiment has the
property of interest if any of these listed outcomes occurs.
A useful way of visualising events is to use a Venn diagram. In a Venn diagram, the sample space is represented
by a rectangle and the outcomes by the points within the rectangle. An event is then some of the points within the
rectangle and so can be represented by a region within the rectangle.
Figure 5.1: Venn Diagram for An Event
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
. A
A
If the result of the experiment is one of the outcomes in the region A then the event A occurs. If the outcome is
outside the region A then the event A does not occur.
Notice that when an experiment is run one and only one of the outcomes occurs. Outcomes are always mutually
exclusive. Events are not always mutually exclusive. In Example 5.7 if the outcome is a 2 or a 4 then both events
E and F occur.
The Venn diagram is particularly useful when there are two eventsas shown in Figure 5.2
Figure 5.2: Venn Diagram for Two Events
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
B
If the result of the experiment is one of the outcomes in the shaded area then either event A has occurred or event
B has occurred or both have occurred.
Note carefully that in future when we talk of event A or event B occurring we include the case of both event A
and event B occurring. In the above diagram the event A or B is the list of all the outcomes in the combined
shaded areas.
If the result of the experiment is an outcome that lies in both the region A and the region B then both event A and
event B have occurred. In Figure 5.3 if the outcome is in the shaded area then both events A and B occur.
Figure 5.3: Venn Diagram for Two Events
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................
B
Example 5.8
A die is thrown and the number uppermost recorded. List the outcomes in the following events.
(a) the number uppermost is not 4 or less.
(b) the number uppermost is either even or 4 or less.
(c) the number uppermost is both even and 4 or less
Solution
Let E = the event the number uppermost is even = {2, 4, 6}
F = the event the number uppermost is four or less = {1, 2, 3, 4}
(a) F = {5, 6}
If the number uppermost is 5 or 6 then the number uppermost is not four of less.
(b) E or F = {1, 2, 3, 4, 6}
If the number uppermost is 1 or 3 then the outcome is four or less but not even. If the number uppermost is
6 then the outcome is even but not four or less, If the number uppermost is 2 or 4 then the outcome is both
even and four or less. Remember that in probability or includes the outcomes that are in both events.
(c) E and F = {2, 4}
If the number uppermost is 2 or 4, then the number uppermost is both even and four or less.
In Section 5.3 you learned how to nd the probabilities of outcomes. In this section you will learn how to nd the
probabilities of events. Different methods can be used for the three different approaches to probability but there
is one method that can be used for all three approaches to probability. This method is simple.
To nd the probability of any event
1. list the outcomes in the event.
2. sum the probabilities of the outcomes in the event.
In the subsections below this method is explained for each of the three approaches to probability.
With this approach the probability of an outcome is the proportion of times the outcome is observed when the
experiment is repeated a very large number of times. The same denition can be extended to events.
The empirical probability of an event: The probability of an event is the proportion of times one of the outcomes
in the event occurs in a large number of trials.
Example 5.9
The colours of 5000 cars passing along College Street were recorded and the the results are displayed in Table 5.3
below. What is the probability the car is either blue or green?
Colour Frequency Probability
white 1000
1000
5000
= 0.20
yellow 1500
1500
5000
= 0.30
blue 500
500
5000
= 0.10
green 1000
1000
5000
= 0.20
red 500
500
5000
= 0.10
other 500
500
5000
= 0.10
Total 5000 1.00
Solution
The outcome blue occurs in 500 of the trials and the outcome green occurs in 1000 of the trials. One of the
outcomes in the event occurs in (500 + 1000) of the 5000 trials.
P(blue or green) = proportion of cars that are blue or green =
500+1000
5000
= 0.30.
Notice that the same result could have been obtained by:
P(blue or green) =
500+1000
5000
=
500
5000
+
1000
5000
= P(blue) +P(green)
= 0.10 + 0.20
= 0.30.
The second approach used in Example 5.9 above can be applied to any relative frequency probability problem.
To nd the probability of any event
This is the most generally applicable approach to nding the probabilities of events.
The classical probability of an event: The proportion of the outcomes in the sample space that are in the event.
For any event A we have:
P(A) =
number of outcomes in A
number of outcomes in the sample space
Example 5.10
A die is thrown and the number uppermost recorded. Find the probabilities of the following events.
(a) the number uppermost is even.
(b) the number uppermost is not even.
(c) the number uppermost is four or less.
Solution
Let E = the event the number uppermost is even.
F = the event the number uppermost is four or less.
(a) E = {2, 4, 6}
.
.
. P(E) =
number of outcomes in E
=
3
6
(b) E = {1, 3, 5}
.
.
. P(E) =
number of outcomes in E
=
3
6
(c) F = {1, 2, 3, 4}
.
.
. P(F) =
number of outcomes in F
=
4
6
Notice that these questions could have been answered by applying the same method as used for the relative
frequency approach i.e.
For the die example each outcome has a probability of
1
6
. Thus, for example
E = {2, 4, 6}
.
.
. P(E) = P(2) +P(4) +P(6) =
1
6
+
1
6
+
1
6
=
3
6
Use the same method as with the objective approach.
The subjective probability of an event: The sum of the subjective probabilities of the outcomes in the event.
Example 5.11
A student has estimated the following probabilities for her grade in this subject.
P(NX) = 0.0, P(P) = 0.2, P(CR) = 0.4, P(DI) = 0.3,and P(HD) = 0.1.
What is the probability the student obtains a grade of credit or higher?
Solution
S = {NX, P, CR, DI, HD}
Let C = the event the student has a grade of credit or higher
Then C = {CR, DI, HD}
.
.
. P(C) = P(CR) +P(DI) +P(HD) = 0.4 + 0.3 + 0.1 = 0.8
There is a probability of 0.8 that the student obtains a grade of credit or higher.
Example 5.12
A second student was asked about her chances of passing BS1. She believes that the possible outcomes and their
probabilities from studying BS1 this semester are:
Table 5.4: Probabilities of Outcomes From Studying BS1
Outcome Description Probability
e
1
she works hard and passes 0.6
e
2
she works hard but fails 0.1
e
3
she does not work hard but passes 0.1
e
4
she does not work hard and fails 0.2
What is the probability that this student passes BS1?
Solution
S = {e
1
, e
2
, e
3
, e
4
}
Let A = the event the student passes BS1
Then A = {e
1
, e
3
}
.
.
. P(A) = P(e
1
) +P(e
3
) = 0.6 + 0.1 = 0.7
There is a probability of 0.7 that the student passes the subject.
Lets have another recap of the situation so far.
1. The list of the mutually exclusive and collectively exhaustive outcomes of the experiment is called the sample
space.
2. Each outcome, e
i
, has a probability P(e
i
). These probabilities satisfy the conditions listed below
(a) 0 P(e
i
) 1
(b)
i=n
i=1
P(e
i
) = 1
3. An event is a list of outcomes. To nd the probability of any event sum the probabilities of the outcomes in
the event.
Quick Question 5
Quick Question 6
5.6 Conditional probabilities and tree diagrams
The occurrence of one event may affect the probability of some other event occurring. In Example 5.12 if the
event works hard occurs, the probability that the student passes the course increases. However, if the event does
not work hard occurs, the probability that the student passes falls. In general, as information about the progress
of an experiment comes to hand, so the probabilities of events may need to be updated.
Conditional probability: The probability that event B occurs given that event A has already occurred is called
the conditional probability of B given A and is denoted by P(B | A).
Example 5.13
All students at a Business College major in either economics or accounting. A sample of students were asked
their gender and major. The responses are given in Table 5.5 below.
Table 5.5: College Students by Gender and Major
Gender Accounting Economics Total
Male 30 10 40
Female 40 20 60
Total 70 30 100
(a) If a student is selected at random, what is the probability the student is majoring in accounting?
(b) If a student is selected at random and the selected student is male, what is the probability the student is
majoring in accounting?
(c) If a student is selected at random and the selected student is female, what is the probability the student is
majoring in accounting?
Solution
Consider the events
M = the event the student is male
F = the event the student is female.
A = the event the student is majoring in accounting.
E = the event the student is majoring in economics.
(a) The student is being selected from the whole class of 100 students and 70 of these 100 students are majoring
in accounting. Applying the classical approach to probability (see section 5.5.2) gives:
P(A) =
number of accounting major students
total number of students
=
70
100
= 0.70
(b) Now the additional information that the selected student is male has become available. The selected student
is one of the 40 males and 30 of these 40 male students are majoring accounting.
Male 30 10 40
Female 40 20 60
Total 70 30 100
P(A | M) =
number of male accounting students
total number of male students
=
30
40
= 0.75
Receiving the additional information that the student is male has increased the probability that the student
is majoring in accounting from 0.70 to 0.75.
(c) Now the additional information is that the selected student is female. The selected student is one of the 60
females and 40 of these 60 female students are studying accounting.
Male 30 10 40
Female 40 20 60
Total 70 30 100
P(A | F) =
number of female accounting students
total number of female students
=
40
60
= 0.67
Receiving the additional information that the student is female has reduced the probability that the student
is majoring in accounting from 0.70 to 0.67.
With conditional probabilities there are two eventssay event A and event B. If event A occurs the probability
of event B occurring may need to be adjusted to take account of this changed situation. The probability of B
occurring if A has already occurred is called the conditional probability of B given A. These conditional prob-
abilities are of great importance and so it is useful to have a general formula for their calculation. This formula
is derived in the paragraph below by using the empirical approach to probability. You can skip this derivation if
you want and go straight to the formula at bottom of the next pagebut you must be able to apply this important
result.
Using the empirical approach to probability, the experiment is run a large number of times and each time the
experiment is run whether event A and/or event B occurs is recorded. If the experiment is run a total of N times
the recorded frequencies could be as shown in Table 5.8 below.
Table 5.8: Recording the Occurrences of Events A and B
B B Total
A n
11
n
12
r
1
A n
21
n
22
r
2
Total c
1
c
2
N
The experiment has been run N times. On n
11
of these N runs both event A and event B occurred. On n
12
of the
N runs event A occurred but event B did not occur etc. Obviously n
11
+n
12
+n
21
+n
22
= N.
The event A occurred on r
1
(where r
1
= n
11
+n
12
) of the N runs of the experiment. Thus we have
P(A) =
r
1
N
On n
11
of the runs both events A and B occurred.
P(A and B) =
n
11
N
If the information is received that on one run of the experiment event A occurred then this must be one of the r
1
runs on which A occurred. In these r
1
runs of the experiment event B occurred n
11
times.
.
.
. P(B | A) =
n
11
r
1
Dividing the numerator and denominator of this equation by N gives
P(B | A) =
n
11
/N
r
1
/N
=
P(A and B)
P(A)
This is the important formula for calculating conditional probabilities.
'
&
$
%
Conditional Probabilities of Events
The probability that event B occurs when event A has already occurred is denoted by P(B | A) and is found
by:
P(B | A) =
P(A and B)
P(A)
A tree diagram is a useful tool for visualising and presenting conditional probabilities.
With the tree diagram, the two events are treated sequentially.
Stage 1: Either event A occurs or event A does not occur.
Stage 2: Either event B occurs or event B does not occur.
Figure 5.4: The Tree Diagram For Two Events
r
r
r
r
r
r
r
r
r
r
r
A
A
O
B
B
B
B
P(A)
P(A)
P(B | A)
P(B | A)
P(B | A)
P(B | A)
Stage 1 Stage 2
The diagram is read from left to right. In stage 1 the two branches from 0 represent the possible results with
regard to event A. Along the upper of these two branches event A occurs and so the conditional probabilities in
the second stage extensions of this branch give the probabilities of B occurring and not occurring given that event
A has already occurred. Similarly along the lower of the two branches from 0 event A does not occur and the
extensions give the probability that B occurs and does not occur given that event A has not occurred.
The concept of conditional probability and the use of the tree diagram is illustrated in the examples below.
Example 5.14
A student is randomly selected from the students at the Business College in Example 5.13.
Let M = the event the selected student is male
F = the event the selected student is female
A = the event the selected student is studying accounting
E = the event the selected student is studying economics
Draw the tree diagram for this experiment.
Solution
Male 30 10 40
Female 40 20 60
Total 70 30 100
P(M) =
40
100
P(F) =
60
100
For males
P(A | M) =
P(A and M)
P(M)
=
30/100
40/100
=
30
40
P(E | M) =
P(E and M)
P(M)
=
10/100
40/100
=
10
40
For females
P(A | F) =
P(A and F)
P(F)
=
40/100
60/100
=
40
60
P(E | F) =
P(E and F)
P(F)
=
20/100
60/100
=
20
60
Figure 5.5: The Tree Diagram For Selecting a Single Student
r
r
r
r
r
r
r
r
r
r
M
F
O
A
E
A
E
40
100
60
100
30
40
10
40
40
60
20
60
Gender Major
Example 5.15
For Example 5.12, what is the probability that the student passes if she works hard? What is the probability that
the student passes if she does not work hard? Draw the tree diagram for this experiment.
The outcomes and their probabilities are as follows:
e
1
e
2
e
3
e
4
Solution
Let W = the event the student works hard
A = the event the student passes the course
Then A = {e
1
, e
3
}
.
.
. P(A) = P(e
1
) +P(e
3
) = 0.7
W and A = {e
1
}
.
.
. P(W and A) = P(e
1
) = 0.6
.
.
. P(A | W) =
P(W and A)
P(W)
=
0.6
0.7
= 0.86
If she works hard, the probability that she passes is 0.86.
What if she does not work hard?
P(A | W) =
P(W and A)
P(W)
=
0.1
0.3
= 0.33
If she does not work hard, the probability that she passes is only 0.33.
Figure 5.6: The Tree Diagram For the Student Experiment
r
r
r
r
r
r
r
r
r
r
r
r
W
W
O
A
A
A
A
0.7
0.3
0.86
0.14
0.33
0.67
Works hard Passes
Example 5.16
A bank has 8 accounts that are overdrawn by more than $1,000,000. Of these 8 accounts, 3 are doubtful debts.
An auditor randomly selects two of the 8 accounts and records whether each of the selected accounts is a doubtful
debt.
Let A = the event the rst account selected is a doubtful debt
B = the event the second account selected is a doubtful debt
Draw the tree diagram for this experiment.
Solution
Figure 5.7: The Tree Diagram for the Account Selection Problem
.
r
r
r
r
r
r
r
r
r
r
r
A
A
B
B
B
B
3
8
5
8
2
7
5
7
3
7
4
7
Selection 1 Selection 2
O
X
Y
The probabilities are calculated in the following way:
At O: The rst selection is made from the 8 accounts. Three of the 8 accounts are doubtful and 5 are
not.
.
.
. P(A) =
3
8
P(A) =
5
8
At X: The rst selection has been made and this selection is an account with a doubtful debt. This
leaves 7 accounts, of which 2 are doubtful and 5 are not.
.
.
. P(B | A) =
2
7
P(B | A) =
5
7
At Y: The rst selection has been made and this selection is an account that is not a doubtful debt.
This leaves 7 accounts, of which 3 are doubtful and 4 are not.
.
.
. P(B | A) =
3
7
P(B | A) =
4
7
5.6.1 Independent events
P(B) is the probability that event B occurs before it is known whether or not event A has occurred.
P(B | A) is the probability of event B occurring after it becomes known that event A has occurred.
There are three cases:
1. P(B | A) < P(B) event A occurring has made event B less likely to occur than before.
2. P(B | A) > P(B) event A occurring has made event B more likely to occur than before.
3. P(B | A) = P(B) event A occurring has had no effect on the probability of event B occurring.
It can be shown that if P(B | A) = P(B) then P(A | B) = P(A) and so neither event has any effect on the
chances of the other event occurring. The events A and B are then said to be independent.
Independence (1): Events A and B are said to be independent if
P(B | A) = P(B) and P(A | B) = P(A).
The two conditions stated in the above denition are equivalent. If one of the conditions is satised, the other is
automatically satised . To demonstrate that two events are independent it is only necessary to show that one of
these two conditions is satised.
If events A and B are independent then
P(B) = P(B | A)
=
P(A and B)
P(A)
Multiplying both sides by P(A) gives:
P(A) P(B) = P(A and B)
This gives a third way of showing that two events are independent and this result is often used as the denition of
independence.
Independence (2): Events A and B are said to be independent if
P(A and B) = P(A) P(B)
The two denitions given for independence are equivalent. You should use the denition that is more convenient
for the problem at hand.
Example 5.17
For Example 5.12, are the two events she works hard and she passes the unit independent events? The outcomes
and their probabilities are as follows:
e
1
e
2
e
3
e
4
Solution
Let W = the event she works hard
A = the event she passes the course
Then P(A) = P(e
1
) +P(e
3
) = 0.7
and P(W) = P(e
1
) +P(e
2
) = 0.7
and P(W and A) = P(e
1
) = 0.6
.
.
. P(A | W) =
P(W and A)
P(W)
=
0.6
0.7
= 0.86
.
.
. P(A | W) > P(A)
If she works hard, the probability that she passes increases from 0.70 to 0.86. This is an example of case 2 above.
Event W occurring increases the probability of event A occurring. The two events are not independent.
Example 5.18
A community has two newspapers: the Times and the Herald. The Times is read by 50% of households and the
Herald by 40% of households. Only 20% of households read both papers. Are reading the Times and reading the
Herald independent events?
Solution
Let T = the event a selected household reads the Times
H = the event a selected household reads the Herald
Then P(T) = 0.50
P(H) = 0.40
P(T and H) = 0.20
EITHER P(T | H) =
P(T and H)
P(H)
=
0.20
0.40
= 0.50
.
.
. P(T | H) = P(T)
Events T and H are independent from denition (1).
OR P(H | T) =
P(T and H)
P(T)
=
0.20
0.50
= 0.40
.
.
. P(H | T) = P(H)
OR P(T) P(H) = 0.50 0.40 = 0.20
.
.
. P(T and H) = P(T) P(H)
The event reads the Times is independent of the event reads the Herald.
Statistical experiments are often constructed in such a way as to make events independent. This simplies the
calculation of probabilities as denition (2) of independence can be applied.
Example 5.19
A population contains the 4 people a, b, c, and d. A simple random sample of size 2 is taken with replacement.
What is the probability that a is the rst person and b the second person selected?
Solution
Let A
1
= the event the rst person selected is a
B
2
= the event the second person selected is b
Then P(A
1
) = P(B
2
) =
1
4
The sample is selected with replacement with the rst person selected replaced in the population before the second
person is selected. Clearly the rst selection has no effect on the second selection. The rst and second selections
are independent.
.
.
. P(A
1
and B
2
) = P(A
1
) P(B
2
) =
1
4

1
4
=
1
16
There is a probability of
1
16
that a is the rst person selected and b is the second person selected.
Quick Question 7
Quick Question 8
Socrates
5.7 Venn diagrams and the rules of probability
You now know two of the rules for nding probabilities with one special case:
'
&
$
%
The Rules of Probability
General Rules
1. The probability of an event from the probabilities of the outcomes:
P(A) =
A
P(e
i
) (5.1)
2. The probability that B occurs when A has already occurred:
P(B | A) =
P(A and B)
P(A)
(5.2)
Special Cases
1 If all the outcomes in the outcome space are equally likely, rule 1 becomes:
P(A) =
number of outcomes in S
(5.3)
In this section we will look at the three main rules for nding the probabilities of combinations of events. The
questions asked are:
If A and B are events:
1. what is the probability that event A does not occur?
2. what is the probability the either event A or event B or both occurs?
3. what is the probability that both event A and event B occurs?
The answers to these three questions are given in the three subsections below. This section contains a large
number of examples. Try and solve these problems yourself before reading the solution. There is no substitute
for practice here!
5.7.1 The probability an event does not occur
P(A) = the probability event A does occur
P(A) = the probability event A does not occur
How can we calculate P(A) from P(A)? The answer should be obviousand we have used this result without
comment in previous sections!
#
"
!
General Rule
3. The probability that an event does nor occur:
P(A) = 1 P(A)
The Venn diagram for this rule is shown in Figure 5.8.
Figure 5.8: Venn Diagram for an Event and Its Negation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
. A
A
The points inside the rectangle but outside the marked region represent the points in A.
Example 5.20
A student was asked about her chances of passing BS1. She believes that the possible outcomes and their proba-
bilities from studying BS1 this semester are:
Table 5.10: Probabilities of Outcomes From Studying BS1
e
1
e
2
e
3
e
4
What is the probability that this student does not pass BS1?
Solution
Let A = the event the student passes BS1
Then P(A) = P(e
1
) +P(e
3
) = 0.7
.
.
. P(A) = 1 P(A) = 0.3
The same result could be obtained directly by
A = {e
2
, e
4
}
.
.
. P(A) = P(e
2
) +P(e
4
) = 0.3
The probability the student does not pass BS1 is 0.3.
5.7.2 The probability that event A or event B occurs
The example below illustrates what is meant by event A or event B occurs and suggests a method for nding the
probability.
Example 5.21
A sample of Business College students were asked their gender and whether they were majoring in economics or
accounting. The responses are given in Table 5.11 below.
Male 30 10 40
Female 40 20 60
Total 70 30 100
(a) If a student is selected at random, what is the probability the student is majoring in accounting?
(b) If a student is selected at random what is the probability the student is male?
(c) If a student is selected at random what is the probability the student is either majoring in accounting or is
male?
Solution
Consider the events
Let M = the event the student is male
A = the event the student is majoring in accounting.
(a) Applying the special case for rule 1 gives:
P(A) =
number of accounting students
=
70
100
= 0.7
The probability the selected student is studying accounting is 0.7
(b) Applying the special case of rule 1 again gives:
P(M) =
number of male students
=
40
100
= 0.4
The probability the selected student is male is 0.4
(c) This illustrates what is meant by an or probability. The student is in this group if the student is male or
studying accounting or both. Note students satisfying both conditions are included. With probability the or
combination always includes both.
The obvious approach is to use the special case of rule 1 yet again:
P(A or M) =
number of students who are accounting or male or both
=
number of accounting students + number of male students
=
70+40
100
= 1.1
There is obviously an error hereprobabilities must lie between 0 and 1! What has gone wrong? Look at
the table below.
Male 30 10 40
Female 40 20 60
Total 70 30 100
In counting the number of students who are male or majoring in accounting as (70+40) = 110, the 30 male
accounting students have been counted twiceonce in the 40 male students and again in the 70 accounting
students. They should only be counted once. This could be done by summing (30 + 10 + 40) = 80.
A second approach is to sum the accounting and male students and then subtract the number that have been
counted twice:
number accounting or male = (number accounting) + (number male)
(number both male and accounting)
= 70 + 40 30
= 80
Thus we have:
P(A or M) =
number of students who are accounting or male or both
=
number accounting + number male number who are both
=
70+4030
100
= 0.80
The probability the selected student is studying accounting or male is 0.80.
The problem of double counting can also be seen in the Venn diagram.
Figure 5.9: Venn Diagram for Two EventsGeneral Case
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
. .
. . .
. .
. . .
. .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. .
. . .
. .
. . .
. .
. .
.
. .
.
.
B
The shaded area is the event A or B. If the result of the experiment is one of the outcomes in the shaded areas
then either A or B or both will occur. To nd the probability of A or B sum the probabilities of the outcomes in
the shaded areas.
If we calculate the sum of these probabilities by P(A) + P(B) then the probabilities of the outcomes that are
in both A and B will be counted twiceonce in P(A) and once in P(B). To include the probability of each
outcome only once subtract the probabilities of the outcomes that have been counted twice from the total. Then
P(A or B) = P(A) +P(B) P(A and B)
If the events have no outcomes in common there is no problem with double counting.
Figure 5.10: Venn Diagram for Two EventsMutually Exclusive Case
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
B
In Figure 5.10 the two events have no outcomes in common and so events A and B cannot both occur together
they are mutually exclusive. Here:
P(A or B) = P(A) +P(B)
The mutually exclusive events displayed above is the special case of the more general result from the previous
page.
'
&
$
%
General Rules
3. The probability that A or B occurs:
P(A or B) = P(A) +P(B) P(A and B) (5.4)
Special Cases
3 If the events A and B are mutually exclusive, rule 3 becomes:
P(A or B) = P(A) +P(B) (5.5)
It is easier to handle problems where the events are mutually exclusive the probabilities just have to be summed.
Where the events are not mutually exclusive, the problem can be reconstructed into four mutually exclusive
eventssee Figure 5.11 overleaf.
Figure 5.11: Venn Diagram for Two Non-mutually Exclusive Events
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
B
A and B
A and B
A and B
A and B
For example, consider the outcomes in the area marked A and B. All these outcomes are in the set A and so
event A has occurred. These outcomes are not in the set B and so event B has not occurred. Therefore, for all the
outcomes in this area, we have A and B. The other areas can be identied in the same way.
The four areas identied in Figure 5.11 are mutually exclusive and so their probabilities can be summed without
any double counting.
Example 5.22
The probability of a student enrolling in BS1 is 0.40. The probability that a student enrols in Economics is 0.20
and the probability a student enrols in both subjects is 0.15. What is the probability that a student enrols in either
BS1 or Economics?
Solution
Let B = the event a student enrols in BS1
E = the event a student enrols in Economics
We are told that
P(B) = 0.40
P(E) = 0.20
and P(B and E) = 0.15
Then P(B or E) = P(B) +P(E) P(B and E)
= 0.40 + 0.20 0.15
= 0.45
The probability the student enrols in BS1 or Economics is 0.45.
Example 5.23
The probability a student obtains an HD in BS1 is 0.05 and the probability of a DI in BS1 is 0.10. What is the
probability a student obtains an HD or a DI?
Solution
Let H = the event a student obtains an HD in BS1
D = the event a student obtains an DI in BS1
P(H) = 0.05
P(D) = 0.10
A student cannot have both an HD and a DIthe events are mutually exclusive
.
.
. P(H or D) = P(H) +P(D)
= 0.05 + 0.10
= 0.15
The probability a student obtains either an HD or a DI is 0.15
Example 5.24
The probability that the driver of a car pulling into a petrol station buys petrol is 0.8 and the probability that the
driver buys oil is 0.3. The probability that the driver buys both petrol and oil is 0.2.
(a) What is the probability that the driver buys petrol or oil?
(b) What is the probability the driver buys petrol but not oil?
(c) What is the probability that the driver buys neither petrol nor oil?
Solution
Let G = the event the driver buys petrol
L = the event the driver buys oil
The information given in the question is:
P(G) = 0.8
P(L) = 0.3
P(G and L) = 0.2
(a) From the general rule we have:
P(G or L) = P(G) +P(L) P(G and L)
= 0.8 + 0.3 0.2
= 0.9
The probability that the driver buys petrol or oil is 0.9.
This was a very straightforward question. For the more difcult questions (b) and (c), the Venn diagram is very
useful.
The Venn diagram for this problem is shown in Figure 5.12 below.
Figure 5.12: The Venn Diagram for the Petrol and Oil Problem
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
G
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
.
.
.
L
0.6 0.2 0.1
0.1
These probabilities were found in the following way.
From the given information we have
P(G and L) = 0.2
This probability can be marked in the Venn diagram.
From the Venn diagram, it can be seen that:
P(G) = P(G and L) +P(G and L)
.
.
. 0.8 = P(G and L) + 0.2
.
.
. P(G and L) = 0.6
Similarly,
P(G and L) = 0.1
Also 1 = P(G and L) +P(G and L) +P(G and L) +P(G and L)
.
.
. 1 = 0.6 + 0.2 + 0.1 +P(G and L)
.
.
. P(G and L) = 0.1
The probabilities of all the mutually exclusive areas have now been found. All the questions can now be answered
by summing the appropriate areas in the Venn diagram.
(a) P(G or L) = 0.6 + 0.2 + 0.1 = 0.9
The probability that the driver buys petrol or oil is 0.9.
(b) P(G and L) = 0.6
The probability that the driver buys petrol but not oil is 0.6.
(c) P(G and L) = 0.1
The probability that the driver buys neither petrol nor oil is 0.1.
5.7.3 Probability that both event A and event B occur
In the section on conditional probability you learned that the probability event B occurs when event Ahas already
occurred is denoted by P(B | A) and can be calculated from
P(B | A) =
P(A and B)
P(A)
Multiplying both sides of this equation by P(A) gives
P(A and B) = P(A) P(B | A)
If A and B are independent events then P(B | A) = P(B) and so the rule becomes
These are our last two rules of probability.
'
&
$
%
General Rule
5. The probability that both A and B occur:
P(A and B) = P(A) P(B | A) (5.6)
Special Case
5 If the events A and B are independent, rule 5 becomes:
P(A and B) = P(A) P(B) (5.7)
This is called the product rule. Calculations using the product rule are most easily organised by using a tree
diagram.
Figure 5.13: The Tree Diagram For Two Events
r
r
r
r
r
r
r
r
r
r
rr
A
A
O
B
B
B
B
P(A)
P(A)
P(B | A)
P(B | A)
P(B | A)
P(B | A)
Stage 1 Stage 2
P(A and B)
P(A and B)
P(A and B)
P(A and B)
Stage 1 Stage 2
Each of the four branches relates to a compound event.
The top branch is the event A and B. We have:
P(A and B) = P(A) P(B | A)
Therefore, the probability of the top branch is found by multiplying the probabilities along the branch.
Similarly, each of the other branches corresponds to a compound event whose probability can be found by multi-
plying the probabilities along the branch. These branch probabilities are written at the end of the branch.
Each of the four branches corresponds to one of the non-overlapping areas in the Venn diagram of Figure 5.11.
As these areas are for mutually exclusive events, the branch probabilities of different branches can be added.
The four branches together cover the sample space and so the sum of the probabilities of the branches is 1.
The rules for a tree diagram are:
'
&
$
%
Rules for a tree diagram
1. To nd the branch probabilities, multiply the probabilities along the branch.
2. To nd the probability of any list of branches, nd the probability of each branch and then add the
probabilities of the listed branches.
3. The sum of the probabilities of the branches is 1.
The tree diagram can be extended to situations with more than two stages and/or more than two options at each
stage.
Example 5.25
An auditor randomly selects two of the 8 accounts. What is the probability that:
(a) neither of the two accounts is a doubtful debt?
(b) only one of the two accounts is a doubtful debt?
(c) both of the selected accounts are doubtful debts?
Solution
The probabilities for this experiment were calculated in Example 5.16. The tree diagram for this experiment is as
shown in Figure 5.14 overleaf.
Figure 5.14: The Tree Diagram for the Account Selection Problem
.
r
r
r
r
r
r
r
r
r
r
r
A
A
B
B
B
B
3
8
5
8
2
7
5
7
3
7
4
7
Selection 1 Selection 2
O
X
Y
3
8

2
7
3
8

5
7
5
8

3
7
5
8

4
7
Multiplying along the branches gives the branch probabilities.
(a) P(neither are doubtful) = P(A and B) =
5
8

4
7
=
20
56
The probability that neither of the two selected accounts is a doubtful debt is
20
56
.
(b) P(only one is doubtful) = P(A and B) +P(A and B)
=
3
8

5
7
+
5
8

3
7
=
30
56
The probability that only one of the two selected accounts is a doubtful debt is
30
56
.
(c) P(both are doubtful) = P(A and B) =
3
8

2
7
=
6
56
The probability that both of the two selected accounts are doubtful debts is
6
56
.
Example 5.26
A manufacturer has three production plants A, B and C, manufacturing identical electronic units. 5% of the units
assembled in plant A are defective and 10% of the units assembled in plants B and C are defective. During an
eight hour shift, plant A produced 200 units, plant B produced 300 units and plant C produced 500 units. One
unit is selected at random from the 1000 units produced and found to be defective. What is the probability the
unit came from plant C?
Solution
Let D = the event the unit selected is defective
A = the event the unit selected was from plant A
B = the event the unit selected was from plant B
C = the event the unit selected was from plant C
The tree diagram for this problem is shown in Figure 5.15 on the following page.
Figure 5.15: The Tree Diagram For the Defective Parts
d
d
d
d
d
d
d
d
d
d
d
A
B
C
D
D
D
D
D
D
200
1000
300
1000
500
1000
5
100
95
100
10
100
90
100
10
100
90
100
200
1000

5
100
200
1000

95
100
300
1000

10
100
300
1000

90
100
500
1000

10
100
500
1000

90
100
P(C | D) =
P(C and D)
P(D)
In Figure 5.15, there are three branches that give defective items. The sum of the
probabilities from these branches gives P(D).
.
.
. P(C | D) =
500
1000

10
100
200
1000

5
100
+
300
1000

10
100
+
500
1000

10
100
=
5
9
The probability the defective item came from C is
5
9
.
In a similar way, we have:
.
.
. P(A | D) =
200
1000

5
100
200
1000

5
100
+
300
1000

10
100
+
500
1000

10
100
=
1
9
The probability the defective item came from A is
1
9
.
.
.
. P(B | D) =
300
1000

10
100
200
1000

5
100
+
300
1000

10
100
+
500
1000

10
100
=
3
9
The probability the defective item came from B is
3
9
.
Notice that, as the defective must have originated from one of the plants, we have:
P(A | D) +P(B | D) +P(C | D) = 1
5.7.4 The Rules of Probability
'
&
$
%
General Rules
1. The probability of an event from the probabilities of the outcomes:
P(A) =
A
P(e
i
) (5.8)
2. The probability that an event does not occur:
P(A) = 1 P(A) (5.9)
3. The probability that A or B occurs:
P(A or B) = P(A) +P(B) P(A and B) (5.10)
4. The probability that B occurs when A has already occurred:
P(B | A) =
P(A and B)
P(A)
(5.11)
5. The probability that both A and B occur:
P(A and B) = P(A) P(B | A) (5.12)
'
&
$
%
Special Cases
1 If all the outcomes in the outcome space are equally likely, rule 1 becomes:
P(A) =
number of outcomes in S
(5.13)
3 If the events A and B are mutually exclusive, rule 3 becomes:
P(A or B) = P(A) +P(B) (5.14)
5 If the events A and B are independent, rule 5 becomes:
P(A and B) = P(A) P(B) (5.15)
Quick Question 9
Quick Question 10
5.8 Reliability in sampling
In this section, the probability theory developed above is applied to an example of taking a small sample from a
small population. This example illustrates how probability theory can be used to measure the reliability of sample
estimates. Both the population and the sample have been made unrealistically small. In later units you will learn
how to measure the reliability of sample estimates from larger populations.
Example 5.27
There are four employees in an ofce: Arthur, Ben, Clare and Daniel. Arthur is paid $30,000 per annum, Ben and
Clare are paid $40,000 per annum and Daniel is paid $50,000 pa. A sample of size two is taken with replacement
and with ordering from this population of 4 employees. The mean income in the sample is used to estimate the
mean income in the population. What is the probability:
(a) the estimate of the mean income from the sample is exactly equal to the mean income in the population?
(b) the estimate of the mean income from the sample is no more than $5,000 from the mean income in the
population?
(c) the mean income in the sample is more than $10,000 from the mean income in the population?
Solution
Let us denote Arthur by a, Ben by b, Clare by c and Daniel by d. The incomes of these four are (in $ thousand):
a : 30
b, c : 40
d : 50
Let = the mean income in the population ($000)
X = the mean income in the sample ($000)
Then
=
30 + 40 + 40 + 50
4
= 40
and so the population mean income is $40,000
The sample space is:
With a simple random sample each sample has the same probability of being selected. Thus the classical approach
to probability can be used.
One possible sample is ca with Clare selected rst and Arthur second. With this sample, the sample mean income
would be $(40, 000 + 30, 000)/2 =$35,000. The sample mean income can be calculated for each sample in the
sample space.
Table 5.13: Samples of Size Two from a Population of Four
Sample Incomes Sample Mean Sample Incomes Sample Mean
($000) ($000) ($000) ($000)
aa 30 30 30 ca 40 30 35
ab 30 40 35 cb 40 40 40
ac 30 40 35 cc 40 40 40
ad 30 50 40 cd 40 50 45
ba 40 30 35 da 50 30 40
bb 40 40 40 db 50 40 45
bc 40 40 40 dc 50 40 45
bd 40 50 45 dd 50 50 50
Each of the above outcomes has a probability of
1
16
.
(a) The sample mean is equal to the population mean if the sample mean is $40,000.
Let E = the event the sample mean equals the population mean.
Then E = {ad, bb, bc, cb, cc, da}
.
.
. P(E) =
6
16
The probability that the sample mean equals the population mean is
6
16
= 0.375 and the probability that the
sample mean does not equal the population mean is
10
16
.
(b) The mean income from the sample is no more than $5,000 from the population mean of $40,000 if the sample
mean is between $35,000 and $45,000 inclusive.
Let G = the event the sample mean is between $35,000 and $45,000 inclusive
G = {aa, dd}
.
.
. P(G) =
2
16
.
.
. P(G) =
14
16
The probability that the sampling error in using the sample mean to estimate the population mean is $5,000
or less is
14
16
= 0.875.
(c) The mean income from the sample is more than $10,000 from the population mean of $40,000 if the sample
mean is less than $30,000 or more than $50,000.
Let H = the event the sample mean is less than $30,000 or more than $50,000.
H = { }
.
.
. P(H) = 0
The probability that the sampling error in using the sample mean to estimate the population mean is more
than $10,000 is 0.
Whether the results from a sample of size 2 are reliable enough depends upon how accurate the investigator wants
the estimate to be and how certain she wants to be that the sample estimate meets the accuracy target.
If the investigator wanted to obtain the exact value of the population mean, there is only a small probability
(0.375) that a sample of size two will give the required precision. On the other hand, if the investigator only
wanted to estimate the population mean income to within $5,000, there is a good chance (0.875) that the sample
of size 2 is large enough.
Later in this subject you will learn how to calculate the sample size required to estimate the population mean to a
required level of accuracy with a known certainty.
5.9 Summary
This unit introduced the concept of probability. A three step approach to probability was used:
1. the probabilities of all the outcomes in the sample space were found by using the relative frequency,
classical or subjective approaches to probability;
2. the probability of events were found by summing the probabilities of the outcomes in the event;
3. the probabilities of combinations of events were found by using the rules of probability.
Venn and tree diagrams are useful in calculating probabilities for combinations of events. These diagrams simplify
the application of the rules of probability. Use these diagrams where possible.
The major application of probability in statistics is the quantication of the reliability of sample estimates of
population values. In these applications the result of the sample is usually a numerical value. In the next two
Units you will learn to apply probabilities to randomexperiments where the result of the experiment is a numerical
value.
Read the learning objectives at the start of the unit. Have you achieved these objectives? If there are any objectives
about which you are unclear reread the appropriate sections before trying the tutorial exercises.
UNIT 6
DISCRETE
PROBABILITY
DISTRIBUTIONS
6.0 Contents
6.1 Unit objectives
6.2 Introduction
6.3 Random variables
6.4 Probability distributions
6.5 Summarising probability distributions
6.6 Expected values
6.7 The binomial distribution
6.8 The Poisson distribution
6.9 Joint probability distributions
6.10 Linear functions of random variables
6.11 Summary
Print Workbook 6
6.1 Unit objectives
The probability results developed in the previous unit apply to all experiments. In this unit we will focus on
experiments for which the recorded value is a number. Two experiments will be discussed in detail. In the rst
of these experiments, the number of successes out of a number of trials is recorded. This gives the binomial
probability distribution. In the second experiment, the number of successes in an interval of time is recorded.
This gives the Poisson probability distribution. At the end of the unit we will look at experiments for which the
recorded value is two numbers.
After completing this unit you should be able to:
Dene the terms random variable and discrete probability distribution;
Calculate probabilities from a given discrete probability distribution;
Calculate the mean and standard deviation of a discrete probability distribution;
Dene the term expected value and calculate expected values for discrete probability distributions;
Dene the binomial experiment and use the binomial formula and binomial tables to calculate binomial
probabilities;
Dene the Poisson experiment and use the Poisson tables to calculate Poisson probabilities:
Dene a joint probability distribution and calculate marginal probability distributions;
Calculate expected values from joint probablity distributions.
6.2 Introduction
In the previous unit, we discussed probabilities for experiments in general. In this and the following unit, we will
focus on probabilities for experiments where each outcome of the experiment is a number.
The unit starts by distinguishing between discrete and continuous random variables. Discrete random variables
are discussed in this unit and continuous random variables are discussed in Unit 7.
The values and probabilities of a discrete random variable are listed in a probability distribution. Several examples
of probability distributions are included in this unit. You will learn how to summarise a probability distribution
by calculating its mean and variance.
In the following sections of the unit, two particular experiments that arise frequently in sampling are discussed.
An example of the rst experiment is a sample survey in which a number of people are asked a question to which
the answer is yes or no. The number of yes responses are then counted. This is called a binomial experiment
and tables have been constructed giving the probability distributions for this form of experiment. You should
have access to a copy of the Comprehensive Statistical Tables Volume 1 published by the University of Canberra.
These contain binomial tables on pages 522. Similar binomial tables are included in the text. You will learn to
use these tables to calculate binomial probabilities.
An example of the second form of experiment is where the number of customers arriving at a supermarket check-
out in a specied interval of time is recorded. This is an example of a Poisson experiment and again tables have
been constructed giving the probabilities for this type of experiment. The Statistical Tables contain Poisson prob-
ability tables on pages 2332. Similar Poisson tables are included in the text. You will learn to use these tables to
calculate Poisson probabilities.
An example of another type of experiment is the recording of the percentage change in the price of two shares in a
week. Here the outcomes of the experiment are two numbers. One of the main interests in this type of experiment
is to discover if there is any relationship between the two recorded numbers. In studying this type of problem you
will learn how to extend the measurement of the strength of the relationship between two variables discussed in
Unit 4 to probability distributions.
6.3 Random variables
The previous probability results apply to all experiments. From this point onwards, we will focus on experiments
where a numerical value is assigned to each outcome of the experiment. A rule used to assign a numerical value
to each outcome is called a random variable.
Random variable: A function (rule) that assigns a numerical value to each outcome in a sample space.
Random variables are usually denoted by upper case letters from the end of the alphabetX, Y , Z etc.
(Some examples of random variables are given in the Exercises in the next section.)
There are two types of random variables: discrete random variables and continuous random variables. You have
learnt the denitions of discrete and continuous variables in Unit 2. The denitions given below apply these
denitions to random variables.
Discrete random variable: A random variable with values that can be listed.
Continuous random variable: A random variable that can assume all the values in an interval.
In this unit you will learn about discrete random variables. Continuous random variables form the subject matter
of Unit 7.
6.4 Probability distributions
The probabilities for a discrete random variable are given in a probability distribution.
Probability Distribution: A table or formula that lists all the possible values that a discrete random can assume
together with their associated probabilities.
Example 6.1
An auditor randomly selects two of the 8 accounts. What is the probability distribution of the number of doubtful
debts in the selected accounts?
Solution
X = the number of doubtful debts in the two selected accounts
The sample space for this experiment is:
S = { AB, AB, AB, AB }
The random variable X assigns a numerical value to each of these outcomes:
S = { AB, AB, AB, AB }
Value of X = 0 1 1 2
Notice that X assigned a numerical value to each outcome in the sample space. X is a random variable.
From Example 5.25, the probability distribution for X is as shown in Table 6.1.
Table 6.1: Probability Distribution of the Number of Doubtful Debts
Number Probability
x p(x)
0
20
56
1
30
56
2
6
56
1
Example 6.2
A population contains 4 people a, b, c, and d
a : earns $30,000 p.a.
b and c : earn $40,000 p.a.
d : earns $50,000 p.a.
For this population = $40, 000 and
2
= $
2
50, 000, 000
A simple random sample of size 2 is taken with replacement and with ordering.
The sample space is:
Let X
1
= the income of the rst person selected ($000)
X
2
= the income of the second person selected ($000)
X = the mean income in the sample ($000).
What is the probability distribution of the random variables X
1
, X
2
and X?
Solution
Sample x
1
x
2
x Sample x
1
x
2
x
aa 30 30 30 ca 40 30 35
ab 30 40 35 cb 40 40 40
ac 30 40 35 cc 40 40 40
ad 30 50 40 cd 40 50 45
ba 40 30 35 da 50 30 40
bb 40 40 40 db 50 40 45
bc 40 40 40 dc 50 40 45
bd 40 50 45 dd 50 50 50
X
1
, X
2
and X are all random variablesthey assign numerical values to each outcome in the sample space.
The sample is taken as a simple random sample and so each of the above outcomes has a probability of
1
16
. We
can construct probability distributions for each of the three random variables. The probability distribution for X
is given in Table 6.3.
Table 6.3: Probability Distribution for the Sample Mean
Sample Mean Probability
x p(x)
30
1
16
35
4
16
40
6
16
45
4
16
50
1
16
1
Notice the notation used in the above probability distributions:
the value of a random variable is denoted by the same letter as the random variable but in lower case. In
Example 6.1, the random variable is X and x is used for the values of the random variable. In Example
6.2, X is the random variable and x is used for the values of the random variable.
the probability that the random variable X has the value x is written as p(x).
In general, we write a probability distribution as shown in Table 6.4 below.
Table 6.4: The General Form of a Probability Distribution
Value Probability
x p(x)
x
1
p(x
1
)
x
2
p(x
2
)
.
.
.
.
.
.
x
k
p(x
k
)
1
The values of a random variable are mutually exclusive and collectively exhaustive. The values of the random
variable can, therefore, be analysed as if they were the outcomes of the experiment. The conditions for the prob-
abilities of outcomes from Section 5.3 apply to the probabilities of probability distributions. The probability of
any set of values of a random variable can be calculated by summing the probabilities of the valuessee Rule 1
from Section 5.4.
'
&
$
%
Conditions for probability distributions
The probabilities of a probability distribution must satisfy:
(a) 0 p(x) 1 for all x.
(b)

x
p(x) = 1
Finding the probability of a set of values
P(A) =
xA
p(x)
Example 6.3
In Example 6.2, if the sample mean is used to estimate the population mean, what is the probability that the
sampling error is $5,000 or less?
Solution
The population mean is $40,000 and so the sample mean is within $5,000 of the population mean if the sample
mean is between $35,000 and $45,000 inclusive.
P(| X | 5) = P(35 X 45)
= P(X = 35 or 40 or 45)
= P(X = 35) +P(X = 40) +P(X = 45)
= p(35) + p(40) + p(45)
=
4
16
+
6
16
+
4
16
=
14
16
The probability that the sampling error is $5,000 or less is
14
16
.
Quick Question 1
Quick Question 2
Quick Question 3
It is sometimes more convenient to work with cumulative probability distributions.
Or-less-than cumulative probability distribution: A table or formula that gives all the possible values of a
discrete random variable X together with P(X x).
Or-more-than cumulative probability distribution: A table or formula that gives all the possible values of a
discrete random variable X together with P(X x).
The cumulative probability distribution of a random variable is easily calculated from its probability distribution.
Example 6.4
Find the or-more-than cumulative probability distribution for the random variable X in Table 6.1.
Solution
P(X 2) = P(X = 2) =
6
56
P(X 1) = P(X = 1) + P(X = 2) =
30
56
+
6
56
=
36
56
P(X 0) = P(X = 0) + P(X = 1) + P(X = 2) =
20
56
+
36
56
= 1
In Table 6.5 these cumulative probabilities are written out in tabular form.
Table 6.5: Or-more-than Cumulative Probability Distribution
Number Or-more-than Probability
x P(X x)
0 1
1
36
56
2
6
56
Quick Question 4
6.5 Summarising probability distributions
In Unit 3, you learnt how to summarise data by calculating a measure of the central value of the data and a
measure of the spread of the data. The mean and variance used to summarise data are also used to summarise
probability distributions.
The mean for a probabiity distribution:
=
xp(x) (6.1)
The variance for a probability distribution:
2
=
(x )
2
p(x) (6.2)
=
x
2
p(x)
2
(6.3)
The concept of the mean and the variance does not change. The mean is a measure of a central value for the
values of X and the variance is a measure of the spread in the values of X.
Equation (6.2) gives the denition for the variance of a probability distributionthe variance is the average of
the squared deviations from the mean. Equation (6.3) is the formula used to calculate the variance. Never use
equation (6.2) to calculate the varianceit is an extremely inefcient method.
Example 6.5
What is the mean and variance of the probability distribution in Table 6.6 below?
Table 6.6: Probability Distribution of the Number of Doubtful Debts
Number Probability
x p(x)
0
20
56
1
30
56
2
6
56
1
Solution
Table 6.7: Calculating the Mean and Variance
Value Probability
x p(x) xp(x) x
2
p(x)
0
20
56
0 0
1
30
56
30
56
30
56
2
6
56
12
56
24
56
1
42
56
54
56
=

xp(x) =
42
56
= 0.75
2
=

x
2
p(x)
2
=
54
56

42
56
2
= 0.4018
The mean is 0.75 and the variance is 0.4018.
Quick Question 5
6.6 Expected values
The previous section included the expressions

xp(x),
x
2
p(x) and

(x )
2
p(x). These expressions can
be simplied by using the expected value notation.
Expected value of g(X): The expected value of the function g(X) is dened by
E[g(X)] =
g(x)p(x)
The expected value of a function of X is the mean value of the function. The expected value will always lie
between the lowest and highest values of the function.
In this notation, Equations (6.1) to (6.3) are written as:
=
xp(x)
= E[X] (6.4)
2
=
(x )
2
p(x)
= E[(X )
2
] (6.5)
2
=
x
2
p(x)
2
= E[X
2
]
2
(6.6)
Example 6.6
The auditor in Example 6.1 has estimated that if X of the two selected accounts are doubtful debts, then it will
take (X
2
+2X +5) days to complete the audit. What is the expected number of days that it will take to complete
the audit?
Solution
Table 6.8: Calculating E[X
2
+ 2X + 5]
Value Probability
x p(x) x
2
+ 2x + 5 (x
2
+ 2x + 5)p(x)
0
20
56
5
100
56
1
30
56
8
240
56
2
6
56
13
78
56
1
418
56
E[X
2
+ 2X + 5] =

(x
2
+ 2x + 5)p(x) =
418
56
= 7.5
From column 3 of Table 6.8, it could take between 5 and 13 days to complete the auditdepending on the number
of doubtful debts in the selected two accounts. On average it will take 7.5 days to complete the audit.
The following rules can be used to simplify the calculation of expected values.
'
&
$
%
Rules for expected values
1. For any constant a
E[a] = a (6.7)
2. For any function g(X) and any constant b
E[bg(X)] = bE[g(X)] (6.8)
3. For any two functions g(X) and f(X)
E[g(X) f(X)] = E[g(X)] E[f(X)] (6.9)
Rule 3 can be extended to any number of functions.
Stated in the abstract, these rules look quite complicated but they are really very obvious results.
The rst rule states that the average value of a constant is the constant itself. On average, the value of 2 is
2seems reasonable!
The second rule states that when all the values of a function of a random variable are multiplied by a constant, the
expected (average) value of the function of the random variable is also multiplied by the same constant. When all
household incomes increase by 10% (b = 1.1), the average household income also increases by 10%.
The third rule states that the average of the sum/difference of two values is the sum/difference of their averages.
If the average income of a household (E[g(X)]) is $300 per week and the average consumption of a household
(E[f(X)]) is $280 per week, then the average saving of a household (E[g(X) f(X)]) is $(300 - 280) = $20 per
week.
The formulae for the mean and standard deviation of a linear function of a random variable are important special
cases of the above results. These results can be rewritten as:
'
&
$
%
Mean and variance of linear functions of a random variable
E[aX + b] = aE[X] + b (6.10)
V [aX + b] = a
2
V [X] (6.11)
Note : The variance of the random variable Y can be denoted by either
2
Y
or V [Y ]. Use
whichever is the more convenient.
Example 6.7
In Example 6.5, it was calculated that:
E[X] =
42
56
and E[X
2
] =
54
56
Use these results to calculate E[X
2
+ 2X + 5].
Solution
E[X
2
+ 2X + 5] = E[X
2
] + E[2X] + E[5] (Rule 3)
= E[X
2
] + E[2X] + 5 (Rule 1)
= E[X
2
] + 2E[X] + 5 (Rule 2)
=
54
56
+ 2
42
56
+ 5
=
418
56
= 7.5
The expected value of (X
2
+ 2X + 5) is 7.5. This agrees with the result of Example 6.6.
Quick Question 6
Example 6.8
In Erehwon, the mean before-tax personal income is $10,000 and the standard deviation of personal incomes is
$4,000. The minimum wage in Erehwon is $2,000. The rst $2,000 of a persons income is tax free but income
tax is levied at a rate of 30 cents in the dollar on all personal income above $2,000. What is the mean and standard
deviation of after-tax personal incomes?
Solution
Let I = before tax income of a randomly selected person ($000)
T = the tax paid by a randomly selected person ($000)
Y = the after-tax income of a randomly selected person ($000)
Then we have
E[I] = 10
V [I] = 4
2
Income tax is only levied on income above $2,000. A person with an income of I will only pay tax on (I 2)
thousand dollars of that income.
.
.
. T = 0.3(I 2)
.
.
. Y = I T = I 0.3(I 2) = 0.6 + 0.7I
.
.
. E[Y ] = E[0.6 + 0.7I] = 0.6 + 0.7E[I] (Using equation 6.10)
= 0.6 + 0.7 10
= 7.6
V [Y ] = V [0.6 + 0.7I] = 0.7
2
V [I] (Using equation 6.11)
= 0.7
2
4
2
.
.
.
Y
= 0.7 4 = 2.8
The mean after-tax income is $7,600 and the standard deviation of after-tax incomes is $2,800.
6.7 The binomial distribution
Some types of experiment arise in widely different areas of application. In the following two sections, you will
learn about two of these experimentsthe binomial experiment and the Poisson experiment.
Consider the following:
Experiment 1
It has been reported that 40% of all new cars sold last month were Holdens. A simple random sample of four new
cars is selected and the number of Holdens in the sample recorded.
Let X = the number of Holdens in the sample.
Y = the number of other makes of car in the sample.
Experiment 2
It is believed that 30% of the accounts at a local bank are overdrawn. The auditor selects a simple random sample
of 20 accounts and records the number that are overdrawn.
Let X = the number of overdrawn accounts in the sample.
Y = the number of not overdrawn accounts in the sample.
Although these two experiments are dealing with different objects they have the same basic structure. They are
examples of the binomial experiment. In both cases an actioncalled a trialis being repeated a xed number
of times. With each trial an outcome either occurs (car is Holden, account is overdrawn) or does not occur. The
number of times the outcome occurs is counted.
'
&
$
%
The binomial experiment
1. The experiment consists of a xed number n of trials.
2. Each trial results in one of two outcomes: success or failure.
3. The probability p of success remains constant from trial to trial.
4. Each trial of the experiment is independent of the other trials.
5. The number of successes X observed in the n trials is recorded.
The random variable X dened in the binomial experiment is said to have a binomial distribution.
The binomial distribution has two parameters
n = the number or trials.
p = the probability of success in each trial.
If the random variable X has a binomial probability distribution with parameters n and p, we write
X B(n, p)
where the symbol is interpreted: is distributed as.
Let us now analyse the experiments given above.
Experiment 1
Here there are four trials. The outcome car is a Holden is called a success and the number of Holdens is counted.
For each trial, the probability that there is a success (ie the selected car is an Holden) is 0.4. Here we have
X B(4, 0.4)
Notice that the decision to call the outcome car is a Holden a success is purely arbitrary. We could have called the
outcome car is not a Holden a success and counted the number of cars that are not Holden. Then the probability
of a success would have been 0.6 and we would have obtained
Y B(4, 0.6)
Experiment 2
Here there are 20 trials. The outcome account is overdrawn is called a success and the number of accounts
overdrawn is counted. For each trial, the probability that there is a success (ie the selected account is overdrawn)
is 0.3. Here we have
X B(20, 0.3)
As before, the decision to call the outcome account is overdrawn a success is purely arbitrary. We could have
called the outcome account is not overdrawn a success and counted the number of accounts that are not overdrawn.
The probability of a success would have been 0.7 and we would have obtained
Y B(20, 0.7)
The probabilities for X B(n, p) are calculated using the formula:
p(x) =
n!
x!(n x)!
p
x
(1 p)
(nx)
for x = 0, 1, 2, . . . , n (6.12)
Note that in evaluating the above expression:
(a) 0! is dened to be 1.
(b) k
0
is 1 for any non-zero value of k.
Example 6.9
It has been reported that 40% of all new cars sold last month were Holdens. A simple random sample of four new
cars is selected. Let X be the number of Holdens in the sample. Calculate the probability distribution of X.
Solution
You already know that X B(4, 0.4). Using Equation 6.12, we nd that
p(0) =
4!
0! 4!
0.4
0
0.6
4
= 0.6
4
= 0.1296
p(1) =
4!
1! 3!
0.4
1
0.6
3
= 4 0.4
1
0.6
3
= 0.3456
p(2) =
4!
2! 2!
0.4
2
0.6
2
= 6 0.4
2
0.6
2
= 0.3456
p(3) =
4!
3! 1!
0.4
3
0.6
1
= 4 0.4
3
0.6
1
= 0.1536
p(4) =
4!
4! 0!
0.4
4
0.6
0
= 0.4
4
= 0.0256
In Table 6.9, these probabilities are written in probability distribution form.
Table 6.9: The Binomial Probability Distribution for n = 4, p = 0.4
x p(x)
0 0.1296
1 0.3456
2 0.3456
3 0.1536
4 0.0256
1.000
From Table 6.9, we can calculate the probability of any event by summing the required probabilities. For example,
we could construct the or-less-than cumulative binomial distribution:
P(X 0) = P(X = 0) = 0.1296
P(X 1) = P(X = 0) + P(X = 1) = 0.1296 + 0.3456
= 0.4752
Continuing in this way, we can calculate all the or-less-than cumulative probabilities. These are presented in
Table 6.10:
Table 6.10: Binomial Probabilities and Cumulative Probabilities for n = 4, p = 0.4
x p(x) P(X x)
0 0.1296 0.1296
1 0.3456 0.4752
2 0.3456 0.8208
3 0.1536 0.9744
4 0.0256 1.0000
1.000
Quick Question 7
Fortunately, these long computations are rarely necessary. The probabilities for binomial distributions with n
from 2 to 25 (with some omissions) are printed on pages 512 of the UC Statistical Tables .
To use these tables:
1. First locate the n value in the rst column of the table.
2. In the section of the table for this n value, locate the x value in the second column of the table. This
determines a row of the table.
3. Locate the column by using the listed p values along the top of the table. The value in the intersection of
the row and the column gives the required probability.
Example 6.10
Use the UC Statistical Tables to nd P(X = 2) for the binomial distribution X B(4, 0.4).
Solution
The relevant section of the UC Statistical Tables page 5 is reproduced below:
p = 0.4
p
n k 0.05 1/3 0.35 0.40 0.45 0.50
1 0 0.9500 0.6667 0.6500 0.6000 0.5500 0.5000
1 0.0500 0.3333 0.3500 0.4000 0.4500 0.5000

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

n = 4 4 0 0.8145 0.1975 0.1785 0.1296 0.0915 0.0625
1 0.1715 0.3951 0.3845 0.3456 0.2995 0.2500
x = 2 2 0.0135 0.2963 0.3105 0.3456 0.3675 0.3750
3 0.0005 0.0988 0.1115 0.1536 0.2005 0.2500
4 0.0000 0.0123 0.0150 0.0256 0.0410 0.0625

.
.
. P(X = 2) = 0.3456
On pages 1322 of the UC Statistical Tables you will nd or-less-than cumulative binomial tables.
You should practice using the UC Statistical Tables to nd probabilities. In your tests
and nal examination you will be given a copy of these Tables to use.
To calculate the probability that a random variable X satises a specied condition:
1. List all the possible values of the random variable.
2. Mark the values of X that meet the specied condition.
3. Sum the probabilities of the marked values of X.
The or-less-than cumulative probability tables can be used to reduce the labour involved in the third step. You
do not have to use the cumulative probability tables if you do not want toyou can always sum the individual
probabilities.
Quick Question 8
Example 6.11
A development bank accepts thirty per cent of loan requests from farmers. Ten loan requests from farmers are
randomly selected for consideration by the bank. Find the probability:
(a) exactly 4 are accepted.
(b) no more than 4 are accepted.
(c) more than 4 are accepted.
(d) at least 2 but fewer than 6 are accepted.
Solution
Let X = number of requests out of 10 accepted by the bank.
Then X B(10, 0.30)
Pages 512 of the UC Statistical Tables gives the individual probabilities p(x) and on pages 1322 are the or-
less-than cumulative probabilities P(X x).
(a) P(X = 4) :
x 0 1 2 3 4 5 6 7 8 9 10 Total
p(x) 1.0
P(X = 4) = 0.2001 (page 7)
The probability that exactly 4 are accepted is 0.2001.
(b) P(X 4) :
x 0 1 2 3 4 5 6 7 8 9 10 Total
p(x) 1.0
EITHER :
P(X 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)
= 0.0282 + 0.1211 + 0.2335 + 0.2668 + 0.2001 (page 7)
= 0.8497
OR :
P(X 4) = 0.8497 (page 15)
The probability that no more than four are accepted is 0.8497.
(c) P(X > 4) :
x 0 1 2 3 4 5 6 7 8 9 10 Total
p(x) 1.0
EITHER :
P(X > 4) = P(X = 5) + P(X = 6) + + P(X = 9) + P(X = 10)
= 0.1029 + 0.0368 + 0.0090 + 0.0014 + 0.0001 + 0.0000 (page 7)
= 0.1502
OR :
P(X > 4) = 1 P(X 4)
= 1 0.8947 (page 15)
= 0.1053
(The difference between the two answers is due to rounding.)
The probability that more than 4 are accepted is 0.1053.
(d) P(2 X < 6) :
x 0 1 2 3 4 5 6 7 8 9 10 Total
p(x) 1.0
EITHER :
P(2 X < 6) = P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5)
= 0.2335 + 0.2668 + 0.2001 + 0.1029 (page 7)
= 0.8033
OR :
P(2 X < 6) = P(X 5) P(X 1)
= 0.9527 0.1493 (page 15)
= 0.8034
(The difference between the two answers is due to rounding.)
The probability that at least 2 but fewer than 6 are accepted is 0.8034.
Quick Question 9
Binomial probabilities for p > 0.5
A glance at the values of p in the UC Statistical Tables reveals a problem: the p values only go up to 0.5. You
may now want to know how we calculate probabilities when p> 0.5.
The trick here is to work with the failures. Remember that with each trial there are two possible outcomes
success and failure. As the probability of success and the probability of failure sum to 1, it is not possible for
them both to have a probability of more than 0.5.
If we let X = the number of successes in n trials
Y = the number of failures in n trials
Then X B(n, p)
and Y B(n, 1 p)
Furthermore, as each trial must result in a success or a failure
x + y = n
.
.
. y = n x
When X = x, it follows that Y = n x. They are the same event and so they have the same probability. For ex-
ample, if in 10 trials there are four successes, then at the same time, there must be six failures. The four successes
and the six failures out of the 10 trials are two ways of looking at the same outcome and so they have the same
probability.
Thus we have
P(X = x) = P(Y = n x)
and the tabulated probabilities for Y can be used to nd the probabilities for X.
Example 6.12
In lling in their income tax returns, 70% of professionals understate their income. A tax inspector randomly
selects 12 tax returns to investigate.
Calculate the probability that:
(a) 10 of these 12 professionals have understated their incomes.
(b) 8 or more of these 12 professionals have understated their incomes.
(c) 9 or less of these 12 professionals have understated their incomes.
(d) between 6 and 9, inclusive, of these 12 professionals have understated their incomes.
Solution
Let X = number of returns out of 12 where income is understated.
Y = number of returns out of 12 where income is not understated.
Then we have
X B(12, 0.70)
Y B(12, 0.30)
and x + y = 12
If X = 0 then Y = 12, if X = 1 then Y = 11, etc. The full conversion table from X to Y is
x 0 1 2 3 4 5 6 7 8 9 10 11 12
y 12 11 10 9 8 7 6 5 4 3 2 1 0
(a) P(X = 10)
x 0 1 2 3 4 5 6 7 8 9 10 11 12 Total
p(x) 1.0
y 12 11 10 9 8 7 6 5 4 3 2 1 0
P(X = 10) = P(Y = 2)
= 0.1678 (page 8)
The probability that 10 of these professionals have understated their income is 0.1678
(b) P(X 8)
x 0 1 2 3 4 5 6 7 8 9 10 11 12 Total
p(x) 1.0
y 12 11 10 9 8 7 6 5 4 3 2 1 0
EITHER :
P(X 8) = P(Y 4)
= 0.7237 (page 16)
OR :
P(X 8) = P(Y = 4) + P(Y = 3) + + P(Y = 0)
= 0.2311 + 0.2397 + 0.1678 + 0.0712 + 0.0138 (page 8)
= 0.7236
The probability that 8 or more of these professionals have overstated their income is 0.7237.
(c) P(X 9)
x 0 1 2 3 4 5 6 7 8 9 10 11 12 Total
p(x) 1.0
y 12 11 10 9 8 7 6 5 4 3 2 1 0
EITHER :
P(X 9) = P(Y 3)
= 1 P(Y 2)
= 1 0.2528
= 0.7472 (page 16)
OR :
P(X 9) = P(Y = 12) +P(Y = 11) + + P(Y = 3)
= 0.0000 + 0.0000 + 0.0002 + + 0.2397 (page 8)
= 0.7472
The probability that 9 or less of these professional have understated their income is 0.7472.
(d) P(6 X 9)
x 0 1 2 3 4 5 6 7 8 9 10 11 12 Total
p(x) 1.0
y 12 11 10 9 8 7 6 5 4 3 2 1 0
EITHER :
P(6 X 9) = P(3 Y 6)
= P(Y 6) P(Y 2)
= 0.9614 0.2528 (page 16)
= 0.7086
OR :
P(6 X 9) = P(3 Y 6)
= P(Y = 6) + P(Y = 5) + P(Y = 4) + P(Y = 3)
= 0.0792 + 0.1585 + 0.2311 + 0.2397 (page 8)
= 0.7085
The probability that between 6 and 9 inclusive of these professionals have understated their incomes is 0.7086.
Quick Question 10
The mean and the variance for the binomial can be calculated using formulae (6.1) and (6.3) but there is an easier
way.
'
&
$
%
The mean and variance of a binomial distribution
For X B(n, p)
= np (6.13)
2
= np(1 p) (6.14)
Example 6.13
In lling in their income tax returns 70% of professionals understate their income. A tax inspector randomly
selects 12 tax returns to investigate.
Let X = number of returns where income is understated.
Calculate the mean and the variance of X.
Solution
X B(12, 0.70)
.
.
. = 12 0.70 = 8.4
2
= 12 0.70 0.30 = 2.52
The mean of X is 8.4 and the variance is 2.52.
Quick Question 11
6.8 The Poisson distribution
The second type of experiment that we will discuss in this unit is the Poisson experiment. Consider the following
experiments:
the number of times a machine breaks down in a day is counted;
the number of people killed on the roads in the ACT in a week is counted;
the number of strikes in a year is counted;
the number of customers arriving at a supermarket checkout in a 10 minute period is counted.
the number of noxious weeds in a hectare of land is counted.
the number of errors on a page of these Course Notes is counted.
These are examples of Poisson experiments (subject to some conditions). In a Poisson experiment, the number
of times an outcome occurs in an interval is counted. The interval is usually an interval of time but it can be a
distance or an area. In the specication below, success refers to the occurrence of the outcome of interest, length
refers to the size of an interval and sub-interval refers to a part of the interval.
In the rst experiment listed above a success is the machine breaking down, the length of the interval is a day
and a sub-interval is a minute (or second) in the day.
'
&
$
%
The Poisson experiment
1. There is an interval of known xed length.
2. There is an average of successes in the interval.
3. The number of successes in a sub-interval is independent of the number of successes in any other non-
overlapping sub-interval.
4. The probability of a success occurring in a sub-interval is proportional to the length of the interval.
5. As a sub-interval becomes smaller and smaller so the probability of two or more successes occurring in the
sub-interval approaches zero.
6. The number of successes X observed in the interval of known xed length is counted.
The random variable X dened in the Poisson experiment is said to have a Poisson distribution.
The Poisson distribution has one parameter:
= the mean number of successes in the interval of known xed length
If the random variable X has a Poisson probability distribution with parameter , we write
X Po()
It can be shown that if X Po() then
p(x) =
e
x
x!
for x = 0, 1, 2, 3, ......... (6.15)
Note that in evaluating the above expression:
i. 0! is dened to be 1.
ii. k
0
is 1 for any non-zero value of k.
iii. The number e that appears in the formula is one of the fundamental constants of mathematics. Its value is
approximately 2.71828.
You can calculate the value of p(x) for different values of and x quite easily using your calculator.
The UC Statistical Tables on pages 2327 give the value of p(x) for a large number of different values of and
x.
Or-less-than cumulative Poisson probabilities, P(X x), are on pages 2832 of the Tables.
Example 6.14
The Maths Department photocopier has an average of four paper jams in an eight hour day.
(a) Use Equation 6.15 to calculate the probability that one or two paper jams will occur during a given day.
(b) Use the UC Statistical Tables to calculate the probability that one or two paper jams will occur during a
given day.
(c) Find the probability that at most two paper jams will occur during the second hour of the day.
Solution
Let X = the number of paper jams in a day.
(a) Here the known xed interval of time for the Poisson is a day. The mean number of paper jams in a day is
four.
.
.
. X Po(4)
Using Equation 6.15:
p(1) =
e
4
4
1
1!
= 4e
4
= 0.0733
p(2) =
e
4
4
2
2!
= 8e
4
= 0.1465
.
.
. p(1) + p(2) = 0.2198
There is a probability of 0.2198 that there are one or two paper jams in a day.
(b) On page 24 of UC Statistical Tables , from the column = 4.0 and rows x = 1 and x = 2
p(1) = 0.0733
p(2) = 0.1465
.
.
. p(1) + p(2) = 0.2198
There is a probability of 0.2198 that there are one or two paper jams in a day.
(c) Let Y = the number of paper jams in the second hour.
Here the xed interval of time for the Poisson is one hour. The mean number of paper jams per hour is
4
8
= 0.5.
.
.
. Y Po(0.5)
On page 23 of the UC Statistical Tables , from the column = 0.5
P(Y 2) = P(Y = 0) + P(Y = 1) + P(Y = 2)
= 0.6065 + 0.3033 + 0.0758
= 0.9856
Or from page 28 of the UC Statistical Tables , from the column = 0.5
P(Y 2) = 0.9856
The probability that there are at most 2 paper jams in the second hour is 0.9856.
Quick Question 12
One of the main uses of the Poisson distribution is to calculate approximate binomial probabilities. Binomial
tables are available only for small values of n but the Poisson tables can be used to approximate binomial proba-
bilities for large n provided the value of p is small. The result here is as follows:
'
&
$
%
Poisson approximation to the binomial
If X B(n, p) with
i. n large: n > 25
and ii. p small: np 5
then X Po(np)
(Read the symbol as is approximately distributed as).
Example 6.15
An insurance company knows that on average only 1% of policy holders make a major insurance claim in a year.
An investigator examines the claims made by 100 randomly selected policy-holders. What is the probability that
in the past year
(a) only one of these policy holders had made a major claim?
(b) two or more of these policy holders had made major claims?
Solution
Let X = number of major claims in the sample of 100.
Then X B(100, 0.01)
Here i. n = 100 > 25
and ii. np = 100 0.01 = 1.0 5
and so the Poisson approximation to the binomial can be used.
.
.
. X Po(100 0.01)
X Po(1.0)
(a) P(X = 1)
.
= 0.3679 (page 23)
There is a probability of approximately 0.368 that only one of the policyholders made a claim.
(b) P(X 2) = 1 P(X 1)
= 1 [P(X = 0) + P(X = 1)]
.
= 1 [0.3679 + 0.3679] (page 23)
.
.
. P(X 2)
.
= 0.2642
There is a probability of approximately 0.264 that two or more of the policyholders made a claim.
Quick Question 13
The mean and variance of Poisson could not be easier to calculate!
'
&
$
%
The mean and variance of a Poisson distribution
For X Po()
= (6.16)
2
= (6.17)
Example 6.16
The School of Mathematics photocopier has an average of four paper jams in an eight hour day. What is the mean
and the variance of the number of breakdowns in a 5 day week?
Solution
Let X = the number of breakdowns in a week
then X Po(20)
.
.
. = 20
and
2
= 20
There are an average of 20 paper jams per week, and the variance is 20 (paper jams per week)
2
.
6.9 Joint probability distributions
So far in this unit you have only looked at experiments for which each outcome is a single number. Many
experiments have outcomes of two or more numbers and the centre of interest is often the relationship between
the numbers. This is illustrated in the following example.
Example 6.17
An entrepreneur has purchased two adjoining stores. Store A sells groceries and store B sells furniture. The
prots from these stores varies from month to month.
Let X = prot in a month from store A ($000)
Y = prot in a month from store B ($000)
T = the total prot from the two stores in a month
.
.
. t = x + y
Over a period of 20 months the entrepreneur records the prot of each of the two stores. The recorded prots are
shown in Table 6.11 overleaf.
Table 6.11: The Monthly Prots of Two Stores ($000)
Month A prot ($000) B prot ($000) Month A prot ($000) B prot($000)
X Y X Y
1 1 0 11 1 0
2 2 1 12 0 0
3 0 1 13 2 1
4 1 1 14 1 1
5 1 0 15 0 1
6 0 1 16 2 1
7 2 1 17 0 1
8 0 1 18 0 1
9 0 0 19 1 0
10 0 1 20 0 1
(a) What is the probability that store A makes a prot of $2000 and store B makes a prot of $1000 in the same
month?
(b) What is the probability that store A makes a prot of $0 and store B makes a prot of $1000 in the same
month.
(c) What is the probability that the total prot of the two stores is negative in a month?
Solution
(a) Scanning down the list we can see that on 4 of the 20 months (coloured blue in the table) store A made a
prot of $2000 and store B made a prot of $1000. Using the relative frequency approach to probability we
have:
P[(X = 2) and (Y = 1)] =
4
20
= 0.20
(b) In 8 of the 20 months (coloured grey in the table) store A made a prot of $0 and store B made a prot
$1000.
P[(X = 0) and (Y = 1)] =
8
20
= 0.40
(c) The total prot in each month is found by summing the prots of the two stores. In 4 of the 20 months
(coloured, appropriately, red) the total prot is negative.
P[X + Y < 0] =
4
20
= 0.20
In both (a) and (b) in the above example we found the probability that in a month both store A and store B had
the specied prots. In general with two random variables X and Y the probability that both X = x and Y = y
is called the joint probability of x and y and is denoted by p(x, y).
p(x, y) = the joint probability that in an experiment both X = x and Y = y .
In the above example p(2, 1) = 0.20 and p(0, 1) = 0.40.
Joint probability distribution: The joint probability distribution for the two discrete random variables X and Y
is a two-way table which lists all the possible values of X and Y and gives the probability of their joint occur-
rence:
p(x, y) = P(X = x, Y = y) (x, y)
(In the above denition the symbol (x, y) is a short hand for for all values of x and y).
A joint probability distribution is a probability distribution and so has the same properties as every other probabil-
ity distribution. To nd the probability of any event sum the probabilities of the outcomes in the event in exactly
the same way as you did in section 6.4
'
&
$
%
Conditions for joint probability distributions
The probabilities of a joint probability distribution must satisfy:
(a) 0 p(x, y) 1 (x, y).
(b)

x,y
p(x, y) = 1
Finding the probability of a set of values
P(A) =
(x,y)A
p(x, y)
Example 6.18
Calculate the joint probability distribution of X and Y for the data in Table 6.11.
Solution
Scanning down Table 6.11 we see that the only values of X are 1, 0 and 2. The only values of Y in the table are
0 and 1. Form a two way table with the X values along the top and the Y values down the side.
Table 6.12: Constructing a Joint Probability Distribution
x 1 0 2
y 0
1
Now work through the different combinations of x and y nd the probabilities using the relative frequency ap-
proach to probability.
i. p(1, 0)
There are four months out of 20 with X = 1 and Y = 0.
.
.
. P(X = 1, Y = 0) =
4
20
= 0.2
ii. p(0, 0)
There are two months out of 20 with X = 0 and Y = 0.
.
.
. P(X = 0, Y = 0) =
2
20
= 0.1
Continuing in this way we obtain the following joint probability distribution.
Table 6.13: The Joint Probability Distribution of Prots
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
Notice that the above joint probability distribution does satisfy the conditions listed earlier:
(a) 0 p(x, y) 1 (x, y).
All the probabilities lie between 0 and 1. Probabilities must always lie in this range!
(b)

x,y
p(x, y) = 1
The sum of the probabilities must be 1.
0.2 + 0.1 + 0.0 + 0.1 + 0.4 + 0.2 = 1.0
The probabilities in the joint probability distribution are the probabilities of each outcome and, although they are
written in a table rather than a single column, they can be manipulated in just the same way as the probabilities of
a single random variable. In particular, to nd the probability of any event sum the probabilities of the included
outcomes.
Example 6.19
For the joint probability distribution of Example 6.18, nd the probability:
(a) that store A just breaks even.
(b) that store B makes a prot.
(c) that the total prot of the two stores is positive.
Solution
(a) P(X = 0)
The section of the joint probability distribution with X = 0 is coloured blue below:
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
.
.
. P(X = 0) = 0.1 + 0.4 = 0.5
The probability that store A just breaks even is 0.5.
(b) P(Y > 0)
The section of the joint probability distribution with Y > 0 is coloured blue below:
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
.
.
. P(Y > 0) = 0.1 + 0.4 + 0.2 = 0.7
The probability that store B makes a prot is 0.7.
(c) P(X + Y > 0)
The total prot for each cell in the table is the sum of the X and Y values for the cell. For example in the
cell in the last row and column X = 2 and Y = 1, and so the total prot is X + Y = 3. In the table below
the total prot has been calculated for each cell and entered in red.
x 1 0 2
y 0 0.2
(1)
0.1
(0)
0.0
(2)
1 0.1
(0)
0.4
(1)
0.2
(3)
The section of the joint probability distribution with X + Y > 0 is coloured blue.
.
.
. P(X + Y > 0) = 0.0 + 0.4 + 0.2 = 0.6
The probability that the total prot of the two stores is positive is 0.6.
Marginal probability distributions
The separate probability distributions of X and Y can be found from the joint probability distribution by sum-
ming the probabilities in the same way as described in the previous section. When the probability distribution
of one variable is calculated from a joint probability distribution it is the marginal probability distribution of
that variable. A marginal probability distribution is a probability distribution of the type described in section 6.4
earlier in this unit.
Example 6.20
For the data of Example 6.18 calculate the marginal probability distribution of the prot from store A.
Solution
From the top row of the table we can see that the possible values of X are 1, 0 and 2.
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
To nd P(X = 1) sum the probabilities of the outcomes with X = 1. Similarly for P(X = 0) and P(X = 2).
Summing the probabilities gives:
P(X = 1) = 0.2 + 0.1 = 0.3 summing the red probabilities
P(X = 0) = 0.1 + 0.4 = 0.5 summing the brown probabilities
P(X = 2) = 0.0 + 0.2 = 0.2 summing the blue probabilities.
Writing these probabilities in a probability distribution gives:
Table 6.18: The Marginal Probability Distribution of X
x p
X
(x)
1 0.3
0 0.5
2 0.2
1.0
The marginal probability distribution of Y can be found in the same way this time by summing along the rows
of the table.
Note that to distinguish between the marginal probability distribution of X and the marginal probability distribu-
tion of Y we will use p
X
() for the marginal probability distribution of X and p
Y
() for the marginal probability
distribution of Y .
It is often convenient to display the marginal probabilities as a border to the joint probability distribution as shown
in Table 6.19.
Table 6.19: The Joint and Marginal Probability Distributions
x 1 0 2 p
Y
(y)
y 0 0.2 0.1 0.0 0.3
1 0.1 0.4 0.2 0.7
p
X
(x) 0.3 0.5 0.2 1.0
Independence
In unit 4 we examined the relationship between two variables by using scatterplots, rank correlation coefcients
and correlation coefcients. The most important question asked was is there any relationship between these two
variables?. If there is no relationship between two variables they are said to be independent. We are now in a
position to give a precise denition to the term independent for random variables.
Recall that events A and B are said to be independent if
This result can be applied directly to random variables. If the random variable X is independent of the random
variable Y , then the value of X will have no effect on the value of Y . Thus the events X = x and Y = y are
independent events.
Independence for random variables: The random variables X and Y are said to be independent if:
P(X = x, Y = y) = P(X = x) P(Y = y) (x, y)
Notice that for the random variables X and Y to be independent the above relationship must hold for every value
of x and y. If there is a single value of x and y for which it does not hold then the variables X and Y are not
independent.
The most convenient way of checking for the independence of two random variables is to write the joint and
marginal probability distributions in a single table as we did in Table 6.19
P(X = x, Y = y) is given in the central joint probability part of the table
P(X = x) is the marginal probability at the bottom of the same column.
P(Y = y) is the marginal probability at the end of the same row.
The random variables X and Y are independent if for every entry in the joint probability table:
P(X = x, Y = y) = P(X = x) P(Y = y)
joint probability = (column marginal probability) (row marginal probability).
x 0 x p
Y
(y)
y 0 P(X = 0, Y = 0) P(X = x, Y = 0) P(Y = 0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y P(X = 0, Y = y) P(X = x, Y = y) P(Y = y)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
p
X
(x) P(X = 0) P(X = x) 1.0
Example 6.21
Are the random variables X and Y in Table 6.21 independent?
x 1 0 2 p
Y
(y)
y 0 0.2 0.1 0.0 0.3
1 0.1 0.4 0.2 0.7
p
X
(x) 0.3 0.5 0.2 1.0
Solution
i. Does P(X = 1, Y = 0) = P(X = 1) P(Y = 0)?
x 1 0 2 p
Y
(y)
y 0 0.2 0.1 0.0 0.3
1 0.1 0.4 0.2 0.7
p
X
(x) 0.3 0.5 0.2 1.0
P(X = 1, Y = 0) = 0.2
P(X = 1) P(Y = 0) = 0.3 0.3 = 0.09
.
.
. P(X = 1, Y = 0) = P(X = 1) P(Y = 0)
The random variables X and Y are not independent.
Example 6.22
Are the random variables X and Y in Table 6.22 independent?
x 2 4 p
Y
(y)
y 1 0.32 0.08 0.40
2 0.48 0.12 0.60
p
X
(x) 0.80 0.20 1.0
Solution
i. Does P(X = 2, Y = 1) = P(X = 2) P(Y = 1)?
x 2 4 p
Y
(y)
y 1 0.32 0.08 0.40
2 0.48 0.12 0.60
p
X
(x) 0.80 0.20 1.0
P(X = 2, Y = 1) = 0.32
P(X = 2) P(Y = 1) = 0.80 0.40 = 0.32
.
.
. P(X = 2, Y = 1) = P(X = 2) P(Y = 1)
ii. Does P(X = 4, Y = 1) = P(X = 4) P(Y = 1)?
x 2 4 p
Y
(y)
y 1 0.32 0.08 0.40
2 0.48 0.12 0.60
p
X
(x) 0.80 0.20 1.0
P(X = 4, Y = 1) = 0.08
P(X = 4) P(Y = 1) = 0.20 0.40 = 0.08
.
.
. P(X = 4, Y = 1) = P(X = 4) P(Y = 1)
iii. Does P(X = 2, Y = 2) = P(X = 2) P(Y = 2)?
x 2 4 p
Y
(y)
y 1 0.32 0.08 0.40
2 0.48 0.12 0.60
p
X
(x) 0.80 0.20 1.0
P(X = 2, Y = 2) = 0.48
P(X = 2) P(Y = 2) = 0.80 0.60 = 0.48
.
.
. P(X = 2, Y = 2) = P(X = 2) P(Y = 2)
iv. Does P(X = 4, Y = 2) = P(X = 4) P(Y = 2)?
x 2 4 p
Y
(y)
y 1 0.32 0.08 0.40
2 0.48 0.12 0.60
p
X
(x) 0.80 0.20 1.0
P(X = 4, Y = 2) = 0.12
P(X = 4) P(Y = 2) = 0.20 0.60 = 0.12
.
.
. P(X = 2, Y = 4) = P(X = 2) P(Y = 4)
We have shown that
P(X = x, Y = y) = P(X = x) P(Y = y) for all(x, y)
X and Y are independent random variables. The value of X has no effect on the value of Y and the value of Y
has no effect on the value of X.
Notice
to show that X and Y are not independent it is only necessary to nd one value of X and Y for which
P(X = x, Y = y) = P(X = x) P(Y = y).
to show that X and Y are independent it is necessary to show that for every value of X and Y it is true
P(X = x, Y = y) = P(X = x) P(Y = y).
Expected values for joint probability distributions
In section 6.6 you learned of the expected value of a function of a single random variable. Remember that the
expected value is just the average value. Expected values are also useful for joint probability distributions.
Example 6.23
An entrepreneur has calculated the following joint probability distribution for the prots in a month in two stores.
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
where X = prot in a month from store A ($000)
What is the interpretation of E[X], E[Y ] and E[X + Y ]?
Solution
E[X] = the mean prot per month from store A in thousand dollars.
E[Y ] = the mean prot per moth from store B in thousand dollars.
E[X + Y ] = the mean prot per month from the two stores in thousand dollars.
The denition of expected values for joint probability distributions is a simple extension of the denition that you
used before for probability distributions.
Expected values for joint probability distributions: If X and Y are two randomvariables with joint probability
distribution p(x, y) and g(X, Y ) is any function of X and Y then
E[g(X, Y )] =
y
g(x, y)p(x, y)
is called the expected value of g(X, Y )
Although this denition looks complicated at rst, it very easy to apply.
To nd the expected value of the function g(X, Y ):
1. For each probability in the joint probability table nd the value of the function i.e. calculate g(x, y).
2. Multiply the value of the function by the probability of the point i.e. calculate g(x, y)p(x, y).
3. Sum the results i.e. calculate

g(x, y)p(x, y).
This process is illustrated in the following example.
Example 6.24
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
where X = prot in a month from store A ($000)
Calculate:
(a) E[X]
(b) E[Y ]
(c) E[X + Y ]
(d) E[XY ]
(a) E[X]
The value of X for each probability in the joint probability table is shown in Table 6.25 below.
Table 6.25: Calculating E[X]
x 1 0 2
y 0 0.2
(1)
0.1
(0)
0.0
(2)
1 0.1
(1)
0.4
(0)
0.2
(2)
.
.
. E[X] = (1) 0.2 + 0 0.1 + 2 0.0 + (1) 0.1 + 0 0.4 + 2 0.2 = 0.1
Store A has a mean prot of $100 per month.
You now know 3 ways of calculating this gure!
1. From the raw data in Table 6.11. Calculate the mean prot for store A by using
=
x
N
=
2
20
= 0.1
2. From the marginal probability distribution of X in Table 6.18
E[X] =

xp(x) = (1) 0.3 + 0 0.5 + 2 0.2 = 0.1
3. From the joint probability distribution:
E[X] =

xp(x, y) = 0.1
(b) E[Y ]
The value of Y for each probability in the joint probability table is shown in Table 6.26 below.
Table 6.26: Calculating E[Y ]
x 1 0 2
y 0 0.2
(0)
0.1
(0)
0.0
(0)
1 0.1
(1)
0.4
(1)
0.2
(1)
.
.
. E[Y ] = 0 0.2 + 0 0.1 + 0 0.0 + 1 0.1 + 1 0.4 + 1 0.2
= 0.7
Store B has a mean prot of $700 per month.
(c) E[X + Y ]
The value of X + Y for each probability in the joint probability table is shown in Table 6.27 below.
Table 6.27: Calculating E[X + Y ]
x 1 0 2
y 0 0.2
(1)
0.1
(0)
0.0
(2)
1 0.1
(0)
0.4
(1)
0.2
(3)
(For example for the probability of 0.2 in the last row and column of the table: In this column X = 2 and
in this row Y = 1. Thus X + Y = 3.)
.
.
. E[X + Y ] = (1) 0.2 + 0 0.1 + 2 0.0 + 0 0.1 + 1 0.4 + 3 0.2
= 0.8
The combined prots of the two stores has a mean of $800 per month.
(d) E[XY ]
The value of XY for each probability in the joint probability table is shown in Table 6.28 below.
Table 6.28: Calculating E[XY ]
x 1 0 2
y 0 0.2
(0)
0.1
(0)
0.0
(0)
1 0.1
(1)
0.4
(0)
0.2
(2)
.
.
. E[XY ] = 0 0.2 + 0 0.1 + 0 0.0 + (1) 0.1 + 0 0.4 + 2 0.2
= 0.3
The product of the prots of the two stores has a mean of $
2
300,000 per month.
(This may appear to be a meaningless calculation but we will see how to make use of this value later in the
section.)
Measuring the relationship between two random variables
One of the major interests with joint probability distributions is the relationship between the two variables.
If the random variable X has a high value does this make it more or less likely that the variable Y has a
high value. In terms of the example above: if store A has a protable month, does this make it more or
less likely that store B has a protable month?
Similarly, if the random variable X has a low value does this make it more or less likely that the variable
Y has a low value. In terms of the example above: if store A has a low prot month, does this make it
more or less likely that store B has a low prot month?
In unit 4 you saw how to use the covariance and the correlation coefcient to measure the relationship between
two variables. The denition below restates the denition of the covariance in probability terms.
The covariance of X and Y :
XY
= E[(X
X
)(Y
Y
)]
where
X
= E[X] = the mean of the random variable X
Y
= E[Y ] = the mean of the random variable Y
is called the covariance of X and Y .
The covariance has the same interpretation as in unit 4. It measures the direction of the relationship between the
random variables X and Y but gives no indication of the strength of the relationship.
Example 6.25
Calculate the covariance between the prots in store A and the prots in store B for the joint probability distribu-
tion in Example 6.18.
Solution
In Example 6.24 we calculated:
X
= E[X] = 0.1
Y
= E[Y ] = 0.7
The value of (x
X
)(y
Y
) for each probability in the joint probability table is shown in Table 6.29 below.
Table 6.29: Calculating E[(X
X
)(Y
Y
)]
x 1 0 2
y 0 0.2
(+0.77)
0.1
(+0.07)
0.0
(1.33)
1 0.1
(0.33)
0.4
(0.03)
0.2
(+0.57)
(For example for the probability of 0.2 in the last row and column of the table:
In this column X = 2 .
.
. (x
X
) = 2 0.1 = 1.9
In this column Y = 1 .
.
. (y
Y
) = 1 0.7 = 0.3
.
.
. (x
X
)(y
Y
) = 1.9 0.3 = 0.57 )
.
.
. E[(X
X
)(Y
Y
] = (+0.77) 0.2 + (+0.07) 0.1 + (1.33) 0.0
+(0.33 0.1 + (0.03) 0.4 + (+0.57) 0.2
= 0.23
The covariance between the prots in the two stores is +$
2
230,000. As the covariance is positive, on average the
prots of the two stores tend to increase together.
Points to note about the covariance are:
1. The calculation used in the previous example is a highly inefcient method of calculating the covariance.
To calculate the covariance always use the computational formula:
XY
= E[XY ] E[X].E[Y ]
2. It is the sign of the covariance that is important.
A positive covariance shows that as the value of the random variable X increases the average value of the
Y variable increases.
A negative covariance shows that as the value of the random variable X increases the average value of
the Y variable decreases.
3. If X and Y are independent the covariance of X and Y is 0. If the covariance of X and Y is not 0, the
variables X and Y are not independent.
4. However, if the covariance of X and Y is 0 this does not show that the variables are independent. A
covariance of 0 is a necessary but not a sufcient condition for independence. To show independence use
the method of the previous section.
Example 6.26
Use the computational formula to calculate the covariance between the prots in store A and the prots in store
B for the joint probability distribution in Example 6.18.
Solution
In Example 6.24 we calculated:
E[X] = 0.1
E[Y ] = 0.7
E[XY ] = 0.3
.
.
.
XY
= E[XY ] E[X].E[Y ]
= 0.3 0.1 0.7 = 0.23
The covariance between the prots in the two stores is +$
2
230,000. On average the prots of the two stores tend
to increase together.
The correlation coefcient gives a measure of the strength of the linear relationship between two randomvariables.
The denition and interpretation is exactly the same as given in Unit 4.
The correlation coefcient: The correlation between the random variables X and Y is denoted by
XY
and is
calculated by:
XY
=

XY
Y
where
XY
= the covariance of the random variables X and Y
X
= the standard deviation of the random variable X
Y
= the standard deviation of the random variable Y
For all random variables we have 1
XY
+1.
Table 6.30: Using the Correlation Correlation to Categorise Linear Relationships
correlation coefcient Strength of linear relationship
XY
= +1 perfect positive linear relationship
+ 0.7
XY
< +1.0 strong positive linear relationship
+ 0.4
XY
< +0.7 medium positive linear relationship
0.4 <
XY
< +0.4 little or no linear relationship
0.7 <
XY
0.4 medium negative linear relationship
1.0 <
XY
0.7 strong negative linear relationship
XY
= 1 perfect negative linear relationship
Example 6.27
x 1 0 2
y 0 0.2 0.1 0.0
1 0.1 0.4 0.2
What is the correlation between the sales in the two stores?
Solution
The full calculations are laid out in a systematic way below.
First calculate the marginal probability distributions of X and Y by summing the probabilities down the columns
and along the rows (see Example 6.20)
x 1 0 2 p
Y
(y)
y 0 0.2 0.1 0.0 0.3
1 0.1 0.4 0.2 0.7
p
X
(x) 0.3 0.5 0.2 1.0
Now use the marginal distributions to calculate the variances of X and Y .
E[X] = (1) 0.3 + 0 0.5 + 2 0.2 = +0.1
E[X
2
] = (1)
2
0.3 + 0
2
0.5 + 2
2
0.2 = +1.1
.
.
.
2
X
= E[X
2
] (E[X])
2
= 1.1 0.1
2
= +1.09
E[Y ] = 0 0.3 + 1 0.7 = +0.7
E[Y
2
] = 0
2
0.3 + 1
2
0.7 = +0.7
.
.
.
2
Y
= E[Y
2
] (E[Y ])
2
= 0.7 0.7
2
= +0.21
Now use the joint probability distribution to calculate the covariance between the two random variables.
E[XY ] = 0 0.2 + 0 0.1 + 0 0.0 + (1) 0.1 + 0 0.4 + 2 0.2
= 0.3
.
.
.
XY
= E[XY ] E[X]E[Y ]
= 0.3 0.1 0.7
= 0.23
.
.
.
XY
=

XY
Y
=
0.23
1.09 0.21
= +0.48
There is a medium strength positive linear relationship between the prots in the two stores
6.10 Linear functions of random variables
In portfolio analysis you will learn how to invest funds so as to obtain the optimal return on the investment. You
will learn that the optimal policy is nearly always to spread the investment over several different assets. This
reduces the risk of the investment. In this section you will learn some of the basic results used in portfolio theory.
To simplify the problem we will restrict our analysis to a portfolio of two assets.
Let X = the rate of return from an investment in asset A (%)
Y = the rate of return from an investment in asset B (%)
R = the rate of return from the portfolio as a whole (%)
If an investor invests a proportion p of her funds in asset A and the remaining proportion (1 p) of her funds in
asset B, then the rate of return from the portfolio as a whole is:
r = px + (1 p)y
Example 6.28
An entrepreneur invests his funds on the stock market. An investment in Company A pays a return of 8% on the
investment and an investment in Company B pays a return of 12% on the investment. Calculate the investors rate
of return on his portfolio if he invests
(a) 100% of his funds in Company A.
(b) 75% of his funds in Company A and 25% in Company B.
(c) 50% of his funds in Company A and 50% in Company B.
(d) 25% of his funds in Company A and 75% in Company B.
(e) 100% of his funds in Company B.
Solution
Let X = the rate of return from an investment in asset A (%)
Y = the rate of return from an investment in asset B (%)
(a) Here all the money invested gives a return of 8%. The return from the portfolio as a whole is 8%.
(b) Now 75% of the money invested pays a return of 8% and the remaining 25% pays 12%.
r = px + (1 p)y = 0.75 8 + 0.25 12 = 9
This portfolio pays 9%.
(c) Now 50% of the money invested pays a return of 8% and the remaining 50% pays 12%.
r = px + (1 p)y = 0.50 8 + 0.50 12 = 10
(d) Now 25% of the money invested pays a return of 8% and the remaining 75% pays 12%.
r = px + (1 p)y = 0.25 8 + 0.75 12 = 11
(e) All the money invested pays 12% and so the portfolio as a whole pays 12%.
In the above example the highest rate of return is obtained by investing all the money in Company B. However the
rates of return on most investments are not known with certainty. There is a spread to the possible rates of return.
The rate of return on an investment is specied by a probability distribution listing the possible percentage rates
of return on the investment and their probabilities.
The wider the spread of the possible rates of return, the more uncertain is the rate of return from the investment.
The spread of the possible rates of return on an investment is measured by the standard deviation of the possible
rates of return. This leads to the following denition of the risk of an investment.
Risk of an investment: The standard deviation of the probability distribution of the possible rates of return on
the investment.
Example 6.29
An investor believes that the rate of return from investing in Company A depends on the state of the economy
over the next year. She has constructed the following probability distribution of the possible rates of return.
Table 6.33: Probability Distribution of the Rates of Return of an Investment in Company A
Condition of Economy Probability Rate of Return from Investing in A (%)
p(x) x
Poor 0.2 5
Fair 0.6 8
Good 0.2 11
What is the expected return and the risk from investing in Company A?
Solution
Let X = the rate of return from investing in Company A.
Then E[X] =

xp(x) = 5 0.2 + 8 0.6 + 11 0.2 = 8.0
E[X
2
] =

x
2
p(x) = 5
2
0.2 + 8
2
0.6 + 11
2
0.2 = 67.6
.
.
.
2
X
= E[X
2
] (E[X])
2
= 67.6 8.0
2
= 3.6
.
.
.
X
=
3.6 = 1.897.
The expected rate of return from the investment is 8% and the risk is 1.9%.
Investors compare stocks by using a Risk and Return Diagram. With this diagram the expected return of the
stock is plotted up the vertical axis and the risk (standard deviation of the possible returns) is plotted along the
horizontal axis.
Figure 6.1: The Risk and Return on 5 Stocks
0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
40
A
B
C
D
E
Investors are assumed to
prefer high expected returns to low expected returns.
prefer low risk investments to high risk investments.
High returns are usually only attainable on high risk stocks and so investors have to balance the returns and the
risk. In the above diagram no investor would invest in stock C or D because a higher return could be obtained at
a lower risk by investing in stock E.
One way of reducing the risk is to invest in a number of assets and we now turn to the problem of calculating
the expected rate of return and the risk of a portfolio of shares in two companies. The rate of the return on the
portfolio, R, is:
r = px + (1 p)y
where X = the rate of return from an investment in Company A (%)
Y = the rate of return from an investment in Company B (%)
p = the proportion of the funds invested in Company A
and both X and Y are random variables with known probability distributions. The following general results can
be used to nd the expected rate of return and risk on a portfolio of investments.
'
&
$
%
The mean and variance of a linear combination of random variables
If X and Y are random variables and
w = ax + by + c
where a, b and c are constants then:
(a) E[W] = aE[X] + bE[Y ] + c
(b)
2
W
= a
2
2
X
+ b
2
2
Y
+ 2ab
XY
where
2
X
= the variance of the random variable X
2
Y
= the variance of the random variable Y
XY
= the covariance of the random variables X and Y
Recall that the correlation coefcient is dened as:
XY
=

XY
Y
.
.
.
XY
Y
=
XY
Substituting for
XY
in (b) gives a useful alternative form of (b).
'
&
$
%
The mean and variance of a linear combination of random variables
w = ax + by + c
(a) E[W] = aE[X] + bE[Y ] + c
(b)

2
W
= a
2
2
X
+ b
2
2
Y
+ 2ab
XY
Y
where
2
X
= the variance of the random variable X
2
Y
= the variance of the random variable Y
XY
= the correlation of the random variables X and Y
You will nd that this is the form used in your nance lectures.
Example 6.30
An entrepreneur invests his funds on the stock market. He invests 20% of his funds in Company A and 80% of
his funds in Company B.
an investment in A has an expected return of 8% and a risk of 5%.
an investment in B has an expected return of 12% and a risk of 10%
Calculate the expected return and risk from this portfolio when
(a) the returns on the two investments are independently distributed.
(b) the returns on the two investments have a correlation of +0.8.
(c) the returns on the two investments have a correlation of 0.8.
Solution
Let X = the rate of return from an investment is Company A (%)
Y = the rate of return from an investment in Company B (%)
We have
E[X] = 8
X
= 5
E[Y ] = 12
Y
= 10
r = 0.2x + 0.8y
(a) the returns on the two investments are independently distributed.
As the returns are independently distributed their covariance is 0 i.e.
XY
= 0.
r = 0.2x + 0.8y
.
.
. E[R] = 0.2E[X] + 0.8E[Y ]
= 0.2 8 + 0.8 12
= 11.2
2
R
= 0.2
2
2
X
+ 0.8
2
2
Y
+ 2 0.2 0.8
XY
= 0.2
2
5
2
+ 0.8
2
10
2
+ 2 0.2 0.8 0
= 65
.
.
.
R
=
65 = 8.062
The portfolio has an expected rate of return of 11.2% and a risk of 8.1%.
(b) the returns on the two investments have a correlation of +0.8.
r = 0.2x + 0.8y
.
.
. E[R] = 0.2E[X] + 0.8E[Y ]
= 0.2 8 + 0.8 12
= 11.2
2
R
= 0.2
2
2
X
+ 0.8
2
2
Y
+ 2 0.2 0.8
XY
Y
= 0.2
2
5
2
+ 0.8
2
10
2
+ 2 0.2 0.8 5 10
= 77.8
.
.
.
R
=
77.8 = 8.820
(c) the returns on the two investments have a correlation of 0.8.
r = 0.2x + 0.8y
.
.
. E[R] = 0.2E[X] + 0.8E[Y ]
= 0.2 8 + 0.8 12
= 11.2
2
R
= 0.2
2
2
X
+ 0.8
2
2
Y
+ 2 0.2 0.8
XY
Y
= 0.2
2
5
2
+ 0.8
2
10
2
+ 2 0.2 (0.8) 5 10
= 52.2
.
.
.
R
=
52.2 = 7.225
The expected rate of return is unaffected by the correlation between the two rates of return. However the risk can
be reduced by combining investments with a negative correlation in the portfolio. Generally high expected rates
of return can only be obtained with a high risk factor. However it may be possible by combining two high return,
high risk shares with a strong negative correlation to form a portfolio with a high expected rate of return and a
low risk factor.
6.11 Summary
This is the second of three units on probability. Probability is used to build a foundation for the measurement of
the reliability of sampling. In this unit discrete probability distributions wer89 e discussed. You learned to dene
a probability distribution and to calculate the mean and the variance of a probability distribution.
You also learned about the two most widely used discrete probability distributions the binomial distribution
and the Poisson distribution. You will nd in later units that the Binomial distribution is used to measure the
reliability of sample estimates of population proportions. One of the most important uses of the Poisson is to
calculate binomial probabilities when p is small.
You may nd it difcult to decide when to use the Binomial and when to use the Poisson. Both of these distribu-
tions count how often an outcome occurs.
Use the binomial to calculate the probability of the number of times the outcome occurs out of a xed
number of tries: the probability of 4 occurrences out of 6 tries; the probability of 3 occurrences in 12
tries etc.
Use the Poisson to calculate the probability of the number of times the outcome occurs in a xed interval:
the probability of 4 occurrences in an hour; the probability of 6 occurrences in a day etc.
For both of these probability distributions, the probabilities can be calculated from the UC Statistical Tables .
Practice using these Tables.
The binomial and Poisson distributions are examples of discrete probability distributions. In the next unit, the nal
probability unit, you will learn about the normal curvethe most important continuous probability distribution.
At the end of the unit you learned of joint probability distributions and their applications to portfolio theory.
UNIT 7
CONTINUOUS
PROBABILITY
DISTRIBUTIONS
7.0 Contents
7.1 Unit objectives
7.2 Introduction
7.3 Probability density functions
7.4 The normal distribution
7.5 Using the inverse normal tables
7.6 Linear functions of continuous random variables
7.7 The normal approximation to the binomial
7.8 Summary
Print Workbook 7
7.1 Unit objectives
In this unit, you will learn to calculate probabilities for continuous random variables. The normal distribution,
which is the most important distribution in statistics, is discussed.
Dene the term probability density function;
Use geometric methods to calculate probabilities for simple probability density functions;
Describe the normal distribution;
Use normal tables to calculate probabilities for the normal distribution;
Use inverse normal tables to calculate values of a normal random variable for specied probabilities;
Use normal tables to calculate probabilities for linear functions of normal variables;
Use normal tables to calculate approximate probabilities for binomial distributions.
7.2 Introduction
In Unit 6, we made a distinction between discrete and continuous random variables.
Discrete random variable: A random variable with values that can be listed.
Continuous random variable: A random variable that can assume all the values in an interval.
As you know, probabilities for a discrete random variable can be specied by listing the values of the random
variable and their probabilities in a table.
Continuous random variables are measuring variables such as the weight of a parcel, the rate of return on an
investment and the sugar yield from a hectare of sugarcane. The values of a continuous random variable are too
numerous to be listed. Consequently we use a different method to specify the probabilities. Probabilities for
a continuous random variable are specied as areas under graphs. These graphs are called probability density
functions. You will learn their properties in the following section. We also discuss how to use the UC Statistical
Tables to calculate the areas under the normal probability density function. In the last section, you will learn to
use normal probability density functions to calculate approximate probabilities for binomial distributions.
7.3 Probability density functions
The probability that a continuous random variable has a value between a and b is given by the area under a graph
between a and b. This graph is called the probability density function of the random variable. You will always
be told which graph to use for each continuous random variable.
Figure 7.1: A Probability Density Function
X a b
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ..................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P(a < X < b)
f(x)
Probability density functions can be thought of as limiting forms of the relative frequency histograms that we
constructed in Unit 2.
For the areas under the curve f(x) to be representative of probabilities, the areas under the curve must be positive
and the total area must be 1. Thus there are two conditions that the function f(x) must satisfy to be a probability
density function.
'
&
$
%
Conditions for probability density functions
The probability density function f(x) must satisfy:
(a) f(x) 0 for all values of x.
(b) the total area under the graph of f(x) is 1.
Example 7.1
A number is chosen at random from the range 5 to 15. Let X be the number chosen. Then the probability density
function for the continuous random variable X is
f(x) =
1
10
for 5 x 15
= 0 elsewhere
(a) Is the given function a probability density function?
(b) What is the probability that the number chosen is more than 8?
Solution
(a) The graph of the function is shown in Figure 7.2
Figure 7.2: A Linear Function
X
f(x)
5 10 15 20
0
0.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Clearly, f(x) 0 for all values of x and so the function satises the rst condition for a probability density
function.
The total area under the graph of the function is the shaded area in Figure 7.2. As the total area of a rectan-
gle is the base times the height we have:
the total area under the graph = 10 0.1 = 1.0
Thus the function satises the second condition for a probability density function.
The given function is therefore a probability density function.
(b) The P(X > 8) is given by the shaded area in Figure 7.3.
Figure 7.3: A Linear Probability Density Function
X
f(x)
5 8 10 15 20
0
0.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P(X > 8)
The area of a rectangle is the base times the height.
.
.
. P(X > 8) = 7 0.1 = 0.7
There is a probability of 0.7 that the number chosen is more than 8.
Representing probabilities by areas has important implications for the probabilities of continuous random vari-
ables:
(a) P(X = x) = 0 for each value a of the random variable.
From Figure 7.1, the P(X = a) is the area of the line over a. The area of a line is 0 and so P(X = a) = 0.
This is true for every value of X. Notice that probabilities are never given by the height of a probability
density function. Probabilities are always given by areas.
(b) P(a < X < b) = P(a X b).
Both these probabilities are given by the area under the graph between a and b.
Quick Question 1
7.4 The normal distribution
The normal distribution is the most important distribution in statistics.
The normal distribution is important for three reasons:
1. Many continuous variables such as the heights of adult males, the heights of adult females, the marks of
students in some exams, the weights of pigs of a specied age and the yields of sugarcane from a hectare
of land are normally distributed. The normal is so widespread in its applications that it is usually assumed
that a continuous random variable is normally distributed unless there is evidence to the contrary.
2. Many discrete probability distributions can be approximated by the normal distribution (see section 7.7).
3. It provides the basis for classical inferential statistics.
The normal distribution is not a single distribution but a whole family of distributions. A normal distribution can
be drawn with any mean and with any positive standard deviation. The shape of the normal distribution is shown
in Figure 7.4 below.
Figure 7.4: The Normal Probability Density Function
X
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
SD =
Mean = Median = Mode
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
(The normal curves in these notes are drawn accurately with regard to their shape, but have been raised slightly
to make the graphs clearer.)
'
&
$
%
Properties of the normal distribution:
1. The normal distribution is symmetric about the mean. The mean, the median and the mode are all equal.
2. The total area under the curve is 1. The area above the mean and the area below the mean are each 0.5.
3. The curve is asymptotic to the xaxis. This means that the curve gets closer and closer to the xaxis but
never touches the xaxis.
4. The area below any value of X is determined by the zscore for that value. The zscore is given by:
z =
x
If the random variable X has a normal distribution with mean and variance
2
, we write
X N(,
2
)
Note that with this notation for the normal distribution, the second parameter given is the variance and not the
standard deviation.
The standard normal: The normal distribution with a mean of 0 and variance of 1 is called the standard normal
distribution.
The letter Z is used for a random variable with the standard normal distribution. Then Z N(0, 1).
The UC Statistical Tables on pages 34 and 35 give the areas under the standard normal distribution to the left of
values of Z.
Figure 7.5: The Standard Normal Probability Density Function
Z
3 2 1 0 1 2 3
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ...............................................................................................................................................................................................................................................................................................................................................................................
= 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................
...........................
UC Statistical Tables give this area
Other normal tables give different areas. The table in the text gives the area between the mean and the zscore.
In your tests and examination, you will use the UC Statistical Tables . Practice using these tables rather than the
table in the text.
From property 4 of the normal distribution, the standard normal tables can be used to calculate the probabilities
for any normal distribution. This is illustrated in the following example.
Example 7.2
A brand of car tyre has a mean life of 60,000 km and a standard deviation of 7,000 km. The lives of these tyres
are normally distributed. What is the probability that a randomly selected tyre:
(a) has a life of less than 75,000 km?
(b) has a life of more than 55,000 km?
(c) has a life of between 60,000 km and 70,000 km?
Solution
Let X = the life of a randomly selected tyre (000 km)
Then X N(60, 7
2
)
(a) What is the probability the selected tyre has a life of less than 75,000 km?
First sketch the curve and identify the required area. When sketching the curve, use ( 3) for the lower
limit and ( + 3) for the upper limit of the axis. (The normal curve is dened for values of X from
to + but recall from the empirical rule in Unit 4 that nearly all observations lie within three standard
deviations of the mean.)
The required area is shown in Figure 7.6 below.
Figure 7.6: Lives of a Brand of Tyre
X
39 60 75 81
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
P(X < 75)
= 7
Z = 2.14
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Now calculate the zscores for the values of interest to two decimal places:
X = 75 Z =
75 60
7
= 2.14
You can see from the graph that the area required is the area below Z = 2.14.
The area below Z = 2.14 can be found from page 35 of the UC Statistical Tables . The integer and rst
decimal place of the zscore determine the row in the table and the second decimal place determines the
column.
From the UC Statistical Tables page 35
Z =2.14
Z .00 .01 .02 .03 .04 .09

0.10 0.5398 0.5438 0.5478 0.5517 0.5557 0.5754
0.20 0.5793 0.5832 0.5871 0.5910 0.5948 0.6141
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Z = 2.14 2.10 0.9821 0.9826 0.9830 0.9834 0.9838 0.9857
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
P(X < 75) = P(Z < 2.14)
= 0.9838
The probability that a tyre has a life of less than 75,000 km is 0.9838.
(b) What is the probability the selected tyre has a life of more than 55,000 km?
Always sketch the curve and mark in the required area. The required area is shown in Figure 7.7.
X
39 55 60 75 81
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
P(X > 55)
= 7
Z = 0.71
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
The value of interest here is 55 (000 km).
X = 55 Z =
55 60
7
= 0.71
The required area is the area to the right of Z = 0.71.
Always remember that the tables give the area to the left of the zscore.
From property 2 of the normal distribution the total area under the curve is 1. The area to the right of any
point can be found by using :
area to left + area to right = total area under the curve = 1
.
.
. area to right = 1 area to the left
P(X > 55) = 1 P(X 55)
= 1 P(Z 0.71)
= 1 0.2389 (use page 34 for negative zscores)
= 0.7611
The probability that a tyre has a life of more than 55,000 km is 0.7611.
(c) What is the probability the selected tyre has a life of between 55,000 km and 75,000 km?
Sketching the curve and marking in the required area leads to Figure 7.8.
X 39 55 60 75 81
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
P(55 < X < 75)
= 7
Z = 0.71
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Z = 2.14
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
The values of interest here are 55 and 75 (000 km).
X = 55 Z =
55 60
7
= 0.71
X = 75 Z =
75 60
7
= 2.14
It can be seen from the graph that the area required is the area below Z = 2.14 less the area below Z =
0.71.
P(55 < X < 75) = P(X < 75) P(X 55)
= P(Z < 2.14) P(Z 0.71)
= 0.9838 0.2389
= 0.7449
The probability that a tyre has a life of between 55,000 and 75,000 km is 0.7449.
Quick Question 2
7.5 Using the inverse normal tables
Now you know how to use the normal tables to calculate areas for a specied value of the random variable X. In
this section, you will learn to calculate the value of a random variable X from a specied value of the area.
The standard notation here is to use z
.
z
: z
is the z-score with an area of to the right (where 0 1).

Figure 7.9: The Denition of z
Z
3 0 z
3
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. . .
............................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P(Z > z
) =
= 1
Notice:
(a) The area to the right of 0 is 0.50 and so z
0.50
= 0.
(b) For < 0.5, z
is located to the right of the mean and hence is positive.

(c) For > 0.5, z
is located to the left of the mean and hence is negative.

These zscores can be found by using Table 4B on page 33 of the UC Statistical Tables .
Example 7.3
Consider a brand of car tyre that has a mean life of 60,000 km and a standard deviation of 7,000 km. Assume that
the lives of these tyres are normally distributed. What is the life of the longest lasting 10% of the tyres?
Solution
Here we must nd the value of x in Figure 7.10.
X
39 60 x 81
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ...............................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. . .
................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P(X > x) = 0.10
= 7
Z = 1.2816
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 4B on page 33 of the UC Statistical Tables gives the zscores for specied areas to the left.
Table 7.1: Commonly used areas of the Normal Distribution
PROPORTION Z PROPORTION Z
0.0001 3.7191 0.5 0.0000
0.0005 3.2905 0.6 0.2533
0.001 3.0902 0.7 0.5244
0.0025 2.8071 0.75 0.6745
0.005 2.5758 0.8 0.8416
0.01 2.3263 0.85 1.0364
0.025 1.9600 0.9 1.2816
0.05 1.6449 0.95 1.6449
0.1 1.2816 0.975 1.9600
0.15 1.0364 0.99 2.3263
.
.
.
.
.
.
.
.
.
.
.
.
For the area to the right of x to be 0.10, the area to the left of x must be 0.90. From the above table, x has a
zscore of 1.2816, i.e. z
0.10
= 1.2816. We have
x 60
7
= 1.2816
.
.
. x = 7 1.2816 + 60 = 68.97
Only 10% of the tyres last for more than 69,000 kilometres.
Quick Question 3
7.6 Linear functions of a continuous random variable
The rules for expected values of random variables that you learned in sections 6.6 and 6.10 also apply to contin-
uous random variables. The main result from section 6.6 was:
'
&
$
%
Mean and variance of linear functions of a random variable
If X is a random variable and
w = ax +b
where a and b are constants then:
(a) E[W] = aE[X] +b
(b) V [W] = a
2
V [X]
There is an important extra result for linear functions of a normal variable:
'
&
$
%
Linear functions of a normal variable
If X is a normally distributed random variable and
w = ax +b
where a and b are constants then W is normally distributed.
The main result from section 6.10 was:
'
&
$
%
Mean and variance of linear functions of random variables
w = ax +by +c
(a) E[W] = aE[X] +bE[Y ] +c
(b) V [W] = a
2
V [X] +b
2
V [Y ] + 2abCOV [X, Y ]
= a
2
V [X] +b
2
V [Y ] + 2ab
XY
Y
where COV [X, Y ] = the covariance of X and Y
XY
= the correlation of X and Y
Again there is an important extra result for linear functions of normal variables that is widely used in portfolio
theory:
'
&
$
%
Linear functions of normal variables
If X and Y are normally distributed random variables and
w = ax +by +c
where a, b and c are constants then W is normally distributed.
Normally distributed random variables are so common that many models in economics and accounting assume
that variables are normally distributed. This often leads to linear functions of a normal random variable.
Example 7.4
Con the Fruiterer nds that his daily sales of bananas are normally distributed with mean sales of 500 kg and a
standard deviation of 50 kg. He buys the bananas for $1.00 per kg and sells the bananas for $1.20 per kg. The
overheads applied to bananas are $80 per day. What is the probability that Con makes a net prot on the bananas?
Solution
Let X = the daily sales of bananas (kg)
and W = the daily net prot from the sales of bananas ($)
Then X N(500, 50
2
)
Con makes a gross prot of $0.2 (20 cents) on each kilogram of bananas sold. Thus if he sells x kilograms he
makes a gross prot of 0.2x. After applying overheads this gives a net prot of 0.2x 80. Net prot is a linear
function of a normally distributed random variable and so net prot is normally distributed.
w = 0.2x 80
.
.
. E[W] = 0.2E[X] 80 = 0.2 500 80 = 20
V [W] = 0.2
2
V [X] = 0.2
2
50
2
= 10
2
.
.
. W N(20, 10
2
)
Figure 7.11: Probability Distribution of Cons Prots
Prot($)
10 0 20 50
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
P(W > 0)
Z = 2.00
.............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
= 10
W = 0 Z =
0 20
10
= 2.00
.
.
. P(W > 0) = 1 P(W 0)
= 1 P(Z 2.00)
= 1 0.0228
= 0.9772
There is a probability of 0.9772 that Con makes a net prot on the sales of bananas.
Quick Question 4
Example 7.5
An investor was considering buying shares in Companies A and B. Her Investment Analyst advised her that the
expected return on shares in Company A was 10% with a standard deviation of 5% while the expected return
on shares in Company B was 15% with a standard deviation of 10%. The returns on both shares are normally
distributed and there is estimated correlation of -0.8 between the returns from shares in Company A and Company
B. The investor decided to invest 20% of her funds in Company A and 80% of her funds in Company B.
What is the probability density function of the returns on this portfolio and what is the mean and variance of these
returns?
Solution
Let X = the return from investing in Company A (%)
Y = the return from investing in Company B (%)
W = the return from this diversied portfolio(%).
X N(10, 5
2
) and Y N(15, 10
2
) and
XY
= 0.8
Then 20% of her funds will have a return of x and 80% will have a return of y and so the overall portfolio return
is given by
w = 0.2x + 0.8y
As W is a linear function of normal variables, W has a normal distribution. The possible returns fromthe portfolio
are normally distributed.
w = 0.2x + 0.8y
.
.
. E[W] = 0.2E[X] + 0.8E[Y ] = 0.2 10 + 0.8 15 = 14
V [W] = 0.2
2
V [X] + 0.8
2
V [Y ] + 2 0.2 0.8
XY
Y
= 0.2
2
5
2
+ 0.8
2
10
2
+ 2 0.2 0.8 (0.8) 5 10
= 52.2
.
.
. W N(14, 52.2)
Using this probability density function the investor can calculate the probabilities of different returns on the
portfolio.
The results on linear functions of normal variables can be extended to three or more random variables in an obvi-
ous way.
'
&
$
%
Mean and variance of linear functions of random variables
If X, Y and U are random variables and
w = ax +by +cu +d
where a, b, c and d are constants then:
(a) E[W] = aE[X] +bE[Y ] +cE[U] +d
(b) V [W] = a
2
V [X] +b
2
V [Y ] +c
2
V [U] + 2abCOV [X, Y ] + 2acCOV [X, U] + 2bcCOV [Y, U]
= a
2
V [X] +b
2
V [Y ] +c
2
V [U] + 2ab
XY
Y
+ 2ac
XU
Y
+ 2bc
Y U
U
'
&
$
%
Linear functions of normal variables
If X,Y and U are normally distributed random variables and
w = ax +by +cu +d
where a, b, c and d are constants then W is normally distributed.
Quick Question 5
7.7 The normal approximation to the binomial
In this section, you will learn to use the normal distribution to calculate binomial probabilities. To emphasise the
distribution being used, the notation P
B
is used to denote a binomial distribution probability and P
N
is used to
denote a normal probability.
In Unit 6 you learnt to use the binomial tables. These tables cover most values of n from 2 to 25. If n > 25 and
np 5 or n(1 p) 5, the Poisson distribution can be used to calculate approximate binomial probabilities.
Now it is possible to ll in the gap in the binomial probabilities by using the normal distribution to compute
approximate binomial probabilities when n > 25 and both np > 5 and n(1 p) > 5.
The result here is as follows:
'
&
$
%
Normal approximation to the binomial
If X B(n, p) with
i. n large: n > 25
and ii. p neither small nor close to 1: np > 5 and n(1 p) > 5
then X N (np, np(1 p))
and P
B
(X = x) = P
N
(x 0.5 < X < x + 0.5)
(Read the symbol as is approximately distributed as and the symbol = as is approximately equal to.)
Example 7.6
If X B(n, p) and the normal approximation was used to calculate probabilities for this binomial distribution,
what area under the normal curve would be used to calculate P
B
(4 X 6)?
Solution
The areas under the normal curve used would be as shown in Figure 7.12
Figure 7.12: Using the Normal to Find Binomial Probabilities
X
np
1.5 2.5 3.5 4.5
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
=
np(1 p)
P
B
(X = 2)
........................................................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P
B
(X = 3)
.................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
P
B
(X = 4)
..................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Thus P
B
(2 X 4) =P
N
(1.5 < X < 4.5)
This adjustment to the limits is called the continuity correction.
The general result on the continuity correction is as follows
'
&
$
%
Continuity correction :
When using the normal approximation to the binomial
P
B
(a X b) =P
N
(a 0.5 < X < b + 0.5).
Example 7.7
On a particular day, 75% of the shares traded on the stock exchange fell in price. An investor has a portfolio of
40 randomly selected shares. What is the probability
(a) 28 out of the 40 shares fell in price;
(b) between 28 and 32 (inclusive) of the shares fell in price.
Solution
Let X = the number of shares that fell in price.
Then X B(40, 0.75)
Here n > 25 and so one of the approximation methods has to be used to estimate the probabilities. The method
to use is determined by the values of np and n(1 p). If either of these values is less than or equal to 5, use the
Poisson approximation. If both these values are greater than 5, use the normal approximation to the binomial.
Here np = 40 0.75 = 30 > 5
n(1 p) = 40 0.25 = 10 > 5
Therefore we should use the normal approximation to the binomial.
The normal distribution to use is:
X N (np, np(1 p))
.
.
. X N(40 0.75, 40 0.75 0.25)
.
.
. X N(30, 7.5)
(a) P
B
(X = 28) = P
N
(27.5 < X < 28.5)
= P
N
(X < 28.5) P(X 27.5)
= P
N
Z <
28.530
7.5
P
N
Z
27.530
7.5
= P
N
(Z < 0.55) P
N
(Z 0.91)
= 0.2912 0.1814
= 0.1098
The probability that 28 shares fell in price is 0.1098.
(b) P
B
(28 X 32) = P
N
(27.5 < X < 32.5)
= P(X < 32.5) P(X 27.5)
= P
N
Z <
32.530
7.5
P
N
Z
27.530
7.5
= P
N
(Z < 0.91) P
N
(Z 0.91)
= 0.8186 0.1814
= 0.6372
The probability that between 28 and 32 shares fell in price is 0.6372.
Quick Question 6
You should now be able to calculate probabilities for any binomial distribution. The full method selection criteria
are displayed in the tree diagram below.
Figure 7.13: Calculating Binomial Probabilities
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ......................................................................................................................................................................................................................................................................
X B(n, p)
X B(n, p)
Binomial Tables
Tables p.521
X P
0
(np)
or Y P
0
(n(1 p))
Poisson Tables
Tables p.2332
X N(np, np(1 p))
Normal Tables
Tables p.3335
n 25 n > 25
np or n(1 p) 5 np and n(1 p) > 5
Quick Question 7
7.8 Summary
This is the third and nal unit on probability. This Unit describes probabilities for continuous random variables.
With continuous random variables probabilities are measured by the areas under a graph and you learned to use
simple geometry to calculate these probabilities.
In this Unit you also learned some of the properties of the normal probability density functionthe most impor-
tant distribution in statistics. You found out how to use the UC Statistical Tables to calculate the probabilities
for a normally distributed random variable. There are normal tables in the back of the text but these are quite
different to the UC Statistical Tables . Practice with the UC Statistical Tables these are the tables you will use
in your exams!
You also learned about the z
notation and practised using the inverse normal tables for nding zscores. This
notation is used extensively in the next four units. Check that you understand this notation before reading the next
unit.
The normal curve was also used to ll a gap in the calculation of binomial probabilities. You should now be able
to calculate approximate probabilities for any binomial distribution.
In the next unit we return to the measurement of the reliability of sample estimates. You will nd that the normal
curve plays a major role in this measurement.
Read the learning objectives at the start of this unit. Have you achieved these objectives? If there are any objec-
tives about which you are unclear, re-read the appropriate sections before trying the tutorial exercise.
UNIT 8
SAMPLING
DISTRIBUTIONS
8.0 Contents
8.1 Unit objectives
8.2 Introduction
8.3 Sampling distributions
8.4 The sampling distribution of the sample mean
8.5 Estimating the population mean: known
8.6 Estimating the population mean: unknown
8.7 The sample size for estimating a population mean
8.8 Estimating a population proportion
8.9 The sample size for estimating a population proportion
8.10 Summary
Print Workbook 8
8.1 Unit objectives
This unit is the most important unit in this course. In this unit we have discussed the concept of a sampling
distribution. Sampling distributions are central to all questions of estimation and inference. The concepts from
this unit will be used in the next three units to estimate population means and proportions and to test hypotheses.
Explain the concept of a sampling distribution;
Calculate the sampling distribution of the mean of a small sample from a small population;
Explain the importance of the central limit theorem;
Use the sampling distribution of the sample mean to calculate the probability that the sample mean is
close to the population mean;
Calculate a point estimate and a condence interval estimate of the population mean;
Determine the sample size required to obtain a reliable estimate of the population mean;
Calculate a point estimate and a condence interval estimate of a population proportion;
Use the sampling distribution of the sample proportion to calculate the probability that the sample pro-
portion is close to the population proportion.
Determine the sample size required to obtain a reliable estimate of a population proportion.
8.2 Introduction
This is the rst of four units on inferential statistics. In Unit 1 you learnt the following denition of inferential
statistics:
calculated.
In units 5, 6 and 7 you learnt to calculate probabilities for particular experiments. You will now learn how to use
probabilities to measure the reliability of using sample values to estimate population parameters.
In this unit we have discussed in detail the experiment of selecting a simple random sample and using the mean
of the sample to estimate the mean of a population. The analysis of the probabilities from this experiment gives a
measure of the reliability of using the sample mean to estimate the population mean.
For simplicity, we start by using an example from the earlier units to illustrate the concept of the sampling distri-
bution of the sample mean. In the next section this concept is extended to large samples from large populations.
The sampling distribution of the sample mean shows that the sample mean may not be equal to the population
mean and in general any sample estimate of a population parameter may not equal the population value. The
question then arises: How large a sample do we need to obtain reliable estimates of population parameters? A
simple formula is given for determining the sample size required to obtain a reliable estimate of the population
mean.
We then turn to estimating population proportions. You will have read opinion polls in the newspapers. Have you
ever asked yourself: How reliable are these gures? In the last two sections of this unit you will learn how to
measure the reliability of opinion polls that are taken by a simple random sample from the population.
8.3 Sampling distributions
To explain the concept of a sampling distribution, we start by considering an example of taking a simple random
sample from a small population. (The results below have all been obtained in earlier examples and are gathered
together here for easy reference. If you nd some of the concepts used later in the unit difcult to grasp keep
referring back to this example.)
Example 8.1
There are four employees in an ofce - Arthur, Beth, Clare and Daniel. Arthur is paid $20,000 per annum, Beth
and Clare are paid $30,000 per annum and Daniel is paid $40,000 per annum. A simple random sample of size
two is taken with replacement and with ordering from this population of 4 employees and the mean income in the
sample recorded.
(a) What is the probability distribution of the sample mean?
(b) What are the mean and variance of the probability distribution of the sample mean?
(c) What is the probability that the sample mean is within $5,000 of the population mean?
Solution
Let = the mean income in the population ($000)
= the standard deviation of income in the population ($000)
X = the mean income in the sample ($000)
In this population, = 30 and
2
= 50.
(a) As before, let us denote Arthur by a, Beth by b, Clare by c and Daniel by d. Then one possible sample is
ca with Clare selected rst and Arthur second. For this sample, the sample mean is X =
30+20
2
= 25. The
sample space for this experiment is given in Table 8.1.
Sample Incomes Sample Mean Sample Incomes Sample Mean
($000) ($000) ($000) ($000)
aa 20 20 20 ca 30 20 25
ab 20 30 25 cb 30 30 30
ac 20 30 25 cc 30 30 30
ad 20 40 30 cd 30 40 35
ba 30 20 25 da 40 20 30
bb 30 30 30 db 40 30 35
bc 30 30 30 dc 40 30 35
bd 30 40 35 dd 40 40 40
With a simple random sample each of the above outcomes has a probability of
1
16
. The probability dis-
tribution of the sample mean is easily constructed from this sample space. For example, for four of the
16 samples ( ab, ac, ba and ca ) the sample mean is 25 and so P(X = 25) =
4
16
. You can calculate the
probabilities for the other values of the sample mean in the same way. The resulting probability distribution
of the sample mean is given in Table 8.2.
Table 8.2: The Probability Distribution of the Sample Mean
x p(x)
20
1
16
25
4
16
30
6
16
35
4
16
40
1
16
1
(b) From Table 8.2 we can calculate that:
E[X] = 20
1
16
+ 25
4
16
+ 30
6
16
+ 35
4
16
+ 40
1
16
= 30
E[X
2
] = 20
2
1
16
+ 25
2
4
16
+ 30
2
6
16
+ 35
2
4
16
+ 40
2
1
16
= 925
.
.
. V (X) = 925 30
2
= 25
The mean of the probability distribution of the sample mean is 30 and the variance is 25.
Note:
the mean of X is equal to the population mean:
E[X] =
the variance of X is the population variance divided by the sample size (50/2 = 25).
V [X] =

2
n
(c) The population mean is $30,000 and so the sample mean is within $5,000 of the population mean if the
sample mean lies in the range $25,000 $35,000. From the probability distribution in Table 8.2 we calculate
that:
P(25 X 35) =
4
16
+
6
16
+
4
16
=
14
16
The probability that the sample mean is within $5,000 of the population mean is
14
16
.
From the above example, we can see that when a simple random sample is taken from a population:
the sample mean will differ from sample to sample (see Table 8.1).
the sample mean will have a probability distribution (see Table 8.2).
the probability distribution of the sample mean will have a mean and a variance.
Sampling is never 100% reliable. In Example 8.1 the probability that the sample mean is exactly equal to the pop-
ulation mean is only
6
16
i.e. 0.375. One method of measuring the reliability of a sample estimate is to nominate an
acceptable sampling error and then calculate the probability that this nominated error is not exceeded. The higher
the probability, the more reliable is the estimate. In Example 8.1 the probability that the sampling error does not
exceed $5,000 is shown to be
14
16
.
The probability distribution of the sample mean in Table 8.2 is called the sampling distribution of the sample
mean. This is a special case of the following denition.
Sampling distribution: The probability distribution of a statistic is called the sampling distribution of the statis-
tic.
The method of listing all the samples used in Example 8.1 to nd the sampling distribution of the sample mean
can only be used for a small sample from a small population. In the next section, results are given for nding the
sampling distribution of the mean of a large sample from a large population.
Quick Question 1
8.4 The sampling distribution of the sample mean
In Example 8.1 we were able to answer three questions:
(a) What is the sampling distribution of the mean?
(b) What is the mean and variance of the sampling distribution of the mean?
(c) What is the probability that the sample mean is close to the population mean?
These questions were answered by listing all the possible samples and then constructing the sampling distribution
of the sample mean. This method can only be used where a small sample is taken from a small population
otherwise the list of possible samples becomes too long. In this section, you will learn how to answer these
questions for samples taken from large populations.
On the next page there are three important results on the sampling distribution of the sample mean. These results
require that each element in the sample is chosen independently of the other elements in the sample. This require-
ment is satised where:
'
&
$
%
The independence requirement
either (a) the sample is taken with replacement
or (b) the sample is small compared to the size of the population.
From this point on, we will always assume that the independence requirement is satised.
'
&
$
%
The sampling distribution of the sample mean
1. If a simple random sample of any size n is taken from any population with mean and variance
2
, then:
E[X] = and V [X] =

2
n
(8.1)
2. If a simple random sample of any size n is taken from a normal population with mean and variance
2
,
then
X N
_
,

2
n
_
(8.2)
3. The central limit theorem.
If a large size simple random sample of size n (say n 30) is taken from any population with mean
and variance
2
, then:
X N
_
,

2
n
_
(8.3)
(Recall that means is distributed approximately as)
Comparing these three results, we note that:
1. The rst result applies to samples of any size from any population (subject to the independence require-
ment given above). This result was found to hold in Example 8.1. The standard deviation of a statistic is
called its standard error and so:
SE[X] =

n
(8.4)
2. The second result is exact and applies to samples of any size from a normal population.
3. The third result is only an approximate result. It applies to large samples from any population.
In the following example, we will examine a problem that is similar to Example 8.1 except that the population is
not small. It is not possible to list all the samples but we can still deduce the sampling distribution of the sample
mean and calculate the probability that the sample mean is close to the population mean.
Example 8.2
A simple random sample of size 2 is taken from a normal population with mean 30 and variance 50.
(a) What is the sampling distribution of the sample mean?
(b) What is the probability that the sample mean is within 5 of the population mean?
Solution
Let X = the mean of the sample.
(a) As the population is normal, Result 2 can be applied:
X N
_
30,
50
2
_
.
.
. X N(30, 5
2
)
The possible values of the sample mean are normally distributed with a mean of 30 and a standard deviation
of 5.
(b) The population mean is 30 and so the sample mean is within 5 of the population mean if the sample mean
lies between 25 and 35. The probability that the sample mean lies between 25 and 35 is the shaded area in
Figure 8.1.
Figure 8.1: The Sampling Distribution of the Sample Mean
X
15 25 30 35 45
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
P(15 X 25)
SE(X) = 5
Z = 1.00
.................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Z = +1.00
.................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
X = 25 Z =
25 30
5
= 1.000
X = 35 Z =
35 30
5
= +1.000
.
.
. P(25 X 35) = P(1.000 Z +1.000)
= 0.8413 0.1587 (UC Statistical Tables p.34 and 35)
= 0.6826
The probability that the sample mean is within 5 of the population mean is 0.6826.
Quick Question 2
8.5 Estimating the population mean: known
Sample data is used to estimate the values of population parameters. Using a single number calculated from a
sample as an estimate of a population parameter is called a point estimate.
Point estimator: draws inferences about a population by estimating the value of an unknown population param-
eter, using a single value or point based on a sample.
The rule used to calculate the single value is called an estimator. The value calculated from the rule is called an
estimate.
For most populations, the sample mean is the best available point estimator of the population mean.
Example 8.3
A simple random sample of 10 students had the following percentage marks in a test:
48 52 58 60 64 68 72 82 92 94
Estimate the mean mark of all students taking the test.
Solution
The rule is to use the sample mean to estimate the population mean. The estimate is
X =
48+52+58+60+64+68+72+82+92+94
10
= 69
The estimated mean mark of all the students taking the test is 69%.
In the previous sections, we have seen that the sample mean has a probability distribution. The sample mean is
not expected to be equal to the population mean. All we can deduce from a point estimate of the population mean
is that the population mean is probably somewhere close to this estimate. But how probably and how close?
Condence interval estimators are much more useful than point estimators because they quantify the reliability
of the estimates.
Condence interval estimator: draws inferences about a population by estimating the value of an unknown
population parameter, using an interval that includes the population parameter with a stated probability based on
a sample.
A typical condence interval estimate is:
the mean mark in the test is somewhere between 61% and 77% with probability 0.95.
Condence interval estimates are also called condence intervals and interval estimates and the probabilities
are often given as percentages. The above condence interval estimate could be reported as:
the range 61% to 77% is a 95% condence interval for the population mean.
Condence intervals are usually given with 90%, 95% or 99% condence levels.
Statisticians use the letter to represent the probability that the population parameter does NOT lie between the
upper and lower limits of the condence interval estimator. The probability that the population parameter does
lie between the upper and lower limits is then 1 and so the condence level is 100(1 )%.
The condence interval estimator for the population mean
From Equations 8.2 and 8.3
X N
_
,

2
n
_
and so the zscore for a sample mean is
z =
x
n
This zscore has the standard normal distribution shown in Figure 8.2.
Figure 8.2: The Standard Normal Distribution
Z
z
/2
0 +z
/2
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
2
................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
1
................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
2
(Recall from section 7.5 that z
/2
is the value of Z with an area

2
to the right.)
From Figure 8.2 we can see that for any standard normal distribution:
P(z
/2
< Z < z
/2
) = 1
.
.
. P
_
_
z
/2
<
x
n
< z
/2
_
_
= 1
A manipulation of the inequalities gives
P
_
x z
/2
n
< < x + z
/2
n
_
= 1 (8.5)
Equation 8.5 gives a range within which the population mean lies with probability (1 ). It is the 100(1)%
condence interval estimate for the population mean.
To nd the value of z
/2
for a condence interval estimate, sketch the graph of the standard normal distribution
and mark in the condence level as the central range in the standard normal distribution. Then use Table 4B on
page 33 of the UC Statistical Tables to determine z
/2
.
Example 8.4
What is the value of z
/2
for a 90% condence interval estimate.
Solution
First sketch the graph with a central marked in area of 0.90.
Figure 8.3: Constructing a 90% Condence Interval Estimate for the Mean
Z
3 0 3
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.90
.................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
.................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
...............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
This is the zscore to use
As the central area is 0.90, the sum of the areas at the two tails of the curve is 10.90 = 0.10. From the symmetry
of the normal curve, half of the unshaded area is in the lower tail and half in the upper tail. The area in each tail
is
0.10
2
= 0.05.
The area below z is 1 0.05 = 0.95. From Table 4B on page 33 z
0.05
= 1.6449.
The 90% condence interval estimate for the mean is therefore:
P
_
x 1.6449

n
< < x + 1.6449

n
_
= 0.95
For a 95% condence interval estimate, the zscore is 1.9600 and for a 99% condence interval estimate the
zscore is 2.5758.
Example 8.5
A simple random sample of 50 units in Surfers that were rented out from 1989 to 1999 showed a mean rate of
return on the investment of 4.3% and a sample standard deviation of 1.0%. Calculate a point estimate and a 95%
condence interval estimate for the mean rate of return on all units in Surfers.
Solution
Let = mean rate of return on all units rented out between 1989 and 1999.
A point estimate for the mean rate of return on all houses is
= x = 4.3%.
The (point) estimate of the mean rate of return on all units in Surfers is 4.3%.
For a 95% condence interval estimate, we have
P
_
x 1.9600

n
< < x + 1.9600

n
_
= 0.95
Here the population standard deviation is unknown and so we will use the sample standard deviation s = 1.0 as
an estimate of the population standard deviation.
P
_
4.3 1.9600
1.0
50
< < 4.3 + 1.9600
1.0
50
_
= 0.95
P(4.0 < < 4.6) = 0.95
The 95% condence interval estimate for the mean rate of return on all units in Surfers between 1989 and 1999
is 4.0% to 4.6%
In other words, on the basis of this sample of only 50 units we can be 95% certain that the mean rate of return on
all units in Surfers is somewhere between 4.0% and 4.6%.
When rounding the limits of condence interval estimates, always round the lower limit downwards and the upper
limit upwards so that the rounded interval contains the unrounded interval. This makes the probability that the
100(1 )% condence interval estimate contains the population parameter at least (1 ). For example, a
condence interval estimate of 4.673 to 5.046 when rounded to one decimal place becomes a condence interval
estimate of 4.6 to 5.1.
The main results of this section are summarised below.
'
&
$
%
Estimating the population mean: known
Assumptions
1. A simple random sample of size n is taken from the population.
2. The sample values are selected independently.
3. Either:
(a) the population is normal or
(b) n 30
Point estimate of : only requires assumption 1
= x
100(1 )% condence interval estimate of
P
_
x z
/2
n
< < x + z
/2
n
_
= 1
Quick Question 3
8.6 Estimating the population mean: unknown
Example 8.5 highlighted a problem with the condence interval estimator for the mean used in section 8.5: the
condence interval estimate can only be calculated when the population standard deviation is known. In practice
of course, the population standard deviation is (almost) never known. To calculate a condence interval estimate
when the population standard deviation is not known requires two changes to the procedure outlined in section 8.5.
1. the sample standard deviation (s) is used as an estimate of the population standard deviation ().
2. the t distribution is used instead of the standard normal distribution.
The point estimator and condence interval estimator for the mean when the population standard deviation is not
known are as shown overleaf.
'
&
$
%
Estimating the population mean: unknown
Assumptions
3. The population is normal.
Point estimate of : only requires assumption 1
= x
100(1 )% condence interval estimate of
P
_
x t
/2
s
n
< < x + t
/2
s
n
_
= 1
where t
/2
is from the ttables with = n 1.
The condence interval estimator requires that the population is normal. However the condence interval estimate
stated above is approximately valid for non-normal populations when:
1. the sample is reasonably large (say n 10) and
2. the population is not extremely skewed.
The t distribution
The t distribution used to nd the condence interval for the mean has a single parameter . We will use this
distribution frequently in the remainder of this course and you will always be told the value of to use. The t
distribution has a very similar shape to the standard normal distribution, and the larger the value of the closer is
the t distribution to the normal distribution.
Figure 8.4: Comparing the t and z Distributions
3 2 1 0 1 2 3
T
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . ................................................................................................................................................... . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
t with = 2
........................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
t with = 30
........................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
z distribution
.................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
As you can see, for 30 the t distribution is almost identical to the standard normal distribution. This leads the
text to suggest that for 30 the normal tables can be used to calculate t-values. We will not use this procedure
here. When the t distribution is used we will nd the values from the ttables.
There are ttables on page 40 of the UC Statistical Tables . Notice that with these ttables the value is listed
down the left hand side of the tables and the areas to the left are listed along the top of the table. The UC
Statistical Tables are consistent, the areas in the tables are always areas to the left! To use the ttables you must
rst calculate the area to the left of the required tscore.
Example 8.6
What is t
/2
for a 90% condence interval estimate from a sample of size 6?
Solution
t
3 0 3
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.90
......................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
......................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Area above tscore = 0.05
................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
This is the tscore to use
= 6 1 = 5
The central shaded area is 0.90 and so, as the total area under the curve is 1, the area in the two tails is
(1 0.90) = 0.10. The two tails are of equal size. Each tail has an area of
0.10
2
= 0.05. The area below
the required tscore is 1 0.05 = 0.95 and so the tscore is in the column p = 0.95. The row is = n1 = 5.
Table 8.3: Using the ttables
/2 = 0.05
p
.75 .80 .85 .90 0.95 0.975 .99 0.995 0.999 0.9995
1 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 318.31 636.62
2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 22.326 31.598
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 10.213 12.924
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 7.173 8.610
= 5 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.208 5.959
When = 5, t
0.05
= 2.015.
Example 8.7
A supermarket recorded the number of packets of chips it sold on eight randomly chosen days. The numbers sold
were:
86 88 90 82 78 84 78 92
Find a point and a 95% condence interval estimate for the mean number of packets of chips the supermarket
sells per day.
Solution
Let = the mean number of packets of chips sold in a day.
Entering these data into a calculator in SD mode and pressing the n , X and
n1
keys gives
n = 8 x = 84.75 s = 5.2304
The point estimate for the mean
Always use the sample mean as the point estimator for the population mean.
The point estimate of the mean number of packet of chips sold per day is 85.
The condence interval estimate for the mean
The degrees of freedom are = n 1 = 7. For a 95% condence interval estimate, the t distribution diagram is
as shown in Figure 8.6.
t
3 0 3
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ...................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.95
........................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.025
........................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Area above tscore = 0.025
...............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
This is the tscore to use
= 7
From the ttables with p = 0.975 and = 7, t
0.025
= 2.365. The condence interval estimate is:
P
_
84.75 2.365
5.2304
8
< < 84.75 + 2.365
5.2304
8
_
= 0.95
P (80.38 < < 89.12) = 0.95
On the basis of the sample data we can conclude that the supermarket sells on average somewhere between 80
and 90 packets of chips per day with probability 0.95.
Quick Question 4
8.7 The sample size for estimating a population mean
We now turn to the question: How large a sample should we take to obtain a reliable estimate of the population
mean? There are two considerations here.
How accurate do we want the estimate to be?
With sampling we cannot expect that the sample mean will be exactly equal to the population mean and
so an acceptable margin of error has to be specied. Obviously, the smaller the acceptable margin of
error, the larger the sample will need to be.
How certain do we want to be that the estimate meets this accuracy requirement?
Sampling is not 100% reliable and cannot guarantee to be as accurate as required all the time. However,
it can meet the requirement almost all of the time e.g. 90%, 95% or 99% of the time. The greater the
degree of certainty required, the larger the sample will need to be.
Let B = the maximum allowable error (also called the error bound)
1 = the acceptable probability that the error is less than B.
Then the requirement is that the sample mean should be within B of the population mean with probability 1 .
This can be written as
P(x B < < x + B) = 1
Comparing this to the condence interval for the mean in Equation 8.5 gives
B = z
/2
n
.
.
. n =
_
z
/2
B
_
2
The sample size in this formula is the smallest sample size that will satisfy the accuracy requirements. Any larger
sample size will also satisfy the requirements.
In practice, the population standard deviation is not known and some estimate has to be used. This estimate can
be obtained from a small pilot study of the population or from previous research work in the area.
'
&
$
%
Calculating the required sample size for estimating the mean
n
_
z
/2

B
_
2
where B = the maximum allowable error
1 = the acceptable probability that the error is less than B
= an estimate of the population standard deviation.
Example 8.8
A researcher wants to estimate the mean number of hours per week that school age children spend watching
television. She wants to be 90% certain that her estimate is in error by no more than 2 hours. Previous research
estimated the standard deviation of the number of hours watched per week to be 8 hours.
Solution
Here B = 2 hours
1 = 0.90
= 8 hours
To nd the required zscore, sketch the standard normal with the required condence level as the central area.
Figure 8.7: Finding the zscore for the Sample Size Calculation
z
3 0 3
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ................................................................................................................................................................................................................................................................................................................................................................................................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.90
.............................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
.............................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
...............................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
This is the zscore to use
Here we use z
0.05
= 1.6449.
n
_
z
/2

B
_
2
_
1.6449 8
2
_
2
.
.
. n 43.29
Now n is a sample size and so must be an integer. The gure estimated above is the minimum sample size. A
smaller sample size is not sufcient. Thus the sample size must always be rounded upwards. The minimum
sample size is 44.
Quick Question 5
8.8 Estimating a population proportion
In this section, you will learn to use sampling to answer such questions as
what proportion of the population is unemployed?
what proportion of the population lives below the poverty line?
what percentage of voters intends to vote for the Governments candidate in the coming by-election?
what proportion of BS1 assignments are returned to the students within three weeks of submission?
what percentage of batteries fail within six months of the purchase date?
In each of the above questions, we are trying to estimate the proportion of the population that has a particular
characteristic.
Let p = the proportion in the population with the characteristic.
To estimate the population proportion, a simple random sample is taken from the population and the number in
the sample with the characteristic recorded.
Let n = the size of the sample
X = the number in the sample with the characteristic.
Then p =
x
n
= the proportion in the sample with the characteristic.
The proportion in the sample with the characteristic is the obvious point estimator of the proportion in the popu-
lation with the characteristic. It can be shown that the sample proportion is best point estimator of the population
proportion.
Interval estimates for the population proportion can be derived by using the normal approximation to the binomial.
You will remember that the normal approximation to the binomial can only be used when np > 5 and nq > 5,
where p is the probability of success in each trial. With proportion sampling the value of p is the population
proportion with the characteristic, which is unknown, but p gives an estimate of p. Then, using p as an estimate
of p the interval estimate for the population proportion is only valid when n p > 5 and n q > 5.
The main results on estimating a population proportion are given below.
'
&
$
%
Estimating a population proportion
Assumptions
3. n 30 and n p, n q 5
Point estimate of p : only requires assumption 1
Let X = the number in the sample with the characteristic.
p =
x
n
100(1 )% interval estimate of p
P
_
_
p z
/2
p q
n
< p < p + z
/2
p q
n
_
_
=1
Example 8.9
A simple random sample of 500 youths aged between 15 and 20 revealed that 300 of them were unemployed.
Find a point and 95% interval estimate for the youth unemployment rate.
Solution
Let p = the proportion of youths in the population who are unemployed.
Then p =
300
500
= 0.6
q = 1 p = 0.4
The estimated youth unemployment rate is 60%.
Before constructing the condence interval estimate, rst check whether the conditions are met:
n p = 500 0.6 = 300 > 5
and n q = 500 0.4 = 200 > 5
As both n p and n q are greater than 5 the condence intervals from the previous page can be used.
To construct a 95% interval estimate, we need the following values:
n = 500
p =
300
500
= 0.6
Applying the formula for the 95% condence interval for a proportion given above we have:
P
_
_
0.6 1.9600
0.6 0.4
500
< p < 0.6 + 1.9600
0.6 0.4
500
_
_
= 0.95
P(0.557 < p < 0.643) = 0.95
On the basis of the sample data, the youth unemployment rate lies between 55% and 65% with probability 0.95.
Notice that although the question was framed in terms of percentages, all the calculations were carried out on
proportions. Always work with proportions and convert the results to percentages at the end of the calculations.
Quick Question 6
8.9 The sample size for estimating a population proportion
In Example 8.9, a sample of size 500 was used to estimate the population proportion. This is in line with the many
opinion polls published in Australian newspapers which often use a sample of 500 to estimate public opinion. Is
this sample large enough to give reliable estimates?
The same considerations apply as for estimating a sample mean:
1. how accurate do we want the estimate to be?
2. how certain do we want to be that the estimate meets this accuracy requirement?
Let B = the maximum allowable error. Thus must be expressed as a proportion and not as a
percentage.
1 = the acceptable probability that the error is less than B.
p = the proportion in the population with the characteristic.
The required sample size is:
n pq
_
z
/2
B
_
2
This result cannot be used directly because it includes the unknown population parameter p. There are two
approaches to this problem.
1. Take a pilot sample and use p from the pilot study as an estimate of p. The sample size is then estimated
by:
n p q
_
z
/2
B
_
2
2. Choose n so that it is large enough for all possible values of p. With this approach, the sample size is
estimated by:
n p
_
z
/2
B
_
2
where p
maximises the value of pq over the range of possible p.

We will use the second approach.
p is a proportion and so its value lies between 0 and 1. The expression: pq = p(1 p) = p p
2
is a simple
quadratic and its graph is sketched below.
Figure 8.8: Choosing the Sample Size to Estimate a Proportion
p
pq
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.00
0.05
0.10
0.15
0.20
0.25
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The value of p
is the value of p that gives the highest point on this graph over the possible range of p. If there is
no prior information on the value of p, this graph has its maximum value of 0.25 when p = 0.5.
'
&
$
%
Calculating the required sample size for estimating a proportion
n p
_
z
/2
B
_
2
where B = the maximum allowable error
1 = the acceptable probability that the error is less than B
and p
= the value of p that maximises pq over the range of possible p.

When there is no prior knowledge of p, the required sample size is
n 0.25
_
z
/2
B
_
2
Example 8.10
You have been engaged to estimate the percentage of voters who feel that the Prime Minister is doing a good job.
How large a sample should you take to be at least 95% condent that the sample percentage is within 2% of the
population percentage:
(a) when there is no information about the population percentage?
(b) when it is known from a previous study that the population percentage is between 20% and 40%?
Solution
Let p = the proportion of voters who feel the Prime Minister is doing a good job.
(a) Here we have 0 p 1
B = 0.02 z
0.025
= 1.9600
.
.
. n 0.25
_
1.9600
0.02
_
2
2401
The minimum sample size is 2401.
(b) Here 0.2 p 0.4
Figure 8.9: Choosing the Sample Size to Estimate a Proportion
p
pq
..
possible p
max
0.2p0.4
pq
........................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.00
0.05
0.10
0.15
0.20
0.25
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Over the range 0.2 p 0.4, the graph has its highest point where p = 0.4. Here we have
B = 0.02 z
0.025
= 1.9600 p
= 0.40
.
.
. n 0.4 0.6
_
1.96
0.02
_
2
2304.96
The minimum sample size is 2305.
Quick Question 7
8.10 Summary
This is the rst of four units on inferential statistics. In this unit estimation was discussed. Two types of estimate
were describedpoint estimates and interval estimates.
We rst considered point estimates. Point estimates give a single numerical value as the estimate of the population
value.
Even when the best possible point estimate has been calculated, its usefulness is limited because the value gives
no indication of the reliability of the estimate. Interval estimates, or condence intervals, are much more useful
because they give a range within which the population value lies with stated probability. In this unit, you learned
to construct condence intervals for means and proportions.
Finally in this unit, you learned to calculate how big a sample is needed to obtain reliable estimates of a population
mean and a population proportion.
In the next unit you will learn another side of inferencehypothesis testing.
UNIT 9
HYPOTHESIS TESTING
PRINCIPLES
9.0 Contents
9.1 Unit objectives
9.2 Introduction
9.3 Step 1: State the null and alternative hypotheses
9.4 Step 2: Specify the level of the test
9.5 Step 3: Select a test statistic
9.6 Step 4: Dene the decision rule
9.7 Step 5: Calculate the value of the test statistic
9.8 Step 6: Make a decision and answer the question
9.9 Two examples of hypothesis testing
9.10 Calculating p-values
9.11 Summary
Print Workbook 9
9.1 Unit objectives
A hypothesis is a statement about one or more populations. Hypothesis testing is the process of using sample data
to decide whether a hypothesis is true or false.
In this unit you will learn the principles of hypothesis testing. To illustrate these principles we have used the
simplest case of testing a hypothesis about a population mean when the population standard deviation is known.
Make a statement about a population in the form of a hypothesis;
Distinguish between a null hypothesis and an alternative hypothesis;
Explain the six step approach to hypothesis testing;
Distinguish between type I and type II errors in hypothesis testing;
Explain the terms rejection region and critical value;
Choose between one-tailed and two-tailed rejection regions;
Carry out a test on a population mean when the population standard deviation is known;
Calculate and interpret pvalues.
9.2 Introduction
In the previous unit you learnt to use sample data to estimate the values of some population parameters. In this
and the following two units you will learn to use sample data to decide whether statements made about population
parameters are true or false. Some examples of the type of statements to be tested are:
the mean wage of an ACT resident in full time employment is $600 per week;
more than 20% of households are now living below the poverty line;
female workers are paid less on average than male workers doing the same job;
the Governments popularity has declined in the last three months;
the marginal propensity to import in Samoa is 0.6.
These statements are examples of hypotheses.
Hypothesis: A statement about one or more populations.
Each of the statements may or may not be true. Hypothesis testing is the statistical procedure for using sample
data to decide whether statements such as these are true or false.
In this subject we will focus on ve types of hypothesis:
1. statements about a single population mean;
2. statements about the difference between two population means;
3. statements about a single population proportion;
4. statements about the difference between two population proportions;
5. statements about the slope of a regression line.
In this unit we discuss the principles of hypothesis testing and apply these principles to testing statements about
a single population mean. In Unit 10 we will apply the principles developed here to testing the rst four types of
hypothesis listed above. In Unit 11 you will learn to test statements about the slope of a regression line.
Hypothesis testing is carried out through a six step procedure:
Step 1: State the null and alternative hypotheses;
Step 2: Specify the level of the test;
Step 3: Select a test statistic;
Step 4: Dene the decision rule;
Step 5: Calculate the value of the test statistic;
Step 6: Make a decision and answer the question.
In the following sections we will look at each of these steps in turn.
You may nd this unit a little complicated at times. Persevere! The nal result is a six step procedure that is very
easy to apply. If you have problems with some sections skip on, and look at the applications in the following unit.
Then when you have a clear view of the six step procedure in practice, return to this unit and examine each step
in detail.
9.3 Step 1: State the null and alternative hypotheses
The rst step in hypothesis testing is to write out the hypothesis and its negation (or opposite). These two hy-
potheses are, by denition, mutually exclusive and collectively exhaustive. One of the hypotheses must therefore
be true and the other must be false.
As the two hypotheses are collectively exhaustive, one of the hypotheses includes the equals case. The hypothesis
that includes the equality is designated as the null hypothesis and the other hypothesis becomes the alternative
hypothesis. This step is illustrated in the following examples.
Example 9.1
A trade union ofcial states that the mean wage of full time workers in the ACT is $600 per week. Formulate the
null and alternative hypotheses for testing this statement.
Solution
Let = the mean weekly wage of a full time worker in the ACT ($).
The ofcials statement is : = 600
The negation of the statement is : = 600
Certainly one of these two hypotheses must be true - but which one? In this case the ofcials statement contains
the equality and so becomes the null hypothesis. The negation becomes the alternative hypothesis. The usual
notation for the null and alternative hypotheses is shown below.
H
0
: = 600
H
A
: = 600
H
0
denotes the null hypothesis and H
A
the alternative hypothesis.
Example 9.2
Residents on Belconnen Way complain that the average speed of cars on the road is greater than the speed limit
of 60kph. Formulate the null and alternative hypotheses for testing the validity of this complaint.
Solution
Let = the mean speed of cars on Belconnen Way (kph).
The residents complaint is : > 60
The negation is : 60
As before, one of these hypotheses is true and the other is false. The negation contains the equality and so
becomes the null hypothesis.
H
0
: 60
H
A
: > 60
In this example the original complaint has become the alternative hypothesis.
Example 9.3
A manufacturer of car batteries claims that less than 10% of her batteries fail in their rst year of use. Formulate
the null and alternative hypotheses for testing this claim.
Solution
Let p = the proportion of the manufacturers batteries that fail in their rst year.
The manufacturers claim is : p < 0.10
The negation is : p 0.10
As before, one of these hypotheses is true and the other is false. The negation contains the equality and so
becomes the null hypothesis.
H
0
: p 0.10
H
A
: p < 0.10
In this example the original claim has become the alternative hypothesis.
Example 9.4
The manufacturer of El-cheapo globes advertises that the average life of his globes is the same as the average life
of the more expensive Excelsior brand. Formulate the null and alternative hypotheses for testing the truthfulness
of this advertisement.
Solution
Let
1
= the mean life of an El-cheapo globe (hours).
2
= the mean life of an Excelsior globe (hours).
The advertisement claims that :
1
=
2
The negation of the claim is :
1
=
2
The original claim contains the equality and so becomes the null.
H
0
:
1
=
2
or
1
2
= 0
H
A
:
1
=
2
or
1
2
= 0
Here the claim in the advertisement has become the null hypothesis.
Example 9.5
A manufacturer believes that absenteeism fell following the introduction of incentive payments to workers. For-
mulate the null and alternative hypotheses for testing this belief.
Solution
Let p
1
= the proportion of workers absent before the introduction of incentive payments.
p
2
= the proportion of workers absent after the introduction of incentive payments.
The manufacturer believes that : p
1
> p
2
The negation is : p
1
p
2
The negation contains the equality and so this becomes the null.
H
0
: p
1
p
2
or p
1
p
2
0
H
A
: p
1
> p
2
or p
1
p
2
> 0
The manufacturers belief has become the alternative hypothesis.
In formulating hypotheses some words that occur regularly have to be translated into mathematical notation. You
may nd the following short glossary useful.
Table 9.1: Some Common Words in Hypothesis Testing
Expression Math Form Expression Math Form
Equals = Differs from =
Less than < No less than
More than > No more than
At least Or less
Increased before < after Fell before > after
Quick Question 1
Quick Question 2
9.4 Step 2: Specify the level of the test
The null and alternative hypotheses are not treated equally in hypothesis testing. Hypothesis testing centres on
the null hypothesis and the objective of the test is to show that the null hypothesis is false. The question asked is
Is there sufcient evidence against the null hypothesis in the sample data for us to condently conclude that the
null hypothesis is false?
If the answer to this question is Yes, the null hypothesis is said to be rejected by the data. The null has been
shown to be false and so the alternative hypothesis has been shown to be true.
If the answer is No, the null hypothesis is said to be not rejected by the data. This does not mean that the
null has been shown to be true, only that so far, there is not enough evidence in the data collected for the null to
be condently declared to be false. The data may suggest that the null is false without providing strong enough
evidence for the null to be rejected. The null hypothesis is never accepted as being true. If the null is not
rejected, the most that can be said about the null is that it may be true.
The only two possible conclusions at the end of a hypothesis test are reject the null or do not reject the null.
Example 9.6
In Example 9.1, the two possible conclusions are
1. Reject the null. Then H
0
has been found to be false and so H
A
is true. The conclusion here is that the
trade union ofcials statement is incorrect and the mean wages of domestic workers is not $600.
2. Do not reject the null. This does not imply that the mean wage is $600. It only shows that the evidence
accumulated so far is not strong enough to condently conclude that the mean wage is not $600. If H
0
is not rejected, the test does not choose which hypothesis is true.
Hypothesis testing cannot show that the trade unionists statement is true. It can only show that the statement is
false or come to no conclusion!
There are two possibilitiesH
0
is true or H
0
is falseand there are two possible conclusionsH
0
is rejected or
H
0
is not rejected. Unfortunately, as you already know from the examples on estimation, sampling is not 100%
reliable. Wrong conclusions can be reached.
Type I errors
If H
0
is true, it should not be rejected. If H
0
is true and it is rejected, an error has been made. This is called
a type I error and the probability of a type I error is denoted by the letter .
Type I error: The rejection of a true null hypothesis
Type II errors
If H
0
is false, it should be rejected. If H
0
is false and it is not rejected, an error has been made. This is
called a type II error and the probability of a type II error is denoted by .
Type II error: The non-rejection of a false null hypothesis
The four possibilities are summarised in Table 9.2.
Table 9.2: Errors in Hypothesis Testing
State of the World Decision
H
0
is not rejected H
0
is rejected
H
0
is true Correct Decision Type I error
Probability =
H
0
is false Type II error Correct decision
Probability=
The probability of a type I error, , can be made smaller by making the conditions for the rejection of H
0
more
difcult to satisfy when H
0
is true. Unfortunately, this makes it more difcult to reject H
0
when H
0
is false and
so leads to an increase in . For a given sample size and test procedure there is a trade-off between the two types
of errors. The smaller is , the larger is .
There are often many different tests that can be used to test the null hypothesis. An efcient test is one which, for
the given sample size and stated level of , has the lowest possible value of . With an efcient test, the values
of and can only be simultaneously reduced by increasing the sample size.
The probability of rejecting H
0
when it is true, , is also called the level of the test.
The level of the test: The probability that H
0
is rejected when it is true.
This probability is chosen by the researcher. There is no right value for the level of a testit depends on the value
judgements of the person carrying out the test. The most common level of test to use is = 0.05called a 5%
test. Other common levels are = 0.10 (10% test) and = 0.01 (1% test). When a test is said to be carried
out at the 5% level, this means that the researcher is accepting a probability of 0.05 that the null hypothesis is
rejected when it is true. The smaller is the specied value of , the larger is the value of .
Notice that hypothesis testing is not symmetric in its treatment of the two types of errors. The researcher nomi-
nates the value of but the value of is not explicitly considered. In this subject you will be told the value of
to use. Always remember that the smaller is the value of , the larger is the implied value of .
9.5 Step 3: Select a test statistic
For the rest of this unit we will focus on the problem of testing hypotheses about the mean of a population when
the population standard deviation is known. In Units 10 and 11 the principles developed in this unit will be ap-
plied to testing other hypotheses.
Recall that the objective of hypothesis testing is to try and reject the null hypothesis on the basis of the evidence
available from sample data. In this section we will dene a single number that measures the strength of the evi-
dence in the sample against the null hypothesis.
There are three possibilities for the null and alternative hypotheses.
CASE 1 H
0
: =
0
H
A
: =
0
CASE 2 H
0
:
0
H
A
: >
0
CASE 3 H
0
:
0
H
A
: <
0
where
0
is the hypothesised population mean value.
Example 9.1 is an example of CASE 1 with
0
= 600. Example 9.2 is an example of CASE 2 with
0
= 60.
Let us rst look at CASE 1. If the null hypothesis is true, then the sample, x, has been taken from a population
with mean
0
and so we would expect the sample mean to be close to
0
. If the sample mean is very different
from
0
, this supports the alternative hypothesis. In Example 9.1, if the null hypothesis is true, the mean income
of all workers is $600. We would then expect that a random sample of workers would have a mean income of
close to $600. However, if a random sample of domestic workers has a mean income that is much smaller or
much larger than $600, this provides evidence against the null hypothesis. The greater the difference between the
sample mean and $600, the stronger the evidence against the null.
In CASE 2, the alternative hypothesis states that the population mean is more than
0
. If the sample mean is much
larger than
0
, this supports the alternative hypothesis that the population mean is larger than
0
. In Example 9.2,
if a random sample of cars passing along Belconnen Way has an average speed that is much greater than 60kph,
this provides evidence in favour of the alternative hypothesis that the average speed of all cars is greater than
60kph. The greater the average speed in the sample, the stronger the evidence against the null.
In CASE 3, support for the alternative hypothesis comes if the sample mean is less than
0
.
In each case the strength of the evidence against the null hypothesis and in favour of the alternative hypothesis
depends on the difference between the sample mean and
0
. The greater the difference, the stronger is the evi-
dence against the null and in favour of the alternative hypothesis.
If the equality in the null hypothesis is true, the sample mean is the mean of a simple random sample taken from
a population with mean
0
and known variance
2
. We recall from Unit 8 that if the population is normal or the
sample size is large, then:
X N
0
,

2
n
A measure of the difference between the sample mean and

0
is provided by the calculated zscore for the sam-
ple mean:
z
obs
=
x
0
n
In CASE 1, there is evidence against the null if the sample mean is much larger than
0
or much smaller than
0
.
If the sample mean is much larger than
0
, the numerator of z
obs
is a large positive number, and so z
obs
is a large
positive number. Similarly, if the sample mean is much smaller than
0
, the numerator of z
obs
is a large negative
number, and so z
obs
is a large negative number. There is evidence against the null if z
obs
is a large positive number
or a large negative number. The larger the number (positive or negative), the stronger the evidence.
In CASE 2, there is evidence against the null if the sample mean is much larger than
0
. The more positive the
value of z
obs
, the stronger the evidence against the null.
In CASE 3, there is evidence against the null if the sample mean is much less than
0
. The more negative the
value of z
obs
, the stronger the evidence against the null.
The z
obs
value is an example of a test statisticit provides a measure of the strength of the evidence against the
null hypothesis.
A test statistic can be any number calculated by comparing the sample data to the value of the equality in the null
hypothesis.
Test statistic: A number calculated from the sample data and the equality in the null hypothesis.
The selected test statistic for the null hypotheses in the three cases listed at the start of the section, is:
z
obs
=
x
0
n
when the population standard deviation is known.
9.6 Step 4: Dene the decision rule
You now know that the further the sample mean, x, is from the hypothesised population mean of
0
, the stronger
the evidence is against the null hypothesis and in favour of the alternative hypothesis. In this section you will
learn to decide whether the sample mean is far enough away from
0
for the null hypothesis to be condently
rejected.
CASE 1 H
0
: =
0
H
A
: =
0
If the null hypothesis is true, the probability distribution of the mean of a simple random sample taken from this
population is
X N
0
,

2
n
(9.1)
The null hypothesis that the population mean is
0
is to be rejected if the sample mean is substantially larger or
smaller than
0
. Now the question arises: How large must the difference between the sample mean and
0
be for
the null hypothesis to be rejected? Suppose arbitrary lower (L) and upper (U) values are selected (where L <
0
and U >
0
) and the following temporary decision rule adopted:
_
`
Reject H
0
if : x < L or x > U
Do not reject H
0
if : L x U
Then, using equation 9.1 above, the possible values of the sample mean when the null hypothesis is true are as
shown in Figure 9.1.
Figure 9.1: The Distribution of X when =
0
X
X N
0
,

2
n
L
0
U
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0
Reject H
0

Do Not Reject H
0
This graph gives the possible values for the mean of the sample when the null hypothesis is true. In this graph
we can see that when the null hypothesis is true, it is possible that the sample mean is less than L or greater than
U. The shaded area is the probability that the sample mean is less than L or greater than U when the null is true.
With the above temporary decision rule, this is the probability of rejecting H
0
when it is true. The shaded area is
therefore the value of referred to in Step 2. The larger the area, the larger is .
In Step 2 an acceptable probability of rejecting H
0
when it is true was specied. The values of L and U must be
chosen so that the total shaded area is equal to the specied value of . The values of L and U must be chosen as
Figure 9.2: Determining the Rejection Region for H
A
: =
0
X
X N
0
,

2
n
L
0
U
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0
Reject H
0

Do Not Reject H
0
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
/2
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
/2
The area to the right of U is /2 and so U has a zscore of z
/2
. All the sample means in the rejection range
above U have a zscore of greater than z
/2
. Similarly all the sample means below L have a zscore of less than
z
/2
. The decision rule can now be written as
_
`
Reject H
0
if : z
obs
< z
/2
or z
obs
> +z
/2
Do not reject H
0
if : z
/2
z
obs
+z
/2
With this decision rule the probability of a type I error is . This is called a 100% test.
CASE 2 H
0
:
0
H
A
: >
0
Here, there is evidence in favour of the alternative hypothesis, and hence against the null hypothesis, if the sample
mean is more than
0
. The null hypothesis can be condently rejected in favour of alternative hypothesis if the
sample mean is substantially larger than
0
. An appropriate decision rule would now be
_
`
Reject H
0
if : x > U
Do not reject H
0
if : x U
A
: >
0
X
X N
0
,

2
n
0
U
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0

Do Not Reject H
0
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
The value U has an area to the right of and so has a zscore of z

_
`
Reject H
0
if : z
obs
> +z
Do not reject H
0
if : z
obs
+z
With this decision rule the probability of a type I error is .

CASE 3 H
0
:
0
H
A
: <
0
Here, there is evidence in favour of the alternative hypothesis, and hence against the null hypothesis, if the sample
mean is substantially less than
0
. An appropriate decision rule would now be
_
`
Reject H
0
if : x < L
Do not reject H
0
if : x L
A
: <
0
X
X N
0
,

2
n
L
0
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0

Do Not Reject H
0
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
In Figure 9.4 the value L has an area to the left of and so, by the symmetry of the normal curve, has a zscore
of z

_
`
Reject H
0
if : z
obs
< z
Do not reject H
0
if : z
obs
z
With this decision rule, the probability of a type I error is .

In the arguments above, for each alternative hypothesis a rule was derived that decided whether the null hypothe-
sis would be rejected. These are called decision rules.
Decision rule: A rule which decides whether or not to reject H
0
based on the value of the test statistic. The
decision rule must specify a decision for every possible value of the test statistic.
Each decision rule determines a rejection region.
Rejection region: The set of values of the test statistic for which the null hypothesis is rejected.
The dividing point between the rejection and non-rejection region is called the critical value.
Critical value: The dividing point between the rejection and non-rejection regions.
At the end of this difcult section you have learned a very simple principle:
The rejection region is determined by the alternative hypothesis and the level of the test.
The rejection region for each of the three cases is shown on the next page. In CASE 1 the rejection region is in
both tails of the Zdistribution. This is referred to a two-tailed test. CASES 2 and 3 are one-tailed tests.
_
CASE 1 : H
A
: =
0
Reject H
0
if z
obs
< z
/2
or > +z
/2
Figure 9.5: The Rejection Region for H
A
: =
0
Z
0
z
/2
+z
/2
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ........................................................................... . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .

Reject H
0
Reject H
0

Do Not Reject H
0
........................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
/2
........................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
/2
CASE 2 : H
A
: >
0
Reject H
0
if z
obs
> +z

A
: >
0
Z
0 +z
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .

Reject H
0

Do Not Reject H
0
.................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
CASE 3 : H
A
: <
0
Reject H
0
if z
obs
< z

A
: <
0
Z
0 z
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ........................................................................... . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0

Do Not Reject H
0
.................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Example 9.7
The null hypothesis in Example 9.1 is tested at the 5% level. What is the decision rule?
Solution
The null and alternative hypotheses are
H
0
: = 600
H
A
: = 600 = 0.05
As the alternative hypothesis is of the = form, this is a CASE 1 two-tailed test and so the rejection region is as
A
: = 600, = 0.05
Z z
0.025
0 z
0.025
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0
Reject H
0

Do Not Reject H
0
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.025
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.025
The area below z
0.025
is (1.0000 0.025) = 0.9750, and so from the inverse normal tables (UC Statistical Tables
page 33) z
0.025
= 1.9600. The decision rule is:
Reject H
0
at 5% level if : z
obs
< 1.9600 or > +1.9600
Do not reject H
0
at the 5% level if : 1.9600 z
obs
+1.9600
Example 9.8
The null hypothesis in Example 9.2 is tested at the 10% level. What is the decision rule?
Solution
The null and alternative hypotheses are
H
0
: 60
H
A
: > 60 = 0.10
As the alternative hypothesis is of the > form, this is a CASE 2 one-tailed test and so the rejection region is as
A
: > 60, = 0.10
Z
0 z
0.10
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Reject H
0

Do Not Reject H
0
.................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.10
The area below z
0.10
is (1.0000 0.1000) = 0.9000, and so from the inverse normal tables (UC Statistical Tables
page 33) z
0.10
= 1.2816. The decision rule is:
Reject H
0
at 10% level if : z
obs
> +1.2816
Do not reject H
0
at the 10% level if : z
obs
+1.2816
Quick Question 3
9.7 Step 5: Calculate the value of the test statistic
The sample and hypothesised values are substituted into the formula for the test statistic selected in Step 3. This
is the easiest step of the hypothesis test.
However do not stop here. Hypothesis tests are carried out to answer questions about populations. The nal step
is to answer the question asked about the population.
9.8 Step 6: Make a decision and answer the question
The calculated value of the test statistic is compared to the rejection region determined in Step 4. There are only
two possible conclusions:
If the calculated value of the test statistic lies in the rejection region, then the null hypothesis is rejected.
If the calculated value of the test statistic does not lie in the rejection region, then there is insufcient
evidence to reject the null hypothesis.
This does not mean that the null hypothesis has been accepted. Always remember that the null
hypothesis is never shown to be true.
Finally the conclusions should be stated in terms of the original statement. The level of the test used should be
included in the conclusions as an indicator of the strength of the evidence considered necessary for the rejection
of the null hypothesis.
Refer to Example 9.1. The statement made is that the mean wage of ACT workers is $600 per week. The
hypotheses are:
H
0
: = 600
H
A
: = 600 = 0.05
The two possible results are:
1. H
0
is rejected at the 5% level. The conclusion from the test is then:
The mean wages of domestic workers is not $600 per week at the 5% level.
2. H
0
is not rejected at the 5% level. The conclusion from the test is then:
There is insufcient evidence in the sample to show, at the 5% level, that the
mean wage of domestic workers is not $600 per week.
This concludes the six step procedure for testing hypotheses. You may have found some of the arguments above
difcult to follow but the six step procedure is very straightforward to use.
9.9 Two examples of hypothesis testing
Example 9.9
In 2001 the prices of textbooks were found to be normally distributed with a mean of $60 and a standard deviation
of $10. A simple random sample of 20 textbooks in 2002 had prices of ($)
63 69 77 74 61 85 51 87 73 49
65 53 61 67 53 83 61 49 47 57
Assume that textbook prices are still normally distributed with a standard deviation of $10. Has the mean price
of textbooks changed between 2001 and 2002? Use a 5% test.
Solution
Let = the mean price of textbooks in 2002 ($).
Steps 1 and 2 : state the hypotheses and specify the level of the test
If the mean price of textbooks has changed, then = 60.
The statement is : = 60
The negation is : = 60
The hypothesis with the equality becomes the null.
H
0
: = 60
H
A
: = 60 : = 0.05
Step 3 : select a test statistic
z
obs
=
x
0
n
Step 4 : dene the decision rule
As H
A
is a = hypothesis, this is a two-tailed test at the 5% level. The rejection region is the shaded
region in Figure 9.10 below.
A
: = 60, = 0.05
Z z
0.025
0 z
0.025
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.025
....................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.025
To nd the critical value from the inverse normal tables, note that the area below the upper critical
value is (1.0000 0.0250) = 0.9750. Then from UC Statistical Tables page 33
z
0.025
= 1.9600
The decision rule is:
Reject H
0
at 5% level if: z
obs
< 1.9600 or > +1.9600
Do not reject H
0
at the 5% level if: 1.9600 z
obs
+1.9600
Step 5: calculate the value of the test statistic
Entering the data values into a calculator in SD mode and using the n and x keys gives
n = 20 x = 64.25
.
.
. z
obs
=
64.25 60
10
20
= 1.9007 (using = 10)
Step 6: make a decision and answer the question
The value 1.9007 is not in the rejection region. Do not reject H
0
in favour of H
A
at the 5% level.
There is insufcient evidence in the data to conclude, at the 5% level, that there has been a change in
the mean price of textbooks.
Example 9.10
In 2001 the prices of textbooks were found to be normally distributed with a mean of $60 and a standard deviation
of $10. A simple random sample of 20 textbooks in 2002 had prices of ($)
63 69 77 74 61 85 51 87 73 49
65 53 61 67 53 83 61 49 47 57
Assume that textbook prices are still normally distributed with a standard deviation of $10. Has the mean price
of textbooks increased between 2001 and 2002? Use a 5% test.
Solution
Let = the mean price of textbooks in 2002 ($).
If the mean price of textbooks has increased, then > 60.
The statement is : > 60
H
0
: 60
H
A
: > 60 : = 0.05
z
obs
=
x
0
n
As H
A
is a > hypothesis, this is a one-tailed upper tail test at the 5% level. The rejection region is the
shaded region in the Figure 9.11.
A
: > 60, = 0.05
Z
0 z
0.05
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0.05
From the inverse normal tables on page 33 of UC Statistical Tables z
0.05
= 1.6449.
Reject H
0
at 5% level if : z
obs
> +1.6449
Do not reject H
0
at the 5% level if : z
obs
+1.6449
z
obs
=
64.25 60
10
20
= 1.9007 (using = 10)
The value of 1.9007 is in the rejection region. Reject H
0
in favour of H
A
at the 5% level.
We can conclude, at the 5% level, that there has been an increase in the mean price of textbooks.
Quick Question 4
9.10 Calculating p-values
There is no right level at which to carry out a test. Some researchers habitually use a 5% test while others always
use a 1% test. A rational approach to selecting an appropriate value of would take at least three factors into
account:
1. the cost of a type I error;
2. the cost of a type II error; and
3. the cost of the sample.
If making a type I error is extremely costly, then the level of should be made small so that this cost is unlikely
to be incurred. For example, if falsely rejecting H
0
would result in an unwarranted heavy capital expenditure,
then a small value of should be used. On the other hand, if a type II error is the more costly, then should be
made small and this could involve making large. The values of and can both be reduced by taking a larger
sample. Where high costs are involved, large sample sizes are justied on a cost-effectiveness basis.
A problem arises where a researcher has used one level of test but a user of the results feels that a different level
of test is required. Suppose that the researcher has carried out a test of the hypotheses
H
0
: =
0
H
A
: =
0
: =
0
at the100
0
% level and in Step 5 the position is as shown in Figure 9.12 below:
Figure 9.12: Testing the Hypothesis: H
0
: =
0
at the 100
0
% Level
Z z
0
/2 0
+z
0
/2
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0
/2
....................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
0
/2
..........................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
As z
obs
is in the rejection region, the researcher rejects H
0
. However, if the user wants to employ a smaller level
of say
1
, then the situation from the users point of view could be as shown in Figure 9.13.
Figure 9.13: Testing the Hypothesis: H
0
: =
0
at the 100
1
% Level
Z
z
1
/2 0
+z
1
/2
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
1
/2
............................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
1
/2
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
From the users point of view the null hypothesis is not rejected. What is required is a method of reporting
the results of a hypothesis test so that different users using different levels of can decide whether, from their
viewpoint, the null hypothesis should be rejected. This can be achieved by the researcher quoting the pvalue for
the test.
From Figures 9.12 and 9.13, you would note that there is denitely some value of between
0
and
1
at which
the null hypothesis is just rejected. This position is illustrated in Figure 9.14. The value of for which the null
hypothesis is just rejected is called the pvalue for the test.
P-value for a test: The value of for which the null hypothesis is just rejected.
Figure 9.14: Finding the pvalue for the Hypothesis : H
0
: =
0
Z
0
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
............................................................................................................ . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
....................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
This pvalue is reported. If the value of is increased to be larger than the pvalue in Figure 9.14, z
obs
will lie
in the rejection region and so H
0
should be rejected. On the other hand, if is made smaller, then z
obs
will not be
in the rejection region and so H
0
should not be rejected.
Any user of the test results can therefore apply the following decision rule:
_
`
If users > pvalue : reject H

0
If users pvalue : do not reject H
0
Example 9.11
In Example 9.9, the calculated value of the test statistic was 1.90. What is the pvalue for this test?
(a) Is the null hypothesis rejected at the 10% level?
(b) Is the null hypothesis rejected at the 5% level?
Solution
The null and alternate hypotheses being tested are:
H
0
: = 60
H
A
: = 60
This is a two-tailed test and so to nd the pvalue, we need to nd the shaded area in Figure 9.15.
A
: = 60
Z
0 1.90
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
.................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
........................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
From the standard normal tables with Z = 1.90, the area below z
obs
is 0.9713
.
.
. pvalue = 2 (1.0000 0.9713) = 0.0574
The pvalue for the test is 0.0574.
(a) = 0.10 p value = 0.0574
.
.
. > pvalue
.
.
. Reject H
0
at the 10% level
(b) = 0.05 pvalue = 0.0574
.
.
. pvalue
.
.
. Do not reject H
0
at the 5% level
Example 9.12
In Example 9.10, the calculated value of the test statistic was 1.90. What is the pvalue for this test?
(a) Is the null hypothesis rejected at the 10% level?
(b) Is the null hypothesis rejected at the 5% level?
Solution
The null and alternate hypotheses being tested are:
H
0
: 60
H
A
: > 60
This is a one-tailed test and so to nd the pvalue, we need to nd the shaded area in Figure 9.16.
A
: > 60
Z
0 1.90
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
pvalue
..................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
.
.
. pvalue = (1.0000 0.9713) = 0.0287
(a) = 0.10 p value = 0.0287
.
.
. > p value
.
.
. Reject H
0
at the 10% level
(b) = 0.05 p value = 0.0287
.
.
. > p value
.
.
. Reject H
0
at the 5% level
Quick Question 5
9.11 Summary
In this unit you have learned the principles of hypothesis testing. Although you may have found some of these
principles difcult to grasp, the application of these principles is very straight-forward.
You learned that using samples to test hypotheses can lead to two types of errors:
1. Type I error: The rejection of a true null hypothesis.
2. Type II error: The non-rejection of a false null hypothesis.
The probability of a type I error is denoted by and the probability of a type II error is denoted by . The
researcher chooses the value of . The value of is not specied in the testing procedure but you must always
remember that reducing will lead to an increase in .
The most common approach to hypothesis testing uses a six step procedure. These six steps are:
Step 1: State the null and alternative hypotheses;
Step 2: Specify the level of the test;
Step 3: Select a test statistic;
Step 4: Dene the decision rule;
Step 5: Calculate the value of the test statistic;
Step 6: Make a decision and answer the question.
In this unit you learned to use the six step procedure to test hypotheses about the population mean, when the
population standard deviation is known. In the next unit this six step procedure will be used to test other forms of
hypotheses.
UNIT 10
HYPOTHESIS TESTING
APPLIED
10.0 Contents
10.1 Unit objectives
10.2 Introduction
10.3 Test on a population mean: variance unknown
10.4 Testing differences between means: variances unknown but equal
10.5 Testing differences between means: variances unknown but unequal
10.6 Testing differences between means: paired data
10.7 Test on a population proportion
10.8 Testing differences between proportions
10.9 Summary
Print Workbook 10
In the previous unit you learned the principles of hypothesis testing. In this unit these principles have been applied
to testing four kinds of hypotheses. For one of these hypotheses, on the difference between two population means,
there are three different tests that can be used and so a total of six tests are described in this unit. For each test
you are told the conditions required for the test, the test statistic and the rejection region. We have also included
an example illustrating the use of each test.
Test a hypothesis on a single population mean;
Use independent samples to test a hypothesis on the difference between two population means;
Use paired samples to test a hypothesis on the difference between two population means;
Test a hypothesis on a single population proportion;
Test a hypothesis on the difference between two population proportions.
10.2 Introduction
In the last section you learned the principles of hypothesis testing. In this unit you will learn how to apply those
principles to six hypothesis tests.
The unit is in the form of a manual. For each test:
You are told the conditions required for the test. When all of these conditions are met, the test will reject
the null hypothesis when it is true with probability where is the value nominated in Step 2 of the
six step procedure. If the test is used when one or more of these conditions is not satised, then the
probability of rejecting H
0
when it is true will not equal the value of nominated in Step 2. A test is
said to be robust with regard to one of its conditions if the true level of the test is close to the nominated
level when the condition is violated. The test can then be used, with caution, where the condition is not
satised.
Robust test: A test is said to be robust with regard to one of its conditions if the true level of the test is
close to the nominated level when the condition is violated.
Where a condition not being satised can lead to the true level of the test being very different from the
nominated level, the test is said to be fragile with regard to this condition. The test should not be used
when this condition is violated.
Fragile test: A test is said to be fragile with regard to one of its conditions if the condition not being
satised can lead to the true level of the test being very different from the nominated level of the test.
The six step procedure is presented. The procedure is described for the two-tailed test. If the alternative
hypothesis is for a one-tailed test, the only adjustments are to the rejection region and the conclusions
drawnas described in the previous unit.
Comments are made. Here some other approaches to testing the stated hypotheses are described. You
are not expected to know the details of these other procedures but you should be aware that there are
alternatives to the standard ttests and ztests described in this unit. The other procedures are known as
non-parametric tests. All ttests have the formal requirement that the population is normally distributed,
although the test is often robust with regard to this assumption. Some non-parametric tests are valid for
any population distribution while others have weaker requirements than normalitysuch as symmetry
or continuity. These tests are often tests on the median rather than the mean. Non-parametric tests are
often very powerful and are generally under-utilised by economists.
A full example is given of the application of the test.
Each test is presented separately. You can read the tests in any order. This has resulted in some duplicationso
dont be surprised to read the same statements in a number of the tests.
10.3 Test on a population mean: variance unknown
In this section you will learn to test hypotheses on the mean of a population. Examples of the type of hypothesis
to be tested are:
the mean wage of an ACT worker is $600 per week;
the average speed of cars on Belconnen Way is more than 80 kph;
on average, rms reinvested 30% of their gross prots;
the toner cartridge in the printer needs to be replaced, on average, every 50,000 pages.
All of these statements are statements about the mean of a population. These hypotheses are of the same form as
those considered in Unit 9 but the unrealistic assumption that the population standard deviation is knownmade
in Unit 9is dropped.
Conditions for the test
(a) A simple random sample is taken from the population.
(b) The population is normal.
The test is robust with regard to the second condition (for reasonable sized samplessay n > 10) but fragile with
regard to the rst condition.
The six step procedure is presented on the following page.
The six step procedure for testing H
0
: =
0
H
0
: =
0
H
A
: =
0
: =
t
obs
=
x
0
s
n
where
0
= hypothesised population mean
x = the sample mean
s = the sample standard deviation
n = the sample size
Reject H
0
at the 100% level of signicance if
t
obs
< t
/2
or > +t
/2
where t
/2
is from the ttables with = n 1 degrees of freedom.
Comments
1. Many textbooks advocate using the ztest described in the previous unitwith s used as an estimator of
when the sample size is more than 30. The value from the ztables is very slightly less than the equivalent
ttable value for large sample sizes and so this practice makes it easier to reject H
0
. The true level of the
test is then very slightly higher than the nominated level of the test. This runs against the standard practice in
statistics which is to be conservative and make the true level of the test no higher than the nominated level.
There seems to be no real advantage in using the ztest rather than the more correct ttest when ttables are
so readily available. It is better to use the ttest whenever the standard deviation has to be estimated from
the sample, irrespective of the sample size.
2. When there is no row in the ttables for the required degrees of freedom, use the table degrees of freedom
less than the required degrees of freedom. (This, in line with the principle of conservatism, will make the
true value of slightly less than the nominated value.)
3. The robustness of the test with regard to the normality of the population depends on the sample size and the
degree of non-normality. If the population is close to bell-shaped, the test can be used with samples of 5 or
more. If the population is very skewed or has heavier tails than the normal distribution, a larger sample size
is required. In these cases consider using a non-parametric test such as the sign test or the signed rank test.
4. With the ttest it is not possible to obtain an exact pvalue for the test from the ttables in the UC Statistical
Tables , only a range within which the pvalue lies. However, there are now a number of web sites that can
be used to calculate exact pvalues.
Example 10.1
The ACT Government is considering the introduction of speed cameras along Belconnen Way. It is believed that
the average speed of cars travelling along Belconnen Way exceeds the speed limit of 80 kph. A simple random
sample of 10 cars had speeds as follows (kph):
96 78 87 74 84 77 83 86 110 80
Do these data provide sufcient evidence to support the ACT Governments belief at the 10% level of signi-
cance? What is the pvalue for this test?
Solution
Let = the mean speed of cars on Belconnen Way (kph)
The statement is : > 80
H
0
: 80
H
A
: > 80 : = 0.10
t
obs
=
x
0
s
n
As H
A
is a > hypothesis, this is a one-tailed upper tail test at the 10% level. From the ttables with
v = n 1 = 9, the rejection region is the shaded region in the diagram below.
A
: > 80 : = 0.10
T
= 9
0 1.383
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.10
Reject H
0
at the 10% level of signicance if:
t
obs
> 1.383
Do not reject H
0
t
obs
1.383
n = 10 x = 85.50 s = 10.6066
.
.
. t
obs
=
85.50 80
10.6066
10
= 1.640
A
: > 80 : = 0.10
T
= 9
0 1.383
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
t
obs
= 1.640
.......................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.10
Reject H
0
in favour of H
A
at the 10% level.
There is sufcient evidence in the sample data to conclude at the 10% level that the mean speed of
cars is greater than 80 kph.
Finding the pvalue
To nd the pvalue look along the row of the ttables for = 9 to nd two tscores that lie on either
side of the t
obs
= 1.640
0.75 0.80 0.85 0.90 0.95 0.975 0.99 0.995 0.999 0.9995
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
= 9 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 10.3: Calculating the pvalue for H
A
: > 80
T
= 9
0 1.383 1.833
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
t
obs
= 1.640
....................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
pvalue
.................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
........................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Area to right = 0.10
Area to right = 0.05
The pvalue is the shaded area and so 0.05 < pvalue < 0.10. An exact pvalue cannot be
found from the UC Statistical Tables .
Quick Question 1
10.4 Testing differences between means: variances unknown but equal
In this section you will learn to test hypotheses on the difference between two population means. Examples of
the type of hypothesis to be tested are:
Men are paid more on average than women;
Following the introduction of new work practices, the average time taken by a worker to complete the
task fell by 2 minutes;
Between 1998 and 1999 the earnings of the average urban household rose by $50 per week;
Using the new fertiliser increases the average yield per acre.
There are three different tests on the difference between two population means.
1. the pooled ttest
This test assumes that
The data was collected by two independent samples.
With the independent samples method, a simple random sample is taken from the rst population and
the values recorded. Then a second simple random sample is taken from the second population and the
values recorded. The sample taken from the second population is completely separate from the sample
taken from the rst population.
The two populations have the same variance.
2. the unequal variances ttest
Notice that this test is more general than the pooled ttest because it does not make any assumptions about
the population variancesthey can be equal or unequal. This test is slightly more complicated than the
pooled ttest but is more widely applicable.
3. the paired ttest
The data was collected in pairs.
With the paired sample method, the rst observation on population 1 is related in some way to the rst
observation on population 2. Similarly the second observation on population 1 is related to the second
observation on population 2 etc. The samples are not chosen separately and so are not independent.
Notice that this test does not assume that the two populations have the same variance.
In this section we will look at the pooled ttest and in the next two sections at the unequal variances and paired
ttests.
Example 10.2
Men are paid more on average than women.
Formulate the null and alternative hypotheses for testing this statement. Explain how you would collect the data
for a pooled ttest.
Solution
Let
1
= the mean wage of men in full-time employment ($ per week)
2
= the mean wage of women in full-time employment ($ per week)
The statement is :
1
>
2
The negation is :
1

2
When formulating hypotheses it is convenient to have all the parameters on the same side of the inequalities and
so the above should be rewritten as
The statement is :
1
2
> 0
The negation is :
1
2
0
H
0
:
1
2
0
H
A
:
1
2
> 0
To collect the data, rst obtain a sample frame of all men in full-time employment and a sample frame of all
women in full-time employment. Assign each man in the male sample frame a number and use a random number
table to select the required sample size. Record the weekly wages of all the selected males. Repeat this process
to select an independent simple random sample of women from the female sample frame. The random numbers
used to select the women are, of course, different from the random numbers used to select the men. Record the
wages of the selected women.
Notice that the two samples are selected separately. They are independent samples.
Example 10.3
Following the introduction of new work practices, the average time taken by a worker to complete the task fell by
2 minutes.
for a pooled ttest.
Solution
Let
1
= the mean time to complete the task with the old practices (minutes)
2
= the mean time to complete the task with the new practices (minutes)
The statement is :
2
=
1
2
The negation is :
2
=
1
2
Taking all the parameters to the same side of the inequalities gives:
The statement is :
1
2
= 2
The negation is :
1
2
= 2
H
0
:
1
2
= 2
H
A
:
1
2
= 2
To collect the data, rst obtain a sample frame of all the workers in the factory. Use a random number table to
select a simple random sample of workers and record the time each of these workers takes to complete the task
using the old work practices. Then select a second simple random sample of workers and record how long it takes
each worker to complete the task using the new work practices.
Quick Question 2
(a) A simple random sample is taken from each of the two populations.
(b) The two samples are independent.
(c) The two populations are normal.
(d) The two populations have the same variance.
2
1
=
2
2
=
2
The test is robust with regard to conditions (c) and (d), but fragile with regard to conditions (a) and (b).
Notice that in this test it is assumed that the two population variances are equal. The test works well if the two
population variances are unequal but of the same order of magnitude. The sample variances are used to determine
whether the population variances are similar enough to use this test.
If
s
2
larger
s
2
smaller
10 : population variances are similar and so use this test.
If
s
2
larger
s
2
smaller
> 10 : population variances are not similar and so use the unequal variances test in the next section.
where s
2
larger
= the larger of the two sample variances
s
2
smaller
= the smaller of the two sample variances
0
:
1
2
=
0
H
0
:
1
2
=
0
H
A
:
1
2
=
0
: =
Only use this test if :
s
2
larger
s
2
smaller
10
t
obs
=
(x
1
x
2
)
0
s
p
1
n
1
+
1
n
2
where
0
= hypothesised difference between the two population means
x
1
= mean of the sample from population 1
x
2
s
p
= pooled standard deviation
=
(n
1
1)s
2
1
+ (n
2
1)s
2
2
n
1
+ n
2
2
s
1
= standard deviation of the sample from population 1
s
2
n
1
= the size of the sample from population 1
n
2
Reject H
0
t
obs
< t
/2
or > +t
/2
where t
/2
is from the ttables with = n
1
+ n
2
2 degrees of freedom.
Comments
1. Again many textbooks advocate using a ztest, with the sample standard deviations used as estimators of the
population standard deviations when the sample sizes are more than 30. This gives the test statistic of
z
obs
=
(x
1
x
2
)
0
2
1
n
1
+

2
2
n
2
There seems to be no real advantage in doing this when ttables are so readily available.
less than the true degrees of freedom. This will make the true value of slightly less than the reported value.
3. If the population distributions are very non-normal, there are a number of non-parametric tests, such as the
Mann-Whitney U test, that can be used. These tests can be both more reliable and more powerful than
the standard ttest for very non-normal distributions. Generally non-parametric tests are under-utilised by
economists and accountants.
4. With the ttest it is not possible to obtain an exact pvalue from the ttables in the UC Statistical Tables ,
only a range within which the pvalue lies. However, there are a number of web sites that can be used to
calculate exact pvalues.
Example 10.4
A supplier of sweets is evaluating a new, and more expensive, retail display. In 10 randomly selected outlets the
older, cheaper display was used. The number of units sold in a week at these 10 outlets were:
30 10 44 38 62 56 48 36 64 28
The supplier used the new display in 10 randomly selected retail outlets. The number of units sold in a week at
these 10 retail outlets were:
56 40 55 48 67 54 62 60 80 34
Does the new display result in an increase in sales? Use a 1% test.
Solution
Let
1
= the mean weekly sales at a retail outlet with the old display
2
= the mean weekly sales at a retail outlet with the new display
If the new display leads to an increase in sales then
1
<
2
.
The statement is :
1
<
2
or
1
2
< 0
The negation is :
1

2
or
1
2
0
H
0
:
1
2
0
H
A
:
1
2
< 0 : = 0.01
Entering the data into a calculator gives:
n
1
= 10 x
1
= 41.60 s
1
= 16.7809
n
2
= 10 x
2
= 55.60 s
2
= 13.1504
.
.
.
s
2
larger
s
2
smaller
=
16.7809
2
13.1504
2
= 1.63 < 10
As the ratio of the variances is less than 10 the assumption that the two populations have the same
variance can be made. The test statistic is then:
t
obs
=
(x
1
x
2
)
0
s
p
1
n
1
+
1
n
2
As H
A
is a < hypothesis, this is a one-tailed lower tail test at the 1% level. From the ttables with
v = n
1
+ n
2
2 = 18 the rejection region is the shaded region in Figure 10.5.
A
:
1
2
< 0, = 0.01
T
= 18
2.552 0
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.01
Reject H
0
at the 1% level of signicance if t
obs
< 2.552
n
1
= 10 x
1
= 41.60 s
1
= 16.7809
n
2
= 10 x
2
= 55.60 s
2
= 13.1504
.
.
. s
P
=
9 16.7809
2
+ 9 13.1504
2
18
= 15.0753
.
.
. t
obs
=
(41.60 55.60) 0
15.0753
_
1
10
+
1
10
= 2.077
The calculated value of t
obs
is not in the rejection region.
Do not reject H
0
in favour of H
A
at the 1% level.
There is insufcient evidence in the sample data to conclude at the 1% level that the introduction of
the new display leads to an increase in sales.
This conclusion illustrates the point, made repeatedly in the previous unit, that not rejecting H
0
should
not be interpreted as showing that H
0
is true. In the two samples, the mean sales with the new display
is 14 units per week higher than with the old display. This strongly suggests that the null hypothesis
of no increase in sales is falsebut the evidence is not strong enough at the 1% level to show that the
display does improve sales. The option of trying to collect more data should be pursued. (However,
some care is needed here. The sometimes-used test procedure of continuing to gather data until the
null is rejected is not valid test methodology. With enough tries any hypothesis can be rejected!)
Quick Question 3
10.5 Testing differences between means: variances unknown and un-
equal
There are three different tests on the difference between two population means.
1. the pooled ttest
With the independent samples method, a simple random sample is taken from the rst population and
the values recorded. Then a second simple random sample is taken from the second population and the
values recorded. The sample taken from the second population is completely separate from the sample
taken from the rst population.
The two populations have the same variance.
2. the unequal variances ttest
Notice that this test is more general than the pooled ttest because it does not make any assumptions about
the population variancesthey can be equal or unequal. This test is slightly more complicated than the
pooled ttest but is more widely applicable.
3. the paired ttest
The data was collected in pairs.
With the paired sample method, the rst observation on population 1 is related in some way to the rst
observation on population 2. Similarly the second observation on population 1 is related to the second
observation on population 2 etc. The samples are not chosen separately and so are not independent.
Notice that this test does not assume that the two populations have the same variance.
In this section we will look at the unequal variances ttest. In the previous section we looked at the pooled ttest
and in the next section we will look at the paired ttest.
(a) A simple random sample is taken from each of the two populations.
(b) The two samples are independent.
(c) The two populations are normal.
The test is robust with regard to condition (c), but fragile with regard to conditions (a) and (b).
Notice that in this test it is not assumed that the two population variances are equal. The variances may or may not
be equal. The population variances are never known and so the sample variances are used to determine whether
to use this test or the pooled ttest. Always remember that the pooled ttest does assume that the two population
variances are equal.
If
s
2
larger
s
2
smaller
10 : population variances are similar and so use the pooled ttest.
If
s
2
larger
s
2
smaller
> 10 : population variances are not similar and so use the unequal variances ttest in this section.
where s
2
larger
= the larger of the two sample variances
s
2
smaller
= the smaller of the two sample variances
0
:
1
2
=
0
H
0
:
1
2
=
0
H
A
:
1
2
=
0
: =
t
obs
=
(x
1
x
2
)
0
s
2
1
n
1
+
s
2
2
n
2
where
0
= hypothesised difference between the population means
x
1
x
2
s
1
s
2
n
1
n
2
Reject H
0
t
obs
< t
/2
or > +t
/2
where t
/2
is from the ttables with degrees of freedom of
=
_
s
2
1
n
1
+
s
2
2
n
2
_
2
_
_
_
s
2
1
n
1
_
2
n
1
1
+
_
s
2
2
n
2
_
2
n
2
1
_
_
In general the calculated value of will not be an integer. Round the calculated value downwards to
an integer value.
Comments
1. Again many textbooks advocate using a ztest, with the sample standard deviations used as estimators of the
population standard deviations when the sample sizes are more than 30. This gives the test statistic of
z
obs
=
(x
1
x
2
)
0
2
1
n
1
+

2
2
n
2
There seems to be no real advantage in doing this when ttables are so readily available.
less than the true degrees of freedom. This will make the true value of slightly less than the reported value.
3. If the population distributions are very non-normal, there are a number of non-parametric tests that can be
used. These tests can be both more reliable and more powerful than the standard ttest for very non-normal
distributions. Generally non-parametric tests are under-utilised by economists and accountants.
only a range within which the pvalue lies. However there are now a number of web sites that can be used
to calculate exact pvalues.
Example 10.5
A supplier of sweets is evaluating a new, and more expensive, retail display. In 10 randomly selected outlets the
older, cheaper display was used. The number of units sold in a week at these 10 outlets were:
30 10 44 38 62 56 48 36 64 28
The supplier used the new display in 10 randomly selected retail outlets. The number of units sold in a week at
these 10 retail outlets were:
5 7 55 130 126 200 5 8 10 10
Does the new display result in an increase in sales? Use a 1% test.
Solution
Let
1
2
If the new display leads to an increase in sales then
1
<
2
.
The statement is :
1
<
2
or
1
2
< 0
The negation is :
1

2
or
1
2
0
H
0
:
1
2
0
H
A
:
1
2
< 0 : = 0.01
There is clearly a greater spread in the sales with the new display than with the old display. The
assumption of equal variances made in the previous is not appropriate here. Entering the data into a
calculator gives:
n
1
= 10 x
1
= 41.60 s
1
= 16.7809
n
2
= 10 x
2
= 55.60 s
2
= 70.9071
.
.
.
s
2
larger
s
2
smaller
=
70.9071
2
16.7809
2
= 17.85 > 10
As the ratio of the variances is greater than 10 the assumption that the two populations have the same
variance cannot be made. The pooled ttest should not be used. The unequal variances test statistic
is:
t
obs
=
(x
1
x
2
)
0
s
2
1
n
1
+
s
2
2
n
2
As H
A
is a < hypothesis, this is a one-tailed lower tail test at the 1% level.
Here =
_
s
2
1
n
1
+
s
2
2
n
2
_
2
_
_
_
s
2
1
n
1
_
2
n
1
1
+
_
s
2
2
n
2
_
2
n
2
1
_
_
=
_
16.7809
2
10
+
70.9071
2
10
_
2
_
(16.7809
2
/10)
2
9
+
(70.9071
2
/10)
2
9
_ = 10.004
= 10
(Notice that the degrees of freedom have been rounded downwards.) From the ttables with = 10
the rejection region is the shaded region in Figure 10.5.
A
:
1
2
< 0, = 0.01
T
= 10
2.764 0
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.01
Reject H
0
obs
< 2.764
n
1
= 10 x
1
= 41.60 s
1
= 16.7809
n
2
= 10 x
2
= 55.60 s
2
= 70.9071
.
.
. t
obs
=
(41.60 55.60) 0
_
16.7809
2
10
+
70.9071
2
10
= 0.608
Do not reject H
0
in favour of H
A
at the 1% level.
There is insufcient evidence in the sample data to conclude at the 1% level that the introduction of
the new display leads to an increase in sales.
Quick Question 4
10.6 Testing differences between means: paired data
There are two sampling methods that can be used to collect the data required to test these hypotheses: two
independent samples and matched pairs samples. In this section we will look at the matched pairs method. The
independent samples method is described in the previous two sections.
With the matched pairs method there is a population of experimental units on which observations can be made.
A simple random sample of experimental units is taken and the value of the experimental units in each of the two
populations recorded. A pair of values is recorded for each of the experimental units.
Example 10.6
Men are paid more on average than women.
for a matched pairs ttest.
Solution
Let
1
= the mean wage of men in full-time employment ($ per week)
2
= the mean wage of women in full-time employment ($ per week)
The statement is :
1
>
2
The negation is :
1

2
When formulating hypotheses it is convenient to have all the parameters on the same side of the inequalities and
so the above should be rewritten as
The statement is :
1
2
> 0
The negation is :
1
2
0
H
0
:
1
2
0
H
A
:
1
2
> 0
To collect the data, pair each man with a woman in the same occupation. These pairs of males and females are
the experimental units. Select a simple random sample of pairs and record the wages of the male and female in
the pair. Note that the observations come in pairsa males wage and a females wage in the same occupation.
Example 10.7
Following the introduction of new work practices, the average time taken by a worker to complete the task fell by
2 minutes.
for a matched pairs ttest.
Solution
Let
1
= the mean time to complete the task with the old practices (minutes)
2
= the mean time to complete the task with the new practices (minutes)
The statement is :
2
=
1
2
The negation is :
2
=
1
2
The hypothesis with the equality becomes the null. Taking all the parameters to the same side of the inequalities
gives:
H
0
:
1
2
= 2
H
A
:
1
2
= 2
To collect the data, rst obtain a sample frame of all the workers in the factorythese are the experimental
units. Use a random number table to select a simple random sample of workers and record the time each of these
workers takes to complete the task under the old work practices and under the new work practices. From each
worker we obtain a pair of observationsthe time to complete the task under the old work practices and the time
taken under the new work practices. In running this experiment, half of the selected workers should complete
the task using the old practices rst followed by the new practices. The other half of the selected workers should
reverse the order and complete the task using the new work practices rst and then using the old work practices.
Quick Question 5
(a) A simple random sample of experimental units is taken from the population of experimental units.
(b) Each experimental unit is observed twiceonce in population 1 and once in population 2.
(c) The differences between the values of the experimental units in population 1 and in population 2 are nor-
mally distributed.
The test is robust with regard to condition (c), but fragile with regard to conditions (a) and (b).
The six step procedure for this test is given on the following page.
0
:
1
2
=
0
H
0
:
1
2
=
0
H
A
:
1
2
=
0
: =
t
obs
=
d
0
s
d
n
where
0
= hypothesised mean population difference
n = the number of sampled experimental units
x
i
= value of experimental unit i in population 1
y
i
= value of experimental unit i in population 2
d
i
= x
i
y
i
for i = 1, 2, . . . , n
d = mean of the d
i
values
s
d
= standard deviation of the d
i
values
Reject H
0
t
obs
< t
/2
or > +t
/2
where t
/2
Comments
1. When there is no row in the ttables for the required degrees of freedom use the table degrees of freedom
less than the true degrees of freedom. This, in line with the principle of conservatism, will make the true
value of slightly less than the reported value.
2. The test works well when the population of differences is non-normal provided the sample of experimental
units is of a reasonable sizesay n > 10.
3. If the population of differences is very non-normal, there are a number of non-parametric tests that can be
used. These tests can be both more reliable and more powerful than the standard ttest. Generally non-
parametric tests are under-utilised by economists.
4. The test is valid even if the observations are not in pairs but are from two independent samples and are then
randomly paired. However, with independent samples the tests described in the previous two sections are
more efcient.
only a range within which the pvalue lies. However there are now a number of web sites that can be used
to calculate exact pvalues.
6. The matched pairs experiment is an example of the technique of blocking that is widely used in experiment
design to increase the efciency of a hypothesis test. The observed values in an experiment are affected by
many factors and the differences in these factors can mask the effect of the factor of interest. If the other
factors can be held xed, the effect of the factor of interest can be more clearly identied. In Example
10.7 one of the factors affecting the time taken is the skill level of worker selected. When comparing the
times taken by the same worker to complete the task under the old and new work practices, the skill level
is unchanged and so the effect of the change in work practices can be seen more clearly. Use blocking
wherever possible to reduce the effect of differences in other factors. Where blocking reduces the effect of
other factors on the observations the matched pairs method is more efcient than the independent samples
method.
7. The use of the matched pairs method can change the hypothesis being tested. Consider Examples 10.2 and
10.6. With the independent samples method, the hypothesis being tested is that men are paid more on average
than women. Men could be paid more on average than women either by being paid more than women for
doing the same work or by there being a higher proportion of men in the more highly paid occupations. With
the matched pairs experiment, the hypothesis being tested is men are paid more than women for doing the
same work. The appropriate experiment design depends on the objectives of the researcher.
Example 10.8
A supplier of sweets is evaluating a new, and more expensive, retail display. He randomly selects 10 outlets. In 5
of these outlets he uses the old retail display in Week 1 and the new retail display in Week 2. In the other 5 outlets
he uses the new retail display in Week 1 and the old retail display in Week 2. The number of units sold in a week
at these 10 outlets is shown in Table 10.1.
Table 10.1: Sales of Sweets in 10 Retail Outlets
Retail Outlet Number of Units Sold
Old Retail Display New Retail Display
1 30 56
2 10 40
3 44 55
4 38 48
5 62 67
6 56 54
7 48 62
8 36 60
9 64 80
10 28 34
(a) Does using the new display result in an increase in sales? Use a 1% test.
(b) What is the pvalue for this test?
Solution
(a) Steps 1 and 2 : state the hypotheses and specify the level of the test
Let
1
2
If the new display leads to an increase in sales, then
1
<
2
.
The statement is :
1
<
2
or
1
2
< 0
The negation is :
1

2
or
1
2
0
H
0
:
1
2
0
H
A
:
1
2
< 0 : = 0.01
t
obs
=
d
0
s
d
n
As H
A
is a < hypothesis, this is a one-tailed lower tail test at the 1% level. From the ttables with
v = n 1 = 9, the rejection region is the shaded region in Figure 10.6.
A
:
1
2
< 0, = 0.01
T
= 9
2.821 0
.................................................................................................................................................................................................................................................................................................................................................................................................................................................... ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.01
Reject H
0
t
obs
< 2.821
Do not reject H
0
t
obs
2.821
First calculate the differences between the sample values.
Table 10.2: Calculating the Differences
Outlet Number of Units Sold Difference
Old Retail Display New Retail Display
i x
i
y
i
d
i
= x
i
y
i
1 30 56 26
2 10 40 30
3 44 55 11
4 38 48 10
5 62 67 5
6 56 54 +2
7 48 62 14
8 36 60 24
9 64 80 16
10 28 34 6
Entering the d
i
values into a calculator gives:
n = 10 d = 14 s
d
= 10.1653
.
.
. t
obs
=
14.0 0
10.1653
10
= 4.355
Reject H
0
in favour of H
A
at the 1% level.
There is sufcient evidence in the sample data to conclude at the 1% level that the mean sales are
higher with the new display than with the old display.
(b) Finding the pvalue
To nd the pvalue, look along the row of the ttables for = 9 to nd two tscores that lie
on either side of | t
obs
|= 4.355.
0.75 0.80 0.85 0.90 0.95 0.975 0.99 0.995 0.999 0.9995
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
= 9 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A
:
1
2
< 0
T
= 9
4.781 4.297 0
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
t
obs
= 4.355
.......................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
pvalue
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................
...........................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................
...........................
Area to left = 0.0010
Area to left = 0.0005
The pvalue is the shaded area in Figure 10.7.
.
.
. 0.0005 < pvalue < 0.0010
An exact pvalue cannot be calculated from the ttable in the UC Statistical Tables .
Quick Question 6
10.7 Test on a population proportion
In this section you will learn to test hypotheses on a population proportion. Examples of the type of hypothesis
to be tested are:
more than 20% of households are now living below the poverty line;
less than 10% of Brand X batteries fail in the rst year of use;
Sarah will win the next election on the rst ballot;
20% of all the garments produced in the factory have defects.
Example 10.9
The Department of Education claims that 90% of the population is literate.
Formulate the null and alternative hypothesis for testing this claim.
Solution
Let p = the proportion of the population that is literate
The claim is : p = 0.90
The negation is : p = 0.90
H
0
: p = 0.90
H
A
: p = 0.90
Example 10.10
A University lecturer claims that most assignments are returned to the students within 14 days.
Formulate the null and alternative hypotheses for testing this claim.
Solution
Let p = the proportion of assignments returned within 14 days
The claim is : p > 0.50
The negation is : p 0.50
H
0
: p 0.50
H
A
: p > 0.50
The above hypotheses all relate to the proportion of the population that has a particular characteristic (lives below
the poverty line, fails in the rst year of use, votes for Sarah etc.) Each element in the population has only two
possible values:
Yes: the element has the characteristic or
No: the element does not have the characteristic.
An element in the population cannot have half the characteristic or have the characteristic twice! The statement
being tested is a hypothesis about the proportion of the population that has the characteristic.
Quick Question 7
(a) A simple random sample of size n is taken from the population;
(b) n 30, np
0
> 5 and n(1 p
0
) > 5, where p
0
is the hypothesised population proportion.
The test is robust with regard to the second condition for samples of a reasonable size (say n > 20), provided
np
0
> 2, and n(1 p
0
) > 2. The test is fragile with regard to the rst condition.
The six step procedure for this test is given on the following page.
0
: p = p
0
H
0
: p = p
0
H
A
: p = p
0
: =
z
obs
=
p p
0
p
0
(1 p
0
)
n
where p
0
= the hypothesised population proportion with the characteristic
p = the proportion in the sample with the characteristic
n = the sample size.
Reject H
0
z
obs
< z
/2
or > +z
/2
where z
/2
is from the inverse normal tables.
Comments
1. This test is based on the binomial distribution. Each member of the population either has the characteristic
or does not have the characteristic. Under the equality in the null, the number in the sample with the charac-
teristic has a binomial distribution with number of trials n and probability of success of p
0
(where p
0
is the
proportion in the population with the characteristic). The second condition for the test is the condition for
using the normal approximation to this binomial. Use this test only if the binomial is an appropriate model
for the number of successes in the sample.
2. If the sample is small, do not use a ttest. Work directly with the binomial tables.
Example 10.11
The management of the Fly-By-Night Airline claims that no more than 10% of all its ights take-off behind
schedule. A sample of 70 ights shows that 10 of them are late taking off. Can we conclude that the claim is
false? Use a 5% test. What is the pvalue?
Solution
Let p = the proportion of ights that take off behind schedule.
The claim is : p 0.10
The negation is : p > 0.10
H
0
: p 0.10
H
A
: p > 0.10 : = 0.05
z
obs
=
p p
0
p
0
(1 p
0
)
n
As H
A
is a > hypothesis, this is a one-tailed upper tail test at the 5% level. From the inverse normal
tables z
0.05
= 1.6449 and so the rejection region is the shaded region in Figure 10.8.
A
: p > 0.10 : = 0.05
Z 0 1.6449
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................................................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.05
Reject H
0
z
obs
> 1.6449
n = 70 x = 10
.
.
. p =
10
70
= 0.1429
.
.
. z
obs
=
0.1429 0.10
0.10(1 0.10)
70
= 1.1964
A
: p > 0.10 : = 0.05
Z
0 1.6449
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
= 1.1964
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
= 0.05
Do not reject H
0
in favour of H
A
at the 5% level.
There is insufcient evidence in the sample data to conclude, at the 5% level, that more than 10% of
ights take off behind schedule.
Finding the pvalue
This is a one-tailed test and so the pvalue is one-tailed. The pvalue is the shaded area in Figure
10.10.
A
: p > 0.10
Z
0
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
= 1.1964
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
pvalue
From the normal tables the area below Z = 1.20 is 0.8849.
.
.
. pvalue = 1.0000 0.8849 = 0.1151
Quick Question 8
10.8 Testing differences between proportions
In this section you will learn to test hypotheses on the difference between two population proportions. Examples
of the type of hypothesis to be tested are:
the Prime Ministers approval rating declined between February and June;
the introduction of new quality control procedures led to a reduction in the percentage of garments
produced that were faulty;
the unemployment rate is higher among management graduates than among mathematics graduates;
the literacy rates in Tonga and Samoa are equal.
Example 10.12
There was no change in the proportion of the population living below the poverty line between 1987 and 1997.
Formulate the null and alternative hypotheses for testing this statement.
Solution
Let p
1
= proportion of the population living below the poverty line in 1987.
p
2
= proportion of the population living below the poverty line in 1997.
The statement is : p
1
= p
2
The negation is : p
1
= p
2
Taking all the parameters to the same side of the inequalities gives:
1
p
2
= 0
The negation is : p
1
p
2
= 0
H
0
: p
1
p
2
= 0
H
A
: p
1
p
2
= 0
Example 10.13
A larger proportion of women than men passed the examination.
Formulate the null and alternative hypotheses for testing this statement.
Solution
Let p
1
= the proportion of women who passed the examination
p
2
= the proportion of men who passed the examination
1
> p
2
or p
1
p
2
> 0
The negation is : p
1
p
2
or p
1
p
2
0
H
0
: p
1
p
2
0
H
A
: p
1
p
2
> 0
All the above hypotheses compare the proportions in two populations that have a particular characteristic (lives
below the poverty line, is faulty, is unemployed etc.) Each element in each population has only two possible
values:
Yes: the element has the characteristic or
No: the element does not have the characteristic.
The statement being tested is a hypothesis about the difference between the proportions in the two populations
that have the characteristic.
Quick Question 9
(a) A simple random sample is taken from each population;
(b) The two samples are independent;
(c) For each sample: n 30, np
0
> 5, and n(1 p
0
) > 5, where p
0
is the common population proportion
under the equality in H
0
.
The test is robust with regard to condition (c) for samples of a reasonable size (say n > 20) provided
np
0
> 2, and n(1 p
0
) > 2 for each sample. The test is fragile with regard to conditions (a) and (b).
The six step procedure is given overleaf.
0
: p
1
p
2
= 0
H
0
: p
1
p
2
= 0
H
A
: p
1
p
2
= 0 : =
z
obs
=
( p
1
p
2
)
_
p
c
(1 p
c
)
_
1
n
1
+
1
n
2
_
where x
1
= number in the sample from population 1 with the characteristic
n
1
= size of the sample from population 1
p
1
=
x
1
n
1
= proportion in the sample from population 1 with the characteristic
x
2
= number in the sample from population 2 with the characteristic
n
2
= size of the sample from population 2
p
2
=
x
2
n
2
= proportion in the sample from population 2 with the characteristic
p
c
=
x
1
+ x
2
n
1
+ n
2
= proportion in the two samples combined with the characteristic
Reject H
0
z
obs
< z
/2
or > +z
/2
where z
/2
is from the inverse normal tables.
Comments
1. This test is based on the binomial distribution. Each member of each population either has the characteristic
or does not have the characteristic. The number in the sample with the characteristic has a binomial distri-
bution with a number of trials n and a probability of success of p
0
.
Under the null hypothesis, the numbers in the two samples have independent binomial distributions with
the same unknown value of p
0
. In the test, the two samples are combined to give a joint estimate p
c
of the
common population proportion p
0
.
2. The third condition for the test is the condition for using the normal approximation to these two binomials
under the null hypothesis. This condition cannot be tested directly as the population proportion is unknown.
However, the value of p
c
can be used as an estimate of p
0
for the purposes of checking this condition.
Use this test only if the binomial is an appropriate model for the number of successes in the two samples.
3. The above test is a test for the equality of the two population proportions. There is a similar (but less used)
test for testing for a non-zero difference between the two population proportions. The hypotheses here are
H
0
: p
1
p
2
=
0
H
1
: p
1
p
2
=
0
: =
where
0
is a hypothesised difference between the two population proportions. The test statistic then becomes
z
obs
=
( p
1
p
2
)
0
p
1
(1 p
1
)
n
1
+
p
2
(1 p
2
)
n
2
Example 10.14
In a simple random sample of 200 children in 1990, it was found that 60 of these children were suffering from
malnutrition. A simple random sample of 300 children in 1995 included 120 children suffering from malnutrition.
Has there been a change in the proportion of children suffering from malnutrition between 1990 and 1995 ? Use
a 5% test. What is the pvalue for this test? Would the null be rejected at the 1% level?
Solution
Let p
1
= the proportion of children who were malnourished in 1990
p
2
= the proportion of children who were malnourished in 1995
If there is no difference between 1990 and 1995, then p
1
= p
2
.
When formulating hypotheses, it is convenient to have all the parameters on the same side of the
inequalities and so the hypotheses to be tested are:
H
0
: p
1
p
2
= 0
H
A
: p
1
p
2
= 0 : = 0.05
z
obs
=
( p
1
p
2
)
_
p
c
(1 p
c
)
_
1
n
1
+
1
n
2
_
As H
A
is a = hypothesis, this is a two-tailed test at the 5% level. From the inverse normal tables, the
rejection region is the shaded region in Figure 10.11.
A
: p
1
p
2
= 0 : = 0.05
Z
1.9600 0 1.9600
................................................................................................................................................................................................................................................................................................................. .....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
/2 = 0.025
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
/2 = 0.025
Reject H
0
z
obs
< 1.9600 or > +1.9600
Do not reject H
0
1.9600 z
obs
+1.9600
n
1
= 200 x
1
= 60 .
.
. p
1
=
60
200
= 0.30
n
2
= 300 x
2
= 120 .
.
. p
2
=
120
300
= 0.40
.
.
. p
c
=
60+120
200+300
= 0.36
.
.
. z
obs
=
0.3 0.4
_
0.36(1 0.36)
_
1
200
+
1
300
_
= 2.2822
Reject H
0
in favour of H
A
at the 5% level.
There is sufcient evidence in the sample data to conclude at the 5% level that there has been a
change in the proportion of children suffering from malnutrition between 1990 and 1995.
Finding the pvalue
This is a two-tailed test and so the pvalue is the area shown in Figure 10.12.
A
: p
1
p
2
= 0
Z
0
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
............................................................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
p value
2
..................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
z
obs
= 2.2822
From the normal tables the area below a zscore of 2.28 is 0.0113. Thus each of the two tails has
an area of 0.0113.
.
.
. pvalue = 2 0.0113 = 0.0226
The pvalue is 0.0226.
For a 1% test = 0.01. As the pvalue > , the null hypothesis is not rejected at the 1% level.
Quick Question 10
10.9 Summary
In this unit you learned six hypothesis tests. Many more tests are used in accounting, economics and nance but
they can all be carried out using the six step procedure. In the next unit you will learn the most important test in
economics the test on the slope of a regression line.
UNIT 11
SIMPLE LINEAR
REGRESSION
11.0 Contents
11.2 Introduction
11.3 An example of a linear model
11.4 Estimating the variance of the errors
11.5 The assumptions of the model
11.6 Examining the residuals
11.7 The sampling distribution of

1
11.8 Testing the slope of the regression line
11.9 Forecasting the value of the dependent variable
11.10 A test on Spearmans rank correlation coefcient
11.11 Unit summary
Print Workbook 11
In Unit 4, you learned to measure the strength of monotonic and linear relationships and to estimate simple linear
regressions. These values are estimated from sample data and, as you know, the reliability of sample estimates
has to be investigated. This is the subject of this unit. Before studying this unit re-read Unit 4.
After you have completed this unit, you should be able to:
Dene the simple linear regression model;
Interpret the graph of the residuals plotted against the predicted values;
Calculate the standard error of the estimate;
Test hypotheses on the slope of the regression line;
Use the estimated regression line to make point and interval estimates of the mean value of the dependent
variable for a given value of the independent variable;
Use the estimated regression line to make point and interval forecasts of the value of the dependent
variable for a given value of the independent variable;
Test for a monotonic relationship between two variables.
11.2 Introduction
In Unit 4 you learned how to use least squares to estimate the equation of the linear relationship between two
variables. The linear equation between the variables X and Y estimated from the sample is written as:
y =

0
+

1
x
The main interest here is in the linear relationship between the two variables in the population. The linear rela-
tionship in the population is written as:
y =
0
+
1
x
As the estimates

0
and

1
have been found from sample data, the fundamental problem of statistics arises: How
reliable are the sample values

0
and

1
as estimates of the population values
0
and
1
?
Before the reliability of the sample estimates can be assessed, the population values that are being estimated
must be specied. In this unit you will learn what is meant by the population relationship between the economic
variables X and Y . We then look at the question of data collection. As you know, with all sample estimates, the
reliability of the estimates is very dependent on the way in which the sample data is collected. The assumptions
about data collection made in the linear model are described in detail.
In the later sections of this unit, you will learn to test hypotheses on the slope,
1
, of the population line. The
hypothesis test on
1
is the most widely used test in economic statistics. It is used to test the validity of economic
theories. It is essential that you understand this test.
One of the most important applications of regression analysis is forecasting. Governments use regression analysis
to make forecasts of the level of investment, the level of consumption and the level of imports in the economy.
Businesses use regression to forecast sales. You will learn to make point and interval forecasts of economic vari-
ables.
The correlation coefcient, r, measures the strength of the linear relationship between two variables in the sam-
ple, but the interest is in the strength of the linear relationship between two variables in the population. In this
unit you will learn to use sample data to test whether there is a linear relationship between two variables in the
population.
In Unit 4 you learned to calculate Spearmans rank correlation coefcient, r
S
, as a measure of the strength of the
monotonic relationship between two variables. The value of r
S
is always calculated from sample data but the
interest is in the strength of the monotonic relationship between the variables in the population. In this unit you
will learn how to use sample data to test whether there is a monotonic relationship between two variables in the
population.
11.3 An example of a linear model
In Unit 4 you learned to use least squares to estimate the equation of the linear relationship between two variables.
In practice these equations are always estimated from sample data and the reliability of using the sample line as
an estimator of the population line has to be considered. Before this reliability can be measured we must rst
specify what we mean by the population line. The population line used in linear regression is described in this
section. In later sections the reliability of sample estimates of the population line is examined.
In this section a simple food expenditure model is used to illustrate the linear model. At the end of the section the
assumptions of the linear model are stated.
First consider the relationship between a households monthly expenditure on food and a households monthly
income. Income is one of the most important factors determining food expenditure but other factors are also
important:
the number of people in the household;
the age of the people in the household;
the other nancial commitments of the household; etc
If we only know the income of a household we cannot expect to exactly predict the households food expenditure.
The linear equation for the relationship between food expenditure and income in the population is:
y =
0
+
1
x +
where
Y = the monthly food expenditure of the household ($000).
X = the monthly income of the household ($000).
= the error term, representing the effect of factors other than income on food expenditure ($000).
How is this equation to be interpreted?
We can divide the population of households into groups according to their monthly income. Suppose that all the
households in the rst group have income x
1
, all the households in the second group have income x
2
etc. Consider
the households in the group with X = x. All these households have exactly the same income, x.
Although all the households in this group have the same income, they will not all have the same food expenditure
some households will spend more on food than others. The food expenditures of the households could be dis-
played in a histogram, and in the limit as a smooth curve.
Figure 11.1 shows the distribution of food expenditure for the all the households in the population with income x.
Figure 11.1: Distribution of Food Expenditure For Households with Income x
Y
SD =
_
V [Y | X = x]
E[Y | X = x]
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
X = x
B
The food expenditure of all the households with a monthly income of x is measured along the horizontal axis.
Different households have different food expenditures. Household A spends more on food than household B and
both spend more than the average for their income level. Remember that all the households in this group have
exactly the same monthly income but their food expenditures differ due to the other factors that inuence food
expenditures. The average food expenditure of all the households with income of x is denoted by E[Y | X = x].
The variance of the food expenditures of all households with an income of x is denoted by V [Y | X = x].
A distribution, similar to Figure 11.1 could be drawn for each income. The simple linear regression model makes
the following assumptions about these distributions.
1. Linearity: For each value of X, the mean value of Y lies on a straight line.
If the average food expenditure is calculated for each income group, then with this assumption, these average
food expenditures will lie exactly on a straight line. This line is given by:
E[Y | X = x] =
0
+
1
x
The graph of this relationship is shown in Figure 11.2.
Figure 11.2: Average Food Expenditure For Each Income Group
Food Expenditure (Y )
Income (X)
x
1
x
2
x
E[Y | X = x
1
]
E[Y | X = x
2
]
E[Y | X = x]
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
y =
0
+
1
x
2. Homogeneous variance: For each value of X, the values of Y have the same variance.
For this example, the assumption is that the variance of the food expenditures is the same for each income
level. The two assumptions listed above give the mean and standard deviation of Y for each value of X. A
third assumption completes the specication of the distribution of Y for each value of X.
3. Normality: For each value of X, the values of Y are normally distributed.
The population regression line gives the average value of Y for each value of X. This is the population line
that we seek to estimate when we use least squares.
Combining these three assumptions, the food expenditures of all the households with an income of x are assumed
to be normally distributed with a mean of
0
+
1
x and a variance of
2
. This can be written as
(Y | X = x) N (
0
+
1
x,
2
) for all x
Example 11.1
The relationship between household monthly food expenditure and monthly income in the population is
(Y | X = x) N (0.4 + 0.2x, 0.3
2
)
where Y = household food expenditure per month ($000)
X = household income per month ($000)
What are the food expenditures of households with an income of $4000 per month?
Solution
We have
E[Y | X = 4] = 0.4 + 0.2 4 = 1.2
Households with an income of $4,000 per month spend an average of $1,200 per month on food.
Some households spend more than $1,200 per month and some less than $1,200. There is a spread of food
expenditures. The spread is measured by the standard deviation.
(Y |X=4)
=
_
V [Y | X = 4] = 0.3
The standard deviation of the food expenditures for households with an income of $4,000 per month is $300.
Now look at the term in the population equation
y =
0
+
1
x +
.
.
. = y (
0
+
1
x)
= y E[Y | X = x]
This term is called the error. The error is the deviation of the observed value of Y from the population regression
line. Suppose that the observed values of X and Y were as shown in Figure 11.3 below.
Figure 11.3: The Errors and the Population Regression Line
Y
X
x
1
x
2
x
y
1
y
2
y
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

y =
0
+
1
x
_
1
_
2
_
The term measures the deviation of a value of Y from the population regression line. If a value of Y is above
average for the given value of X, it is above the regression line and the error is positivesee X = x
2
. If a value
of Y is below average for the given value of X, the error is negative see X = x
1
.
Example 11.2
The relationship between household monthly food expenditure and monthly income in the population is
(Y | X = x) N (0.4 + 0.2x, 0.3
2
)
where Y = household food expenditure per month ($000)
X = household income per month ($000)
Find the value of the error term for:
(a) a household with a monthly income of $4,000 and a monthly food expenditure of $1,300.
(b) a household with a monthly income of $10,000 and a monthly food expenditure of $2,000.
Solution
(a) The average monthly food expenditure of all households with a monthly income of $4,000 is
E[Y | X = 4] = 0.4 + 0.2 4 = 1.2
On average, households with an income of $4,000 per month spend $1,200 per month on food. A household
with an income of $4,000 that spends $1,300 on food has:
= 1.3 1.2 = 0.1
This household spends $100 per month on food more than the average household with income $4,000 per
month.
(b) The average food expenditure of all the households in the population with a monthly income of $10,000 is
E[Y | X = 10] = 0.4 + 0.2 10 = 2.4
On average, households with an income of $10,000 per month spend $2,400 per month on food. A household
with an income of $10,000 per month that spends $2,000 on food has:
= 2.0 2.4 = 0.4
This household spends $400 per month on food less than the average household with income $10,000 per
month.
The relationship between Y and for a given value of X is shown in Figure 11.4 below. The represents the
deviation of Y from its mean value.
Figure 11.4: The Distributions of Y and for X = x
Y
SD =
E[Y | X = x] y
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
X = x
..
Error
SD =
0
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
X = x
..
Notice that the has a mean of 0 and the same spread as the values of Y . Thus
E[ | X = x] = 0
V [ | X = x] =
2
As (Y | X = x) N (
0
+
1
x,
2
)
it follows that ( | X = x) N(0,
2
)
The linear model with dependent variable Y and independent variable X is dened below:
'
&
$
%
The Linear Model
y =
0
+
1
x +
with E[Y | X = x] =
0
+
1
x
V [Y | X = x] =
2
(Y | X = x) is normally distributed
Combining these statements we have
(Y | X = x) N(
0
+
1
x,
2
)
The linear model now has three parameters to be estimated:

0
: the average value of Y when X = 0.

1
: the increase in the average value of Y when X increases by 1 unit.

2
: the variance of Y for each value of X.
Quick Question 1
11.4 Estimating the variance of the errors
The model is
y =
0
+
1
x +
with E[Y | X = x] =
0
+
1
x
V [Y | X = x] =
2
(Y | X = x) is normally distributed
The model has three parameters
0
,
1
and
2
. In Unit 4 you learned to use least squares to estimate
0
and
1
from sample data. Now you will learn to estimate
2
from sample data.
As you know,
2
is the variance of Y for each value of X. It is also the variance of the errors. The errors are
unknown as they are the deviations from the unknown population regression line. We only know the equation
of the sample regression line. The deviations of the observations from the sample regression line are called the
residuals and the residuals are used as estimates of the errors. The variance of the residuals, s
2
, is then used as an
estimator of the variance of the errors
2
.
s
2
(e
i
e)
2
n 2
Notice that the variance of the residuals has a denominator of (n2) and not the more usual (n1). This change
is necessary to make the variance of the residuals an unbiased estimator of the variance of the errors.
With least squares, the sum of the residuals is always zero, and so the mean of the residuals is also zero. The
above formula can, therefore, be simplied to
s
2
e
2
i
n 2
=
SSE
n 2
where SSE is the sum of squares due to error dened in Section 4.10.
s
is called the standard error of estimate.

The standard error of estimate can be calculated by hand but a better method is to use the Excel regression output.
The estimated variance of the errors, s
2
, is calculated in the Residuals row of the ANOVA table. The

value of s
2
is in the MS column of this row.

The standard error of estimate, s
, is given in the Regression Statistics.

Example 11.3
The monthly incomes and food expenditures of a simple random sample of 5 households are as shown in Table
11.1 below.
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
The least squares regression line for these data is
Food = 0.5 + 0.1 Income

Estimate the standard deviation of the food expenditures of all households with an income of $3,000 per month.
Solution
In the linear model the standard deviation of Y is the same for each value of X and is calculated by:
s

e
2
i
n 2
Table 11.2: Calculating the Standard Error of Estimate
x y y = 0.5 + 0.1x e = y y e
2
1 0.5 0.6 0.1 0.01
3 1.1 0.8 +0.3 0.09
4 0.8 0.9 0.1 0.01
7 1.0 1.2 0.2 0.04
10 1.6 1.5 +0.1 0.01
sum 0.0 0.16
.
.
. s
2
e
2
i
n 2
=
0.16
5 2
= 0.05
3
.
.
. s
= 0.2309
The estimated standard deviation of the food expenditures of all households with an income of $3,000 per month
is $231.
A better method is to use the Excel regression output below:
R Square 0.757575758
Observations 5
ANOVA
df SS MS F Signicance F
Regression 1 0.5 0.5 9.375 0.054912524
Residual 3 0.16 0.053333333
Total 4 0.66
From the Residual row of the ANOVA table (coloured blue) we have:
s
2
=
SSE
n 2
=
0.16
3
= 0.053333333
.
.
. s
0.053333333 = 0.2309
Or from the Standard Error row of the Regression Statistics table (coloured red):
s
= SEE = 0.230940108
The estimated standard deviation of the food expenditures of all households with an income of $3,000 per month
is $231.
Quick Question 2
11.5 The assumptions of the model
An important determinant of the reliability of the estimates of
0
,
1
and is the way that the sample points are
selected. (Compare the importance of random sampling in the estimation and testing of the population mean.)
In the simple linear regression model, the following procedure is assumed to have been used to select the data
points:
1. The population is divided into groups so that each observation in the group has the same value of X.
2. Some of these groups are chosen. Any methodrandom or non-randomcan be used to choose these
groups.
3. From each chosen group one or more individuals is randomly selected and the value of Y is recorded.
The data could then look like the points shown in Figure 11.5. (Remember that the errors are the distances of the
points from the population line.)
Figure 11.5: The Data Points For the Simple Linear Regression Model
Y
X
x
1
x
2
x
i
x
n
y
1
y
i
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

y =
0
+
1
x
_
i
_
1
From the way that the values of Y are selected, each Y value is selected independently of the other Y values. As
the Y values are selected independently, the values are also independent.
Gathering together the various assumptions that have been stated above gives the full simple linear regression
model.
'
&
$
%
The Simple Linear Regression Model
y =
0
+
1
x +
where
1. E[Y | X = x] =
0
+
1
x for all x
2. V [Y | X = x] =
2
for all x
3. Y values are independently distributed
4. Y is normally distributed for all x
11.6 Examining the residuals
The relationship between the values of Y and the errors was discussed in section 11.3. The specication of the
linear regression model on the previous page can be rewritten in terms of the errors, as:
'
&
$
%
The Simple Linear Regression Model
y =
0
+
1
x +
where
1. E[ | X = x] = 0 for all x
2. V [ | X = x] =
2
for all x
3. values are independently distributed
4. is normally distributed for all x
Before accepting the results of an estimated regression or carrying out tests on the parameters of the model the
appropriateness of the above model should be investigated.
The errors are the deviations of the observed values from the population line. In Figure 11.6 for the observed
points I
1
and I
2
, the errors are
1
and
2
.
Figure 11.6: The Errors and the Population Regression Line
Y
X
x
1
x
2
E[Y | X = x
1
]
y
1
y
2
E[Y | X = x
2
]
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
I
1
I
2
y =
0
+
1
x
_
2
_
1
As you know, the position of the population line is not known and so the errors can never be calculated. In Unit
4 you learned to calculate the least squares line and the residuals. The errors are the deviations of the data points
from the population regression line. The residuals are the deviations of the data points from the sample regression
line. The residuals should have approximately the same properties as the errors.
The following checks should always be carried out before calculating condence intervals or testing hypotheses
in the simple linear regression model:
1. Plot the data points and examine the plot to see if there appears to be a linear relationship between the variables.
Figure 11.7: A Linear Model Data Plot
Y
X
.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
y =

0
+

1
x
Figure 11.8: A Non-linear Model Data Plot
Y
X
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
y =

0
+

1
x
In Figure 11.7 the data points show a linear relationship between the two variables. This data is suitable for
further analysis as a linear model. In Figure 11.8 the data is clearly not linear and so a linear regression should
not be estimated. The model should be reformulated as a non-linear model.
2. Plot the residuals against the predicted values.
From the assumptions of the linear regression model:
From Assumption 3, the residuals should be (approximately) independent of each of each other. Thus
there should be no pattern of any kind in the plot of the residuals against the predicted values.
From Assumption 2, the residuals should have (approximately) the same variance for each value of X.
The spread of the residuals should not change as X changes. Thus the spread of the residuals should not
change as the predicted value changes.
Figure 11.9: Randomly Scattered Residuals
e
y
0
The residuals in Figure 11.9 seem randomly distributed. This supports the use of a linear regression model.
Figure 11.10: Residuals Showing Autocorrelation
e
y
0
In Figure 11.10 there is a clear U shape to the residuals. If the errors are independently distributed, the points
should be randomly scattered around this diagram with no clear shape to the distribution. In Figure 11.10 the
errors are clearly not independent. When the errors are not independent they are said to exhibit autocorrela-
tion and this violates Assumption 3.
Autocorrelation of the residuals may be due to the mistaken estimation of a non-linear relationship. For ex-
ample, if a linear regression was estimated for the data illustrated in Figure 11.8, the residuals would show a
clear pattern. Another common reason for observing autocorrelation is the presence of an important variable
affecting the dependent variable that has not been accounted for in the model.
Figure 11.11: Residuals Showing Heteroskedasticity
e
y
0

In Figure 11.11 there is a clear pattern to the residuals. As the predicted value y increases so does the variance
of the residuals. This suggests that the variance of the errors is not constant, which violates Assumption 2
of the simple linear regression model. When the variance of the errors is constant, the errors are said to be
homoskedastic. When the variance of the errors is not constant, the errors are said to be heteroskedastic.
Heteroskedasticity violates Assumption 2 of the simple linear regression model.
Heteroskedastic errors can arise quite naturally in economics. Consider the example of Section 11.3. For
households with a low income, most of this income has to be spent on food and so the spread in food expendi-
ture for low income households will be small. As household income increases, there is more choice available
to the households. Some households will choose to spend more on food while others may save more or spend
their income on other goods. Thus for high incomes there will be a wide spread in food expenditure and hence
a large variance for the error terms.
Where the errors are found to be heteroskedastic or to exhibit autocorrelation, the tests and condence intervals
you will learn later in this unit are not valid. The model should be reformulated before carrying out any tests.
The examination of the plot of the residuals against the predicted values is an essential part of any linear
model analysis.
Example 11.4
The monthly incomes and food expenditures of a simple random sample of 5 households are shown below.
x y
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
The least squares regression line for these data is:
y = 0.5 + 0.1x
Plot the residuals against the predicted values.
Table 11.4: Calculating the Predicted Values and the Residuals
x y y = 0.5 + 0.1x e = y y
1 0.5 0.6 0.1
3 1.1 0.8 +0.3
4 0.8 0.9 0.1
7 1.0 1.2 0.2
10 1.6 1.5 +0.1
Figure 11.12: Plot of Residuals against Predicted Values
e
y
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
There are too few points here to draw conclusions on the validity of the assumptions of the linear model.
Quick Question 3 Quick Question 4
11.7 The sampling distribution of

1
As you know, the simple linear regression model is:
y =
0
+
1
x + where NID(0,
2
)
(NID is shorthand for Normally and Independently Distributed).
To estimate the simple linear regression model, the population is divided into groups with all the members of each
group having the same value of X. Some of these groups are chosen and then one or more members are randomly
selected from each group and the values of X and Y recorded. The observed values of X and Y are then used to
estimate the sample regression line:
y =

0
+

1
x
If this experiment is repeated a large number of times with the same groups chosen, then each time the experi-
ment is run, the same X values will be recorded but the Y values will change. From each run of the experiment,
different

0
and

1
values will be calculated i.e. different sample estimates of
0
and
1
will be obtained. If the
experiment is repeated a large number of times, the values of

1
obtained from the different regressions could
be graphed in a histogram, which in the limit would become a smooth curve. This is the sampling distribution
of

1
. If all the assumptions of the simple linear regression model hold, then it can be shown that the sampling
distribution of

1
is:
1
N
_
1
,

2
SS
x
_
where SS
x
=
x
2
x)
2
n
= (n 1)s
2
x
(s
2
x
is the variance of the sample values of X)
Figure 11.13: Distribution of the Slopes of the Sample Regression Lines
1
SD =

SS
x
1
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Notice that E[
1
] =
1
. The sample statistic

1
is an unbiased estimator of
1
. The standard error of

1
, denoted
by SE(
1
), is the estimated standard deviation of

1
. Using the standard deviation of the residuals as an estimate
of the standard deviation of the errors gives:
SE(
1
) =
s
SS
x
The standard error of

1
gives a measure of the reliability of using

1
as an estimate of
1
. If the standard error of
1
is small then the spread in the estimated values of

1
around
1
from different regressions is small. The value
of

1
is nearly always close to
1
. There is a high probability that

1
is close to
1
and so

1
is a reliable estimate
of
1
. If the standard error of

1
is large then

1
is often very different from
1
and so does not give a reliable
estimate of
1
.
Similarly the values of

0
have a sampling distribution. The estimated standard deviation of the sampling distri-
bution of

0
is called the standard error of

0
. We will not cover the hand calculations of the standard error of

0
in this unit.
One convention for presenting the estimates from a simple linear regression is to write them in the form:
y =

0
+

1
x R
2
=
(SE(
0
)) (SE(
1
)) SEE =
Unfortunately, this convention is not universally followed. Some economists put t-scores and not the standard
errors in parentheses below the estimates of

0
and

1
. Read the published results of regressions carefully!
In Excel the values of the standard errors of

0
and

1
are given in the column next to the values of

0
and

1
.
Example 11.5
The monthly incomes and food expenditures of a simple random sample of 5 households were recorded and used
to estimate the food expenditure regression equation
Food =
0
+
1
Income +
where Food = monthly food expenditure ($)
Income = monthly income ($)
The Excel regression output is shown on the following page. Write out the estimated equation in the standard
form.
SUMMARY OUTPUT
R Square 0.757575758
Observations 5
ANOVA
df SS MS
Residual 3 0.16 0.053333333
Total 4 0.66
Coefcients Standard Error tStat
Intercept 0.5 0.193218357 2.587745848
Income 0.1 0.032659863 3.061862178
Solution
In the output on the previous page the coefcient of determination and standard error of the estimate are coloured
green, the values of

0
and

1
are coloured red and their standard errors are blue. The estimated equation is:
Food = 0.5 + 0.1Income R

2
= 0.7576
(0.19) (0.03) SEE = 0.2309
As the standard error of

1
is small compared to its value, the value of

1
is a reliable estimate of
1
.
Quick Question 5
11.8 Testing the slope of a regression line
In this section you will learn to test hypotheses on the slope of a population regression line. Examples of the type
of hypothesis to be tested are:
the marginal cost is $1.00 per unit;
the marginal propensity to consume food is less than 0.2;
Australias marginal propensity to import is 0.3;
when the rate of interest increases, the level of investment falls.
Example 11.6
The marginal cost of production is $1.00. Formulate the null and alternative hypotheses for testing this statement.
Solution
If the cost function is linear, then :
c =
0
+
1
x +
where C = the total cost incurred ($)
X = the number of units produced
When x increases by 1 unit, c increases by
1
and so
1
is the marginal cost.
The statement is :
1
= 1.00
The negation is :
1
= 1.00
H
0
:
1
= 1.00
H
A
:
1
= 1.00
Example 11.7
The marginal propensity to consume food is less than 0.20 Formulate the null and alternative hypotheses for test-
ing this statement.
Solution
A linear food consumption function is:
Food =
0
+
1
Income +
where Food = the monthly food expenditure ($)
Income = the monthly income ($)
When income increases by 1 unit, food consumption increases by
1
and so
1
is the marginal propensity to
consume food.
The statement is :
1
< 0.20
The negation is :
1
0.20
H
0
:
1
0.20
H
A
:
1
< 0.20
Quick Question 6
1. E[Y | X = x] =
0
+
1
x for all x
2. V [Y | X = x] =
2
for all x
3. Y values are independently distributed
4. Y is normally distributed for all x
The test is robust with regard to condition 4 for reasonable size samples (say n 10) but fragile with regard to
the other conditions.
Notice that the test is fragile with regard to the assumptions of homoskedasticity and independence of the ob-
servations. The test can be very unreliable when there is extreme heteroskedasticity and/or autocorrelation of
the errors. This makes it extremely important that the residual plot discussed in section 11.6 is examined before
carrying out the test.
The test is carried out with the usual six step procedure.
The six step procedure for the ttest of H
0
:
1
=
0
1
H
0
:
1
=
0
1
H
A
:
1
=
0
1
: =
t
obs
=
0
1
SE(
1
)
where
0
1
= hypothesised population slope
1
= the slope of the least squares regression line
SE(
1
) = the estimated standard deviation of

1
.
Reject H
0
t
obs
< t
/2
or > +t
/2
where t
/2
Comments
As you know, the population regression equation is:
y =
0
+
1
x +
There are three important special cases of the above t test on
1
.
1. Is there a linear relationship between the variables X and Y ?
Here the test is:
H
0
:
1
= 0
H
A
:
1
= 0 : =
(a) If H
0
is rejected, then
1
= 0 and so when X changes, the value of Y changes. There is a linear relationship
between the variables.
(b) If H
0
is not rejected, then it is possible that
1
= 0. If
1
= 0 then
y =
0
+ 0 x + =
0
+
and so when X changes there is no effect on the value of Y . Thus when H
0
is not rejected, there may be
no linear relationship between the two variables.
2. Is there a positive linear relationship between the variables X and Y ?
Here the test is:
H
0
:
1
0
H
A
:
1
> 0 : =
(a) If H
0
is rejected, then
1
> 0 and so when X increases, the value of Y increases. There is a positive linear
relationship between the variables.
(b) If H
0
is not rejected, then it is possible that there is not a positive linear relationshipthere could be no
linear relationship (
1
= 0) or a negative relationship (
1
< 0).
3. Is there a negative linear relationship between the variables X and Y ?
Here the test is:
H
0
:
1
0
H
A
:
1
< 0 : =
(a) If H
0
is rejected, then
1
< 0 and so when X increases, the value of Y decreases. There is a negative
linear relationship between the variables.
(b) If H
0
is not rejected, then it is possible that there is not a negative linear relationshipthere could be no
nolinear relationship (
1
= 0) or a positive linear relationship (
1
> 0).
NOTE VERY, VERY CAREFULLY
If H
0
is rejected in the above tests, then the conclusion is that there is a linear relationship between the economic
variables X and Y . This does not show that changes in X have caused changes in Y . It could be:
1. Changes in Y are causing changes in X.
2. Changes in some other variable are causing both X and Y to change. The most common example of this arises
with variables that change over time. If any two variables that grow over time are used as the dependent and
independent variables in a regression, then they will appear to be related. Each year both variables increase,
and so when one variable increases, the other variable also increases.
For example, over the last ten years the number of UCaccounting students and the number of crimes committed
in the Canberra region have both increased. A regression
y =
0
+
1
x +
where Y is the number of crimes and X is the number of UC accounting students shows a strong positive
relationship between the two variables. The hypothesis
1
0 is rejected in favour of
1
> 0. This does not
show that the increase in the number of UC accounting students has caused the increase in crime! It does not
mean that the crime problem can be solved by reducing the number of accounting students! All the test has
shown is that the variables have increased togetherthere is no causal relationship between these variables.
When a relationship between two variables is due to the inuence of some other variable, there is said to be a
spurious correlation between the two variables.
The results of tests in regression need to be interpreted with care. They can only show that the variables have
changed together. These tests cannot show causation.
The t test is a test for a linear relationship. Where the relationship between two variables is believed to be
non-linear, Spearmans Rank Correlation Test could be more appropriate.
Example 11.8
The monthly incomes and food expenditures of a simple random sample of 5 households were recorded and used
to estimate the food expenditure regression equation
Food =
0
+
1
Income +
where Food = monthly food expenditure ($)
Income = monthly income ($)
The Excel regression output is shown on the following page.
(a) Is the marginal propensity to consume food out of income is less than 0.20? Use a 5% test.
(b) Is there a linear relationship between income and food expenditure? Use a 1% test.
SUMMARY OUTPUT
R Square 0.757575758
Observations 5
ANOVA
df SS MS
Residual 3 0.16 0.053333333
Total 4 0.66
Coefcients Standard Error tStat P-value
Intercept 0.5 0.193218357 2.587745848 0.081233
Income 0.1 0.032659863 3.061862178 0.054913
Solution
A linear food consumption function is:
Food =
0
+
1
Income +
where Food = the monthly food expenditure ($)
Income = the monthly income ($)
When income increases by 1 unit, food consumption increases by
1
and so
1
is the marginal propensity to
consume food out of income.
(a) Steps 1 and 2 : state the hypotheses and specify the level of the test
The statement is :
1
< 0.20
The negation is :
1
0.20
H
0
:
1
0.20
H
A
:
1
< 0.20 = 0.05
t
obs
=
0
1
SE(
1
)
As H
A
is a < hypothesis, this is a lower-tailed test at the 5% level. From the ttables with v =
n 2 = 3 the rejection region is the shaded region in Figure 11.14.
Figure 11.14: The Rejection Region for ttest of H
A
:
1
< 0.20 : = 0.05
t
= 3
0
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
t
0.95
= 2.353
Rejection Region
= 0.05
................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Reject H
0
obs
< 2.353
From the Excel output:
1
= 0.1
SE(
1
) = 0.032659863
.
.
. t
obs
=
0.10 0.20
0.032659863
= 3.062
obs
is in the rejection region. Reject H
0
in favour of H
A
at the 5% level.
We can conclude at the 5% level, that the marginal propensity to consume food out of income less
than 0.20.
(b) Steps 1 and 2 : state the hypotheses and specify the level of the test
There is a linear relationship if :
1
= 0
The negation is :
1
= 0
H
0
:
1
= 0
H
A
:
1
= 0 = 0.01
t
obs
=
0
1
SE(
1
)
As H
A
is a = hypothesis, this is a two-tailed test at the 1% level. From the ttables with v = n2 = 3
the rejection region is the shaded region in Figure 11.15.
Figure 11.15: The Rejection Region for ttest of H
A
:
1
= 0 : = 0.01
t
= 3
0
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . .
t
0.005
= 5.841
................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
/2 = 0.005
................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Rejection Region
/2 = 0.005
................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . .
...........................
Reject H
0
obs
< 5.841 or > +5.841
From the Excel output:
1
= 0.1
SE(
1
) = 0.032659863
.
.
. t
obs
=
0.10 0
0.032659863
= 3.062
obs
is not in the rejection region. Do not reject H
0
in favour of H
A
at the 1%
level.
There is insufcient evidence to conclude at the 1% level, that there is a linear relationship between
income and food expenditure.
In the above test it was not necessary to calculate t
obs
by hand. It is given in the Excel output in the column next
to the SE(
1
). This has been coloured red in the above Excel output. There is no need to consult ttables either
as the next column of the output (coloured green) gives the p-value for this test. To test
H
0
:
1
= 0
H
A
:
1
= 0
for any regression just compare the p-value from the Excel output with the required value of .
Quick Question 7
11.9 Forecasting the value of the dependent variable
One of the main reasons for estimating a regression is to predict the value of Y when X is known. However,
remember that when X is xed, Y has a distribution of possible values. For the simple linear regression model,
the possible values of Y for a given value of X are as shown in Figure 11.16.
Figure 11.16: The Distribution of Y For a Fixed Value of X
Y
SD =
E[Y | X = x]
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
There are two estimates that can be made from the regression line:
1. An estimate of the average value of Y when X = x i.e. an estimate of E[Y | X = x].
2. An estimate of the value of Y that will be observed when X = x. This value is more difcult to estimate as
the observed value of Y could be above or below the average value depending on the (unknown) value of .
1. Estimating the average value of Y when X = x
Point estimate
E[Y | X = x] =

0
+

1
x
100(1 )% condence interval
P
_
(
0
+

1
x) t
/2
s
1
n
+
(x x)
2
(n 1)s
2
x
< E[Y | X = x] < (
0
+

1
x) + t
/2
s
1
n
+
(x x)
2
(n 1)s
2
x
_
= 1
where t
/2
2. Predicting Y when X = x
Point prediction
y =

0
+

1
x
100(1 )% prediction interval
P
_
(
0
+

1
x) t
/2
s
1 +
1
n
+
(x x)
2
(n 1)s
2
x
< (Y | X = x) < (
0
+

1
x) + t
/2
s
1 +
1
n
+
(x x)
2
(n 1)s
2
x
_
= 1
where t
/2
Example 11.9
A simple random sample of 5 households was taken and each households monthly income and monthly food
expenditure recorded. These data are displayed in Table 11.7 below.
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
The Excel regression output for these data is given on the following page.
(a) A household has a monthly income of $7000. Give a point and 95% interval estimate for the monthly food
expenditure of this household.
(b) Give a point and 95% interval estimate for the average food expenditure of all households with an income of
$7000 per month.
SUMMARY OUTPUT
R Square 0.757575758
Observations 5
ANOVA
df SS MS
Residual 3 0.16 0.053333333
Total 4 0.66
Coefcients Standard Error tStat P-value
Intercept 0.5 0.193218357 2.587745848 0.081233
Income 0.1 0.032659863 3.061862178 0.054913
Solution
From the Excel output the estimated equation is:
Food = 0.5 + 0.1Income R

2
= 0.7576
(0.19) (0.03) SEE = 0.2309
Entering the income data into a calculator in SD mode gives:
n = 5 x = 5 s
2
x
= 12.5
From the ttables with = 3 degrees of freedom, t
0.025
= 3.182.
(a) A point and 95% interval estimate for Y when X = 7.
The estimated equation is
Food = 0.5 + 0.1Income

The point estimate for Food when Income = 7 is
Food = 0.5 + 0.1 7 = 1.2

The estimated monthly food expenditure of a household with monthly income of $7000 is $1200.
The 95% prediction interval for Income = 7 is
P
_
_
1.2 3.182 0.2309
1 +
1
5
+
(7 5)
2
4 12.5
< Food < 1.2 + 3.182 0.2309
1 +
1
5
+
(7 5)
2
4 12.5
_
_
= 0.95
P (1.2 0.83 < Food < 1.2 + 0.83) = 0.95
P (0.3 < Food < 2.1) = 0.95
The monthly food expenditure of a household with a monthly income of $7000 will lie between $300 and $2100
with probability 0.95.
(b) A point and 95% interval estimate for E[Y | X = 7]
The estimated equation is
Food = 0.5 + 0.1Income

The point estimate for the mean food expenditure when Income = 7 is
E[

Food | Income = 7] = 0.5 + 0.1 7 = 1.2
The estimated mean monthly food expenditure of all households with monthly income of $7000 is $1200.
The 95% condence interval for E[Food | Income = 7] is
P
_
_
1.2 3.182 0.2309
1
5
+
(7 5)
2
4 12.5
< E[Food |Income=7] < 1.2 + 3.182 0.2309
1
5
+
(7 5)
2
4 12.5
_
_
= 0.95
P (1.2 0.39 < E[Food |Income=7] < 1.2 + 0.39) = 0.95
P (0.8 < E[Food |Income=7] < 1.6) = 0.95
The average food expenditure of all households with an income of $7000 per month lies between $800 and $1200
with probability 0.95.
The assumption of linearity used to make predictions may be reasonable over small ranges of X but a poor
approximation over larger ranges. For example, consider the economists idea of a cost function.
Figure 11.17: The Total Cost Curve Used in Economics
Cost
Output
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
A B
The cost function can be reasonably approximated by a linear function for output in the range (A,B). However if
we use this linear approximation to predict costs for outputs less than A or more than B we will obtain very poor
estimates.
Thus in general do not try and use regression to predict Y or estimate E[Y | X] for values of X that are far
outside the range of the observations used to estimate the regression.
Quick Question 8
11.10 A test on Spearmans rank correlation coefcient
In this section you will learn to test hypotheses on the existence of a monotonic relationship between two vari-
ables. Examples of the type of hypothesis to be tested are:
high prices are associated with low quantities demanded;
high prices are associated with high quantities supplied;
high levels of education are associated with high wages;
high interest rates are associated with high exchange rates for the Australian dollar.
All of these statements assert that there is a monotonic relationship between two variables. The rst statement
asserts that there is a negative monotonic relationship between the price of a good and the quantity demanded i.e.
that the demand curve slopes downwards from left to right. The third statement asserts that there is a positive
monotonic relationship between a persons level of education and the wages received i.e. the higher the level of
education the higher the wages received.
You will probably be tempted to rewrite the third statement as:
an increase in the years of education leads to an increase in wages;
and, indeed, we may informally interpret our results in this way but always remember that statistics cannot show
that changes in one variable cause changes in a second variable. Statistics cannot show causation.
If there is a positive monotonic relationship between two variables in the population then the population rank
correlation coefcient is positive ie.
S
> 0. If there is a negative monotonic relationship between two variables
in the population rank correlation coefcient is negative ie
S
< 0. If
S
= 0 then there is not a monotonic
relationship between the two variables.
1. A simple random sample is taken from a population of objects or individuals and the value of two variables
X and Y is recorded for each selected object or individual.
2. The variables X and Y are measured on at least an ordinal scale.
3. The variables X and Y are continuous.
The test is robust with regard to the third condition but fragile with regard to the other conditions.
The six step procedure is presented on the following page.
0
:
S
= 0
H
0
:
S
= 0
H
1
:
S
= 0 : =
r
S
= 1
6
d
2
n(n
2
1)
where d = R(x) R(y)
R(x) = the rank of the value of X in the observation
R(y) = the rank of the value of Y in the observation
n = the sample size
Reject H
0
r
S
< r
/2
or > +r
/2
where r
/2
is from the Spearmans rank correlation tables. These tables are not in the UC Statistical
Tables but are reproduced on the following page.
Table 11.9: Spearmans Rank Correlation Table
p
n 0.90 0.95 0.975 0.99 0.995 0.999
4 0.8000 0.8000 -
5 0.7000 0.8000 0.9000
6 0.6000 0.7714 0.8286 0.8857 0.9429
7 0.5357 0.6786 0.7450 0.8571 0.8829 0.9643
8 0.5000 0.6190 0.7143 0.8095 0.8571 0.9286
9 0.4667 0.5833 0.6833 0.7667 0.8167 0.9000
10 0.4424 0.5515 0.6364 0.7333 0.7818 0.8667
11 0.4182 0.5273 0.6091 0.7000 0.7545 0.8364
12 0.3986 0.4965 0.5804 0.6713 0.7273 0.8182
14 0.3626 0.4593 0.5341 0.6220 0.6747 0.7670
16 0.3382 0.4265 0.5000 0.5824 0.6324 0.7265
18 0.3148 0.3994 0.4716 0.5480 0.5975 0.6904
20 0.2977 0.3789 0.4451 0.5203 0.5684 0.6586
25 0.2646 0.3362 0.3977 0.4654 0.5100 0.5962
30 0.2400 0.3059 0.3620 0.4251 0.4665 0.5479
Comments
1. Condition 3 requires that both the X and Y variables are continuous. As you saw in Unit 7, with a continu-
ous random variable the probability that a variable has exactly some specied value is 0. It follows that with
a continuous random variable the probability that the same value is observed twice in a nite sample is 0.
Thus this condition ensures that there will be no tied observations. However, the test is robust with regard
to this condition and so is often used where one or both of the variables are discrete. Then tied values may
occur. Where two values are equal, each observation should be assigned the mean of the tied rank positions.
2. The value of the test statistic is discrete variable and so the values listed in the Spearmans Rank Correlation
Table may be observed. The values in the table are in the rejection region and so if these values are observed
the null hypothesis is rejected.
3. If there are more than 30 observations use
z
obs
= r
S
n 1
as the test statistic and then carry out a one or two tailed ztest as appropriate for the alternative hypothesis.
4. There are theoretical objections to the existence of
S
in an innite population where either X or Y are
continuous random variables. If a parameter does not exist its value cannot be tested! In this subject we will
not be concerned with such niceties.
Example 11.10
The monthly incomes and food expenditures of a simple random sample of 5 households are as shown in Table
11.10 below.
1 1 0.5
2 3 1.1
3 4 0.8
4 7 1.0
5 10 1.6
Do the wealthy spend more on food than the poor? Use a 10% test.
Solution
Let
s
= the rank correlation between income and food expenditure in the population of all households.
If high food expenditures are associated with high incomes then there is a positive monotonic rela-
tionship between food expenditure and income.
The statement is :
S
> 0
The negation is :
S
0
H
0
:
S
0
H
A
:
S
> 0 : = 0.10
r
S
= 1
6
d
2
n(n
2
1)
As H
A
is a > hypothesis, this is an upper tailed test at the 10% level. The critical value has an area of
0.10 to the right and so has an area of 0.90 to the left. (Remember that all our tables are structured in
terms of areas to the left.)
There are n = 5 observations.
From the Spearmans rank correlation table row n = 5 and column 0.90 the critical value is 0.7000.
Reject H
0
r
S
0.7000
Do not reject H
0
r
S
< 0.7000
In Example 4.3 you saw that r
S
= 0.7000.
The calculated value of r
S
is (just) in the rejection region. Reject H
0
in favour of H
A
at the 10% level.
Conclude that higher food expenditures are associated with higher incomes.
Quick Question 9
11.11 Unit summary
Economic theory makes a number of assertions about the relationships between economic variables. Some ex-
amples are:
when the price of a good increases, the quantity demanded falls;
when GDP increases, imports increase;
when the interest rate falls, the level of investment will increase.
In Unit 4 you learned to use sample data to estimate the relationships between economic variables. In this unit
you have learned to test whether observed economic data supports the assertions of economic theory. The test in
section 11.8 is the most widely used test of economic theory. Unfortunately this test is also the most misused test
in statistics. It is often applied in situations where the necessary conditions for the use of the test are violated.
Always graph the data and plot the residuals against the predicted values before using this test. Also, always
remember that the test can only show that two variables have changed togetherit cannot show that changes in
one variable have caused changes in the other variable.
In this unit you have also learned to use the simple linear regression model to forecast the value of the dependent
variable for given values of the independent variable.
UNIT 12
TIME SERIES,
FORECASTING
AND INDEX NUMBERS
12.0 Contents
12.2 Time series
12.3 The additive and multiplicative time series models
12.4 Estimating the trend
12.5 Estimating the seasonal effects in the additive model
12.6 Forecasts and residuals with the additive model
12.7 Estimating the seasonal effects in the multiplicative model
12.8 Forecasts and residuals with the multiplicative model
12.9 Index numbers
12.10 Price indices
12.11 Constructing a Laspeyres price index
12.12 Constructing a Paasche price index
12.13 Comparing the Laspeyres and Paasche Price Indices
12.14 The Australian consumer price index
12.15 Deating a time series
12.16 Summary
Print Workbook 12
This unit is in two parts. Sections 2 to 8 describe methods used to analyse and forecast the values of observations
taken over time. Forecasts of the future values of variables are used in all branches of economic activity. Compa-
nies forecast sales in order to determine appropriate production levels. The government forecasts unemployment
and uses these forecasts in formulating its macroeconomic policies. In this unit some of the basic methods of
forecasting are described.
Sections 9 to 15 are more descriptive. In these sections you will learn how to use index numbers to summarise
time series price and income data. After you have completed this unit, you should be able to:
Dene the components of a time series model;
Describe the multiplicative and additive models;
Estimate linear and quadratic trends;
Estimate seasonal effects in the additive and multiplicative models;
Use the additive and multiplicative models to forecast the values of economic variables;
Calculate an index series for an economic time series;
Calculate a Laspeyres and Paasche price index and explain their advantages and disadvantages;
Describe how the Australian Consumer Price Index is calculated;
Explain what is meant by a deated time series and know how to calculate a deated time series.
12.2 Time series
Time series: A sequence of observations made at equally spaced points of time.
Some examples of time series are:
The daily closing price of BHP on the Australian stock exchange.
The monthly gure for the percentage rate of unemployment in Australia.
The quarterly GNP of Australia.
The annual prots of the Commonwealth Bank
The basic assumption of time series analysis is that patterns of change that have been observed in the past will
continue into the future. The patterns in the past were caused by underlying economic and social forces and
if these forces continue to operate the observed patterns should continue in the future. Forecasts are made by
identifying these past patterns and extrapolating them into the future.
If there is a change in the underlying economic and social conditions then the forecasts generated will be inac-
curate. One of the problems with a time series analysis is that no attempt is made to identify the causes of the
observed patterns and so it is difcult to decide if the underlying forces have changed.
Traditionally four components of a time series are identied:
1. a long term trend (T
t
);
2. a cyclical pattern around the trend (C
t
);
3. a seasonal component (S
t
) and
4. a random component (R
t
).
In a time series analysis the rst three of these components are estimated and then extrapolated into the future
to give forecasts of the future values of the variable. By denition the future values of the random component
cannot be forecast.
The cyclical component referred to above is often of irregular length and difcult to estimate so in this unit we
will only consider the trend component, the seasonal component and the random component of a time series. The
rst step in any time series analysis is to graph the data and examine the graph for trend and seasonal components.
Example 12.1
The sales of beer in a large club are given in Table 12.1 below.
Table 12.1: Sales of Beer 1996-2000 ($000)
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4
1996 115 114 111 154
1997 158 135 146 181
1998 170 154 163 196
1999 193 165 165 201
2000 197 165 161 195
Source: Club Sales Reports 1996 2000
Examine these data to see if there is a trend and/or a seasonal component.
Solution
Figure 12.1: Sales of Beer 1996 2000 ($000)
Source: Club Annual Sales Reports 1996 2000
Year
50
100
150
200
250
Sales($000)

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
1996 1997 1998 1999 2000
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
With these data there is a clear upward trend (shown as a red line in the diagram) with a possible down turn at the
end of the period.
There is a strong seasonal component with sales always being above the trend line in the rst and fourth quarters
of the year and below the trend line in the second and third quarters of the year.
Quick Question 1
12.3 The additive and multiplicative time series models
A time series is a combination of
1. a long term trend (T
t
);
2. a seasonal component (S
t
) and
3. a random component (R
t
).
There are two models of how these effects interact to give the observed time series the additive model and the
multiplicative model.
The additive model
In the additive model the time series is the sum of the three components:
Y
t
= T
t
+ S
t
+ R
t
t = 1, 2, . . . , T
If the sum of the seasonal effect and the random effect is positive the observed value will be above the
trend line.
If the sum of the seasonal effect and the random effect is zero the observed value will lie on the trend
line.
If the sum of the seasonal effect and the random effect is negative the observed value will be below the
trend line.
Over the year the seasonal effects should cancel out and so
Year
S
t
= 0
If the seasonal effects are constant over time the seasonal effects cause uctuations around the trend line of the
same magnitude each year, irrespective of the size of the trend value.
Figure 12.2: The Additive Time Series Model
Y
t
t
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The multiplicative model
In the multiplicative model the time series is the product of the three components:
Y
t
= T
t
S
t
R
t
t = 1, 2, . . . , T
If the product of the seasonal effect and the random effect is greater than 1 the observed value will be
above the trend line.
If the product of the seasonal effect and the random effect is 1 the observed value will lie on the trend
line.
If the product of the seasonal effect and the random effect is less than 1 the observed value will be below
the trend line.
Over the year the seasonal effects should cancel out and so
Year
S
t
= 1
If the seasonal effects are constant over time the each seasonal effect is a constant proportion of the trend value.
The seasonal uctuations increase in magnitude as the trend value increases.
Figure 12.3: The Multiplicative Time Series Model
Y
t
t
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
To decide which model to use plot the dataalways plot the data before analysing time series.
1. If the size of the seasonal uctuations are constant for all values of the trend line use the additive model.
2. If the size of seasonal uctuations increase as the trend line rises use the multiplicative model.
Quick Question 2
12.4 Estimating the trend
The trend line is a smooth curve drawn through the observations. Many different shapes can be used for the trend
line but in this unit we will only consider two a linear trend line and a quadratic trend line.
A linear trend
T
t
=
0
+
1
t
This trend line is used when the time series uctuates around a straight line.
Figure 12.4: A Linear Trend Line
Y
t
t
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
You already know how to use to Excel to estimate linear regressions! To estimate a linear trend, rst number the
observations 1, 2, 3, . . . , T. Now use the regression instruction in Excel with the observed values as the dependent
variable and the observation numbers as the independent variable.
A quadratic trend line
T
t
=
0
+
1
t +
2
t
2
This trend line is used when the time series uctuates around a curve.
Figure 12.5: A Quadratic Trend Line
Y
t
t

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Here it would clearly be inappropriate to draw a straight line trend line through these points as there is a curve to
the observations. Never t a straight line to curved data! Use a quadratic trend line here. The procedure for using
Excel to estimate a quadratic trend line is illustrated in the following example.
Example 12.2
1996 115 114 111 154
1997 158 135 146 181
1998 170 154 163 196
1999 193 165 165 201
2000 197 165 161 195
(a) Fit a trend line to these data.
(b) What is the estimated trend gure for quarters 1,2,3 and 4 of 2001?
(a) The data is graphed below with a straight line trend line drawn in. There appears to be a slight curve to the
data and so a quadratic trend line should be used.
Figure 12.6: Sales of Beer 1996 2000 ($000)
Year
50
100
150
200
250
Sales($000)

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
1996 1997 1998 1999 2000
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
To estimate a quadratic trend line enter the data into Excel as shown below (you do not need to colour the
cells):
Table 12.3: Excel Data for Estimating a Quadratic Trend
Quarter Y t tsq
1996.1 115 1 1
1996.2 114 2 4
1996.3 111 3 9
1996.4 154 4 16
.
.
.
.
.
.
.
.
.
.
.
.
2000.1 197 17 289
2000.2 165 18 324
2000.3 161 19 361
2000.4 195 20 400
1. Start the regression package used in Unit 4. The Regression dialogue box appears.
2. For the Input Y Range box select the Y columnthe yellow coloured cells above.
3. For the Input X Range select the t and tsq columnsthe green coloured cells above.
(For a linear trend only select the t column).
4. Remember to click the Labels box and then click the OK box.
Part of the Excel output is reproduced below:
Table 12.4: The Estimated Quadratic Trend Line
Coefcients Standard Error t Stat
Intercept 100.5868421 12.20562785 8.24102155
t 10.13421053 2.676883298 3.785824557
tsq -0.313909774 0.12381837 -2.535243949
The estimated trend line is
T
t
= 100.59 + 10.13t 0.314t
2
For example in the rst quarter of 1996, t = 1
.
.
.

T
1
= 100.59 + 10.13 1 0.314 1
2
= 110
On the trend line, sales of beer in the rst quarter of 1996 is $110,000. The other trend line gures can be
calculated in the same way.
Figure 12.7: The Trend in the Sales of Beer 1996 2000 ($000)
Year
50
100
150
200
250
Sales($000)

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
1996 1997 1998 1999 2000
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
(b) Continuing the numbering of the observations, quarter 1 of 2001 has t = 21, quarter 2 has t = 22 etc.
2001.1

T
21
= 100.59 + 10.13 21 0.314 21
2
= 175
2001.2

T
22
= 100.59 + 10.13 22 0.314 22
2
= 171
2001.3

T
23
= 100.59 + 10.13 23 0.314 23
2
= 167
2001.4

T
24
= 100.59 + 10.13 24 0.314 24
2
= 163
These trend gures are poor estimates of beer sales. In quarters 1 and 4 of each year the observed values are
always above the trend gure and in quarters 2 and 3 the observed values are always below the trend gure.
Trend gures underestimate sales in quarters 1 and 4 and overestimate sales in quarters 2 and 3.
Quick Question 3 Quick Question 4
12.5 Estimating the seasonal effects in the additive model
In the additive model
Y
t
= T
t
+ S
t
+ R
t
t = 1, 2, . . . , T
.
.
. Y
t
T
t
= S
t
+ R
t
t = 1, 2, . . . , T
The difference between the observed time series value (Y
t
) and the estimated trend line (T
t
) is the sum of the sea-
sonal effect (S
t
) and a random effect (R
t
). Random effects are sometimes positive and sometimes negative. Over
time random effects cancel out and so the average random effect is zero. This suggests the following procedure
for estimating the seasonal effects.
1. Estimate the trend line.
2. For each observation calculate the difference between the observed value and the trend line.
3. Estimate each seasonal effect by calculating the mean of all the differences for that season.
In the third step of this process it is assumed that in the averaging process the random effects will cancel out,
leaving an estimated value for the seasonal effect.
This procedure is illustrated in the following example.
Example 12.3
1996 115 114 111 154
1997 158 135 146 181
1998 170 154 163 196
1999 193 165 165 201
2000 197 165 161 195
The estimated trend line for these data is
T
t
= 100.59 + 10.13t 0.314t
2
with t = 1 in 1996.1
Calculate the seasonal effect for each of the four quarters.
Solution
In Table 12.6 overleaf the trend gures were generated by substituting t = 1, 2, . . . , 20, in the above trend line
equation. (The easiest way to calculate these gures is to click the Residuals box when estimating the trend line
on Excel. Excel then calculates the trend gures (
T
t
) and the residuals (Y
t

T
t
) for each observation. Rounding
errors will cause hand calculated gures to differ slightly from those given below.)
Steps 1 and 2: Estimate the trend line and calculate the difference between the observed value and the trend line.
Table 12.6: Calculating the Sum of the Seasonal and Random Effects
Quarter t Y
t

T
t
Y
t

T
t
=

R
t
+

S
t
1996.1 1 115 110.41 4.59
1996.2 2 114 119.60 5.60
1996.3 3 111 128.16 17.16
1996.4 4 154 136.10 17.90
1997.1 5 158 143.41 14.59
1997.2 6 135 150.09 15.09
1997.3 7 146 156.14 10.14
1997.4 8 181 161.57 19.43
1998.1 9 170 166.37 3.63
1998.2 10 154 170.54 16.54
1998.3 11 163 174.08 11.08
1998.4 12 196 176.99 19.01
1999.1 13 193 179.28 13.72
1999.2 14 165 180.94 15.94
1999.3 15 165 181.97 16.97
1999.4 16 201 182.37 18.63
2000.1 17 197 182.15 14.85
2000.2 18 165 181.30 16.30
2000.3 19 161 179.82 18.82
2000.4 20 195 177.71 17.29
Step 3 : Estimate each seasonal effect by calculating the mean of all the differences for each quarter
First look at the effect of quarter 1. In the years between 1996 and 2000 the observed sales were the following
amounts above the estimated trend line.
Year 1996 1997 1998 1999 2000
S
t
+

R
t
4.59 14.59 3.63 13.72 14.85
In each of the ve years the sales in quarter 1 were above the trend line. On average the effect of quarter 1 was to
raise the sales above the trend line by:
Mean =
4.59 + 14.59 + 3.63 + 13.72 + 14.85
5
= 10.28
The estimated quarter 1 seasonal effect is

Q
1
= +10.28.
The seasonal effects for the other three quarters are calculated in the same way.
Year 1996 1997 1998 1999 2000 Mean
Quarter 1 4.59 14.59 3.63 13.72 14.85 10.28
Quarter 2 5.60 15.09 16.54 15.94 16.30 13.89
Quarter 3 17.16 10.14 11.08 16.97 18.82 14.83
Quarter 4 17.90 19.43 19.01 18.63 17.29 18.45
Thus in quarter 1 of each year sales are predicted to be 10.28 above the trend line, in quarter 2 of each year sales
are predicted to be 13.69 below the trend line etc.
Notice that over the year the seasonal effects on sales cancel out as:
10.28 + (13.89) + (14.83) + 18.45 = 0
In reading published statistics you will often see deseasonalised or seasonally adjusted gures displayed.
Seasonally adjusted gures: In the additive model the calculated values of Y
t

S
t
are called the seasonally
adjusted (or deseasonalised) gures. They are estimates of the sum of the trend and random components of a time
series.
12.6 Forecasts and residuals with the additive model
In the additive model:
Y
t
= T
t
+ S
t
+ R
t
t = 1, 2, . . . , T
In section 12.4 you learned how to estimate the trend line T
t
.
In section 12.5 you learned how to estimate the seasonal effects S
t
.
The random effects R
t
are random and so by denition cannot be predicted.
The forecast value for any period t is therefore
Y
t
=

T
t
+

S
t
Example 12.4
The trend line for the quarterly sales of beer (in $000) at a large club is:
T
t
= 100.59 + 10.13t 0.314t
2
with t = 1 in 1996.1
The estimated quarterly effects are ($000)
Quarter 1 Quarter 2 Quarter 3 Quarter 4
+10.28 13.89 14.83 18.45
Forecast the beer sales for each quarter of 2001.
Solution
The forecast sales are:
Table 12.7: Forecasting the Sales of Beer
Period t Trend Seasonal Effect Forecast Sales
2001.1 21 174.97 10.28 185.25
2001.2 22 171.61 13.89 157.72
2001.3 23 167.62 14.83 152.79
2001.4 24 163.00 17.29 181.45
Table 12.8: The Forecasted Beer Sales in 2001 ($)
Quarter 1 185,000
Quarter 2 158,000
Quarter 3 153,000
Quarter 4 181,000
(The above gures were generated on Excel. Due to rounding errors in the given trend line hand calculated gures
will differ slightly from those given above.)
Two checks should be carried out before accepting forecasts generated in this way:
(a) Use the estimated trend line and seasonal effects to forecast the past values of the time series. See how
closely the forecasts agree with the observed values. If this procedure cannot accurately forecast past
values its forecasts for future values will not be reliable.
(b) For each observed value estimate the random term. We have
Y
t
= T
t
+ S
t
+ R
t
.
.
. R
t
= Y
t
(T
t
+ S
t
)
and the estimated trend line and seasonal effects can be used to estimate the random effects.
R
t
= Y
t
(
T
t
+

S
t
)
.
.
.

R
t
= Y
t

Y
t
These random effects should be random. Plot the estimated random effects against t and examine for
randomness. If the residuals are not random, then the trend line used may be incorrect or the seasonal
pattern may be changing over time. Again this will make any forecasts unreliable.
Example 12.5
Table 12.9: Sales of Beer 1996-2000
1996 115 114 111 154
1997 158 135 146 181
1998 170 154 163 196
1999 193 165 165 201
2000 197 165 161 195
T
t
= 100.59 + 10.13t 0.314t
2
with t = 1 in 1996.1
and the estimated seasonal effects are:
Quarter 1 2 3 4
Seasonal Effect +10.28 13.89 14.83 +18.45
(a) Use the estimated trend line and seasonal effects to forecast the beer sales for 1996 to 2000. Compare the
forecast values to the observed values.
(b) Estimate the random effects for each quarter form 1996 to 2000 and plot these against time.
Solution
(a)
Table 12.10: Calculating the Forecast Vales for 1996 to 2000
Quarter t Y
t

T
t

S
t

Y
t
=

T
t
+

S
t
1996.1 1 115 110.41 10.28 120.69
1996.2 2 114 119.60 -13.89 105.71
1996.3 3 111 128.16 -14.83 113.33
1996.4 4 154 136.10 18.45 154.55
1997.1 5 158 143.41 10.28 153.69
1997.2 6 135 150.09 -13.89 136.20
1997.3 7 146 156.14 -14.83 141.31
1997.4 8 181 161.57 18.45 180.02
1998.1 9 170 166.37 10.28 176.65
1998.2 10 154 170.54 -13.89 156.65
1998.3 11 163 174.08 -14.83 159.25
1998.4 12 196 176.99 18.45 195.44
1999.1 13 193 179.28 10.28 189.56
1999.2 14 165 180.94 -13.89 167.05
1999.3 15 165 181.97 -14.83 167.14
1999.4 16 201 182.37 18.45 200.82
2000.1 17 197 182.15 10.28 192.43
2000.2 18 165 181.30 -13.89 167.41
2000.3 19 161 179.82 -14.83 164.99
2000.4 20 195 177.71 18.45 196.16
Figure 12.8: Comparing the Observed and Forecast Sales Values
Y
t
Year
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.......................................................................................................
Observed Sales
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.......................................................................................................
Forecast Sales
1996 1997 1998 1999 2000
There is a close agreement between the observed and the forecast values for this time series.
(b)
Quarter t Y
t

T
t

S
t

Y
t
=

T
t
+

S
t

R
t
= Y
t

Y
t
1996.1 1 115 110.41 10.28 120.69 -5.69
1996.2 2 114 119.60 -13.89 105.71 8.29
1996.3 3 111 128.16 -14.83 113.33 -2.33
1996.4 4 154 136.10 18.45 154.55 -0.55
1997.1 5 158 143.41 10.28 153.69 4.31
1997.2 6 135 150.09 -13.89 136.20 -1.20
1997.3 7 146 156.14 -14.83 141.31 4.69
1997.4 8 181 161.57 18.45 180.02 0.98
1998.1 9 170 166.37 10.28 176.65 -6.65
1998.2 10 154 170.54 -13.89 156.65 -2.65
1998.3 11 163 174.08 -14.83 159.25 3.75
1998.4 12 196 176.99 18.45 195.44 0.56
1999.1 13 193 179.28 10.28 189.56 3.44
1999.2 14 165 180.94 -13.89 167.05 -2.05
1999.3 15 165 181.97 -14.83 167.14 -2.14
1999.4 16 201 182.37 18.45 200.82 0.18
2000.1 17 197 182.15 10.28 192.43 4.57
2000.2 18 165 181.30 -13.89 167.41 -2.41
2000.3 19 161 179.82 -14.83 164.99 -3.99
2000.4 20 195 177.71 18.45 196.16 -1.16
Figure 12.9: Plot of Residuals Against Time
t
10
5
0
5
10
Residual
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The residuals appear to be randomly distributed. The forecasts from this model are reliable provided there
are no structural changes. The model makes no allowance for such factors as a change in the tax on beer
forcing the price up, a competing club opening close by, a change in attitudes to drinking etc. Any of these
changes could result in the forecasts being wide of the mark.
Quick Question 5
12.7 Estimating the seasonal effects in the multiplicative model
The trend is estimated in the multiplicative model in exactly the same way as in the additive model. The additive
and multiplicative models differ in the methods used to estimate seasonal effects and in the making of forecasts.
Example 12.6
Due to increased demand for rental properties in recent years, the number of unlet units on the books of an estate
agent has declined. The number of unlet units at the end of each quarter for the last four years is shown below.
Would you use an additive or a multiplicative model to analyse this time series?
Table 12.12: Number of Unlet Rental Properties
1997 76 111 90 82
1998 62 88 84 57
1999 42 83 64 48
2000 43 64 46 39
Source: Estate Agent Records 1997 2000
Solution
The data are graphed below with a quadratic trend line drawn in.
Figure 12.10: Number of Unlet Units at the End of the Quarter
20
40
60
80
100
120
Number unlet
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Year
1997 1998 1999 2000
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
As the trend line declines the amplitude of the seasonal uctuations also declines. This suggests using a multi-
plicative model.
In the multiplicative model
Y
t
= T
t
S
t
R
t
t = 1, 2, . . . , T
Multiplying a number by 1 leaves the number unchanged. Thus with a multiplicative model if a seasonal or
random component has a value of 1 it has no effect. If a component has a value of greater than 1 it leads to an
increase in the observed value and if it has a value of less than 1 it leads to a decrease in the observed value.
Dividing both sides of the above equation by the trend value gives:
Y
t
T
t
= S
t
R
t
t = 1, 2, . . . , T
The ratio of the observed time series value and the trend line is the product of the seasonal effect and a random
effect. Multiplicative random effects are sometimes greater than 1 and sometimes less than 1. Over time, random
effects cancel out and so the average random effect is 1. This suggests the following procedure for estimating the
seasonal effects.
1. Estimate the trend line.
2. For each observation calculate the ratio of the observed value and the estimated trend value.
3. Estimate each seasonal effect by calculating the geometric mean of all the ratios for that season.
In the third step of this process it is assumed that in calculating the geometric mean the random effects will cancel
out, leaving an estimated value for the seasonal effect.
This procedure is illustrated in the following example.
Example 12.7
The number of unlet units on the books of an estate agent at the end of each quarter for the last four years is
shown below.
1997 76 111 90 82
1998 62 88 84 57
1999 42 83 64 48
2000 43 64 46 39
T
t
= 96.39 3.52t + 0.011t
2
with t = 1 in 1997.1
Calculate the seasonal effect for each of the four quarters.
Solution
In the table overleaf the trend gures were generated by substituting t = 1, 2, . . . , 20 in the above trend line
equation. Excel was used for these calculations and so the gures may show minor discrepancies when compared
to hand calculations.
Steps 1 and 2: Estimate the trend line and calculate the difference between the observed value and the trend line.
Table 12.14: Calculating the Product of the Seasonal and Random Effects
Quarter t Y
t

T
t
Y
t
T
t
=

R
t

S
t
1997.1 1 76 92.8811 0.8183
1997.2 2 111 89.3890 1.2418
1997.3 3 90 85.9182 1.0475
1997.4 4 82 82.4687 0.9943
1998.1 5 62 79.0407 0.7844
1998.2 6 88 75.6339 1.1635
1998.3 7 84 72.2486 1.1627
1998.4 8 57 68.8846 0.8275
1999.1 9 42 65.5419 0.6408
1999.2 10 83 62.2206 1.3340
1999.3 11 64 58.9207 1.0862
1999.4 12 48 55.6421 0.8627
2000.1 13 43 52.3849 0.8208
2000.2 14 64 49.1491 1.3022
2000.3 15 46 45.9346 1.0014
2000.4 16 39 42.7414 0.9125
Step 3
First look at the effect of quarter 1. In the years between 1997 and 2000 the observed numbers of unlet units were
the following proportions of the trend line.
Year 1997 1998 1999 2000
S
t

R
t
0.8183 0.7844 0.6408 0.8208
Each of these numbers is less than 1 and so in each of the four years the number of unlet units in quarter 1 was
below the trend line. On average the effect of quarter 1 was to lower the number below the trend line by:
GM =
4
0.8183 0.7844 0.6408 0.8208 = 0.7623

The observed gure in quarter 1 is on average only 76.23% of the trend gure. The estimated quarter 1 seasonal
effect is

Q
1
= 0.7623.
Notice that as the model is multiplicative, a multiplicative average (the geometric mean) is used to average the
seasonal effects.
The seasonal effects for the other three quarters are calculated in the same way.
Table 12.15: Calculating the Multiplicative Seasonal Effects
Year 1997 1998 1999 2000 Geometric Mean
Quarter 1 0.8183 0.7844 0.6408 0.8208 0.7623
Quarter 2 1.2418 1.1635 1.3340 1.3022 1.2587
Quarter 3 1.0475 1.1627 1.0862 1.0014 1.0728
Quarter 4 0.9943 0.8275 0.8627 0.9125 0.8971
Thus in quarter 1 vacancies are predicted to be 76.23% of above the trend gure, in quarter 2 vacancies are
predicted to be 125.87% of the trend gure etc.
Seasonally adjusted gures can be calculated from the multiplicative model.
Seasonally adjusted gures: In the multiplicative model the calculated values of Y
t
/
S
t
are called the seasonally
adjusted (or deseasonalised) gures. They are estimates of the product of the trend and random components of a
time series.
12.8 Forecasts and residuals with the multiplicative model
In the multiplicative model:
Y
t
= T
t
S
t
R
t
t = 1, 2, . . . , T
In section 12.4 you learned how to estimate the trend line T
t
.
In section 12.7 you learned how to estimate the seasonal effects S
t
.
The random effects R
t
are random and so by denition cannot be predicted.
The forecast value for any period t is therefore
Y
t
=

T
t

S
t
Example 12.8
The estimated trend line for the number of unlet units on the books of an estate agent at the end of each quarter is
T
t
= 96.39 3.52t + 0.0.011t
2
with t = 1 in 1997.1
The estimated quarterly effects are ($000)
0.7623 1.2587 1.0728 0.8971
Forecast the number of unlet units for each quarter of 2001.
Solution
The forecasts are:
Table 12.16: Forecasting the Number of Unlet Units
Period t Trend Seasonal Effect Forecast Number
2001.1 17 39.5696 0.7623 30.16
2001.2 18 36.4192 1.2587 45.84
2001.3 19 33.2901 1.0728 35.71
2001.4 20 30.1824 0.8971 27.08
Table 12.17: The Forecasted Number of Unlet Units in 2001
Quarter 1 30
Quarter 2 46
Quarter 3 36
Quarter 4 27
The same two checks as for the additive model should be carried out before accepting forecasts generated in this
way
(a) Use the estimated trend line and seasonal effects to forecast the observed values of the time series. See
how closely the forecasts agree with the observed values. If this procedure cannot accurately forecast past
values its forecasts for future values will not be reliable.
(b) For each observed value estimate the random term. We have
Y
t
= T
t
S
t
R
t
.
.
. R
t
=
Y
t
T
t
S
t
and the estimated trend line and seasonal effects can be used to estimate the random effects.
R
t
=
Y
t
T
t

S
t
.
.
.

R
t
=
Y
t
Y
t
These random effects should be random. Plot the estimated random effects against t and examine for
randomness. If the residuals are not random, then the trend line used may be incorrect or the seasonal
pattern may be changing over time. Again this will make any forecasts unreliable.
Example 12.9
The number of unlet units on the books of an estate agent at the end of each quarter for the last four years is
shown below.
1997 76 111 90 82
1998 62 88 84 57
1999 42 83 64 48
2000 43 64 46 39
T
t
= 96.39 3.52t + 0.0.011t
2
with t = 1 in 1997.1
And the estimated multiplicative seasonal effects are:
0.7623 1.2587 1.0728 0.8971
(a) Use the estimated trend line and seasonal effects to forecast the number of unlet units for 19972000.
Compare the forecast values to the observed values.
(b) Estimate the random effects for each quarter from 1997 to 2000 and plot these against time.
(a)
Quarter t Y
t
T
t
S
t

Y
t
=

T
t

S
t
1997.1 1 76 92.8811 0.7623 70.80
1997.2 2 111 89.3890 1.2587 112.51
1997.3 3 90 85.9182 1.0728 92.17
1997.4 4 82 82.4687 0.8971 73.98
1998.1 5 62 79.0407 0.7623 60.25
1998.2 6 88 75.6339 1.2587 95.20
1998.3 7 84 72.2486 1.0728 77.51
1998.4 8 57 68.8846 0.8971 61.80
1999.1 9 42 65.5419 0.7623 49.96
1999.2 10 83 62.2206 1.2587 78.32
1999.3 11 64 58.9207 1.0728 63.21
1999.4 12 48 55.6421 0.8971 49.92
2000.1 13 43 52.3849 0.7623 39.93
2000.2 14 64 49.1491 1.2587 61.86
2000.3 15 46 45.9346 1.0728 49.28
2000.4 16 39 42.7414 0.8971 38.34
Figure 12.11: Number of Unlet Units at the End of the Quarter
20
40
60
80
100
120
Number unlet
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.........................................................................................................................
Observed Number
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.........................................................................................................................
Predicted Number
Year
1997 1998 1999 2000
There is a close agreement between the observed and the forecast values for this time series.
(b)
Table 12.20: Calculating the Residuals for 1997 to 2000
Quarter t Y
t

T
t

S
t

Y
t
=

T
t

S
t

R
t
=
Y
t
Y
t
1997.1 1 76 92.8811 0.7623 70.80 1.0734
1997.2 2 111 89.3890 1.2587 112.51 0.9865
1997.3 3 90 85.9182 1.0728 92.17 0.9764
1997.4 4 82 82.4687 0.8971 73.98 1.1084
1998.1 5 62 79.0407 0.7623 60.25 1.0290
1998.2 6 88 75.6339 1.2587 95.20 0.9244
1998.3 7 84 72.2486 1.0728 77.51 1.0838
1998.4 8 57 68.8846 0.8971 61.80 0.9224
1999.1 9 42 65.5419 0.7623 49.96 0.8406
1999.2 10 83 62.2206 1.2587 78.32 1.0598
1999.3 11 64 58.9207 1.0728 63.21 1.0125
1999.4 12 48 55.6421 0.8971 49.92 0.9616
2000.1 13 43 52.3849 0.7623 39.93 1.0768
2000.2 14 64 49.1491 1.2587 61.86 1.0345
2000.3 15 46 45.9346 1.0728 49.28 0.9335
2000.4 16 39 42.7414 0.8971 38.34 1.0171
Figure 12.12: Plot of Residuals Against Time
t
0.8
0.9
1.0
1.1
1.2
Residual
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The residuals appear to be randomly distributed. The forecasts from this model are reliable provided there are
no structural changes. The model makes no allowance for such factors as a change in the interest rate or laws
on negative gearing affecting the supply of rental properties. Any of these changes could result in the forecasts
being wide of the mark.
Quick Question 6
12.9 Index numbers
With many economic time series the main focus of interest is on the percentage changes in the value of the
variable over time. For example, consider the average weekly earnings of an employee in the period 19901999.
Table 12.21: The Average Weekly Earnings of an Australian Employee
Year Average Weekly Earnings ($)
1990 471.1
1991 485.6
1992 505.8
1993 518.7
1994 532.8
1995 549.0
1996 565.5
1997 578.9
1998 597.5
1999 612.7
Source:Average Weekly Earnings, ABS Cat. No. 6302.0
What is the percentage increase in average weekly earnings between 1990 and 1991? Between 1990 and 1999?
The percentage increase between a base period and a current period is calculated by:
% increase =
current period value base period value
base period value
100
Example 12.10
From the data in Table 12.21 what was the percentage increase in average weekly earnings from
(a) 1990 to 1991?
(b) 1990 to 1999?
Solution
(a) % increase =
earnings in 1991 earnings in 1990
earnings in 1990
100
=
485.6 471.1
471.1
100
= 3.078
Between 1990 and 1991 average weekly earnings increased by 3.1%.
(b) % increase =
earnings in 1999 earnings in 1990
earnings in 1990
100
=
612.7 471.1
471.1
100
= 30.057
Between 1990 and 1999 average weekly earnings increased by 30.1%.
Notice that if all the numbers in a time series are multiplied by the same constant, then in the formula:
% increase =
current period value base period value
base period value
100
the numerator and denominator would both be multiplied by the constant and so the percentage increases would
be unaffected.
Multiplying or dividing all the values in a time series by the same constant is called scaling the series. Scaling a
series does not affect the percentage changes in the series. It is often convenient to scale a series to have a value
of 100 in some period. Such a scaled series is called an index.
Index: A time series that has been scaled so that the value in one period, called the base period, is 100.
Example 12.11
Using the data in Table 12.22, construct an index of weekly earning with base year 1990.
Table 12.22: The Average Weekly Earnings of an Australian Employee
Year Average Weekly Earnings ($)
1990 471.1
1991 485.6
1992 505.8
1993 518.7
1994 532.8
1995 549.0
1996 565.5
1997 578.9
1998 597.5
1999 612.7
Solution
The time series in Table 12.22 is to be scaled to have value 100 in 1990. Recall that scaling means multiplying
or dividing all the values in a time series by the same constant. First divide all the values in the time series by the
base year value. The base year value is now 1. Now multiply all the values in the time series by 100. The base
year value is now 100.
Table 12.23: Calculating an Index for the Average Earnings of an Australian Employee
Year Average Weekly Earnings ($) Scaling the Time Series
1990 471.1 471.1
100
471.1
= 100.0
1991 485.6 485.6
100
471.1
= 103.1
1992 505.8 505.8
100
471.1
= 107.4
1993 518.7 518.7
100
471.1
= 110.1
1994 532.8 532.8
100
471.1
= 113.1
1995 549.0 549.0
100
471.1
= 116.5
1996 565.5 565.5
100
471.1
= 120.0
1997 578.9 578.9
100
471.1
= 122.9
1998 597.5 597.5
100
471.1
= 126.8
1999 612.7 612.7
100
471.1
= 130.1
Table 12.24: An Index for the Average Earnings of an Australian Employee
Year Index of Average Weekly Earnings ($)
1990 100.0
1991 103.1
1992 107.4
1993 110.1
1994 113.1
1995 116.5
1996 120.0
1997 122.9
1998 126.8
1999 130.1
An index series shows the percentage changes from the base period. In the above table you can see that between
1990 and 1991 earnings increased by 3.1%, between 1990 and 1992 earnings increased by 7.4% and between
1990 and 1999 earnings increased by 30.1%.
To calculate the changes between two non-base period gures, the percentage increase formula must still be used.
Quick Question 7
12.10 Price indices
The most widely quoted and used indices are prices indices. These are used to show the percentage change in
prices over time. You will learn how to calculate simple and compound prices indices.
Simple price index: An index of the price of a single good
Simple price indices can be calculated in the same way as the earnings index described in the previous section.
Example 12.12
Over the last four years the prices of three alcoholic drinks served at a local club were as shown below.
Table 12.25: Prices of Alcoholic Drinks ($)
Drink Beer Wine Whisky
(Schooner) (Bottle) (Tot)
1997 2.00 10.00 4.00
1998 2.10 9.90 4.60
1999 2.20 9.70 5.00
2000 2.50 9.50 6.00
Construct a price index series with base year 1997 for each of these drinks.
Solution
Each of these series has to be scaled to have a value of 100 in 1997.
1. the beer series should be divided by 2.00 and then multiplied by 100
2. the wine series should be divided by 10.00 and then multiplied by 100
3. the whisky price series should be divided by 4.00 and then multiplied by 100.
Table 12.26: Index Series for Alcoholic Drinks
Drink Beer Wine Whisky
1997 100.0 100.0 100.0
1998 105.0 99.0 115.0
1999 110.0 97.0 125.0
2000 125.0 95.0 150.0
Over this four year period the price of a beer has risen by 25%, the price of a bottle of wine has fallen by 5% and
the price of a tot of whisky has risen by 50%.
Compound price index: An index of the prices of a several goods.
In the remainder of this unit you will learn how to combine index series of the prices of a number of goods into
a single compound price index series. The most important example of this type of index is the consumer price
index. This index is used to measure the rate of ination in Australia and often impacts on government monetary
and scal policy.
In the above example some prices have gone up and some have gone down. What has happened to the average
price of an alcoholic drink over time? The obvious method is to average the three index series constructed above.
The index for 1998 would then be:
I
98
=
105.0 + 99.0 + 115.0
3
= 106.3
This gives the following index for the price of alcoholic drinks.
Table 12.27: Index Series for Alcoholic Drinks
Year Index
1997 100.0
1998 106.3
1999 110.7
2000 123.3
With this index the price of alcoholic drinks have risen by 23.3% between 1997 and 2000.
However consider someone who regularly drinks wine and only has the very occasional beer and whisky. For this
person the fall in the price of wine is much more important than the increases in the other two pricesfor this
person the cost of the average drink she buys has fallen. Similarly for a person who only buys whisky the cost of
the average drink has risen by 50%. The index in Table 12.27 only gives the average change in the price of the
drinks bought by someone who buys equal numbers of each drink.
In the averaging of index numbers method described above changes in the prices of each of the three drinks are
counted as being equally important but the changes are not equally important. To the wine drinker the change
in the price of wine is more important than the change in the price of beer or whisky. To the whisky drinker the
change in the price of whisky is all that matters. There is no general average price change for a group of goods.
The average price change depends upon how much of each good is purchased!
The method used to construct average price indices is to estimate typical quantities of each good purchased and
determine the cost of buying this typical bundle of goods each period. The index of this purchase cost time series
is used as the combined price index.
Notice that in using this method the same quantities must be purchased each period. Any changes in the cost of
purchasing this bundle is then due to changes in prices and not changes in the quantities purchased.
Example 12.13
Over the last four years the prices of three alcoholic drinks served at a local club were as shown below.
Table 12.28: Prices of Alcoholic Drinks ($)
Drink Beer Wine whisky
(Schooner) (Bottle) (Tot)
1997 2.00 10.00 4.00
1998 2.10 9.90 4.60
1999 2.20 9.70 5.00
2000 2.50 9.50 6.00
Joe Soak buys 30 schooners of beer, one bottle of wine and two whiskies a week. Construct a price index with
base year 1997 for the drinks purchased by Joe.
Solution
Table 12.29: Cost of Joe Soaks Drinks ($)
Year Purchase Cost ($)
1997 2.00 30 + 10.00 1 + 4.00 2 = 78.00
1998 2.10 30 + 9.90 1 + 4.60 2 = 82.10
1999 2.20 30 + 9.70 1 + 5.00 2 = 85.70
2000 2.50 30 + 9.50 1 + 6.00 2 = 96.50
Notice that the same quantities are purchased each year. The cost changes because of changes in the price.
Dividing each term in this series by 78.00 and multiplying by 100 gives the following index series:
Table 12.30: Index Series for the Cost of Joe Soaks Drinks
Year Index
1997 100.0
1998 105.3
1999 109.9
2000 123.7
For Joe the price of a drink has risen by 23.7% between 1997 and 2000.
The method used to compute a compound price index for several goods is:
1. Determine the quantity of each each good purchased.
2. Calculate the cost of buying this bundle of goods in the periods to be compared.
3. Scale the time series of these costs to give an index series.
There are two main approaches determining the quantities used in the rst step of constructing an indexthe
Laspeyres index and the Paasche index. These are discussed in the next two sections.
Quick Question 8
12.11 Constructing a Laspeyres price index
The rst step of constructing a compound price index is to determine the quantities of each good purchased. Over
time the quantities purchased change and so the question arises as to which quantities to use.
Consider Joe Soak again. As the price of wine falls so Joe decides to drink more wine and less beer.
Table 12.31: Prices ($) and Quantities of Alcoholic Drinks Consumed per Week
Year Beer Wine Whisky
Price ($) Quantity Price ($) Quantity Price($) Quantity
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 2
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
Construct a price index series with base year 1997 for Joe Soaks drinks.
To construct the index we must calculate the cost of buying the same quantities of each good in 1997, 1998, 1999
and 2000. But which quantities should we use?
With a Laspeyres index the base year quantities are used.
Example 12.14
The prices of alcoholic drinks and the quantities of each purchased by Joe Soak each week are as shown in Table
12.32.
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 2
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
Construct a Laspeyres price index series with base year 1997 for Joe Soaks drinks.
Solution
The base year is 1997. In 1997 Joe purchased 30 beers, 1 bottle of wine and 2 tots of whisky. The cost of
purchasing these quantities in each year is:
Table 12.33: Cost of Joe Soaks Drinks ($)
1997 2.00 30 + 10.00 1 + 4.00 2 = 78.00
1998 2.10 30 + 9.90 1 + 4.60 2 = 82.10
1999 2.20 30 + 9.70 1 + 5.00 2 = 85.70
2000 2.50 30 + 9.50 1 + 6.00 2 = 96.50
Scaling the cost time series to have value 100 in the base year gives the following index.
Table 12.34: Laspeyres Price Index for Alcohol
Year Index
1997 100.0
1998 105.3
1999 109.9
2000 123.7
Notice that with the Laspeyres index we are measuring the cost of the quantities of goods that used to be purchased
in the base period. The index takes no account of changes in purchasing patterns that occur over the time span of
the index. In the later periods the index is based on an outdated purchasing pattern.
The Laspeyres index goes out of date over time. To overcome this problem the index is often restarted at a later
date using more recent consumption patterns. This results in a break in the series, where the new series starts.
The Paasche index overcomes this problem.
Quick Question 9
12.12 Constructing a Paasche price index
Compound price index numbers compare the prices in each period to the prices in the base period. The Laspeyres
price index uses the quantities purchased in the base period for this comparison. The Paasche price index uses the
quantities purchased in the period for which the index is being calculated for this comparison. The procedure is
illustrated in the following example.
Example 12.15
The prices of alcoholic drinks and the quantities of each purchased by Joe Soak each week are as shown in Table
12.36.
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 1
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
Construct a Paasche price index series with base year 1997 for Joe Soaks drinks.
Solution
With the Paasche price index each periods index is constructed separately.
The index for 1998 with base year 1997
Here we are comparing prices in 1998 with the base year of 1997. With a Paasche price index this comparison
uses the quantities purchased in 1998.
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 1
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
In 1998 Joe purchased 20 beers, 3 bottles of wine and 1 whisky. The index for 1998 compares the cost of
purchasing these quantities in 1998 with the cost in the base year.
Table 12.37: Calculating the Cost of Purchasing the 1998 Bundle
1997 2.00 20 + 10.00 3 + 4.00 1 = 74.00
1998 2.10 20 + 9.90 3 + 4.60 1 = 76.30
Now scale the above cost series to have value 100 in the base year.
Table 12.38: Index Series for the Cost of the 1998 Bundle
Year Index
1997 74.00
100.0
74.00
= 100.000
1998 76.30
100.0
74.00
= 103.108
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 1
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
In 1999 Joe purchased 15 beers, 8 bottles of wine and 1 whisky. The index for 1999 uses the cost of these
purchases in 1999 and in the base year.
Table 12.40: Calculating the Paasche Index for 1999
Year Purchase Cost ($) Index
1997 2.00 15 + 10.00 8 + 4.00 1 = 114.00 100.000
1999 2.20 15 + 9.70 8 + 5.00 1 = 115.60 101.403
1997 2.00 30 10.00 1 4.00 2
1998 2.10 20 9.90 3 4.60 1
1999 2.20 15 9.70 8 5.00 1
2000 2.50 10 9.50 12 6.00 0
In 2000 Joe purchased 10 beers, 12 bottles of wine and no whisky.
Table 12.42: Calculating the Paasche Index for 2000
Year Purchase Cost ($) Index
1997 2.00 10 + 10.00 12 + 4.00 0 = 140.00 100.000
2000 2.50 10 + 9.50 12 + 6.00 0 = 139.00 99.286
Gathering the calculated index numbers together (and rounding to one decimal place) gives the Paasche price
index shown below.
Table 12.43: A Paasche Price Index for Alcoholic Drinks
Year Index
1997 100.0
1998 103.1
1999 101.4
2000 99.3
Notice that the Paasche price index uses a different basket of goods for each period of the index. In the above ex-
ample the 1998 index was constructed using the quantities purchased in 1998 and the 1999 index was constructed
using the 1999 quantities. This means that the change in the value of the index from 103.1 to 101.4 is not caused
purely by price changes but is also due, in part, to changes in the quantities used. In general with a Paasche price
index comparisons between non-base year periods should be treated with some cautionthe differences could be
due to either quantity changes or price changes or some combination of the two.
Quick Question 10
12.13 Comparing the Laspeyres and Paasche Price Indices
The Laspeyres and Paasche price indices can give very different results. In the Joe Soap example used above the
two indices are as shown in Table 12.44 below.
Table 12.44: A Laspeyres and a Paasche Price Index for Alcoholic Drinks
Year Laspeyres Price Index Paasche Price Index
1997 100.0 100.0
1998 105.3 103.1
1999 109.9 101.4
2000 123.7 99.3
The Laspeyres price index shows that prices have risen by 23.7% between 1997 and 2000 but the Paasche price
index shows that prices have fallen by 0.7%! Which is right?
There is no right answer. The Laspeyres index compares the costs of purchasing a bundle of goods which contains
a large quantity of beer and only a small quantity of wine. The price of beer has risen between 1997 and 2000 and
so the cost of this bundle has risen considerably. The Paasche index compares the costs of a bundle that contains
a quantity of beer and a quantity of wine. The rise in the price of beer is offset by the fall in the price of wine.
There is no right bundle of goods to use and so there is no right answer.
When comparing Laspeyres and Paasche price indices the following points should be considered:
1. The Paasche price index uses the most up-to-date bundle of goods. The Laspeyres price index uses an
out-of-date bundle of goods which needs to be updated periodically leading to breaks in the series.
2. The Paasche price index uses a different bundle of goods for each period which makes comparisons
between non-base year periods less reliable.
3. The Laspeyres price index only needs quantities for the base period but the Paasche price index needs
quantities to be collected for every period other than the base period. Where gathering quantities is
costly this makes the Paasche price index a much more expensive index to construct than the Laspeyres
price index. It is for this reason that the Paasche price index is used much less than the Laspeyres price
index.
4. The Paasche price index usually gives a lower ination gure than the Laspeyres price index. You can
see this in the Joe Soak example above. As the price of beer increases so Joe Soak substitutes wine for
beer. The Laspeyres price index uses Joes consumption before the price increase and so includes a high
beer component. The Paasche price index uses Joes consumption after the increase in the price of beer
and so gives beer a lower weight than the Laspeyres index. In general consumers switch away from
goods with large price increases into goods with small price increases. The Laspeyres index uses the
quantities consumed before the switch and so gives a greater weight to goods with large price increases
than does the Paasche price index. Thus the Laspeyres price index gives a higher ination gure than
the Paasche price index.
12.14 The Australian consumer price index
The Australian Consumer Price Index (CPI) attempts to measure the changes in the prices of the goods and ser-
vices purchased by the typical Australian household living in a capital city. It is an extremely important index
as it is used in wage negotiations, in xing the wages of lower paid workers by the Arbitration Committee, in
indexing the superannuation payments of Commonwealth Government employees etc.
The CPI is a quarterly index. It was rst constructed in June 1960 but then calculated retrospectively back to the
September quarter of 1948. The ABS calculates a separate CPI for each of the eight capital cities of Australia.
These indices are then weighted and averaged to give the Australian CPI.
Each capital city CPI is a Laspeyres price index.
1. Determining the quantities for the goods
The rst stage is to determine which goods to include and the appropriate quantities for each good. This is done
through a Household Expenditure Survey in each capital city. The population for this survey is the wage and
salary earner households in the capital city with a total household income of more than the minimum adult wage
and excluding high income earners (the top 10% of household incomes).
A multistage random sample of households is taken and each of the selected households is required to record
the quantities of all the goods and services purchased in a specied interval of time. Using this information the
ABS is able to construct a collection of goods and services (with quantities purchased) that account for a large
proportion of the expenditures of the average household in the capital city. The goods are closely speciedfor
example Colgate 50gm oride toothpaste so that prices can be easily identied.
2. Recording the prices of the goods
Price data for the CPI are obtained by eld ofcers carrying out personal interviews. Where possible, 10 or more
representative and reputable retailers are visited in each capital city. Most prices are only collected once a
quarter but some basic prices are collected monthly. In all around 12,000 prices are recorded for each capital city.
It is a very comprehensive price index!
3. Calculating the price index
The index is calculated in essentially the same way as the Laspeyres index you saw calculated in section 12.11.
With the extremely large number of goods and services included in the CPI, it is more convenient, and useful,
to calculate separate indices for different groups of goods (food, clothing, housing, etc) and then combine these
indices into a single index. This is just an administrative convenience and does not change the nature of the index.
Problems with the index
As you have learned above the main problem with a Laspeyres price index is that the quantities used can become
out of date. To overcome this problem the ABS revises the basket of goods used about every ten years. This
results in a number of short run price series rather than a single long run price series but the ABS does publish a
long run series formed by splicing together the short run series. The CPI and the rate of ination for Australia for
the last 10 years is shown overleaf.
Rate of ination: the percentage increase in the CPI
Table 12.45: Australias CPI and Rate of Ination 19902000
End of Year Consumer Price Index Rate of Ination (%)
1990 100.0
1991 106.0 6.0
1992 107.6 1.5
1993 107.9 0.3
1994 110.0 1.9
1995 112.8 2.5
1996 118.5 5.1
1997 120.0 1.3
1898 121.9 0.1
1999 124.1 1.8
2000 131.3 5.8
Source: Australian Bureau of Statistics, Cat. No. 6450.0
For more details on the CPI read A Guide to the Consumer Price Index: 14th Series (ABS Cat No. 6440.0).
This can be downloaded from the ABS Web site.
12.15 Deating a time series
In the last ten years Australian average weekly earnings have increased but so have prices. If earnings rise by
more than prices then workers can buy more goods with their incomes than before and so become better off. If
earnings rise by less than prices then workers become worse off. Have Australian workers become better off over
the last 10 years?
Table 12.46: Average Weekly Earnings and the CPI in Australia
Year Weekly Earnings ($) Consumer Price Index (8990=100)
1990 471.1 100.0
1991 485.6 106.0
1992 505.8 107.6
1993 518.7 107.9
1994 532.8 110.0
1995 549.0 112.8
1996 565.5 118.5
1997 578.9 120.0
1998 597.5 121.9
1999 612.7 124.1
Source: Australian Bureau of Statistics
One method of seeing whether earnings are increasing more quickly than wages is to calculate the fraction:
Average weekly earnings
Consumer price index
100
If average weekly earnings rise faster than the consumer price index then this fraction will get larger over time. If
average weekly earnings rise slower than the consumer price index then this fraction will get smaller over time.
A number of terms are used to describe the result of dividing a time series by a price index. A series that has been
divided by a price index in this way is said to be at constant year B prices where B is the base year of the price
index used. The series is said to have been delated by the price index. With earnings the original series is often
referred to as nominal income and the deated series as real income.
The method is illustrated in the following example.
Example 12.16
Use the information in Table 12.47 belowto construct an index of the earnings of Australian employees at constant
19891990 prices.
Year Average Weekly Earnings ($) Consumer Price Index (8990=100)
1990 471.1 100.0
1991 485.6 106.0
1992 505.8 107.6
1993 518.7 107.9
1994 532.8 110.0
1995 549.0 112.8
1996 565.5 118.5
1997 578.9 120.3
1998 597.5 120.0
1999 612.7 121.9
First calculate a time series of average weekly earnings at constant 1989-1990 prices by
Earnings at constant 1989-1990 prices =
Average weekly earnings
Consumer price index with base year 19891990
Year Average Weekly Earnings ($) The CPI Earnings at 19891990 prices
1990 471.1 100.0
471.1
100.0
100 = 471.1
1991 485.6 106.0
485.6
106.0
100 = 458.1
1992 505.8 107.6
505.8
107.6
100 = 470.1
1993 518.7 107.9
518.7
107.9
100 = 480.7
1994 532.8 110.0
532.8
110.0
100 = 484.4
1995 549.0 112.8
549.0
112.8
100 = 486.7
1996 565.5 118.5
565.5
118.5
100 = 477.2
1997 578.9 120.0
578.9
120.0
100 = 482.4
1998 597.5 121.9
597.5
121.9
100 = 490.2
1999 612.7 124.1
612.7
124.1
100 = 493.7
Now scale the earnings at constant 19891990 prices to form an index.
Table 12.49: Average Weekly Earnings at Constant Prices
Year Average Weekly Earnings Index of Average Weekly Earnings
at 19891990 prices($) at 19891990 prices
1990 471.1 100.0
1991 458.1 97.2
1992 470.1 99.8
1993 480.7 102.0
1994 484.4 102.8
1995 486.7 103.3
1996 477.2 101.3
1997 482.4 102.4
1998 490.2 104.1
1999 493.7 104.8
Over the ten year period average weekly earnings at constant prices have only risen by 4.8%.
Quick Question 11
12.16 Summary
In this unit you learned how to forecast the future values of a time series. To make forecasts the time series has to
be decomposed into a trend effect, a seasonal effect and a random effect. You learned how to estimate the trend
component using a linear and a quadratic trend line. You also learned how to estimate the seasonal effects using
both a additive and a multiplicative model. The residuals were examined to check on the appropriateness of the
estimated models.
In the second half of the unit index numbers were discussed. The methods used to construct index numbers were
described and applied to wages and price time series. The Australian Consumer Price Index was described and
you learned how to use a price index series to deate a time series.

Statisticsnotes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statisticsnotes

Uploaded by

Copyright:

Available Formats

Quantitative Methods G

Unit 1 : Data Collection

18.6 18.3 18.3 15.6 15.2

Frequency Relative Frequency

80 70 70 150 70 80 110 70 110 = 86.77

90 95 100 105 750 = 146.4

1.037 1.013 0.998 1.012 = 1.0149

0 50 100 150 200 250 300 350

(x x) = 0. Positive and negative deviations from the mean cancel out.)

For sample data the z score of an observation X is: z =

Source: Random sample of households

Source: Bureau of Statistics, 1998

The points slope up from left to right. As X increases Y usually increases.

Non-monotonic scatter diagrams could result from plotting:

Source: Random sample of households

Source: Random sample of households

Food = 0.5 + 0.1 Income

Z .00 .01 .02 .03 .04 .09

is the z-score with an area of to the right (where 0 1).

is located to the right of the mean and hence is positive.

is located to the left of the mean and hence is negative.

maximises the value of pq over the range of possible p.

= the value of p that maximises pq over the range of possible p.

A measure of the difference between the sample mean and

. The decision rule can now be written as

With this decision rule the probability of a type I error is .

. The decision rule can now be written as

With this decision rule, the probability of a type I error is .

Figure 9.6: The Rejection Region for H

Figure 9.7: The Rejection Region for H

If users > pvalue : reject H

is called the standard error of estimate.

, is calculated in the Residuals row of the ANOVA table. The

is in the MS column of this row.

, is given in the Regression Statistics.

Food = 0.5 + 0.1 Income

Food = 0.5 + 0.1Income R

Food = 0.5 + 0.1Income R

Food = 0.5 + 0.1Income

Food = 0.5 + 0.1 7 = 1.2

Food = 0.5 + 0.1Income

0.8183 0.7844 0.6408 0.8208 = 0.7623

You might also like