You are on page 1of 18

6/02/2014

Chapter 10
Sampling distributions

Introduction
In real life, calculating the parameters
of populations is prohibitive because
populations are very large.
Rather than investigating the whole
population, we take a sample, calculate
a statistic related to the parameter of
interest and make an inference.
The sampling distribution of the statistic
is the tool that tells us how close the
statistic is to the parameter.
3

6/02/2014

Sampling Distributions
A sampling distribution is created by, as the
name suggests, sampling.
The method we will employ to derive the
sampling distribution uses the rules of
probability and the laws of expected value
and variance.
For example, consider the roll of one and two
dice
4

10.1 Sampling Distribution of the


Sample Mean
Sampling distribution of a single die
A fair die is thrown infinitely many times,
with the random variable X = Number of spots
showing on any throw.
The probability distribution of X is:
x
1
2
3
4
5
6
P(x)

1/6

1/6

1/6

1/6

1/6

1/6

and the mean and variance are calculated as:

Sampling Distribution of Two Dice


A sampling distribution of the sample mean is
created by looking at all samples of size n=2
(i.e. two dice) and their means

While there are 36 possible samples of size


2, there are only 11 values for , and some
(e.g. =3.5) occur more frequently than
others (e.g.
=1).

6/02/2014

Sampling Distribution of Two Dice


The sampling distribution of

is shown below:

P( )

Compare
Compare the distribution of X

with the sampling distribution of

As well, note that:


8

Generalise
We can generalise the mean and variance of the
sampling of two dice:

to n-dice:
The standard deviation of
the sampling distribution of
the sample mean is called
the standard error:
9

6/02/2014

Notice that
is smaller
than x. The larger the sample 1
size the smaller . Therefore,
tends to fall closer to, as the
sample size increases.

10

The variance of the sample mean is smaller


than the variance of the population.
Mean = 1.5 Mean = 2. Mean = 2.5

Population

1.5
2.5
3
222
1.5
2.5
1.5
2
2.5
1.5
2
2.5
2.5
Compare1.5
the variability
of the
2
1.5
2.5 population
1.5
2.5 mean.
Let us take samples to the variability
of 22the sample
1.5
2.5
1.5
2.5
of two observations.
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1

Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
11

Central Limit Theorem


If a random sample is drawn from a normal
population, then the sampling distribution of
the sample mean is normally distributed for
all values of n (sample size).
If a random sample is drawn from any
population, the sampling distribution of the
sample mean is approximately normal for a
sufficiently large sample size.

12

6/02/2014

Central Limit Theorem


The larger the sample size, the more closely
the sampling distribution of
will resemble a
normal distribution.
In most practical situations, a sample size of
30 may be sufficiently large to allow us to use
the normal distribution as an approximation
for the sampling distribution of .

13

Sampling Distribution of the Sample Mean

14

Sampling Distribution of the Sample Mean


We can standardise the sampling distribution of
the sample mean
as

15

6/02/2014

Sampling Distribution of the Sample Mean


The summaries above assume that the population
is infinitely large. However, if the population is finite
the standard error is

where N is the population size and

is the finite population correction factor.


16

Sampling Distribution of the Sample Mean


If the population size is large relative to the
sample size the finite population correction
factor is close to 1 and can be ignored.
We will treat any population that is at least 20
times larger than the sample size as large.
In practice, most applications involve
populations that qualify as large.
As a consequence the finite population
correction factor is usually omitted.
17

Example 10.1
The weight of each 32g chocolate bar is
normally distributed with a mean of 32.2 g
and a standard deviation of 0.3 g.
Find the probability that, if a customer buys
one chocolate bar, that bar will weigh more
than 32 g.

Solution
The random variable X is the weight of a
chocolate bar.

0.7486

x = 32
= 32.2

18

6/02/2014

Find the probability that, if a customer buys


a pack of 4 bars, the mean weight of the 4
bars will be more than 32 g.

Solution
The random variable here is the mean
weight per chocolate bar, . We want

0.9082
0.7486
x = 32
= 32.2
19

Example
The average weekly income of graduates one
year after graduation is $600. Suppose the
distribution of weekly income has a standard
deviation of $100. What is the probability that
25 randomly-selected graduates have an
average weekly income of less than $550?
Solution
Let X be the weekly income of graduates one
year after graduation.

20

Example
The average weekly income of graduates
one year after graduation is $600. Suppose
the distribution of weekly income has a
standard deviation of $100.
1. What is the probability that 25 randomlyselected graduates have an average weekly
income of less than $550?
2. If a random sample of 25 graduates actually
had an average weekly income of $550, what
would you conclude about the validity of the
claim that the average weekly income is
$600?
21

6/02/2014

Solution
Let X be the weekly income of graduates one
year after graduation.
1.

2. With = 600, the probability of having a


sample mean of 550 is very low (0.0062).
So the claim that the average weekly
income is $600 is probably unjustified.
It would be more reasonable to
assume that is smaller than $600,
because then a sample mean of $550
becomes more probable.
22

Using Sampling Distributions for


Inference
To make inferences about population parameters
we use sampling distributions.
The symmetry of the normal distribution along with
the sample distribution of the mean lead to:

- Z.025

Z.025

23

Standard normal distribution Z

0.025

0.025

1.96

Normal distribution of
1.96

0.025

0.025

24

6/02/2014

Conclusion
There is a 95% chance that the sample
mean falls within the interval [560.8,
639.2] if the population mean is 600.
Since the sample mean was 550, the
population mean is probably not 600.
25

In general

26

Creating the Sampling Distribution


by Computer Simulation
By producing data sets of random
numbers that come from given
distributions, we can verify probabilistic
and statistical characteristics.
We simulate a dice-tossing experiment
(creating the distribution of the
average).
Effects of an increasing sample size on
the distribution of the mean are shown.
27

6/02/2014

Simulation of Dice Tossing


Mean = 3.486
Stand. dev. = 1.215

n=2

Mean = 3.495
Stand. dev. = 0.749

n=5

n = 10
Mean = 3.494
Stand. dev. = 0.544

28

Type in the variable values and the probability (type 1/6).

Calculate the means.


Type the bin.

Excel
Creating a
simulated
distribution
of the mean

Create samples
of size two.
Create a histogram
for the distribution
of the mean.

29

10.2 Sampling Distribution of the


Sample Proportion
The parameter of interest for nominal data
is the proportion of times a particular
outcome (success) occurs.
To estimate the population proportion p we
use the sample proportion p^ .
The sampling distribution of p^ is binomial.
We prefer to use normal approximation to
the binomial distribution to make
inferences about p^.
30

10

6/02/2014

Normal Approximation to Binomial

Binomial distribution with n=20 and p=.5 with a


normal approximation superimposed (=10 and
=2.24)

31

Normal Approximation to Binomial


Binomial distribution with n=20 and p=.5 with a normal
approximation superimposed ( =10 and =2.24).
where did these values come from?!
From Section 7.6 we saw that:

Hence:
and
32

Normal Approximation to Binomial


Normal approximation to the binomial
works best when
the number of experiments (sample size) is
large, and
the probability of success p is close to 0.5.

For the approximation to provide good


results:
np 5; n(1 p) 5

33

11

6/02/2014

Normal Approximation to Binomial


To calculate P(X=10) using
the normal distribution, we
can find the area under the
normal curve between 9.5
and 10.5

P(X = 10) P(9.5 < Y < 10.5)


where Y is a normal random variable approximating
the binomial random variable X.
34

Normal Approximation to Binomial


In fact:
P(X = 10) = .176
while
P(9.5 < Y < 10.5) = .1742
the approximation is quite
good.

P(X = 10) P(9.5 < Y < 10.5)


where Y is a normal random variable approximating
the binomial random variable X.
35

Example
Approximate the binomial probability
P(X = 10) when n = 20 and p = 0.5.
The parameters of the normal
distribution used to approximate the
binomial are:
= np; = np(1 p)

36

12

6/02/2014

Let us build a normal distribution to


approximate the binomial P(X = 10).
= np = 20(0.5) = 10; 2 = np(1 p) = 20(0.5)(1 0.5) = 5; = 2.24
The exact probability is P(X = 10) = 0.176.
P(9.5<YNormal<10.5)
the approximation

9.5

10

10.5

P(XBinomial = 10) ~= P(9.5<Y<10.5)


37

More Normal Approximation Exercises

P(X < = 8) = P(Y < 8.5)

8.5

P(X > = 14) = P(Y > 13.5)


For large n the effect of the
continuity correction factor is
very small and will be omitted.

13.5

14
38

Approximate Sampling Distribution of a


Sample Proportion
From the laws of expected value and variance, it
can be shown that E( ) = p and V( ) = p(1p)/n
If both np 5 and np(1p) 5, then

is approximately standard normally distributed.


39

13

6/02/2014

Example
The Laurier companys brand has a market
share of 30%. In a survey, 1 000
consumers were asked which brand they
prefer. What is the probability that more
than 32% of the respondents say they
prefer the Laurier brand?

40

Solution
The number of respondents who prefer Laurier
is binomial with n = 1000 and p = 0.30.
Also, np = 1000(0.3) = 300 > 5
n(1 p) = 1000(1 0.3) = 700 > 5.
Therefore,
is normal with mean p = 0.30
and standard error

Hence

41

Sampling Distribution of the


Difference between Two Means,
The difference between two means can
become a parameter of interest when the
comparison between two populations is
studied.
To make an inference about 1 2 we
observe the distribution of
.

42

14

6/02/2014

Applying the laws of expected value


and variance we have:

The distribution of
is normal
with mean 1 2 and standard
deviation of
if
the two samples are independent
the original populations are normally
distributed.
43

If the original populations are not


normally distributed but the sample
sizes are 30 or more, the distribution
of
is approximately normal.

44

Example
The mean salaries of MBA graduates from
two universities (1 and 2), after 5 years,
are $62 000 (stand. dev. = $14 500) and
$60 000 (stand. dev. = $18 300).What is
the probability that a sample mean of
University-1 graduates will exceed the
sample mean of University-2 graduates?
(n1 = 50; n2 = 60)
Solution
As the sample sizes are more than 30, the
distribution of
is approximately
normal.
45

15

6/02/2014

1 2 = 62 000 60 000 = $2 000

There is about a 74% chance that the sample mean


starting salary of U. #1 grads will exceed that of U.
#2.
46

From Here to Inference


In Chapters 7 and 8 we introduced
probability distributions, which allowed us
to make probability statements about
values of the random variable.
A prerequisite of this calculation is
knowledge of the distribution and the
relevant parameters.

47

From Here to Inference


In Example 7.9, we needed to know that
the probability that Pat Statsdud guesses
the correct answer is 20% (p = .2) and
that the number of correct answers
(successes) in 10 questions (trials) is a
binomial random variable.
We then could compute the probability of
any number of successes.

48

16

6/02/2014

From Here to Inference


In Example 8.3, we needed to know that
the amount of time to assemble a
computer is normally distributed with a
mean of 60 minutes and a standard
deviation of 8 minutes.
These three bits of information allowed us
to calculate the probability of various
values of the random variable.

49

From Here to Inference


The figure below symbolically represents the use of
probability distributions.
Simply put, knowledge of the population and its
parameter(s) allows us to use the probability
distribution to make probability statements about
individual members of the population.

Probability Distribution ----------

Individual
50

From Here to Inference


In this chapter we developed the sampling
distribution, wherein knowledge of the parameter(s)
and some information about the distribution allow
us to make probability statements about a sample
statistic.

-----

Statistic

51

17

6/02/2014

From Here to Inference


Statistics works by reversing the direction of
the flow of knowledge in the previous figure.
The next figure displays the character of
statistical inference.
Starting in Chapter 11, we will assume that
most population parameters are unknown. The
statistics practitioner will sample from the
population and compute the required statistic.
The sampling distribution of that statistic will
enable us to draw inferences about the
parameter.
52

From Here to Inference

Statistic

------

Parameter

53

18

You might also like