PRBE002 Slides Session 05

6/02/2014
Chapter 10
Sampling distributions
Introduction
In real life, calculating the parameters
of populations is prohibitive because
populations are very large.
Rather than investigating the whole
population, we take a sample, calculate
a statistic related to the parameter of
interest and make an inference.
The sampling distribution of the statistic
is the tool that tells us how close the
statistic is to the parameter.
3
6/02/2014
Sampling Distributions
A sampling distribution is created by, as the
name suggests, sampling.
The method we will employ to derive the
sampling distribution uses the rules of
probability and the laws of expected value
and variance.
For example, consider the roll of one and two
dice
4
10.1 Sampling Distribution of the

Sample Mean
Sampling distribution of a single die
A fair die is thrown infinitely many times,
with the random variable X = Number of spots
showing on any throw.
The probability distribution of X is:
x
1
2
3
4
5
6
P(x)
1/6
1/6
1/6
1/6
1/6
1/6
and the mean and variance are calculated as:
Sampling Distribution of Two Dice

A sampling distribution of the sample mean is
created by looking at all samples of size n=2
(i.e. two dice) and their means
While there are 36 possible samples of size

2, there are only 11 values for , and some
(e.g. =3.5) occur more frequently than
others (e.g.
=1).
6/02/2014
Sampling Distribution of Two Dice

The sampling distribution of
is shown below:
P( )
Compare
Compare the distribution of X
with the sampling distribution of
As well, note that:

8
Generalise
We can generalise the mean and variance of the
sampling of two dice:
to n-dice:
The standard deviation of
the sampling distribution of
the sample mean is called
the standard error:
9
6/02/2014
Notice that
is smaller
than x. The larger the sample 1
size the smaller . Therefore,
tends to fall closer to, as the
sample size increases.
10
The variance of the sample mean is smaller

than the variance of the population.
Mean = 1.5 Mean = 2. Mean = 2.5
Population
1.5
2.5
3
222
1.5
2.5
1.5
2
2.5
1.5
2
2.5
2.5
Compare1.5
the variability
of the
2
1.5
2.5 population
1.5
2.5 mean.
Let us take samples to the variability
of 22the sample
1.5
2.5
1.5
2.5
of two observations.
2
1.5
2.5
1.5
2
2.5
1.5
2
2.5
1.5
2
2.5
1
Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
11
Central Limit Theorem

If a random sample is drawn from a normal
population, then the sampling distribution of
the sample mean is normally distributed for
all values of n (sample size).
If a random sample is drawn from any
population, the sampling distribution of the
sample mean is approximately normal for a
sufficiently large sample size.
12
6/02/2014
Central Limit Theorem

The larger the sample size, the more closely
the sampling distribution of
will resemble a
normal distribution.
In most practical situations, a sample size of
30 may be sufficiently large to allow us to use
the normal distribution as an approximation
for the sampling distribution of .
13
Sampling Distribution of the Sample Mean
14

We can standardise the sampling distribution of
the sample mean
as
15
6/02/2014

The summaries above assume that the population
is infinitely large. However, if the population is finite
the standard error is
where N is the population size and
is the finite population correction factor.

16

If the population size is large relative to the
sample size the finite population correction
factor is close to 1 and can be ignored.
We will treat any population that is at least 20
times larger than the sample size as large.
In practice, most applications involve
populations that qualify as large.
As a consequence the finite population
correction factor is usually omitted.
17
Example 10.1
The weight of each 32g chocolate bar is
normally distributed with a mean of 32.2 g
and a standard deviation of 0.3 g.
Find the probability that, if a customer buys
one chocolate bar, that bar will weigh more
than 32 g.
Solution
The random variable X is the weight of a
chocolate bar.
0.7486
x = 32
= 32.2
18
6/02/2014
Find the probability that, if a customer buys

a pack of 4 bars, the mean weight of the 4
bars will be more than 32 g.
Solution
The random variable here is the mean
weight per chocolate bar, . We want
0.9082
0.7486
x = 32
= 32.2
19
Example
The average weekly income of graduates one
year after graduation is $600. Suppose the
distribution of weekly income has a standard
deviation of $100. What is the probability that
25 randomly-selected graduates have an
average weekly income of less than $550?
Solution
Let X be the weekly income of graduates one
year after graduation.
20
Example
The average weekly income of graduates
one year after graduation is $600. Suppose
the distribution of weekly income has a
standard deviation of $100.
1. What is the probability that 25 randomlyselected graduates have an average weekly
income of less than $550?
2. If a random sample of 25 graduates actually
had an average weekly income of $550, what
would you conclude about the validity of the
claim that the average weekly income is
$600?
21
6/02/2014
Solution
Let X be the weekly income of graduates one
year after graduation.
1.
2. With = 600, the probability of having a

sample mean of 550 is very low (0.0062).
So the claim that the average weekly
income is $600 is probably unjustified.
It would be more reasonable to
assume that is smaller than $600,
because then a sample mean of $550
becomes more probable.
22
Using Sampling Distributions for

Inference
To make inferences about population parameters
we use sampling distributions.
The symmetry of the normal distribution along with
the sample distribution of the mean lead to:
- Z.025
Z.025
23
Standard normal distribution Z
0.025
0.025
1.96
Normal distribution of
1.96
0.025
0.025
24
6/02/2014
Conclusion
There is a 95% chance that the sample
mean falls within the interval [560.8,
639.2] if the population mean is 600.
Since the sample mean was 550, the
population mean is probably not 600.
25
In general
26
Creating the Sampling Distribution

by Computer Simulation
By producing data sets of random
numbers that come from given
distributions, we can verify probabilistic
and statistical characteristics.
We simulate a dice-tossing experiment
(creating the distribution of the
average).
Effects of an increasing sample size on
the distribution of the mean are shown.
27
6/02/2014
Simulation of Dice Tossing

Mean = 3.486
Stand. dev. = 1.215
n=2
Mean = 3.495
Stand. dev. = 0.749
n=5
n = 10
Mean = 3.494
Stand. dev. = 0.544
28
Type in the variable values and the probability (type 1/6).
Calculate the means.

Type the bin.
Excel
Creating a
simulated
distribution
of the mean
Create samples
of size two.
Create a histogram
for the distribution
of the mean.
29
10.2 Sampling Distribution of the

Sample Proportion
The parameter of interest for nominal data
is the proportion of times a particular
outcome (success) occurs.
To estimate the population proportion p we
use the sample proportion p^ .
The sampling distribution of p^ is binomial.
We prefer to use normal approximation to
the binomial distribution to make
inferences about p^.
30
10
6/02/2014
Normal Approximation to Binomial
Binomial distribution with n=20 and p=.5 with a

normal approximation superimposed (=10 and
=2.24)
31

Binomial distribution with n=20 and p=.5 with a normal
approximation superimposed ( =10 and =2.24).
where did these values come from?!
From Section 7.6 we saw that:
Hence:
and
32

Normal approximation to the binomial
works best when
the number of experiments (sample size) is
large, and
the probability of success p is close to 0.5.
For the approximation to provide good

results:
np 5; n(1 p) 5
33
11
6/02/2014

To calculate P(X=10) using
the normal distribution, we
can find the area under the
normal curve between 9.5
and 10.5
P(X = 10) P(9.5 < Y < 10.5)

where Y is a normal random variable approximating
the binomial random variable X.
34

In fact:
P(X = 10) = .176
while
P(9.5 < Y < 10.5) = .1742
the approximation is quite
good.
P(X = 10) P(9.5 < Y < 10.5)

where Y is a normal random variable approximating
the binomial random variable X.
35
Example
Approximate the binomial probability
P(X = 10) when n = 20 and p = 0.5.
The parameters of the normal
distribution used to approximate the
binomial are:
= np; = np(1 p)
36
12
6/02/2014
Let us build a normal distribution to

approximate the binomial P(X = 10).
= np = 20(0.5) = 10; 2 = np(1 p) = 20(0.5)(1 0.5) = 5; = 2.24
The exact probability is P(X = 10) = 0.176.
P(9.5<YNormal<10.5)
the approximation
9.5
10
10.5
P(XBinomial = 10) ~= P(9.5<Y<10.5)

37
More Normal Approximation Exercises
P(X < = 8) = P(Y < 8.5)
8.5
P(X > = 14) = P(Y > 13.5)

For large n the effect of the
continuity correction factor is
very small and will be omitted.
13.5
14
38
Approximate Sampling Distribution of a

Sample Proportion
From the laws of expected value and variance, it
can be shown that E( ) = p and V( ) = p(1p)/n
If both np 5 and np(1p) 5, then
is approximately standard normally distributed.

39
13
6/02/2014
Example
The Laurier companys brand has a market
share of 30%. In a survey, 1 000
consumers were asked which brand they
prefer. What is the probability that more
than 32% of the respondents say they
prefer the Laurier brand?
40
Solution
The number of respondents who prefer Laurier
is binomial with n = 1000 and p = 0.30.
Also, np = 1000(0.3) = 300 > 5
n(1 p) = 1000(1 0.3) = 700 > 5.
Therefore,
is normal with mean p = 0.30
and standard error
Hence
41
Sampling Distribution of the

Difference between Two Means,
The difference between two means can
become a parameter of interest when the
comparison between two populations is
studied.
To make an inference about 1 2 we
observe the distribution of
.
42
14
6/02/2014
Applying the laws of expected value

and variance we have:
The distribution of
is normal
with mean 1 2 and standard
deviation of
if
the two samples are independent
the original populations are normally
distributed.
43
If the original populations are not

normally distributed but the sample
sizes are 30 or more, the distribution
of
is approximately normal.
44
Example
The mean salaries of MBA graduates from
two universities (1 and 2), after 5 years,
are $62 000 (stand. dev. = $14 500) and
$60 000 (stand. dev. = $18 300).What is
the probability that a sample mean of
University-1 graduates will exceed the
sample mean of University-2 graduates?
(n1 = 50; n2 = 60)
Solution
As the sample sizes are more than 30, the
distribution of
is approximately
normal.
45
15
6/02/2014
1 2 = 62 000 60 000 = $2 000
There is about a 74% chance that the sample mean

starting salary of U. #1 grads will exceed that of U.
#2.
46
From Here to Inference

In Chapters 7 and 8 we introduced
probability distributions, which allowed us
to make probability statements about
values of the random variable.
A prerequisite of this calculation is
knowledge of the distribution and the
relevant parameters.
47

In Example 7.9, we needed to know that
the probability that Pat Statsdud guesses
the correct answer is 20% (p = .2) and
that the number of correct answers
(successes) in 10 questions (trials) is a
binomial random variable.
We then could compute the probability of
any number of successes.
48
16
6/02/2014

In Example 8.3, we needed to know that
the amount of time to assemble a
computer is normally distributed with a
mean of 60 minutes and a standard
deviation of 8 minutes.
These three bits of information allowed us
to calculate the probability of various
values of the random variable.
49

The figure below symbolically represents the use of
probability distributions.
Simply put, knowledge of the population and its
parameter(s) allows us to use the probability
distribution to make probability statements about
individual members of the population.
Probability Distribution ----------
Individual
50

In this chapter we developed the sampling
distribution, wherein knowledge of the parameter(s)
and some information about the distribution allow
us to make probability statements about a sample
statistic.
-----
Statistic
51
17
6/02/2014

Statistics works by reversing the direction of
the flow of knowledge in the previous figure.
The next figure displays the character of
statistical inference.
Starting in Chapter 11, we will assume that
most population parameters are unknown. The
statistics practitioner will sample from the
population and compute the required statistic.
The sampling distribution of that statistic will
enable us to draw inferences about the
parameter.
52
Statistic
------
Parameter
53
18

PRBE002 Slides Session 05

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PRBE002 Slides Session 05

Uploaded by

Copyright:

Available Formats

6/02/2014

10.1 Sampling Distribution of the

and the mean and variance are calculated as:

Sampling Distribution of Two Dice

While there are 36 possible samples of size

Sampling Distribution of Two Dice

with the sampling distribution of

As well, note that:

The variance of the sample mean is smaller

Central Limit Theorem

Central Limit Theorem

Sampling Distribution of the Sample Mean

Sampling Distribution of the Sample Mean

Sampling Distribution of the Sample Mean

where N is the population size and

is the finite population correction factor.

Sampling Distribution of the Sample Mean

Find the probability that, if a customer buys

2. With = 600, the probability of having a

Using Sampling Distributions for

Standard normal distribution Z

Creating the Sampling Distribution

Simulation of Dice Tossing

Type in the variable values and the probability (type 1/6).

Calculate the means.

10.2 Sampling Distribution of the

Normal Approximation to Binomial

Binomial distribution with n=20 and p=.5 with a

Normal Approximation to Binomial

Normal Approximation to Binomial

For the approximation to provide good

Normal Approximation to Binomial

P(X = 10) P(9.5 < Y < 10.5)

Normal Approximation to Binomial

P(X = 10) P(9.5 < Y < 10.5)

Let us build a normal distribution to

P(XBinomial = 10) ~= P(9.5<Y<10.5)

More Normal Approximation Exercises

P(X < = 8) = P(Y < 8.5)

P(X > = 14) = P(Y > 13.5)

Approximate Sampling Distribution of a

is approximately standard normally distributed.

Sampling Distribution of the

Applying the laws of expected value

If the original populations are not

1 2 = 62 000 60 000 = $2 000

There is about a 74% chance that the sample mean

From Here to Inference

From Here to Inference

From Here to Inference

From Here to Inference

Probability Distribution ----------

From Here to Inference

From Here to Inference

From Here to Inference

You might also like