You are on page 1of 29

Lecture 1: Sampling

Distributions

1
Introduction
 In real life calculating parameters of
populations is prohibitive because
populations are very large.
 Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of interest,
and make an inference.
 The sampling distribution of the statistic is
the tool that tells us how close is the statistic
to the parameter.

2
Sampling Distribution of
the Mean
 An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on any
throw.
– The probability distribution of X is E(X) = 1(1/6) +
2(1/6) + 3(1/6)+
x 1 2 3 4 5 6 ………= 3.5
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 V(X) = (1-3.5)2 +
(2-3.5)2 + ………
………. = 2.92
3
 Suppose we want to estimate  from x the
mean of a sample of size n = 2.
 What is the distribution that x can follow?

Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2
3,2 2.5
2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 44
44 1,4
1,4 2.5
2.5 16
16 3,4
3,4 3.5
3.5 28
28 5,4
5,4 4.5
4.5
55 1,5
1,5 33 17
17 3,5
3,5 44 29
29 5,5
5,5 55
66 1,6
1,6 3.5
3.5 18
18 3,6
3,6 4.5
4.5 30
30 5,6
5,6 5.5
5.5
77 2,1
2,1 1.5
1.5 19
19 4,1
4,1 2.5
2.5 31
31 6,1
6,1 3.5
3.5
88 2,2
2,2 22 20
20 4,2
4,2 33 32
32 6,2
6,2 44
99 2,3
2,3 2.5
2.5 21
21 4,3
4,3 3.5
3.5 33
33 6,3
6,3 4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5
4,5 4.5
4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66

4
Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2 2.5
3,2 2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 
44
2
44 1,4
1,4 2.5
2.5 16
16 3,4 3.5
3,4 Note :    and  
3.5 28
28 x
5,4
5,4 x
4.5
4.5
2
x
x
55
66
1,5
1,5
1,6
1,6
33
3.5
3.5
17
17
18
18
3,5
3,5 44
3,6 4.5
3,6 4.5
29
29
30
30
5,5
5,5
5,6
5,6
2
55
5.5
5.5
77 2,1
2,1 1.5
1.5 19
19 4,1 2.5
4,1 2.5 31
31 6,1
6,1 3.5
3.5
88 2,2
2,2 22 20
20 4,2
4,2 33 32
32 6,2
6,2 44
99 2,3
2,3 2.5
2.5 21
21 4,3 3.5
4,3 3.5 33
33 6,3
6,3 4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5 4.5
4,5 4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66

E( x) =1.0(1/36)+
6/36 1.5(2/36)+….=3.5
5/36

4/36
x
V(X) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46
3/36
2/36
1/36
1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 5
n5
 x  3.5
n  10
 2x
 x  .5833 (  )
2
 x  3.5
5
 2x n  25
  .2917 (  )
2
x  x  3.5
10
 2x
  .1167 (  )
2
x
25
1 6

Notice that  x is smaller


2
1 6
than x. The larger the sample
size the smaller  x . Therefore,
2

x tends to fall closer to, as the


sample size increases.
1 6 6
The Central Limit Theorem
– If a random sample is drawn from any
population,
– the sampling distribution of the sample mean is
approximately normal for a sufficiently large
sample size.
– The larger the sample size, the more closely the
sampling distribution of x will resemble a
normal distribution.

7
The Sampling Distribution of the Sample
Mean
1.  x   x

2
2.  
2
x
x
n
3. If x is normal, x is normal. If x is nonnormal
x is approximately normally distributed for
sufficiently large sample size.

8
 Example 8.1
– The amount of soda pop in each bottle is
normally distributed with a mean of 32.2
ounces and a standard deviation of .3 ounces.
– Find the probability that a bottle bought by a
customer will contain more than 32 ounces.
– Solution
 The random variable X is the amount of soda in a
bottle. 0.7486
x   32  32.2
P( x  32)  P(  )
x .3
x = 32
 P( z  .67)  0.7486  = 32.2
9
– Find the probability that a carton of four bottles
will have a mean of more than 32 ounces of
soda per bottle.
– Solution
 The random variable here is the mean amount of
soda per bottle.
x   32  32.2
P( x  32)  P(  )
x .3 4
0.9082
 P( z  1.33)  0.9082
0.7486
x = 32
x  32  = 32.2
 x  32.2 10
 Example 8.2
– The average weekly income of graduates one
year after graduation is $600.
– Suppose the distribution of weekly income has
a standard deviation of $100. What is the
probability that 25 randomly selected graduates
have an average weekly income of less than
$550?
– Solution x   550  600
P( x  550)  P(  )
x 100 25
 P( z  2.5)  0.0062

11
– If a random sample of 25 graduates actually had
an average weekly income of $550, what would
you conclude about the validity of the claim
that the average weekly income is 600?
– Solution
 With  = 600 the probability to have a sample mean
of 550 is very low (0.0062). The claim that the
average weekly income $600 is probably unjustified.
 It will be more reasonable to assume that  is
smaller than $600, because then a sample mean of
$550 becomes more probable.

12
Standard normal distribution Z

.025 .025

Normal distribution of x
-1.96 0 -1.96

.025 .025

  
  1.96   1.96
n n

13
In general

 
P(   z  2  x    z 2 )  1 
n n

14
Substituti ng   600,   100, and n  25 from example 8.2
100 100
P(600  1.96  x  600  1.96 )  .95
25 25
Which reduces to
P(560.8  x  639.2)  .95

 Conclusion
– There is 95% chance that the sample mean falls
within the interval [560.8, 639.2] if the
population mean is 600.
– Since the sample mean was 550, the population
mean is probably not 600.
15
Sampling Distribution of a
Proportion
 The parameter of interest for qualitative data is
the proportion of times a particular outcome
(success) occurs.
 To estimate the population proportion p we use
the sample proportion p^ .
 The sampling distribution of p^ is binomial.
 We prefer to use normal approximation to the
binomial distribution to make inferences about p^
. 16
Normal approximation to the Binomial
– Normal approximation to the binomial works
best when
 the number of experiments (sample size) is large,
and
 the probability of success, p, is close to 0.5.

– For the approximation to provide good results:


np > 5; n(1 - p) > 5

17
Let us build a normal distribution to
approximate the binomial P(X = 10).
 = np = 20(.5) = 10; 2 = np(1 - p) = 20(.5)(1 - .5) = 5
The exact probability is P(X = 10) = .176

P(9.5<YNormal<10.5)
The approximation

9.5 10 10.5

9.5  10 10.5  10
P(XBinomial = 10) ~= P(9.5<Y<10.5)  P( Z )  .1742
2.24 2.24

18
Sampling Distribution of a
Proportion
 The parameter of interest for qualitative data is
the proportion of times a particular outcome
(success) occurs.
 To estimate the population proportion p we use
the sample proportion p^ .
 The sampling distribution of p^ is binomial.
 We prefer to use normal approximation to the
binomial distribution to make inferences about p^
. 19
Normal approximation to the Binomial
– Normal approximation to the binomial works
best when
 the number of experiments (sample size) is large,
and
 the probability of success, p, is close to 0.5.

– For the approximation to provide good results:


np > 5; n(1 - p) > 5

20
Let us build a normal distribution to
approximate the binomial P(X = 10).
 = np = 20(.5) = 10; 2 = np(1 - p) = 20(.5)(1 - .5) = 5
The exact probability is P(X = 10) = .176

P(9.5<YNormal<10.5)
The approximation

9.5 10 10.5

9.5  10 10.5  10
P(XBinomial = 10) ~= P(9.5<Y<10.5)  P( Z )  .1742
2.24 2.24

21
 More normal approximation exercises

P(X<=8) = P(Y< 8.5)

8
8.5
P(X>= 14) = P(Y > 13.5)

For large n the effects of the


continuity correction factor is
13.5 14
very small and will be omitted.

22
Approximate sampling distribution
^p of a
sample proportion

– From the laws of expected value and variance,


it can be shown that E( p̂ ) = p and V( p̂ ) = p(1-
p)/n
– If both np > 5 and np(1-p) > 5, then

p̂pp

zz 
pp((11pp))
nn
is approximately standard normally distributed.
23
 Example 8.3
– The Laurier company’s brand has a market
share of 30%. In a survey 1000 consumers
were asked which brand they prefer.
– What is the probability that more than 32% of
all the respondents say they prefer the Laurier
brand?
– Solution
 The number of respondents who prefer Laurier is
binomial with n = 1000 and p = .30. Also, np =
1000(.3) = 300 > 5
n(1-p) = 1000(1-.3) = 700 > 5.
 p̂  p . 32  . 30 
P(p̂  .32)  P    .0838
 p(1  p) n .01449 
  24
Sampling Distribution of the
Difference Between Two
Means
 The difference between two means can
become a parameter of interest when the
comparison between two populations is
studied.

 To make an inference about 1 - 2 we


observe the distribution of x 1  x 2.

25
 Applying the laws of expected value and
variance we have:
E( x 1  x 2 )  E( x 1 )  E( x 2 )   1   2
 
2 2

V( x1  x 2 )  V( x1 )  V( x 2 )   1 2

n n
 The distribution of x 1  x 2 is normal with
mean 1 - 2 and standard deviation of   
2 2
1 2

n n
if
– the two samples are independent
– the original populations are normally
26
distributed.
 If the original populations are not normally
distributed but the sample sizes are 30 or
more, the distribution of x 1  x 2 is
approximately normal.
 Example 8.4
– The starting salaries of MBA students from two
universities (WLU and UWO) are $62,000
(stand.dev. = $14,500), and $60,000 (stand. dev. =
$18,3000).
– What is the probability that a sample mean of
WLU students will exceed the sample mean of
UWO students? (nWLU = 50; nUWO = 60)
27
 Solution
1 - 2 = 62,000 - 60,000 = $2,000
12  22 14,5002 18,3002
    $3,128
n n 50 60

x 1  x 2  ( 1 -  2 ) 0  2000
P ( x 1  x 2  0)  P (  )
12 22 3128

n1 n2
 P( z  .64)  .5  .2389  .7389

28
 More normal approximation exercises

P(X<=8) = P(Y< 8.5)

8
8.5
P(X>= 14) = P(Y > 13.5)

For large n the effects of the


continuity correction factor is
13.5 14
very small and will be omitted.

29

You might also like