MAT 540 Statistical Concepts For Research

Chapter 7 VARIATION IN REPEATED SAMPLES SAMPLING DISTRIBUTIONS
7.1 (a) (d) Statistic Parameter (b) (e) Statistic Statistic (c) Parameter
7.2
(a) The 41 movies receiving an award is a statistic. The population to which these belong is the collection of all movies in 2008. (b) The 400 minority persons are a sample from the population of all minority persons living in Chicago. The number out of work, 41, among the 400 is a statistic. (c) The 100 dog owners constitute a sample of the population of dog owners who applied for licenses in northern Wisconsin. The number who owned a Labrador retriever, 18, among the 100 is a statistic. (a) Students working full time would be more likely to take an evening course than students who do not work, and so they would be over-represented in the sample. Also, those who work the evening shift are left out of the survey. (b) A large majority of persons, at all income levels, spend more during the holiday season. The total amount spent and the types of purchases are often atypical. The survey would likely be misleading. (a) The question is worded as to favor the answer yes. (b) The eighth grade girls did not presently have boys in their classroom, so they may have very different opinions than girls that were already in classrooms with boys. (This example is real people tried to reach general conclusions from such a survey.)
7.3
7.4
209
210
CHAPTER 7. VARIATION IN REPEATED SAMPLES
7.5
(a) All possible samples ( x1 , x2 ) and the corresponding x values are: ( x1 , x2 )

x=
x1 + x2 2
(3,3) 3
(3,5) 4
(3,7) 5
(5,3) 4
(5,5) 5
(5,7) 6
(7,3) 5
(7,5) 6
(7,7) 7
(b) The 9 possible samples are equally likely, so each has a probability 1/9 of occurring. The sampling distribution of X is obtained by listing the distinct values of X along with the corresponding probabilities, as follows:
x 3 4 5 6 7 Total
Probability f ( x ) 1/9 2/9 3/9 2/9 1/9 1
7.6
The table below lists the 9 possible samples ( x1 , x2 ) , along with the corresponding values of X and S 2 . Since n = 2 , the sample mean and sample variance for each member of the sample are calculated using these formulae: x +x 2 2 2 x = 1 2 2 , s 2 = n1 1 ( x x ) = ( x1 x ) + ( x2 x )
For example, for ( x1 , x2 ) = (2, 4) , we have:

x=
2+ 4 2
= 3, s 2 = (2 3) 2 + (4 3)2 = 2
( x1 , x2 ) x s2
(0,0) 0 0
(0,2) 1 2
(0,4) 2 8
(2,0) 1 2
(2,2) 2 0
(2,4) 3 2
(4,0) 2 8
(4,2) 3 2
(4,4) 4 0
(b) The 9 possible samples are equally likely, so each has a probability 1/9 of occurring. The sampling distributions of X and S 2 are obtained by listing the distinct values of X and S 2 along with the corresponding probabilities, as follows: x Probability 0 1/9 1 2/9 2 3/9 3 2/9 4 1/9 Total 1
211
s2 0 2 8 Total
Probability 3/9 4/9 2/9 1
7.7 Not a random sample. The photographer would show his better pictures when trying to get a contract for wedding pictures. 7.8 Upon first inspection, it seems that the sample will be representative. But, a moments reflection reveals that it will be biased by having times that tend to be too large, on average. Consider a constant stream of customers. Those taking a long time to check out are more likely to be included in the sample than those with short check out times. (Length bias sampling is a somewhat common departure from random sampling.) 7.9 (a) X = 1 if 1 or 2 dots show X = 2 if 3, 4 or 5 dots show X = 4 if 6 dots show (b) Answers will vary. We obtained the following data from our experiment. Roll 1 = 3 (so X = 2 ), Roll 2 = 4 (so X = 2 ), Roll 3 = 4 (so X = 2 )
The median X value is therefore 2. (c) Answers will vary. The data we collected for the 75 rolls of the die, grouped into 25 3-samples, are tabulated below, along with the corresponding values of X and the median of X in each case. Rolls (3,4,4) (2,1,6) (6,1,1) (3,3,4) (6,1,6) (5,3,2) (2,2,6) (3,1,3) (3,4,1) (6,2,2) (3,5,5) (3,6,4) (4,4,3) Values of X (2,2,2) (1,1,4) (4,1,1) (2,2,2) (4,1,4) (2,2,1) (1,1,4) (2,1,2) (2,2,1) (4,1,1) (2,2,2) (2,4,2) (2,2,2) Median 2 1 1 2 4 2 1 2 2 1 2 2 2 Rolls (2,2,2) (5,3,2) (1,1,2) (6,4,6) (5,5,2) (1,1,6) (3,3,5) (4,1,2) (5,1,1) (3,2,3) (3,1,5) (2,4,4) Values of X (1,1,1) (2,2,1) (1,1,1) (4,2,4) (2,2,1) (1,1,4) (2,2,2) (2,1,1) (2,1,1) (2,1,2) (2,1,2) (1,2,2) Median 1 2 1 4 2 1 2 1 1 2 2 2
212
CHAPTER 7. VARIATION IN REPEATED SAMPLES The relative frequency distribution is: X 1 2 4 Total Frequency 29 36 10 75 Relative Frequency 29/75 = 0.38667 36/75 = 0.48000 10/75 = 0.13333 1.0 Population Relative Frequency 0.33333 0.50000 0.16667 1.0
The sample relative frequency distribution and the population relative frequency distribution should be close since the sample size of 75 is relatively large. In fact, as the sample size increases, the sample relative frequency distribution should better approximate the population relative frequency distribution. (d) The sample median values are also listed in the table in part (c). The relative frequency distribution for the median values is as follows: Xmedian 1 2 4 Total Frequency 9 14 2 25 Relative Frequency 9/25 = 0.36 14/25 = 0.56 2/25 = 0.08 1.0
This distribution does approximate the sampling distribution since the 25 medians corresponding to the 3-samples of rolls are themselves values of X. As such, this is equivalent to rolling a die 25 times, assigning the appropriate value of X, and then forming a sampling distribution of X based on a sample of size 25. As mentioned above, as the sample size increases, such a sample relative frequency distribution should better approximate the population relative frequency distribution. 7.10 (a) See the solution to Exercise 7.9(c). (b) The following is the table from Exercise 7.9(c) with an additional column for Y. (Note: It continues onto the next page.) Rolls (3,4,4) (2,1,6) (6,1,1) (3,3,4) (6,1,6) (5,3,2) (2,2,6) Values of X (2,2,2) (1,1,4) (4,1,1) (2,2,2) (4,1,4) (2,2,1) (1,1,4) Median 2 1 1 2 4 2 1 Y 0 1 1 0 1 1 1 Rolls (2,2,2) (5,3,2) (1,1,2) (6,4,6) (5,5,2) (1,1,6) (3,3,5) Values of X (1,1,1) (2,2,1) (1,1,1) (4,2,4) (2,2,1) (1,1,4) (2,2,2) Median 1 2 1 4 2 1 2 Y 1 1 1 0 1 1 0
213
(3,1,3) (3,4,1) (6,2,2) (3,5,5) (3,6,4) (4,4,3)
(2,1,2) (2,2,1) (4,1,1) (2,2,2) (2,4,2) (2,2,2)
2 2 1 2 2 2
1 1 1 0 0 0
(4,1,2) (5,1,1) (3,2,3) (3,1,5) (2,4,4)
(2,1,1) (2,1,1) (2,1,2) (2,1,2) (1,2,2)
1 1 2 2 2
1 1 1 1 1
The relative frequency distribution for Y is as follows: Y Frequency Relative Frequency 0 7 7/25 = 0.28 1 18 18/25 = 0.72 Total 25 1.0 From our data, P[Y = 1] = 0.72 . 7.11 We have = 83 and = 38 . (a) For n = 4 , E ( X ) = = 83 and sd( X ) =
=
n
38 4
= 19 . = 7.6 .
(b) For n = 25 , E ( X ) = = 83 and sd( X ) = 7.12 We have = 6.7 and = 0.47 . (a) For n = 9 , E ( X ) = = 6.7 and sd( X ) = (b) For n = 16 , E ( X ) = = 6.7 and sd( X ) =
38 25
=
n
0.47 9
= 0.157 . = 0.1175 .
0.47 16
7.13 The population standard deviation is = 1.549 , so sd( X ) = (a) For n = 25, sd( X ) = (b) For n = 100, sd( X ) = (c) For n = 400, sd( X ) = (d) In general,
4n
= 1.549 . n
= 1.549 = 0.3098 25
n
= 1.549 = 0.1549 100

n
= 1.549 = 0.07745 400
=1 2
( ) . So, quadrupling the sample size cuts sd( X ) in half.
7.14 The population standard deviation is = 38 , so sd( X ) = (a) For n = 9 sd( X ) =
38 n
=
n
38 9
= 12.667
38 36
(b) For n = 36, sd( X ) = (c) For n = 144, sd( X ) = (d) In general,
4n
=
n
= 6.333 = 3.167
38 144
=1 2
( ) . So, quadrupling the sample size cuts sd( X ) in half.
214
7.15 We first calculate the mean and standard deviation of the population that corresponds to X taking the values 3, 5, and 7, each having the same probability of occurring (namely 1/3). x 3 5 7 Total f ( x) 1/3 1/3 1/3 1 x f ( x) 3/3 5/3 7/3 15/3 x 2 f ( x) 9/3 25/3 49/3 83/3
Using the values in the table, we have the following:
= x f ( x) =
15 3
=5
8 3
2 8 2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 83 3 5 = 3 , so that =
For n = 2 , we know that the mean and standard deviation of the sampling distribution of X must be as follows: E( X ) = = 5
sd( X ) =
8 3
8 6
4 3
We verify these by actually calculating the distribution of X :

x
f (x ) 1/9 2/9 3/9 2/9 1/9 1
x f (x ) 3/9 8/9 15/9 12/9 7/9 45/9
3 4 5 6 7 Total
x 2 f (x ) 9/9 32/9 75/9 72/9 49/9 237/9
Using the values in the table, we have the following (which do indeed confirm the above assertion): E ( X ) = x f ( x ) = 45 9 =5
Var( X ) = E ( X ) ( E ( X )) 2 = x 2 f ( x ) ( E ( X ))2 = sd( X ) =
4 3 2 237 9 4 52 = 12 9 = 3,
215
7.16 We first calculate the mean and standard deviation of the population that corresponds to X taking the values 0, 2, and 4, each having the same probability of occurring (namely 1/3). x 0 2 4 Total f ( x) 1/3 1/3 1/3 1 x f ( x) 0 2/3 4/3 6/3 x 2 f ( x) 0 4/3 16/3 20/3
= x f ( x) = 6 3 = 2
2 = E ( X 2 ) 2 = x 2 f ( x) 2 =
20 3
22 = 8 3 , so that =
8 3
For n = 2 , we know that the mean and standard deviation of the sampling distribution of X must be as follows:
E( X ) = = 2
sd( X ) =
8 3
8 6
4 3

x
f (x ) 1/9 2/9 3/9 2/9 1/9 1
x f (x ) 0 2/9 6/9 6/9 4/9 18/9
0 1 2 3 4 Total
x 2 f (x ) 0 2/9 12/9 18/9 16/9 48/9
Using the values in the table, we have the following (which do indeed confirm the above assertion): E( X ) = x f (x ) =
2
18 9
=2
48 9 4 22 = 12 9 = 3,
Var( X ) = E ( X ) ( E ( X )) 2 = x 2 f ( x ) ( E ( X )) 2 = sd( X ) =
4 3
216
7.17 (a) All possible samples ( x1 , x2 ) and the corresponding x values and probabilities are tabulated below. By independence, P( x1 , x2 ) = P( x1 ) P( x2 ) , where the values of P(0), P(1), and P(2) are given in the distribution of X. So, for instance, P(0, 0) = P(0) P(0) = (0.3) (0.3) = 0.09 . ( x1 , x2 ) Probability of ( x1 , x2 )
x=
x1 + x2 2
(0,0) 0.09 0
(0,1) 0.12 0.5
(0,2) 0.09 1
(1,0) 0.12 0.5
(1,1) 0.16 1
(1,2) 0.12 1.5
(2,0) 0.09 1
(2,1) 0.12 1.5
(2,2) 0.09 2
The sampling distribution of X is obtained by listing the distinct values of X along with the corresponding probabilities, as follows: x Probability f ( x ) 0 0.09 0.5 0.24 1 0.34 1.5 0.24 2 0.09 Total 1 (b) E ( X ) = = x f ( x ) = 0(0.3) + 1(0.4) + 2(0.3) = 1.0 . This is true for any sample size n. (c) For n = 36 , E ( X ) = 1.0 (as mentioned in part (b)). Also, sd( X ) =
36
, where
is the population standard deviation, which we calculate below.

x 0 1 2 Total f ( x) 0.3 0.4 0.3 1 x f ( x) 0 0.4 0.6 1.0 x 2 f ( x) 0 0.4 1.2 1.6
2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 1.6 12 = 0.6, so that = 0.6 = 0.7746

Thus, sd( X ) = 7.18 (a) E ( X ) = = 7.2 (b) sd( X ) =
36
0.7746 36
= 0.1291 .
= 1.3 = 0.65 4
(c) Since the population distribution is normal, the sample mean X has normal distribution with mean 7.2 and standard deviation 0.65.
217
7.19 (a) E ( X ) = = 115 (b) sd( X ) =
22 6
= 8.981
(c) Since the population distribution is normal, the sample mean X has normal distribution with mean 115 and standard deviation 8.981. 7.20 (a) All possible samples ( x1 , x2 ) and the corresponding x values and probabilities are tabulated below. By independence, P( x1 , x2 ) = P( x1 ) P( x2 ) , where the values of P(0), P(2), and P(4) are given in the distribution of X. So, for instance, P(0, 0) = P(0) P(0) = (0.7) (0.7) = 0.49 . ( x1 , x2 ) Probability of ( x1 , x2 )
x=
x1 + x2 2
(0,0) 0.49 0
(0,2) 0.07 1
(0,4) 0.14 2
(2,0) 0.07 1
(2,2) 0.01 2
(2,4) 0.02 3
(4,0) 0.14 2
(4,2) 0.02 3
(4,4) 0.04 4
The sampling distribution of X is obtained by listing the distinct values of X along with the corresponding probabilities, as follows:
x 0 1 2 3 4 Total
Probability f ( x ) 0.49 0.14 0.29 0.04 0.04 1
(b) E ( X ) = = x f ( x) = 0(0.7) + 2(0.1) + 4(0.2) = 1.0 . This is true for any sample size n. (c) For n = 25 , E ( X ) = 1.0 (as mentioned in part (b)). Also, sd( X ) =
25
, where

x 0 2 4 Total f ( x) 0.7 0.1 0.2 1 x f ( x) 0 0.2 0.8 1.0 x 2 f ( x) 0 0.4 3.2 3.6
2 = E ( X 2 ) 2 = x 2 f ( x) 2 = 3.6 12 = 2.6 so that = 2.6 = 1.612

Thus, sd( X ) =
25
= 1.612 = 0.3224 . 25
218
7.21 Denote X = weight of a package. We are given that X is normal with mean 32.4 and standard deviation 0.4. (a) We convert to the standard normal to obtain X 32.4 32 32.4 P[ X < 32] = P[ < ] = P[ Z < 1] = 0.1587 . 0.4 0.4 Hence, about 16% of the packages weigh less than the labeled amount. (b) Let X1 and X2 denote the weight of two randomly chosen packages. Observe that: E ( X ) = 32.4 sd( X ) = Hence, X =
0.4 2
= 0.2828
X1 + X 2 is normal with mean 32.4 and standard deviation 0.2828. 2 (c) Again, we convert to the standard normal (using part (b)) to obtain X 32.4 32 32.4 P[ X < 32] = P[ < ] = P[Z < 1.414] = 0.0786 . 0.2828 0.2828 Hence, there is about an 8% chance that the average weight of two packages will be less than the labeled amount of 32 ounces.
7.22 Denote X = amount of sports drink in a bottle. We are given that X is normal with mean 101.5 and standard deviation 1.6. (a) We convert to the standard normal to obtain X 101.5 100 101.5 P[ X < 100] = P[ < ] = P[ Z < 0.9375] = 0.1742 . 1.6 1.6 Hence, about 17.4% of the bottles are under filled. (b) Let X1, , X4 denote the amount of drink in four randomly chosen bottles. Observe that: E ( X ) = 101.5 sd( X ) = Hence, X =
1.6 4
= 0.8
X1 + X 2 + X 3 + X 4 is normal with mean 101.5 and standard 4 deviation 0.8.
(c) Again, we convert to the standard normal (using part (b)) to obtain X 101.5 100 101.5 P[ X < 100] = P[ < ] = P[ Z < 1.875] = 0.0304 . 0.8 0.8 Hence, there is about a 3% chance that the average content of four bottles will be less than the labeled amount of 100 ounces.
219
7.23 (a) We have E ( X ) = = 41, 000 and sd( X ) =
5000 100
= 500 . Since n = 100 is
large, the central limit theorem ensures that the distribution of X is approximately normal with mean and standard deviation as calculated above. X 41, 000 (b) The standardized variable is Z = . As such, we have 500 41,500 41, 000 P[ X > 41,500] = P[ Z > ] = P[ Z > 1] = 0.1587 . 500 7.24 (a) We have E ( X ) = = 2.0 and sd( X ) =
2.0 100
= 0.2 . Since n = 100 is large,
the central limit theorem ensures that the distribution of X is approximately normal with mean and standard deviation as calculated above. X 2 . As such, we have (b) The standardized variable is Z = 0.2 2.3 2 P[ X > 2.3] = P[ Z > ] = P[ Z > 1.5] = 0.0668 . 0.2 7.25 The population of fry has mean = 3.4 and standard deviation = 0.8 , so that
E ( X ) = = 3.4 and sd( X ) =
0.8 36
= 0.1333
and the standardized variable is Z = (a) P[ X < 3.2] = P[ Z <
X 3.4 . 0.1333
3.2 3.4 ] = P[ Z < 1.5] = 0.0668 0.1333 (b) Those caught in the net may be slower, less active fish, or even the less healthy ones. Consequently, they may tend to be on the smaller side of the distribution.
7.26 The height of an individual male follows a normal distribution with mean = 70 and standard deviation = 2.8 , so that E ( X ) = = 70 and sd( X ) = n = 2.8 = 1.252 5 and the standardized variable is Z = X 70 . As such, we have 1.252 72.0 70 P[ X > 72.0] = P[ Z > ] = P[ Z > 1.60] = 0.0548 . 1.252
7.27 We have E ( X ) = = 34.5 and sd( X ) = variable is Z =
= 1.3 = 0.5307 , and the standardized 6
X 34.5 . As such, we have 0.5307
220

34.1 34.5 35.2 34.5 <Z< ] 0.5307 0.5307 = P[ .7537 < Z < 1.319] = P[Z < 1.319] P[ Z < 0.7537] = 0.9064 0.2255 = 0.6809
P[34.1 < X < 35.2] = P[
7.28 For a sample of size n = 100 , we have E ( X ) = = 0.05 and X 0.05 . 0.0015 Then, a package will weigh between 4.8 and 5.3 ounces if the mean weight X of 4.8 5.3 the 100 almonds lies between 100 = 0.048 and 100 = 0.053 ounces. Observe that 0.048 0.05 0.053 0.05 P[0.048 < X < 0.053] = P[ <Z< ] 0.0015 0.0015 = P[ 1.333 < Z < 2.000] = P[ Z < 2.000] P[ Z < 1.333] = 0.9772 0.0912 = 0.8860. sd( X ) =
0.015 100
= 0.0015 , and the standardized variable is Z =
7.29 (a) By column, the medians are as follows:
6 7 4 6 4
4 4 4 4 6
4 6 4 7 4
9 4 5 4 2
6 4 7 6 5
6 1 6 7 9
6 6 5 1 5
4 4 7 6 6
2 2 5 6 2
4 2 5 5 4
7 2 4 4 5
5 4 5 4 4
3 7 2 4 4
8 2 4 3 2
5 3 5 6 5
3 5 8 4 5
6 4 4 4 2
4 4 6 8 7
1 5 6 3 5
7 3 4 3 6
(b) & (c) Histograms are given below. The mean has smaller variance.
221
Frequency Chart: x 1 frequency 3
2 10
3 7
4 31
5 17
6 18
7 9
8 3
9 2
7.30 (a) We use the following tabulated values to calculate the mean and standard deviation: x f ( x) x f ( x) x 2 f ( x) 3 0.5 1.5 4.5 4 0.3 1.2 4.8 5 0.2 1.0 5.0 Total 1 3.7 14.3 Consequently, E ( X ) = 3.7 and sd( X ) = 14.3 (3.7) 2 = 0.61 = 0.781 . (b) Let X 1 be the time for the letter to arrive and X 2 the time for the return receipt to arrive. The total time is the sum X1 + X 2 . In order to determine the probability distribution of the sum, we make use of the fact that X 1 and X 2 are independent. For instance, the outcome (3,4), the total is 3 + 4 = 7 and, by independence, the probability of the outcome (3,4) is the product (0.5)(0.3) = 0.15 . All possible samples ( x1 , x2 ) , along with the corresponding totals and probabilities, are tabulated below: ( x1 , x2 ) Probability of ( x1 , x2 )
x1 + x2
(3,3) 0.25 6
(3,4) 0.15 7
(3,5) 0.10 8
(4,3) 0.15 7
(4,4) 0.09 8
(4,5) 0.06 9
(5,3) 0.10 8
(5,4) 0.06 9
(5,5) 0.04 10
The sampling distribution of X1 + X 2 is obtained by listing the distinct values of X1 + X 2 along with the corresponding probabilities, as follows:
222
CHAPTER 7. VARIATION IN REPEATED SAMPLES x1 + x2 6 7 8 9 10 Total Probability 0.25 0.30 0.29 0.12 0.04 1
(c) Let Y denote the number of letters that take 5 days to reach City B. Note that Y has a binomial distribution with n = 100 and p = 0.2 . We use the normal approximation with mean 100 (0.2) = 20 and standard deviation
100 (0.2) (0.8) = 4 . As such, observe that 25 20 P[Y > 25] = P[ Z > ] = P[ Z > 1.25] = 0.1056 . 4 7.31 (a) and (b) We use the following tabulated values to calculate the mean and standard deviation: x f ( x) x f ( x) x 2 f ( x) 0 0.4 0 0 1 0.3 0.3 0.3 2 0.1 0.2 0.4 3 0.2 0.6 1.8 Total 1 1.1 2.5
Consequently, E ( X ) = 1.1 and sd( X ) = 2.5 (1.1)2 = 1.29 = 1.136 . (c) Let X 1 be the number of complaints on the first day and X 2 the number of complaints on the second day. The total number of complaints is the sum X1 + X 2 . In order to determine the probability distribution of the sum, we make use of the fact that X 1 and X 2 are independent. For instance, the outcome (1,2), the total is 1 + 2 = 3 and, by independence, the probability of the outcome (1,2) is the product (0.3)(0.1) = 0.03 . All possible samples ( x1 , x2 ) , along with the corresponding totals and probabilities, are tabulated below: ( x1 , x2 ) Probability of ( x1 , x2 )
x1 + x2
(0,0) (0,1) 0.16 0 (2,0) 0.04 2 0.12 1 (2,1) 0.03 3
(0,2) 0.04 2 (2,2) 0.01 4
(0,3) 0.08 3 (2,3) 0.02 5
(1,0) 0.12 1 (3,0) 0.08 3
(1,1) 0.09 2 (3,1) 0.06 4
(1,2) 0.03 3 (3,2) 0.02 5
(1,3) 0.06 4 (3,3) 0.04 6
( x1 , x2 ) Probability of ( x1 , x2 )
x1 + x2
223
The sampling distribution of X1 + X 2 is obtained by listing the distinct values of X1 + X 2 along with the corresponding probabilities, as follows: Probability x1 + x2 0 1 2 3 4 5 6 Total 0.16 0.24 0.17 0.22 0.13 0.04 0.04 1
(d) Note that if the number of complaints is more than 125, the sample mean will be greater than 125 90 . Since the sample size n = 90 is large, we approximate the distribution of X by a normal distribution with mean 1.1 and standard deviation n = 1.136 = 0.1197 . Indeed, we have 90
P[ X > 125 90 ] = P[ Z >
125 90
1.1 ] = P[ Z > 2.413] = 0.0079 . 0.1197
7.32 According to the model, the three monthly differences D1 , D2 , and D3 are independent, and each difference has population mean 0 and population variance 2 = (0.0128) 2 = 0.0001638 . According to Appendix A3.3, the expected value of the sum of two random variables is the sum of two expectations. Applying this first to D1 + D2 and then to
( D1 + D2 ) + D3 = D1 + D2 + D3 , we observe that
E ( D1 + D2 + D3 ) = E ( D1 ) + E ( D2 ) + E ( D3 ) = 0 + 0 + 0 = 0 . (Note that this is also equal to three times the population mean.) Next, according to Appendix A3.3, the variance of the sum of two independent random variables is the sum of two variances. Applying this first to D1 + D2 and then to ( D1 + D2 ) + D3 = D1 + D2 + D3 , we observe that
Var( D1 + D2 + D3 ) = 2 + 2 + 2 = 3(0.0128)2 = 0.0004915 . (Note that this is also equal to three times the population variance.) The standard deviation is then given by
sd( D1 + D2 + D3 ) = 2 + 2 + 2 = 3 = 3(0.0128) = 0.02217 .
224
7.33 (a) The histogram is given below:
(b) There is more variability in the differences two months apart than in the onemonth differences. A computer calculation gives the standard deviation 0.0197 for two-month differences versus 0.0128 for the one-month differences. 7.34 (a) The table below lists the 16 possible samples ( x1 , x2 ) , along with the corresponding values of X . Since n = 2 , the sample mean for each member of the x +x sample is calculated using the formula x = 1 2 2 . ( x1 , x2 ) x ( x1 , x2 ) x (0,0) (0,2) (0,4) (0,6) (2,0) (2,2) (2,4) (2,6) 0 1 2 3 1 2 3 4
(4,0) (4,2) (4,4) (4,6) (6,0) (6,2) (6,4) (6,6) 2 3 4 5 3 4 5 6
(b) The 16 possible samples are equally likely, so each has a probability 1/16 of occurring. The sampling distribution of X is obtained by listing the distinct values of X along with the corresponding probabilities, as follows:
x 0 1 2 3 4 5 6 Total
Probability 1/16 2/16 3/16 4/16 3/16 2/16 1/16 1
225
(c) Since each of the four values of X are equally likely, each has probability of of occurring. The probability distribution is tabulated below, along with other calculations needed to compute the population mean and standard deviation. x 0 2 4 6 Total f ( x) 1/4 1/4 1/4 1/4 1 x f ( x) 0 2/4 4/4 6/4 12/4 x 2 f ( x) 0 4/4 16/4 36/4 56/4
Using the values in the table, we have the following: = x f ( x ) = 12 4 =3

2 2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 56 4 3 = 5, so that = 5
(d) For n = 2 , we know that the mean and standard deviation of the sampling distribution of X must be as follows: E( X ) = = 3
sd( X ) =
5
2
5 2

x
f (x ) 1/16 2/16 3/16 4/16 3/16 2/16 1/16 1
x f (x ) 0 2/16 6/16 12/16 12/16 10/16 6/16 48/16
0 1 2 3 4 5 6 Total
x 2 f (x ) 0 2/16 12/16 36/16 48/16 50/16 36/16 184/16
Using the values in the table, we have the following (which do indeed confirm the above assertion): 48 E ( X ) = x f ( x ) = 16 =3
2 Var( X ) = E ( X ) ( E ( X )) 2 = x 2 f ( x ) ( E ( X ))2 = 184 16 3 = 2
5 2
sd( X ) =
5 2
226
7.35 (a) By column, we record here the value of R for each possible sample listed in Exercise 7.34(a). 0 2 4 6 2 0 2 4 4 2 0 2 6 4 2 0 (b) For the sampling distribution of R, we list the distinct values along with the corresponding probabilities: Value of R 0 2 4 6 Total Probability 4/16 6/16 4/16 2/16 1
7.36 The standard deviation of X is (a) In order to have (b) In order to have (c) In order to have
, where is the population standard deviation.
= 4 , we require that n = 4 , or n = 16 . = 7 , we require that n = 7 , or n = 49 .

1 = (0.12) , we require that n = 0.12 , so that
1 2 n = ( 0.12 ) = 69.44 . Since the sample size must be an integer value, we would use n = 70 to be conservative.
7.37 (a) All possible samples ( x1 , x2 ) and the corresponding x values and probabilities are tabulated below. By independence, P( x1 , x2 ) = P( x1 ) P( x2 ) , where the values of P(1), P(2), and P(3) are given in the distribution of X. So, for instance, P(1,1) = P(1) P(1) = (0.2) (0.2) = 0.04 . ( x1 , x2 ) Probability of ( x1 , x2 )
x=
x1 + x2 2
(1,1) 0.04 1
(1,2) 0.12 1.5
(1,3) 0.04 2
(2,1) 0.12 1.5
(2,2) 0.36 2
(2,3) 0.12 2.5
(3,1) 0.04 2
(3,2) 0.12 2.5
(3,3) 0.04 3
x 1
Probability f ( x ) 0.04
227
1.5 2 2.5 3 Total
0.24 0.44 0.24 0.04 1
(b) E ( X ) = = xf ( x ) = 1(0.2) + 2(0.6) + 3(0.2) = 2.0 . This is true for any sample size n. (c) For n = 81 , E ( X ) = 2.0 (as mentioned in part (b)). Also, sd( X ) = is the population standard deviation, which we calculate below. x 1 2 3 Total f ( x) 0.2 0.6 0.2 1 x f ( x) 0.2 1.2 0.6 2.0 x 2 f ( x) 0.2 2.4 1.8 4.4
81
, where
2 = E ( X 2 ) 2 = x 2 f ( x) 2 = 4.4 22 = 0.4, so that = 0.4 = 0.6325

Thus, sd( X ) =
81
= 0.6325 = 0.0703 . 81
7.38 (a) All possible samples ( x1 , x2 ) and the corresponding x values and probabilities are tabulated below. By independence, P( x1 , x2 ) = P( x1 ) P( x2 ) , where the values of P(1), P(3), and P(5) are given in the distribution of X. So, for instance, P(1,1) = P(1) P(1) = (0.6) (0.6) = 0.36 . ( x1 , x2 ) Probability of ( x1 , x2 )
x=
x1 + x2 2
(1,1) 0.36 1
(1,3) 0.18 2
(1,5) 0.06 3
(3,1) 0.18 2
(3,3) 0.09 3
(3,5) 0.03 4
(5,1) 0.06 3
(5,3) 0.03 4
(5,5) 0.01 5
x 1
Probability f ( x ) 0.36
228
CHAPTER 7. VARIATION IN REPEATED SAMPLES 2 3 4 5 Total 0.36 0.21 0.06 0.01 1
(b) E ( X ) = = xf ( x ) = 1(0.6) + 3(0.3) + 5(0.1) = 2.0 . This is true for any sample size n. (c) For n = 25 , E ( X ) = 2.0 (as mentioned in part (b)). Also, sd( X ) =
25
, where

x 1 3 5 Total f ( x) 0.6 0.3 0.1 1 x f ( x) 0.6 0.9 0.5 2.0 x 2 f ( x) 0.6 2.7 2.5 5.8
2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 5.8 22 = 1.8, so that = 1.8 = 1.3416

Thus, sd( X ) =
25
= 1.3416 = 0.2683 . 25
7.39 We have = 32.4, = 0.4, and n = 9 . (a) E ( X ) = = 32.4 ,

sd( X ) =
= 0.4 = 0.1333 9
(b) Since the population distribution is normal, the sample mean X has normal distribution with mean and standard deviation given in part (a). X 32.4 (c) The standardized variable is Z = . As such, we have 0.1333 32.3 32.4 32.6 32.4 P[32.3 < X < 32.6] = P[ <Z< ] = P[ 0.750 < Z < 1.500] 0.1333 0.1333 = P[Z < 1.500] P[ Z < 0.750] = 0.9332 0.2266 = 0.7066. 7.40 We have = 0.32 and = 0.08 . (a) Denote X = weight of one pear selected at random. The distribution of X is the same as the population distribution. The standardized normal variable is X X 0.32 Z= = . As such, observe that 0.08 0.28 0.32 0.34 0.32 P[0.28 < X < 0.34] = P[ <Z< ] = P[ 0.5 < Z < 0.25] 0.08 0.08 = P[ Z < 0.25] P[ Z < 0.5] = 0.5987 0.3085 = 0.2902.
229
(b) The distribution of X is normal with mean = 0.32 and standard deviation X 0.32 is N(0,1). As such, we have 0.04 0.28 0.32 0.34 0.32 P[0.28 < X < 0.34] = P[ <Z< ] = P[ 1 < Z < 0.5] 0.04 0.04 = P[Z < 0.5] P[ Z < 1] = 0.6915 0.1587 = 0.5328. sd =
= n
0.08 4
= 0.04 . So, Z =
7.41 We have = 12.1 , = 3.2 , and n = 9 . (a) Since the population is normal, the distribution of X is normal with mean = = 12.1 and sd = n = 3.2 = 3.2 3 . 9 (b) The standardized normal variable is Z =
P[ X < 10] = P[ Z <
X 12.1 . As such, observe that 3.2 / 3
10 12.1 ] = P[ Z < 1.97] = 0.0244. 3.2 / 3 (c) The pebble size X is normally distributed with mean 12.1 and standard deviation X 12.1 3.2, so Z = is N(0,1). As such, observe that 3.2 10 12.1 P[ X < 10] = P[ Z < ] = P[ Z < 0.656] = 0.256. 3.2 So, about 26% of the pebbles are of size smaller than 10.
7.42 We have = 37 , = 7 , and n = 150 . (a) We have E ( X ) = = 37 and sd( X ) =
7 150
= 0.5715 . Since n = 150 is
large, the central limit theorem ensures that the distribution of X is approximately normal with mean and standard deviation as calculated above. X 37 (b) The standardized variable is Z = . As such, we have 0.5715 36 37 38 37 P[36 < X < 38] = P[ <Z < ] = P[ 1.75 < Z < 1.75] 0.5715 0.5715 = P[Z < 1.75] P[ Z < 1.75] = 0.9599 0.0401 = 0.9198. 38.5 37 (c) P[ X > 38.5] = P[ Z > ] = P[ Z > 2.625] = 1 0.9957 = 0.0043 . 0.5715
7.43 We have = 1.9 , = 1.2 , and n = 36 . (a) E ( X ) = = 1.9 and sd( X ) =
1.2 36
= 0.2 .
230
CHAPTER 7. VARIATION IN REPEATED SAMPLES (b) Since n = 36 is large, the central limit theorem ensures that the distribution of X is approximately normal with mean and standard deviation as calculated in part (a).
7.44 Z =
X 1.9 is nearly a standard normal variable. 0.2 2.2 1.9 (a) P[ X > 2.2] = P[ Z > ] = P[ Z > 1.5] = 1 P[ Z < 1.5] = 1 0.9332 = 0.0668 0.2 1.65 1.9 2.25 1.9 P[1.65 < X < 2.25] = P[ <Z< ] = P[ 1.25 < Z < 1.75] (b) 0.2 0.2 = P[Z < 1.75] P[ Z < 1.25] = 0.9599 0.1056 = 0.8543
7.45 (a) We need to first compute and :

x 1 2 3 4 5 Total Thus, f ( x ) xf ( x) 0.02 0.02 0.02 0.04 0.04 0.12 0.12 0.48 0.80 4.0 4.66 x 2 f ( x) 0.02 0.08 0.36 1.92 20 22.38
= xf ( x ) = 4.66 2 = x 2 f ( x ) 2 = 22.38 4.66 = 0.6644 = 0.6644 = 0.815

0.815 = 0.0218 . 1400 1400 (b) Since n is large, X has approximately a normal distribution with mean and s.d. given in part (a). Hence, E X = = 4.66 and sd X =
2
( )
( )
(c)
7.46 Since n = 49 is large, the distribution of X is approximately normal with mean X = (the population mean), and sd( X ) = n = 14 = 2 . Hence, Z = is 49 2 approximately standard normal.
231
(a)
P[2 < X < 2] = P[
2 2 < Z < ] = P[ 1 < Z < 1] 2 2 = P[Z < 1] P[ Z < 1] = 0.8413 0.1587 = 0.6826
(b) Since P[1.645 < Z < 1.645] = 0.90 , the number k must be 1.645 sd( X ) . Hence, k = 1.645(2) = 3.29 . (c) P[ X > 4] = P[ Z > 4 2 ] = P[ Z > 2] = 2 P[ Z < 2] = 2(0.0228) = 0.0456 . 7.47 (a) & (b) We calculate the population mean and standard deviation using the tabulated values below: x 0 1 2 Total f ( x) 0.5 0.3 0.2 1 x f ( x) 0 0.3 0.4 0.7 x 2 f ( x) 0 0.3 0.8 1.1
= x f ( x ) = 0.7
2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 1.1 (0.7) 2 = 0.61, so that = 0.61 = 0.781

(c) Let X 1 be the number sold the next day and X 2 the number sold the day after next. The total number sold is the sum X1 + X 2 . In order to determine the probability distribution of the sum, we make use of the fact that X 1 and X 2 are independent. For instance, the outcome (2,1), the total is 2 + 1 = 3 and, by independence, the probability of the outcome (2,1) is the product (0.2)(0.3) = 0.06 . All possible samples ( x1 , x2 ) , along with the corresponding totals and probabilities, are tabulated below: ( x1 , x2 ) Probability of ( x1 , x2 )
x1 + x2
(0,0) 0.25 0
(0,1) 0.15 1
(0,2) 0.10 2
(1,0) 0.15 1
(1,1) 0.09 2
(1,2) 0.06 3
(2,0) 0.10 2
(2,1) 0.06 3
(2,2) 0.04 4
The sampling distribution of X1 + X 2 is obtained by listing the distinct values of X1 + X 2 along with the corresponding probabilities, as follows:
232
CHAPTER 7. VARIATION IN REPEATED SAMPLES x1 + x2 0 1 2 3 4 Total Probability 0.25 0.30 0.29 0.12 0.04 1
53 64
(d) The event that at least 53 kayaks are sold is the same event that X >
. Since
the sample size n = 64 is large, we approximate the distribution of X by a normal distribution with mean 0.7 and standard deviation n = 0.781 = 0.09763 . 64 Indeed, we have
P[ X > 53 64 ] = P[ Z > 0.7 ] = P[ Z > 1.312] = 0.0948 . 0.09763
53 64
(e) Since P[ Z > 1.645] = 0.0500 , we must have the z-value 1.645. If k is the required number, then k 0.7 Z = 64 = 1.645 , 0.09763 so that k = 64(0.7 + 1.645(0.09763)) = 55.1 . Hence, 56 kayaks must be ordered. 7.48 (a) The distribution is normal, so the proportion is equivalent to the probability X 8.2 8 8.2 P[ X < 8] = P[ < ] = P[ Z < 2] = 0.0228 0.1 0.1 (b) Since P[ Z > 1.645] = 0.0500 , we have w 8.2 Z= = 1.645 or w = 8.2 + (0.1)(1.645) = 8.36 ounces. 0.1 (c) Since the population is normal, X has a normal distribution with mean 8.2 and = 0.07071 . Indeed, we have standard deviation n = 0.1 2
8.3 8.2 ] = P[Z < 1.414] = 0.9213 . 0.07071 (d) From part (c), each package has probability of 0.9213 of weighing less than 8.3 ounces. Let Y be the number out of 5 that weigh less than 8.3 ounces. Then, Y has a binomial distribution with n = 5 and p = 0.8413 . We are interested in the event Y = 4 or 5 . By the formula for the binomial distribution, we see that the probability of Y occurring is: (0.8413)5 + 5(0.8413) 4 (0.1587) = 0.81897 . P[ X < 8.3] = P[ Z <
233
7.49 (a) Let X be the amount in a single bottle. The distribution of X is normal, so that the desired probability is X 302 299 302 P[ X < 299] = P[ < ] = P[ Z < 1.5] = 0.0668 . 2 2 (b) Since P[ Z > 1.645] = 0.0500 , we have v 302 = 1.645 or v = 302 + 2(1.645) = 305.29 ml . Z= 2 (c) Since the population is normal, X has a normal distribution with mean 302 and standard deviation n = 22 = 2 . Indeed, we have 299 302 ] = P[Z < 2.121] = 0.017 . 2 (d) From part (a), each package has probability of 0.0668 of containing less than 299 ml. Let Y be the number out of 2 packages that contain less than 299 ml. Then, Y has a binomial distribution with n = 2 and p = 0.0668 . We are interested in the event Y = 1 . By the formula for the binomial distribution, we see that the probability of Y occurring is: 2(0.017)(1 0.017) = 0.1247 . P[ X < 299] = P[Z <
234

MAT 540 Statistical Concepts For Research

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAT 540 Statistical Concepts For Research

Uploaded by

Copyright:

Available Formats

Chapter 7 VARIATION IN REPEATED SAMPLES SAMPLING DISTRIBUTIONS

CHAPTER 7. VARIATION IN REPEATED SAMPLES

(a) All possible samples ( x1 , x2 ) and the corresponding x values are: ( x1 , x2 )

Probability f ( x ) 1/9 2/9 3/9 2/9 1/9 1

For example, for ( x1 , x2 ) = (2, 4) , we have:

Probability 3/9 4/9 2/9 1

(3,1,3) (3,4,1) (6,2,2) (3,5,5) (3,6,4) (4,4,3)

(2,1,2) (2,2,1) (4,1,1) (2,2,2) (2,4,2) (2,2,2)

(4,1,2) (5,1,1) (3,2,3) (3,1,5) (2,4,4)

(2,1,1) (2,1,1) (2,1,2) (2,1,2) (1,2,2)

= 1.549 = 0.1549 100

= 1.549 = 0.07745 400

( ) . So, quadrupling the sample size cuts sd( X ) in half.

7.14 The population standard deviation is = 38 , so sd( X ) = (a) For n = 9 sd( X ) =

( ) . So, quadrupling the sample size cuts sd( X ) in half.

CHAPTER 7. VARIATION IN REPEATED SAMPLES

Using the values in the table, we have the following:

We verify these by actually calculating the distribution of X :

f (x ) 1/9 2/9 3/9 2/9 1/9 1

x f (x ) 3/9 8/9 15/9 12/9 7/9 45/9

x 2 f (x ) 9/9 32/9 75/9 72/9 49/9 237/9

Using the values in the table, we have the following:

We verify these by actually calculating the distribution of X :

f (x ) 1/9 2/9 3/9 2/9 1/9 1

x f (x ) 0 2/9 6/9 6/9 4/9 18/9

x 2 f (x ) 0 2/9 12/9 18/9 16/9 48/9

CHAPTER 7. VARIATION IN REPEATED SAMPLES

(0,1) 0.12 0.5

(1,0) 0.12 0.5

(1,2) 0.12 1.5

(2,1) 0.12 1.5

is the population standard deviation, which we calculate below.

2 = E ( X 2 ) 2 = x 2 f ( x ) 2 = 1.6 12 = 0.6, so that = 0.6 = 0.7746

7.19 (a) E ( X ) = = 115 (b) sd( X ) =

Probability f ( x ) 0.49 0.14 0.29 0.04 0.04 1

is the population standard deviation, which we calculate below.

2 = E ( X 2 ) 2 = x 2 f ( x) 2 = 3.6 12 = 2.6 so that = 2.6 = 1.612

CHAPTER 7. VARIATION IN REPEATED SAMPLES

X1 + X 2 + X 3 + X 4 is normal with mean 101.5 and standard 4 deviation 0.8.

7.23 (a) We have E ( X ) = = 41, 000 and sd( X ) =

= 500 . Since n = 100 is

= 0.2 . Since n = 100 is large,

and the standardized variable is Z = (a) P[ X < 3.2] = P[ Z <

7.27 We have E ( X ) = = 34.5 and sd( X ) = variable is Z =

= 1.3 = 0.5307 , and the standardized 6

X 34.5 . As such, we have 0.5307

CHAPTER 7. VARIATION IN REPEATED SAMPLES

P[34.1 < X < 35.2] = P[

= 0.0015 , and the standardized variable is Z =

7.29 (a) By column, the medians are as follows:

Frequency Chart: x 1 frequency 3

(0,0) (0,1) 0.16 0 (2,0) 0.04 2 0.12 1 (2,1) 0.03 3

(0,2) 0.04 2 (2,2) 0.01 4

(0,3) 0.08 3 (2,3) 0.02 5

(1,0) 0.12 1 (3,0) 0.08 3

(1,1) 0.09 2 (3,1) 0.06 4

(1,2) 0.03 3 (3,2) 0.02 5

(1,3) 0.06 4 (3,3) 0.04 6

1.1 ] = P[ Z > 2.413] = 0.0079 . 0.1197

sd( D1 + D2 + D3 ) = 2 + 2 + 2 = 3 = 3(0.0128) = 0.02217 .

CHAPTER 7. VARIATION IN REPEATED SAMPLES

7.33 (a) The histogram is given below:

(4,0) (4,2) (4,4) (4,6) (6,0) (6,2) (6,4) (6,6) 2 3 4 5 3 4 5 6