You are on page 1of 18

STA 6166, Section 8489, Fall 2007 Homework #6 Due 27 November 2007

RAMIN SHAMSHIRI
UFID#: 9021-3353

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 1

Chapter6Concept Questions- Indicate True or False. If False, specify what change will make the statement true. 1- If for two samples the conclusions from an ANOVA and t-test disagree, you should trust the t-test. Answer: False, In ANOVA we consider each group as a population. In t-test our assumption is that the difference of the two samples should be normally distributed, but in ANOVA each population should be normally distributed. So the ANOVA is stronger. (This answer is also checked with software package.) 2- A set of sample means is more likely to result in rejection of the hypothesis of equal population means if the variability within the populations is smaller. Answer: True Based on the ratio = , the smaller within group variance, the larger the F value which leads to the 2

smaller P-value, thus more likely to reject the Hypothesis of equality of population. 3- If the treatments in a CRD consist of numeric levels of input to a process, the LSD multiple comparison procedure is the most appropriate test. Answer: False 4- If every observation is multiplied by 2, then the value of the F statistic in an ANOVA is multiplied by 4. Answer: False Based on the ratio of = , if every observation is multiplied by 2, the variances will be each 2

multiplied by 4, but the ratio of F will still remain the same. 5- To use the F statistic to test the equality of two variances, the samples sizes must be equal. Answer: False The assumptions of the F statistic are equality of population variances, independency of samples and normally distribution of populations. thus, the sample sizes do not to be equal. 6- The logarithmic transformation is used when the variance is proportional to the Mean. Answer: True If is proportional to the Mean, we use the logarithm of the yij. 7- With the usual ANOVA assumptions, the ratio of two Mean squares whose expected values are the same has an F distribution. Answer: 8- One purpose of randomization is to remove experimental error from the estimates. Answer: 9- To apply the F test in ANOVA, the sample size for each factor level (population) must be the same. Answer: False The size of each treatment does not need to be same.

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 2

10- To apply the F test for ANOVA, the sample standard deviations for all factor levels must be the same. Answer: False The variances of the populations must be equal. 11- To apply the F test for ANOVA, the population standard deviations for all factor levels must be the same. Answer: True The variances of the populations must be equal. 12- An ANOVA table for a one-way experiment gives the following: Answer true or false for the following six arguments: The null hypothesis is that all four means are equal. Answer: False , There are three groups The calculated value of F is 1.125.
2 2

Answer: False because =

810 /2 720 /8

= 4.5

The critical value for F for 5% significance is 6.60. Answer: False, The critical value is 4.46 (from 0.05 table, F=4.5,dfN=2, dfDN=8 ) The null hypothesis can be rejected at 5% significance. Answer: True because the critical F value at 5% is 4.46 which is less than F value from test. It means that the P-value from test is less than the critical p-value 5%, thus we reject H0 The Null hypothesis cannot be rejected at 1% significance. Answer: TRUE, because the critical F-value at 1% is 8.86 There are 10 observations in the experiment. Answer: False, There are 11 observations in this experiment. 13- A statistically significant F in an ANOVA indicates that you have identified which levels of factors are different from the others. Answer: False, the F value just tells if the levels of factor are different or not different, but it does not tell which ones are different from the others

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 3

Chapter 6- Exercises 4- A manufacturer of concrete bridge supports is interested in determining the effect of varying the sand content of concrete on the strength of the supports. Five supports are made for each of five different amounts of sand in the concrete mix and each support tested for compression resistance. The results are shown in table below: Percent Sand A= 15 B= 20 C= 25 D= 30 E= 35 Support 1 7 17 14 20 7 Comparison Resistance (10,000 psi) Support 2 Support 3 Support 4 7 10 15 12 11 18 18 18 19 24 22 19 10 11 15

Support 5 9 19 19 23 11

a- Perform the analysis to determine whether there is an effect due to changing the sand content. Answer: The factor is the Sand and the treatments are the five sand contents which have named as A,B,C,D and E. To determine whether there is a significant difference in the Means of each sand contents, we use ANOVA which requires checking the assumptions stated below. 1- The population from which the samples were obtained must be normally or approximately normally distributed. 2- The samples must be independent. 3- The variances of the populations must be equal. Checking the Normality assumption: H0: the Population A: Sand=15% has a specified theoretical distribution H1: the distribution is not the theoretical distribution
N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Moments 5 Sum Weights 9.6 Sum Observations 3.28633535 Variance 1.43410896 Kurtosis 504 Corrected SS 34.2326598 Std Error Mean 5 48 10.8 2.0936214 43.2 1.46969385

Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling

Tests for Normality --Statistic-------p Value-----W D W-Sq A-Sq 0.844815 0.251562 0.067783 0.420715 Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.1787 >0.1500 0.2467 0.1932

H0: the Population B: Sand=20% has a specified theoretical distribution H1: the distribution is not the theoretical distribution
Moments

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 4

N Mean Std Deviation Skewness Uncorrected SS Coeff Variation

5 15.4 3.64691651 -0.4824345 1239 23.681276

Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean

5 77 13.3 -2.8509243 53.2 1.63095064

Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling

Tests for Normality --Statistic--W D W-Sq A-Sq 0.860318 0.26957 0.068758 0.398516

-----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.2294 >0.1500 0.2404 0.2235

H0: the Population C: Sand=25% has a specified theoretical distribution H1: the distribution is not the theoretical distribution
N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Moments 5 Sum Weights 17.6 Sum Observations 2.07364414 Variance -1.9177563 Kurtosis 1566 Corrected SS 11.782069 Std Error Mean Tests for Normality --Statistic--W D W-Sq A-Sq 0.738725 0.37648 0.127365 0.686044 5 88 4.3 3.87777177 17.2 0.92736185

Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling

-----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.0233 0.0201 0.0340 0.0296

H0: the Population D: Sand=30% has a specified theoretical distribution H1: the distribution is not the theoretical distribution
N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Moments 5 Sum Weights 21.6 Sum Observations 2.07364414 Variance -0.2355139 Kurtosis 2350 Corrected SS 9.60020433 Std Error Mean Tests for Normality --Statistic--W D W-Sq A-Sq 0.952351 0.179821 0.031987 0.206799 5 108 4.3 -1.9632234 17.2 0.92736185

Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling

-----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.7540 >0.1500 >0.2500 >0.2500

H0: the Population E: Sand=35% has a specified theoretical distribution H1: the distribution is not the theoretical distribution
N Mean Std Deviation Skewness Uncorrected SS Coeff Variation Moments 5 Sum Weights 10.8 Sum Observations 2.86356421 Variance 0.33218026 Kurtosis 616 Corrected SS 26.5144835 Std Error Mean Tests for Normality 5 54 8.2 1.66864961 32.8 1.28062485

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 5

Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling

--Statistic--W D W-Sq A-Sq 0.941971 0.272159 0.056065 0.303417

-----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.6799 >0.1500 >0.2500 >0.2500

Conclusion of checking the Normality Assumption: Since the Shapiro-Wilk test is for testing the Normality only, we use it as a reference to summarize the result for conclusion.
Population Sand Content= 15% Sand Content= 20% Sand Content= 25% Sand Content= 30% Sand Content= 35% Shapiro-Wilk test P-value 0.1787 > 0.05 0.2294> 0.05 0.0233 < 0.05 0.7540 > 0.05 0.6799 > 0.05 Decision Dont Reject H0 Dont Reject H0 Reject H0 Dont Reject H0 Dont Reject H0

Based on the tests for Normality and considering the Histogram and QQ plots for each of the five level of sand content, we conclude that the all data are coming from normal or approximately normal populations, except the 3rdpopulation, (Sand Level=25%)

Histogram and QQ-plot for Sand level=20%

Histogram and QQ-plot for Sand level=15%

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 6

Histogram and QQ-plot for Sand level=35%

Histogram and QQ-plot for Sand level=30%

Histogram and QQ-plot for Sand level=25%

Checking the assumption of equality of Variance Testing the equality of Variance with Levenes test:
Levene's Test for Homogeneity of Y Variance ANOVA of Absolute Deviations from Group Means Sum of Squares 8.8320 46.6080 Mean Square 2.2080 2.3304

Source treatment Error

DF 4 20

F Value 0.95

Pr > F 0.4573

Covariance Parameter Estimates Cov Parm Residual Residual Residual Residual Residual Group treatment treatment treatment treatment treatment A B C D E Estimate 10.8000 13.3000 4.3000 4.3000 8.2000

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 7

Conclusion of checking the Normality Assumption: Do not reject the hypothesis of equality of variance since the P-value from the Levenes test is greater than the common rejection p-value (0.05). Thus we conclude that the assumption of the homogeneity of variance is met. This conclusion can also be observed from the estimated variances listed above Now that we have our assumptions checked, we can continue performing the analysis to determine whether there is an effect due to changing the sand content. Our hypothesis is as below: H0: A=B= C =D =E H1: At least one Mean is difference from the others Since we have more than two means, we cannot use the t-test (due to reasons mentioned in text) and we should use ANOVA for such comparison. If there is no difference in the Means, the between group variance estimate will be approximately equal to the within group variance estimate and the F-test value will be approximately equal to 1 and the null hypothesis will not be rejected. If the Means differs significantly, the between group variance will be much larger than the within group variance, thus the Ftest will be significantly greater than 1 and the null hypothesis will be rejected. Using both SAS and Excel software packages, the ANOVA outputs are as below: From SAS
Source Model Error Corrected Total DF 4 20 24 Sum of Squares 486.4000000 163.6000000 650.0000000 Mean Square 121.6000000 8.1800000 F Value 14.87 Pr > F <.0001

From Excel: SUMMARY Groups Row 1 Row 2 Row 3 Row 4 Row 5 ANOVA Source of Variation Between Groups Within Groups Total Count 5 5 5 5 5 Sum Average 48 9.6 77 15.4 88 17.6 108 21.6 54 10.8 Variance 10.8 13.3 4.3 4.3 8.2

SS 486.4 163.6 650

df 4 20 24

MS

121.6 14.86553 8.18

P-value F crit 8.65E06 2.866081

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 8

Conclusion: The F-value of ANOVA is 14.87 which is much larger than the Critical value of F which is 2.87. In other words, since the P-value from ANOVA is much less than the critical p-value, we reject the Null hypothesis of equal means and conclude that there is not enough evidence that the Means of these 5 groups are equal. Figure.

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 9

b. Redo the analyses as a linear regression of compression resistance on sand content. Check the assumptions and if met, test whether the slope is not equal to 0.
Answer: From the ANOVA we have already concluded that there is not enough evidences that the Means of the five sand levels are equal. This is due to the fact the one or more pairs of the sand level Means are statistically not equal. In the other words, there is sufficient evidence that at least one of the sand level mean differs from the others. We first need to check the assumption first before analyzing the Means of the sand levels. The ANOVA can also be represented as the Linear model for several populations, yij= i+ij i=1,2,,t j=1,2,,n
yij: jth observation sample value from the ith population i:Mean of the ith population th ij: Difference or deviation of the j observed value from its respective population mean.

With the following assumptions: 1. The ijs are normally distributed random variables with Mean=0 and Variance= 2 2. The ijs are independent in probability sense; that is the behavior of the ij is not affected by the behavior value of any other.
Tests for Normality Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling --Statistic--W D W-Sq A-Sq 0.967737 0.13053 0.055396 0.326296 -----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.5884 >0.1500 >0.2500 >0.2500

Quantile 100% Max

Estimate 5.4

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 10

99% 95% 90% 75% Q3

5.4 4.2 3.6 1.4

Extreme Observations ----Lowest---Value -4.4 -3.8 -3.6 -3.4 -2.6 Obs 8 21 11 7 19 ----Highest--Value 2.4 2.6 3.6 4.2 5.4 Obs 17 9 10 24 4

Stem 5 4 3 2 1 0 -0 -1 -2 -3 -4

Leaf 4 2 6 46 4446 224444 86 6 666 864 4 ----+----+----+----+


Variable: resid

# 1 1 1 2 4 6 2 1 3 3 1

Boxplot | | | | +-----+ *--+--* | | | | +-----+ | |

Normal Probability Plot 5.5+ *++ | *++++ | +*++ | +*+* | ***+* 0.5+ ***+** | **++ | ++*+ | ++** | *+*+* -4.5+ *++++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 11

S E G TR N TH = 10. 7 +0. 172 S N A D 25. 0 N 25 R sq 0. 0569 A R dj sq 0. 0159 R S ME 5. 1627

22. 5

20. 0

17. 5

15. 0

12. 5

10. 0

7. 5

5. 0 15. 0 17. 5 20. 0 22. 5 25. 0 S N A D 27. 5 30. 0 32. 5 35. 0

S E G TR N TH = 10. 7 +0. 172 S N A D 10. 0 N 25 R sq 0. 0569 A R dj sq 0. 0159 R S ME 5. 1627

7. 5

5. 0

2. 5

0. 0

- 2. 5

- 5. 0

- 7. 5

- 10. 0 13. 0 13. 5 14. 0 14. 5 15. 0 P edi ct ed V ue r al 15. 5 16. 0 16. 5 17. 0

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 12

Testing whether the slope is equal to zero: H0: 1=0 H1: 10

Parameter Estimates Parameter Estimate 10.70000 0.17200 Standard Error 3.79376 0.14602

Variable Intercept SAND

DF 1 1

t Value 2.82 1.18

Pr > |t| 0.0097 0.2509

Analysis of Variance Sum of Squares 36.98000 613.02000 650.00000 Mean Square 36.98000 26.65304

Source Model Error Corrected Total

DF 1 23 24

F Value 1.39

Pr > F 0.2509

Root MSE Dependent Mean Coeff Var

5.16266 15.00000 34.41772

R-Square Adj R-Sq

0.0569 0.0159

Conclusion: The t-value is equal to 1.18 and leads to p-value equal to 0.2509 which is larger than 0.05 significant level. The F-value is equal to 1.39 and leads to a same p-value which shows that there is not enough evidence to reject the null hypothesis that the Slope is equal to zero. In the other words, based on this test, we do not reject H0 A measure of the amount of explained variability is R-squared which is equal to 0.0569. This value is very close to zero and shows that there is almost no linear relationship between Y and X. From the residual plot, Histogram plot and QQ-plot of the ijs, we can conclude that the errors are normally distributed random variables. Since the p-values from all of these tests are greater than the common critical P-value (0.05) they all show that there is not sufficient evidence to reject the null hypothesis of Normality, thus we conclude that the data are coming from normally distributed populations.

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 13

Comparing the Means of Sand Levels, we have 5 = 10 following hypothesizes: 2 H0: A= B H0: A= C H0: A= D H0: A= E H1: A B H1: A C H1: A D H1: A E H0: B= C H1: B C H0: C= D H1: C D H0: D= E H1: D E
The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey Sand_ Level A B C D E resistance LSMEAN 9.6000000 15.4000000 17.6000000 21.6000000 10.8000000 LSMEAN Number 1 2 3 4 5

H0: B= D H1: B D H0: C= E H1: C E

H0: B= E H1: B E

Protected Fishers LSD approach:


Least Squares Means for effect Sand_Level Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: resistance i/j 1 2 3 4 5 1 2 0.0044 0.0044 0.0003 <.0001 0.5146 0.2381 0.0027 0.0194 3 0.0003 0.2381 0.0388 0.0012 4 <.0001 0.0027 0.0388 <.0001 5 0.5146 0.0194 0.0012 <.0001

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 14

Tukey approach:
Least Squares Means for effect Sand_Level Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: resistance i/j 1 2 3 4 5 1 2 0.0320 0.0320 0.0022 <.0001 0.9620 0.7423 0.0200 0.1204 3 0.0022 0.7423 0.2159 0.0096 4 <.0001 0.0200 0.2159 <.0001 5 0.9620 0.1204 0.0096 <.0001

Decision summary based on Tukey approach


i/j A A B 0.03<0.05 Reject H0: A= B C 0.002<0.05 Reject H0: A= C 0.7423>0.05 Do Not Reject H0: B= C D 0.0001<0.05 Reject H0: A= D 0.02<0.05 Reject H0: B= D 0.2159>0.05 Do not Reject H0: C= D E 0.962>0.05 Do not Reject H0: A= E 0.12>0.05 Do not Reject H0: B= E 0.0096<0.05 Reject H0: C= E 0.0001<0.05 Reject H0: D= E

0.03<0.05 Reject H0: A= B 0.002<0.05 Reject H0: A= C 0.0001<0.05 Reject H0: A= D 0.962>0.05 Do not Reject H0: A= E

0.7423>0.05 Do not Reject H0: B= C 0.02<0.05 Reject H0: B= D 0.12>0.05 Do not Reject H0: B= E

0.2159>0.05 Do not Reject H0: C= D 0.0096<0.05 Reject H0: C= E

0.0001<0.05 Reject H0: D= E

Conclusion of Mean comparison: From the Least Squares Means, it is clear that the Means of Sand Level A and Sand Level E are very close together, (Mean A=9.6 and Mean E=10.8) which confirms rejecting the H0: A= E . It can also be found that the Sand Level B and sand Level C have close values of Means, (15.4 and 17.6). This is also confirmed from the Tukey comparison of Means, where we make the decision to reject the H 0: B= C It should be noted that another conclusion may be made based on the Protected Fishers LSD approach. From the fit test, we see that the R-square, (R2) which is a measure of the amount of explained variability is equal to 0.748 implies that the regression relationship explains approximately 81% of the observed variability in Y (index). Fit Test: (R2), Grand Mean is 15.00
R-Square 0.748308 Coeff Var 19.06713 Root MSE 2.860070 Y Mean 15.00000

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 15

12- For laboratory studies of an organism, it is important to provide a medium in which the organism flourishes. The data for this exercise shown in table below are from a completely randomized design with four samples for each of seven media. The response is the diameters of the colonies of fungus.
Medium WA RDA PDA CMA TWA PCA NA Fungus colony Diameters 4.1 4.4 6.8 7.2 7.9 7.6 6.2 6.0 5.0 5.4 6.2 6.2 6.8 6.6

4.5 7.1 7.8 6.5 5.1 6.1 7.0

4.0 6.9 7.6 6.4 5.2 6.0 6.8

a. Perform an analysis of variance to determine whether there are different growth rates among the media.

Dependent Variable: Y Sum of Squares 32.75857143 0.68000000 33.43857143

Source Model Error Corrected Total

DF 6 21 27

Mean Square 5.45976190 0.03238095

F Value 168.61

Pr > F <.0001

R-Square 0.979664

Coeff Var 2.905720

Root MSE 0.179947

Y Mean 6.192857

ANOVA Source of Variation Between Groups Within Groups Total

SS 32.75857 0.68 33.43857

df

MS

6 5.459762 168.6103 21 0.032381 27

P-value F crit 1.19E16 2.572712

Conclusion: The very big F-value leads to a very small P-value which rejects null hypothesis at any significant level. Therefore, we conclude that there are different growth rates among the media.

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 16

b. Is this exercise appropriate for preplanned or post hoc comparison? Perform the appropriate method and make recommendations. Answer: This exercise is appropriate for Post-hoc comparison since the problem is to find a medium in which the organism flourishes. It means that we first need to test the medium if they are significantly difference. Then based on the result, we need a multiple comparison techniques which are of two general types: 1- Pre-planned comparison (generated prior to the experiment being conducted). Pre-planned comparisons should be performed whenever possible because: Pre-planned comparisons have more power. A post-hoc comparison may not provide useful results. 2- Post-hoc comparison (use the result of the analysis to formulate the hypotheses) As mentioned, we use post-hoc comparison in which specific hypotheses are based on observed differences among the estimated factor level means. That is, the hypotheses are based on the sample data. Most post-hoc comparison procedures are restricted to testing contrasts that compare pairs of means, H0: i=j for all values of ij

Here we can use the Tukey test after the analysis of variance has been completed to make pair wise comparisons between the groups which have the same sample size. = 2 / 2 Where and are the Means of the samples being compared, n is the size of the samples and is the within group variance. When the absolute value of q is greater than the critical value for the Tukey test, there is a significant difference between the two means being compared.
The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey LSMEAN Number 1 2 3 4 5 6 7

MEDIUM CMA NA PCA PDA RDA TWA WA

DIAM LSMEAN 6.27500000 6.80000000 6.12500000 7.72500000 7.00000000 5.17500000 4.25000000

Least Squares Means for effect MEDIUM Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: DIAM

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 17

i/j 1 2 3 4 5 6 7

2 0.0074

3 0.8944 0.0005 <.0001 <.0001 <.0001 <.0001

4 <.0001 <.0001 <.0001 0.0002 <.0001 <.0001

5 0.0002 0.7004 <.0001 0.0002 <.0001 <.0001

6 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

7 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001

0.0074 0.8944 <.0001 0.0002 <.0001 <.0001

0.0005 <.0001 0.7004 <.0001 <.0001

Conclusion of H0: i=j for all values of ij Decision summary based on Tukey approach
i/j 1 2 3 4 5 6 7 1 Reject Do not Reject Reject Reject Reject Reject 2 Reject Reject Reject Do not Reject Reject Reject 3 Do not Reject Reject Reject Reject Reject Reject 4 Reject Reject Reject Reject Reject Reject 5 Reject Do not Reject Reject Reject Reject Reject 6 Reject Reject Reject Reject Reject Reject 7 Reject Reject Reject Reject Reject Reject

Ramin Shamshiri

STA6166, HW#6, Nov.27.2007

Page 18

You might also like