You are on page 1of 14

TABLE OF CONTENTS

1.0

INTRODUCTION 1.1 1.2 OBJECTIVES DATA DESCRIPTION

2.0

LITERITURE REVIEWS

3.0

FINDING AND ANALYSIS

4.0

CONCLUSIONS

5.0

REFERENCES

1.0

INTRODUCTION

it is not enough to know that a sample could have come from a normal population; we must be clear that it is at the same time improbable that it has come from a population differing so much from the normal as to invalidate the use of the normal theory tests in further handling of the material. E. S. Pearson, 1930 (quoted on page 1 in Tests of Normality, Henry C. Thode, Jr., 2002) Many of the statistical methods that we will apply require the assumption that a variable or variables are normally distributed. The mulitivariate normalilty assumption is more complex compare to the univariate normality assumption.

With multivariate statistics, the assumption is the combination of variables follows a multivariate normal distribution. The other properties of multivariate normal distribution are all subsets of the components have (multivariate) normal distributions and all the

linear combinations of the components are normally distributed. Since there is not a direct test for multivariate normality, we generally test each variable individually and assume that they are multivariate normal if they are individually normal, though this is not necessarily the case.

There are several ways to test for multivariate normality and some are easy to implement. There are both graphical and statistical methods for evaluating normality.

Graphical methods include the histogram and normality plot. a) The normality plot Normal Test Plots (also called Normal Probability Plots or Normal Quartile Plots) are the graphical technique used to investigate whether process data exhibit the standard normal "bell curve". The data should form an approximate straight line when the data are plotted against a theoretical normal distribution. There will be indicate that departures from normality when the data was departures from the straight line. However there is no plot will be exactly linear, because data is subject to randomness in it is collection.

b) Histogram The purpose of a histogram (Chambers) is to graphically summarize the distribution of a univariate data set. The effect is a rough approximation of the frequency distribution of the data. The shape of the distribution conveys important information such as the probability distribution of the data. In cases in which the distribution is known, a histogram does not fit the distribution may provide clues about a process and measurement problem.

However, it is difficult to tell whether the nonnormality is real and apparent, because of considerating the sampling error if the small or moderate size by applied the graphical tmethod. So that, we can use the statistical test can be used to analyze the assumption of the multinormality. Kolmogorov-Smirnov test and Shapiro-Wilk test, such the example of the statistical tests that common use by the researcher.

a) Kolmogorov-Smirnov Kolmogorov-Smirnov test is a goodness-of-fit test for any statistical distribution. The test relies on the fact that the value of the sample cumulative density function is asymptotically normally distributed.

b) Shapiro-Wilk test The Shapiro-Wilk test compares a set of measures against the Normal distribution. Shapiro-Wilk may be used before Parametric tests, to ensure the data being used has a Normal distribution.

Shapiro-Wilk is an improvement on the more general Kolmogorov-Smirnov curve-fitting algorithm. In addition, Shapiro-Wilk test is the most powerful normality test available and is able to detect small departures from normality.

1.1

OBJECTIVES

The objective of this case study is to check multivariate normality for each variable. To be fair, I will implement together with the graphical and statistical tests for each variable selected.

1.2

DATA DESCRIPTIONS

The LA Times map of the neighborhoods of Los Angeles contains data (U.S. Census 2000) on the 110 LA neighborhoods including measures of education, income and populations demographic. The variables include:

LA_Neighborhoods (LA_Nbhd) Highest_Incomes (Income): Median household income reports the amount of money earned by the household that falls exactly in the middle of pack.

Highest-Scoring_Public_Schools (Schools): The median API score reports the 2008 test results posted by the school that falls exactly in the middle of the pack. California's Academic Performance Index (API) combines several tests into a single number between 200 and 1000 for each school. The tests that make up the API and their weighting are listed on the California Department of Education website.

Most_Diverse_Population (Diversity):

The

diversity

index

measures

the

probability that any two residents, chosen at random, would be of different ethnicities. If all residents are of the same ethnic group it's zero. If half are from one group and half from another it's 50%. This figure is used to place areas into 10 groups of even size and then rank those groups. Areas with a diversity rank of 1 are the least ethnically diverse. Those with a diversity rank of 10 are the most ethnically diverse.

Oldest_Population (Age): Median age reports the age of the person who falls exactly in the middle of the pack.

Most_Homeowners (Homes): The percentage of owners measures the portion of households that are owner occupied. A ranking of the percentage of households that rent is also available.

Most_Veterans (Vets): The percentage of veterans measures the portion of adult population that once served in the armed forces. A ranking of active members of the armed forces is also available.

Asian (Asian): The percentage of population of whose ethnicity is Asian. Black (Black): The percentage of population of whose ethnicity is black. Latino (Latino): The percentage of population of whose ethnicity is Latino. White (White): The percentage of population of whose ethnicity is white

However for test the multivariate normality assumption only includes several responses variable. I choose Income, Schools, Diversity, Homes, Population and Area.

2.0

LITERATURE REVIEWS

Multivariate Normal Distribution

In Von Eye and Bogat (2004), said that before applying parametric multivariate procedures, it recommend to perform tests of Multivariate Normality. This is because to avoid get the inefficiency and biased parameter estimates. In many social science applications, robustness cannot be assumed because sample sizes tend to be small.

Farrell, Matias and Naczk (2006), as we knew the multivariate inferential very sensitive technique, especially when goes to the Multivariate Normal Distribution. Even the big number of test had been proposed for detecting departures from Multivariate Normal distribution, the assumption frequently goes untested.

Terzi (unkown), wrote that many researchers dont test for multivariate distributions, maybe because some multivariate techniques are very robust against deviations from multivariate normality. However, it is important to verify the assumption in order to understand the severity of violations. Also, we should remember that some multivariate procedures are really, really, sensitive to nonnormal distributions.

Looney (1995) lists a number of reasons for the reluctance to test for Multivariate Normal Distribution, including the lack of awareness of the existence of the tests, the limited availability of software, and the lack of information regarding size and power.

Los Angeles City Neighborhoods Data

Schnittker (2004) in his study, indicates thats education reduces the strength and curvature of the income-health relationship. In addition, those who are having more education have better health for all levels of income. In contrast, the disparities of fewer income-based exist among the well educated than among the less well educated.

In most metropolitan regions throughout the globe, urbanized land area is increasing to accommodate increasing population size, Jullian (2007)

3.0

FINDING AND ANALYSIS a) Statistical Tests

The data analyzed by using SPSS 17.0. All the

Kolmogorov-Smirnov

Shapiro-Wilk

Statistic Income School Diversity Homes Population Area .131 .291 .108 .174 .107 .207

df 110 110 110 110 110 110

Sig. .000 .000 .003 .000 .004 .000

Statistic .839 .615 .935 .844 .913 .785

df 110 110 110 110 110 110

Sig. .000 .000 .000 .000 .000 .000

Table 1: Tests of Normality

With using alpha, = 0.05 and based on tests of normality by using KolmogorovSmirnov and Shapiro-Wilk, we can conclude that the income, school, diversity, homes, population and area is normally distributed. This is because all the significant value for the included variables is less than 0.05.

b) Graphical Method c) We compare the analysis by using statistical tests and graphical method to make confirmation of the normality for the selected response variables.

i)

Normal Probability Plot - Income

Figure 1

The general impression from the examining display is that the normal distribution is approximately normal. The normal probability plot shows that all points are approximately lies on the straight line, even there still having the outliers. The assumption of normality of error is satisfied.

ii)

Histogram House

Figure 2: The histogram of Home The figure is likely tended to show the bell curve shape. We can say that the distribution is normal, even there are having the outliers.

4.0

CONLUSION

In conclusion for this case study, overall considering the patterns in the observed distribution of Los Angeles neighborhood data, the normal distribution approximation was judged to be reasonable summary. Both statistical methods, the statistical and graphical, I can concluded that all variables are normally distributed. Hence, the objective of this case study achieved. The statistical test, p-value is greater than the significant value. The normal probability plot and histogram, ensure the test results.

5.0

REFERENCES

Alexander Shapiro (2008), Asymptotic normality of test statistics under alternative hypothesis, Journal of Multivariate Analysis: Volume 100 (2009), p. 936-945

Alexander Von Eye, G. Anne Bogat (2004), Testing the assumption of multivariate normality, Psychology Science, Volume 46, 2004, p 423-258

Christopher J. Mecklin, and Daniel J. Mundfrom (2004), An Appraisal and Bibliography of Tests for Multivariate Normality

Gabor j. Szekely, Maria L.Rizzo (2002), A new test for multivariate normality, Journal of Multivaraite Analysis: 93 (2005), p. 58-80

J. Schnittker, Education and the Changing Shape of the Income Gradient in Health, Journal of Health and Social BehaviorSeptember 2004 vol. 45 no. 3 286-305

Julian D. Marshall (2007), Urban Land Area and Population Growth: A New Scaling Relationship for Metropolitan Expansion, Urban Stud September 2007 vol. 44, no.10 1889-1904

Nornadiah Mohd Razali, yap Bee Wah (2011), Power comparison of Shapiro-WIlks, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests, Journal of Statistical Modelling and Analytics: Vol. 2 No.1, 21-33, 2011

Mathew Richardson, Tom Smith (1993), A Test of Multivariate Normality in Stock Returns, Journal of Business: Volume 66, Issue 2(Apr, 1993)

Stephen W. Looney(1995), How to Use Tests for Univariate Normality to Assess Multivariate Normality, The American Statiscian, Volume 49, No 1 ( Feb, 1995). P 64-70

Ted Gaten, (2000), Normal distributions, www.le.ac.uk/bl/gat/virtualfc/Stats/normal.htm

You might also like