You are on page 1of 8

Central Limit Theorem (Convergence of the sample means distribution to the normal distribution) Let X1 , X2 , . . .

, Xn be a random sample drawn from any distribution with a nite mean and variance 2 . As n , the distribution of: X / n converges to the distribution N (0, 1). In other words, X N (0, 1). / n
X ) = and Var(X ) = 2 /n. ? Remember that we proved that E(X Note 1: What is / n , subtracting its mean, and dividing by its That means we are taking the random variable X standard deviation. Its a z-score!

Note 2: converge means convergence in distribution: lim P X z / n = (z ) for all z .

Dont worry about this if you dont understand (its beyond the scope of 15.075). Note 3: CLT is really useful because it characterizes large samples from any distribution. As long as you have a lot of independent samples (from any distribution), then the distribution of the sample mean is approximately normal. Lets demonstrate the CLT. Pick n large. Draw n observations from U [0, 1] (or whatever distribution you like). Repeat 1000 times. t=1 : : : t = 1000 x 1 x2 x3 .21 .76 .57 xn .84 x = n i=1 xi (.21+.76+. . . )/n

Then, histogram the values in the rightmost column and it looks normal.

CLTdemo

n=z;
myrand=rand(n,500); mymeans= m ean ; [myrand) hist(mymeans)
BO 70

m
40

n=2

n= 3

60 4D

n= 30

Sampling Distribution of the Sample Variance - Chi-Square Distribution From the central limit theorem (CLT), we know that the distribution of the sample mean is approximately normal. What about the sample variance? Unfortunately there is no CLT analog for variance... But there is an important special case, which is when X1 , X2 , . . . , Xn are from a normal distribution. (Recall that the CLT applies to arbitrary distributions.) If this is true, the distribution of the sample variance is related to the Chi-Square (2 ) distribution.
2 2 2 + Z2 + + Z . Then the pdf of X Let Z1 , Z2 , . . . , Z be N (0, 1) r.v.s and let X = Z1 can be shown to be: 1 f (x) = /2 x/21 ex/2 for x 0. 2 (/2)

This is the 2 distribution with degrees of freedom ( adjustable quantities). (Note: the 2 distribution is a special case of the Gamma distribution with parameters = 1/2 and r = /2.) Fact proved in book: If X1 , X2 , . . . , Xn are iid N (, ) r.v.s, then (n 1) 2 S 2 n1 . 2 That is, the sample variance times a constant
(n1) 2

has a 2 n1 distribution.

Technical Note: we lost a degree of freedom when we used the sample mean rather than the true mean. In other words, xing n 1 quantities completely determines s2 , since: s2 := 1 n1 (xi x )2 .
i

Lets simulate a 2 n1 distribution for n = 3. Draw 3 samples from N (0, 1). Repeat 1000 times. t=1 : : : t = 1000 z1 -0.3 z2 -1.1 z3 0.2
3 2 i=1 zi

1.34

Then, histogram the values in the rightmost column.

For the chi-square distribution, it turns out that the mean and variance are: E(2 ) = Var(2 ) = 2. We can use this to get the mean and variance of S 2 : 2 2 n1 2 2 E(S ) = E = (n 1) = 2 , n1 n1 2 2 n1 4 4 2 4 2 2 Var(S ) = Var = Var( ) = 2( n 1) = . n1 (n 1)2 n1 n1 (n 1)2 So we can well estimate S 2 when n is large, since Var(S ) is small when n is large. Remember, the 2 distribution characterizes normal r.v. with known variance. You need to know ! Look below, you cant get the distribution for S 2 unless you know . X1 , X2 , . . . , Xn N (, 2 ) (n 1)S 2 2 n1 2

Students t-Distribution Let X1 , X2 , . . . , Xn N (, 2 ). William Sealy Gosset aka Student (1876-1937) was looking at the distribution of: X T = S/ n
X which we know is N (0, 1). Contrast T with / n So why was Student looking at this? Because he had a small sample, he didnt know the variance of the distribution and couldnt estimate it well, and he wanted to determine how far x was from . We are in the case of:

N (0, 1) r.v.s to comparing X unknown variance 2 small sample size (otherwise we can estimate 2 very well by s2 .) Rewrite X = T = S/ n
X / n

S2 1 n / n

=_

Z S 2 / 2

The numerator Z is N (0, 1), and the denominator is sort of the square root of a chi-square, because remember S 2 (n 1)/ 2 2 n1 . Note that when n is large, S 2 / 2 1 so the T-distribution N (0, 1). Student showed that the pdf of T is: ( +1)/2 +1 t2 2 f (t) = 1+ <t< (/2)

Snedecors F-distribution The F -distribution is usually used for comparing variances from two separate sources. 2 ) and Y1 , Y2 , . . . , Yn2 Consider 2 independent random samples X1 , X2 , . . . , Xn1 N (1 , 1 2 2 2 N (2 , 2 ). Dene S1 and S2 as the sample variances. Recall:
2 S1 (n1 1) 2 n1 1 2 1

and

2 S2 (n2 1) 2 n2 1 . 2 2

The F-distribution considers the ratio:


2 2 2 /(n1 1) S1 /1 1 1 n . 2 2 2 n2 1 /(n2 1) S2 /2 2 2 2 2 = 2 , the left hand side reduces to S1 /S2 . When 1 2 We want to know the distribution of this! Speaking more generally, let U 2 1 and V 2 . 1 Then W = U/ has an F-distribution, W F1 ,2 . V /2 The pdf of W is: /2 (1 +2 )/2 1 ((1 + 2 )/2) 1 1 1 /21 f (w) = w 1+ w for w 0. 2 (1 /2)(2 /2) 2

There are tables in the appendix of the book which solve:


2 P (2 > , ) =

P (T > t, ) = P (F1 ,2 > f1 ,2 , ) =

Note that because F is a ratio, F1 ,2 = 1 F2 ,1

which you might need to use in order to look up the F-scores in a table in the book. Actually, you will need to know: 1 . f1 ,2 ,1 = f2 ,1 ,

MIT OpenCourseWare http://ocw.mit.edu

15.075J / ESD.07J Statistical Thinking and Data Analysis


Fall 2011

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like