You are on page 1of 3

15-359: Probability and Computing

Recitation 9 November 2, 2007


The Central Limit Theorem
The normal distribution occurs in many places where one might not at rst expect it; today we will see why.
When one has a large number of independent identically distributed random variables, their average tends
towards a normal distribution as the number of variables grows. This notion is formalized in the following
theorem.
Central Limit Theorem. Let X
1
, X
2
, . . . , be a sequence of i.i.d. r.v.s with mean and variance
2
, and
dene:
Z
n
=
X
1
+ +X
n
n

n
Then the cdf of Z
n
converges to the normal cdf. i.e. for all z,
lim
n
Pr[Z
n
z] = (z) =
1

e
x
2
/2
dx
Well give a simplied proof that assumes the existence of a moment generating function. If X is a r.v., then
the moment generating function of X is the function dened by,
M
X
(t) = E[e
tX
]
The moment generating function is called so because it contains all the information about the moments of
a r.v:
E[X
n
] = M
(n)
X
(0) =
d
n
M
X
(t)
dt
n

t=0
To see this, note that,
M
X
(t) = E[e
tX
]
=

e
tx
f
X
(x) dx
=

(1 +tx +
t
2
x
2
2!
+ )f
X
(x) dx
= 1 +tE[X] +
t
2
2!
E[X
2
] +
To prove the central limit theorem we will use the following two facts.
Fact 1. Let a
n
a as n . Then,
lim
n

1 +
a
n
n

n
= e
a
Fact 2. Let X, Y be r.v.s. If both M
X
(t) and M
Y
(t) exist on an open interval containing 0, and M
X
(t) =
M
Y
(t), then X and Y are identically distributed.
1
Weve used fact 1 before. Proving fact 2 requires Fourier analysis. Intuitively though, what it means is that
if two r.v.s have the same moment generating function, and it exists near 0, then the two r.v.s are actually
the same. That is, the m.g.f. completely characterizes the distribution in this case.
The last thing to do before we prove the theorem is nd the m.g.f. of the standard normal. If Z is a standard
normal, then
M
Z
(t) = E[e
tZ
]
=

e
tx
1

2
e
x
2
/2
dx
=
1

e
x
2
/2+tx
dx
=
1

e
(xt)
2
/2+t
2
/2
dx
=
1

e
u
2
/2+t
2
/2
du u = x t
= e
t
2
/2
1

e
u
2
/2
du
= e
t
2
/2
Proof of Central Limit Theorem. Using fact 2, to prove that the c.d.f. of Z
n
converges to the normal c.d.f.,
it suces to show the m.g.f of Z
n
converges to the normal m.g.f. Well also assume that = 0 and = 1;
we can eliminate these assumptions later. First,
M
Zn
(t) = E[exp (tZ
n
)]
= E

exp

t
X
1
+ +X
n

= E

exp

n
X
1

exp

n
X
n

= E

exp

n
X
1

exp

n
X
n

= M
X

n
Now, by Taylors theorem
1
M
X
(t) = M
X
(0) +M

X
(0)t +
M

X
2
t
2
+o(t
2
) = 1 +
t
2
2
+o(t
2
)
Since M
X
(0) = 1 and as shown above, M

X
(0) = E[X] = 0 and M

X
(0) = E[X
2
] = 1 by our assumptions.
Thus,
M
Zn
(t) = M
X

n
=

1 +
t
2
2n
+o

t
2
n

n
=

1 +
t
2
/2 +no(t
2
/n)
n

n
Now, o(t
2
/n)/(t
2
/n) 0 as t
2
/n 0 so for xed t, no(t
2
/n) 0 as n . Thus, t
2
/2 +no(t
2
/n) t
2
/2
as n . Thus, for each t, by fact 1,
lim
n
M
Zn
(t) = lim
n

1 +
t
2
/2 +no(t
2
/n)
n

n
= e
t
2
/2
= M
Z
(t)
1
In case you havent seen this notation before, f = o(g) as x a means f(x)/g(x) 0 as x a. You can think of this as
saying f goes to zero much faster than g near a.
2
This completes the proof. Now we eliminate the assumptions on and . Suppose that Y
1
, Y
2
, . . . are i.i.d.
r.v.s with mean and variance
2
. Let X
i
= (Y
i
)/. Then, each X
i
has mean 0 and variance 1, so by
the above proof, the distribution of:
Z
n
=
X
1
+ +X
n

n
approaches that of a standard normal. But,
Z
n
=
X
1
+ +X
n

n
=
(Y
1
)/ + + (Y
n
)/

n
=
Y
1
+ +Y
n
n

n
But this last expression is precisely what we wanted to show approaches the normal distribution.
It turns out the hypotheses of the CLT can be weakened slightly to be applicable in more cases. One problem
with the standard statement of the CLT is that it gives you no information about how big n needs to be in
order to get a good approximation. This can vary greatly depending on the distribution: if X
i
is already
normal, then n = 1 suces, but as X
i
deviates from the normal distribution, larger n is required before a
the approximation is good.
If these details are important, one can use the following theorem instead.
Berry-Esseen CLT. Let X
1
, X
2
, . . . , X
n
be independent r.v.s satisfying E[X
i
] = 0 for all i,

n
i=1
E[X
2
i
] =

2
, and

n
i=1
E[|X
i
|
3
] =
3
. Let S = (X
1
+ + X
n
)/, and let F denote the c.d.f. of S. Then, for all
x R,
|F(x) (x)| C
3
/
3
where C is a universal constant.
In fact, [Shiganov 86] has shown that one can take C = .7915, so you can even just lazily drop the C from
the right-hand side.
3

You might also like