You are on page 1of 5

CS 498/698 Cheat Sheet: Useful Facts from Probability Theory

Shai Ben-David D avid P al February 7, 2006


Useful analytical inequality: For any [0, 1] and any m 0 it holds (1 )m e m . This inequality is typically used when bounding probability of m independent outcomes each with probability (1 ). Proof. First we show, that for any real number x it holds 1 x ex . (1) If we draw the function y = 1 x on the left side of the inequality and the function y = ex on the right side, we get something as on Figure 1. These two functions have common value at the point x = 0 and also derivative of both functions at x = 0 is the same. Since ex is convex and 1 x is linear, we have proved inequality (1). For x [0, 1] both sides of (1) are non-negative. Hence, we may take the m-th power (m 0) of both sides and obtain (1 x)m exm . Replacing x by , we get the desired inequality.
y = ex

1 0 1 y =1x x

Figure 1: The functions y = 1 x and y = ex . 1

Union bound: If A and B are two probability events, then Pr(A B ) Pr(A) + Pr(B ). More generally, if A1 , A2 , . . . , An are probability events, then Pr(A1 A2 An ) Pr(A1 ) + Pr(A2 ) + + Pr(An ). Example: We demonstrate simple, yet very typical, use of both the union bound and the useful analytical inequality. Suppose we throw simultanously 100 fair six-sided dice. We show that with probability very very close to 1, all the numbers 1, 2, 3, 4, 5, 6 will appear on at least one of the dice. Let A1 , A2 , . . . , A6 be the event that respectively 1, 2, . . . , 6 appears on at least one of the dice. Let A1 , A2 , . . . , A6 be the event that respectively 1, 2, . . . , 6 does not appear on any of the dice. We have, Pr(A1 ) = Pr(A2 ) = Pr(A3 ) = Pr(A4 ) = Pr(A5 ) = Pr(A6 ) = 1 1 6
100

e100/6

What is the event A1 A2 A3 A4 A5 A6 ? It is the event that at least one of the numbers 1, 2, 3, 4, 5, 6 does not appear on any of the dice. Using the union bound we show that this event has small probability. Pr(A1 A2 A3 A4 A5 A6 ) Pr(A1 ) + Pr(A2 ) + Pr(A3 ) + Pr(A4 ) + Pr(A5 ) + Pr(A6 ) 6 e100/6 .00000034666491116514 And hence, the probability of the complementary event (i.e. that all numbers will show up) is at least 0.999999. Expected value: Expected value (average, mean) of a real random variable X is dened as, in discrete case

E(X ) =
i=

i Pr(X = i ),

when the values that X can have are in a countable set {i : i N }. In the continous case the sum is replaced by an integral

E(X ) =

x p(x) dx,

where p(x) is the probability density at point x. Example: What is the average number thrown on a die? It is E(X ) = 1 1 1 1 1 1 1 +2 +3 +4 +5 +6 6 6 6 6 6 6 = (1 + 2 + 3 + 4 + 5 + 6)/6 = 21/6 = 7/2 = 3.5 2

Few useful tricks for expectations: Linearity of expectation: E(X + Y ) = E(X )+E(Y ) and for any c R, E(cX ) = c E(X ). More to come . . . Markovs inequality: Let X be a non-negative real random variable with the expected value (i.e. mean) . Then, Pr(X > t) < or equivalently, 1 for any a > 0. a In more condesed form, we usually write as: If X 0, then Pr(X > a) < Pr(X > t) < or equivalently, Pr(X > a E(X )) < 1 a for any a > 0. E(X ) t for any t > 0, t for any t > 0,

Example: The average height of a human is 2 meters. What is the probability of meeting a human with height 10 or more? Well, by Markov inequality (assuming that height of humans is non-negative), this probability is at most 20%. Chebyshevs inequality: Let X be a non-negative random variable with the expected value (i.e. mean) = E(X ) and with variance 2 2 Pr(|X | t) 2 , t or equivalently 1 , for any a > 0. a2 This inequality states that, it is unprobable that a random variable will deviate much from its expected value. Pr(|X | a ) Cherno s bound: Let X1 , X2 , . . . , Xm be binary independent random variables, with E (Xi ) = p for all i = 1, 2, . . . , m. (By binary we mean, that Xi reaches only two possible values 0 and 1. Then E (Xi ) = p simply means that Pr[X = 1] = p and Pr[X = 0] = 1 p.) Let m 1 X= Xi m i=1 3 for any t > 0,

be the empirical average then for any > 0 Pr(X < E (X ) ) e2 Pr(X > E (X ) + ) e2
2m 2m 2m

Pr(|X E (X )| > ) 2e2 or equivalently Pr(X < p ) e2 Pr(X > p + ) e2


2m 2m

Pr(|X p| > ) 2e2

2m

Note that second bound follows from the rst bound by taking Yi = 1 Xi . The third bound follows from preceding two by the union bound. Example: Let us demonstrate an example of use of the Cherno bound. Suppose we throw simultanously 1000 fair coins. What is the probability that the number of heads will lie in the interval [400, 600]? Let Xi be the binary random variable, that is 1 i i-th coin falls head and zero otherwise. Then X1 + X2 + + X1000 is the random variable that counts the number of heads. Using Chernos bound we have Pr [X1 + X2 + + X1000 [400, 600]] = 1 Pr [|X1 + X2 + + X1000 500| > 100] 1 100 = 1 Pr |X1 + X2 + + X1000 500| > 1000 1000 = 1 Pr = 1 Pr Xi 1 > 0.1 1000 2 X 1 > 0.01 2
1000 i=1

1 2e(0.1) = 1 2e10 0.9999

2 1000

1000 1 where X = 1000 i=1 Xi is the empirical average. Hence the number of heads will be in the interval [400, 600] with probability at least 99.99%.

Chernos bound is implied by much more general Hoedings inequality. Hoedings inequality: Let X1 , X2 , . . . , Xm be independent random variables, such that ai Xi bi . (Note that Xi does not have to be identicaly distributed.) Let
m

S=
i=0

Xi 4

be the empirical sum then for any > 0 Pr(S < E (S ) m ) e Pr(S > E (S ) + m ) e

2 Pm2 (2bm 2 i ai ) i=1 2 Pm2 (2bm 2 i ai ) i=1 2 Pm2 (2bm 2 i ai ) i=1

Pr(|S E (S )| > m ) 2e

As before the second inequality follows from rst one, by applying it to Yi = bi Xi . Similarily as before, the third bound follows from preceding two by the union bound.

You might also like