You are on page 1of 44

Probability Distribution

Random Variable

A random variable has (real) values. Each value of the r.v. is associated with the corresponding p
Example 1: We are tossing a coin. If the value of r.v. X is 1 for head, and 2 for tail,
then the r.v. X can have only two values: 1 and 2.
We know the corresponding probabilities:
X
1
2

P(X = x), x = 1, or 2
1/2
1/2

Example 2: We are tossing two coins together.


Let the r.v. Y be 'the number of heads.'
In this case, we know the values of Y are 0, or 1, or 2. We also know the each probabilit
Y
0
1
2

P(Y = y), y = 0, or 1, or 2
1/4
2/4
1/4

Knowing the exact probability is very useful in decision makings. For example, if you know the
raining tomorrow, then you can decide whether you will take a umbrella with you or not.
But in many cases, we cannot calculate probabilities right away.
The probability theory in statistics tells us how to calculate the probabilities in many different situ
Learning the probability distribution is learning how to calculate probabilities for a given situation
Generally speaking, we are under two situations in calculating probabilities: Discrete case and C

(1) Discrete case : The values of a r.v. are not continuous. Examples are X = 1, 2 or Y =
There are no other values in between 1 and 2, or 2 and 3.
(2) Continuous case: The values of a r.v. are continuous. Examples are

5 < X < 10, o

Statistics theory gives you a direct probability function called 'the probability mass function (PM
the case (2) gives indirect function called 'probability density function (PDF)' that will be used to
Example of a PMF:

P(X) =

10

Cx 0.25x 0.75(10 - x)

(Don't worry about what

So, the probability of X to be 3 is:


Example of a PDF:

f( t ) = 0.4 e-0.4t, t 0

Cx me

10

EXCEL will calculate it for you.)


3
(10 - 3)
P(X=3) = 10C3 0.25 0.75
=

.. (1)

Be noticed that here we use the 'f( )', not 'P( )' for the function.
P stands for 'Probability function. So, f (t ) is not a probability function ye

In the continuous r.v. case, there are infinite numbers that the r.v. can take. For example,
here t is any positive number. It can be 0.24567, or 7.999999, or a very big number as long as it
The probability for a continuous r.v. to be an exact certain number out of infinite possible number
That is P(t = 7.99999) = P (t = 3.0) = P (t = 0.245657) = 0. So how can we calculate probability?
We consider certain interval, not an exact number, for the continuous r.v. For example,
What is the probability of t to be between 0.2 and 0.5,. i.e. P (0.2 < t < 0.5), if the function (1) is t
When a PDF function f (x) is given, P ( a < x < b) =

So, if the PDF is given as (1), P(0.2 < t < 0.5) =

(Again, don't worry about how to calculate this. EXCEL will do it.)

In sum, when the question is to calculate a probability of a discrete r.v. X,


statistics theory gives you the direct probability function P(x). But if it is a continuous r.v., then it
PDF function f (x), with which you can calculate the probability by integration.
P ( a < X < b) =

There are well known PMFs and PDFs. We will use them to calculate probabilities.
Discrete random variable
Distribution
(1) Binomial Distribution

PMF

P(X) =

Cx px (1 - p)(n - x)

x = 0, 1, 2 ,.., n

(2) Poisson Distribution

P(X) =
x = 0, 1, 2, 3, ,

Continuous random variable

PDF

(3) Uniform Distribution


f(x) =

a<x<b

(4) Exponential Distribution

elsewhere

f(x) = e-x, x 0

(5) Normal Distribution


- < x < ,

: mean, : standard deviation

(6) t - Distribution

- < x < , ( ) is the Gamma function


v is degree of freedom (df)

(7) Chi-square (
Distribution

)
0 < x < , ( ) is the Gamma function
v is degree of freedom (df)

(8) F - Distribution

0 < x < , ( ) is the Gamma function


v1, v2 are two degrees of freedom (df)

Remember, however the PMF or the PDF looks complicated, EXCEL can calculate the probability
Here are some examples.

Let's do not think of what the Binomial distribution, or Poisson distribution, or any other distribu
calculate the probability under a given specific distribution.
1. Discrete Random Variable Case
(1) Suppose a random variable X follows the binomial distribution with n = 10, and p =
then what is the probability of X be 2, i.e. P(X = 2)?
EXCEL answer:

=BINOM.DIST(2, 10, 0.25, 0)

P(X = 2) =

What is the probability X be less or equal to 2, i.e. P(X 2)?


This means that P(X = 0) + P(X = 1) + P(X = 2)
EXCEL answer:

P(X 2) =

=BINOM.DIST(2, 10, 0.25, 1)

In the discrete r.v. case, the 0 in the EXCEL function indicates the height of the PMF and
the 1 indicates the cumulative probability.
(2) A random variable X follows the Poisson distribution with = 2,
then what is the probability of X be 4, i.e. P(X=4)?
EXCEL answer:

=POISSON.DIST(4, 2, 0)

P(X = 4) =

What is the probability X be greater or equal to 2, i.e. P(X 2)?


P(X 2) = 1 - P(X < 2) = 1 - P(X = 0) - P(X = 1).
EXCEL answer:

=1 - POISSON.DIST(1, 2, 1)

P(X 2) =

Remember, the r.v. X under Poisson distribution can have values of 0, 1, 2, .


2. Continuous Random Variable Case (we calculate the probability in a interval)
(3) A random variable X follows the exponential distribution with = 10,
then what is the probability of X be less than 0.5?
EXCEL answer:

P(X < 0.5) = EXPON.DIST(0.5, 10, 1) 0.99326205

What is the probability of X be greater than 0.5, i.e. 1 - P(X < 0.5)?
EXCEL answer:

P(X > 0.5) = 1 - P(X < 0.5) = 1 - EXPON.DIST(0.5, 10, 1) =

Note: In the continuous case P(X = 0.5) = 0, so P(X < 0.5) = P(X 0.5).

(4) A random variable X follows the normal distribution with = 165, = 10,
then what is the probability of X be less than 163?
EXCEL answer:

P(X < 163) = NORM.DIST(163, 165, 10, 1) =

(5) A random variable X follows the normal distribution with = 165, = 10,
then what is the probability of X be between 163 and 167, i.e. P(163 < X < 167)?

This needs two steps: P(163 < X < 167) = P( X < 167 ) - P(X < 163). So we need P
EXCEL answer:

P(X < 167) =NORM.DIST(167, 165, 10, 1)


P(X < 163) =NORM.DIST(163, 165, 10, 1)

So,
P(163 < X < 167) =
P( X < 167 ) - P(X < 163
Or, we can combine the two steps in one:
=NORMDIST(167, 165,10,1) - NORM.DIST(163, 165,10, 1)
(6) A random variable X follows the t distribution with degree of freedom (d.f.) 50,
then what is the probability of t be between -0.3 and +0.3, i.e. P(-0.3 < t < +0.3)?

P(-0.3 < t < +0.3) = P( t < +0.3) - P(< t < -0.3) = T.DIST(0.3, 50, 1) - T.D

(7) A random variable X follows the F distribution with d.f. 1 = 10, and d.f.2 = 5,
then what is the probability of F be greater than 3.0 , i.e. P(F > 3.0)?
P(F > 3.0) = 1 - P(F < 3.0) = 1 - F.DIST(3, 10, 5, 1)
Inverse function of Distribution

So far we have calculated probability. EXCEL has the functions that will give the location corre
to a given probability.
For example, if X ~ N (170, 202), then P(X < 160) = NORM.DIST(160, 170, 20, 1) =
This means that the low 30.9% in the distribution is 160 Cm. So the location corresponding to th
probability 0.309 is 160 Cm.
P(X <160 ) = 0.309. Inversely, we ask where is 'a' that makes P(X < a) = 0.309?
We calculate this with the 'INVERSE' function of the normal distribution.
=NORM.INV(0.309, 170, 20)

160.026263

Normal dist
Distribution
Normal distribution

Inverse function
=NORM.INV(, , )

t-distribution

=T.INV(, df)

Chi-square distribution
F distribution
is probability

=CHISQ.INV(, df)
=F.INV(, df1, df2)

So far, we have calculated probabilities in a very strict way according to the statistics theory.
Let's calculate probabilities in more realistic situations.

(1) Binomial Distribution


A foolish student chooses any answer at random in questions. Each question has only one c
If he chooses answers in this way for 10 different questions, what is the probability for him to h

What is the probability to have more than 5 correct answers?


Now we cannot calculate these right away. It is a difficult problem.
But statistics theory says that the X, which is the number of correct answers he will pick up un
this condition follows the binomial distribution with n = 10, and p = 0.25. We accept this theo
it is easy to calculate the probability.
We express the binomial expression as:
X ~ B(n, p) : n is number of trials, p is the probability of the thing happen
(in our case he picks up the right answer) in a single trial.
In this example, X ~ B(10, 0.25)
So, P (X = 3) = BINOM.DIST(3, 10, 0.25, 0)

0.250282

Also, P (X 5) = P (X = 6) + P (X = 7) + ..... + P (X = (If


10)he picks up all he correct an
We can use the EXCEL function as = BINOM.DIST(5, 10, 0.25, 0)+ BINOM.DIST(6, 10
But EXCEL has another useful function for this:
'=BINOM.DIST(4, 10, 0.25, 1)' calculate the cumulative probability of P (X 4) = P(X
So , P (X 5) = 1 - P (X 4)

= 1 - BINOM.DIST(4, 10, 0.25, 1)

(2) Poisson Distribution


I receive, on average, 10 e-mails everyday. What is the probability of having no e-mail at all

Again. The theory says that if you know something happens times on average for a given co
the number of times that thing happens for the same condition follows the Poisson distribution
Simply we express,
X ~ P (),

x = 0, 1, 2, . ,

, is average occurrences in the given condition.

(Here P stand for POISSON distribution)


The given condition can be a time period, certain space area.

In the above e-mail case, P (X = 0) = POISSON(04.5400E-005


What is the probability of receiving more than 20 e-mails? Using the cumulative function of E
P (X 20) = 1 - P (X 19) = 1 - POISSON (19, 10, 1)

0.003454

Examples of applications of Poisson distribution are (from http://www.aabri.com/SA12Manuscrip


The number of bankruptcies that are filed in a month The number of
arrivals at a car wash in one hour (Anderson et al., 2012, p. 236).
The number of network failures per day (Levine, 2010, p. 197).
The number of file server virus infection at a data center during a 24-hour
period . The
number of Airbus 330 aircraft engine shutdowns per 100,000 flight hours. The
number of
asthma patient arrivals in a given hour at a walk-in clinic (Doane, Seward,

asthma patient arrivals in a given hour at a walk-in clinic (Doane, Seward,


2010, p. 232).
The number of hungry persons entering McDonald's restaurant. The number
of workrelated
accidents over a given production time, The number of birth, deaths,
marriages,
divorces, suicides, and homicides over a given period of time (Weiers, 2008,
p. 187).
The number of customers who call to complain about a service problem per
month
(Donnelly, Jr., 2012, p. 215) .
The number of visitors to a Web site per minute (Sharpie, De Veaux,
Velleman, 2010, p.
654).
The number of calls to consumer hot line in a 5-minute period (Pelosi,
Sandifer, 2003, p.
D1).
The number of telephone calls per minute in a small business. The number
of arrivals at a
turnpike tollbooth par minute between 3 A.M. and 4 A.M. in January on the
Kansas
Turnpike (Black, 2012, p. 161)
The
The
The
The
The
The
The
The
The
The
The
The

number
number
number
number
number
number
number
number
number
number
number
number

of
of
of
of
of
of
of
of
of
of
of
of

soldiers of the Prussian army killed accidentally by horse kick per year
bankruptcies that are filed in a month
arrivals at a car wash in one hour
network failures per day
file server virus infection at a data center during a 24-hour period
Airbus 330 aircraft engine shutdowns per 100,000 flight hours
hungry persons entering McDonald's restaurant.
visitors to a Web site per minute
calls to consumer hot line in a 5-minute period
fleas on the body of a dog
repairs needed in 10 miles of highway
defects in a 50-yard roll of fabric

Now we turn to the probability distribution of continuous random variables


(3) Uniform Distribution

A bus runs every one hour. If I go to the bus stop and wait for the bus, what is the probability o
waiting time being less than 10 minutes?
X, the waiting time will be 0 ~ 60, and it will have the uniform distribution.
So, P(X < 10 ) = 10 * (1 / 60)

(4) Exponential Distribution


I receive a telephone call every one hour on average. I have just received a call. What is the
probability of receiving another call in 20 minutes?
According to the theory, the r.v. X (the duration time that another thing occurs from the mom

follows an exponential distribution.


X ~ EXP ( ), is average occurrences in a given time.

If we measure time as the hour unit, then X ~ EXP (1), but if we measure in minute, X ~ EX
because happening once an hour is happening 1/60 times in a minute.

So, in the hour term, the probability of taking 20/60 hour is: P(X < 20/60) = EXPON.D
In the minute term, the probability of taking 20 minutes is:
P(X < 20) = EXPON.DIST(
The answers are the same.

(5) Normal Distribution


Suppose that heights of high school students follow the normal distribution with mean 170 c
X ~ N (170, 202)

This indicates that X has a normal distribution with mean 1

and variance 400 ( = (standard deviation) 2 ), or standard d

What is the probability of a randomly picked student's height being shorter than 165 cm
P (X < 165) = NORM.DIST(165, 170, 20, 1) 0.40129367
What is the probability of his height being between 168 cm and 172 cm?

P ( X < 168 < 172) = P ( X < 172) - P (X <168) = NORM.DIST(172, 170, 20, 1) - NORM.D
0.07965567
What is the lower 10 % height? P(X < a ) = 0.10
=NORM.INV(0.10, 170, 20)=

Statistics theory tells that a new r.v. variable that is linearly transformed from a normally distri
follows a normal distribution. That is,

X ~ N (, 2 ). and if Y = aX + b , ( a, b are constants), then Y distributes N (a + b, a


So, if we multiply 2 and add 5 cm to each student's height, the new values Y will distribute,
Y ~ N (170 * 2 + 5, 4* 20 2) i.e.

Y ~ N (345, 402)

In the transformation X ~ N (, 2 )
Y ~ N (a + b, a 22 )
If a = 1/, and b = - / , then Y = (X - ) / . This Y distributes
Y ~ N (0, 1).

When a normal distribution has its mean zero and its standard deviation 1 (so the variance is a
the square of the s.d.), we call the distribution as 'standard normal distribution'.
This is a very important transformation of the normal distribution: Normally distributing any ran
transformed into the standard normal distribution if we subtract the mean from the original valu

by its standard deviation.


So if we subtract 170 cm and divide it by 20 to every student, the new values will have the
standard normal distribution.

Let's suppose one student's height is 180. Then (180 - 170) / 20 = 0.5
If another student's heights 168. Then (168 - 180) / 20 = - 0.1
If we do this calculation for all the students, then the collection of the new values, represented
the standard normal distribution. We call this procedure, that is, "subtracting its mean and
as standardization.
The standardized normal distribution is usually expressed as:
Z ~ N (0, 1)

Traditionally, the standardization of normal distribution was used to calculate probabilities unde
For example, P (X < 165) is the same as P(Z < -0.25) because
P(X < 165) = P((X - ) < ) < (165 - 170) / 20 ) = P(Z < -0.25)

When there was no EXCEL, all the statistics textbook had a probability table for Z in the appen
was done by reading off the probability values from the Z table. But now, with the EXCEL, we do
In EXCEL, P(X < 165) can be directly calculated by =NORM.DIST(165, 170, 20, 1) which is th
Small Sample Distribution
Let's suppose we select samples from a normally distributed population ().

And also suppose that there are all 300,000 Chinese male students, and their distributi
(the theoretical normal distribution assumes that there are infinite number of students,
X ~ N (170, 20 2)

If we select 25 students randomly out of 300,000, and if we calculate their mean - the
then the sample mean will be different whenever we select another 25 students.
If we pick 25-student samples as many time as we want, these different sample means
We call this as the distribution of sample means, or shortly, small sample distribution.
Lets be the sample mean. The theory says that the sample mean will have the fo
~ N (,

Please notice three things: First, the sample mean's distribution is also normal distributi
Second, the sample means have mean , the same as X ( the population mean),
Third, variance of the sample means is 1/n times of its population variance (2).

n is the number of students (observations) in the sample


This is very interesting. So in the above example, 25-student sample means will have:
~ N (170, (20 / 5)2 )

That is the normal distribution with mean 170 cm and var

Once we know this , then we can calculate probabilities on sample mean.


What is the probability of the 25-student sample mean be less than 168?
P ( < 168) = NORM.DIST(168, 170, 4, 1) 0.30853754
If we standardized the distribution,
P ( < 168) = P( Z <(168 - 170) / 4 ) =NORM.DIST(-0.5, 0, 1, 1)0.30853754

Suppose there is a group of 25 students, and we don't know whether they are Chinese s
was only 160.
1) What is the probability of these 25 students assuming that they are Chinese ?
If these 25 students are all Chinese, their heights belong to
So, the 25 students' sample mean ~ N(170, (20 /5) 2)

X ~ N (170, 202)

Since our sample mean was only 160 Cm, let's calculate the probability of the sample m
Then P( 160 )
= NORM.DIST(160, 170, 4, 1) =
0.00620967

The probability is very small. From this, can you say for sure that they are Chinese s
If you say YES, why?
If you say 'NO', why?, then what is your possible mistake for saying 'NO'?
(6) t - Distribution
From above, we know that the sample mean distributes as:
~ N (,

If we standardize this, that is, subtract its mean and divide by standard deviation,
Z=

~ N (0, 1)

As long as we know the population mean and population standard deviation , we ha


problem in calculating the probabilities about the . For example, as shown above,
P( 160 )

= NORM.DIST(160, 170, 4, 1) =

0.00620967

This is the same as , if we standardize it,


P( 160 ) = P(

< (160 - 170) / (20/5) ) = P( Z < -2.5) = NORM.DIS


0.00620967
But usually, we do not know the population s.d. . If we do not know , we cannot calcu
Then our best way to calculate the probability is somehow we estimate and use the e

The best estimation for is the sample standard deviation, that is, calculate the standa

Then, we can think of, from the standard form,

But as soon as we replace by s, this does not distribute the standard normal distributi

Theory say that this new r.v. distributes t - distribution with degree of freedom ( d.f. ) n
Suppose our calculation for 's' from the 25-student sample was 15 Cm.
This time, unknowing , but replacing it by s,

~ t (df = n-1)

So, P( 160 ) = P(

< (160 - 170) / ( 15

=T.DIST(-3.33333333, 24, 0.001388


Please do the exercise in the worksheet EXERCISE
(7) Chi-square (2 ) Distribution

The sum of the squares of k independent standard normal random variables has the Chi-square
2 (df = k) = Z12 + Z22 + + Zk2,

Z1, Z2, ., Zk are all independent, and each has stan

If the sample (x1, x2,, x n) are drawn randomly from the normal distribution of X ~ N (, 2),
Then,
, s2 is the sample variance

has Chi-square distribution with df = n - 1. For the proof of this theory, please refer to t

If we know the samples are from a normal distribution, we can calculate probability on its popula
Let's

X be height of students and X ~ N (, 2). Let's also assume that we do not know the valu

Someone says that 2 is 300. To see if it is correct, we sampled 25 students, and calculated the s

162.4

s2
208

So the sample variance seems smaller than the population variance the person says.
Under the assumption that population variance 2 is 300,

what is the probability of having 25 students that has sample variance be smaller than 208?

Since

= (25 -1)* (208 / 300) =

16.64

This distributes Chi-square df = 24. So P(2 < 16.64)


= CHISQ.DIST(16.64, 24, 1) 0.13639555
(8) F Distribution
If there are two independent random variables, each has Chi-square distribution as:
x1 ~ 2 (df = k1), x2 ~ 2 (df =k2) and independent each other, then the ratio of
~ F (K1, k2). It says that this ratio distributes F with df1 = k1, df2 = k2.

F distribution is useful to test if the variances of two independent groups are equal. Suppose tha
were measured from two groups of students . Also assume that heights are normally distributing.
We calculated basic statistics from the two samples.
Y1 ~ N (1, 12)
Y2 ~ N (2, 22)
N of students
Sample mean
Sample variance

(n1,
n2)
(1,
2)
(s12, s22)

Since

Group 1
25
168

Group2
30
171

360

300

, and

distributes Chi-square distributio

and under the assumption that 12 and 22 are the same, then the F ratio becomes simp
What is the probability of 12 = 22 ?

= 360 / 300 = 1.2

So, P (F < 1.2, df1 = 24, df2 = 29) = F.DIST(1.2, 24, 29, 1)

Random Numbers
Often we need numbers that follow a certain probability distribution to do some experiments.
Suppose we want to have 20 random numbers that follow N (170, 20 2)
EXCEL can do it.

Data - Data Analysis - Random numbers


Then give the number of variables you want (1), number of cases (20), then
- select normal distribution - give the mean (170) and standard deviation (20)
and give the cell address you want to have the numbers.
Expectation Value of Random Variable
Expectation value of X , E(X) is the mean of the random variable X
Expectation value of (X - E(X))2 , E(X - E(X))2 is the variance of the random variable X
E(X) and E(X -E(X))2 are defined as:
(1) When the random variable is discrete;
Mean(X) = E(X) =
VAR(X) = E(X - E(X))2 =
ex 1)

Random variable X has the following probability distribution


X
0
1
2

P(X)
0.25
0.50
0.25
sum =

X * P(X)

(X - E(X))2

0
0.5
0.5
1

(X - E(X))2* P(X)
1
0.25
0
0
1
0.25
0.5

E(X) = 0 * 0.25 + 1 * 0.5 + 2 * 0.25 = 1.0


Var(X) = (0 - 1.0)2 * 0.25 + (1 - 1)2 * 0.5 + (2 - 1.0)2 * 0.25 = 0.5
ex 2)

Let X be the value of the outcome of tossing a dice.


X
1
2
3
4
5
6

= 1P(X)
/6

0.167
0.167
0.167
0.167
0.167
0.167
Sum =

X * P(X)
0.167
0.333
0.500
0.667
0.833
1.000
3.500

(X - E(X))2 (X - E(X))2* P(X)


6.25
1.04
2.25
0.38
0.25
0.04
0.25
0.04
2.25
0.38
6.25
1.04
2.92

E(X) = 1 * (1/6) + 2 * (1/6) + + 6 * (1/6) = 3.5


Var(X) = (1 - 3.5)2 * (1/6) + (2 - 3.5)2 * (1/6) + + (6 - 3.5)2* (1/6) = 2.92
ex 3)X ~ P() : The random variable follows the Poisson probability distribution with
the average number of happenings .

So, P(X) =

E(X) =

where y ~ P().

Because

VAR(X) = E(X - E(X))2 = E(X2) - (E(X))2 =E(X(X -1)) + E(X) - (E(X))2


= E(X(X -1)) + - 2
2
and we can prove that E(X(X -1)) =Sothe
. VAR(X) becomes .

See question 12 of exercise 2.


(2) When the random variable is continuous;
Mean(X) = E(X) =
VAR(X) = E(X - E(X))2 =
ex)

X ~ U(a, b) : X has uniform distribution between a and b


Then f(x) = 1 /(b - a)
a<X<b
= 0 otherwise

E(X) =

VAR(X) =

Let

Then dy = dx,

So , (1) becomes,

Rules in E(X) and VAR(X)


(1)
(2)
(3)
(4)

E(a) = a , where a is a constant


E(aX + b) = a E(X) + b
E(aX + bY) = a E(X) + b E(Y)
VAR (X + b) = VAR(X - b) = VAR(X)

(5) VAR (aX + b) = a2 VAR(X)


(6) VAR(aX + bY) = a2VAR(X) + b2VAR(Y) + 2 COV(X, Y)COV: Covariance
(7) VAR(aX + bY) = a2VAR(X) + b2VAR(Y) when X and Y are Independent
(8) VAR(X) = E(X - E(X))2 = E(X2) - [E(X)]2
Proof of (8) :

E(X - E(X))2 = E[(X - E(X)) * (X - E(X)] = E[(X - E(X)) * X] - E[(X - E(X)) *


= E[(X2 -X* E(X)] - E(X) * E[(X -E(X)] = E(X2) - E(X) * E(X) - E(X) * 0
= E(X2) - [E(X)]2

Note: E(X) is a constant, so E(X - E(X)) - E(X) - E(E(X)) = E(X) - E(X) = 0


(8) is a similar relation with:

1.959964

he corresponding probability.

}}+}+}+}+}+}+'

the each probability.

le, if you know the exact probability of


ou or not.

many different situations.


or a given situation.
iscrete case and Continuous case.

are X = 1, 2 or Y = 1, 2 , 3.

re

5 < X < 10, or - < Y <

y mass function (PMF)' in the case of (1) while


that will be used to calculate the probability.

about what

10

Cx means.

lculate it for you.)


0.253 0.75(10 - 3) =
3

0.250282288

bability function yet.

For example,
umber as long as it is positive.
e possible numbers is always 'zero'.
culate probability?
r example,
the function (1) is the PDF?

0.10438559

L will do it.)

tinuous r.v., then it gives you the

EXCEL function to calculate probability


P(X)

P(X)

=BINOM.DIST(x, n, p, 0)

=POISSON(x, , 0)

P(X < c) = (c - a) / (b - a)

a < c <b

P(0 < X < a) =EXPON.DIST(x, , 1)

P (- < X < a=NORM.DIST(x, , , 1)

P( - < x < t=T.DIST(x, df, 1)


P( | x | > t)
=T.DIST.2T(x, df)

amma function

P(0 < X < a) =CHISQ.DIST(x, df, 1)

mma function

P(0 < X < a) =F.DIST(x, df1, df2, 1)

late the probability by its simple function.

r any other distribution is now. We just want to

h n = 10, and p = 0.25,

0.28156757
X

P(X)

0
1
2
3
4
5

0.5255928

ght of the PMF and it is the probability.


sum

0.59049
0.32805
0.0729
0.0081
0.00045
0.00001
1

0.09022352

0.59399415

f 0, 1, 2, .

DIST(0.5, 10, 1) =

P(X)
0 0.056314
1 0.187712
2 0.281568
3 0.250282
4 0.145998
5 0.058399
6 0.016222
7 0.00309
8 0.000386
9 2.86E-005
10 9.54E-007

0.006737947

0.42074029

63 < X < 167)?

63). So we need P( X < 167 ) and P(X < 163)


0.57925971
0.42074029

P(X)
0 0.133484
1 0.311462
2 0.311462
3 0.173035
4 0.057678
5 0.011536
6 0.001282
7 6.10E-005
1

0.15851942
0.15851942

dom (d.f.) 50,


0.3 < t < +0.3)?

IST(0.3, 50, 1) - T.DIST(-0.3, 50, 1)


0.23457912

0.11848355

e the location corresponding

0.309
orresponding to the

< a) = 0.309?

Normal distribution

statistics theory.

tion has only one correct answer out of the four possible answers.
bability for him to have 3 correct answers?

s he will pick up under


We accept this theory, and then

ng happen

up all he correct answers, X =10)


BINOM.DIST(6, 10, 0.25, 0) + . +BINOM.DIST(10, 10, 0.25, 0)

of P (X 4) = P(X = 0) + P(X = 1) = + P(X = 4)


0.07812691

ng no e-mail at all today?

rage for a given condition,


oisson distribution with .

Simon Denis Poisson (French, 1781~1840)

lative function of EXCEL,

om/SA12Manuscripts/SA12083.pdf)

rse kick per year

-hour period

is the probability of my

a call. What is the

curs from the moment it just happened now)

e in minute, X ~ EXP (1/60)

20/60) = EXPON.DIST( 0.283468689


20) = EXPON.DIST(20, 6 0.283468689

n with mean 170 cm, and standard deviation 20 cm.

bution with mean 170,

n) 2 ), or standard deviation20.

shorter than 165 cm?

70, 20, 1) - NORM.DIST(168, 170, 20, 1)

0.10, 170, 20)=

144.3689687

m a normally distributing r.v. also

butes N (a + b, a 22 )

Y will distribute,

so the variance is also 1, because variance is

istributing any random variable can be


m the original values and divide it

es will have the

alues, represented by Y, will have


ng its mean and divide by s.d.",

probabilities under normal distribution.

for Z in the appendix, and the specific calculation


h the EXCEL, we don't necessarily have to standardize.
, 20, 1) which is the same as =NORM.DIST(-0.25, 1, 0, 1).

on ().

and their distribution of heights (X) is :


umber of students, but let 300,000 be enough)

e their mean - the sample mean,


25 students.
ent sample means will have a distribution.
mple distribution.
an will have the following distribution.

so normal distribution.
tion mean),

riance (2).

ons) in the sample.

e means will have:

n 170 cm and variance 16 cm.

they are Chinese students or not. their average

e Chinese ?

~ N(170, (20 /5) 2)

ity of the sample mean less or equal to 160 Cm.

they are Chinese students?

rd deviation,

deviation , we have no

s shown above,

< -2.5) = NORM.DIST(-2.5, 0, 1, 1) =

, we cannot calculate the probability this way.


te and use the estimated value for the unknown .

alculate the standard deviation from the sample.

d normal distribution anymore.


freedom ( d.f. ) n -1 .

< (160 - 170) / (15/5) ) = P( t < --3.333)

has the Chi-square distribution with d.f. = k.


and each has standard normal distribution

n of X ~ N (, 2),

y, please refer to the 'CHI' sheet.

bility on its population variance 2 .


not know the value of 2 .

nd calculated the sample mean (), and sample variance (s2).

aller than 208?

the ratio of

k1, df2 = k2.

qual. Suppose that heights


mally distributing.

hi-square distribution,

atio becomes simply

0.68300897

me experiments.

ution with

* X] - E[(X - E(X)) * E(X)]

X) - E(X) * 0

) = E(X) - E(X) = 0

Exercise 1
1. Lex X be the number of heads in tossing three coins at the same time,
(1) Show the probability distribution of X
X

P(X)

(2) Show the probability distribution of Y, where Y = X2.


Y

P(Y)

2. There are two coins labeled as A, and B each. Coin A and B are tossing together.
Let X be 0 if A coin appears head, 1 otherwise.
Let Y be 1 if B coin appears head, 2 otherwise.
Show the probability distribution of Z, where Z = X + Y.
Z

P(Z)

3. Find P(X), the probability mass function (PDF) , of X.


X
0
1

P(X)
p
1-p

You have to give the


exact function of X.

P(X) =
4. There are 7 red chalks and 3 white chalks.
How many different of combinations of colors can you have?

5. There are only two cases for each trial; 'success' and 'failure'.
We try three times.
(1) Display all the possible outcomes.
1

(S, S, S)

S means Success ; F means Failure

2
3
4
5
6
.
.
?
(2) Suppose the probability of 'Success' in each try is 0.2, and
X be he number of 'Success'. Show the probability distribution of X.
X

P(X)

1. The binomial distribution becomes looking like a normal distribution as the number of
trials ( = n) increases. Let's do some experiment.

We set the p = 0.4Fill the cells with probabilities. Make a chart for the three probability distribut

X
0
1
2
3
4

n=5
P(X)
0.07776

n=10
X
0
1
2
3
4

P(X)

X
0
1
2
3
4

6
7

6
7

8
9

8
9

10

10

Sum =

11
12
13
14
15
16
17
18
19

Sum =

* Each sum must be 1.

n=20
P(X)

20
Sum =
3. Poisson probability distribution approaches to a normal distribution
as (average number of happenings) increases.
Calculate P(X) when X = 0, 1, 2, .., 15 for = 1, = 5, = 7. Then draw a chart showing
the three distributions.

X
1
2
3
4
5

=
5

P(X)

P(X)

P(X)

6
7
8
9
10
11
12
13
14
15
4. Let X ~ B(10, 0.25) which says the random variable X follows the BINOMIAL distribution with
the total number of trials is 10, and each trial has the 0.25probability of success.
Calculate the probabilities in two ways, and check if two cases have the same values.

Using the EXCEL


function
=BINOM.DIST( )

Using the
formula for
the binomial
distribution

(1) (P(X = 2)
(2) P(X >= 5)
(3) P(3 <= X <= 7)
(4) P(X < 4)

5. In the binomial distribution, when p is small and n is big, the calculation of probability is difficult
For example, suppose the death from cancer is 10 persons in 1,000. We want to calculate the pr
of 5 persons among 200 would die from cancer. In this case p =0.01, n= 200, and x= 5. So P(X
P(X = 5) =

200

C5 (0.01)5 * (0.99)195

To calculate this, we need to calculate 0.99 195.


Of course with EXCEL, this is as simple as '=BINOM.DIST(5, 200, 0.01, 0)'
A French mathematician Poisson calculated this in an approximated way.

Since p = 0.01, then the average death among 200 persons is 2. So, if we interprete the prob
as a Poisson distribution, It can be calculated as P(X = 5) from the Poisson distribution with =
P(X = 5) = 'POISSON.DIST(5, 2, 0)' = 0.036089409 Close enough!

Let's experiment how close the two values are: one from Binomial, the other from Poisson distribu

n
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150

Binomial
0.0000000240

Poisson
###

0.0137762623

###

Check if, as n increases, the two values are closing each other.
6. It is known that the average number of death by cancer is 50 out of 1000 persons.
Calculate the probability of at least 50 death in 10000 persons.
(1) Using the BINOMIAL distribution
(2) Using the POISSON distribution
7. Weights of students follow the normal distribution with = 45 Kg, s.d. = 5. Calculate:
(1)
(2)
(3)
(4)
(5)

P(X < 45)


P(40 <X< 50)
P(IX - 45I > 3)
(I I means absolute)
Where is the lower 5% ?
Where is the upper 10%?

8. Let X ~ N(, 2) (the random variable X follows the normal distribution with the mean and the
And let Y = a * X + b (A, and b are constants).
(1) What is the E(Y)?
(2) What is the VAR(Y)?
(3) when a = 1 / and b = - /, that is, Y = (X - ) / ,
What is the E(Y)?
What is the VAR(Y)?

9. Random variable X has t-distribution with d.f. = 10. Calculate:


(1) P(t < 0.25)
0.5961758971
(2) P(-0.1<X<+0.1)
0.0776792814
(3) P(It - 0.5I > 1.5)
(4) Where is the lower 5%?
-1.8124611228
(5) Where is the upper 5%?
1.8124611228
10. A sample of 25 student's height is given.
Students
1
2

Height
163.7
183.1

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

160.8
166.3
180.3
163.9
155.3
161.1
138.5
164.5
155.7
163.6
173.7
179.9
184.8
165.6
152.8
175.2
170.1
169.3
165.5
123.6
142.2
145.7
156.0

(1) Show some basic descriptive statistics.


(2) Make a HISTOGRAM.
(3) Someone says that this sample came from the population that
normal distribution with = 170, = 20, i.e.
Can you believe that?
(4) Suppose was not known.

What is the s.d. of these 25 students?


What is the t value to calculate the probability about the sampl
Can you say that this sample came from the population describ

11. Generate 20 random numbers that follow a uniform distributing between 0 ~ 1.


To check if they distribute uniformly, make a histogram with the bin as follows.
0.0
0.1
0.2
0.3

~
~
~
~

0.1
0.2
0.3
0.4

0.4
0.5
0.6
0.8
0.9

~
~
~
~
~

0.5
0.6
0.7
0.9
1.0

The expected value for each interval is 2. Do you have a similar result?
If you are not satisfied with your result, then have 200 random numbers and do the sam
This time the expected value is 20 for each interval. Check your result.
Generating 20 random numbers of uniform distribution between 1 and 2
Data - Data Analysis - Random numbers number of variables : 1
number of random numbers: 20
Uniform distribution : beginning 0, ending 1
12. Calculate E(X), and VAR(X), where the probability distribution of X follows the Uniform
between the interval of a and b. So the f(x) = 1 / (b - a) where a< x < b, and 0 elsewhere.
E(X) =
VAR(X) =

13. Calculate E(X(X -1)) where X ~ P().


Hint: E(X (X -)) =

Answer: 2

on as the number of

r the three probability distributions.

hen draw a chart showing

BINOMIAL distribution with


bility of success.
ave the same values.

P(X) =

Cx px (1 - p)(n - x)

Cx in EXCEL is '=COMBIN(n, x)

ulation of probability is difficult without EXCEL.


00. We want to calculate the probability
.01, n= 200, and x= 5. So P(X = 5) is:

0.035723357

. So, if we interprete the problem


e Poisson distribution with =2.

se enough!

the other from Poisson distribution. We set, P= 0.01, X = 5, and we increase n.

(absolute) Difference
0.0000000514

0.0003436931

of 1000 persons.
Use an EXCEL inverse function for normal distribution:
=NORM.INV(0.05, 45, 5) function.
It will give you the value 'a', where P( X < a) = 0.05

, s.d. = 5. Calculate:

bution with the mean and the variance 2.

came from the population that has


= 20, i.e.

X ~ N (170, 202),

he probability about the sample mean?


me from the population described in (2)?

between 0 ~ 1.
bin as follows.

similar result?
ndom numbers and do the same thing.
ck your result.

etween 1 and 2

X follows the Uniform


a< x < b, and 0 elsewhere.

You might also like