You are on page 1of 11

Statistical Formula Sheet 1

'X 'X 'f @X


Formulas for the mean: X̄ ' µ ' X̄ '
n N n

n IL
Ranged table median formula: Median ' L % & CF @
2 f

L is the beginning of the interval for the range that holds the median value. n is the total
observations for the table(sum of frequencies). CF is the cumulative frequency of the ranges prior
to the range of the median value. f is the frequency of the range that holds the median value. IL is
the interval/range length( IL = range high - range low + 1 ).

Formulas for the population standard deviation and variance:


'X 2
2 'X 2 '(X & µ)2 'X 2 'X 2
2 '(X & µ)2
' & ' ' & '
σ σ σ

N N N N N N
Formulas for the sample standard deviation and variance:

('X)2 ('X)2
'X 2 & 'X 2 &
n 'X 2 & n(X)2 n 'X 2 & n(X)2
s2 ' ' s ' '
n & 1 n & 1 n & 1 n & 1

('fX)2
'fX 2 &
n
Formulas for the sample standard deviation and variance from a range table: s '
n & 1

3(mean & median)


The skewness formula: Sk '
standard deviation
Factorial: n! = n(n-1)! where 0! = 1 ie 5! = 5C4C3C2C1C1 = 120
n! n!
Permutation formula: nPr ' Combination Formula: nC r '
(n & r)! r!(n & r)!

r µX
Binomial formula: P(r) ' nC r@ p @ q n&r Poisson Formula: P(x)'
X!@ e µ

Formulas for the probability standard deviation and variance: µ ' '[XP(X)] 2
' '[(X & µ)2@ P(X)]
σ

2
Standard probability formulas: p % q ' 1 µ ' np ' n@p@(1&p)
σ

X & µ s
σ

Z-score formulas: Z ' ' sX '


σ

σ
X
n n

Http://www.zen.home.att.net
Troy E. O'Brien Page 1
s p(1 & p)
Confidence interval formulas: X ± Z p ± Z
n n

s p(1 & p)
Confidence interval formulas for small samples (n-1 degrees of freedom): X ± t p ± t
n n

2 2
z@s Z
Formulas for determining the size of a sample: n ' n ' p @(1 & p)@
E E

X & µ X1 & X2 p & p


Z-score formulas for test of hypotheses: Z ' σ
Z ' Z '
2 2 p(1 & p)
s1 s2
n % n
n1 n2
n1p̄1 % n2p̄2 p̄1 & p̄2
Comparison of two probability samples: p̄c ' Z '
n1 % n2 1 1
p̄c(1 & p̄c )( % )
n1 n2

t-score formulas for test of hypotheses (n - 1 degrees of freedom)

X̄ & µ X̄1 & X̄2 p & p


t ' t ' t '
s 2 2 p(1 & p)
s1 s2
n % n
n1 n2
n1p̄1 % n2p̄2 p̄1 & p̄2
Comparison of two probability samples: p̄c ' t '
2 2 n1 % n2 1 1
2 (n & 1)(s1 ) % (n2 & 1)(s2 ) p̄c(1 & p̄c )( % )
Pooled variance: sp ' 1 n1 n2
n1 % n2 & 2

Set theory formulas: P(A or B) = P(A) + P(B) P(A or B) = P(A) + P(B) - P(A and B)
P(A) = 1 - P(~A) P(A and B) = P(A)P(B) P(A and B) = P(A)P(B|A)

Correlations( y = a + bx ): b ' j
xy & x̄ @j y j xy & x̄ @j y
a ' ȳ & bx̄ r '
j x & nx̄
2 2
(j x 2 & nx̄ 2)(j y 2 & nȳ 2)
j y & aj y & bj xy
2
se ' --- note: n - 2 degrees of freedom
n & 2
se
sb '
j x & nx̄
2 2
b & hypothetical value
These are used on the slope of the correlation: b ± t@sb And t '
sb
b
t ' - used to test if a correlations does exists: see *
sb

1 (x & x̄)2
For a given x value: sa % bx ' s e % (a % bx) ± t@sa % bx
n j x & nx̄
2 2

Http://www.zen.home.att.net
Troy E. O'Brien Page 2
Test of dependance/independence:
r r 2(n & 2)
t ' ' --- see *
(1 & r 2) (1 & r 2)
(n & 2)
* Ho: No Correlation or No Dependence
If t > t crit then positive dependence/correlation
If t < -t crit then negative dependence/correlation
If t > t crit or t < - t crit then dependent/correlate
Note - n -2 degrees of freedom

The F score is used when comparing the variance of two samples. This score is also used with the Anova
test on multiple groups to see if the variance is the same for all groups.
2
s1
F ' With n1 - 1 degrees of freedom in the numerator and n2 - 1 in the denominator
2
s2

If F < 1 then use the following to calculate F


2
s2
F ' With n2 - 1 degrees of freedom in the numerator and n1 - 1 in the denominator
2
s1

Notes: Treat an entire population as 4 degrees of freedom


Fcrit And F are both defined as being greater to or equal to 1.
1
You might have to make F equal to
F Is the value you read from the F table F
crit

Ho: the variance of the two groups is the same


Ha: the variance of the two groups is different

General Note: Crit or critical scores are that are from tables.

Important Notes:
t-scores use n - 1 degrees of freedom unless two samples are involved then use n1 + n2 - 2

Probabilities are always in the range of 0 < p < 1. Negative or probabilities greater than 1 have no meaning.
If a range formula gives you a negative or greater than one probability, then stop/represent it as zero(for a
negative) or one(for anything greater than one).

N or n is the size of a population or sample.

P, in probability terms, stands for success and is calculated by the number of ways to succeed divided by
the total ways.

q, in probability terms, is the probability of failure and is calculated as 1 - p.

Http://www.zen.home.att.net
Troy E. O'Brien Page 3
Important to Know Definitions:
Mean - Arithmetic average of numbers.
Mode - The number that comes up the most often.
Median - The number in the middle when all the numbers are arrange from low to high. If between
two numbers, take the average of the numbers.
Expected Value - The mean value or the average value that is the expected outcome.
Standard Deviation - Measurement of the spread of a group of numbers.
Variance - The measurement of the squared distance of each number in a sample from the mean of
a sample. It is also the standard deviation squared.
Sample - A portion of a population taken to represent a population.
Population - All data that represents a group. IE Davenport has a school population which they have
complete data representing that group. Secretary of State has data on the population of
valid Michigan Drivers.
Statistic - A single measurement or calculated value from a group of numbers.
Statistics - Measurements or calculated values used to interpret a problem, sample, or population.
Combination - Selecting a group when order does not matter. IE ABC is the same as CAB.
Permutation - Selecting a group when order matters. IE ABC is not the same as BAC.
Tree Diagram - A diagram used to show multiple events by using a hierarchy just like the diagram used for
tree factoring.
Venn Diagram - A diagram made of circles to show the relationships of sets and subsets. Helps
organize data to eliminate repetition.
Survey - Taking a random sample of the population to measure some property of the population.
Bias - Using statistics to prove your point without reguard to the actual statistics. This is also caused by
poor wording of questions or leading people into giving an answer. Meaningless statistics are
created by bias. Do not be bias with your statistics.
Fun - Something statistics can be if you come to class well prepared.
Dumb Question - No such thing.
Recursion - See recursion.
Puns - Type of joke often told by your instructor and usually followed by groans and/or moans.
Wordprocessor - What you should use to type up your paper.
Binomial - Event with only two possible outcomes, success and failure. When up of several sub
binomial events, the binomial formula may be used.
Midpoint - The value calculated in a range table by adding the high and low of each interval and
dividing that result by two. It is also used as the x vaule for calculating the mean, mode,
and standard deviation of a range table.
Discrete Numbers- These are numbers that fall on a number line and include data like salaries, ages,
distance, etc
Non-Discrete Numbers - These are categorical numbers that are turned into percentages. They
include data like gender, ethnic background, yes/no survey questions, etc
Histogram - A bar chart used to show relationships between a dependant and independent variable.
Central Limit Theorm - For almost all population, the sample distributions of X is approximately normal
when the simple random sample size is sufficiently large.

Http://www.zen.home.att.net
Troy E. O'Brien Page 4
Logic And Sets Handout
p ~p
T F
F T

p q p ¸ q p º q p 6 q p : q

T T T T T T
T F F T F F
F T F T T F
F F F F T T

De Morgan’s Laws for statements p and q:


-( p º q ) / -p ¸ -q -( p ¸ q ) /-p º -q

Set Theory - DeMorgan’s Laws: (a _ b)) ' a ) ^ b ) (a ^ b)) ' a ) _ b )


A’ = U - A

Hints:
Make Into

Where:
A={1,2,4,5} B={2,3,4,6}
C={4,5,6,7} U={1,2,3,4,5,6,7,8}

Http://www.zen.home.att.net
Troy E. O'Brien Page 5
Variable
Meaning
Sample Population

X̄ µ Mean or arithmetic average

n N Number of observations

s Standard deviation

p̄ p Probability mean

Z-score Z-score Used with large groups. Rule of thumb is that


n$30 or N$30. Can use 4 as the degrees of
freedom on the t-score table

t-score t-score Used with small groups. Rule of thumb is that


n<30 or N<30. Degrees of freedom = n - 1

s2 2 Variance

f f Frequency - used with grouped data to show


or or how many occurrences in a group(x)
F(x) F(x)
P(x) P(x) Probability of event x occurring. This notation is
also used to represent the poisson distribution
function.

Subscripts are used with variables/symbols to show which belong to which sample. X1, s1, and n1 are all
from the same sample.

Important constants: i ' &1 , e = 2.718281828... and π = 3.141592654...

To the right is the foundation of how you should set up


every test of hypothesis. This corresponds to steps 2
through 4 on the next page. You should get into the
habit of drawing it every time you start doing a test of
hypothesis.

Also remember that "YES" is a dirty word when doing tests of hypothesis. You do not want to make absolute
statements because your level of significance is how often that you may be wrong. In addition, less-than and
more-than are the key words for a one tail test. Different or not the same is your clue for a two tail test.

Http://www.zen.home.att.net
Troy E. O'Brien Page 6
The Bell Curve

Statis tics
uses th e area
under thi s curve

Left Tail Right Tail


Ha Ho Ho Ha

A B C D
Negative Z-scores 0 Positive Z-scores
Negative t-scores 0 Positive t-scores

Z-scores and t-scores measure the distance from the center of the curve to the point on the line
where the desired amount of area has been accumulated. In the above example, there are four divisions
under the curve. Please note that A + B + C + D = 1 and A + B = .5 and C + D = .5. In a one tailed test,
either A or D will be equal to the level of significance. In a two tailed test, A and D will both be equal to half
of the level of significance. Remember: Level of Confidence + Level of Significance = 1

In testing of hypothesis, Ho is always equality and Ha is either <, >, or …. To check for #, you use > in Ha. To
check for $, you use < in Ha.

The steps to follow when working a test of hypothesis


1) Determine and list the variables from the problem. The chart on the previous page will help in doing
it.

2) Determine Ho and Ha. Knowing the variables is needed for this step.

3) Determine the level of significance.

4) Draw a bell curve and show the tail(s). Also determine the Z-score(s) or t-score(s) for the reject
criteria.

5) Use the proper formula to determine the Z-score or t-score for the test. This can easily determined
from step 1 where you listed out the variables of the problem.

6) Show the result on the graph you drew. This will let you know whether to accept Ho or Ha.

X̄ & µ X̄1 & X̄2 p̄ & p p̄1 & p̄2


Z ' σ
Z ' Z ' Z ' W h e r e
2 2 p(1 & p) 1 1
s1 s2 p̄c(1 & p̄c )( % )
n % n n1 n2
n1 n2
n1p̄1 % n2p̄2
p̄c '
n1 % n2

Http://www.zen.home.att.net
Troy E. O'Brien Page 7
ANova Help Sheet
Group 1 Group 2 Group 3 Group 4
Notes Grand
x x2 x x2 x x2 x x2 Totals

18 324 17 289 14 196 15 225

14 196 13 169 15 225 16 256

15 225 17 289 15 225 15 225

16 256 17 289 16 256 16 256

17 289 11 121 19 361 17 289

12 144 14 196 12 144 11 121

19 361 17 289 10 100

21 441 16 256 22 484

N is the total of group n’s 13 169 10 100

k is number of groups 18 324 k=4

Column total - T 92 129 137 150 508

n for each group 6 8 9 10 33

sum of x 2 for each group 1434 2155 2121 2380 8090

T2 T2
sum of squares treatment: sst ' j ( c ) & j
( x)2
sum of squares error: sse ' j (x 2) & j ( c )
nc N nc
sst
k - 1 degrees of freedom in the numerator
k & 1
F ' For F score from table
sse N - k degrees of freedom in the denominator
N & k
From above :
922 1292 1372 1502 5082
sst ' ( % % % ) & ' 7826.236111111 & 7820.121212121 ' 6.1148989899
6 8 9 10 33

922 1292 1372 1502


sse ' 8090 & ( % % % ) ' 8090 & 7826.236111111 ' 263.763888889
6 8 9 10
6.1148989899
4 & 1 2.0382996633
F ' ' ' 0.2241045598951
263.763888889 9.095306513414
33 & 4
1
Since F < 1, we will use or 4.462202823842 with N - k degrees of freedom in the numerator
F k - 1 degrees of freedom in the denominator
for F score from table
Ho: the groups are the same Ha: at least one group is different {two tail test}

Http://www.zen.home.att.net
Troy E. O'Brien Page 8
Chi aka - Help sheet
(fo & fe)2
2
' j[ ]
χ

fe

For a single column table:


degrees of freedom = number of groups - 1

n
fe '
number of groups

example

(fo & fe)2


Fruit fo fe fo & fe (fo & fe)2
fe

Peaches 23 30 -7 49 1.6333333
Apples 31 30 1 1 0.033333
Mangos 25 30 -5 25 0.8333333
Grapes 29 30 -1 1 0.033333
Lemon 37 30 7 49 1.6333333
Lime 35 30 5 25 0.8333333
Totals 180 180 0 150 5

Last total is Chi

On a table(see next page for an example) :

(row total)(column total)


fe ' This is done for each cell in the table
table total
degrees of freedom = (rows - 1)(columns - 1)

Http://www.zen.home.att.net
Troy E. O'Brien Page 9
for the following example table:

fo Lansing Grand Holland South Alma Totals


Rapids Bends
Math 45 35 33 32 25 170
Reading 40 33 42 35 30 180
English 45 52 45 43 45 230
Totals 130 120 120 110 100 580

we get the expected value for each cell by multiplying the corresponding column and
row totals then dividing it by the grand total(all values came for the above table to get
this):

fe Lansing Grand Holland South Alma Totals


Rapids Bends
Math 38.10345 35.17241 35.17241 32.24138 29.31034 170
Reading 40.34483 37.24138 37.24138 34.13793 31.03448 180
English 51.55172 47.58621 47.58621 43.62069 39.65517 230
Totals 130 120 120 110 100 580

finally we take the difference of the observed and expected values, square it, and divide
by the expected value for each cell - not forgetting to get the column/row totals:

(fo & fe)2


Lansing Grand Holland South Alma Totals
Rapids Bends
fe

Math 1.248245 0.001 0.134179 0.0018 0.633874 2.01895


Reading 0.0029 0.483046 0.608046 0.02177 0.03448 1.150291
English 0.832661 0.409395 0.140555 0.0088 0.72039 2.111832
Totals 2.083852 0.893286 0.882779 0.03241 1.388747 5.281073

So, Chi is 5.281073 - the total of the last table

Http://www.zen.home.att.net
Troy E. O'Brien Page 10
Sign Test
Hypothesis:
Ho: p = .5
Ha: p <> .5

Choose a level of significance and put half into each tail


Choose definition of success (+’s or -’s)
Go through paired data and mark with +, - , or no change
n is the count of +’s and -’s (no changes are ignored)

Small use binomial distribution (20 or less)


Let a = level of significance
a
Count inward from both ends of the binomial distribution ( )@(2n ) stopping at the last combination from the
2 a
outer ends of the nth level of the binomial distribution which is less than or equal to ( )@(2n ) .
2

Example for n = 6 and L.S. = .10


.05 X 64 = 3.2 from the front ends starts 1 6 . . . so, your cut off will be between 0 and 1 successes/failures
and at the other end between 5 and 6 successes/failures. This means the test will be a Ha only at 0 and 6
successes/failures.

large scale (over 20) use bell curve:


read z critical from the table and compare to:
(X & .5) & .5n
if number of pluses or minuses is greater than n/2 use z '
.5 n

(X % .5) & .5n


if number of pluses or minuses is less than n/2 use z '
.5 n

Http://www.zen.home.att.net
Troy E. O'Brien Page 11

You might also like