You are on page 1of 6

QUANTITATIVE SKILLS AND MANAGERIAL

STATISTICS

Report on:
DEGREES OF FREEDOM
Married Man: There is only one subject and my degree of freedom
Is zero. I should increase my "sample size."
Degrees of freedom are the number of values in a distribution that are free to vary for any
particular statistic" In a layman word Degrees of freedom is used to minimize the error of
the results in statistics. That is why it is called degrees of freedom.

This article is an attempt to understand the concept of degrees of freedom that occurs
throughout statistics and is used extensively in statistical inference.
The term “degrees of freedom” was introduced by Sir Ronald fisher in 1922 without
mentioning its purpose. The various definitions given about “degrees of freedom” are:
1. It is the number of independent parameters that are needed to specify the
configuration of a system.
2. It refers to the number of independent variables involved in a statistic.
3. It is a number which in some way represents the size of the sample/samples used
in a statistical test. (In some cases it is the sample size, and sometimes it has to be
calculated depending on the kind of test).
4. It is a parameter that appears in some probability distributions used in statistical
inference, particularly the t-distribution, chi-squared distribution and F-
distribution.
5. It is a positive integer (however fractional numbers can occur in some
approximations) normally equivalent to the number of independent observations
in a sample, minus the number of population to be estimated from the sample.
Though all the above definitions are an attempt to explain the concept of the degrees
of freedom, they appear vague and inconsistent with each other. However a clearer
and concise definition can be:

“Degrees of Freedom means the number of independent units of information in a


sample relevant to the estimation of a parameter or the calculation of a statistic”

This definition can be explained simply by taking the example of a contingency table.
For example if we take a 2 x 2 contingency table with marginal totals provided then
we will have only one degree of freedom as we are free to put only the value in one of
the four cells of the table and the value of the remaining three cells will be dependent
on all the other values (value of marginal totals and the one independent value which
we put).
X Y1 24

Y2 Y3 20

15 29 44

As in the above example X is the only independent variable, and therefore we have only
one degree of freedom, once we set its value for example “5” then all the other values
will also become known, Y1=19, Y2=10 and Y3=10.
Suppose in the above example we only provide with the grand total then we will have
three degrees of freedom.

X1 X2

X3 Y

44

As we can set three independent values such that suppose X1=10, X2=20, X3=5 then Y
must be equal to 9 so as to meet the given requirement.

Thus we can on the basic level understand degrees of freedom as the number or amount
of information that can be freely varied without the violation of any restrictions.

Also in statistics degrees of freedom is the number of independent items of information


given in the data set; that is the total number of items in the data set less the number of
items in the data set that can be derived from other data.
Thus we can say that X1, X2, X3 are all independent variables and we can set their
values by our own will so they have n=3 degrees of freedom, but if the average of X1,
X2, X3 is provided as 2 then the degree of freedom will be reduced to n-1=3-1=2 as now
one of the three variables is dependent on the sum of the other two, that is if X1=2 and
X2=3 then X3 has to be necessarily equal to 1 to meet the requirement of the average
being two.
Similarly in the above examples of the contingency tables when the 4 marginal totals, we
understand that only 3 marginal totals are relevant as the 4th marginal total will become
intrinsically known therefore the degrees of freedom can be calculated as
N (number of required cell values)-3(amount of relevant data known) = 4-3 =1

In the second contingency table where only the grand total was provided we will
calculate the degrees of freedom as
N-1= 4-1= 3

The maximum numbers of quantities whose values are free to vary before the remainder
of the quantities are determined. The examples of degree of freedom are present in
Everyday life. For example in a day a student has to attend 4 class i.e. English, Math,
Science, History, Language and business between a specific duration. The schedule of the
student has 3 degree of freedom. It means that the student has a choice of attending 3
classes under the schedule on it own will in his desire time slot however after taking all
the 3 classes the fourth class is automatically determined by default.
Furthermore if the student is given a certain restriction that he has to attend the English
class first then his degree of freedom will decrease a point and will become 2 as cannot
select it freely and it cannot vary

There are many other situations where we can talk about "degrees of freedom". For
instance, we may be studying the relationship between age, sex, and college GPA in a
population. In this situation, we have three "degrees of freedom" for each person.
0.8

0.7 F Distribution
3, 36 degrees of
0.6
freedom

s12
0.5

0.4

F=
s22
0.3

0.2

0.1 Test statistic


falls
0 within the zone of 2.87
0 1 2 (5%)3 4
acceptance

Accept “Equal Variance”


Reject
Hypothesis

The variance of a random variable, probability distribution, or sample is one measure of


statistical dispersion, averaging the squared distance of its possible values from the
expected value (mean). Whereas the mean is a way to describe the location of a
distribution, the variance is a way to capture its scale or degree of being spread out. The
unit of variance is the square of the unit of the original variable. The positive square root
of the variance, called the standard deviation, has the same units as the original variable
and can be easier to interpret for this reason.
If we wish to calculate the sample variance, it is first necessary to compute the sample
mean.
In many statistic textbooks the sample variance is the result of dividing n instead of by
n-1 and few provide an intuitive justification for dividing the sum of squared deviations
by n– 1 rather than n
Some texts/people use n, but the reason for using n-1 is to make the estimate unbiased.
I.e., you want the expected value of your estimate to equal the true population variance,
and this requires using n-1.
The idea behind having estimators is to be able to find the population variance (and
mean) which in an actual statistical problem, you don't know. So you construct functions
(called estimators) which estimate the value of the population statistics from the sample
statistics. The closer your estimator takes you to the population statistics, the better it is
for you. In the large n limit, the properties of the sample approach the properties of the
underlying distribution exactly.
When you have (n-1) then the expectation of the so defined sample variance exactly
equals the population variance. That is

Now giving justification on why there are n– 1 degrees of freedom for calculating a
sample variance I can explain through an example Suppose There is a sample of 8
observation and nothing is known about the observation so any value can be taken at
random and can be freely taken and discarded. However if the sample variance of the 8
outcome is taken out we need to first take out

_
X=∑Xi/N now suppose the X=60 it is now not correct at all that all the
observation are free to be replaced as

x1+x2+x3+x4+x5+x6+x7+x8 =60

Therefore we can say that nX=60. This means that the sum of the 8 means is sixty. Now
the seven values in the sample are free to be replaced and can vary however when the
seven values are decided and fixed the 8 value is determined by default.
Therefore there are 7 degree of freedom i.e. 8-1 as the 7 values in the sample can vary
and 8th value is obtained by default.

You might also like