Professional Documents
Culture Documents
1. MEANING AND PURPOSE ................................................................................................................................... 2 OBJECTIVES OF MEASURING DISPERSION ...................................................................................................................... 2 PROPERTIES OF AN IDEAL MEASURE OF DISPERSION .......................................................................................................... 3 2. DEFINITION ........................................................................................................................................................ 3 3. ABSOLUTE AND RELATIVE MEASURES OF DISPERSION ...................................................................................... 4 4. RANGE .............................................................................................................................................................. 4 4.1. PROPERTIES OF RANGE ....................................................................................................................................... 5 4.2. USES OF RANGE ............................................................................................................................................... 5 4.3. LIMITATIONS: .................................................................................................................................................. 5 5. QUARTILE DEVIATION ........................................................................................................................................ 7 5.1. MERITS OF QUARTILE DEVIATION .......................................................................................................................... 7 5.2. LIMITATIONS: .................................................................................................................................................. 8 6. MEAN DEVIATION ............................................................................................................................................. 9 6.2. PROPERTIES OF MEAN DEVIATION ....................................................................................................................... 11 6.3. LIMITATIONS ................................................................................................................................................. 11 6.4. USES ........................................................................................................................................................... 11 6.5. MEAN SQUARED DEVIATION .............................................................................................................................. 13 7. VARIANCE ....................................................................................................................................................... 13 7.1. PROPERTIES OF VARIANCE ................................................................................................................................. 13 7.2. COMBINED VARIANCE ...................................................................................................................................... 14 8. STANDARD DEVIATION ................................................................................................................................... 14 8.1. CHARACTERISTICS OF STANDARD DEVIATION .......................................................................................................... 15 8.2. DIFFERENCE BETWEEN MEAN DEVIATION AND STANDARD DEVIATION ........................................................................... 16 8.3. MATHEMATICAL PROPERTIES OF STANDARD DEVIATION ............................................................................................ 17 9. COEFFICIENT OF VARIATION............................................................................................................................ 18 9.1. DEFINITION ................................................................................................................................................... 18 9.2. PROPERTIES .................................................................................................................................................. 18 10. RELATION BETWEEN MEASURES OF DISPERSION .......................................................................................... 19 11.CONCLUSION .................................................................................................................................................. 19
While measures of central tendency are used to estimate "normal" values of a dataset, measures of dispersion are important for describing the spread of the data, or its variation around a central value. Two distinct samples may have the same mean or median, but completely different levels of variability, or vice versa. A proper description of a set of data should include both of these characteristics. There are various methods that can be used to measure the dispersion of a dataset, each with its own set of advantages and disadvantages. We have seen that the averages and the measures of dispersion can help in describing the frequency of distribution.
2. Definition
A central value does not provide information about the scattering of values in a set of data. Some other statistical measures which are involved to reflect on the scattering of items from numerical terms and are known as measures of variability or dispersion. Before discussing various measures of dispersion, it is worthwhile to point out that no hard and fast rules have been set which work as a guide for choosing a particular measure. A measure of dispersion is good if it posses all those properties which are discussed for the measure of central tendency.
4. Range
Range is the simplest measure of dispersion. It is defined as the difference between the largest and the smallest value in a data set. In a grouped frequency distribution range is computed either subtracting the lower limit of the smallest class from the upper limit of the largest class or by the subtracting the mid-value of the smallest class from the mid value of the largest class. The formula for range is Range = LS Where, L= largest item S = smallest item In the one hand range is simple measure of dispersion. It is easy to understand and compute. But in the other hand it may be the most misleading of all measures of dispersion. The reason is, range totally ignores all values other than the extreme ones in the distribution. Range is an absolute measure of dispersion. Its relative measure is called coefficient of range. The formula for coefficient of range is given below
4.3. Limitations:
Range is not based on each and every item of the distribution. It is subject to fluctuations of considerable magnitude from sample to sample.
Range cannot tell us anything about the character of the distribution within the two extreme observations.
Problem: Q. The following are the prices of shares of Amish Co. Ltd. From Monday to Saturday: Day Monday Tuesday Wednesday Thursday Friday Saturday Price (Rs.) 200 210 208 160 220 250
Calculate range and its coefficient. Solution: Range= L-S Here, L=250 S=160 Range=L-S =250-160 =Rs. 90 Coefficient of range =
= = = 0.22
5. Quartile deviation
It is another measure of variability. The difference Q3 Q1 is interquartile range. Midpoint of the interquartile range is quartile deviation which is also called semi-inter quartile range. The symbol Q and D is used to denote quartile deviation. When the values represent some sort of ranking, the semiinter quartile range provides a measure of dispersion within that distribution. The formula for quartile deviation is
Q. D. = (Q3 Q1)
More the computed the value of Q, more will be the variability of the series. Compare to the range, quartile deviation is more reliable measure of dispersion. Since 50% of the cases lies between the first and the third quartiles, quartile deviation measures the dispersion of middle 50% of the data in a series. It is not only the difference between the scale values of Q 3 and Q1 but also the one half of the distance between Q3 and Q1. Quartile deviation is an absolute measure of dispersion. If it thus divided by average values of two quartiles a relative measure is obtained. It is called the coefficient of quartile deviation. Coefficient of quartile deviation has the formula,
5.2. Limitations:
It is not capable of further algebraic treatments. It is susceptible to sampling fluctuations. The measure does not take into account the individual values occurring between Q3 and Q1. It means that no idea about variation of even middle 50% values is available from this measure. Anyhow it provides some idea if the values are uniformly distributed between Q3 and Q1. It is not considered as measure of dispersion as it does not show the scattering of central values. In fact, it is measure of partitioning of distribution. Hence, it is not commonly used for statistical inference. Problem: Q. Compute quartile deviation and its coefficient from the following data:
Marks No.
10 of 4
20 7
30 15
40 8
50 7
80 2
students
Solution: Calculation of Q.D. and coefficient of Q.D. Marks 10 20 30 40 50 60 Frequency 4 7 15 8 7 2 Cumulative frequency 4 11 26 34 41 43
Q1= size of
th item
6. Mean deviation
Range, the simplest measure of dispersion in all, depends only on the two extreme items of the distribution. Provided the series is arranged in descending or ascending order and the quartile deviation accommodates middle 50% of data. As such the measure of dispersion discussed so far are not satisfactory in the sense that they lack most of the requirements of a good measure. Mean deviation is a better measure of dispersion than range and quartile deviation is, later two are not average deviations as they are not based in all observations. But the mean deviations show variation of the items from an average.
Definition Mean deviation of a series is defined as the arithmetic average of the positive deviations of various items from a measure of central tendency (mean, median, and mode). Consider a set of n observation x1, x2, x3 xn . Then the mean deviation denoted by M.D. is given by the formula
M.D. = 1/n |x A|
Where A is the central value i.e. the mean, median or the mode. In case of data given in the form of frequency distribution the formula becomes
M.D. = 1/N f |x A|
Where N = f is the total frequency. In case of grouped data the midpoint of each class interval is treated as x and we can use formula as above.
Mean deviation taken from mean, median and mode are respectively called the mean deviation from mean, mean deviation from median and mean deviation from mode. But in general the deviations are taken only from mean and median. Between mean and median later is supposed to be better than the former, because the sum of deviations from the median is less than the sum of deviation from the mean. Since the sum of item from median is least (wherever the signs are ignored) it is advantageous to find mean deviation from median. But in general practice mean deviation from mean is used. The main drawback of the mean deviation is we ignore algebraic signs while taking deviation of the items. Following are different formulas for mean deviation.
X Md n
6.3. Limitations
The greatest drawback of this method is that algebraic signs are ignored while taking the deviations of the item. This method may not give very accurate results. It is not capable of further algebraic treatment. It is rarely used in sociological studies.
Because of these limitations, its use is limited and it is overshadowed as a measure of variation by the superior standard deviation.
6.4. Uses
It is especially effective in reports presented to the general public or to groups not familiar with statistical methods. This measure is useful for small samples with no elaborate analysis required. Incidentally, it may be mentioned that the National Bureau of Economic Research has found in its work on forecasting business cycles, that the average deviation is the most practical measure of dispersion to use for this purpose.
Problem Q. Calculate mean deviation from median and its coefficient from the following data: X F 10 3 11 12 12 8 13 12 14 3
Solution Calculation of Mean Deviation and its coefficient X 10 11 12 13 14 f 3 12 18 12 3 N=48 D 2 1 0 1 2 f D 6 12 0 12 6 f D=36 c.f. 3 15 33 45 48
Median= size of N+1/2 th item =size of 24.5th item Since 24.5th item is 12, hence median=12 M.D. = f |D| /N =36/48 =0.75 Coefficient of M.D.= M.D./median =0.75/12 =0.0625
7. Variance
The minimum value of mean squared deviation is called the variance, given by the formula
2 =1/n(x -x)2
So, variance is the arithmetic mean of the squares of deviation taken from A.M. A sample variance differs from population variance. Computational formula makes the point clear. A sample variance is given by the formula s2 = 1/n -1 ( x - x )2 the process of subtracting the mean from each data value to obtain the deviations results in the loss of one piece of information from the original n numbers. Therefore, sum of the squared deviation is divided by one fewer than the number of terms added up while comparing the sample variance. Remarks If nothing about the sample or population is mentioned in the problem and variance is to be taken out, this has the idea of population variance and therefore it should be calculated accordingly
3. The variance gives more weightage to the extreme values as compared to those which are near to mean value, because the difference is squared in variance. 4. Variance is not affected by the shift in origin. This simply affects the mean, but the change of scale does affect it. Thus the variance of temperature would alter if the temperature is measured in Centigrade rather than in Fahrenheit, and the variance of the plants affected by disease would not be the same as that of the percentage of plants free from infection or if we subtract a fixed number, from the entire data and compute the variance of the deviations, the variance such computed will be the same as that of the original data. But if we divide the entire data set by some fixed number and compute the variance then the variance such computed would be equal to the product of the square of the number and the variance of the original data.
8. Standard deviation
A measure of dispersion in which the drawbacks of variance are overcome is standard deviation. Standard deviation denoted by S.D. is defined as the positive square root of the variance. The formula for population standard deviation is
S.D. = 2 =
For a sample we have the following formula
S.D. = s2 = s
If nothing is mentioned about the standard deviation we should understand it as population standard deviation and use formula accordingly. But in general we will be dealing with sample deviation. This is the best accepted and most widely accepted and most widely used of all variability measures. It is sensitive to all of the data because the deviation of all data values from the mean enter equally into the computation. Standard deviation is used in computing different statistical quantities like regression coefficients, correlation coefficients etc and also in testing the reliability of certain statistical measure. Note: literally, S.D. explains the average amount of variation on either side of the mean. Independently, S.D. is defined as the positive square root of the arithmetic mean of the sum of squared deviation taken from the arithmetic mean i.e.
= 1/n(x x ) 2
For computational purpose we use the following simplified form of the formula
= x2/n (x/n) 2
And in case of frequency distribution the formula is simply modified as
= fx2/N (fx/N) 2
Where N = f
Standard deviation is an absolute measure of dispersion. Its relative measure is called the coefficient of standard deviation defined as
x-x 1 2 0 1 0 1 -5
(xi-x) 1 4 0 1 0 1 25 32
Variety B y 10 10 10 9 10 7 7 63
y-y 1 1 1 0 1 -2 -2
(yi-y)2 1 1 1 0 1 4 4 12
= 63 7
=9
Y = Yi n
= 63 7
= 9
S.D. of A variety =
(Xi-)2 n-1
= 32 7-1
=32 6
= 5.33
S.D. of B variety =
(Yi-)2 n-1
= 12 7-1
=12 6
=2
Interpretation: The SD of B variety is less than SD of A variety, that the B variety is more consistent or homogeneous than A variety since their means are equal. Therefore, we prefer B variety safely.
1. The standard deviation is independent of change of origin but not of scale. 2. The sum of the squared deviations of items in distribution from other arithmetic mean is minimum. i.e
3. Standard deviation of n natural numbers: The standard deviation of the first n natural numbers can be obtained by the following formula
= (1/12)(n2 -1)
thus the standard deviation of natural numbers from1 to 6 will be =62-1/12 35/1 = 1.71 4. Combined standard deviation X deviations following formula X2 and standard
9. Coefficient of variation
All the measures of variation above have units .if series differ in their units of measurement, their variability cannot be compared by any measure of dispersion given so far. Also the size of measure of dispersion depends on the size of values. Hence in situations where either the two series have different units of measurements, or their means differ sufficiently in size, the coefficient of measurement should be used as a measure of dispersion. This is a relative measure of dispersion so it is a unitless measure. It takes into account the size of means of two series. Using coefficient of variation (C.V.) two or more sets of data can betterly can be compared for their variability. A series with less coefficient of variation is considered more consistent or stable.
9.1. Definition
Coefficient of variation of series of variate values the ratio of standard deviation to the arithmetic mean multiplied by hundred. If is the standard deviation and x is the set of values, the coefficient of variation is
c.v. = ( / x) 100
This measure of dispersion has been given by Professor Karl Pearson.
9.2. Properties
1. It is one of the most widely used measures of dispersion because of its virtues. 2. For field experiments, C.V. is low; it indicates more reliability of experimental findings. 3. With c.v. we comment about the variability of distribution .Less the c.v. more uniform, consistent or more stable the distribution and is more the c.v. more variable or more scattered the distribution is.
Q. Compare the two yields (variety A: the mean 60 kg; and standard deviation 10 kg and variety B the mean 50 kg and standard deviation 9 kg), which one is subjected to more variation?
Now,
= 10 100 60
= 16.66%
In case of B variety: Mean () = 50.0 kg Now, C.V = S.D 100 Mean Interpretation: The C.V. of A variety is lower than B variety. So, the A variety is better based on the yield distribution because it is more consistent or homogeneous than B variety. B variety has more variation than A variety. S.D.(s)= 9.0 kg = 9 50
100
= 18%
11.Conclusion
Range takes only the maximum and minimum values into account and not all the values. Hence it is very unstable or unreliable indicator of the amount of deviation. It is affected by the extreme values. The quartile deviation is more stable than the range as it depends on two intermediate values. This is not
affected by extreme values since the extreme values are already removed. However; quartile deviation also fails to take the values of all deviation. The mean deviation is measure of dispersion based on all items in a distribution. Mean deviation is the arithmetic mean of the deviations of a series computed from any measure of central tendency; i.e., the mean, median or mode, all the deviations are taken as positive i.e., signs are ignored. S.D. is the best accepted and most widely accepted and most widely used of all variability measures.
References
Agrawal, B.L. 1999. Programmed statistics. New age international publishers, New Delhi. Gupta, C.B. 1976. An introduction to statistical methods.7th edn. Vikas publishing house pvt. ltd., New Delhi. Gupta, S.P. 1995. Fundamental of statistics. 5th edn. Himalaya publishing house, Bombay. Pant, G. D., & Chaudhary, A. K. 2055. Statistics For Economics .2nd ed. Bhundi Puran Prakashan. Rangaswami, R. 1995. A text book of agricultural statistics.Wiley Eastern Limited. New age international publishers, New Delhi, India.680p Spiegel, M.R. 1992. Theory and problems of Statistics. 2 nd edn. McGraw-Hill Book Company, New- York. Steel, R.G.D. and J.H. Torri 1980. Principle and procedure of statistics,2 nd edition. McGraw-Hill Book Company, New- York. Sthapit, B.S., R.P.Yadav and S.P. khanal 2005. Business Statistics. (2 nd ed). Asmita Books Publishers, Kathmandu.633p.