You are on page 1of 2

MEASURES OF CENTRAL TENDENCY It is very easy to describe qualitative variables.

For example, we can say that there are 23 men and 60 women regularly employed in the operation theatre. Or, we can say that 90% of the hospital staff are Hindu. Such a description of gender and religion is complete in itself. It is not as easy to describe quantitative variables. For example, we cannot say that the ages of the patients in a ward are 32, 39, 19, 63, 28, 44... years because although providing all the individual numbers may be possible if the sample size is small, it becomes inconvenient to manage and impossible to understand if the sample size is large. To describe a quantitative variable conveniently, therefore, we require a single value which is representative of the entire set of values (of that variable) in the sample. This representative value is known as a measure of central tendency. Observe that a measure of central tendency allows us to conveniently describe as well as easily understand how the values of the variable appear in the sample. There are three measures of central tendency: the mean, the median, and the mode. MEAN In mathematics, there are different kinds of means: the arithmetic mean, the geometric mean, the harmonic mean, etc. In statistics, however, it is almost always the arithmetic mean which is invoked. The arithmetic mean is calculated as the sum of all the values divided by the total number of values. Thus, the mean is what is commonly referred to as "average". Mathematically, the mean is expressed as (x)/n where the Greek symbol refers to "Sum of all values". Thus, if x1 refers to the first value, x2 refers to the second value, and so on up to xn where n refers to the total number of values (sample size), Mean = = (x)/n x1 + x2 + x3 ... xn n

MEDIAN Consider a group of 5 persons whose ages are 10, 12, 10, 11, and 73 years. The mean age of this sample is 116/5; that is, 23.2 years. This value for the mean fails as a measure of central tendency because it does not represent any of the 5 values for age! The median is a more useful value in such situations. The median is obtained as the middle term when the terms are arranged in ascending (or descending) order of magnitude. In this example, arranging the terms in ascending order gives us: 10, 10, 11, 12, 73

The median is 11 because there are two terms (10 and 10) which are lower and two terms (12 and 73) which are higher in value. It is easy to see that the median is more representative than the mean for this sample because it is close to 4 of the 5 values in the data set. When the sample size (n) is an odd number, the median term is (n+1)/2. In our example, the sample size was 5. The median term is therefore (5+1)/2; that is, the third term when the terms are arranged in order of magnitude. When the sample size is an even number, the median term is the average of the two middle values. These middle values are the n/2 term and the one immediately after it. Thus, if the sample size is 10, the median term is the average of the 10/2 term and the next term; that is, the fifth and sixth terms. MODE The mode is the value which occurs with the greatest frequency. In the data set 14, 15, 18, 12, 14, 12, 14, 14, 14, 15, 14, 13, 14, 14, 16, 15, 16, 14, the mode is 14 because it occurs more often than any of the other values. There may be more than one mode in a data set if two or more values occur with the same (highest) frequency. When to use mean, median, and mode The mean is the most commonly used measure of central tendency. It is easy to calculate and understand. It is also used in a large number of statistical procedures, such as the comparison of groups. The median is preferred when there are several very large (or very small) values in the data set which "pull" the mean away from the centre. These extreme values are known a outliers (outlying values); samples which contain several outliers are said to be skewed. [Note: Can we exclude outliers and then calculate the mean? Not if the outliers are valid values. Yes, if the outliers are invalid values. E.g., if a student obtains outlying marks because he was ill during the exam, or because he cheated during the exam, his marks can validly be disregarded when computing the mean; but, if he does well or poorly because he is naturally bright or dull, his marks should not be omitted. [Sometimes, if the distribution is bimodal because there are valid subsamples, the mean can be calculated separately for each subsample. For example, in a mixed sample, body weight may need to be averaged separately for Europeans and Asians, or men and women. The mode is used the commonest or the most typical value is desired. For example, if you are buying operation theatre gowns, you would prefer to buy the commonest size. As an extreme example to illustrate a preference for the mode over the mean, consider the amputees story at the end of this article.

You might also like