You are on page 1of 5

ELEMENTS OF STATISITCS

Que. 5) Short notes:

a)Measures of dispersion:

The word dispersion is used to denote the degree of heterogeneity in the


data.

It is an important characteristic indicating the extent to which observations


vary

amongst themselves. The dispersion of a given set of observations will be


zero

when all of them-are equal (as in Set B given above). The wider the
discrepancy

from one observation to another, the larger would be the dispersion. (Thus

dispersion in Set A should be larger than that in Set C.) A measure of


dispersion

is designed to state numerically the extent to which individual observations


vary

on the average.

There are quite a few measures of dispersion. They are:

i. Range

Of all measures of dispersions, range is the simplest. It is defined as the


difference

between the largest and the smallest observations. It is intuitive that,


because of central tendency, if one selects a small sample, observations
are more likely to be around its mode than away from it. Less likely or
extreme values will be included in the sample when its size is large. This,
in other words, implies that range will increase with increase in sample

size.

ii. Inter-quartile Range

Range as a measure of dispersion does not reflect a frequency


distribution well,
as it depends on the two extreme values. Even one very large or
small observation,

away from general pattern of other observations in the data set,


makes the range

very large.

Inter-quartile Range = Q, - Q, = P,, - P,,.

Inter-quartile range is the range of the middle most 50% of the


observations.

iii. Mean Deviation

While range depends on the two extreme observations, inter-quartile


range depends

on the two extreme observations among the middle most 50 percent


of the

observations. Thus, one talks only about the percentage of


observations between

minimum, P,, and maximum, P,, . Thus both range and interquartile
range do not

depend upon all the observations in the sample. Hence while


computing range or

inter-quartile range we do not say anything about the distribution of


observations

within the group.

iv. Variance and Standard Deviation:

The most frequently used measures of dispersion are variance and


standard

deviation. Variance is so commonly used that it is also called


dispersion.

Variance is a measure which suitably combines individual deviations


from the mean,
treating each observation with equal weight as in mean deviation. For
variance,

however, measure of individual deviation is taken as the squared


difference from

the mean. Based on variance, an equally or more popular measure of


dispersion in the same unit as that of observations is standard
deviation, abbreviated as s.d. Standard

deviation is defined as the positive square root of variance.

b) Skewness:

In probability theory and statistics, skewness is a measure of the asymmetry


of the probability distribution of a real-valued random variable. The skewness
value can be positive or negative, or even undefined. Qualitatively, a
negative skew indicates that the tail on the left side of the probability density
function is longer than the right side and the bulk of the values (including the
median) lie to the right of the mean. A positive skew indicates that the tail on
the right side is longer than the left side and the bulk of the values lie to the
left of the mean. A zero value indicates that the values are relatively evenly
distributed on both sides of the mean, typically but not necessarily implying a
symmetric distribution.

Consider the distribution on the figure. The bars on the right side of the
distribution taper differently than the bars on the left side. These tapering
sides are called tails, and they provide a visual means for determining which
of the two kinds of skewness a distribution has:

1.negative skew: The left tail is longer; the mass of the distribution is
concentrated on the right of the figure. It has relatively few low values. The
distribution is said to be left-skewed or "skewed to the left"[1]. Example
(observations): 1,1000,1001,1002,1003

2.positive skew: The right tail is longer; the mass of the distribution is
concentrated on the left of the figure. It has relatively few high values. The
distribution is said to be right-skewed or "skewed to the right"[1]. Example
(observations): 1,2,3,4,100.

The skewness of a random variable X is the third standardized moment,


denoted γ1 and defined as:

where μ3 is the third moment about the mean μ, σ is the standard deviation,
and E is the expectation operator. The last equality expresses skewness in
terms of the ratio of the third cumulant κ3 and the 1.5th power of the
second cumulant κ2. This is analogous to the definition of kurtosis as the
fourth cumulant normalized by the square of the second cumulant.

Que. 7) Definition:

Classification is a process of arranging data into different classes according


to their resemblances and affinities. The arrangement of a huge mass of
heterogeneous data into homogeneous groups facilitates comparison and
analysis of the data. Classification prepares the ground for the proper
presentation of statistical facts.

Objectives:
1. To condense the mass of data: Statistical data collected during the
course of an investigation are so varied that it is not possible to
appreciate, even after a careful study, the real significance of the figures,
unless they are properly classified small groups or classes. For example;
the huge and fragmented data collected during a population census has
to be classified according to sex, marital status, education, occupation,
etc., to ascertain the structure and nature of the population.

2. To enable grasping of data: Unorganized mass of data does not allow a


proper grasp of the definition of statistics (as data) it was indicated that
it has to be an organized mass arranged and classified as per a
predetermined mode of classification.

The figures are easily arranged in a few classes or categories so that the like
go with the like. The data becomes comprehensible when it is sorted into
homogeneous groups as per their respective affinities and cognate
characteristics.

3. To prepare the data for tabulation: Only classified data can be presented
in tabular form. Classification thus provides a basis for tabulation and
further statistical processing.

4. To study the relationship:

Relationship between variable can be established only after the various


characteristics of the data can be known, which is possible only through
classification and tabulation.

5. To facilitate comparison:

Classification enables comparison between variables.

You might also like