You are on page 1of 122

BASIC STATISTICS 1:

DESCRIPTIVE
STATISTICS

SUMMARY MEASURES
2

Course Outline
3

Course Outline

• Summary Measures
o Measures of Central Tendency
o Measures of Dispersion
o Measures of Location
o Measures of Skewness
o Measures of Kurtosis
o Rates, Ratios, Proportions,
Percentage, Percent Change
4

Learning
Objectives
5

Learning Objectives

• To describe data using summary


measures;
• To use MS Excel in computing for the
different summary measures; and,
• To interpret correctly MS Excel
summary measures output.
6

Summary
Measures
7

Summary Measures
A summary measure is a single numeric figure that
describes a particular feature of the collection of
observations.

If the summary measure is computed using


population data then it is referred to as a parameter.

If the summary measure is computed using sample


data then it is referred to as a statistic.
8

Summary Measures

• Measures of Central Tendency


• Measures of Location
• Measures of Variability
• Measures of Skewness
• Measures of Kurtosis
• Proportions, Rates, Ratios, Percent Change
9

Measures of
Central Tendency
10

Measures of Central Tendency


A measure of central tendency is a summary
measure that can be used to represent all the other
values in the collection.
Notes:
• Some people refer to this measure as the
“average”.
• This measure tells us where the “center” of the
distribution lies.
11

Measures of Central Tendency


• Arithmetic Mean
• Median
• Mode

The use of this measure will facilitate the


comparison of two or more data sets.
12

Measures of Central Tendency

The arithmetic mean is the sum


of all observed values divided by
the total number of observations.
13

Measures of Central Tendency


Let population data={x1, x2,…, xN}
The population mean for a population with N elements is:

Let sample data={x1, x2,…, xn}


The sample mean for a sample with n elements is:
14

Measures of Central Tendency


15

Measures of Central Tendency

Examples:

1. Five foresters reported the number of illegal


loggers they have apprehended as follows:
1, 2, 5, 5, 7
The mean number of illegal loggers
apprehended is 4.
16

Measures of Central Tendency

Examples:

2. The change in fixed assets of 10 companies are as follows


(in percent)
+1.2, -1.5, +3.4, +2.1, -2.7, +4.1, -3.3, +3.8, +1.9, -3.6
The mean percent change in fixed assets of the companies is

(1.2 - 1.5 + 3.4 + 2.1 - 2.7 + 4.1 - 3.3 + 3.8 + 1.9 - 3.6) = 0.54
10
17

Measures of Central Tendency


The Mean as a “Center” of Mass

• What would happen if the first measurement had


been 7 instead of 8?
• What would happen this time if the last measurement
had been 1000 instead of 10?
18

Measures of Central Tendency


Effect of an Outlier on the Mean
Definition: Outliers are observations that are
markedly different from the rest of the data
items.
• Since the mean is the “center of mass” then
its value is gravely affected by outliers.
• An outlier will pull the value of the arithmetic
mean in its direction and away from the
location of majority of the observations.
19

Measures of Central Tendency

The mean is affected


by extremes.
20

Measures of Central Tendency


Effect of an Outlier on the Mean
• With the presence of outliers, the mean
might not be a suitable measure of central
tendency because it may not be a good
representative of the observations in the
collection.
21

Measures of Central Tendency


Effect of an Outlier on the Mean
Example: Monthly salary of 5 employees in a
certain division

P15,000 P16,000 P17,000 P15,000 P50,000

Find the mean.


22

Measures of Central Tendency


Characteristics of the Mean
• The mean is the “center of mass”.
• It uses all the observed values in the
calculation.
• It may or may not be an actual observed
value in the data set.
• Its value is gravely affected by outliers.
23

Measures of Central Tendency


Characteristics of the Mean
• The mean of a finite collection always exists
and is unique.
• Data values should be measured using at
least an interval scale for it to be
interpretable.
24

Measures of Central Tendency


Characteristics of the Mean
Weighted mean assigns weights (or measures
of relative importance) to the observations to
be averaged.
25

Measures of Central Tendency


Example:
Congress is giving scholarship grants to employees
taking graduate studies. Courses in graduate studies
earn credits of 1, 2, 3, 4, or 5 units. You can get a partial
scholarship for the next semester if you get a weighted
average of 1.5 to 1.75 and a full scholarship of your
average is better than 1.5, which means an average of
1.0 to 1.49. What kind of scholarship will the 2
employees get given their grades for the previous
semester?
26

Measures of Central Tendency


Weighted Mean
27

Measures of Central Tendency


Solution
28

Measures of Central Tendency


Solution

Employee A will get a partial scholarship while


employee B will get a full scholarship.
29

Measures of Central Tendency


Median

divides the sorted observations


into two equal parts
30

Measures of Central Tendency


How to Determine the Median:
31

Measures of Central Tendency


• Use the following formula to get the median:

Case 1: n is odd Case 2: n is even


(the median is the (the median is the
value in the middle average of the 2
of sorted middle observations
observations)
32

Measures of Central Tendency


Examples:

1. The following are the total receipts of 7 mining


companies (in million pesos)
1.2, 4.5, 6.5, 7.2, 10.4, 12.5, 50.6
The median is 7.2.
At least fifty percent of the seven mining
companies have total receipts less than or
equal to 7.2 million pesos.
33

Measures of Central Tendency


Examples:

2. The following are the number of years of


operation of 8 mining companies:
8, 10, 10, 11, 16, 17, 17, 18
The median is (11+16) / 2 = 13.5
At least half of the eight mining companies
have number of years of operation less than or
equal to 13.5 years.
34

Measures of Central Tendency


Characteristics of the Median

• The median is also a measure of location. It


indicates the relative position of an observation
in the distribution.
• If the observation is smaller than the median
then it belongs in the lower half of the
distribution; while if the observation is larger
than the median then it belongs in the upper
half of the distribution.
35

Measures of Central Tendency


Characteristics of the Median

• The median is affected by the position of each


observation in the sorted data but not by the
value of the observation. Consequently, outliers
do not affect the median.
• It is interpretable even if the level of
measurement is as low as ordinal.
36

Measures of Central Tendency

The median is not affected by extremes.


37

Measures of Central Tendency


Mode

• is the most common observation in a


data set.
• the mode is determined by determining
the frequency of each value and finding
the value with the highest frequency of
occurrence
38

Measures of Central Tendency


Examples: Find the mode

1. The following are waistlines (in inches) of 12 males:


25, 26, 29, 30, 30, 29, 30, 30, 30, 31, 34, 36
2. Given the number of children of 20 male respondents:
2,5,5,2,2,5,1,3,5,4,2,5,5,2,2,5,5,2,2,1
3. Given the following data:
1,2,3,3,2,1,2,3,1,4,4,5,5,1,2,3,4,5,4,5
39

Measures of Central Tendency


Characteristics of the mode

• It does not always exist; and if it does, it


may not be unique.
• It is not recommended if there are only a
few observations.
• It is not affected by outliers.
• The mode can be used even if the level of
measurement is as low as nominal.
40

Measures of Central Tendency


41

Measures
of Location
42

Measures of Location

A measure of location indicates the relative


position of an observation in the distribution.
• Percentiles
• Quartiles
• Deciles
43

Measures of Location
Percentiles
• Percentiles divide the sorted observations into
100 equal parts.
• There are 99 percentiles. We denote and read
the individual percentiles as follows:
P1 is read as the first percentile.
P2 is read as the second percentile.
:
P99 is read as the ninety-ninth percentile
44

Measures of Location
Percentiles
• The kth percentile, Pk is a value such that at
least k% of the ordered data are less than or
equal to it and at least (100-k)% are greater
than or equal to it, where k = 1, 2, 3, …, 99.
45

Measures of Location
Example of Percentiles:
• The 80th percentile of a distribution is a value
such that at least 80 percent of the ordered
observations are less than its value and at
least 20 percent of the ordered observations
are larger than its value.
• If P = 75: At least 80% of the ordered
80
observations are less than 75.
OR At least 20% of the ordered
observations are larger than 75.
46

Measures of Location
Example of Percentiles:
• So any observation that is smaller than P80
value belongs in the lower 80% of the
distribution while any observation greater than
P80 value belongs in the upper 20% of the
distribution.
47

Measures of Location: Examples of Percentile


1. The annual per capita poverty threshold in pesos
of the different regions of the Philippines are as
follows: 15,693, 13,066, 12,685, 11,128, 13,760,
13,657, 11,995, 11,372, 11,313, 9,656, 9,518,
9,116, 10,503, 10,264, 10,466, 10,896, 12,192.
Find the 75th percentile.
The 75th percentile is 12,685. This implies that any
region with annual per capita poverty threshold that
is lower than PhP12,685 belongs in the lower 75% of
the distribution.
48

Measures of Location: Examples of Percentile


2. The following are the number of telephone lines
of 16 regions for the year 2004: 2799079,
94079,190335, 42860, 410841, 1049413,
125157, 427497, 470299, 151652, 35945,
147513, 295334, 82616, 117116, 33315.
Find the 50th percentile.

The 50th percentile is 149,582. Thus, any region


with number of telephone lines lower than 149,582
belongs in the lower 50% of the distribution.
49

Measures of Location
Quartiles
• Quartiles divide the ordered observations
into 4 equal parts.
• 1st Quartile = 25th Percentile
• 2nd Quartile= 50th Percentile
• 3rd Quartile = 75th Percentile
50

Measures of Location
Quartiles
• The third quartile denoted by Q 3 divides the bottom
75% of the ordered observations from the top 25%.
Thus it is equal to P75 .
• The second quartile denoted by Q 2 divides the bottom
50% of the ordered observations from the top 50%.
Thus it is equal to P50 and the median.
• The first quartile denoted by Q 1 divides the bottom
25% of the ordered observations from the top 75%.
Thus it is equal to P25 .
51

Measures of Location
Deciles
• Divide the ordered observations into 10
equal parts.
• Each part contains 10 percent of the
observations.
• There are nine deciles and these are D1,
D2, D3, . . . , D9.
52

Measures of Location: Example of Deciles


53

Measures
of Dispersion
54

Summary Measures
Measures of Dispersion
• indicate the extent to which observations
in the data differ from the average value.
55

Why Measures of Dispersion is Important

The mean is not enough to describe the data.


56

Two Measures of Dispersion


Measures of Dispersion
1. Measures of Absolute Dispersion
• carries the unit of measure of the observations
• can be used to compare data sets with the same
means and the same units of measurement
2. Measures of Relative Dispersion
• unitless so it can be used to compare the
dispersion of two or more data sets with
different means or different units of
measurement.
57

Measures of Dispersion

Measures of Absolute Measures of Relative


Dispersion Dispersion
• Range • Coefficient of Variation
• Standard deviation • Standard Score
58

Measures of Absolute Dispersion


Measures of Dispersion

Range = maximum – minimum

It is the difference between the maximum and


minimum values of a data set.
59

Measures of Absolute Dispersion


Properties of the Range

• It does not take into account middle


observations.
• It is affected by outliers
• It tends to be smaller for smaller samples
than for larger samples.
60

Measures of Absolute Dispersion


Variance

• Describes how far the observations are


from the mean.
• Its unit is the square of the unit of measure
of the observations
61

Measures of Absolute Dispersion


62

Measures of Absolute Dispersion

• The standard deviation is the positive square


root of the variance. Its unit is the same as the
unit of measurement of the observations.
• Population standard deviation

• Sample standard deviation


63

Measures of Absolute Dispersion


64

Measures of Absolute Dispersion


65

Measures of Absolute Dispersion


Team A: Heights in Inches of 5 Marathon Players
66

Standard Deviation Example


Team B: Heights in Inches of 5 Marathon Players
67

Comparing Standard Deviations


68

Standard Deviation

Remarks:
1. If there is a large amount of variation in the data
set, then on the average, the data values will be far
from the mean. Hence, the standard deviation will
be large.
2. If there is only a small amount of variation in the
data set, then on the average, the data values will
be close to the mean. Hence, the standard
deviation will be small.
69

Measures of Absolute Dispersion

Characteristics of the Standard Deviation:


1. It is affected by the value of every
observation. It may be distorted by outliers.
2. It cannot be negative
70

Measures of Relative Dispersion

Coefficient of Variation
• are unitless and are used to compare the
scatter of one distribution with the scatter of
another distribution
71

Measures of Relative Dispersion


72

Measures of Relative Dispersion

Coefficient of Variation (CV)


Example: Suppose you want to buy a stock and you have the option to select one
out of two stocks. The given information is that Stock 1 is currently priced at
P2000 per share and stock 2 is priced P550 per share. In buying stocks, we
reduce the risk by selecting a stock whose price is stable. On the other hand, we
could take a chance on a stock that shows greater variation in price, hoping the
prices go up rather than down. Let’s say a sample of prices of Stock 1 and Stock 2
was collected at the close of training for the past months and the following
statistics were obtained:
73

Measures of Relative Dispersion

Coefficient of Variation (CV)


To determine which of the two stocks have a more variable
price, we compute for the coefficient of variation.

Stock 1 price is more variable than stock 2 price. As a


matter of fact, stock 1 price is almost twice as variable as
stock 2 price.
74

Measures of Relative Dispersion

Standard Score (z-score)


• Helps determine the relative position of an observed
value in the collection where the observed value
came from.
• A positive z score measures the number of standard
deviations an observation is above the mean
• A negative z score gives the number of standard
deviations an observation is below the mean.
75

Measures of Relative Dispersion

Population Sample
z-score z-score
76

Measures of Relative Dispersion

Example: Standard Score (z-score)


The mean score of participants in the training workshop in
Descriptive Statistics is 70% and the standard deviation is 10%;
whereas in Inferential Statistics, the mean score is 80% and
the standard deviation is 20%.

a) Roland got a score of 75% in Descriptive Statistics and a


score of 90% in Inferential Statistics. In which subject
did Roland perform better if we consider the score of
the other participants in the two training courses?
77

Measures of Relative Dispersion

Example: z-score

If we consider the scores of the other participants in the two


training courses, Roland’s score in Descriptive Statistics is
just as good as his score in Inferential Statistics. Based on
the z scores, Roland’s scores in both training courses are
0.5 standard deviations above their respective mean scores.
78

Measures of Relative Dispersion

Continuation of Example: z-score


b) Betty got a grade of 70% in both Descriptive and
Inferential Statistics. In which training course did
Betty perform better if we consider the scores of
the other participants in the two training courses?
c) Mario got a score of 100% in Descriptive Statistics.
Compute for the z score and interpret.
79

Measures of Relative Dispersion

Remarks: z-score
• We can also use the standard score in identifying
possible outliers in our data set.
• By rule of thumb, if the absolute value of the
standard score is at least 3 then that observation is
marked as a possible outlier.
80

Measures
of Skewnes
81

Summary Measures: SHAPES of the Data Distribution


82

Symmetric Distribution

• The graph of the frequency


distribution or relative frequency
distribution is symmetric if it can
be folded along the vertical axis so
that the left hand side is the mirror
image of the right hand side.

• If the distribution has one mode


and is symmetric, the mean, the
median, and the mode are equal.
83

Other Symmetric Distributions


84

Skewed Distribution

• if the two sides do not coincide,


distribution is said to be asymmetric

• a distribution that is asymmetric with


respect to a vertical axis is said to be
skewed.
85

Two Types of Skewness

• Positively Skewed or Skewed to the Right

• Negatively Skewed or Skewed to the Left


86

Positively Skewed or Skewed to the Right Distribution

• The longer upper tail indicates


that there are observations in
the data whose values are so
much larger compared to the
others so consequently these
observations will pull the mean
to the right.
• The mean will then be larger
than the median. The median
will be larger than the mode.
87

Example of a Positively Skewed Distribution


88

Negatively Skewed Or Skewed to the Left Distribution

• The longer lower tail indicates


that there are observations in
the data whose values are so
much smaller compared to the
others so consequently these
observations will pull the mean
to the left.
• The mean will then be smaller
than the median. The median
will be smaller than the mode.
89

Measures of Skewness

Pearson’s First and Second Coefficient of Skewness


90

Measures of Skewness

Interpretation:
91

Measures
of Kurtosis
92

Measure of Kurtosis

• It measures the extent of peakedness or


flatness of the distribution

• It is denoted by K.
93

Measure of Kurtosis
94

Box-and-Whisker
Plot or Boxplot
95

Box-and-Whisker Plot

It is used to display the following features of the data:


• location
• spread
• symmetry
• extremes
• outliers
The boxplot is a simple graphical method used to
display the 5-letter summary.
96

Box-and-Whisker Plot

STEPS in making the Boxplot (Box-and-Whisker Plot)


1. Construct a rectangle with one end at the 1st
Quartile and the other end at the 3rd Quartile.
2. Draw a line within the rectangle at the value of the
median.
3. Compute for the IQR, interquartile range. IQR = 3RD
QUARTILE – 1ST QUARTILE
97

Box-and-Whisker Plot

STEPS in making the Boxplot (Box-and-Whisker Plot)


4. Compute for the lower fence and upper fence.

These are your outlier cutoffs


5. Excluding outliers, identify the two data values that are
closest to the lower fence and upper fence, respectively.
Draw a line, starting from these values up to each side of the
rectangle (whiskers).
98

Box-and-Whisker Plot

STEPS in making the Boxplot (Box-and-Whisker Plot)


6. Plot outliers at their corresponding values using
an x mark or any symbol.
99

Box-and-Whisker Plot

Boxplot displays the location, spread, symmetry, extremes,


and outliers.
This can be done by examining the following components
of the boxplot:
1. The line inside the rectangle shows the location of the
median, our measure of central tendency;
2. The sides of the rectangle, which are plotted either at
the fourths or the quartiles, indicate where the middle
50% of observations lie;
100

Box-and-Whisker Plot

Boxplot displays the location, spread, symmetry, extremes,


and outliers.
This can be done by examining the following components
of the boxplot:
3. The length of the rectangle represents the magnitude of the
inter-quartile range, the measure of dispersion;
4. The relative position of the line inside the rectangle to its
sides gives an idea on the degree and direction of
symmetry because this shows the respective distances of
the median to the lower and upper fourths.
101

Box-and-Whisker Plot

Boxplot displays the location, spread, symmetry, extremes,


and outliers.

• A line that is in the middle of the rectangle


indicates that the distribution is symmetric
• A line that is closer to the 1st quartile indicates
that the distribution is skewed to right
• A line that is closer to the 3rd quartile indicates
that the distribution is skewed to the left
102

Box-and-Whisker Plot

Boxplot displays the location, spread, symmetry, extremes,


and outliers.

5. If there are no outliers, then the ends of the


whiskers indicate the respective values of both
extremes. But, if there are outliers then the
farthest outlier is the extreme; and,
6. The outliers are clearly identified by the
distinctive mark (x) used to plot them.
103

Box-and-Whisker Plot
104

Proportions,
Ratios, Rates,
Percent Change
105

Proportions

The proportion among elements in the collection


belonging in a given category is defined as:

The number of elements belonging in the category


divided by the total number of elements in the
collection.
106

Proportions

Examples

Proportion of males in the population

Proportion of males in the sample


107

Proportions

Examples
SDG indicator 5.5.1:
Proportion of seats held by women in
(a)national parliaments and
(b)local governments
108

Proportions
Note
The proportion is actually a special case of the
arithmetic mean.
Illustration: Let P= population proportion of males
Where:
109

Proportions
Example:
Number of clientele assisted by the Public Attorney’s
Office by Quasi-Judicial Cases Handled in the
Philippines: 2013 and 2014
110

Proportions
Proportion of Prosecution Cases Handled:
2013: 40,752/69,779 = 0.58
2014: 37,659/60,136 = 0.63

Note:
The sum of the proportions in the different
categories of the variable is 1.00.
111

Percentages

• Percent means “per hundred”, “by the hundred”, or


“out of a hundred”.

• A proportion can be converted to a percentage by


multiplying it by 100.
112

Percentages
Example:
Proportion of Prosecution Cases Handled in 2013:
40,752/69,779 = 0.58
Percentage of Prosecution Cases Handled in 2013:
.58 x 100 = 58%
This indicates that for every 100 quasi-judicial
cases handled in 2013, there are 58 that are
prosecution cases.
113

Ratios
• The ratio of a number x to another number y
expresses the size of one measure x with respect to
the size of another measure y.
• It is written as x:y and is read as “x is to y”.
• When the measure x is divided by the measure y, the
relationship that x bears to y is then expressed as a
ratio to one.
• The measure y in the denominator is called the base.
114

Ratios
Example:

Pupil-Teacher Ratio = Total pupils/total teachers


= 33,681/991 = 34

There are 34 pupils to 1 teacher.


115

Ratios
Example:
116

Ratios
Averaging Ratios:
Arithmetic Mean vs. Weighted Mean

There are many ways of averaging ratios, each one


assigning a different set of weights to the ratios.
Example:

No. of correct answers


Total no. of items
Percentage
117

Ratios
Arithmetic Mean
= (96% + 87.5% + 10%)/3 = 64.5% .
All of the 3 exams are given the same weights.

Weighted Mean
= [(50)(96%) + (80)(87.5%) + (100)(10%)]/(50+80+100)
= 55.65%.
Here, the ratio with the largest base is given the
greatest importance.
118

Ratios
Ratio of Sums or Ratio of Means
If the collection of measurements consists of
ratios, , then the average of these

ratios wherein those ratios with larger bases are


given heavier weights is the ratio of sums, R,
119

Ratios
Ratio of Sums or Ratio of Means
If the collection of measurements consists of
ratios, , then the average of these

ratios wherein those ratios with larger bases are


given heavier weights is the ratio of sums, R,
120
121
122

� THANK YOU!
Any questions?

You can find us at:


psa.launion@yahoo.com

You might also like