You are on page 1of 35

Summarizing and Describing

Numerical Data
Lectures 3+4+5 Topics
Measures of Central Tendency
Mean, Median, Mode
Measures of Variation
The Range, Variance and
Standard Deviation
Shape
Symmetric, Skewed, Skewness, Kurtosis
Summary Measures
Central Tendency
Mean
Median
Mode
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Measures of Central Tendency
Central Tendency
Mean Median
Mode
n
x
n
i
i

=1
The Mean (Arithmetic mean,
Average)
It is the Arithmetic Average of data values:
The Most Common Measure of Central Tendency
Affected by Extreme Values (Outliers)
n
x
n
1 i
i

=
n
x x x
n 2 i
+ - - - + +
=
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
= x
Sample Mean
Sum of the observations
Number of observations
Mean =
This is the most popular and useful This is the most popular and useful
measure of central location measure of central location
The Arithmetic The Arithmetic
Mean Mean
n
x
x
i
n
1 i=

=
Sample mean Population mean
N
x
i
N
1 i=

=
Sample size Population size
n
x
x
i
n
1 i=

=
The Arithmetic The Arithmetic
Mean Mean
=
+ + +
=

=
=
10
...
10
10 2 1
10
1
x x x x
x
i i
Example 1
The reported time spent on the Internet of 10 adults are 0, 7, 12, 5,
33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.
00 77 22 22
11.0 hours 11.0 hours
Example 2
Suppose the telephone bills represent
the population of measurements ( 200). The population mean is
=
+ + +
=

=
=
200
x ... x x
200
x
200 2 1 i
200
1 i
42.19 42.19 38.45 38.45 45.77 45.77
43.59 43.59
The Arithmetic The Arithmetic
Mean Mean
The arithmetic
mean
Weighted mean for data grouped Weighted mean for data grouped
by categories or variants by categories or variants
i
i i
k
i
f
f x
x

=
=1
When many of the measurements have the same value, the
measurement can be summarized in a frequency table. Suppose
the number of children in a sample of 16 families were recorded
as follows:
NUMBER OF CHILDREN 0 1 2 3
NUMBER OF FAMILIES 3 4 7 2
16 families
5 . 1
16
) 3 ( 2 ) 2 ( 7 ) 1 ( 4 ) 0 ( 3
16
... .
16
16 16 2 2 1 1
16
1
=
+ + +
=
+ +
=

=
=
f x f x f x f x
x
i i i
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Median = 5
Important Measure of Central Tendency
In an ordered array, the median is the
middle number.
If n is odd, the median is the middle number.
If n is even, the median is the average of the 2
middle numbers.
Not Affected by Extreme Values
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 22
0, 0, 5, 7, 8, 9, 12, 14, 22, 33 0, 0, 5, 7, 8, 9, 12, 14, 22, 33
Even number of observations
Example 4.3
Find the median of the time spent on the internet
for the adults of example 1
The The Median Median of a set of observations is the of a set of observations is the
value that falls in the middle when the value that falls in the middle when the
observations are arranged in order of observations are arranged in order of
magnitude or ranked increasingly magnitude or ranked increasingly
The Median The Median
Suppose only 9 adults were sampled
(exclude, say, the longest time (33))
Comment
8
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not be a Mode
There May be Several Modes
Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
The The Mode Mode of a set of observations is the of a set of observations is the
variable value that occurs most frequently. variable value that occurs most frequently.
Set of data may have one mode (or modal Set of data may have one mode (or modal
class), or two or more modes. class), or two or more modes.
The modal class
For large data sets
the modal class is
much more relevant
than a single-value
mode.
The Mode The Mode
Approximating Descriptive Approximating Descriptive
Measures for grouped Measures for grouped
Data by CLASSES Data by CLASSES
Approximating descriptive measures for Approximating descriptive measures for
grouped data may be needed in two grouped data may be needed in two
cases: cases:
when approximated values.suffices the needs, when approximated values.suffices the needs,
when only secondary grouped data are when only secondary grouped data are
available. available.
i
k
i
i i
k
i
f
f x
x
1
1
=
=

=
x midpoint
f frequency
Class Class Frequency Midpoint
i limits f
i
x
i
x
i
f
i
1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
. . . . .
6 17-20 2 18.5 37.0
n =sample size= 30=f
1
++f
n
312.0
Class Class Frequency Midpoint
i limits f
i
x
i
x
i
f
i
1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
. . . . .
6 17-20 2 18.5 37.0
n =sample size= 30=f
1
++f
n
312.0
Example 3 Example 3
Approximate the mean (calculate the mean) of Approximate the mean (calculate the mean) of
the telephone call durations problem as the telephone call durations problem as
represented by the frequency distribution represented by the frequency distribution
5 8 11 14 17 20 More
5
6.5
26 . 10
: value Real
= x
Median and Mode Median and Mode
Median Median
Me
1 - Me
1 i
i
0
n
n - 1) (
2
1
K x

=
+
+ =
i
n
Me
Median and Mode Median and Mode
Mode Mode
2 1
1
0
K x
( + (
(
+ = Mo
If a distribution is symmetrical, the If a distribution is symmetrical, the
mean, median and mode coincide mean, median and mode coincide
If a distribution is non symmetrical, and If a distribution is non symmetrical, and
skewed to the left or to the right, the skewed to the left or to the right, the
three measures differ. three measures differ.
A positively skewed distribution
(skewed to the right)
Mean
Median
Mode
Mean
Median
Mode
A negatively skewed distribution
(skewed to the left)
Relationship among Mean, Median, Relationship among Mean, Median,
and Mode and Mode
Summary Measures
Central Tendency
Mean
Median
Mode
n
x
n
i
i

=1
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
)
1 n
x x
s
2
i
2


=
Measures of Variation
Variation
Variance Standard Deviation Coefficient of
Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
100%

=
X
S
CV
Measure of Variation
Difference Between Largest & Smallest
Observations:
Absolute Range =
Relative Range =
Ignores How Data Are Distributed:
The Range
Smallest rgest La
x x
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
mean x x
Smallest La
/ ) (
rgest

Deviation Deviation
Individual deviation from the mean = Individual deviation from the mean =
Overall deviation = 0, because Overall deviation = 0, because
Summing squared deviations Summing squared deviations
or or
absolute values of the deviations absolute values of the deviations
mean x
i

)

= 0 X X
i
)


2
X X
i
| | x x
i

Important Measure of Variation
Shows Variation About the Mean
Computed as an arithmetic mean of
squared deviations or as a square mean of
individual deviations
For the Population:
For the Sample:
Variance
)
N
X
i


=
2
2

o
)
1
2
2


=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.
Most Important Measure of Variation
Shows Variation About the Mean:
For the Population:
For the Sample:
Standard Deviation
)
N
X
i


=
2

o
)
1
2


=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.
Sample Standard Deviation
)
1
2


=
n
X X
i
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
1 8
16 24 16 18 16 17 16 15 16 14 16 12 16 10
2 2 2 2 2 2 2

+ + + + + + ) ( ) ( ) ( ) ( ) ( ) ( ) (
= 4.2426
s
: X
i
Comparing Standard Deviations
)
1
2


n
X X
i s =
= 4.2426
)
N
X
i


=
2

o = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24 : X
i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B - AGE
Data A - AGE
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C - AGE
Coefficient of Variation Coefficient of Variation
Measure of Measure of Relative Variation Relative Variation
Always a Always a % or coefficient % or coefficient
Shows Variation Relative to Mean Shows Variation Relative to Mean
Used to Used to Compare 2 or More Groups Compare 2 or More Groups
Formula ( for Sample): Formula ( for Sample):
100%

=
X
S
CV
Comparing Coefficient of Variation Comparing Coefficient of Variation
Stock A: Stock A: Average Price last year = Average Price last year = $50 $50
Standard Deviation (sd) Standard Deviation (sd) = = $5 $5
Stock B: Stock B: Average Price last year Average Price last year = = $100 $100
(sd) = (sd) = $5 $5
100%

=
X
S
CV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Both average prices are
representatives
Shape Shape
Describes How Data Are Distributed Describes How Data Are Distributed
between smallest and largest values between smallest and largest values
Measures of Shape: Measures of Shape:
Symmetric or skewed Symmetric or skewed
Right-Skewed or
Positively Skewed
Left-Skewed or
Positive Skew-ness
Symmetric
Mean = Median = Mode Mean Median Mode Median Mean Mod
e
Box plot Box plot graphical presentation of graphical presentation of
CTM CTM
Central tendency Central tendency
measures summary measures summary
Discussed Measures of Discussed Measures of Central Tendency Central Tendency
Mean, Median, Mode Mean, Median, Mode
Addressed Measures of Addressed Measures of Variation Variation
The Range The Range, , Variance, Variance,
Standard Deviation, Coefficient of Standard Deviation, Coefficient of Variation Variation
Determined Determined Shape Shape of Distributions of Distributions
Symmetric or Skewe Symmetric or Skewed d
Coefficient of skewness Coefficient of skewness
Mean= Median = Mode Mean Median Mode Mode Median Mean

You might also like