You are on page 1of 19

1/27/2015

Lecture 2: Methods for


Describing data through
Numerical Measures

Outline
Measures of central tendency and dispersion
Characteristics, uses, advantages, and disadvantages of
each measure of location and dispersion

Chebyshevs theorem and the Empirical Rule as they


relate to a set of observations

Quartiles, deciles, and percentiles


Box plots
Coefficient of skewness and coefficient of variation
Scatter diagram
Contingency table
North South University

Slide 1 of 76

School of Business

Numerical Ways of Describing


Data

North South University

Slide 2 of 76

School of Business

Parameter and Statistic


A Parameter is a measurable characteristic of a
population

Measures of location

A statistic is a measurable characteristic of a


sample

Measures of dispersion

North South University

Slide 3 of 76

School of Business

North South University

Slide 4 of 76

School of Business

1/27/2015

Measures of Location

Measures of Location and


Dispersion

Mean (Arithmetic, Weighted, Geometric)


Median
Mode

Measures for Population Data

Measures for Sample Data


Measures for Ungrouped data
Measures for grouped data

North South University

School of Business

Slide 5 of 76

Average
Joe

It is calculated by
summing the
values and
dividing by the
number of values
North South University

Slide 7 of 76

School of Business

Slide 6 of 76

Population Mean

Arithmetic Mean
The Arithmetic Mean
is the most widely used
measure of location and
shows the central value of
the data

North South University

School of Business

For ungrouped data, the


Population Mean is
the sum of all the
population values
divided by the total
number of population
values:

X
N

where
is the population mean
N is the total number of observations.
X is a particular value.
indicates the operation of adding.
North South University

Slide 8 of 76

School of Business

1/27/2015

Example 1
The Kiers family
owns four cars.
The following is
the current
mileage on
each of the four
cars.

Sample Mean
For ungrouped data, the sample mean
is the sum of all the sample values
divided by the number of sample
values:

56,000
42,000
23,000

73,000
Find the mean mileage for the cars.

X
N

North South University

56 ,000 ... 73,000


48 ,500
4
Slide 9 of 76

School of Business

Example 2
A sample of
five
executives
received the
following
bonus last
year ($000):

where n is the total number of


values in the sample.
North South University

Slide 10 of 76

School of Business

Properties of the Arithmetic Mean


Every set of interval-level and ratio-level data has a
mean.

14.0,
15.0,
17.0,
16 0
16.0,
15.0

All the values are included in computing the mean.

A set of data has a unique mean.

The mean is affected by unusually large or small data


values.

The arithmetic mean is the only measure of location


where the sum of the deviations of each value from the
mean is zero.

X 14.0 ... 15.0 77

15.4
n
5
5

North South University

X
n

Slide 11 of 76

School of Business

North South University

Slide 12 of 76

School of Business

1/27/2015

Example 3

Weighted Mean

Consider the set of values: 3, 8, and 4.


The mean is 5. Illustrating the fifth
property

(X X) (35) (85) (45) 0


North South University

Slide 13 of 76

School of Business

The Weighted Mean of a set


of numbers X1, X2, ..., Xn, with
corresponding weights w1, w2,
p
...,wn, is computed
from the
following formula:

Xw

(w1 X1 w2 X 2 ... wn X n )
(w1 w2 ...wn )

North South University

Example 4

5($0.50) 15($0.75) 15($0.90) 15($1.15)


5 151515
$44.50

$0.89
50

Xw

Slide 15 of 76

School of Business

The Median

During a one hour period on a hot


Saturday afternoon cabana boy
Chris served fifty drinks. He sold
five drinks for $0.50, fifteen for
$0.75, fifteen for $0.90, and fifteen
for $1
$1.10.
10 Compute the weighted
mean of the price of the drinks.

North South University

Slide 14 of 76

School of Business

The Median is the


midpoint of the values
after they have been
ordered from the smallest
to the largest.

There are as many


values above the
median as below it in
the data array.

For an even set of values, the median will be the


arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.
North South University

Slide 16 of 76

School of Business

1/27/2015

Example 5

The median (contd)


The ages for a sample of five college
students are:
21, 25, 19, 20, 22.

North South University

Slide 17 of 76

The heights of four basketball players, in


inches, are: 76, 73, 80, 75.

Arranging the data


g order
in ascending
gives:

Arranging the data in


ascending order
gives:

19, 20, 21, 22, 25.

73, 75, 76, 80

Thus the median is


21.

Thus the median is


75.5.

School of Business

North South University

Properties of the Median

There is a unique median for each data set.

It is not affected by extremely large or small


values and is therefore a valuable measure of
location when such values occur.
occur

Slide 18 of 76

The median is
found at the
(n+1)/2 = (4+1)/2
=2.5th data point.
School of Business

The Mode

The Mode is another measure of location and


represents the value of the observation that
appears most frequently.

It can be computed for ratio-level, intervallevel, and ordinal-level data.

North South University

Slide 19 of 76

School of Business

North South University

Slide 20 of 76

School of Business

1/27/2015

The Relative Positions of the Mean,


Median, and Mode

Example 6

Symmetric distribution: A distribution having the same shape on either side


of the center

The exam scores for ten students are: 81, 93, 84,
75, 68, 87, 81, 75, 81, 87. Because the score of
81 occurs the most often, it is the mode.
Skewed distribution: One whose shapes on either side of
the center differ; a nonsymmetrical distribution.

Data can have more than one mode. If it has two


modes, it is referred to as bimodal, three modes,
trimodal, and so on.
Can be positively or negatively skewed, or bimodal
North South University

Slide 21 of 76

School of Business

The Relative Positions of the Mean, Median,


and Mode: Symmetric Distribution
Zero skewness

Mean
= Median
= Mode

North South University

School of Business

Slide 22 of 76

The Relative Positions of the Mean, Median,


and Mode: Right Skewed Distribution
Positively skewed:
Mean and median are
to the right of the
mode.

Mean > Median > Mode

Mean
Median

Mode

Mode

North South University

Slide 23 of 76

Mean

Median
School of Business

North South University

Slide 24 of 76

School of Business

1/27/2015

The Relative Positions of the Mean, Median,


and Mode: Left Skewed Distribution

Geometric Mean
The Geometric Mean
(GM) of a set of n positive
numbers is defined as the nth
root of the product of the n
numbers. The formula is:

Negatively Skewed:
Mean and Median are to
the left of the Mode

GM

Mean < Median < Mode

Mode
Median

Slide 25 of 76

School of Business

North South University

Example 7

( 5 )( 21 )( 4 ) 7 . 49

The GM gives a more conservative


profit figure because it is not heavily
weighted by the rate of 21percent.
North South University

Slide 27 of 76

Slide 26 of 76

School of Business

Example 8

The interest rate on three bonds were 5, 21, and 4


percent.
The arithmetic mean is (5+21+4)/3 =10.0.
The geometric mean is

GM

( X 1)( X 2 )( X 3 )... ( X n )
The geometric mean is used
to average percents,
indexes, and relatives.

Mean

North South University

School of Business

The return on investment earned by Atkins Construction


Company for four successive years was: 30%, 20%, 40%, and 200%. What is the geometric mean rate of
return on investment?

GM n X1 X 2 ... X n 4 1.31.20.63.0 1.294

The average rate of return is 29.4%

North South University

Slide 28 of 76

School of Business

1/27/2015

Geometric Mean (contd)


Another use of the
geometric mean is
to determine the
percent increase in
sales, production or
other business or
economic series
from one time
period to another.
GM

Example 9

Grow th in Sales 1999-2004

The total number of females enrolled in


American colleges increased from 755,000 in
1992 to 835,000 in 2000.

Sa
ales in Millions($)

50
40
30
20
10
0
1999

2000

2001

2002

2003

2004

Year

( Value at end of period)


1
(Value at beginning of period)

North South University

School of Business

Slide 29 of 76

GM 8

The value 0.0127 indicates that the average annual growth over
the last 8-year period was 1.27%.

North South University

25
20
15
10
5
0
0

10

12

Measures of dispersion include the following:

range, mean deviation, variance, and


standard deviation.
Range = Largest value Smallest value
North South University

School of Business

The following represents the current years Return on Equity


of the 25 companies in an investors portfolio.

30

refers to the
spread or
variability in
the data.

Slide 30 of 76

Example 10

Measures of Dispersion
Dispersion

835,000
1 .0127
755,000

Slide 31 of 76

School of Business

-8.1
-5.1
-3.1
-1.4
14
1.2

3.2
4.1
4.6
48
4.8
5.7

5.9
6.3
7.9
79
7.9
8.0

8.1
9.2
9.5
97
9.7
10.3

12.3
13.3
14.0
15 0
15.0
22.1

Highest value: 22.1 Lowest value: -8.1


Range = Highest value lowest value
= 22.1-(-8.1)
= 30.2
North South University

Slide 32 of 76

School of Business

1/27/2015

Example 11

Mean Absolute Deviation (MAD)


The main features :

MAD:

All

values are used in the


calculation.
It is not unduly influenced
by large or small values.
values
The absolute values are
difficult to manipulate.

The arithmetic
mean of the
absolute
values
l
off the
th
deviations
from the
arithmetic
mean.

MAD

North South University

Slide 33 of 76

The weights of a sample of crates containing


books for the bookstore (in pounds ) are:
103, 97, 101, 106, 103
Find the mean deviation.
X = 102
The mean deviation is:

MD

X X

103 102 ... 103 102

n
1 5 1 4 5

2.4
5

n
School of Business

X X

North South University

Variance and standard


Deviation

Slide 34 of 76

School of Business

Population Variance
The major characteristics:

Variance:
the arithmetic
mean of the
squared
deviations from
the mean.

Not

influenced by extreme values.


units are awkward,, the square
q
of the
original units.
All values are used in the calculation.
The

Standard deviation:

The
square root of the variance.

North South University

Slide 35 of 76

School of Business

North South University

Slide 36 of 76

School of Business

1/27/2015

Variance and standard deviation


Population Variance formula:

Example 10 (revisited)
In Example 10, the variance and standard
deviation are:

(X - )2
N

X is the value of an observation in the population


is the arithmetic mean of the population

(-8.1-6.62)2 + (-5.1-6.62)2 + ... + (22.1-6.62)2


25

N is the number of observations in the population

= 42.227

Population Standard Deviation formula:

= 6.498

2
North South University

Slide 37 of 76

School of Business

Sample variance and standard


deviation
Sample variance (s2):

s2

(X - X)2
n-11

Sample standard deviation (s):

s s2
North South University

Slide 39 of 76

North South University

School of Business

Slide 38 of 76

Example 12
The hourly wages earned by a sample of five
students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
X

X 37
X

7.40
n
5

7 7.4 ... 6 7.4 21 .2 5.30


X X

n 1
5 1
5 1
2

s2

s
School of Business

6.62

North South University

s2

5 . 30 2 . 30

Slide 40 of 76

School of Business

10

1/27/2015

Chebyshevs theorem
Chebyshevs theorem: For any set of
observations (sample or population), the proportion
of the values that lie within k standard deviations of
the mean is at least:

1
where

1
k2

Chebyshevs theorem (contd)


The arithmetic mean biweekly amount by the
Dupree Paint employees to the companys profitsharing plan was $51.54, and the standard
deviation is $7.51. At least what percent of the
contributions lie within plus 3.5 standard deviations
and minus 3.5
3 5 standard deviations of the mean?
1

1
1
1
1
1
0 . 92
k2
12 . 25
3 . 5 2

k is any constant greater than 1.


About 92%.

North South University

Slide 41 of 76

School of Business

Interpretation and Uses of the


Standard Deviation
Empirical Rule: For any symmetrical, bell-shaped
distribution:
About

About

North South University

Slide 42 of 76

Interpretation and Uses of the


Standard Deviation
Bell-shaped Curve showing the relationship between
and

68% of the observations will lie within 1s


off the
th mean

68%

95% of the observations will lie within 2s


of the mean

Virtually

all (99.7%) the observations will be within


3s of the mean

North South University

Slide 43 of 76

School of Business

School of Business


North South University

95%
99.7%

Slide 44 of 76

School of Business

11

1/27/2015

The Mean of Grouped Data


The

Mean of a sample of data

organized in a frequency
distribution is computed by the
following formula:

North South University

Mf
n

Slide 45 of 76

School of Business

The Median of Grouped Data


The Median of a sample of data organized in a
frequency distribution is computed by:

Example 13
A sample of ten
movie theaters
in a large
metropolitan
area tallied the
total number of
movies showing
last week.
Compute the
mean number of
movies
showing.
North South University

Movies
showing
1 up to 3

frequency class
f
midpoi
nt M
1
2

(f)(M)

3 up to 5

5 up to 7

18

7 up to 9

9 up to 11

10

30

Total

10

66

Mf
66

6 .6
n
10

Slide 46 of 76

School of Business

Finding the Median Class


Construct a cumulative frequency distribution.
Decide the class that contains the median. Median
Class is the first class with the value of cumulative
frequency at least n/2.
n/2

n
CF
Median L 2
(i )
f
where L is the lower limit of the median class, CF is the
cumulative frequency preceding the median class, f is the
frequency of the median class, and i is the median class
interval.
North South University

Slide 47 of 76

School of Business

North South University

Slide 48 of 76

School of Business

12

1/27/2015

Example 13 (revisited)

Example 13 (contd)

Movies
showing
1 up to 3

Frequency
1

Cumulative
Frequency
1

3 up to 5

5 up to 7

7 up to 9

9 up to 11

10

North South University

Slide 49 of 76

School of Business

The Mode of Grouped Data


The Mode for grouped data is
approximated by the midpoint of the class
with the largest class frequency.
Movies
showing
h i
1 up to 3

frequency
class
f
midpoint
id i t
M
1
2

3 up to 5

5 up to 7

7 up to 9

9 up to 11

10

North South University

Slide 51 of 76

The modes in
example 13 are 6
and 10 and so is
bimodal.

School of Business

From the table, L= 5, n =10, f = 3, i = 2, CF = 3

n
10
CF
3
Median L 2
(i ) 5 2
(2) 6.33
f
3

North South University

Slide 50 of 76

School of Business

The Standard Deviation of


Grouped Data
The

Standard Deviation

of a sample of data organized in


a frequency distribution is
computed by the following
formula:

s
North South University

f M X
n 1
Slide 52 of 76

School of Business

13

1/27/2015

Example 13 (revisited)
A sample of ten
movie theaters in a
large metropolitan
area tallied the total
number of movies
showing last week.
Compute the
standard deviation of
movies showing.

Movies
showing
1 up to 3

frequency f

class
midpoint M
2

(M-X)

f*(M-X)

-4.6

21.16

3 up to 5

-2.6

13.52

5 up to 7

-0.6

1.08

7 upp to 9

1.4

1.96

9 up to 11

10

3.4

34.68

Total

X
n 1

North South University

Other Measures of Dispersion

10

Quartiles divide a set of observations into four equal


parts

Deciles divide a set of observations into 10 equal


parts

Percentiles divide a set of observations into 100


equal parts

72.40

72 . 40
10 1

Slide 53 of 76

2 . 8363

School of Business

North South University

Slide 54 of 76

School of Business

Location of a Percentile

Quartiles

Locate the median,

Lp = (n+1)

(50th percentile)
first quartile (25th percentile)

P
100

where

and the 3rd quartile

P is the desired percentile

(75th percentile)

North South University

Slide 55 of 76

School of Business

North South University

Slide 56 of 76

School of Business

14

1/27/2015

Example 14
Stock prices on twelve
consecutive days for a
major
publicly traded company

Example 14 (contd)
Using the twelve stock prices, we can find the
median, 25th, and 75th percentiles as follows:

Quartile 3

100

L75 = (12 + 1) 75 = 9.75th observation


100

90

80

70

86, 79, 92, 84, 69, 88, 91

60

Median

L50 = (12 + 1)

50
th
100 = 6.50 observation

Quartile 1

L25 = (12+1)

25 = 3.25th observation
100

50
1

10

11 12

83, 96, 78, 82, 85.

North South University

Slide 57 of 76

School of Business

Example 14 (contd)

North South University

School of Business

Slide 58 of 76

Interquartile Range

To locate the values, the first step is to organize the data in increasing order

12
Q4 11
10
9
Q3 8
7
6
Q2 5
4
3
Q1 2
1

96
92
91
88
86
85
84
83
82
79
78
69

75th percentile
Price at 9.75 observation = 88 + .75(91-88)
= 90.25
50th percentile: Median
Price at 6.50 observation = 84 + .5(85-84)
= 84.50
25th percentile
Price at 3.25 observation = 79 + .25(82-79)
= 79.75

North South University

Slide 59 of 76

School of Business

The Interquartile
range is the distance
between the third
quartile Q3 and the
first quartile Q1.

This distance will


include the middle
50 p
percent of the
observations.

Interquartile range = Q3 - Q1
North South University

Slide 60 of 76

School of Business

15

1/27/2015

Box Plots

Example 15

A box plot is a graphical


display, based on quartiles,
that helps to picture a set of
data.

For a set of
observations the third
quartile is 24 and the
first quartile is 10.
What is the quartile
deviation?

Five pieces of
data are needed
to construct a box
plot: the Minimum
Value, the First
Quartile, the
Median, the Third
Quartile, and the
Maximum Value.

The interquartile range


is 24 - 10 = 14. Fifty
percent of the
observations will occur
between 10 and 24.
North South University

Slide 61 of 76

School of Business

Example 16

North South University

Slide 62 of 76

School of Business

Example 16 (contd)

Based on a sample of 20 deliveries,


Buddys Pizza determined the following
information. The minimum delivery
time was 13 minutes and the maximum
30 minutes. The first quartile was 15
minutes the median 18 minutes
minutes,
minutes, and the
third quartile 22 minutes. Develop a box
plot for the delivery times.

North South University

Slide 63 of 76

School of Business

North South University

Slide 64 of 76

School of Business

16

1/27/2015

Example 16 (contd)
Min Q
1

Median

Coefficient of Variation
Max

Q3

Relative dispersion
The coefficient of variation
is the ratio of the standard
deviation to the arithmetic
mean expressed as a
mean,
percentage:

12

14

16

18

20

22

24

26

28

30

32

CV

s
(100 %)
X

Mean
North South University

Slide 65 of 76

School of Business

Skewness
symmetry of the distribution.

sk

North South University

Using the twelve stock prices, we find the mean to be 84.42,


standard deviation, 7.18, median, 84.5.
86 79 92 84 69 88
91 83 96 78 82 85

The coefficient of
skewness can range
from -3
3.00
00 up to 3
3.00
00
when using the
following formula:

3 X Median
s

School of Business

Slide 66 of 76

Example 14 revisited

Skewness is the measurement of the lack of

North South University

Coefficient of variation:
CV

A value of 0 indicates a
symmetric distribution.

s
(100 %)
X

= 8.5%

Coefficient of skewness:
Some software packages use a
different formula which results in a
wider range for the coefficient.

Slide 67 of 76

School of Business

sk
North South University

3 X Median
s
Slide 68 of 76

= -.035
School of Business

17

1/27/2015

Scatter diagram

Relationship Between Two


Variables
Scatter
diagram :
A technique
used to
show the
relationship
between
variables.

Univariate Data (Single Variable)


Bivariate Data (Two Variables)
Scatter diagram
Contingency table

North South University

School of Business

Slide 69 of 76

8.0
7.5
7.5
7.3
7.2
7.2
7.1
7.1
7.0
6.2
6.2
5.1

96
92
91
88
86
85
84
83
82
79
78
69

North South University

Slide 70 of 76

School of Business

Contingency table

The twelve days of stock prices and


the overall market index on each day
are given as follows:

A contingency table is used


to classify observations
according to two identifiable
characteristics.

Relationship between Market Index


and Stock Price

Contingency
g
y tables are used
when one or both variables are
nominally scaled.

100
90
Price

Price

Relationship can be positive (direct) or


negative (inverse)

North South University

Example 14 revisited
Index
(000s)

Variables must be at least interval scaled

80
70
60

A contingency table is a
cross tabulation that
simultaneously summarizes
two variables of interest.

50
5

10

Index

Slide 71 of 76

School of Business

North South University

Slide 72 of 76

School of Business

18

1/27/2015

Example 17 (contd)

Example 17
Weight Loss
45 adults, all 60 pounds
overweight, are randomly
assigned to three weight loss
programs. Twenty weeks into
the program,
program a researcher
gathers data on weight loss
and divides the loss into three
categories: less than 20
pounds, 20 up to 40 pounds,
40 or more pounds. Here are
the results.

Weight
Loss
Plan

Less 20 up to
40
than 20
40
pounds
pounds pounds or more

Plan 1

Plan 2

12

Plan 3

12

Compare the weight loss under the three plans.


North South University

Slide 73 of 76

School of Business

Practice Problems

North South University

Slide 74 of 76

School of Business

Assignment-2

Problem 11 (Page 62)


Problem 55 (Page 84)

(Problem 13)

Problem 21 (Page 68)

(Problem 59 (Page 88))

Problems 11, 13 (Page 108)

(Problem 25 (Page 69))

Problem 27 (Page 70)

(Problems 11, 13 (Page 110))

Problem 15 (Page 111)

(Problem 31 (Page 71))

Problem 42 (Page 76)

(Problem 15 (Page 113))

Problem 20 (Page 113)

(Problem 46 (Page 79))

Problem 47 (Page 79)

Problem 25 (Page 117)

(Problem 51 (Page 82))

(Problem 21 (Page 118))

Problem 49 (Page 81)


(Problem 53 (Page 84))
North South University

Slide 75 of 76

School of Business

North South University

Slide 76 of 76

School of Business

19

You might also like