Central Tendency and Variability

Central Tendency and Variability
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score. The center of the distribution.
One distribution can have multiple locations where scores cluster.

Must decide which measure is best for a given situation.
Central Tendency
Measures of Central Tendency:
Mean
The sum of all scores divided by the number of scores.
Median
The value that divides the distribution in half when observations are ordered.
Mode
The most frequent score.
Central Tendency Example: Mode

52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Mode: most frequent observation Mode(s) for hotel rates:
264, 317, 384
Pros and Cons of the Mode

Pros
Good for nominal data. Good when there are two typical scores. Easiest to compute and understand. The score comes from the data set.
Cons
Ignores most of the information in a distribution. Small samples may not have a mode.
Central Tendency Example: Median

52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 The median is the middle value when observations are ordered.
To find the middle, count in (N+1)/2 scores when observations are ordered lowest to highest.
Median hotel rate:

(35+1)/2 = 18 317
Finding the median with an even number of scores.

2, 2, 3, 5, 6, 7, 7, 7, 8, 9 With an even number of scores, the median is the average of the middle two observations when observations are ordered.
Find the average of the N/2 and the (N+2)/2 score.
N/2 = 5th score, (N+2)/2 = 6th score
Add middle two observations and divide by two.

(6+7)/2 = 6.5
Median is 6.5
Pros and Cons of Median

Pros
Not influenced by extreme scores or skewed distributions. Good with ordinal data. Easier to compute than the mean.
Cons
May not exist in the data. Doesnt take actual values into account.
Mean
Is the balance point of a distribution. The sum of negative deviations from the mean exactly equals the sum of positive deviations from the mean.
Mean
Population
mu
sigma, the sum of X, add up all scores
X = N
N, the total number of scores in a population sigma, the sum of X, add up all scores
Sample
X bar
X X = n
n, the total number of scores in a sample
Central Tendency Example: Mean

52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
Mean hotel rate:
X X = n
13005 X = = 371.60 35
Mean hotel rate: $371.60
Pros and Cons of the Mean

Pros
Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Good for interval and ratio data. Does not ignore any information. Inferential statistics is based on mathematical properties of the mean.
Cons
Influenced by extreme scores and skewed distributions. May not exist in the data.
The effect of skew on average.

In a normal distribution, the mean, median, and mode are the same. In a skewed distribution, the mean is pulled toward the tail.
Which average?
Each measure contains a different kind of information.
For example, all three measures are useful for summarizing the distribution of American household incomes.
In 1998, the income common to the greatest number of households was $25,000. Half the households earned less than $38,885. The mean income was $50,600.
Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.
Which average?
Wal-Mart's average wage is around $10 an hour, nearly double the federal minimum wage. The truth is that our wages are competitive with comparable retailers in each of the more than 3,500 communities we serve, with one exception: a handful of urban markets with unionized grocery workers. Few people realize that about 74 percent of Wal-Mart hourly store associates work full-time, compared to 20 to 40 percent at comparable retailers. This means Wal-Mart spends more broadly on health benefits than do most big retailers, whose part-timers are not offered health insurance. You may not be aware that we are one of the few retail firms that offer health benefits to parttimers. Premiums begin at less than $40 a month for an individual and less than $155 per month for a family.
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Measures of Variability
A single summary figure that describes the spread of observations within a distribution.
Measures of Variability
Range
Difference between the smallest and largest observations.
Interquartile Range
Range of the middle half of scores.
Variance
Mean of all squared deviations from the mean.
Standard Deviation
Rough measure of the average amount by which observations deviate from the mean. The square root of the variance.
Variability Example: Range

Las Vegas Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 Range: 891-52 = 839
Pros and Cons of the Range

Pros
Very easy to compute. Scores exist in the data set.
Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).
Variability Example: Interquartile Range

Las Vegas Hotel Rates
52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
Interquartile Range:
(35+1)/4 = 9 472-257 = 215
Pros and Cons of the Interquartile Range

Pros
Fairly easy to compute. Scores exist in the data set. Eliminates influence of extreme scores.
Cons
Discards much of the data.
Variance
The average amount that a score deviates from the typical score.
Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
Variance: Definitional Formula

Population Sample
2
(X ) =
N
(X X ) =
n
sigma
Variance
Use the definitional formula to calculate the variance.
n (3 6) 2 + (4 6) 2 + (4 6) 2 + (4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 2 S = 10 40 S2 = = 4.0 10
(X X ) =
Variance: Computational Formula

Population Sample
N X 2 ( X ) 2 N
2
S =
n X ( X )
2
n2
Variance
Use the computational formula to calculate the variance.
X X2
S2 =
n X 2 ( X ) 2 n2
10(400) (60) 2 S = 10 2 4000 3600 S2 = 100 S 2 = 4.0

2
3 9 4 16 4 16 4 16 6 36 7 49 7 49 8 64 8 64 9 81 Sum: 60 Sum: 400
Variability Example: Variance

S2 = n X 2 ( X ) 2 n2
35(6686202) (13386) 2 S2 = 35 2 234017070 179184996 2 S = 1225 S 2 = 44760.88
X X2 472 222784 303 91809 280 78400 282 79524 417 173889 400 160000 254 64516 205 42025 384 147456 264 69696 317 100489 76 5776 643 413449 480 230400 136 18496 250 62500 100 10000 732 535824 317 100489 264 69696 384 147456 750 562500 402 161604 422 178084 373 139129 325 105625 313 97969 749 561001 791 625681 196 38416 891 793881 283 80089 52 2704 186 34596 693 480249 Sum: 13386 Sum: 6686202
Pros and Cons of Variance

Pros
Takes all data into account. Lends itself to computation of other stable measures (and is a prerequisite for many of them).
Cons
Hard to interpret. Can be influenced by extreme scores.
Standard Deviation
To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.
Standard Deviation
Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.
Population
Sample
=
=
N
2
2
s= s
S=
2
2
2
(X )
(X X )
n
2
N X2 ( X) N
2
S=
n X ( X ) n2
Variability Example: Standard Deviation
S= S=
(X X )
n
(3 6) 2 + (4 6) 2 + (4 6) 2 + ( 4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 10
S= n X 2 ( X ) n2
2
40 S= = 2.0 10
10(400) (60) 2 S= 10 2
Mean: 6 Standard Deviation: 2
S=
4000 3600 100
S = 4.0 S = 2.0
Variability Example: Standard Deviation

9 8 7 6 Frequency 5
hotel rates
4 3 2 1 0 0-99 100-199 200-299 300-399 400-499 Rates 500-599 600-699 700-799 800-899
Mean:
$371.60
35(6686202) (13386) 2 S= 35 2 S= 234017070 179184996 1225
Standard Deviation:
S = 44760.88 = $211.57
Pros and Cons of Standard Deviation

Pros
Lends itself to computation of other stable measures (and is a prerequisite for many of them). Average of deviations around the mean. Majority of data within one standard deviation above or below the mean.
Cons
Influenced by extreme scores.
Mean and Standard Deviation

Using the mean and standard deviation together:
Is an efficient way to describe a distribution with just two numbers. Allows a direct comparison between distributions that are on different scales.

Central Tendency and Variability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Central Tendency and Variability

Uploaded by

Copyright:

Available Formats

Central Tendency and Variability

One distribution can have multiple locations where scores cluster.

Central Tendency Example: Mode

Pros and Cons of the Mode

Central Tendency Example: Median

Median hotel rate:

Finding the median with an even number of scores.

Add middle two observations and divide by two.

Pros and Cons of Median

sigma, the sum of X, add up all scores

Central Tendency Example: Mean

Mean hotel rate:

Pros and Cons of the Mean

The effect of skew on average.

Variability Example: Range

Pros and Cons of the Range

Variability Example: Interquartile Range

Pros and Cons of the Interquartile Range

Variance: Definitional Formula

Variance: Computational Formula

10(400) (60) 2 S = 10 2 4000 3600 S2 = 100 S 2 = 4.0

3 9 4 16 4 16 4 16 6 36 7 49 7 49 8 64 8 64 9 81 Sum: 60 Sum: 400

Variability Example: Variance

35(6686202) (13386) 2 S2 = 35 2 234017070 179184996 2 S = 1225 S 2 = 44760.88

Pros and Cons of Variance

Variability Example: Standard Deviation

Mean: 6 Standard Deviation: 2

4000 3600 100

Variability Example: Standard Deviation

35(6686202) (13386) 2 S= 35 2 S= 234017070 179184996 1225

Pros and Cons of Standard Deviation

Mean and Standard Deviation

You might also like