Professional Documents
Culture Documents
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the central location of an entire distribution of scores.
The typical score. The center of the distribution.
Central Tendency
Measures of Central Tendency:
Mean
The sum of all scores divided by the number of scores.
Median
The value that divides the distribution in half when observations are ordered.
Mode
The most frequent score.
Cons
Ignores most of the information in a distribution. Small samples may not have a mode.
Median is 6.5
Cons
May not exist in the data. Doesnt take actual values into account.
Mean
Is the balance point of a distribution. The sum of negative deviations from the mean exactly equals the sum of positive deviations from the mean.
Mean
Population
mu
X = N
N, the total number of scores in a population sigma, the sum of X, add up all scores
Sample
X bar
X X = n
n, the total number of scores in a sample
X X = n
13005 X = = 371.60 35
Mean hotel rate: $371.60
Cons
Influenced by extreme scores and skewed distributions. May not exist in the data.
Which average?
Each measure contains a different kind of information.
For example, all three measures are useful for summarizing the distribution of American household incomes.
In 1998, the income common to the greatest number of households was $25,000. Half the households earned less than $38,885. The mean income was $50,600.
Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.
Which average?
Wal-Mart's average wage is around $10 an hour, nearly double the federal minimum wage. The truth is that our wages are competitive with comparable retailers in each of the more than 3,500 communities we serve, with one exception: a handful of urban markets with unionized grocery workers. Few people realize that about 74 percent of Wal-Mart hourly store associates work full-time, compared to 20 to 40 percent at comparable retailers. This means Wal-Mart spends more broadly on health benefits than do most big retailers, whose part-timers are not offered health insurance. You may not be aware that we are one of the few retail firms that offer health benefits to parttimers. Premiums begin at less than $40 a month for an individual and less than $155 per month for a family.
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Measures of Variability
A single summary figure that describes the spread of observations within a distribution.
Measures of Variability
Range
Difference between the smallest and largest observations.
Interquartile Range
Range of the middle half of scores.
Variance
Mean of all squared deviations from the mean.
Standard Deviation
Rough measure of the average amount by which observations deviate from the mean. The square root of the variance.
Cons
Value depends only on two scores. Very sensitive to outliers. Influenced by sample size (the larger the sample, the larger the range).
Interquartile Range:
(35+1)/4 = 9 472-257 = 215
Cons
Discards much of the data.
Variance
The average amount that a score deviates from the typical score.
Score Mean = Difference Score Average of Difference Scores = 0 In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
(X ) =
N
(X X ) =
n
sigma
Variance
Use the definitional formula to calculate the variance.
n (3 6) 2 + (4 6) 2 + (4 6) 2 + (4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 2 S = 10 40 S2 = = 4.0 10
(X X ) =
S =
n X ( X )
2
n2
Variance
Use the computational formula to calculate the variance.
X X2
S2 =
n X 2 ( X ) 2 n2
X X2 472 222784 303 91809 280 78400 282 79524 417 173889 400 160000 254 64516 205 42025 384 147456 264 69696 317 100489 76 5776 643 413449 480 230400 136 18496 250 62500 100 10000 732 535824 317 100489 264 69696 384 147456 750 562500 402 161604 422 178084 373 139129 325 105625 313 97969 749 561001 791 625681 196 38416 891 793881 283 80089 52 2704 186 34596 693 480249 Sum: 13386 Sum: 6686202
Cons
Hard to interpret. Can be influenced by extreme scores.
Standard Deviation
To undo the squaring of difference scores, take the square root of the variance. Return to original units rather than squared units.
Standard Deviation
Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.
Population
Sample
=
=
N
2
2
s= s
S=
2
2
2
(X )
(X X )
n
2
N X2 ( X) N
2
S=
n X ( X ) n2
S= S=
(X X )
n
(3 6) 2 + (4 6) 2 + (4 6) 2 + ( 4 6) 2 + (6 6) 2 + (7 6) 2 + (7 6) 2 + (8 6) 2 + (8 6) 2 + (9 6) 2 10
S= n X 2 ( X ) n2
2
40 S= = 2.0 10
10(400) (60) 2 S= 10 2
S=
S = 4.0 S = 2.0
hotel rates
4 3 2 1 0 0-99 100-199 200-299 300-399 400-499 Rates 500-599 600-699 700-799 800-899
Mean:
$371.60
Standard Deviation:
S = 44760.88 = $211.57
Cons
Influenced by extreme scores.