You are on page 1of 5

Chapter 3 - Outline

Univariate Population Parameters and Sample Statistics


A. Summation notation
X = variable
Xi = the score for variable X for a particular individual or object i
i = serves to identify one individual or object from another
= the sum of (capital sigma)
i = 1 is the lower limit or beginning of the summation
A. Summation notation
Take the following set of scores or ages:
X1 = 7 X2 = 11 X3 = 18 X 4 = 20 X5 = 24
Here i = 1, 2, 3, 4, 5 or i = 1, , 5
= 7 + 11 + 18 + 20 + 24 = 80
A. Summation notation
The most frequently used summation notation for sample data is
B. Measures of central tendency
mode, median, and mean.
1. The mode
1. The mode is defined as that value in a distribution of scores that occurs most
frequently.
B. Measures of central tendency: Mode
Bimodal distributions can occur.
Some people report both modes, while others average the two modes if they are adjacent.
If they are not adjacent, do not average them.
The mode is determined in the same way whether you are talking about a population parameter
or a sample statistic.
B. Measures of central tendency: Mode
General characteristics:
a. simple to obtain (+)
b. does not always have a unique value (-)
c. not a function of all of the scores (-)
d. difficult to deal with mathematically (-)
e. can be used with any type of measurement scale
B. Measures of central tendency: Median
2. Median
The median is defined as that score which divides the distribution of scores into
two equal parts.
In other words, 50% of the scores fall above the median and 50% of the
scores fall below the median.
B. Measures of central tendency: Median
Odd number of untied scores:
1 3 7 11 21
Even number of untied scores:
1 3 5 11 21 32
Tied scores:

Determine 50th percentile or Q2


B. Measures of central tendency: Median
General characteristics:
a. not influenced by extreme scores (outliers) (+)
b. not a function of all of the scores (-)
c. difficult to deal with mathematically (-)
d. always has a unique value (+)
e. can be used with any type of measurement scale except nominal
Population mean m (mu):
Sample mean:
General characteristics:
a. function of every score (+)
b. influenced by extreme scores (-)
c. always has a unique value (+)
d. easy to deal with mathematically; most stable of the measures of central tendency
(+)
e. only appropriate for interval and ratio measurement scales (What about Likert?)
Summary of Measures of Central Tendency
a. mode is the only appropriate measure for nominal data.
b. median and mode are both appropriate for ordinal data.
c. all three measures are appropriate for interval and ratio data.
C. Measures of dispersion
(if the mean is 50, what do we know about the distribution of scores?)
C. Measures of dispersion
Another method for summarizing a set of scores is to construct an index or value that can
be used to describe the amount of variability among the scores.
That is, do the scores tend to fall fairly close to the central tendency measure or are the
scores fairly well spread out?
These indices are known as the measures of dispersion (or variability).
Here we consider the most popular such measures:
range,
H spread,
variance, and
standard deviation.
C. Measures of dispersion:
Range
1. Range (two different definitions)
Exclusive range: The difference between the largest and smallest scores in a
collection of scores.
ER = Xmax Xmin
= 20 9 = 11
Inclusive range: The difference between the upper real limit of the interval
containing the largest score and the lower real limit of the interval containing the
smallest score in the collection of scores.
IR = URL of Xmax LRL of Xmin
= 20.5 8.5 = 12

C. Measures of dispersion:
Range
General characteristics:
a. simple to obtain (+)
b. influenced by extreme scores (-)
c. only a function of two of the scores (-)
d. unstable from sample to sample (-)
e. can be used with any type of measurement scale
C. Measures of dispersion:
H spread
2. H spread
H = Q3 Q1
H = 18.0833 13.1250 = 4.9583
H spread relies on the difference between the third and first quartiles, Q3 Q1.
H is short for hinge, developed by Tukey.
This is also known as the interquartile range.
H measures the range of the middle 50% of the distribution.
The larger the value, the greater is the spread in the middle of the distribution.
C. Measures of dispersion:
H spread
General characteristics:
a. unaffected by extreme scores (+)
b. not a function of all of the scores (-)
c. not very stable from sample to sample (-)
d. can be used with any type of measurement scale except nominal
C. Measures of dispersion:
Deviation scores
3. Deviation scores
The difference between a particular raw score and the mean of the collection of scores (i.e.,
population or sample).
Summing the deviation scores will always equal 0.
Thus, any measure involving simple deviation scores will be useless in that the sum of the
deviation scores will always be zero, regardless of the spread of scores.
Because it sums to zero, it is rarely used in statistics.
C. Measures of dispersion:
Deviation scores
Notation
Deviation
di = Xi m
Sum of deviation scores
C. Measures of dispersion:
Deviation scores
Table 3.3
Credit card data
C. Measures of dispersion:
Population variance

4. Population variance
measure of the area of a distribution
Definitional formula:
C. Measures of dispersion:
Population variance
It is conceptually how you define variance.
Conceptually, the variance is a measure of the area of a distribution.
That is, the more spread out the scores, the more area or space the distribution takes up and the
larger is the variance.
C. Measures of dispersion:
Population standard deviation
5. Population standard deviation
C. Measures of dispersion
General characteristics of variance and standard deviation:
a. are a function of every score (+)
b. affected by extreme scores (-)
c. quite useful for deriving other statistics (+)
d. can be used with interval and ratio measurement scales
Comparison of central tendency measures with dispersion measures
a. mode and range share certain characteristics
b. median shares certain characteristics with H spread
c. mean shares many characteristics with the variance and standard deviation
Why cant we just convert everything to sample statistics?
The sample mean may not be the same as the population mean. In most samples, the sample
mean will be somewhat different than the population mean.
You cant use the population mean anyways because it is unknown. Because they are different,
the deviations will be affected. The sample variance that would be obtained would be a biased
estimate of the population variance. The sample variance obtained would be systematically too
small.
In order to get an unbiased sample estimate of the population variance, a slight change in the
computational formula has to be made.
C. Measures of dispersion:
Sample variance
6. Sample variance
Definitional formula:
If n is used in the denominator, then the estimate of the population variance would
be biased (would be too small).
So we have to make an adjustment in the denominator to obtain an unbiased
estimate of the population variance.
C. Measures of dispersion:
Sample variance
7. Sample standard deviation
Summary of measures of dispersion
1. The range is the only appropriate measure for ordinal data. (Likert?)
2. The H spread, variance, and standard deviation can be used with interval or ratio
measurement scales.

3. There are no measures of dispersion appropriate for nominal data.


The Normal Distribution
Typical normal distribution
A. The normal distribution:
2. Characteristics
a. standard curve:
symmetric around the mean
unimodal
bell-shaped
mean = median = mode
B. The normal distribution:
2. Characteristics
f. constant relationship to the standard deviation (see Figure 4.1 again)
The Empirical Rule
68% of distribution within 1 s.d. of mean
95% of distribution within 2 s.d. of mean
99% of distribution within 3 s.d. of mean
Figure 4.1: The normal distribution

You might also like