You are on page 1of 30

Introduction to Biostatistics Using SPSS

SUMMARIZING DATA
Descriptive Analysis

Chee Fong Tyng


School of Sustainable Agriculture
Universiti Malaysia Sabah
ftchee@ums.edu.my
Descriptive
• Measures of central tendency
– Mean
– Median
– Mode
• Measures of dispersion
– Range
– Interquartile range
– Variance and standard deviation
– Coefficient of variation

Introduction to Biostatistics Using SPSS


Measures of Central Tendency

• Summarizes the entire data set into a single


variable (measurement)
• Measures of Central Tendency includes
– Median
– Trimmed Mean
– Mean
– Mode

Introduction to Biostatistics Using SPSS


Mean
• The sum of the measurements divided by the total
number of measurements or better known as the
average. x
x
n
• There is only 1 mean.
• Work well if data reasonably symmetric and
unimodal (“bell-shaped”)
• Not good measurement if got extreme data values
(“outliers”) are present & data are skewed
• Value is influences by extreme measurements
• Applicable to quantitative data only.
Introduction to Biostatistics Using SPSS
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

© 2002 Prentice-Hall, Inc.


Chap 3-5
Introduction to Biostatistics Using SPSS
Trimmed Mean
• the mean is calculated in the usual way, except that
the specified percent of the most extreme
observations are not included in the calculations
• The mean is influenced by extreme values (Outliers)
• To reduce the effect of outliers which distort the mean
value, a variation of the mean is introduced.
• Trimmed mean drops the highest and lowest extreme
values and averages the rest.
– For example, the 5% trimmed mean would ignore the
smallest 5% and the largest 5% of the measurements

Introduction to Biostatistics Using SPSS


Median
• The middle value when the measurements are arranged
from lowest to highest.
• 50% of the measurement lie above it and 50% fall below
it.
• Often used to measure the midpoint of a large set of
measurement.
– median = 50th percentile = second quartile (Q2)
• There is only 1 median
• Not influenced by extreme measurements.
• Applicable to quantitative data only.

Introduction to Biostatistics Using SPSS


0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 6

• If the number of observations (n) is odd, the median is the middle value,
or the [(n+1)/2]th observation.
• If n is even, the median is usually calculated as the average of the two
middlemost values- that is, the average of the [(n/2)]th observation and
the [(n/2) + 1]th observation.
© 2002 Prentice-Hall, Inc. Chap 3-8

Introduction to Biostatistics Using SPSS


Mode
• The measurement that occurs most often ( with the
highest frequency )
• Commonly used as a measure of popularity.
• There can be more than 1 mode.
– Unimodal, bimodal or multimodal
• Not influence by extreme measurements.
• Applicable for both qualitative and quantitative data.

Introduction to Biostatistics Using SPSS


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5, 12
© 2002 Prentice-Hall, Inc.
Chap 3-10
Introduction to Biostatistics Using SPSS
Measures of Variability
• It is not sufficient to describe a data set using only
measures of central tendency
• Need to determine how dispersed/ spread out the data
is.
• Measures of variability/spread includes
– Range
– Percentile / Quartile
– Deviation / Standard Deviation (sisihan piawai)
– Variance
– Coefficient of variation

Introduction to Biostatistics Using SPSS


Range
• Measure of variation
• The difference between the largest and the
smallest measurement / observations of the
set.
• It is easy to compute but very sensitive to
outliers.
• Does not give much information about the
pattern of variability
• Ignores the way in which data are distributed
Introduction to Biostatistics Using SPSS
Range  X Largest  X Smallest

Range = 12 - 7 = 5 Range = 12 - 7 = 5

7 8 9 10 11 12 7 8 9 10 11 12

© 2002 Prentice-Hall, Inc.


Chap 3-13
Introduction to Biostatistics Using SPSS
Percentile / Quartile
• The pth percentile of a set of n measurements
arranged in order of magnitude is that value
that has at most p% of the measurements
below it and at most ( 100 – p ) % above it.
• Example: 60th percentile has 60% of the data
below it and 40% above it.
• Percentile of interest are the 25th, 50th, 75th,
percentiles often called the lower quartile,
median, and upper quartile.
• Interquartile range – difference between the
upper and lower quartile
Introduction to Biostatistics Using SPSS
© 2002 Prentice-Hall, Inc. Chap 3-15

Quartiles
• Split Ordered Data into 4 Quarters
25% 25% 25% 25%

 Q1   Q3   Q2 
i  n  1
• Position of i-th Quartile  Qi   4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22

1 9  1 12  13
Position of Q1   2.5 Q1   12.5
4 2
Q1 Q3
• Q and are Measures of Noncentral Location
• 2
= Median, A Measure of Central Tendency
Introduction to Biostatistics Using SPSS
Skewness
• Relationship of the mode, median, mean and trimmed
mean is reflected through the skewness of the data.
• Skewness of the data measures how the data is
distributed.
• Zero Skewness
– symmetrical ( Mode = Median = Mean)
• Positive Skewness
– skewed to the right ( Mode < Median < Mean )
• Negative Skewness
– skewed to the left ( Mode > Median > Mean )

Introduction to Biostatistics Using SPSS


Distribution Shape and
Box-and-Whisker Plot
Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3

© 2002 Prentice-Hall, Inc.


Chap 3-17
Introduction to Biostatistics Using SPSS
Skewness
Positively or right skewed Negatively or left skewed
35 50
30
40
25
20 30
15 20
10
10
5
0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Mode<Median<Mean Mean<Median<Mode
Symmetrical
25

20

15

10

0
1 2 3 4 5 6 7 8 9

Mode=Median=Mean
Introduction to Biostatistics Using SPSS
Variance and Standard
Deviation
• The variance of a set of n measurements y1, y2, … ,yn with
mean y is the sum of the squared deviations divided by n
– 1.  x 2
2
 ( x  x)
2
x 
n
 2
 
n 1 n 1
• The standard deviation of a set of measurement is
defined to be the positive square root of the variance.
  2
• Both measure how spread out the data is from the mean.

Introduction to Biostatistics Using SPSS


Comparing Standard Deviations

Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21

Data B
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21 s = .9258

Data C
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21 s = 4.57
© 2002 Prentice-Hall, Inc. Chap 3-20

Introduction to Biostatistics Using SPSS


Coefficient of Variation
• Measures the variability in the values in a
population relative to the magnitude of the
population mean.
• CV = Standard Deviation
|Mean|
• The CV is a unit-free number, it is useful when
comparing variation of different sets of data.

Introduction to Biostatistics Using SPSS


Comparing Coefficient of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
• Stock B:
– Average price last year = $100
– Standard deviation = $5
• Coefficient of variation:
– Stock A: S   $5 
CV   100%   100%  10%
X   $50 

– Stock B: S   $5 
CV   100%   100%  5%
© 2002 Prentice-Hall, Inc.
Chap 3-22 X   $100 
Introduction to Biostatistics Using SPSS
• How/where to get descriptive statistics?
– Analyze -> reports or descriptive statistics

Introduction to Biostatistics Using SPSS


Introduction to Biostatistics Using SPSS
• Analyze -> reports -> case summary

Introduction to Biostatistics Using SPSS


Introduction to Biostatistics Using SPSS
Introduction to Biostatistics Using SPSS
Introduction to Biostatistics Using SPSS
Introduction to Biostatistics Using SPSS
THANK YOU
THANK YOU

You might also like