You are on page 1of 43

UNIVERSITY OF DAR ES

SALAAM-MBEYA COLLEGE
OF HEALTH AND ALLIED
SCIENCES (UDSM-MCHAS)
BIOSTATISTICS
BY
C. MBOTWA
Introduction
• This module will equip students with knowledge,
skills and ability to plan, conduct, analyse and
interpret biomedical, clinical or community health
research and communicate the results of statistical
analyses for decion-making purposes.
• It also introduces students to the main sources of
demographic information and vital statistics for
monitoring and assessing the health situation and
trends
1.1 Use of Biostatistics
Self directed learning
• Discuss the use of biostatistics in:-
1. Medicine
2. Physiology and Anatomy
3. Pharmacology
4. Public health
5. Nutrition
6. Health planning, monitoring and evaluation
7. In dental science
1.2 Variables and Measurement
Levels
Group discussion questions
1. Classify each variable as qualitative or
quantitative. State the scale of measurement
applicable for each.
i. Height
ii. Temperature
iii. Blood type
iv. Volume
v. Stages of the disease
1.2 Variables and Measurement
Levels
2. Put tick () where applicable for each of
the following provides:-
Provides Nominal Ordinal Interval Ratio
Frequency distribution
Mode, Median
The order of the values is known
Can quantify the difference between each
value

Can add or subtract values


Can multiply and divide values
Has true zero
Descriptive Statistics
• Descriptive statistics is a branch of statistics
that describe what you get. It includes:-
i. Frequencies and frequency distribution
ii. Measures of central tendency
iii. Measures of dispersion
iv. Measures of distribution of the data
Frequencies
• Summarizing categorical variables is
straightforward, the main task being to count
the number of observations in each category.
These counts are called frequencies.
• They are often also presented as relative
frequencies; that is as proportions or
percentages of the total number of individuals
Frequencies
• For example, Table 1.1 summarizes the
method of delivery recorded for 600 births in a
hospital.
• The variable of interest here is method of
delivery, a categorical variable with three
levels (Normal, Forceps, and Caesarean
section).
• Frequencies and relative frequencies are
commonly illustrated by a bar chart (also
known as a bar diagram) or by a pie chart.
Table 3.1 Method of delivery of 600
babies born in a hospital.
Methods of delivery No. of births Percentage

Normal 478 79.7

Forceps 65 10.8

Caesarean 57 9.5

Total 600 100.0


Frequency distributions
• Frequency distribution is a table showing the
number of observations at different values or
within certain ranges.
• For a discrete variable the frequencies may be
tabulated either for each value of the variable
or for groups of values.
• With continuous variables, groups have to be
formed
Guidelines for preparing
frequency distribution table
• Choose between 5 and 20 classes
• Choose classes that will accommodate all the
data
• Choose classes that are mutually exclusive
• If possible make all classes of equal length
• Consider the data on Table 1.2 for hemoglobin
which has been measured g/100 ml. Construct
a frequency distribution table
Table 1.2 Hemoglobin levels in
g/100 ml for 70 women.
10.2 13.7 10.4 14.9 11.5 12.0 11.0
13.3 12.9 12.1 9.4 13.2 10.8 11.7
10.6 10.5 13.7 11.8 14.1 10.3 13.6
12.1 12.9 11.4 12.9 10.6 11.4 11.9
9.3 13.5 14.6 11.2 11.7 10.9 10.4
12.0 12.9 11.1 8.8 10.2 11.6 12.5
13.4 12.1 10.9 11.3 14.7 10.8 13.3
11.9 11.4 12.5 13.0 11.6 13.1 9.7
15.1 10.7 12.9 13.4 12.3 12.9 11.0
11.1 13.5 10.9 13.1 11.8 12.2 13.5
Frequency distributions
Table 1.3. Frequency distribution table
Hemoglobin No. of Women Percentage

8.0-8.9 1 1.4

9.0-9.9 3 4.3

10.0-10.9 14 20.0

11.0-11.9 19 27.1

12.0-12.9 14 20.0

13.0-13.9 13 18.6

14.0-14.9 5 7.1

15.0-15.9 1 1.4

Total 70 100
Frequency distributions
• NB: Frequency distributions are usually
illustrated by histograms. Either the
frequencies or the percentages may be used;
the shape of the histogram will be the same.

Question: Construct a histogram from frequency


distribution above (Table 1.3)
Terminologies Associated with a
Frequency Distribution
1. Class Interval
 A symbol of group such as 10.0-10.9 in Table
1.3 is called interval. Smaller one 10.0 in this
case is called Lower Class Limit and the
larger one 10.9 in this case is called the
Upper Class Limit.
 In case where either the lower class limit of
the first class interval or the upper class limit
of the last class interval is not indicated then
we say we have an Open Class Interval.
Terminologies Associated with a
Frequency Distribution
2. Class Boundaries
 These are the dividing lines between
successive class intervals.
 The boundaries are obtained by adding the
upper class limit of one class interval with the
lower class limit of the succeeding class
interval and dividing by 2.
 Example, the class boundary of class interval
10.0-10.9 is 9.95-10.95.
Terminologies Associated with a
Frequency Distribution
3. Class Mark
 This is the mid-point of the class interval.
 It is obtained by adding the lower class limit
and the upper class limit and dividing by 2
 For example, the class mark for class interval
10.0-10.9 in our case is 10.45
 We usually denote class mark by X
Terminologies Associated with a
Frequency Distribution
4. Class Interval Size (or Length or Width)
 This is the difference between the upper class
boundary and the lower class boundary of a
given class interval.
 In our example, the class interval size is 1
Measures of Central Tendency
• Measures of central tendency are statistical
measures which describe the position of a
distribution.
• In the univariate context, the mean, median
and mode are the most commonly used
measures of central tendency.
• Measures of central tendency can be computed
for ungrouped and grouped data
Ungrouped Data
1. Arithmetic Mean
• Arithmetic mean is a mathematical average
and it is the most popular measures of central
tendency.
• It is frequently referred to as ‘mean’ or
‘average’ it is obtained by dividing sum of the
values of all observations in a series by the
number of items constituting the series.
• Arithmetic mean for ungrouped data is
computed as:-
Ungrouped Data

• X is the value of the item


• Where n refer to number of items
Ungrouped Data
2. Median
• The median is the value that divides the given
distribution into two equal parts.
• In order to calculate the median we have to
arrange the observations in ascending or
descending order
• When we have an odd number of observations.
Then, the median is in position
Ungrouped Data
• If there is an even number of observations,
there is two middle items. In this case, the
average of the two ‘middle’ ones is taken as
the Median
NB: The most important thing when computing
the median is to arrange your observations in
either ascending order or descending order.
Ungrouped Data
3. Mode
• The mode is that value (or those values) that
occurs most often in the distribution. That is
the value with the highest frequency.
• Theoretically, there can be no mode or there
can be more than one mode.
• A distribution with one mode is said to be
unmodal.
• A distribution with more than one mode is
called multimodal.
Ungrouped Data
Example 1.1: Compute mean, median and mode
for the following plasma
volumes (litres) of eight healthy adult
males:
2.75, 2.86, 3.37, 2.76, 2.62, 3.49,
3.05, 3.12

Example 1.2: Calculate the mean, median and


mode of the following BMI measured
for 5 adult females:
18.6, 20.5, 25.8, 18.2, 22.9, 20.5, 24.0
Grouped Data
1. Arithmetic Mean
For grouped data, the arithmetic mean is
defined as:-

Where X refer to Class Mark and f refer to the


corresponding frequency.
Grouped Data
2. Median
For grouped data, median is given as:-

Where:
is the median
Grouped Data
is the lower class boundary of the median
class interval, that is the class interval
containing the median value
N is the total of the frequencies
is the cumulative frequency of all
frequencies of the class intervals below the
median class interval
is the frequency of the median class
interval
is the class interval size of the median class
interval
Grouped Data
NB: The median class interval is that interval at
which the cumulative frequencies exceed N/2
for the first time.

3. Mode
The mode is computed using the following
formula:
Grouped Data
Where:
is the modal value
is the lower class boundary of the modal
class, that is the class interval containing
the mode
is the difference between the frequency of the
modal class interval and the frequency of the
class interval immediately below the modal
class interval
Grouped Data
is the difference between the frequency of
the modal class interval and the frequency
of the class interval immediately above the
modal class interval
C is the class interval size of the modal class
interval
NB: The modal class interval is that interval
which has the highest frequency
Grouped Data
• Example 1.3: The following data represents the
age distribution of a sample of 100
people covered by health insurance
(private or government)

Age Number
25-34 23
35-44 29
45-54 28
55-54 20

Compute Mean, Median and Mode.


Grouped Data
Example 1.4: Consider the frequency
distribution of serum cholesterol in
86 stroke patients:-
Interval Frequency
3.0-3.9 3
4.0-4.9 14
5.0-5.9 21
6.0-6.9 20
7.0-7.9 21
8.0-9.0 5
9.0-9.9 2

Compute Mean, Median and Mode


QUANTILES OR FRACTILES
• We now consider dividing a frequency
distribution into a number of specified
fractions or quantities.
• The general terminology is that of quantiles or
Fractiles.
• Let’s focus on the Median, Quartiles, Deciles
and Percentiles.
QUANTILES OR FRACTILES
• The Median divides the distribution into two
equal parts
• The Quartiles divide the distribution into four
equal parts. There are three quartiles (1st , 2nd ,
and 3rd )
• The Deciles divide the distribution into ten
equal parts
• The Percentiles divide the distribution into one
hundred equal parts.
MEASURES OF DISPERSION
• There are several measures of
dispersion/variability. But we will consider
only three common measures, which are:-
1. Range
2. Variance
3. Standard Deviation
MEASURES OF DISPERSION
1. Range
 This is the simplest measure of dispersion or
variation.
 It is defined as the difference between the
highest value and the lowest value.
 For the frequency distribution, the range is
given by the difference between the upper
class limit of the highest class interval and the
lower class limit of the lowest class interval
MEASURES OF DISPERSION
2. Variance (S2)
 Average squared distance of individual
observations from the mean
 High variance means that most scores are far away from the
mean. Low variance indicates that most scores cluster tightly
about the mean.
 Variance for ungrouped data is given by:-
MEASURES OF DISPERSION
However, the formula given above gives the
biased estimate of the population variance.
Unbiased estimate of the population variance
will be given by:-
MEASURES OF DISPERSION
Variance for ungrouped data is given by:

3. Standard Deviation (S)


 This is the statistical measure which show
how individual scores vary from the mean
 Standard deviation is the square root of
variance, thus, it is expressed in the original
units of measurements
MEASURES OF DISPERSION
Coefficient of Variation
Is the ratio of standard deviation to the
arithmetic mean of the distribution.
It is defined as:

NB: If you are comparing distributions, the


distribution with smallest C.V is Said to be
less variable than the other distribution.
MEASURES OF DISPERSION
Exercise
Using the data in example 1.1-1.4, compute the
i. range, variance and standard deviation.
ii. The 1st quartile, 3rd quartile, 5th decile, 9th
decile, 25th percentile, and 75th percentile.
END

You might also like