You are on page 1of 88

Cohort Study

Subodh S Gupta
Dr. Sushila Nayar School of Public Health MGIMS, Sewagram

Type of study Alternate name Observational studies


Descriptive studies Analytical studies
Ecological Cross-sectional Case-Control Cohort Correlational Prevalence Case-Reference Follow-up/ Longitudinal

Unit of study

Populations Individuals Individuals Individuals

Experimental/ intervention Studies


Randomized Controlled Clinical Trial Studies
Field Trial

Patients
Healthy person

Community Trial

Community intervention studies

Communities

Origin of word cohort


The

word cohort has its origin in the Latin cohors cohors (Latin word) = Refers to warriors and gives notion of a group of persons proceeding together in time Group of persons with a common statistical characteristic; e.g. age, birth date

Definition & Synonyms


Definition

The cohort study is an observational epidemiological study which, after the manner of an experiment, attempts to study the relationship between a purported cause (exposure) and the subsequent risk of developing disease.
Synonyms
Follow-up

Longitudinal
Prospective Incidence

study

The cohort design


Groups

are exposure based: The group or groups of persons to be studied are defined in terms of characteristics manifest prior to the appearance of the disease under investigation The study is conceptually longitudinal: The study groups so defined are observed over a period of time to determine the frequency of disease among them A definite beginning and end

The cohort design


Efficient

for examining

When there is good evidence of exposure and disease. When exposure is rare but incidence of disease is higher among exposed When follow-up is easy, cohort is stable When ample funds are available Common outcomes

The cohort design


Many

different outcomes for same exposure The dynamic nature of many risk factors and their relations in time to disease occurrence can be captured here (cannot be done in cross-sectional study and only with difficulty in case-control study) Associations (not cause and effect) Estimate incidence within risk factor groups
Cannot estimate prevalence of risk factor

Case control study


Time
Exposed

Not exposed

(People with disease)

Cases

Population
Exposed Controls (People without disease)

Not exposed

Direction of enquiry

Cohort study
Diseased

Exposed
People without the outcome Not Exposed Not diseased Not diseased

Population

Diseased

Time
Direction of enquiry

Types of cohort study


Historical/

Retrospective/ Non-concurrent Prospective/ Concurrent

The distinction between retrospective and prospective cohort studies is important, not because of any conceptual difference or differences in interpretability of findings, but because of relevance to some practical issues, mostly the ability to control confounding.

Point in time when enquiry begins?


Diseased Exposed People without the outcome Not Exposed Not diseased Not diseased

Population

Diseased

Time Direction of enquiry

Both exposures and outcomes measured prospectively


Diseased Exposed People without the outcome Not Exposed Not diseased Not diseased

Population

Diseased

Time
Direction of enquiry

Exposures measured retrospectively and outcomes prospectively


Diseased Exposed People without the outcome Not Exposed Not diseased Not diseased

Population

Diseased

Time
Direction of enquiry

Both exposures and outcomes measured retrospectively


Diseased Exposed People without the outcome Not Exposed Not diseased Not diseased

Population

Diseased

Time
Direction of enquiry

Advantages
Direct

estimate of risk and rate of disease occurrence over time An efficient means of studying rare exposures Assess multiple outcomes of a single exposure Establish temporal relationship between exposure and outcome Exposure definitely precedes the outcome Avoids recall bias, survival bias Does not require strict random assignments of subjects Can be done with original data or secondary data

Best observational design to establish association

Disadvantages
Very

large sample sizes, especially for rare outcomes Expensive and time-consuming Attrition problem (Loss to follow-up) Differences in the quality of measurement of exposure or disease b/w the cohorts may introduce misclassification (information bias) Can not infer causal relation Very specific finding Complexity of data analysis Ethical issues Study effects

Alternate designs and concerns


Two

separate cohorts; exposed and unexposed subjects Omission of non-factor group Use of external comparison Use of mortality than morbidity as outcome Event notification arises from routine statistics, rather than special observations Comparison of several groups Competing causes of death

Cohort Study: Steps

Steps in conducting cohort study


1. 2. 3. 4. 5.

Identification of study population and initial steps Measurement of exposure Selection of study and comparison cohorts Follow-up (for outcome measurement) Data analysis

Types of cohorts
Closed

or fixed cohorts:

Fixed group of persons followed from a certain point in time until a defined endpoint Starting point - exposure defining event Endpoint occurrence of the disease, loss to follow-up, death The exposure is an event which occurs only once
Open

or dynamic cohorts:

Subjects may enter or leave the study at any time Exposure status may change over time

Cohorts
General

population cohorts: population groups offering special resources for followup or data linkage are chosen, and the individuals are subsequently allocated according to their exposure status Special exposure cohorts: Samples chosen on the basis of a particular exposure Exposures may be a particular event, a permanent state or a reversible state

General population cohorts (groups offering special resources)


Groups

with readily available health records Certain professional categories Obstetric populations Volunteer groups Geographically identified cohorts Record linkage

Special exposure cohorts (groups offering special resources)


Exposed

to certain factor or event Occupational groups Based on qualitative characteristics

Population-based Cohort Studies


Advantages Estimation of distributions and prevalence rates of relevant variables Risk factor distributions Ideal setting in which to carry out unbiased evaluation of relations

Selection of comparison group


Internal

Only one cohort identified Later on, classified into study and comparison cohort based on exposure

comparison

External

More than one cohort identified e.g. Cohort of radiologist compared with ophthalmologists

comparison

Comparison

If no comparison group is available we can compare the rates of study cohort with general population Cancer rate of uranium miners with cancer in general population

with general population rates

Ideal Cohort
Stable

cohort Cooperative cohort Committed cohort Well informed cohort

Exposure measurement
Exposures:

exogenous and/ or endogenous

Reference period Frequency of follow-up


Challenge

of prospective data collection

Changes in instrument over time Use of repeated measures Data collection costs

Sources of information
Records

Cohort

members: self-administered questionnaires, interviews, telephone interviews, mailed questionnaires, Medical examination & biomarkers: Clinic examinations & lab tests Measures of the environment: level of air pollution, quality of drinking water, airborne radiation Multiple methods

Follow-up: Types of outcomes


Discrete events
Single

events

Mortality First occurrence of a disease or health-related outcome


Multiple

occurrences

Disease outcome Transition between states of health/ disease Transitions between functional states

Level

of a marker

Exercise 1
An

investigator wants to discover whether or not being overweight in adolescence increases the risk of cardiovascular mortality in adulthood. a) Assuming historical records are available, would a prospective or retrospective study be more practical? b) Who would comprise the investigator's cohort under study? c) Who would comprise the investigator's exposed and unexposed groups in this cohort?

Group Exercise
Design

a Cohort Study Outline the steps which you will require to do for this study Special efforts you may need to do for follow-up of the study subjects What care you will need to take to reduce measurement bias Calculate the sample size

Challenges in conducting Cohort Study

multiple dimensions of time in cohort study


Age Calendar period

Challenge 1:

Exposure 1
Exposure 2 Exposure i Covariate 1 Covariate 2

Start of study

Covariate i

End of study

Challenge 2: Retaining cohort study members


Loss

to follow-up

Dropouts Can not be traced


More

concern: those who cannot be traced; May have moved because they have developed the disease

Effect of Nonresponse
Nonresponse:

a major problem A differential nonresponse will distrorts the true relationship b/w exposure and outcome

Nonresponse: random or selective?


Exposure

data: find out if nonrespondents are different from the respondents


Intensive efforts within the study design Follow-up of the nonrespondents as well as respondents

Challenge 3: Large Modern Cohort Studies


Huge

requirements of resources and manpower Management of huge database Follow-up Exposure information Data quality? Collection of biologic samples?

Challenge 4: Long term follow-up


Operational

problems Cumulative risk getting closer to one

Cohort Study Analysis

(Relation between exposure and outcome)


DISEASE STATUS Present EXPOSURE STATUS Present Absent Total

Standard 2 X 2 table

a+b

Absent
Total

c
a+c

d
b+d

c+d
N

Two types of measures for rate


Cumulative

incidence = Proportion of study subjects getting the outcome during the study period Incidence rate = New cases/ Person-time under observation

1. Cumulative incidence rate:


Number of new cases of disease occurring over a specified period of time in a population at risk.

EXAMPLE
A surveillance system for Hospital acquired infection among the postoperative patients in a month.

Example
9 6 14 14 24 19 14 4 5 19 21 6

10

15

20

25

30

2. Incidence density:
Number of new cases of disease occurring over a specified period of time in a population at risk throughout the interval.

Incidence density requires us to add up the period of time each individual was present in the population, and was at risk of becoming a new case of disease. Incidence density characteristically uses as the denominator personyears at risk. (Time period can be person-months, days, or even hours, depending on the disease process being studied.)

USES OF INCIDENCE DENSITY AND CUMULATIVE INCIDENCE

Incidence density gives the best estimate of the true risk of acquiring disease at any moment in time. Cumulative incidence gives the best estimate of how many people will eventually get the disease in an enumerated population.

(Relation between exposure and outcome)


Peripheral Vascular Disease
Present Cigarette Smoking Present Absent Total

Standard 2 X 2 table

15

1712

1727

Absent
Total

41
56

3188
4900

3229
4956

(Relation between exposure and outcome)


Disease status
Present 1st 2nd 3rd 4th 5th Total Absent Total

l X 2 table

Cholesterol quintiles

15 20 26 41 48 150

798 794 791 785 777 3945

813 814 817 826 825 4095

Comparing risks in different groups


Relative

risk OR Risk ratio (RR) Attributable risk OR Risk difference (AR) Attributable risk percent (AR%) Population attributable risk (PAR) Population attributable risk percent (PAR%) Odds Ratio (OR)

Relative risk OR Risk ratio


Ratio

of the risk among exposed to the risk among unexposed [Risk (Exp) / Risk (Unexp)] Risk of disease among exposed = [a/ [a+ b)] Risk of disease among unexposed = [c/ [c +d)] RR = [a/ [a +b)] / [c/ [c +d)] For null hypothesis, Risk ratio will equal one SE=

Risk difference vs. Relative risk


Absolute riskLung cancer deaths per 100,000 adult male per year

200 180 160

191

Smokers Non smokers

22
Relative risk

Absolute risk
8.7

140 120 100 80 60 40 20 0

Attributable risk OR Risk difference


(Absolute differences in risks or rates)
Also

known as attributable risk Risk (Exp) Risk (Unexp) Risk of disease among exposed = [a/ [a +b)] Risk of disease among unexposed = [c/ [c +d)] Risk difference = [a/ [a +b)] - [c/ [c +d)] For null hypothesis, Risk difference will equal zero

Risk difference vs. Relative risk


Absolute riskLung cancer deaths per 100,000 adult male per year

200 180 160 140 120 100 80 60 40 20 0

191

Smokers Non smokers

Risk difference

Absolute risks (Exp & Unexp)

8.7

Attributable risk percent among exposed


Among AR%

exposed, what percent of the total risk for disease is due to the exposure

(Exposed)

= [Risk (Exp) Risk (Unexp)]/ Risk (Exp) X 100 = (RR 1)/ RR X 100 = (OR 1)/ OR X 100 (if risk is small)

Attributable Risk Percent


Absolute riskLung cancer deaths per 100,000 adult male per year

200 180

191

22
% risk due to exposure

Absolute risks (Exp)

160 140 120 100 80 60 40 20 0

Relative risk

% risk due to background

8.7

Smokers

Non smokers

Attributable Risk Percent


Absolute riskLung cancer deaths per 100,000 adult male per year

200 180 160 140 120 100 80 60 40 20 0

191

p0RR Relative risk p0(RR-1)

Attributable risk Percent = (RR-1)/ RR *100

p0RR
Smokers

8.7

p0

Non smokers

Population attributable risk


In

the general population, how much of the total risk for disease is due to the risk factor Risk (Total) Risk (Unexp) Risk (Total)
= [Proportion population Exp X Risk (Exp)] + [Proportion population Unexp X Risk (Unexp)]

Population attributable risk percent


Among

the general population, what percent of the total risk for disease is due to the risk factor

PAR%

= [Risk (Total) Risk (Unexp)]/ Risk (Total) X 100 = [Pe (RR 1)]/ [1+ Pe (RR 1)] X 100

Population attributable risk percent


Absolute risk of lung cancer death per 100,000 adult male per year
180 160 140 120 100 Pe(RR-1) 80

RR

(RR-1)(1-Pe)

60
40 20 0

Pe
Smoker

(1-Pe)
Nonsmoker

Population Attributable risk Percent = [Pe (RR 1)]/ [1+ Pe (RR 1)] X 100

Risk Reduction
Risk Risk RR ARR RRR

(T/t) = a/(a+b) (Exp) = c/(c+d) = Risk (T/t)/ Risk (Exp) = Risk (Exp) Risk (T/t) = [Risk (Exp) Risk (T/t)] / Risk (Exp) = 1-Risk(T/t)/Risk(Exp) = 1-RR NNT = 1/ARR = 1/Risk(Exp)*RRR NNH

Analytical considerations
Concurrent

follow-up Varying follow-up dates Moving baseline dates Withdrawals Competing causes of death

Analytical considerations
Concurrent

follow-up

Simple risk-based analyses Survival analysis


Varying

follow-up dates

Simple risk analysis for all events up to, but not exceeding, the minimum elapsed time Survival analysis
Moving

baseline dates

Ignore and measure elapsed time since recruitment Survival analysis


Withdrawals Competing

causes of failure

Advanced methods
Standardization

Stratification
Life

Tables Multivariate analysis and Cox regression

Exercise 2
A

cohort study to explore the relationship between visual impairment and the risk of injuries from falls among the elderly. A total of 400 visually impaired (VI) persons >70 yrs are compared against 400 controls without VI. Over a 5-year follow-up period, 80 VI persons and 20 non-VI persons have injuries from falls. a) Construct a 2x2 table from the information above b) Calculate the followings with their CI :
Cumulative Incidence rate for exposed and unexposed Relative risk Attributable risk & Attributable risk percent

Exercise 2
A

cohort study to explore the relationship between visual impairment and the risk of injuries from falls among the elderly. A total of 400 visually impaired (VI) persons >70 yrs are compared against 400 controls without VI. Over a 5-year follow-up period, 80 VI persons and 20 non-VI persons have injuries from falls. a) Construct a 2x2 table from the information above b) Calculate the followings with their CI :
Cumulative Incidence rate for exposed and unexposed Relative risk Attributable risk & Attributable risk percent

Exercise 3
A

retrospective cohort study to explore the relationship between perimenopausal exogenous estrogen use and the risk of coronary heart disease (CHD). A total of 5000 exposed and 5000 unexposed women are enrolled and followed for 15 years for the development of myocardial infarction (MI). A total of 200 estrogen users and 300 nonusers had MIs.

Exercise 3 (Contd.)
a)

b)
c) d)

The risk (CI) of a MI among estrogen users The risk (CI) of a MI among nonusers of estrogen The relative risk (CIR) for MI Based on the results of this study is estrogen use a causative or protective factor for MI?

Exercise 4
Shaper et. al. (1988) A random sample of 7729 middle-aged British men Each man asked, at baseline, his alcohol consumption Next 7.5 years, death certificates collected for any subject who died
Alcohol consumption group (Unit/wk)
None Occasional Light (<1) (1-15) Moderate (16-42) Heavy (>42)

Subjects Deaths

466 41

1845 142

2544 143

2042 116

832 62

Exercise 4 (Contd.)
a) b)

Calculate the risk and the relative risk for each alcohol consumption group. Why might the conclusion based on the above table may be misleading? Given adequate funding, describe how?

Exercise 5

In a cohort study of 34387 menopausal women in Iowa, intakes of certain vitamins were assessed in 1986. In the period up to the end of 1992, 879 of these women were newly diagnosed with breast cancer. The table below shows data for two vitamins, classified according to ranked categories of intake.
Vitamin C Vitamin E

Events

PY

Events

PY

1 (low) 2 3 4 5 (high)

507 217 76 55 24

124,373 57,268 19,357 17,013 7,711

570 129 71 28 81

143,117 33,950 19,536 6,942 22,176

Exercise 5 (Contd.)
a)

For each vitamin, calculate the relative rates (with 95% confidence intervals) taking the low-consumption group as the base. Do your results suggest any beneficial (or otherwise) effect of additional vitamin C or E intake?

Types of bias
Selection

bias Follow-up bias Information bias Confounding bias Post hoc bias

Selection bias
Group

studied does not reflect the same distribution of factors (such as age, sex, SES, behavior etc.) as occurs in the general population
Effect of volunteering Whole spectrum of independent variables not represented in the study group Presence of incipient disease Distribution of covariates Survival cohorts: cohorts ascertained long after exposure

Follow-up bias
Also

known as Migration Bias In nearly all large studies some members of the original cohort drop out of the study If drop-outs occur randomly, such that characteristics of lost subjects in one group are on an average similar to those who remain in the group, no bias is introduced But ordinarily the characteristics of the lost subjects are not the same

Example of lost to follow-up


EXPOSURE irradiation
+ + 10000 20000 30000 50 100 Total 150 + 4000 RR= 30/4000 30/8000 =2 8000 12000 EXPOSURE irradiation + 30 30 Total 60

RR= 50/10000 100/20000 =1

Example. healthy worker effect


Question:

association b/w formaldehyde exposure and eye irritation Subjects: factory workers exposed to formaldehyde Bias: those who suffer most from eye irritation are likely to leave the job at their own request or on medical advice Result: remaining workers are less affected; association effect is diluted

Measurement / (Mis) classification


Exposure

misclassification occurs when exposed subjects are incorrectly classified as unexposed, or vice versa Disease misclassification occurs when diseased subjects are incorrectly classified as non-diseased, or vice versa

Misclassification bias: due to measurement errors


Systematic

bias Measurement errors


Non-differential: observed relative risk biased towards the null hypothesis Differential: This can lead to study results, which can not be interpreted because the observed relative risk may be biased towards the null, away from the null, or cross over the null value compared with the true relative risk

Sources of measurement errors


Selection/

exposure Omissions in the protocol for use of the instrument Poor execution of the study protocol Inherent subject characteristics Drift in accuracy of exposure measures over time Data processing and creation of exposure variables

design of the instrument to measure the

Reassignment to exposure category


Changes

in dichotomous exposure, if not taken into consideration will tend to make the strength of an observed association lower than that which actually existed
Latency is likely to be short Exposure accumulates over time during the study Very accurate results desirable

Reassignment

may not be possible

Close cohort as a rule Latency is very long Duration of follow-up is very long

Separate examination of outcome in those who changed exposure status during the study

Confounding bias
Other

factors which are associated with both outcome and exposure variables do not have the same distribution in the exposed and unexposed group

Examples confounding

COFFEE DRINKING

HEART DISEASE
(Smoking increases the risk of heart ds)

(Coffee drinkers are more likely to smoke)

SMOKING

Resolving Confounding Bias


Standardization

Stratification
Multivariate

adjustment

Post hoc bias


Use

of data from a cohort study to make observations that were not part of original study intent.

Thank you

Internal & External validity

7/13/2007

PRD-91

You might also like