You are on page 1of 6

long Kong Journal o Lmergency Medicine

Basics of research - part 4B: Sample size, data collection, and bias
CV Pollack and EA Panacek
Correspondence to:
Charles V. Pollack, Director
Marisopa Medical Center, Department o Lmergency
Medicine, Phoenix, Arizona, USA
UC-Davis Medical Center, Diision o Lmergency Medicine
Ldward A. Panacek
Part 4A o this article was published in the last issue o this
Journal.
Measurement collection methods
Once the design and subject selection procedures
are determined the researcher must next consider
how the ariables o interest will be measured. Many
dierent methods o data collection are aailable
depending upon the research question and resources
o the inestigator. Data collection methods ary
in the degree o structure, quantiiability, researcher
obtrusieness, and objectiity. lighly structured
methods are preerable when a speciic, non-
exploratory research question is being asked. lor
example, structured methods would work well or
the question "is heparin or normal saline a better
agent to maintain patency o a heparin lock" In
contrast, less structured methods may be appropriate
or the question "what is the experience o being
transported by ambulance or acute abdominal
pain".
Some ariables are inherently more quantiiable
than others. Blood pressure and other ital signs
are easily quantiiable. loweer, leel o stress or
skill in intubation are less readily quantiiable.
Measur ement o al l ari abl es need not be
quantiiable, but reproducibility and reliability are
usually higher when the measure can be quantiied.
Obtrusieness o the research protocol can impact
the quality o the data obtained. An indiidual
under scr uti ny by a researcher may al ter hi s
usual behaiour, either or better or or worse. I
obseration during a resuscitation is used as a
research method it may be diicult or the obserer
to remain unobtrusie because o the small space
inoled. 1he obserer should make eery attempt
not to interere with the normal process o eents.
In addition, participant bias is reduced i the purpose
o the obserer is blinded to the participants.
linally, measurement techniques can ary in degree
o objectiity. Objectiity is the degree to which
two indiiduals can proide the same measure on a
speciic ariable. 1wo people determining end tidal
CO
2
as a measure o intubation success would be
more objectie than two people determining success
by isual inspection alone. Degree o objectiity is
increased when the measurement technique relies
more on standard procedure than on subjectie
opinion. Objectiity also is increased when the
obserer is not inoled in proision o patient care
or other research actiity being measured.
Biophysiologic measures, sel report, and
observation are three common methods used to
collect data or an inestigation and ary in their
degree o structure, quantiiability, researcher
obtrusieness, and objectiity. ,1ables 1 & 2, In
order to identiy the measurement collection
methods best or the project, the inestigator should
irst list the ariables o interest in the study and
included within the hypotheses or research questions.
Once the methods o data collection are identiied
the researcher should become aware o the limitations
o the particular method o data collection chosen
and implement procedures to limit the diiculties
wheneer possible. 1here are generally two ways to
accomplish this. One is to hae the protocol and data
collection sheets reiewed beore the study by as many
people as possible. 1he other approach is to "pilot
test" the data collection method beore the ull study
using old charts or a ew actual patients.
Biophysiologic measures are increasingly common
long Kong j. emerg. med. Vol.,1, Jan 2000 5 2
with healthcare research. 1his trend is due partially
to the increased technological nature o healthcare.
Biophysiological measures include but are not
limited to blood pressure, weight, and heart rate.
Standards or the measure o each o these ariables
are aailable, increasing the objectiity o the
measures and the ability to reproduce results rom
moment to moment or researcher to researcher.
A pri mary di sadantage o bi ophysi ol ogi cal
measurements can be oerly high reliance on their
al i di t y and rel i abi l i t y. 1he presence o a
quantiiable number may gie a alse sense o
accuracy. I a temperature gauge reads 98.64 degrees
it may or may not actually be accurate to 0.1 degree.
Researchers should establish, rather than blindly
accept, the degree o accuracy present in their
1able 2. Data collection methods
Quantiiability Objectivity Structure Obtrusiveness
Biophysiological XXX XXX XXX XX
Sel Report XX XX XXX XXX
Obseration X X Varies Varies
Number o Xs symbolises the degree to which the characteristic is met.
1able l. Deinition o terms
Biophysiologic measures measures o biological unction obtained through use o technology, such as electrocardiogram
or haemodynamic monitoring
Sel report the ariables o interest are measured by asking the subject to report on their perception o
the alue or the ariable
Psychological scale usually a number o sel report items combined in a questionnaire designed to ealuate the
subject on a particular psychological trait such as sel esteem
Obseration the actiity o interest is obsered, described, and possibly recorded ia audio or ideo tape
Validity how well the tool measures what its supposed to measure
lace alidity the instrument looks like it is measuring what it should be measuring
Criterion related alidity - the results rom the tool o interest are compared to those o another
criterion that relates to the ariable to be measured
Concurrent alidity criterion-related alidity in which the measures are obtained at the same time
Predictie alidity criterion-related measurement using one instrument to to predict the alue rom another
instrument at a uture point in time
Content alidity concerned with whether the questions asked, or obserations made actually address all o
the ariable o interest
Construct alidity the researcher is not as concerned with the alues obtained by the instrument but with the
abstract match between the true alue and the obtained alue
Reliability the degree o consistency with which an instrument measures the ariable it is designed to
measure
Stability determination o the degree o change in a measure across time
determination o stability is only appropriate when the alue or the ariable o interest is
expected to remain the same oer the time period examined
Inter-rater reliability the degree to which two or more ealuators agree upon the measurement obtained
Internal consistency the degree to which items on a questionnaire or psychological scale are consistent with each
other
Pollack et al.,Sample, size, data collection, and bias 5 3
physiological measures. Another limitation results
rom increasing complexity o biophysiological
deices. Such deices can proide inaccurate data
unless they are correctly used and with increased
complexity it may be more diicult to detect
equipment malunction.
"Sel report" data also is common within the
healthcare enironment. Sel-report data is easy to
obtain and with some approaches can be gien at
least the appearance o quantiiability. Sel-report
data can be in the orm o diaries, interiews, or
completion o a list o written or erbal questions.
Sel-report can be used to measure attitudes,
psychological tendencies, and behaiours. In some
studies, sel-report is the only way to measure the
ariable o interest, especially when the ariables
are subjectie. lor example, attitudes towards
speci i c pol i ci es may not be amendabl e t o
obseration but the subject may be willing to express
their iews in a written or erbal ormat. Sel-report
i s not as constrai ned as other methods. An
i ndi i dual may be abl e to recal l eel i ngs or
experiences rom a preious point in time when
obseration or biophysiological measurements were
not possible. As an example, this approach can be
used to measure amounts o "pain".
Sureys or mailed questionnaires are common orms
o sel-report because o the ease o deelopment
and ease o analysis. 1he usual ormat is to pose a
question and leae a space or the subject's response.
1he more speciic the answer requested the easier
data analysis, but the more stilted the responses
might be.
Another common approach to collecting sel-report
data is a psychological scale. Researchers hae
deeloped speciic questionnaires to measure
ariables such as work satisaction, sel-esteem, and
quality o lie. 1he adantage o this approach is
that usually the alidity and reliability o the
instrument has been preiously established, a
method o data analysis is predetermined, and time
consuming instrument construction is aoided. 1he
disadantages include concerns that the tool does
not precisely measure your ariable o interest and
that the originator might charge or use o the
instrument.
Observation is the inal approach to data collection
to be discussed. In obseration the actiity o interest
is obsered, described, and possibly recorded ia
audio or ideotape. 1he inestigator then analyses
the episode or the ariables o interest. lor example,
s t u d i e s e x a mi n i n g a d mi n i s t r a t i o n o
cardiopulmonary resuscitation may collect data
such as obsered depth o compression or adequacy
o chest rise during entilation. Although more
i ntrusie methods coul d be used to proi de
quanti tatie data such as measured depth o
compression or tidal olume, obseration and
subjectie appraisal may be used in order to
minimise intrusieness o the data collection.
Obseration methods hae the adantage o being
usable in many settings, o maintaining some o the
context o the situation, o proiding a way to re-
examine the situation ater it occurred, and o
allowing or interpretation by the researcher.
Obserations that are recorded can be analysed by
more than one indiidual in an attempt to decrease
the subjectie nature o data analysis. Obseration
howeer has seeral disadantages. Bias in recording
and ealuation o the obserations is a possibility,
een with a legitimate eort to increase objectiity.
1he presence o an obserer or a recording deice
may make the subject more aware o their actions,
causing an alteration in their behaiour.
Validity o measurements
In designing a research study the inestigator
attempts to use the best tool or measuring the
ariable o interest. Unortunately, the true score
o the ariable is neer absolutely known. An
obtained score is always altered to a certain degree
by "er r or i n measur ement . " 1he er r or i n
measurement can hae multiple causes, including
alidity and reliability o the instrument.
Validity is the "degree to which an instrument
measures what it is supposed to be measuring."
Biophysiological measures hae "relatiely" high
alidity because the measurement technique may
be based on the deinition and upon sound scientiic
principles. lor example, blood pressure is the
pressure i n t he cardi oascul ar syst em. 1he
long Kong j. emerg. med. Vol.,1, Jan 2000 5 4
meas ur ement o pr es s ur e i s a r el at i el y
straightorward process. In contrast, deelopment
o a alid tool to measure pain is much more
diicult. Not eeryone agrees on a deinition o pain
so deeloping a tool to address a nebulous and highly
subjectie entity is much more diicult. 1he
researcher might question i the tool measures pain
or i it really measures something else such as
anxiety.
Validity is ery diicult to insure because absolute
knowledge cannot be obtained. Researchers use
seeral "round about " met hods t o t r y and
demonstrate that an i nstr ument i s al i d and
measuring what it says it measures. O all o the
measures o alidity, ace validity may be the easiest
to establish. lace alidity only means that the
instrument looks like it is measuring what it should
be measuring
4
and is an intuitie and subjectie
judgement. At a ery minimum a tool must hae
alidity, but as this is the weakest test o alidity,
other approaches also should be used to establish
alidity.
Criterion - related validity uses the process o
comparing the tool o interest to another criterion
that relates to the ariable o interest. A critique o
this approach is that i there is another tool that
can be used as the "gold standard," why not use it
instead. Use o the "gold standard" may be suitable
in most cases, but sometimes the better instrument
may not be appropriate in the research enironment.
lor example, to establish the criterion - related
alidity o pulse oximetry as a measure o blood
oxygenati on, the al ues obtai ned rom pul se
oximetry might be compared to the alues obtained
rom an arterial blood gas. During transport blood
gases are not aailable, so the less inasie pulse
oximetry might be the only way to obtain SaO
2
data or a study. In the aboe example, the blood
gas and the pulse oximetry measure would be
obtained at the same time to establish criterion -
related alidity.
1he aboe method is considered establishment o
concurrent validity as the two measures were done
at the same time. Another orm o criterion - related
alidity is predictive validity. lere the measure o
interest is obtained and then at a uture time another
criterion is measured. 1he philosophy behind this
approach is that i X leads to \ with a certain
requency, and one measures X, then one should
be able to measure \ to eriy the alidity o X. lor
example, i the reised trauma score measures
seerity o injury and should predict mortality then
a proen correlation between trauma score and
patient mortality would be eidence o predictie
alidity or the reised trauma score.
Content validity deals with whether the questions
asked, or obserations made actually address all o
the ariable o interest. Content alidity relates
more to sel-report data and obserations than to
biophysiological measures. loweer, content
alidity also would be releant when looking at
composite biophysiological measures that are
combined to make more complex assessments. lor
example, content alidity o the reised trauma score
woul d be establ i shed by deter mi ni ng i the
indiidual components o the reised trauma score
coered all o the items necessary to describe the
seerity o the trauma.
Unortunately content alidity cannot be directly
measured in most cases as is possible with criterion
related alidity. Lstablishment o content alidity
relies most commonly on the opinion o experts.
lor educational assessment tools, comparison o the
tool against the list o objecties or course outline
might be an approach to the establishment o
content alidity. In this way, content alidity is
similar to ace alidity. 1he dierence is that ace
alidity oten inoles the same people both as the
subjects and as the experts. Secondly, content
alidity is more concerned with the question o
whether eerything is coered and nothing let out.
As a result, content alidity uses more speciic and
objectie criteria.
Construct validity is perhaps the most diicult
type o alidity to understand and to measure.
Lstablishment o construct alidity is a more
abstract process as the researcher is not as concerned
with the alues obtained by the instrument but with
the abstract match between the true alue and the
obtained alue. lurther discussion is beyond the
scope o this series. ,lor more inormation see Polit
and lunger, 1995.,
Pollack et al.,Sample, size, data collection, and bias 5 5
Instrument reliability
In contrast to alidity, reliability is the degree o
consistency with which an instrument measures the
ariable it is designed to measure.
4
lortunately,
establishing instrument reliability is easier than
establishing alidity. It is important to note that an
unreliable instrument cannot be alid. I the
instrument does not measure something the same
way twice, at least one o those times the instrument
cannot be measuring what it is supposed to measure.
In contrast, an instrument can be ery reliable and
yet not hae alidity. lor example, i one takes a
blood pressure multiple times and each time it is
the same, that is a reliable measure. But i one is
determining "leel o stress," measuring blood
pressure by itsel is not a alid measure o stress,
despite its obious reliability. As with alidity there
are seeral types o reliability: stability across time,
inter-rater reliability, internal consistency, and
equialence. Stability across time is measured using
the test-retest approach. A measurement is taken at
one point in time. 1his approach to measuring
reliability is only appropriate when the ariable
being measured can be considered stable across the
chosen period o time. lor example, the height o
an adult can be expected to remain the same or
relatiely long periods o time. 1o measure the
stability o a ruler as measure o height, one height
measurement could be taken today and another in
a month. I instead the measure could be expected
to change more requently, such as weight, placing
the two measurements at a one-month interal could
not be expected to proide test-retest reliability.
Instead, haing the indiidual step o the scale, wait
a minute or two and then step back on the scale
would be a more appropriate ealuation o stability
because weight does not luctuate oer a l-2 minute
period o time. \hen a researcher wishes to examine
test-retest reliability careul consideration must be
made o the length o time oer which stability can
reasonably be expected.
Inter-rater reliability is the degree to which two
or more ealuators agree upon the measurement
obtained. lor example, to test inter-rater reliability
o a bl ood pressure measurement a doubl e
stethoscope would be used to determine i both
researchers would agree on a single blood pressure
alue. 1his method is most important in assessing
methods that hae a greater degree o subjectiity
,e.g., patient mental status,. Researchers using
obserational methods should examine inter-rater
reliability beore collecting study data to assure that
eeryone is looking or the same behaiours or
measurements.
Internal consistency is a little more complex and
is the degree to which items on a questionnaire or
psychological scale are consistent with each other.
Questionnaires that are consistent hae items that
are directed at measuring the same thing. lor
example, a scale to measure sel esteem would hae
a number o questions all directed at measuring a
component o sel esteem. Achieing a questionnaire
with internal consistency is a balancing act. 1he
goal is to arrie at a questionnaire that is consistent
without being redundant. Long questionnaires may
not be completed and so the goal is to ask as ew
questions as possible that proide a alid measure
o the ariable o interest.
1wo main techniques are used to measure internal
consistency, split-hal reliability and Cronbach's
coeicient alpha. A discussion o the two methods
is beyond the scope o this series.
4
A inal orm o reliability is parallel orms. Parallel
orms is an examination o two instruments used to
measure the same ariable. lor example, one may
not want all students to hae the exact same test i
they are sitting ery close to each other when taking
the exam. \ou also may want to repeat the exam at
a ery short interal and do not want subjects to
remember questions rom the irst time. 1o assure
that the instruments are reliable the researcher needs
to hae one group o subjects complete both orms
at the same sitting. A correlation between the two
orms is then done to determine the degree o
reliability.
Conclusion
1here are many actors that impact the quality o a
research study. Not all points must be addressed
with a gien design. In many cases an eye to
common sense will help the researcher to identiy
potential sources o bias in the research design. Not
all sources o bias can be eliminated, but an attempt
long Kong j. emerg. med. Vol.,1, Jan 2000 5 6
should be made to eliminate or reduce bias where
possible.
Submitting the research proposal to others is a
helpul method o determining sources o bias.
Comparison o the research protocol to published
reports o other similar studies also may be helpul.
1he methods section o a research report should
discuss the steps taken by the researchers to minimise
bias. Similar approaches then can be used in the
proposed study.
1his article in the series is meant to discuss the many
issues associated with "leshing out" a research
protocol. 1he subject can become ery complex
because o the broad spectrum o clinical research.
It is impossible to go into each area in great detail,
but a number o reerence textbooks are aailable
or those who wish to learn more on this subject. It
is important to understand the importance o the
planning phase o a study, which can take longer
and be more diicult than the study itsel.
As discussed in the irst parts o this series, a research
proposal should be based upon sound scientiic
principles. loweer, the quality o the science and
the ethics o a study are two dierent issues. 1he
next article in the series will discuss the ethics o
research and methods or assurance that the rights
o human subjects are protected within the research
design.
Reerences
1. Cohen J. Statistical Power Analysis or the Behaioral
Science. lillsdale: NJ Lawrence Lrlbaum Associates,
1988.
2. leberlein 1A, Baumgartner, R. lactors aecting
response r at es t o mai l ed quest i onnai res: A
quantitatie analysis o the published literature. Amer
Soc Re 198,43:44-62.
3. Cohen J, Cohen P. Applied Multiple Regression,
Correlation Analysis or the Behaioral Sciences.,
lillsdale: NJ Lawrence Lrlbaum Associates, 1983.
4. Polit DG, lungler BP. Nursing Research: Principles
and Methods, 5th edition. Philadelphia PA: JB
Lippincott Company, 1995.
Recommended 1exts
1. Polit Dl, lungler BP. Nursing research: Principles
and methods, 5th edition. Philadelphia PA: JB
Lippincott Company, 1995.
2. Bailey DM. Research or the lealth Proessional: A
Practical guide. 1991. l.A. Dais, Co., Philadelphia.
3. lul l ey SB, Dummi ngs SR. Desi gni ng Cl i ni cal
Research. \illiams and \ilkins. Baltimore, MD.
1988
4. Okol o Ln. le a l t h Re s e a r ch De s i gn a nd
Methodology. 1990. CRC Press, Inc., Boca Raton,
lL.

You might also like