You are on page 1of 6

Test Validity

A. Definition
Generally, Validity is arguably the most important criteria for the quality of a test.
The term validity refers to whether or not the test measures what it claims to measure. On a
test with high validity the items will be closely linked to the tests intended focus. For many
certification and licensure tests this means that the items will be highly related to a
specific job or occupation. If a test has poor validity then it does not measure the job-
related content and competencies it ought to. When this is the case, there is no
justification for using the test results for their intended purpose. There are several ways
to estimate the validity of a test including content validity, concurrent validity, and
predictive validity. The face validity of a test is sometimes also mentioned. Ruch (1924:
13) By validity is meant the degree to which a test or examination measures what it
purports to measure. Validity might also be expressed more simply as the
worthwhileness of an examination. For an examination to possess validity it is
necessary that the materials actually included be of prime importance, that the questions
sample widely among the essentials over which complete mastery can reasonably be
expected on the part of the pupils, and that proof can be brought forward that the test
elements (questions) can be defended by arguments based on more than mere personal
opinion. Although we often speak of a given tests validity, this is misleading, because
validity is not simply a function of the content and procedures of the test itself. Validity,
on the other hand, is concerned with identifying the factors that produce the reliable
variance in test scores.
How We Apply Test Validity
It is useful to think of validity as truthfulness. Have we made the correct inference from
the tests core? For an inference from a test to be valid, or truthful, the test must first be
reliable. If we can-not even get a bathroom scale to give us a consistent weight measure,
we certainly cannot expect to make accurate inferences about weight from it. Note,
however, that a measure might be consistent (reliable) but not accurate(valid). A scale
may record weights as two pounds too heavy each time. In other words, reliability is a
necessary but not sufficient condition for validity. (Neither validity nor reliability is an
either/or dichotomy; there are degrees of each.). In discussing validity, it is useful to
think of two general types of inferences: (1) making inferences about performance other
than that measured; and (2) making inferences about a property (behavioral domain) of
the person measured. The first is a statistical inference; the second is a measurement
inference (Guion, 1983). When a score is used to infer other performance, we are, in a
sense, predicting performance. Knowing the degree to which the prediction or inference
is accurate depends on criterion related validity evidence. In test validation we are not
examining the validity of the test content or of even the test scores themselves, but
rather the validity of the way we interpret or use the information gathered through the
testing procedure. Generalizability theory provides a framework for identifying and
simultaneously examining several sources of error in test scores. Messick (1989), for
example, describes validity as an integrated evaluative judgment of the degree to which
empirical evidence and theoretical rationales support the adequacy and appropriateness
of inferences and actions based on test scores

B. Types of Validity
a. Concurrent Validity
1. What is Concurrent Validity?
Another important method for investigating the validity of a test is
concurrent validity. Concurrent validity is a statistical method using correlation,
rather than a logical method. Examinees who are known to be either masters or
non-masters on the content measured by the test are identified, and the test is
administered to them under realistic exam conditions. Information on concurrent
criterion relatedness is undoubtedly the most commonly used in language testing.
Such information typically takes one of two forms:
(1) Examining differences in test performance among groups of
individuals at different levels of language ability, or
(2) Examining correlations among various measures of a given ability. If
we can identify groups of individuals that are at different levels on the
ability in which we are interested, we can investigate the degree to
which a test of this ability accurately discriminates between these
groups of individuals. Typical groups that have been used for such
comparisons are native speakers and non-native speakers of the
language (for example, Chihara et al. 1977; Alderson 1980; Bachman
1985).
2. How do we do concurrent validity?
In concurrent validation, the predictor and criterion data are collected at or
about the same time. This kind of validation is appropriate for tests designed to
asses a persons current criterion status. It is good diagnostic screening tests when
you want to diagnose.
b. Construct Validity
1. What is Construct Validity?

Concurrent validity is a statistical method using correlation, rather than a


logical method. Examinees who are known to be either masters or non-masters on
the content measured by the test are identified, and the test is administered to
them under realistic exam conditions. Construct validity is indeed the unifying
concept that integrates criterion and content considerations into a common
framework for testing rational hypotheses about theoretically relevant
relationships (Messick 1980: 1015). In discussing the different aspects of
validation so far, I have repeatedly referred to construct validity, and it is high
time I attempted to explain what this is. Construct validity concerns the extent to
which performance on tests is consistent with predictions that we make on the
basis of a theory of abilities, or constructs. Historically, the notion of construct
validity grew out of efforts in the early 1950s by the American Psychological
Association to prepare a code of professional ethics, part of which would address
the adequacy of psychological tests (Cronbach 1988).5 The seminal article that
described the need for construct validation and provided the conceptual
framework for its investigation is that of Cronbach and Meehl (1955). In the thirty
years since, construct validity has come to be recognized by the measurement
profession as central to the appropriate interpretation of test scores, and provides
the basis for the view of validity as a unitary concept. Construct validation can
thus be seen as a special case of verifying, or falsifying, a scientific theory, and
just as a theory can never be proven, the validity of any given test use or
interpretation is always subject to falsification.

2. How do we apply Construct Validity?

For knowing this one, it will give an example, if I generate descriptions of


fluency and anxiety I may hypothesize that, as anxiety increases, fluency will
decrease, and vice versa. If this hypothesis is tested and can be supported, we
have the very primitive beginnings of a theory of speaking that relates how we
perform to emotional states. To put this another way, concepts become constructs
when they are so defined that they can become operational we can measure
them in a test of some kind by linking the term to something observable (whether
this is ticking a box or per-forming some communicative action), and we can
establish the place of a construct in a theory that relates one construct to another
(Kerlinger and Lee, 2000: 40), as in the case of fluency and anxiety above. Then,
the example is if one were to develop an instrument to measure intelligence that
does indeed measure IQ, than this test is construct valid. Construct validity is very
much an ongoing process as one refines a theory, if necessary, in order to make
predictions about test scores in various settings and situations.

3. Why we choose Construct Validity?


c. Predictive Validity
1. Definition

Another statistical approach to validity is predictive validity. This


approach is similar to concurrent validity, in that it measures the relationship
between examinees' performances on the test and their actual status as masters or
non-masters. However, with predictive validity, it is the relationship of test
scores to an examinee's future performance as a master or non-master that is
estimated.

2. How does Predictive Validity work?

Predictive validity considers the question, "How well does the test predict
examinees' future status as masters or non-masters?" For this type of validity, the
correlation that is computed is between the examinees' classifications as master or
non-master based on the test and their later performance, perhaps on the job. This
type of validity is especially useful for test purposes such as selection or
admissions.

In order to examine the predictive utility of test scores in cases such as


these, we would need to collect data demonstrating a relationship between scores
on the test and job or course performance. In this case our primary concern is the
accuracy with which our test scores predict the criterion behaviors in which we
are interested, and our procedures will thus focus on the problems of prediction.

3. Why do we choose Predictive validity?


d. Criterion Validity
1. Definition
Criterion-related validity is a concern for tests that are designed to predict
someones status on an external criterion measure. A test has criterion-related
validity if it is useful for predicting a persons behavior in a specified situation.
Criterion-related validity can be also assessed when one is interested in
determining the relationship of scores on a test to a specific criterion.
Another kind of information we may gather in the validation process is
that which demonstrates a relationship between test scores and some criterion
which we believe is also an indicator of the ability tested. This criterion may be
level of ability as defined by group membership, individuals performance on
another test of the ability in question, or their relative success in performing some
task that involves this ability. In some cases this criterion behavior may be
concurrent with, or occur nearly simultaneously with the administration of the
test, while in other cases, it may be some uture behavior that we want to predict.
2. How do we apply Criterion Validity?
The example is that scores on an admissions test for graduate school
should be related to relevant criteria such as grade point average or completion of
the program. Conversely, an instrument that measured your hat size would most
assuredly demonstrate very poor criterion -related validity with respect to success
in graduate school.
e. Determining a Tests Validity
1. What is Determining a Tests Validity?
2. How does Determining a Tests Validity work?
3. Advantage of Determining a Tests Validity

You might also like