Professional Documents
Culture Documents
A. Definition
Generally, Validity is arguably the most important criteria for the quality of a test.
The term validity refers to whether or not the test measures what it claims to measure. On a
test with high validity the items will be closely linked to the tests intended focus. For many
certification and licensure tests this means that the items will be highly related to a
specific job or occupation. If a test has poor validity then it does not measure the job-
related content and competencies it ought to. When this is the case, there is no
justification for using the test results for their intended purpose. There are several ways
to estimate the validity of a test including content validity, concurrent validity, and
predictive validity. The face validity of a test is sometimes also mentioned. Ruch (1924:
13) By validity is meant the degree to which a test or examination measures what it
purports to measure. Validity might also be expressed more simply as the
worthwhileness of an examination. For an examination to possess validity it is
necessary that the materials actually included be of prime importance, that the questions
sample widely among the essentials over which complete mastery can reasonably be
expected on the part of the pupils, and that proof can be brought forward that the test
elements (questions) can be defended by arguments based on more than mere personal
opinion. Although we often speak of a given tests validity, this is misleading, because
validity is not simply a function of the content and procedures of the test itself. Validity,
on the other hand, is concerned with identifying the factors that produce the reliable
variance in test scores.
How We Apply Test Validity
It is useful to think of validity as truthfulness. Have we made the correct inference from
the tests core? For an inference from a test to be valid, or truthful, the test must first be
reliable. If we can-not even get a bathroom scale to give us a consistent weight measure,
we certainly cannot expect to make accurate inferences about weight from it. Note,
however, that a measure might be consistent (reliable) but not accurate(valid). A scale
may record weights as two pounds too heavy each time. In other words, reliability is a
necessary but not sufficient condition for validity. (Neither validity nor reliability is an
either/or dichotomy; there are degrees of each.). In discussing validity, it is useful to
think of two general types of inferences: (1) making inferences about performance other
than that measured; and (2) making inferences about a property (behavioral domain) of
the person measured. The first is a statistical inference; the second is a measurement
inference (Guion, 1983). When a score is used to infer other performance, we are, in a
sense, predicting performance. Knowing the degree to which the prediction or inference
is accurate depends on criterion related validity evidence. In test validation we are not
examining the validity of the test content or of even the test scores themselves, but
rather the validity of the way we interpret or use the information gathered through the
testing procedure. Generalizability theory provides a framework for identifying and
simultaneously examining several sources of error in test scores. Messick (1989), for
example, describes validity as an integrated evaluative judgment of the degree to which
empirical evidence and theoretical rationales support the adequacy and appropriateness
of inferences and actions based on test scores
B. Types of Validity
a. Concurrent Validity
1. What is Concurrent Validity?
Another important method for investigating the validity of a test is
concurrent validity. Concurrent validity is a statistical method using correlation,
rather than a logical method. Examinees who are known to be either masters or
non-masters on the content measured by the test are identified, and the test is
administered to them under realistic exam conditions. Information on concurrent
criterion relatedness is undoubtedly the most commonly used in language testing.
Such information typically takes one of two forms:
(1) Examining differences in test performance among groups of
individuals at different levels of language ability, or
(2) Examining correlations among various measures of a given ability. If
we can identify groups of individuals that are at different levels on the
ability in which we are interested, we can investigate the degree to
which a test of this ability accurately discriminates between these
groups of individuals. Typical groups that have been used for such
comparisons are native speakers and non-native speakers of the
language (for example, Chihara et al. 1977; Alderson 1980; Bachman
1985).
2. How do we do concurrent validity?
In concurrent validation, the predictor and criterion data are collected at or
about the same time. This kind of validation is appropriate for tests designed to
asses a persons current criterion status. It is good diagnostic screening tests when
you want to diagnose.
b. Construct Validity
1. What is Construct Validity?
Predictive validity considers the question, "How well does the test predict
examinees' future status as masters or non-masters?" For this type of validity, the
correlation that is computed is between the examinees' classifications as master or
non-master based on the test and their later performance, perhaps on the job. This
type of validity is especially useful for test purposes such as selection or
admissions.