Professional Documents
Culture Documents
1385-4046/98/1201-043$12.00
Swets & Zeitlinger
1Department
ABSTRACT
The Hopkins Verbal Learning Test (HVLT) is a brief verbal learning and memory test with six alternate
forms. The HVLT is ideal in situations calling for repeated neuropsychological examinations, but it lacks
a delayed recall trial which is essential for the assessment of abnormal forgetting. We present a revised
version of the HVLT which includes a delayed recall trial, and therefore delays the yes/no recognition trial.
The equivalence of test forms was examined in two separate studies using between-groups and withinsubjects research designs. In both studies, the six forms of the revised HVLT (HVLT-R) were found to be
equivalent with respect to the recall trials, but there were some modest differences in recognition. Recommendations for the use of the HVLT-R in serial neuropsychological examinations are provided, as well as
normative data tables from a sample of 541 subjects, spanning ages 17 to 88 years.
Data collection at the SUNY Buffalo site was supported, in part, by a test development grant from Psychological Assessment Resources, Inc. Data collection at the Johns Hopkins University was supported in part by a
memory research grant from the DeVilbiss Fund and NIA grant #1R01AG11859-01A Aging Brain Imaging and
Cognition. The authors gratefully acknowledge the assistance of Melissa Dobraski and Barnett Sphritz for their
assistance in data collection.
Administration and scoring instructions, and Hopkins Verbal Learning Test-Revised test forms, can be obtained
from Dr. Benedict at cost.
Address correspondence to: Ralph H. B. Benedict, SUNY Buffalo School of Medicine, Department of Neurology,
Buffalo General Hospital, 100 High Street (D-6), Buffalo, NY 14203, USA.
Accepted for publication: June 11, 1997.
44
1987). Most memory tests are also highly susceptible to the effects of task-specific practice
because patients are asked to learn the same material repeatedly. McCaffrey and colleagues
(McCaffrey, Ortega, Orsillo, Nelles, & Haase,
1992) reported a practice effect of one standard
deviation (SD) in magnitude on the Visual Reproduction subtest from the Wechsler Memory
Scale (Wechsler, 1945) when it was re-administered after 1 week. In contrast, our group found
that when a similar test with alternate forms was
administered to normal subjects using the same
test-retest interval, the change was on the order
of 0.2 SD (Benedict, Schretlen, Groninger,
Dobraski, & Sphritz, 1996). These findings
highlight the importance of using different,
equivalent forms of the same test when repeated
assessments of memory are necessary.
Many investigators have developed multipleform verbal memory tests. Parker, Eaton,
Whipple, Heseltine, and Bridge (1995) recently
introduced the University of Southern California
Repeatable Episodic Memory Test (USCREMT), a word-list learning task which includes
only semantically unrelated words in order to
maximize the demand for subjective organization during encoding and retrieval. The USCREMT has seven alternate forms which were
administered to 50 highly educated, middle-aged
men, 36 of whom tested positive for HIV-1. Preliminary reliability data are encouraging, but the
USC-REMT is limited by the lack of delayed
recall and recognition trials. Shapiro and Harrison (1990) reported the equivalence of four
forms of the Rey Auditory Verbal Learning Test
(RAVLT; Rey, 1964) in within-subjects testing
of 17 neurology inpatients and 25 college students. The weaknesses of this study were a
highly variable intertest interval (range = 2 to 13
days) and a small sample size. Geffen,
Butterworth, and Geffen (1994) examined the
equivalence of the original version of the
RAVLT and an alternate form in 51 normal subjects. The authors included delayed recall and
recognition trials. The sample was more representative of the general population and the testretest interval was more carefully standardized.
Analyses of variance revealed no significant
effect of test form. There was a difference of 1.1
METHODS
Subjects
The participants were recruited from three sources:
(1) the State University of New York (SUNY) at
Buffalo and surrounding metropolitan area (n =
HVLT-R
45
46
logical tests. The selection of test form was random for each subject in the SUNY Buffalo sample, and as a result, this sample included roughly
equal numbers of subjects per test form. The 18
college students from UMBC completed all six
HVLT-R forms and were assigned to a test form
sequence according to a Latin squares research
design. Each UMBC student returned to the laboratory for five follow-up assessments at weekly
intervals. On each occasion, the subjects completed a new HVLT-R form as well as a test of
nonverbal learning and brief problem-solving tests
that were used to distract them during the 20-25
min delayed recall interval. The JHU subjects
were examined with either form 2 or form 6 of
the HVLT-R, in accordance with a research protocol. Assignment of subjects to one of these test
forms was random.
Forty elderly subjects from the SUNY Buffalo
sample returned to the same laboratory to complete
a different form of the HVLT-R. On each occasion,
these subjects completed a brief battery of other
neuropsychological tests covering the domains of
language, visual-spatial, and executive function.
The selection of HVLT-R form was random for
each examination, provided that the same form was
not repeated with the same subject. The mean age
of this sample was 68.8 years (SD = 5.8, range 56
82) and the average level of education was 13.9
years (SD = 2.7, range 8 20). The test-retest interval ranged from 14 to 134 days, with a mean of
46.6 (SD = 30.1).
Data Analysis
Although all of the HVLT-R measures were limited by a restricted range to some degree, trial 1,
trial 2, trial 3, learning, total recall, trial 4, and
response bias conformed roughly to a normal distribution of scores and parametric statistics were
employed for these measures. Statistical analyses
of the remaining measures employed
nonparametric tests as the distributions for these
measures deviated clearly from normal. For example, 217 (40%) cases achieved a percent retained
score of 100. Extreme kurtosis was particularly
salient on the recognition task, where 419 (77%)
of subjects made 12 of 12 correct target word detections, and 361 (67%) subjects made no falsepositive errors. Finally, given the high statistical
power of our large sample and the multiple comparisons, we set alpha at .01 to avoid interpreting
very small effects.
RESULTS
Between-Group Analysis of Inter-Form
Equivalence
HVLT-R test forms were administered randomly
to the 432 SUNY Buffalo subjects, resulting in
comparable sample sizes per form: form 1 = 92,
form 2 = 70, form 3 = 60, form 4 = 67, form 5 =
62, form 6 = 81. Age, education, Barona IQ, and
NAART IQ values did not differ across form
group as indicated by one-way ANOVA (Age
F(5,426) = 1.5; Education F(5,426) = 0.6;
Barona IQ F(5,426) = 2.2; NAART IQ F(5,246)
= 1.4). Neither the Caucasian to African American\Other ratio nor the male to female ratio varied significantly across form as demonstrated by
chi-square analysis (Sex P 2 = 4.0; Race\
Ethnicity P2 = 0.6).
Analyses of inter-form equivalence employed
one-way ANOVA for the normally distributed
measures, and the nonparametric Kruskal-Wallis
statistic for the remaning measures. As can be
seen in Table 1, the forms are equivalent with
respect to the free-recall scores, percent retained, and recognition true-positives. Large and
significant effects were found, however, for recognition false-positives, discrimination index,
and response bias. All three findings can be attributed to marked differences in the number of
palse-positives produced by the HVLT-R forms.
As shown in Figure 1, there are essentially two
clusters of HVLT-R forms with forms 1, 2, and
4 resulting in a higher number of false-positives
than forms 3, 5, and 6. Scheff and KruskalWallis comparisons revealed no significant differences among the forms within each cluster.
For response bias, Scheff tests revealed significant differences between form 2 and forms 3, 5,
and 6, and a significant difference between
forms 4 and 6. For both false-positives and discrimination index, Kruskal-Wallis comparisons
were significant (all p values < .001) for each
possible pairing of test form between the clusters.
Within-Subjects Analysis of Inter-Form
Equivalence
Comparison of scores across the six test forms,
among the 18 students who completed each
47
HVLT-R
Lowest Mean
7.61 (form 5)
9.71 (form 3)
10.61 (form 3)
28.11 (form 3)
2.81 (form 3)
10.01 (form 2)
0.91 (form 2)
11.71 (forms 1,4)
0.21 (forms 3,5)
11.01 (form 4)
0.48 (form 6)
Highest Mean
8.11 (form 4)
10.21 (forms 1,4,5)
10.91 (form 5)
28.91 (forms 1,6)
3.41 (form 5)
10.51 (form 5)
0.96 (form 3)
11.91 (form 3)
0.81 (form 2)
11.71 (forms 3,5)
0.59 (form 2)
(SD)
F or K-W
(1.7)
(1.6)
(1.3)
(4.0)
(1.5)
(1.8)
(0.12)
(0.6)
(0.8)
(1.1)
(0.15)
0.5
0.9
0.6
0.5
1.3
0.5
7.6
7.8
50.1
35.2
6.9
0.81
0.46
0.73
0.78
0.26
0.78
0.18
0.17
< .0001
< .0001
< .0001
Note. Recog = Recognition; SD = mean standard deviation for all test forms; K-W = Kruskal-Wallis Chi-Square
statistic.
form, was accomplished using repeated measures ANOVA and nonparametric tests as required. Figure 2 presents the average number of
words recalled for each form across the four recall trails, collapsed across the session administered. The figure clearly demonstrates that the
free-recall scores were similar across form, as
was found in the between-groups analysis. A 6
(form) 4 (trial) ANOVA, with repeated mea-
Fig. 1.
Frequency distribution of the percentage of subjects giving 0, 1, 2, or more than 2 false-positive responses on each form of the HVLT-R. Nonparametric statistical analyses revealed that forms 1, 2, and
4 are similar, as are forms 3, 5, and 6, consistent with visual inspection of the frequency distribution.
48
Fig. 2.
Number of words recalled over the three learning and delayed recall trials of the HVLT-R. Subjects
were 18 college undergraduate students who completed all six forms at successive one-week intervals.
Normative Data
The normative data sample included subjects
from the reliability studies above, and the JHU
sample. As expected, there was a modest yet
significant relationship between younger age
and better HVLT-R performance. The Pearson r
49
HVLT-R
7.6
10.0
10.7
3.3
28.2
9.9
0.91
11.8
0.5
11.3
0.54
Test 2
(SD)
(2.0)
(1.6)
(1.5)
(1.7)
(4.4)
(2.0)
(0.13)
(0.4)
(0.9)
(1.0)
(0.17)
M
8.2
10.1
11.1
2.9
29.3
10.5
0.95
11.8
0.3
11.5
0.51
(SD)
(2.0)
(1.6)
(1.6)
(1.5)
(4.7)
(2.0)
(0.13)
(0.5)
(0.7)
(1.0)
(0.15)
T or Z
b
0.55
0.67b
0.78b
0.41a
0.74b
0.66b
0.39
0.46a
0.25
0.40
0.05
2.0
0.5
2.3
1.1
2.1
2.2
1.4
0.1
1.0
1.6
0.7
DISCUSSION
In response to the growing demand for brief,
repeatable tests of memory, we report on a revised version of the Hopkins Verbal Learning
Test which now includes a 20-25 min delayed
recall trial, a measure of forgetting, and a delayed recognition trial. Our results indicate that
the HVLT-R has acceptable reliability, and that
the test forms are equivalent with respect to
learning and delayed recall. There are modest
interform differences on the delayed recognition
task, and we recommend that this factor be taken
into account in the interpretation of HVLT-R
data. Using the same recognition task immediately after trial 3, Brandt (1991) also found that
form 3 results in better target discrimination
than forms 1 and 4. The findings were attributed
to differences in the number of false-positive
errors. When viewed together, existing research
with the HVLT (or HVLT-R) indicates that form
3 is less likely to produce false-positive recognition errors than forms 1, 2, and 4. Although the
modest degree of difference is likely to have
little practical significance, we recommend that
when the HVLT-R is used in repeated examinations, that forms 1, 2, and 4 or forms 3, 5, and 6
be used together, where possible. Analyses of
the recall trails data indicate that all six forms
are interchangeable.
11.81
10.61
11.81
10.49
(0.6)
(1.1)
(1.4)
(0.16)
10 12
01
9 12
0.17 0.75
9 12
05
7 12
0.17 0.92
4 12
7 12
7 12
08
19 36
6 12
58 120
19
16
59
19
10.17
10.17
19.17
15.17
17.17
10.17
20.17
17.17
67.17
17
14.17
17.17
18.17
11.25
11.25
10.25
10.25
10.25
13.25
18.25
10.25
15.25
18.25
19.25
10.25
22.25
18.25
73.25
12.50
10.50
12.50
10.50
11.50
12.50
19.50
10.50
16.50
19.50
10.50
12.50
25.50
19.50
83.50
16
10.50
12.50
11.50
10.50
10.50
17.50
19.50
10.50
12.50
27.50
10.50
90.50
25
19.50
12.50
12.50
14.50
32.50
12.50
75
110.50 10.50
110.50
112.50
110.50 10.70
118.50
110.50
111.50
113.50
130.50
111.50
100.50
50
Percentile Ranks
Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.
11.71
10.71
11.01
10.56
18.11 (1.7)
10.31 (1.4)
11.01 (1.2)
13.11 (1.4)
29.41 (3.7)
10.61 (1.6)
95.11 (11.0)
17 30
8 18
(4.6)
(2.1)
24.21
13.81
Range
(SD)
Age (years)
Education (years)
Table 3. HVLT-R Normative Data for 46 Male and 56 Female Young Adults.
10.50
10.75
14.50
33.50
19.50
84
10.50
10.85
15.50
35.50
11.50
95
10.75
10.92
16.50
36.50
12.50
98
99
50
RALPH H.B. BENEDICT ET AL.
7.8
9.9
10.9
3.2
28.8
10.3
93
11.8
0.7
11.2
.59
11.8
0.2
11.6
.49
(0.4)
(0.9)
(1.1)
(0.16)
9 12
02
9 12
0.13 0.83
10 12
05
5 12
0.25 0.90
3 12
4 12
7 12
08
17 36
4 12
50 113
31 54
10 20
(6.5)
(1.9)
(1.7)
(1.5)
(1.2)
(1.5)
(3.8)
(1.7)
(11.2)
Range
(SD)
14.50
16.50
18.50
10.50
20.50
16.50
63.50
15.50
17.50
19.50
11.50
22.50
17.50
70.50
16.50
18.50
10.50
12.50
25.50
19.50
82.50
16
19.50 10.50
12.50
19.50
10.13 10.17
11.50
11.50
10.50
10.25
118.50
110.50
111.50
113.50
129.50
111.50
100.50
50
19.50
11.50
12.50
14.50
32.50
12.50
75
95
15.50 16.50
33.50 34.50
10.50 11.50
12.50
84
16.50
35.50
11.50
98
12.50
11.50 110.50
11.50 111.50 12.50
10.50 110.50 10.75 10.75 10.83 10.88
17.50
19.50
10.50
12.50
26.50
10.50
89.50
25
11.50 12.50
10.50
11.50 11.50 112.50
10.50 10.50 110.50 10.50 10.50 10.75 10.81
19.50
15.50
57.50
17.50
Percentile Ranks
Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.
42.1
13.8
Age (years)
Education (years)
Table 4. HVLT-R Normative Data for 79 Male and 156 Female Middle-Aged Adults.
10.83
10.90
17.50
36.50
12.50
99
HVLT-R
51
3 12
4 12
6 12
1 6
15 36
5 12
56 120
7.4 (1.9)
9.7 (1.7)
10.6 (1.4)
3.3 (1.5)
27.5 (4.3)
9.8 (1.8)
(12.9)
91
8 12
04
7 12
0.10 0.90
9 12
04
7 12
0.17 0.87
(0.9)
(0.9)
(1.4)
(0.18)
(0.6)
(0.8)
(1.1)
(0.16)
11.5
0.7
10.8
0.56
11.7
0.4
11.3
0.52
61.9
13.8
55 69
6 20
Range
(4.3)
(2.6)
(SD)
10.17
10.10
15.50
15.50
57.50
13.50
14.50
16.50
15.50
18.50
19.50
12.50
23.50
18.50
78.50
11.50
12.50
19.50
10.50
11.50
11.50
10.50
10.48
19.50
13.50
18.50
10.20
11.50
13.50
19.50
10.25
18.50
14.50
17.50
10.13
19.50
14.50
17.50
10.19
16
15.50
16.50
18.50
11.50
20.50
16.50
63.50
14.50
15.50
17.50
10.50
16.50
16.50
60.50
95
98
12.50
11.50 10.50
11.50 12.50
10.50 10.50 110.50 10.75 10.83 10.88
15.50 16.50
32.50 34.50 35.50
12.50
12.50
10.50
11.50 112.50
10.50 110.75 10.75 10.86 10.89
84
11.50
11.50
10.50
10.50
118.50
111.50
112.50
115.50
131.50
111.50
100.50
75
17.50
10.50
11.50
13.50
28.50
10.50
92.50
50
16.50
19.50
10.50
12.50
25.50
19.50
83.50
25
Percentile Ranks
Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.
Age (years)
Education (years)
Recall Measures All Forms, (n = 129)
Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained
Recognition Measures Forms 1,2,4, (n = 68)
True-Positives
False-Positives
Discrimination Index
Response Bias
Recognition Measures Forms 3,5,6, (n = 61)
True-Positives
False-Positives
Discrimination Index
Response Bias
Table 5. HVLT-R Normative Data for 50 Male and 79 Female Older-Aged Adults.
36
12
99
52
RALPH H.B. BENEDICT ET AL.
6.7
8.8
9.7
3.2
25.2
8.7
86
11.3
0.7
10.6
0.51
11.4
0.7
10.7
0.50
(0.9)
(0.9)
(1.5)
(0.18)
10 12
05
5 12
0.17 0.83
9 12
04
6 12
0.13 .90
3 12
4 12
5 12
07
14 35
0 12
0 120
70 88
5 20
(4.5)
(2.9)
(2.0)
(2.1)
(2.0)
(1.7)
(5.5)
(2.8)
(20.7)
Range
SD
0
0
3
4
15.50
16.50
10.13
14.50
11.50
19.50
14.50
15.50
15.50
10.50
14.50
16.50
10.17
19.50
13.50
18.50
10.20
14.50
15.50
16.50
11.50
16.50
14.50
46.50
11.50
12.50
10.50
10.25
10.50
11.50
19.50
10.27
15.50
16.50
18.50
12.50
20.50
16.50
70.50
16
11.50
11.50
10.50
10.25
11.50
11.50
10.50
10.44
15.50
17.50
18.50
12.50
21.50
17.50
80.50
25
118.50
111.50
111.50
114.50
129.50
111.50
100.50
75
19.50
11.50
12.50
15.50
31.50
12.50
84
17.50
34.50
10.50
12.50
95
12.50
10.50
11.50 112.50
10.50 110.71 10.75 10.83
12.50
11.50 110.50
11.50 112.50
10.50 110.66 10.75
16.50
19.50
10.50
13.50
25.50
19.50
89.50
50
Percentile Ranks
Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except percent retained and response bias.
75.2
13.4
Age (years)
Education (years)
Table 6. HVLT-R Normative Data for 25 Male and 50 Female Elderly Adults.
35
11
98
12
99
HVLT-R
53
54
The results of the test-retest reliability analysis should be viewed as preliminary, due to the
wide range of test-retest interval (14 to 134
days) and its restriction to a geriatric sample.
Despite the questionable interform reliability of
the recognition task, the HVLT-R still holds
several advantages over existing verbal learning
tests which provide a more comprehensive evaluation of memory (e.g., Delis et al., 1987). The
HVLT-R has six alternate forms which are
equivalent with respect to learning and recall,
and two sets of three forms which can be used
interchangeably for the assessment of delayed
recognition. The test is also easy to administer
and is tolerated well by elderly and demented
patients. These factors contribute to a cost-effective and less strenuous examination of learning and memory. Despite these advantages, the
HVLT-R, like its predecessor the HVLT, may
lack sufficient difficulty to detect deficits in
young, mildly impaired patients. As is apparent
in Tables 3-6, the test also suffers from a limited
range of scores, particularly on the recognition
task.
Research on the validity of the HVLT-R is
ongoing. In a recent article describing the psychometric qualities of the Brief Visuospatial
Memory Test-Revised (BVMT-R; Benedict et
al., 1996), the HVLT-R was included in a construct validity experiment. The HVLT-R was
administered to 126 patients, aged 55 and over,
diagnosed with vascular or mixed dementia
(22%), dementia of the Alzheimer type (21%),
schizophrenia (16%), mood disorder (19%), and
smaller numbers of patient with dementia due to
other etiologies. The rest of the test battery included the Controlled Oral Word Association
Test (Benton & Hamsher, 1983) with letter (S,P)
and category (animals, supermarket items) cues,
a 30-item short form of the Boston Naming Test
(Kaplan, Goodglass, & Weintraub, 1983), Developmental Test of Visual-Motor Integration
(Beery & Buktenica, 1982), and the Trail Making Test (Reitan, 1958). In the principal components analysis with varimax rotation, the HVLTR recall and discrimination index scores loaded
on a separate verbal learning and memory factor, and the response bias measure loaded on a
separate factor along with a response bias mea-
REFERENCES
Barona, A., Reynolds, C.R., & Chastain, R. (1984). A
demographically based index of pre-morbid intelligence for the WAIS-R. Journal of Consulting and
Clinical Psychology, 52, 885-887.
Beery, K. E., & Buktenica, N. A. (1982). Revised Administration, Scoring, and Teaching manual for the
Developmental Test of Visual-Motor Integration.
Cleveland, OH: Modern Curriculum Press.
Benedict, R.H.B., Schretlen, D., Groninger, L.,
Dobraski, M., & Shpritz, B. (1996). Revision of
the Brief Visuospatial Memory Test: Studies of
normal performance, reliability, and validity. Psychological Assessment, 8, 145-153.
Benton, A. L., & Hamsher, K. (1983). Multilingual
Aphasia Examination. Iowa City, IA: AJA Associates.
Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: A revision of the National Adult Reading Test. The Clinical Neuropsychologist, 3, 129136.
Brandt J. (1991). The Hopkins Verbal Learning Test:
Development of a new memory test with six equivalent forms. The Clinical Neuropsychologist, 5,
125-142.
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A.
(1987). California Verbal Learning Test: Adult
Version. San Antonio, TX: The Psychological Corporation.
Geffen, G. M., Butterworth, P., & Geffen, L. B.
(1994). Test-retest reliability of a new form of the
auditory verbal learning test (AVLT). Archives of
Clinical Neuropsychology, 9, 303-316.
Kaplan, E. F., Goodglass, H., & Weintraub, S. (1983).
The Boston Naming Test (2nd ed). Philadelphia,
PA: Lea & Febiger.
McCaffrey, R. J., Ortega, W. H., Osillo, S. M., &
Nelles, W. B. (1992). Practice effects in repeated
neuropsychological assessments. The Clinical Neuropsychologist, 6, 32-42.
Medicare Part B of New York (1995, August). The
Medicare News Brief 95-12. Medicare Part B:
Crompond, NY.
Parker, E. S., Eaton, E. M., Whipple, S. C., Heseltine,
P. N. R., & Bridge, T. P. (1995). University of
Southern California Repeatable Episodic Memory
Test. Journal of Clinical and Experimental Neuropsychology, 17, 926-936.
Reitan, RM. (1958). Validity of the Trail Making Test
as an indicator of organic brain damage. Perceptual and Motor Skills, 8, 271-276.
HVLT-R
55