You are on page 1of 15

Psychological Review

Vol. 65, No. 4, 1958

STIMULUS AND RESPONSE GENERALIZATION:


DEDUCTION OF THE GENERALIZATION
GRADIENT FROM A TRACE MODEL 1
ROGER N. SHEPARD
Harvard University2

It is now generally acknowledged existence of a quantitatively invariant


that (a) a response conditioned to one "gradient of generalization" in the
stimulus tends also to occur to other first place (5, 16, 21), there are those,
stimuli, and (b) the magnitude of this like Schlosberg and Solomon, who
response-tendency (for any particular consider this gradient to be linear, as
one of those stimuli) is governed by in A (8, 14, 22, 23); those, like Spence,
the dissimilarity between that stimu- who consider it to be convex upward,
lus and the stimulus to which the as in B (27, 28); and those, like Hull,
response was originally conditioned. who consider it to be concave upward,
Indeed, this principle of stimulus gen- as in C (1, 11, 12, 19, 20). In addi-
eralization is of such fundamental tion, if the "discriminal dispersion" of
importance that any quantitative the psychophysicists can be inter-
theory of behavior that fails to deal preted as a gradient of generalization,
with it explicitly can only be regarded there is support for a bell-shaped func-
as incomplete. It is not surprising, tion which is first convex and then
therefore, that a number of investiga- concave upward as shown in D (9, pp.
tions have been specifically concerned 317-319).
with the form of the function relating In addition to stimulus generaliza-
generalized response-tendency to in- tion, an analogous phenomenon of
terstimulus dissimilarity. What may response generalization is sometimes
seem surprising, though, is that the supposed to operate so that (a) a
conclusions of these various studies, stimulus to which one response has
far from converging on a unique func-
tion, have diverged to the several
different functions illustrated in Fig.
1. Beyond those who doubt the
1
This work was begun on a National Acad-
emy of Sciences-National Research Council
Postdoctoral Associateship at the Naval Re-
search Laboratory, was completed under Con-
tract AF 33(038)-14343 between Harvard
University and the Operational Applications
Laboratory, Air Force Cambridge Research
Center, Air Research and Development Com-
mand, and appears as Rep. No. AFCRC-TN-
57-62, ASTIA Document No. AD 146753. STIMULUS DISSIMILARITY
Reproduction for any purpose of the United
States government is permitted. The author FIG. 1. Several proposals for the form of
is indebted to H. Glaser for a number of useful the gradient of stimulus generalization. The
suggestions and to G. A. Miller and S. S. abscissa in each case is some measure of the
Stevens for their improvements in the manu- dissimilarity between two stimuli, and the
script. ordinate is some measure of the tendency of a
3
Now at the Bell Telephone Laboratories, response, conditioned to one of the two stimuli,
Murray Hill, N. J. to follow the presentation of the other.
242
STIMULUS AND RESPONSE GENERALIZATION 243

been conditioned tends also to evoke function (Curve D) as nonreinforced


other responses, and (6) the magnitude trials are introduced with greater and
of this tendency (for any particular greater frequency.
one of those responses) is governed by
the dissimilarity between that re- THE EXPONENTIAL GRADIENT IN
sponse and the response originally PAIRED-ASSOCIATE LEARNING
conditioned. Although this principle
of response generalization is also of In view of the difficulties attending
considerable theoretical importance, any effort to establish one measure of
even less progress has been made to- dissimilarity as a standard, the follow-
ward the quantitative determination ing strategy was recently proposed:
of the shape of the gradient in this Stimuli (or responses) were con-
case than in the case of stimulus gen- ceptualized as points in a "psychologi-
eralization. cal space" in such a way that the dis-
Actually, that a unique gradient of tance between any pair of these points
generalization has failed to emerge represented the psychological dissimi-
both in studies of stimulus generaliza- larity between the corresponding pair
tion and in studies of response general- of stimuli (or responses). By taking
ization no longer seems so surprising account of a set of metric axioms which
when one examines these studies in any measure of distance should satisfy,
detail. For although the obtained it was shown that hypotheses about
gradient must depend crucially upon the shape of the gradient of generaliza-
the choice of the independent variable tion could be tested without resorting
(i.e., dissimilarity), the various in- to an independent measure of dissimi-
vestigators have never been able to larity (25). In a series of experiments
agree on any one measure of dissimi- on stimulus and response generaliza-
larity as appropriate. Furthermore, tion during paired-associate learning,
since tests for generalization have been substantial support was obtained for
conducted during various stages of the hypothesis that generalization is
discrimination learning, as well as an exponential decay function of psy-
after various periods of extinction, chological distance (26). Since alter-
one must consider the possibility that native hypotheses were not tested,
the form of the gradient may change however, it seems desirable to present
radically under different conditions of data in a form that will clearly rule
reinforcement (13, 31). out the Functions A, B, and D of Fig.
This paper will attempt to show 1.
that, by approaching the generaliza- Now in a paired-associate experi-
tion problem from a somewhat differ- ment there are N stimuli, Si, S^, . . .
ent direction, considerable evidence Sif, to which are assigned N responses,
can be adduced for the proposition RI, RZ, . . . Rff. On each trial one of
that the gradients of stimulus and re- the N stimuli is presented to the sub-
sponse generalization both conform to ject who must in turn produce one of
an exponential decay function (Curve the N responses. Various procedures
C of Fig. 1). Further evidence will of differential reinforcement can be
be presented to indicate that the form used to communicate to the subject
of the gradient does indeed depend the prevailing assignment of the re-
upon the schedule of reinforcement sponses to the stimuli. Sometimes
and, more particularly, that it changes the subject is simply informed after
from the exponential to a bell-shaped each trial whether his response was or
244 ROGER N. SHEPARD

was not the correct one (i.e., the one spends more or less randomly. This
assigned to the stimulus presented on means that the function given in the
that trial); at other times more elabo- above formula necessarily levels off at
rate methods of "correction" are used some positive asymptote for large
(26). In any case the so-called as- interstimulus distances. Again, in
signment is simply a rule which the order to compare data from different
experimenter follows in delivering the experiments, this asymptote must be
reinforcements, and any arbitrary brought down to zero for each experi-
one-to-one assignment can be estab- ment. This is accomplished by esti-
lished in this way. mating a parameter Cs from each set
Consider then a paired-associate of data (25), and by defining the gen-
experiment in which the responses are eralization between Si and 5* to be
highly distinctive and so lead to neg-
ligible amounts of generalization. If
there are N pairs consisting of one P;SP,

stimulus and its assigned response, * ' rn


L1J
the data from such an experiment can
be represented by an N X N matrix Likewise, in an experiment with
giving, for every 5< and Si,, the condi- highly discriminable stimuli, the re-
tional probability Ps with which Si sponse generalization between Ri and
leads to the response assigned to Si, Rjc will be given by
(25). A very basic measure of stimu-
lus generalization between Si and 5* is G** = (1 + C)
then provided by either of the prob- / P..RP, .R
abilities Pn,s or Pkis- Comparison X -J ** *' - LCR L2J T21
between different experiments is facil-
V P*PM*
itated, however, by adjusting these where P^K is the conditional probabil-
measures so that the generalization ity of Rk, given the stimulus assigned
between any stimulus and itself is to Ri (25).
always unity. This can be done by Equations 1 and 2 then specify the
replacing the absolute probabilities dependent variables for paired-associ-
Piks and Pkis with the ratios Piks/ ate experiments, i.e., stimulus and re-
Pus and PMS/PhKa. Furthermore, sponse generalization. With regard
these two ratios can be averaged to- to the independent variables, i.e.,
gether to furnish a single, more stable interstimulus and interresponse dis-
estimate of the generalization between tances, it is clear that physical meas-
Si and Sk. Since there are theoretical ures are not directly applicable.
reasons for preferring the geometric Thus the psychological distance be-
to the arithmetic mean (25), the meas- tween two tones at a fixed difference
ure of stimulus generalization might in intensity changes as both tones are
be defined by the formula increased in intensity. But this
does not mean that there is no relation
between physical and psychological
distance. On the contrary, at least
the following two statements can be
Owing to the gradual manner in made : (a) two tones which are brought
which differential reinforcement takes arbitrarily close together in terms of
effect, however, there is an initial physical measures also approach each
phase during which the subject re- other psychologically ; (&) as two tones
STIMULUS AND RESPONSE GENERALIZATION 245

separated by a fixed difference in then be given by


intensity are increased in intensity,
1
the psychological distance (although
it does not remain fixed) nevertheless
G(D)
N -D
-LG,
changes in a gradual and continuous (with i - k = D) [3]
manner. These considerations form
a basis for the assumption that the where, as indicated, the summation is
locus of a set of stimuli in psychologi- carried out over the N D stimulus
cal space can always be obtained from pairs separated by j ust D steps. Sim-
their locus in physical space by a ilarly, if the responses meet the linear-
transformation that is (a) continuous ity condition, this formula can be used
and (6) differentiate (25). as a measure of the average general-
Although the stimuli and responses ization for all pairs of responses sep-
chosen for experiments on paired- arated by D steps. Thus, Go, in
associate learning have typically been Formula 3 can stand for either Go,8 or
words or nonsense syllables, nonverbal G*R.
materials can be used just as well. Now the average generalization
Indeed, for the purposes of establish- G(D) and the separation D have
ing a measure of the psychological properties that particularly recom-
distance between stimuli, it is espe- mend them for investigations of the
cially convenient to choose stimuli relation between generalization and
(such as tones differing only in in- distance. First of all, D is quite in-
tensity) that can be varied along a sensitive to the physical measure used
single physical dimension. In gen- to space the stimuli evenly along the
eral, of course, stimuli that are evenly chosen dimension. Thus, if the stim-
spaced along a single physical dimen- uli are squares differing only in size,
sion will be neither evenly spaced nor it does not much matter whether these
confined to a straight line in psy- squares are spaced in accordance with
chological space. Nevertheless, it fol- constant increments of area or length
lows from the assumptions of contin- of side. It is a consequence of the
uity and differentiability that, if linearity condition that these two
stimuli are evenly spaced over a variables (a2 and a) will be almost
sufficiently small range of a single linearly related within the small range
physical dimension, then they will be of variation permitted along the size
spaced approximately evenly along an dimension. In fact, any variable that
approximately straight line in psy- is equivalent to these variables except
chological space. for a continuous, differentiable trans-
Suppose, then, that the N stimuli of formation could presumably be used.
a paired-associate experiment satisfy Thus the stimuli could just as well be
this special condition which, for pres- regularly spaced in terms of measures
ent purposes, will be called the linear- based upon psychophysical procedures
ity condition. Such stimuli can be such as the summation of jnd's, cate-
designated as Si, S2, . . . SN in such gory judgment, magnitude estimation,
a way that the subscripts correspond etc. (29). All of these measures are
to the ordinal positions of the N continuous and differentiable func-
stimuli along the common (approxi- tions of the usual physical variables.
mately straight) line. The average This indifference of D to the vari-
generalization for all pairs of stimuli able chosen for evenly spacing the
just D steps apart along this line will stimuli is further enhanced by com-
246 ROGER N. SHEPARD

1.0' puting G(D) as an average for all


O CIRCLES VARYING IN SIZE
(MCGUIRE) pairs of stimuli separated by D steps.
5.9- Thus, even if there is some residual
systematic contraction or expansion
z of psychological distance, as one pro-
O .?
ceeds from the first to the last pair of
2.6 stimuli at a given separation D, this
systematic effect will be largely can-
.5-
celed out by averaging both ends of
the stimulus range together.
In Fig. 2 average generalization
G(D) is plotted against D for several
experiments on the learning of paired
associates. In each experiment sev-
5-H eral different random assignments of
the responses to the stimuli were em-
0 1 2 3 4 5 6 7 8 ployed. Since the average psycho-
Dl S T A N C E . D logical spacing of the stimuli and re-
FIG. 2. An exponential decay function sponses varies somewhat from experi-
fitted to data from several experiments in- ment to experiment, the D-values for
volving stimulus or response generalization each experiment are multiplied by a
during paired-associate learning. The center constant factor to make them compar-
of each plotted symbol (triangle, square,
circle) has average generalization G(D) as able with the Z>-values for the other
ordinate and distance D (multiplied by a con- experiments. The conclusion is clear:
stant factor) as abscissa. The experiments Although the empirical points fall
were conducted by Attneave (1), McGuire closely along an exponential decay
(18, 26), and Shepard (26). In the experi-
ment with circles of variable color, the stimuli function (the fitted curve), they devi-
were not confined to a single physical dimen- ate markedly from the alternative
sion. Since the linearity condition was not Functions A, B, and D of Fig. 1.
met in this case, the D-values were taken from
a multidimensional scaling solution obtained DEDUCTION OF THE EXPONENTIAL
by Torgerson for these same colors (30, Ch.
11). Since the stimuli were contained in a GRADIENT FROM A TRACE
relatively small region of psychological space, MODEL
it seems safe to assume that Torgerson's
judgmental method of triads yields satisfac- The principal purpose of this paper
tory estimates of psychological distance in the is to propose a model to account for
present sense. In the experiment on response
generalization, it had been found that the probabilities, it was necessary to substitute
two end-responses did not conform to the the approximate formula (Ps + PKS)/
linearity requirement (26). The plotted (Pns + Pkks) for the geometric mean used in
data were therefore based upon the generaliza- Equation 1. However, in a previous investi-
tion between the seven intermediate responses gation this approximation was very close and,
only. Finally, Attneave's experiments devi- indeed, possessed greater statistical stability
ated from the design presupposed by the pres- than the geometric mean (24). (d) There
ent analysis in the following respects, (a) were not sufficient data to make the estima-
The stimuli were quite widely spaced and so tion of the constants Ca feasible. For the
may not have met the linearity requirement, purposes of plotting the data, therefore, Cs
(i) The number of trials was much smaller was assumed to be zero. That the agreement
than in the other experiments. This makes between the various sets of data is good de-
the steady-state condition assumed in the en- spite the deviations from optimum conditions
suing theoretical discussion seem a little un- in certain instances suggests that the data
likely, (c) Since Attneave reported only the may not be particularly sensitive to such
sums Pit3 + Pus but not the individual deviation.
STIMULUS AND RESPONSE GENERALIZATION 247

the empirically determined exponen- s'


tial form of the gradient of general- o
ization in terms of a hypothetical
trace process. The aim of such a
proposal is twofold: First, it is desired o
O ^5
to remove the apparent arbitrariness O Q O
of the exponential gradient by showing
that it follows from certain elementary
assumptions of a more intuitively S
o
compelling character. Second, it is o
hoped that such a model will provide o
o-
for prediction to other experiments in o
which the usual paired-associate con- o
ditions no longer prevail. o
The model is suggested by recogni-
tion experiments of the following kind. FIG. 3. The diffusion and deconditioning
of the stimulus trace. The arrows connecting
A subject is shown a certain square a single stimulus S to the alternative percep-
and, after a delay of t units of time, tual representations S* in A, B, and C illus-
shown a second square which may or trate how the trace elements are assumed to
may not be the same as the first. In spread out with time. In D, E, and F the
process is represented as continuous.
this situation the probability of re-
sponding "same" is distributed, with
respect to the difference in size between Suppose, then, that when Si leads to
the first and second squares, according St*, a large number of trace elements
to some bell-shaped density function. are conditioned (by reinforcement)
Since the time error is usually small, from Si to Si*. This bundle of trace
the mode falls near the zero-difference elements or "stimulus trace" is des-
point. The variance of the distribu- ignated by the arrow in A. Immedi-
tion, however, increases appreciably ately following the removal of the ex-
with the delay t imposed between the ternal stimulus Si, the trace elements
first and second exposure (2,3,10, 17). are subject to haphazard perturba-
The unidimensional stimuli, Si, Si, tions so that some of the conditioned
. . . Sif, in an experiment on paired- trace elements wander to adjacent
associate learning, will be designated stimulus representations as shown in
by small circles arranged in a vertical B. (One might imagine here the
row along the left, as in A of Fig. 3. action on the synapses of the random
Corresponding to these, there are con- molecular processes associated with
ceived the internal representations metabolism.) Later still, some of
(perceptions), Si*, S^*, . . . SN*, these elements will wander even fur-
which will be designated by the small ther from Si*, as shown in C.
circles on the right. Whenever a Now for any two stimuli differing
stimulus Si is presented, it leads to along a single physical dimension,
some one of the internal representa- another stimulus can be found which
tions 5**, with probability P. If is situated intermediately between
the responses are so distinctive as these. Thus, as indicated in D, a
never to be confused, reinforcement continuum may be conceived as under-
will ensure only if a stimulus is fol- lying the corresponding internal repre-
lowed by its corresponding representa- sentations. This continuum then is
tion (25). the psychological space of the stimuli.
248 ROGER N. SHEPARD

After removal of the stimulus, the dis- which does not depend upon the choice
tribution of trace elements progres- of an arbitrary interval At. Then, if
sively spreads out in this space, as the <-axis is translated so that condi-
illustrated in D, E, and F. tioning takes place at t = 0, m can be
In addition to this spread or diffu- increased in such a way that, as At
sion of the trace, it is assumed that the 0, m- At -t.
trace elements are also subject to Although the probability that a
spontaneous deconditioning (again, given trace element is deconditioned at
presumably owing to random proc- precisely time t is zero, the probability
esses at the molecular level). Thus, per unit time (the "probability den-
as the elapsed time increases, the sity") or rate of deconditioning for an
number of elements still conditioned individual trace element is, at the
decays to zero. In Fig. 3, the shaded particular instant t,
areas represent the fraction of the
original trace elements which are still lim [1 - U(Af)Ji^U(At)/At
A(->0
conditioned at each time.
But by Equation 5 as At -* 0, U(At) >
The Deconditioning of the Trace U-At+Ut/m. Therefore, the rate
The probability that a given condi- of deconditioning at time / is, for
tioned element will suffer decondi- single trace elements,
tioning during a small interval At will lim [1 - Ut/m~]mU
be denoted by U(At). The simplest m*oo
rule that can be assumed to govern = 7 exp (- 70 [6]
this probability is as follows:
Assumption I. U(At) is a constant, where, for convenience in what fol-
independent of the time the element lows, exp ( Ut) is used in place of
has remained conditioned and inde- e~ut.
pendent of the distance (in psycho- With regard to the deconditioned
logical space) to which the element has trace elements, the following rule is
drifted in that time. the simplest that can be assumed:
The probability that an element, Assumption II. In the absence of
still conditioned at time /, remains further reinforcements, a decondi-
conditioned until t + At is, of course, tioned element remains deconditioned.
1 - [/(Aif). Therefore, the probabil- From this assumption and Equation
ity that a given trace element remains 6 it follows that the fraction of the
conditioned during the first m intervals originally conditioned elements re-
of length At, but then becomes decon- maining conditioned at time t is
ditioned during the immediately suc-
ceeding A/-interval, is, by Assumption U(f) = 1 - f ' 7 exp (- Ur)dr
I,
[1 - U(AO~]mU(At) [4] = exp (- 70 [7]
Now U(At) is necessarily propor- The Diffusion of the Trace
tional to the interval chosen for At.
It is convenient, therefore, to define Equation 7 completes the quantita-
a deconditioning parameter, tive formulation of the deconditioning
of the trace. In order to provide a
7(AQ similar formulation for the diffusion
U = lim [5]
A(->0 At of the trace, it is necessary to consider
STIMULUS AND RESPONSE GENERALIZATION 249

the motions of the individual trace gible. This can be stated with greater
elements in psychological space. The precision as follows :
exposition is simplified by continuing Assumption IV. The function f&t
to suppose that the stimuli are evenly and, thus, the probability density of
spaced along a restricted range of a displacements of a trace element from
single physical dimension. It is then x = 0 is distributed over the #-axis
possible to introduce one coordinate with finite variance.
Xi for each stimulus representation Si* From Assumptions III and IV it
giving its position along a one-dimen- follows that the variance of the dis-
sional psychological space. The psy- tribution of displacements during an
chological distance between any two interval At is the finite constant
stimuli, Si and Sk, is then
Da, = \Xi xk\
V(Af) = [11]
[8]
Now the expression Vik(Ax, At) Since V(Af) must depend upon the
will denote the probability that, if at length of the interval At, it is useful to
time t a trace element is situated at Xi, define a diffusion parameter,
by time t + At it will have moved into
the one-dimensional region bounded V = lim [12]
by Xk and Xk + Ax. As before, the Af-0 At
arbitrary interval Ax can be eliminated which does not require the stipulation
by defining a new quantity of an interval At. Just as U governs
the rate of deconditioning of the trace,
[9] then, V controls the rate of spread or
diffusion of the trace in psychological
is the probability density space.
of displacements from Xi to Xk during The next question to be answered is
the brief interval A^. The simplest this : Given the diffusion parameter V,
rule that can reasonably be assumed what form will the distribution of
to govern this quantity is as follows: trace elements take after an appreci-
Assumption III. Vik(At) is an able delay t? It is possible to show
invariant function of the psychologi- that the assumptions which have been
cal distance between 5,- and Sk. It made are sufficient conditions for the
does not depend upon the time ^ or desired distribution to tend toward a
upon the absolute position of the pair, limiting form which is independent of
Si and Sk, in psychological space. the form of /A ( (6, 15). Specifically,
If the x-axis is translated so that # if a trace comprising a large number
= 0, it follows from Assumption III () of elements is conditioned from Si
that, for a given interval At, there to Xi at t = 0, the density of these
exists a fixed function /A t, such that elements at x/, for some later time t
conforms with the Gaussian function
Vik(At) - [10]
n-Vik(t) = n-(2ir7/)-*
However, it will not be necessary to X e x P [ - (A [13]
make any particular assumption about
the form of the function /A*. It is The beauty of this result is its com-
only necessary to insure that, during plete independence from the under-
a short period of time, the probability lying mechanism symbolized by f&t-
of a very large displacement is negli- Thus, even though one imagines, for
250 ROGER N. SHEPARD

example, that thermal agitation of the initial phase of rapid learning, rein-
molecular substrate is responsible for forcements occur at relatively fre-
deconditioning and diffusion, the bio- quent and regular intervals. If the
physical details of this process need early trials are disregarded, then, a
not be specified. For, according to roughly steady-state process can be
the present formulation, these details considered. The summation of the
are irrelevant to the question of the Gaussian curves of various ages can
gross behavior of the trace system. then be approximated by an integra-
tion of these curves over t.
The Trace Process in Paired-Associate Now, with respect to Si, the density
Learning of trace elements at Xk resulting from
a reinforcement t time units ago is
Now in the course of learning paired
given by n- Vik(t). However, only
associates, each stimulus Si will have the fraction U(t) of these is still con-
been presented on many occasions.
ditioned. Therefore the density of
Furthermore, on a number of these
elements at Xk which remain condi-
occasions, reinforcement of the re-
tioned after a delay t from reinforce-
sponse assigned to St will have condi-
ment is n U(t) Vii, (f) Clearly, then,
tioned a bundle of n trace elements to
the total density of conditioned trace
Si*. Thus at some given time /o, the elements resulting from all previous
density of conditioned elements re-
reinforcements of the response assigned
sulting from the immediately preced-
to Si is distributed in approximate ac-
ing reinforcement will be distributed
cord with
in psychological space as shown for
t-i in Fig. 4. Likewise, the densities
remaining from earlier reinforcements (X n-U(t)-Vik(t)dt
Jo
will be distributed as shown for L.2,
2_3, and so on. c
= I w-exp ( Ut) u.TrV()~*
The total distribution of condi- Jo
tioned elements emanating from Si at exp [- (Dik)i/2Vf\dt [14]
to can be found by summing the
Gaussian distributions resulting from Fortunately, the integration can be
all previous reinforcements of the re- effected (4, p. 144) and, indeed, yields
sponse assigned to S,. It is possible to
derive an analytic approximation to n(2UV)~
this composite distribution (J"t in [15]
Fig. 4) by supposing that, after an
For intermediate stages of paired-
s* associate learning, then, this function
can be taken as a measure of the
strength of connection of the stimulus
Si to the point xk on the perceptual
continuum. It is not a probability
density, however, since (if n> 1 or
U < 1) the integration of this func-
tion over the x-axis yields a value
FIG. 4. A series of Gaussian distributions greater than unity (7, p. 133).
with increasing variances, and the composite
distribution arising from an integration of Multiplication of Equation 15 by
these over time. U/n converts this function to a prob-
STIMULUS AND RESPONSE GENERALIZATION 251

ability density, But if K is identified with the constant


distance multiplier calculated for each
Pik = (C//2V)* experiment, this is precisely the ex-
exp[- D ponential function fitted to the empir-
for then ical data in Fig. 2.

f+X Pikdxk = 1 [17] Further Aspects of the Trace Process


'-00

The role of deconditioning. It might


Now Equation 16 furnishes an esti- have seemed unnecessary to include
mate of the conditional probability the deconditioning assumptions (I and
(per unit x) of the particular percep- II) along with the diffusion assump-
tion Xk, given the stimulus Si. In tions (III and IV), since the process of
order to secure the probability of diffusion alone would account for the
taking 5,- to be Sk (through stimulus spread of the trace, and hence for
generalization), Sk* can be reinterpre- stimulus generalization. However,
ted as a finite region partitioned off from Equation 18 it is clear that, with-
from psychological space in the neigh- out deconditioning, generalization
borhood of x^ In this way the entire would be so extensive as to prohibit
one-dimensional space can be divided the learning of paired associates.
into TV mutually exclusive and ex- For if U = 0, Gik = 1 for all * and k.
haustive segments so that each seg- In this case the gradient is perfectly
ment corresponds to one of the ex- flat, and discrimination between stim-
ternal stimuli. A given trace element uli is impossible. This is a conse-
is then said to be conditioned to Si* at quence of carrying the integration of
time t if and only if it falls in the region Equation 14 out to infinite t. Alter-
containing Xi at that time. natively, one could integrate only out
Pik, reinterpreted in this way, can to some finite value t = T. However,
be taken as an approximation to the in terms of the model, this is equiva-
conditional probability that the ex- lent to assuming that the independent
ternal stimulus Si will lead to the elements simultaneously suffer de-
internal representation Sk*. If the conditioning at the same instant T
early trials are ignored (since these time units from conditioning. Ra-
trials do not sufficiently approach a ther than postulate coincidences of
steady state), the constant Cs in this kind, it seems more plausible to
Equation 1 can be disregarded (25). assume the gradual kind of fading away
Setting Cs = 0, and substituting the implied by Assumptions I and II.
right-hand member of Equation 16 This fading away (or forgetting) then
for Pa,3 in Equation 1, then, the serves the adaptive function of weight-
generalization between 5, and 5* as- ing old, diffuse traces less heavily than
sumes the remarkably simple form new, accurate traces.
G* = exp (- Dik V2Z7/F) [18] Of course, the integration of Equa-
tion 14 out to infinite t is not strictly
Letting K stand for the constant justified, since the learning experi-
\2U/V and averaging dts over all ment itself proceeds for only a finite
pairs of stimuli separated by a fixed period. The error introduced by the
distance D, Equation 3 now takes the infinite integration is small, however,
form if the deconditioning parameter U is
G(D) = e~*D [19] sufficiently greater than zero. For
252 ROGER N. SHEPAED

then the hypothetical traces persist- be greatest for stimuli or responses at


ing from reinforced trials that might the end of a linear array (26).
be imagined as preceding the actual Multidimensional generalization.
beginning of the experiment would be For expository reasons the stimuli
almost totally deconditioned during were supposed to vary along a re-
all but the early trials of the learning stricted range of a single physical di-
experiment. mension. Actually, by making use
Asymmetric generalization. It has of the theory of random motions in
sometimes been suggested that there Euclidean spaces of more than one
exist asymmetries in which a general- dimension (6), it can be shown that
ization going in one direction, e.g., the trace model leads to the same ex-
Si > Sk*, is more probable than a gen- ponential gradient in either of the
eralization going in the reverse direc- two following multidimensional cases:
tion, Sh Si*. At first glance, such (a) the psychological space is Euclid-
a possibility appears to violate As- ean ; (b) the stimuli are confined to a
sumption III, according to which the small region of psychological space.
probability of a displacement for a (The second case follows from the
given trace element depends only first. For, by the hypothesized rela-
upon the length (and not upon the tion between physical and psychologi-
direction) of that displacement. cal space, a sufficiently small region
However, an account of such asym- of even a non-Euclidean space will be
metries which is consistent with As- approximately Euclidean.) The
sumption III is suggested by the treatment of generalization over large
analysis proposed by Bush and Mos- distances in non-Euclidean spaces,
teller. According to their model, the however, awaits the development of a
psychological dissimilarity going from general theory of random motions in
Si to Sk is equal to the dissimilarity spaces of positive and negative curva-
going from Sk to Si only if the "set tures.
of stimulus elements" comprising St Response generalization. The pre-
has the same measure as the set of ceding discussion has been formulated
stimulus elements comprising S/, (5). solely in terms of stimulus generaliza-
In terms of the trace model, then, it tion. However, the same trace model
can be supposed that the area of the can also be applied in the case of
region of psychological space des- response generalization. Suppose, for
ignated as Si* may be greater or example, that the responses are closely
smaller than the area of the region spaced along a single physical dimen-
designated as S**. Since the prob- sion, whereas the stimuli are com-
ability that a given trace element pletely discriminable. When one re-
(wandering at random) will occupy a sponse Ri is intended, some other
response Rk may actually be made.
certain region during an interval A is
Using a notation analogous to that
proportional to the area of that adopted for stimulus generalization,
region, Pi!es does not necessarily equal it can be said that Rf leads to RS, with
Pkis. Indeed, the weight Wts, pro- probability PikR. If then Rf* leads
posed earlier by Shepard (25), is to Ri, the ensuing reinforcement
presumably a measure of the area of conditions a large number of trace
the region corresponding to Si. This elements from Ri* to a point Xi in the
interpretation suggests why the em- psychological space of the responses.
pirically obtained weights tended to These conditioned trace elements are
STIMULUS AND RESPONSE GENERALIZATION 253

then subject to the rules already set


forth in Assumptions I through IV.

THE FORM OF THE GRADIENT AND


THE SCHEDULE OF REINFORCEMENT
In the last section the apparent
arbitrariness of the specific function
fitted to the paired-associate data of
Fig. 2 was reduced by showing that
this function is a consequence of
elementary assumptions which do not
involve the specification of any partic-
ular function. The purpose of this
section is to determine whether these
same assumptions have any conse-
quences for different types of experi-
ments in which the sequence of rein- FIG. 5. The gradient of generalization as
forced and nonreinforced trials is a function of the number of trials intervening
manipulated, as in the study by Hum- between a response and the last preceding
phreys (13). feedback as to the correctness of that response.
Certainly, from Fig. 4 it is clear The numbers of intervening trials are grouped
together as follows: I. 0 trials; II. 1-3 trials;
that the sharp peak of the combined III. 4-8 trials; IV. 9-15 trials; and V. 16 or
gradient at x = 0 is contributed more trials.
solely by the distributions of trace
elements resulting from the most re- with 100 per cent reinforcement for
cent reinforcements (like the dis- this effect. Since the stimuli are pre-
tribution L_i). Therefore, if feedback sented in random order, the delay be-
as to the correctness of each response tween successive occurrences of a
is terminated, the composite gradient given stimulus or response varies over
should become rounded (convex up- a considerable range. Thus separate
ward) in the vicinity of x = 0 and, gradients can be plotted depending on
under continued extinction, gradually the number of trials intervening be-
flatten out. Likewise, if reinforce- tween a given stimulus-response se-
ment is delivered only for very wth quence and the most recent feedback
correct response, the composite gradi- concerning the correctness of that
ent should assume an intermediate particular sequence. The most ex-
form which tends more toward either tensive data currently available in a
an exponential or a bell-shaped curve form suitable for this analysis come
as n is made smaller or larger. Con- from an experiment on response gen-
siderations of this kind may account eralization in paired-associate learning
for the finding of Humphreys that, (26). Those data are therefore re-
whereas with 100 per cent reinforce- analyzed and plotted in Fig. 5. As
ment the generalization gradient was predicted from the model, the gradient
concave upward, with 50 per cent of generalization systematically
reinforcement it was initially convex changes from concave to convex up-
upward (13). ward in the vicinity of the correct
It is also feasible to analyze data response as the number of trials inter-
from paired-associate experiments vening between a response and the
254 ROGER N. SHEPARD

last differential reinforcement of that to paired-associate experiments with


response is increased. continual differential reinforcement,
These results may help to explain the micromechanical model has im-
why the so-called "discriminal dis- plications for a considerably wider
persion" observed in absolute-judg- range of experiments.
ment and identification experiments The three basic assumptions from
seems to conform to a Gaussian or which the macromechanical model was
normal density function (9, pp. 317- originally derived are as follows, (a)
319). Since differential reinforce- Stimulus and response generalizations
ments are not usually provided in take place independently of each
these experiments, the discriminal dis- other, (b) The probability of a
persion is presumably maintained by stimulus generalization is an expo-
those haphazard reinforcements of nential decay function of the psycho-
everyday existence which antedate logical distance between the stimuli.
the beginning of the experiment (c) The probability of a response gen-
proper. Under these circumstances eralization is an exponential decay
the gradient must be relatively meso- function of the psychological distance
kurtic, as illustrated in V of Fig. 5. between the responses. Now to say
This gradient resembles a Gaussian that the scope of the earlier model will
function, and is quite unlike the com- be increased by the adjunction of the
paratively leptokurtic gradients (such micromechanical assumptions is to
as I) which have been shaped by re- say that the macromechanical as-
cent differential reinforcement. sumptions (a, b, and c) will retain
their original form only under cer-
MlCROMECHANICAL AND MACROME- tain limiting conditions (e.g., under
CHANICAL MODELS FOR GENERAL- continual reinforcement). When
IZATION these conditions are modified (as
when the reinforcements are delivered
The present trace model might be only intermittently), Propositions b
termed a micromechanical model be- and c will have to assume different
cause it is based upon assumptions forms in accordance with the con-
about the fine-grain or "microscopic" clusions of the last section.
mechanics of the underlying trace Even with continual reinforcement,
process. This model is to be dis- there is one case of paired-associate
tinguished from an earlier macrome- learning for which the macromechani-
chanical model derived from assump- cal assumptions would have a different
tions pertaining to large-scale or form if deduced from the microme-
"macroscopic" aspects of generaliza- chanical assumptions. Specifically, if
tion (25). However, to the extent the interstimulus distances and the
that the two models are compatible, interresponse distances are both quite
they can be integrated so that the small, the trace model does not lead
macromechanical assumptions of the directly to the exponential gradient.
earlier model will appear as deductions Because occasional sequences of the
from the more primitive microme- form Si -> St* -Rk* -R t will be
chanical assumptions of the present reinforced, trace elements will be con-
model. The aim of such an integra- ditioned from Si to Xk (rather than
tion would be to broaden the scope of only to X{). Events of this kind will
the earlier model. For, whereas the somewhat alter the form of the gradi-
macromechanical model applies only ent. (In a previous experiment with
STIMULUS AND RESPONSE GENERALIZATION 255

generalization, both between the stim- both invariant functions of inter-


uli and between the responses, no stimulus and interresponse dissimilari-
significant departure from prediction ties, respectively, provided that two
on the assumption of an exponential conditions are met. First, dissimi-
gradient was observed [26]. How- larity is reinterpreted to mean a
ever, the theoretically expected devi- "psychological distance" which (a)
ations would be quite small and may is equivalent to "physical distance"
have been obscured by the rather except for a continuous, differentiable
large variability of the data from that transformation, and (6) satisfies the
experiment.) metric axioms. Second, a given
Detailed derivations from the mi- schedule of reinforcement is main-
cromechanical assumptions are quite tained.
complicated in the case of simultane- 3. Under conditions of frequent
ous stimulus and response generaliza- and regular reinforcement (as in the
tion, owing to the circumstance that, typical paired-associate experiment),
in this case, the form of the gradient the gradient of generalization is
depends upon the particular stimulus- closely approximated by an exponen-
response assignment enforced. Part tial decay function (concave upward).
of this complication is connected with 4. Under conditions of infrequent
the absence in both models of any or intermittent reinforcement, this
account of the decrease in generaliza- gradient departs from the exponential
tion which necessarily accompanies function in that it is convex upward in
learning. An entirely satisfactory the vicinity of the reinforced stimulus
treatment of these problems will or response.
probably require an even more basic 5. The empirically observed gradi-
integration of these models with the ents of generalization can be deduced
already extensively developed models from a mathematical model based upon
for learning per se. four elementary assumptions con-
cerning the temporal decay of stimu-
SUMMARY lus and response traces.
The problem of the relation between
generalization and dissimilarity (i.e., REFERENCES
the problem of the shape of the 1. ATTNEAVE, F. Dimensions of similarity.
"gradient of generalization") is re- Amer. J. Psychol., 1950, 63, 516-556.
examined in the light of recent the- 2. BACHEM, A. Time factors in relative
oretical and empirical developments. and absolute pitch determination. /.
acoust. Soc. Amer,, 1954, 26, 751-753.
With regard to experimental arrange- 3. BALDWIN, J. M., & SHAW, W. J. Memory
ments in which reinforcments are de- for square-size. Psychol. Rev., 1895,
livered in accordance with a one-to- 2, 236-239.
one assignment of the responses to the 4. BIERENS DE HAAN, D. Noutielles tables
stimuli (as in paired-associate learn- d'integrales definies. New York: G. E.
Stechert, 1939,
ing), the following conclusions are 5. B JSH, R. R., & MOSTELLER, F. A model
drawn : for stimulus generalization and dis-
1. Measures of generalization can crimination. Psychol. Rev., 1951, 58,
be defined in terms of the conditional 413-423.
probabilities with which the various 6. CIANDRASEKHAR, S. Stochastic prob-
lems in physics and astronomy. In
stimuli lead to the various responses. N. Wax (Ed.), Noise and stochastic
2. Thus defined, stimulus general- processes. New York: Dover, 1954.
ization and response generalization are Pp. 3-89.
256 ROGER N. SHEPARD

7. FELLER, W. An introduction to prob- 20. PLOTKIN, L. Stimulus generalization in


ability theory and its applications. Morse code learning. Arch. Psychol,
New York: Wiley, 1950. 1943, 40, No. 287.
8. GRANDINE, Lois, & HARLOW, H. F. 21. RAZRAN, G. Stimulus generalization of
Generalization of the characteristics conditioned responses. Psychol Bull,
of a single learned stimulus by monkeys. 1949, 46, 337-365.
/. comp. physiol. Psychol, 1948, 41, 22. ROTHKOPF, E. Z. A measure of stimulus
327-338. similarity and errors in some paired-
9. GUILFORD, J. P. Psychometric methods. associate learning tasks. /. exp. Psy-
(2nd ed.) New York: McGraw-Hill, chol., 1957, 53, 94-101.
19S4. 23. SCHLOSBERG, H., & SOLOMON, R. L.
10. HARRIS, J. D. The decline of pitch dis- Latency of response in a choice dis-
crimination with time. J. exp. Psy- crimination. /. exp. Psychol, 1943,
chol., 1952, 43, 96-99. 33, 22-39.
11. HOVLAND, C. I. The generalization of 24. SHEPARD, R. N. Stimulus and response
conditioned responses: I. The sensory generalization during paired-associates
generalization of conditioned responses learning. Unpublished doctoral dis-
with varying frequencies of tone. J. sertation, Yale Univer., 1955.
gen. Psychol., 1937, 17, 125-148.
12. HULL, C. L. Principles of behavior. 25. SHEPARD, R. N. Stimulus and response
New York: Appleton-Century, 1943. generalization: A stochastic model
13. HUMPHREYS, L. G. Generalization as a relating generalization to distance in
function of method of reinforcements. psychological space. Psychometrika,
J. exp. Psychol, 1939, 25, 361-372. 1957, 22, 325-345.
14. JOHNSON, D. M. Generalization of a 26. SHEPARD, R. N. Stimulus and response
scale of values by the averaging of generalization: Tests of a model relat-
practice effects. /. exp. Psychol., ing generalization to distance in psy-
1944, 34, 425-436. chological space. /. exp. Psychol,
15. KAC, M. Random walk and the theory 1958, 55, 509-523.
of Brownian motion. In N. Wax 27. SPENCE, K. W. The differential response
(Ed.), Noise and stochastic processes. in animals to stimuli varying within a
New York: Dover, 1954. Pp. 369- single dimension. Psychol Rev., 1937,
391. 44, 430-444.
16. LASHLEY, K. S., & WADE, MARJORIE. 28. SPENCE, K. W. A reply to Dr. Razran
The Pavlovian theory of generaliza- on the transposition of response in dis-
tion. Psychol. Rev., 1946, S3, 72-87. crimination experiments. Psychol.
17. LEYZOREK, M, Two-point discrimina- Rev., 1939, 46, 88-91.
tion in visual space as a function of the 29. STEVENS, S. S. On the psychophysical
temporal interval between the stimuli. law. Psychol Rev., 1957, 64, 153-181.
J. exp. Psychol, 1951, 41, 364-375. 30. TORGERSON, W. S. Theory and methods
18. McGuiRE, W. J. A multi-process model of scaling. New York: Wiley, 1958.
for paired-associates learning. Un- 31. WICKENS, D. D., SCHRODER, H. M., &
published doctoral dissertation, Yale SNIDE, J. D. Primary stimulus gen-
Univer., 1954. eralization of the GSR under two con-
19. MARGOLIUS, G. Stimulus generalization ditions. /. exp. Psychol, 1954, 47,
of an instrumental response as a func- 52-56.
tion of the number of reinforced trials.
J. exp. Psychol, 1955, 49, 105-111. (Received December 17, 1957)

You might also like