Professional Documents
Culture Documents
Previous books
Syntax
Academic Press 1976
Syntactic Nuts: Hard Cases in Syntax
Foundations of Syntax, I
Oxford University Press 1999
Parasitic Gaps
co-edited with Paul M. Postal
MIT Press 2001
with Andrzej Nowak
Dynamical Grammar
Foundations of Syntax, II
Oxford University Press 2003
with Ray Jackendoff
Simpler Syntax
Oxford University Press 2005
with Elizabeth Hume
Basics of Language for Language Learners
Ohio State University Press 2010
Explaining Syntax
Representations, Structures,
and Computation
P E T E R W. C U L I C OV E R
1
3
Great Clarendon Street, Oxford, ox2 6dp,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
# Peter W. Culicover 2013
The moral rights of the author have been asserted
First Edition published in 2013
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence, or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
ISBN 978–0–19–966023–0
As printed and bound by
CPI Group (UK) Ltd, Croydon, cr0 4yy
Contents
Preface xi
1 Prologue. The Simpler Syntax Hypothesis (2006) 1
1.1 Introduction 1
1.2 Two views on the relation between syntax and semantics 2
1.3 Mainstream syntactic structures compared with Simpler Syntax 3
1.4 Application to Bare Argument Ellipsis 5
1.5 Some other cases where Fregean compositionality does not hold 7
1.5.1 Metonymy 7
1.5.2 Sound + motion construction 7
1.5.3 Beneficiary dative construction 7
1.6 Choosing between the two approaches 8
1.7 Rules of grammar are stored pieces of structure 9
1.8 Conclusion 11
Part I. Representations
2 OM-sentences: on the derivation of sentences with systematically
unspecifiable interpretations (1972) 15
2.1 Introduction 16
2.2 On OM-sentences 16
2.2.1 The readings of OM-sentences 17
2.2.2 A possible source for and-OM-sentences 18
2.2.3 The conjunction 19
2.2.4 Or-OM-sentences 22
2.3 What can a consequential OM-sentence mean? 25
2.4 Some proposals for derivation 28
2.4.1 Can there be deletions? 28
2.4.2 Do consequential OM-sentences have if ’s in deep structure? 31
2.4.3 How do you derive an OM-sentence? 36
2.4.4 Comparing approaches 41
2.4.5 Sequence of tenses 42
2.4.6 The consequences for phrase structure 45
2.5 The incongruence reading of and-OM-sentences 46
2.6 Rhetorical OM-sentences and the incongruence reading 49
2.7 Summary 52
vi contents
5.4 Arguments of Koster and May (1981) for syntactic PRO 150
5.4.1 Wh-infinitives 151
5.4.2 Redundancy of base rules 152
5.4.3 Pseudo-clefts 153
5.4.4 Extraposition 153
5.4.5 Coordination 154
5.4.6 Construal 154
5.5 Comparison with the Projection Principle 155
5.5.1 The categorial component and the lexicon 156
5.5.2 Raising to subject 159
5.5.3 NP-trace 160
5.5.4 Acquisition 161
5.6 Conclusion 162
6 Negative curiosities (1982) 163
6.1 Introduction 164
6.2 Tags: the polarity facts 165
6.2.1 Types of tag 166
6.2.2 Syntactic analysis of tags 168
6.2.3 Determinants of tag polarity 171
6.2.4 Deriving the ambiguity 175
6.2.5 Tags and surface structure scope 177
6.3 Any 179
6.4 More curiosities 184
6.5 Conclusion 188
References 358
Index 375
This page intentionally left blank
Preface
The articles collected here are all concerned in one way or another with a
question that has engaged me ever since I began my study of natural language
syntax: why does syntax have the properties that it has? In order to even
attempt to imbue this question with empirical content, it is essential to
determine what “syntax” is, and what its properties are. When I began the
study of syntax as a graduate student in the 1960s, I thought I understood this,
more or less, but as time has progressed, what seemed obvious or at least not
to be disputed has become much less clear to me, and much more unstable.
Some of the results of my attempts to reconstruct what “syntax” is, and what
its properties are, at least for myself (and with my collaborators), are repre-
sented in this book.
This book considers various aspects of what the proper domain of syntax is
(“Representations”), how to properly characterize the syntax of a language
(“Structures”), and reasons why some syntactic possibilities might be more
likely to be encountered than others (“Computation”). Hence the title—
Explaining Syntax: Representations, Structures and Computation.
Collecting a representative set of articles such as this allows for some
unique opportunities. One can look back and see how far one has come in
some respects, one can look back and see how little one has changed in other
respects, and one can correct errors, omissions, and various infelicities. And,
not insignificantly, one can renew one’s acquaintance with one’s earlier avatars,
a process occasionally accompanied by recognition, amazement, or shock. It is
very gratifying to be able to do all these things here.
In looking back, I find the seeds of my most recent work, Syntactic Nuts,
Simpler Syntax (with Ray Jackendoff) and Grammar and Complexity (forth-
coming), in some of the pieces that I worked on as much as forty years ago.
For example, in “OM-sentences: on the derivation of sentences with system-
atically unspecifiable interpretations” (reprinted here as Chapter 2), I was
concerned with the fact that distributional patterns found in certain con-
structions that could be attributed to invisible syntactic structure need not
be attributed to such structure if we take into account the fact that these
constructions have interpretations that can be held responsible for the pat-
terns. By taking this position I was swimming against the mainstream of the
time, which for the most part has accepted without question the rule of
thumb that if two sentences show the same distributional pattern, they have
the same syntactic structure (visible or not). After forty years, I find that I am
xii preface
still swimming against the mainstream (in this regard, at least—see the
treatment of ellipsis in Simpler Syntax and more recently in Culicover and
Jackendoff, 2012), although perhaps with more company than forty years ago.
On the other hand, much has changed. Perhaps the most important change
concerns the status of linguistic unacceptability. Ray Jackendoff and
I suggested in “A reconsideration of Dative Movements” (reprinted here as
Chapter 11) that certain instances of unacceptability might be due to the way
in which interpretations of sentences are computed, and not to the grammar
per se. We wrote “The distinction between the rules of the grammar and how
the rules are used by the speaker or hearer to create or interpret sentences is
still scrupulously maintained. All that is changed is that it is no longer so
obvious what sentences are to be generated by the rules: we cannot rely
entirely on intuition to determine whether an unacceptable sentence is gram-
matical or not (using ‘grammatical’ in the technical sense ‘generated by the
grammar’).” This is a perspective that I take up and elaborate on at some
length in Grammar and Complexity.
Another theme that has occupied me for much of the past forty years has
been the proper treatment of ‘constructions’ in grammar. I explored this issue
in “On the coherence of syntactic descriptions”, where I tried to capture the
naturalness of a grammar containing a set of distinct constructions that make
use of similar or identical structures. When this paper was published in 1973, it
was still commonplace to think of grammars as consisting of constructions.
Formal syntacticians were just beginning to contemplate the idea that con-
structions are epiphenomenal reflexes of more abstract parameter settings.
This latter view had its roots in the analysis of the passive construction in
Chomsky’s “Remarks on nominalization” (Chomsky, 1972) and came to occupy
a central position in mainstream work over the next twenty years or so. But as
many of the papers included here show, I have always taken seriously the idea
that constructions are properly part of grammars, not epiphenomenal. In
Grammar and Complexity I come back to the role of constructions in defining
the formal complexity of a grammar and in accounting for language change.
In order to provide a more general overview of these various themes and to
link the pieces reproduced here to more recent developments in the field,
I include a brief article entitled “The Simpler Syntax Hypothesis”, by Ray
Jackendoff and myself as Chapter 1. For those chapters that originally lacked
abstracts I have written brief summaries that highlight their main goals,
results, and shortcomings, and link them to later work. I have taken the
opportunity in editing the articles to correct a few youthful indiscretions
and overstatements, to fix errors in trees and references, adding those that
should have been cited but were not, to omit some discussion that is particu-
larly irrelevant to contemporary concerns, and to interject a few comments
preface xiii
Prologue
The Simpler Syntax Hypothesis
(2006)*
What roles do syntax and semantics have in the grammar of a language? What
are the consequences of these roles for syntactic structure, and why does
it matter? We sketch the Simpler Syntax Hypothesis, which holds that much
of the explanatory role attributed to syntax in contemporary linguistics
is properly the responsibility of semantics. This rebalancing permits broader
coverage of empirical linguistic phenomena and promises a tighter integra-
tion of linguistic theory into the cognitive scientific enterprise. We suggest
that the general perspective of the Simpler Syntax Hypothesis is well suited
to approaching language processing and language evolution, and to computa-
tional applications that draw upon linguistic insights.
1.1 Introduction
What roles do syntax and semantics have in the grammar of a language, and
what are the consequences of these roles for syntactic structure? These
questions have been central to the theory of grammar for close to 50 years.
We believe that inquiry has been dominated by one particular answer to these
questions, and that the implications have been less than salutary both for
linguistics and for the relation between linguistics and the rest of cognitive
science. We sketch here an alternative approach, Simpler Syntax (SS), which
offers improvements on both fronts and contrast it with the approach of
mainstream generative grammar (Chomsky 1965; 1981a; 1995). Our approach,
developed in three much more extensive works (Culicover 1999; Jackendoff
2002; Culicover and Jackendoff 2005), draws on insights from various
* [This chapter appeared originally in Trends in Cognitive Sciences 10: 413–18 (2006). It is
reprinted here by permission of Elsevier.]
2 explaining syntax
interpretation that assigns Ozzie this extra role. Thus, semantics can have
more elaborate structure than the syntax that expresses it.
Let us make more precise our notion of syntactic complexity. For Simpler
Syntax, the complexity of syntactic structure involves the extent to which
constituents contain subconstituents, and the extent to which there is invis-
ible structure. Thus, the structure of A in (3a) is simpler than in (3b) or (3c),
where is an invisible element. SS will choose (3b) or (3c) only if there is
empirical motivation for the more complex structure.
(3) a. [A B C D]
b. [A B [a C D]]
c. [A B [a C D]]
SSH allows the possibility of abstract elements in language when there
is empirical motivation for their syntactic (and psychological) reality. In
particular, it acknowledges the considerable linguistic and psycholinguistic
evidence for ‘traces’—the gaps that occur in languages such as English when
constituents appear in non-canonical position (Featherston 2001):
(4) What do you think you’re looking at ___ ?
Theories like that, I have a really hard time believing in ___.
Despite the considerable reduction of complexity under Simpler Syntax,
syntactic structure does not disappear altogether (hence the term ‘simpler
syntax’ rather than ‘simple’ or ‘no syntax’). It is not a matter of semantics that
English verbs go after the subject but Japanese verbs go at the end of the
clause—nor that English and French tensed clauses require an overt subject
but Spanish and Italian tensed clauses do not; that English has double object
constructions (give Bill the ball) but Italian, French, and Spanish do not;
that English has do-support (Did you see that?) but Italian, French, German,
and Russian do not; that Italian, French, and Spanish have object clitics
(French: Je t’aime) before the verb but English does not. It is not a matter
of semantics that some languages use case morphology or verbal agreement,
or both, to individuate arguments. That is, there remains a substantial body
of phenomena that require an account in terms of syntactic structure.
a. TP
DP[nom] T⬘
Joe T PerfP
have T <have> vP
pres DP v⬘
<Joe> v VP
V v DP[acc] V⬘
put en Spec D⬘ V PP
<those> AP nP in D nP
raw n NP the n NP
potatoes n N pot n N
<potatoes> <pot>
b. S
NP Aux VP
Joe has V NP PP
put Det AP N P NP
Figure 1.1. (a) A mainstream analysis of Joe has put those raw potatoes in the pot.
Elements in brackets are unpronounced copies of elements elsewhere in the tree.
(b) Simpler Syntax analysis of Joe has put those raw potatoes in the pot.
textbook for beginning graduate students (Adger 2003). The literature offers
many other variants of comparable complexity.
Figure 1.1(a) is representative of the most recent version of mainstream
theory, the Minimalist Program (Chomsky 1995; Lasnik 2002). Such a
structure typically incorporates many elements that do not correspond to
perceived form (e.g. v, n, and multiple copies of Joe, have, put, and potatoes),
as well as many constituents that are motivated largely on theoretical
grounds. Classical constituency tests, such as the ability to displace as a
unit, provide motivation only for major constituent divisions such as TP,
DP, and PP.
the simpler syntax hypothesis 5
1.5.1 Metonymy
An individual can be identified by reference to an associated characteristic, as
when a waitperson says to a colleague,
(13) The ham sandwich over there wants more coffee.
The intended meaning is ‘the person who ordered/is eating a ham sandwich’. FC
requires the syntax to contain the italicized material at some hidden syntactic
level. Another example is (14), in which the interpretation of Chomsky is
clearly ‘a/the book by Chomsky’.
(14) Chomsky is next to Plato up there on the top shelf.
Simpler Syntax says that the italicized parts of the interpretation are supplied
by semantic/pragmatic principles, and the syntax has no role.
(17) VP
V NP
1.8 Conclusion
The choice between mainstream syntax and Simpler Syntax is important at
three levels.
First, Simpler Syntax affords broader empirical coverage of grammatical
phenomena.
Second, Simpler Syntax enables a stronger link between linguistic theory
and experimental and computational accounts of language processing.
Changing the balance between syntax and semantics along the lines
proposed by Simpler Syntax might contribute to resolving longstanding
disputes about their relative roles in language processing (Brown and
Hagoort 1999).
Third, Simpler Syntax claims that the foundation of natural language
semantics is combinatorial thought, a capacity shared with other
primates. It thus offers a vision of the place of language in human
cognition that we, at least, find attractive.
This page intentionally left blank
PART I
Representations
This page intentionally left blank
2
OM-sentences
On the derivation of sentences with systematically
unspecifiable interpretations
(1972)*
Remarks on Chapter 2
This chapter explores the form and interpretation of ‘OM-sentences’ such as
One more can of beer and I’m leaving. I originally observed in a short squib
(Culicover 1970) that, strikingly, the connectivity between the ‘one more’ phrase
and the conjoined clause is the same as that found in full sentences. Following
the standard mode of argumentation in syntax launched in the 1960s (and still
actively employed to this day), we might then conclude that we get the same
patterns in both cases because the ‘one more’ phrase is the elliptical form of a
full sentence. I argue that this conclusion is wrong; rather, OM-sentences are
instances of a particular construction whose interpretation is constrained by the
form, but not fully specified by the form. It follows that the connectivity must
be mediated by the semantics and pragmatics. Essentially the same arguments
are made in my later work with Jackendoff on related phenomena, e.g. pseudo-
imperatives such as Don’t move or I’ll shoot and Bare Argument Ellipsis (see
Culicover and Jackendoff 2005, and Chapter 1).
The force of this argument goes directly to the question of whether there is
invisible syntactic structure in elliptical constructions. The standard view in
mainstream generative grammar, represented most prominently in current
work by Merchant (2001), is that there is. But the evidence brought forth
in this article and elsewhere (see Chapter 1 and references there) is that the
invisible-structure position can be maintained only if we admit only the most
manageable subset of data in our inquiry. The full range of phenomena
suggests that the interpretation of elliptical constructions cannot in general
2.1 Introduction
This paper deals with the treatment in a transformationala grammar of
sentences like the following:
(1) One more can of beer and I’m leaving.
It will be shown in subsequent discussion that such sentences admit of three
‘interpretations’, which are very closely related to more commonly encoun-
tered constructions, including conditionals, but that nevertheless there are
aspects of the interpretation of such sentences which are systematically un-
specifiable. I will argue that these sentences should not be derived from more
complex underlying structures, but that they are in fact underlain by struc-
tures characterizable by phrase structure rule (2).
(2) S ➝ NP CONJ S
To complete the analysis, I will show how rules of semantic interpretation may
be devised which capture the similarities between sentences like (1) and other
constructions in a very natural way.
2.2 On OM-sentences
I will refer to sentences like (1) as ‘OM-sentences’. One of the more noticeable
properties of (1) is that it has an unusual surface structure, which is given
schematically in (3).
(3) NP and S
In general an OM-sentence is a sentence of the form in (3), with possible
variation in the nature of the conjunction. I will also distinguish between
different OM-sentences by the conjunction that they contain, e.g. ‘and-OM-
sentence’, ‘or-OM-sentence’, etc. The NP and the S in (3) will be referred to by
their category labels.
a
Contemporary MGG terminology has dispensed with the classical term ‘transformational’
in favor of the more generic ‘derivational’.
om-sentences 17
The best possible reading for the acceptable cases in (5) is the incongruence
reading. A considerably less acceptable reading is the sequential reading,
which is nevertheless possible if a sufficiently plausible context can be created,
as in (6) and (7).
(6) OK, we will discuss the Queen of England, and then I’m leaving.
(7) OK, I’ll watch (what you call) the best movie of the year, and then I’m
leaving.
It will be noted that the readings for an or-OM-sentence are not the same as
those for an and-OM-sentence such as the ones just discussed. In fact, it
would appear to be the case that there is only one possible reading for an
or-OM-sentence, which in the case of (8) is represented by (9).
1
In }2.4 I discuss ways in which this phenomenon may be further delimited. A solution to
this problem is not crucial, however, to the present discussion.
18 explaining syntax
but
(12) a. *One more can of beer and
b. *If you had drunk one more can of beer I had left.
(13) a. *One more can of beer and I will have been leaving.
b. *If you had drunk one more can of beer
The acceptable pairs of sentences correspond not only in their acceptability
judgments, but also in their interpretation. For example, (11a) is interpretable
only as a counterfactual: we know that whatever the event is which involves
the NP one more can of beer, it did not take place. (10a), like (10b), is
ambiguous. The latter can be paraphrased by either of the following two
sentences.
(14) a. Whenever you drink one more can of beer I leave.
b. If you drink one more can of beer (than you have already) I will
leave.
The same information can be deduced from (10a): whatever the event involv-
ing the NP is, either (a) I always leave when it happens, or (b) I’m going to
leave if it happens now.
While these observations might seem to be more than abundantly obvious,
it is quite important, I think, to establish clearly how strict the correlation
between conditionals and consequentials is. While it appears to be unavoid-
able that and-OM-sentences and if-then conditionals should be derived from
the same source, considering evidence such as the preceding, nevertheless I do
not believe that the precise nature of the relationship between them is as clear
as it might seem on the surface. I will show in the course of this paper that it is
inappropriate to analyze this relationship in transformational terms.
om-sentences 19
2
It might be argued that the deep structures of sentences with or are significantly different
from those with and. If this were true then it would not be possible to appeal to similarity of
structure up to the nature of the conjunction. I see no evidence to suggest, however, that
sentences with and and with or are not all derived from deep structures displaying coordinate
structure.
b
I make much the same argument for not deriving idiosyncratic constructions (‘syntactic
nuts’) from abstract syntactic structures in Culicover (1999) and Culicover (2013).
20 explaining syntax
(18) Two things happened which were not necessarily related: John came in
and Bill jumped out of the window.
Perhaps a better example of the juxtapositional reading, where there is no
likely confusion between it and the other two, is the following.
(19) Last year it rained one foot and it snowed three feet.
The three readings of (15) may be summarized by (20).
therefore
(20) John came in and then Bill jumped out the window.
also
I expect that there will be no doubt that (15) may have these readings. What
is more interesting is that two of these three readings correspond to readings
which we established for the and-OM-sentences, while the third is closely
related to one of them. Compare (4) and (20), for example.
Another case for which the same three readings which are illustrated in (20)
are possible is the following.
(21) Sit down in that chair and I’ll bake you a dumpling.
The consequential reading of this sentence is paraphrased by (22).
(22) If you sit down in that chair I’ll bake you a dumpling.
The sequential reading does not involve any causal relationship between the
request and the activity.
(23) Sit down in that chair, and (then (while you are sitting)) I’ll bake you a
dumpling.
The juxtapositional reading is difficult to get for this sentence: it is most
closely given by reversing the order of the conjuncts in (21).
(24) I’ll bake you a dumpling, and sit down in that chair.
In general it sounds strange to conjoin an imperative with a declarative,
particularly if there is no particular connection between the two, aside from
their being uttered in the same sentence. However, examples are of varying
acceptability depending on the context in which they are or may be used. E.g.,
(25) Albert is coming for dinner, and don’t forget to send out the laundry.
Therefore it is possible to say that the conjunction and in principle has
three readings.3
3
It may also be possible to find cases of constituent conjunction which have the three
readings referred to. For example,
om-sentences 21
between the two events. The word ‘link’ here is used in a rather abstract way,
meaning a temporal relationship, a cause–effect relationship, or the relation-
ship expressed by the incongruence reading, which we might refer to as a
‘mental’ relationship.
2.2.4 Or-OM-sentences
We remarked in }2.1.1 that or-OM-sentences could have only one reading. If
we consider or at the level at which we have been considering and, this fact
becomes surprising, since there are a number of logically possible interpret-
ations for sentences with or. The question is whether the set of meanings of a
sentence of the form S or S is coextensive with the set of logical equivalences of
the sentence. Consider the following example.
(27) John will close the window or Bill will freeze.
The point which I would like to make here4 is that the meaning of the
sentence is more than the logical structure of the sentence. A simple demon-
stration of this is the result of reversing the order of the clauses in (27). The
truth values remain the same, but the meanings change decidedly.
(28) Bill will freeze or John will close the window.
Another logical equivalent is (29)—
(29) If John closes the window Bill won’t freeze and if John doesn’t close the
window, Bill will freeze.
—and so is (30),
(30) If Bill freezes then John won’t close the window and if Bill doesn’t freeze
then John will close the window.
What is going on, evidently, is that the logical properties of implication are
not the same as the properties of conditionals as they are use conventionally.
It is correct to say, I think, that the meaning of or is more than its truth table
would suggest: there is some sense of relatedness between the events described
by the clauses. Furthermore, this relationship is such that the meaning of the
sentence changes when the order of clauses is reversed.
With this in mind it is easy to see why sentences like (31) and (32) mean
what they do.
(31) Stay home or Bill will leave.
4
This is certainly not the first time that this point has been made.
om-sentences 23
5
Notice that it is not clear how one would go about determining which if-then should be
chosen to underlie these sentences, since certainly a number of logical relationships may be said
to apply between the clauses. From (i) we may infer (ii) or (iii), for example.
(i) Give me a beer or I’ll call a cop.
(ii) If you give me a beer I won’t call a cop.
(iii) If you don’t give me a beer I’ll call a cop.
6
I have stated this assumption in the most general way possible, in order not to prejudice the
discussion by creating particular analyses at this point.
24 explaining syntax
A similar relationship can be seen to hold between (37) and (36) at some level
of representation.
(37) Drink any more beer and I’m leaving.
It is immaterial for this discussion at present whether or not (35) and (37) are
derived from the same deep structure as (36). Whatever the level is at which
we wish to account for the presence of any, we are assuming that these three
sentences are identical at the level with respect to the rule in question.
If we consider now (31) and (33) we see that (33) cannot be a representation
for (31) at any level, since if it were we would expect to find the same behavior
as we do in the case of (35)–(37). We would expect that any would be
acceptable in an or-OM-sentence if (33) was a representation of (31), because
at the level of (33) there is no formal difference between it, and, say, (36). In
particular, we would expect to relate (38) and (39).
(38) If you don’t drink any more beer I’m leaving.
(39) *Any more beer or I’m leaving.
On the basis of this we must conclude that (39) does not contain if or any
element which corresponds to it at the level at which the acceptability of any is
determined.
It would seem to be the case, in fact, that at this level the or-OM-sentence
shares more of the characteristics of imperatives, and not conditionals. For
example, we can insert please into an or-OM-sentence or a sentence like (37),
but not into an and-OM-sentence, or an if-then conditional.
or
(40) One more beer, please, I’m leaving.
*and
or
(41) Give me one more beer, please, I’m leaving.7
*and
(42) *If you (don’t) give me one more beer, please, then I’m leaving.
Another interesting point is that while a conditional and an and-OM-
sentence may have truth value, an or-OM-sentence cannot. Hence it seems
7
Further evidence that sentences like (41) with or are underlying imperatives is that they can
take tags, while the sentences with and cannot.
or
(i) Give me some more beer, will you, I’m leaving.
*and
Sentence (i) with and is acceptable if it is assigned the juxtapositional reading, but not the
consequential. Of interest in this regard is whether (ii) is acceptable.
(ii) Some more beer, will you, or I’m leaving.
I myself find (ii) to be understandable, but marginal in grammaticality. It is quite sobering to
contemplate what the consequences for the grammar of English would be if (ii) were to be
judged grammatical; however, this factor has played no role in my judgment.
om-sentences 25
(44) hits me
explodes
rolls in front of me
If one more can of beer hits you I’m leaving.
hits anyone
comes out of the darkness
...
26 explaining syntax
8
A simple example of this, which was pointed out to me by W. C. Watt, is illustrated by the
following sentence.
(i) John was kicked in the head.
(i) does not specify who or what kicked John in the head. In order for John to have been kicked
in the head, a deep subject of kick must exist; we say that it is indeterminate. However, although
we do not know what or who did the kicking, we do know that it cannot be something which
lacks the capacity to kick. Hence the indeterminacy of the representation of the deep subject of
kick is restricted by the context.
9
But a sentence like (i)
(i) One more warm can of beer and I’m leaving.
is acceptable, since it suggests some event occurring which involves the warm can of beer.
10
The question marks before examples (49) and (50) indicate the infelicity of these sentences
as paraphrases of (1). The question marks before examples (52)–(54) below indicate their
infelicity as paraphrases of (51).
om-sentences 27
(49) ?If John tells me that Mary wants me to buy her one more can of beer,
I’m leaving.
(50) ?If the label of one more can of beer comes off, I’m leaving.
Intuitively it seems that in these examples there is no particular connection
between the can or the beer and my leaving; what is more important is John
telling me in the first case, and the label coming off in the second. Such an
intuition is much stronger when one more is not mentioned in the NP at all.
(51) Two beers and I’m leaving.
(52) ?If John tells me again that Mary wants me to pay for her two beers then
I’m leaving.
(53) ?If John begins to tell that old story about how he was so drunk that he
couldn’t even drink two beers I’m leaving.
(54) ?If a man comes in carrying two beers I’m leaving.
As a first approximation, then, we might say that the NP is understood to
be either the subject or the object of the sentences which may be used to
paraphrase the event involving the NP, which we represent as ‘E(NP)’. How-
ever, we can see immediately that this is at best a weak substitute for the
notion ‘intrinsic connection’ between the NP and E(NP). We can devise
examples in which the NP is the surface subject, and those in which it is
the deep subject, and in neither case can we conclude that such examples are
paraphrases of the corresponding OM-sentence. This indicates that what is
going on is independent of either deep or surface grammatical relations.
In the following examples, for instance, it is clear that the failure of the
(b)-sentences to be paraphrases of the (a)-sentences cannot be attributed to
the grammatical role of the NP without incorrectly denying the existence of
the paraphrase relationship in countless other cases.
(55) a. One more aging film star and I will stop reading the newspapers.
b. If one more aging film star is claimed by the gossip columnists to
have been reported by the Hollywood crowd to be dating a young
starlet I will stop reading the newspapers.
(56) a. One more beer company and I will stop watching TV.
b. If one more beer company announces that its product is the best in
America I will stop watching TV.
Presumably the relationship between the understood role of the NP and the
acceptability of the sentence is not a grammatical one, but a semantic one.
Unfortunately it is not at all clear at this point how this relationship should or
28 explaining syntax
c
Arguments along essentially the same lines are made against deletion analyses of Bare
Argument Ellipsis and sluicing in Culicover and Jackendoff (2005; 2012).
om-sentences 29
11
Naturally if we can show that there are such empirical objections to TDEL, this will serve
to justify further the condition of recoverability of deletions.
d
Here I was arguing implicitly against the Generative Semantics program, which sought to
encode all aspects of meaning syntactically (in deep structure), and then derive the surface
structure through transformations.
30 explaining syntax
12
This argument will assume that the auxiliaries of the antecedent and the consequences are
fully specified in deep structure. If they are specified in terms of a sequence of tenses rule then
other difficulties arise, which will be considered in }2.4.5.
e
Jackendoff and I (e.g. in Culicover and Jackendoff 2012) argue that this is one of the reasons
that deletion accounts of elliptical constructions (as in Merchant 2001) appear at first glance to
be plausible.
om-sentences 31
That is, given a situation in which the unspecified material is understood from
the context, the less fully specified sentence will serve to convey the same infor-
mation as the more fully specified sentence.
Notice, incidentally, that this suggests why numerical quantifiers, and
especially one more and another, are so natural in OM-sentences. The use of
one more presupposes a content which is completely known to the speaker
and the hearer. That is, whatever E(NP) is, it has happened before, everyone is
aware of it, and it might happen again. This is brought out clearly by
sentences in which one more does not appear, which are mere statements of
fact, and not threats.
the
(60) a. If you bring beer, I’ll bring the wine.
more
*The
b. beer and I’ll bring the wine.
*More
When the is used there is the implication that no beer has yet been brought.
When more is used there is the implication that some beer has already been
brought, but the consequence I’ll bring the wine, if it does not suggest a threat,
does not involve the implication that the consequence will follow from the
bringing of the beer.
It becomes clear now that the role of the determiner of the NP is precisely
to single out the next occurrence of the event as the deciding factor in the
cause-and-effect relationship, by contrasting it with all the other previous
events of a similar nature, which by implication are characterized by the fact
that they did not cause the consequence to take place.
I think it fair to say that such a criterion as whether the consequence
follows from the antecedent in the way that I have described it here has no
business as a constraint on a transformation, which would be necessary if we
wanted to maintain the derivation of these sentences by TDEL.
13
I do not mean to imply here that all obligatory transformations are ad hoc. I am merely
claiming that in the absence of clear, syntactic motivation any such transformation is ad hoc. The
question which arises here, therefore, is whether there is any syntactic evidence for deriving
OM-sentences from a deep structure such as (61). [NOTE: Subsequent developments in
monostratal approaches to syntax such as HPSG and LFG, as well as Simpler Syntax, have
gone all the way with this argument and rule out not only obligatory transformations, but all
transformations, based largely on the absence of clear, syntactic motivation. Crucially, system-
atic synonymy such as is found in the passive is not a sufficient criterion for assigning the same
underlying structure to different constructions—see Culicover and Jackendoff (2005: chs 1–3).]
om-sentences 33
Because of such cases it appears reasonable to argue that the syntactic rules
capture generalizations about the set of possible well-formed surface struc-
tures of language. From this it follows that any principle that requires us to
represent the interpretation explicitly at the deep structure level will result in
the loss of certain generalizations about what the set of well-formed surface
structures consists of. Hence we may conclude that while consequential OM-
sentences may be derived from an underlying if-then, they need not in
principle be so derived.
A third point we must consider, then, is whether the derivation of conse-
quential OM-sentences from underlying if-then can be carried out in a
manner which does not do violence to our previously accepted notions of
what may constitute a possible derivation. It can be shown, in fact, that a
transformation could only derive an OM-sentence from a deep structure like
(61) if that deep structure met certain semantic conditions. To constrain a
transformation just in case the deep structure has a certain interpretation
would be a rather unprecedented step for us to take, particularly if less radical
alternatives are available. Let us consider what well-formedness conditions
seem to be necessary.
In previous discussion I have spoken of “the event described by S.” This
choice of words was motivated by a reluctance to introduce complications in
terminology which could only be resolved at a much later point. I will now
show that it is not strictly the case that the S must describe an event. Let us
return first of all to sentence (1), repeated below for convenience.
(1) One more can of beer and I’m leaving.
In (68) below I summarize the potential readings of this construction,
according to the observation made previously.
(68) a. If . . . NP. . . , then S.
b. After . . . NP. . . , then S.
and
c. . . . NP. . . , (surprisingly) S!
but
The question that we will concern ourselves with now is whether the S can
describe a state in any of these interpretations. It is clear that we can have a
sentence which describes a state that comes about as the result of an event,
such as the state of knowing something.
(69) One more can of beer and Bill will know the truth about you.14
14
One might say in this context that know may be used as a metaphor for learn. However one
wishes to interpret (69), it is no accident that the relationship expressed here exists between
these two verbs, and not learn and some other verb not related.
34 explaining syntax
15
For an introduction to the notions of ‘felicity’, ‘infelicity’, and ‘felicity conditions’, see
Austin (1962).
om-sentences 35
Since (78a) is ungrammatical, this would suggest that (75) should function as
a well-formedness condition on deep structures. Such a suggestion is difficult
to accept, however, since deep structures are formal syntactic objects and (75)
is stated in semantic terms. Notice, more importantly, that it is not sufficient
to restate (75) in terms of syntactic structure, since there is no formulation
which would rule out the deep structure corresponding to (78c), but not
the deep structures underlying the well-formed OM-sentences which we
have been considering. We may conclude, therefore, that consequential
OM-sentences are not derived from deep structures containing if and then.
(81) S
S and S
NP VP I’m leaving
V NP
(82) S
S and S
NP VP I’m leaving
(83) S
S and S
NP VP I’m leaving
S V S
NP VP NP VP
V NP
16
See e.g. Jackendoff (1972: ch. 3).
38 explaining syntax
(86) S
NP and S
17
The notation ‘IMP(erative)’ refers to the form of the left-hand clause, and not to its
interpretation. For an extensive discussion of this duality, see Culicover (1971: ch. 1).
18
Generally the S and S structure does not have a conditional interpretation, but a cause-
and-effect interpretation. The former is a special case of the latter.
19
In representing the interpretive process I use a curved arrow to represent an interpretive
rule, a straight double arrow to represent a transformational mapping, and a single straight
arrow to represent a rule which operates only in the semantic component.
om-sentences 39
Given that the sentence has the conditional interpretation, represented by the
ANTE-CONS pair, we may apply the sequence of tense rule, which we may
plausibly consider to be an interpretive rule defined on ANTE-CONS inter-
pretations. If would have appears in the consequent, it signifies that CONS is
future irrealis, and it requires that past perfect irrealis appear in the
ANTE. What this means in terms of the interpretation is that the ANTE
must be unrealized and completely in the past with respect to the temporal
frame of reference defined by the consequent.
(90) future
a. would have irrealis
CONS ANTE
b. future past perfect20
irrealis irrealis
past perfect
c. had
irrealis
Rules (90a) and (90c) informally represent the semantic interpretation of
the auxiliaries would have and had, respectively. Rule (90b) represents the
sequence of tenses relation defined on a conditional whose CONS is future
irrealis, i.e. on a counterfactual conditional. It is this rule which is of most
concern to us.
Let us consider the interpretive process as it applies to a sentence like (91).
(91) *If you go I would have gone.
By rule (89) we get (92).
(92) CONS: ‘I would have gone’
ANTE: ‘you go’
20
I use the term ‘past perfect’ rather casually here. It is intended to represent the notion
‘completely in the past’. I do not wish to suggest that had is the only auxiliary which may appear
in the antecedent of a counterfactual conditional. (i) shows that this is simply false.
(i) If John could have fixed the faucet we wouldn’t have been flooded out of the house last night.
40 explaining syntax
ANTE
future : ‘you go’
irrealis
Transformational:
T
I D1 S2
R1
S1
Interpretative:
I D1 S1
R1
D2 S2
R2
Figure 2.1
ad hoc, deriving OM-sentences from if-then structures only for the purpose
of relating them semantically. When one takes into account the further
objections to such a rule which were discussed in }2.4.2, the case for it is
very weak indeed.
On the other hand, we must also ask whether there is independent motiv-
ation for postulating a deep structure D, in this case that characterized by (85)
and (86). Clearly a conjoined structure in English is generally well-motivated
for all kinds of sentences, and it should be noted that the conjunction and
found in OM-sentences is not a lexical item that is characteristic of OM-
sentences, but rather one of the set of coordinating conjunctions in the
language. It can be plausibly argued that the phrase structure rule in (85)
can be generalized to (97), furthermore, on the evidence that and is not the
only conjunction which can function in this position.
(97) S ➝ NP CONJ S
and
(98) Twelve cases of beer or I’m leaving.
but
Notice also that the form of the OM-sentence is basically like that of a typical
conjoined structure; i.e. the conjunction is located between the conjoined
constituents. This is precisely what we would expect if the OM-sentences were
a sub-class of the class of conjoined structures, and not something quite
unique.
Transformational:
T
D–D⬘ D–S
Rd Rd⬘
Id–Id⬘
Interpretative:
D–S
Rd Rs
M
Id–Is Id – [F Is] = Id–Id⬘
Figure 2.2
M
(101) if. . . Past (
be
) . . ., then. . . Past ( M
be
)...
f
The analysis of the English verbal sequence assumed here is that of Chomsky (1957). The
particular formalization of the morphology is not critical, and the argument would go through
equally well if the analysis was updated to a more contemporary one expressed in terms of
selection by each auxiliary verb of a VP with a particular morphological feature.
44 explaining syntax
(102) if . . . Past (M) have +en (be+ing) . . . , then . . . Past will have+en
(be+ing) . . .
Such a transformation would capture the generalization that in a conditional
the antecedent and the consequent must be of the same tense, e.g.,
a. I’ll wear a clown suit.
b. I’m wearing a clown suit.
(103) If you buy a rubber duck then
c. *I would wear a clown suit.
d. *I was wearing a clown suit.
One of the problems in defining T is that (103c) is unacceptable only as a strict
conditional. It can be seen that it is perfectly acceptable if interpreted as an
epistemic conditional. In order to maintain the generalization of sequence of
tenses in conditionals, it would be required to hypothesize a different deep
structure for the epistemic interpretation.
We must assume for the sake of argument, therefore, that we have a means
of distinguishing the two deep structures. It will turn out that it is not
sufficient for the consequent to be only present tense; it must also be future
time. If we consider (103b) we find that while it is ambiguous, it only has the
conditional interpretation if it is understood as being future time, while it has
the epistemic interpretation if it is understood as being present time. Fur-
thermore, the examples in (104), which do not display sequence of tenses, are
also present tense, yet cannot be conditionals.
(104) a. *If you leave early then I have left before you.
b. If you are Napoleon then I am the King of France.
c. If you have blonde hair then I must have dreamt you were a brunette.
We are still assuming that the deep structure for a conditional is identifiable as
such. What happens when we encounter a deep structure which can be an if-
then, but not a conditional, by virtue of the fact that present tense in this
structure is not interpretable as future time? We must identify the representation
of this structure as semantically inconsistent, because the consequent of a
conditional must be future time, yet cannot be future time in the case in
question. However, a semantic rule that will perform this identification would
eliminate the need for a sequence of tenses transformation, since the rule would
also identify as non-conditionals those cases in which the consequent is past
tense, and for the very same reason, i.e. past tense is not future time. Conse-
quently it becomes clear that a sequence of tenses transformation is a spurious
generalization, the true generalization being a sequence of time semantic rule.21
21
An alternative approach would be to assume not only that the deep structures of epistemic
conditionals and strict conditionals are different, but also that the present/future time
om-sentences 45
(105) S1
NP1 and S2
NP2 and S3
NP3 and S4
NP4 and S5
ambiguity of present tense is relatable to a deep structure difference. Thus we might assume that
Time, and not Tense, is present in deep structure, as Present, Past, or Future.
Such an assumption would enable us to maintain sequence of tenses as a syntactic general-
ization capturable by a transformation, but it would prevent us from capturing the syntactic
generalizations which are expressed in the standard expansion of the Aux given in (99). In view
of this, and in light of the discussion above, it can be seen that failure to account for sequence of
tenses in the form of a semantic rule results in the failure to capture generalizations at one place
or another in the grammar.
46 explaining syntax
22
Cf. Chomsky and Halle (1968), Chomsky (1971), and Bresnan (1971).
om-sentences 47
(115) a. *2 3 1
b. *1 3 2
c. *3 1 2
The only stress patterns which are acceptable for the incongruence reading
here involve higher than normal stress levels. To each of these stress patterns
corresponds a different paraphrase involving different presuppositions. Com-
pare (114a), (114b), and (114c), with (116a), (116b), and (116c) respectively.
(116) a. There are two thousand cases of beer here, and of all things, instead
of staying I’m going home.
b. Everybody is going to be here with the thousand cases of beer
except me, who is going home.
c. Instead of going to where the beer is I’m going home.
In the first reading, what is being emphasized is that I am doing something
which I normally would not do it there were a thousand cases of beer around,
namely go home. In the second reading, what is being emphasized is that of
all the people who might normally go home when there are a thousand cases
of beer around, I am not one of them. In the third reading, what is being
48 explaining syntax
(117) a. John brought two thousand cases of beer and I’m going home.
2+E 3 1+E
b. There are two thousand cases of beer at the party to be divided
up among all the people there, and I’m going home.
2+E 3 1
c. There are two thousand cases of beer at John’s house,
and I’m going home.
2 3 1+E
The incompatibility may exist at a number of levels: behavior contrary to
desirable behavior, behavior contrary to required behavior, behavior contrary
to normal behavior, etc. In any case, the incompatibility arises as a result of
some variation for expected behavior in a certain context. An example will
illustrate the range of variation possible.
(118) One thousand cases of beer, and John is going home.
If John’s job is to load beer, then his behavior is contrary to what is required of
him. If he likes to drink beer, then his behavior is contrary to what would be
desirable for him to do. If it is considered a normal human trait to drink beer
when it is available, then his behavior is contrary to what would be normal for
him. If the event described by the S involves a natural phenomenon, then the
abnormality is more strongly felt, e.g.,
(119) Three days of sunshine and this flower hasn’t bloomed.
It turns out that the incongruence reading is also possible if there is no emphatic
stress, but if it is explicitly stated that there is something strange about E(S).
(120) A thousand cases of beer, and John’s going home, strangely enough.
The strangely enough, without emphatic stress in the sentence, refers to the
entire activity, and not simply to who is going, or where John is going.
A sentence which contains still, which carries with it the notion of exceptional
behavior, is also acceptable even with normal stress.
om-sentences 49
23
In the subsequent discussion I assume knowledge of the standard analysis of questions and
negation found e.g. in Klima (1964) or Culicover (1971).
50 explaining syntax
Transformational:
T
WH neg WH
Rwh R1
⬘WH⬘ ⬘neg⬘
Interpretative:
WH neg
Rwh R2 R1
⬘WH⬘ ⬘neg⬘
Figure 2.3
Other things being equal, I would be inclined to say that the interpretative
analysis is to be preferred over the transformational. There is no evidence that
T is independently motivated, and the grammar must be able to generate
questions in any case.
There are two kinds of evidence which would suggest that the transform-
ational analysis was to be preferred. First of all, if we found a case in which a
structure containing formal negation but lacking a negative interpretation had
to be related transformationally to a structure having the form of a question,
this would constitute an independent motivation for T. Second of all, if we
found a structure which was fundamentally a question, and had the interpret-
ation of negation, but which displayed distributional characteristics possessed
by negation and not by questions, then this would argue for the derivation of
rhetorical questions from negation by applying T after the application of the
well-motivated distributional rules pertaining to negation.24
Let us, therefore, consider these two hypothetical cases. It turns out that a
sentence in which negation is present, but which lacks a negative connotation,
already has the form of a question, e.g.
(126) Why don’t you sit down next to me.
(127) Aren’t you the guy who wants to marry my daughter?
(128) Haven’t we had fun!
Since all these examples require the presence of an element which causes
inversion, it is not clear what would happen if negation in each case was
replaced by another element which also causes inversion, i.e. WH. This
24
Cf. Klima (1964) and Jackendoff (1969).
om-sentences 51
not
(131) John does *whether like peanut butter.
*if
nothing
(132) John is doing nothing very interesting.
*what
The first kind of evidence discussed constitutes an argument against a trans-
formation T. The second kind of evidence, while consistent with a transform-
ation, is predicted by the interpretative approach, which insists that a
rhetorical question can be nothing more than a question with a special
interpretation. Thus the existence of rhetorical OM-sentences argues in
favor of the interpretative approach which we have established in general
for the derivation of OM-sentences.
25
Even if (130) was grammatical it would not have the interpretation predicted by the
transformation T, as a consideration of (129) will show. (130) would have to be derived
from (i)—
(i) neg someone has learned something
—which still leaves open the question of why the surface structures (129b) and (129c) do not
have the rhetorical interpretation when derived from (129a).
52 explaining syntax
2.7 Summary
In the course of this discussion arguments have been made in favor of a
number of claims:
(a) The interpretation of an and-OM-sentence is systematically
unspecifiable.
(b) Hence we would be incorrect in deriving the surface NP from an
underlying S.
(c) No syntactic generalizations would be captured by deriving conse-
quential OM-sentences from underlying if-then conditionals. Further-
more, doing so would require a semantic condition on the well-
formedness of certain deep syntactic structures.
(d) Hence we require the phrase structure rule S ➝ NP CONJ S.
(e) Properly stated rules of semantic interpretation can adequately cap-
ture the similarity between consequential and-OM-sentences and if-
then conditionals.
These conclusions cast strong doubt, of course, on the validity of any linguis-
tic theory, such as generative semantics, which in effect requires that para-
phrases be identical in deep structure to the extent that they are identical in
interpretation.
Points (c) and (d) above in particular show, assuming that they are valid,
that there exist constructions which possess different deep structures but
which share a significant portion of their interpretations. I suspect that such
a situation will prove to be quite common, and furthermore that it may even
turn out to be the most natural state of affairs with respect to the relationship
between the totality of syntactic structures of a language and the set of
possible interpretations.26
26
The analysis in this article thus constitutes one of the earliest published entries in the brief
for construction grammar, later developed explicitly by Fillmore, Kay, Goldberg, and others.
3
Remarks on Chapter 3
My intent in this writing this article was to capture the fact that languages
show a significant degree of constructional coherence that cannot be reduced
to meaning. In part this was an argument for the autonomy of syntax, an issue
that was hotly debated in generative grammar in the 1960s and early 1970s. In
part it is an early argument for constructional inheritance (which I called
‘coherence’), which came of age some 20 to 30 years later with the emergence
of Construction Grammar (Fillmore et al. 1988; Goldberg 1995; Kay and
Fillmore 1999: Fillmore 1999; Kay 2002a; Sag 1997, among others).
The article focuses on English tags, such as those found in tag questions.
I argue that tags are characteristic of English, and that they are found in a set
of formally similar but distinct constructions. These constructions cannot be
reduced to a single general construction because each special case has its own
particular function and meaning. I used evidence from rule ordering (which
was a prominent device at the time) to argue that there can be no uniform
syntactic derivation of all of the different tags. I argue that a grammar in
which the same structure is used in distinct constructions is more ‘natural’
than a grammar in which the constructions use unrelated structures.
I propose a measure to capture this naturalness even when the various
constructions cannot be collapsed into a single construction.
* [This chapter appeared originally in Journal of Linguistics 9: 35–51 (1973). It is reprinted here
by permission of Cambridge University Press.]
54 explaining syntax
(6) I
A
II
B
III
Let us consider, therefore, what these rules A and B might be, and how the set
of rules in question is ordered.
a
At the time that this article was written, rule ordering was considered to be an empirically
significant component of grammatical descriptions. On the view current at the time, languages
could differ by having the same rules with different orderings, which would yield different patterns
of grammaticality. Crucially, two rules could be collapsed into a single rule only if they were
ordered adjacent to one another. While the argument in this article demonstrates that the rules for
English tags are not ordered adjacently and so cannot be collapsed, a contemporary interpretation
of the result is that they are distinct constructions that cannot be characterized in uniform terms.
the coherence of syntactic descriptions 55
3.2 Orderings
It can be argued, first of all, that the well-known rule of neg-placement (Klima
1964; Jackendoff 1969), which we will abbreviate here as NEGP, must be
ordered after TQF in order for certain generalizations to be captured. Of
particular importance is the fact that at most one neg can appear in a given tag
question:
(7) Harry fell first, didn’t he?
(8) Harry didn’t fall first, did he?
(9) Harry fell first, did he?
(10) *Harry didn’t fall first, didn’t he?
It is possible to capture this generalization quite elegantly by ordering TQF
before NEGP.
To see this, observe that the essential function of TQF is to duplicate the
underlying aux to the right of the sentence. The function of NEGP is to move
neg into the aux of a sentence. If we assume that the sentence has at most one
neg in deep structure, and if we accept the preceding characterization of TQF,
then the ordering of TQF before NEGP accounts automatically for the
distribution of neg in tag questions. Since TQF in effect creates a new aux, it
follows that no special statement is necessary to represent the fact that neg
may appear in another aux, but not both. This fact is not captured in a
particularly revealing way by Klima (1964), who has the ordering
NEGP
TQF,
with insertion of neg into the tag only if the aux of the underlying structure
does not also contain neg.
Of course, it is necessary that we constrain NEGP in such a way that it does
not insert the neg of one S into another S, as could conceivably occur in the
case of conjoined sentences in which neg was attached to the leftmost S. Since
there is no comparable syntactic generalization to be captured by permitting
neg to be inserted anywhere in the entire conjoined structure, we must restrict
the scope of NEGP to a single S. Whatever the convention to do this might be,
we must then give the tagged sentence a structure such that NEGP will not be
constrained from inserting neg into the tag. What this means, basically, is that
the structure of a tagged sentence is not that of a conjoined sentence, which is
not a particularly surprising claim.
56 explaining syntax
neg
(b) NP AUX X NP AUX
neg
On the basis of the preceding discussion it seems correct to order TQF before
NEGP:
TQF
NEGP.
For the reader who has doubts about the validity of an analysis such as the
above, particularly because of the fact that (a) and (b) are from the same deep
structure, yet differ in meaning, I have a few (hopefully) soothing words.
While it is true that most transformational grammarians have operated
during the past seven years or so with the criterion that sameness of meaning
should be represented by sameness of deep structure, Jackendoff (1972)
pointed out that this criterion was nothing more than a version of the
Katz–Postal Hypothesis (cf. Katz and Postal 1964). The continuing validity
of a hypothesis depends crucially on its applicability to a wide variety of cases.
In the situation under discussion, we have a putative syntactic generalization
which cannot be captured if the sameness of meaning criterion is applied
rigorously. Therefore we can only conclude either (a) that the putative
syntactic generalization is a spurious one, or (b) that there is at least one
exception to the criterion. Since I cannot accept the first conclusion, I am
forced to accept the second. Jackendoff (1972) demonstrates that the class of
exceptions is quite numerous, so that the hypothesis can be seriously ques-
tioned on general grounds, and not merely on the basis of isolated incidents
such as the one discussed here.
A consequence of this view is that I do not assume the existence of deep
structure morphemes whose only function is a semantic one. For example, in
the analysis below in }3.3 I state the inversion transformation in terms of WH
only, and do not include Q (as do Katz and Postal 1964), since the existence of
Q is not required in order for us to capture the relevant syntactic
generalizations.
the coherence of syntactic descriptions 57
3.3 Neg-contraction
The other rule which we must bring into this analysis is neg-contraction
(NC). It is important first of all to determine what the precise statement of
this rule is. In order to do this we must also consider the rule of Inversion,
which I state below:
(12) Inversion:
WH NP TENSE(+[+v]) X
1 2 3 4 => 1 3 2 4
where [+v] = Modal, have or be
The application of this rule, which is also a well-known one, will result in
surface strings like (13) and (14):
(13) What did you give to Hermann?
(14) Did you give a hammer to Mildred?
It turns out that it is possible to capture an interesting generalization by a
judicious ordering of Inversion and NC, provided that NC is stated correctly.
When NC applies before Inversion, the contracted negation is inverted along
with [+v]. E.g.:
(15) Didn’t you like the concert?
(16) Did you not like the concert?
(17) *Did not you like the concert?
(18) *Did you n’t like the concert?
From (17) it can also be seen that if NC does not apply, then Inversion cannot
move neg to the front of the sentence. It appears logical, therefore, that for the
simplest expression of Inversion we should state NC so that it attaches n’t to
something that will later invert.
One candidate for this would be [+v]. However, even in sentences with no
[+v], NC applies to neg and then Inversion inverts the resulting structure.
Note that the rule ordering must be:
NC
Inversion.
It follows that NC cannot attach n’t to do (from do-Support) if there is no
[+v], because do-Support must follow Inversion. This is shown by the
following example, where negation is absent:
(19) Do you like Crunchy Fazola?
58 explaining syntax
Hence NC must attach n’t to TENSE, which is always present. The transform-
ational mapping NC will look something like (20):
Past Past
n’t
Pres Pres
It can be seen that a consequence of this is that the output of Affix-hopping
(Chomsky 1957: }5.3) in case a [+v] is present will be [+v]+ Past + n’t, which
Pres
is the correct ordering of morphemes in surface structure. By ‘correct’, I mean
that, given that can + Past ) could, no extra statement is required to predict
the surface form of can + Past + n’t. The surface form of can + n’t + Past,
however, is not nearly as predictable. If n’t was attached to [+v], giving, for
example,
MODAL
can n’t
then Affix-hopping would have as output the string
Past
*can + n’t + .
Pres
This is interesting in view of the independent argument against attaching n’t
to [+v] mentioned above.
If n’t was not attached to either TENSE or to [+v], then we would be
obliged to mention the optional presence of n’t in the structural condition of
Inversion, i.e.
. . . NP TENSE (+[+v]) (+n’t) . . . ,
which misses the generalization concerning contraction and inversion.
It can now be shown that NC precedes ITF. ITF is used to generate sentences
like (5), repeated below for convenience.
(5) Leave me alone, will you.
To determine the ordering relationship between NC and ITF we must con-
sider the paradigm illustrated by (21)–(24).
(21) Leave me alone, won’t you.
(22) *Leave me alone, will you not.
(23) *Don’t leave me alone, will you.
(24) *Don’t leave me alone, won’t you.
The crucial sequence of constituents involved in the derivation of imperative
tags is underlying NP-AUX. As can be seen, neg may be found in the tag only
when it has contracted. Compare (21) and (22) with the tag questions (25):
didn⬘t he?
(25) Harry swam breast stroke,
did he not?
This means that the rule which forms the imperative tag will create a tag with
neg in it if contraction has applied, and will not apply at all if neg is present
but has not contracted. Hence, ITF must follow NC.
TQF
NEGP
NC
ITF
Given that the AUX of the emphatic tag is identical to the first element of
the underlying AUX, it was reasonable to suppose that ETF uses the sequence
NP-TENSE (+[+v]) in forming tags. If ETF preceded NEGP, then the latter
rule would have two possible locations in the sentence available for the
placement of the neg. While this would be a correct formulation in the case
of TQF, sentence (26) shows that ETF must follow NEGP.
On the other hand, if NC preceded ETF, then we would expect contracted
neg to appear in the tag, since n’t would be attached to TENSE, and a copy of
TENSE appears in the tag. Sentence (26) again shows that neg does not appear
in the tag: therefore ETF must precede NC. This establishes our final rule
ordering.
TQF
NEGP
ETF
NC
ITF
3.8 Similarity
(a) The claim that English is more natural than *English arises from the
linguist’s intuition. If the linguist’s intuition is wrong, he will never get
anywhere in accounting for linguistic phenomena, but will bog down
in linguistics’ version of the epicycle. There is no a priori notion of
‘natural’ on which we can base our arguments.
(b) It is not clear that the three rules would collapse even if the ordering
arguments were shown to be invalid. Observe that TQF puts a pro-
noun in the tag, ITF puts a personal pronoun you in the tag, and ETF
leaves a pronoun behind in the sentence. Examples (33)–(35) illustrate
these points.
he
(33) John’s not very bright, is ? (From TQF.)
*John
you
(34) (Someone) pick up the phone, will ? (From ITF.)
*he
He
(35) ’s really something, is John. (From ETF.)
*John
Thus it is not conclusively the case that collapsing would capture the desired
generalization even if the rule ordering could be made appropriate.
The only other possible assumptions that would uphold the conclusion in
}3.7(b) is that our notion of transformation, i.e. our notion of linguistically
significant generalization, is not a useful one. If our theory utilizes a notion of
generalization such that application of that notion of generalization prevents
us from capturing other generalizations which we feel are significant ones,
then our notion of generalization may be faulty in the first place. At this point
we can do no more by way of discussion than to repeat the statement made in
}3.8(a).
(c) It will be noted that a Kisseberth-type constraint is a negative con-
straint; that is, it tells us what kinds of structures we can never have.
Since tags are a type of structure which we may have, and a type of
structure which is ‘natural’ to find in a particular location in the
sentence, a Kisseberth-type constraint would not appear to have
much usefulness in the situation under discussion.
(d) It seems more likely that a Emonds-type constraint would be applic-
able here. Assume that English had a phrase-structure rule like (36)—
64 explaining syntax
It is important to point out here that even if the set of possible root
transformations was severely constrained, we could not predict within the
present framework that it is more natural for a language to select fewer of the
possible root structures.
An analogue to the notion of structure-preserving, non-root transform-
ations, however, must consist of a principled method by which we can constrain
the class of non-BC-structures possible for a given language by appealing to a set
of rules. The feature of tags which was judged to be notable was the fact that the
ultimate location of the tags in surface structure was the same, regardless of
their source. Hence the feature of the rules in question which is of interest is
their basic similarity to one another, and not the form of their output per se.
What we really require, in fact, is a means of expressing the notion that the
output of two transformations which are not consecutively ordered is similar
with respect to the relative position in surface structure of certain well-defined
sub-phrase-markers which are crucial to the statement of the two transform-
ations. In a sense, then, structure-preservingness is a special case of the
maximization of similarity, rather than similarity being a kind of structure-
preservingness.b
b
Here I am anticipating a point that I have since elaborated elsewhere (e.g. in Culicover and
Nowak 2002 and Culicover 2013): what have been formulated as grammatical constraints in
earlier work may actually be reflections of markedness. Constructions that obey these con-
straints are more highly valued than those that do not, but the latter are nevertheless theoretic-
ally possible and may occur in natural languages under certain circumstances.
66 explaining syntax
3.10 Definitions
Definition 1: (a) TC is the set of transformations which only change the value
of a feature for a single constituent of the tree. Given con-
stituent A and feature F, TC(A, F) does one of the following:
(i) A A
+F –F
(ii) A A
–F +F
A
+F
(iii) A
A
–F
c
The idea of measuring similarity in terms of primitive operations such as deletion and
insertion is due originally to Levenshtein (1966). I was unaware of Levenshtein distance until
hearing a colloquium in the 1990s by John Nerbonne on applying it to the analysis of Dutch
dialects (reported on in Nerbonne et al. 1999 and other work). Jirka Hana and I apply Levensh-
tein distance (or ‘edit distance’) to measuring the complexity of morphological analyses in
Ch. 13 of this book.
the coherence of syntactic descriptions 67
d
However, it is more or less the set of primitive operations assumed by Chomsky in his
earliest work (1955) and most recently in the Minimalist Program (1995).
e
A constraint against sequences of operations that take one back to where one started was
subsequently proposed by Pullum (1976), under the rubric of the Duke of York Gambit, and has
been followed up in phonological and morphological theory by e.g. McCarthy (2003).
A constraint ruling out Duke of York derivations is required for the learnability proof of Wexler
and Culicover (1980).
68 explaining syntax
3.11 Coherence
Consider now the following transformations:
T1: A B C X ) A B C X A B C
T2: A B C X ) X A B C
T3: D B C X ) D B C X D B C
By Definition 3 we have T1 ~ 4 T2, T2 ~ 3 T3, T1 ~
1 T3. (All three values are not
symmetric, e.g. T2~T
3 3, etc. but T 3~T
5 2 and T 3~
2 1.) Now let us consider *T1, *T2,
T
and *T3.
*T1: A B C X ) A B C A B C X
*T2: A B C X ) X A B C
*T3: D B C X ) D B D B C C X
By Definition 3 we have *T1~
4 *T2, *T2~
6 *T3, *T1~
4 *T3.
It is possible to see how the fact that one set of transformations is more
‘coherent’ than another is reflected in their respective similarity values. In all
likelihood a coherence-measure could be devised which would, on the basis of
the aggregate differences in similarity values in different sets of transform-
ations, reflect these differences as a single value. The utility of such a device is
open to question, however.
Consideration of a few examples will show why it might be desirable to
treat the smallest possible similarity measure between two transformations as
being significant. The following should serve to illustrate:
T4: A B X ) B X A
T5: C B X ) B X C
*T4: A B X ) B X A
*T5: C B X ) B C X
Observe first that the movement, in T4, of A to the right of X can be simulated
by inserting a copy of A to the right of X and then deleting the first
A. Similarly, the similarity between T4 and T5 can be measured by inserting
C to the right of A in the output of T4 then deleting A. Thus in this sense T5 is
2-similar to T4.
However, the same procedure will assign the value 2 to the similarity
between *T4 and *T5. Simply insert C to the right of B in the output of *T4,
and then delete A. It is clear, however, that T4 and T5 are more similar than *T4
and *T5. By introducing TR, the elementary replacement transformation, we
can relate T4 and T5 by the single application of TR, so that T4~ 1 T5. However,
*T4~2 *T5 at best, no matter how the similarity value is determined.
the coherence of syntactic descriptions 69
Notice that it will be necessary to constrain TR in some ways, and to take into
account the surface structures being considered, and not simply the surface
strings. For example, if we replaced each element in the structural change of
*ITF by the corresponding element in the structural change of *ETF, we
would be able to say that the two transformations are 7-similar. However, in
establishing the similarity measure it was not our intent that it be applied
blindly, but rather that it reflect in some way the true extent to which
transformations are performing similar functions. It is necessary therefore
that we interpret the elementary transformations {TU} as having the charac-
teristic common to all transformations of mapping trees into one another,
and that we interpret the similarity measure as strictly speaking being defined
on the output trees, and not the input strings, of transformations.
general and in grammar formulation. I will give one example of each kind of
application.
Evaluation: Linguists have not infrequently expressed dissatisfaction with the
idea of symbol-counting as an appropriate or adequate evaluation device. As
an example, consider the case of T4 – T5 in }3.11 compared with **T4 – **T5
below:
**T4: A B X ) B A X
**T5: C A X ) B A X
The pairs may be collapsed as follows:
TØ: A B X ⇒ B X A
C C
**TØ: A B X ⇒ B A X
C A
In each case the uncollapsed set of rules contains twelve symbols, while the
collapsed set contains eight. The greater similarity between T4 and T5 is not
captured by this notation. We see however, that T4 is 1-similar to T5, while
**T4 is 2-similar to **T5.
It seems to me that while collapsibility ought to play a role in the evaluation
of grammars, it is certainly not a sufficient simplicity criterion. If, however, it
turned out that there was evidence against the collapsing conventions which
we are fond of using, the notion of relative similarity or coherence could still
be used to capture certain kinds of generalizations which were implicit in our
use of the conventions.
Formulation: Some linguists have often made use of the argument that if
A displays many of the same characteristics as B, then A should be analyzed as
a B, either by using the notation B or by postulating the structure [B A] (see
±A
e.g. Lakoff 1971; Ross 1969a). The main argument given in favor of this step is
that the appearance of the term A in more than one rule would constitute
B
the loss of a linguistically significant generalization.
It can be seen, however, that a grammar which has n occurrences of A is
B
more coherent than the same grammar with n–1 occurrences of A and one
B
occurrence of A , and is therefore more highly valued if we include coher-
C
ence in our evaluation metric.
4
Remarks on Chapter 4
Sentence stress is not always a sufficient condition for interpretation as focus.
An insightful analysis of the appropriate generalizations can be accommo-
dated under a ‘modular’ approach to grammatical theory. Certain observa-
tions concerning the stress properties of wh-questions are shown to be
consistent with the assumptions of trace theory as developed in e.g. Chomsky
and Lasnik (1977), where the relationship between focus and stress is mediated
by S-structure. The notion of focus has no consistent pragmatic characteriza-
tion; it is, rather, a grammatical notion. The interpretation of this grammat-
ical notion in particular discourse contexts is provided by rules of Discourse
Grammar using the predicate ‘c-construable’, which is here defined.
Our goal in this article was to account for the correspondence between the
location of the focal stress and the focus interpretation. We believed that this
could be explained by assigning a focus feature F (which we borrowed from
the work of Jackendoff and Selkirk) to a node or nodes in the syntactic structure,
and then mapping this structure into prosody, on the one hand, and a focus
interpretation, on the other. This early study was refined and elaborated by
Rochemont (1986; 1998).
4.1 Introduction
In this paper we address the issue of how to relate focus and stress in English
sentences, particularly within the framework of the (Revised) Extended Stand-
ard Theory. Specifically, we will show that, with some refinement, the
* [This chapter appeared originally in Language 59: 123–65 (1983). It is reprinted here by
permission of the Linguistic Society of America. The order of the authors is strictly alphabetical.
We would like to thank Dwight Bolinger, Larry Hyman, Will Leben, and an anonymous
Language referee for helpful comments. The research reported on here was supported in part
by grants from the National Science Foundation (BNS-7827044) and the Sloan Foundation.]
72 explaining syntax
1
We assume this for expository convenience. We take no principled position on the question
of where lexical insertion operates in the organization of grammar (cf. Otero 1972).
stress and focus in english 73
Base
Movement Transformations
S-structure
Deletions Construal
Filters Interpretive Rules
Stylistic Rules (e.g. Quantifier Raising,
WH-Interpretation, Focus)
Surface Structure Principles of Anaphora
Accent
Placement Logical Form & Conditions
Rules
Rules of Discourse
Prosodic Structure Grammar
Phonology
Phonetic
Representations
Figure 4.1
2
Bolinger (1961) uses the terms ‘accent’ and ‘stress’ differently than we do here: for him
‘accent’ designates phrasal stress, while ‘stress’ designates lexical stress. See below for discussion.
3
We make a fundamental assumption here that the semantic effects of stress, in particular
nuclear stress, can be identified and studied independently of the contribution to meaning of
intonation contours. This point will become clearer in }4.3, where we discuss the interpretation
of focus. But for the present, consider the following example:
(a) Bill likes only Mary.
This sentence can be pronounced with different intonation contours. For our purposes,
consider only the following—a simple declarative contour and a typical interrogative contour:
(b) Bill likes only Mary.
(c) Bill likes only Mary?
In either case, the location of the nuclear stress can be identified with the focused constituent,
completely independently of the meaning contribution of the intonation contour. That is, Mary
is focused in both cases. It might be possible to defend and maintain the view that varying
intonational possibilities for a single sentence, with a particular focus specification, can be
defined by altering pitch ranges across the pre-head, head, and tail, while keeping constant the
pitch range over the nucleus; this possibility was suggested to us by H. Borer. However, we will
not explore it further here. Returning to the interpretation of intonation contours, we suspect
that particular contours define conventional implicatures; in the case of (b) and (c), the
different implicatures may be responsible for the fact that a reading which expresses surprise
at the focus is forced in (c), but not in (b).
stress and focus in english 75
4
It should be noted that the approach of Liberman and Prince encounters problems in
nominal compounds where accent placement has potential interpretive effects. To illustrate,
consider an example drawn from Ladd: he notes that, with a nominal compound like faculty
meeting, the relational theory predicts that the strong accent will be placed consistently on the
left constituent. But the strong accent appears just as naturally on the right constituent in
particular, non-linguistically determined contexts. Consider, for instance, the following dialog:
A: Has the faculty voted on that issue yet?
B: No, they will be discussing it at the faculty meeting tomorrow.
It is possible that our approach can be extended to such cases; but we will not pursue this
suggestion here.
5
Dogil (1979) proposes an analysis somewhat similar to ours, in that he also develops an
interpretive model incorporating a relational theory of stress. He assumes, however, that
determination of focus is based on P-structures rather than S-structures—a proposal which
clearly must be abandoned if one assumes a non-isomorphic mapping between S-structure and
P-structure. That is, if the derivation of P-structures alters the syntactic constituent structures, a
constituent-related definition of focus will be impossible. Dogil’s approach is more ambitious
than ours, however, since his proposals are extended to accommodate instances of lexically
contrastive stress—a topic which we have ignored. Some modification of his proposals might
yield a system consistent with our view.
6
David Gil (p.c.) has suggested to us the possibility of generating P-structures independently
(rather than deriving them from syntactic structures), and of defining an algorithm to pair up
syntactic and prosodic structures. We reject this alternative for two reasons. First, in Gil’s
system, the pairing mechanism is sensitive to semantic and pragmatic considerations—an
analysis that is inconsistent with our Autonomous Systems view. Second, a major motive for
Gil’s proposal is his contention that, as a universal property, languages have rightmost strong
accent placement. Thus defining accent-placement rules for languages with radically distinct
syntactic structures, in order to derive essentially similar prosodic structures for these languages,
would give rise to unnecessary theoretical complications. Clearly, however, languages do differ
in prosodic structure. Furthermore, should it turn out that such prosodic differences are
paralleled by syntactic differences, our approach would be even more strongly preferred. Barring
this consequence, an explicit analysis of the type Gil suggests would, in our view, amount to
nothing more than a notational variant of our analysis, with the notable disadvantage that it
does not conform to the Autonomous Systems view.
76 explaining syntax
relational theory of Liberman (1979) and Liberman and Prince (1977): i.e. each
non-terminal node in a prosodic tree is binary-branching, dominating one s
node and one w node. Thus each constituent is assigned prominence relative
to its sister constituent. In order to accommodate instances of multiple
primary stress, as in example (1) below, Liberman’s relational theory must
be modified so as to allow prosodic nodes to dominate two s sisters, i.e.
constituents which are perceived to have equal relative prominence:7
(1) John told bill about susan, and sam about george.
On the plausible assumption that not all syntactic structures are binary-
branching, then either syntactic and prosodic structures must be non-
isomorphic, or else we must abandon the strict definition of ‘prosodic tree’
which characterizes the approach outlined above. We adopt the former alter-
native, for the following reason. Under the assumption that syntactic and
prosodic structures are isomorphic, we would expect Figure 4.2(a), which has
binary branching, to have a P-structure like Figure 4.2(b). (‘R’ is the
node used to define the root of the tree, following Liberman and Prince.)
(a) PP (b) R
P NP w s
from
from PP w s
towns
N P NP w s
in Germany
towns in Germany
Figure 4.2
Figure 4.2(b) implies that each of the first two lexical items in the phrase
has relatively greater prominence than the item following—which strikes us as
false. The P-structure in Figure 4.3, however, more accurately represents the
perceived relative stress levels.
R
w s
w s w s
from towns in Germany
Figure 4.3
7
We adopt the convention of using small capitals to signal primary (nuclear) sentence stress.
We do not share the view of Schmerling (1976) and Bing (1979) that, in sentences like our
stress and focus in english 77
example (1), the rightmost stressed element is perceived to have relatively greater prominence
than the other stressed elements in the sentence. In our view, all the nuclear stressed elements
have equal relative prominence. As far as we know, there is no conclusive empirical evidence
bearing on this issue.
8
Given the general unavailability of empirical studies on phonological phrasing, we acknow-
ledge the unreliability of intuitions in subtle cases. However, in the cases we discuss, the relevant
intuitions are apparently well-defined and consistent across speakers.
9
It has been pointed out to us by a referee that the metrical grid construction of Liberman
and Prince is designed to accommodate many of our observations concerning prominence and
constituency. However, it is our contention that, in being extended to sentential domains, the
metrical grid approach makes certain false predictions. For instance, this approach predicts that
alternating ‘upbeats’ can optionally appear, so long as no violation of the Relative Prominence
Projection Rule (RPPR) occurs. Consider in this connection a sentence like (i).
(i) John may have been dating mary.
The grid construction method predicts that have can optionally be stressed without violation
of the RPPR—which is clearly false; i.e. have cannot optionally be stressed in (i), unless it bears
nuclear stress, and hence is focused.
A similar case is provided by the following example:
78 explaining syntax
(a) S (b) S
NP AUX VP NP VP
N⬘ is V NP N⬘ x
N dating DET N⬘ N x NP
brother is dating x
DET N
my brother
Figure 4.4
11
We understand the term ‘head’ here to refer to the lexical head of a phrase, not a phrasal
head (cf. Jackendoff 1977). In addition, we define a recursive node X as one which dominates
another node Y, such that X and Y are identical in terms of syntactic-feature make-up and
number of primes.
(a) VP (b) VP
V NP PP NP PP
send DET N⬘ P NP N⬘ x
a N PP to N⬘ x PP P NP
V x x to N⬘
book P NP N
send DET N P NP N
about N⬘ Mary
a book about N⬘ Mary
N
N
Nixon
Nixon
Figure 4.5
DET N⬘ N⬘
our N PP x PP
the N with N
president x
DET N
the president
Figure 4.6
P1 NP1 NP1
N in N⬘ from N⬘ in N⬘
towns N N N
Figure 4.7
stress and focus in english 81
Figures 4.4–4.7 illustrate the application of the Head Rule. We use ‘x’ here
simply as a place-holder for new nodes that are added in the course of the
derivation.12
Note that, in Figure 4.6(b), P is not combined with det by the Head Rule.
The node NP2, in this case, is in fact the highest right-adjacent, c-commanded,
non-recursive node with respect to P. In this respect, Figure 4.6 contrasts with
Figure 4.5. In Figure 4.5, the NP object of V has essentially the structure of
Figure 4.6(b) after the application of the Head Rule. Subsequent application
of the Head Rule in the domain of the VP in Figure 4.5 forces V to combine with
[x det N], rather than with NP or N0 , since both NP and N0 are recursive in
this case. Similarly, in Figure 4.7, P1 combines with NP2 rather than with NP1.
Figure 4.8 illustrates the application of the Sisters Rule.13
NP NP
NP CONJ NP NP x
the woman
Figure 4.8
12
Following Jackendoff (1977), we take V to be the head of S.
13
Examples like the following suggest either that the notion ‘c-command’ in (2a) is not
sufficiently strong, or that the Sisters Rule must apply before the Head Rule:
NP
NP and NP
N⬘ N⬘
N N
John Mary
We will not pursue this matter further.
82 explaining syntax
Our proposal differs from that of Selkirk (1978: 20), who suggests an
alternative algorithm for defining the mapping from syntactic structures to
P-structures:
(4) a. An item which is the specifier of a syntactic phrase joins with the head
of that phrase.
b. An item belonging to a ‘non-lexical’ category (cf. Chomsky 1965),
such as det, p(rep), comp, verbaux, or conj, joins with its sister
constituent.
The results contrast consistently with those of rule (2), above. Consider, for
example, the following:
(5) On a side street—in the Soho district—of London.
The application of (4) yields a binary-branching structure something
like Figure 4.9. But rule (2) will yield the binary-branching structure of
Figure 4.10.
Note that Figure 4.10 is consistent with our assumption that P-structure is
instrumental in the determination of phonological phrases, in the sense in
which we have defined them. That is, Figure 4.10 (but not Figure 4.9) yields an
appropriate constituent-structure characterization of the major options for
pauses in example (5).
Selkirk marshals empirical arguments in favor of having the structures
given by rule (4) define the domain of application of particular phonological
rules; these too can be applied essentially without revision to our proposal
(but cf. fn. 9, above). Given the broader range of applicability of rule 2, we
therefore prefer it over rule (4).
PP1
P1 NP1
on NP2 PP2
a sidestreet P2 NP3
in NP4 PP3
Figure 4.9
stress and focus in english 83
PP1
NP1
x PP2
P1 NP2 NP3
on a sidestreet x PP3
P2 NP4 P3 NP5
Figure 4.10
14
Neutral Accent Placement subsumes the effect of Liberman and Prince’s rule (8a).
15
Only at the level of P-structure do the assignments of s and w have any interpretive
value. Thus we do not consider our analysis to violate the strict definition of prosodic
trees given by Liberman and Prince, whereby a node in a prosodic tree can only be
interpreted in relation to its sister constituent. At the level of P-structure, where
this interpretive principle must be seen to hold, all prosodic nodes are defined in this
relation.
16
We suspect that this derivation gives evidence for an additional rule of types (2)–(3).
Specifically, it appears that comp should combine with the subject NP in the embedded
sentence of Figure 4.13(b). Such a rule might also be formulated so as to generalize to cases
like the following: This analysis is adopted in the discussion of wh-questions in }4.2.4. For
discussion of an additional class of cases such as these, see Selkirk (1972).
stress and focus in english 85
(a) S (b) S
NP VP NP VP
N⬘ V NP PP N⬘ x PP
N N⬘ P NP N V NP P NP
N N:s N
NP VP:s w s
John
N⬘ x:s PP s w
N V NP:s P NP w s w s
saw Mary in
N⬘:s DET N⬘ w s
the park
N:s N
Figure 4.11
4.2.3 Stress
As indicated in }4.2.2, the primary stress of a sentence is identified with the
designated terminal element. Traditionally, stress levels have been indicated
for English by the assignment of numerical values, in which ‘1’ generally
marks the position of primary stress, ‘2’ that of secondary stress, ‘3’ that of
tertiary stress, and so on. It has been a common criticism of the analysis of
English stress proposed by Chomsky and Halle that it results in implausibly
fine distinctions in perceived stress levels, by allowing for the assignment of
numerical values in excess of ‘4’ or ‘5’ (cf. Bierwisch 1968). The proposals of
both Liberman and Prince and of Liberman are intended to some extent to
overcome this criticism. However, it also applies to the algorithm which they
propose (Liberman and Prince 1977: (25a)) to relate accent placement to
perceived relative levels of stress:17
17
Liberman and Prince in fact take the position that assignment of stress levels is of little
linguistic interest. As the following discussion shows, we can make stress level a more interesting
notion by defining it in terms of domains of maximal stress prominence, where ‘domain’ has a
structural characterization.
86 explaining syntax
(a) S (b) S
NP VP NP VP
N⬘ V NP PP N⬘ NP PP
N DET N⬘ P NP N N⬘ x
N PP N⬘ x PP P NP
P NP:s N V x x N⬘
N⬘ DET N P NP:s N
N N⬘
NP VP:s w s
John
N⬘ NP:s PP s w
N N⬘:s x w s w s
to Mary
x PP:s P NP w s w s
sent about Nixon
V x x:s N⬘ w w
a book
DET N P NP:s N
N⬘
Figure 4.12
(11) If a terminal node t is labeled w, its stress number is equal to the number
of nodes that dominate it, plus one. If a terminal node t is labeled s,
its stress number is equal to the number of nodes that dominate the
lowest w dominating t, plus one.
This will assign the stress levels in the following example to the P-structure of
Figure 4.13(d):
stress and focus in english 87
(a) S (b) S
NP VP NP VP
N⬘ V S⬘ N⬘ V S⬘
N COMP S N COMP S
N⬘ V NP N⬘ x NP
N N⬘ N AUX V N⬘
N N
NP VP:s w s
Bill
N⬘ V S⬘:s w s
believes
N COMP S:s w s
that
NP VP:s w s
John
N⬘ x NP w s
Sue
N AUX V N⬘ w s
is marrying
N
Figure 4.13
2 3 3 5 7 6 1
(12) Bill believes that John is marrying Sue
4.2.4 Wh-constructions
We have now defined in broad outline a system of rules associating instances of
primary stress with focus. Focus Assignment, and the resulting interpretation,
will operate on S-structure to which SA has applied (see }4.3). The rules involved
in the mapping from surface structure to P-structure preserve the s assignments,
w s
John
s w
w s w s
saw Mary in
w s
the park
Figure 4.14
18
It is intuitively clear that limiting the application of (13) to domains defined by S is similar
in effect to a constraint such as the Binary Principle (BP) of Wexler and Culicover (1980). That
principle will not apply to (13) directly, however, because BP refers to the B-cyclic nodes NP and
S0 . These node labels do not correspond to the P-cyclic nodes in P-structure. In fact, in many
cases no node in P-structure corresponds to a syntactic NP or S0 . Nevertheless, it would be worth
investigating whether a constraint like the BP could be independently motivated from consider-
ations of learnability in the domain of prosodic structure.
stress and focus in english 89
w s
John
s w
w s w s
to Mary
w s w s
about Nixon
sent w s
a book
Figure 4.15
w s
Bill
w s
believes
s
w s
that
w w s
John Sue
w s
is marrying
Figure 4.16
and guarantee that these constituents will contain the primary stress of the
sentence. Since we assume SA to be optional, we will relate the results of its
application to multiple-focus interpretations as well (}4.4). We assume that, in
the interpretation (perhaps at LF), all sentences must specify a focus—though not
necessarily only as a result of applying SA. We thus predict that sentences will exist
in which the location of primary stress given by the rules of accent placement will
not coincide with the constituent that functions as focus, if the focus constituent
is determined by a rule other than that which is sensitive to the application of SA.
This prediction is borne out. Thus a sentence like (18), with primary stress
on buy, can be used just as easily in a context like (16), in which case it cannot
be said that buy is focus, as it could in a context like (17), in which case it
can be said that buy is focus:19
19
Bolinger has pointed out (p.c.) that our analysis predicts that (i) and (ii), below, should be
possible in the same contexts, without a difference of interpretation vis-à-vis focus:
90 explaining syntax
(17) A: Bill took me downtown to all the big department stores today.
2 1
(18) B: Oh yeah? What did you buy?
It has been repeatedly suggested (cf. Gunter 1966; Horvath 1979; Rochem-
ont 1978) that wh-words function naturally as focus constituents of construc-
tions in which they appear. In the context defined by (16), the location of
primary stress in (18) does not coincide with the focused constituent. It appears,
then, that the wh-focus does not always coincide with a primary stress (this fact
is also noted by Gunter). It should be emphasized that we do not view these
cases as ‘preferred’ to or ‘more normal’ than other occurrences of stress. In fact,
what we will suggest is that such instances are derived by the same set of accent
placement rules that we have just proposed. Note that, in (18), stress is
rightmost in the sentence, in the position predicted by Neutral Accent Place-
ment (cf. rule (9)). This is a natural consequence of our analysis, since we claim
that SA has not applied in the derivation of (18). Consideration of additional
wh-questions bears this out:
2 1
(19) a. Who was talking to Bill?
2 1
b. Which girl did John meet in Rome?
2 1
c. Who decided to leave early?
2 1
b. What kind of creature do you think we’re up against?
2 1
c. Which seat was she sitting in?
2 1
d. What will you talk about?
2 1
e. What are you looking at?
20
To see that it is stress on a wh-specifier (and not a full wh-phrase) that yields this
possibility, consider these sentences:
1 2
(i) How many soldiers did you meet?
1 2
(ii) How many soldiers did you meet?
2 1
(iii) How many soldiers did you meet?
Clearly, soldiers is focused in (i) and not in (iii). Correspondingly, (i) and (iii) cannot be used
interchangeably in any context without affecting the focus properties of the utterance in
question. But (ii) and (iii) can be used interchangeably:
(iv) A: I’m so excited! Tom took me down to Buckingham Palace today and I got to meet all
those soldiers.
B: Oh, really?!
1 2
How many soldiers did you meet?
2 1
How many soldiers did you meet?
1 2
How many soldiers did you meet?
92 explaining syntax
2 1
(21) a. Who is Bill sleeping with?
2 1
b. What kind of creature do you think we’re up against?
2 1
c. Which seat was she sitting in?
2 1
d. What will you talk about?
2 1
e. What are you looking at?
21
In addition, given the rules for interpreting focus, an s on a null element does not receive
an interpretation. This is because such an element, being bound by some other constituent,
cannot at the same time support an interpretation as new or contrastive information (cf. }3).
stress and focus in english 93
(a) R (b) R
w s w s
what what
w s w s
w s w s w s w s
did you buy e will you talk
w s
about e
Figure 4.17
P-structures like those in Figure 4.17 can be derived in two ways: by Weak Default (rule 8), or
by assigning s in S-structure to the trace of wh. Given that a wh-phrase is by definition a focus,
the first option will yield an acceptable derivation even though there is no stress focus. The
second option is independently necessary, so that we may derive the correct prosodic structure
when the trace of wh is not on a rightmost branch; e.g. What did you do yesterday?
If we allow the structure
VP
V NP
e:s
then the question arises as to whether both the wh and the VP that contains its trace can be focus.
We cannot discover any plausible interpretation of such a FA, and will therefore provisionally
adopt the convention that a trace within a focus constituent cannot be bound from outside. Such a
convention may extend naturally to rule out the following cases:
(a) Extraction from the fronted wh-phrase or topic: *Whoi did you wonder [whose picture of
ti]j John stole tj?
(b) Extraction from the focus of a cleft: *Whoi was it [a picture of ti]j that John stole tj?
(c) Extraction from the focus of a pseudo-cleft: *Whoi was what John stole [a picture of ti]?
For a discussion of extraction from focus in pseudo-clefts, see Culicover (1977), where a
somewhat different approach to the one suggested here is adopted. Delahunty (1981) suggests
that such cases may be handled by a constraint which blocks extraction from antecedents.
94 explaining syntax
w s
what
w s
w s s w
did you buy e
Figure 4.18
w s
what
w
w s s
will you
w s
talk
w s
about e
Figure 4.19
There is some reason to believe that the examples in (21) may have a
different S-structure from those in (20) (assuming our SA rule); if so, our
statement of the Switch Rule would be too broad. Crucially, the examples in
(21) cannot be used when the mutual beliefs on which they bear have not been
asserted; e.g. (21d) would not be appropriate if you had not said that you were
going to talk. Similarly, we could walk up to someone and say (20e), but not
(21e). It might be appropriate to derive (20a–e) from S-structures in which s
falls on the verb, in which case the verb would be focus. In (21a–e), s would
appear on the trace of wh, and the Switch Rule would move it onto the
preposition. Viewing the examples in (20) along the lines just suggested does
not appear to wreak havoc on the view of focus interpretation sketched below;
however, we will leave the question of the correct analysis of these examples
undecided for the present. For completeness, we indicate the form of the
alternative Switch Rule here.
(24) Switch Rule (Alternative): [sl w [s e]] ) [sl s [w e]], where [s e] is a
designated terminal element.
The analysis using (23) makes accurate predictions in a number of unexpected
cases:
stress and focus in english 95
2 1
(25) a. Which number did you look up?
2 1
b. Who did John send Mary to?
Here nuclear stress in S can fall only on the final word, if the interpretation
with which we are concerned is to be preserved. Specifically, penultimate
stress will not preserve this interpretation:
2 1
(26) a. Which number did you look up?
2 1
b. Who did John send Mary to?
The P-structure which our rules assign to the surface structure of (25b), in
which no s has been assigned by SA, is shown in Figure 4.20.
In this P-structure, rule (23) can analyze only a single non-P-cyclic node;
thus it is correctly predicted that stress must fall on the preposition in this
derivation, unlike Figure 4.17(b) above. Example (25a) is slightly more com-
plex, since we assume this sentence to be associated with two well-formed
surface structures on the intended interpretation, given in Figure 4.21.
From Figure 4.21(a,b), our rules determine the structures of Figure 4.22
(a,b), respectively. Rule (23) does not apply in Figure 4.22(a), since e is not
immediately dominated by s. But it does apply to Figure 4.22(b), yielding
Figure 4.23.
In the P-structures of Figures 4.22(a) and 4.23, the final preposition is
a designated terminal element; thus it is predicted that this word receives
primary stress in the sentence, regardless of where e appears in the verb
phrase.
w s
who
w s
w s w s
did John
w s w s
send Mary to e
Figure 4.20
96 explaining syntax
(a) S⬘ (b) S⬘
COMP S COMP S
look e P look P e
up up
Figure 4.21
(a) R (b) R
w s w s
w s w s w s w s
which number which number
w s w s w s w s
did you up did you e
w s w s
look e look up
Figure 4.22
w s
w s w s
which number
w s s w
did you
w s e
look up
Figure 4.23
w s
w s w s
to talk about e
Figure 4.24
stress and focus in english 97
4.2.5 Cliticization
The Head Rule and the Sisters Rule, which map from S-structure into P-structure,
do not account for all the possible stress patterns and phrasings of English. In
addition (cf. also fn. 16 above), a class of cliticizations must be stipulated.
Consider the following, which is parallel to (21) and Figure 4.17:
(27) What are you going to talk about?
The problem that we face with going to concerns the proper attachment of to.
If to is an aux in [s pro to talk about e], the Head Rule will make it a sister of
talk in P-structure, as in Figure 4.24.
The Switch Rule (23) will put s only on about, because to talk is branching,
and so (28a) will never be derived:
2 1
(28) a. What are you going to talk about?
2 1
b. What are you going to talk about?
Suppose, however, that to is a surface-structure daughter of the VP, as in
Figure 4.25(a). Applying the Head Rule to Vand PP will then yield Figure 4.25(b).
If the Switch Rule puts s on about, the resulting P-structure will be
Figure 4.26. This structure predicts a higher stress level on to than on talk,
because they are both w, and talk is further from the closest P-cyclic node.22
To resolve this problem, we cliticize to to going after the Head and the
Sisters Rules have applied, by a rule that we will call Cliticization. (Selkirk 1972
proposes cliticization rules for English which incorporate certain clitics into
(a) VP (b) VP
to V PP to x
talk P NP V PP
about e talk P NP
about e
Figure 4.25
22
This result does not obtain if talk about e is in a different S than what are you going to. In
such a case, the lowest P-cyclic node above talk would be different than the lowest P-cyclic node
above to. Such a structure does not seem plausible to us.
98 explaining syntax
w s
to w s
talk
s w
about e
Figure 4.26
S⬘
COMP S
what AUX NP VP
are you V VP
going to V PP
talk P NP
about e
Figure 4.27
the preceding word, as does our rule.) In order for our prosodic rules to work
correctly, the clitic must in fact be made a daughter of the terminal node
dominating the item to which the clitic is attached, as we will see below. The
result of applying Cliticization to (27) is given in Figure 4.27.23
Note that the output of Cliticization here makes perfect sense in defining
the context in which gonna is derived from going to. Suppose that to in the
infinitive is not in aux in surface structure, but is as we have illustrated it
in Figures 4.25(a) and 4.27. The Head Rule does not apply to this to. In
the VP going to the store, however, to is the head of PP, and is mapped into
a P-structure phrasal unit, as in Figure 4.28. Because of the ungrammaticality
of *I’m gonna the store, it is plausible to assume that nodes like PP or x in
Figure 4.28 are frozen with respect to later rules like Cliticization.24
23
The non-branching VP may be pruned here, or may be pruned in the P-structure by the
Pruning Convention. We see no reason not to assume that the latter convention does the job.
24
The similarity of this assumption to the Freezing Principle of Wexler and Culicover (1980)
is obvious. However, we do not know whether there is a principled generalization over the two
(see fn. 18 above).
stress and focus in english 99
VP
V PP
going x
P NP
to DET N
the store
Figure 4.28
25
Bolinger has pointed out to us that this statement must be qualified. A pronoun that is
initial in a conjoined structure may be unstressed, yet uncliticized: As for John, I don’t mind
sending him or his brother to Mary.
100 explaining syntax
The two patterns are derivable by the Switch Rule, but only if give it is a
phonological unit. A full NP direct object allows stress only on to, if the
intended interpretation is to be preserved:
(31) A: I gave the book to someone.
B: Who did you give the book to?
B0 : Who did you give the book to?
As we see, give it behaves prosodically not like give the book, but like talk—as
in (22), where the Switch Rule was motivated.
26
That is, there is a mapping from S-structure to P-structure; but it must be mediated by
other components of the grammar, e.g. stylistic and deletion transformations (cf. Chomsky and
Lasnik 1977).
stress and focus in english 101
In its essentials, the system for assignment of focus is very simple. Given an
S-structure to which SA has applied, any node with s assigned to it defines a
focus constituent.27 In assigning focus, we map the syntactic representation
into another level of representation in which the focus constituent is explicitly
identified: we call this level ‘F-structure’.28 In the interpretation of focus, this
latter level of representation is related to properties of the discourse and to the
context in general.29
In this section we will sketch the details of the mapping from S-structure to
F-structure. In }4.3.1, we suggest a formal notation for the representation of
focus. In }4.3.2, we discuss the cases of apparent ambiguity of focus first noted
in Chomsky (1971).
27
We will later introduce two more stress markings to designate marked intonations: ‘s!’
represents an emphatic intonation contour, while ‘s?’ represents an echo intonation contour.
Our claim is that the intonation contour functions orthogonally to focus assignment (cf. fn. 3
above). We speculate that the focal phrases, as defined by assignment of s, are instrumental in
determining the domains to which intonation contours are assigned in P-structure.
28
We leave open the question of whether F-structure is to be identified with LF, since some of
the F-structures which FA derives violate the Empty Category Principle (ECP) of Chomsky
(1981a), Jaeggli (1982), and Kayne (1981a). One way to overcome this difficulty would be to allow
FA to apply on LFs, yielding a level of representation distinct from LF and not subject to the
ECP, taken as a condition on LF. For a number of reasons, however, we do not adopt this
alternative to the grammatical model of Figure 4.1—in particular, because this would not solve a
similar problem for the ECP which arises in connection with the interpretation of quantified
expressions in general. It is our view that these two problems may comprise a unified difficulty
for the ECP.
29
It should be apparent that the S-structure abstract marker functions to mediate between
the prosodic and the focus structures. In this respect it is similar to the deep-structure markers
Q and I of Katz and Postal (1964), which were intended to mediate between surface structure
and semantic interpretation. In each case, the marker ensures that the correlation is maintained
between the two levels of representation that are defined independently in terms of the abstract
syntactic representation, be it deep structure or S-structure.
102 explaining syntax
of the traditional predicate calculus. In the latter, the verb would be a relation
of two variables, and (34) would be replaced by the following:
(35) [ºx(loves (Mary, x)] (John)
Because the focus constituent may be a VP, an aux (or at least a modal), or
any of various expressions whose syntactic category is Xi for arbitrary i,
expression (35) would prevent us from expressing all the possible focus
constituents: e.g. it does not contain the node VP, or any counterpart to it.
Our F-structure, then, will be essentially the S-structure of English, with
variables introduced as required. For convenience, we will use lower-case
node labels for the variables, so that (34) will appear as follows:
(36) [np (Mary loves np)] (John)
The similarity between this representation and that provided by trace theory
for cases of wh-Fronting is striking—though in the absence of a detailed
theory of LF and its relationship to focus, nothing significant can be attrib-
uted to minor notational details (but cf. (28)). An equivalent way of express-
ing (36) might be (37), where the similarity to trace theory is even closer:30
(37) Johni [Mary loves npi]
We will settle for now on this last representation.
Faced only with simple examples like (34), we could formulate the Focus
Assignment (FA) rule as a ‘movement’ transformation: one which indexes some
strong constituent in the sentence, extracts it, and leaves behind a coindexed
variable of the appropriate syntactic type. We are assuming that, in S-structure,
any node may be marked with s. Let us also assume that every node in the tree
has a unique index (including terminals). For a node X, the index of X is i(X),
and Type(X) is the syntactic type of X. We then state the following rule:
(38) Focus Assignment: Let X be an arbitrary node, and let Æ be the highest
node in the tree. If X is s, then FA(Æ) is the result of appending X to Æ
and replacing X with a dummy whose index is i(X) and whose syntactic
type is Type(X), i.e. [Æ Xi [Æ . . . ti . . . ]].
30
On the standard interpretation, the º representation expresses certain logical properties,
while (37)—as well as what is referred to by linguists as LF (cf. Chomsky 1981a)—has no self-
evident interpretation. Thus the latter must be assigned a logical interpretation, or else be
translated into a standard representation, e.g. the lambda calculus. We will assume for present
purposes that such an interpreted translation for our formulas is available.
stress and focus in english 103
S⬘
COMP S1
NP2 VP3
old man
Figure 4.29
31
Example (39e) is given for completeness: it is not obvious how FA should apply if the
candidate focus constituent is the entire S0 . Apparently, the FA will apply to S in the
configuration [S0 comp S], giving an F-structure Si [S0 comp si ]. But if comp is filled, the
constituent in comp will be focus; hence this F-structure will be impossible, by the conven-
tion of fn. 21. If comp is Ø, it seems to us that FA is, properly speaking, inapplicable—in that
its intended function is to extract a focus constituent from its surrounding structure; and
such a structure is absent. In fact, it is not clear that a Ø-comp is syntactically expressed in
S-structure.
104 explaining syntax
32
We are leaving unformalized the definition of ‘6¼’ in expressions like Focl 6¼ Foc2. To make
this definition precise, we would have to develop an account of lexical contrast, so that we
may specify when two lexical items or phrases are in fact not the same. Clearly this definition
106 explaining syntax
has, in part, to do with the phonology, a matter that we choose to avoid here (see Williams 1981
for some relevant considerations).
stress and focus in english 107
What is common to all these cases is that the proposition in question, which
is in contrast with what S says, has been introduced into the discourse directly
by assertion, or obliquely by attribution or by a question—or quite indirectly,
by virtue of being construable (given certain beliefs) from a proposition that
has already been introduced. Therefore, it appears that our definition of
contrastive focus is appropriate to a more restricted case: that of direct disput-
ing of a belief which S thinks that H holds. A more general characterization of
the appropriate conditions for the interpretation of focus as contrastive would
be the following. We will say that F(t/Foc) is construable from the context,
or c-construable, if it has been asserted or mentioned (i.e. introduced
into the discourse), or if it can be inferred from what has been asserted or
mentioned, or if it is inferable from the mutual beliefs of S and H.
(46) generalized contrastive focus: In Foc1(F(t)), the element Foc1 is a
generalized contrastive focus iff F(t/Foc2)(Foc2 ¼
6 Foc1) is c-construable.33
Pragmatically, such a definition has the following consequences. If a sen-
tence is uttered which is to have a focus interpretation, then H must be able to
find a proposition in the discourse, or in mutual beliefs, for the purposes of
contrast (or else the focus must have some other interpretation that we have
not yet discussed). Given that H seeks a contrastive interpretation, and given
that no such proposition has been asserted, it must be either that S believes
that what was uttered allows the construction of such a proposition as
relevant to the discourse, or that the proposition in question is believed by
S to be generally believed. Given appropriate mutual beliefs, an utterance like
the following can have a contrastive interpretation without anything actually
having been said previously in the discourse:
(47) We can’t go to Hawaii this weekend.
The vast range of acceptable contexts for generalized contrastive focus demon-
strates rather clearly the undesirability of trying to generate the stress patterns of
sentences based on the contexts in which they are appropriate.
Chomsky (1976) gives an intriguing discussion of stress-related focus in
English. He suggests that, if we view a stress-focused phrase as binding a
variable at LF in its S-structure position, we can explain why focused NPs
behave like quantified expressions with respect to the determination of possible
33
A slight revision and extension of this definition will allow us to handle cases of multiple
focus. The trick is to indicate for such cases that the structures being contrasted are the same—
in that (a) they contain variables with parallel functions, and (b) the extracted foci can be
matched up with each other as n-tuples corresponding to the variables in the F-structures. The
examples that such an account should handle are well known, e.g. john hit susan and then she
hit him. We will not attempt to work out the details here.
108 explaining syntax
34
An unfortunate consequence of this characterization of generalized presentational focus is
that it allows (i), below, in contexts where the sentence is used to initiate a discourse, but
excludes cases like (ii) and (iii) in similar contexts:
(i) The construction crew is dynamiting.
(ii) A strange thought just occurred to me.
(iii) A man appeared.
Here (i) has a reading (i.e. can occur in a context) in which both the subject and predicate are
presentationally focused. In (ii) and (iii), the predicate cannot be presentationally focused, since
it is not stressed. (The predicates here contain examples of ‘natural’ verbs of appearance; see
Guéron 1980 and Rochemont 1978 for discussion.) However, in a context where e.g. (ii) is being
used to initiate a discourse, the predicate of appearance meets our conditions for interpretation
as presentational focus. In our terms, then, it should be stressed; but as (iv) and (v) indicate, it
need not be, since either of these sentences could be used to initiate a discourse:
(iv) A strange thought just occurred to me.
(v) A strange thought just occurred to me.
Examples like (iv) might lead us to the conclusion that true verbs of appearance like occur or
stress and focus in english 111
appear need not be focused in order to be introduced. In other words, in (iv), used to initiate a
discourse, it must be c-construable that something has just occurred to the speaker, in order for
the predicate not to be focused. Given our definition of ‘c-construable’, this proposition must be
inferable from the mutual beliefs of speaker and hearer. Let us say that the mutual beliefs of
speaker and hearer include a set of principles of discourse, along the lines of Grice’s conversa-
tional maxims (Grice 1975). We might then assume that the c-construable proposition associ-
ated with (iv) in the context with which we are concerned falls under some version of Grice’s
Cooperative Principle. In other words, propositions with natural verbs of appearance are
c-construable as a function both of their usefulness in initiating discourse and of their inten-
sions. In contrast, examples like (ii–v) might be taken to indicate a deeper distinction between
NPs and predicates as focus, as suggested by both Bing and Ladd. We will not decide this
issue here.
112 explaining syntax
Note that the wh-quantifier has this function only when it has wide scope (i.e.
over the matrix S). Consider a question with an LF equivalent, in relevant
respects, to wh-xi(F(ti)). Let us assume that wh-xi is identified as a focus in all
LFs of this type. A wh-focus may be seen as always satisfying the conditions for
interpretation as a presentational focus. Strictly speaking, what this means is
that the referent of the wh-phrase must not have been previously introduced
into the discourse. However, wh-phrases are not referring expressions: in
essence, a wh-phrase functions to bind an empty position in an F-structure
which the speaker intends the hearer to fill with a response of the appropriate
semantic type. In this respect, the wh-operator in F-structures is similar to the
º-operator: it serves temporarily to bind an otherwise free variable.
Consider in this regard the following examples, based on the work of Ladd.
(59) A: John speaks many languages.
B1: How many languages does he speak?
B2: How many languages does he speak?
B3: *How many languages does he speak?
(60) A: John is a great linguist.
B1: How many languages does he speak?
B2: How many languages does he speak?
B3: How many languages does he speak?
In (59), that John speaks n languages is c-construable on the basis of
A. Assuming an analysis of B1 in terms of unmarked accent placement, how
many designates a presentational focus in the sense that both B1 and B2 are
requesting that A specify a value for n in the c-construable proposition
mentioned above, on the assumption that no such value is c-construable.
B1 and B1 presumably have the following F-structure:
(61) wh-numberi (John speaks i-many languages)
B3 is inappropriate in the context of (59A), since languages is not interpretable
as a presentational, contrastive, or informational focus—which renders the
associated F-structure uninterpretable.
In (60), given the beliefs of the hearer, one of two relevant propositions
is c-construable on the basis of A: that John speaks many languages, in which
case B1 or B2 is appropriate; or simply that John speaks languages, in which case
B3 is appropriate.
Given our informal characterization of discourse structure and our defin-
itions for presentational and contrastive focus, it is a fairly straightforward
matter to explain the inappropriateness of contrastive focus in contexts where
certain beliefs do not hold. For example, if someone said John insulted Mary,
and no question had been asked about what John did to Mary—and if, in
stress and focus in english 113
addition, Mary had not been introduced into the discourse—then the inter-
pretation of focus as contrastive would yield a contradiction between this
aspect of the structure of the discourse and the fact that, in order to constitute
contrastive focus, S must believe that F(v/Foc) for some Foc 6¼ insult. That is,
S must believe that John V-ed Mary had been introduced into the discourse,
which is inconsistent with the assumption that Mary had not been introduced
into the discourse.
Let us now consider the echo intonation, which we represent as ‘?’. In Foc
(F(t)), echo intonation indicates that F(t/Foc) is c-construable, and that there
is something surprising or noteworthy about F(t/Foc) that particularly has
to do with Foc. In this respect, echo focus is like emphatic focus: the
main difference seems to be that the former, but not the latter, requires that
F(t/Foc) be c-construable. So only emphatic focus can be used in presenting
an exciting piece of information as the beginning of a conversation:
(62) a. Guess what! My mother is coming to visit.
b. *Guess what! My mother? is coming to visit.
It appears that a sentence with echo intonation can have precisely the F-structure
of the preceding sentence, being neither presentational nor contrastive (cf.
fn. 27):
(63) H: Your mother is coming to visit.
wonderful!
S: My MOTHER? is coming to visit. That’s
impossible!
Finally, let us consider instances of so-called normal stress. Such a stress
pattern may come about in two ways. First, there may be no assigned s in
S-structure. Second, an s assigned in S-structure may fall on a rightmost
branch. In either case, the result will be a stress peak on the rightmost terminal
of the surface string.
In the first case, there is no marked focus; and it is possible that there is no
constructional focus either. Can there be a sentence with no focus? Such a
sentence would neither add new information to the discourse nor dispute any
aspect of the discourse. Nor would it be a repetition of a prior sentence, since
such a repetition would also repeat the preceding F-structure. We therefore
rule out by convention the possibility that a sentence has no focus at all; a
derivation without an F-structure is ill-formed.
In the second case, there is a possibility that the node identified as the focus is
the highest, root S. Can an entire sentence be a focus when it is not embedded?
In fn. 31 above we suggested that FA will not assign such a F-structure; thus we
will have a derivation without an F-structure if the node S is chosen as Æ in an
application of FA.
114 explaining syntax
35
Our rules for interpretation of focus do not mention topic, theme, or presuppositions.
As regards the last, we agree with Sperber and Wilson (1979) that focus structures are not used to
define presuppositions. Concerning the first two notions, our rules indicate that, if some phrase
meets the conditions for interpretation as focus of a particular type, then it must be specified as
a focus in F-structure, and vice versa. As demonstrated in Chafe (1976) and Reinhart (1981a), a
topic need not be ‘old information’; it can also function as a focus in an appropriate discourse
context. Reinhart argues persuasively that the notion ‘topic’ is unrelated to focus or old infor-
mation; i.e. the topic of a sentence is not everything that is unfocused. Rather, the topic of a
sentence is what that sentence is ‘about’, independently of whether or not that constituent
happens also to be a focus.
stress and focus in english 115
a
See also Rochemont and Culicover (1990).
116 explaining syntax
but are not more highly marked. Although a number of ‘normal stress’
proponents have attempted to respond to these criticisms (e.g. Bresnan
1972; Jackendoff 1972; Ladd 1980), none has in our view proved particularly
successful.
Our present analysis is closest in theory to Jackendoff (1972) and Williams
(1981). It differs from them in explicitly associating surface-structure represen-
tations with prosodic structures (autonomy of stress) and in not attempting to
determine presuppositions on the basis of particular choices of focus which
define the stressed constituents of a phrase. In our view, specific FAs determine
not presuppositions, but contextual conditions under which the associated
sentences would be deemed appropriate (autonomy of focus). In certain
respects, our analysis thus also bears a superficial resemblance to that of
Sperber and Wilson (1979) and Williams (1981), in maintaining the Autono-
mous Systems view of Hale et al. (1977).
Let us consider the specifics of certain proposals. Bresnan (1971) adopts the
view that normal stress is to be distinguished from emphatic or contrastive
stress, and that the location of normal stress is predictable by rule from the
syntactic structure, specifically by the Nuclear Stress Rule of Chomsky and
Halle. Her proposal differs from that of Chomsky and Halle in requiring the
NSR to apply to underlying, rather than surface, representations. In their
critiques of her article, both Berman and Szamosi (1972) and Lakoff disagree
with Bresnan’s characterization of ‘normal’ stress; they suggest that, on
consideration of a broader class of normally stressed sentences, her analysis
is faulty. Bresnan (1972) responds to these criticisms by introducing a distinc-
tion between focus-related normal stress and other instances of normal stress;
she suggests that focus-related stress is specified by the operation of a rule
quite distinct from the NSR.
Bolinger (1972) rejects Bresnan’s analysis on completely different grounds,
arguing that sentence stress (defined as ‘accent’ in Bolinger 1961) is a function
of semantic or emotional highlighting.36 Accent goes on the ‘point of infor-
mation focus’ (cf. Bolinger 1958); i.e. stress on any lexical unit in a sentence
serves merely to highlight that item as an indication of the speaker’s intent in
communication. Given that no structurally independent characterization of
‘normal’ sentence stress can be given, Bolinger continues, it is entailed that no
systematic structural description of sentence accent placement is possible.
36
We agree with Bolinger that there is a notion of ‘normal’ stress defined at the word level.
Because of well-known cases like I said information, not deformation, we can see that, to define
lexical contrast, we must have access to some characterization of the normal stress pattern of a
word, as well as its segmental make-up and/or syllable structure.
stress and focus in english 117
Consistent with this is the argument that the notion of contrastive accent is
an illusion (cf. Bolinger 1961).
Schmerling—and, to some extent, Ladd – agrees with Bolinger that accent
is not structurally predictable; however, both argue that Bolinger’s semantic
characterization is inaccurate. Bolinger’s writings make several allusions to
the notion of semantically neutral accent placement, one which does not
define presuppositions—i.e. a context-free intonation. Both Schmerling and
Ladd present convincing arguments that all semantically determined accent
placements induce contexts, and hence that no independently motivated
characterization of ‘semantically neutral’ accent placement can be given.
Both also find elusive the proposition that accent and ‘point of information
focus’ should be identified with the unit in the sentence with the ‘greatest
relative semantic weight’.
Schmerling (1976) offers an alternative analysis which recognizes two
distinct sentence types: ‘news’ sentences (e.g. John died) and topic-comment
sentences (e.g. John died). Her claim is that each type is identifiable in terms
of discourse function, and that distinct principles of accent placement apply
for each. (Thus we disagree with Schmerling with respect to the autonomy of
stress and of focus.) In news sentences, predicates receive lower stress than
their arguments, regardless of relative linear arrangement (p. 82). In topic-
comment structures, topic and comment receive equal stress at some level of
representation (p. 94); an independent principle (p. 86) then determines the
heaviest relative stress as that which is rightmost.37
Ladd takes issue with Schmerling’s analysis. Her arguments that all accent
placements induce contexts, he notes, show only that no notion of seman-
tically neutral accent placement is definable. In the spirit of Chomsky (1971)
and Jackendoff (1972), he suggests a well-defined notion of syntactically
neutral accent placement (our rule (9)), namely one which ambiguously
identifies a number of focus constituents in a sentence. For example, in a
sentence like (67), ambiguity exists in the scope of focus within the NP that
contains the accent (cf. discussion in }4.2 above):
(67) Was he warned to look out for an ex-convict in a red shirt?
The advantage of this approach is that no implication is made that all
sentences potentially exhibit neutral accent placement; neutral accent is
37
Schmerling also argues against the view of Chomsky and Halle that all rules which relate to
pronunciation constitute an interpretive component of grammar. Her argument is based on her
analysis of stress assignment, in terms of phonological principles, as sensitive to discourse
considerations. Since these principles do not depend in any direct way on syntactic structure,
she takes the analysis to argue strongly against the Chomsky–Halle version of the Autonomous
Systems view. Note that, under our analysis, Schmerling’s argument is without force.
118 explaining syntax
structure. However, she further suggests that, given the theoretical vagueness
of the precise relationship of syntactic and metrical structures, one might
appeal to the phenomenon of Default Accent to predict metrical structure.
The circularity of her proposal is evident.
On neither Ladd’s nor Bing’s account, then, is the notion of Default
Accent rigorously defined. In our view, any approach that attempts to define
a notion of relative accentability will fail. Default Accent, if it exists, must
be structurally definable. It is our claim, however, that the need for such
a notion is obviated under a complete analysis of structurally defined accent
placement.
Accent placement is thus seen as a formal matter with consistent and
predictable interpretive results. We have presented an analysis that, with certain
well-defined exceptions, formally characterizes the association of primary
stress and focus in English sentences.
5
Remarks on Chapter 5
This chapter is concerned with the problem of finding empirical evidence to
support the hypothesis of empty NPs such as PRO in control constructions.
We argued that no such evidence could be found, and that the entire motiv-
ation for PRO was theory-internal and driven by the desire to assign uniform
syntactic representations to constructions that share semantic properties.
(This methodology of Uniformity has been employed widely in the develop-
ment of contemporary generative grammar, as discussed at length in Culi-
cover and Jackendoff 2005: chs 2 and 3.) The approach to control argued for in
this chapter bears a closer resemblance to the treatment of control in HPSG,
LFG, and Simpler Syntax. On this view, control is not a binding relationship
between NPs in the syntax, but a matter of interpretation that is constrained
partly by syntactic structure and partly by the particular lexical items.
The analysis diverged from standard approaches in proposing a semantic
account of control in terms of ‘R-structure’. This is a level of representation
that incorporates information about the referents of syntactic arguments and
their thematic relations. R-structure proves to be a restricted variant of
Jackendoff ’s Conceptual Structure; hence our account here overlaps in import-
ant respects semantic accounts of control such as Dowty (1985), Sag and
Pollard (1991), Culicover and Jackendoff (2001; 2005; 2006), and Jackendoff
and Culicover (2003).
* [This chapter appeared originally in Language 62: 120–53 (1986). It is reprinted here by
permission of the Linguistic Society of America. This work was funded in part by grants from
the National Science Foundation and the Sloan Foundation. We gratefully acknowledge their
support. We would like to thank Joe Emonds, Ann Farmer, Eloise Jelinek, Chisato Kitagawa,
Fritz Newmeyer, Richard Oehrle, and Geoffrey Pullum for their very helpful comments. The
authors’ names appear in alphabetical order.]
control, pro, & the projection principle 121
5.1 Introduction
This paper presents a theory of control (predication), in terms of thematic
relations, which makes no use of the element PRO in the syntax. Important
consequences of the theory are that the Ł-criterion must be relativized to
particular local domains, and the Projection Principle cannot be maintained.
A number of syntactic arguments against PRO are summarized, and the
arguments of Koster and May (1981) in favor of PRO are addressed. It is
concluded that, given a thematic relation-based account of predication, the
Projection Principle in its current form is not a useful postulate in the theory
of grammar.
In previous work (Culicover and Wilkins 1984, henceforth LLT), we have
assumed that infinitives in general are not derived from S0 complements. This
means that we question the existence of the abstract empty NP, usually
referred to as PRO, which is assumed in much current work to be the syntactic
subject of embedded infinitival complements. The issue of the existence of
PRO as a syntactic element is of great importance, given its central role in the
theory of Government and Binding (Chomsky 1981a and much other work).
PRO is necessary to avoid violations of the Projection Principle (hereafter
PrP), which states: “Representations at each syntactic level (i.e. L[ogical]
F[orm] and D- and S-structure) are projected from the lexicon, in that they
observe the subcategorization properties of lexical items” (Chomsky 1981a:
29). In particular, this means that a verb that requires a propositional comple-
ment in LF would require a sentential complement at D- and S-structure.
Where such a verb apparently occurs in the syntax with a bare infinitive, the
PrP requires that the infinitive be analyzed as a full S. This untensed S, with no
overt subject, would have PRO as its subject—at least in English (and similar
languages) where the subject is not optional in the expansion of S. The PrP is
stated formally as follows (Chomsky 1981a: 38):
(i) If ( is an immediate constituent of ª in [ª . . . Æ . . . . . . . ] or [ª . . .
. . . Æ . . . ] at Li, and ª = Ǣ, then Æ Ł-marks in ª.
(ii) If Æ selects in ª as a lexical property, then Æ selects in ª at Li.
(iii) If Æ selects in ª at Li, then Æ selects in ª at Lj.
In our theory we do not assume the PrP; specifically, we take issue with
statements (ii) and (iii). This means that not all thematic information—the
Ł-marking of (i)—has an overt syntactic representation in terms of distinct
categories at each syntactic level. In other words, subcategorization require-
ments can be satisfied without necessarily presupposing that the logical/
semantic requirements of a verb have a one-to-one correspondence with the
syntactic categories in syntactic structure. Because of the theory of coindexing
122 explaining syntax
which we present here, a given NP may bear a thematic role with respect to
more than a single verbal (or relational) element. Then, because the PrP is not
assumed, there is no reason why the mapping from syntactic to semantic
structure cannot introduce arguments, or rather representations of argu-
ments, as under conditions of predication.
The advantage of our approach over one which includes the PrP is that the
non-syntactic nature of PRO is immediately explained. That is, the apparent
inconsequentiality of PRO for many syntactic phenomena ceases to require
explanation. PRO is a logical element, not a syntactic one.
Koster and May (1982) claim to demonstrate conclusively the theoretical
advantages of assuming that infinitival complements contain a syntactic
PRO subject, and are sentential. Our opposing argument begins, in }5.2, by
presenting our theory of predication, which makes the no-PRO theory inter-
esting. In }5.3, we discuss some arguments against the syntactic element PRO.
In }5.4, we summarize our response to Koster and May’s arguments. Our
general conclusion (}5.5) is that little syntactic evidence exists, if any, in
support of PRO, and that therefore the PrP—including (ii) and (iii)—is not
supported as a useful postulate of the theory of grammar.
It is important to point out from the beginning that our theory of predica-
tion does not, in itself, constitute an argument against PRO or the PrP.
A theory could conceivably adopt our thematic conditions on coindexing
(presented in }5.2) without abandoning the syntactic PRO subject of untensed
clauses. Our case against syntactic PRO can only be evaluated by combining
our theory of predication with our syntactic, arguments (}}5.3, 5.4).
1
We do not include here a point-by-point comparison of our theory either with Williams
(1980) or with any of the literature it has generated, because such a comparison would obscure
the larger issue which we mean to address. It will be clear to the reader that our approach owes
much to Williams’ insights about the relation between predicates and antecedents, and also that
both theories owe much to the earlier work by Jackendoff (e.g. 1972). After the completion of
this article, the unpublished dissertation of Rothstein (1983) was brought to our attention. Our
analysis would undoubtedly have benefitted from a consideration of that work.
control, pro, & the projection principle 123
2
The definition and treatment of predicates here is a revision of LLT, ch. 2. We mean this new
definition of predicate to be universal, but this does not necessarily mean that all languages will
have predicates of all categories. For instance, some languages do not have infinitives (e.g.
Modern Greek); there we would predict that ‘control’ would be accomplished differently,
perhaps in terms of the binding of an empty NP pro (distinguishing pro from PRO). We
would expect the conditions on binding to differ from those on coindexing, but to be sensitive
to some instantiation of the general Locality Condition of LLT.
In our definition of predicate, we leave open the correct treatment of predicate nominals, as
in Mary is a doctor. If, for independent reasons, a doctor must be classified as a direct
became
object, then the definition would have to be revised appropriately—e.g. to allow a direct object
to be a predicate just in case it is not assigned a Ł-role. However, it may be that a doctor is not,
strictly speaking, a direct object.
3
The definition also excludes infinitival VPs and other predicational elements inside NP, e.g.
[NP Bill’s promise to go]. These must of course be accounted for, but we exclude them from
discussion here (see fn. 16).
a
The theory suggested here anticipates Simpler Syntax, where the ‘deep grammatical rela-
tions’ are linked to conceptual structure arguments, on the one hand, and to syntactic configur-
ations on the other. Constructions such as passive and raising to subject are derived by mapping
the DGR to another DGR (in the spirit of Relational Grammar). So, for example, in the English
passive the deep Object is mapped to the Subject, and realized as the sister of VP. For discussion,
see Culicover and Jackendoff (2005: ch. 6).
124 explaining syntax
4
The rule below is a generalization of the PS rules given in LLT, ch. 2. It will of course be necessary
to impose ordering restrictions on the complements of V—perhaps by an adjacency requirement on
thematic role assignment, or in terms of abstract case assignment (as in Stowell 1981).
The notation (XP)* is intended to designate a sequence of maximal projections, perhaps of
different categories.
control, pro, & the projection principle 125
b
For an important proposal about the fine structure of thematic relations, and the condi-
tions under which a particular relation is assigned to a particular syntactic argument, see Dowty
(1991). We were unaware of Dowty’s work when our paper went to press.
c
R-structure is a notational variant of (parts of) Jackendoff ’s Conceptual Structure (Jackendoff
1990; 2002), if we take thematic roles to be strictly defined over CS representations.
control, pro, & the projection principle 127
5
The theory of predication presented here is a reformulation of LLT, ch. 2. It owes much to
the important treatment of control in terms of thematic relations by Jackendoff (1972). We differ
in certain respects from Jackendoff, but the basic insight of accounting for the predication
phenomena in terms of thematic relations is his.
128 explaining syntax
NP V2 NP V2
John V1 John V1 AP
V NP AP V NP nude
Figure 5.1
Both raw and nude are predicates here because they are maximal APs
dominated by VP, and they bear no grammatical relation to the verb. In
Figure 5.1(a), the meat and raw are coindexed because they are bijacent,
6
source might more accurately be called location or experiencer for some verbs; see
Gruber (1965), Jackendoff (1972), Nishigauchi (1984). It is clear that a deeper account of why
goals are excluded as antecedents would be desirable. However, a detailed discussion of
thematic roles would take us far beyond the scope of this paper.
7
The definition of ‘bijacent’ is based on the insight provided in the discussion of
‘c-subjacent’ in Williams (1980: 204, fn. 1).
control, pro, & the projection principle 129
and the meat is the theme of ate. The meat is assigned theme by raw; this
means that, in R-structure, the meat represents the theme of the domain
raw. In general, predicates assign theme to their antecedents. John is not a
possible antecedent because the predicate is not bijacent to it. In Figure 5.1(b),
however, because the predicate is in V2, it is bijacent to the subject, but not to
the object. Here John is the antecedent and theme of nude, because the
predicate is bijacent to it and it is the source of ate (or its location; see
fn. 5 above).
The examples in (6) are among those which we use in LLT to show that
the appropriate level for coindexing (in English) is before Dative Movement
at D-structure. Because we assume that there are no rules of NP movement,8
these same examples illustrate the importance of thematic conditions (5a):
(6) a. John made Billi a good friendi.
b. Johni made a good friendi for Bill.
c. Johni made Bill a good friendi.
The relevant phrase-markers for (6) are given in Figure 5.2.
(a) S (b) S (c) S
NP V2 NP V2 NP V2
V NP NP V a P NP V NP a
good good
made Bill a made friend for Bill made Bill friend
good
friend
Figure 5.2
8
In LLT, ch. 3, we consider the alternative of base-generating passives, but do not adopt such
an analysis, for reasons dealing with the theory of predication. Our revised theory of predication
here resolves the inconsistencies in the base-passive theory pointed out in LLT.
130 explaining syntax
(a) S
NP V2
John V1 PP
V NP AP P NP
(b) S
NP V2
John V1
V NP PP AP
(c) S
NP V2
John V1 PP AP
V NP P NP green
(d) S
NP V2
John V1
V NP PP AP
(e) S
NP V2
John V1 PP AP
V NP P NP full
Figure 5.3
132 explaining syntax
Examples (7a–e) show the two different senses of the verb load. As indi-
cated in Figure 5.3(b), load in (7b) is structurally (and semantically, of course)
similar to the verb put: the PP is inside V1. The other sense of load is indicated
in Figure 5.3(a), where the PP is a daughter of V2. Given the phrase-markers in
Figures 5.3(a–c) and the bijacency requirement on Coindex, the grammatical-
ity judgments indicated in (7) are readily explained.9
The importance of the thematic conditions on Coindex are again illus-
trated below:
nervous
(8) a. Johni sent the book off happy i.
a total wreck
nervous
b. *Johni received the book happy i.
a total wreck
c. John got the presidenti angryi.
d. *Johni got the present angryi. (got = received)
unmade
(9) a. The bedi was slept in i.
with dirty sheets
nude
b. *Billi was talked about angry i.
in the living room
9
Note that, in each case in Figure 5.3 where PP and AP are at the same height, the order could
be changed (in accord with the base rule in (2)) without affecting the grammaticality judgments.
There seems to be stylistic re-ordering within both V2 and V1 in English. Note also that, in a
structure like Figure 5.3(e), the subject would be a possible antecedent for the predicate because
the predicate is bijacent to it; e.g. Johni loaded the wagon with hay [full (from a big meal)]i.
While the Bijacency Condition appears to capture a significant generalization, it is peculiar
in being a syntactic condition on a relation that holds at R-structure. Because R-structure is a
level of semantic representation, a strict version of the Autonomy Thesis (cf. Hale et al. 1977)
should disallow it. Because all other aspects of predication are expressed strictly in terms of
R-structure, we would expect that, ultimately, the Bijacency Condition could be also. Aside
from the issue of the strict autonomy of levels, there are independent motivations for a
reformulation of the bijacency requirement. Although we do not now have a precise reformu-
lation of the Bijacency Condition, we suspect that it may be nothing more than a byproduct
of the way in which syntactic structures are compositionally translated into semantic
representations.
control, pro, & the projection principle 133
angry
(10) a. Johni was found nude i.
in the forest
angry
b. *Johni was looked for nude i.
in the forest
In none of these examples does the predicate bear a thematic role (because in
no case does it bear a DGR, or get assigned a role by an idiosyncratic verb). In
(8a), the antecedent of the predicate is the source of the verb send. In (8c), the
antecedent is the theme of get. The ungrammaticality of (8b) and (8d) results
from the fact that, in both cases, the only possible antecedent—the subject—is
neither theme nor source, but rather goal: this is a violation of the thematic
condition on Coindex. In neither (8b) nor (8d) can the antecedent be the
theme, book or present, because of the obvious conflict in semantic features. Of
course, (8d) is grammatical when we take got to have an active sense (similar to
bought), since on this sense the antecedent is a source.10
We again see a violation of the thematic conditions in (9b) and (10b). In
these cases, the passive subject and only possible antecedent for the predicate
is the goal, neither a source nor theme. Talk about and look for are not
typical examples of transitive verbs, in that their objects do not ‘undergo’ the
action of the verb. These are special source-goal constructions which lack
themes (see Gruber 1965 for relevant discussion). This distinction between
theme and goal assigned to an object is relevant in the semantics of many
verbs. It correctly characterizes the difference between the role assignments in
such pairs as look at vs. look for, and watch vs. watch for or seek (Gruber 1967
specifically takes into account the prepositions that occur with various
verbs).11 The second predicate of each pair has an object which is not affected
10
It is important to distinguish predicates with referential NP antecedents from adverbs
which have scope over some clausal domain, e.g. John received the book nervously . In
with good humor
these cases, John is not the antecedent of the adverb—i.e., John is not the theme of nervously or
with good humor; rather, John, along with the VP, falls within the scope of the adverb. Our
definition of predicate, including ‘Xmax dominated by Vn’, is meant to exclude these adverbs.
Evidence that this exclusion is well-motivated comes from grammatical sentences like It rained
furiously vs. *It rained furious: here the adverb, but not the predicate, is grammatical because
there is no referring antecedent that can bear the theme role (i.e. it here has no referent).
11
It might be that a generalization is missed about the distribution of theme if some
sentences have only source-goal. In that case, we would have to distinguish the usual
theme relation from that in examples like (9b) and (10b). There might be a different role
assigned to ‘themes’ which are not directly affected by their assigning verb. We predict that, as
more work is done on thematic relations, the set of different roles will continue to grow larger;
134 explaining syntax
by the action of the verb. Therefore we get grammatical sentences of the forms
(11a,b), but not the corresponding negation and passive:
(11) a. We looked for Mary but didn’t see her.
midgets.
looked for
(12) a. *We Mary nude, (but we didn’t see her).
sought
looked for
b. *At least one tall man was angry.
sought (out)
In (12a,b), just as in (9b) and (10b), the goal is ruled out as the antecedent of
a predicate. In the grammatical examples (9a) and (10a), the passive subject
which is the antecedent is in each case the theme of the verb.
e.g. it also seems to us necessary to distinguish recipient from goal, and location from
source. We leave this topic for future investigation.
d
The treatment of control in this section and the next is a semantic account of control. It
anticipates much of the typology and analysis of Simpler Syntax (see Culicover and Jackendoff
2001; 2006; Jackendoff and Culicover 2003). The Coindex mechanism used to express the
control relation is stated over R-structures, which is the counterpart to CS in the later approach.
In spirit the current treatment of control is very close to that of HPSG (Sag and Pollard 1991),
which was a major influence on the Simpler Syntax analysis. The main difference is that the
current approach attempted to unify control and secondary predication. Since secondary
predication is not sensitive to the lexical head, while control is, it appears likely that this
unification is in the end not feasible.
12
This idea, of course, is not new; it has been argued e.g. by Brame (1975), Hasegawa (1981),
and Bresnan (1982b). Because these accounts do not base the control theory on thematic roles or
directly address the PrP, comparison with our theory would fall outside the scope of this article.
It is important to reiterate at this point, before a detailed discussion of VPs, that some bear
DGRs—and therefore thematic roles—while others do not. VPs without thematic roles would
include Bill saw John [VP waiting for a bus] or I took a taxi (in order) [VP to get there on time].
These VPs are subject to the thematic condition (a.i) of Coindex and to the bijacency require-
ment; they are not included in the following discussion.
control, pro, & the projection principle 135
permitted
(13) a. John Billi [VP to go]i.
allowed
expected
b. Johni wanted [VP to go]i.
tried
wanted
c. John Billi [VP to go]i.
expected
believed
d. John Billi [VP to be the winner]i
hoped for
if Bill is coindexed with to go, then the proposition Billi [to go]i is the theme,
and John is the experiencer; this yields a well-formed R-structure.
Example (13d) presents two more cases where the same type of derivation is
relevant. The subject John is the source. We assume that the matrix verbs,
which both take propositional themes, assign no roles to their NP object or
to their VP complement. Coindexing is free; however, Bill must be the
antecedent of the infinitival VP. Bill bears no role with respect to the matrix
verb; thus it must receive a role from the infinitive, in order to avoid a
violation at R-structure. The proposition formed by the coindexing, Billi
(to) be the winneri, is then theme of the main verb.
For neither (13c) nor (13d) are the thematic conditions on R-structure
relevant. The matrix verb here assigns theme to the proposition that includes
the predicate; therefore, by definition, the predicate does not lack a role, and
thematic condition (5a.i) does not apply. However, the predicate is not in
itself the theme (or goal), and therefore neither condition (ii) nor (iii) is
relevant.
In contrast, simple AP predicates—such as raw in John ate the meat raw—
will have no thematic role in R-structure; therefore thematic condition (5a.i)
applies.
What is relevant in those cases when either the predicate or the antecedent
lacks a thematic role is the bijacency requirement. Where the R-structure
indicates that one of these elements has no role, then the predicate must be
bijacent to its NP antecedent in the associated syntactic structure. Because
each R-structure is associated with a D-structure (= NP-structure), both the
thematic and syntactic information is available for the well-formedness
conditions.
In each of the cases of coindexing exemplified in (13), the infinitival VP
assigns the role theme to its antecedent by the thematic-role assignment
algorithm. That other roles may be assigned by coindexing is illustrated in
(14), where we give the R-structures for each sentence (for readability, the
domain in each case is identified by the verb that defines it):
(14) a. John permitted Billi [to kick the dog]i.
{<John, {source}, permit>,
<Bill, {theme}, permit>),
<[to kick the dog], {goal}, permit>,
<Bill, {source, agent}, kick>,
<the dog, {theme, patient}, kick>}
b. John wanted Billi [to kick the dog] i .
{<John, {source}, want>,
<Billi [to kick the dog]i, {theme}, want>,
control, pro, & the projection principle 137
13
There are of course many other conditions which could rule out a coindexing like that in
(15b), such as some version of the Specified Subject Condition (e.g. Chomsky 1973), or some
version of the Variable Interpretation Convention (Wilkins 1977; 1980). What is of interest to us
for the moment is simply that there is some requirement of ‘locality’ which is relevant for the
coindexing of infinitives.
14
For a treatment of control that would give similar results, although developed within a
different framework, see Farmer (1984).
138 explaining syntax
15
Even where there is a by-phrase, the passive with promise is ungrammatical: *Bill was
promised to leave by John. It is possible that the object of by is in a different thematic domain in
R-structure at the point at which Coindex applies. In other words, by might not directly assign
to its complement any thematic role governed by the verb. The role of the complement of by
would be themeby , and the interpretation of this argument as having the same roles as the
subject of the active verb would be determined at a later stage. This account would be in the
spirit of the Thematic Hierarchy Condition of Jackendoff (1972).
16
Of course we are not the only researchers in generative grammar who assume the existence
of a non-transformational passive; others include Freidin (1975), Bresnan (1978; 1982c), Wasow
(1980), Brame (1978), Koster (1978a), Bach (1980), and Keenan (1980). Gazdar (1981) and all
work in Generalized Phrase Structure Grammar also presuppose the PS generation of the
passive.
control, pro, & the projection principle 139
roles. This would be relevant both for grammatical subjects and for antecedents
determined by coindexing. NP movement need not apply in an embedded
domain before coindexing takes place. In both cases in (17), Mary is assigned
theme, patient by the infinitival VP, just as it is in Mary was examined.
The same explanation is relevant where the subject of the matrix is the
antecedent of an infinitival passive VP:
(18) John wanted to be arrested.
Because to be arrested is the theme of want, the antecedent is the source
John. By the coindexing, John is assigned theme, patient of to be arrested.
These roles are assigned because of the passive morphology that occurs on
arrest (cf. John wanted to arrest Bill, where John is assigned agent of arrest).
17
Another case of an infinitival VP in NP would be John sent Mary [NP a book [VP to read]]. We
believe that here a separate coindexing for infinitival modifiers applies in NP; as Nishigauchi
(1984) shows, the goal of the main verb is often the antecedent of the infinitive. An analysis like
Nishigauchi’s is readily incorporated into our theory—except, of course, that we assume no PRO.
18
At least one set of examples is not correctly accounted for by our theory. These involve
verbs of ‘saying’:
(i) John said to Mary [to arrive on time].
essentially the same facts as one like that of Williams,19 which assumes PRO
(also see Manzini 1983 for relevant discussion).20
In Spanish, it does not seem possible to account for the control of infini-
tives in strictly configurational (syntactic) terms. As in English, there seem to
be thematic well-formedness conditions. There is, however, an interesting
difference between the two languages with respect to control.
In English, some NP in the sentence is, in general, the controller of the
infinitival VP. The arb interpretation arises only under the restricted circum-
stances pointed out above—namely, where the VP has no index because it is
not locally coindexed with a referring NP, or where the infinitive is a subject:
wants
(23) a. Maryi expects [to leave]i.
asks
wants
b. Mary expects youi [to leave]i.
asks
sees
(24) a. Mary permits youi [(to) leave]i.
makes
sees
b. *Mary permits [(to) leave]ARB.
makes
By contrast, in Spanish, the interpretation of an embedded infinitival VP
often involves a controlling NP that is not overtly indicated in the syntax:
recetó
(25) a. Ana tei permitió saliri.
vio
prescribed
Ann 2.SG permitted to.leave
saw
recetó
b. *Anai permitió saliri.
vio
recetó
c. Ana permitió salirARB.
vio
control, pro, & the projection principle 143
But not all Spanish verbs allow this arb interpretation of the complement:
quiso
(26) a. *Ana tei esperó saliri.
decidió
wanted
Ann 2.SG expected to.leave
decided
wanted
‘Ann expected you to leave.’
*decided
quiso
b. Anai esperó salirARBi.
decidió
wanted
Ann expected to.leave
decided
wanted
‘Ann expected to leave.’
decided
quiso
c. *Ana esperó salirARB.
decidió
wanted
Ann expected to.leave
decided
wanted
‘Ann expected to leave.’ (ungrammatical in English
decided on the ARB interpretation)
Permitió
(28) Recomendó lamentar.
Escuchó
permitted
recommended 3.SG. to.lament
listened to
permitted
*‘S/he recommended (to) lament.’
listened to
21
It is important to note that we distinguish an unspecified NP in R-structure (as we are
referring to it here) from an NP in syntactic structure which dominates no lexical material (i.e.
[e]), and which must be lexically filled or bound in a well-formed derivation. The unspecified
NP is present in R-structure, but not in the syntactic levels.
146 explaining syntax
22
One might postulate a PRO in object or clitic position, which then controls the embedded
subject PRO; this would mean substantially revising the conditions on the distribution of well-
formed PRO. Jaeggli (1982) does propose an object PRO for Spanish, but it has very different
properties from the one which would be necessary here. Importantly, a PRO in object or clitic
position would be an element with no syntactic properties, exactly as we demonstrate for subject
PRO.
We prefer an account of control that uses R-structure to one based on lexical information (as
suggested by Bresnan 1982b), specifically because we can thus account for the difference between
English and Spanish that we have pointed out. It would seem ad hoc, in a lexical account of
control, to express the fact that English—but not Spanish—requires the controlling antecedent
to have a syntactic representation. This does not seem to be a fact best captured in terms of the
control relations of individual lexical entries: the general pattern of both English and Spanish
would nowhere be expressed.
Bresnan (1982b)’s account is similar to ours in that the control possibilities are determined
not by structural relations but in terms of the ‘function’ of the antecedent NP (though Bresnan’s
‘function’ is not directly equivalent to our thematic relations). A thorough comparison of the
two theories of control would be of interest in future research.
23
Arguments which appear in this section in abbreviated form, because of space limitations,
are discussed in detail in LLT, ch. 2.
control, pro, & the projection principle 147
e
In fact, in subsequent work, the notion of ‘governing category’ was ultimately abandoned
and it was proposed that that PRO has a special abstract Case, or no case; for a review of the
issues, see Landau (2006).
148 explaining syntax
expect expects
want wants
(36) a. I would like Mary to be rich and Bill *[ would like ] Sam to be poor.
believe believes
find finds
expects expects
b. John wants to eat the beans, and Mary [ wants ] to eat the
would like would like
potatoes.
The examples in (36a) are ungrammatical in our theory because what follows
the gap is the sequence NP VP. In (36b), what follows the gap is just VP. If, in
both cases, the single constituent [S NP VP] were involved (whether in
syntactic structure or in the phonological representation), then we would
have no account of the grammaticality difference.
5.3.3 Pseudo-clefts
S0 is, in general, well-formed as the focus of a pseudo-cleft:
(37) a. What John expects is that he will be elected President.
b. What John prefers is for Mary to be elected President.
If, in (38a,b), the focus constituent is an S0 , we have no account of the
grammaticality distinction:
(38) a. What John expects is to be elected President.
b. *What John expects is Mary to be elected President.f
Assuming no PRO, (38a) has a VP as focus. Example (38b) is ungrammatical
because the focus of the pseudo-cleft is a sequence of two constituents: NP
VP.24 If (38a) contained PRO, it should be excluded for the same reason.
f
Examples such as these strike me as much better now than when we wrote this article. It is
conceivable that they are well-formed, and derived by omitting the complementizer for from
sentences such as What John expects is for Mary to be elected President.
24
In GB theory, (38b) is ungrammatical because Mary would not be assigned case. While this
accounts for the facts, we interpret this kind of explanation as evidence that only case-marked
NPs—i.e. NPs other than PRO (or the trace of NP)—have any syntactic reality.
control, pro, & the projection principle 149
(40) *John expects PRO, who deserves it, to win the prize.
In a theory with no PRO, (40) is ungrammatical because there is no antece-
dent for the appositive.25
5.3.5 Conjunction
If PRO is an NP, we would expect it to conjoin with other NPs. However, it
does not, as shown in (41b,c).26
(41) a. I expect to go to Italy, and I expect John to go to Italy.
b. *I expect PRO and John to go to Italy.
c. *I expect John and PRO to go to Italy.
25
Case theory would be hard-pressed to account for the grammaticality difference between
(39a) and (40), given that expect generally has optional S-bar deletion, and therefore permits
both John expects PRO to win and John expects Bill to win. Presumably the application of S0
deletion should not be sensitive to the appositive in its context.
26
Case theory can account for the grammaticality facts here, as in (38b): expect either would
or would not have S0 deletion—meaning that either both PRO and John would be assigned case,
or neither would be. Again (see fn. 23 above), case theory highlights the fact that case-marked
NP has a clearly syntactic character, whereas PRO does not.
27
The definition of ‘antecedent’ includes both grammatical ‘subject’ and the antecedent
designated by coindexing. This is discussed in LLT, ch. 4, in terms of the ‘antecedent-internal
e condition’ of Delahunty (1981).
150 explaining syntax
The facts in (43) and (44) would be difficult to explain if, in both cases, the
embedded clause were an S0 with a PRO subject. If there were a PRO subject,
we would expect Stylistic Inversion not to apply at all—given that pronominal
forms cannot themselves invert, and that they block other NPs from moving
into the VP over them:
(45) a. He sat on the stool.
b. *On the stool sat he.
(47) a. The man in the funny hat expects him to sit on the stool.
himself
b. *On the stool expects him to sit the man in the funny hat.
himself
The grammaticality of (44b) not only argues against the pronominal
element PRO but, given the well-known constraint against the lowering of
constituents, also argues against the claim that the infinitive is an S0 (or S). In
our theory, the ungrammaticality of (47b), as compared with (44b), results
from the fact that the NP moved into the infinitive is not its antecedent.
We have presented here six different constructions which indicate the
disadvantages of assuming that PRO is a syntactic element (more are pre-
sented in LLT, ch. 2). The arguments associated with these constructions
would not be of so much interest if the theory containing PRO were the
only one to explain the control facts. We claim, however, that at least one
other explanatory theory, namely ours, does not use PRO. In the next section,
in discussion of Koster and May (1981), we address certain theoretical argu-
ments adduced in favor of PRO.
5.4.1 Wh-infinitives
K&M argue that, since wh-infinitives as exemplified in (48) must have COMP,
they must be S’s (and therefore would have PRO subjects):
(48) a. I wonder what to do.
b. a topic on which to work
However, COMP might be introduced under two types of nodes. If this means
an unwanted complication of the base, then we would also have to question
the analysis of NP and AP by Selkirk (1977) as both containing DET, on the
basis of examples like John knows this man and John is this tall. But in
that that
fact, what Selkirk’s observations point to is a generalization: if NP and AP are
both analyzed as [ + N], DET can be generalized as the specifier of [ + N]
phrases.
A similar generalization can be made for COMP. If it is supposed that
infinitival VP (or VP0 ) is the maximal projection of V, and that S0 is the
maximal projection of Modal [later Infl] (as suggested by McA’Nulty 1980,
Klein 1981, and Chomsky 1981a)—and if it is assumed further that Modal and
V share the feature [+V] (cf. Chomsky 1972)—then we have the generalization
that COMP is the specifier of [+V] phrases.28
K&M also argue that the introduction of COMP in both VP0 and S0 is
undesirable, given that VP0 is not a bounding node with respect to subjacency.
They note (p. 135) that the presence of COMP in VP0 cannot block configur-
ations like the following (their (79)):
(49) *What2 does Mary wonder [VP0 to whom3 [to give e2 e3]]?
But there are other constraints which will block (49), including the Variable
Interpretation Convention of Wilkins (1977; 1980) and the Locality Condition
of LLT.
Finally, in this respect, note that an analysis of examples like (48) and (49)
need not turn on just the issue of VP0 vs. S0 . Our theory (in LLT) in fact says
that these wh-phrases should be analyzed as NPs with [+wh] specifiers that
permit wh-Movement. This analysis is based on an adaptation of the ‘deverb-
alizing’ rules of Jackendoff (1977), where NP can be rewritten as [SPEC V00 ].
28
Under this account, a VP which occurs with tense would be Vmax-1. We do not in fact claim
that infinitival VP has a COMP (except when the VP is an infinitival NP). We include this
discussion simply to address the logic of K&M’s argument.
152 explaining syntax
The SPEC which is a sister of any V00 is then analyzed as COMP (as opposed to
DET), and permits wh-Movement.29
So far as we can tell, these are the only two arguments that K&M bring to
bear against the notion of COMP in VP0 which are relevant in light of our
proposed theory of VP coindexing.
29
Further evidence of this type of analysis of infinitival NPs can be found in Spanish. There is
good evidence for the two following nominal structures, even where no wh-term is involved:
(a) N⬙ (b) N⬙
COMP V⬙ DET N⬘
el V⬙ ADV
su AP N N⬙
V⬘ PP despertar
‘his painful awakening
hablar P NP of the town’
con ellos
‘the speaking with
them constantly’
The structure like that in (a) is the one which permits wh-Movement. This analysis of infinitival
NPs is presented in Wilkins (1986). (See also Plann 1981 for a discussion of these infinitives.)
control, pro, & the projection principle 153
5.4.3 Pseudo-clefts
K&M argue that the grammar can be simplified if there is no VP0, because
then it need only be stated that S0 can be focus of a pseudo-cleft. They fail to
note sentences like the following, which suggest that VP can function as a
focus if it is not tensed:
(52) a. What he did was feed the ducks.
b. What he wanted to do was feed the ducks.
Tensed VP cannot be a focus, but that fact has little to do with whether there is
a VP0 constituent, since VP0 would not contain tense.
5.4.4 Extraposition
K&M argue for a simplification by pointing out that both S and so-called VP0
extrapose. They do not note that AP and PP can also extrapose, as shown in
(56)–(57):
(53) a. A book which we didn’t like appeared.
b. A book appeared which we didn’t like.
(54) a. A book on which to work appeared.
b. A book appeared on which to work.
(55) a. A problem to work on is on the table.
b. A problem is on the table to work on.
(56) a. A book bound in leather was on the table.
b. A book was on the table bound in leather.
(57) a. A book about armadillos has just appeared.
b. A book has just appeared about armadillos.
30
While these points about the base are of interest, K&M have glossed over some additional
complexities that are important to consider. These are discussed in LLT, ch. 2.
154 explaining syntax
The true generalization is not ‘Extrapose S0 from NP’, as K&M would con-
clude, but, rather, simply ‘Extrapose from NP.’
5.4.5 Coordination
According to K&M, infinitival complements conjoin with sentential comple-
ments, and therefore should be considered to be of the same category. They
give the following examples (p. 133):
(58) a. To write a novel and for the world to give it critical acclaim is John’s
dream.
b. John expected to write a novel but that it would be a critical disaster.
The same logic would lead to the conclusion that the complements are all PPs
or NPs, because for-to complements can be conjoined with PP, and that-
complements can be conjoined with NPs:
(59) a. John hopes for Mary to leave and for a miracle.
b. I believe your answer, and that you believe what you are saying.
c. That you were here last night, and John’s reaction when you told
him, surprised no one.
The argument from conjunction used in (58) to show that VP0 is the same
category as S0 would lead to the conclusion that, in (59), S0 is NP or PP. Either
it is the case that conjunction does not provide a test for syntactic category, or
else there must be no problem with saying that all the conjoined constituents
are NPs. But presumably K&M cannot adopt this view (see Koster 1978b).
5.4.6 Construal
The strongest arguments for subjects in superficially subjectless clauses deal
with anaphora, coreference, and rules of construal in general. K&M point out
several facts that can be explained if these clauses contain a PRO subject. Two
important points must be made about this part of their discussion.
First, K&M’s approach is sufficiently problematic to warrant exploration of
the relevant constructions within alternative theories; e.g. such exploration
would seem necessary for Q-Float and for the correct construal of all. While
certain things can be adequately accounted for by a movement analysis of Q,
illustrated in (60), a number of problems remain. These can be exemplified
by (61).
(60) a. All the men tried to leave.
b. The men all tried to leave.
c. The men tried [PRO all to leave].
d. The men tried [to all leave].
control, pro, & the projection principle 155
31
In LLT, ch. 2, we show that there are also certain problems with PRO in the account of
reflexives, especially reflexives inside NP.
32
Wilkins (1985) shows that our level of R-structure is in fact relevant for bound coreference,
reflexivization, and related phenomena.
156 explaining syntax
which cannot assume the PrP. It follows from the PrP that all verbs which have
a logical subject also have syntactic subjects:
. . . Ł-theory requires that clauses with certain verb phrases (e.g. persuade John to leave
but not be raining or be a good reason for his refusal) must have subjects at the level of
LF-representation. By the projection principle, these clauses must have subjects at D-
structure and S-structure, either PRO, or the trace of an NP, or some phonetically-
realized NP. (Chomsky 1981a: 40)
33
Wasow presents a very similar thematic analysis in a lexical framework
158 explaining syntax
34
Believe behaves differently in other languages, e.g. Spanish and French. The equivalent of
(68) is grammatical in Spanish, but strings of the form NP V NP VP (*Juan cree a Maria ser
inteligente) are ungrammatical. The proper treatment of believe-type constructions in French
and Spanish requires a full analysis of control in clitic languages; such a treatment is beyond the
scope of this chapter.
control, pro, & the projection principle 159
In our theory, an example like (69b) is ungrammatical because try occurs with
too many arguments.35 Example (69c) is ungrammatical because the goal is a
proposition, in violation of the lexical specifications for the verb.
To summarize, both systems provide for a reduction of the categorial
component of the grammar. Both theories involve a certain cost. To sustain
this reduction, the (inviolable) PrP framework must include a system of
exceptional case-marking, with (sometimes obligatory) S0 deletion. The no-
PRO theory must permit fairly detailed lexical entries. The important point is
that the PrP is not supported just because it allows for a reduced categorial
component. For both theories, the base rules are needed only for the unpre-
dictable distribution of certain categories, e.g. prepositions.
35
Note that (69b) would be well-formed without the embedded VP, as in We tried Bill, just so
long as exactly two roles are assigned. The situation with try is somewhat more complicated than
we suggest in the text, because it is necessary to rule out *We tried Bill to leave, even when Bill
and to leave are coindexed. In such a case, Bill would actually have a thematic role—the one
which to leave assigns to its antecedent. We propose that the explanation in this case is that such
a configuration of Ł-roles and coindexing, where the antecedent lacks its own Ł-role, forces a
propositional interpretation, which is of course ruled out for certain verbs.
36
The question of whether PRO exists is logically independent of whether there is NP
movement; however, we believe we have shown (LLT, ch. 3) that a no-PRO theory with NP-
movement runs into unneeded complications. The only interesting theories in this respect have
both NP-movement and PRO, or neither. In light of what we take to be syntactic evidence
against PRO, we adopt the second alternative.
160 explaining syntax
5.5.3 NP-trace
Third, the PrP is important in distinguishing NP-trace and PRO. Given
an antecedent-[e] pair, the PrP gives a principled account of when the [e]
is the trace of ‘Move Æ’ for NP, and when [e] must be PRO. If the
antecedent is in a Ł-position, and the [e] is the trace of a movement,
then a violation of the Ł-criterion will occur at S-structure, because the
antecedent will have more than a single Ł-role. This is illustrated in (73a)
below. Example (73b) shows that the D-structure would represent a
violation of the PrP, because it is not a projection of the lexical properties
of the matrix verb:
(73) a. Johni hoped ti to leave.
b. [e] hoped John to leave.
In (73a), John would have two roles at S-structure, one directly from the verb
hope and the other via the trace. In the D-structure, the PrP is directly violated
because there are too few arguments to satisfy the lexical requirements of the
verb hope. (Presumably nothing can be assigned to an empty node [e] where
no lexical insertion has taken place.)
Alternatively, if the antecedent is in a non-Ł-position and the [e] is a PRO,
there will again be violations:
(74) John was seen PRO.
John in this case has no role whatever. The PrP therefore leads to a principled
distinction between PRO and the trace of NP.
In our theory, the issue does not arise because there is no PRO, nor is there
a trace of NP-movement. Even if there were a rule of NP-movement, for
reasons of learnability (see LLT, ch. 5)—and because there is no phonological
evidence for NP-trace (see Culicover and Rochemont 1983)—movement to an
argument position could not leave a trace. Movement to COMP or FOCUS,
control, pro, & the projection principle 161
however, must leave a trace. Here again, in our theory, the PrP is not
necessary.
5.5.4 Acquisition
Finally, and perhaps most importantly, the PrP implies that acquisition can be
based on the learning of lexical items. This issue is not addressed directly in
the literature on GB theory; however, Chomsky states (1981a: 31):
The grammar of a particular language can be regarded as simply a specification of
values of parameters of U[niversal] G[rammar], nothing more. Since the projection
principle has the consequence of substantially reducing the specification of the
categorial component for a particular grammar, it has corresponding implications
for the theory of language acquisition. Someone learning English must somehow
discover the subcategorization features of persuade, one aspect of learning its mean-
ing. Given this knowledge, basic properties of the syntactic structures in which
persuade appears are determined by the projection principle and need not be learned
independently. Similarly, a person who knows the word persuade ([and] hence knows
its lexical properties, specifically, its subcategorization features) can at once assign an
appropriate LF-representation and S- and D-structure when the word is heard in an
utterance, or in producing the word, and will recognize the sentence to be deviant if
other properties of the utterance conflict with this assignment. Hence languages
satisfying the projection principle in their basic design have obvious advantages
with respect to acquisition and use.
far as to outline how a formal proof would proceed—we also assume that
learning is based on the acquisition of information about lexical entries. The
learnability problem is cast in terms of the learning of the correct assignment
of thematic roles and grammatical relations to the correct NPs. Although we
do not assume the PrP for the learnability model, we believe that the plausi-
bility of degree-0 learning, as we conceive it, is demonstrated. We therefore
feel confident in claiming that the PrP also fails to give a unique characteriza-
tion of a theory of learning based on lexical information.
5.6 Conclusion
We have shown here that an interesting theory of control exists which makes
no use of PRO in the syntax. To do this, we have had to give detailed
consideration to two of the basic principles of the GB theory: the Projection
Principle and the Ł-criterion. We think we have demonstrated that neither
principle is necessarily supported by the syntactic data, given an alternative
theory based on the Locality Condition of LLT and on the level of R-structure
with the particular properties which we postulate for it. In addition, since our
theory has been developed with the specific goal of determining a plausible
learning theory—and since we take learnability requirements to be crucial in
the explanation of the structure of linguistic theory—we believe our work to
be particularly relevant at the explanatory level.
6
Negative curiosities
(1982)*
Remarks on Chapter 6
I was motivated to write this article by the idea being entertained in the late
1970s that ‘stylistic’ rules such as extraposition have no effect on the logical
form of a sentence, although they do have consequences for superficial
constituent ordering. This seemed to me at first sight to be wrong, because
of counterexamples resulting from extraposition of a negative over any, e.g.
*Pictures were on any of the tables of none of the men, which was my original
concern in sketching out this paper. Over time the paper morphed into an
extended investigation into a number of oddities involving negation in
English, including tag questions. I have omitted from the current version an
Appendix that now strikes me as superfluous to the main argument.
The main argument of the paper is that the logical properties of sentences
are determined by the most superficial syntactic representation, that is, the one
that corresponds directly to linear order. While the argument developed here is
essentially about the facts of interpretation, there are significant theoretical
implications. In particular, if the logical form of a sentence depends strictly on
the superficial structure, then the motivation for deriving extraposition
through movement is significantly weakened. This is a welcome result, since
a movement analysis of extraposition does not fit naturally with the treatment
of leftward movement constructions, such as wh-questions and topicalization.
Michael Rochemont and I pursued the issue of a rightward movement
analysis of extraposition in other work, including Culicover and Rochemont
(1990) and a 1997 paper, reprinted here as Chapter 7. We concluded ultimately
that extraposition is not movement, as originally suggested by the negative
curiosities discussed here, but a special case of predication, along the lines
discussed in Chapter 5.
* [This paper originally appeared as Peter W. Culicover, Negative Curiosities, Indiana Uni-
versity Linguistics Club, 1982. (Revision of Social Science Research Report 81, UC Irvine.) It is
reprinted by permission of the Indiana University Linguistics Club.]
164 explaining syntax
6.1 Introduction
There has been considerable effort expended in linguistics in recent years on
the investigation of the properties of unbounded movement rules, such as wh-
fronting and NP movement.1 This work has led to the development of the
trace theory of movement rules, in which restrictions on the output possibil-
ities of such unbounded rules are handled not by conditions on the rules
themselves but by constraints on the derived syntactic relationship between
the moved constituent and its trace, corresponding to the underlying position
of the moved constituent. The intriguing possibility has emerged that these
constraints may in fact be constraints on the logical forms corresponding to
the derived structures, and are not strictly syntactic constraints.2
My concern in this paper will be primarily with rules that are, from all
indications, not unbounded movement rules: tag formation, negative inver-
sion and Stylistic Inversion. To a considerable extent my interest here is a
descriptive (or perhaps observational) and not a theoretical one, because
there are certain facts that have been ignored in traditional treatments of
these rules, and which should, I believe, be taken into consideration in any
future account. However, the phenomena that I will discuss do have theoret-
ical implications, and while I will not pursue them in great detail here, I will
suggest some likely directions in which the evidence points. Specifically, it
appears that there are some logical properties of the sentences to be con-
sidered that appear to be determined by the linear order of constituents after
all transformations have applied.
To put these points into perspective, let us recall that Chomsky and Lasnik
(1977) propose that the logical form (LF) of a sentence is determined not by
the actual surface structure of the sentence, but by the intermediate structure
that results from the application of rules of ‘Core Grammar,’ such as wh-
Fronting and NP movement, cited above. Other rules are viewed as stylistic,
and do not, in the Chomsky and Lasnik proposal, bear on aspects of logical
form. Rochemont (1978; 1979; 1980) has developed a particular version of this
proposal, setting forth a characterization of the form and function of stylistic
rules.
Since the term “logical form” is a vague one, we could choose to speak
rather of a putative level of representation LF that has certain specific and
perhaps yet to be discovered properties. It is entirely plausible that limiting LF
to, say, representation of the binding relationships between quantifiers and
1
See Chomsky and Lasnik (1977) and references cited there. It is by no means universally
accepted that NP Movement is an unbounded rule, nor even that it is a transformation. For an
alternative view, see Bresnan (1978).
2
See Chomsky (1980) for a recent formulation of some constraints on logical form.
negative curiosities 165
main clause is the part of the sentence to the left of the comma, and the tag is
the part to the right of the comma.
(1) a. John drank the tea, didn’t he?
b. John didn’t drink the tea, did he?
The polarity of the main clause does not depend simply on whether there is
negation in the AUX position, but on a complex set of conditions. These
conditions appear to be surface structure conditions, in part, having to do
with the surface position of constituents containing negatives. The polarity of
the tag serves in turn as a diagnostic for what the scope of negation is in the
main clause.
(3)
a. John drank the tea, didn’t he?
a
I discuss the different types of tags and their meanings in Culicover (1973), reprinted as
Ch. 3 of this book.
b
When I wrote this paper I did not have the benefit of the subsequently developed ToBI
framework for annotating intonation (Beckman et al. 2005). As far as I know, the intonation of
English tags has yet not been given a precise description within the ToBI framework.
negative curiosities 167
It appears in fact that the intonation rise on the tag does not quite bring the
pitch up to the level at the comma.
The disputational tag has a relatively flat intonation. The pitch of the tag in
this case is determined by the pitch at the end of the main clause. If the pitch
is rising (in an expression of shocked disbelief), the pitch on the tag remains
at that level, as in (4).
(4)
You plan to marry my daughter, do you?
However, if the pitch falls on the last part of the main clause, as in contem-
plation of a recently expressed proposition, the pitch on the disputation tag is
low but flat.
(5)
You plan to marry my daughter, do you?
If the main clause is negative, the disputational tag is still positive.
(6)
You don’t plan to marry my daughter, do you?
(7)
You don’t plan to marry my daughter, do you?
It is impossible to have a disputational tag that is negative, attached to either a
positive or a negative main clause.
(8)
a. *You plan to marry my daughter, don’t you?
The assertival tag is similar to the interrogative tag, but differs from it in
intonation and in nuance. The intonation of the assertival tag is a falling one:
the pitch on the AUX starts higher than the pitch at the end of the main
clause, and falls back down to this level (approximately).
(10)
You plan to marry my daughter, don’t you?
The interpretation of this tag differs from that of the interrogative in that this
one expects confirmation from the listener, and does not simply seek con-
firmation. Arguably, the assertival tag is a variant of the interrogative tag
involving a switch of accent in the tag, which leads to a different intonation
contour and a slightly different intonation. It is certainly true that the two
types display the same polarity facts.
(11)
You don’t plan to marry my daughter, do you?
It should be noted that the same intonation as we find in the assertival tag
shows up in a case where there is no polarity difference.
(12)
You plan to marry my daughter, do you.
It is likely, however, that, this intonation contour is a consequence of putting
stress on the AUX, and does not signal a crucially different type of tag from
the flat, disputational tag that we discussed above.
c
The use of transformational rules to generate tag questions is a device rooted in the earliest
period of generative grammar. A more modern treatment would not employ such devices, but
would still be faced with the problem of describing what a possible tag question is, and what it
means. Given the idiosyncrasies that the English tags display, a more contemporary approach
would take a constructional perspective, as in e.g. Culicover (1999) and Kay (2002b).
170 explaining syntax
and that the force of the tag is determined in derived structure as well.
However, we do not have an argument here that the scope of negation is
determined after all transformations, since in this analysis placement of not
apparently determines its ultimate scope.
What I will argue however, is that the polarity of the tag cannot be
characterized simply by a rule of not-Placement, but depends on the scope
of negation expressed as a logical property. It is of some interest to note,
therefore, that there are problems with the purely syntactic analysis of tags
from the point of view of the syntax itself.
It appears to be a mistake to generate the two kinds of tags, interrogative
and disputational, by the same syntactic rules. Because of sentences like the
following we will have to extend the tag formation transformation to include
auxiliaries that follow AUX.
(15) a. John would have left, wouldn’t he have?
b. Mary should be here by now, shouldn’t she be?
c. Clancy hasn’t been trying very hard, has he been?
Ignoring here the precise form that such a rule would take, observe that these
extended tags cannot be used disputationally. With incredulous, rising inton-
ation, all of the following are quite unacceptable.
(16) a. *John would have left, would he have?
b. *Mary should be here by now, should she be?
c. *Clancy has been trying very hard, has he been?
These facts suggest that the syntactic generalization captured by the tag
formation rule ordered before not-Placement is a spurious one. Notice that
there is a way to avoid the problem just noted: remove the parentheses in Tag
Formation from not. The correctness of this revised analysis then rests on
whether the analysis captures all of the relevant data (which it does not), and
whether not-Placement itself is a well-motivated transformation. We can
avoid this latter question here, since we can show that the revised analysis is
not descriptively adequate, even if it is preferable to the analysis of (14).
Finally, analysis (iii) (in Culicover 1976) tries to explain the appearance of
negation in the main clause or the tag as a consequence of whether negation
has underlyingly sentential or verb phrase scope. The tag formation rule is
stated as follows:
For this rule to apply correctly, we must impose a special interpretation on the
meaning of the parenthesized not in the structural description: If not is
present between AUX and VP, it is moved into the tag. However, if there is
no not between AUX and VP, only the AUX is copied into the tag. That is, the
fourth term of the structural description in this case is satisfied by Ø, which is
disjunctive with not. Hence Ø (in effect nothing) is copied over into the tag.
In order to get negation in the main clause, it must be generated in some
position in addition to the immediate post-AUX position as a daughter of
S. To make this analysis work, we must generate not as a daughter of VP.
The claim, then, is that not that appears in the tag is underlying S negation,
while not that appears in the main clause is underlying VP negation. If there is
some semantic correlate to the syntactic position of negation, we would
expect that negated main clauses with positive tags would have a more
restricted range of interpretation than identical declaratives with negation,
since only in the case of the latter could the negative be attached to S or to
VP. As far as I know there is no data to suggest that this is the case. In fact, the
only data that pretends to illustrate a difference between sentential and VP
negation does not provide the relevant distinction.
(18) John doesn’t lie because he is honest.
On the traditional analysis, where negation takes wide (S) scope we get the
entailment that John doesn’t lie for the reason that he is honest, but he lies for
some other reason. In fact, he may not be honest. Where negation takes
narrow (VP) scope, we get the entailment that John doesn’t lie, and the reason
is that he is honest. (There are, of course, alternative analyses in which the
relevant variable is the position of the because clause, and not negation.) The
ambiguity shows up when we introduce a tag, however.
(19) John doesn’t lie because he is honest, does he?
The ambiguity of (19) is predicted in an analysis in which the difference
between VP and S scope is not tied to a syntactic difference in the position
of not, and is not predicted where the scope of not is syntactically
characterized.
did he?
(20) a. No one drank the tea,
didn’t he?
b. Pictures of none of the women were hanging in the gallery,
were they?
weren’t they?
are they?
c. Nobody’s pictures of Bill are on sale,
aren’t they?
The property that these examples share with the more traditional examples in
which a negative in the main clause selects a positive tag is that the main
clauses of these may also be paraphrased by it is not the case that, indicating
that both classes of example have wide (S) scope negation.3
(21) a. It is not the case that anyone drank the tea.
b. It is not the case that pictures of any of the women were hanging in
the gallery.
c. It is not the case that anybody’s pictures of Bill are on sale.
In order to incorporate such examples into a syntactic analysis, we would
have to add another rule of negative placement that moves not into constitu-
ents like the subject NPs in (20). In fact, Klima’s (1964) analysis of negation
contains, in addition to not-Placement, a subsequent transformation that
attracts not to a preceding indefinite NP, and another rule that incorporates
not with an indefinite to yield, ultimately, no, none, nobody, etc. In current
theory neither of these latter two transformations can be formulated. The rule
that incorporates not demands significant respelling in violation of the Strict
Lexical Hypothesis; see Jackendoff (1972: esp. ch. 9) for arguments against
Klima’s analysis.
The rule that attracts not to an indefinite must also look indefinitely far into
the subject NP to determine that an indefinite in fact is present, and the
incorporation rule must actually lower not into the NP. That there is no
principled bound to this lowering can be seen from examples such as the
following, constructed along the lines of (20b).
(22) a. Photographs of pictures of none of the women were hanging in the
were they?
gallery,
weren’t they?
b. Negatives of photographs of pictures of none of the women were
were they?
found in the darkroom,
weren’t they?
3
See Jackendoff (1972) for discussion of wide scope negation and its paraphrases.
negative curiosities 173
As expected, with a positive tag the negation in the main clause has wide
scope, and the following paraphrases are appropriate.
(23) a. It is not the case that photographs of pictures of any of the women
were found in the darkroom.
b. It is not the case that negatives of photographs of pictures of any of
the women were found in the darkroom.
A rule permitting the unbounded lowering violates two constraints accepted
in much of current syntactic theory: lowering is not permitted, and trans-
formations cannot apply over an unbounded domain.4
Granting that wide scope negation determines that the tag will be positive,
what determines that negation will have wide scope? From examples that we
have already encountered we may conclude tentatively at least that AUX
negation and negation in a subject NP will yield wide scope. Before continu-
ing with this line of inquiry, however, we should rule out the logical possibility
that the selection of the positive tag is determined by a small set of syntactic
conditions, and not by a single semantic property of the main clause. In
particular, we should rule out the possibility that it is sufficient simply for
there to be a negative in the subject NP in order for there to be a positive tag.
The following examples demonstrate that the condition is not syntactic.
(24) a. A man with no hair was on the bus, *was he?
wasn’t
b. Requests for no noise are treated with disdain, *are they?
aren’t
c. Movies with no children are popular with adults, *are they?
aren’t
Confirming our intuition is the fact that the following are not paraphrases of
the main clauses in (24).
(25) a. It is not the case that a man with any hair was on the bus.
b. It is not the case that requests for any noise are treated with disdain.
c. It is not the case that movies with any children are popular with
adults.
We thus illustrate the well-known fact about negation that it can take con-
stituent (here, NP) as well as sentential scope. The point here is that there is
4
For the constraint against lowering, see Chomsky (1965) and for a different formulation,
Wexler and Culicover (1980). Boundedness follows from a variety of independently proposed
constraints, including the Subjacency Condition of Chomsky (1973), the Binary Principle of
Wexler and Culicover (1980), Culicover and Wexler (1977), and perhaps the Subject Condition of
Chomsky (1973), at least for the examples in (22).
174 explaining syntax
refer either to the particular object by describing it, or to the type of object in
mind. When a physical relationship is involved that is described by a transitive
verb, it entails that there is some physical object corresponding to the direct
object, and hence the type interpretation of the indefinite NP will always be
paired with an existential interpretation.
The possibility of assigning wide scope to NPs in general is discussed by
Dresher (1977), who proposes the following rule:
(32) NP-Scope Interpretation
Any configuration [S . . . NP. . . ] can be interpreted either as
i. [S . . . NP. . . ] or as
ii. [S NP [AB x n [S . . . hen . . . ]]]
Dresher notes the clear applicability of this rule to cases in which the NP is
indefinite (pp. 372–3). Since negative NPs are indefinites, we will be able to use
this rule to get wide scope negation for cases like John is predicting the election
of no candidate (29a). Let us consider what the domain of (32) is.
As stated, (32) is extremely general. While he does not pursue the matter in
detail, Dresher does note that it is applicable at least to simple S’s, and to
complement S’s, as in (33).5
(33) Mary thinks that John is looking for a lawyer.
Dresher shows that this sentence, by the appropriate application of (32), is
predicted to have three readings, and all three appear to hold in fact. Example
(31a) shows that (32) applies to an NP within another NP. Given this, we can
use (32) to account for the wide scope of negation in all of the examples that
we have thus far considered, provided that we assume in addition that wide
scope of negation in fact is formally equivalent to the result of applying (32) to
a negative indefinite. For example, applying (32) to (31a) does not automatic-
ally give the desired result.
(34) a. John is predicting the election of no candidate.
b. no candidate [x n [John is predicting the election of himn]].
For purposes of this discussion we will simply stipulate that a logical form
such as (34b) with an indefinite taking wide scope is equivalent to a formal
logical expression in which the indefinite is interpreted as an existential, and
that furthermore if the indefinite is negative, it is interpreted as a negative
existential, as in (35).
5
Dresher’s example (59).
negative curiosities 177
(35) there does not exist [any candidate [x n John is predicting the election of
himn]]].
Such a stipulation is not a solution, but is made simply in lieu of having
worked out a complete and precise analysis, one which may well involve some
substantial reformulation of (32).
As predicted by (32), it should be possible to get wide scope of negation
when the negative indefinite is within a sentential complement. In fact, it is
possible, but it is necessary to assign heavy stress to the indefinite NP in order
for the interpretation to come through clearly.6 The following examples
illustrate.
(36) a. Karen believed that no one drank the tea, did she?
b. Carl claimed that he wanted none of the books, did he?
c. Sam predicted that no candidate would be elected, did he?
The positive tag is acceptable just in case we can read the negative constituent
as being an existential that takes scope over the entire sentence, not just the
that clause. This reading is closely related to the so-called not-Transportation
or not-Hopping reading given by the following paraphrases.
(37) a. Karen didn’t believe that anyone drank the tea, did she?
b. John didn’t claim that he wanted any of the books, did he?
c. Sam didn’t predict that any candidate would be elected, did he?
We can in fact generate the same entailments for the examples in (37) and
those in (30) by applying (32) to the latter and then applying the inference
exemplified in (34).
6
There are examples in which heavy stress facilitates the wide scope interpretation for non-
negative indefinites, as Dresher notes (1977: 370).
178 explaining syntax
would have to be stated in terms of the surface structure output of SAI, since
only the application of SAI would provide the crucial condition for wide
scope.
To avoid relating surface structures directly to logical form we could also
seek an analysis in which the syntactic structure contains a trigger both for the
relevant transformations and the wide scope interpretation. Suppose for
example, that after fronting of with no job we have one of the two following
intermediate structures.
(41) a. With no job NEG John would be happy.
b. With no job John would be happy.
NEG in (41) would trigger SAI and would assign wide scope negation to with
no job.
While the broad outlines of an analysis of this sort may be easy enough to
talk about, the details are neither trivial nor self-evident. Most significantly,
recall that we had found it possible to make use of Dresher’s general rule of
NP Scope Interpretation (32) to explain the possibility of having wide scope
negation given a negative indefinite. An analysis involving NEG divides the
responsibility for assignment of wide scope negation among two rules, one of
which is (32), and the other of which applies just in case a negative has been
fronted and the sentence contains NEG.
A proponent of the analysis involving NEG would naturally seek to gener-
alize NEG to all instances in which a negative constituent has wide scope,
whether or not it is fronted. Such a generalization still leaves us with Dresher’s
(32) for non-negative NPs, so that we will still have two rules for assigning
wide scope. On the whole it does not appear that anything is to be gained by
attributing wide scope negation to an abstract marker NEG except that we
could then avoid the conclusion that wide scope of negation is determined at
surface structure. Properly formulated, an analysis involving NEG would
allow the scope of negation to be determined in deep structure or at some
early stage in the derivation.
6.3 Any
We turn now to examples involving rules other than SAI that also suggest that
surface structure is the determining level of the scope of negation, and hence
of logical form. In order to maintain the claim that logical form is, in contrast,
determined after the rules of ‘Core Grammar’, it would appear to be necessary
to extend the definition of Core Grammar so broadly that it would lose all of
its theoretical interest.
180 explaining syntax
(48) John gave to the libraries in the city none of the books that he found,
did he?
In order to get the wide scope interpretation in (48) it is necessary to stress
none.d
Another observation about these sentences is the following: in general, the
meaning of a sentence after Heavy NP Shift has applied is identical to its
interpretation before Heavy NP Shift, suggesting on a classical model that the
interpretation be assigned before the rule applies. To interpret the sentence
after the rule applied would require that we reconstruct the original position
of the moved NP and move it back ‘in the semantics’. However we cannot
completely interpret the sentence before Heavy NP Shift if the scope of
negation is part of the interpretation of the sentence, since Heavy NP Shift
affects logical form. This appears to put us in somewhat of a quandary. So,
in deriving Heavy NP Shift we have to do the following. (i) We must
specify what surface position in the VP the direct object will have; (ii) we
must specify that the direct object functions as such; (iii) we must specify
the interaction between negatives and indefinites in terms of (i), not (ii).
These observations are consistent with the position that at least in part the
interpretation of the sentence depends strictly on surface structure after a
stylistic rule.7
A second construction that is thought of as stylistic (see Rochemont (1978))
but that affects logical form is Stylistic Inversion. In Culicover, 1977 I suggest
that the derivation of this construction has two parts. One part fronts the
d
This interaction between the position of negation and the position of the heavy NP is
consistent with the view that Heavy NP Shift is not movement, but an alternative ordering
within VP.
7
If the example in (ii) below is acceptable, there may be a stylistic rule that does not affect
logical form. Consider the rule of VP topicalization
(i) a. They said that John wouldn’t give the paintings to Mary, and he didn’t give the paintings
to Mary.
b. He said that John wouldn’t give the paintings to Mary, and give the paintings to Mary he
didn’t.
If the VP contains any in the scope of Aux negation, we get the following:
(ii) They said that John wouldn’t give any of the paintings to Mary and give any of the paintings
to Mary he didn’t.
If (ii) is good, it means that the scope of negation over any is determined before VP topicaliza-
tion. However, if (ii) is bad, and it probably is, VP topicalization must precede the assignment of
wide scope to not in the main clause.
182 explaining syntax
sister of an intransitive verb and leaves behind a dummy, and the other part
moves the subject into the position of the dummy.e (49) illustrates.
(49) a. John walked into the room. )
b. Into the room John walked ˜ )
c. Into the room walked John.
Each of these two rules can change the relative order of an indefinite and
negation, and this clearly has consequences for the interpretation.
(50) a. *Any of the men didn’t walk into the room.
b. Into the room didn’t walk any of the men.
(51) a. None of the men walked into the room.
b. Into the room walked none of the men.
(52) a. None of the men walked into any of the rooms.
b. *Into any of the rooms walked none of the men.
(53) a. The men didn’t walk into any of the rooms.
b. *Into any of the rooms didn’t walk the men.
(54) a. *Any of the men walked into none of the rooms.
b. Into none of the rooms walked any of the men.
Another rule, also stylistic, is extraposition of PP or PPEXT (Rochemont
1978). This rule also affects logical form, as shown below.
(55) a. Pictures of the women were hanging on the wall.
b. Pictures were hanging on the wall of the women.
(56) a. Pictures of none of the women were hanging on the wall.
b. Pictures were hanging on the wall of none of the women.
(57) a. Pictures of none of the women were hanging on any of the walls.
b. *Pictures were hanging on any of the walls of none of the women.
(58) a. *Pictures of any of the women were hanging on none of the walls.
b. Pictures were hanging on none of the walls of any of the women.
e
This derivation is somewhat different from the one that I proposed subsequently with
Levine in Culicover and Levine (2001), reprinted in this book as Ch. 9. The Culicover–Levine
analysis proposes that there are in fact two constructions. The details of the configuration of PP
and logical subject turn out not to be relevant, however, as long as the PP c-commands the
logical subject (so that any is licensed); the argument made here is that what matters is the linear
order of the constituents.
negative curiosities 183
(59) a. *Pictures of any of the women weren’t hanging on any of the walls.
b. Pictures weren’t hanging on any of the walls of any of the women.
Here, as elsewhere, we find ourselves in a somewhat puzzling situation. On
the one hand, we wish to represent the fact that the broken up constituent in
fact is interpreted as a constituent, and we might do this by mapping the
constituent into some semantic representation before PPEXT, for example. If
the co-occurrence of negation and indefinites with respect to one another is
an interpretive phenomenon, which it is in part, it might reasonably be
expected to be stated at this level of representation. But it cannot be, because
PPEXT can reorder the negatives and the indefinites.
What is particularly surprising the case of these last examples is that the
negative in the extraposed PP is sufficient to yield a sentential negative
interpretation for the entire constituent from which it was extraposed, but
this negative interpretation does not govern the any that follows. That is, in
(56b) we get a perfectly reasonable interpretation that no pictures of any of
the women were hanging on the wall. (We also get the odd but not totally
implausible interpretation that pictures depicting womenlessness were hang-
ing on the wall.)
However, as (57b) shows, this interpretation is still not sufficient to allow
any to appear. What this suggests is that the reading no pictures of any of the
women is an entailment of (57), and that there in fact is no level of representa-
tion where (56a) and (56b) or any of the other pairs are represented identi-
cally. How the rules for entailment are properly to be stated is a problem for
future study, and have interesting implications for accounts of strict surface
structure interpretation by a comprehension device.f
I conclude with a related matter, but one that does not involve any. It turns
out that there are parentheticals that must co-occur with sentential negation.
One such is I don’t think. Below it is compared with I think.
(60) a. John isn’t here, I (don’t) think.
b. John is here, I (*don’t) think.
It is well known that parentheticals may appear internal to sentences, and this
is illustrated by (61).
(61) John, I think, is here.
I don’t think can also appear internally. However, it turns out that in order for
I don’t think to be acceptable, it is not sufficient that the sentence contain
f
How this interpretation would work is the concern of Ch. 7.
184 explaining syntax
sentential negation that takes scope over the parenthetical. Rather, in addition
to this, the negative element must precede I don’t think in surface structure.
(62) a. John doesn’t believe that Mary is here, I don’t think.
b. John doesn’t believe, I don’t think, that Mary is here.
c. John doesn’t, I don’t think, believe that Mary is here.
d. *John, I don’t think, doesn’t believe that Mary is here.
The problem in (62d) is not the surface structure position of the parenthetical
(before AUX) per se, since a negative subject also allows the parenthetical.
(63) a. No one believes that Mary is here, I don’t think.
b. No one believes, I don’t think, that Mary is here.
c. No one, I don’t think, believes that Mary is here.
The requirement that the negation be sentential in scope is shown by pairs like
the following.
(64) a. *In not many years, Christmas will fall on a Tuesday, I don’t think.
b. In not many years will Christmas fall on a Tuesday, I don’t think.
And note the following as well.
(65) a. *In not many years, I don’t think, Christmas will fall on a Tuesday.
b. In not many years, I don’t think, will Christmas fall on a Tuesday.
That the parentheticals have anything directly to do with determining logical
form is unlikely. Nevertheless, the examples show that aspects of the scope of
negation cannot be determined at intermediate levels of the derivation, but at
surface structure. For example, if there is in fact a (stylistic) rule that moves
constituents around parentheticals [or parentheticals around constituents],
this rule must precede assignment of scope of negation so that it can be
determined in surface structure whether the internal parenthetical is to the
right of a negative that has wide scope. Specification of the scope of negation
before reordering of the parenthetical would yield the ungrammatical
examples of (60)–(64). Thus we may hypothesize that assignment of scope
of negation follows the reordering of parentheticals.
determined. It has been known for some time8 that a well-formed relative
clause need not have a relative pronoun, that or ∅ in COMP position. The
following examples show in fact that the rule fronting constituents in Stylistic
Inversion (cf. (49b)) may front a constituent that is in no obvious sense a
wh-phrase. The crucial sentence is (66c).
(66) a. ?This is the church which very expensive paintings are hanging on
the walls of. (wh-Fronting)
b. This is the church on the walls of which are hanging very expensive
paintings. (Stylistic Inversion)
c. This is the church hanging on the walls of which are very expensive
paintings. (Stylistic Inversion)
The fronted constituent in (66c) is a VP, presumably.9 Note that if the
condition requiring the relative clause to have a wh-phrase or a phrase
containing a wh-phrase in COMP is in fact a condition of logical form, this
condition cannot be applied until after the application of the rules deriving
Stylistic Inversion. It is possible, though, that only the inversion of the subject
is a stylistic rule, a possibility that will be discussed in somewhat more detail
below.
Let us turn now to an argument that wh-Fronting must apply after Stylistic
Inversion. If this argument is correct, it would follow that wh-Fronting could
not be a rule of Core Grammar, rendering the latter of little theoretical
interest. However, we will see that it may be possible to distinguish two
rules of wh-Fronting, along the lines suggested by Koster (1978a), thus
avoiding this conclusion.
In the following examples, Stylistic Inversion appears to have applied in
the lower S before wh-Fronting has moved the clause containing wh to the
higher S.
(67) This is the wall on which Mary claims were hanging twelve ghastly
pictures of Nixon.
(68) On which of these walls does Mary suspect were hanging the ghastly
pictures of Nixon?
The fronted constituent need not be a PP.
8
See Emonds (1976).
9
It would be natural to try to explain the fact that Stylistic Inversion applies to hanging on the
wall by reanalyzing it as something other than a VP, or by motivating a feature decomposition of
VP to allow generalization with other constituents that also trigger the rule. For some specula-
tion, see Culicover (1982).
186 explaining syntax
(69) This is the wall, hanging on which Mary claims were twelve ghastly
pictures of Nixon.
(70) Hanging on which of these walls does Mary suspect were the ghastly
pictures of Nixon?
Since Stylistic Inversion occurs in the lower S, but the wh-phrase ends up in
the higher S, we must conclude that the wh-phrase is fronted in the lower
clause first by Stylistic Inversion, and then moved into the higher clause by
wh-Fronting.
(71) This is the wall [COMP Mary claims [COMP twelve ghastly pictures of
Nixon were hanging on which]] )
This is the wall [COMP Mary claims [on which were hanging twelve
ghastly pictures of Nixon]] )
This is the wall [on which Mary claims [Ø were hanging twelve ghastly
pictures of Nixon]]
Koster (1978a) suggests that the rule of wh-Fronting that moves wh-phrases
out of complements is not a rule of Core Grammar, while wh-Fronting in
simple S’s is. Thus the examples in (67)-(70) simply show that the first rule
must follow Stylistic Inversion, but the second, core rule does not.
It might be supposed that this is an undesirable result, because it requires
that we break the one maximally general transformation of wh-Fronting into
two rules. However, Koster (1978a) also proposes that there is no rule of
wh-Fronting at all. Rather, what is part of Core Grammar is the coindexing of
an initial wh-phrase with its trace, while what is not part of Core Grammar is
a configuration in which an initial wh-phrase may bind a trace in a lower
clause. We need not concern ourselves with the technical details here.
Adopting this analysis of wh-Fronting requires us to reanalyze Stylistic
Inversion along the following lines: the topicalized constituent is generated
in initial position in the base; this constituent binds a trace; inversion of the
subject depends on the condition that the verb phrase contain a trace of an
intransitive V in the following configuration.10
10
The verb must be intransitive, because in general the subject of an S cannot be moved into
direct object position when the direct object has been fronted. It is necessary to specify in the
statement of the rule that the only daughters of VP are V and the trace, so that the subject does
not move into the position occupied by the trace of the object of a prepositional phrase.
(i) a. *Whoi did ei see Billj?
b. *Which tablei did ej sit [PP on Maryj]?
negative curiosities 187
11
Such examples were discussed by Akmajian in a paper presented to the LSA in 1974 at San
Diego.
g
Marginal though these examples are, Culicover and Levine (2001) end up making some-
what more of them than is proposed here.
12
Suppose we grant that there is no topic position in infinitives. Then examples like the
following argue for a trace-filling analysis of Stylistic Inversion.
(i) In this room John expects to be sitting an enormous elephant.
Since there is no position in the infinitive into which to move the adverb, it is impossible to
trigger Stylistic Inversion on the lower cycle. When the adverb is moved on the higher cycle, the
subject NP is not an enormous elephant, but John. However, we do not get
(ii) *In this room expects an enormous elephant to be sitting John.
188 explaining syntax
6.5 Conclusion
It appears that the scope of negation must be determined in surface structure
if SAI is not syntactically triggered, that tags cannot be generated by a
transformation, and that Stylistic Inversion must precede certain instances
of wh-Fronting. More generally, it seems to be the case that logical form
cannot be completely determined before surface structure, although in certain
constructions earlier levels of structure may contain sufficient information for
the assignment of logical forms. These results cast some doubt on the notion
that there is a level of logical form defined as the output of the transform-
ations wh-Fronting and NP Movement.
If the directional adverb is generated in initial position in the base, the trace of the adverb is
already in underlying structure, and Stylistic Inversion may apply cyclically. This argument is
vitiated if it turns out that sentences like (i) are ungrammatical (my judgments are unclear), or,
if there is COMP position in the infinitive through which a directional adverb may move.
PART II
Structures
This page intentionally left blank
7
Remarks on Chapter 7
Michael Rochemont and I wrote this paper for a conference on rightward
movement at Tilburg University. While we believed that extraposition could
be handled by interpretive rules, we were interested in seeing if we could find
conclusive arguments for or against treating extraposition as movement. The
antisymmetry perspective of Kayne (1994) ruled out a rightward movement
account of extraposition and Heavy NP Shift, and required such apparent
rightward movements to be a remnant of massive leftward movement. In the
course of the research we realized that the antisymmetry approach also allows
for an analysis of these constructions in terms of massive rightward move-
ment, given alternative assumptions about branching direction. Crucially, we
found that there was no empirical evidence to decide among the various
alternatives, and in the interest of keeping the syntax as simple as possible, we
concluded that the interpretive position was the preferred one.
7.1 Introduction
In this paper we will be concerned with the properties of rightward positioned
adjuncts in English that are in some sense dependent for their interpretation on
a position elsewhere in the sentence, e.g. relative and result clause extraposition
* [This chapter appeared originally in Dorothee Beerman, David LeBlanc, and Henk van
Riemsdijk (eds), Rightward Movement. Amsterdam: Benjamins (1997). It is reprinted here by
permission of John Benjamins. For their comments we would like to thank Bob Levine, Louise
McNally, and the members of audiences at the University of Groningen, Tilburg University,
Université du Québec à Montréal, and the University of British Columbia. Michael Rochemont’s
work on this project was supported by grant no. 410-92-1379 from the Social Sciences and
Humanities Research Council of Canada.]
192 explaining syntax
(5) CP
CP WhX
WH C⬘
C IP
IP SX
NP I⬘
I VP
VP OX
V NP
Constituency tests such as VP ellipsis, VP topicalization, and pseudo-cleft give
results that are consistent with this structure (see R&C), but they are consist-
ent with plausible alternatives, so we will not discuss them here.
The varying potential for coreference under Condition C of the Binding
Theory is also compatible with the same differences in adjunction positions.1
Example (6) shows that the subject c-commands an object-extraposed rela-
tive, and the examples in (7) show that an indirect object c-commands an
object relative only in its non-extraposed position.2 (It is not possible to
1
We do not consider parallel facts from bound variable interpretations of pronouns, though
the results are for the most part equivalent to the Condition C effects observed here. The
interpretation of variable binding examples is somewhat more complicated than the Condition
C facts, owing to the possibility that the former is constrained by LF c-command relations, as
suggested by the literature on weak crossover (see Culicover 1993a and Lasnik and Stowell 1991
for some recent perspectives).
2
As pointed out to us by Bob Levine, our account of (7b) presupposes that there cannot be
any ‘vacuous’ extraposition, in which the relative clause is adjacent to the head noun but
adjoined to the VP. Levine also notes that there may be some question as to the ungrammatic-
ality of (7b), in view of the relatively greater acceptability of examples such as the following.
(i) I offered heri many gifts from Central Asia that Maryi didn’t like.
In these examples, it appears that the PP internal to NP is sufficient to permit coreference. If this
is the case, then it is not clear that a similar effect is not in effect in (7b). Hence it is possible that
vacuous extraposition is may exist. Note that this possibility cannot be ruled out on the account
of C&R.
deriving dependent right adjuncts 195
An alternative hypothesis is that a dative pronominal does not c-command to the right in
VP. This possibility would appear to be falsified by examples such as the following.
(ii) a. *I told heri that Maryi would win.
b. *I offered heri Maryi’s favorite food.
c. *I gave heri some flattering pictures of Maryi.
The contrast between the examples in (ii) and (i) recalls the contrast between arguments and
adjuncts noted by Lebeaux (1988) in connection with anti-reconstruction effects, as in (iii).
(iii) a. Which gifts from Central Asia that Maryi didn’t like did shei try to sell to someone else?
b. ?Which of Maryi’s favorite foods did shei prefer?
c. *Which pictures of Maryi did shei like best?
Lebeaux’s observation is that pronominal subjects appear to produce condition C effects with
R-expressions in fronted arguments but not adjuncts. The facts in (i) and (ii) suggest that dative
pronouns produce condition C effects in R-expressions to the right of them that are in argument
position, but not those that are in adjuncts. A related point is made in fn. 3 below.
196 explaining syntax
Example (10) shows that a matrix subject does not c-command a relative
extraposed from wh in its own COMP, even if it does c-command the trace of
the wh. (Compare (9c).)3
(10) Which man did hei say came into the room that Johni didn’t like?
Finally, (11) shows that it is the surface and not the LF position of the
antecedent that is relevant to the positioning of the extraposed relative.
(11) a. *Who told heri that Sam was taking a student to the dance [CP that
the teacheri liked]?
b. *Who told heri that Sam was taking [which student] to the dance [CP
that the teacheri liked]?
(C&R)
To conclude, the height of attachment of an extraposed relative is a function
of the surface position of its antecedent. That is, given (5), an extraposed
relative is adjoined to the minimal maximal projection containing its surface
antecedent.
3
Bob Levine has pointed out to us that the absence of a Condition C violation in (10) appears
to parallel the anti-reconstruction facts discussed by Lebeaux (1988) (see also fn. 2 above).
(i) a. which man that Johni didn’t like did hei say came into the room
b. *whose claim that Johni was a spy did hei refuse to acknowledge
(ii) a. which man did hei say came into the room that Johni didn’t like (= (10))
b. *whose claim did hei refuse to acknowledge that Johni was a spy
If the adjuncthood of the relative clause is responsible for the absence of a Condition C violation
in (i.a), and not its adjunction site, then our argument is somewhat weakened. On the other
hand, it is possible that in (ii.b) the extraposed complement is adjoined above the subject, but
because it is an argument it undergoes reconstruction, which feeds Condition C. In this case, the
higher adjunction of the complement would not be sufficient to allow it to avoid Condition C,
while the higher adjunction of the relative clause would be.
deriving dependent right adjuncts 197
Even a matrix object (13) or matrix subject (14) can fail to c-command a result
clause extraposed from within the embedded complement.
(13) a. *I told heri that the concert was attended by many people last year
that made Maryi nervous.
b. I told heri that the concert was attended by so many people last year
that I made Maryi nervous.
(G&M)
(14) a. *Shei told me that the concert was attended by many people last year
that made Maryi nervous.
b. Shei thought that the concert was attended by so many people last
year that Maryi decided not to go this year.
Following G&M, we propose that so is the LF antecedent of the result clause.
That so has potentially different scope interpretations at LF is shown by (15),
whose two readings may be informally represented as (15a,b).
(15) Mary believes that Harryi is so crazy that hei acted irrationally. (G&M)
a. Mary believes that so [Harry is crazy][that he acted irrationally]
b. so [Mary believes that Harry is crazy][that he acted irrationally]
The two readings of (15) may be paraphrased as follows: (a) Mary has the
belief that Harry is so crazy that he acted irrationally, or (b) the extent to
which Mary believes that Harry is crazy is such that he acted irrationally. Let
us suppose that the result clause is adjoined to the clause over which so takes
scope at LF. This gives the correct results for an example like (16), where the
only reading compatible with Condition C places the result clause outside the
c-command domain of the matrix subject and correspondingly forces only
the wide scope reading for so; unlike (15), (16) is unambiguous.
(16) Shej believes that Harryi was so crazy that Maryj left himi.
With Guéron and May, we propose that so undergoes LF raising to achieve its
proper scope. Unlike Guéron and May, however, we suppose so to move at LF
as an adjunct. We therefore correctly predict that it will display LF island
effects with sentential subjects (17), wh-islands (18), complex NPs (19), and
adjunct islands (20b, 21).
(17) a. [[That so many people ate cheesecake] that we had to order more]
surprised us.
b. *[That so many people ate cheesecake] surprised us that we had to
order more.
(R&C)
(18) Mary wondered whoi was so crazy that hei acted irrationally.
198 explaining syntax
(19) a. Shei claimed that so many people left that Maryi must have been
lying.
b. *Shei made the claim that so many people left that Maryi must have
been lying.
(20) a. Shei tried to do so many pushups that Maryi hurt herself.
b. *Shei bent to do so many pushups that Maryi hurt herself.
(21) Shei hurried out after eating so much food that Maryi must have been
sick.
In all of these cases the coreference requires that the result clause be outside of
the clause that contains the so, because it has to be higher than the pronom-
inal. If so is prevented from moving, ungrammaticality or unambiguity
results. We conclude that the height of attachment of an extraposed result
clause is a function of the LF position of its so antecedent—the result clause is
adjoined at the surface to the clause to which so is adjoined at LF.
On the basis of our discussion of result and relative clause extrapositions,
we can state the following generalization: for both relative and result
clause extraposition, it is the antecedent that determines the
height of attachment of the extraposed phrase. In the case of relatives
it is the surface position of the antecedent, and in the case of result clauses it is
the LF position.4 This means that the extraposed clause can be no higher in
the tree than its antecedent, and it must be at least as high as its antecedent.
The precise interpretation of ‘high’ depends on independent assumptions
4
Since the bulk of our evidence for this generalization relies on Condition C effects, it might
be thought that the generalization is undermined by the observation that Condition C is
essentially an LF effect. The relative and result clauses might in fact be relatively ‘low’ in the
structure at the surface, and achieve positions satisfying the generalization only at LF under
movement. Our argument that this cannot be so is that extraposed clauses can be seen to appear
outside the clauses they ‘originate’ in even at the surface and quite apart from c-command
effects. In (i), the extraposed relative appears outside the temporal adverb even though the latter
is readily construed with the matrix verb. (That is, (i) can have the same meaning as (ii).) (See
R&C p. 37 for a similar example.)
(i) Mary expected her flight to be so late yesterday that she neglected to set her alarm.
(ii) Yesterday, Mary expected her flight to be so late that she neglected to set her alarm.
Similarly, (iii) can have the same meaning as (iv).
(iii) Shei thought that the concert would be attended by so many people last year that Maryi
decided not to go.
(iv) Last year, shei thought that the concert would be attended by so many people that Maryi
decided not to go.
We assume that since at the surface temporal adjuncts cannot escape from the clause they
originate in, they are similarly bounded at LF.
deriving dependent right adjuncts 199
about what the structures actually are. Given classical assumptions, we sup-
pose that the extraposed clause must be adjoined to the lowest maximal
projection that contains the antecedent; given other assumptions, which we
will discuss, the generalization would be implemented somewhat differently,
consistent with the differences in attachment that we have noted.
5
These observations motivate Baltin’s (1981) Generalized Subjacency.
200 explaining syntax
7.5.1 Stranding
Consider first a stranding analysis of relative-clause extraposition, on which
extraposed relatives are stranded by leftward movement of the antecedent, on a
par with Sportiche (1988)’s analysis of Q-Float in French. This analysis fails the
first requirement, in that it assigns a structure on the order of (23), where the
indirect object c-commands the relative clause whether it is ‘extraposed’ or not.
(23) ... .
NPIO .
NPDO .
a
That is, a PP that is an argument in virtue of being selected by a lexical head.
202 explaining syntax
(26) ... .
NPS .
NPDO EX
(27) ... .
Spec .
X .
Spec .
RX X IP
(28) ... .
Spec .
X .
Spec .
WhX X IP
(29) ... .
Spec .
X .
Spec .
.
OX X
.
IO
DO .
We must assume that some principle like the Complement Principle guaran-
tees the proper interpretation of the result/relative clause, and that the
structures in (27)–(29) appear at the appropriate level of clausal embedding.
One virtue of this analysis is that it readily captures the relative order of
relative clauses and other extraposed constituents. It also satisfies our three
requirements. Since the relevant arguments will always be contained in a
projection that excludes the extraposed constituent (the boxed constituent
in each structure), they will always fail to c-command the extraposed con-
stituent. In effect, leftward movement is producing the mirror image of the
underlying order without disturbing the crucial c-command relations. We say
‘crucial’ because certainly the structure in this case is different from the
adjunction structure that we assumed in the classical approach. But it is
possible to define a type of c-command such that the specifier containing
the extraposed clause c-commands the constituent containing the antecedent.
Of the three alternatives that we have considered, this last is the only one
that seems viable given the evidence that we have discussed. We emphasize
that while this is a leftward movement analysis, as opposed to base generation,
it too requires a version of the CP. This analysis remains incomplete, of
course, without (i) some account of why the boxed phrase must move,
(ii) independent motivation for the structures assumed, and (iii) an explan-
ation of what licenses the required movements, e.g. movement of IP across RX
into a higher Spec in (27).
b
Our original judgment of this examples was ‘*’, which I now believe is too strong. For
discussion of fully acceptable or almost fully acceptable extraction from subject NPs, see
Kluender (2004).
6
There are those who do not share our judgments about this example. To us, the difference
in grammaticality illustrated here is very sharp.
7
PTI cannot in principle license a parasitic gap because the HNP is a subject.
deriving dependent right adjuncts 205
c. (*)Did there walk into the room a man with long blond hair?
d. *This is the room that there walked into t a man with long blond
hair.
In R&C we argue that HNPS does not freeze the entire VP, because of
examples like the following.
(38) a. For whom did Bill purchase t last week an all expense paid ticket to
Europe?
b. I don’t remember for which of his sisters Bill bought in Europe t a
fourteenth century gold ring.
c. This is the woman from whom Bill purchased t last week a brand
new convertible with red trim.
But as Bresnan (1994) observes, we did not consider the possibility that the
extracted phrase is moved from a position following the HNP. Therefore, let
us provisionally accept the proposal originally made by Wexler and Culicover
(1980) that HNPS freezes the VP.8 Given this, the important point is that the
freezing effect in PTI is different from that in HNPS, since in PTI the entire
clause is frozen, while in HNPS only the VP is frozen, as extraction of the
subject and SAI show in (39).
(39) a. Which of these people purchased from you last week an all expense
paid ticket to Europe?
b. Did Bill buy for his mother anything she really liked?
Note that in comparison, extraposition of relative clauses from PP is possible
(cf. (24)).
R&C argue that these four properties follow directly from a rightward
adjunction account. There are two additional properties of a somewhat
different character that also suggest that HNPS and PTI involve movement.
First, HNPS out of a PP is impossible (Ross 1967).
(40) a. *I found the article in t yesterday [the magazine that was lying on the
coffee table].
b. *John talked to t at the party [several people who had blond hair].
(Rochemont 1992)
8
Bob Levine (p.c.) points out that Johnson (1985) argues against Bresnan’s point using
examples such as the following.
(i) Robin is a person [at whom]i I consider tj excessively angry ti [a whole gang of maniacal
Tolstoy scholars]j.
Here, the PP must originate to the left of the shifted NP, yet the VP does not appear to be frozen.
206 explaining syntax
for the fact that parasitic gaps are licensed in HNPS. And fourth, it does
not capture the full range of freezing effects in HNPS and PTI (see (36)–
(39) above).
(48) .
Spec .
NPs X .
Spec .
.
X
Spec .
V .
.
NPo
V PP
(49) .
Spec .
Spec .
.
NPo V
.
to
V PP
deriving dependent right adjuncts 209
(50) .
Spec .
X .
Spec .
.
X
there .
V .
.
NPs
(51) .
Spec .
Spec .
.
NPs X
there .
V .
.
ts
English, leftward movement from PP is not blocked. It also fails to block long
extraction of the HNP. These are exactly the properties that on a rightward
movement account are attributed to the Rightward Movement Constraint.
Seen from this perspective, the rightward movement account and the MHS
account have the same weakness: they must both provide for some means of
phrase bounding that is thus far not independently motivated by any property
of leftward movement. The equivalent of the Rightward Movement Con-
straint on the MHS analysis must be a principle whose effect is to guarantee
that the requisite functional structures to which the HNP and its containing
phrase move are immediately above the containing phrase. Thus the cost of
properly characterizing bounding appears to be equivalent in both accounts.
There do not appear to be any empirical differences between the two, at least
none that are tied to configuration. Our comparison of the leftward move-
ment and rightward movement accounts shows that it is possible to repro-
duce on the leftward movement account the essential properties of the
structures that would result from rightward movement. In principle, it
appears that the two are notational variants of one another, mutatis mutandis,
and there can be no empirical basis for choosing between them. Questions
that remain open on the leftward movement account concern independent
motivation of the required functional structure and the triggering and licens-
ing conditions on the movements.
For example, in the structures that we proposed on the MHS analysis of
HNPS, there is an open question as to whether and how the trace of the HNP
is properly bound (see (48)), since the HNP does not c-command its trace.
A parallel question arises in the licensing of parasitic gaps in HNPS, where the
HNP fails to c-command the parasitic gap. In this account, one possibility
would be to appeal to reconstruction to legitimize the relevant configurations.
Alternatively, we might suppose that neither proper binding nor the licensing
of parasitic gaps makes reference to c-command. One can conceive of an
equivalent notion to which these licensing conditions could make reference,
e.g. the HNP will be in some type of sister relation to the constituent
containing the trace or the parasitic gap. The sort of sister relation that
might qualify is one in which the two sisters are dominated by all of the
same lexical, but not functional, projections (Chomsky 1986: 13).
7.7 Conclusion
Let us review. First, the language-internal facts from English, at least, do not
bear on the question of whether there is rightward and leftward movement, or
just leftward movement. In fact, there is no empirical reason why there cannot
be strict leftward branching, with rightward movement deriving all of the
deriving dependent right adjuncts 211
ordering and relative height facts, essentially the converse of the MHS
analysis.
Second, the facts do bear on the question of what form such an analysis
must take. For example, an account invoking leftward movement must be of
the High Specifier type for both extraposition and heavy noun phrases. In
particular, neither the Stranding analysis of extraposition nor the Predicate
raising analysis of HNPS give rise to an empirically adequate account, unless
of course they involve movement to a high specifier as part of the derivation.
Third, the choice between successful leftward and rightward movement/
adjunction alternatives must hinge on their relative explanatory potentials.
For instance, we have argued that both types of account require separate
stipulations with the effects of the Complement Principle and the Rightward
Movement Constraint. If these stipulations can be derived from other con-
siderations on one or the other view, that view gains an advantage over the
other, to the extent that the derivation has no comparable equivalent on the
other view. (At present we can see no way of eliminating these stipulations on
either view.) Whatever the outcome of future exploration of these and related
questions, it remains clear that the question whether rightward movement
exists or not, at least for these constructions of English, is not an empirical
one.
8
Remarks on Chapter 8
I wrote a first version of this article for the Going Romance conference at the
University of Utrecht. I had been away from syntactic research for a few years
due to a flirtation with academic administration, but an ongoing reading
group that I had organized with Shigeru Miyagawa had helped me stay some-
what aware of what was going on. I was interested in what was happening with
the ‘exploded Infl’ proposed by Pollock (1989), and thought I would try to
apply the same type of analysis of the left periphery of English. I proposed that
English has an invisible functional Pol(arity) head between C and Infl. I did
not publish this paper in a journal because I was suspicious of the account
of the amelioration of the that-t effect when there is an adverb in [Spec,Pol]
between that and t, the ‘Adverb Effect’. Ultimately I argued against ECP
accounts of the that-t effect on the basis of the Adverb Effect—see
Chapter 9 below.
Much of the later work in the subsequent ‘cartographic’ framework
addresses some of the problems with the approach explored here and gener-
alizes it to languages other than English (see Cinque 2002; 2006; Belletti 2004;
Rizzi 1997; 2004; Cinque and Rizzi 2008).
* [This chapter appeared originally in Denis Delfitto, Martin Evergert, Arnold Evers, and Frits
Stuurman (eds), Going Romance and Beyond, OTS Working Papers, University of Utrecht (1992).
Portions of this material were presented to audiences at the University of Arizona, the Rijksuniversi-
teit van Utrecht, and ESCOL. For helpful discussion, criticism, and specific suggestions regarding the
analyses proposed in this paper I would like to thank Andy Barss, Arnold Evers, Hans den Besten, Alec
Marantz, Shigeru Miyagawa, J. J. Nakayama, David Pesetsky, Tom Roeper, Bonnie Schwartz, Frits
Stuurman, Laurie Zaring, and especially Marc Authier, Peter Coopmans, Heizo Nakajima, Michael
Rochemont, and Ayumi Ueyama. Naturally I am responsible for any errors.]
topicalization, inversion, and complementizers 213
8.1 Introduction
I argue in this paper that there are two complementizer-type positions in
English, as illustrated in (1). ‘Pol(P)’ abbreviates ‘Polarity (Phrase).’1
(1) [CP Spec C [PolP Spec Pol [IP . . . ]]]
The various arguments that I give are directed towards demonstrating that
there are generalizations that can be best explained if we assume the existence
of both C and Pol, with their associated maximal projections and specifiers.2
I will suggest that C ranges at least over that, Q, and [e], while Pol may be
at least neg, wh, and so.3 There is also evidence that Pol may be Focus.
Movement into [Spec,PolP] is licensed through Spec-head agreement, as is
movement into [Spec,CP]. Such licensing depends crucially on the ability of the
particular head to participate in an agreement relationship with Spec
(Chomsky 1986; Rizzi 1990; 1996). Movements into [Spec,PolP] yield Subject
AUX Inversion (SAI) because of the need for Pol when it is a bound morpheme
to adjoin to an overt element. I assume that ‘topic’ topicalization, where the
topic does not carry primary stress (Gundel 1974), is adjunction to a maximal
projection (e.g. CP, PolP or IP), and is not substitution for a Spec (Lasnik and
Saito 1992; Rochemont 1989). However, I suggest that ‘focus’ topicalization
(Gundel 1974) may in fact be substitution for [Spec,PolP] when Pol is Focus.
These points are developed in the following way. }8.2 demonstrates that
topicalization and Negative Inversion involve very different landing sites for
the fronted constituent. Topicalization creates a ‘topic island’ while Negative
1
I adapt the category Pol from Johnson (1989), who makes different use of it than is proposed
here. For Johnson, Pol is the category of the ‘adverbs’ so, too, and not. My proposal resembles several
others that have appeared recently, as well. Laka (1990) proposes a head for English, Spanish, and
Basque that resembles Pol in many respects; I will suggest a variety of additional evidence for her
general proposal as well as several modifications. Ueyama (1991) has argued for a similar head in
Japanese, while Koizumi (1991) proposes a somewhat different M(P) for ‘modal’ adverbs in
Japanese; the two proposals are not entirely compatible, however. Haegeman (1991) argues exten-
sively for a neg(P) external to IP in West Flemish, which appears to have many of the properties of
Pol when Pol takes on the value neg in my analysis. Authier (1991) suggests that CP can iterate in
English, yielding superficially similar structures to those that I investigate in this paper.
2
The view that there are two adjunction sites to the left of the subject is not entirely novel; see
e.g. Grosu (1975) and Reinhart (1981b). Reinhart in particular is concerned with the fact that it is
possible to extract from Hebrew clauses that appear to have filled COMP (relative clauses and wh-
questions) in violation of Subjacency. Rather than take S to be a bounding node, she suggests that
there are two escape hatches in Hebrew (and in Italian) but only one in English. The framework
within which their arguments are couched is sufficiently different from the current one that it is
not entirely clear how their evidence can be brought to bear on the current proposal.
3
Another value of C which I will not discuss at length here is Rel(ative). See fn. 26 below.
Laka (1990) shows, following Klima (1964), that there is a phonologically empty morpheme that
denotes affirmation and is in complementary distribution with neg.
214 explaining syntax
Inversion does not. The conclusion is that the first is adjunction, while the
second is substitution into a specifier position to the right of the comple-
mentizer, i.e. [Spec,PolP].
}8.3 produces a range of new evidence to support the analysis. (i) The presence
of Pol in addition to C allows certain subject that-t extractions not to violate the
ECP. (ii) The existence of C and Pol allows us to explain why inversion occurs
in embedded sentences with fronted negation and so, but not with fronted wh.
(iii) The analysis extends naturally to an account of Sluicing (Ross 1969b).
(iv) The availability of two complementizer positions, each of which has a
Spec, allows us to explain some subtle differences between why and how come,
on the assumption that they are both generated outside of IP (see Rizzi 1990).
In order to account for the licensing of subject wh and subject neg/so, it is
necessary to assume that in English PolP may be a complement of Infl as well
as of C. }8.4 pursues some implications of this analysis and extends it to the
account of focus constructions in Hungarian, English, and other languages.
}8.4 also examines briefly the implications of the Pol analysis for the verb
second phenomena of the Germanic languages.
For the purposes of this paper I will adopt aspects of the theoretical
perspective of Rizzi (1990) as modified by Cinque (1990), as well as that of
Lasnik and Saito (1992). The points that are most relevant to the investigation
here are the following.
Head government: The formal licensing portion of the ECP is reducible
to a requirement of proper head government.4
Spec-head agreement: A filled Spec is licensed by Spec-head agreement
(Rizzi 1990; 1991).
Empty C agrees: In English, that is inert with respect to agreement, while
empty C can agree with Spec. Thus, movement of a subject through
[Spec,CP] is licensed when C is empty, because C is coindexed with the
[Spec,CP] through Spec-head agreement, hence with the trace in [Spec,
IP] (Lasnik and Saito 1992; Rizzi 1990; Rochemont and Culicover 1990).
Topic islands: Adjunction to a maximal projection creates a barrier to
extraction (Lasnik and Saito 1992; Rochemont 1989). Following Cinque
(1990), a single barrier to movement bars extraction; hence topicaliza-
tion through adjunction creates a ‘topic island’.
X0 adjunction: Movement of a head X0 is always structure-preserving,
i.e. it is either adjunction to another X0 or substitution for empty X0
(Chomsky 1986; Baker 1988).
4
The term “proper head government” is taken from Rizzi (1990 ). Lasnik and Saito argue
that lexical government and antecedent government are distinct notions, but that only an X0 can
be a proper governor. For many cases, the two approaches converge, although the phenomena
are grouped somewhat differently.
topicalization, inversion, and complementizers 215
5
See Diesing (1990) for discussion of V-second in Yiddish.
6
As discussed in }8.4, there are two types of topicalization, with different intonations. It is
marginally more acceptable to extract from the ‘focus’ topicalization, structure, which I suggest
may not be an adjunction structure but a substitution for a Spec.
216 explaining syntax
7
Specifically, Cinque proposes the following definitions of barrier.
(113) Definition of barrier for government
Every maximal projection that fails to be directly selected by a category nondistinct from
[+V] is a barrier for government.
(114) Definition of barrier for binding
Every maximal projection that fails to be (directly or indirectly) selected in the canonical
direction by a category nondistinct from [+V] is a barrier for binding.
8
I thank Shigeru Miyagawa for suggesting this formulation to me.
9
If we wish to allow IP to be an inherent barrier, then an alternative account is possible.
Lasnik and Saito (1992) and Rochemont (1989) propose that adjunction to IP creates a ‘topic
island’ with respect to subsequent extraction from IP. The new IP node constitutes an extra
barrier. A Subjacency violation follows when something is extracted across the original IP, which
is a barrier, and the barrier created by adjunction of the topic. (i) illustrates.
(i) I asked [CP what [[IP [to Mary] [[IP Bill gave t t]]]
(The double brackets indicate the two barriers that what must cross.) Thus, the examples in (5)
are ruled out for two reasons. First, extraction of the wh over the two barriers is a Subjacency
violation; second, movement of Infl over the two barriers is a Subjacency violation.
It is also possible that the topic islands are a reflex of Relativized Minimality (Rizzi 1990). On the
face of it, both adjunction of the topic to IP and substitution of wh into [Spec,CP] are
A0 -movements, and thus should yield Relativized Minimality violation in combination. I leave the
question open for now; for some additional considerations, see the discussion in fn. 14 below.
topicalization, inversion, and complementizers 217
At no time, etc. are fronted expressions that are preceded by C and are
followed by an inverted I(nfl). If they are topics, they are adjoined to
IP. Then in these inversion examples, Infl must also adjoin to IP, in violation
of the requirement that movement of a head be a substitution or an adjunc-
tion to another head.10 On the other hand, if the fronted expression is
adjoined to CP, then that cannot be C.
Extraction from clauses in which Negative Inversion has applied cannot be
easily accommodated within this framework, regardless of which structure we
choose. If Negative Inversion is assumed to pattern like a wh-question,
extraction from a Negative Inversion clause should be blocked by the same
mechanism that blocks extraction from wh-islands in English. On the other
hand, if Negative Inversion is assumed to pattern like topicalization, extrac-
tion should be blocked by the same mechanism that blocks extraction from
topic islands. In either case, extraction should be unacceptable, but it is not.
The relevant data is given in (9)–(14).11
10
This assumption is not universally accepted. It is not made in e.g. Rochemont and
Culicover (1990), and it does not appear to be made by Lasnik and Saito (1992). It may well
be possible to replace the requirement that X0 movement and even XP movement be structure-
preserving by a requirement that adjunctions be properly licensed, along lines suggested by
Fukui and Speas (1986), Hoekstra (1991), and Culicover (1993b).
11
There appears to be a ‘focus’ topicalization construction in English that differs from the
‘topic’ topicalization construction intonationally, and in not creating a topic island. The starred
examples in (9), (11), and (13) are much improved under the ‘focus’ topicalization reading. See
}8.4 for discussion.
218 explaining syntax
12
I assume here that a negative constituent in [Spec,CP] should count as an A0 minimality
domain for a wh in a higher [Spec,CP] that c-commands it. However, as I note below, it turns
out that Relativized Minimality does not hold for wh/negative interactions. Even if Relativized
Minimality does not apply, the force of the evidence is still that there is a maximal projection
different from CP involved in the derivation of negative inversion.
13
It has been proposed that that may take a CP complement (Rizzi and Roberts 1989; Authier
1991); Chomsky (1977) adopts a similar approach in a earlier framework. Such a structure must
be severely constrained so that illicit sequences are not generated: *that that ( . . . ), *who that, *at
no time who, *at no time that, etc. Taking PolP to be the complement of C imposes these
restrictions directly, in terms of the range of C and of Pol. In some sense, of course, the two
options are notational variants of one another.
topicalization, inversion, and complementizers 219
adjunction of the negative constituent would create a topic island and block
the movement of XP into [Spec,CP]. As the following sentences show, relative
clauses allow Negative Inversion.14
?* with great difficulty Lee can carry
(15) These are the books which *to Robin Lee will give .
*on the table Lee will put
only with great difficulty can Lee carry
(16) These are the books which only to Robin will Lee give .
only on this table will Lee put
Again, the evidence suggests that there is an additional landing site for the
negative constituent that is distinct from [Spec,CP].
14
The fact that it is possible to extract from a Negative Inversion sentence undermines the
Relativized Minimality account of topic islands (see fn. 9 above). Negative Inversion involves
substitution for Spec, and hence is an A0 -movement. If topicalization and wh-Movement are
also A0 -movements, they should be blocked by the movement of a negative constituent into
Spec, but they are not. One inference to draw is that movement of a negative into Spec is a
different type of movement from topicalization and wh-Movement, so that Relativized Minim-
ality does not apply. But then it is equally or more plausible on formal grounds that topicaliza-
tion and wh-Movement are also different types of movement from one another.
220 explaining syntax
Op that
(17) a. Robin met the man whoi for all intents and purposes ti was the
i
mayor of the city.
Opi that
b. This is the tree which just yesterday I had tried to dig up ti with
i
my shovel.
c. I asked whati in your opinion Robin gave ti to Lee.
d. Lee forgot which dishesi under normal circumstances you would put
ti on the table.
In each of these cases there is extraction of a wh-phrase over an adjunct, yet no
topic island violation of the sort seen in examples such as (11) and (15) above.
Why this should be the case is an independently complex matter that I cannot
go into here; in any case, the empirical evidence shows that not all adjuncts
create topic islands.
Assume now that if there is no Pol and nothing that must be adjoined to
PolP, PolP is not present. If PolP is not present and if there is an adjunct that
does not create a topic island, a constituent Æ can move over the IP-adjunct
into [Spec,CP], as in (18).15
(18) [CP [Spec Æi] C [IP XP [IP . . . ti . . . ]]]
Suppose next that XP is adjoined to PolP, again not producing a topic island
in this case. A constituent Æ can move into [Spec,PolP] and then into [Spec,
CP] over a PolP-adjunct, if there is no topic island, as shown in (19).
(19) [CP [Spec Æi] C [PolP XP [PolP [Spec ti0 ] Pol [IP . . . ti . . . ]]]]
Thus, in cases where adjunction does not create a topic island, there will be
two possible structures for extraction over the topic, namely (18) and (19).
Suppose now that Æi is the subject of IP. Furthermore, let C be that, which
cannot undergo Spec-head agreement (Rizzi 1990). I continue to assume that
there is an XP adjunct in each case that does not create a topic island.
(20) a. . . . [CP [Spec Æi] that [IP XP [IP ti . . . ]]]
b. . . . [CP [Spec Æi] that [PolP XP [PolP [Spec ti0 ] Poli [IP ti . . . ]]]]
15
I am assuming for completeness that adjunction to IP of a non-topic island adjunct is a
possibility. But nothing hangs on this assumption. Suppose that we could independently
demonstrate that the non-topic island adjuncts are not moved, but generated in adjunct
position in D-structure. Then things would actually be simpler if we were to assume that
there are no D-structure IP adjuncts. We could continue to suppose that Move Æ can adjoin
either to IP or to PolP. All of these conclusions are consistent with the analysis later of why,
which I argue is generated in D-structure in [Spec,PolP].
topicalization, inversion, and complementizers 221
16
As Peter Coopmans has pointed out to me, a question now arises as to the status of ti0 in
(20b). This trace is not lexically governed or antecedent-governed under the definition of
Rochemont and Culicover (1990) or head-governed under the definition of Rizzi (1990). The
most natural approach to take here is to say that the correct structure when the that-t effect is
suspended is not in fact (20b), but (i).
(i) Æi . . . [CP [Spec ] that [PolP XP [PolP [Spec ] Poli [IP ti . . . ]]]]
Either (i) is a long extraction of the sort discussed by Cinque (1990), or the non-argument trace
can be freely deleted in LF (Lasnik and Saito 1984). What is essential is that the empty Pol is
licensed by the adjoined XP and in turn licenses the empty subject position, which is not
possible when XP is adjoined to IP, as in (ii), or when there is no adjunct, as in (iii).
(ii) Æi . . . [CP [Spec ] that [IP XP [IP ti . . . ]]]
(iii) Æi . . . [CP [Spec ] that [IP ti . . . ]]
On the long extraction approach, the mechanism by which an empty Pol (or C) head governs
the subject cannot involve Spec-head agreement, since there is nothing in [Spec,PolP].
222 explaining syntax
Op that
(22) a. *Robin met the man whoi Leslie said that ti was the mayor of
i
the city.
b. *This is the tree Opi that I said that ti had resisted my shovel.
c. *I asked whati Leslie said that ti had made Robin give a book to Lee.
d. *Lee forgot which dishesi Leslie had said that ti should be put on the
table.
Op that
(23) a. Robin met the man whoi Leslie said [ei] ti was the mayor of the
i
city.
b. This is the tree Opi that I said [ei] ti had resisted my shovel.
c. I asked whati Leslie said [ei] ti had made Robin give a book to Lee.
d. Lee forgot which dishesi Leslie had said [ei] ti should be put on the
table.
In order to capture the difference between (21) and (22), we must make the
natural assumption that when Pol and [Spec,PolP] are entirely empty and
nothing adjoins to PolP, PolP is pruned from the structure. Otherwise, if we
were to allow empty [Spec,PolP] and a PolP with nothing adjoined to it, we
would expect to never get the that-t effect. Crucially, we cannot take the non-
topic island adjuncts to be in [Spec,PolP], because we would then lack the
formal mechanism for linking Pol with the subject in trace position through
Spec-head agreement with a trace in [Spec,PolP]. (But see fn. 16 above for
some indication that presence of the empty Pol itself, and not the contents of
[Spec,PolP], is what is relevant here.)
We predict that the counterpart of the that-t effect will be suspended in case
the complementizer is other than that. It is impossible to test this prediction
in the case of infinitives, because Pol only selects tensed IP complements (see
fn. 22 below). But suppose that the complementizer is Q, to be discussed in
greater detail in }8.3.2 below. There appears to be a suspension of the ‘Q-t’
effect as well.
(24) a. *Who did Lee wonder whether t had left
b. ?Who did Lee wonder whether Leslie had seen t
c. ?Who did Lee wonder whether just yesterday t had left
d. *Why did Lee wonder [whether Leslie had left t]
e. *Why did Lee wonder [whether just yesterday Leslie had left t]
Assume the analysis of Cinque (1990). Example (24a) is an ECP violation,
since the subject is not head-governed. Long movement of the subject does
not save this sentence. (24b) involves long extraction from a weak island.
There is no ECP violation, since the direct object is properly head-governed.
Example (24c) should be judged as acceptable as (24b), since presumably the
topicalization, inversion, and complementizers 223
empty Pol properly head-governs the subject in this case. While the judgment
is somewhat subtle, the acceptability of this example appears to be closer to
that of (24b) than to that of (24a) and (24d,e), which are ECP violations.
Examples with other adjuncts confirm this general tendency.
(25) a. ?the person who Lee wondered whether *(for all intents and circum-
stances) t was already the Democratic candidate
b. ?the pasta that Lee forgot whether *(in your opinion) t should be
served for dinner
c. ?What did Lee wonder whether *(under more normal circum-
stances) t would have been served for dinner
17
Of course it is possible to front a negative constituent without inversion, as shown by
Klima (1964). I am focusing here on those cases in which the negative has sentential scope. For
discussion of the interpretive difference between Negative Inversion and ordinary topicaliza-
tion, see Klima (1964), as well as Liberman (1974) and Rochemont (1978).
18
We may take a similar approach to so-Inversion, illustrated in (i).
(i) So many people did John insult that he did not dare return home.
We would therefore predict that extraction from a so-Inversion context will be grammatical, by
analogy with extraction from a Negative Inversion context. The judgments are marginal at best,
however, for reasons that are not clear to me.
(ii) a. Mary says that she will sell this book to so many people that she will become rich.
b. ?This is the book that Mary says that to so many people will she sell that she will become
rich.
(iii) a. Mary says that she will put the books on so many tables that the floor will collapse.
b. ?These are the books that Mary says that on so many tables will she put that the floor
will collapse.
(iv) a. Mary says that she will read the book with so much attention that she won’t hear the
phone ring.
b. ?This is the book that Mary says that with so much attention will she read that she won’t
hear the phone ring.
224 explaining syntax
[The marginal sentences are instances of crossing dependency, which could be responsible for
the judgment.]
19
Laka (1990: 40) proposes that Infl must move to neg as a consequence of the following
Tense c-command condition, based on a suggestion by Pollock (1989): “negation must be
c-commanded by Infl at S-structure.” More generally, in S-structure Tense must dominate all
other inflectional elements, including neg. If I am correct that English has both a complemen-
tizer Q and a Pol wh, then the fact that Infl does not raise to Q might constitute a problem for
such an approach.
20
A not dissimilar account is given by Rizzi (1996). Rizzi suggests that in wh-questions I is
marked [+wh]. This I moves to C in order to license Spec-head agreement with a wh in Spec.
The two approaches are technically very similar. One difference appears to be that by incorpor-
ating Pol into I in the form of a feature, we would lose the ability of empty Pol to license a subject
trace, as discussed in }8.3.1.
topicalization, inversion, and complementizers 225
21
In fact, it may be that in some languages, Q is realized overtly as that (or whatever
corresponds to that). For example, Bavarian (Bayer 1984) may have the sequence wh-daß.
(i) I woass ned [wanni (dass) [da Xavea ti kummt]]
I know not when that the Xaver t comes
(ii) Es is no ned g’wiess [weai (dass) [ti kummt]]
it is yet not sure who that t comes
(iii) dea Hund [deai (wo) [ti gestern d’Katz bissn hot]]
the dog which that t yesterday the cat bitten has
(iv) de Frau [deai (wo) [da Xavea ti a Bussl g’gem hot]]
the woman to-who that the Xaver t a kiss given has
Similar examples for relative clauses (but not questions) are cited for English by Grimshaw
(1975) (see also Bresnan 1976 ; Chomsky and Lasnik 1977 ), where Rel is realized as that.
22
It is possible to have wh-infinitives in English, but not neg-infinitives or so-infinitives.
(i) a. I was wondering whether (or not) I should leave.
b. I was wondering what I should do.
c. I was wondering how many times I should call.
d. I expected that not once would I see John.
e. I expected that so many people would I meet that I wouldn’t be able to count them all.
(ii) a. I was wondering whether (or not) to leave.
b. I was wondering what to do.
c. I was wondering how many times to call.
d. *I expected not once to have seen John.
e. *I expected so many people to meet that I wouldn’t be able to count them all.
The current account crucially provides both [Spec,CP], the landing site for fronted wh, and
[Spec,PolP], the landing site for fronted neg and so. The evidence of these examples is that Pol
selects for Tense. Note in this regard that Negative Inversion cannot apply in subjunctives and in
imperatives.
(iii) a. It is important that you never talk to them.
b. *It is important that never (do) you talk to them.
(iv) a. You talk to no one.
b. *To no one do you talk.
(v) a. No one talks to anyone.
b. *To no one does anyone talk.
These facts follow if subjunctives and imperatives lack Tense but have Agr, as suggested by
Beukema and Coopmans (1989).
226 explaining syntax
23
The somewhat greater acceptability in embedded questions of only-phrases than NegPs
raises the possibility that there are different functional categories for the two.
24
As noted by Hooper and Thompson (1973), the restriction on the distribution of wh-
inversion is not a syntactic one, since it can be found in subordinate clauses that have a ‘root’
function.
25
This agreement is referred to by Rizzi (1996) as the “wh Criterion” for wh-questions
(following May 1985) and the “Neg Criterion” for the negative cases. One aspect of these criteria
is that the Spec position must be filled. How this requirement is to be satisfied in the case of yes-
no questions is a problem that I touch on below. Rizzi does not address it in his analysis.
topicalization, inversion, and complementizers 227
There are thus the following derived structures for wh-questions and Negative
Inversion.
(31) a. [PolP [Spec WhPi] wh [IP . . . ti . . . ]]
b. [PolP [Spec NegPi] neg [IP . . . ti . . . ]]
In embeddings, [Pol wh] cannot appear. Therefore, there is no movement of
Infl in an embedded question. The WhP must move into [Spec,CP] in order
to undergo Spec-head agreement with the complementizer Q. But neg can
appear as Pol in an embedded sentence, and so there is embedded Negative
Inversion.
(32) a. . . . [CP [Spec WhPi] Q [IP . . . ti . . . ]]
b. . . . [CP [Spec ] C [PolP [Spec NegPi] neg [IP . . . ti . . . ]]]
Assume, as before, that PolP is optional.26
At this point it might be objected that the theory of interrogative syntax is
rendered unaesthetic by the assignment of interrogative properties to both C,
in the form of Q, and to Pol, in the form of wh. In fact, one might counter
this objection by saying that such a distribution is the norm. To support this
position, I note the analysis of negative complements of Laka (1990). Laka
shows that negative verbs such as deny, regret, and forget do not have the
feature neg, which explains why they do not govern Negative Polarity Items
(NPI) in object position, in contrast with not.
26
I do not discuss relative clauses at length in the text. My analysis suggests that the head of a
relative clause is the complementizer Rel, which must undergo Spec-head agreement with a
suitable constituent in [Spec,CP]. I predict that Negative Inversion and so-Inversion will be
possible in relative clauses, and they are.
(i) This is the man that only once did I talk to.
who
(ii) This is the man that so many times did I talk to that I was arrested.
who
Interestingly, Negative Inversion may apply when the constituent in [Spec,CP] is negative as well
as relative.
(iii) These are the people, none of whom had I ever seen.
The grammaticality of this sentence suggests the following derivation.
(iv) people, [CP [Spec [none of whom]i] Rel [PolP [Spec t 0 i] neg+Infl] [IP . . . tj . . . ti]]]
The NegP none of whom first moves into [Spec,PolP], where it triggers inversion. Presumably it
or its trace satisfies Spec-head agreement with neg. Then it moves into [Spec,CP], where it
satisfies Spec-head agreement with Rel.
228 explaining syntax
denied
(33) a. *I regretted anything interesting.
forgot
say
b. I didn’t claim anything interesting.
remember
However, NPIs appear in complements of these verbs.
denied
(34) I regretted that anything interesting happened
forgot
So Laka concludes, correctly I believe, that the complements of these negative
verbs contain the complementizer thatNEG, which governs the NPIs. In this
regard the negative verbs are entirely parallel to interrogative verbs, such as
wonder, ask, etc. in English, which select the complementizer Q.27 Thus, given
the existence of the negative polarity marker neg and the negative comple-
mentizer thatNEG, the existence of a parallel pair consisting of an interrogative
polarity marker wh and an interrogative complementizer is not surprising.
8.3.3 Whether
Let us turn to yes-no questions. The traditional analysis of yes-no questions in
generative grammar starts with the assumption that these are wh-questions in
27
Laka’s discussion is extensive, and I have given here only a brief motivation for the
analysis. Perhaps the strongest evidence in favor of her analysis is that, while normally NPIs
cannot be moved to the left of their governor, clauses containing NPIs can be so moved if they
contain the negative complementizer. Consider the following examples.
(i) a. Robin didn’t say anything interesting.
b. *Anything interesting, Robin didn’t say t.
(ii) a. Robin didn’t say that anything interesting happened.
b. *That anything interesting happened, Robin didn’t say t.
(iii) a. Robin denied that anything interesting happened.
b. That anything interesting happened, Robin denied t.
(Laka does not cite these cases, but does cite examples involving subject complements that make
the same point.) Along similar lines, note that the NPI must be c-commanded by the element
that governs it. Such a relationship does not hold in a pseudo-cleft, nor does ‘reconstruction’
feed the constraint that licenses NPIs. But within a selected negative clause in focus position of a
pseudo-cleft, an NPI is fine.
(iv) a. Robin didn’t do anything interesting.
b. Robin denied that anything interesting happened.
(v) a. *What Robin didn’t do was [anything interesting].
b. What Robin denied was [that anything interesting happened].
The force of this evidence, along with Laka’s, appears to show clearly the existence of thatNEG.
topicalization, inversion, and complementizers 229
disguise, in that they contain a covert wh element that triggers inversion (Katz
and Postal 1964; Klima 1964). This element is whether.
The traditional approach to the direct yes-no question also assumes that
whether is deleted in S-structure. Such an analysis does not explain why this
deletion is obligatory, or why it is impossible in embedded wh-questions.
(35) a. (*whether) did you call Robin
b. I wonder *(whether) Lee called Robin
We could therefore modify the traditional analysis as follows. The absence of
whether in the S-structure of direct yes-no questions suggests that whether is
never in [Spec,PolP]. Rather, whether is a CP-adjunct, and thus will move into
[Spec,CP] to satisfy Spec-head agreement with the complementizer Q.
(36) . . . [CP [Spec whetheri] Q [PolP [Spec ] Pol [IP NP I VP]] ti ]
The treatment of whether as a CP-adjunct is consistent with Klima’s (1964)
analysis, in which whether has the underlying form wh-either. Either, for its
part, is plausibly analyzed as a CP-adjunct, the affective variant of too, as in
Robin didn’t leave, either; Robin left, too, etc.
On this view of whether as a CP-adjunct, inversion in a direct yes-no
question cannot be the reflex of movement of whether to [Spec,PolP] and
then deletion of whether. Inversion must arise from the adjunction of Infl to
wh when [Spec,PolP] is empty. The derivation of a direct yes-no question is
then as follows.28
(37) [PolP [Spec ] wh [IP NP Infl VP]] )
[PolP [Spec ] wh+Infl [IP NP t VP]]
If Pol is neg or so, we will get inversion after whether, as illustrated in the
following examples, repeated from (30).
(38) a. ?Lee wonders whether at no time at all would Robin volunteer.
b. Lee wonders whether only then would Robin volunteer.
c. ?Lee wonders whether so many people did Robin insult that he does
not dare return home.
d. Lee will finally tell us whether or not to so many people did Robin
give his phone number to that we can expect phone calls all week.
28
This derivation obviously requires that empty [Spec,PolP] agrees with wh for the purposes
of Spec-head agreement, which appears to conflict with Rizzi’s (1996) wh-Criterion, which
requires that [Spec,CP] be overtly filled. In order to maintain this criterion, we would have to
assume the existence of an abstract operator (e.g. WH+SO) that is a PolP-adjunct. I can find no
independent syntactic evidence to support the existence of such an operator.
230 explaining syntax
29
For a different view of if and whether, see Stuurman (1991).
30
There may be a phonologically empty variant of if that occurs in subjunctive inversion.
(i) a. If John had left, I wouldn’t have called.
b. Had John left, I wouldn’t have called.
Let us call this element if. Like if, if is a C. I presume that, like neg and so, it must be bound
even though it is phonologically empty. Thus we get inversion, as in (ii).
(ii) [CP [Spec ] if+hadi [IP John ti left]]
31
We do not get *I forget [whether Q [IP e ]], for reasons that are probably tied to the fact that
Sluicing is a focusing construction, and whether cannot be in focus. For a general approach to
the syntax of focus, see Rochemont (1986) and Rochemont and Culicover (1990).
topicalization, inversion, and complementizers 231
who
what
where
when
(42) . . . but I forget [CP how Q [IP e ]]
why
which NP
how AP
etc.
In this construction, [IP e ] is interpreted in such a way that in LF it contains
a variable that is bound by the fronted WhP. For example, (43) is interpreted
as (44).a
(43) Robin saw someone, but I forget who
(44) ∃x (Robin saw x), but I forget who:x )
∃x (Robin saw x), but I forget who:x (Robin saw x)32
Crucially, there is no counterpart to the Sluicing construction for topicaliza-
tion, fronted NegP or fronted SoP, as illustrated in (45).
(45) a. Robin saw someone, and I believe that Fred, *(Robin saw t)
b. Lee said that Robin saw someone, but I believe that not a single
person *(did he see).
c. Lee asked whether Robin saw everyone, and I said that so many
people *(did he see that . . . ).
The ungrammaticality of these examples supports the view that embedded
questions are structurally different from topicalization, Negative Inversion,
and so-Inversion in ways that I have already discussed. The ungrammaticality
of (45b) and (45c) follows directly from our analysis, since without the
possibility of inversion in the embedded clause, the morpheme neg or so
cannot be bound.
NEG
(46) . . . and NP V [CP [Spec ] that [PolP NegP/SoP] SO
[IP e ]]]
NEG
Necessarily, SO
cannot cross over the filled [Spec,PolP] and adjoin to that.
This is a plausible assumption to make for such a cliticization operation.
a
For a more recent account of the interpretation of Sluicing that does not assume an empty
IP, see Culicover and Jackendoff (2005; 2012).
32
This analysis of Sluicing entails that the island constraints cannot be conditions on the
LF representations since, as Ross pointed out in his original paper, there are well-formed
instances of Sluicing that violate the Complex NP Constraint, for example.
(i) John met a man who was wearing some kind of hat, but I don’t know what kind of hat
[*John met a man who was wearing t].
232 explaining syntax
The ungrammaticality of (45a), on the other hand, may stem from the fact
that the empty IP is not formally licensed by that, owing either to the presence
of the topic, the inability of that to be a head governor in general, or both.
I leave the question open here.
Consider next (41b) and (41c). Here, unlike in the case of Sluicing, the
empty IP may be treated as a prosentential that does not contain a variable
that is bound from outside IP. I represent this IP as +pro, without claiming
that it necessarily has the properties attributed to +pro in the Binding theory.
(47) a. . . . [CP Spec that [PolP Spec neg [IP +pro ]]]
b. . . . [CP Spec that [PolP Spec so [IP +pro ]]]
Unlike in the topicalization case of (45a), the empty IP here is properly head
NEG
governed by SO
. But because neg and so are morphemes that must be
bound, these are ill-formed S-structures as given here. Suppose that neg and
so adjoin to that over an empty Spec.33
(48) . . . [CP that+negi [POLP Spec ti [IP +pro ]]]
. . . [CP that+soi [POLP Spec ti [IP +pro ]]]
33
Alternatively, we may assume that cliticization of neg and so to that does not yield a well-
formed PF representation, but that cliticization to the empty complementizer [e] does. This
alternative is made attractive by the observation that in general not and so may only occur with
that-Deletion verbs.
believe
hope (that) S
(i) a. I expect so .
imagine not
persuade him
*whispered
(that) S
?regretted
b. Lee so .
*ordered
not
*established
The generalization is not perfect, however, in that there are some verbs that allow that-Deletion
but not
so .
not
know (that) S
(ii) I understand so
remember ?*
not
topicalization, inversion, and complementizers 233
34
I leave open in this paper the proper treatment of English not in auxiliary and other uses.
For some very interesting discussion, see Laka (1990), who takes not and n’t to be surface
realizations of neg. Alternatively, we might pursue the hypothesis that not is [Spec,PolP] when
Pol is neg, while n’t is neg. [For some additional discussion, see Ch. 6 above.]
35
Why we cannot say *I think that yes and *I think that no in English is an independent
question. For some discussion, see Laka (1990).
36
An alternative is that why is generated in [Spec,PolP]. But how come must be a PolP-
adjunct, as I show immediately below, so taking why to be a PolP-adjunct allows us to treat why
and how come as essentially the same.
234 explaining syntax
allow inversion, it cannot co-occur with the hell/in the world, and it cannot
occur with ever, in contrast with why and the other interrogatives.37
*why
(50) a. did Robin say that
how come
*why the hell
b. did Robin say thatb
how come in the world
*why
c. Robin said that
how come
37
As Pesetsky (1987) shows, the hell/in the world is compatible only with the sentence-initial
interrogative, i.e. the one that takes widest scope.
(i) a. who the hell hit Mary
b. who hit who
c. who the hell hit who
d. *who hit who the hell
e. *who the hell hit who the hell
b
But ?how the hell come seems to be marginally possible.
topicalization, inversion, and complementizers 235
38
A similar but distinct pattern holds for infinitival questions, e.g.
??where to
when to
(i) a. . . . Robin didn’t know *what to .
*who to
*how many to
?why not to
?where not to
b. . . . Robin didn’t know, when not to .
*what not to
*who not to
?how many not to
I do not find the judgments stable, however, and therefore I will forgo attempting to account for
them here.
236 explaining syntax
but not
(57) a. *how so
not
b. *where so
not
c. *when so
not
d. *what so
not
e. *who so
not
Thus, ?why
*how
(58) He said he wanted to leave, but he didn’t say so.
*where
*when
(59) a. *He said that he did something for a strange reason, but he didn’t say
what so.
b. *He said that he wanted to see someone for some reason, but he
didn’t say who so.
Some speakers do not accept why so at all. But there is another elliptical
construction in which why so and how so appear to be quite acceptable, while
the other interrogatives are not.39
(60) A: Robin will not leave on time.
B: i. Why so?
ii. How so?
In this case, how so has more or less the interpretation why so. Note that we
cannot have *how not, which suggests that this use of how is idiosyncratic.
On the analysis of Sluicing in }8.3.5, the interrogative is in [Spec,CP], as
in (61).
(61) . . . [CP [Spec what] Q [IP e ]]
Crucially, what must bind a trace in the LF representation of the empty
IP, which is thus not a prosentential. But suppose that why originates as a
39
Thanks to Marc Authier for suggesting this argument to me.
topicalization, inversion, and complementizers 237
NEG
PolP-adjunct, and Pol is SO
. While why binds a trace, the trace is not
contained within the minimal IP, which may therefore be +pro. Hence, why
not/so has the underlying structure in (62).
NEG
(62) [CP whyi Q [PolP Spec SO
[IP ti [IP +pro ]]]]
40
I leave open here the precise details of how the ellipsis is to be formally captured. For a
range of views, see Sag (1976), Wasow (1979), and Williams (1977).
238 explaining syntax
On the other hand, the other wh-words are moved into [Spec,PolP] by Move
Æ. Consequently, if the IP is reconstructed as in (64), there will be no trace in
the reconstructed IP for the moved wh to bind, as in *what did Robin do and
how, shown in (65). The reconstructed IP is shown in strikeout.
(65) [PolP what [IP did Robin do t]] and [PolP how [IP Robin do what]]
The unavailability of a trace in the reconstructed IP for the moved wh explains
the ungrammaticality of the sentences in (63) that lack why or how come.41
By assuming that why and how come originate outside of IP we can also
account for the fact that only these interrogatives allow internal topicaliza-
tion. We have already seen that topicalization blocks extraction of a wh from
IP, because of the topic island created by adjunction. I repeat the examples
of (6).
(6) a. *I asked what, to Lee, Robin gave.
b. *Lee forgot which dishes, on the table, you are going to put.
c. *Robin knows where, the birdseed, you are going to put.
However, why and how come are generated outside of IP. Topicalization can
apply freely below them, adjoining to IP. The following examples demonstrate
that the prediction is correct.42
(66) a. I asked why , to Lee, Robin gave the book
how come
b. Lee forgot why , on the table, you are going to put the dishes
how come
c. Robin knows why , the birdseed, you are going to put in the
how come
bird feeder
41
Along related lines, the following examples show that it is possible to have ellipsis in a
relative clause when the relative proform is why or how come, but not when it is another relative
proform, that or empty complementizer.
(i) a. John would not tell me the reason why (not).
b. John would not tell me the reason how come (*not).
c. *John would not tell me the way how (not).
d. *John would not tell me the time when (not).
e. *John would not tell me the place where (not).
f. *John would not tell me the thing which (not).
g. *John would not tell me the person who (not).
42
Sentences such as these are problematic for Lasnik and Saito (1992).
topicalization, inversion, and complementizers 239
43
Of course, we will still have to rule out the ungrammatical examples. The obvious
approach would be to extend the ECP for subject traces to cases in which Pol is not empty, e.g.
(i) [PolP [Spec whoi] wh+didj [IP ti tj leave]]
(ii) [PolP [Spec no onei] neg+didj [IP ti tj leave]]
240 explaining syntax
The sequence Infl–Pol allows Pol to raise to Infl in order to be bound without
yielding the S-structure inversion pattern, as in (71).
(71) [IP whoi [Infl Past do] [PolP Spec wh [VP . . . ]]] )
[IP whoi [Infl Past do]+wh [PolP Spec t [VP . . . ]]] )
[IP whoi [Infl Past]+wh [PolP Spec t [VP . . . ]]]
After this raising, Infl is a composite head that can license the wh in subject
position through Spec-head agreement. Similarly for neg and so.44
For this derivation to work as intended, do must be deleted before V even
across Pol. A question then arises as to why not blocks the deletion of do, given
that not is an instance of the head neg (cf. Laka 1990).
This derivation also entails that when the subject is questioned, the inter-
rogative remains in situ in S-structure, in contrast with questions where a non-
subject is interrogative. Finally, empty [Spec,PolP] inside of IP does not block
the deletion of do, nor does it appear to be a landing site, for English at least.
I will not deal in detail with the first point, which appears to be merely a
technical matter.45 On the second point, there appears to be no strong
evidence that the interrogative is anywhere other than in subject position in
S-structure. The fact that the subject functions as the focus of the sentence
follows from the fact that it is a WhP in the scope of a wh functional head. As
shown by multiple wh-questions, a WhP need not move into Spec to be
interpreted as a focus.
(72) What did you give to whom?
The claim that a negative subject is in situ (as in no one left) is far less controver-
sial, although the pattern appears to be exactly identical to that of the interroga-
tive. In the negative case we would say that Pol is neg; similarly for so.46
In each case, ti is not properly governed, since it is not coindexed with wh. I speculate that when
Pol is wh, neg, or so, agreement with what is in [Spec,PolP] does not entail coindexing. But
when Pol is [e], agreement can only be accomplished through coindexing.
44
An alternative is that wh, neg, so, etc. may appear as features on I as well as functional
categories external to IP. This dual status of Pol is problematic, however, and should lead us to
eliminate one of the two possibilities. Because of space limitations I will not pursue this question
further here.
45
The obvious route to pursue is that not is [Spec,PolP], and the head is neg. Then do will be
deleted unless there is a filled Spec between it and V.
46
An examination of Spanish is instructive in this regard. In Spanish, a negative sentence has
an overt sentence-initial no unless there is a fronted negative constituent.
(i) a. no lo tengo
neg it I-have
b. Juan no lo tiene
John neg it has
topicalization, inversion, and complementizers 241
c
And in the analysis in Ch. 9 of this book. For additional arguments that postverbal subjects
in focus constructions are in situ in VP, see Culicover and Winkler (2008).
topicalization, inversion, and complementizers 243
no one
(82) a. Into the room walked none of the women .
few of the women
No one
b. * None of the women into the room walked .
did into the room walk
Few of the women
No one
c. * None of the women did I say that into the room walked.
Few of the women
47
There are alternatives, of course. It might be supposed e.g. that the inverted subject
position is a focus position, which requires that whatever occupies that position move to
Spec,CP in LF. While there is evidence for this position being a focus (see Rochemont and
Culicover 1990), this focus position crucially does not yield Weak Crossover, unlike S-structure
movement or true LF movement of a focus (see Chomsky 1977).
(i) a. *Whoi did hisi mother scold ti
b. *Hisi mother scolded johni
c. Onto hisi face fell johni
d. Onto hisi face fell which boyi
244 explaining syntax
(87) a. Many people here drive General Motors cars, but no one [drives] a
Pontiac.
b. Many people here drive General Motors cars, but who ?[drives]/
does [drive] a Pontiac.
(88) a. Many people here would drive a General Motors car, but no one
would [drive] a Pontiac.
b. Many people here would drive a General Motors car, but who would
[drive] a Pontiac.
then
(96) Robin opened the door.
*when
One solution rests on the fact that wh is a clitic. If [Spec,PolP] is filled, then
wh cannot cliticize to I. If, in addition, V cannot adjoin to Pol, then wh will
not cliticize to anything, and sentences such as (94) will not be generated.
We would expect, in any event, that in some languages at least PolP could
have a phrasal [Spec,PolP] and a wh head internal to IP. In fact, Horvath
(1985) shows that the landing site for interrogative wh in main clauses in
48
Thanks to Peter Coopmans for raising this question for me.
246 explaining syntax
There are also SVO languages with focus to the right of V (e.g. Swahili,
M. Rochemont, p.c.). In such a language, the focus constituent can be
moved into [Spec,PolP], and subsequent raising and adjunction of the
heads will move the verb to the left of the focus, as illustrated in (99).
(99) [IP NP [Infl+[Pol+Vi]]j [PolP Spec tj [VP ti . . . ]]]
For Arabic, Ouhalla (1994) has shown that there are two negative operators,
one external to TnsP (maa) and one internal to TnsP (laa). There are two
interrogative markers, ʔa and hal. Only the external interrogative is consistent
with disjunctive questions.
(100) a. ʔa Zaynab-a uy-hibbu Zayd-un ʔam Laylaa
Q Zaynab-acc 3ms-loves Zayd-nom or Laylaa
‘Is it Zaynab that Zayd loves or Laylaa?’
b. *hal Zaynab-a uy-hibbu Zayd-un ʔam Laylaa
49
Horvath views the focus position as governed by V. However, she raises the possibility
(1985: 146, n. 35) that an analysis similar to ours might be entertained, suggesting that the focus
position might be governed by Infl.
topicalization, inversion, and complementizers 247
Suppose that Pol can be Focus. This value of Pol is distinct from wh
(interrogation), neg (negation), and so (emphasis). Since Focus is empty, it
can agree with its Spec, just as empty C can (Rizzi 1990). By assumption it is
phonologically inert and does not trigger inversion. We would predict that
certain instances of movement that appear to be topicalization are actually
movements to [Spec,PolP] of Focus. On the assumption that a topic can
adjoin to IP, we then predict the existence of two different structures for
essentially the same sequence in S-structure.
(105) [PolP [Spec XPi] Focus [IP . . . ti . . . ]]
[PolP Spec Pol [IP XPi [IP . . . ti . . . ]]]
Consider how these structures differ from one another and what empirical
predictions are made. First, there might be two intonations corresponding to
the two structures, where one intonation corresponds to a focus interpret-
ation and the other does not. Second, when XP is moved into [Spec,PolP] it
should be possible to extract over it, just as it is possible to extract over a
fronted negative constituent.
Concerning the prosodic difference, it has been noted in the literature that
there are two distinct topicalization intonation contours, ‘topic’ and ‘focus’
(Gundel 1974). The topic intonation is the typical ‘comma intonation’, where
the topic and the rest of the sentence constitute separate intonation groups.
(106) a. To Robin, I gave a book.
b. On the table, Lee put the books.
c. Last year, we were living in St. Louis.
d. In those days, we drove a nice car.
e. Robin, I really dislike.
The focus intonation is characterized by a primary stress in the topic and no
break between the topic and the rest of the sentence. It is possible for there to
be an additional primary stress elsewhere in the sentence as well.
(107) a. To robin I gave a book.
b. On the table Lee put the books.
c. last year we were living in St. Louis.
d. In those days we drove a nice car.
e. robin I really dislike.
(108) a. To robin I gave a book.
b. On the table Lee put the books.
c. last year we were living in St. Louis.
d. In those days we drove a nice car.
e. robin I really dislike.
topicalization, inversion, and complementizers 249
The claim that the stressed elements in these sentences are foci is supported
by the fact that they can be used to answer corresponding questions
(Gundel 1974 : ch. 5), To whom did you give a book?, etc.; To whom did you
give what?, etc.
Consider next extraction. PolP is not a barrier, since it is c-selected by C (in
the sense of Cinque 1990). Where the topic is in [Spec,PolP], then, we expect
that extraction from IP over PolP into a higher Spec should be possible.
Moreover, this extraction, if it is possible, should correlate with the focus
intonation difference.
The examples in (109)–(112) test this prediction. The first group of sen-
tences illustrates extraction over an IP-adjoined topic. In the (a) examples the
wh-phrase moves over the topic into the closest [Spec,CP]. In the (b)
examples the wh-phrase moves to a higher [Spec,CP]. In the (c) examples
the wh-phrase moves over the topic into the closest [Spec,CP] and Infl must
also move to the left of the topic in order to move into Pol. Hence Infl as well
as wh crosses both IP nodes in the (c) examples.
(109) a. *This is the book which, to Robin, I gave.
b. *Which book did Lee say that, to Robin, she gave?
c. *Which book did, to Robin, Lee give?
(110) a. *I picked up the books which, on the table, Lee had put.
b. *Which books did Lee say that, on the table, she had put?
c. *Which books did, on the table, Lee put?
(111) a. *This is the town in which, last year, we were living.
b. *In which town did Lee say that, last year, we were living?
c. *In which town were, last year, you living?
(112) a. *This is car which, in those days, we drove.
b. *Which car did Lee say that, in those days, we drove?
c. *Which car did, in those days, you drive?
As we can see in these examples, with the comma intonation extraction over
the topic is uniformly ungrammatical.
Next, consider extraction over a focus. In the (a) examples we have
movement to an embedded [Spec,CP] without inversion, while in the (b)
examples we have movement to a higher [Spec,CP]. In the (c) examples, Pol
must be wh in order that the wh-question be well-formed. Hence Pol cannot
be Focus. The topic must be adjoined to IP, which creates a topic island. Thus
250 explaining syntax
(114) a. I picked up the books which on the table Lee had put.
b. Which book did Lee say that on the table she had put?
What
b. In which town did Lee say that last year we were living?
Where
50
The sentences in (113)–(116) are somewhat reminiscent of Baltin’s (1982) well-known He’s a
man to whom liberty we could never grant.
51
Stylistic Inversion also has a smooth intonation, suggesting that it is a case of ‘focus’
topicalization.
topicalization, inversion, and complementizers 251
In English, inversion must be triggered by wh, neg, or so. While Pol may
be Focus, Focus does not trigger inversion in English. German and the
Scandinavian languages differ from English in that inversion is found for
the most part with any fronted constituent. This latter group of languages can
themselves be differentiated according to whether or not V2 in complements
is in complementary distribution with the presence of an overt complemen-
tizer. For instance, the sequence [CP C–XP–V–NP– . . . ] is not possible in
German but it is in Faroese. (The Faroese examples are from Vikner 1991.)
(124) Ge. a. *Ich glaube, daß gestern habe ich es auf den Tisch gestellt.
b. Ich glaube, daß gestern ich es auf den Tisch gestellt habe.
c. Ich glaube, gestern habe ich es auf den Tisch gestellt.
d. *Ich glaube, gestern ich es auf den Tisch gestellt habe.
(125) Fa. a. Tróndur segđi, at ı́ gjár vóru dreingirnir als ikki ósamdir.
Trondur said, that yesterday were boys-the at-all not disagreed
b. *Tróndur segđi, at ı́ gjár dreingirnir vóru als ikki ósamdir.
c. *Tróndur segđi, at ı́ gjár dreingirnir als ikki vóru ósamdir.
topicalization, inversion, and complementizers 253
Example (126) shows that in German there must be inversion when there is a
‘topicalized’ constituent. It is standard in the analysis of German and the
other Germanic languages to hold that the surface order Subject–Verb–XP is
derived by V2, where the subject occupies the [Spec,CP] position. Hence in
German the tensed verb in the complement must follow a clause-initial
subject.
(126) a. Ich glaube, daß Johann Maria gesehen hat.
I believe that Johann Maria seen has
b. *Ich glaube, daß Johann hat Maria gesehen.
c. *Ich glaube, Johann Maria gesehen hat.
d. Ich glaube, Johann hat Maria gesehen.
But in Faroese, the tensed verb need not move into second position.
(127) a. Tróndur segđi, at dreingirnir vóru als ikki ósamdir.
Trondur said, that boys-the were at-all not disagreed
b. Tróndur segđi, at dreingirnir als ikki vóru ósamdir.
Trondur said, that boys-the at-all not were disagreed
Suppose that we express these differences in terms of Pol. In English, Pol
ranges over wh, neg, so, Focus, and [e], while in the other Germanic
languages it is unrestricted to topic. In all the languages but English, empty
Pol is a bound morpheme that must be bound to a lexical head, in particular,
V. In German, PolP is in complementary distribution with CP, while in
English and the other Germanic languages it can be a complement of
C. Hence in German, Pol is obligatory when there is no C; in the other
languages it is optional.52 (128) summarizes.
(128) English German Faroese
Range of Pol wh Topic Topic
neg
so
Foc
[e]
Empty Pol Free Bound Bound
Distribution Optional C Optional
of PolP complement of C Pol complement of C
52
Since Pol selects only tensed IP, it follows that there are no wh-infinitives in German.
254 explaining syntax
8.5 Summary
In this paper I have given evidence that there are two complementizer-type
positions in English, each of which is the head of a maximal projection. The
two heads, C and Pol, permit the explanation of a range of phenomena that
do not appear to be amenable to a one-complementizer analysis. For example,
the fact that there is no that-t violation when that is immediately followed by
one of a certain class of adjuncts is accounted for if empty Pol undergoes
agreement with the subject trace. The occurrence of SAI in embedded Nega-
tive Inversion and so-Inversion sentences but not in embedded wh-questions
has a natural account if we distinguish pure complementizers such as that and
Q from polarity operators such as wh, neg, and so. The assumption that Pol
selects only tensed S’s but not infinitivals allows us to explain the fact that
there are only wh infinitivals, not negative or so infinitivals. The C/Pol analysis
also allows us to capture some facts about the behavior of why and how come
as well as some subtle differences between them. By assuming some relatively
minimal differences in the range of Pol and in the distribution of PolP with
respect to C, it appears that we may be able to account for some of the
differences among the Germanic languages regarding V-second phenomena.
Finally, I have proposed that PolP can appear not only as a complement of
C, but as a complement of I. When it is IP-internal, [Spec,PolP] can function
as the location of pre-V focus, as in Hungarian. Allowing Pol to be Focus
allows us to capture the difference between comma intonation and focus
intonation topicalization in English, and predicts correctly that certain
instances of topicalization will not create topic islands. In languages like
Arabic, external and internal neg and wh are overtly distinguished, which
supports the general picture developed for English.
9
Remarks on Chapter 9
This article is concerned with the fact noted in Chapter 8 that an adverb
(and other initial material) that intervenes between that and the trace of an
A0 extraction significantly ameliorates the that-t effect (*what do you think
that t happened? what do you think that just t happened?). I was unaware at
the time (i.e. had forgotten) that the data had been originally observed
by Bresnan (1977). The significance of the Adverb Effect is that it undermines
the ECP account—a grammatical constraint formulated in terms of antece-
dent and/or head government—since the intervening adverb does not on
the face of it significantly alter the syntactic configuration. It is of course
possible to make ad hoc assumptions about what the structure is
when the adverb is present that will change the government relations in
the intended direction, but the phenomenon calls out for an alternative
perspective.
The Adverb Effect and its evil twin, the that-t effect, are among the
more interesting puzzles unearthed in the contemporary exploration of
English syntax. At this point I am convinced that the correct account is
not a strictly syntactic one, but rather one that appeals to the computation
of the correspondence between syntactic structure and interpretation. Robert
Levine and I offer some speculation in Chapter 10 along these lines, but a
genuine explanation has yet to be provided.
* [A condensed version of this chapter first appeared in Linguistic Inquiry 24: 557–61 as
Culicover (1993). I am very grateful to Chris Barker, Peter Coopmans, Michael Rochemont,
Philip Miller, Mineharu Nakayama, Bob Levine, Carl Pollard, and an anonymous Linguistic
Inquiry reviewer for helpful comments and criticisms on various aspects of this research. This
article was inspired in part by the reviewer pointing out examples like (8) in the text.]
the adverb effect 257
Spec C⬘
C PolP
Spec Pol
Pol(arity) IP
The familiar contrast that illustrates the that-t effect is given in (2) and (3).
(2) a. I expected (that) you would win the race.
b. Which race did you expect (that) I would win?
1
The same effect occurs with PPs topicalized out of VP, but it is more difficult to control for
the effects of crossing dependency and topic islands. The following examples appear to me to be
fairly acceptable, with focal stress on the topic.
(i) a. Robin met the man whoi Leslie said that [to kim]j ti had given the money tj.
b. I asked whoi you had claimed that [on this table]j ti had put the books tj.
258 explaining syntax
Op that
(4) a. Robin met the man whoi Leslie said that for all intents and
i
purposes ti was the mayor of the city.
b. This is the tree Opi that I said that just yesterday ti had resisted my
shovel.
c. I asked whati Leslie said that in her opinion ti had made Robin give
a book to Lee.
d. Lee forgot which dishesi Leslie had said that under normal circum-
stances ti should be put on the table.
Let’s call this the Adverb Effect.2
First I will examine the Adverb Effect and consider what it suggests about
ECP accounts of the that-t effect. Then I will explore extensions of the Adverb
Effect and show that it has some interesting implications for the analysis of
parasitic gaps.
The (questionable) argument for the empty functional category Pol(arity)
that I alluded to above goes as follows. Suppose we assume that a subject trace
is licensed by an empty complementizer, but not by an overt lexical comple-
mentizer. There have been a number of proposals in the literature for deriving
this result. Let us assume for concreteness the proposal of Rizzi (1990), in
which one possible instantiation of the empty complementizer is Agr, which
agrees with the trace in [Spec,CP] by general Spec-head agreement and, by
transitivity, with the subject trace as well, as shown in (5).3
(5) [CP ti0 Agri [IP ti . . . ]]
2
Note that the sentential adverbials in (4) in general do not give rise to topic islands (see (iii)
and (iv)), which have been discussed by Lasnik and Saito (1992) and Rochemont (1989).
Opi that
(i) a. This is the tree which just yesterday I had tried to dig up ti with my shovel.
i
b. I asked whati in your opinion Robin gave ti to Lee.
c. Lee forgot which dishesi under normal circumstances you would put ti on the table.
(ii) a. I think that, to Lee, Robin gave a book.
b. Lee said that, on the table, she is going to put the yellow dishes.
c. Robin says that, the birdseed, he is going to put in the shed.
(iii) a. *Whati didk, [to Lee]j, Robin tk give ti tj?
b. *[Which dishes]i arek, [on the table]j, you tk going to put ti tj?
c. *Wherei arek, [the birdseed]j, you tk going to put tj ti?
(iv) a. I asked whati, [to Lee]j, Robin gave ti tj.
b. *Lee forgot [which dishes]i, [on the table]j, you are going to put ti tj.
c. *Robin knows wherei, [the birdseed]j, you are going to put tj ti.
It is not clear whether this is related to the Adverb Effect.
3
See Rochemont and Culicover (1990) for a similar account.
the adverb effect 259
a
This stipulation subsequently evolved into a general principle of Optimality Theoretic
syntax; effectively, structure is not present unless it is needed to host an overt constituent. See
Grimshaw (1997).
260 explaining syntax
(8) Leslie is the person who I said that only then would run for President.
This example appears to be comparable in grammaticality to one that con-
tains a non-negative adverbial.
(9) Leslie is the person who I said that at that time would run for President.
Fronted only then typically causes Negative Inversion. Suppose therefore that
the structure of (8) is as in (10).
(10) . . . whoi [I said [CP that [PolP [only then] [Pol wouldj][IP ti tj run for
President]]]]
The main problem is that it is not clear how it is that ti is properly governed.
Wouldj cannot head-govern ti, since the two are not coindexed. Similar
configurations involving interrogatives are ill-formed, as Rizzi (1990) notes.
(11) a. *whoi didj [IP ti tj sleep] (from Koopman 1983)
b. *[isj [IP ti tj intelligent]] [every man in the room]i
So we don’t really want to re-index wouldj and tj with i in (10).
It might be thought that perhaps the negative adverbial in this case does
not actually trigger Negative Inversion. Note, however, that the negative
adverbial takes sentential scope, since it licenses polarity items.
(12) a. Leslie is the person who I said that at no time would run for any
public office.
b. Robin met the man who Leslie said that only then had seen anything
moving.
c. It is Leslie who I believe that only for one moment had given a damn
about the budget.
Topicalized negative phrases, i.e. those that don’t trigger Negative Inversion,
cannot license polarity items.
(13) a. At no time would Leslie run for any public office.
b. *At no time(,) Leslie would run for any public office.
(14) a. Only then did Leslie see anything moving.
b. *Only then(,) Leslie saw anything moving.
(15) a. Not once had Leslie given a damn about the budget.
b. *Not once(,) Leslie had given a damn about the budget.
the adverb effect 261
4
If there is inversion in (8), we might expect that in the absence of a modal, the sequence
Tense-[NP t]-V- . . . would trigger do-support. Then (i.a) should be grammatical and (i.b) should
be ungrammatical.
(i) a. ??Leslie is the person who I said that only in that election did run for any public office.
b. Leslie is the person who I said that only in that election ran for any public office.
I speculate that the oddness of the first example is due to the fact that the sequence did V with
unstressed did is marginal in PF, regardless of the presence of the empty category. The second
example, while grammatical, has an analysis in which the adverb only in that election appears
between Infl and VP.
5
This negative conclusion is not an argument against the existence of Pol. I am suggesting
that the Adverb Effect simply does not constitute evidence for the existence of Pol.
6
A similar account is proposed by Kayne (1981a).
7
There is no question here of some dialect variation involving the status of the comple-
mentizer that, as suggested by Sobin (1987), since speakers such as myself who have the that-t
effect also accept sentences in which it is suspended.
262 explaining syntax
satisfy ECP. If the ECP must hold for the subject trace, either the ECP doesn’t
involve head government, or the subject trace is head governed.
What are the potential consequences? If head government is not part of
ECP, then we have to worry anew about argument/adjunct differences in
extraction, no small task. If head government is a part of ECP, and if the
subject is head-governed (e.g. by Infl or by C), there are then questions of
what the head governor is and how to account for the Negative Inversion
cases discussed above (see (8)). With each alternative, we are faced with a
different set of complicated consequences that are worth pursuing, but lack of
space prevents me from pursuing them here.
Whether the Chomsky–Lasnik type of filter is the correct account awaits
additional research, as does the question of how an empty subject is licensed.8
In the space remaining, I want to consider a broader range of cases in which
the that-t effect appears, showing that the Adverb Effect applies to comple-
mentizers other than that and to certain parasitic gaps as well as true gaps.
8
See Pesetsky (1979) for arguments against the filter analysis of the that-t effect. In Culicover
(1992b) I explore the hypothesis that the filter is actually due to a prosodic constraint (at PF) on
the distribution of stress peaks in the neighborhood of wh-trace.
the adverb effect 263
At worst there is still a weak wh-island violation, due to the extraction over
whether/if, but it is no worse than extraction from object position over
whether/if.
(19) This is a person whoi you might well wonder whether under some
if
circumstances you would dislike ti.
Very much the same judgments hold for the movement of a empty oper-
ator, which we see in the cleft construction.
(20) a. *It is this person Opi that you might well wonder whether ti
if
dislikes you
b. It is this person Opi that you might well wonder whether for all
if
intents and purposes ti dislikes you.
c. It is this person Opi that you might well wonder whether you
if
should pay attention to ti.
Consider next the Stylistic Inversion construction, illustrated in (21).
(21) On the table was put the book with the answers.
If the ‘subject’ gap (that is, the gap to the left of the verb) results from the
movement of the PP we get the same pattern as we get with the movement of a
subject NP.
(22) a. *[On which table]i were you wondering whether ti had been put
if
the books that you had bought?
b. [On which table]i were you wondering whether under certain
if
circumstances ti might have been put the books that you had
bought.b
And similarly for the cleft construction, where the empty operator is linked to
the PP in focus position.
(23) a. *It was on this table Opi that I was wondering whether ti had
if
been sitting [the book with the answers].
b. It was on this table that I was wondering Opi whether at some
if
time or another ti had been sitting [the book with the answers].
Like and as if occur in more restricted contexts, but display the same
behavior. Extraction of a non-subject is possible, extraction of a subject is
b
My original judgment had (22b) as grammatical and (22a) ungrammatical. At this point it
seems to me that the adverb ameliorates (22b) in comparison with (22a), although it is still quite
marginal.
264 explaining syntax
ungrammatical, and the Adverb Effect applies. Note the contrast between (c)
and (d) in the following examples.
(24) a. It seems like you lost your notebook.
b. This is the notebooki Opi that it seems like you lost ti.
c. *This is the person Opi that it seems like ti lost the notebook.
d. This is the person Opi that it seems like just a few minutes ago ti
lost the notebook.
(25) a. It seems as if you lost your notebook.
b. This is the notebooki Opi that it seems as if you lost ti.
c. *This is the person Opi that it seems as if ti lost the notebook.
d. This is the person Opi that it seems as if just a few minutes ago ti
lost the notebook.
The data thus confirm that not only does the that-t effect generalizes to the
full set of complementizers (whatever its ultimate source), but the Adverb
Effect does as well.
9
In GPSG and related approaches, parasitic gaps are treated as similar to multiple extraction
from a coordinate structure. See Gazdar et al. (1985).
the adverb effect 265
(28) *Whati did you buy ti after stating clearly that pgi could easily be made
at home?
(29) *This is the very person whoi you should ask ti whether pgi might be
consulting you in the future.
And, as in the extraction cases, a sentential adverb seems to improve matters.
(30) ?Whati did you buy ti after stating clearly that with very little difficulty
pgi could be made at home?
(31) ?This is the very person whoi you should ask ti whether under some
circumstances pgi might be consulting you in the future.
A more deeply embedded parasitic gap behaves in the same way.
(32) a. Whati did you buy ti after stating clearly that it was obvious that you
could make pgi yourself at home?
b. *Whati did you buy ti after stating clearly that it was obvious that pgi
could easily be made at home?
c. ?Whati did you buy ti after stating clearly that it was obvious that
with very little difficulty pgi could be made at home?
(33) a. This is the very person whoi you should tell ti whether you think
that you will consult pgi in the future.
b. *This is the very person whoi you should tell ti whether you think
that pgi should consult you in the future.
c. ?This is the very person whoi you should tell ti whether you think
that under some circumstances pgi should consult you in the future.
We may take these examples as showing that these parasitic gaps, like
some true gaps, are generated by ‘movement’ of a null empty operator
(Chomsky 1986).10
Now let us turn to some cases where the Adverb Effect does not occur.
An empty subject that results from extraction cannot be adjacent to a subor-
dinating conjunction.
(34) *I met a person whoi I went and bought some jewelry just before ti
disappeared without a trace.
10
Notice that the possibility of nominative parasitic gaps calls into question the view that
there is a ‘case compatibility’ condition on the complex chain containing a parasitic gap and its
antecedent. It also undermines the account of Frampton (1990), in which the parasitic gap must
be ‘lexically identified’. Subjects, on Frampton’s analysis, are not lexically identified.
266 explaining syntax
There is both a CED violation and a classical ECP violation here, because of
the extraction of a subject. The presence of an adverb does not appear to
reduce the ungrammaticality of the subject extraction case even slightly.
(35) *I met a person whoi I went and bought some jewelry just before for all
intents and purposes ti disappeared without a trace
When there is no extraction site in the adjunct, but a parasitic gap, there
is presumably no CED violation. But a subject gap is worse than a non-subject
gap and, as before, a sentential adverb does not significantly improve
grammaticality.
(36) a. Whati did you pay for ti just before the store tried to repossess pgi?
b. *Whati did you pay for ti just before pgi was repossessed by the
store?
c. *Whati did you pay for ti just before for all intents and purposes pgi
was repossessed by the store?
These violations in CED configurations fall together with other Subja-
cency-type violations in their resistance to the Adverb Effect. In (37) we see
that extraction from subject position of a relative clause is not improved by
the presence of the adverb.
(37) a. *This is the mani that the theoremj that ti proved tj contains a
serious error.
b. *This is the mani that the theoremj that for all intents and purposes
ti proved tj contains a serious error.
A similar result holds when the gap in the relative clause is a parasitic gap. (38)
shows the grammaticality of parasitic gaps in this construction, while (39)
shows the ungrammaticality of subject parasitic gaps in relative clauses.
(38) Beer is the only beverage whichi everyonej that tj likes pgi praises ti.
(39) *Beer is the only beverage whichi everyonej that pgi makes tj drunk
praises ti.
And (40) shows that a sentential adverb does not improve grammaticality.
(40) *Beer is the only beverage whichi everyonej that under any circum-
stances pgi makes tj drunk praises ti.
Robert D. Levine (p.c.) has pointed out that in these relative clauses there
is crossing dependency regardless of whether there is an adverb. This is
the adverb effect 267
definitely a factor. I noted above that the Adverb Effect holds in a embedded
wh-question headed by whether, regardless of whether there is extraction
(cf. (18)) or a parasitic gap (cf. (33)). But in wh-islands in which something
has been fronted, the crossing dependency has a clear effect, which appears to
overwhelm the Adverb Effect (as shown in the c examples).
(41) a. ??whoi did you ask ti [whoj tj likes pgi]
b. *whoi did you ask ti [whoj pgi likes tj]
c. *whoi did you ask ti [whoj for a very good reason pgi likes tj]
(42) a. ??whati did you find out [whoj tj said ti]
b. *whoi did you find out [whatj ti said tj]
c. *whoi did you find out [whatj for a very good reason ti said tj]
Because the complementizer position that contains wh or a null operator in
the embedded S is adjacent to the subject position, there is no way to
dissociate the effect of crossing dependency from the effect of simply having
a subject trace adjacent to an overt complementizer.
Similar results hold for complex NPs (appositives):
(43) a. Beer is the only beverage whichi the fact that everyone likes pgi fails
to make ti more expensive.
b. *Beer is the only beverage whichi the fact that pgi makes people sick
fails to make ti less expensive.
c. *Beer is the only beverage whichi the fact that for all intents and
purposes pgi makes people sick fails to make ti less expensive,
—and for sentential subjects.
(44) a. Ed is the only politician whoi that everyone dislikes pgi appears to
bother ti.11
b. *Ed is the only politician whoi that pgi really dislikes people appears
to bother ti.
c. *Ed is the only politician whoi that for all intents and purposes pgi
really dislikes people appears to bother ti.
That is, a subject parasitic gap that is maximal in a Subjacency island is as
ungrammatical as a trace in the same position.
11
The acceptability of this sentence is enhanced by putting a brief pause after who and heavy
stress on dislikes and bother.
268 explaining syntax
9.4 Summary
There is a general constraint against the sequence C-t, where C is an overt
complementizer or subordinating conjunction and not a relative/comparative
marker. The Adverb Effect somehow improves the grammaticality of an
empty subject by interposing material between the complementizer and the
subject. There are two types of response to the Adverb Effect. First, the Adverb
Effect applies to empty subjects (true gaps or parasitic gaps) in domains from
which extraction is in principle possible. These are the subjects of that-
complements and the subjects of whether-complements. Second, the Adverb
Effect is neutralized when the empty subject is maximal in a domain from
which extraction is in principle impossible, such as CED configurations,
relative clauses, appositive clauses, and sentential subjects.
The paradox implicit in these observations is the following. On the one
hand it appears that extraction of subjects and parasitic gap licensing
of subjects are subject to the same barriers, even if only the former involves
movement across the extraction barrier. On the other hand, non-subject
parasitic gaps, and parasitic gap subjects of sentential complements, are
licensed in configurations where extraction is impossible.12 So, it appears
that what blocks extraction of subjects blocks parasitic gap subjects, but what
blocks extraction of non-subjects and subjects of sentential complements
does not block comparable parasitic gaps. The paradox lies in the fact
that we are presumably dealing with the same mechanisms of extraction in
all cases, the same mechanism for licensing parasitic gaps in all cases, and the
same characterization of barriers in all cases. Something has to give here.
I leave the problem for future investigation.
In conclusion, returning to the observations that launched this paper,
I have shown that the presence of sentential adverbs suspends the that-t effect,
and more generally, the C-t effect. This result calls into question classical ECP
accounts of this effect, in which that more or less directly blocks proper
government of the empty subject. The evidence suggests that the that-t effect
should be thoroughly reconsidered and the data re-evaluated, and with it the
portion of the theory that incorporates the ECP. The interaction between the
Adverb Effect and parasitic gaps suggests that the Adverb Effect may have
some additional diagnostic properties that will be useful in understanding the
nature of parasitic gaps, extraction, and barriers.
12
These generalizations hold particularly clearly if we exclude wh-islands from consideration
because of the crossing dependency effect noted earlier, and assume that extraction from a wh-
island is in principle possible (and ruled out for other reasons, e.g. Minimality).
10
Remarks on Chapter 10
Our goal in this paper was to understand the syntactic structure of Stylistic
Inversion (Into the room walked Sandy). We argue that the phenomenon
described and discussed in the literature as Locative or Stylistic Inversion
in English is actually a conflation of two quite different constructions: on the
one hand, light inversion (LI), in which the postverbal NP element can be
phonologically and structurally extremely simple, possibly consisting of a
single name, and on the other hand heavy inversion (HI), where the post-
verbal element is heavy in the sense of heavy NP shift. We present evidence
that the preverbal PP in LI patterns with subjects but the PP in HI is a
syntactic topic, using a variety of tests which distinguish A-positions from
A0-positions. Other significant differences between HI and LI, such as the
classes of verbs which support these two constructions, respectively, and the
differential behavior of HI and LI with respect to adverbial placement,
provide support for interpreting HI as a case of heavy NP shift applying to
subject constituents.
* [This chapter appeared originally in Natural Language and Linguistic Theory 19: 283–310
(2001). It is reprinted here by permission of Springer. An earlier version was presented at the
Colloque de Syntaxe et Sémantique, University of Paris VII, October 1995. We thank the
participants at that conference for their comments, as well as various other audiences elsewhere
which have provided us with helpful feedback, including the University of Girona. In addition,
we wish to express our appreciation for the care and effort evident in the responses to our paper
of several anonymous referees for NLLT.]
270 explaining syntax
10.1 Introduction
Levin and Rappaport Hovav (1995) have recently argued against the view that
Stylistic Inversion is a diagnostic for unaccusativity.1 Rather, they suggest,
Stylistic Inversion occurs with a wide range of verbs, including unaccusatives,
passives, and—crucially—unergatives. We demonstrate in the following
discussion that the argument of Levin and Rappaport Hovav does not go
through, because they, along with all other students of Stylistic Inversion, fail
to observe that there are actually two Stylistic Inversion constructions in
English. One construction, which we call light inversion (LI), is restricted to
unaccusatives; the other, which we call heavy inversion (HI), is not (we explain
this technology shortly). In general, it has been evidence of HI that has been
used to argue that Stylistic Inversion is not restricted to unaccusatives.
We begin by adducing evidence in }10.2 that in LI the fronted PP is a subject,
i.e. occupies the Spec position associated with IP. In }10.3 we elaborate our
claim that there are two Stylistic Inversion constructions, presenting a wide
range of evidence that Stylistic Inversion with ‘light’ subjects is possible only
when the verb is unaccusative; when the verb is unergative or even transitive,
Stylistic Inversion is possible, but only with a ‘heavy’ subject. The notion of
‘heavy’ here corresponds exactly to the one that is relevant to heavy NP shift
(see Arnold et al. 2000 for detailed discussion of the factors which heaviness
comprises). We assume that in the case of light inversion (LI), the subject is in
situ in VP, while in the case of heavy inversion (HI), the subject appears in
[Spec, IP] at some point in the derivation and subsequently postposes to the
right of VP. For concreteness we assume the following derivations.
(1) LI: [IP e I [VP V NPsubj PP. . . ]] ) [IP PP I [VP V NPsubj t . . . ]]
(2) HI: [IP e I [VP NPsubj V PP. . . ]] ) [IP NPsubj I [VP tsubj V PP. . . ]] )
[IP t0 subj I [VP tsubj V PP. . . ] NPsubj ] ) [IP PP [IP t0 subj I [VP tsubj V tPP . . . ]
NPsubj ]]
We stress at the outset that the main focus of this paper is that there are two
constructions. Space considerations prohibit us from exploring in satisfactory
depth all of the technical questions bearing on the specific details. We do
assume, following proposals of Coopmans (1989) and Hoekstra and Mulder
(1990) among others, that the subject NP in (1) is selected as a sister of the
unaccusative verb. Either it or the PP moves into the higher specifier position,
which we assume to be [Spec, IP]. The apparent optionality of such move-
ment is an obvious problem from the perspective of a theory of movement
1
Throughout we use the term ‘Stylistic Inversion’. Another term commonly found in the
literature is ‘Locative Inversion’.
stylistic inversion in english 271
triggered by the need to discharge features (e.g. Chomsky 1995), but we will
not pursue this aspect of the analysis here.
More controversially, we assume that the sentence-final subject in (2) is
necessarily in [Spec, IP] at some point in the derivation, and that it ends up in
final position through movement. If this NP moves to the right, as we assume
in (2), then this clearly raises important questions in the light of the proposal
of Kayne (1994) that there are no rightward movements. For recent commen-
tary on this as well as other aspects of Kayne’s proposal, see the papers in
Beerman et al. (1997). It is conceivable that the proper derivation of HI does
not involve movement of the subject to the right, but rather movement of
everything else to the left. We will not be able to develop and evaluate here an
analysis along these lines.a
An additional complication is that movement of the NP to the right
leaves a trace that must be licensed. It is generally claimed that in the
configuration that t Infl . . . , the trace of the subject is not licensed (see
e.g. Rizzi 1997, and Chapter 9 above). The question then arises as to why
the subject trace would be licensed in the configuration PP t Infl . . . , as in
(2). Our hypothesis is that the licensing of the subject trace is not a strictly
grammatical phenomenon, but rather a processing effect.b Again, to develop
such an idea in satisfactory depth would take us far afield and away from the
primary focus of the paper.
In the following section, we briefly touch on the claim that LI occurs only
when the verb is unaccusative. The facts turn out not to be entirely simple, but
the generalization can be sustained more or less in this form. We support this
claim by providing a number of syntactic contexts in which LI is impossible,
but where HI yields a structure which creates the illusion of an ordinary
stylistically inverted form. }10.4 summarizes our conclusions and notes sev-
eral important issues which our conclusions raise, but which we have not been
able to address within the confines of this paper.
10.2 PP is a subject
Frequently cited evidence that the PP in Stylistic Inversion is a subject is the
following. First, long extraction of the PP produces a that-t effect, as first
noted in Bresnan (1977); see also Culicover (1993a) and Chapter 9 above. This
generalization extends to other complementizers (e.g. whether-t, extraction
from gerundives) that show a Comp-t effect (Pesetsky 1982). We illustrate the
relevant data in examples (3)–(7):
a
But see Ch. 6 above, which argues against a leftward movement analysis.
b
This conclusion is compatible with the arguments in Ch. 9 above regarding the Adverb Effect.
272 explaining syntax
that-t:
(3) a. Into the room Terry claims (*that) t walked a bunch of gorillas.
b. Into which room does Terry claim (*that) t walked that bunch of
gorillas?
(4) That bunch of gorillas, Terry claims (*that) t walked into the room.
whether-t:
(5) a. ?Into this room, Terry wonders whether a bunch of gorillas had
walked t.
b. *Into this room, Terry wonders whether t had walked a bunch of
gorillas.
gerundive:
(6) a. Terry imagined a bunch of gorillas walking into the room.
b. Into the room Terry imagined a bunch of gorillas walking.
c. Into the room Terry imagined walking [a bunch of gorillas].
d. Into which room did Terry imagine a bunch of gorillas walking?
e. Into which room did Terry imagine walking [a bunch of gorillas].
f. [How many gorillas] did Terry imagine walking into the room?
(7) a. Terry thought about a bunch of gorillas walking into the room.
b. ?Into the room Terry thought about a bunch of gorillas walking.
c. *Into the room Terry thought about walking [a bunch of gorillas].
d. ?Into which room did Terry think about a bunch of gorillas walking?
e. *Into which room did Terry think about walking [a bunch of
gorillas]?
f. *[How many gorillas] did Terry think about walking into the room?
But this argument is far from conclusive, because it crucially assumes that it is
the fronted PP and not the postverbal subject which is responsible for the
trace in subject position. We argue later that the postverbal subjects in such
examples are exclusively heavy, in precisely the sense that distinguishes
constituents eligible to undergo heavy NP shift from those which are not,
and hence must be moved to their surface position from [Spec, IP]. What the
starred examples then show is that that-t is indeed ill-formed, but not that the
extracted PP is linked to the subject trace.
Second, the fronted PP in Stylistic Inversion appears to undergo Raising,
suggesting that it is a subject.
(8) a. A picture of Robin seemed to be hanging on the wall.
b. On the wall seemed to be hanging a picture of Robin.
stylistic inversion in english 273
2
In order to minimize notational complexity, we replace strings of traces with ellipses where
appropriate.
274 explaining syntax
appears that the PP in the case of LI cannot undergo raising, in spite of the
fact that it is in [Spec, IP].3
Consider the following contrasts.
(10) a. Into the room appeared to be walking slowly a very large caterpillar.
b. Into the room walked Robin slowly.4
c. *Into the room appeared to be walking Robin slowly.
(11) a. Slowly into the room walked Robin boldly.
b. *Slowly into the room appeared to walk Robin boldly.
(12) a. Into the room singing walked Robin slowly.
b. *Into the room singing appeared to walk Robin slowly.
The presence of the adverb after the subject forces the LI analysis. We see that
in this case, a simple PP, or a more complex XP (a V-less VP in the RC
analysis), cannot undergo raising to a higher subject position.
Yet while these two arguments ultimately fail to support the treatment of
PPs in Stylistic Inversion as subjects, there is a significant set of data, reflecting a
systematic differences between A- and A0 -positions, which confirms the subject
status of the preverbal PPs in (light) Stylistic Inversion, and which is not
consistent with PP moving directly into a topic position in these cases, viz. the
fact that true Stylistic Inversion, which we refer to as light inversion (LI), does
not produce Weak Crossover (WCO) effects (just like Raising, and in contrast
with wh-Movement).5 The basic contrast we appeal to here is shown in (13):
3
It is not entirely clear why the PP in LI does not undergo raising. If the PP can raise from VP
to [Spec, IP] in the first place, then we might expect that it would satisfy the conditions for
raising from a lower [Spec, IP] into a higher [Spec, IP]. Thus it appears that the answer to the
question must be a semantic one. If, for example, seem and appear predicate of [Spec, IP], then
only a referential PP could be in this position as the surface subject of seem and appear. Contrast
the following:
(i) a. Under the table is a good place to put the beer.
b. Under the table rolled the beer slowly.
(ii) a. Under the table seems to be a good place to put the beer.
b. *Under the table seemed to roll the beer slowly.
4
Note that on our current analysis, slowly into the room must be a constituent. This contrasts
with the view taken by R&C, which is that slowly into the room is the remnant of a VP from
which the V has raised. This analysis is ruled out on the present account, due to the presence of
the subject NP within VP.
5
Prior claims for the subject status of the subject status of the PP have been made by Bresnan
(1994) and, in somewhat more complex form, in Stowell (1981), where the PP moves through
subject position en route to a final topic position. We stress that, as indicated, we do not take all
the evidence cited in such sources as genuine support for the analysis of PPs as subjects, though
we agree on the conclusion.
stylistic inversion in english 275
6
Note that whether the fronted wh-element is fronted on its own or is piedpiped, the effect is
the same—as we would expect, given, on the one hand, index percolation at S-structure, and
reconstruction of the preposition back to its D-structure location at LF on the other.
7
An anonymous reader judges examples (16a) and (16b) to be indistinguishable in gram-
maticality. We suspect that the relative acceptability of (16b) is due to a reading of every as each,
which does not produce Weak Crossover violation. Compare (16b) with (i).
(i) In each dogi’s cage itsi most attractive and expensive collar was sitting on a hook.
Replacing every by no should sharpen the judgment for those speakers for whom the difference
between (16a) and (16b) is minimal.
(ii) a. In no dogi’s cage hung itsi collar.
b. *In no dogi’s cage was hanging on a hook itsi most attractive and expensive collar.
c. *In no dogi’s cage itsi most attractive and expensive collar was hanging on a hook.
276 explaining syntax
the VP, parallel to the raised subject who in (13a), while the quantified NP
within the topicalized PP in (14a), like the wh-moved NP in (13b,c), cannot
bind the corresponding pronoun. The difference in the status of the inversion
and topicalization example respectively shown here follows immediately on
the assumption that in (16a) the PP is in a A-position and the subject is in VP,
while in (16b) the PP is topicalized and the subject is linked to [Spec, IP].
A particularly clear demonstration of the contrast we find between the two
kinds of case emerges from the fact that the postverbal quantifier no dog
produces a WCO violation when it binds the pronoun in the PP in A-position
in ??*In itsi cage sat no dogi, just as a quantifier in a direct object produces a
WCO violation when the pronoun is in an NP subject, as in *Itsi master
criticized no dogi. Again, the Stylistic Inversion cases pattern in a fashion
parallel to examples with a quantified subject uncontroversially in [Spec,
IP]. Example (16b), on the other hand, falls together with the case in which
the PP is topicalized and the subject is in [Spec, IP], as in (13d), or e.g. *To
every instructori hisi students gave a teaching award. In this case, as we have
already noted, the PP behaves as though it is reconstructed into the postverbal
position. Compare the examples in (15), which show the same pattern.
The WCO data we have adduced thus points strongly to the conclusion that
the fronted PP is a syntactic surface subject (i.e. is in [Spec, IP]) or, at the very
least, is in a superior A-position with respect to binding, Weak Crossover, and
so on. This hypothesis is consistent with Bresnan’s (1994) proposal that PP is
assigned the SUBJ function under an LFG treatment.
(20) H∗ H∗ H∗
L– !H – L–
8
We are grateful to Mary Beckman for her help in notating the HI intonation.
9
As Mary Beckman has pointed out to us (p.c.), this phrasing correlates rather nicely with
what we claim to be the constituent structure of such examples (see fn. 2 above).
278 explaining syntax
When the verb is unergative, the light NP cannot appear postverbally at all,
while the heavy NP can appear after the VP, but not before it, as we see in (21).
(21) a. *In the room slept Robin.
b. *In the room slept Robin fitfully.
c. *In the room slept fitfully Robin.
d. Remember Robin? Well, in the room slept fitfully . . . robin!
e. In the room slept fitfully the students in the class who had heard
about the social psych experiment that we were about to perpetrate.
f. In the room slept the students in the class who had heard about the
social psych experiment that we were about to perpetrate (very)
fitfully.
Here the crucial contrast is between the (c) example and the (d,e) examples.
Such contrasts follow immediately if a sentence such as (21e) is derived by
movement of the heavy NP subject to the right, as suggested by R&C. If this
approach is correct, we would expect the heavy NP to appear exclusively
external to the VP, since it would then be moving across the entire VP from
[Spec, IP], perhaps adjoined to IP as shown in (22), and the contrast between
(21e) and (21f) indeed shows that this subject must be in a position adjoined
outside the VP slept fitfully, just as the scenario we have outlined requires.
(22) IP
IP NPi
-
Spec I ROBIN
ti Infl VP
. . . Spec -
V
ti - Adjct
V
V fitfully
slept
stylistic inversion in english 279
But then the pattern seen in connection with unaccusative verbs, for example
(18b), where the subject is light and cannot appear in the adjoined position
occupied by the heavy NP in (22), must have a different derivation, one in
which the subject is licensed in a VP-internal position.10 Such a position is
available only to the subject of verbs like walk, given the difference with sleep
that is illustrated here.11,12
Presentational there constructions are standardly taken to illustrate the
existence of a class of unaccusative verbs in English, as in e.g. Coopmans
(1989). But there phenomena also provide independent motivation for
movement of the subject to the right, and for the observation that the
locative PP need not move to the left. Thus, R&C argue that movement of a
heavy NP subject to the right produces presentational there insertion (PTI),
as in (23):
(23) a. There slept fitfully in the next room a group of the students in the
class who had heard about the social psych experiment that we were
about to perpetrate.
b. In the next room there slept fitfully a group of the students in the
class who had heard about the social psych experiment that we were
about to perpetrate.
10
It should be pointed out that when the verb is unaccusative and the subject is heavy, there
is really no way to tell whether the subject is in situ in VP as in the LI construction, or whether it
has moved to the right from [Spec, IP] as in the HI construction. Such a sentence will display all
of the properties of both constructions (since, on our account the conditions for each of the two
homophonous structures will be satisfied) and will therefore have no diagnostic utility vis-à-vis
the proposed analysis.
11
We conjecture that there is a correlation between this structure, in which the unaccusative
subject originates as the direct object of the verb, and the interpretation of ‘movement along a
path’ that is typical of the unaccusative construction. Note that this correlation is construc-
tional, not lexical, given that such an interpretation can be associated with any verb that can be
plausibly used to denote a property of movement along a path:
stumbled
wobbled
(i) Into the room stormed Fred.
blustered
skidded
12
As pointed out by two reviewers, the derivation that we propose for HI raises the question
of how it is that topicalization of PP and movement to the right of the heavy NP can interact. If
the heavy NP moves first, we might expect the resulting structure to be ‘frozen’ (cf. Wexler and
Culicover 1980), blocking subsequent topicalization. But if topicalization applies first, then we
might expect there to be a topic-island effect, blocking subsequent movement to the right of the
heavy subject NP.
As pointed out by Johnson (1985), the evidence that heavy NP shift blocks subsequent
extraction is not conclusive. In the following example, the PP must extract from a VP to
which heavy NP shift has applied.
280 explaining syntax
(i) the refrigerator [into which]i I put tj ti after I got home [all of the beer that I had bought]j
Moreover, compare the following examples:
13
The reader may find these data somewhat surprising, in that on our analysis the well-
formed inversion examples are analyzed as instances of PP topicalization. Yet it is well known
that topicalization within nonfinite clauses is typically extremely degraded. But this is far less
true in the case of gerundives than infinitives. Compare e.g.
(i) that solution Robin having already explored t and rejected t, she decided to see if she could
mate in six moves with just the rook and the two pawns.
(ii) *I really want *that solution Robin to explore t thoroughly.
It thus appears that gerundive clauses are rather more tolerant of topicalization than infini-
tive clauses; in fact, this is essentially what we would predict if the Case-assignment proper-
ties of gerundives are as analyzed in Reuland (1983), where the subject of gerunds is governed
by the verbal affix and thus an internal source of Case is available to such subjects, as opposed
to infinitive clauses, whose overt subjects must be in all cases be externally governed in order
to receive Case. We grant however that they are probably not up to the standard of normal
finite clause complementation and might therefore strike some readers as less than fully
natural.
282 explaining syntax
(28) a. I decided to let no one into the room; in fact, *into the room
I prevented t from walking Robin.
b. Into the room I even prevented t from walking . . . robin!
c. Into the room I even prevented t from walking a group of the
students in the class who had heard about the social psych experi-
ment that we were about to perpetrate. [HI intonation]
d. I decided to allow no one to do anything in this church; in fact, from
this pulpit I even prevented t from preaching a close associate of the
great Cotton Mather. [HI intonation]
(iii) Heavy NP shift corresponds to control of PRO by an ‘invisible’ subject
coindexed with the postverbal heavy NP, as in (29d,e).
(29) a. Robin expected PRO to walk into the room.
b. Into the room Robin expected PRO to walk.
c. *Into the room t expected PRO to walk Robin.
d. Into the room t expected PRO to walk . . . robin! [HI intonation]
e. We had set up the protocols perfectly to ‘trick’ the students, so that
into the room t fully expected PRO to walk a group of the students in
the class who had heard about the social psych experiment that we
were about to perpetrate. [HI intonation]
f. Preaching from this pulpit is a great achievement and people come
from near and far hoping to do it. In fact, from this pulpit t expected
PRO to preach a number of close associates of the great Cotton
Mather himself. [HI intonation]
(30) a. Robin avoided PRO walking into the room.
b. Into the room Robin avoided PRO walking.14
14
The following example is ill-formed on normal intonation:
(i) Remember Robin and her fear of windows? *Well, predictably, into the room t avoided
PRO walking Robin.
But note that the following examples appear to be well-formed with the appropriate prosody:
(ii) They said that not everyone would recklessly walk into the room, and, predictably, into the
room t avoided PRO walking . . . robin! [HI intonation]
(iii) We had set up the protocols perfectly to ‘trick’ the students. But for some reason, into the
room t avoided PRO walking a group of the students in the class who had heard about the
social psych experiment that we were about to perpetrate. [HI intonation]
(iv) Preaching from this pulpit was known by many to be terribly unlucky; in fact, from this
pulpit t, studiously avoided PRO preaching any sane associate of Cotton Mather/even the
least superstitious of Cotton Mather’s associates. [HI intonation]
stylistic inversion in english 283
These and the previous examples raise the obvious question of how the ECP is to be satisfied
with respect to the trace in subject position. The question is actually more complicated, in view
of the problems noted in Culicover (1993b) and Ch. 9 above in accounting for the that-trace
effect in terms of the ECP. For an interesting approach to these problems, see Rizzi (1997); full
discussion of the possible sources of the that-trace effect and their interaction with the structures
we are positing for heavy inversion would take us well beyond the scope of the present paper,
and we leave investigation of this issue for future work.
15
It is not clear to us how the PP gets into topic position in (31b), in view of the ungram-
maticality of (i).
(i) *We saw into the room an angry horde of Tolstoy scholars run.
We leave this question as an unsolved problem. It is possible that the phenomenon seen here is
related to that of French exceptional case marking, where the subject of an infinitival cannot
appear in situ but can be extracted if it is an interrogative or a clitic pronoun (Kayne 1981b).
284 explaining syntax
(34) a. Everyone seemed very hungry today. For example, into the cafeteria
have both gone the two students that I was telling you about. [HI
intonation]
b. From this pulpit have both preached Cotton Mather’s two closest
and most trusted associates. [HI intonation]
By contrast, when the subject is light, as in (35), it cannot be the antecedent of
the floated quantifier, as (36) and (37) illustrate.
(35) a. Both the students have gone into the cafeteria.
b. The students have both gone into the cafeteria.
(36) a. Q: Who went into the cafeteria? A: Into the cafeteria have gone both
(of the) students, I think.
b. Q: Who went into the cafeteria? *A: Into the cafeteria have both
gone the students, I think.
(37) a. Into the mists of history are quickly disappearing both my heroes.
b. *Into the mists of history are both quickly disappearing my heroes.
The evidence thus suggests, once again, that the heavy subject is moving to the
right from [Spec, IP], while the light subject is in situ in VP.
There are several other differences between LI and HI that do not involve
the subject NP directly:
(vi) HI but not LI allows long extraction of the XP from a tensed
complement.
(38) a. *Into the room I claim/believe walked Robin.
b. *Into the room I claim/believe/expect t will walk Robin.
c. *From this pulpit I claim/believe/expect t will preach Robin.
(39) a. Into the room I claim/believe/expect ti will walk . . . robini! [HI
intonation]
b. From this pulpit I claim/believe/expect t will preach (eloquently) . . .
robin! [HI intonation]
(40) a. Into the room I claim/believe/expect ti will walk [a ravenous horde
of angry Tolstoy scholars]i. [HI intonation]
b. From this pulpit I claim/believe/expect ti will preach (incoherently)
[a series of ravenous Tolstoy scholars]i. [HI intonation]
The key point here is the contrast between (38) on the one hand and (39) and
(40) on the other, pointedly demonstrating the difference in extraction
stylistic inversion in english 285
16
Note that if the light subject is in situ, the awkwardness of extracting from it must be due to
the fact that it is the logical and not the syntactic subject of the sentence. This observation recalls
the proposal of Culicover and Wilkins (1984) that extraction from the antecedent of a predicate
diminishes acceptability, regardless of the syntactic configuration in which the antecedent
appears. This specific effect need not, and apparently is not, universal, given that extraction
from postverbal unaccusative subjects is fine in other languages such as German and Italian. But
the language-specific nature of such restrictions is unsurprising and well attested elsewhere;
thus, in English, gaps within subjects are only sanctioned as part of parasitic gap constructions
(modulo a limited class of examples noted in Ross 1967), while in Icelandic such gaps may occur
freely even without coindexed gaps elsewhere in the clause, as noted in Sells (1984).
286 explaining syntax
(44) *Whoi did you say that you saw ti yesterday [offensive friends of tj]i
(vii) HI but not LI (marginally) allows where.
We begin with the general observation that while a relative PP produces
Stylistic Inversion, both relative and interrogative where block inversion, as
illustrated in (45)–(52).
(45) a. the place to which Robin went
b. the place where Robin went
(46) a. the place to which went Robin
b. *the place where went Robin
(47) a. the city in which all my relatives live
b. the city in which live all my relatives
(48) a. the city where all my relatives live
b. *the city where live all my relatives
Similarly for interrogative PP vs. where:
(49) a. To which place did Robin go?
b. Where did Robin go?
(50) a. To which place went Robin?
b. *Where went Robin?
(51) a. In which city do all your relatives live?
b. Where do all your relatives live?
(52) a. In which city live all your relatives?
b. *Where live all your relatives?
These facts appear at first sight to be totally mysterious. Notice, however, that
the ungrammatical examples are greatly improved by introduction of an
adverb, an apparent instance of the Adverb Effect (Chapter 9).
(53) a. This is the city where for the most part live all my relatives.
b. This is the city where for most of the year live all my relatives.
c. ?Leslie asked me where, at that point, had gone the thieves who had
taken my money.
d. ?(Leslie was wondering) where for most of the year live all of your
most favorite relatives.
Significantly, however, there is no improvement unless the postposed subject
is relatively heavy.
stylistic inversion in english 287
(54) a. *This is the city where for the most part lives Robin.
b. *This is the city where for most of the year lives Robin.
c. *(Leslie asked me) where at that point went Robin.
d. *(Leslie was wondering) where for most of the year live your kids.
The efficacy of the Adverb Effect when there is HI, but not when there is LI,
once again strongly suggests that there are two different structures for the two
constructions. More precisely, it appears that the landing site for where in HI
is the complementizer position or [Spec, CP], producing a C-t effect that is
ameliorated by the Adverb Effect. But apparently there is no landing site for
where in LI. If, as we have suggested, LI involves movement of a PP into [Spec,
IP], we can explain the absence of a landing site by positing that where is not a
PP in the required sense, but an NP, since NPs—for reasons that of course
need to be explained—fail to participate in LI.17,18
To sum up, we have observed several distinct syntactic phenomena that
support the claim that there are two SI constructions, HI and LI, notably:
that it is possible to postpose only a constituent corresponding to a
heavy subject in the cases of various kinds of nonfinite complements
(see (i)), gerundives (see (ii)), configurations of control (see (iii)),
and complements of a perception verb (see (iv));
that only the heavy antecedent of a floated quantifier can postpose
(see (v));
that only the heavy subject of an embedded complement can post-
pose when a constituent of a that complement has been fronted to
the matrix (see (vi));
17
Note e.g. that where can be a tough subject, in spite of the fact that PPs are typically ruled
out as subjects of tough predicates: ??*In which room would be easiest to hold the exam?, but
Where would be easiest to hold the exam?
18
We can only consider briefly here the restriction that allows only PPs in [Spec, IP] to
trigger locative inversion. Suppose that NP movement paralleled PP movement to create
inversion structures. Consider the following contrast:
(i) a. Robin ran into the room.
b. Into the room ran Robin.
c. [e] Infl [[ran Robin] into the room]
(ii) a. Robin ran the race.
b. *The race ran Robin. [on the same reading as (ii.a)]
c. [e] Infl [Robin [ran the race]]
The (a) examples show the consequence of moving the subject into [Spec, IP]. The (b)-examples
show what happens when we move the non-subject out of VP into [Spec, IP]. The approximate
underlying structures are given as the (c)-examples, where (i.c) follows (1) in the text.
We assume that a D-structure subject in [Spec, VP] will be assigned an agentive Ł-role.
288 explaining syntax
19
One additional piece of evidence that the postverbal NP is in situ comes from superiority
effects: LI, unlike wh-Movement, does not produce strong superiority violations, as shown in
(i)–(iv):
(i) a. Who did what?
b. *What did who do?
(ii) a. Who came out of which room?
b. *Out of which room did who come?
c. (?)Out of which room came who?
(iii) a. Who did you claim t did what?
b. *What did you claim who did t?
(iv) a. Who did you claim came out of which room?
b. *Out of which room did you claim who came?
c. Out of which room did you claim came who?
(Cf. ?Which man saw who?)
stylistic inversion in english 289
10.4 Conclusion
If the arguments presented in the preceding discussion are on the right track,
it is necessary to reassess the data standardly cited by syntacticians offering
accounts of English Stylistic Inversion, so that such accounts are to be
expected to correctly predict the well-formedness status of inversion just in
case the subject can be light. In support of this claim, we have presented
evidence from Weak Crossover phenomena that preposed PPs in (light)
inversion are genuine subjects, rather than topicalized constituents, and
then provided several strands of evidence, involving intervention effects,
infinitival and gerundive complements and associated control phenomena,
perception verb complementation, quantifier float, and a variety of other
phenomena, which clearly sort the possible appearance of postverbal heavy
NPs from those which allow light NPs. The data which have in the past been
used to argue that fronted PPs are subjects which can undergo raising are a
further case in point, since as we showed earlier these examples are only well-
formed when the postverbal NP is heavy. The simplest account of these
effects, we believe, is to recognize the possibility that subjects as well as objects
can heavy shift. Such a conclusion in turn raises several important theoretical
questions.
What licenses the trace in subject position when HI heavy-shifts the
subject?
The general impossibility of heavy-shifting subjects of finite clauses would
lead one to conclude that the resulting subject traces are not properly
governed, giving rise to an ECP effect. But as we have noted earlier, reducing
the that-t effect to the ECP is not entirely straightforward (see note 15).
Why is HI as well as LI incompatible with an overt object?
In the case of LI, it seems reasonable to take this property as a reflection of the
restriction of LI to unaccusative verbs, which of course do not take a direct
Again we see that the PP in Stylistic Inversion displays subject rather than topic or wh-moved
properties: (iic) is essentially comparable in acceptability to (ii.a), while (ii.b), containing a wh-
moved PP, displays the strong unacceptability of a classic superiority effect violation. An
anonymous reader writes that some speakers find it difficult to perceive the intended difference
between (iv.b) and (iv.c), although to our ears it is quite sharp. Let us replace who by how many
people :
(v) a. *Out of which room did you claim how many people came?
b. Out of which room did you claim came how many people?
In our judgment this move strengthens the superiority effect in (iv.b) to the point that the
sentence is virtually uninterpretable, but leaves (iv.c) unchanged.
290 explaining syntax
object in addition to their surface subject; but why should the same restriction
carry over to HI, whose derivational history should make it irrelevant whether
or not an object is present? On the contrary, it is standardly assumed that no
examples of Stylistic Inversion are possible with direct objects:
(56) a. A bunch of teenagers in funny hats had put some gum into the gas
tank of our motorcycle.
b. *Into the gas tank of our motorcycle had put some gum a bunch of
teenagers in funny hats.
We believe that any full discussion of this point must take into account the
fact that, although awkward, there are examples of HI containing direct
objects which we believe to be grammatical:20
(57) a. In the backyard were quietly sunning themselves k a group of the
largest iguanas that had ever been seen in Ohio.
b. The economist predicted that at that precise moment k would turn
the corner k the economics of half a dozen South American
nations.21
c. In the laboratory were dying their various horrible deaths the more
than ten thousand fruit flies that Dr. Zapp had collected in his
garden over the summer.
d. Outside in the still upright hangar were heaving deep sighs of relief
the few remaining pilots who had not been chosen to fly in the worst
hurricane since hurricanes had names.
Our analysis predicts that such examples should exist; what remains at issue is
the distinction between cases such as (57) on the one hand vs. (56b) on the
other. We note that the direct objects in the examples in (57) are not
referential. This fact suggests that what allows such cases is that the verb
phrases are thematically intransitive, i.e. no Ł-role is assigned to the direct
object. Sun oneself means ‘to sun’, turn the corner in this case is an idiom that
means ‘improve’, die a horrible death means die horribly, and heave a sigh
20
As above, we indicate with the notation k a major prosodic juncture. Such junctures
appear in what we take to be acceptable utterances of these examples.
21
Unquestionably, turn the corner is at least semi-idiomatic. Nonetheless, the fact that this
idiomaticity is preserved under passivization (e.g. The corner was finally turned on July 10, when
the Ostrogoth economy finally emerged from its deep recession) indicates that the corner is indeed
an internal syntactic argument of the verb, which can therefore hardly be regarded as exhibiting
intransitive, much less unaccusative argument structure here. Similar observations hold for
(57d), e.g. After the crisis brows were mopped, deep sighs of relief were heaved, and then everyone
got back to work.
stylistic inversion in english 291
Computation
This page intentionally left blank
11
A reconsideration of Dative
Movements
(1972)*
Ray Jackendoff and Peter W. Culicover
Remarks on Chapter 11
The first section of this article is a transformational account of dative alternations
to
(V–NP2–NP1) with to and for (V–NP1– –NP2). We provided an account of
f or
the fact that the double object construction that is related to to has different
syntactic and semantic properties than the double object construction that is
related to for. The facts discussed in this section in fact suggest, from a contem-
porary perspective, that these alternations are lexically governed constructions,
in the sense of Goldberg (1995).
The remainder of the article is concerned with the fact that A0 constructions
where the gap is the indirect object are less than fully acceptable. This article
was one of the first in the literature to suggest that data that had been
previously thought of as being the responsibility of syntactic rules actually
reflect aspects of the computation of the meaning of a sentence based on its
form. We hypothesized that identification of the gap corresponding to an A0
filler is triggered only when the syntactic context requires it. Hence in the
sentence *Who did you give a book the processor does not posit a gap between
give and a book, because a book satisfies a requirement of the verb give. The
processor expects a gap after the preposition to, but when the end of the
sentence is reached, there is no to. Hence there is no gap for the filler, which
we propose results in a processing error and the judgment of unacceptability.
11.1 Introduction
Two well-known transformational relationships are the shifts of indirect
objects with to and for.
(1) Bill gave a book to Mary.
(2) Bill gave Mary a book.
(3) Bill bought a book for Mary.
(4) Bill bought Mary a book.
To explain differences between the two processes, standard analyses of the
dative, for example Fillmore (1965), generally postulate two similar Dative
Movement rules, one of which applies to to-indirect objects and the other to
for-indirect objects. In this paper we will show that this analysis can be improved
somewhat within the framework of traditional transformational rules. How-
ever, not all difficulties can be eliminated in this way. In an effort to further
improve the solution, we will show that, on independent grounds, constraints
imposed by the hearer’s perceptual strategy for interpreting sentences play a part
in the unacceptability of certain constructions. These constraints will then be
used to account for the remaining anomalies in the dative shift paradigms.
1
This is shown by examples like the following (pointed out by Klima):
(i) a. This table has been eaten at by many famous people.
b. *This table has been eaten food at by many famous people.
c. Food has been eaten at this table by many famous people.
(ii) a. This violin was once played by Heifetz.
b. *This violin has been played the Kreutzer Sonata on by Heifetz.
c. The Kreutzer Sonata has been played on this violin by Heifetz.
Only when there is no direct object intervening between the prepositional phrase and the verb
can the object of the preposition undergo the passive. Cf. also examples (71)–(78).
a
Sentences such (9) are often said to be ungrammatical, although preferable when the direct
object is pronominal, e.g. A book was given her by John. If (9) is ruled out, then the discussion to
298 explaining syntax
Thus with either ordering of to-dative and Passive we are unable to generate
the full range of sentences. To avoid this difficulty we might resort to a
solution like Fillmore’s, involving an extension of the environment of Passive.
Fillmore constructs the rules in such a way that the sequence V + to-NP can be
considered a verb for the sake of the Passive transformation. In this way the
NP a book can be considered to be next to the verb in (13), so that it can be
moved into subject position by Passive, forming (9). A more satisfying
solution will be proposed later on.
Now assume that Passive has been altered in a suitable way so that we can
get the full range of dative and passive sentences (9) and (12)–(15). Now let us
question the direct and indirect objects in these sentences, using the rule of
wh-Movement.
(16) What did John give to Mary?
(17) Whom did John give a book to? (from (12))
(18) What did John give Mary?
(19) *Whom did John give a book? (from (13))
(20) What was given to Mary by John?
(21) Whom was a book given to by John? (from (14))
(22) What was Mary given by John?
(23) Who was given a book by John? (from (15))
(24) What was given Mary by John?
(25) *Whom was a book given by John? (from (9))
How can we prevent the perfectly general rule of wh-preposing from produ-
cing the questionable sentences (19) and (25)? Fillmore utilizes the rather
artificial device of prohibiting the transformation of wh-Attachment from
applying to NPs that are to-indirect objects positioned next to the verb. As
Kuroda (1968) points out, however, wh-Attachment is not a transformation in
post-Aspects generative theory; rather, the base generates the wh-marker with
the noun phrase it is associated with at the surface. If this is the case,
Fillmore’s solution can no longer be stated.
Furthermore, and more important, Kuroda shows that the ungrammatic-
ality of (19) and (25) is not due to their being questions, since the non-preposed
follow can be considerably simplified. In fact, since passive applies to the two complement
structures of the VP, we may take give Mary a book to be an instance of the dative construction,
alternating with give a book to Mary. The corresponding passive construction maps the first
postverbal argument to the syntactic Subject, yielding (14) and (15).
a reconsideration of dative movements 299
versions (26) and (27) are as acceptable as any other wh-question in which the
wh is not preposed.
(26) John gave whom a book?
(27) A book was given whom by John?
b
(35) is the counterpart of (9).
2
Passives of the form (36) seem to vary in acceptability, depending on a number of factors, some
of which we can make explicit. There are two semantically distinct for-datives, only one of which
undergoes passive. The first type is exemplified in (32)–(36), where it can be said that as a result of the
event Mary has a new wardrobe. The other type is exemplified by John played a tune for Mary, which
undergoes the dative shift but not the passive analogous to (36): ?*Mary was played a tune by John.
This event does not have as one of its results that Mary has a tune. There also seem to be some factors
of length involved: Mary was bought a book by John seems somewhat less acceptable than (36).
300 explaining syntax
the two rules. The difference between the to-dative and the for-dative lies in
the fact that the passive of the direct object with the indirect object prepos-
ition deleted is grammatical for to-dative (9), but ungrammatical for for-
dative (35). It is the form (9) which requires an alteration to the passive
transformation in Fillmore’s solution.
As in the case of to-dative, questioning all combinations of the for-dative
paradigm produces some questionable sentences.
(37) What did John buy for Mary?
(38) Whom did John buy a book for? (from (32))
(39) What did John buy Mary?
(40) *Whom did John buy a book? (from (33))
(41) What was bought for Mary by John?
(42) Whom was a book bought for by John? (from (34))
(43) ?What was Mary bought by John?c
(44) Who was bought a book by John? (from (36))
Of course, the questions formed from the ungrammatical (35) are ungram-
matical too.
(45) *What was bought Mary by John?
(46) *Whom was a book bought by John?
Still, the problem of accounting for the ungrammaticality of (19), (25), (40),
and (43) remains.
Thus far we have improved on Fillmore’s solution to the problems arising
from the interaction between the dative shifts and Passive. It still remains to
give some account of the restrictions on wh-Movement. One could retain
Kuroda’s solution, in which an ad hoc restriction is placed on preposing
transformations operating on certain dative constructions, and still have a
grammar superior to Fillmore’s with respect to the dative paradigms.d
c
As far as I know the unacceptability of this sentence has not been discussed in the
subsequent literature, and remains a puzzle, in view of the acceptability of the corresponding
What was Mary given by John?
d
I have omitted here an analysis that at tempts to conflate the derivation of the dative
constructions with other cases that involve PP–PP complements. The analysis assumes reorder-
ing of the PP complements and lexically governed deletion of the preposition in the first PP. In
contemporary constructional terms it is far more straightforward to specify that particular verbs
select NP–PP or PP–PP complements, where the PPs are headed by specific prepositions.
a reconsideration of dative movements 301
3
The proposals of this section are similar to those of Bever (1970) and Klima (1970), but were
arrived at independently.
e
In contemporary terms the task would be better characterized in terms of assigning
thematic roles to the arguments.
4
This restriction is most extensively discussed in Keyser (1967). Other properties of the rule
are discussed in Ross (1967) and Akmajian (1970).
302 explaining syntax
What is the exact form of the restriction on Extraposition from NP? From
the examples so far, the condition seems to be that the relative clause cannot
cross over another NP. This condition in itself is rather strange. But in fact the
condition must be more complicated than that. Consider the following cases,
which vary from plausible to very bad.
(53) ?The man went to Philadelphia [who loves Mary].
(54) ?*The man kicked the snail [who loves Mary].
[relative clause on man]
(55) ?*The man hit John [who loves Mary].
(56) ?John hit the man in the stomach [who loves Mary].
(57) *The man hit John in the stomach [who loves Mary].
The generalization seems to be that acceptability is inversely correlated with
the plausibility of generating the final relative clause with another, nearer
NP. This is certainly a very strange condition to put on a transformation,
prohibiting it just in case it would produce an ambiguity. It runs counter to all
the usual notions of how structural ambiguities are developed by the grammar.
In terms of a theory of perceptual strategy, this restriction makes a certain
amount of sense. Consider the interpretation of (48). At the stage at which a
man came in has been heard, it is known that the next word to follow will not
be related to in in any way. Who signals the beginning of a relative clause, since
we are not currently in the middle of an NP, and an appropriate NP must be
found for it to apply to. The only eligible one in the sentence is the subject, so
the correct interpretation results. In (52), however, boy is not necessarily the
end of its NP; in particular, a relative clause is the possible continuation of the
NP. Therefore, who occasions no surprise: it is automatically put with boy, and
is given no chance to associate with girl.
Now consider the intermediate cases (53)–(57). In (53), the proper noun
Philadelphia leaves open the possibility of an appositive relative following it,
and so the relative pronoun who to some extent confirms this possibility. On the
other hand, who is an inappropriate relative pronoun for Philadelphia, and the
lack of a pause means that the relative clause cannot be an appositive, so after a
moment’s confusion the interpreter looks for another source for the relative. In
(54) and (55) the plausibility of the relative going with the final NP is higher
than in (53): (54) is only a violation of gender, and (55) only lacks a pause for the
relative to be grammatically associated with John as an appositive. Therefore the
tendency to interpret the relative as semigrammatically associated with the final
NP is stronger, and so attaching the relative to the subject is less plausible.
a reconsideration of dative movements 303
f
For an analysis in which the extraposed clause is interpreted in its surface position, see Ch. 6
above and Culicover and Rochemont (1990).
5
This example was pointed out to us by John Limber.
304 explaining syntax
(59) creates difficulties in both of these respects at once. Many people try
to interpret it as they would (62), with a bandage as part of the subordinate
clause. This is because bit a bandage is an actually occurring sequence in a
single clause: hence the gap into which the relative pronoun may fit is not
immediately apparent. But if a bandage is part of the VP, there is no place in
the relative clause for the relative pronoun, since bite can only take a single
object. Furthermore, if a bandage is part of the relative clause, the main
verb give will not have been provided with its full range of complements.
Thus the logical decision to put a bandage in the relative clause results in
confusion.
(60) is an example of the opposite problem of interpretation: the end of the
relative clause is guessed to be sooner than it actually is. The critical part of
the sentence is the sequence believed was, which does not occur unless an NP
has been moved away, and which therefore signals that a transformation has
applied. But apparently the first hypothesis is that the sentence will be of the
same form as (63), with was as the verb of the main sentence. Thus the real
main verb, died, comes as a surprise.
In (61) the problem is again that of finding an appropriate place for the
relative pronoun, which has been fronted from the position after want. Since
want to leave is a permissible string in a non-relativized sentence. I, rather
than the relative pronoun, is interpreted as the subject of leave. The gap for
the relative pronoun to fit into is assumed to be further to the right, as in (64).
Then, when no gap occurs, the usual confusion results. Note that (65)
presents no such problem, since believe to be sick does not occur unless an
NP has been moved away from after believe.
To see more clearly that an appeal to perceptual strategy is useful here,
consider how the distinction in acceptability between (60) and (64) would
have to be captured if it were a restriction on transformations. The wh-
preposing rule would have to be prohibited from operating in a very particu-
lar situation—when it is trying to prepose an NP from the position circled in
(66), just in case the preceding verb permits complement subject deletion to
take place (the difference between want and believe).
(66) S
NP VP
V NP VP
a reconsideration of dative movements 305
Like the transformational constraint needed for extraposition from NP, this
restriction seems highly unlikely. A solution employing perceptual strategy
seems to give a much more motivated account of the restriction.g
g
Recent computational work attributes certain judgments of unacceptability to ‘surprisal’,
i.e. the predictability of the continuation of a (parsed) string of words. See e.g. Hale (2003) and
Levy (2008).
306 explaining syntax
h
A more contemporary characterization of what happens here is that the gap is posited, but
is immediately suppressed by the presence of the following NP that can serve as the direct object.
a reconsideration of dative movements 307
preposing would precede Passive, rather than the other way around, as we are
proposing.
In (22), then, Mary will be recognized as coming from a position directly
after the verb. This yields the string (intermediate in the process of interpret-
ation) give Mary. Since give requires two objects before by occurs, the gap is
recognized to be after Mary as soon as by is perceived. In (25), undoing the
passive gives the string give a book. As in (19), this is a possible string, so it is
expected that to will follow. When instead by follows, a gap is recognized, but
it is not the expected gap, and hence the sentence is judged unacceptable.
This leaves the case of (43), which we frankly find to be a mystery. Note the
slight unnaturalness of the passive Mary was bought a book, which may be due
to the passivization of an optional for-object (to-objects are obligatory),
leaving no trace of the characteristic preposition.6 Who was bought a book
seems similar in acceptability. (43) is somewhat worse, perhaps because at the
stage at which only what was Mary has been perceived, the most suggestive
hypothesis about the structure is a continuation along the lines of what was
Mary doing. This may interact with the slight unnaturalness of the actual
declarative form to produce some confusion.
None of these problems concerning questions arise with other verbs that
permute objects, where both objects are PPs.
(67) Who did you speak to about the movie?
(68) What did you speak to Harry about?
(69) Who did you speak about the movie to?
(70) What did you speak about to Harry?
(71) Who was the movie spoken about to?
(72) What was Harry spoken to about?
(73) Who did he credit with the discovery?
(74) What did he credit Bill with?
(75) Who did he credit the discovery to?
(76) What did he credit to Bill?
(77) Who was the discovery credited to?
(78) What was Bill credited with?
6
Cf. also fn. 2 above in this connection.
308 explaining syntax
In the first set of examples there is always a bare preposition signaling the gap.
In the second, there is either a bare preposition or a string V+P, which also
signals a gap, since in declarative form the verb is always followed by an
NP. Thus these cases differ from the to- and for-dative in that their indirect
objects leave noticeable gaps when they are fronted from postverbal position.
This is not the case with true to- and for-dative indirect objects.
This approach explains nicely Kuroda’s observation that the restriction has
to do with fronting, not with the process of questioning. In echo questions,
where the wh-phrase is not moved from its position, it is obvious that no
problem will arise in finding where it came from. Likewise, it explains why
corresponding sentences are bad in the topicalized and cleft instructions
(28)–(31). Again the difficulty lies in finding the gap in the VP from which
the preposed element was removed, and the same problem of being unable to
correctly detect the gap arises in case the indirect object has been fronted or
deleted from postverbal position. An explanation in terms of perceptual
strategy thus accounts for the fact that three independent rules have identical
strange restrictions.
The fact that our approach to these problems appeals to performance
should not be interpreted as sweeping the problem under the rug. A general
solution within the bounds of statement and ordering of transformations
seems out of the question; if we wish to preserve the generality of the
transformations, we must appeal elsewhere. The fledgling theory of percep-
tual strategy we have presented seems to be in general agreement with the
models proposed in Fodor and Garrett (1967) and Bever (1970), developed
from the results of experimental work.
Nor should the fact that certain sentences appear to be rejected on grounds
of performance be interpreted as an indication that the competence/perform-
ance distinction ought to be abandoned. The distinction between the rules of
the grammar and how the rules are used by the speaker or hearer to create or
interpret sentences is still scrupulously maintained. All that is changed is that
it is no longer so obvious what sentences are to be generated by the rules: we
cannot rely entirely on intuition to determine whether an unacceptable
sentence is grammatical or not (using ‘grammatical’ in the technical sense,
‘generated by the grammar’). Though this makes the linguistic theory of
Aspects (Chomsky 1965) more difficult to apply in practice, it does not by
any means make it conceptually unsound. Rather, the appeal to performance
made here is precisely parallel to the case of center-embedded sentences
discussed in Aspects, chapter 1, section 2, which is used to illuminate and
sharpen the competence/performance distinction.
12
Markedness, antisymmetry,
and complexity of
constructions
(2003)*
Peter W. Culicover and Andrzej Nowak
Remarks on Chapter 12
Our concern in this chapter is with the interactions between language change,
language acquisition, markedness, and computational complexity of map-
pings between grammatical representations. We demonstrate through a com-
putational simulation of language change that markedness can produce ‘gaps’
in the distribution of combinations of linguistic features. Certain combin-
ations will not occur, simply because there are competing combinations that
are computationally less complex. We argue that one contributor to marked-
ness in this sense is the degree of the transparency of the mapping between
superficial syntactic structure and conceptual structure. We develop a rough
measure of complexity that takes into account the extent to which the syn-
tactic structure involves stretching and twisting of the relations that hold in
conceptual structure, and we show how it gives the right results in a number
of specific cases.
This work was followed up in Culicover and Nowak (2003) and more
recently Culicover (2013). It elaborates on the view that much of the explan-
ation of what constitutes the syntax of a language, and syntax in general, derives
from the properties of the computation of the form–meaning correspondence,
viewed in terms of the reduction or avoidance of complexity.
* [This chapter appeared originally in Pierre Pica and Johan Rooryk (eds), Language
Variation Yearbook, Vol.2 (2002). It is reprinted here by permission of John Benjamins.]
310 explaining syntax
12.1 Introduction
One of the strongest arguments for the thesis that the human mind possesses
a Universal Grammar (UG) with specific grammatical properties is that
languages do not appear to have arbitrary and uncorrelated properties.
What we find, rather, is that the properties of languages cluster, and that
there are asymmetries among the logical possibilities. For example, VSO
languages are always prepositional, and SOV languages are usually postpos-
itional (Greenberg 1963: 78–9). There are languages that express wh-questions
using leftward movement to a peripheral position in the clause, and there are
languages that express wh-questions without overt movement. But there do
not appear to be languages that express wh-questions using rightward move-
ment to a peripheral position in the clause.
It is natural, given observations such as these, to posit that they are direct
reflections of UG, which the language learner draws upon in choosing or
constructing grammars. However, there are two other possibilities that have
to be ruled out before such a conclusion can be drawn. First, the clustering of
properties and the absence of certain logical possibilities may be due to social
forces. In such a case we would not expect to find the same asymmetries in
different parts of the world where languages are not genetically related or in
contact. Second, these asymmetries may be due to the interaction between the
grammatical or processing complexity of certain constructions and social
forces. On this view, all of the logical possibilities are linguistic possibilities,
but those that are more complex tend to lose out over time to their less
complex competitors as linguistic knowledge is transmitted from generation
to generation in a network of social interactions.
The intention of this paper is to explore and make somewhat more precise
these scenarios. We make the background assumption that language change
occurs in part as the consequence of different learners being exposed to
different evidence regarding the precise grammar of the language that they
are to learn. Following the original insight of Chomsky (1965), we assume that
learners chose the most economical grammar consistent with their experience,
and even overlook counterevidence to the most economical solution unless the
counterevidence is particularly robust. It is reasonable to understand economy
in terms of the complexity of the grammatical representation that is to be
learned (although there are many other ideas around). To the extent that
learners reduce complexity we will then expect language change to reflect this
preference in the relative ubiquity of certain grammatical devices compared
with others, and even in the appearance of universals (Briscoe 2000).
We will begin by illustrating the ways in which language change gives rise to
correlations of properties; it will be demonstrated that some combinations are
markedness and antisymmetry 311
1
This notion of construction is related to that of Construction Grammar (see e.g. Goldberg
1995), in that we assume, with Jackendff (1990), that grammatical knowledge consists of syntax–
semantics correspondences.
2
Latané (1996), Nowak et al. (1990). Nettle (1999) independently hit upon the idea of using
the Latané–Nowak approach to Social Impact theory in a computational simulation of language
change.
312 explaining syntax
12.2.2 Gaps
12.2.2.1 How gaps arise
We suppose for the sake of the simulation that the class of possible grammars
of natural languages can be characterized entirely in terms of values of
features.3 A prevalent view in current linguistic theory is that most if not all
of the most theoretically interesting aspects of language variation, language
change, and language acquisition can be accounted for in terms of a small set
of binary features, called ‘parameters’. For our purposes, however, it is suffi-
cient to assume that whatever the features are, however many there are, and
whatever values they have, learners are influenced to adopt the values of their
community through social interaction.
Our simulation supposes that there are three two-valued features, which
define eight distinct languages.
(1) +F1,+F2,+F3
+F1,+F2,F3
+F1,F2,+F3
+F1,F2,F3
F1,+F2,+F3
F1,+F2,F3
F1,F2,+F3
F1,F2,F3
Gaps occur when certain feature combinations are not attested. Our simula-
tion shows that gaps may arise over the course of time, as the values of two of
the features become strongly correlated. To take a simple example, if the
geographical distribution of [F2] becomes sufficiently restricted, it may
fail to overlap with [+F1]. That is, [+F1] and [+F2] become highly correlated.
In such a case, some of the languages, namely those with [+F1,F2], will cease
to exist. Such a situation may occur simply as a consequence of the social
structure, and in itself tells us nothing interesting about the relationship
between [+F1] and [F2].
For the simulation, we may assume that at the outset of the simulation all
possible combinations of features are possible (the ‘Tower of Babel’ state). The
reasoning is that if certain combinations fail to exist after some period of
time, this fact must be due to social factors, since there are no initial gaps. If
3
In fact this must be true in a trivial sense; see Culicover (1999) for discussion.
markedness and antisymmetry 313
we allowed for initial gaps, i.e. innate implicational universals, then the
appearance down the line of gaps would not provide any evidence about
the effect of social interaction on the distribution and clustering of linguistic
properties.
Figure 12.1 shows the random distribution of feature values for three
features in a population of 2500 (=5050). The upper left-hand image
shows the distinct languages as differences on the gray scale. The other images
show the distribution of + and values for the three features FIRSTs,
SECONDs, and THIRDs.
The population of each of the eight languages is shown in the histogram in
Figure 12.2. As can be seen, the languages are distributed more or less evenly
over the entire population, as would be expected from a randomized assign-
ment of feature values.
We have omitted intermediate steps in the simulation due to limited space.
After 69 steps the distribution of languages and features is as in Figure 12.3. The
histogram in Figure 12.4 shows the population levels of the eight languages at
this point. The loss of languages illustrated in this particular instance of the
simulation is not unique. It is a consequence of the particular assumptions
Histogram of Languages
121
0 Histogram (Classification)
424
60
50
Number of runs
40
30
20
10
0
1 2 3 4 5 6 7 8
Number of languages at step 200
In 50 of the 100 runs of the simulation there were eight languages after 200
steps. But in 32 runs there were 7 languages, in 10 runs there were 6 languages,
and so on. So while the precise number of languages that will remain after a
certain number of steps is not predictable, it is clear that gaps in the set of
languages can and will arise over the course of time as a consequence of the
interaction in the network. The chart in Figure 12.6 shows that over a longer
time span the number of languages for the same simulation tends to decline.
25
Number of runs
20
15
10
0
1 2 3 4 5 6 7 8
Number of languages
results will look like those we have already seen. However, on every run of the
simulation model the results will be more or less the same, in that there will be
gaps or immanent gaps in [+F1, –F2] languages. It is known that simulations
that assume bias in general show a clustering towards the same stable state;4
the strength of the bias determines the predictability of the outcome.
This behavior of the simulation model suggests that it might be productive
to look at the content of particular feature combinations in order to deter-
mine what it is about them that yields more or less complexity. A number of
candidates for complexity should be considered.
Optimality theory (OT) as applied to syntax posits that particular struc-
tures are produced by rules that violate various constraints. Given a particular
formulation that captures a general tendency or a universal, it would be
natural to ask what it is about the particular constraints that yields the
observed ranking, since OT theory itself is not a theory of where the rankings
come from. On the other hand, OT allows for different rankings of the same
constraints, which suggests a priori that it might not shed much light on the
question of whether there is an independent universal metric that ranks
particular structures with respect to complexity.
Chomsky’s Minimalist Program (1995) proposes a measure of economy
that ranks derivations. The metric is formulated in terms of formal operations
and does not directly address the superficial properties of the languages
produced. From the perspective of the learner it is the superficial properties
4
This is demonstrated in simulations by Nowak et al. (1990). Kirby (1994) notes the role of
bias in change, while Briscoe (2000) has constructed computational simulations of the evolution
of language in which biases play a major role in determining the ultimate outcomes.
markedness and antisymmetry 317
that are most salient (or at least, for us, putting ourselves in the position of the
learner). One cannot rule out the possibility that there is a relationship
between derivational economy and superficial properties of the strings to be
processed by the learner, but nothing along these lines springs to mind. See
Jackendoff (1997) for discussion of the fact that derivation itself is far from
being a necessary component of a descriptively adequate account of human
language, as well as a vast amount of research in non-derivational theories,
especially HPSG.5
Parsing theory may offer some insight into what goes into the complexity
of a particular string, in terms of the extent to which the structure corre-
sponding to the string is transparently determined by the string.
Learnability theory has also been concerned with complexity, not so much
the complexity of individual examples as the complexity of a system of
examples with respect to the grammar that accounts for their properties.
5
The exchange in Natural Language and Linguistic Theory regarding the MP (beginning with
Lappin et al. 2000) does not offer any particularly good motivation for derivational economy, in
our view, but below we suggest an incompatible alternative view of derivational complexity that
might be more satisfying.
318 explaining syntax
In (5a) the movement string is ill-formed with respect to the more highly
ranked constraint, Stay, while the non-movement string is well-formed with
respect to this constraint. The reverse situation holds in (5b). Thus we have
grammars for two languages, of which one requires movement and the other
disallows it. The only difference between the two grammars in this case is the
relative ordering of the constraints. This is the device for representing lan-
guage variation in OT.
An account of this type raises two fundamental questions. First, what
determines the set of possible constraints? Second, if some orderings of
constraints are preferred to others, why is this the case? Beyond this there
are difficult questions of computability and learnability (Tesar 1995).
In OT the set of possible constraints is determined by Universal Grammar.
This much is not controversial, since any theory of grammar must provide some
account of what the possibilities are that languages may choose among.6 The
critical question has to do with the rankings. In some cases there appears to be a
natural ordering of the constraints, but there is nothing in the theory per se that
rules out any particular orderings. If we find that there is a preferred ordering,
this ordering of the constraints is an accounting of or an embodiment of the
markedness relations, in some sense. But of course, in addition to representing
markedness, we would like to be able to explain where it comes from.
Bresnan (2000) characterizes markedness in syntax in terms of the corre-
spondence between representations, in particular, c-structure and f-structure:
“there is not a perfect correspondence between the categorial (c-structure)
head and the functional (f-structure) head.” We believe that the notion of
correspondence in general is the right one for the purpose of characterizing
optimality; let us go back to the most primitive correspondence, however, that
between sound and meaning, in order to find an explanation for markedness
6
Matters become somewhat more complex if we attempt to derive some of the constraints
from functional considerations, rather than simply assume that they are all part of UG. For
discussion, see Newmeyer (2002) and Aissen and Bresnan (2002).
markedness and antisymmetry 319
relations. If, as we suggest in the next section, markedness in the end corres-
ponds to the complexity of mapping between strings and conceptual struc-
tures, an OT account, to the extent that it correctly captures the markedness
relations, is parasitic on the underlying correspondence that is ultimately
responsible for complexity.
7
Deep structure was renamed D-structure in subsequent syntactic theory.
8
Brown and Hanlon (1970); Fodor et al. (1974).
9
For more on mismatches, see Culicover and Jackendoff (1995; 1997; 1999), among many
others.
320 explaining syntax
The interpretation of this example is ‘a man who wants to buy your car called’,
but the relative clause and the head that it modifies are not adjacent in the
string. Hence there is a mismatch between the hierarchical structure and the
string, illustrated in (7).10
(7) .
. .
. . called
The crossing of mapping lines and the breaking up of the structure of the
subject illustrates the mismatch. (The crossing has nothing to do with linear
ordering in the structure, but with the way we display the hierarchical
organization and how it maps into the string.)
Intuitively, discontinuity of the sort illustrated in (7) does not contribute
significantly to processing complexity. If this intuition is correct, it would
suggest that discontinuity in itself is not problematic. Rather, complexity
arises when there are factors that interfere with the resolution of the
discontinuity.11 In the case of extraposition, on the assumption that extra-
position is not inherently complex, this may well be because it is treated as a
special case of binding, along the lines suggested by Culicover and Rochemont
(1990). The core idea, in this case, is that processing of the linear order of
10
There are several familiar mechanisms for representing discontinuity in natural language,
including movement and passing features of some gap within the larger string, so that the entire
string inherits the ability to license the ‘moved’ constituent. The formal devices for capturing
this type of relationship are not at issue here. The main point is that the mismatch introduces a
level of complexity into the mapping, both from the perspective of computing it for a given
string, and from the perspective of determining its precise characteristics on the basis of pairs
consisting of string and corresponding CS.
11
It is often suggested that extraposition and other rightward movements improve process-
ing by reducing center-embedding. See Hawkins (1994) and Wasow (1997).
markedness and antisymmetry 321
words produces a structure of the form in (8) at the point at which the
extraposed constituent is encountered.
. .
a man called
Processing of the relative clause creates a predicate that must be applied to the
representation of an object in CS; in this case the only available antecedent is
the CS representation of a man. Mapping (8) into (7) depends on the extent
to which this antecedent is computationally accessible.a It is this accessibility
that we believe underlies the complexity of the mapping between strings and
CS, both for learners and for adult language processors, especially in the case
of discontinuity but in other cases as well.12
This takes us close to a familiar idea in the domain of human sentence
processing. Constituents that have been processed and interpreted are in
general accessible to subsequent operations that require retrieval of their
meanings (Bransford and Franks 1971); at the same time, the actual form of
these constituents is difficult to retrieve as sentence processing continues.13
One of the key ideas in this work is that local relations are easier to compute
than more distant relations, which require memory for the elements that
occur earlier. Memory may degrade with time or it may be overloaded by the
need to perform multiple tasks; or it may be disordered by the need to
perform multiple similar tasks. All of these are logically possible, and empir-
ical evidence exists to suggest that they are in fact realistic problems for a
language processor. Again, we suggest that the language learner faces similar
problems. The bottom line, other things being equal, is that distance in the
string between elements that are functionally related to one another in the
interpretation of the string contributes to complexity of mapping that string
into CS.
a
The discussion of extraposition in Ch. 11 discusses some factors that may render a
particular NP less accessible as an antecedent.
12
Hence we follow the lead of Berwick (1987), who saw the connection very clearly.
13
There are many additional complexities, of course. See Kluender (1998) for a discussion of
some of these.
322 explaining syntax
14
Of course, we could suppose that CS includes a representation for discourse structure as
well as a representation for argument structure, but this would not simplify the mapping
problem, since we would then be dealing with a more complex CS with more possibilities.
15
One minor concern with the explanatory force of this argument is that we might have
expected that human memory would have evolved so as to overcome the problems offered by
non-uniform branching. Of course there are many reasons why this would not have happened,
and it is probably impossible to settle the issue. Shifting the burden of explanation to language
acquisition rather than language processing sidesteps this problem, since we probably do not
want to attribute to early learners the adult’s capacity to store and process long strings of
linguistic material. See }12.3.2, and fn. b below.
markedness and antisymmetry 323
16
The mapping was formulated in terms of strings and base phrase markers, but the general
problem is the same as the one that we are considering here.
17
This is not to say that grammatical errors per se are irrelevant, but simply that they are not
the whole story. On the current perspective, a grammatical error would occur if a particular
string is hypothesized to correspond to the wrong conceptual structure representation. We
assume that such errors are always detectable on the basis of subsequent information in the form
of <string,CS> pairs, but leave open the possibility that a particular formulation of the
correspondences might give rise to pathological cases that would have to be addressed.
324 explaining syntax
Kayne’s (1994) Antisymmetry theory, where all branching is binary and to the
right, such that all phrases are of the form given in (9).18
(9) XP
Spec X⬘
X YP
Kayne assumes that there is a strict correlation between asymmetric c-
command and linear order called the Linear Correspondence Axiom (LCA),
such that if Æ c-commands and does not c-command Æ, then Æ precedes .
If there is no movement, and if the branching structure in (9) is taken to be
the CS, then the mapping between strings and corresponding CS representa-
tions will be straightforward, in fact. All of the mappings will conform to the
LCA. Moreover, the mapping will be maximally simple, in that in order to
construct the mapping it is sufficient to scan the string from left to right,
establishing a correspondence between each element in the string and each
constituent of the CS.
18
In principle all branching could be to the left in Kayne’s approach, but Kayne introduces
an additional stipulation that rules out leftward branching.
b
Memory limitations play a central role in many accounts of processing complexity, e.g.
Hawkins’s work cited here and Hofmeister (2011). For arguments that memory limitations do
not correspond directly to acceptability judgments, see Sprouse et al. (2012). A plausible inter-
pretation of the role of memory is that it is a biasing factor, which leads speakers to prefer
certain constructions over others, which leads to higher frequency for the preferred construc-
tions, which ultimately produces ‘surprisal’ in the case of dispreferred constructions. Surprisal
in turn corresponds to lower acceptability. For some discussion, see Culicover (2013).
markedness and antisymmetry 325
(10) A
B C
D E F G
H I
string = defhi
Let us say that the Image of D is d, and so on for the other terminals in the CS
representation. We simplify dramatically here, because it is plausible that a
single CS can be expressed in a number of different ways. We can also define
an inverse relation and since there is more information in the tree than in the
string, the inverse image defines a set containing one or more CS
representations.
(11) Image(D)=d
Image-1(d)=<D, . . . >
Hence the correspondences are many-to-many.
It is possible that the image of a higher level node in the tree is not
decomposable into the image of its constituents, which would be typical of
an idiom (e.g. Image-1(kick the bucket)=<DIE, . . . >). It is also possible that a
single element in a string corresponds to a complex CS representation, as
argued e.g. by Jackendoff (1990). And it is possible that there is a particular
aspect of CS that corresponds to a class of strings that satisfy a certain
structural description, as has been argued for the dative construction
among others (see Goldberg 1995; Jackendoff 1997). We leave these more
complex possibilities aside here.
We can measure the distance between constituents of the CS representation
in terms of the height of the common ancestor. For sisters we will say that the
CDistance, i.e. the distance in the CS representation, is 0, which is the number
of ancestors that they do not have in common. So for (10) we have (12).
326 explaining syntax
(12) CDistance(H,I) = 0
The CDistance between a node and the daughter of its sister is 1, as in the case
of (F,H) and (F,I). In general, the CDistance between two nodes is the number
of dominating nodes that the path between them passes through. A node is
not a dominating node if the path through it links sisters; otherwise it is.
Given this notion of CDistance, we can relate the distance between sub-
strings to linear relations between the corresponding parts of the CS repre-
sentation. The general idea is the following. For a given distance between two
elements (words, phrases, etc.) in the string, we posit that greater distance in
CS requires greater processing, and hence produces greater complexity, other
things being equal.
Consider the string defhi. Sisterhood at CS, i.e. CDistance = 0, corresponds
to adjacency in the string. If CDistance(Æ,) = 0, and Image(Æ) precedes
Image(), then the right edge of Image(Æ) is adjacent to the left edge of
Image(). This is the case, for example, for Æ = B and = C.
We use this property to measure the amount of deformation (or ‘twisting’)
of a CS representation with respect to its corresponding string. In the case of
adjacency there is no deformation. We may measure deformation in terms of
the distance in the string between the right edge of Image(Æ) and the left edge
of Image(), which in this case is 0. But we must be careful to correlate these
distances appropriately. So, for example, the distance between B and G is 1.
Image(B) = de and Image(G) = hi. The distance between the right edge of de
and the left edge of hi is one element, namely f, but this is simply because f is a
terminal.
Suppose we replace F corresponding to f in the string in (10) with [F J K],
corresponding to jk in the string.
(13) A
B C
D E F G
J K H I
string = dejkhi
Now there are two elements in Image(F). But the distance between de and jk
and is 1, if we treat Image(B) = de and Image(F) = jk as single units. They can
be so treated because they correspond to constituents of CS. Let us call this
markedness and antisymmetry 327
(15) a. . b. .
H1 . H1 .
H2 XP XP H2
(18) a. . b. .
. H1 . H1
H2 XP XP H2
For (18a),
(19) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) =1 PDistance(H1,H2) =1
CDistance(H1,XP) = 1 PDistance(H1,XP) = 0
and for (18b),
(20) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) =1 PDistance(H1,H2) =0
CDistance(H1,XP) = 1 PDistance(H1,XP) = 1
Again, the greater PDistance between heads that are adjacent in the structure
occurs when the branching is not uniform, as in (18a).
The total deformation of a tree of course grows as the number of heads
grows, and the extent to which they do not line up grows. So, if we take the
pattern in (18a) and replicate it, the total PDistance between adjacent heads
will equal the number of alternating pairs of heads, while the total CDistance
between adjacent heads will remain 0. We might surmise that a single head in
an initial position with all other heads to the right might not be that costly in
terms of complexity, and might optimize something else in the grammar.
The computational cost would be minimized if the head in question was the
highest, since an internal ‘outlier’ would produce a cost with respect to the
head immediately above it and the one immediately below it.
On this view, complexity of processing is correlated with memory load, and
uniformity of branching reduces memory load. In this sense, the antisymme-
try approach of Kayne (1994) is correct in placing a high value on uniformity
of the direction of branching structure, but is too strong in that it does not
allow for non-uniform branching at all. For our purposes, it is enough to say
that uniformity is computationally less complex, other things being equal.
The reduction of complexity, coupled with a theory of language change that
reflects the computation biases of learners as discussed in }12.2, will produce a
markedness and antisymmetry 329
(21) CP
Spec C⬘
C IP
[+WH]
XP
[+WH]
19
An absolute requirement along these lines is too strong, given that there are cases where an
operator binds a variable to its left, such as If hei wants to, each mani can vote (Greg Carlson,
p.c.). We hypothesize that the correct account is one that assigns a strong preference to the case
in which the operator precedes what it binds, presumably for processing reasons.
330 explaining syntax
Bi C
D E
Fj G
H J L M
K ti N tj
b. A
Bi C
D E
L M
N F
H J
K ti
markedness and antisymmetry 331
PDistance(Fj,tj) = 2
CDistance(Fj,tj) = 2
b. PDistance(Bi,ti) = 5
CDistance(Bi,ti) = 5
When the trace is contained in a moved constituent, the complexity would be
better represented by constructing a measure that takes this fact explicitly into
account. One possibility is to multiply the CDistance from Bi to its trace by
the CDistance from Fj to its trace in (22a), which yields 8 compared with 5 in
(22b). Such a measure, while arbitrary, reflects the degree of deformation of
the tree.
To sum up, there are essentially three ways to map a CS into a string. One is
to align the constituents of the CS with the string without crossing constitu-
ents of the parse string. The second is to stretch a CS constituent to position
the corresponding string in a position where it is not adjacent to its CS sisters.
The third is to twist the lines so that the correspondences between strings and
constituents of CS cross. Our intention is that the relative complexity
accorded to this measure reflects the relative complexity in terms of memory
requirements, and that we do not have to formulate an explicit theory of
memory for sentence processing in order to be able to capture the basic
outlines of comparative complexity.
Note that there are several complexities that we have not factored into our
account here. A string of words may map into a CS representation so that
there are fewer primitives in the CS representation than there are words in the
string; this is a characterization of idiomaticity. Or there may be more
primitives in the CS representation than in the string; this is a characterization
of a ‘construction’ in the sense of Construction Grammar. In both cases there
is the opportunity for a mismatch in the CDistance and PDistance, since the
two are equal when there is a uniform linearization of a branching structure,
with a one-to-one correspondence between elements of the string and elem-
ents of the CS representation. To the extent that this additional complexity
presents a burden for the learner, we might expect some effect on learning.
But there is no twisting and so the burden, if it exists, is relatively light.
12.5 Summary
We have suggested that at its core the antisymmetry theory reflects the relative
computational simplicity of mapping strings into structures assuming uni-
form branching. The branching really has to do with the relative linear order
in the string between related heads and their identifiability, a measure that can
be correlated with memory but that can be abstractly formulated for string/
markedness and antisymmetry 333
Morphological complexity
outside of universal grammar
(1998)*
Remarks on Chapter 13
This chapter is about morphosyntax, in particular the use of linear order in
inflected words to express correspondences between form and meaning. In this
case, we focus on the identification of inflectional morphology and the corres-
pondence between morphological structure and syntactic function. We explore
the possibility that different orderings among the root and inflection in an
inflected form may yield differences in the complexity of the form–meaning
correspondence. We assume that complexity differences result in turn in prefer-
ences for some orderings over others, as seen in typological distribution, along
lines similar to those discussed in Chapter 12. Specifically, we argue that the
identification of inflectional morphology expressed as suffixation is computation-
ally less complex than prefixation, which in turn is computationally less complex
than infixation. These preferences account for the greater frequency of suffixation
over prefixation, and the greater frequency of prefixation over infixation.
13.1 Background
We address here one aspect of the question of why human language is the way
it is. It has been observed (Sapir 1921; Greenberg 1957; Hawkins and Gilligan
1988) that inflectional morphology tends overwhelmingly to be suffixation,
rather than prefixation, infixation, reduplication, or other logical possibilities
* [This chapter originally appeared as Jirka Hana and Peter W. Culicover, ‘Morphological com-
plexity outside of Universal Grammar’, OSUWPL 58, Spring 2008, pp. 84–108. We thank Chris Brew,
Beth Hume, Brian Joseph, John Nerbonne, and three anonymous reviewers from the journal Cognitive
Science for valuable feedback on various versions. We also thank Mary Beckman and Shari Speer.]
morphological complexity 335
that are quite rare if they exist at all. For this study, we assume that the
statistical distribution of possibilities is a consequence of how language is
represented or processed in the mind. That is, we rule out the possibility that
the distributions that we find are the result of contact, genetic relatedness, or
historical accidents (e.g. annihilation of speakers of languages with certain
characteristics), although such possibilities are of course conceivable and in
principle might provide a better explanation of the facts than the one that we
assume here.
The two possibilities that we focus on concern whether the preference for
suffixation is a property of the human capacity for language per se, or whether
it is the consequence of general human cognitive capacities. Following
common practice in linguistic theory, let us suppose that there is a part of
the human mind/brain, called the Language Faculty, that is specialized for
language (see e.g. Chomsky 1973). The specific content of the Language
Faculty is called Universal Grammar. We take it to be an open question
whether there is such a faculty and what its specific properties are; we do
not simply stipulate that it must exist or that it must have certain properties,
nor do we deny its existence and assert that the human capacity for language
can be accounted for entirely in terms that do not appeal to any cognitive
specialization. The goal of our research here is simply to investigate whether it
is possible to account for a particular property of human language in terms
that do not require that this property in some way follows from the architec-
ture of the Language Faculty.
1
The word ‘paradigm’ is used in two related but different meanings: (1) all the forms of a
given lemma; (2) in the original meaning, referring to a distinguished member of an inflectional
class, or more abstractly to a pattern in which the forms of words belonging to the same
inflectional class are formed. We reserve the term ‘paradigm’ only for the former meaning, and
use the phrase ‘paradigm pattern’ for the latter.
2
Throughout, we mark relevant morpheme boundaries by ‘·’, e.g. book·s.
336 explaining syntax
head-initial languages use not only prefixes, as expected, but also suffixes.
Moreover, there are many languages that use exclusively suffixes and not
prefixes (e.g. Basque, Finnish), but there are very few that use only prefixes
and no suffixes (e.g. Thai, but in derivation, not in inflection).
There have been several attempts to explain the suffix–prefix asymmetry,
using processing arguments, historical arguments, and combinations of both.
3
For example, even though in free word-order languages like Russian or Czech it is not
possible to predict case endings in general, they can be predicted in many specific cases because
of agreement within the noun phrase, subject–verb agreement, semantics, etc.
338 explaining syntax
original SOV word order; he uses Hawkins and Gilligan’s argument about
efficient processing to conclude that prefixes are less likely than suffixes
because free morphemes are less likely to fuse in pre-stem positions.
Although the work above correctly explains suffix–prefix asymmetry, it has
several disadvantages: (1) it relies on several processing assumptions that are
not completely independent of the explained problem, (2) there are many
other asymmetries in the distribution of potential morphological systems;
(3) as stated above, it addresses only verbal morphology. In the rest of the paper,
we develop an alternative measure that we believe addresses all of these issues.
about preference per se, but rather formal properties of the systems under
consideration. On this approach, if a system of Type I is measurably more
complex than a system of Type II, we would predict that Type I systems would
be more commonly found than Type II systems.
13.2.1 Complexity
We see basically two types of measure as the most plausible accounts of
relative morphological complexity: learning and real-time processing. Sim-
plifying somewhat, inflectional morphology involves adding a morpheme to
another form, the stem. From the perspective of learning, it may be more
difficult to sort out the stem from the inflectional morpheme if the latter is
prefixed than if it is suffixed. The other possibility is a processing one: once all
of the forms have been learned, it is more difficult to recognize forms and
distinguish them from one another when the morphological system works a
particular way, e.g. uses inflectional prefixes.
We do not rule out the possibility of a processing explanation in principle,
although we do not believe that the proposals that have been advanced (see
}13.1.2) are particularly compelling or comprehensive. The types of measure
that we explore here (see }13.4) are of the learning type.
4
The study explores the perception of nonsense words containing nasal–obstruent clusters.
Words containing clusters rare in English (e.g. /np/) were rated as potential words more likely
when the context allowed placing a morpheme boundary in the middle of the cluster,
e.g. zan·plirshdom was rated better than zanp·lirshdom.
5
The term ‘lemma’ is used with several different meanings. In our usage, every set of forms
belonging to the same inflectional paradigm is assigned a lemma, a particular form chosen by
convention (e.g. nominative singular for nouns, infinitive for verbs) to represent that set. The
342 explaining syntax
assume that forms belonging to the same lexeme are likely to have similar
orthography and contextual properties, and that the distribution of forms will
be similar for all lexemes. In addition they combine these similarity measures
with an iteratively trained probabilistic grammar generating the word forms.
Similarly Baroni et al. (2002) successfully use orthographical and semantic
similarity.
Formal similarity. The usual tool for discovering similarity of strings
is the Levenshtein edit distance (Levenshtein 1966). The advantage is that
it is extremely simple and is applicable to concatenative as well as non-
concatenative morphology. Some authors (Baroni et al. 2002) use the stand-
ard edit distance, where all editing operations (insert, delete, substitute) have a
cost of 1. Yarowsky and Wicentowski (2000) use a more elaborated approach.
Their edit operations have different costs for different segments and the costs
are iteratively re-estimated; initial values can be based either on phonetic
similarity or on a related language.
Semantic similarity. In most of the applications, semantics cannot be
accessed directly and therefore must be derived from other accessible proper-
ties of words. For example, Jacquemin (1997) exploits the fact that semantic-
ally similar words occur in similar contexts.
Distributional properties. The method of Yarowsky and Wicentowski
(2000) acquires morphology of English irregular verbs by comparing the
distributions of their forms with regular verbs, assuming they are distributed
equally.6 They also note that forms of the same lemma have similar selectional
preferences. For example, related verbs tend to occur with similar subjects
and objects. The selectional preferences are usually even more similar across
terms ‘citation form’, ‘canonical form’ are used with the same meaning. For example, the forms
break, breaks, broke, broken, breaking have the same lemma, break. Note that in this usage, only
forms related by inflection share the same lemma, thus e.g. the noun songs and the verb sings do
not have the same lemma.
6
Obviously, this approach would have to be significantly modified for classes other than
verbs and/or for highly inflective languages. Let’s consider e.g. Czech nouns. Not all nouns have
the same distribution of forms; e.g. many numeral constructions require the counted object to
be in the genitive. Therefore, currency names are more likely to occur in the genitive than, say,
proper names. Proper nouns occur in vocative far more often than inanimate objects, words
denoting uncountable substances (e.g. sugar) occur much more often in singular than in plural,
etc. Therefore, we would have to assume that there is not just a single distribution of forms
shared by all the noun lemmas, but several distributions. The forms of currency names, proper
names, and uncountable substances would probably belong to different distributions.
The algorithm in Yarowsky and Wicentowski (2000) is given candidates for verbal paradigms
and it discards those whose forms do not fit into the required uniform distribution. The
algorithm for discovering Czech nouns could use the same technique, but (i) there would not
be just one distribution but several, (ii) the algorithm would need to discover what those
distributions are.
morphological complexity 343
different forms of the same lemma than across synonyms. For this case, they
manually specify regular expressions that (roughly) capture patterns of pos-
sible selectional frames.
7
A more realistic model would allow iterative repetition of these stages. Even after establish-
ing a basic morphological competence, new forms that are opaque for it are still learned as
suppletives. The output of Stage 3 can be used to improve the clustering in Stage 2.
8
Of course, it is possible to imagine languages where Stage 2 is easy and Stage 3 is very hard.
For instance, in a language where plural is formed by some complex change of the last vowel,
Stage 2 is quite simple (words that differ only in that vowel go into the same paradigm), while
Stage 3 (discovering the rule that governs the vowel change) is hard.
344 explaining syntax
measure in such a way that it reflects more intuitively the physical and
cognitive reality of morphology acquisition. Some of the modifications are
similar to edit distance variants proposed by others, while some we believe are
original.
especially the fact that language occurs in time, and that human computa-
tional resources are limited.
Model 1 uses an incremental algorithm to compute similarity distance of
two strings. Unlike Model 0, Model 1 calculates only one edit operation
sequence. At each position, it selects a single edit operation. The most
preferred operation is MATCH. If MATCH is not possible, another operation
(SUBSTITUTE, DELETE, or INSERT) is selected randomly.9 The edit dis-
tance computed by this algorithm is larger or equal to the edit distance
computed by Model 0 algorithm (Figure 13.1). It cannot be smaller, because
Model 0 computes the optimal distance. It can be larger, because the oper-
ation selected randomly does not have to be optimal.
9
A more realistic model could (1) adjust the preference in the operation selection by
experience, (2) employ a limited look-ahead window. For the sake of simplicity, we ignore
these options.
348 explaining syntax
10
If S is a set of stems, A a set of affixes, then LP = A·S and LS = S·A. If s 2 S and a 2 A, then
wp = a·s and ws = s·a. The symbol · denotes both language concatenation and string
concatenation.
11
Note that this is not the general case, e.g., for words of different length there is no diagonal
at all—cf. Figure 13.3 C or D.
morphological complexity 349
v e k u t i k u t i v e
b k
a u
k t
u i
t b
i a
A. A prefixing language in M1 B. A suffixing language in M1
v e k u t i k u t i v e
k k
u u
t t
i i
C. Zero prefixes in M1 D. Zero prefixes in M1
MMMMII.12 The cost is equal to 2 and since there are no other possibilities,
the average cost of matching those two words is trivially optimal. The optimal
sequence for words kuti and ve·kuti of the prefixing language LP0 (IIMMMM)
costs also 2. However, there are many other non-optimal sequences. The worst
ones contain 6 INSERTs and 4 DELETEs and have a cost of 10.13
13.4.4.2 Evaluation
We randomly generate pairs of languages in various ways. The members of the
pair are identical except for the position of the affix. There is no homonymy
in the languages. For each such pair we calculated the following ratio:
PSI(LP)
(10) sufPref =
PSI(LS)
12
Note that delete or insert operations cannot be applied if match is possible.
13
In a model using a look-ahead window, the prefixing language would be still more
complex, but the difference would be smaller.
morphological complexity 351
languages, but less complex than prefixing languages. The reason is simple:
the uncertainty is introduced later than in the case of a prefix, therefore it is
possible that the string whose matching can be influenced by a non-optimal
operation selection is shorter.
This prediction contradicts the fact that infixes are much rarer than prefixes
(}13.1.2). Note, however, that the prediction concerns simplicity of clustering
word forms into paradigms. According to the model, it is easier to cluster
forms of an infixing language into paradigms than those of a prefixing
language. It may well be the case that infixing languages are more complex
from another point of view, that of identification of morphemes: other things
being equal, a discontinuous stem is probably harder to identify than a
continuous one.
Metathesis.The model prefers metathesis occurring later in a string for the
same reasons as it prefers suffixes over prefixes. This prediction is in accord
with data (see }13.B.2). However, the model also considers metathesis (of two
adjacent segments) to have the same cost as an affix consisting of two segments
and to be even cheaper than an affix with more segments. This definitely does
not reflect the reality. In }13.4.5.2, we suggest how to rectify this.
Mirror image. Similarly as Model 0, this model considers mirror image to
be extremely complicated to acquire.
Templatic morphology. As we note in Appendix }13.B.1, templatic morph-
ology does not have to be harder to acquire than morphology using continuous
affixes. Following Fowler (1983), it can be claimed that consonants of the
root and vowels of the inflection are perceptually in different ‘dimensions’—
consonants are modulated on the basic vowel contour of syllables—and there-
fore clearly separable.
14
It is probable that learners extract similar probabilities on other levels as well—segments,
feet, etc.
morphological complexity 353
operations, (2) the size of the stack in all steps, (3) the cost of possible
backtracking. Each of them is adding to the memory load and/or slowing
processing.
Matching morpheme boundaries increases the probability that the two
words are being matched the ‘right’ way (i.e. that the match is not accidental).
This means that it is more likely that the choices of edit operations made in
the past were correct, and therefore backtracking is less likely to occur. In such
case, Model 2 flushes the stack. Similarly, the stack can be flushed if a certain
number of matches occurs in a row, but a morpheme boundary contributes
more to the certainty of the right analysis. In general, we introduce a notion of
anchor, that is, i.e. a sequence of matches of certain weight when the stack
is flushed. This can be further enhanced by assigning different weights to
matching of different segments (consonants are less volatile than vowels).
Morpheme boundaries would then have higher weight than any segment.
Moreover, more probable boundaries would have higher weights than less
probable ones.
Thus in general, a regular language with more predictable morpheme
boundaries needs a smaller stack for clustering words according to their
formal similarity.
Suffix vs. prefix. It is evident that Model 2 also considers prefixing
languages more complex than suffixing languages for two reasons. First, the
early uncertainty of a prefixing language leads to more deviations from the
minimal sequence of edit operations in the same way as in Model 1. Second,
the stack is filled early and the information must be kept there for a longer
time, therefore the memory load is higher.
Infixes. Our intuitions tell us that Model 2, unlike Model 1, would
consider an infixing language more complex than a prefixing language. The
reason is that predicting morpheme boundaries using statistics is harder in an
infixing language than in the corresponding prefixing language. However we
have not worked out the formal details of this.
13.5 Conclusion
We showed that it is possible to model the prevalence of various morpho-
logical systems in terms of their acquisition complexity. Our complexity
measure is based on the Levenshtein edit distance modified to reflect external
constraints—human memory limitations and the fact that language occurs in
time. Such a measure produces some interesting predictions; for example, it
correctly predicts prefix–suffix asymmetry and shows mirror image morph-
ology to be virtually impossible.
both forms and the corresponding stem–inflection pairs. This is similar (with
enough simplification) to our idealization of a child being exposed to both
forms and their meanings.
Many of the results are in accord with the preferences attested in real
languages (see }13.1.2): it was easier to identify roots in a suffixing language
than in a prefixing one, the templates were relatively easy, and infixes were
relatively hard.15 In a similar experiment Gasser and Lee (1991) showed that
the model does not learn linguistically implausible languages—pig Latin or a
mirror image language (see (5)). The model was unable to learn any form of
syllable reduplication. A model enhanced with modules for syllable process-
ing was able to learn a very simple form of reduplication—reduplicating onset
or rime of a single syllable. It is necessary to stress that the problem addressed
by Gasser was much simpler than real acquisition: (1) at most two inflectional
categories were used, each with only two values, (2) each form belonged only
to one paradigm, (3) there were no irregularities, and (4) only the relevant
forms with their functions were presented (no context, no noise).
15
The accuracy of root identification was best in the case of suffixes, templates, and umlaut
(c.75%); in the case of prefixes, infixes, and deletion it was lower (c.50%); all above the chance
baseline (c.3%) The accuracy of the inflection identification showed a different pattern: the best
were prefix and circumfix (95+%), slightly harder were deletion, template, and suffix (90+%),
and the hardest were umlaut and infix (c.75%); all above the chance baseline (50%).
356 explaining syntax
13.B.2 Metathesis
In morphological metathesis, the relative order of two segments encodes a
morphological distinction. For example, in Rotuman (Austronesian family,
related to Fijian), words distinguish two forms, called the complete and
incomplete phase16 by Churchward (1940), and in many cases these are
distinguished by metathesis (examples due to Hoeksema and Janda 1988):17
16
According to Hoeksema and Janda (1988), the complete phase indicates definiteness or
emphasis for nouns and perfective aspect or emphasis for verbs and adjectives, while the
incomplete phase marks words as indefinite/imperfective and nonemphatic.
17
In many cases, subtraction (rako vs. rak ‘to imitate’), subtraction with umlaut (hoti vs. höt ‘to
embark’), or identity (rı̄ vs. rı̄ ‘house’) is used instead. See McCarthy (2000) for more discussion.
morphological complexity 357
They found only one language (Fur) with a fully productive root-initial
metathesis involving a wide variety of sounds. Apparent cases of non-adjacent
metathesis can be usually analyzed as two separate metatheses, each motiv-
ated by an independent phonological constraint.
Processing metathesis. Mielke and Hume (2001) suggest that the reasons
for the relative infrequency of metathesis are related to word recognition—
metathesis impedes word recognition more than other frequent processes,
like assimilation. Word recognition (see }13.3.1) can also explain the fact that
it is even rarer (or perhaps nonexistent) word/root-initially or with non-
adjacent segments: since (i) lexical access is generally achieved on the basis of
the initial part of the word and (ii) phonological changes involving non-
adjacent segments are generally more disruptive to word recognition.
References
Adger, David (2003). Core Syntax: A Minimalist Approach. Oxford: Oxford University
Press.
Aissen, Judith, and Joan Bresnan (2002). Optimality and functionality: objections and
refutations. Natural Language and Linguistic Theory 20: 81–95.
Akmajian, Adrian (1970). On deriving cleft sentences from pseudocleft sentences.
Linguistic Inquiry 1: 149–68.
Akmajian, Adrian, and Frank Heny (1975). An Introduction to Transformational
Generative Grammar. Cambridge, Mass.: MIT Press.
Anderson, Stephen R. (1977). Comments on the paper by Wasow. In Peter W. Culicover,
Thomas Wasow, and Adrian Akmajian (eds), Formal Syntax, 361–77. New York:
Academic Press.
Arnold, Jennifer, Thomas Wasow, Anthony Losongco, and Ryan Ginstrom (2000).
Heaviness vs. newness: the effects of structural complexity and discourse status on
constituent ordering. Language 76: 28–55.
Austin, J. L. (1962). How to Do Things with Words. New York: Oxford University Press.
Authier, Jean-Marc (1991). Iterated CPs and embedded topicalization. Linguistic
Inquiry 23: 329–36.
Bach, Emmon (1980). In defense of passive. Linguistics and Philosophy 3: 297–341.
Baker, Mark (1988). Incorporation: A Theory of Grammatical Function Changing.
Chicago: University of Chicago Press.
Baltin, Mark (1978). Towards a theory of movement rules. Dissertation, MIT.
Baltin, Mark (1981). Strict bounding: the logical problem of language acquisition. In
Carl Lee Baker and John J. McCarthy (eds), The Logical Problem of Language
Acquisition, 247–95. Cambridge, Mass.: MIT Press.
Baltin, Mark (1982). A landing site theory of movement rules. Linguistic Inquiry 3: 1–38.
Baroni, Marco (2000). Distributional cues in morpheme discovery: a computational
model and empirical evidence. Dissertation, UCLA.
Baroni, Marco, Johannes Matiasek, and Harald Trost (2002). Unsupervised discovery
of morphologically related words based on orthographic and semantic similarity. In
Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning,
vol. 6: 48–57.
Bayer, Josef (1984). Towards an explanation of certain that-t phenomena: the COMP
node in Bavarian. In W. de Geest and Y. Putseys (eds), Sentential Complementation,
23–32. Dordrecht: Foris.
Becker, Thomas (2000). Metathesis. In Geert Booij, Christian Lehmann, and Joachim
Mugdan (eds), Morphology: A Handbook on Inflection and Word Formation, 576–81.
Berlin: Mouton de Gruyter.
references 359
Beckman, Mary E., Julia Hirschberg, and Stephanie Shattuck-Hufnagel (2005). The
original ToBI system and the evolution of the ToBI framework. In Sun-Ah Jun (ed.),
Prosodic Typology: The Phonology of Intonation and Phrasing, 9–54. Cambridge:
Cambridge University Press.
Beerman, Dorothee, David LeBlanc, and Henk van Riemsdijk (eds) (1997). Rightward
Movement. Amsterdam: Benjamins.
Belletti, Adriana (2004). Structures and Beyond: The Cartography of Syntactic Struc-
tures. New York: Oxford University Press.
Berman, Arlene, and Michael Szamosi (1972). Observations on sentential stress.
Language 48: 304–25.
Berwick, Robert C. (1987). Parsability and learnability. In Brian MacWhinney (ed.),
Mechanisms of Language Acquisition, 345–65. Hillsdale, NJ: Erlbaum.
Beukema, Frits, and Peter Coopmans (1989). A government-binding perspective on
the imperative in English. Journal of Linguistics 25: 417–36.
Bever, Thomas G. (1970). The cognitive basis for linguistic structures. In John
R. Hayes (ed.), Cognition and the Development of Language, 279–362. New York:
Wiley.
Bever, Thomas G., and Brian McElree (1988). Empty categories access their antece-
dents during comprehension. Linguistic Inquiry 19: 35–43.
Bever, Thomas G., and David J. Townsend (2001). Sentence Comprehension: The
Integration of Habits and Rules. Cambridge, Mass.: MIT Press.
Bierwisch, Manfred (1968). Two critical problems of accent rules. Journal of Linguistics
4: 173–8.
Bing, Janet (1979). Aspects of English prosody. Dissertation, University of Massachu-
setts, Amherst.
Bolinger, Dwight (1958). Stress and information. American Speech 33: 3–20.
Bolinger, Dwight (1961). Contrastive accent and contrastive stress. Language 37: 83–96.
Bolinger, Dwight (1972). Accent is predictable (if you’re a mind-reader). Language 48:
633–44.
Borer, Hagit (1989). Anaphoric AGR. In Osvaldo Jaeggli and Kenneth Safir (eds), The
Null Subject Parameter, 69–110. Dordrecht: Kluwer.
Brame, Michael (1975). On the abstractness of syntactic structure: the VP controversy.
Linguistic Analysis 1: 191–203.
Brame, Michael (1978). Base Generated Syntax. Seattle, Wa.: Noit Amrofer.
Bransford, John D., and Jeffery J. Franks (1971). The abstraction of linguistic ideas.
European Journal of Cognitive Psychology 2: 331–50.
Bresnan, Joan (1971). Sentence stress and syntactic transformations. Language 47:
257–81.
Bresnan, Joan (1972). Stress and syntax: a reply. Language 48: 326–42.
Bresnan, Joan (1976). Evidence for a theory of unbounded transformations. Linguistic
Analysis 2: 353–93.
Bresnan, Joan (1977). Variables in the theory of transformations. In Peter W. Culicover,
Thomas Wasow, and Adrien Akmajian (eds), Formal Syntax, 157–96. New York:
Academic Press.
360 references
Culicover, Peter W., and Michael S. Rochemont (1990). Extraposition and the comple-
ment principle. Linguistic Inquiry 21: 23–48.
Culicover, Peter W., and Kenneth Wexler (1977). Some syntactic consequences of a
theory of language learnability. In Peter W. Culicover, Thomas Wasow, and Adrian
Akmajian (eds), Formal Syntax, 7–60. New York: Academic Press.
Culicover, Peter W., and Wendy Wilkins (1984). Locality in Linguistic Theory. New
York: Academic Press.
Culicover, Peter W., and Susanne Winkler (2008). English focus inversion construc-
tions. Journal of Linguistics 44: 625–58.
Cutler, Anne, John A. Hawkins, and Gary Gilligan (1985). The suffixing preference:
a processing explanation. Linguistics 23: 723–58.
Daneš, František (1967). Order of elements and sentence intonation. In Morris Halle,
Horace Lunt, Hugh MacLean, and Cornelis van Schooneveld (eds), For Roman
Jakobson, 499–512. The Hague: Mouton.
Delahunty, Gerald P. (1981). Topics in the syntax and semantics of English cleft
sentences. Dissertation, University of California, Irvine.
den Dikken, Marcel (2005). Comparative correlatives comparatively. Linguistic
Inquiry 36: 497–533.
Diesing, Molly (1990). Verb movement and the subject position in Yiddish. Natural
Language and Linguistic Theory 8: 41–79.
Dogil, Gregory (1979). Autosegmental Account of Phonological Emphasis. Carbondale,
Ill.: Linguistic Research.
Downing, Bruce T. (1970). Syntactic structure and phonological phrasing in English.
Dissertation, University of Texas, Austin.
Dowty, David (1985). On recent analyses of the semantics of control. Linguistics and
Philosophy 8: 291–331.
Dowty, David (1991). Thematic proto-roles and argument selection. Language 67:
547–619.
Dresher, Elan (1977). Logical representations and linguistic theory. Linguistic Inquiry
8: 351–78.
E. Kiss, Katalin (ed.) (1992). Discourse Configurationality. Oxford: Oxford University
Press.
Emonds, Joseph (1970). Root and Structure Preserving Transformations. Bloomington:
Indiana University Linguistics Club.
Emonds, Joseph (1976). A Transformational Approach to English Syntax. New York:
Academic Press.
Farmer, Ann K. (1984). Modularity in Syntax. Cambridge, Mass.: MIT Press.
Featherston, Sam (2001). Empty Categories in Sentence Processing. Amsterdam:
Benjamins.
Fillmore, Charles J. (1965). Indirect Object Constructions and the Ordering of Trans-
formations. The Hague: Mouton.
Fillmore, Charles J. (1999). Inversion and constructional inheritance. In Gert Webel-
huth, Jean-Pierre Koenig, and Andreas Kathol (eds), Lexical and Constructional
Aspects of Linguistic Explanation, 113–28. Stanford, Calif.: CSLI.
364 references
Fillmore, Charles J., Paul Kay, and Mary Catherine O’Connor (1988). Regularity and
idiomaticity in grammatical constructions: the case of let alone. Language 64: 501–39.
Fodor, Janet D. (1978). Parsing strategies and constraints on transformations. Linguis-
tic Inquiry 9: 427–73.
Fodor, Jerry A., and Merrill Garrett (1967). Some syntactic determinants of sentential
complexity. Attention, Perception, and Psychophysics 2: 289–96.
Fodor, Jerry A., Thomas Bever, and Merrill Garrett (1974). The Psychology of Language:
An Introduction to Psycholinguistics and Generative Grammar. New York: McGraw-
Hill.
Fowler, Carol A. (1983). Converging sources of evidence on spoken and perceived
rhythms of speech: cyclic production of vowels in monosyllabic stress feet. Journal
of Experimental Psychology: General 112: 386.
Frampton, John (1990). Parasitic gaps and the theory of wh-chains. Linguistic Inquiry
21: 49–77.
Freidin, Robert (1975). The analysis of passives. Language 51: 384–405.
Friedmann, Naama, and Louis P. Shapiro (2003). Agrammatic comprehension of OSV
and OVS sentences in Hebrew. Journal of Speech, Language and Hearing Research
46: 288-97.
Fukui, Naoki, and Margaret Speas (1986). Specifiers and projection. MIT Working
Papers in Linguistics 8: 128–72.
Gallistel, C. Randall (1990). The Organization of Learning. Cambridge, Mass.: MIT Press.
Gasser, Michael (1994). Acquiring receptive morphology: a connectionist model.
Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics,
279–86.
Gasser, Michael, and Chan-Do Lee (1991). A short-term memory architecture for
the learning of morphophonemic rules. In R. P. Lippmann, J. E. Moody, and
D. S. Touretzkey (eds), Advances in Neural Information Processing Systems 3,
605–11. San Mateo, Calif: Morgan Kaufmann.
Gazdar, Gerald (1981). Unbounded dependencies and coordinate structure. Linguistic
Inquiry 12: 155–84.
Gazdar, Gerald, Ewan Klein, Geoffrey Pullum, and Ivan A. Sag (1985). Generalized
Phrase Structure Grammar. Cambridge, Mass.: Harvard University Press.
Givón, Talmy (1979). On Understanding Grammar. New York: Academic Press.
Goldberg, Adele E. (1995). Constructions: A Construction Grammar Approach to Argu-
ment Structure. Chicago: University of Chicago Press.
Goldberg, Adele E. (2006). Constructions at Work: Constructionist Approaches in
Context. Oxford: Oxford University Press.
Goldberg, Adele E., and Ray Jackendoff (2004). The English resultative as a family of
constructions. Language 80: 532–67.
Greenberg, Joseph H. (1957). Essays in Linguistics. Chicago: University of Chicago
Press.
Greenberg, Joseph H. 1963. Some universals of grammar with particular reference to
the order of meaningful elements. In Universals of Language, 73–113. Cambridge,
Mass.: MIT Press.
references 365
Grice, H. P. (1975). Logic and conversation. In Peter Cole and Jerry L. Morgan (eds),
Speech Acts, 41–58. New York: Academic Press.
Grimshaw, Jane (1975). Relativization by deletion in Chaucerian Middle English. In
Jane Grimshaw (ed.), Papers in the History and Structure of English 1. Amherst,
Mass.: University of Massachusetts.
Grimshaw, Jane (1979). Complement selection and the lexicon. Linguistic Inquiry 10:
279–326.
Grimshaw, Jane (1997). Projections, heads and optimality. Linguistic Inquiry 28: 373–422.
Grodzinsky, Yosef (2000). The neurology of syntax: language use without Broca’s area.
Behavioral and Brain Sciences 23: 1–71.
Grosu, Alexander (1975). The position of fronted wh phrases. Linguistic Inquiry
6: 588–99.
Gruber, Jeffrey S. (1965). Studies in lexical relations. Dissertation, MIT.
Gruber, Jeffrey S. (1967). Disjunctive ordering among lexical insertion rules. MS, MIT.
Guéron, Jacqueline (1980). On the syntax and semantics of PP extraposition. Linguis-
tic Inquiry 1: 637–78.
Guéron, Jacqueline, and Robert May (1984). Extraposition and logical form. Linguistic
Inquiry 5: 1–31.
Gundel, Janet (1974). The role of topic and comment in linguistic theory. Dissertation,
University of Texas at Austin.
Gunter, Richard (1966). On the placement of accent in dialogue: a feature of context
grammar. Journal of Linguistics 2: 159–79.
Haegeman, Liliane (1991). Negative concord, negative heads. In Denis Delfitto, Martin
Everaert, Arnold Evers, and Frits Stuurman (eds), Going Romance and Beyond: Fifth
Symposium on Comparative Grammar. Utrecht: OTS.
Haider, Hubert (1986). V-second in German. In Hubert Haider and Martin Prinzhorn
(eds), Verb Second Phenomena in Germanic Languages, 49–75. Dordrecht: Foris.
Hale, John T. (2003). The information conveyed by words in sentences. Journal of
Psycholinguistic Research 32: 101–23.
Hale, Kenneth, LaVerne Jeanne, and Paul Platero (1977). Three cases of overgenera-
tion. In Peter W. Culicover, Thomas Wasow, and Adrian Akmajian (eds), Formal
Syntax, 379–416. New York: Academic Press.
Hall, Christopher J. (1988). Integrating diachronic and processing principles in
explaining the suffixing preference. In John A. Hawkins (ed.), Explaining Language
Universals, 321–49. Oxford: Blackwell.
Harris, Zelig S. (1955). From phoneme to morpheme. Language 31: 190–222.
Hasegawa, Nobuko (1981). The VP complement and control phenomena: beyond
trace theory. Linguistic Analysis 7: 85–120.
Hauser, Marc D. (2000). Wild Minds: What Animals Really Think. New York: Holt.
Hawkins, John A. (1994). A Performance Theory of Order and Constituency. Cambridge:
Cambridge University Press.
Hawkins, John A., and Gary Gilligan (1988). Prefixing and suffixing universals in
relation to basic word order. Lingua 74: 219–59.
366 references
Hay, Jennifer, Janet Pierrehumbert, and Mary E. Beckman (2003). Speech perception,
well-formedness and the statistics of the lexicon. In John Local, Richard Ogden,
and Rosalyn Temple (eds), Papers in Laboratory Phonology, vol. 6, 58–74. Cam-
bridge: Cambridge University Press.
Hoeksema, Jack, and Richard D. Janda (1988). Implications of process morphology for
categorial grammar. In Richard T. Oehrle, Emmon Bach, and Dierdre Wheeler
(eds), Categorial Grammars and Natural Language Structures, 199–247. New York:
Academic Press.
Hoekstra, Eric (1991). Licensing conditions on phrase structure. Dissertation, Univer-
sity of Groningen.
Hoekstra, Teun, and René Mulder (1990). Unergatives as copula verbs: location and
existential predication. Linguistic Review 7: 1–79.
Hofmeister, Philip (2011). Representational complexity and memory retrieval in
language comprehension. Language and Cognitive Processes 26: 376–405.
Hooper, Joan, and Sandra A. Thompson (1973). On the applicability of root trans-
formations. Language 4: 465–97.
Horvath, Julia (1979). Core grammar and a stylistic rule in Hungarian syntax. NELS
9: 237–55.
Horvath, Julia (1985). Focus in the Theory of Grammar and the Syntax of Hungarian.
Dordrecht: Foris.
Jackendoff, Ray (1969). An interpretive theory of negation. Foundations of Language
4: 218–41.
Jackendoff, Ray (1972). Semantic Interpretation in Generative Grammar. Cambridge,
Mass.: MIT Press.
Jackendoff, Ray (1977). X-Bar Syntax: A Study of Phrase Structure. Cambridge, Mass.:
MIT Press.
Jackendoff, Ray (1990). Semantic Structures. Cambridge, Mass.: MIT Press.
Jackendoff, Ray (1997). The Architecture of the Language Faculty. Cambridge, Mass.:
MIT Press.
Jackendoff, Ray (2002). Foundations of Language. Oxford: Oxford University Press.
Jackendoff, Ray, and Peter W. Culicover (1972). A reconsideration of dative move-
ment. Foundations of Language 6: 197–219.
Jackendoff, Ray, and Peter W. Culicover (2003). The semantic basis of control.
Language 79: 517–56.
Jacobson, Pauline (1992). Antecedent contained deletion in a variable-free semantics.
In Chris Barker and David Dowty (eds), Proceedings of the Second Conference on
Semantics and Linguistic Theory, 193–213. Columbus: Department of Linguistics,
Ohio State University.
Jacquemin, Christian (1997). Guessing morphology from terms and corpora. Proceed-
ings of the 20th Annual International Conference on Research and Development in
Information Retrieval, 156–67.
Jaeggli, Osvaldo (1980). Remarks on to contraction. Linguistic Inquiry 11: 239-46.
Jaeggli, Osvaldo (1982). Topics in Romance Syntax. Dordrecht: Foris.
references 367
Janda, Richard D. (2011). Why morphological metathesis rules are rare: on the
possibility of historical explanation in linguistics. In Proceedings of the Annual
Meeting of the Berkeley Linguistics Society, 87–103.
Jespersen, Otto (1949). A Modern English Grammar on Historical Principles, 7: Syntax.
London: Allen & Unwin.
Johnson, E. K., and P. W. Jusczyk (2001). Word segmentation by 8-month-olds: when
speech cues count more than statistics. Journal of Memory and Language 44: 548–67.
Johnson, Kyle (1985). A case for movement. Dissertation, MIT.
Johnson, Kyle (1989). Clausal architecture and structural case. MS, University of
Wisconsin-Madison.
Kathol, Andreas, and Robert D. Levine (1992). Inversion as a linearization effect. In
Amy Schaefer (ed.), Proceedings of NELS 23, 207–21. Amherst, Mass.: GLSA.
Katz, Jerrold J., and Paul M. Postal (1964). Toward an Integrated Theory of Linguistic
Descriptions. Cambridge, Mass.: MIT Press.
Kay, Paul (2002a). An informal sketch of a formal architecture for construction
grammar. Grammars 5: 1–19.
Kay, Paul (2002b). English subjectless tagged sentences. Language 78: 453–81.
Kay, Paul, and Charles J. Fillmore (1999). Grammatical constructions and linguistic
generalizations: the what’s x doing y? construction. Language 75: 1–33.
Kayne, Richard S. (1981a). ECP extensions. Linguistic Inquiry 22: 93–133.
Kayne, Richard S. (1981b). On certain differences between French and English. Lin-
guistic Inquiry 12: 349–71.
Kayne, Richard S. (1994). The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press.
Keenan, Edward (1980). Passive is phrasal (not sentential or lexical). In Teun Hoek-
stra, Harry van der Hulst, and Michael Moortgat (eds), Lexical Grammar, 181–213.
Dordrecht: Foris.
Kehler, Andrew (2000). Coherence and the resolution of ellipsis. Linguistics and
Philosophy 23: 533–75.
Keyser, Samuel J. (1967). Machine recognition of transformational grammars of English.
DTIC document: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix
=html&identifier=AD0653993.
Kirby, Simon (1994). Adaptive explanations for language universals. Sprachtypologie
und Universalienforschung 47: 186–210.
Kisseberth, Charles W. (1970). On the functional unity of phonological rules. Linguis-
tic Inquiry 1: 291–306.
Klavans, Judith L., and Philip Resnik (1996). The Balancing Act: Combining Symbolic
and Statistical Approaches to Language. Cambridge, Mass.: MIT Press.
Klein, Sharon M. (1981). Syntactic theory and the developing grammar. Dissertation,
UCLA.
Klima, Edward (1964). Negation in English. In Jerry Fodor and Jerrold J. Katz (eds),
The Structure of Language, 246–323. Englewood Cliffs, NJ: Prentice-Hall.
Klima, Edward (1970). Regulatory devices against functional ambiguity. MS, MIT.
368 references
Kluender, Robert (1998). On the distinction between strong and weak islands: a
processing perspective. In Peter W. Culicover and Louise McNally (eds), The Limits
of Syntax, 241–79. New York: Academic Press.
Kluender, Robert (2004). Are subject islands subject to a processing account? In
Benjamin Schmeiser, Vineeta Chand, Ann Kelleher, and Angelo Rodriguez (eds),
Proceedings of WCCFL 23, 101–25. Somerville, Mass.: Cascadilla Press.
Koizumi, Masatoshi (1991). Syntax of adjuncts and the phrase structure of Japanese.
Dissertation, Ohio State University.
Koopman, Hilda (1983). ECP effects in main clauses. Linguistic Inquiry 4: 346–50.
Koster, Jan (1978a). Locality Principles in Syntax. Dordrecht: Foris.
Koster, Jan (1978b). Why subject sentences don’t exist. In Samuel J. Keyser (ed.),
Recent Transformational Studies in European Languages, 53–64. Cambridge, Mass.:
MIT Press.
Koster, Jan, and Robert May (1981). On the constituency of infinitives. Language 58:
116–43.
Kuroda, S.-Y. (1968). Review of Fillmore (1965). Language 44: 374–78.
Ladd, Robert (1980). The Structure of Intonational Meaning. Bloomington: Indiana
University Press.
Laka, Itziar (1990). Negation in syntax: on the nature of functional categories and
projections. Dissertation, MIT.
Lakoff, George (1969). On derivational constraints. In Robert I. Binnick, Alice Davi-
son, Georgia Green, and Jerry L. Morgan (eds), Papers from the Fifth Regional
Meeting of the Chicago Linguistic Society, 117–39. Chicago: CLS.
Lakoff, George (1970). Linguistics and natural logic. Synthese 22: 151–271.
Lakoff, George (1971). On the Nature of Syntactic Irregularity. New York: Holt, Rine-
hart & Winston.
Lakoff, George (1972). The global nature of the Nuclear Stress Rule. Language 48:
285–303.
Lakoff, Robin (1969). A syntactic argument for negative transportation. In Robert
I. Binnick, Alice Davison, Georgia Green, and Jerry L. Morgan (eds), Papers from
the Fifth Regional Meeting of the Chicago Linguistic Society, 140–47. Chicago: CLS.
Landau, Idan (2006). Severing the distribution of PRO from case. Syntax 9: 153–70.
Lappin, Shalom (1996). The interpretation of ellipsis. In Shalom Lappin (ed.), Hand-
book of Contemporary Semantic Theory, 145–75. Oxford: Blackwell.
Lappin, Shalom, Robert D. Levine, and David Johnson (2000). The structure of
unscientific revolutions. Natural Language and Linguistic Theory 18: 665–71.
Larson, Richard (1988). On the double object construction. Linguistic Inquiry 19: 335–91.
Larson, Richard (1990). Double objects revisited: reply to Jackendoff. Linguistic
Inquiry 21: 589–632.
Lasnik, Howard (2001). When can you save a structure by destroying it? NELS
31: 301–20.
Lasnik, Howard (2002). The minimalist program in syntax. Trends in Cognitive
Sciences 6: 432–37.
references 369
Lasnik, Howard, and Mamoru Saito (1984). On the nature of proper government.
Linguistic Inquiry 5: 235–89.
Lasnik, Howard, and Mamoru Saito (1992). Move Alpha. Cambridge, Mass.: MIT
Press.
Lasnik, Howard, and Tim Stowell (1991). Weakest crossover. Linguistic Inquiry 22:
687–720.
Latané, Bibb (1996). The emergence of clustering and correlation from social inter-
actions. In R. Hegselmann and H. O. Peitgen (eds), Modelle socialer Dynamiken:
Ordnung, Chaos und Komplexität, 79–104. Vienna: Holder-Pichler-Tempsky.
Lebeaux, David (1988). Language acquisition and the form of the grammar. Disserta-
tion, University of Massachusetts, Amherst.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics 10.8: 707–10.
Levin, Beth, and Malka Rappaport Hovav (1995). Unaccusativity: At the Syntax–
Lexical Semantics Interface. Cambridge, Mass.: MIT Press.
Levine, Robert D. (1989). On focus inversion: syntactic valence and the role of a subcat
list. Linguistics 17: 1013–55.
Levy, Roger (2008). Expectation-based syntactic comprehension. Cognition 106: 1126–77.
Liberman, Mark (1974). On conditioning the rule of Subj–Aux inversion. In Ellen
Kaisse and Jorgen Hankamer (eds), Papers from the Fifth Annual Meeting of NELS,
77–91. Amherst, Mass.
Liberman, Mark (1979). The Intonational System of English. New York: Garland.
Liberman, Mark, and Alan Prince (1977). On stress and linguistic rhythm. Linguistic
Inquiry 8: 249–336.
Manning, Christopher D., and Hinrich Schütze (1999). Foundations of Statistical
Natural Language Processing. Cambridge, Mass.: MIT Press.
Manzini, M. Rita (1983). Restructuring and reanalysis. Dissertation, MIT.
Marslen-Wilson, W. D. (1993). Issues of process and representation in lexical access. In
G. T. M. Altmann and R. Shillcock (eds), Cognitive Models of Speech Processing: The
Second Sperlonga Meeting, 187–210. Mahwah, NJ: Erlbaum.
Marslen-Wilson, W. D., and L. K. Tyler (1980). The temporal structure of spoken
language understanding. Cognition 8: 1–71.
Marslen-Wilson, W. D., and A. Welsh (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10: 29–63.
May, Robert (1985). Logical Form. Cambridge, Mass.: MIT Press.
McA’Nulty, Judith (1980). Binding without case. In John Jensen (ed.), Proceedings of
NELS 10, 315–28. Ottawa: Cahiers linguistiques d’Ottawa, University of Ottawa.
McCarthy, John J. (2000). The prosody of phase in Rotuman. Natural Language and
Linguistic Theory 18: 147-97.
McCarthy, John J. (2003). Sympathy, cumulativity, and the Duke-of-York gambit. In
Caroline Féry and Ruben van de Vijver (eds), The Optimal Syllable, 23–76. Cam-
bridge: Cambridge University Press.
Merchant, Jason (2001). The Syntax of Silence. Oxford: Oxford University Press.
370 references
Mielke, Jeff, and Elizabeth Hume (2001). Consequences of word recognition for
metathesis. In Elizabeth Hume, Norval Smith, and Jeroen van de Weijer (eds),
Surface Syllable Structure and Segment Sequencing, 135–58. Leiden: HIL.
Nerbonne, John, Wilbert Heeringa, and Peter Kleiweg (1999). Edit distance and dialect
proximity. In David Sankoff and Joseph Kruskal (eds), Time Warps, String Edits and
Macromolecules: The Theory and Practice of Sequence Comparison, v–xv. Stanford,
Calif.: CSLI.
Nettle, Daniel (1999). Using social impact theory to simulate language change. Lingua
108: 95–117.
Newman, Stanley (1946). On the stress system of English. Word 2: 171–87.
Newmeyer, Frederick J. (1998). On the supposed ‘counterfunctionality’ of universal
grammar: some evolutionary implications. In James R. Hurford, Michael Studdert-
Kennedy, and Chris Knight (eds), Approaches to the Evolution of Language, 305–19.
Cambridge: Cambridge University Press.
Newmeyer, Frederick J. (2001). Agent-assignment, tree-pruning, and Broca’s aphasia.
Behavioral and Brain Sciences 23: 44–5.
Newmeyer, Frederick J. (2002). Optimality and functionality: a critique of functionally-
based optimality-theoretic syntax. Natural Language and Linguistic Theory 21: 43–80.
Nishigauchi, Taisuke (1984). Control and the thematic domain. Language 60: 21–50.
Nooteboom, S. G. (1981). Lexical retrieval from fragments of spoken words: begin-
nings vs. endings. Journal of Phonetics 9: 401–24.
Nowak, Andrzej, Jacek Szamrej, and Bibb Latané (1990). From private attitude to
public opinion: a dynamic theory of social impact. Psychological Review 97: 362–76.
Otero, Carlos (1972). Acceptable ungrammatical sentences in Spanish. Linguistic
Inquiry 3: 233–42.
Ouhalla, Jamal (1994). Verb movement and word order in Arabic. In David Lightfoot
and Norbert Hornstein (eds), Verb Movement, 41–72. Cambridge: Cambridge
University Press.
Partee, Barbara, Alice ter Meulen, and Robert E. Wall (1990). Mathematical Methods in
Linguistics. Dordrecht: Kluwer.
Perlmutter, David M. (1983). Studies in Relational Grammar. Chicago: University of
Chicago Press.
Pesetsky, David (1979). Russian morphology and lexical theory. Dissertation, MIT.
Pesetsky, David (1982). Complementizer-trace phenomena and the nominative island
condition. Linguistic Review 1: 297–343.
Pesetsky, David (1987). Wh-in-situ: movement and unselective binding. In Eric
J. Reuland and Alice G. B. ter Meulen (eds), The Representation of (In)Definiteness,
98–129. Cambridge, Mass.: MIT Press.
Peters, Anne M., and Lise Menn (1993). False starts and filler syllables: ways to learn
grammatical morphemes. Language 69: 742–77.
Piñango, Maria Mercedes (1999). Real-time processing implications of aspectual
coercion at the syntax–semantics interface. Journal of Psycholinguistic Research 28:
395–414.
references 371
Wasow, Thomas (1980). Major and minor rules in lexical grammar. In Teun Hoekstra,
Harry van der Hulst, and Michael Moortgat (eds), Lexical Grammar, 285–312.
Dordrecht: Foris.
Wasow, Thomas (1997). Remarks on grammatical weight. Language Variation and
Change 9: 81–106.
Wexler, Kenneth, and Peter W. Culicover (1980). Formal Principles of Language
Acquisition. Cambridge, Mass.: MIT Press.
Wilkins, Wendy (1977). The variable interpretation condition. Doctoral dissertation,
UCLA.
Wilkins, Wendy (1980). Adjacency and variables in syntactic transformations. Linguis-
tic Inquiry 1: 709–58.
Wilkins, Wendy (1985). On the linguistic function of thematic relations. Paper pre-
sented at Symposium on Thematic Relations, Seattle.
Wilkins, Wendy (1986). El sintagma nominal de infinitivo. Revista argentina de lingüı́s-
tica 2: 209–29.
Wilkins, Wendy (2005). Anatomy matters. Linguistic Review 22: 271–88.
Williams, Edwin (1977). Discourse and logical form. Linguistic Inquiry 8: 101–40.
Williams, Edwin (1980). Predication. Linguistic Inquiry 1: 203–38.
Williams, Edwin (1981). Remarks on stress and anaphora. Journal of Linguistic
Research 1: 1–16.
Yarowsky, David, and Richard Wicentowski (2000). Minimally supervised morpho-
logical analysis by multimodal alignment. In Proceedings of the 38th Annual Meeting
on Association for Computational Linguistics, 207–16.
Index
accent 118, 119; see also accent placement Bing, J. 76, 83, 111, 115, 118
accent placement 73, 74, 75, 78, 83–5, 92, Bolinger, D. 71, 74, 83, 89, 99, 115–17
93, 104, 112, 114, 116, 117 Borer, H. 74, 230
Adger, D. 4 Brame, M. 138
Adverb Effect 212, 256–68 Bransford, J. 321
Aissen, J. 318 Bresnan, J. 2, 46, 115, 116, 130, 134, 138,
Akmajian, A. 168, 187, 301 146, 164, 205, 225, 256, 271, 274,
Anderson, S. 125, 126, 135 318, 319
Arabic 246, 255 Brew, C. 334
Arnold, J. 270 Briscoe, E. 310, 316
Austin, J. 34 Brown, R. 8, 11, 319
Authier, M. 212, 213, 218, 236
Autonomous Systems view 72, 75, 83, c-construable 71, 107, 109–14
100, 115, 116, 117 CED 200, 264, 266, 268
autonomy 53, 72, 115, 116, 117, 118, 132 Chafe, W. 114
Chierchia, G. 127
Bach, E. 138 Chomsky, N. 1, 4, 28, 29, 30, 46, 58, 67,
Baker, M. 214, 215 71–3, 77, 78, 82, 85, 92, 100–2, 107,
Baltin, M. 155, 199, 201, 215, 250 115–17, 121, 122, 125, 137, 146, 151,
Bare Argument Ellipsis 5–7, 15, 28 156–9, 161, 164, 173, 178, 199, 210, 211,
Barker, C. 256 214, 215, 218, 239, 243, 244, 251, 261,
Baroni, M. 341, 342 262, 264–7, 308, 310, 316,
Barss, A. 212 331, 335
Basque 337 Churchward, C. 356
Bavarian 223 Cinque, G. 212, 214, 216, 221, 222, 249
Bayer, J. 223 cliticization 97–100
Becker, T. 356 coherence 53, 68, 70
Beckman, M. 166, 277, 334 coindexing 102, 121–30, 132–42, 145, 146,
Beerman, D. 191, 271 150, 152, 155, 157–60, 186, 188, 214, 221,
Belletti, A. 212 259, 260, 282, 283
Berman, A. 115, 116 Cole, R. 340
Berwick, R. 321 COMP 124, 151, 152, 160–1, 178, 185
Beukema, F. 223 Complement Principle 199, 203, 211
Bever, T. 8, 301, 308 complexity
bias 311, 315, 316, 333, 338 computational/processing 309–11, 32
Bierwisch, M. 85, 115 compositionality 2, 7, 10
bijacent 128, 129, 136, 140 conditionals (and OM-sentences) 16, 18,
Binding Theory 194, 232 22–4, 30, 32, 41, 43, 44, 52
Condition C 194–8 Connine, C. 340
376 index
Haegeman, L. 213 120, 122, 123, 125–8, 134, 137, 138, 141,
Hagoort, P. 8, 11 151, 160, 172, 231, 280, 311, 317, 319,
Haider, H. 252 323, 325
Hale, J. 305 Jacobson, P. 5
Hale, K. 72, 83, 100, 116, 132 Jaeggli, O. 73, 92, 101, 146
Halle, M. 46, 77, 85, 115–17 Jakamik, J. 340
Harnish, M. 15 Janda, R. 356
Harris, Z. 352 Jelinek, E. 120
Hauser, M. 10 Johnson, E. 352
Hawkins, J. 311, 320, 322, 327, 329, 334, Johnson, K. 205, 213, 239, 244, 279
336–8 Joseph, B. 334
Hay, J. 341, 352 Jusczyk, P. 252
Head Rule 78, 79, 81, 83, 84, 97, 98 juxtapositional interpretation
heavy inversion (HI) 269–71, 277, 279, (OM-sentences) 19, 20, 21, 24
280–9
Heavy NP Shift (HNPS) 180, 181, 191, Kathol, A. 276
203–11, 269–72, 277, 280–5, 288 Katz, J. 56, 101, 229, 245
Hebrew 213 Kay, P. 52, 53, 169
Heny, F. 168 Kayne, R. 101, 191, 200, 261, 271, 283, 324,
Hoeksema, J. 356 328, 329, 333
Hoekstra, E. 217 Keenan, E. 138
Hoekstra, T. 217 Kehler, A. 5
Hofmeister, P. 324 Kirby, S. 316
Hooper, J. 240 Kisseberth, C. 60, 61, 63
Horvath, J. 90, 245, 246 Kitagawa, C. 120
Hume, E. 334, 356, 357 Klavans, J. 10
Hungarian 14, 246, 252, 255 Klein, S. 123, 151
Hyman, L. 71 Klima, E. 49, 50, 54, 55, 168, 213, 223, 229,
297, 301
Icelandic 285 Kluender, R. 204, 321
idiom 9, 290, 325, 332 Korean 354
imperative 20, 23–5, 54, 59 Koster, J. 121, 122, 138, 140, 150–4,
incongruence interpretation 185–8, 151
(OM-sentences) 17, 21, 22, 30, 38, Kruksal, J. 345
46–9 Kuroda, S.-Y. 298
island
topic 213, 214, 216–22, 238, 249, 255, Ladd, R. 74, 75, 83, 111, 112, 115–18
257, 258, 279, 280 Laka, I. 213, 224, 227, 228, 233, 239, 240,
LF 197 241, 244, 247
wh- 197, 217, 263, 264, 267 Lakoff, G. 61, 70, 115, 116
Italian 3, 213, 285, 336 Lakoff, R. 64
Landau, I. 147
Jackendoff, R. 1, 5, 6, 7, 9, 10, 15, 28, 30, Lappin, S. 5, 317
32, 37, 50, 55, 56, 71, 79, 81, 116, 117, Larson, R. 204, 206, 207
378 index