You are on page 1of 395

Explaining Syntax

Previous books

Syntax
Academic Press 1976
Syntactic Nuts: Hard Cases in Syntax
Foundations of Syntax, I
Oxford University Press 1999
Parasitic Gaps
co-edited with Paul M. Postal
MIT Press 2001
with Andrzej Nowak
Dynamical Grammar
Foundations of Syntax, II
Oxford University Press 2003
with Ray Jackendoff
Simpler Syntax
Oxford University Press 2005
with Elizabeth Hume
Basics of Language for Language Learners
Ohio State University Press 2010
Explaining Syntax
Representations, Structures,
and Computation

P E T E R W. C U L I C OV E R

1
3
Great Clarendon Street, Oxford, ox2 6dp,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
# Peter W. Culicover 2013
The moral rights of the author have been asserted
First Edition published in 2013
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence, or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
ISBN 978–0–19–966023–0
As printed and bound by
CPI Group (UK) Ltd, Croydon, cr0 4yy
Contents

Preface xi
1 Prologue. The Simpler Syntax Hypothesis (2006) 1
1.1 Introduction 1
1.2 Two views on the relation between syntax and semantics 2
1.3 Mainstream syntactic structures compared with Simpler Syntax 3
1.4 Application to Bare Argument Ellipsis 5
1.5 Some other cases where Fregean compositionality does not hold 7
1.5.1 Metonymy 7
1.5.2 Sound + motion construction 7
1.5.3 Beneficiary dative construction 7
1.6 Choosing between the two approaches 8
1.7 Rules of grammar are stored pieces of structure 9
1.8 Conclusion 11

Part I. Representations
2 OM-sentences: on the derivation of sentences with systematically
unspecifiable interpretations (1972) 15
2.1 Introduction 16
2.2 On OM-sentences 16
2.2.1 The readings of OM-sentences 17
2.2.2 A possible source for and-OM-sentences 18
2.2.3 The conjunction 19
2.2.4 Or-OM-sentences 22
2.3 What can a consequential OM-sentence mean? 25
2.4 Some proposals for derivation 28
2.4.1 Can there be deletions? 28
2.4.2 Do consequential OM-sentences have if ’s in deep structure? 31
2.4.3 How do you derive an OM-sentence? 36
2.4.4 Comparing approaches 41
2.4.5 Sequence of tenses 42
2.4.6 The consequences for phrase structure 45
2.5 The incongruence reading of and-OM-sentences 46
2.6 Rhetorical OM-sentences and the incongruence reading 49
2.7 Summary 52
vi contents

3 On the coherence of syntactic descriptions (1973) 53


3.1 Rules for tags 53
3.2 Orderings 55
3.3 Neg-contraction 57
3.4 More orderings 58
3.5 Emphatic tags 59
3.6 Some implications 60
3.7 The impossibility of collapsing tag rules 61
3.8 Similarity 63
3.9 Capturing similarity 65
3.10 Definitions 66
3.11 Coherence 68
3.12 Towards a general notion of similarity 69
4 Stress and focus in English (1983) 71
4.1 Introduction 71
4.2 Prosodic structure 74
4.2.1 The mapping 78
4.2.2 Accent placement 83
4.2.3 Stress 85
4.2.4 Wh-constructions 88
4.2.5 Cliticization 97
4.3 Assignment of focus 100
4.3.1 The formal representation of focus 101
4.3.2 Some applications of focus assignment 104
4.4 The interpretation of focus 105
4.5 Summary and review 114
5 Control, PRO, and the Projection Principle (1992) 120
5.1 Introduction 121
5.2 A theory of predication 122
5.2.1 Phrase structure and lexicon 124
5.2.2 A coindexing rule 127
5.2.3 VP predicates and control 134
5.2.4 Non-obligatory control and secondary predication 139
5.2.5 Control in Spanish 141
5.3 Arguments against syntactic PRO 146
5.3.1 Gapping (I) 147
5.3.2 Gapping (II) 147
5.3.3 Pseudo-clefts 148
5.3.4 Appositive relatives 148
5.3.5 Conjunction 149
5.3.6 Stylistic Inversion 149
contents vii

5.4 Arguments of Koster and May (1981) for syntactic PRO 150
5.4.1 Wh-infinitives 151
5.4.2 Redundancy of base rules 152
5.4.3 Pseudo-clefts 153
5.4.4 Extraposition 153
5.4.5 Coordination 154
5.4.6 Construal 154
5.5 Comparison with the Projection Principle 155
5.5.1 The categorial component and the lexicon 156
5.5.2 Raising to subject 159
5.5.3 NP-trace 160
5.5.4 Acquisition 161
5.6 Conclusion 162
6 Negative curiosities (1982) 163
6.1 Introduction 164
6.2 Tags: the polarity facts 165
6.2.1 Types of tag 166
6.2.2 Syntactic analysis of tags 168
6.2.3 Determinants of tag polarity 171
6.2.4 Deriving the ambiguity 175
6.2.5 Tags and surface structure scope 177
6.3 Any 179
6.4 More curiosities 184
6.5 Conclusion 188

Part II. Structures


7 Deriving dependent right adjuncts in English (1997) 191
7.1 Introduction 191
7.2 Properties of extraposition constructions 192
7.2.1 Relative clause extraposition 192
7.2.2 Result clause extraposition 196
7.3 The Complement Principle 199
7.4 Extraposition is not rightward movement 199
7.5 Leftward movement 200
7.5.1 Stranding 200
7.5.2 Low adjunct 201
7.5.3 High specifier 202
7.6 HNPS and PTI 203
7.6.1 Properties 203
7.6.2 Leftward movement and rightmost heavy noun phrases 206
7.6.3 Phrase bounding 209
7.7 Conclusion 210
viii contents

8 Topicalization, inversion, and complementizers in English (1992) 212


8.1 Introduction 213
8.2 Two landing sites 215
8.3 Additional evidence 219
8.3.1 Suspension of that-t ECP effects 219
8.3.2 Subject Aux Inversion (SAI) 223
8.3.3 Whether 228
8.3.4 Elliptical constructions 230
8.3.5 Why and how come 233
8.4 Extension to focus 239
8.4.1 Licensing subjects 239
8.4.2 Implications of internal PolP 244
8.4.3 Pol as focus in English 247
8.4.4 Comparative Germanic 252
8.5 Summary 255
9 The Adverb Effect: evidence against ECP accounts of
the that-t effect (1992) 256
9.1 The Adverb Effect 257
9.2 Other complementizers 262
9.3 Parasitic gaps 264
9.4 Summary 268
10 Stylistic Inversion in English: a reconsideration (2001) 269
10.1 Introduction 270
10.2 PP is a subject 271
10.3 Light and heavy inversion 276
10.4 Conclusion 289

Part III. Computation


11 A reconsideration of Dative Movements (1972) 295
11.1 Introduction 296
11.2 The syntax of indirect objects 296
11.3 Perceptual strategy constraints on acceptability 301
11.4 Application of perceptual strategy to dative movements 305
12 Markedness, antisymmetry, and complexity of constructions (2003) 309
12.1 Introduction 310
12.2 Change and clustering 311
12.2.1 The simulation model 311
12.2.2 Gaps 312
contents ix

12.3 Markedness and computational complexity 317


12.3.1 OT 317
12.3.2 The basis for markedness 319
12.4 The computation of complexity 324
12.4.1 Distance 324
12.4.2 Stretching and twisting 329
12.5 Summary 332
13 Morphological complexity outside of universal grammar (1998) 334
13.1 Background 334
13.1.1 Types of inflectional morphology 335
13.1.2 A classical example: prefix–suffix asymmetry 336
13.2 Our approach 338
13.2.1 Complexity 339
13.2.2 Acquisition complexity: the dynamical component 339
13.3 Relevant studies in acquisition and processing 340
13.3.1 Lexical processing 340
13.3.2 External cues for morphology acquisition 340
13.3.3 Computational acquisition of paradigms 341
13.4 The complexity model 343
13.4.1 Semantic similarity 343
13.4.2 Similarity of forms 344
13.4.3 Model 0: standard Levenshtein distance 344
13.4.4 Model 1: matching strings in time 346
13.4.5 Possible further extensions 351
13.5 Conclusion 354
13.A Morphology acquisition by neural networks 354
13.B Templatic morphology, metathesis 355
13.B.1 Templatic morphology 355
13.B.2 Metathesis 356

References 358
Index 375
This page intentionally left blank
Preface

The articles collected here are all concerned in one way or another with a
question that has engaged me ever since I began my study of natural language
syntax: why does syntax have the properties that it has? In order to even
attempt to imbue this question with empirical content, it is essential to
determine what “syntax” is, and what its properties are. When I began the
study of syntax as a graduate student in the 1960s, I thought I understood this,
more or less, but as time has progressed, what seemed obvious or at least not
to be disputed has become much less clear to me, and much more unstable.
Some of the results of my attempts to reconstruct what “syntax” is, and what
its properties are, at least for myself (and with my collaborators), are repre-
sented in this book.
This book considers various aspects of what the proper domain of syntax is
(“Representations”), how to properly characterize the syntax of a language
(“Structures”), and reasons why some syntactic possibilities might be more
likely to be encountered than others (“Computation”). Hence the title—
Explaining Syntax: Representations, Structures and Computation.
Collecting a representative set of articles such as this allows for some
unique opportunities. One can look back and see how far one has come in
some respects, one can look back and see how little one has changed in other
respects, and one can correct errors, omissions, and various infelicities. And,
not insignificantly, one can renew one’s acquaintance with one’s earlier avatars,
a process occasionally accompanied by recognition, amazement, or shock. It is
very gratifying to be able to do all these things here.
In looking back, I find the seeds of my most recent work, Syntactic Nuts,
Simpler Syntax (with Ray Jackendoff) and Grammar and Complexity (forth-
coming), in some of the pieces that I worked on as much as forty years ago.
For example, in “OM-sentences: on the derivation of sentences with system-
atically unspecifiable interpretations” (reprinted here as Chapter 2), I was
concerned with the fact that distributional patterns found in certain con-
structions that could be attributed to invisible syntactic structure need not
be attributed to such structure if we take into account the fact that these
constructions have interpretations that can be held responsible for the pat-
terns. By taking this position I was swimming against the mainstream of the
time, which for the most part has accepted without question the rule of
thumb that if two sentences show the same distributional pattern, they have
the same syntactic structure (visible or not). After forty years, I find that I am
xii preface

still swimming against the mainstream (in this regard, at least—see the
treatment of ellipsis in Simpler Syntax and more recently in Culicover and
Jackendoff, 2012), although perhaps with more company than forty years ago.
On the other hand, much has changed. Perhaps the most important change
concerns the status of linguistic unacceptability. Ray Jackendoff and
I suggested in “A reconsideration of Dative Movements” (reprinted here as
Chapter 11) that certain instances of unacceptability might be due to the way
in which interpretations of sentences are computed, and not to the grammar
per se. We wrote “The distinction between the rules of the grammar and how
the rules are used by the speaker or hearer to create or interpret sentences is
still scrupulously maintained. All that is changed is that it is no longer so
obvious what sentences are to be generated by the rules: we cannot rely
entirely on intuition to determine whether an unacceptable sentence is gram-
matical or not (using ‘grammatical’ in the technical sense ‘generated by the
grammar’).” This is a perspective that I take up and elaborate on at some
length in Grammar and Complexity.
Another theme that has occupied me for much of the past forty years has
been the proper treatment of ‘constructions’ in grammar. I explored this issue
in “On the coherence of syntactic descriptions”, where I tried to capture the
naturalness of a grammar containing a set of distinct constructions that make
use of similar or identical structures. When this paper was published in 1973, it
was still commonplace to think of grammars as consisting of constructions.
Formal syntacticians were just beginning to contemplate the idea that con-
structions are epiphenomenal reflexes of more abstract parameter settings.
This latter view had its roots in the analysis of the passive construction in
Chomsky’s “Remarks on nominalization” (Chomsky, 1972) and came to occupy
a central position in mainstream work over the next twenty years or so. But as
many of the papers included here show, I have always taken seriously the idea
that constructions are properly part of grammars, not epiphenomenal. In
Grammar and Complexity I come back to the role of constructions in defining
the formal complexity of a grammar and in accounting for language change.
In order to provide a more general overview of these various themes and to
link the pieces reproduced here to more recent developments in the field,
I include a brief article entitled “The Simpler Syntax Hypothesis”, by Ray
Jackendoff and myself as Chapter 1. For those chapters that originally lacked
abstracts I have written brief summaries that highlight their main goals,
results, and shortcomings, and link them to later work. I have taken the
opportunity in editing the articles to correct a few youthful indiscretions
and overstatements, to fix errors in trees and references, adding those that
should have been cited but were not, to omit some discussion that is particu-
larly irrelevant to contemporary concerns, and to interject a few comments
preface xiii

where it seems to me that some additional clarification or cross-referencing


is necessary or an observation is pertinent. These comments for the most
part take the form of lettered footnotes, which I have tried to keep to a
minimum in order to maintain the flow of the narrative; there are a few
minor comments in square brackets where a footnote would be overkill.
I have introduced or revised section headings and numbers, and made a
number of other minor alterations in order to achieve a more uniform format
for the chapters.
Yet another welcome opportunity afforded by putting together this collec-
tion is that I am able to fully acknowledge my gratitude to my collaborators
Jirka Hana, Ray Jackendoff, Bob Levine, Andrzej Nowak, Michael Rochemont
and Wendy Wilkins. I have been blessed by being in a position to work with a
number of wonderful scholars, and to accomplish with them results that
I could never have imagined achieving on my own. I am so pleased that
they have given me permission to reproduce our joint work here. While in
science it is certainly true that the destination is of critical importance, the
journey has been most extraordinary.
Each article contains an acknowledgment of the original publisher. I am
also grateful to two reviewers of this collection for Oxford University Press for
their useful feedback and suggestions, many of which I have followed up on.
This page intentionally left blank
1

Prologue
The Simpler Syntax Hypothesis
(2006)*

Peter W. Culicover and Ray Jackendoff

What roles do syntax and semantics have in the grammar of a language? What
are the consequences of these roles for syntactic structure, and why does
it matter? We sketch the Simpler Syntax Hypothesis, which holds that much
of the explanatory role attributed to syntax in contemporary linguistics
is properly the responsibility of semantics. This rebalancing permits broader
coverage of empirical linguistic phenomena and promises a tighter integra-
tion of linguistic theory into the cognitive scientific enterprise. We suggest
that the general perspective of the Simpler Syntax Hypothesis is well suited
to approaching language processing and language evolution, and to computa-
tional applications that draw upon linguistic insights.

1.1 Introduction
What roles do syntax and semantics have in the grammar of a language, and
what are the consequences of these roles for syntactic structure? These
questions have been central to the theory of grammar for close to 50 years.
We believe that inquiry has been dominated by one particular answer to these
questions, and that the implications have been less than salutary both for
linguistics and for the relation between linguistics and the rest of cognitive
science. We sketch here an alternative approach, Simpler Syntax (SS), which
offers improvements on both fronts and contrast it with the approach of
mainstream generative grammar (Chomsky 1965; 1981a; 1995). Our approach,
developed in three much more extensive works (Culicover 1999; Jackendoff
2002; Culicover and Jackendoff 2005), draws on insights from various

* [This chapter appeared originally in Trends in Cognitive Sciences 10: 413–18 (2006). It is
reprinted here by permission of Elsevier.]
2 explaining syntax

alternative theories of generative syntax (Perlmutter 1983; Pollard and Sag


1994; Van Valin and LaPolla 1997; Bresnan 2001; Goldberg 2006).

1.2 Two views on the relation between syntax


and semantics
A central idealization behind mainstream generative grammar, shared by
much of formal logic and other approaches to language, is classical Fregean
compositionality (FC):
FC: “The meaning of a compound expression is a function of the meaning of its
parts and of the syntactic rules by which they are combined.” (Partee et al. 1990)
Although many linguistic phenomena are known to be problematic for
this view, it is fair to say that a strong form of FC is generally taken to be a
desideratum of syntactic theory construction.
FC appears to be violated, for example, in circumstances where certain
aspects of sentence meaning do not seem to be represented in the words or
syntactic structure of the sentence. In sentence (1), one understands Ozzie to
be not only the ‘tryer’ but also the ‘drinker’, even though the noun phrase
Ozzie is not overtly an argument of the verb drink.
(1) Ozzie tried not to drink.
The masterstroke behind mainstream generative grammar was to propose
that the missing piece of meaning is supplied by an element in a covert level of
syntactic structure (‘deep structure’ in early work, later ‘Logical Form’).
Sentence (1) has the covert form (2), in which the verb drink actually does
have a subject—PRO, an unpronounced pronoun whose antecedent is Ozzie.
(2) Ozzie tried [PRO not to drink].
Such an approach is effective—and appealing—for relatively straightforward
situations such as (1). However, we show that carrying this strategy through
systematically leads to unwelcome consequences.
Alternatives to FC are:
Autonomous Semantics/AS: Phrase and sentence meanings are composed from
the meanings of the words plus independent principles for constructing
meanings, only some of which correlate with syntactic structure.
Simpler Syntax Hypothesis/SSH: Syntactic structure is only as complex as it
needs to be to establish interpretation.
Under SSH, sentence (1) needs no hidden syntactic structure. The fact that
Ozzie is understood as the ‘drinker’ results from a principle of semantic
the simpler syntax hypothesis 3

interpretation that assigns Ozzie this extra role. Thus, semantics can have
more elaborate structure than the syntax that expresses it.
Let us make more precise our notion of syntactic complexity. For Simpler
Syntax, the complexity of syntactic structure involves the extent to which
constituents contain subconstituents, and the extent to which there is invis-
ible structure. Thus, the structure of A in (3a) is simpler than in (3b) or (3c),
where  is an invisible element. SS will choose (3b) or (3c) only if there is
empirical motivation for the more complex structure.
(3) a. [A B C D]
b. [A B [a C D]]
c. [A B [a  C D]]
SSH allows the possibility of abstract elements in language when there
is empirical motivation for their syntactic (and psychological) reality. In
particular, it acknowledges the considerable linguistic and psycholinguistic
evidence for ‘traces’—the gaps that occur in languages such as English when
constituents appear in non-canonical position (Featherston 2001):
(4) What do you think you’re looking at ___ ?
Theories like that, I have a really hard time believing in ___.
Despite the considerable reduction of complexity under Simpler Syntax,
syntactic structure does not disappear altogether (hence the term ‘simpler
syntax’ rather than ‘simple’ or ‘no syntax’). It is not a matter of semantics that
English verbs go after the subject but Japanese verbs go at the end of the
clause—nor that English and French tensed clauses require an overt subject
but Spanish and Italian tensed clauses do not; that English has double object
constructions (give Bill the ball) but Italian, French, and Spanish do not;
that English has do-support (Did you see that?) but Italian, French, German,
and Russian do not; that Italian, French, and Spanish have object clitics
(French: Je t’aime) before the verb but English does not. It is not a matter
of semantics that some languages use case morphology or verbal agreement,
or both, to individuate arguments. That is, there remains a substantial body
of phenomena that require an account in terms of syntactic structure.

1.3 Mainstream syntactic structures compared


with Simpler Syntax
The choice between the two approaches to (1) does not seem especially
consequential. However, following FC to its logical end turns out to have
radical consequences for the syntactic analysis of even the simplest sentences.
For example, Figure 1.1(a) shows the structure of the sentence Joe has put those
raw potatoes in the pot, based on the treatment in a contemporary mainstream
4 explaining syntax

a. TP

DP[nom] T⬘

Joe T PerfP

have T <have> vP

pres DP v⬘

<Joe> v VP

V v DP[acc] V⬘

put en Spec D⬘ V PP

those D nP <put> P DP[acc]

<those> AP nP in D nP

raw n NP the n NP

potatoes n N pot n N

<potatoes> <pot>

b. S

NP Aux VP

Joe has V NP PP

put Det AP N P NP

those Adj potatoes in Det N

raw the pot

Figure 1.1. (a) A mainstream analysis of Joe has put those raw potatoes in the pot.
Elements in brackets are unpronounced copies of elements elsewhere in the tree.
(b) Simpler Syntax analysis of Joe has put those raw potatoes in the pot.

textbook for beginning graduate students (Adger 2003). The literature offers
many other variants of comparable complexity.
Figure 1.1(a) is representative of the most recent version of mainstream
theory, the Minimalist Program (Chomsky 1995; Lasnik 2002). Such a
structure typically incorporates many elements that do not correspond to
perceived form (e.g. v, n, and multiple copies of Joe, have, put, and potatoes),
as well as many constituents that are motivated largely on theoretical
grounds. Classical constituency tests, such as the ability to displace as a
unit, provide motivation only for major constituent divisions such as TP,
DP, and PP.
the simpler syntax hypothesis 5

By contrast, in SS this sentence has the structure in Figure 1.1(b), which


contains only the classical constituent divisions and which has no hidden
elements or inaudible copies.

1.4 Application to Bare Argument Ellipsis


Differences between mainstream theory and SS emerge also in many
other cases. One compelling phenomenon is Bare Argument Ellipsis (BAE),
illustrated in B’s reply to A in example (5). (We sketch here only the highlights
of the detailed argument in Culicover and Jackendoff 2005.)
(5) A: Ozzie says that Harriet’s been drinking.
B: Yeah, scotch.
B’s reply conveys the same meaning as sentence (6), thus going beyond the
meanings of Yeah and scotch.
(6) B: Yeah, Harriet’s been drinking scotch.
If all aspects of understanding must be explicit in syntactic structure, it is
necessary to posit (i) a complete syntactic structure for B’s reply along the
lines of (6), and (ii) a syntactic or phonological process that deletes everything
but the words yeah and scotch. This deletion has to be based on syntactic
identity with the antecedent of the ellipsis—that is, the relevant portions of
A’s preceding statement.
In SS, such full syntactic structure and deletions are unnecessary.
The syntactic structure of B’s reply is just the string of two words, and its
interpretation is determined by grafting the meanings of the two words
onto an appropriate place in the meaning of A’s statement, without any
necessary syntactic support (Jacobson 1992; Lappin 1996; Stainton 1998;
Kehler 2000).
At this point, the FC and SS accounts diverge. The relation between the
elliptical utterance and its antecedent depends not on syntactic identity, but
rather on delicate factors in the semantics of the antecedent. For instance,
there is no syntactic difference among A’s utterances in (5) and (7), but the
interpretation of the antecedent is clearly different.
(7) a. A: Ozzie fantasizes that Harriet’s been drinking.
B: Yeah, scotch. [‘Ozzie fantasizes that Harriet’s been drinking scotch’, not
‘Harriet’s been drinking scotch’]
b. A: Ozzie doubts that Harriet’s been drinking.
B: Yeah, scotch. [no plausible interpretation]
6 explaining syntax

An approach to ellipsis that depends only on syntactic structure cannot


capture these differences.
Moreover, in many examples of ellipsis, the putative hidden syntactic forms
either are ungrammatical (8i and 9i) or diverge wildly from the form of the
antecedent (8ii and 9ii).
(8) A: John met a guy who speaks a very unusual language.
B: Which language?
i. *Which language did John meet a guy who speaks?
ii. Which language does the guy who John met speak? (Ross 1969b;
Lasnik 2001; Merchant 2001)
(9) A: Would you like a drink?
B: Yeah, how about scotch.
i. *Yeah, how about would you like scotch.
ii. Yeah, how about you giving me scotch.
The antecedent can even extend over more than one sentence, so the ellipsis
cannot possibly be derived from a hidden syntactic clause.
(10) It seems we stood and talked like this before. We looked at each other in
the same way then. But I can’t remember where or when. (Rodgers and
Hart 1937)
This is not to say that ellipsis is a purely semantic phenomenon. It is also
constrained by the syntax and lexicon of the language, as seen in (11) and (12).
(11) A: Ozzie is flirting again.
B: With who(m)?
B0 : *Who(m)?
(12) A: What are you looking for?
B: Those. [pointing to a pair of scissors]
The ellipsis in (11) must include with because flirt, in the antecedent, requires
it; this is often taken to be evidence for deletion of a syntactic copy of the
antecedent (Merchant 2001). However, the ellipsis in (12) must be plural, not
because of something in the antecedent but because the unmentioned word
scissors is plural. SSH proposes a mechanism that accounts for these cases
together (Culicover and Jackendoff 2005).
Examples (8)–(10) and (12) show that in general BAE cannot be accounted for
by deletion of syntactic structure that is identical to the antecedent. Thus, there
appears to be no reason to invoke such an account for cases such as (5) and (11)
either. Although the meanings of the words certainly contribute to the inter-
pretation of the sentence, they are combined by semantic principles that go
the simpler syntax hypothesis 7

beyond a simple mapping determined by syntactic structure—a richer compo-


sitionality than FC.

1.5 Some other cases where Fregean


compositionality does not hold
BAE is by no means unique. We illustrate several other cases, drawn from
Culicover and Jackendoff (2005). In the following cases, as in BAE, substantive
aspects of the meaning of a phrase or sentence cannot be identified with the
meaning of any individual word or constituent.

1.5.1 Metonymy
An individual can be identified by reference to an associated characteristic, as
when a waitperson says to a colleague,
(13) The ham sandwich over there wants more coffee.
The intended meaning is ‘the person who ordered/is eating a ham sandwich’. FC
requires the syntax to contain the italicized material at some hidden syntactic
level. Another example is (14), in which the interpretation of Chomsky is
clearly ‘a/the book by Chomsky’.
(14) Chomsky is next to Plato up there on the top shelf.
Simpler Syntax says that the italicized parts of the interpretation are supplied
by semantic/pragmatic principles, and the syntax has no role.

1.5.2 Sound + motion construction


(15) The trolley rattled around the corner.
The meaning of (15) is roughly ‘The trolley went around the corner, rattling’.
Rattle is a verb of sound emission, not a verb that expresses motion. Hence,
no word in the sentence can serve as source for the understood sense of the
trolley’s motion. FC requires a hidden verb go in the syntax; SS says this sense
is supplied by a conventionalized principle of interpretation in English that is
specific to the combination of sound emission verbs with path expressions
such as around the corner (Levin and Rappaport Hovav 1995; Goldberg and
Jackendoff 2004).

1.5.3 Beneficiary dative construction


In a double object construction such as build Mary a house (paraphrasing
build a house for Mary), the indirect object (Mary) is understood as coming
8 explaining syntax

into possession of the direct object (a house). The possession component of


meaning does not reside in the meaning of build, Mary, or house, but in the
construction itself. FC requires an explicit but hidden representation of
possession in syntactic structure; SS supplies this sense as a piece of meaning
associated with the double object construction as a whole (Goldberg 1995).
These cases are a small sample of the many well-studied phenomena in
which FC requires hidden elements in syntactic structure, motivated only by
the need for syntax to express full meaning explicitly.
We thus face a choice between two approaches: one in which semantics and
syntax are closely matched but syntactic structure is elaborate and abstract,
and one in which syntactic structure is relatively simple and concrete but
there is considerable mismatch between semantics and syntax. How does one
decide between the two?

1.6 Choosing between the two approaches


We have seen that SSH offers a more general account of empirical linguistic
phenomena such as BAE. Therefore, it should be preferred on grounds
internal to linguistics. However, there are also two reasons why Simpler
Syntax is preferable within the broader cognitive scientific enterprise.
The first reason is that SS enables closer ties between linguistic theory and
experimental research on language processing. Virtually all research on lan-
guage perception and production from the earliest days (Fodor et al. 1974) to
contemporary work (Brown and Hagoort 1999) presumes syntactic structures
along the lines of Figure 1.1(b). We know of no psycholinguistic research that
strongly supports the invisible copies, the empty heads, and the elaborated
branching structure of structures such as Figure 1.1(a) (but see Bever and
McElree 1988; Bever and Townsend 2001; Friedmann and Shapiro 2003;
Grodzinsky 2000 for experimental evidence for invisible copies in certain
constructions). Tests of processing or memory load involving reaction time,
eye movements, and event-related potentials appear to be sensitive to relative
complexity in structures of the SS sort. We know of no convincing predictions
based on structures such as Figure 1.1(a) that bear on processing complexity.
Mainstream generative grammar has tended to distance itself from pro-
cessing considerations by appealing to the theoretical distinction between
competence—the ‘knowledge of language’—and performance—how know-
ledge is put to use in processing. According to this stance, psycholinguistics
need not bear directly on the adequacy of syntactic analyses. In SS, by
contrast, rules of grammar are taken to be pieces of structure stored in
memory, which can be assembled online into larger structures. In the next
section we sketch some of the motivation behind this construal of
the simpler syntax hypothesis 9

grammatical rules. Thus, Simpler Syntax suggests a transparent relation


between knowledge of language and use of this knowledge, one that has
begun to have a role in experimental studies of online processing and of
aphasia (Piñango 1999; 2000).

1.7 Rules of grammar are stored pieces of structure


Like every other theory of language, Simpler Syntax treats words as stored
associations of pieces of phonological, syntactic, and semantic structure.
However, unlike approaches that assume FC, where only individual words
contribute to the construction of a meaning, SS enables storage of more
complex structures with associated meanings. For instance, an idiom such
as kick the bucket can be stored as an entire verb phrase, associated in memory
with its idiosyncratic meaning, ‘die’. All languages contain thousands of such
complex stored units. Among the idioms are some with idiosyncratic syntac-
tic structure as well as idiosyncratic meaning, for example (16) (Culicover
1999):
(16) Far be it from NP to VP. Far be it from me to disagree with you.
PP with NP! Off with his head! Into the house with you!
How about X? How about a scotch? How about we talk?
NP and S. One more beer and I’m leaving. [Culicover 1970]
The more S. The more I read, the less I understand. [Culicover and
Jackendoff 2005; den Dikken 2005]
These reside in the lexicon as associations of meanings with non-canonical
syntactic structure. Other idioms, including the sound + motion construction
(}1.5.2) and the beneficiary dative (}1.5.3), attach idiosyncratic meaning to a
standard syntactic structure, but do not involve particular words.
Once pieces of syntactic structure can be stored in the lexicon associated
with meanings, it is a simple step to store pieces of syntactic structure that
have no inherent meaning beyond Fregean composition, such as (17).

(17) VP

V NP

This piece of structure is equivalent to a traditional phrase structure rule


VP ! V NP. Thus, it is possible to think of the lexicon as containing all the
rules that permit syntactic combinatoriality. These are put to use directly in
processing, as pieces available for constructing trees.
10 explaining syntax

Simpler Syntax shares this continuity between idiosyncratic words


and general rules with several related frameworks, most notably Head-Driven
Phrase Structure Grammar (Pollard and Sag 1994) and Construction Gram-
mar (Goldberg 2006).
Along related lines, a major objective of computational linguistics is to
assign meanings to strings of words on the basis of some syntactic analysis;
many approaches (e.g. Klavans and Resnik 1996; Manning and Schütze 1999)
combine symbolic and statistical methods to identify the syntactic structure
associated with a string. The syntactic theory most widely used in computa-
tional linguistics is Head-Driven Phrase Structure Grammar (Pollard and Sag
1994), one of the frameworks that adopt some version of SSH. Again, we think
that the reason for this choice is that SSH is sufficient for establishing
interpretation, and more elaborate structure is unnecessary.
There is a second, deeper reason why SSH should be of interest to cognitive
science as a whole. Recall that mainstream generative grammar is based on the
assumption of Fregean compositionality. FC implies that sentence meaning
has no combinatorial structure that is not derived from the syntactic structure
that expresses it.
Now, intuitively, the meaning of a sentence is the thought that the sentence
expresses. Thus, Fregean compositionality suggests that without language
there is no combinatorial thought—a position reminiscent of Descartes.
Such a conclusion flies in the face of overwhelming evidence from compara-
tive ethology that the behavior of many animals must be governed by com-
binatorial computation. Such computation is arguably involved, for instance,
in comprehending complex visual fields, planning of action, and understand-
ing social environments, capacities present in primates as well as many other
species (Gallistel 1990; Hauser 2000). Given its focus on syntax, mainstream
generative grammar has not taken the apparent conflict between these two
conclusions as a central concern.
Simpler Syntax, by contrast, regards linguistic meaning as largely coexten-
sive with thought; it is the product of an autonomous combinatorial capacity,
independent of and richer than syntax. This allows the possibility that
thought is highly structured in our non-linguistic relatives—they just cannot
express it. Combinatorial thought could well have served as a crucial pre-
adaptation for the evolution of combinatorial expression, i.e. human lan-
guage (Jackendoff 2002; Newmeyer 1998; Wilkins 2005).
Some components of meaning, particularly argument structure, are
encoded fairly systematically in syntax. Others, such as modality, aspect,
quantifier scope, and discourse status, receive relatively inconsistent syntactic
encoding within and across languages. On this view, language is an imperfect
but still powerful means of communicating thought.
the simpler syntax hypothesis 11

1.8 Conclusion
The choice between mainstream syntax and Simpler Syntax is important at
three levels.
 First, Simpler Syntax affords broader empirical coverage of grammatical
phenomena.
 Second, Simpler Syntax enables a stronger link between linguistic theory
and experimental and computational accounts of language processing.
Changing the balance between syntax and semantics along the lines
proposed by Simpler Syntax might contribute to resolving longstanding
disputes about their relative roles in language processing (Brown and
Hagoort 1999).
 Third, Simpler Syntax claims that the foundation of natural language
semantics is combinatorial thought, a capacity shared with other
primates. It thus offers a vision of the place of language in human
cognition that we, at least, find attractive.
This page intentionally left blank
PART I

Representations
This page intentionally left blank
2

OM-sentences
On the derivation of sentences with systematically
unspecifiable interpretations
(1972)*

Remarks on Chapter 2
This chapter explores the form and interpretation of ‘OM-sentences’ such as
One more can of beer and I’m leaving. I originally observed in a short squib
(Culicover 1970) that, strikingly, the connectivity between the ‘one more’ phrase
and the conjoined clause is the same as that found in full sentences. Following
the standard mode of argumentation in syntax launched in the 1960s (and still
actively employed to this day), we might then conclude that we get the same
patterns in both cases because the ‘one more’ phrase is the elliptical form of a
full sentence. I argue that this conclusion is wrong; rather, OM-sentences are
instances of a particular construction whose interpretation is constrained by the
form, but not fully specified by the form. It follows that the connectivity must
be mediated by the semantics and pragmatics. Essentially the same arguments
are made in my later work with Jackendoff on related phenomena, e.g. pseudo-
imperatives such as Don’t move or I’ll shoot and Bare Argument Ellipsis (see
Culicover and Jackendoff 2005, and Chapter 1).
The force of this argument goes directly to the question of whether there is
invisible syntactic structure in elliptical constructions. The standard view in
mainstream generative grammar, represented most prominently in current
work by Merchant (2001), is that there is. But the evidence brought forth
in this article and elsewhere (see Chapter 1 and references there) is that the
invisible-structure position can be maintained only if we admit only the most
manageable subset of data in our inquiry. The full range of phenomena
suggests that the interpretation of elliptical constructions cannot in general

* [This chapter appeared originally in Foundations of Language 8: 199–236 (1972). It is


reprinted here by permission of the copyright holder, John Benjamins. I dedicate this chapter
to the memory of Mike Harnish.]
16 explaining syntax

simply be read off of invisible structure under conditions of syntactic identity


with an antecedent. Rather, it is computed by rules of interpretation and
inference, operating over the interpretation of fragments in relation to ante-
cedent syntactic structure and discourse structure.

2.1 Introduction
This paper deals with the treatment in a transformationala grammar of
sentences like the following:
(1) One more can of beer and I’m leaving.
It will be shown in subsequent discussion that such sentences admit of three
‘interpretations’, which are very closely related to more commonly encoun-
tered constructions, including conditionals, but that nevertheless there are
aspects of the interpretation of such sentences which are systematically un-
specifiable. I will argue that these sentences should not be derived from more
complex underlying structures, but that they are in fact underlain by struc-
tures characterizable by phrase structure rule (2).
(2) S ➝ NP CONJ S
To complete the analysis, I will show how rules of semantic interpretation may
be devised which capture the similarities between sentences like (1) and other
constructions in a very natural way.

2.2 On OM-sentences
I will refer to sentences like (1) as ‘OM-sentences’. One of the more noticeable
properties of (1) is that it has an unusual surface structure, which is given
schematically in (3).
(3) NP and S
In general an OM-sentence is a sentence of the form in (3), with possible
variation in the nature of the conjunction. I will also distinguish between
different OM-sentences by the conjunction that they contain, e.g. ‘and-OM-
sentence’, ‘or-OM-sentence’, etc. The NP and the S in (3) will be referred to by
their category labels.

a
Contemporary MGG terminology has dispensed with the classical term ‘transformational’
in favor of the more generic ‘derivational’.
om-sentences 17

2.2.1 The readings of OM-sentences


An OM-sentence, such as (1), may have three different kinds of interpretation.
(4) a. If you drink one more can of beer I’m leaving.
b. After I drink one more can of beer I’m leaving.
c. In spite of the fact that there is one more can of beer here, I’m leaving.
Let us refer to the reading in (4a) as the ‘consequential’ reading, the reading in
(4b) as the ‘sequential’ reading, and the reading expressed by (4c) as the
‘incongruence’ reading. The significance of the first two terms should be clear;
the third is so called because of the sense in which the sentence describes an
unusual or unexpected event or state of affairs.1
It turns about that one’s ability to ‘get’ a particular reading for a given
sentence depends to a considerable extent on the contents of the NP and of
the S. In general, the sequential reading is easiest to get, since it is compara-
tively simple to construct a context in which the event described by the S can
chronologically follow an event involving the NP. It is somewhat more
difficult to construct a context if the further requirement is placed on the
activity described by the S that it somehow follow from the event involving
the NP.
Consider, for example, the following.
*can of beer
Queen of England
(5) The best movie of the year and I’m leaving.
*day before yesterday

The best possible reading for the acceptable cases in (5) is the incongruence
reading. A considerably less acceptable reading is the sequential reading,
which is nevertheless possible if a sufficiently plausible context can be created,
as in (6) and (7).
(6) OK, we will discuss the Queen of England, and then I’m leaving.
(7) OK, I’ll watch (what you call) the best movie of the year, and then I’m
leaving.
It will be noted that the readings for an or-OM-sentence are not the same as
those for an and-OM-sentence such as the ones just discussed. In fact, it
would appear to be the case that there is only one possible reading for an
or-OM-sentence, which in the case of (8) is represented by (9).

1
In }2.4 I discuss ways in which this phenomenon may be further delimited. A solution to
this problem is not crucial, however, to the present discussion.
18 explaining syntax

(8) A thousand cans of beer or I’m leaving.


(9) If you don’t give me a thousand cans of beer I’m leaving.

2.2.2 A possible source for and-OM-sentences


It should come as no surprise that judgments concerning and-OM-sentences
with the consequential interpretation correspond precisely to judgments
about if-then sentences with the same range of auxiliaries. For example,
(10) a. One more can of beer and
b. If you drink one more can of beer I leave.

(11) a. One more can of beer and


b. If you drank one more can of beer I would have left.

but
(12) a. *One more can of beer and
b. *If you had drunk one more can of beer I had left.

(13) a. *One more can of beer and I will have been leaving.
b. *If you had drunk one more can of beer
The acceptable pairs of sentences correspond not only in their acceptability
judgments, but also in their interpretation. For example, (11a) is interpretable
only as a counterfactual: we know that whatever the event is which involves
the NP one more can of beer, it did not take place. (10a), like (10b), is
ambiguous. The latter can be paraphrased by either of the following two
sentences.
(14) a. Whenever you drink one more can of beer I leave.
b. If you drink one more can of beer (than you have already) I will
leave.
The same information can be deduced from (10a): whatever the event involv-
ing the NP is, either (a) I always leave when it happens, or (b) I’m going to
leave if it happens now.
While these observations might seem to be more than abundantly obvious,
it is quite important, I think, to establish clearly how strict the correlation
between conditionals and consequentials is. While it appears to be unavoid-
able that and-OM-sentences and if-then conditionals should be derived from
the same source, considering evidence such as the preceding, nevertheless I do
not believe that the precise nature of the relationship between them is as clear
as it might seem on the surface. I will show in the course of this paper that it is
inappropriate to analyze this relationship in transformational terms.
om-sentences 19

2.2.3 The conjunction


The evidence of the preceding sections indicated that the conjunction and
may be associated with at least three interpretations, while the conjunction or
may be associated with only one. We might go so far as to suggest that the
interpretation of these sentences is centered around the conjunction, either
through interpretive rules or transformations which map certain structures
into and and or. The reason for this is that if the conjunction and were not
involved in determining the possible readings of and-OM-sentences, it would
be surprising that sentences with or did not also display the same range of
variation in their interpretation, since both are coordinating. If the conjunc-
tion did not determine the meaning, or if the underlying structure did not
determine the conjunction, then it would not make any difference what the
conjunction was, assuming that the deep structures were otherwise the same.2
Furthermore, it can be shown on the independent grounds that and may
occur with this range of readings, while or may not. I think that a quite
plausible argument can be made for considering and itself to be the source of
the three readings, and not some deeper structure, although no doubt an
analysis which postulates a deeper structure than the one I propose can be
made to work reasonably well, as far as a mere description of the data goes.b
What I would like to show now is that at the level of sentential coordination
the conjunction and may participate in the assignment of one of at least three
readings. I will call these readings ‘consequential’, ‘sequential’, and ‘juxtapos-
itional’, to express a partial similarity with previously discussed interpret-
ations with respect to OM-sentences.
(15) John came in and Bill jumped out the window.
The consequential reading of (15) may be given as a paraphrase in (16).
(16) Bill jumped out of the window because John came in.
The sequential reading is illustrated in (17).
(17) John came in and then Bill jumped out the window.
The juxtapositional reading may be paraphrased by (18).

2
It might be argued that the deep structures of sentences with or are significantly different
from those with and. If this were true then it would not be possible to appeal to similarity of
structure up to the nature of the conjunction. I see no evidence to suggest, however, that
sentences with and and with or are not all derived from deep structures displaying coordinate
structure.
b
I make much the same argument for not deriving idiosyncratic constructions (‘syntactic
nuts’) from abstract syntactic structures in Culicover (1999) and Culicover (2013).
20 explaining syntax

(18) Two things happened which were not necessarily related: John came in
and Bill jumped out of the window.
Perhaps a better example of the juxtapositional reading, where there is no
likely confusion between it and the other two, is the following.
(19) Last year it rained one foot and it snowed three feet.
The three readings of (15) may be summarized by (20).
therefore
(20) John came in and then Bill jumped out the window.
also
I expect that there will be no doubt that (15) may have these readings. What
is more interesting is that two of these three readings correspond to readings
which we established for the and-OM-sentences, while the third is closely
related to one of them. Compare (4) and (20), for example.
Another case for which the same three readings which are illustrated in (20)
are possible is the following.
(21) Sit down in that chair and I’ll bake you a dumpling.
The consequential reading of this sentence is paraphrased by (22).
(22) If you sit down in that chair I’ll bake you a dumpling.
The sequential reading does not involve any causal relationship between the
request and the activity.
(23) Sit down in that chair, and (then (while you are sitting)) I’ll bake you a
dumpling.
The juxtapositional reading is difficult to get for this sentence: it is most
closely given by reversing the order of the conjuncts in (21).
(24) I’ll bake you a dumpling, and sit down in that chair.
In general it sounds strange to conjoin an imperative with a declarative,
particularly if there is no particular connection between the two, aside from
their being uttered in the same sentence. However, examples are of varying
acceptability depending on the context in which they are or may be used. E.g.,
(25) Albert is coming for dinner, and don’t forget to send out the laundry.
Therefore it is possible to say that the conjunction and in principle has
three readings.3

3
It may also be possible to find cases of constituent conjunction which have the three
readings referred to. For example,
om-sentences 21

The readings which we have been discussing seem to be due to a systematic


ambiguity of the conjunction and. Furthermore, the consequential reading
appears to be a special case of the sequential reading, occurring when a causal
relationship between the two events is possible. In the absence of evidence to
the contrary it is always possible to interpret the second event as following the
first event in time; given the appropriate context it may also be concluded that
the second follows from the first. Which readings are possible in given cases
depends, of course, on the context established by the clauses themselves.
On the basis of these general observations concerning the interpretations of
and when it conjoins sentences describing events, we can account for two of the
three readings of the and-OM-sentences. Assuming that the NP represents some
event involving it, then if the S involves an event explicitly, the entire sentence
may have either the sequential reading or the consequential reading. The
relationship between the juxtapositional reading of the full conjoined sentences
and the incongruence reading of the OM-sentence is not quite as clear, however.
Note that the incongruence reading is possible with the full conjoined
sentences also. In order for this to be the case the right-hand conjunct must
have an exaggerated stress contour.
(26) John has two cases of beer, and I’m going home.
From this we could conclude that there is a fourth reading for the conjunction
and. However, we must observe that it is through the presence of the emphatic
stress contour that the second clause is linked with the first in (26). Otherwise
there is no necessary connection between the two at all, and the juxtapos-
itional reading is possible. So we may conclude that the juxtapositional
reading has two variants: (a) pure juxtaposition, where there is no connection
between the two clauses aside from their being uttered in the same sentence,
and (b) linked juxtaposition, or incongruence, where abnormal stress is
present, and as a consequence some notion of exceptionality is associated
with the fact of juxtaposition itself.
From all this we may say that there are at least three constraints on the
interpretation of the and-OM-sentences: (a) the NP represents an event
involving the NP, (b) the S describes an event, and (c) there is some link

(i) John burned the match and the building.


Under one reading the burning of the building is a consequence of the burning of the match.
Under another reading the burning of the building follows the burning of the match, but is not
directly related to it. Under the third reading both events have taken place, but no claim is made
as to their relative occurrence in time.
It is an open question whether (i) should be considered to be derived by conjunction
reduction from sentential coordination, or whether these readings can be directly associated
with constituent coordination.
22 explaining syntax

between the two events. The word ‘link’ here is used in a rather abstract way,
meaning a temporal relationship, a cause–effect relationship, or the relation-
ship expressed by the incongruence reading, which we might refer to as a
‘mental’ relationship.

2.2.4 Or-OM-sentences
We remarked in }2.1.1 that or-OM-sentences could have only one reading. If
we consider or at the level at which we have been considering and, this fact
becomes surprising, since there are a number of logically possible interpret-
ations for sentences with or. The question is whether the set of meanings of a
sentence of the form S or S is coextensive with the set of logical equivalences of
the sentence. Consider the following example.
(27) John will close the window or Bill will freeze.
The point which I would like to make here4 is that the meaning of the
sentence is more than the logical structure of the sentence. A simple demon-
stration of this is the result of reversing the order of the clauses in (27). The
truth values remain the same, but the meanings change decidedly.
(28) Bill will freeze or John will close the window.
Another logical equivalent is (29)—
(29) If John closes the window Bill won’t freeze and if John doesn’t close the
window, Bill will freeze.
—and so is (30),
(30) If Bill freezes then John won’t close the window and if Bill doesn’t freeze
then John will close the window.
What is going on, evidently, is that the logical properties of implication are
not the same as the properties of conditionals as they are use conventionally.
It is correct to say, I think, that the meaning of or is more than its truth table
would suggest: there is some sense of relatedness between the events described
by the clauses. Furthermore, this relationship is such that the meaning of the
sentence changes when the order of clauses is reversed.
With this in mind it is easy to see why sentences like (31) and (32) mean
what they do.
(31) Stay home or Bill will leave.

4
This is certainly not the first time that this point has been made.
om-sentences 23

(32) One more can of beer or I’m going home.


Since these sentences also have the interpretation that the clause to the right is
somehow dependent on the clause or NP to the left, it is natural to attribute
this to the fact that in general this is a property of clauses conjoined by or. The
alternative, that these sentences are derived from an underlying if-then, is
difficult to justify, owing to the fact that an if-then fails to represent the
imperative nature of what is found to the left of the or. While logically the
clauses are reversible, this characteristic of the left conjunct results in a
different interpretation. If we paraphrase the above sentences by an if-then
construction we get something like the following.5
(33) If you don’t stay home Bill will leave.
(34) If you don’t give me one more beer I’m going home.
It will be recalled that the essential problem with OM-sentences is that while
the and-OM-sentences had certain characteristics of conditionals, the or-OM-
sentences did not. This was found to be surprising in view of the fact that the
conditional interpretation, of which (34) is a sample, appearing to provide a
reasonable paraphrase for both types of sentence.
Now, however, if we reinterpret (34) as being not a paraphrase, but a logical
inference from an or-OM-sentence, then we will have a reasonable means of
accounting for this data.
Let us now make the following assumption: the analysis of and-OM-
sentences is such that at some level of their representation the rules which
permit the occurrence of any in conditionals will also permit the occurrence
of any in and-OM-sentences.6 That is, the acceptability of (35) below is
directly related to the acceptability of (36), just as the interpretation of (35)
is related to the interpretation of (36),
(35) Any more beer and I’m leaving.
(36) If you drink any more beer I’m leaving.

5
Notice that it is not clear how one would go about determining which if-then should be
chosen to underlie these sentences, since certainly a number of logical relationships may be said
to apply between the clauses. From (i) we may infer (ii) or (iii), for example.
(i) Give me a beer or I’ll call a cop.
(ii) If you give me a beer I won’t call a cop.
(iii) If you don’t give me a beer I’ll call a cop.
6
I have stated this assumption in the most general way possible, in order not to prejudice the
discussion by creating particular analyses at this point.
24 explaining syntax

A similar relationship can be seen to hold between (37) and (36) at some level
of representation.
(37) Drink any more beer and I’m leaving.
It is immaterial for this discussion at present whether or not (35) and (37) are
derived from the same deep structure as (36). Whatever the level is at which
we wish to account for the presence of any, we are assuming that these three
sentences are identical at the level with respect to the rule in question.
If we consider now (31) and (33) we see that (33) cannot be a representation
for (31) at any level, since if it were we would expect to find the same behavior
as we do in the case of (35)–(37). We would expect that any would be
acceptable in an or-OM-sentence if (33) was a representation of (31), because
at the level of (33) there is no formal difference between it, and, say, (36). In
particular, we would expect to relate (38) and (39).
(38) If you don’t drink any more beer I’m leaving.
(39) *Any more beer or I’m leaving.
On the basis of this we must conclude that (39) does not contain if or any
element which corresponds to it at the level at which the acceptability of any is
determined.
It would seem to be the case, in fact, that at this level the or-OM-sentence
shares more of the characteristics of imperatives, and not conditionals. For
example, we can insert please into an or-OM-sentence or a sentence like (37),
but not into an and-OM-sentence, or an if-then conditional.
or
(40) One more beer, please, I’m leaving.
*and
or
(41) Give me one more beer, please, I’m leaving.7
*and
(42) *If you (don’t) give me one more beer, please, then I’m leaving.
Another interesting point is that while a conditional and an and-OM-
sentence may have truth value, an or-OM-sentence cannot. Hence it seems

7
Further evidence that sentences like (41) with or are underlying imperatives is that they can
take tags, while the sentences with and cannot.
or
(i) Give me some more beer, will you, I’m leaving.
*and
Sentence (i) with and is acceptable if it is assigned the juxtapositional reading, but not the
consequential. Of interest in this regard is whether (ii) is acceptable.
(ii) Some more beer, will you, or I’m leaving.
I myself find (ii) to be understandable, but marginal in grammaticality. It is quite sobering to
contemplate what the consequences for the grammar of English would be if (ii) were to be
judged grammatical; however, this factor has played no role in my judgment.
om-sentences 25

unlikely that a conditional could even be an adequate paraphrase for an or-


OM-sentence, let alone underlie it.
Let us summarize what we have determined to this point. We have dem-
onstrated that the tripartite interpretation of and-OM-sentences can be
correlated with a more general tripartite interpretation of conjoined struc-
tures linked by and; hence we have concluded that the conjunction and may
be interpreted in one of three ways when it conjoins sentences expressing
events. We also demonstrated that one of these readings, the consequential
reading, possesses some of the properties of conditional if-then sentences.
Upon examining or-OM-sentences we discovered that there was only one
interpretation of sentences linked by or, and that these sentences bore several
properties of imperatives. Again it was shown that the properties of or could
be found in sentences which were more elaborate in structure than the OM-
sentence. In a nutshell, it is no accident that the OM-sentences have the
interpretations they do. What may be more surprising is that they have any
interpretations at all, as we shall see.

2.3 What can a consequential OM-sentence mean?


Let us consider now in a preliminary fashion what the range of paraphrases of
a consequential OM-sentence is. Concerning this question in Culicover
(1970), I said “Given any situation . . . , this situation can be used as a potential
condition under which the proposition [represented by the S] will be true.” As
an example I gave sentence (1), and a number of possible paraphrases.
(1) One more can of beer and I’m leaving.

(43) you give me


I get hit by
I see
If I hear about one more can of beer, I’m leaving.
you buy
John crushes
anybody drinks
...

(44) hits me
explodes
rolls in front of me
If one more can of beer hits you I’m leaving.
hits anyone
comes out of the darkness
...
26 explaining syntax

In Culicover (1970) I referred to sentences like (1) as “potentially infinitely


ambiguous”. I think now that a far better description would be ‘indetermin-
ate’ or ‘vague’. A sentence is ambiguous if it has more than one representation
at the level of semantic interpretation; it is indeterminate if certain aspects of
its interpretation are unspecified. Our problem, therefore, is not to specify
completely what the semantic representations of these sentences may be, as we
would do in the case of ambiguity, but to delineate the range of indeterminacy
of the representations.8
While we do not know what the particular event involving the NP in (1) is,
we do know (a) that it has not occurred, (b) that a number of events involving
similar NPs have occurred, and (c) that the consequence of this event will be
that the S will take place.
There are a number of other things which we can determine from (1). First
of all, whatever the event involving the NP is, it involves only this NP, and no
others. So, for example, we would not infer (45) from (1).
(45) If you drink a scotch and soda and one more can of beer I’m leaving.
Second of all, it must be an event which involves the NP, and not a state. Thus
none of the following examples would be a paraphrase of (1).
(46) If one more can of beer is warm I’m leaving.9
(47) If Mary wants one more can of beer I’m leaving.
(48) If your sketch resembles one more can of beer I’m leaving.
Third of all, the event must involve the NP intrinsically. This is a very difficult
notion to capture, but we may approach it through examples like the
following.10

8
A simple example of this, which was pointed out to me by W. C. Watt, is illustrated by the
following sentence.
(i) John was kicked in the head.
(i) does not specify who or what kicked John in the head. In order for John to have been kicked
in the head, a deep subject of kick must exist; we say that it is indeterminate. However, although
we do not know what or who did the kicking, we do know that it cannot be something which
lacks the capacity to kick. Hence the indeterminacy of the representation of the deep subject of
kick is restricted by the context.
9
But a sentence like (i)
(i) One more warm can of beer and I’m leaving.
is acceptable, since it suggests some event occurring which involves the warm can of beer.
10
The question marks before examples (49) and (50) indicate the infelicity of these sentences
as paraphrases of (1). The question marks before examples (52)–(54) below indicate their
infelicity as paraphrases of (51).
om-sentences 27

(49) ?If John tells me that Mary wants me to buy her one more can of beer,
I’m leaving.
(50) ?If the label of one more can of beer comes off, I’m leaving.
Intuitively it seems that in these examples there is no particular connection
between the can or the beer and my leaving; what is more important is John
telling me in the first case, and the label coming off in the second. Such an
intuition is much stronger when one more is not mentioned in the NP at all.
(51) Two beers and I’m leaving.
(52) ?If John tells me again that Mary wants me to pay for her two beers then
I’m leaving.
(53) ?If John begins to tell that old story about how he was so drunk that he
couldn’t even drink two beers I’m leaving.
(54) ?If a man comes in carrying two beers I’m leaving.
As a first approximation, then, we might say that the NP is understood to
be either the subject or the object of the sentences which may be used to
paraphrase the event involving the NP, which we represent as ‘E(NP)’. How-
ever, we can see immediately that this is at best a weak substitute for the
notion ‘intrinsic connection’ between the NP and E(NP). We can devise
examples in which the NP is the surface subject, and those in which it is
the deep subject, and in neither case can we conclude that such examples are
paraphrases of the corresponding OM-sentence. This indicates that what is
going on is independent of either deep or surface grammatical relations.
In the following examples, for instance, it is clear that the failure of the
(b)-sentences to be paraphrases of the (a)-sentences cannot be attributed to
the grammatical role of the NP without incorrectly denying the existence of
the paraphrase relationship in countless other cases.
(55) a. One more aging film star and I will stop reading the newspapers.
b. If one more aging film star is claimed by the gossip columnists to
have been reported by the Hollywood crowd to be dating a young
starlet I will stop reading the newspapers.
(56) a. One more beer company and I will stop watching TV.
b. If one more beer company announces that its product is the best in
America I will stop watching TV.
Presumably the relationship between the understood role of the NP and the
acceptability of the sentence is not a grammatical one, but a semantic one.
Unfortunately it is not at all clear at this point how this relationship should or
28 explaining syntax

could be characterized. It is conceivable that some progress might be made by


considering the thematic relations of the NPs in the permissible paraphrases,
but at present my speculations along these lines are highly tentative and
cannot be given serious consideration here.

2.4 Some proposals for derivation


There a number of alternative proposals which one might put forth to
account for the data in preceding sections. The two basic issues must be
considered here: (a) Whether or not the deep structure of and-OM-sentences
contains if-then, and (b) Whether or not the attenuated surface structure of
OM-sentences is to be accounted for by one or more deletion transform-
ations. I will argue that the deep structure of and-OM-sentences does not
contain if-then, but and, and that there are no deletion transformations in
operation in the derivations of such sentences.c A question which is subsidiary
to (b) is whether, assuming that no deletion transformations are motivated,
the deep structure contains dummy nodes, and I argue that it does not. The
first two questions are discussed in }2.4.1 and }2.4.2, and the third in }2.4.3. In
}2.4.3–}2.4.5. I discuss the procedure by which we can capture the correct
generalizations about OM-sentences. In }2.4.6 I consider briefly the question
of whether our analysis has any unfortunate consequences for the base
component.

2.4.1 Can there be deletions?


The hypothesis that OM-sentences are derived from fully specified deep
structures can be made concrete in the form of the following skeletal trans-
formation, in which certain details are left unspecified.
(57) TDEL:
[S X NP Y] . . . S Z
1 2 3 4 5 ) 2 ... 4 5
What TDEL says is that in order to derive OM-sentences we must delete
everything from the antecedent sentence except one noun phrase. An objec-
tion to TDEL on metatheoretical grounds is that the deletion takes place
without any condition of identity between the deleted material and other
material in the sentence being met. Thus we see that deletion in TDEL is non-
recoverable, in the sense of Chomsky (1964; 1965).

c
Arguments along essentially the same lines are made against deletion analyses of Bare
Argument Ellipsis and sluicing in Culicover and Jackendoff (2005; 2012).
om-sentences 29

It will be recalled that Chomsky’s condition of recoverability was motivated


to prevent the following kind of situation from arising: a deep structure
D undergoes semantic interpretation such that an interpretation I is assigned
to it. A sequence of transformations {T} applies to derive surface structure S,
which is well-informed. If {T} contains a transformation which does not meet
the recoverability condition, then there is at least one deep structure D0 with
interpretation I0 such that S will be derivable from D0 by {T} also. Assuming that
S does not represent an ambiguous sentence, permitting a transformation which
does not meet the recoverability condition will incorrectly relate I0 and S. In
theory an infinite set of interpretations {T} will be related to S for this reason.
Notice now that this last result would seem to be exactly what we wish to
achieve for and-OM-sentences. One way of representing the fact that a certain
type of sentence has an infinite number of possible consistent interpretations
is to derive it from an infinite number of deep structures. In Chomsky (1965)
this method of representing the indeterminacy of elliptical sentences is
rejected on the grounds that structural ambiguity is intuitively quite different
from indeterminacy. Because of the fact that the indeterminacy of OM-
sentences is demonstrably far more complex than that of the common type
of elliptical sentence, such as (i) in note 8, one might conceivably wish to
argue that the established criterion of recoverability of deletion may not hold
in the case of the former. While I agree that the intuitive difference is quite
clear, and that the recoverability condition is amply motivated, I think that we
should attempt to investigate whether there are objections to TDEL on other
than metatheoretical grounds.11
On the basis of our brief discussion in }2.3 we can conclude that while it
would be necessary to constrain the variables X and Y in TDEL, there is no
simple way that this can be done, assuming that it can be done at all. First of
all, neither X or Y may contain a conjunction. The NP must be in the main
sentence, which means that X cannot contain a complementizer like that, nor
can it contain a relative marker. But as indicated by (55) and (56), even if the
NP is a surface subject it is possible that TDEL cannot apply, although as we
have noted the reason for this is not clear. Since the OM-sentence involves
some notion of semantic intrinsic connection between the antecedent and the
consequent, it should not come as a surprise that it would be impossible
to specify this condition in syntactic terms.d

11
Naturally if we can show that there are such empirical objections to TDEL, this will serve
to justify further the condition of recoverability of deletions.
d
Here I was arguing implicitly against the Generative Semantics program, which sought to
encode all aspects of meaning syntactically (in deep structure), and then derive the surface
structure through transformations.
30 explaining syntax

A more pointed argument against TDEL comes from a consideration of the


data presented in }2.3. There it was observed that not all if-then sentences
could be paraphrases of consequential OM-sentences. Quite independent of
the question of whether if-then is present in the underlying structure of such
sentences, however, is the question of how we would go about constraining
TDEL just in case the tense relationship between the antecedent and the
consequent was not one of those for which we can determine a conditional
interpretation.12 I will not attempt to work through the tedious demonstra-
tion of how ad hoc such a constraint would be. To get an intuitive idea of what
it would entail, consider the kind of formal constraint that would have to be
placed on a transformation which could not apply if two of its variables
contained NPs which referred to automobile racing.
Since the sequence of tense problem does not exist for the sequential and
incongruence OM-sentences, the above is clearly not an argument against
using TDEL in these cases. I think that one can be quite comfortable with
Chomsky’s observation that indeterminacy is quite a different thing from
ambiguity and should be handled by the same formal mechanisms. In general
it is quite difficult to argue against deletion transformations in cases where the
surface structure is paraphraseable by and is a sub-tree of the proposed deep
structure.e The argument must come from consideration of the power of the
transformational component and the adequacy of the analysis in describing
the similarities and differences between phenomena.
Thus one might argue plausibly that a consequential OM-sentence is no
more synonymous with a conditional than a sentence like (58a) is synonym-
ous with (58b), or (59a) with (59b).
(58) a. John was kicked.
b. John was kicked by a goat.
(59) a. John was eating.
b. John was eating potatoes.
The more fully specified b-sentences entail the less fully specified a-sentences,
but in no sense are the pairs above synonymous. It is more difficult to talk
about entailment in the case of consequential OM-sentences and condition-
als, but it seems that a similar more-or-less-fully-specified relationship exists.

12
This argument will assume that the auxiliaries of the antecedent and the consequences are
fully specified in deep structure. If they are specified in terms of a sequence of tenses rule then
other difficulties arise, which will be considered in }2.4.5.
e
Jackendoff and I (e.g. in Culicover and Jackendoff 2012) argue that this is one of the reasons
that deletion accounts of elliptical constructions (as in Merchant 2001) appear at first glance to
be plausible.
om-sentences 31

That is, given a situation in which the unspecified material is understood from
the context, the less fully specified sentence will serve to convey the same infor-
mation as the more fully specified sentence.
Notice, incidentally, that this suggests why numerical quantifiers, and
especially one more and another, are so natural in OM-sentences. The use of
one more presupposes a content which is completely known to the speaker
and the hearer. That is, whatever E(NP) is, it has happened before, everyone is
aware of it, and it might happen again. This is brought out clearly by
sentences in which one more does not appear, which are mere statements of
fact, and not threats.
the
(60) a. If you bring beer, I’ll bring the wine.
more
*The
b. beer and I’ll bring the wine.
*More
When the is used there is the implication that no beer has yet been brought.
When more is used there is the implication that some beer has already been
brought, but the consequence I’ll bring the wine, if it does not suggest a threat,
does not involve the implication that the consequence will follow from the
bringing of the beer.
It becomes clear now that the role of the determiner of the NP is precisely
to single out the next occurrence of the event as the deciding factor in the
cause-and-effect relationship, by contrasting it with all the other previous
events of a similar nature, which by implication are characterized by the fact
that they did not cause the consequence to take place.
I think it fair to say that such a criterion as whether the consequence
follows from the antecedent in the way that I have described it here has no
business as a constraint on a transformation, which would be necessary if we
wanted to maintain the derivation of these sentences by TDEL.

2.4.2 Do consequential OM-sentences have if’s in deep structure?


Having concluded that there is not a fully specified sentence underlying the
NP in an OM-sentence, we can reduce the question of whether if and then
underlie consequential OM-sentences to the question of whether the deep
structure of such sentences is (61).
(61) If NP then S
First of all, we must take note of the fact that positing such a deep structure
would be ad hoc, since there is no surface structure which is different from the
surface structure of an OM-sentence yet which must be derived from the deep
structure in (61). This case may be contrasted with any of the numerous cases
32 explaining syntax

for which there is syntactic evidence that a transformation is required to relate


two or more sets of distinct surface structures. In the case of the passive, for
example, it is possible to derive a well-formed surface structure from the
structure underlying passive sentences whether or not we apply the passive
transformation. In contrast, it is absolutely necessary to derive an OM-sentence
from (61) since failure to do so would result in an ill-formed surface structure.13
Second of all, we must consider whether it is necessarily true that two
constructions that are as similar in interpretation as consequential OM-
sentences and if-then conditionals must be substantially identical in deep
structure. Sentences (62)–(65) show that there are at least four constructions
which bear this semantic similarity.
(62) If you take one more step you’ll go over the edge.
(63) One more step and you’ll go over the edge.
(64) Take one more step and you’ll go over the edge.
(65) Anyone who takes one more step will go over the edge.
By the same argument that an OM-sentence, such as (63), must have if and
then in deep structure, we would be led to conclude that the pseudo-impera-
tive (64) and the relative clause (65) are also derived from deep structures
containing if and then. While this is certainly a mechanically workable
analysis, it is syntactically unmotivated, since the phrase structure rules and
transformations used in deriving the structure of a pseudo-imperative and of a
relative clause are independently needed in the grammar of English. That is to
say, we need rules for deriving these structures when they are not interpreted
as conditionals; therefore, there is no particular reason for deriving the same
structures by completely different set of rules when they are subject to a
different interpretation. In (66) and (67) I give examples of these structures
that do not have a conditional interpretation.
(66) Wash the dishes, and I want you to take out the garbage too.
(67) The man who visited yesterday was none other than the Pope.

13
I do not mean to imply here that all obligatory transformations are ad hoc. I am merely
claiming that in the absence of clear, syntactic motivation any such transformation is ad hoc. The
question which arises here, therefore, is whether there is any syntactic evidence for deriving
OM-sentences from a deep structure such as (61). [NOTE: Subsequent developments in
monostratal approaches to syntax such as HPSG and LFG, as well as Simpler Syntax, have
gone all the way with this argument and rule out not only obligatory transformations, but all
transformations, based largely on the absence of clear, syntactic motivation. Crucially, system-
atic synonymy such as is found in the passive is not a sufficient criterion for assigning the same
underlying structure to different constructions—see Culicover and Jackendoff (2005: chs 1–3).]
om-sentences 33

Because of such cases it appears reasonable to argue that the syntactic rules
capture generalizations about the set of possible well-formed surface struc-
tures of language. From this it follows that any principle that requires us to
represent the interpretation explicitly at the deep structure level will result in
the loss of certain generalizations about what the set of well-formed surface
structures consists of. Hence we may conclude that while consequential OM-
sentences may be derived from an underlying if-then, they need not in
principle be so derived.
A third point we must consider, then, is whether the derivation of conse-
quential OM-sentences from underlying if-then can be carried out in a
manner which does not do violence to our previously accepted notions of
what may constitute a possible derivation. It can be shown, in fact, that a
transformation could only derive an OM-sentence from a deep structure like
(61) if that deep structure met certain semantic conditions. To constrain a
transformation just in case the deep structure has a certain interpretation
would be a rather unprecedented step for us to take, particularly if less radical
alternatives are available. Let us consider what well-formedness conditions
seem to be necessary.
In previous discussion I have spoken of “the event described by S.” This
choice of words was motivated by a reluctance to introduce complications in
terminology which could only be resolved at a much later point. I will now
show that it is not strictly the case that the S must describe an event. Let us
return first of all to sentence (1), repeated below for convenience.
(1) One more can of beer and I’m leaving.
In (68) below I summarize the potential readings of this construction,
according to the observation made previously.
(68) a. If . . . NP. . . , then S.
b. After . . . NP. . . , then S.
and
c. . . . NP. . . , (surprisingly) S!
but
The question that we will concern ourselves with now is whether the S can
describe a state in any of these interpretations. It is clear that we can have a
sentence which describes a state that comes about as the result of an event,
such as the state of knowing something.
(69) One more can of beer and Bill will know the truth about you.14

14
One might say in this context that know may be used as a metaphor for learn. However one
wishes to interpret (69), it is no accident that the relationship expressed here exists between
these two verbs, and not learn and some other verb not related.
34 explaining syntax

However, it is not immediately obvious that the (a)-interpretation and the


(b)-interpretation are distinguishable in all cases. (69) may be paraphrased by
(70), omitting the (c)-interpretation.
(70) a. After you drink one more can of beer Bill will know the truth about you.
b. If you drink one more can of beer Bill will know the truth about you.
What appears to be crucial in determining whether or not there will be an
independent, distinct (a)-interpretation is not whether or not there is a state
involved, but whether or not anyone has control over the state or event
described by the S. In the case of statives like know the subject of the S has
no control of the situation: he cannot deliberately ‘refuse to know’ or ‘decide
to know’. As a consequence it is unlikely that the state of knowing will occur
after the drinking of the can of beer, but not because of it.
Similarly, if the S describes an event involving any non-agent subjects, then
the same falling together of the (a)- and (b)-interpretations is likely to arise.
(71) One more can of beer and this table will collapse.
(72) a. After you put one more can of beer on this table it will collapse (?but
not because of it).
b. If you put one more can of beer on this table it will collapse.
There is a difference between states and agentless events, however, which is
observable in case there is no plausible real-world relationship between the NP
and the S. In such a case the sequential interpretation applies to the agentless
event, but not to the state, and the consequential interpretation applies to either.
erupt
(73) One more can of beer and Old Faithful will .
be late
erupt
(74) a. After you drink one more can of beer Old Faithful .
be late
?erupt
b. If you drink one more can of beer Old Faithful will .
?be late
From this we may conclude a number of things about the felicity of the
consequential and sequential interpretations of and-OM-sentences. We may
profitably talk about these interpretations in terms of ‘felicity conditions’, that
is, conditions of the real world which must be met in order for the sentences
to be felicitous, or alternatively, conclusions about the real world which may
be drawn on the assumption that the sentences are felicitous.15 Certain felicity

15
For an introduction to the notions of ‘felicity’, ‘infelicity’, and ‘felicity conditions’, see
Austin (1962).
om-sentences 35

conditions cannot be met by particular sentences, and in these cases certain


interpretations are inadmissible.
In order for the consequential interpretation to be acceptable the following
condition must be met:
(75) The event or the state described by S (henceforth ‘E(S)’) must be able
to follow from the event or the state described by the NP (henceforth
‘E(NP)’).
In order for there to be an independent acceptable sequential interpretation
the condition in (76) must hold.
(76) E(S) must be able to follow E(NP) in time without following from it.
It is impossible for a consequential interpretation to exist where a sequential
interpretation cannot, because generally an event cannot follow from another
event without following it as well. There may be mediating circumstances,
however, which make it possible for us to understand a conditional sentence
in which the consequence does not follow from the antecedent although it
follows it.
In order to determine whether a given if-then construction could be
transformed into an OM-sentence, we would first have to determine whether
or not Condition (75) can be met. We cannot simply say that every deep
structure of the form in (61) will automatically satisfy this condition,
since there are if-then constructions which do not have corresponding OM-
sentences, because they lack conditional interpretations. For example,
(77) a. If Mary wants more can of beer, then rob a bar.
b. If Mary drank one more can of beer, why didn’t Bill tell his mother?
c. If Mary drank one more can of beer, then Bill is a poor judge of
character.
(78) a. *One more can of beer and rob a bar.
b. *One more can of beer, and why didn’t Bill tell his mother?
c. *One more can of beer and Bill is a poor judge of character.
It can be seen from these examples that there are if-then sentences which have
no OM-sentence counterparts. It is clear that the reason for this is that in both
(77) and (78), there is no E(S) such that E(S) follows from E(NP), as Condi-
tion (75) requires. According to the analysis which we are now considering,
(78a) would have to be derived from (79).
(79) If one more can of beer then rob a bank.
36 explaining syntax

Since (78a) is ungrammatical, this would suggest that (75) should function as
a well-formedness condition on deep structures. Such a suggestion is difficult
to accept, however, since deep structures are formal syntactic objects and (75)
is stated in semantic terms. Notice, more importantly, that it is not sufficient
to restate (75) in terms of syntactic structure, since there is no formulation
which would rule out the deep structure corresponding to (78c), but not
the deep structures underlying the well-formed OM-sentences which we
have been considering. We may conclude, therefore, that consequential
OM-sentences are not derived from deep structures containing if and then.

2.4.3 How do you derive an OM-sentence?


We arrive finally at the question of how we should derive OM-sentences, now
that all of the ‘reasonable’ possibilities have been eliminated. All that is left to
us are several unreasonable possibilities. By elimination the deep structure of
an OM-sentence must be (80).
(80) NP and S
There are still two possibilities open to us here, nevertheless. First of all, (80)
might be exactly the deep structure. Alternatively, the deep structure might be
(80) plus a number of dummies hovering around the NP, so that the actual
deep structure of one more can of beer and I’m leaving would be something like
the following.

(81) S

S and S

NP VP I’m leaving

V NP

one more can of beer


Of course, (82) could also be a deep structure, as could (83).

(82) S

S and S

NP VP I’m leaving

one more can of beer


om-sentences 37

(83) S

S and S

NP VP I’m leaving

S V S

NP VP NP VP

V NP

one more can of beer


In general, the surface structure of an and-OM-sentence could be generated
by an infinite number of deep structures, all of which contained one more can
of beer, and, I’m leaving, and a multitude of dummies. The question, then, is
what do we do with all these dummies?
According to accepted theory a sentence is unacceptable if it contains a
dummy which lacks an interpretation.16 So, for example, if the complement
subject after believe is a dummy, then there is no rule for interpreting it, and
the sentence is judged unacceptable.
(84) *Bill believed ˜ to be a fink.
For the sake of argument, then given a structure like (81) or (82) we could say
that there are rules for interpreting the dummies, but there would be no such
rules for interpreting all the dummies in (83), since as we know (cf. }2.3) the
NP cannot be understood as being a constituent of a complement sentence.
This brings us back to the problem which we have encountered elsewhere,
which in this case may be phrased as follows: is the dummy to be interpreted
as having a meaning synonymous with a constituent which dominates real
lexical items?
When we were considering the possibility of a deletion transformation, we
arrived at the conclusion that there was no meaning attached to the antece-
dent of an OM-sentence besides the meaning of the NP. There is, therefore, no
particular reason to assume that we can interpret the dummies and assign
meanings to them, by the same token. Just as we would not permit a
transformation which could delete lexical material from a sentence without
identity conditions being met, we also would not want a rule of interpretation
which would assign interpretations to dummies in the absence of identity.

16
See e.g. Jackendoff (1972: ch. 3).
38 explaining syntax

The analogy of the recoverability condition on deletions is the ‘source’


condition on interpretations.
The final alternative is that the deep structure of and-OM-sentences is (80)
with no further adornments. This, however, presents its own problems. First
of all, how do we represent what we know about the interpretation of an and-
OM-sentence? Second, what would the consequences for the rest of the
grammar be of having a phrase structure rule which generated (80) as a
deep structure?
(85) S ➝ NP and S

(86) S

NP and S

one more can of beer I⬘m leaving


Let us begin to answer these questions by considering sentence (11a), repeated
below for convenience.
(11) a. One more can of beer and I would have left.
We assume that (11a) is uttered with normal intonation, so that it may be
distinguished from the incongruence variant. We know that this sentence has
the interpretation of a conditional. }2.3 suggests that this interpretation is not
only a property of if-then under certain circumstances, but that it is also a
property of and, since in general we find this interpretation associated with
the forms NP and S, IMP(erative) and S,17 and S and S, with some minor
variations.18 We may hypothesize, therefore, that there are rules of interpret-
ation which, given one of these structures, an if-then structure, or a relative
clause, may assign various constituents of the structures to the semantic
categories ANTE(cedent) and CONS(equent) in the representation.19

(87) X and S ANTE: X


CONS: S

17
The notation ‘IMP(erative)’ refers to the form of the left-hand clause, and not to its
interpretation. For an extensive discussion of this duality, see Culicover (1971: ch. 1).
18
Generally the S and S structure does not have a conditional interpretation, but a cause-
and-effect interpretation. The former is a special case of the latter.
19
In representing the interpretive process I use a curved arrow to represent an interpretive
rule, a straight double arrow to represent a transformational mapping, and a single straight
arrow to represent a rule which operates only in the semantic component.
om-sentences 39

(88) [S1 NP VP] ANTE: S2


CONS: S1–S2
NP S2

(89) If S1 then S2 ANTE: S1


CONS: S2

Given that the sentence has the conditional interpretation, represented by the
ANTE-CONS pair, we may apply the sequence of tense rule, which we may
plausibly consider to be an interpretive rule defined on ANTE-CONS inter-
pretations. If would have appears in the consequent, it signifies that CONS is
future irrealis, and it requires that past perfect irrealis appear in the
ANTE. What this means in terms of the interpretation is that the ANTE
must be unrealized and completely in the past with respect to the temporal
frame of reference defined by the consequent.

(90) future
a. would have irrealis

CONS ANTE
b. future past perfect20
irrealis irrealis

past perfect
c. had
irrealis
Rules (90a) and (90c) informally represent the semantic interpretation of
the auxiliaries would have and had, respectively. Rule (90b) represents the
sequence of tenses relation defined on a conditional whose CONS is future
irrealis, i.e. on a counterfactual conditional. It is this rule which is of most
concern to us.
Let us consider the interpretive process as it applies to a sentence like (91).
(91) *If you go I would have gone.
By rule (89) we get (92).
(92) CONS: ‘I would have gone’
ANTE: ‘you go’

20
I use the term ‘past perfect’ rather casually here. It is intended to represent the notion
‘completely in the past’. I do not wish to suggest that had is the only auxiliary which may appear
in the antecedent of a counterfactual conditional. (i) shows that this is simply false.
(i) If John could have fixed the faucet we wouldn’t have been flooded out of the house last night.
40 explaining syntax

By (90a) and (90c) we get (93).


(93) CONS
future : ‘I would have gone’
irrealis

ANTE
future : ‘you go’
irrealis

By (90b) we get (94).


(94) ANTE
past perfect : ‘you go’
irrealis
In comparing the entries for ANTE in (93) and (94) we discover that there
is a contradiction. By such a method as the preceding we can represent the
unacceptability of (91).
We can use a rule like (90b) in the following way: it assigns certain semantic
characteristics to a portion of the representation. If this portion of the repre-
sentation has been assigned the same semantic characteristics by other rules,
then the representation is consistent. If other rules have assigned different
characteristics, as in (91)–(94), then the representation is inconsistent. This is
a perfectly straightforward way of viewing the operation and interaction of
semantic interpretation rules. This operation applies in exactly the same way as
selectional restrictions apply, by supplying semantic markers when they are
absent and marking anomalies when semantic inconsistencies are present.
Notice, however, that if there are no other rules but the rule in question, in
this case (90b), then the rule in question will assign a perfectly consistent and
well-formed interpretation to the part of the sentence which it applies. If
(90b) were to apply to an antecedent which lacked explicit time, it would
nevertheless assign the characteristics ‘past perfect irrealis’ to the representa-
tion of this antecedent. This is in fact what happens in the case of and-OM-
sentences. Going back to (11a), we have (95).
(95) CONS: ‘I would have left’ (by (89))
ANTE: ‘One more can of beer’
(96) CONS ANTE
future → past perfect (by (90a) and (90b)).
irrealis irrealis
My main purpose in giving the preceding demonstration is to show that it
is perfectly feasible to construct an interpretive mechanism which will capture
om-sentences 41

the clear generalizations about if-then conditionals and consequential OM-


sentences. There are certain aspects of such an interpretive mechanism and an
alternative transformational one that should be compared, and we will now
proceed to a discussion of these.

2.4.4 Comparing approaches


An interpretive mechanism assigns the ANTE-CONS semantic structure to
the OM-sentence and the if-then sentence independently. A transformational
approach would derive one from the other, or both from a third source. It is
worth considering for a moment whether either approach is a priori more
highly valued in regard to capturing the desired generalizations.
Let us assume that two surface structures S1 and S2 have the interpretation
I. Let D1 and D2 represent deep structures, T a transformation, and R1 and R2
rules of interpretation. Assume also that D1 and D2 may also function as
surface structures, that is S1 may be derived from D1 by the application of only
obligatory transformations, and similarly for S2.
We may represent the transformational and the interpretive approaches as
in Figure 2.1.

Transformational:
T
I D1 S2
R1
S1

Interpretative:

I D1 S1
R1
D2 S2
R2

Figure 2.1

The area of comparison centers on T, D2 and R2. A priori there is no reason to


prefer T over D2/R2. The question is an empirical one: are T, D2, or R2
independently motivated? If the answer is that none are, then there is no
way of choosing between the two approaches. However, if one has independ-
ent justification, then the more widely motivated solution is to be favored.
The question in terms of the particular data being considered here is
whether there is independent motivation for a transformation which deletes
if-then and inserts and where then was located. Quite aside from the problems
created by such a transformation, it is clear that there is no independent
motivation for it. It appears that such a transformation would be completely
42 explaining syntax

ad hoc, deriving OM-sentences from if-then structures only for the purpose
of relating them semantically. When one takes into account the further
objections to such a rule which were discussed in }2.4.2, the case for it is
very weak indeed.
On the other hand, we must also ask whether there is independent motiv-
ation for postulating a deep structure D, in this case that characterized by (85)
and (86). Clearly a conjoined structure in English is generally well-motivated
for all kinds of sentences, and it should be noted that the conjunction and
found in OM-sentences is not a lexical item that is characteristic of OM-
sentences, but rather one of the set of coordinating conjunctions in the
language. It can be plausibly argued that the phrase structure rule in (85)
can be generalized to (97), furthermore, on the evidence that and is not the
only conjunction which can function in this position.
(97) S ➝ NP CONJ S
and
(98) Twelve cases of beer or I’m leaving.
but

Notice also that the form of the OM-sentence is basically like that of a typical
conjoined structure; i.e. the conjunction is located between the conjoined
constituents. This is precisely what we would expect if the OM-sentences were
a sub-class of the class of conjoined structures, and not something quite
unique.

2.4.5 Sequence of tenses


There is a second method by which we can choose between the competing
analyses outlined in }2.4.4. It will be recognized that rule (90b), which assigns
the proper time and realization feature to the antecedent in case it lacks an
auxiliary, is a special case of the rule which we generally refer to as ‘sequence of
tenses’. This rule is commonly understood to specify the form of the auxiliary in
certain complex structures. The most striking example is the counterfactual
conditional, which we are considering here, but there are others.
The interpretive approach to this problem assumes that sequence of tenses
is a ‘semantic’ phenomenon, in that it assigns features to the representation of
a structure on the basis of other features assigned by interpretive rules
applying to other structures. The transformationalist approach would say,
simply, that sequence of tenses is a transformation. Once again this is an
empirical issue and should be decided on empirical grounds.
Let us schematize the two approaches once again. Consider two deep
structures analyzable as D-D0 and D-S, interpretative rules Rc, Rd, and Rs
which map D into Id, D0 into Id0 = [F Is] and S into Is respectively, a semantic
om-sentences 43

Transformational:
T
D–D⬘ D–S
Rd Rd⬘

Id–Id⬘

Interpretative:
D–S

Rd Rs
M
Id–Is Id – [F Is] = Id–Id⬘

Figure 2.2

rule M which assigns a semantic feature [F] to Is, and a transformation


T which maps D0 into S.
It can be clearly seen in Figure 2.2 that there is no formal difference in the
complexity of the two descriptions; the only question is whether the phenom-
enon described involves syntactic or semantic features. It can be shown that
sequence of tenses, at least as it is observed in the cases which we are
considering here, is a purely semantic phenomenon.
Consider for a moment what the transformation T in the transformational
approach would look like. There are two possibilities, depending on what we
assumed the deep structure of the AUX was. If we assumed that the deep
structure was determined by (99), then T would have the function of specify-
ing at least the tense of D0, and possibly the remaining contents of the AUX
as well.f
(99) AUX ➝ TENSE (M) (have+en) (be+ing)
T would have the function of specifying the permissible sequences of AUX’s in
conditionals, e.g.
M
(100) if. . . Present (
be
) . . ., then. . . Present ( M
be
) . . .)

M
(101) if. . . Past (
be
) . . ., then. . . Past ( M
be
)...

f
The analysis of the English verbal sequence assumed here is that of Chomsky (1957). The
particular formalization of the morphology is not critical, and the argument would go through
equally well if the analysis was updated to a more contemporary one expressed in terms of
selection by each auxiliary verb of a VP with a particular morphological feature.
44 explaining syntax

(102) if . . . Past (M) have +en (be+ing) . . . , then . . . Past will have+en
(be+ing) . . .
Such a transformation would capture the generalization that in a conditional
the antecedent and the consequent must be of the same tense, e.g.,
a. I’ll wear a clown suit.
b. I’m wearing a clown suit.
(103) If you buy a rubber duck then
c. *I would wear a clown suit.
d. *I was wearing a clown suit.
One of the problems in defining T is that (103c) is unacceptable only as a strict
conditional. It can be seen that it is perfectly acceptable if interpreted as an
epistemic conditional. In order to maintain the generalization of sequence of
tenses in conditionals, it would be required to hypothesize a different deep
structure for the epistemic interpretation.
We must assume for the sake of argument, therefore, that we have a means
of distinguishing the two deep structures. It will turn out that it is not
sufficient for the consequent to be only present tense; it must also be future
time. If we consider (103b) we find that while it is ambiguous, it only has the
conditional interpretation if it is understood as being future time, while it has
the epistemic interpretation if it is understood as being present time. Fur-
thermore, the examples in (104), which do not display sequence of tenses, are
also present tense, yet cannot be conditionals.
(104) a. *If you leave early then I have left before you.
b. If you are Napoleon then I am the King of France.
c. If you have blonde hair then I must have dreamt you were a brunette.
We are still assuming that the deep structure for a conditional is identifiable as
such. What happens when we encounter a deep structure which can be an if-
then, but not a conditional, by virtue of the fact that present tense in this
structure is not interpretable as future time? We must identify the representation
of this structure as semantically inconsistent, because the consequent of a
conditional must be future time, yet cannot be future time in the case in
question. However, a semantic rule that will perform this identification would
eliminate the need for a sequence of tenses transformation, since the rule would
also identify as non-conditionals those cases in which the consequent is past
tense, and for the very same reason, i.e. past tense is not future time. Conse-
quently it becomes clear that a sequence of tenses transformation is a spurious
generalization, the true generalization being a sequence of time semantic rule.21

21
An alternative approach would be to assume not only that the deep structures of epistemic
conditionals and strict conditionals are different, but also that the present/future time
om-sentences 45

2.4.6 The consequences for phrase structure


One further question which I could like to consider is what the consequences
of having a phrase structure rule like (97) are. In Culicover (1970) I suggested
that a rule like (97) would make false predictions about the interpretation of a
certain class of sentences. To review the argument briefly, I pointed out that
due to the recursiveness of the S, there would be no way to avoid generating
structures like (105).

(105) S1

NP1 and S2

NP2 and S3

NP3 and S4

NP4 and S5

The sentence corresponding to such a structure would be (106).


(106) One more can of beer and one more whiskey sour and one more glass
of wine and one more daiquiri and you’ll be just as drunk as you were
at that party last Christmas.
The question which must be answered concerning a sentence like (106) is
whether it possesses the interpretation of nested conditions as (105) would
suggest it should. That is, can (106) be paraphrased as follows?
(107) If you drink one more can of beer then if you drink one more whiskey
sour then if you drink one more glass of wine then if you drink one
more daiquiri then you’ll be just as drunk as you were at that party last
Christmas.
In Culicover (1970) I concluded that the recursiveness of rule (97) presented a
severe problem, because (107) is not a paraphrase of (106). It appears, now,
however, that this is not a great problem as it originally seemed to be. The

ambiguity of present tense is relatable to a deep structure difference. Thus we might assume that
Time, and not Tense, is present in deep structure, as Present, Past, or Future.
Such an assumption would enable us to maintain sequence of tenses as a syntactic general-
ization capturable by a transformation, but it would prevent us from capturing the syntactic
generalizations which are expressed in the standard expansion of the Aux given in (99). In view
of this, and in light of the discussion above, it can be seen that failure to account for sequence of
tenses in the form of a semantic rule results in the failure to capture generalizations at one place
or another in the grammar.
46 explaining syntax

felicity condition (75) requires that E(S) follow from or be perceived as


following from E(NP). In order not to identify S2 in (105) as a CONS(equent)
of the ANTE(cedent) NP1, we need only note that strictly speaking a condi-
tional does not constitute an event or state in the same sense that a clause like
S5 does. To the extent that one might wish to argue that (106) has the sense of
a nested conditional, then to that extent one must redefine the notion ‘event’
so that it is satisfied by a conditional. This kind of variation in the applicabil-
ity of a felicity condition would not be surprising if it occurred.

2.5 The incongruence reading of and-OM-sentences


We observed in }2.4 that the incongruence reading was intimately related to
an abnormal stress contour on the S. In the following description I will mark
the stress levels with numbers such that 1-stress will be the highest normal
stress. ‘E’ will be used to denote higher than normal stress, and will be used in
conjunction with the numerals to indicate relative stress levels. For example,
I will consider 2 + E to be greater stress than 1-stress, and so on.
Let us consider briefly some of the grosser characteristics of normal
contrastive and emphatic stress. The normal stress rules22 will assign a 2 3 1
pattern to an SVO sentence.

(108) John is eating an apple.


2 3 1
If we wish to contrast elements in the sentence which have normal stress, it is
not necessary to alter the normal stress as long as the notion of contrast is
conveyed by the similarity of structure and the partial lexical identity of the
clauses.

(109) John is eating an apple, and Mary is eating a pear.


2 3 1 2 3 1
It is permissible, however, to add emphatic stress to the contrasted
constituents.

(110) John is eating an apple, and Mary is eating a pear.


2 3 1 2+E 3 1+E
2+E 3 1+E

To contrast elements in the sentence which are normally unstressed we assign


high stress to them, and lower the surrounding stress levels accordingly. It is

22
Cf. Chomsky and Halle (1968), Chomsky (1971), and Bresnan (1971).
om-sentences 47

difficult to say at present whether the high stress so assigned is abnormally


placed 1-stress, or emphatic stress, i.e. higher than 1-stress. Probably both are
possible.

(111) John ate an apple, but Mary didn’t


2 3 1 2 1
1 3 2 1 2

(112) John cut the apple, but he peeled the grape.


3 1 2 2+E 1
There is a very definite sense, in any case, in which emphatic stress and
contrastive stress may be distinguished. It is likely that out of context they
cannot be compared with respect to their relative stress levels. However,
contrastive stress does not convey the notion that something is out of the
ordinary, while emphatic stress does. To see this, let us consider the stress
pattern in incongruence OM-sentences.
(113) Two thousand cases of beer, and I’m going home.

(114) a. 2+E 3 1+E


b. 2+E 3 1
c. 2 3 1+E

(115) a. *2 3 1
b. *1 3 2
c. *3 1 2
The only stress patterns which are acceptable for the incongruence reading
here involve higher than normal stress levels. To each of these stress patterns
corresponds a different paraphrase involving different presuppositions. Com-
pare (114a), (114b), and (114c), with (116a), (116b), and (116c) respectively.
(116) a. There are two thousand cases of beer here, and of all things, instead
of staying I’m going home.
b. Everybody is going to be here with the thousand cases of beer
except me, who is going home.
c. Instead of going to where the beer is I’m going home.
In the first reading, what is being emphasized is that I am doing something
which I normally would not do it there were a thousand cases of beer around,
namely go home. In the second reading, what is being emphasized is that of
all the people who might normally go home when there are a thousand cases
of beer around, I am not one of them. In the third reading, what is being
48 explaining syntax

emphasized is that I am going to a place where I would normally not go if


there were a thousand cases of beer elsewhere.
There are most likely numerous other inflections which will also satisfy the
context of (113). In every case, however, there must be the presupposition that
the event described by the S is somehow incompatible with the event involv-
ing the NP. This particular presupposition is not generally associated with
simple contrastive stress or the normal stress contour, hence the examples in
(115) are inacceptable in this context, lacking as they do emphatic stress.
The presupposition mentioned here is not, of course, restricted to
OM-sentences only.

(117) a. John brought two thousand cases of beer and I’m going home.
2+E 3 1+E
b. There are two thousand cases of beer at the party to be divided
up among all the people there, and I’m going home.
2+E 3 1
c. There are two thousand cases of beer at John’s house,
and I’m going home.
2 3 1+E
The incompatibility may exist at a number of levels: behavior contrary to
desirable behavior, behavior contrary to required behavior, behavior contrary
to normal behavior, etc. In any case, the incompatibility arises as a result of
some variation for expected behavior in a certain context. An example will
illustrate the range of variation possible.
(118) One thousand cases of beer, and John is going home.
If John’s job is to load beer, then his behavior is contrary to what is required of
him. If he likes to drink beer, then his behavior is contrary to what would be
desirable for him to do. If it is considered a normal human trait to drink beer
when it is available, then his behavior is contrary to what would be normal for
him. If the event described by the S involves a natural phenomenon, then the
abnormality is more strongly felt, e.g.,
(119) Three days of sunshine and this flower hasn’t bloomed.
It turns out that the incongruence reading is also possible if there is no emphatic
stress, but if it is explicitly stated that there is something strange about E(S).
(120) A thousand cases of beer, and John’s going home, strangely enough.
The strangely enough, without emphatic stress in the sentence, refers to the
entire activity, and not simply to who is going, or where John is going.
A sentence which contains still, which carries with it the notion of exceptional
behavior, is also acceptable even with normal stress.
om-sentences 49

(121) A thousand cases of beer, and John is still going home.


In the light of this we may say that a felicity condition on the incongruence
reading, then, is the following:
(122) E(S) must be understood to be incompatible with what is presupposed
to be normal behavior in the context of E(NP).

2.6 Rhetorical OM-sentences and the incongruence reading


I would like to conclude this paper with some comments on a type of
construction which possesses some of the more noticeable characteristics of
OM-sentences. I will call this construction a ‘rhetorical OM-sentence’, for
reasons that will be obvious upon examining a typical example.
(123) Twenty-five centuries of language teaching and what have we learned?
While I would like very much to be able to consider this construction in
depth, such a course would lead us far afield. My intention in introducing
(123) here is to provide some further evidence for the correctness of the
interpretive analysis for OM-sentences outlined in }2.4.
Note first that (123) may be paraphrased by either of the following sentences.
(124) Twenty-five centuries of language teaching and we have learned nothing.
(125) Twenty-five centuries of language teaching and we haven’t learned
anything.
The fact that a question may appear in an OM-sentence would tend to argue
in favor of the phrase structure rule (97), since the range of structures which
may be generated by this rule is shown by these examples to be even wider
than had originally been thought. Such an argument would be vitiated if it
could be demonstrated that (124) underlies rhetorical OM-sentences.
This latter proposal is quite reminiscent of the transformational proposal
for deriving OM-sentences from if-then structures. In this case we are faced
with the choice of having a transformation which transforms a negative
sentence into a question, or of having an interpretative rule which assigns a
negative interpretation to an underlying question. As can be seen from
Figure 2.3, the latter analysis makes use of the fact that questions must be
generated by the grammar, anyway, while the former derives questions from
two difference sources.23

23
In the subsequent discussion I assume knowledge of the standard analysis of questions and
negation found e.g. in Klima (1964) or Culicover (1971).
50 explaining syntax

Transformational:
T
WH neg WH

Rwh R1

⬘WH⬘ ⬘neg⬘

Interpretative:
WH neg

Rwh R2 R1

⬘WH⬘ ⬘neg⬘

Figure 2.3

Other things being equal, I would be inclined to say that the interpretative
analysis is to be preferred over the transformational. There is no evidence that
T is independently motivated, and the grammar must be able to generate
questions in any case.
There are two kinds of evidence which would suggest that the transform-
ational analysis was to be preferred. First of all, if we found a case in which a
structure containing formal negation but lacking a negative interpretation had
to be related transformationally to a structure having the form of a question,
this would constitute an independent motivation for T. Second of all, if we
found a structure which was fundamentally a question, and had the interpret-
ation of negation, but which displayed distributional characteristics possessed
by negation and not by questions, then this would argue for the derivation of
rhetorical questions from negation by applying T after the application of the
well-motivated distributional rules pertaining to negation.24
Let us, therefore, consider these two hypothetical cases. It turns out that a
sentence in which negation is present, but which lacks a negative connotation,
already has the form of a question, e.g.
(126) Why don’t you sit down next to me.
(127) Aren’t you the guy who wants to marry my daughter?
(128) Haven’t we had fun!
Since all these examples require the presence of an element which causes
inversion, it is not clear what would happen if negation in each case was
replaced by another element which also causes inversion, i.e. WH. This

24
Cf. Klima (1964) and Jackendoff (1969).
om-sentences 51

question can be answered on the basis of other considerations, however, if


we note what would happen if a sentence contained both WH and neg in
deep structure and T applied to it. At best we would derive (129b) and (129c)
from (129a).
(129) a. WH neg someone do something.
b. Who didn’t do anything?
c. Who did what? (by application of T)
Even if this analysis could be made to work without loss of generalizations in
the grammar as a whole, the result would not be a rhetorical question,
although we would predict that it should be.25
(130) *Twenty-five centuries of language teaching and who has learned
what?
Thus not only is there no evidence for deriving questions for negation lacking
a negative connotation, but there is evidence for not deriving questions from
certain cases of negation which possesses a negative connotation.

not
(131) John does *whether like peanut butter.
*if

nothing
(132) John is doing nothing very interesting.
*what
The first kind of evidence discussed constitutes an argument against a trans-
formation T. The second kind of evidence, while consistent with a transform-
ation, is predicted by the interpretative approach, which insists that a
rhetorical question can be nothing more than a question with a special
interpretation. Thus the existence of rhetorical OM-sentences argues in
favor of the interpretative approach which we have established in general
for the derivation of OM-sentences.

25
Even if (130) was grammatical it would not have the interpretation predicted by the
transformation T, as a consideration of (129) will show. (130) would have to be derived
from (i)—
(i) neg someone has learned something
—which still leaves open the question of why the surface structures (129b) and (129c) do not
have the rhetorical interpretation when derived from (129a).
52 explaining syntax

2.7 Summary
In the course of this discussion arguments have been made in favor of a
number of claims:
(a) The interpretation of an and-OM-sentence is systematically
unspecifiable.
(b) Hence we would be incorrect in deriving the surface NP from an
underlying S.
(c) No syntactic generalizations would be captured by deriving conse-
quential OM-sentences from underlying if-then conditionals. Further-
more, doing so would require a semantic condition on the well-
formedness of certain deep syntactic structures.
(d) Hence we require the phrase structure rule S ➝ NP CONJ S.
(e) Properly stated rules of semantic interpretation can adequately cap-
ture the similarity between consequential and-OM-sentences and if-
then conditionals.
These conclusions cast strong doubt, of course, on the validity of any linguis-
tic theory, such as generative semantics, which in effect requires that para-
phrases be identical in deep structure to the extent that they are identical in
interpretation.
Points (c) and (d) above in particular show, assuming that they are valid,
that there exist constructions which possess different deep structures but
which share a significant portion of their interpretations. I suspect that such
a situation will prove to be quite common, and furthermore that it may even
turn out to be the most natural state of affairs with respect to the relationship
between the totality of syntactic structures of a language and the set of
possible interpretations.26

26
The analysis in this article thus constitutes one of the earliest published entries in the brief
for construction grammar, later developed explicitly by Fillmore, Kay, Goldberg, and others.
3

On the coherence of syntactic


descriptions
(1973)*

Remarks on Chapter 3
My intent in this writing this article was to capture the fact that languages
show a significant degree of constructional coherence that cannot be reduced
to meaning. In part this was an argument for the autonomy of syntax, an issue
that was hotly debated in generative grammar in the 1960s and early 1970s. In
part it is an early argument for constructional inheritance (which I called
‘coherence’), which came of age some 20 to 30 years later with the emergence
of Construction Grammar (Fillmore et al. 1988; Goldberg 1995; Kay and
Fillmore 1999: Fillmore 1999; Kay 2002a; Sag 1997, among others).
The article focuses on English tags, such as those found in tag questions.
I argue that tags are characteristic of English, and that they are found in a set
of formally similar but distinct constructions. These constructions cannot be
reduced to a single general construction because each special case has its own
particular function and meaning. I used evidence from rule ordering (which
was a prominent device at the time) to argue that there can be no uniform
syntactic derivation of all of the different tags. I argue that a grammar in
which the same structure is used in distinct constructions is more ‘natural’
than a grammar in which the constructions use unrelated structures.
I propose a measure to capture this naturalness even when the various
constructions cannot be collapsed into a single construction.

3.1 Rules for tags


There is, in some dialects of English, a construction which might well be
called an emphatic tag. Consider the following examples:

* [This chapter appeared originally in Journal of Linguistics 9: 35–51 (1973). It is reprinted here
by permission of Cambridge University Press.]
54 explaining syntax

(1) He’s quite a ballplayer, is John.


(2) She’ll have a fantastic wedding, will Jill.
(3) It’s dangerous, is the pill.
And so on. Let us assume for the sake of discussion that the rule which forms
such tags applies to an underlying structure similar to the structure under-
lying yes-no questions, with the difference that the former contains an
emphatic morpheme such as EMPH (cf. Klima 1964: 257), while the latter
contains a question morpheme. Let us call this rule Emphatic Tag Formation,
or ETF. It is not necessary to give a precise formal statement of this rule at this
point, although it is possible (cf. }3.11).
Let us also consider the rules of Tag Question Formation (TQF) and
Imperative Tag Formation (ITF). The first applies in sentences such as (4),
the second in sentences such as (5):
(4) John doesn’t want to be President, does he?
(5) Leave me alone, will you.
I will be particularly concerned here with investigating the ordering relation-
ship between the three rules mentioned above. One might be inclined to
suspect at first glance that all three rules are in fact the same rule; that is, that
the three rules are ordered consecutively in a grammar of English and can
most likely be collapsed into a single rule. I will demonstrate that this cannot
in fact be the case.a To do this one must show that given the three rules already
mentioned (call them I, II, and III), it is possible to find two other rules
(A and B) such that the rules are ordered as in (6):

(6) I
A
II
B
III
Let us consider, therefore, what these rules A and B might be, and how the set
of rules in question is ordered.

a
At the time that this article was written, rule ordering was considered to be an empirically
significant component of grammatical descriptions. On the view current at the time, languages
could differ by having the same rules with different orderings, which would yield different patterns
of grammaticality. Crucially, two rules could be collapsed into a single rule only if they were
ordered adjacent to one another. While the argument in this article demonstrates that the rules for
English tags are not ordered adjacently and so cannot be collapsed, a contemporary interpretation
of the result is that they are distinct constructions that cannot be characterized in uniform terms.
the coherence of syntactic descriptions 55

3.2 Orderings
It can be argued, first of all, that the well-known rule of neg-placement (Klima
1964; Jackendoff 1969), which we will abbreviate here as NEGP, must be
ordered after TQF in order for certain generalizations to be captured. Of
particular importance is the fact that at most one neg can appear in a given tag
question:
(7) Harry fell first, didn’t he?
(8) Harry didn’t fall first, did he?
(9) Harry fell first, did he?
(10) *Harry didn’t fall first, didn’t he?
It is possible to capture this generalization quite elegantly by ordering TQF
before NEGP.
To see this, observe that the essential function of TQF is to duplicate the
underlying aux to the right of the sentence. The function of NEGP is to move
neg into the aux of a sentence. If we assume that the sentence has at most one
neg in deep structure, and if we accept the preceding characterization of TQF,
then the ordering of TQF before NEGP accounts automatically for the
distribution of neg in tag questions. Since TQF in effect creates a new aux, it
follows that no special statement is necessary to represent the fact that neg
may appear in another aux, but not both. This fact is not captured in a
particularly revealing way by Klima (1964), who has the ordering

NEGP
TQF,
with insertion of neg into the tag only if the aux of the underlying structure
does not also contain neg.
Of course, it is necessary that we constrain NEGP in such a way that it does
not insert the neg of one S into another S, as could conceivably occur in the
case of conjoined sentences in which neg was attached to the leftmost S. Since
there is no comparable syntactic generalization to be captured by permitting
neg to be inserted anywhere in the entire conjoined structure, we must restrict
the scope of NEGP to a single S. Whatever the convention to do this might be,
we must then give the tagged sentence a structure such that NEGP will not be
constrained from inserting neg into the tag. What this means, basically, is that
the structure of a tagged sentence is not that of a conjoined sentence, which is
not a particularly surprising claim.
56 explaining syntax

The derivations may be schematically represented by the following. Details


irrelevant to this discussion are omitted. Note that (a) and (b) are optional
variants.

(11) neg NP AUX X =TQF=>


neg NP AUX X NP AUX =NEGP=>
(a) NP AUX X NP AUX

neg
(b) NP AUX X NP AUX

neg
On the basis of the preceding discussion it seems correct to order TQF before
NEGP:

TQF
NEGP.
For the reader who has doubts about the validity of an analysis such as the
above, particularly because of the fact that (a) and (b) are from the same deep
structure, yet differ in meaning, I have a few (hopefully) soothing words.
While it is true that most transformational grammarians have operated
during the past seven years or so with the criterion that sameness of meaning
should be represented by sameness of deep structure, Jackendoff (1972)
pointed out that this criterion was nothing more than a version of the
Katz–Postal Hypothesis (cf. Katz and Postal 1964). The continuing validity
of a hypothesis depends crucially on its applicability to a wide variety of cases.
In the situation under discussion, we have a putative syntactic generalization
which cannot be captured if the sameness of meaning criterion is applied
rigorously. Therefore we can only conclude either (a) that the putative
syntactic generalization is a spurious one, or (b) that there is at least one
exception to the criterion. Since I cannot accept the first conclusion, I am
forced to accept the second. Jackendoff (1972) demonstrates that the class of
exceptions is quite numerous, so that the hypothesis can be seriously ques-
tioned on general grounds, and not merely on the basis of isolated incidents
such as the one discussed here.
A consequence of this view is that I do not assume the existence of deep
structure morphemes whose only function is a semantic one. For example, in
the analysis below in }3.3 I state the inversion transformation in terms of WH
only, and do not include Q (as do Katz and Postal 1964), since the existence of
Q is not required in order for us to capture the relevant syntactic
generalizations.
the coherence of syntactic descriptions 57

3.3 Neg-contraction
The other rule which we must bring into this analysis is neg-contraction
(NC). It is important first of all to determine what the precise statement of
this rule is. In order to do this we must also consider the rule of Inversion,
which I state below:
(12) Inversion:
WH NP TENSE(+[+v]) X
1 2 3 4 => 1 3 2 4
where [+v] = Modal, have or be

The application of this rule, which is also a well-known one, will result in
surface strings like (13) and (14):
(13) What did you give to Hermann?
(14) Did you give a hammer to Mildred?
It turns out that it is possible to capture an interesting generalization by a
judicious ordering of Inversion and NC, provided that NC is stated correctly.
When NC applies before Inversion, the contracted negation is inverted along
with [+v]. E.g.:
(15) Didn’t you like the concert?
(16) Did you not like the concert?
(17) *Did not you like the concert?
(18) *Did you n’t like the concert?
From (17) it can also be seen that if NC does not apply, then Inversion cannot
move neg to the front of the sentence. It appears logical, therefore, that for the
simplest expression of Inversion we should state NC so that it attaches n’t to
something that will later invert.
One candidate for this would be [+v]. However, even in sentences with no
[+v], NC applies to neg and then Inversion inverts the resulting structure.
Note that the rule ordering must be:

NC
Inversion.
It follows that NC cannot attach n’t to do (from do-Support) if there is no
[+v], because do-Support must follow Inversion. This is shown by the
following example, where negation is absent:
(19) Do you like Crunchy Fazola?
58 explaining syntax

Hence NC must attach n’t to TENSE, which is always present. The transform-
ational mapping NC will look something like (20):

(20) X1 - AUX - X3 - => X1 - AUX - X3

TENSE X2 not TENSE X2

Past Past
n’t
Pres Pres
It can be seen that a consequence of this is that the output of Affix-hopping
(Chomsky 1957: }5.3) in case a [+v] is present will be [+v]+ Past + n’t, which
Pres
is the correct ordering of morphemes in surface structure. By ‘correct’, I mean
that, given that can + Past ) could, no extra statement is required to predict
the surface form of can + Past + n’t. The surface form of can + n’t + Past,
however, is not nearly as predictable. If n’t was attached to [+v], giving, for
example,
MODAL

can n’t
then Affix-hopping would have as output the string

Past
*can + n’t + .
Pres
This is interesting in view of the independent argument against attaching n’t
to [+v] mentioned above.
If n’t was not attached to either TENSE or to [+v], then we would be
obliged to mention the optional presence of n’t in the structural condition of
Inversion, i.e.
. . . NP TENSE (+[+v]) (+n’t) . . . ,
which misses the generalization concerning contraction and inversion.

3.4 More orderings


We may now inquire as to the ordering of these five rules. It is obvious that
NC follows NEGP:
TQF
NEGP
NC
the coherence of syntactic descriptions 59

It can now be shown that NC precedes ITF. ITF is used to generate sentences
like (5), repeated below for convenience.
(5) Leave me alone, will you.
To determine the ordering relationship between NC and ITF we must con-
sider the paradigm illustrated by (21)–(24).
(21) Leave me alone, won’t you.
(22) *Leave me alone, will you not.
(23) *Don’t leave me alone, will you.
(24) *Don’t leave me alone, won’t you.
The crucial sequence of constituents involved in the derivation of imperative
tags is underlying NP-AUX. As can be seen, neg may be found in the tag only
when it has contracted. Compare (21) and (22) with the tag questions (25):
didn⬘t he?
(25) Harry swam breast stroke,
did he not?
This means that the rule which forms the imperative tag will create a tag with
neg in it if contraction has applied, and will not apply at all if neg is present
but has not contracted. Hence, ITF must follow NC.

TQF
NEGP
NC
ITF

3.5 Emphatic tags


Let us return finally to the emphatic tags. The examples (26)–(29) show that
while neg may not appear in an emphatic tag, it may appear elsewhere in a
sentence which possesses an emphatic tag:
(26) *He’s quite a ballplayer, isn’t John.
(27) *She’s not going to have a good time, isn’t Sue.
(28) He’s not getting any younger, is George.
(29) She won’t like that very much, will Mary.
60 explaining syntax

Given that the AUX of the emphatic tag is identical to the first element of
the underlying AUX, it was reasonable to suppose that ETF uses the sequence
NP-TENSE (+[+v]) in forming tags. If ETF preceded NEGP, then the latter
rule would have two possible locations in the sentence available for the
placement of the neg. While this would be a correct formulation in the case
of TQF, sentence (26) shows that ETF must follow NEGP.
On the other hand, if NC preceded ETF, then we would expect contracted
neg to appear in the tag, since n’t would be attached to TENSE, and a copy of
TENSE appears in the tag. Sentence (26) again shows that neg does not appear
in the tag: therefore ETF must precede NC. This establishes our final rule
ordering.
TQF
NEGP
ETF
NC
ITF

3.6 Some implications


The reason that is it profitable to pursue such a low-level phenomenon as the
formation of English tags with such dedication to detail is that there are
problems raised here with are not subject to resolution by methods such as
those employed by Kisseberth (1970) or by Emonds (1970), which would seem
to be the most likely candidates. Both Kisseberth and Emonds hypothesize the
introduction into the theory of certain constraints which would have the
effect of saying that certain structures of the language are in a sense charac-
teristic of the language. On the face of it, this appears to be basically what we
are trying to do here.
Our intent is to come up with some way of expressing the notion that it is
not surprising that English has three rules which create tags of roughly the
same structure in the same relative position in the sentence, and that it would
be more surprising if English had three tag formation rules which each
created a tag in a different relative position in the sentence. What we are
saying, in essence, is that the formation of such tags is somehow characteristic
of English. Since both Kisseberth and Emonds address themselves to precisely
the problem of characterizing the notion characteristic, it is reasonable to
ascertain whether either of their proposals is applicable to the problem at
hand.
the coherence of syntactic descriptions 61

Kisseberth, in his paper on Yawelmani phonology and its implications for


the theory of phonology, claims that “rules may be alike in having a common
effect rather than in operating upon the same class of segments, or performing
the same structural change, etc.” After giving evidence for this claim, he
concludes that as a first approximation we might establish a kind of deriv-
ational constraint (a notion introduced by Lakoff 1969; 1971) which has as its
function the ruling out of any derivation which will lead to the creation of a
forbidden phonological sequence. He also discusses a variation on this theme.
Emonds, on the other hand, attempts to account for the fundamental
persistence of syntactic structures throughout certain well-defined subse-
quences of derivational sequences. I summarize his hypothesis as it relates
to our problem and return to the main discussion in }3.7.
Emonds hypothesizes that there are two kinds of transformations in a
grammar: structure-preserving and root. A transformation is structure-
preserving if the input and the output of the transformation are substan-
tially characterizable by the same phrase-structure rules. The fundamental
claim made by Emonds is that all transformations which may apply to
embedded sentences as well as the topmost sentence in a tree are structure-
preserving. If a transformation is not structure-preserving, then it is a root
transformation, and may apply to the topmost S-node of the tree only.
Since the first transformations which apply in a derivation apply to the
lowest part of the phrase-structure tree, it follows that there will be an initial
subsequence of trees which constitutes the derivation that is characterized by
the fact that all of the transformations which apply to produce this subse-
quence are structure-preserving. Since the input to the earliest transformation
in this subsequence is characterizable by the base phrase-structure rules of the
language, this means that all the trees in the subsequence in question are also
characterizable by this same set of base phrase-structure rules. What Emonds
is saying, in effect, is that to a considerable extent the surface structures of a
language are the deep structures of the language. This is not the case only
when a surface structure is derived by a sequence of structure-preserving rules
followed by at least one root transformation.

3.7 The impossibility of collapsing tag rules


Kisseberth’s constraints, if I interpret them correctly, are concerned with
establishing that certain structures may not occur in any derivation, no matter
what the actual roles contributing to this derivation might be. Emonds, with
his structure-preserving transformations, is stating constraints to the effect
that certain structures must occur (over a certain subsequence of the deriv-
ation, which subsequence is presumed to be independently well-defined). It
62 explaining syntax

should be clear that in formulating this discussion in these terms I have


simplified considerably. I believe, however, that I have captured the essence
of the difference between the two approaches.
Let us now compare the two kinds of constraints and the kinds of general-
izations that they seek to capture with the data that we have presented here.
Given accepted notions of what constitutes a significant generalization in
linguistics, we might except that our linguistic theory would provide us with
the means for expressing the persistence of a syntactic device as the formation
of tags in English as somehow relatively probable or ‘natural’. By this I mean
the following: we could much more expect to find a language like English than
a language (call it *English) which differed from English only in that
(i) *TQF forms tags in sentence-initial position
(ii) *ITF forms tags in sentence-final position, and
(iii) *ETF forms tags between the subject and the predicate.
*English would contain sentences like those in (30)–(32):
(30) Is he, John isn’t very bright. (From *TQF.)
(31) Leave me alone, will you. (From *ITF.)
(32) He, can John, can really cook. (From *ETF.)
We are unable to distinguish between the two languages on the basis of the
relative complexity of the grammars, determined by any counting procedure
whatsoever. The rules for both languages are formally identical, with the
exception that English tags all show up in the same place, while *English
tags all show up in different places. Clearly English might be found to be
simpler than *English if the rules of English were consecutively ordered. For if
this was the case, we might be able to see our way to collapsing the three
English tag formation rules into a single tag formation rule. Even the possi-
bility of doing this is ruled out for *English.
From the discussion in }3.1–}3.5, however, it can be seen that the three tag
formation rules cannot be collapsed into a single tag formation rule, because
of the fact that they are not consecutively ordered. In fact, there is no
consecutively ordered pair. Therefore a different mode of attack on the
problem is required. There are a number of logical possibilities which imme-
diately present themselves.
(a) It is wrong to claim that English is more natural than *English.
(b) While it is correct to claim that English is more natural than *English,
present linguistic theory must be able to express this within its present
capabilities: therefore, the rule ordering arguments and/or statements
of the transformations in }3.1–}3.5 are wrong to the extent that they
prevent collapsing of the three tag formation rules.
the coherence of syntactic descriptions 63

(c) We need a Kisseberth-type derivational constraint.


(d) We need an Emonds-type derivational constraint.
(e) We need something else.

3.8 Similarity
(a) The claim that English is more natural than *English arises from the
linguist’s intuition. If the linguist’s intuition is wrong, he will never get
anywhere in accounting for linguistic phenomena, but will bog down
in linguistics’ version of the epicycle. There is no a priori notion of
‘natural’ on which we can base our arguments.
(b) It is not clear that the three rules would collapse even if the ordering
arguments were shown to be invalid. Observe that TQF puts a pro-
noun in the tag, ITF puts a personal pronoun you in the tag, and ETF
leaves a pronoun behind in the sentence. Examples (33)–(35) illustrate
these points.
he
(33) John’s not very bright, is ? (From TQF.)
*John
you
(34) (Someone) pick up the phone, will ? (From ITF.)
*he
He
(35) ’s really something, is John. (From ETF.)
*John
Thus it is not conclusively the case that collapsing would capture the desired
generalization even if the rule ordering could be made appropriate.
The only other possible assumptions that would uphold the conclusion in
}3.7(b) is that our notion of transformation, i.e. our notion of linguistically
significant generalization, is not a useful one. If our theory utilizes a notion of
generalization such that application of that notion of generalization prevents
us from capturing other generalizations which we feel are significant ones,
then our notion of generalization may be faulty in the first place. At this point
we can do no more by way of discussion than to repeat the statement made in
}3.8(a).
(c) It will be noted that a Kisseberth-type constraint is a negative con-
straint; that is, it tells us what kinds of structures we can never have.
Since tags are a type of structure which we may have, and a type of
structure which is ‘natural’ to find in a particular location in the
sentence, a Kisseberth-type constraint would not appear to have
much usefulness in the situation under discussion.
(d) It seems more likely that a Emonds-type constraint would be applic-
able here. Assume that English had a phrase-structure rule like (36)—
64 explaining syntax

(36) SENT ➝ S TAG

—then if the tag formation rules could be shown to be structure-preserving,


we could then account for the fact that tags always show up at the end of
the sentence in this way. However, as Emonds (1970) shows, TQF is a root-
transformation and is not structure-preserving. (But cf. R. Lakoff 1969.)
Even if this were not the case, however, it is difficult to see how we could
maintain the integrity of the notion ‘structure-preserving’ if tag formation
was postulated to be a structure-preserving transformation. In assigning this
characteristic to a transformation, we are attempting to express formally our
intuition that the structure that is the output of such a transformation is a
‘characteristic’ structure of the language. It becomes clear from considering
tags that while the notion of structure-preserving may capture the notion of
characteristic structure, it fails to do so just in case there is a characteristic
structure which must be generated by a set of transformations which contains
at least one root transformation. To put it succinctly, what is needed is some
characterization of ‘characteristic root structure’.
We are faced, therefore, with the option of accepting the status quo, as in
}3.7(a), or of attempting to construct a framework in which a root transform-
ation may give rise to a characteristic structure which is explicitly represented
as characteristic within that framework. It thus becomes necessary, having
rejected the three logically possible sources for a solution to the problem, to
devise a brand-new alternative. It turns out, however, that the notion of
structure preservation functions as a spiritual source for a solution, even if
it cannot be used directly to solve the problem.
The reader familiar with Emonds’ work will agree, perhaps, that what
is needed here is not simply the analogue, for root transformations, of
structure-preservingness. To make the reasons behind this observation expli-
cit, let us consider the conceptual basis of this notion.
What we are saying when we call a set of transformations structure-preserv-
ing is that there is a subset of the characteristic structures of the language
which can be characterized by the rules for the base phrase-markers of the
language. Let us call this subset of structure ‘base-characteristic-structures’, or
BC-structures. Root transformations apply to these BC-structures to give
another set of structures whose members are non-BC-structures. Since these
latter structures are not related in any systematic way to the base rules as the
BC-structures are, we are admitting in principle an unlimited variety of possible
non-BC-structures. If we discover a regularity among non-BC structures there
is no way, within any of the present formulations of linguistic theory, of
avoiding the claim that this regularity is accidental.
the coherence of syntactic descriptions 65

It is important to point out here that even if the set of possible root
transformations was severely constrained, we could not predict within the
present framework that it is more natural for a language to select fewer of the
possible root structures.
An analogue to the notion of structure-preserving, non-root transform-
ations, however, must consist of a principled method by which we can constrain
the class of non-BC-structures possible for a given language by appealing to a set
of rules. The feature of tags which was judged to be notable was the fact that the
ultimate location of the tags in surface structure was the same, regardless of
their source. Hence the feature of the rules in question which is of interest is
their basic similarity to one another, and not the form of their output per se.
What we really require, in fact, is a means of expressing the notion that the
output of two transformations which are not consecutively ordered is similar
with respect to the relative position in surface structure of certain well-defined
sub-phrase-markers which are crucial to the statement of the two transform-
ations. In a sense, then, structure-preservingness is a special case of the
maximization of similarity, rather than similarity being a kind of structure-
preservingness.b

3.9 Capturing similarity


We may now motivate our notion of similarity as follows: given a pair of
transformations, we wish to say that they are intuitively similar to one
another if they could be collapsed by the accepted collapsing conventions
provided that they were consecutively ordered. The more similar the two
transformations are, the more they can be collapsed into a single rule without
the use of brackets. Consider now the following pair of rules:
Rule 1: A B ) B A
Rule 2: C B ) B C
We may think of the similarity of the two rules as being a function of the
number of changes that we would have to make in a structure which is the
input to one rule such that it becomes the output of the other. For example,
given the ‘rule’ RI—
RI: A ) C

b
Here I am anticipating a point that I have since elaborated elsewhere (e.g. in Culicover and
Nowak 2002 and Culicover 2013): what have been formulated as grammatical constraints in
earlier work may actually be reflections of markedness. Constructions that obey these con-
straints are more highly valued than those that do not, but the latter are nevertheless theoretic-
ally possible and may occur in natural languages under certain circumstances.
66 explaining syntax

—we could derive B C from two different sources:


Derivation 1: C B = Rule 2 ) B C
Derivation 2: A B = Rule 1 ) B A = RI ) B C
If we call RI a ‘unit of dissimilarity,’ then we can say that Rule 1 and Rule 2 are
one unit of dissimilarity from one another.c
Let us attempt to define this notion of similarity precisely. The following
discussion is based on the tag formation rules, and the reader should be aware
that our formulation, which is preliminary, may be somewhat idiosyncratic
for this reason. In Definition 1, below, I have listed what I take to be a minimal
set of elementary transformations {TU}. In Definition 2 a notation is estab-
lished to permit us to refer to a phrase-marker which is the output of a
sequence of transformations, denoted by TS, which have applied to an initial
phrase marker P. In Definition 3 we have a notation which permits us to
represent the similarity of two transformations in terms of the relative
number of elementary transformations which would have to be applied to a
selected input P in order to produce the output of the two transformations,
T(P) and T0 (P). (An alternative formulation is possible in terms of T(P)
and T0 (P0 )).

3.10 Definitions
Definition 1: (a) TC is the set of transformations which only change the value
of a feature for a single constituent of the tree. Given con-
stituent A and feature F, TC(A, F) does one of the following:

(i) A A
+F –F

(ii) A A
–F +F

A
+F
(iii) A
A
–F

c
The idea of measuring similarity in terms of primitive operations such as deletion and
insertion is due originally to Levenshtein (1966). I was unaware of Levenshtein distance until
hearing a colloquium in the 1990s by John Nerbonne on applying it to the analysis of Dutch
dialects (reported on in Nerbonne et al. 1999 and other work). Jirka Hana and I apply Levensh-
tein distance (or ‘edit distance’) to measuring the complexity of morphological analyses in
Ch. 13 of this book.
the coherence of syntactic descriptions 67

(b) TI is the transformation which inserts a single constituent


into the tree.
(c) TD is the transformation which deletes a single constituent
from the tree.
(d) TR is the transformation which replaces one constituent by
another.
(e) {TU} is the set consisting of TC, TI¸TD and TR. TU is any
member of {TU}.
(Note: the set {TU} is defined in order to provide a means for
discussing similarity between transformations. There is no
claim that {TU} is in any sense a minimal set for the purpose
of specifying transformational grammars.d)
Definition 2: Given a phrase marker P and a set of transformations TS, then
the result of applying k transformations all of which are
members of TS is referred to as ‘TkS (P)’.
Definition 3: Let ~n be read as ‘n-similar’. If two transformations are
Ø-similar, they are identical. We define n-similar as follows:
T0 ~ 0 k m
n T, if T (P) = TU(T(TU (P))), where k + m = n.

What Definition 3 says is that the similarity of two transformations can be


measured according to how many elementary transformations we would
have to apply along with one transformation in order to have the same effect
as simply applying the other transformation alone. We must specify that
the shortest path from T to T0 is to be considered the determinant of n, for
otherwise the definition is meaningless. This is the case because it is clearly
possible to apply k insertions and k deletions to any structure and end up with
the same structure. This is formalized as Caveat.e
Caveat: Given k, k0 , m, m0 , T, T0, n, n0 , then if
T0 (S) = TkU (T(Tm
U 0(S))) such that k + m = n, and if
T0 (S) = TkU (T(Tm 0 0 0
U (S))) such that k + m = n , and if
n0 6¼ n, then for q = min(n0 , n), T0~ q T.

d
However, it is more or less the set of primitive operations assumed by Chomsky in his
earliest work (1955) and most recently in the Minimalist Program (1995).
e
A constraint against sequences of operations that take one back to where one started was
subsequently proposed by Pullum (1976), under the rubric of the Duke of York Gambit, and has
been followed up in phonological and morphological theory by e.g. McCarthy (2003).
A constraint ruling out Duke of York derivations is required for the learnability proof of Wexler
and Culicover (1980).
68 explaining syntax

3.11 Coherence
Consider now the following transformations:
T1: A B C X ) A B C X A B C
T2: A B C X ) X A B C
T3: D B C X ) D B C X D B C
By Definition 3 we have T1 ~ 4 T2, T2 ~ 3 T3, T1 ~
1 T3. (All three values are not
symmetric, e.g. T2~T
3 3, etc. but T 3~T
5 2 and T 3~
2 1.) Now let us consider *T1, *T2,
T
and *T3.
*T1: A B C X ) A B C A B C X
*T2: A B C X ) X A B C
*T3: D B C X ) D B D B C C X
By Definition 3 we have *T1~
4 *T2, *T2~
6 *T3, *T1~
4 *T3.

It is possible to see how the fact that one set of transformations is more
‘coherent’ than another is reflected in their respective similarity values. In all
likelihood a coherence-measure could be devised which would, on the basis of
the aggregate differences in similarity values in different sets of transform-
ations, reflect these differences as a single value. The utility of such a device is
open to question, however.
Consideration of a few examples will show why it might be desirable to
treat the smallest possible similarity measure between two transformations as
being significant. The following should serve to illustrate:
T4: A B X ) B X A
T5: C B X ) B X C
*T4: A B X ) B X A
*T5: C B X ) B C X
Observe first that the movement, in T4, of A to the right of X can be simulated
by inserting a copy of A to the right of X and then deleting the first
A. Similarly, the similarity between T4 and T5 can be measured by inserting
C to the right of A in the output of T4 then deleting A. Thus in this sense T5 is
2-similar to T4.
However, the same procedure will assign the value 2 to the similarity
between *T4 and *T5. Simply insert C to the right of B in the output of *T4,
and then delete A. It is clear, however, that T4 and T5 are more similar than *T4
and *T5. By introducing TR, the elementary replacement transformation, we
can relate T4 and T5 by the single application of TR, so that T4~ 1 T5. However,
*T4~2 *T5 at best, no matter how the similarity value is determined.
the coherence of syntactic descriptions 69

As an exercise, now, I will compare the two sets of transformations consist-


ing of TQF, ITF, and ETF on the one hand and *TQF, *ITF, and *ETF on the
other (cf. }3.7). It will be seen that this case is similar to the hypothetical case
just discussed involving T1–T3, *T1–*T3, with this one being somewhat more
complex.
TQF: WH NP AUX X ) NP AUX X WH NP AUX
+PRO
ITF: WH NP AUX X ) NP SUBJ(unctive) X WH you AUX

ETF: EMPH NP TENSE(+[+v]) X )


– PRO
NP EMPH NP TENSE(+[+v]) TENSE(+[+v]) X
+PRO
TQF~
2 ITF, TQF~
8 ETF, ITF~
8 ETF

*TQF: WH NP AUX X ) WH NP AUX NP AUX X


+PRO
*ITF: WH NP AUX X ) NP SUBJ X WH you AUX

*ETF: EMPH NP TENSE(+[+v]) X )


– PRO
NP EMPH NP TENSE(+[+v]) TENSE(+[+v]) X
+PRO
*TQF~
9 *ITF, *TQF ~
14 *ETF, *ITF ~
14 *ETF

Notice that it will be necessary to constrain TR in some ways, and to take into
account the surface structures being considered, and not simply the surface
strings. For example, if we replaced each element in the structural change of
*ITF by the corresponding element in the structural change of *ETF, we
would be able to say that the two transformations are 7-similar. However, in
establishing the similarity measure it was not our intent that it be applied
blindly, but rather that it reflect in some way the true extent to which
transformations are performing similar functions. It is necessary therefore
that we interpret the elementary transformations {TU} as having the charac-
teristic common to all transformations of mapping trees into one another,
and that we interpret the similarity measure as strictly speaking being defined
on the output trees, and not the input strings, of transformations.

3.12 Towards a general notion of similarity


I would like to end this paper with a few speculations on the generality of a
notion of similarity, however it might be defined in practice. I find it conceiv-
able that this device might be applicable both in grammar evaluation in
70 explaining syntax

general and in grammar formulation. I will give one example of each kind of
application.
Evaluation: Linguists have not infrequently expressed dissatisfaction with the
idea of symbol-counting as an appropriate or adequate evaluation device. As
an example, consider the case of T4 – T5 in }3.11 compared with **T4 – **T5
below:
**T4: A B X ) B A X
**T5: C A X ) B A X
The pairs may be collapsed as follows:

TØ: A B X ⇒ B X A
C C

**TØ: A B X ⇒ B A X
C A
In each case the uncollapsed set of rules contains twelve symbols, while the
collapsed set contains eight. The greater similarity between T4 and T5 is not
captured by this notation. We see however, that T4 is 1-similar to T5, while
**T4 is 2-similar to **T5.
It seems to me that while collapsibility ought to play a role in the evaluation
of grammars, it is certainly not a sufficient simplicity criterion. If, however, it
turned out that there was evidence against the collapsing conventions which
we are fond of using, the notion of relative similarity or coherence could still
be used to capture certain kinds of generalizations which were implicit in our
use of the conventions.
Formulation: Some linguists have often made use of the argument that if
A displays many of the same characteristics as B, then A should be analyzed as
a B, either by using the notation B or by postulating the structure [B A] (see
±A
e.g. Lakoff 1971; Ross 1969a). The main argument given in favor of this step is
that the appearance of the term A in more than one rule would constitute
B
the loss of a linguistically significant generalization.
It can be seen, however, that a grammar which has n occurrences of A is
B
more coherent than the same grammar with n–1 occurrences of A and one
B
occurrence of A , and is therefore more highly valued if we include coher-
C
ence in our evaluation metric.
4

Stress and focus in English


(1983)*
Peter W. Culicover and Michael S. Rochemont

Remarks on Chapter 4
Sentence stress is not always a sufficient condition for interpretation as focus.
An insightful analysis of the appropriate generalizations can be accommo-
dated under a ‘modular’ approach to grammatical theory. Certain observa-
tions concerning the stress properties of wh-questions are shown to be
consistent with the assumptions of trace theory as developed in e.g. Chomsky
and Lasnik (1977), where the relationship between focus and stress is mediated
by S-structure. The notion of focus has no consistent pragmatic characteriza-
tion; it is, rather, a grammatical notion. The interpretation of this grammat-
ical notion in particular discourse contexts is provided by rules of Discourse
Grammar using the predicate ‘c-construable’, which is here defined.
Our goal in this article was to account for the correspondence between the
location of the focal stress and the focus interpretation. We believed that this
could be explained by assigning a focus feature F (which we borrowed from
the work of Jackendoff and Selkirk) to a node or nodes in the syntactic structure,
and then mapping this structure into prosody, on the one hand, and a focus
interpretation, on the other. This early study was refined and elaborated by
Rochemont (1986; 1998).

4.1 Introduction
In this paper we address the issue of how to relate focus and stress in English
sentences, particularly within the framework of the (Revised) Extended Stand-
ard Theory. Specifically, we will show that, with some refinement, the

* [This chapter appeared originally in Language 59: 123–65 (1983). It is reprinted here by
permission of the Linguistic Society of America. The order of the authors is strictly alphabetical.
We would like to thank Dwight Bolinger, Larry Hyman, Will Leben, and an anonymous
Language referee for helpful comments. The research reported on here was supported in part
by grants from the National Science Foundation (BNS-7827044) and the Sloan Foundation.]
72 explaining syntax

grammatical model of Chomsky and Lasnik (1977) accommodates an insight-


ful analysis of the relationship between focus and stress, while preserving the
Autonomous Systems view of Hale et al. (1977).
By invoking the Autonomous Systems view, we wish to suggest that the
current work is directed toward the development of a comprehensive gram-
matical theory of the following sort. The grammar is composed of a set
of independent components, and its overall functioning is constrained by
mappings between representations provided by the individual components.
It is a crucial feature of the Autonomous Systems view, and of our analysis,
that the maximal degree of generality of the grammar of a language can
be achieved by maximizing the generality of each of the independent com-
ponents. Specifically, it should not be necessary to refer to the primitives and
relations characterized by one component in specifying the generalizations
(i.e. the rules) of another.
In terms of accounting for the distribution of stress and focus in English,
the Autonomous Systems view leads us to a characteristic sort of analysis.
Generalizations about stress, which is a phonetic phenomenon, are the proper
domain of (a part of) the phonological component. By contrast, identifica-
tion of focus is accounted for within the domain of the syntactic component,
whereby a given constituent is represented as ‘in focus’. The interpretation of
focus is a pragmatic phenomenon, and has to do with contextual beliefs.
Crucially, in such an account the rules for assigning stress cannot directly
take into account which constituent is in focus; the identification of the
constituent in focus cannot be stated in terms of either the prosodic pattern,
or the contextual beliefs that are implicated in the interpretation of focus; and
the assignment of stress cannot be a function of the contextual beliefs. This
is not to say that stress, focus, and context are unrelated, but rather that the
generalizations concerning each are independently specified. Through such
autonomy, the various related phenomena will become better understood.
The organization of grammar which we adopt is, then, given as Figure 4.1.
The syntactic component consists of a set of base rules, rules of lexical
insertion,1 and movement transformations, assumed generally to be of the
type ‘Move Æ’. The output of this component is a level of representation
referred to as S-structure. S-structures give input to the rules which ultimately
associate semantic representations with sentences (the ‘right’ side of the
grammar) as well as to the rules which ultimately derive phonetic representa-
tions (the ‘left’ side of the grammar).
S-structure is thus seen as the interface between two autonomous sets of rule
systems, each of which interprets syntactic representations; thus it ultimately

1
We assume this for expository convenience. We take no principled position on the question
of where lexical insertion operates in the organization of grammar (cf. Otero 1972).
stress and focus in english 73

Base
Movement Transformations

S-structure

Deletions Construal
Filters Interpretive Rules
Stylistic Rules (e.g. Quantifier Raising,
WH-Interpretation, Focus)
Surface Structure Principles of Anaphora

Accent
Placement Logical Form & Conditions
Rules
Rules of Discourse
Prosodic Structure Grammar

Phonology

Phonetic
Representations

Figure 4.1

yields a characterization of the association between sound and meaning in


language. S-structure is crucially to be distinguished in this framework from
surface structure—which results from the operation of deletion transform-
ations, filters, and stylistic rules on S-structure. This usage of the term ‘surface
structure’ parallels its traditional usage except that, with Chomsky and Lasnik,
we take surface structure to be enriched by the presence of certain phonologically
null elements which are visible to the rules of phonology. This set of elements is
properly a subset of those which are encoded in S-structure; specifically, the set
includes only those empty elements (in effect, traces) which appear in case-
marked positions in S-structure (cf. Jaeggli 1980). Thus, of the phonologically
null elements present in S-structure (i.e. PRO, case-marked traces, and non-
case-marked traces), only case-marked traces survive in surface structure.
In }4.2, we will be concerned with the derivation of Prosodic Structure
from surface structure by the rules of accent placement. This is a non-trivial
matter, given that (as we will argue) prosodic and surface structures are non-
isomorphic. We suggest that the rules of accent placement are also sensitive to
the presence of case-marked traces—as we might expect, given the organiza-
tion of grammar in Figure 4.1.
Turning to the right side of the grammar, within the interpretive compon-
ent deriving Logical Form (LF), we will be concerned primarily with the
74 explaining syntax

assignment of focus on S-structures (cf. }4.3). This paper deals exclusively


with stress-related focus, though we suggest an analysis of non-primary
stressed wh-phrases as foci. However, we do not consider the problem
of specifying focus that is not stress-related, e.g. as defined by the scope of
lexical items such as only or even, or by particular syntactic configurations
(constructional focus, in the sense of Rochemont 1980).
We take the component of rules of Discourse Grammar to contain rules
which relate aspects of sentence grammar to discourse and context. An example
is the Delta sub-f Interpretation of Williams (1977). In }4.4, we suggest a
number of additional rules of Discourse Grammar whose function is to define
contextual conditions for the interpretation of focus as presentational, con-
trastive, emphatic, etc.
In }4.5 we summarize our results, and compare our proposals with a
number of others that have appeared in the linguistic literature.

4.2 Prosodic structure


In this section we examine the derivation of prosodic structures from syntac-
tic structures. We distinguish the notions ‘accent’ and ‘stress’.2 Adapting the
terminology of Ladd (1980), we use the term ‘accent placement’ to refer to the
assignment of s (strong) and w (weak) to nodes in a prosodic tree. We reserve
the term ‘stress’ to refer to perceived levels of relative prominence.3

2
Bolinger (1961) uses the terms ‘accent’ and ‘stress’ differently than we do here: for him
‘accent’ designates phrasal stress, while ‘stress’ designates lexical stress. See below for discussion.
3
We make a fundamental assumption here that the semantic effects of stress, in particular
nuclear stress, can be identified and studied independently of the contribution to meaning of
intonation contours. This point will become clearer in }4.3, where we discuss the interpretation
of focus. But for the present, consider the following example:
(a) Bill likes only Mary.
This sentence can be pronounced with different intonation contours. For our purposes,
consider only the following—a simple declarative contour and a typical interrogative contour:
(b) Bill likes only Mary.
(c) Bill likes only Mary?
In either case, the location of the nuclear stress can be identified with the focused constituent,
completely independently of the meaning contribution of the intonation contour. That is, Mary
is focused in both cases. It might be possible to defend and maintain the view that varying
intonational possibilities for a single sentence, with a particular focus specification, can be
defined by altering pitch ranges across the pre-head, head, and tail, while keeping constant the
pitch range over the nucleus; this possibility was suggested to us by H. Borer. However, we will
not explore it further here. Returning to the interpretation of intonation contours, we suspect
that particular contours define conventional implicatures; in the case of (b) and (c), the
different implicatures may be responsible for the fact that a reading which expresses surprise
at the focus is forced in (c), but not in (b).
stress and focus in english 75

We further distinguish ‘phrasal’ and ‘lexical’ stress. Lexical stress refers to


stress within categories of type Xn, n = 0; phrasal stress refers to stress in
categories of type Xn, n ≧ 1. We will not deal here with the rules of accent
placement involved in the determination of lexical stress; that topic is dis-
cussed in detail by Liberman and Prince (1977).4
Our goal is to determine rules of accent placement to account for the
full range of phrasal stress in English sentences.5 These rules determine,
on the basis of surface-structure phrase markers, a level of representation
which we will refer to as Prosodic Structure (P-structure).6 P-structure
serves as input to the phonological rule component, following Selkirk (1978).
Representations at P-structure are prosodic trees of the type defined in the

4
It should be noted that the approach of Liberman and Prince encounters problems in
nominal compounds where accent placement has potential interpretive effects. To illustrate,
consider an example drawn from Ladd: he notes that, with a nominal compound like faculty
meeting, the relational theory predicts that the strong accent will be placed consistently on the
left constituent. But the strong accent appears just as naturally on the right constituent in
particular, non-linguistically determined contexts. Consider, for instance, the following dialog:
A: Has the faculty voted on that issue yet?
B: No, they will be discussing it at the faculty meeting tomorrow.
It is possible that our approach can be extended to such cases; but we will not pursue this
suggestion here.
5
Dogil (1979) proposes an analysis somewhat similar to ours, in that he also develops an
interpretive model incorporating a relational theory of stress. He assumes, however, that
determination of focus is based on P-structures rather than S-structures—a proposal which
clearly must be abandoned if one assumes a non-isomorphic mapping between S-structure and
P-structure. That is, if the derivation of P-structures alters the syntactic constituent structures, a
constituent-related definition of focus will be impossible. Dogil’s approach is more ambitious
than ours, however, since his proposals are extended to accommodate instances of lexically
contrastive stress—a topic which we have ignored. Some modification of his proposals might
yield a system consistent with our view.
6
David Gil (p.c.) has suggested to us the possibility of generating P-structures independently
(rather than deriving them from syntactic structures), and of defining an algorithm to pair up
syntactic and prosodic structures. We reject this alternative for two reasons. First, in Gil’s
system, the pairing mechanism is sensitive to semantic and pragmatic considerations—an
analysis that is inconsistent with our Autonomous Systems view. Second, a major motive for
Gil’s proposal is his contention that, as a universal property, languages have rightmost strong
accent placement. Thus defining accent-placement rules for languages with radically distinct
syntactic structures, in order to derive essentially similar prosodic structures for these languages,
would give rise to unnecessary theoretical complications. Clearly, however, languages do differ
in prosodic structure. Furthermore, should it turn out that such prosodic differences are
paralleled by syntactic differences, our approach would be even more strongly preferred. Barring
this consequence, an explicit analysis of the type Gil suggests would, in our view, amount to
nothing more than a notational variant of our analysis, with the notable disadvantage that it
does not conform to the Autonomous Systems view.
76 explaining syntax

relational theory of Liberman (1979) and Liberman and Prince (1977): i.e. each
non-terminal node in a prosodic tree is binary-branching, dominating one s
node and one w node. Thus each constituent is assigned prominence relative
to its sister constituent. In order to accommodate instances of multiple
primary stress, as in example (1) below, Liberman’s relational theory must
be modified so as to allow prosodic nodes to dominate two s sisters, i.e.
constituents which are perceived to have equal relative prominence:7
(1) John told bill about susan, and sam about george.
On the plausible assumption that not all syntactic structures are binary-
branching, then either syntactic and prosodic structures must be non-
isomorphic, or else we must abandon the strict definition of ‘prosodic tree’
which characterizes the approach outlined above. We adopt the former alter-
native, for the following reason. Under the assumption that syntactic and
prosodic structures are isomorphic, we would expect Figure 4.2(a), which has
binary branching, to have a P-structure like Figure 4.2(b). (‘R’ is the
node used to define the root of the tree, following Liberman and Prince.)
(a) PP (b) R

P NP w s
from
from PP w s
towns
N P NP w s
in Germany
towns in Germany

Figure 4.2
Figure 4.2(b) implies that each of the first two lexical items in the phrase
has relatively greater prominence than the item following—which strikes us as
false. The P-structure in Figure 4.3, however, more accurately represents the
perceived relative stress levels.
R

w s

w s w s
from towns in Germany

Figure 4.3

7
We adopt the convention of using small capitals to signal primary (nuclear) sentence stress.
We do not share the view of Schmerling (1976) and Bing (1979) that, in sentences like our
stress and focus in english 77

Assuming that Figure 4.3 is an empirically more adequate representation


for Figure 4.2(a) than is Figure 4.2(b), it is at once evident that syntactic and
prosodic structures are not even ‘weakly isomorphic’, as they would be if
all instances of binary branching in a syntactic structure were preserved in
P-structure. The mapping between these levels of representation in Selkirk
(1978) may in fact be weakly isomorphic in this sense, if all her ‘non-lexical’
categories are binary sisters.
Extending the suggestions of Ross (1967) and Downing (1970), we adopt
the view that P-structure crucially determines the boundaries of what we will
refer to as a ‘phonological phrase’, i.e. a sequence of words which can optionally
be set off from its environment by pauses. For instance, consider again the
phrase from towns in Germany: our contention is that the only natural position
for an optional pause is between towns and in.8 As can be seen by comparing
the structures of Figures 4.2(b) and 4.3, only Figure 4.3 shows any obvious
relation between the constituency of the P-structure and determination of the
relevant phonological phrases. It is precisely this function of P-structure that
we take to subsume part of the motivation for ‘readjustment rules’ first
mentioned in Chomsky and Halle (1968: 10, 371–2). In the SPE system,
readjustment rules play two rather dissimilar roles. Their operation in the
modification of syntactic structure to accord with phonological phrases is
what is most relevant to us here. Chomsky and Halle offer examples reminis-
cent of our Figure 4.3, and it is precisely this function of the readjustment
rules that we take to be subsumed by the translation procedure offered below
for relating syntactic and prosodic structures.9

example (1), the rightmost stressed element is perceived to have relatively greater prominence
than the other stressed elements in the sentence. In our view, all the nuclear stressed elements
have equal relative prominence. As far as we know, there is no conclusive empirical evidence
bearing on this issue.
8
Given the general unavailability of empirical studies on phonological phrasing, we acknow-
ledge the unreliability of intuitions in subtle cases. However, in the cases we discuss, the relevant
intuitions are apparently well-defined and consistent across speakers.
9
It has been pointed out to us by a referee that the metrical grid construction of Liberman
and Prince is designed to accommodate many of our observations concerning prominence and
constituency. However, it is our contention that, in being extended to sentential domains, the
metrical grid approach makes certain false predictions. For instance, this approach predicts that
alternating ‘upbeats’ can optionally appear, so long as no violation of the Relative Prominence
Projection Rule (RPPR) occurs. Consider in this connection a sentence like (i).
(i) John may have been dating mary.
The grid construction method predicts that have can optionally be stressed without violation
of the RPPR—which is clearly false; i.e. have cannot optionally be stressed in (i), unless it bears
nuclear stress, and hence is focused.
A similar case is provided by the following example:
78 explaining syntax

In }4.2.1, we elaborate an algorithm for unambiguously deriving prosodic


structures from syntactic structures. We assume that this mapping applies to
surface structures after the application of all core, deletion, and stylistic
transformations (cf. Chomsky and Lasnik 1977). In }4.2.2, we define a set of
rules for Accent Placement (assignment of s and w), and will specify in broad
terms how we intend to relate the interpretation of focus to accent placement.
In }4.2.3, we define an algorithm for determining stress levels on the basis of
P-structures. At this point we will have elaborated in broad outline a system of
rules relating stress and focus in English sentences. In }4.2.4, we apply this
system to the analysis of questions; and we show that, with some slight
modification in line with our assumptions, the system can be extended to
account for certain previously unexplained properties of focus interpretation
and stress in wh-questions. Finally, in }4.2.5, we outline an approach to
cliticization which overcomes certain difficulties encountered by the mapping
algorithm defined in }4.1.1.

4.2.1 The mapping


For present purposes, we specify two rules to accomplish the mapping from
surface-structure trees to binary-branching trees. We adopt the convention that
these rules apply bottom-up in syntactic trees, concurrently deriving the corres-
ponding binary-branching structures. Note that the trees so derived are not all
strictly binary-branching: the output will contain instances of non-terminal,
non-branching nodes. We will later introduce a pruning convention that will
guarantee strictly binary-branching trees at P-structure.10
We state first the Head Rule. For this purpose, we define the notion
‘combines’ as follows: for Æ,  adjacent in a tree, ‘Æ combines with ’ means
that Æ is Chomsky-adjoined to , preserving order, and the resulting branch-
ing node is not labeled.

(ii) The old man left.


(Consider this as a response to the question Who did what?) As with (i), the grid method
apparently makes a false prediction that an upbeat should be possible on the. But (ii) raises a
more interesting problem: in the metrical grid constructed for it, the RPPR is violated unless
a pause appears between man and left. It strikes us, however, that such a pause is only
optional, not mandated. It is precisely these types of pauses that our approach is designed to
capture. In fact, our claim is that the appearance of such pauses is tied not to the accent
properties of the utterance, but to the P-structure. Thus a similar pause is also optionally
available in (iii).
(iii) The old man left.
10
We adopt the Pruning Convention to preserve the strict binary branching of prosodic
trees, in line with the Liberman–Prince framework.
stress and focus in english 79

(2) The Head Rule:


a. The head of a phrase combines with the highest right adjacent
c-commanded non-recursive node.
b. A specifier combines with its head.11
Even after the application of rule (2), instances of multiply-branching
nodes may remain. For such cases, we assume the Sisters Rule, by which
daughters of multiply branching nodes that are not analyzable by rule (2) are
grouped into pairs, beginning with the two rightmost daughters.
(3) The Sisters Rule: If Æ and  are sisters and neither is the output of the
Head Rule, combine Æ with .
We will assume, for completeness, that rule (3) applies recursively to its
own output. We are not concerned here with structures in which the Sisters
Rule must crucially apply more than once, so this assumption of recursiveness
may ultimately be superfluous.

(a) S (b) S

NP AUX VP NP VP

N⬘ is V NP N⬘ x

N dating DET N⬘ N x NP

Mary my N Mary AUX V N⬘

brother is dating x

DET N

my brother

Figure 4.4

11
We understand the term ‘head’ here to refer to the lexical head of a phrase, not a phrasal
head (cf. Jackendoff 1977). In addition, we define a recursive node X as one which dominates
another node Y, such that X and Y are identical in terms of syntactic-feature make-up and
number of primes.
(a) VP (b) VP

V NP PP NP PP

send DET N⬘ P NP N⬘ x

a N PP to N⬘ x PP P NP

V x x to N⬘
book P NP N
send DET N P NP N
about N⬘ Mary
a book about N⬘ Mary
N
N
Nixon
Nixon

Figure 4.5

(a) NP1 (b) NP1

DET N⬘ N⬘

our N PP x PP

talk P NP2 DET N x

with DET N⬘ our talk P NP2

the N with N

president x

DET N

the president

Figure 4.6

(a) (cf. Fig. 2) (b) (cf. Fig. 3)


PP1 PP1

P1 NP1 NP1

from NP2 PP2 x PP2

N⬘ P2 NP3 P1 NP2 P2 NP3

N in N⬘ from N⬘ in N⬘

towns N N N

Germany towns Germany

Figure 4.7
stress and focus in english 81

Figures 4.4–4.7 illustrate the application of the Head Rule. We use ‘x’ here
simply as a place-holder for new nodes that are added in the course of the
derivation.12
Note that, in Figure 4.6(b), P is not combined with det by the Head Rule.
The node NP2, in this case, is in fact the highest right-adjacent, c-commanded,
non-recursive node with respect to P. In this respect, Figure 4.6 contrasts with
Figure 4.5. In Figure 4.5, the NP object of V has essentially the structure of
Figure 4.6(b) after the application of the Head Rule. Subsequent application
of the Head Rule in the domain of the VP in Figure 4.5 forces V to combine with
[x det N], rather than with NP or N0 , since both NP and N0 are recursive in
this case. Similarly, in Figure 4.7, P1 combines with NP2 rather than with NP1.
Figure 4.8 illustrates the application of the Sisters Rule.13

NP NP

NP CONJ NP NP x

DET N⬘ and DET N⬘ N⬘ CONJ NP

the N the N x and N⬘

man woman DET N x

the man DET N

the woman

Figure 4.8

12
Following Jackendoff (1977), we take V to be the head of S.
13
Examples like the following suggest either that the notion ‘c-command’ in (2a) is not
sufficiently strong, or that the Sisters Rule must apply before the Head Rule:

NP

NP and NP

N⬘ N⬘

N N

John Mary
We will not pursue this matter further.
82 explaining syntax

Our proposal differs from that of Selkirk (1978: 20), who suggests an
alternative algorithm for defining the mapping from syntactic structures to
P-structures:
(4) a. An item which is the specifier of a syntactic phrase joins with the head
of that phrase.
b. An item belonging to a ‘non-lexical’ category (cf. Chomsky 1965),
such as det, p(rep), comp, verbaux, or conj, joins with its sister
constituent.
The results contrast consistently with those of rule (2), above. Consider, for
example, the following:
(5) On a side street—in the Soho district—of London.
The application of (4) yields a binary-branching structure something
like Figure 4.9. But rule (2) will yield the binary-branching structure of
Figure 4.10.
Note that Figure 4.10 is consistent with our assumption that P-structure is
instrumental in the determination of phonological phrases, in the sense in
which we have defined them. That is, Figure 4.10 (but not Figure 4.9) yields an
appropriate constituent-structure characterization of the major options for
pauses in example (5).
Selkirk marshals empirical arguments in favor of having the structures
given by rule (4) define the domain of application of particular phonological
rules; these too can be applied essentially without revision to our proposal
(but cf. fn. 9, above). Given the broader range of applicability of rule 2, we
therefore prefer it over rule (4).

PP1

P1 NP1

on NP2 PP2

a sidestreet P2 NP3

in NP4 PP3

the Soho P3 NP5


district
of London

Figure 4.9
stress and focus in english 83

PP1

NP1

x PP2

P1 NP2 NP3

on a sidestreet x PP3

P2 NP4 P3 NP5

in the Soho of London


district

Figure 4.10

4.2.2 Accent placement


In assuming the Autonomous Systems view of Hale et al., we adopt the
position that accent placement is a formal matter, not sensitive to semantic
or pragmatic considerations—in contrast to the views expressed by Bolinger
(1972), Schmerling (1976), Ladd (1980), and Bing (1979). We do, however,
agree that certain aspects of accent placement have interpretive effect. To
incorporate this observation into our framework, we introduce an optional
rule of Strong Assignment (SA), which we assume to apply as the last rule in
the derivation of S-structures.
(6) strong assignment: Any node X ! X:s.
We use the notation ‘X:s’ to signal that, in the translation into prosodic
structure, any node so marked must be translated as s in the corresponding
prosodic tree. In }4.3, on Focus Assignment, we will present rules for the
determination of focus which are sensitive to strong accents that have
been assigned by SA. As noted, assignments of s will be preserved by the
rules deriving P-structures. Following Liberman and Prince (1977), we can
identify the location of primary stress in a given sentence as the desig-
nated terminal element in the corresponding P-structure. Thus, to be
certain that an s placed by SA receives primary stress, we must guarantee
that it is ultimately associated with the designated terminal element.
To accomplish this, we propose a rule of Strong Propagation, which applies
to the trees that result from application of the Head Rule (2) and the
Sisters Rule (3).
(7) strong propagation: If a node Ni dominates a node Nj:s, then Ni is Ni:s.
84 explaining syntax

To determine the relative values (s or w) of the remaining nodes in


structures derived by application of rules (2) and (3), we require an additional
set of rules, given below (the term ‘unspecified’ is taken to designate nodes
which have not yet been assigned a prosodic value).
(8) Weak Default: An unspecified sister of s is w.
(9) Neutral Accent Placement:14 [Æ()ª] ! [Æ()s], where ,ª are
unspecified.
(10) If Æ is the root of the tree, Æ is R.
Thus, assuming SA to apply in the derivation of S-structures, the mapping
from surface structure to P-structure follows several distinct steps. First, the
Head Rule and the Sisters Rule apply to surface structures to determine
structures which are, at most, binary-branching. Then Strong Propagation
and rules (8)–(10) apply. To the output of these rules we apply a pruning
convention that deletes any remaining non-branching, non-terminal nodes.
The trees so derived are P-structures.15
To illustrate the mapping, we offer Figures 4.11–4.13. In each case, (a) is the
surface structure; (b) is the result of applying the Head Rule and the Sisters
Rule; (c) is the result of Strong Propagation; and (d) is the P-structure that
results from applying rules (8)–(10) and the Pruning Convention. Since the
syntactic category information is presumably of no value in P-structure or
in the phonological component, we adopt the convention of substituting at
P-structure the s, w values of the category nodes for the nodes themselves.
Figure 4.11 illustrates John saw Mary in the park; Figure 4.12 illustrates John
sent a book about Nixon to Mary; and Figure 4.13 illustrates Bill believes that
John is marrying Sue.16

14
Neutral Accent Placement subsumes the effect of Liberman and Prince’s rule (8a).
15
Only at the level of P-structure do the assignments of s and w have any interpretive
value. Thus we do not consider our analysis to violate the strict definition of prosodic
trees given by Liberman and Prince, whereby a node in a prosodic tree can only be
interpreted in relation to its sister constituent. At the level of P-structure, where
this interpretive principle must be seen to hold, all prosodic nodes are defined in this
relation.
16
We suspect that this derivation gives evidence for an additional rule of types (2)–(3).
Specifically, it appears that comp should combine with the subject NP in the embedded
sentence of Figure 4.13(b). Such a rule might also be formulated so as to generalize to cases
like the following: This analysis is adopted in the discussion of wh-questions in }4.2.4. For
discussion of an additional class of cases such as these, see Selkirk (1972).
stress and focus in english 85

(a) S (b) S

NP VP NP VP

N⬘ V NP PP N⬘ x PP

N N⬘ P NP N V NP P NP

N:s DET N⬘ N⬘ DET N⬘

N N:s N

(c) S:s (d) R

NP VP:s w s
John
N⬘ x:s PP s w

N V NP:s P NP w s w s
saw Mary in
N⬘:s DET N⬘ w s
the park
N:s N

Figure 4.11

4.2.3 Stress
As indicated in }4.2.2, the primary stress of a sentence is identified with the
designated terminal element. Traditionally, stress levels have been indicated
for English by the assignment of numerical values, in which ‘1’ generally
marks the position of primary stress, ‘2’ that of secondary stress, ‘3’ that of
tertiary stress, and so on. It has been a common criticism of the analysis of
English stress proposed by Chomsky and Halle that it results in implausibly
fine distinctions in perceived stress levels, by allowing for the assignment of
numerical values in excess of ‘4’ or ‘5’ (cf. Bierwisch 1968). The proposals of
both Liberman and Prince and of Liberman are intended to some extent to
overcome this criticism. However, it also applies to the algorithm which they
propose (Liberman and Prince 1977: (25a)) to relate accent placement to
perceived relative levels of stress:17

17
Liberman and Prince in fact take the position that assignment of stress levels is of little
linguistic interest. As the following discussion shows, we can make stress level a more interesting
notion by defining it in terms of domains of maximal stress prominence, where ‘domain’ has a
structural characterization.
86 explaining syntax

(a) S (b) S

NP VP NP VP

N⬘ V NP PP N⬘ NP PP

N DET N⬘ P NP N N⬘ x

N PP N⬘ x PP P NP

P NP:s N V x x N⬘

N⬘ DET N P NP:s N

N N⬘

(c) S:s (d) R

NP VP:s w s
John
N⬘ NP:s PP s w

N N⬘:s x w s w s
to Mary
x PP:s P NP w s w s
sent about Nixon
V x x:s N⬘ w w
a book
DET N P NP:s N

N⬘

Figure 4.12

(11) If a terminal node t is labeled w, its stress number is equal to the number
of nodes that dominate it, plus one. If a terminal node t is labeled s,
its stress number is equal to the number of nodes that dominate the
lowest w dominating t, plus one.
This will assign the stress levels in the following example to the P-structure of
Figure 4.13(d):
stress and focus in english 87

(a) S (b) S

NP VP NP VP

N⬘ V S⬘ N⬘ V S⬘

N COMP S N COMP S

NP AUX VP:s NP VP:s

N⬘ V NP N⬘ x NP

N N⬘ N AUX V N⬘

N N

(c) S:s (d) R

NP VP:s w s
Bill
N⬘ V S⬘:s w s
believes
N COMP S:s w s
that
NP VP:s w s
John
N⬘ x NP w s
Sue
N AUX V N⬘ w s
is marrying
N

Figure 4.13

2 3 3 5 7 6 1
(12) Bill believes that John is marrying Sue

We propose to replace (11) with this:


(13) If a terminal node t is labeled w, its stress number is equal to the number
of nodes dominating t to the nearest P-cyclic node, plus one. If a terminal
node t is labeled s, its stress number is equal to the number of nodes
that dominate the lowest w dominating t to the nearest P-cyclic node,
plus one. In the domain defined by R, if a 1-stress is dominated by
w, add one.
88 explaining syntax

We define the notion ‘P-cyclic node’ as follows:18


(14) A P-cyclic node is
a. any node in a P-structure that translates a syntactic S; or
b. any node in a P-structure that
(i) is not dominated (i.e., it is R), or
(ii) immediately dominates two branching nodes.
To illustrate, we apply (13) to each of the P-structures in Figures 4.11–4.13. For
convenience, these are reproduced as Figures 4.14–4.16, respectively, with the
P-cyclic nodes circled.
Rule (13) then yields the following stress levels:
(15) 2 3 1 3 4 2
a. John saw Mary in the park.
2 3 4 2 3 1 3 2
b. John sent a book about Nixon to Mary.
2 3 4 2 4 3 1
c. Bill believes that John is marrying Sue.

4.2.4 Wh-constructions
We have now defined in broad outline a system of rules associating instances of
primary stress with focus. Focus Assignment, and the resulting interpretation,
will operate on S-structure to which SA has applied (see }4.3). The rules involved
in the mapping from surface structure to P-structure preserve the s assignments,

w s
John
s w

w s w s
saw Mary in
w s
the park

Figure 4.14

18
It is intuitively clear that limiting the application of (13) to domains defined by S is similar
in effect to a constraint such as the Binary Principle (BP) of Wexler and Culicover (1980). That
principle will not apply to (13) directly, however, because BP refers to the B-cyclic nodes NP and
S0 . These node labels do not correspond to the P-cyclic nodes in P-structure. In fact, in many
cases no node in P-structure corresponds to a syntactic NP or S0 . Nevertheless, it would be worth
investigating whether a constraint like the BP could be independently motivated from consider-
ations of learnability in the domain of prosodic structure.
stress and focus in english 89

w s
John
s w

w s w s
to Mary
w s w s
about Nixon
sent w s
a book

Figure 4.15

w s
Bill
w s
believes
s
w s
that
w w s
John Sue
w s
is marrying

Figure 4.16

and guarantee that these constituents will contain the primary stress of the
sentence. Since we assume SA to be optional, we will relate the results of its
application to multiple-focus interpretations as well (}4.4). We assume that, in
the interpretation (perhaps at LF), all sentences must specify a focus—though not
necessarily only as a result of applying SA. We thus predict that sentences will exist
in which the location of primary stress given by the rules of accent placement will
not coincide with the constituent that functions as focus, if the focus constituent
is determined by a rule other than that which is sensitive to the application of SA.
This prediction is borne out. Thus a sentence like (18), with primary stress
on buy, can be used just as easily in a context like (16), in which case it cannot
be said that buy is focus, as it could in a context like (17), in which case it
can be said that buy is focus:19

19
Bolinger has pointed out (p.c.) that our analysis predicts that (i) and (ii), below, should be
possible in the same contexts, without a difference of interpretation vis-à-vis focus:
90 explaining syntax

(16) A: I finally went out and bought something today.

(17) A: Bill took me downtown to all the big department stores today.
2 1
(18) B: Oh yeah? What did you buy?
It has been repeatedly suggested (cf. Gunter 1966; Horvath 1979; Rochem-
ont 1978) that wh-words function naturally as focus constituents of construc-
tions in which they appear. In the context defined by (16), the location of
primary stress in (18) does not coincide with the focused constituent. It appears,
then, that the wh-focus does not always coincide with a primary stress (this fact
is also noted by Gunter). It should be emphasized that we do not view these
cases as ‘preferred’ to or ‘more normal’ than other occurrences of stress. In fact,
what we will suggest is that such instances are derived by the same set of accent
placement rules that we have just proposed. Note that, in (18), stress is
rightmost in the sentence, in the position predicted by Neutral Accent Place-
ment (cf. rule (9)). This is a natural consequence of our analysis, since we claim
that SA has not applied in the derivation of (18). Consideration of additional
wh-questions bears this out:
2 1
(19) a. Who was talking to Bill?
2 1
b. Which girl did John meet in Rome?
2 1
c. Who decided to leave early?

All these sentences allow an interpretation in which only the wh-word is


focused, even though it does not bear primary stress. Note also that no

(i) What did you buy?


(ii) What did you buy?
We agree that, in the context defined by (16), above, (a) but not (b) carries an implicature that
the speaker is disputing the truth of the proposition ∃x(You bought x). We believe this
implicature to be tied not to the focal properties of the utterance—which are the same as (b)
on the relevant interpretation—but to the peculiar properties of stressed wh-words in English.
This conventional implicature is reminiscent of one typically associated with stressed wh-words in
echo questions. On this view, echo questions have the same focus structures as their correspond-
ing information questions, but differ in terms of the conventional implicatures they carry.
Consider also the following:
(iii) What did you buy?
This, unlike (i) and (ii), is not consistent with the context of (16). Under our analysis, (iii) must
be treated as an instance of multiple foci, since the formalism allows no alternative. (That is,
Neutral Accent Placement cannot be responsible for the nuclear accent on buy.) Thus (iii) is
predicted to occur naturally in a context like the following:
(iv) John walks in carrying a huge gift-wrapped box, and Mary exclaims: ‘what did you buy?’
stress and focus in english 91

other placement of primary stress, except on the wh-specifier itself, allows


such an interpretation.20
The sentences in (20) present apparent counterexamples to this analysis. In
each of them, the accent falls on the penultimate word. As in (18) and (19), our
analysis predicts that, on the intended interpretation—i.e. where only the wh-
word is focused, and the word that receives primary stress is not—stress
should fall on the final word, as in (21):
2 1
(20) a. Who is Bill sleeping with?

2 1
b. What kind of creature do you think we’re up against?

2 1
c. Which seat was she sitting in?

2 1
d. What will you talk about?

2 1
e. What are you looking at?

20
To see that it is stress on a wh-specifier (and not a full wh-phrase) that yields this
possibility, consider these sentences:
1 2
(i) How many soldiers did you meet?
1 2
(ii) How many soldiers did you meet?
2 1
(iii) How many soldiers did you meet?

Clearly, soldiers is focused in (i) and not in (iii). Correspondingly, (i) and (iii) cannot be used
interchangeably in any context without affecting the focus properties of the utterance in
question. But (ii) and (iii) can be used interchangeably:
(iv) A: I’m so excited! Tom took me down to Buckingham Palace today and I got to meet all
those soldiers.

B: Oh, really?!
1 2
How many soldiers did you meet?
2 1
How many soldiers did you meet?
1 2
How many soldiers did you meet?
92 explaining syntax

2 1
(21) a. Who is Bill sleeping with?

2 1
b. What kind of creature do you think we’re up against?

2 1
c. Which seat was she sitting in?

2 1
d. What will you talk about?

2 1
e. What are you looking at?

Note, however, that (21a–e) allow interpretations exactly equivalent to those


of (20a–e) in certain contexts; the wh-word is focused, though not primary-
stressed, and the stressed word is not focused. For example, (20d) and (21d)
can be used interchangeably in the context below:
(22) A: I’ve just been asked to talk about something at a conference next month.
B: Oh, really?!
2 1
(20d)What will you talk about?
2 1
(21d)What will you talk about?
Neutral Accent Placement predicts the accent placements of (21d) but not (20d).
Our theoretical framework assumes that wh-Movement leaves a trace in its
sentence-internal position that is subsequently visible to filters, and to the
phonological rule of to-contraction, in a way that other phonologically null
elements, e.g. PRO, are not (cf. Chomsky and Lasnik 1977; Jaeggli 1980). In
line with these proposals, let us assume that the relevant traces of wh- in the
surface structures of (18)–(21) are present in the associated P-structures.
Thus (18) is taken to have the P-structure of Figure 4.17(a), and (21d) the
P-structure of Figure 4.17(b) (cf. fn. 16 above)
We make the natural assumption that P-structures like these are to be
excluded: clearly the primary stress of the associated sentences cannot fall
on a phonologically null element.21 As a mechanism to exclude such cases, we
adopt an obligatory rule:

21
In addition, given the rules for interpreting focus, an s on a null element does not receive
an interpretation. This is because such an element, being bound by some other constituent,
cannot at the same time support an interpretation as new or contrastive information (cf. }3).
stress and focus in english 93

(a) R (b) R

w s w s
what what
w s w s

w s w s w s w s
did you buy e will you talk
w s
about e

Figure 4.17

(23) Switch Rule: [s1 w [s . . . e]] ) [s1 s [w . . . e]], where [s . . . e] is a desig-


nated terminal element, and s1 is not a P-cyclic node and does not
dominate a P-cyclic node.
This applies to Figure 4.17(a) to yield Figure 4.18. Note, however, that rule (23)
applies ambiguously to Figure 4.17(b), which has two relevant non-P-cyclic
nodes that dominate E. These are the boxed nodes indicated in Figure 4.19.
Rule (23), together with Neutral Accent Placement, predicts all the accent
placements in (19)–(21) on the intended interpretations.

P-structures like those in Figure 4.17 can be derived in two ways: by Weak Default (rule 8), or
by assigning s in S-structure to the trace of wh. Given that a wh-phrase is by definition a focus,
the first option will yield an acceptable derivation even though there is no stress focus. The
second option is independently necessary, so that we may derive the correct prosodic structure
when the trace of wh is not on a rightmost branch; e.g. What did you do yesterday?
If we allow the structure

VP

V NP

e:s
then the question arises as to whether both the wh and the VP that contains its trace can be focus.
We cannot discover any plausible interpretation of such a FA, and will therefore provisionally
adopt the convention that a trace within a focus constituent cannot be bound from outside. Such a
convention may extend naturally to rule out the following cases:
(a) Extraction from the fronted wh-phrase or topic: *Whoi did you wonder [whose picture of
ti]j John stole tj?
(b) Extraction from the focus of a cleft: *Whoi was it [a picture of ti]j that John stole tj?
(c) Extraction from the focus of a pseudo-cleft: *Whoi was what John stole [a picture of ti]?
For a discussion of extraction from focus in pseudo-clefts, see Culicover (1977), where a
somewhat different approach to the one suggested here is adopted. Delahunty (1981) suggests
that such cases may be handled by a constraint which blocks extraction from antecedents.
94 explaining syntax

w s
what
w s

w s s w
did you buy e

Figure 4.18

w s
what
w

w s s
will you
w s
talk
w s
about e

Figure 4.19

There is some reason to believe that the examples in (21) may have a
different S-structure from those in (20) (assuming our SA rule); if so, our
statement of the Switch Rule would be too broad. Crucially, the examples in
(21) cannot be used when the mutual beliefs on which they bear have not been
asserted; e.g. (21d) would not be appropriate if you had not said that you were
going to talk. Similarly, we could walk up to someone and say (20e), but not
(21e). It might be appropriate to derive (20a–e) from S-structures in which s
falls on the verb, in which case the verb would be focus. In (21a–e), s would
appear on the trace of wh, and the Switch Rule would move it onto the
preposition. Viewing the examples in (20) along the lines just suggested does
not appear to wreak havoc on the view of focus interpretation sketched below;
however, we will leave the question of the correct analysis of these examples
undecided for the present. For completeness, we indicate the form of the
alternative Switch Rule here.
(24) Switch Rule (Alternative): [sl w [s e]] ) [sl s [w e]], where [s e] is a
designated terminal element.
The analysis using (23) makes accurate predictions in a number of unexpected
cases:
stress and focus in english 95

2 1
(25) a. Which number did you look up?
2 1
b. Who did John send Mary to?
Here nuclear stress in S can fall only on the final word, if the interpretation
with which we are concerned is to be preserved. Specifically, penultimate
stress will not preserve this interpretation:
2 1
(26) a. Which number did you look up?
2 1
b. Who did John send Mary to?
The P-structure which our rules assign to the surface structure of (25b), in
which no s has been assigned by SA, is shown in Figure 4.20.
In this P-structure, rule (23) can analyze only a single non-P-cyclic node;
thus it is correctly predicted that stress must fall on the preposition in this
derivation, unlike Figure 4.17(b) above. Example (25a) is slightly more com-
plex, since we assume this sentence to be associated with two well-formed
surface structures on the intended interpretation, given in Figure 4.21.
From Figure 4.21(a,b), our rules determine the structures of Figure 4.22
(a,b), respectively. Rule (23) does not apply in Figure 4.22(a), since e is not
immediately dominated by s. But it does apply to Figure 4.22(b), yielding
Figure 4.23.
In the P-structures of Figures 4.22(a) and 4.23, the final preposition is
a designated terminal element; thus it is predicted that this word receives
primary stress in the sentence, regardless of where e appears in the verb
phrase.

w s
who
w s

w s w s
did John
w s w s
send Mary to e

Figure 4.20
96 explaining syntax

(a) S⬘ (b) S⬘

COMP S COMP S

which number AUX NP VP which number AUX NP VP

did you V NP PP did you V PP NP

look e P look P e

up up

Figure 4.21

(a) R (b) R

w s w s

w s w s w s w s
which number which number
w s w s w s w s
did you up did you e
w s w s
look e look up

Figure 4.22

w s

w s w s
which number
w s s w
did you
w s e
look up

Figure 4.23

w s

w s w s
to talk about e

Figure 4.24
stress and focus in english 97

4.2.5 Cliticization
The Head Rule and the Sisters Rule, which map from S-structure into P-structure,
do not account for all the possible stress patterns and phrasings of English. In
addition (cf. also fn. 16 above), a class of cliticizations must be stipulated.
Consider the following, which is parallel to (21) and Figure 4.17:
(27) What are you going to talk about?
The problem that we face with going to concerns the proper attachment of to.
If to is an aux in [s pro to talk about e], the Head Rule will make it a sister of
talk in P-structure, as in Figure 4.24.
The Switch Rule (23) will put s only on about, because to talk is branching,
and so (28a) will never be derived:

2 1
(28) a. What are you going to talk about?
2 1
b. What are you going to talk about?
Suppose, however, that to is a surface-structure daughter of the VP, as in
Figure 4.25(a). Applying the Head Rule to Vand PP will then yield Figure 4.25(b).
If the Switch Rule puts s on about, the resulting P-structure will be
Figure 4.26. This structure predicts a higher stress level on to than on talk,
because they are both w, and talk is further from the closest P-cyclic node.22
To resolve this problem, we cliticize to to going after the Head and the
Sisters Rules have applied, by a rule that we will call Cliticization. (Selkirk 1972
proposes cliticization rules for English which incorporate certain clitics into

(a) VP (b) VP

to V PP to x

talk P NP V PP

about e talk P NP

about e

Figure 4.25

22
This result does not obtain if talk about e is in a different S than what are you going to. In
such a case, the lowest P-cyclic node above talk would be different than the lowest P-cyclic node
above to. Such a structure does not seem plausible to us.
98 explaining syntax

w s

to w s
talk
s w
about e

Figure 4.26

S⬘

COMP S

what AUX NP VP

are you V VP

going to V PP

talk P NP

about e

Figure 4.27

the preceding word, as does our rule.) In order for our prosodic rules to work
correctly, the clitic must in fact be made a daughter of the terminal node
dominating the item to which the clitic is attached, as we will see below. The
result of applying Cliticization to (27) is given in Figure 4.27.23
Note that the output of Cliticization here makes perfect sense in defining
the context in which gonna is derived from going to. Suppose that to in the
infinitive is not in aux in surface structure, but is as we have illustrated it
in Figures 4.25(a) and 4.27. The Head Rule does not apply to this to. In
the VP going to the store, however, to is the head of PP, and is mapped into
a P-structure phrasal unit, as in Figure 4.28. Because of the ungrammaticality
of *I’m gonna the store, it is plausible to assume that nodes like PP or x in
Figure 4.28 are frozen with respect to later rules like Cliticization.24

23
The non-branching VP may be pruned here, or may be pruned in the P-structure by the
Pruning Convention. We see no reason not to assume that the latter convention does the job.
24
The similarity of this assumption to the Freezing Principle of Wexler and Culicover (1980)
is obvious. However, we do not know whether there is a principled generalization over the two
(see fn. 18 above).
stress and focus in english 99

VP

V PP

going x

P NP

to DET N

the store

Figure 4.28

Selkirk (1972: }3.2) has an extensive treatment of clitics in which pronouns


are incorporated into a single word with a verb or a preposition, and we will not
repeat her analysis here. A point worth noting is that, although an unstressed
pronoun must be cliticized, an uncliticized pronoun must be stressed, and will
hence be focused:25
(29) a. I sent him to Mary.
b. I sent him to Mary.
c. *I sent Mary him.
d. I sent Mary him.
The pronoun in (29a) may have an anaphoric reading, while that in (29b) and
(29d) may not. Example (29c) is ruled out as a consequence of the fact that
Cliticization cannot apply to the unstressed pronoun. Given the appropriate
mechanism for designating (29c) as ill-formed (say, the filter *#[+Clitic]#),
the interpretive possibilities in (29) will be explained by a theory of focus
interpretation, which we take up in }4.4.
To conclude this discussion of clitics, let us reconsider Selkirk’s proposal
that clitics form a phonological unit with the elements to which they
are attached. Example (30) can be stressed in two ways, with stress either on
give or on to:
(30) A: I gave the book to someone.
B: Who did you give it to?
B0 : Who did you give it to?

25
Bolinger has pointed out to us that this statement must be qualified. A pronoun that is
initial in a conjoined structure may be unstressed, yet uncliticized: As for John, I don’t mind
sending him or his brother to Mary.
100 explaining syntax

The two patterns are derivable by the Switch Rule, but only if give it is a
phonological unit. A full NP direct object allows stress only on to, if the
intended interpretation is to be preserved:
(31) A: I gave the book to someone.
B: Who did you give the book to?
B0 : Who did you give the book to?
As we see, give it behaves prosodically not like give the book, but like talk—as
in (22), where the Switch Rule was motivated.

4.3 Assignment of focus


It is well known that a given sentence with a given stress pattern may have a
variety of interpretations with respect to the assignment of focus. This was
perhaps first noted in the generative grammar literature by Chomsky (1971),
who pointed out that a sentence like (32) may answer a variety of questions, as
illustrated in (33). The bracketing in (33) indicates that part of the sentence
which answers the corresponding question:
(32) [John [gave the ice cream to [the [old [man]a ]b ]c ]d ]e
(33) a. Did John give the ice cream to the old woman?
b. Did John give the ice cream to the boy?
c. Who did John give the ice cream to?
d. What did John do?
e. What happened?
It appears to us that the most natural way to relate the location of stress in
the sentence to the assignment and interpretation of focus is essentially to
map, on the one hand, from S-structure into the representation of focus—and
thence into the interpretation of the focus—and, on the other hand, from
S-structure to Prosodic Structure.26 As will be discussed in greater detail in
}4.5, any framework in which the assignment of stress is determined by
properties of the discourse and the interpretation becomes burdened with
complexity. Moreover, such a framework rather seriously violates the Autono-
mous Systems view of Hale et al. (1977), which we are assuming here as a
methodological guide.

26
That is, there is a mapping from S-structure to P-structure; but it must be mediated by
other components of the grammar, e.g. stylistic and deletion transformations (cf. Chomsky and
Lasnik 1977).
stress and focus in english 101

In its essentials, the system for assignment of focus is very simple. Given an
S-structure to which SA has applied, any node with s assigned to it defines a
focus constituent.27 In assigning focus, we map the syntactic representation
into another level of representation in which the focus constituent is explicitly
identified: we call this level ‘F-structure’.28 In the interpretation of focus, this
latter level of representation is related to properties of the discourse and to the
context in general.29
In this section we will sketch the details of the mapping from S-structure to
F-structure. In }4.3.1, we suggest a formal notation for the representation of
focus. In }4.3.2, we discuss the cases of apparent ambiguity of focus first noted
in Chomsky (1971).

4.3.1 The formal representation of focus


Our representation of focus must have the property that the focus constituent is
formally isolated from the rest of the structure in which it appears in S-structure.
If we map S-structures into lambda-notation to represent the assignment of
focus, we obtain expressions like the following:
(34) [ºx(Mary loves x)] (John)
Here John is the focus, and the original sentence might have been Mary loves
John, with heavy stress on John.
There is good reason, however, to configure focus structure much like the
surface structure of English, rather than reduce it to something along the lines

27
We will later introduce two more stress markings to designate marked intonations: ‘s!’
represents an emphatic intonation contour, while ‘s?’ represents an echo intonation contour.
Our claim is that the intonation contour functions orthogonally to focus assignment (cf. fn. 3
above). We speculate that the focal phrases, as defined by assignment of s, are instrumental in
determining the domains to which intonation contours are assigned in P-structure.
28
We leave open the question of whether F-structure is to be identified with LF, since some of
the F-structures which FA derives violate the Empty Category Principle (ECP) of Chomsky
(1981a), Jaeggli (1982), and Kayne (1981a). One way to overcome this difficulty would be to allow
FA to apply on LFs, yielding a level of representation distinct from LF and not subject to the
ECP, taken as a condition on LF. For a number of reasons, however, we do not adopt this
alternative to the grammatical model of Figure 4.1—in particular, because this would not solve a
similar problem for the ECP which arises in connection with the interpretation of quantified
expressions in general. It is our view that these two problems may comprise a unified difficulty
for the ECP.
29
It should be apparent that the S-structure abstract marker functions to mediate between
the prosodic and the focus structures. In this respect it is similar to the deep-structure markers
Q and I of Katz and Postal (1964), which were intended to mediate between surface structure
and semantic interpretation. In each case, the marker ensures that the correlation is maintained
between the two levels of representation that are defined independently in terms of the abstract
syntactic representation, be it deep structure or S-structure.
102 explaining syntax

of the traditional predicate calculus. In the latter, the verb would be a relation
of two variables, and (34) would be replaced by the following:
(35) [ºx(loves (Mary, x)] (John)
Because the focus constituent may be a VP, an aux (or at least a modal), or
any of various expressions whose syntactic category is Xi for arbitrary i,
expression (35) would prevent us from expressing all the possible focus
constituents: e.g. it does not contain the node VP, or any counterpart to it.
Our F-structure, then, will be essentially the S-structure of English, with
variables introduced as required. For convenience, we will use lower-case
node labels for the variables, so that (34) will appear as follows:
(36) [np (Mary loves np)] (John)
The similarity between this representation and that provided by trace theory
for cases of wh-Fronting is striking—though in the absence of a detailed
theory of LF and its relationship to focus, nothing significant can be attrib-
uted to minor notational details (but cf. (28)). An equivalent way of express-
ing (36) might be (37), where the similarity to trace theory is even closer:30
(37) Johni [Mary loves npi]
We will settle for now on this last representation.
Faced only with simple examples like (34), we could formulate the Focus
Assignment (FA) rule as a ‘movement’ transformation: one which indexes some
strong constituent in the sentence, extracts it, and leaves behind a coindexed
variable of the appropriate syntactic type. We are assuming that, in S-structure,
any node may be marked with s. Let us also assume that every node in the tree
has a unique index (including terminals). For a node X, the index of X is i(X),
and Type(X) is the syntactic type of X. We then state the following rule:
(38) Focus Assignment: Let X be an arbitrary node, and let Æ be the highest
node in the tree. If X is s, then FA(Æ) is the result of appending X to Æ
and replacing X with a dummy whose index is i(X) and whose syntactic
type is Type(X), i.e. [Æ Xi [Æ . . . ti . . . ]].

30
On the standard interpretation, the º representation expresses certain logical properties,
while (37)—as well as what is referred to by linguists as LF (cf. Chomsky 1981a)—has no self-
evident interpretation. Thus the latter must be assigned a logical interpretation, or else be
translated into a standard representation, e.g. the lambda calculus. We will assume for present
purposes that such an interpreted translation for our formulas is available.
stress and focus in english 103

As indicated above, we refer to the structure that results from FA as ‘F[ocus]


structure’. We may informally identify the focus constituent as binding a variable
in F-structure. (Of course, there may be more than one such constituent.)
Let us see how FA applies to (32), which has the structure of Figure 4.29
prior to the application of SA.
Applying FA to Figure 4.29 yields the F-structures in (39) (corresponding to
the sentences in (33)), assuming prior assignment of s (by SA) to the appro-
priate nodes in Figure 4.29:31
(39) a. manl4 [John gave the ice cream to the old nl4]
b. [old man]12 [John gave the ice cream to the n12]
c. [the old man]10 [John gave the ice cream to np10]
d. [gave the ice cream to the old man]3 [John vp3]
e. [John gave the ice cream to the old man]1 [sl]
For each sentence, then, we can define a set of F-structures, with membership
1. We will refer to this set as the ‘focal range’ of the utterance, along the
lines of Sperber and Wilson (1979).

S⬘

COMP S1

NP2 VP3

John V4 NP5 PP8

gave DET6 N7 P9 NP10

the ice cream to DET11 N⬘12

the ADJ13 N14

old man

Figure 4.29

31
Example (39e) is given for completeness: it is not obvious how FA should apply if the
candidate focus constituent is the entire S0 . Apparently, the FA will apply to S in the
configuration [S0 comp S], giving an F-structure Si [S0 comp si ]. But if comp is filled, the
constituent in comp will be focus; hence this F-structure will be impossible, by the conven-
tion of fn. 21. If comp is Ø, it seems to us that FA is, properly speaking, inapplicable—in that
its intended function is to extract a focus constituent from its surrounding structure; and
such a structure is absent. In fact, it is not clear that a Ø-comp is syntactically expressed in
S-structure.
104 explaining syntax

4.3.2 Some applications of focus assignment


In order to investigate the assignment of focus in real examples, we will
require access to the interpretation of focus—since the latter depends cru-
cially on the former, and we have only intuitions about the latter. Here it will
only be necessary to observe that, in some instances of focus, an indefinite NP
can be used to introduce a new individual into the discourse (so-called
‘presentational focus’); but sometimes a focus constituent cannot have this
function. Consider the following examples:
(40) a. John gave [a picture] to Mary.
b. John gave [a picture of Susan] to Mary.
c. John gave [a picture of Susan] to Mary.
(41) a. John brought [a book] into the room.
b. John brought [a book about linguistics] into the room.
c. John brought [a book about linguistics] into the room.
(42) a. John put [a glass] on the table.
b. John put [a glass of water] on the table.
c. John put [a glass of water] on the table.
Our intuition is that the (a) and (b) examples can have a presentational focus
interpretation for the bracketed phrases, but the (c) examples may not.
In order for NP to have a presentational interpretation, it must be the
focus. The difference between the (c) and (b) examples is that, in the latter,
the stress peak falls on the right branch of NP. Apparent ambiguity of focus
occurs when SA assigns s to a non-terminal right-branching node. In a right-
branching structure, such assignment of s yields a P-structure with a chain of
s’s along a right branch, because of Neutral Accent Placement (rule (9)).
However, an s on a right branch yields a chain of s’s above it, because of
Strong Propagation (rule (7))—regardless of the position on the chain that
the original s is given by Strong Assignment. Therefore, assignment of s by
SA to different nodes on a right-branching path will yield different focal
structures, but the same P-structure. In the (c) examples, by contrast, s is
assigned to a left branch; therefore no ambiguity of focus results.
Thus far we have not discussed the domain of FA. It appears that all extracted
foci are attached at the same level of structure, so that we have a natural
representation for multiple foci. Thus we have required that Æ be the root
node (or a root node, in the case of conjoined roots). We have no evidence to
suggest that stress focus may have a domain other than the entire S.
stress and focus in english 105

4.4 The interpretation of focus


It is often thought that several different kinds of focus (or stress) exist: presen-
tational, contrastive, emphatic, etc. As we have already suggested, ‘contrastive
stress’ does not in fact designate a unique stress pattern, but must be understood
(in our framework at least) as referring to focus used contrastively. Along
related lines, an expression like ‘contrastive focus’ for us does not designate
a distinct representation in S-structure or F-structure, but again refers simply
to focus used contrastively.
In our approach, then, focus is a formal property of sentences: its character-
ization is independent of either its particular interpretation in some context, or
the conditions under which it may have a specific interpretation. We will
therefore assign to F-structures the contextual conditions that must be satisfied
if the sentences to which those structures correspond are to be used appropriately.
We may distinguish at least three types of focus: contrastive, informational,
and presentational. Stress focus may have a range of interpretations, and the
actual interpretation of a given focus in a particular context will be deter-
mined only by the contextual conditions. More formally, we understand Foc
(F(t)) to be an F-structure in which Foc is the focus constituent, t and Foc are
co-indexed, and F(t) is an S-structure with t replacing the focus constituent.
Looking first at contrastive focus, it appears that, if an expression P1 with
focus is intended to be interpreted contrastively, it must be the case that the
speaker believes that the hearer believes both that some other expression P2 is
true, and that P1 is not true, where the difference between P1 and P2 lies in the
precise characterization of the sub-expression to which the focus constituent
corresponds. It is important to emphasize here that a sentence does not
contain contrastive focus except in virtue of the speaker’s intention to express
a certain belief B1, given his belief that the hearer has a related belief B2. Thus
exactly the same sentence, with the same focus, could have other uses in other
contexts. Contrastive stress, we claim, does not exist as such: what exists is the
interpretation of focus as contrastive in an appropriate context.
We can tentatively formalize our characterization of contrastive focus in
the following way. We let F(t/A) designate the expression that results from
substituting A for t in F(t):
(43) contrastive focus: In Foc1(F(t)), the element Foc is a contrastive
focus iff S believes that H believes that not F(t/Foc1), and that H believes
that F(t/Foc2), for some Foc2 6¼ Foc1.32

32
We are leaving unformalized the definition of ‘6¼’ in expressions like Focl 6¼ Foc2. To make
this definition precise, we would have to develop an account of lexical contrast, so that we
may specify when two lexical items or phrases are in fact not the same. Clearly this definition
106 explaining syntax

We provide the following examples, in which what is contrasted is successively


N, V, ADJ, NP, VP, PP, and S:
(44) a. John bought a green snake. (not rake)
b. John bought a green snake. (not sought)
c. John bought a green snake. (not blue)
d. John bought a green snake. (not a blue rake)
e. John bought a green snake. (not mowed the lawn)
f. Mary put the snake in the refrigerator. (not under the stove)
g. I think that John bought a green snake. (not that nothing happened)
Our definition of contrastive focus corresponds to a special case of contrast—
in fact, to what the literature often calls ‘contrastive stress’. We will see that
focus can be used ‘contrastively’ in contexts where the condition stipulated
in our definition does not hold. It is a simple matter, in general, to isolate
particular types of focus and relate them to certain contextual conditions, as in
our definition above; but it should be clear that doing so does not in itself
confer any special theoretical status to these special cases.
Along these lines, we note that, in certain ‘standard’ uses of focus, the
condition in the characterization of contrastive focus, for example, is satisfied
in a particular way. One standard use of contrastive focus is that in which
H asserts F(t/Foc2), Foc2 6¼ Foc1 ; this provides S with a very direct basis for
believing that H believes F(t/Foc2). The use of contrastive focus in such a
context might be termed ‘disputational’.
Returning to the examples in (44), we note that each is an appropriate
response to a yes-no question in which the constituent in parentheses (rake,
sought, etc.) is substituted for the focus in the full sentence. So (44b) would
be an appropriate answer to Did John seek a green snake? It would also be
appropriate as a response to the assertion Fred said that John sought a green
snake. In these two cases, H has not asserted that John sought a green snake.
In the first case, the belief is expressed and questioned; in the second, it is
attributed to Fred. Moreover, adapting a well-known example, we can show
that the belief that is relevant for a contrastive focus interpretation need not
even have been expressed explicitly:
(45) H: Did Mary call John a Republican?
S: No, she praised him.
Here the contrast is appropriate just in case S has the belief that calling
someone a Republican is not identical to praising someone.

has, in part, to do with the phonology, a matter that we choose to avoid here (see Williams 1981
for some relevant considerations).
stress and focus in english 107

What is common to all these cases is that the proposition in question, which
is in contrast with what S says, has been introduced into the discourse directly
by assertion, or obliquely by attribution or by a question—or quite indirectly,
by virtue of being construable (given certain beliefs) from a proposition that
has already been introduced. Therefore, it appears that our definition of
contrastive focus is appropriate to a more restricted case: that of direct disput-
ing of a belief which S thinks that H holds. A more general characterization of
the appropriate conditions for the interpretation of focus as contrastive would
be the following. We will say that F(t/Foc) is construable from the context,
or c-construable, if it has been asserted or mentioned (i.e. introduced
into the discourse), or if it can be inferred from what has been asserted or
mentioned, or if it is inferable from the mutual beliefs of S and H.
(46) generalized contrastive focus: In Foc1(F(t)), the element Foc1 is a
generalized contrastive focus iff F(t/Foc2)(Foc2 ¼
6 Foc1) is c-construable.33
Pragmatically, such a definition has the following consequences. If a sen-
tence is uttered which is to have a focus interpretation, then H must be able to
find a proposition in the discourse, or in mutual beliefs, for the purposes of
contrast (or else the focus must have some other interpretation that we have
not yet discussed). Given that H seeks a contrastive interpretation, and given
that no such proposition has been asserted, it must be either that S believes
that what was uttered allows the construction of such a proposition as
relevant to the discourse, or that the proposition in question is believed by
S to be generally believed. Given appropriate mutual beliefs, an utterance like
the following can have a contrastive interpretation without anything actually
having been said previously in the discourse:
(47) We can’t go to Hawaii this weekend.
The vast range of acceptable contexts for generalized contrastive focus demon-
strates rather clearly the undesirability of trying to generate the stress patterns of
sentences based on the contexts in which they are appropriate.
Chomsky (1976) gives an intriguing discussion of stress-related focus in
English. He suggests that, if we view a stress-focused phrase as binding a
variable at LF in its S-structure position, we can explain why focused NPs
behave like quantified expressions with respect to the determination of possible

33
A slight revision and extension of this definition will allow us to handle cases of multiple
focus. The trick is to indicate for such cases that the structures being contrasted are the same—
in that (a) they contain variables with parallel functions, and (b) the extracted foci can be
matched up with each other as n-tuples corresponding to the variables in the F-structures. The
examples that such an account should handle are well known, e.g. john hit susan and then she
hit him. We will not attempt to work out the details here.
108 explaining syntax

coreference of pronouns. Specifically, a pronoun may have a quantified expres-


sion as antecedent if the quantified expression does not bind a variable that
appears to the right of the pronoun in LF. Thus the difference between (48a,b)
is taken to parallel that between (49a,b):
(48) a. *The woman hei loved betrayed someonei.
b. Someonei was betrayed by the woman hei loved.
(49) a. *The woman hei loved betrayed Johni.
b. The woman hei loved betrayed Johni.
Rochemont (1978) argues that (49a) is in fact acceptable in particular
contexts, e.g.
(50) S: Sally and the woman John loves are leaving the country today.
H: I thought the woman he loves had betrayed Sally.
S: No. The woman hei loves betrayed Johni.
We are now in a position to accommodate this fact. On our analysis, (49a) can
be characterized as acceptable only on a contrastive interpretation of the
focused NP, since John could not define a presentational focus, for example,
if the antecedent of he had been previously established in the discourse as
John. A contrastive focus, however, need not be new to the discourse, given the
contextual conditions for contrastive focus interpretation outlined above. It
need only be the case that, in (50), S believes that H believes that not F(t/Foc1)
(Foc1 = John), and further that H believes that F(t/Foc2) (Foc2 = Sally), where
Foc2 6¼ Foc1. Thus, strictly speaking, in the F-structure of (49a), the conditions
on anaphoric interpretation of pronouns are not relaxed to allow he to appear
in LF as an instance of the variable bound by the focused NP John. Rather, he
is understood as coreferential to John only if John has been determined as
its antecedent in the preceding discourse, as in (50).
Along lines like those followed to our first definition of contrastive focus
(as used disputationally), focus that is used to provide information can be
defined in the following way.
(51) informational focus: In Foc1(F(t)), the element Foc1 is an informa-
tional focus iff S believes that H wants S to specify Foc2 such that
F(t/Foc2).
The interpretation of focus as informational is based purely on contextual
considerations, and does not depend on the form of the particular sentence.
All the examples in (44) may be answers to questions like these:
(52) a. What green thing did John buy?
b. What did John do about a green snake?
stress and focus in english 109

c. What kind of snake did John buy?


d. What did John buy?
e. Where did Mary put the snake?
f. What do you think?
Note that our revised definition of generalized contrastive focus includes
informational focus—since, in answering a question, one is providing a sen-
tence with the F-structure Foc1(F(t)), where F(t/Foc2) is c-construable for some
Foc2. This explains why the expression of focus is the same regardless of whether
the sentence is used to dispute a previously introduced proposition or to
answer a question.
Next we turn to presentational focus, exemplified by the following:
(53) a. A strange man walked into the room.
b. A new book by Chomsky has just appeared.
c. I ran into your old boyfriend yesterday.
In such cases, the stress peak is in a path with a left branch; therefore the NP
containing the stress peak is the maximal constituent that can be a focus. But
the examples in (53) need not be contrastive: no related propositions need to
have been introduced into the discourse, and they need not be informational
in the sense defined above. We assume that the structure of discourse involves
the introduction of individuals, and the predication of their properties. Some
individuals may be believed to be known to S and H; they can be referred to
either with a proper name or by an NP expression with a definite determiner
(e.g. your old boyfriend). To introduce a new individual who is not mutually
believed to be known to both participants, the indefinite article is required.
(In some dialects, it is also possible to use presentational this; cf. Prince 1981.)
After an individual has been introduced, the same individual is referred to with
the definite article, and with a suitably restrictive description—depending
again on mutual beliefs. Presentational focus is, then, the use of focus to
introduce an individual into the discourse.
(54) presentational focus: In Foc(F1(t)), the element Foc is a presenta-
tional focus iff F2(t/Foc) has not been introduced into the discourse
for all F2.
Presentational focus can always occur when an indefinite NP is focus, if the
F-structure Foc(F(t)) is consistent with existing beliefs. If all individuals
mentioned in F have already been introduced into the discourse, and the
particular relation between them has been established, then an individual
can be introduced. Thus (55) is appropriate as presentational focus if we are
discussing all the people who attacked Bill and Fred:
110 explaining syntax

(55) A strange man attacked Bill and Fred.


Such examples point up the intuition that, in presentational focus, it is
impossible to introduce a new individual and to predicate something sub-
stantive about that individual in the same sentence—unless what is being
predicated is, in some sense, ‘not unexpected’. A predicate is sufficiently
‘expected’ if it has already been predicated of other individuals in the dis-
course (but cf. fn. 34 below). We can provide a theoretical account of this
rather ill-defined intuition of expectedness by extending presentational focus
to non-NPs.
There is a parallelism between introducing a new individual and pred-
icating a property of some individual that has already been introduced
into the discourse. In both cases, something has been added to the infor-
mation expressed by the entire discourse. As the following examples show,
it would be a mistake to restrict presentational focus to NPs—in none of
these examples does the capitalized focus express a contrast or answer a
question:
(56) a. When John came into the room, he saw Mary sitting in the corner.
He spoke to her for a few minutes and then went into the kitchen.
b. Mary bought a calculator for John. She gave it to him last week,
when he was in town for a meeting.
c. Washington is a very interesting city. When you visit Washington,
you should try to get to the National Gallery.
It appears from such examples that a focus is presentational if F2(t/Foc) is not
c-construable, for all F2.34 Thus, in (56a), it is not c-construable that John did

34
An unfortunate consequence of this characterization of generalized presentational focus is
that it allows (i), below, in contexts where the sentence is used to initiate a discourse, but
excludes cases like (ii) and (iii) in similar contexts:
(i) The construction crew is dynamiting.
(ii) A strange thought just occurred to me.
(iii) A man appeared.
Here (i) has a reading (i.e. can occur in a context) in which both the subject and predicate are
presentationally focused. In (ii) and (iii), the predicate cannot be presentationally focused, since
it is not stressed. (The predicates here contain examples of ‘natural’ verbs of appearance; see
Guéron 1980 and Rochemont 1978 for discussion.) However, in a context where e.g. (ii) is being
used to initiate a discourse, the predicate of appearance meets our conditions for interpretation
as presentational focus. In our terms, then, it should be stressed; but as (iv) and (v) indicate, it
need not be, since either of these sentences could be used to initiate a discourse:
(iv) A strange thought just occurred to me.
(v) A strange thought just occurred to me.
Examples like (iv) might lead us to the conclusion that true verbs of appearance like occur or
stress and focus in english 111

nothing with respect to Mary. Furthermore, if some individual involved in


F1(t) is not in the context at all, then F2(t/Foc) is not c-construable, for all F2.
The same sorts of consideration that led us to define generalized contrastive
focus can be applied to our discussion of presentational focus. All that is
necessary for presentational focus is that, for all F, the occurrence of F(t/Foc)
should not be c-construable. The relation involved here is not one of logical
truth, but of sufficient information. If it is part of the discourse that X and
Y bear some particular relationship to one another (whether or not they
actually do), then assertion, mention, or even suggestion of another relation-
ship between X and Y is appropriately presentational.
(57) generalized presentational focus: In Foc(F1(t)), the element Foc
is a generalized presentational focus iff it is not the case that F2(t/Foc) is
c-construable, for all F2.
It is possible for a given F-structure to satisfy more than one set of conditions
for interpretation as focus, e.g.
(58) A: Bill was talking to Mary.
B: No, John was talking to Mary.
Assuming that John has not until this point been introduced into the dis-
course, the F-structure associated with B’s response allows John to function as
both a presentational and a contrastive focus in the context. However, as
shown by the discussion following example (50) (cf. also fn. 33 above), not all
instances of contrastive focus need also be presentational. Given that general-
ized presentational focus expresses the intuitive characterization of focus as
‘new information’, examples like (50) indicate that it would be a mistake to
attempt to give a general characterization of focus as ‘new information’, since
some instances of contrastive focus need not be ‘new’ in this sense.
Let us now pause briefly to address the issue of why a wh-specifier seems
naturally to function as a focus, whether stressed or not (cf. }4.2 and fn. 20).

appear need not be focused in order to be introduced. In other words, in (iv), used to initiate a
discourse, it must be c-construable that something has just occurred to the speaker, in order for
the predicate not to be focused. Given our definition of ‘c-construable’, this proposition must be
inferable from the mutual beliefs of speaker and hearer. Let us say that the mutual beliefs of
speaker and hearer include a set of principles of discourse, along the lines of Grice’s conversa-
tional maxims (Grice 1975). We might then assume that the c-construable proposition associ-
ated with (iv) in the context with which we are concerned falls under some version of Grice’s
Cooperative Principle. In other words, propositions with natural verbs of appearance are
c-construable as a function both of their usefulness in initiating discourse and of their inten-
sions. In contrast, examples like (ii–v) might be taken to indicate a deeper distinction between
NPs and predicates as focus, as suggested by both Bing and Ladd. We will not decide this
issue here.
112 explaining syntax

Note that the wh-quantifier has this function only when it has wide scope (i.e.
over the matrix S). Consider a question with an LF equivalent, in relevant
respects, to wh-xi(F(ti)). Let us assume that wh-xi is identified as a focus in all
LFs of this type. A wh-focus may be seen as always satisfying the conditions for
interpretation as a presentational focus. Strictly speaking, what this means is
that the referent of the wh-phrase must not have been previously introduced
into the discourse. However, wh-phrases are not referring expressions: in
essence, a wh-phrase functions to bind an empty position in an F-structure
which the speaker intends the hearer to fill with a response of the appropriate
semantic type. In this respect, the wh-operator in F-structures is similar to the
º-operator: it serves temporarily to bind an otherwise free variable.
Consider in this regard the following examples, based on the work of Ladd.
(59) A: John speaks many languages.
B1: How many languages does he speak?
B2: How many languages does he speak?
B3: *How many languages does he speak?
(60) A: John is a great linguist.
B1: How many languages does he speak?
B2: How many languages does he speak?
B3: How many languages does he speak?
In (59), that John speaks n languages is c-construable on the basis of
A. Assuming an analysis of B1 in terms of unmarked accent placement, how
many designates a presentational focus in the sense that both B1 and B2 are
requesting that A specify a value for n in the c-construable proposition
mentioned above, on the assumption that no such value is c-construable.
B1 and B1 presumably have the following F-structure:
(61) wh-numberi (John speaks i-many languages)
B3 is inappropriate in the context of (59A), since languages is not interpretable
as a presentational, contrastive, or informational focus—which renders the
associated F-structure uninterpretable.
In (60), given the beliefs of the hearer, one of two relevant propositions
is c-construable on the basis of A: that John speaks many languages, in which
case B1 or B2 is appropriate; or simply that John speaks languages, in which case
B3 is appropriate.
Given our informal characterization of discourse structure and our defin-
itions for presentational and contrastive focus, it is a fairly straightforward
matter to explain the inappropriateness of contrastive focus in contexts where
certain beliefs do not hold. For example, if someone said John insulted Mary,
and no question had been asked about what John did to Mary—and if, in
stress and focus in english 113

addition, Mary had not been introduced into the discourse—then the inter-
pretation of focus as contrastive would yield a contradiction between this
aspect of the structure of the discourse and the fact that, in order to constitute
contrastive focus, S must believe that F(v/Foc) for some Foc 6¼ insult. That is,
S must believe that John V-ed Mary had been introduced into the discourse,
which is inconsistent with the assumption that Mary had not been introduced
into the discourse.
Let us now consider the echo intonation, which we represent as ‘?’. In Foc
(F(t)), echo intonation indicates that F(t/Foc) is c-construable, and that there
is something surprising or noteworthy about F(t/Foc) that particularly has
to do with Foc. In this respect, echo focus is like emphatic focus: the
main difference seems to be that the former, but not the latter, requires that
F(t/Foc) be c-construable. So only emphatic focus can be used in presenting
an exciting piece of information as the beginning of a conversation:
(62) a. Guess what! My mother is coming to visit.
b. *Guess what! My mother? is coming to visit.
It appears that a sentence with echo intonation can have precisely the F-structure
of the preceding sentence, being neither presentational nor contrastive (cf.
fn. 27):
(63) H: Your mother is coming to visit.
wonderful!
S: My MOTHER? is coming to visit. That’s
impossible!
Finally, let us consider instances of so-called normal stress. Such a stress
pattern may come about in two ways. First, there may be no assigned s in
S-structure. Second, an s assigned in S-structure may fall on a rightmost
branch. In either case, the result will be a stress peak on the rightmost terminal
of the surface string.
In the first case, there is no marked focus; and it is possible that there is no
constructional focus either. Can there be a sentence with no focus? Such a
sentence would neither add new information to the discourse nor dispute any
aspect of the discourse. Nor would it be a repetition of a prior sentence, since
such a repetition would also repeat the preceding F-structure. We therefore
rule out by convention the possibility that a sentence has no focus at all; a
derivation without an F-structure is ill-formed.
In the second case, there is a possibility that the node identified as the focus is
the highest, root S. Can an entire sentence be a focus when it is not embedded?
In fn. 31 above we suggested that FA will not assign such a F-structure; thus we
will have a derivation without an F-structure if the node S is chosen as Æ in an
application of FA.
114 explaining syntax

On this matter we are in disagreement with the usual understanding that a


sentence is a focus when it answers the question What happened? For all
examples that we know, (i) the VP is focus in such cases, or (ii) the subject
is focus, or (iii) the subject and VP are both focus, and are both given stress
peaks. Consider the following examples:
(64) (A and B hear a loud noise, like an explosion.)
A: What happened?
B: My stereo exploded.
Here B expresses a presentational focus on stereo—in view of the fact that it is
reasonable to assume, in this context, that Something exploded is mutually
believed. The mutual belief may be a bit weaker, e.g. that something made a
loud noise, consistent with our definition of c-construable:
(65) A: What happened?
B: My stereo just short-circuited.
Here it is mutually believed that the stereo exists, and the VP just short-
circuited is the presentational focus. If B needs to introduce the stereo and say
that it short-circuited, then he must say My stereo just short-circuited,
which is an instance of dual presentational focus. Another example of dual
presentational focus would be:
(66) The President’s just been assassinated!

4.5 Summary and review


We now compare the main features of the framework developed in this paper
with other proposals that have appeared in the literature. To summarize, we
have proposed the following:35
(a) Neutral accent: Neutral as well as marked accent exists, and all
accent placement is structurally determined.
(b) No contrastive stress: Contrastive stress per se does not exist.

35
Our rules for interpretation of focus do not mention topic, theme, or presuppositions.
As regards the last, we agree with Sperber and Wilson (1979) that focus structures are not used to
define presuppositions. Concerning the first two notions, our rules indicate that, if some phrase
meets the conditions for interpretation as focus of a particular type, then it must be specified as
a focus in F-structure, and vice versa. As demonstrated in Chafe (1976) and Reinhart (1981a), a
topic need not be ‘old information’; it can also function as a focus in an appropriate discourse
context. Reinhart argues persuasively that the notion ‘topic’ is unrelated to focus or old infor-
mation; i.e. the topic of a sentence is not everything that is unfocused. Rather, the topic of a
sentence is what that sentence is ‘about’, independently of whether or not that constituent
happens also to be a focus.
stress and focus in english 115

(c) Autonomy of stress: The phonological organization of stress and


the assignment and interpretation of focus are autonomous systems.
(d) Autonomy of focus: The interpretation of focus is determined by
context, not by structure.
(e) Types of focus: At least three means of expressing focus exist in
English. (We ignore here lexical items like only which seem generally
to bind focus.) We characterize these as constructional focus, stress
focus, and wh-focus. The first is discussed in Rochemont (1978; 1980);a
the latter two have been the subject of this paper. We have further
argued that wh- and stress are neither necessary nor sufficient charac-
teristics of focus.
(f) The non-unity of focus: Given the preceding two proposals, we
are committed to the position that it is impossible to correlate all
instances of focus with stress, on the one hand, or with a necessary and
sufficient set of interpretive properties, on the other hand. Focus is
represented as a unified phenomenon only at the level of F-structure.
Let us consider now some key points in the various proposals in the
literature. The topic of English phrasal stress has received periodic attention
in the linguistic literature for a number of years. It is possible to characterize
two distinct positions which have emerged in this long-standing debate. We
will here define these positions in broad overview, acknowledging that our
characterization ignores certain crucial (though secondary) views expressed
by particular authors. The proponents of one rather widely held view main-
tain that some notion of ‘normal’ or ‘neutral’ phrasal stress is systematically
definable on the basis of structure; it may in general be identified with
instances of rightmost stress in a phrase (cf. Bierwisch 1968; Bing 1979;
Bresnan 1971; 1972; Chomsky 1971; Chomsky and Halle 1968; Chomsky et al.
1956; Ladd 1980; Lakoff 1972; Liberman and Prince 1977; Newman 1946;
Stockwell 1960; 1972; Trager and Smith 1951). In contrast to this view,
several authors have expressed varying degrees of skepticism regarding
the issue of whether any notion of ‘normal’ stress is empirically defensible
(cf. Berman and Szamosi 1972; Bolinger 1958; 1961; 1972; Daneš; 1967; Schmer-
ling 1976). The primary argument advanced by proponents of this second
position involves the central role played by the notion ‘contrastive stress’
in the characterization of ‘normal stress’, and the failure of proponents of
the ‘normal stress’ position to characterize ‘contrastive stress’ explicitly.
On this view, it is common to maintain that ‘contrastive stress’ simply defines
classes of contexts which are distinct from those defined by ‘normal stress’,

a
See also Rochemont and Culicover (1990).
116 explaining syntax

but are not more highly marked. Although a number of ‘normal stress’
proponents have attempted to respond to these criticisms (e.g. Bresnan
1972; Jackendoff 1972; Ladd 1980), none has in our view proved particularly
successful.
Our present analysis is closest in theory to Jackendoff (1972) and Williams
(1981). It differs from them in explicitly associating surface-structure represen-
tations with prosodic structures (autonomy of stress) and in not attempting to
determine presuppositions on the basis of particular choices of focus which
define the stressed constituents of a phrase. In our view, specific FAs determine
not presuppositions, but contextual conditions under which the associated
sentences would be deemed appropriate (autonomy of focus). In certain
respects, our analysis thus also bears a superficial resemblance to that of
Sperber and Wilson (1979) and Williams (1981), in maintaining the Autono-
mous Systems view of Hale et al. (1977).
Let us consider the specifics of certain proposals. Bresnan (1971) adopts the
view that normal stress is to be distinguished from emphatic or contrastive
stress, and that the location of normal stress is predictable by rule from the
syntactic structure, specifically by the Nuclear Stress Rule of Chomsky and
Halle. Her proposal differs from that of Chomsky and Halle in requiring the
NSR to apply to underlying, rather than surface, representations. In their
critiques of her article, both Berman and Szamosi (1972) and Lakoff disagree
with Bresnan’s characterization of ‘normal’ stress; they suggest that, on
consideration of a broader class of normally stressed sentences, her analysis
is faulty. Bresnan (1972) responds to these criticisms by introducing a distinc-
tion between focus-related normal stress and other instances of normal stress;
she suggests that focus-related stress is specified by the operation of a rule
quite distinct from the NSR.
Bolinger (1972) rejects Bresnan’s analysis on completely different grounds,
arguing that sentence stress (defined as ‘accent’ in Bolinger 1961) is a function
of semantic or emotional highlighting.36 Accent goes on the ‘point of infor-
mation focus’ (cf. Bolinger 1958); i.e. stress on any lexical unit in a sentence
serves merely to highlight that item as an indication of the speaker’s intent in
communication. Given that no structurally independent characterization of
‘normal’ sentence stress can be given, Bolinger continues, it is entailed that no
systematic structural description of sentence accent placement is possible.

36
We agree with Bolinger that there is a notion of ‘normal’ stress defined at the word level.
Because of well-known cases like I said information, not deformation, we can see that, to define
lexical contrast, we must have access to some characterization of the normal stress pattern of a
word, as well as its segmental make-up and/or syllable structure.
stress and focus in english 117

Consistent with this is the argument that the notion of contrastive accent is
an illusion (cf. Bolinger 1961).
Schmerling—and, to some extent, Ladd – agrees with Bolinger that accent
is not structurally predictable; however, both argue that Bolinger’s semantic
characterization is inaccurate. Bolinger’s writings make several allusions to
the notion of semantically neutral accent placement, one which does not
define presuppositions—i.e. a context-free intonation. Both Schmerling and
Ladd present convincing arguments that all semantically determined accent
placements induce contexts, and hence that no independently motivated
characterization of ‘semantically neutral’ accent placement can be given.
Both also find elusive the proposition that accent and ‘point of information
focus’ should be identified with the unit in the sentence with the ‘greatest
relative semantic weight’.
Schmerling (1976) offers an alternative analysis which recognizes two
distinct sentence types: ‘news’ sentences (e.g. John died) and topic-comment
sentences (e.g. John died). Her claim is that each type is identifiable in terms
of discourse function, and that distinct principles of accent placement apply
for each. (Thus we disagree with Schmerling with respect to the autonomy of
stress and of focus.) In news sentences, predicates receive lower stress than
their arguments, regardless of relative linear arrangement (p. 82). In topic-
comment structures, topic and comment receive equal stress at some level of
representation (p. 94); an independent principle (p. 86) then determines the
heaviest relative stress as that which is rightmost.37
Ladd takes issue with Schmerling’s analysis. Her arguments that all accent
placements induce contexts, he notes, show only that no notion of seman-
tically neutral accent placement is definable. In the spirit of Chomsky (1971)
and Jackendoff (1972), he suggests a well-defined notion of syntactically
neutral accent placement (our rule (9)), namely one which ambiguously
identifies a number of focus constituents in a sentence. For example, in a
sentence like (67), ambiguity exists in the scope of focus within the NP that
contains the accent (cf. discussion in }4.2 above):
(67) Was he warned to look out for an ex-convict in a red shirt?
The advantage of this approach is that no implication is made that all
sentences potentially exhibit neutral accent placement; neutral accent is

37
Schmerling also argues against the view of Chomsky and Halle that all rules which relate to
pronunciation constitute an interpretive component of grammar. Her argument is based on her
analysis of stress assignment, in terms of phonological principles, as sensitive to discourse
considerations. Since these principles do not depend in any direct way on syntactic structure,
she takes the analysis to argue strongly against the Chomsky–Halle version of the Autonomous
Systems view. Note that, under our analysis, Schmerling’s argument is without force.
118 explaining syntax

possible only where there is a potential syntactic ambiguity in the scope of


focus. Ladd summarizes his position with the statement that “accent goes on
the point of information focus, unless the focus is unmarked, in which case
the accent goes in a location determined by the syntax” (1980: 114). So, while
Ladd shares our view of neutral accent, he does not assume autonomy
of stress.
When a possibility of focal ambiguity exists, as in example (67), the focus is
said by Ladd to be ‘broad’; the most radical instance is one in which the scope
of focus includes the entire sentence. A ‘narrow’ focus, by contrast, arises in
sentences in which no focal ambiguity is possible; for Ladd, contrastive focus
is the most extreme case of this. Thus he recognizes both contrastive and
neutral accent: these are simply opposite extremes on a continuum which
defines range of focus syntactically from broad to narrow. This appears to be
quite close to our ‘no contrastive stress’ proposal.
However, the quotation above does not accurately reflect Ladd’s ultimate
position. He suggests a further qualification on the nature of the relationship
between accent and focus, required in his view by a principle of discourse
that says: “De-accent something in order to signal its relation to the context”
(1980: 142). In such cases, accent falls by default on some other constituent to
the immediate left or right of the de-accented phrase; hence the appellation
‘Default Accent’. This is a valuable insight, if it can be shown to have a
predictable nature. Ladd suggests that two principles suffice to determine
whether the Default Accent falls to the right or left of a de-accented constitu-
ent. In nominal compounds, Default Accent switches the order of two sister s
and w nodes in the prosodic structure of the word. Within larger constituents,
however, Ladd relies on a hierarchy of ‘accentability’ to determine the pos-
ition of the Default Accent. In his characterization, “content words are more
accentable than function words . . . and nouns are more accentable than other
content words” (p. 125). Thus Ladd’s explanation for the ultimate location of
the Default Accent makes no firm predictions, except that nouns are the most
accentable words.
Bing’s analysis is subject to a somewhat similar objection. In noting
the preferred status which Ladd invokes for NPs, she proposes (1979: 179) a
principle of Noun Phrase Prominence: “A node in metrical structure which
corresponds to a node in syntactic structure which is a noun phrase cannot be
dominated by any node labeled WEAK except when the node has been
destressed because of reference to previous discourse.” Bing’s proposal, then,
is that all NPs are accented unless they already bear a relation to the context of
the utterance—in contradiction to our notions of autonomy of stress and
of focus. On her analysis, verbs and other categories are stressed only by
default. Bing suggests that this is predictable on the basis of the metrical
stress and focus in english 119

structure. However, she further suggests that, given the theoretical vagueness
of the precise relationship of syntactic and metrical structures, one might
appeal to the phenomenon of Default Accent to predict metrical structure.
The circularity of her proposal is evident.
On neither Ladd’s nor Bing’s account, then, is the notion of Default
Accent rigorously defined. In our view, any approach that attempts to define
a notion of relative accentability will fail. Default Accent, if it exists, must
be structurally definable. It is our claim, however, that the need for such
a notion is obviated under a complete analysis of structurally defined accent
placement.
Accent placement is thus seen as a formal matter with consistent and
predictable interpretive results. We have presented an analysis that, with certain
well-defined exceptions, formally characterizes the association of primary
stress and focus in English sentences.
5

Control, PRO, and the


Projection Principle
(1992)*
Peter W. Culicover and Wendy Wilkins

Remarks on Chapter 5
This chapter is concerned with the problem of finding empirical evidence to
support the hypothesis of empty NPs such as PRO in control constructions.
We argued that no such evidence could be found, and that the entire motiv-
ation for PRO was theory-internal and driven by the desire to assign uniform
syntactic representations to constructions that share semantic properties.
(This methodology of Uniformity has been employed widely in the develop-
ment of contemporary generative grammar, as discussed at length in Culi-
cover and Jackendoff 2005: chs 2 and 3.) The approach to control argued for in
this chapter bears a closer resemblance to the treatment of control in HPSG,
LFG, and Simpler Syntax. On this view, control is not a binding relationship
between NPs in the syntax, but a matter of interpretation that is constrained
partly by syntactic structure and partly by the particular lexical items.
The analysis diverged from standard approaches in proposing a semantic
account of control in terms of ‘R-structure’. This is a level of representation
that incorporates information about the referents of syntactic arguments and
their thematic relations. R-structure proves to be a restricted variant of
Jackendoff ’s Conceptual Structure; hence our account here overlaps in import-
ant respects semantic accounts of control such as Dowty (1985), Sag and
Pollard (1991), Culicover and Jackendoff (2001; 2005; 2006), and Jackendoff
and Culicover (2003).

* [This chapter appeared originally in Language 62: 120–53 (1986). It is reprinted here by
permission of the Linguistic Society of America. This work was funded in part by grants from
the National Science Foundation and the Sloan Foundation. We gratefully acknowledge their
support. We would like to thank Joe Emonds, Ann Farmer, Eloise Jelinek, Chisato Kitagawa,
Fritz Newmeyer, Richard Oehrle, and Geoffrey Pullum for their very helpful comments. The
authors’ names appear in alphabetical order.]
control, pro, & the projection principle 121

5.1 Introduction
This paper presents a theory of control (predication), in terms of thematic
relations, which makes no use of the element PRO in the syntax. Important
consequences of the theory are that the Ł-criterion must be relativized to
particular local domains, and the Projection Principle cannot be maintained.
A number of syntactic arguments against PRO are summarized, and the
arguments of Koster and May (1981) in favor of PRO are addressed. It is
concluded that, given a thematic relation-based account of predication, the
Projection Principle in its current form is not a useful postulate in the theory
of grammar.
In previous work (Culicover and Wilkins 1984, henceforth LLT), we have
assumed that infinitives in general are not derived from S0 complements. This
means that we question the existence of the abstract empty NP, usually
referred to as PRO, which is assumed in much current work to be the syntactic
subject of embedded infinitival complements. The issue of the existence of
PRO as a syntactic element is of great importance, given its central role in the
theory of Government and Binding (Chomsky 1981a and much other work).
PRO is necessary to avoid violations of the Projection Principle (hereafter
PrP), which states: “Representations at each syntactic level (i.e. L[ogical]
F[orm] and D- and S-structure) are projected from the lexicon, in that they
observe the subcategorization properties of lexical items” (Chomsky 1981a:
29). In particular, this means that a verb that requires a propositional comple-
ment in LF would require a sentential complement at D- and S-structure.
Where such a verb apparently occurs in the syntax with a bare infinitive, the
PrP requires that the infinitive be analyzed as a full S. This untensed S, with no
overt subject, would have PRO as its subject—at least in English (and similar
languages) where the subject is not optional in the expansion of S. The PrP is
stated formally as follows (Chomsky 1981a: 38):
(i) If ( is an immediate constituent of ª in [ª . . . Æ . . . .  . . . ] or [ª . . .
 . . . Æ . . . ] at Li, and ª = Ǣ, then Æ Ł-marks  in ª.
(ii) If Æ selects  in ª as a lexical property, then Æ selects  in ª at Li.
(iii) If Æ selects  in ª at Li, then Æ selects  in ª at Lj.
In our theory we do not assume the PrP; specifically, we take issue with
statements (ii) and (iii). This means that not all thematic information—the
Ł-marking of (i)—has an overt syntactic representation in terms of distinct
categories at each syntactic level. In other words, subcategorization require-
ments can be satisfied without necessarily presupposing that the logical/
semantic requirements of a verb have a one-to-one correspondence with the
syntactic categories in syntactic structure. Because of the theory of coindexing
122 explaining syntax

which we present here, a given NP may bear a thematic role with respect to
more than a single verbal (or relational) element. Then, because the PrP is not
assumed, there is no reason why the mapping from syntactic to semantic
structure cannot introduce arguments, or rather representations of argu-
ments, as under conditions of predication.
The advantage of our approach over one which includes the PrP is that the
non-syntactic nature of PRO is immediately explained. That is, the apparent
inconsequentiality of PRO for many syntactic phenomena ceases to require
explanation. PRO is a logical element, not a syntactic one.
Koster and May (1982) claim to demonstrate conclusively the theoretical
advantages of assuming that infinitival complements contain a syntactic
PRO subject, and are sentential. Our opposing argument begins, in }5.2, by
presenting our theory of predication, which makes the no-PRO theory inter-
esting. In }5.3, we discuss some arguments against the syntactic element PRO.
In }5.4, we summarize our response to Koster and May’s arguments. Our
general conclusion (}5.5) is that little syntactic evidence exists, if any, in
support of PRO, and that therefore the PrP—including (ii) and (iii)—is not
supported as a useful postulate of the theory of grammar.
It is important to point out from the beginning that our theory of predica-
tion does not, in itself, constitute an argument against PRO or the PrP.
A theory could conceivably adopt our thematic conditions on coindexing
(presented in }5.2) without abandoning the syntactic PRO subject of untensed
clauses. Our case against syntactic PRO can only be evaluated by combining
our theory of predication with our syntactic, arguments (}}5.3, 5.4).

5.2 A theory of predication


The present theory of predication follows Williams (1980) in assuming that an
antecedent is assigned to every predicate by coindexing.1 A predicate is either
any infinitival VP, or some phrasal category in the VP that does not bear a
grammatical relation to the verb. It is defined formally as follows:

1
We do not include here a point-by-point comparison of our theory either with Williams
(1980) or with any of the literature it has generated, because such a comparison would obscure
the larger issue which we mean to address. It will be clear to the reader that our approach owes
much to Williams’ insights about the relation between predicates and antecedents, and also that
both theories owe much to the earlier work by Jackendoff (e.g. 1972). After the completion of
this article, the unpublished dissertation of Rothstein (1983) was brought to our attention. Our
analysis would undoubtedly have benefitted from a consideration of that work.
control, pro, & the projection principle 123

(1) A predicate is any non-propositional major category Xmax, immediately


dominated by Vn, which (a) bears no grammatical relation to the verb,
or (b) is an infinitival VP.2
Predicates are non-propositional in the sense that they are not expressions
with complete argument structure (e.g. they are not S’s). A predicate acquires
propositional content by virtue of the coindexing with an antecedent. “Dom-
inated by Vn” is mentioned to exclude sisters to VP (such as adverbial
phrases) from the coindexing.3 The definition will also exclude main VPs
unless S is a projection of V. Because we analyze S in English as a projection of
Infl (following McA’Nulty 1980, Klein 1981, and Chomsky 1981a), there is no
coindexing as such of a main VP with an antecedent; rather, the antecedent of
the main VP is the grammatical subject.
The definition of predicate refers crucially to the term grammatical
relation. Deep grammatical relations (DGRs) are primitives of the theory.a
Presumably every language will have a syntactic mechanism for uniquely
characterizing the subject and objects of any given verb. The realization of
the grammatical relations might involve some designated morphology, or it
might depend strictly on configuration (as in English). Because the non-
idiosyncratic assignment of thematic relations is based on the DGRs, it will
usually be the case that non-VP predicates will not bear a thematic role.
Infinitives may have DGRs, and bear thematic roles.

2
The definition and treatment of predicates here is a revision of LLT, ch. 2. We mean this new
definition of predicate to be universal, but this does not necessarily mean that all languages will
have predicates of all categories. For instance, some languages do not have infinitives (e.g.
Modern Greek); there we would predict that ‘control’ would be accomplished differently,
perhaps in terms of the binding of an empty NP pro (distinguishing pro from PRO). We
would expect the conditions on binding to differ from those on coindexing, but to be sensitive
to some instantiation of the general Locality Condition of LLT.
In our definition of predicate, we leave open the correct treatment of predicate nominals, as

in Mary is a doctor. If, for independent reasons, a doctor must be classified as a direct
became
object, then the definition would have to be revised appropriately—e.g. to allow a direct object
to be a predicate just in case it is not assigned a Ł-role. However, it may be that a doctor is not,
strictly speaking, a direct object.
3
The definition also excludes infinitival VPs and other predicational elements inside NP, e.g.
[NP Bill’s promise to go]. These must of course be accounted for, but we exclude them from
discussion here (see fn. 16).
a
The theory suggested here anticipates Simpler Syntax, where the ‘deep grammatical rela-
tions’ are linked to conceptual structure arguments, on the one hand, and to syntactic configur-
ations on the other. Constructions such as passive and raising to subject are derived by mapping
the DGR to another DGR (in the spirit of Relational Grammar). So, for example, in the English
passive the deep Object is mapped to the Subject, and realized as the sister of VP. For discussion,
see Culicover and Jackendoff (2005: ch. 6).
124 explaining syntax

5.2.1 Phrase structure and lexicon


Before presenting the coindexing rule, we will discuss what our theory of
predication presupposes about the base (at least in English) and about the
type of information in the lexicon. First, we postulate a smaller verb phrase
within the main VP. We will call this smaller verb phrase V1, to distinguish it
from the maximal VP, which we call V2. V1 contains the obligatorily strictly
subcategorized constituents, as well as those that are directly assigned the-
matic roles by the verb. We use the term ‘subcategorization’ loosely to refer to
the argument selection of a verb or other relational element, without presup-
posing strict subcategorization in the lexicon (we return to this issue in }5.5).
This means that the English base rules include the following:4
(2) Vn ! Vn-1 (XP)*
Our second proposal about the base is the main theoretical construct at
issue here. We postulate syntactic VP complements; not all surface infinitives
are derived from full S0 -complements. (For the current theory, COMP is not
an optional constituent in the expansion of S0 ; where there is no COMP, there
is no sentential constituent.) The consequence of our assumption that VP
complements exist in the base will be that there is no PRO in the syntax.
With respect to the lexicon, we assume that the lexical entry for a verb
consists primarily of the specification of its thematic structure. We distinguish
two classes of thematic roles: extensional and intensional. The former is
related to the human perceptual system, and to the categorization of objects
as physical entities by virtue of their perceived properties. We assume that the
extensional roles include e.g. Gruber’ s (1965) source, goal, and theme.
The intensional class of thematic roles relates to objects with respect to their
status as participants in actions. Roles such as agent, patient, instrument,
and benefactee cannot be assigned to objects just by virtue of the perception
of their physical properties. These particular roles are assigned on the basis of
a theory of human action. It is important to point out here that we are not
claiming that explanatory theories of perception and action yet exist. We are
saying, however, that when the relevant theories are worked out, we will see
that the assignment of particular thematic relations is to be determined by
these models of non-linguistic cognitive systems. In other words, we expect the
set of possible thematic roles to be defined by universal constraints on

4
The rule below is a generalization of the PS rules given in LLT, ch. 2. It will of course be necessary
to impose ordering restrictions on the complements of V—perhaps by an adjacency requirement on
thematic role assignment, or in terms of abstract case assignment (as in Stowell 1981).
The notation (XP)* is intended to designate a sequence of maximal projections, perhaps of
different categories.
control, pro, & the projection principle 125

perception and action theories, rather than by constraints on the system of


grammar. This issue is of particular importance in our discussion of learnability
(LLT, ch. 5).
Given that the lexical entries for verbs specify their thematic structure, a
large part of the categorial component of the syntax (i.e. the PS rules) can be
derived, rather than overtly specified—just as suggested in Chomsky (1981a)
(cf. }5.5 below). Importantly, however, we do not adopt the version of the
Ł-criterion which states: “Each argument bears one and only one Ł-role, and
each Ł-role is assigned to one and only one argument” (Chomsky 1981a: 36).
In our view of thematic roles, comprising two systems, it is possible for the
argument of a verb to be assigned more than a single role, since no logical
disjunction exists between the extensional and the intensional relations. We
follow Jackendoff (1972) in allowing arguments to bear more than a single
role; but we restrict the role assignment, along the lines suggested by
Chomsky, by disallowing more than a single role within either the extensional
or the intensional system. An argument may be assigned at most one role
from each system by a given verbal element.
We stress “by a given verbal element” because another important difference
exists here between our system and the particular version cited for the
Ł-criterion (and its related PrP). Within a sentence, an argument may bear
more than a single extensional or intensional role, so long as each is assigned
by a different verb (or other predicative element). Interestingly, this possibil-
ity would seem to be allowed by the formulation of the Ł-criterion in
Chomsky (1981a: 335), which does not exclude multiple role assignment to a
given argument position. Our requirement that, in such cases, each role must
be assigned by a different predicative element provides the basis for the
correct application of the coindexing rules, which we discuss shortly.
The thematic roles in the general case are assigned algorithmically. The rule
for role assignment is as follows, incorporating Anderson (1977)’s Theme Rule:
(3) (i) a. Assign lexically idiosyncratic roles; or
b. Assign A to the object if there is one. Otherwise, assign A to the
subject (antecedent). Assign E to the subject (antecedent) if
nothing has been assigned to it.
(ii) Realize A as theme.
(iii) Realize A as patient and E as agent, or A as patient and E as
instrument, or E as goal, or . . . , depending on the governing
verb or preposition.
As is clear from the algorithm, the thematic roles are derived by a mapping
which involves DGRs and the lexical entries of verbs and prepositions. Clause
(i.a) is required for cases in which the assignment of Ł-roles is not directly
126 explaining syntax

determined by the constituent structure, e.g. where the complement of a verb


like expect is a proposition. Much of what is expressed in (iii) is not predict-
able from the syntax, and must be stated explicitly in the lexicon. Possibly we
could dispense with these idiosyncrasies in favor of clause (i.a). The use of
A and E, inspired by the terms ‘absolutive’ and ‘ergative’, allows us to
generalize the statement of the distribution of theme, and easily to distin-
guish the theme role from the role borne by the subject of a transitive verb.b
In our discussion throughout this paper, we are essentially assuming
Anderson’s treatment of theme. This means that, for motion verbs, the
theme is the thing that moves; for location verbs, it is the thing whose
location is defined; for transitive verbs in general, it is the thing that under-
goes the described action. Deviations from the theme generalization would
have to be overtly expressed in the lexical entry of a verb. As will become clear
below, we differ from Anderson in the assignment of thematic roles to verbal
complements. The roles source and goal are essentially as presented in
Jackendoff (1972).
The rule for role assignment is relevant in predicational structures. The
coindexing procedures for predicates also involve thematic role assignment.
Verbs, adjectives, or other phrases involved in predicational contexts would
follow the algorithm in assigning relevant roles to their antecedents. For
instance, AP predicates and intransitive VPs would, in the unmarked case,
assign A (theme) to their antecedents, whereas transitive VPs with objects
would assign E to theirs.
In our theory, the well-formedness conditions on thematic relations are not
strictly syntactic: they are relevant to a level of representation which we call
R-structure, and which is read off D-structure. The R-structure is a set of triples
(i, T, k) where, for each triple, i is the index of an NP, T is the set of thematic
roles {t1, t2, . . . } assigned to i, and k is the index of the domain on which T is
defined. For convenience, let us take the lexical item with a subscript to be the
index of the domain. The sentence John fell would then have the R-structure
(John, {theme}, fallk); the sentence John hit Mary would have the R-structure

( John, { AGENT }, hit ), ( Mary, {THEME, PATIENT}, hit )


j j .
c

Each NP in a sentence represents a set of individuals in R-structure. While


R-structure is not a strictly syntactic level, neither is it simply a mental
representation of a configuration of objects in the physical world: it is a

b
For an important proposal about the fine structure of thematic relations, and the condi-
tions under which a particular relation is assigned to a particular syntactic argument, see Dowty
(1991). We were unaware of Dowty’s work when our paper went to press.
c
R-structure is a notational variant of (parts of) Jackendoff ’s Conceptual Structure (Jackendoff
1990; 2002), if we take thematic roles to be strictly defined over CS representations.
control, pro, & the projection principle 127

representation of part of the linguistic description of some expression. Conse-


quently, R-structures are constrained in ways that are not necessarily strictly
conceptual, but rather are linguistic. The constraints on the distribution of
thematic roles, from which a requirement such as Disjoint Reference is derived,
are just such linguistic constraints on R-structure. The particular constraints
assumed by us are referred to as Completeness and Distributedness:
(4) a. The R-structure of a sentence and each individual element of it must
be complete. Every required role must be assigned; each role must be
assigned to a set of individuals; and each set of individuals must have
a role.
b. The R-structure associated with a sentence must be distributed.
A thematic role relative to a particular act or state cannot be assigned
to more than one set of individuals; and more than one thematic role
of the same type cannot be assigned to the same individual or set of
individuals. (LLT, 108)
The advantage we see to representing (and constraining) thematic role
assignment in terms of R-structure, rather than of a syntactic level, is that
certain conditions hold on the distribution of roles and representations of
individuals even when a given set of individuals is not overtly represented by
an NP in syntactic structure. This point can be most readily exemplified by a
discussion of control in Spanish, to which we turn at the end of this section.
Before presenting the coindexing rule which we assume for English, it is
important to mention an additional fact about lexical entries. Included in the
specification of thematic structure (which must be correctly and completely
represented at the R-structure) is the indication of whether a particular role is
assigned to an object, a proposition, or a property (cf. Grimshaw 1979;
Chierchia 1985). The importance of this point will become clear below.

5.2.2 A coindexing rule


To state the coindexing rule, we introduce the following notation. Let R(NP)
be the representation in R-structure of an NP—and similarly, R(X) in general,
for any constituent X. Recall that thematic roles are not assigned to syntactic
constituents, but to their representations in R-structure. Our Coindex Rule
for English is as follows:5

5
The theory of predication presented here is a reformulation of LLT, ch. 2. It owes much to
the important treatment of control in terms of thematic relations by Jackendoff (1972). We differ
in certain respects from Jackendoff, but the basic insight of accounting for the predication
phenomena in terms of thematic relations is his.
128 explaining syntax

(5) Coindex R(NP) and R(X) where X is a predicate.


a. Thematic conditions on R-structure:
(i) If R(X) bears no thematic role, then R(NP) must be a theme or
a source.
(ii) If R(X) is a goal, then R(NP) must be a theme.
(iii) If R(X) is a theme, then R(NP) must be a source.6
b. Locality conditions:
(i) If R(NP) and R(X) both bear thematic roles, they must do so
within the same domain (i.e. with respect to the same role-
assigning element) at R-structure.
(ii) If R(NP) or R(X) bears no thematic role, then X must be
bijacent to NP in syntactic structure.
c. definition: X is bijacent to NP iff:
(i) X is a sister to NP, or
(ii) X is immediately dominated by a sister of NP.7
The functioning of Coindex is illustrated in Figure 5.1. (Most examples are
taken from LLT, where many are borrowed directly from Williams.) In the
discussion of examples, we will speak informally of thematic roles as assigned
to syntactic constituents (i.e. NPs) rather than referring each time to ‘the
representation in R-structure’ of an NP.
(a) S (b) S

NP V2 NP V2

John V1 John V1 AP

V NP AP V NP nude

ate the meat raw ate the meat

Figure 5.1
Both raw and nude are predicates here because they are maximal APs
dominated by VP, and they bear no grammatical relation to the verb. In
Figure 5.1(a), the meat and raw are coindexed because they are bijacent,

6
source might more accurately be called location or experiencer for some verbs; see
Gruber (1965), Jackendoff (1972), Nishigauchi (1984). It is clear that a deeper account of why
goals are excluded as antecedents would be desirable. However, a detailed discussion of
thematic roles would take us far beyond the scope of this paper.
7
The definition of ‘bijacent’ is based on the insight provided in the discussion of
‘c-subjacent’ in Williams (1980: 204, fn. 1).
control, pro, & the projection principle 129

and the meat is the theme of ate. The meat is assigned theme by raw; this
means that, in R-structure, the meat represents the theme of the domain
raw. In general, predicates assign theme to their antecedents. John is not a
possible antecedent because the predicate is not bijacent to it. In Figure 5.1(b),
however, because the predicate is in V2, it is bijacent to the subject, but not to
the object. Here John is the antecedent and theme of nude, because the
predicate is bijacent to it and it is the source of ate (or its location; see
fn. 5 above).
The examples in (6) are among those which we use in LLT to show that
the appropriate level for coindexing (in English) is before Dative Movement
at D-structure. Because we assume that there are no rules of NP movement,8
these same examples illustrate the importance of thematic conditions (5a):
(6) a. John made Billi a good friendi.
b. Johni made a good friendi for Bill.
c. Johni made Bill a good friendi.
The relevant phrase-markers for (6) are given in Figure 5.2.
(a) S (b) S (c) S

NP V2 NP V2 NP V2

John V1 John V1 NP PP John V1 NP

V NP NP V a P NP V NP a
good good
made Bill a made friend for Bill made Bill friend
good
friend

Figure 5.2

The phrase-markers of Figures 5.2(a–c) are all structures in which Coindex


assigns an antecedent to a good friend. In all three cases, a good friend is
coindexed with a bijacent NP: Bill in Figure 5.2(a), John in Figure 5.2(b), and
John in Figure 5.2(c). In all three cases, the coindexed NP is assigned the
theme role with respect to the predicate. In none of these examples is the first
object the direct object of make; i.e. these constructions must be distinguished
from one like John made a good pie for Bill by different assignment of

8
In LLT, ch. 3, we consider the alternative of base-generating passives, but do not adopt such
an analysis, for reasons dealing with the theory of predication. Our revised theory of predication
here resolves the inconsistencies in the base-passive theory pointed out in LLT.
130 explaining syntax

grammatical and thematic relations. We assume that in Figure 5.2(a), as


indicated, both NPs are in V1, and that Bill is the theme and patient of make.
In Figure 5.2(b)—since the NP must not be assigned patient, as a direct
object would be—we assume that it is outside the V1. In Figure 5.2(c), Bill
must be assigned goal (rather than theme, patient) because the ‘good
friend’ is made ‘for Bill’. We assume that the relevant thematic requirements
are indicated in the lexical entry of the verb make. Also indicated is the
thematic role assigned to a good friend. In Figure 5.2(a), a good friend is the
goal; in Figures 5.2(b,c), it is the theme. In Figure 5.2(a), where the predicate
is the goal, the theme Bill is the antecedent—by thematic condition (5a.ii).
In the other two cases, where the predicate is the theme, the antecedent is the
source—by condition (5a.iii) (see Bresnan 1978; 1982a; Wasow 1980 for
relevant discussion).
So far as we are aware, there are no syntactic tests to demonstrate that the
examples in (6a) and (6c) are in fact structurally different as indicated in
Figures 5.2(a) and 5.2(c); however, given the maximally general PS rules in (2),
these phrase-markers will be generated. The derivation in which they are
involved will result in grammaticality only when all the thematic relations are
correctly assigned—i.e. where the lexical requirements of make are satisfied,
and where the predicate has a well-formed antecedent. The examples in (7)
illustrate the importance of the bijacency requirement on syntactic structure
(the asterisks mark the ungrammaticality of the indicated coindexing):
(7) a. John loaded the wagoni fulli with hay.
b. John loaded the hayi into the wagon greeni.
c. *John loaded the wagon with the hayi greeni.
d. *John loaded the hay into the wagoni fulli.
e. *John loaded the wagoni with the hay fulli.
These examples correspond to the phrase-markers in Figure 5.3.

(a) S

NP V2

John V1 PP

V NP AP P NP

loaded the wagon full with hay


control, pro, & the projection principle 131

(b) S

NP V2

John V1

V NP PP AP

loaded the hay P NP green

into the wagon

(c) S

NP V2

John V1 PP AP

V NP P NP green

loaded the wagon with the hay

(d) S

NP V2

John V1

V NP PP AP

loaded the hay P NP full

into the wagon

(e) S

NP V2

John V1 PP AP

V NP P NP full

loaded the wagon with the hay

Figure 5.3
132 explaining syntax

Examples (7a–e) show the two different senses of the verb load. As indi-
cated in Figure 5.3(b), load in (7b) is structurally (and semantically, of course)
similar to the verb put: the PP is inside V1. The other sense of load is indicated
in Figure 5.3(a), where the PP is a daughter of V2. Given the phrase-markers in
Figures 5.3(a–c) and the bijacency requirement on Coindex, the grammatical-
ity judgments indicated in (7) are readily explained.9
The importance of the thematic conditions on Coindex are again illus-
trated below:
nervous
(8) a. Johni sent the book off happy i.
a total wreck

nervous
b. *Johni received the book happy i.
a total wreck
c. John got the presidenti angryi.
d. *Johni got the present angryi. (got = received)
unmade
(9) a. The bedi was slept in i.
with dirty sheets
nude
b. *Billi was talked about angry i.
in the living room

9
Note that, in each case in Figure 5.3 where PP and AP are at the same height, the order could
be changed (in accord with the base rule in (2)) without affecting the grammaticality judgments.
There seems to be stylistic re-ordering within both V2 and V1 in English. Note also that, in a
structure like Figure 5.3(e), the subject would be a possible antecedent for the predicate because
the predicate is bijacent to it; e.g. Johni loaded the wagon with hay [full (from a big meal)]i.
While the Bijacency Condition appears to capture a significant generalization, it is peculiar
in being a syntactic condition on a relation that holds at R-structure. Because R-structure is a
level of semantic representation, a strict version of the Autonomy Thesis (cf. Hale et al. 1977)
should disallow it. Because all other aspects of predication are expressed strictly in terms of
R-structure, we would expect that, ultimately, the Bijacency Condition could be also. Aside
from the issue of the strict autonomy of levels, there are independent motivations for a
reformulation of the bijacency requirement. Although we do not now have a precise reformu-
lation of the Bijacency Condition, we suspect that it may be nothing more than a byproduct
of the way in which syntactic structures are compositionally translated into semantic
representations.
control, pro, & the projection principle 133

angry
(10) a. Johni was found nude i.
in the forest
angry
b. *Johni was looked for nude i.
in the forest

In none of these examples does the predicate bear a thematic role (because in
no case does it bear a DGR, or get assigned a role by an idiosyncratic verb). In
(8a), the antecedent of the predicate is the source of the verb send. In (8c), the
antecedent is the theme of get. The ungrammaticality of (8b) and (8d) results
from the fact that, in both cases, the only possible antecedent—the subject—is
neither theme nor source, but rather goal: this is a violation of the thematic
condition on Coindex. In neither (8b) nor (8d) can the antecedent be the
theme, book or present, because of the obvious conflict in semantic features. Of
course, (8d) is grammatical when we take got to have an active sense (similar to
bought), since on this sense the antecedent is a source.10
We again see a violation of the thematic conditions in (9b) and (10b). In
these cases, the passive subject and only possible antecedent for the predicate
is the goal, neither a source nor theme. Talk about and look for are not
typical examples of transitive verbs, in that their objects do not ‘undergo’ the
action of the verb. These are special source-goal constructions which lack
themes (see Gruber 1965 for relevant discussion). This distinction between
theme and goal assigned to an object is relevant in the semantics of many
verbs. It correctly characterizes the difference between the role assignments in
such pairs as look at vs. look for, and watch vs. watch for or seek (Gruber 1967
specifically takes into account the prepositions that occur with various
verbs).11 The second predicate of each pair has an object which is not affected

10
It is important to distinguish predicates with referential NP antecedents from adverbs

which have scope over some clausal domain, e.g. John received the book nervously . In
with good humor
these cases, John is not the antecedent of the adverb—i.e., John is not the theme of nervously or
with good humor; rather, John, along with the VP, falls within the scope of the adverb. Our
definition of predicate, including ‘Xmax dominated by Vn’, is meant to exclude these adverbs.
Evidence that this exclusion is well-motivated comes from grammatical sentences like It rained
furiously vs. *It rained furious: here the adverb, but not the predicate, is grammatical because
there is no referring antecedent that can bear the theme role (i.e. it here has no referent).
11
It might be that a generalization is missed about the distribution of theme if some
sentences have only source-goal. In that case, we would have to distinguish the usual
theme relation from that in examples like (9b) and (10b). There might be a different role
assigned to ‘themes’ which are not directly affected by their assigning verb. We predict that, as
more work is done on thematic relations, the set of different roles will continue to grow larger;
134 explaining syntax

by the action of the verb. Therefore we get grammatical sentences of the forms
(11a,b), but not the corresponding negation and passive:
(11) a. We looked for Mary but didn’t see her.

b. We watched (out) for


at least one tall man, but all we saw were
sought (out)

midgets.

looked for
(12) a. *We Mary nude, (but we didn’t see her).
sought

looked for
b. *At least one tall man was angry.
sought (out)

In (12a,b), just as in (9b) and (10b), the goal is ruled out as the antecedent of
a predicate. In the grammatical examples (9a) and (10a), the passive subject
which is the antecedent is in each case the theme of the verb.

5.2.3 VP predicates and control d


We turn next to a discussion of Coindex where the predicate is a VP. Because
the current theory contains no PRO, subjectless infinitival complements are
base VPs:12

e.g. it also seems to us necessary to distinguish recipient from goal, and location from
source. We leave this topic for future investigation.
d
The treatment of control in this section and the next is a semantic account of control. It
anticipates much of the typology and analysis of Simpler Syntax (see Culicover and Jackendoff
2001; 2006; Jackendoff and Culicover 2003). The Coindex mechanism used to express the
control relation is stated over R-structures, which is the counterpart to CS in the later approach.
In spirit the current treatment of control is very close to that of HPSG (Sag and Pollard 1991),
which was a major influence on the Simpler Syntax analysis. The main difference is that the
current approach attempted to unify control and secondary predication. Since secondary
predication is not sensitive to the lexical head, while control is, it appears likely that this
unification is in the end not feasible.
12
This idea, of course, is not new; it has been argued e.g. by Brame (1975), Hasegawa (1981),
and Bresnan (1982b). Because these accounts do not base the control theory on thematic roles or
directly address the PrP, comparison with our theory would fall outside the scope of this article.
It is important to reiterate at this point, before a detailed discussion of VPs, that some bear
DGRs—and therefore thematic roles—while others do not. VPs without thematic roles would
include Bill saw John [VP waiting for a bus] or I took a taxi (in order) [VP to get there on time].
These VPs are subject to the thematic condition (a.i) of Coindex and to the bijacency require-
ment; they are not included in the following discussion.
control, pro, & the projection principle 135

permitted
(13) a. John Billi [VP to go]i.
allowed
expected
b. Johni wanted [VP to go]i.
tried
wanted
c. John Billi [VP to go]i.
expected

believed
d. John Billi [VP to be the winner]i
hoped for

Example (13a) illustrates thematic condition (5a.iii). The VP to go is the goal


of the main verb; therefore Bill, the theme, is the antecedent. (In this
discussion of verbs with infinitival complements, we again use Anderson’s
definition of theme.) The sentences of (13b) exemplify thematic condition
(5a.iii): the infinitival VP is the theme (it is what is ‘wanted’, ‘expected’, or
‘tried’), and therefore the antecedent is the source.
To explain (13a–d) fully, we assume that believe and hope for differ from
want and expect in terms of their lexical entries. Believe-type verbs require
a propositional theme (or an NP object with propositional content, e.g.
I believed the answer; see Grimshaw 1979). Expect-type verbs have less
restricted lexical structure, and the theme may be a proposition (I expected
that Bill would leave), an object (I expected a present), or an action (I expected
to leave). Often, as in (13a,b), the VP complement itself has a thematic role—
or, more accurately, translates into a representation in R-structure where it is
assigned a role. In such cases, either thematic condition (5a.ii) or (5a.iii) will
be relevant. In other cases, the VP in itself has no thematic role, but is part of a
coindexing relationship which is assigned a thematic role in R-structure. In
other words, the coindexed elements form a proposition, and this complete
proposition bears a role in R-structure; these cases fall outside the scope of the
thematic conditions. Examples (13c,d) are relevant here.
Example (13c) shows sentences where want and expect have a propositional
argument. The verbs here assign two roles: experiencer (or source) and
propositional theme (meaning a theme which denotes a proposition). In
order for the infinitival VP to be translated as a proposition in R-structure, it
must have an antecedent, and it must be assigned a Ł-role by the verb. In (13c)
there are two possible antecedents, John and Bill. If John were coindexed with
to go, it would be the experiencer and also be included in the theme. This
would yield an ill-formed R-structure, since Bill would have no role at all. But
136 explaining syntax

if Bill is coindexed with to go, then the proposition Billi [to go]i is the theme,
and John is the experiencer; this yields a well-formed R-structure.
Example (13d) presents two more cases where the same type of derivation is
relevant. The subject John is the source. We assume that the matrix verbs,
which both take propositional themes, assign no roles to their NP object or
to their VP complement. Coindexing is free; however, Bill must be the
antecedent of the infinitival VP. Bill bears no role with respect to the matrix
verb; thus it must receive a role from the infinitive, in order to avoid a
violation at R-structure. The proposition formed by the coindexing, Billi
(to) be the winneri, is then theme of the main verb.
For neither (13c) nor (13d) are the thematic conditions on R-structure
relevant. The matrix verb here assigns theme to the proposition that includes
the predicate; therefore, by definition, the predicate does not lack a role, and
thematic condition (5a.i) does not apply. However, the predicate is not in
itself the theme (or goal), and therefore neither condition (ii) nor (iii) is
relevant.
In contrast, simple AP predicates—such as raw in John ate the meat raw—
will have no thematic role in R-structure; therefore thematic condition (5a.i)
applies.
What is relevant in those cases when either the predicate or the antecedent
lacks a thematic role is the bijacency requirement. Where the R-structure
indicates that one of these elements has no role, then the predicate must be
bijacent to its NP antecedent in the associated syntactic structure. Because
each R-structure is associated with a D-structure (= NP-structure), both the
thematic and syntactic information is available for the well-formedness
conditions.
In each of the cases of coindexing exemplified in (13), the infinitival VP
assigns the role theme to its antecedent by the thematic-role assignment
algorithm. That other roles may be assigned by coindexing is illustrated in
(14), where we give the R-structures for each sentence (for readability, the
domain in each case is identified by the verb that defines it):
(14) a. John permitted Billi [to kick the dog]i.
{<John, {source}, permit>,
<Bill, {theme}, permit>),
<[to kick the dog], {goal}, permit>,
<Bill, {source, agent}, kick>,
<the dog, {theme, patient}, kick>}
b. John wanted Billi [to kick the dog] i .
{<John, {source}, want>,
<Billi [to kick the dog]i, {theme}, want>,
control, pro, & the projection principle 137

<Bill, {source, agent}, kick>,


<the dog, {theme, patient}, kick>}
Locality condition (5b.i) on R-structure is necessary to assure that, when there
is an antecedent for an infinitival VP, it will occur in the same S:
(15) a. Bill believed that Johni wanted [to go]i.
b. *Billi believed that John wanted [to go]i.
c. Johni believed Bill j [to have been permitted [to leave]j]j. (* . . . [to
leave]i]i.)
d. *Billi believes that it is easy [to fly]i.
In (15a,b), the VP to go is a theme; therefore the antecedent must be a source.
Both Bill and John are sources of their respective verbs. Only John, the closest
source, may be coindexed with the infinitive.13 The result is that both John
and to go bear roles with respect to the same verb, i.e. in the domain of want:
John is the experiencer of want, and to go is its theme. In (15c) again the
locality condition is respected. To leave is the goal in the domain of permit.
By coindexing (as in (13d)), Bill is the theme of permit (Bill is coindexed with
the larger infinitive to have been permitted to leave). Bill is also the antecedent
of to leave, resulting in both the antecedent and the predicate bearing roles
with respect to permit.14 In (15d) the NP in the matrix is not the antecedent of
the embedded VP. The closer NP, it, is the antecedent and results in ‘non-
obligatory control’, to which we will return shortly.
This brings us to the well-known difference between promise and persuade.
(16) a. Johni promised Bill [to leave]i.
b. *Bill was promised to leave.
c. *Mary believed that Bill was promised to leave.
d. John persuaded Billi [to leave]i.
e. Billi was persuaded [to leave]i.
In (16a), to leave is the theme of promise ; and by thematic condition (5a.iii),
the antecedent must be the source John. In this case, Bill is the goal (this
thematic analysis is from Jackendoff 1972: 216). The ungrammaticality of (16b)
results from the fact that there is no source; the passive subject would be the

13
There are of course many other conditions which could rule out a coindexing like that in
(15b), such as some version of the Specified Subject Condition (e.g. Chomsky 1973), or some
version of the Variable Interpretation Convention (Wilkins 1977; 1980). What is of interest to us
for the moment is simply that there is some requirement of ‘locality’ which is relevant for the
coindexing of infinitives.
14
For a treatment of control that would give similar results, although developed within a
different framework, see Farmer (1984).
138 explaining syntax

goal just as in (16a).15 Example (16c) is ungrammatical because, even though


there is a source (Mary is the source of believe), the locality condition
prevents it from serving as antecedent of to leave.
For persuade, as (16e) shows, there is no problem with the passive: just as
for the active, to leave is the goal; Bill is the theme; and Bill is coindexed as
the antecedent.
This account of promise/persuade presents interesting motivation for
stating the locality condition (5b.i) in terms of R-structure. If the condition
is strictly syntactic—as in Rosenbaum’s (1967) Principle of Minimal Distance,
or in our LLT Locality Condition—then there are only two possibilities for
promise: either it is simply exceptional, or there must be a rule of Dative
Movement in English. In other words, John promised Bill to leave must be
exceptional, in that the closest NP does not control the infinitive; or such a
sentence must be derived from underlying John promised to leave to Bill. We
think it preferable to state the well-formedness condition on R-structure—
where, as we have shown in the discussion of the examples in (16), both
promise and persuade are explained unexceptionally. Thus we take our
account so far to be evidence in favor of expressing control in terms of
R-structure; and our discussion of control in Spanish, below, will make the
case for R-structure much stronger.
Our proposed theory of VP complementation and control raises the issue
of the interaction of coindexing and passivization of the infinitival VP. These
sentences illustrate the relevant cases:
(17) a. I expected Mary to be examined (by the doctor).
b. I persuaded Mary to be examined (by the doctor).
Here the infinitival VP is coindexed with Mary by the relevant conditions of
Coindex. Because we assume that the passive construction is base-generated,
the account is very straightforward:16 it is simply stated that the antecedent of
a verb with designated passive morphology is assigned the appropriate thematic

15
Even where there is a by-phrase, the passive with promise is ungrammatical: *Bill was
promised to leave by John. It is possible that the object of by is in a different thematic domain in
R-structure at the point at which Coindex applies. In other words, by might not directly assign
to its complement any thematic role governed by the verb. The role of the complement of by
would be themeby , and the interpretation of this argument as having the same roles as the
subject of the active verb would be determined at a later stage. This account would be in the
spirit of the Thematic Hierarchy Condition of Jackendoff (1972).
16
Of course we are not the only researchers in generative grammar who assume the existence
of a non-transformational passive; others include Freidin (1975), Bresnan (1978; 1982c), Wasow
(1980), Brame (1978), Koster (1978a), Bach (1980), and Keenan (1980). Gazdar (1981) and all
work in Generalized Phrase Structure Grammar also presuppose the PS generation of the
passive.
control, pro, & the projection principle 139

roles. This would be relevant both for grammatical subjects and for antecedents
determined by coindexing. NP movement need not apply in an embedded
domain before coindexing takes place. In both cases in (17), Mary is assigned
theme, patient by the infinitival VP, just as it is in Mary was examined.
The same explanation is relevant where the subject of the matrix is the
antecedent of an infinitival passive VP:
(18) John wanted to be arrested.
Because to be arrested is the theme of want, the antecedent is the source
John. By the coindexing, John is assigned theme, patient of to be arrested.
These roles are assigned because of the passive morphology that occurs on
arrest (cf. John wanted to arrest Bill, where John is assigned agent of arrest).

5.2.4 Non-obligatory control and secondary predication


Another important type of control is that which has been called ‘non-obligatory’
(cf. Williams). In our theory, such control results from two different types of
cases. The first is illustrated here:
(19) a. It is important to arrive on time. (cf. It is important for John to
arrive on time.)
b. It is a pain to visit John.
In both these examples, the closest NP to the infinitival VP is an NP with no
identifiable referent, i.e. with reference to an unspecified set of individuals or
objects: it in (19a), and a pain (or maybe it) in (19b). We say that this NP (or rather
its representation in R-structure)—which we call ‘arbitrary’ or arb because it
lacks a referential index—is the antecedent of the infinitive. The infinitive, then,
like its antecedent, lacks an index. In these cases there is non-obligatory control.
The R-structures of such sentences indicate a triple which includes a representa-
tion of an arbitrary set of individuals, e.g. <arb {theme} arrive >.
We capture this fact about non-obligatory control in an addition to the
Coindex rule:
(20) (i) = Rule (5).
(ii) Assign arb to R(X) if it lacks an index.
That the locality condition is relevant for predicates with an arbitrary antece-
dent is shown in examples like this:
(21) a. Maryi said to the childrenj that it is important [to tell the truth]*i,*j
b. *Billi believes that it is a drag [to tell the truth]i.
140 explaining syntax

In these cases, there is no referring antecedent in the same domain; in


other words, there is no referring argument which is assigned a thematic
role by the same verbal element that assigns a role to the predicate. What
then happens is that the predicate is not coindexed with any argument in
R-structure; it has no index; and it thus receives an arb interpretation. The
R-structure for the examples in (21) therefore includes the triple <arb {agent}
tell>. In these cases the bijacency requirement is (vacuously) relevant also. The
non-referential NP has no thematic role; therefore the predicate must be
bijacent to it. Since the NP has no index, however, the rule cannot ‘coindex’.
Another type of example where a VP complement does not have a unique
controller is illustrated below:
(22) a. To die is no fun.
my
b. To leave would be pleasure.
a
c. What to do is a mystery (to John).
Where the infinitival VP is in subject position, again there is no antecedent
in the relevant domain. The predicate is not coindexed with any argument in
R-structure, and it receives an arb interpretation. However, these VPs are
arguments themselves, and they are antecedents that are coindexed with
constituents in the following VP. In other words, to die is the antecedent of
my
no fun; to leave is the antecedent of pleasure. Here, in R-structure, the
a
infinitive is the i of the triple, rather than the k.
In (22c), the D-structure antecedent of a mystery (to John) is [NP [spec,
+ wh] [VP to do what]]. Even where there is evidence of wh-Movement in an
infinitival, there is no necessary reason to assume that the constituent is an S0
with a PRO subject. We will return to this question in our discussion of Koster
and May (1981).17
Thus far in this section, we have presented a theory of predication which
makes no use of the element PRO, yet adequately characterizes the ‘control’
facts for infinitival complements in English.18 Our theory accounts for

17
Another case of an infinitival VP in NP would be John sent Mary [NP a book [VP to read]]. We
believe that here a separate coindexing for infinitival modifiers applies in NP; as Nishigauchi
(1984) shows, the goal of the main verb is often the antecedent of the infinitive. An analysis like
Nishigauchi’s is readily incorporated into our theory—except, of course, that we assume no PRO.
18
At least one set of examples is not correctly accounted for by our theory. These involve
verbs of ‘saying’:
(i) John said to Mary [to arrive on time].

(ii) John asked Mary [to arrive on time].


told
control, pro, & the projection principle 141

essentially the same facts as one like that of Williams,19 which assumes PRO
(also see Manzini 1983 for relevant discussion).20

5.2.5 Control in Spanish


We consider now the issue of the control of infinitival complements in
Spanish. We have been assuming that the control facts are expressed at
R-structure. We have assumed also that the thematic conditions on Coindex
are to be understood as well-formedness conditions on R-structures. We have
not demonstrated, however, that a level of R-structure is necessary in the
theory. In our brief consideration of Spanish, we will see that a theory which
includes a level of R-structure seems particularly well-suited to an interesting
explanation of the facts.

(iii) John said [to arrive on time].


Here, if the infinitive VP is the theme, then according to Coindex, the source should be the
antecedent. In (i) and (ii), Mary, which would seem to be the goal, is the controller; in (iii),
there is non-obligatory control. These VPs seem to work like the infinitival modifiers of
Nishigauchi (1984). [For a more recent account of control that handles these cases, see Culicover
and Jackendoff (2005: ch. 12). This account, like the one proposed in the current chapter, offers a
semantic account of control that is constrained by the properties of lexical items.]
19
An advantage of our theory of control over that of Williams is that we need no rules of
‘arb rewriting’: the arb interpretation follows directly from our account of coindexing. Another
important difference between the two theories is the extent to which thematic information is
utilized; we have no strictly syntactic contexts for predication.
Again with respect to arb interpretation, it must be pointed out that an AP predicate can be
interpreted as arb only when it is inside some infinitival VP that is arb:
(i) To swim nude would be fun.
In general, a predicate with no Ł-role cannot be arb:
(ii) *It is important nudearb.
A predicate inside the infinitive in (i) is grammatical as arb, even though it is bijacent to no NP
in syntactic structure.
This suggests, as mentioned in fn. 8, that the correct treatment of these predicates, as with
infinitival VPs, is in terms of R-structure. Effectively, the antecedent of nude in (i) is the
antecedent of the predicate that immediately contains it. We surmise that there is a chaining
of predicates in R-structure. It remains, of course, to work out a formal account of this chaining.
20
Manzini points to sentences like (?)John was promised to be allowed to leave as counter-
examples for current approaches to control phenomena. It is not clear to us that such an
example is grammatical; however, it would be if promise has some use in which its thematic roles
are the same as e.g. permit:
(i) Bill permitted John [to go].
source theme goal
(ii) (?)Bill promised John [to be allowed to go].
source theme goal
In (ii), the antecedent is the theme John; and just as with permit, the passive example would be
well-formed. This use of promise is obviously very restricted; it permits only to be allowed, to be
able, or the like in the embedded VP.
142 explaining syntax

In Spanish, it does not seem possible to account for the control of infini-
tives in strictly configurational (syntactic) terms. As in English, there seem to
be thematic well-formedness conditions. There is, however, an interesting
difference between the two languages with respect to control.
In English, some NP in the sentence is, in general, the controller of the
infinitival VP. The arb interpretation arises only under the restricted circum-
stances pointed out above—namely, where the VP has no index because it is
not locally coindexed with a referring NP, or where the infinitive is a subject:
wants
(23) a. Maryi expects [to leave]i.
asks
wants
b. Mary expects youi [to leave]i.
asks

sees
(24) a. Mary permits youi [(to) leave]i.
makes
sees
b. *Mary permits [(to) leave]ARB.
makes
By contrast, in Spanish, the interpretation of an embedded infinitival VP
often involves a controlling NP that is not overtly indicated in the syntax:
recetó
(25) a. Ana tei permitió saliri.
vio

prescribed
Ann 2.SG permitted to.leave
saw

prescribed that you


‘Ann permitted you to leave.’
saw you

recetó
b. *Anai permitió saliri.
vio

recetó
c. Ana permitió salirARB.
vio
control, pro, & the projection principle 143

But not all Spanish verbs allow this arb interpretation of the complement:

quiso
(26) a. *Ana tei esperó saliri.
decidió

wanted
Ann 2.SG expected to.leave
decided

wanted
‘Ann expected you to leave.’
*decided
quiso
b. Anai esperó salirARBi.
decidió

wanted
Ann expected to.leave
decided

wanted
‘Ann expected to leave.’
decided

quiso
c. *Ana esperó salirARB.
decidió

wanted
Ann expected to.leave
decided

wanted
‘Ann expected to leave.’ (ungrammatical in English
decided on the ARB interpretation)

Before we can characterize this difference between English and Spanish


formally, it is necessary to point out a further characteristic of the above
examples. Compare the following:
144 explaining syntax

(27) a. Fue triste lamentar.


was.3.sg sad to.lament
‘It was sad to lament.’
b. (El) lamentar fue triste.
the to.lament was.3.sg sad
‘(The) lamenting was sad.’

Permitió
(28) Recomendó lamentar.
Escuchó

permitted
recommended 3.SG. to.lament
listened to

permitted
*‘S/he recommended (to) lament.’
listened to

In (27), the interpretation of the infinitive lamentar is truly ‘arbitrary’ in the


sense that its antecedent is any (set of) individual(s). This is the type of
arbitrary interpretation found in English in It is important to study. The
interpretation of the infinitive in (28) is rather different. Here the antecedent
of lamentar is not expressed in the syntactic structure, but neither is it really
arbitrary: it must be interpreted as the (unexpressed) object of the matrix
verb. The interpretation is as in English S/he listened to someone lament. This
type of sentence contrasts with one like (29).
(29) Escuchó el lamentar.
listened.to.3.sg the to.lament
‘S/he listened to the lamenting.’
Examples like (28) have an ‘understood’ object of the matrix verb, and this
object is the antecedent for the infinitive. For verbs like escuchar, the object is
the controller of the infinitive, whether or not it is overtly expressed in the
syntax. We would say, therefore, that (27) and (29) illustrate arbitrary control,
whereas (28) presents a different case.
We propose to account for these Spanish control facts in terms of
R-structure, rather than syntactic structure. It appears that Spanish permits
an object in R-structure to be a controller, even though it is not present in
syntactic structure. The R-structure in example (28) indicates that the relevant
thematic role is assigned to the representation of individual(s) which is
control, pro, & the projection principle 145

unspecified in the D-structure. Of course, we must also assume that, at some


level (probably discourse), there is a relevant specification of the NP that is left
unspecified at the sentence-syntax level.21
This difference between Spanish and English can be characterized in our
notion of R-structure, where thematic roles are represented and where impli-
cit (but syntactically unexpressed) objects can be expressed. The thematic
conditions on Coindex are the same for the two languages; the difference
concerns the syntactically obligatory nature of subjects and antecedents in
English as compared with Spanish. For both languages, the thematic condi-
tions are well-formedness requirements for R-structures; but English, unlike
Spanish, is sensitive to whether there is an overt syntactic representation of an
antecedent NP. English requires both syntactic subjects and antecedents,
whereas Spanish allows subjects (as in so-called ‘pro-drop’ sentences) as
well as antecedents of some predicates to be syntactically null.
Returning to the examples, in (25a) the theme te is overtly expressed, and
effectively controls the interpretation of salir. In (25c) and (28), the theme,
which is not expressed in syntactic structure, is nevertheless the controller.
This fact is accurately represented at R-structure. The examples in (27) are
handled just like the parallel cases in English.
The Coindex rule (20) is the same for Spanish as for English. For
both languages, the thematic well-formedness conditions are relevant at
R-structure. The result of the application of Coindex to (28) is given in (30).
(30) {<yo, {agent, source}, permitiri>,
<xi, {theme, patient}, permitiri>,
<lamentarj, {goal}, permitiri>,
<xi,{agent, source}, lamentarj>}
Because lamentar is the goal of the verb permitir, the antecedent must be a
theme. As indicated in the R-structure, the theme in this case is the unspeci-
fied xi. This xi is then identified at R-structure as the antecedent of lamentar.
The locality condition is respected in that both the predicate and the antece-
dent bear roles with respect to the domain permitir.
Thematic condition (5a.iii) is relevant for examples like (26), where the
subject is the source (or experiencer) and the infinitive is the theme. It is
also relevant for the usually recalcitrant case involving prometer ‘promise’, just
as in English:

21
It is important to note that we distinguish an unspecified NP in R-structure (as we are
referring to it here) from an NP in syntactic structure which dominates no lexical material (i.e.
[e]), and which must be lexically filled or bound in a well-formed derivation. The unspecified
NP is present in R-structure, but not in the syntactic levels.
146 explaining syntax

(31) Le prometı́ salir (a Maria)


to-her promised.1.sg to.go to Maria
‘I promised (Maria) to leave.’
Here the infinitive is the theme; le ‘to her’ is the goal; and the subject
(indicated by the verb morphology) is the source and therefore the antece-
dent of salir.
More remains to be said about predication and control in Spanish; we
include this brief presentation only to motivate our use of the level of
R-structure. The Spanish control facts are best explained at a grammatical
level where the extensions of arguments, but not syntactic constituents, are
represented. This is not a strictly syntactic level, such as D-, S-, or surface-
structure.22 The difference we have pointed out between Spanish and English
would not be expressed in terms of the Coindex rule. The control difference is
actually a reflex of the fact that the two languages differ in the treatment of
pronominal subjects (and antecedents); this is to be explained by a theory of
the ‘pro-drop’ facts (see Chomsky 1981a).
We move next to a number of syntactic arguments against PRO, based on a
variety of English constructions.

5.3 Arguments against syntactic PRO


Here we briefly demonstrate some of the syntactic disadvantages of postulat-
ing PRO.23 Our arguments demonstrate the essentially non-syntactic nature
of this element, and thus give evidence for the no-PRO theory. Certainly, for

22
One might postulate a PRO in object or clitic position, which then controls the embedded
subject PRO; this would mean substantially revising the conditions on the distribution of well-
formed PRO. Jaeggli (1982) does propose an object PRO for Spanish, but it has very different
properties from the one which would be necessary here. Importantly, a PRO in object or clitic
position would be an element with no syntactic properties, exactly as we demonstrate for subject
PRO.
We prefer an account of control that uses R-structure to one based on lexical information (as
suggested by Bresnan 1982b), specifically because we can thus account for the difference between
English and Spanish that we have pointed out. It would seem ad hoc, in a lexical account of
control, to express the fact that English—but not Spanish—requires the controlling antecedent
to have a syntactic representation. This does not seem to be a fact best captured in terms of the
control relations of individual lexical entries: the general pattern of both English and Spanish
would nowhere be expressed.
Bresnan (1982b)’s account is similar to ours in that the control possibilities are determined
not by structural relations but in terms of the ‘function’ of the antecedent NP (though Bresnan’s
‘function’ is not directly equivalent to our thematic relations). A thorough comparison of the
two theories of control would be of interest in future research.
23
Arguments which appear in this section in abbreviated form, because of space limitations,
are discussed in detail in LLT, ch. 2.
control, pro, & the projection principle 147

each argument, there may be some way in G[overnment/]B[inding] theory to


account for the facts. In certain cases, an apparently syntactic phenomenon
could be accounted for in the phonological component (where PRO is
invisible), or by stipulating that PRO is to be distinguished from all other
NPs (e.g. by being assigned a Ł-role independent of case, or by not having a
governing category).e We would take each such explanation as evidence that
PRO, as distinct from lexical NPs or the trace of wh, is devoid of syntactic
characteristics.

5.3.1 Gapping (I)


The rule of Gapping cannot apply where the gapped sequence contains an
overt NP antecedent, as in (32), or a subject NP, as in (33). Neither can the gap
contain the trace of wh-Movement, as in (34) (the square brackets indicate the
gapped material):
(32) Arthur expects Mary to go dancing, and Archie *[expects Mary], to go
to the movies.
(33) John said that the kids like elephants, and Mary *[said that the kids
like], camels.
(34) Who did John say t ate the cake, and who did Mary *[say t ate], the pie?

Gapping is not sensitive, however, to a PRO internal to the gapped sequence:


(35) a. John tried PRO to leave, and Mary [tried PRO], to stay.
b. Susan will manage PRO to fix the faucet, and John
[will manage PRO], to fix
the sink.
[will manage PRO to fix].
c. John expected PRO to try PRO to leave, and Mary [PRO to try PRO],
to try], PRO
to stay.
These facts follow immediately if there is no syntactic element PRO.

5.3.2 Gapping (II)


Assuming that Gapping may apply where only one constituent follows the gap
(Stillings 1975), there is a ready account of the grammaticality difference
between (36a) and (36b):

e
In fact, in subsequent work, the notion of ‘governing category’ was ultimately abandoned
and it was proposed that that PRO has a special abstract Case, or no case; for a review of the
issues, see Landau (2006).
148 explaining syntax

expect expects
want wants
(36) a. I would like Mary to be rich and Bill *[ would like ] Sam to be poor.
believe believes
find finds

expects expects
b. John wants to eat the beans, and Mary [ wants ] to eat the
would like would like
potatoes.

The examples in (36a) are ungrammatical in our theory because what follows
the gap is the sequence NP VP. In (36b), what follows the gap is just VP. If, in
both cases, the single constituent [S NP VP] were involved (whether in
syntactic structure or in the phonological representation), then we would
have no account of the grammaticality difference.

5.3.3 Pseudo-clefts
S0 is, in general, well-formed as the focus of a pseudo-cleft:
(37) a. What John expects is that he will be elected President.
b. What John prefers is for Mary to be elected President.
If, in (38a,b), the focus constituent is an S0 , we have no account of the
grammaticality distinction:
(38) a. What John expects is to be elected President.
b. *What John expects is Mary to be elected President.f
Assuming no PRO, (38a) has a VP as focus. Example (38b) is ungrammatical
because the focus of the pseudo-cleft is a sequence of two constituents: NP
VP.24 If (38a) contained PRO, it should be excluded for the same reason.

5.3.4 Appositive relatives


In general, full NPs and pronominal forms allow appositive relatives:
(39) a. John expects Bill, who deserves it, to win the prize.
b. John expects himself, who deserves it, to win the prize.
However, PRO cannot occur with an appositive:

f
Examples such as these strike me as much better now than when we wrote this article. It is
conceivable that they are well-formed, and derived by omitting the complementizer for from
sentences such as What John expects is for Mary to be elected President.
24
In GB theory, (38b) is ungrammatical because Mary would not be assigned case. While this
accounts for the facts, we interpret this kind of explanation as evidence that only case-marked
NPs—i.e. NPs other than PRO (or the trace of NP)—have any syntactic reality.
control, pro, & the projection principle 149

(40) *John expects PRO, who deserves it, to win the prize.
In a theory with no PRO, (40) is ungrammatical because there is no antece-
dent for the appositive.25

5.3.5 Conjunction
If PRO is an NP, we would expect it to conjoin with other NPs. However, it
does not, as shown in (41b,c).26
(41) a. I expect to go to Italy, and I expect John to go to Italy.
b. *I expect PRO and John to go to Italy.
c. *I expect John and PRO to go to Italy.

5.3.6 Stylistic Inversion


The rule of Stylistic Inversion, illustrated in (42) and (43), moves an NP into a
VP where a constituent has been extracted from that VP.
(42) a. The man in the funny hat sat on the stool.
b. On the stool sat the man in the funny hat.
(43) a. John expects the man in the funny hat to sit on the stool.
b. ?On the stool John expects to sit the man in the funny hat.
Where there is no overt ‘subject’ of the infinitive, the subject of the matrix
moves into the VP:
(44) a. The man in the funny hat expects to sit on the stool.
b. On the stool expects to sit the man in the funny hat.
Under our analysis, the NP which is involved in Stylistic Inversion is always
the antecedent of the involved VP. This NP will be either the syntactic subject
(e.g. of a matrix S, as in (42)) or the antecedent of an embedded infinitival VP
(as in (43) and (44)).27

25
Case theory would be hard-pressed to account for the grammaticality difference between
(39a) and (40), given that expect generally has optional S-bar deletion, and therefore permits
both John expects PRO to win and John expects Bill to win. Presumably the application of S0
deletion should not be sensitive to the appositive in its context.
26
Case theory can account for the grammaticality facts here, as in (38b): expect either would
or would not have S0 deletion—meaning that either both PRO and John would be assigned case,
or neither would be. Again (see fn. 23 above), case theory highlights the fact that case-marked
NP has a clearly syntactic character, whereas PRO does not.
27
The definition of ‘antecedent’ includes both grammatical ‘subject’ and the antecedent
designated by coindexing. This is discussed in LLT, ch. 4, in terms of the ‘antecedent-internal
e condition’ of Delahunty (1981).
150 explaining syntax

The facts in (43) and (44) would be difficult to explain if, in both cases, the
embedded clause were an S0 with a PRO subject. If there were a PRO subject,
we would expect Stylistic Inversion not to apply at all—given that pronominal
forms cannot themselves invert, and that they block other NPs from moving
into the VP over them:
(45) a. He sat on the stool.
b. *On the stool sat he.

(46) a. He expects to sit on the stool.


b. *On the stool expects to sit he.

(47) a. The man in the funny hat expects him to sit on the stool.
himself
b. *On the stool expects him to sit the man in the funny hat.
himself
The grammaticality of (44b) not only argues against the pronominal
element PRO but, given the well-known constraint against the lowering of
constituents, also argues against the claim that the infinitive is an S0 (or S). In
our theory, the ungrammaticality of (47b), as compared with (44b), results
from the fact that the NP moved into the infinitive is not its antecedent.
We have presented here six different constructions which indicate the
disadvantages of assuming that PRO is a syntactic element (more are pre-
sented in LLT, ch. 2). The arguments associated with these constructions
would not be of so much interest if the theory containing PRO were the
only one to explain the control facts. We claim, however, that at least one
other explanatory theory, namely ours, does not use PRO. In the next section,
in discussion of Koster and May (1981), we address certain theoretical argu-
ments adduced in favor of PRO.

5.4 Arguments of Koster and May (1981) for syntactic PRO


We here discuss the most salient arguments of Koster and May (1981, hence-
forth K&M) for the existence of PRO as a syntactic element (see LLT, ch. 2, for
a detailed discussion of K&M). Essentially, K&M give two types of arguments.
First, they claim that bare infinitival complements cause a complication of the
base component. We note that this need not be the case. Second, they give a
set of grammatical arguments to support the assumption that PRO exists.
These arguments can each be accounted for in a theory that marks the
relationship between an infinitive and its subject through coindexing for
predication, without the control of PRO.
control, pro, & the projection principle 151

5.4.1 Wh-infinitives
K&M argue that, since wh-infinitives as exemplified in (48) must have COMP,
they must be S’s (and therefore would have PRO subjects):
(48) a. I wonder what to do.
b. a topic on which to work
However, COMP might be introduced under two types of nodes. If this means
an unwanted complication of the base, then we would also have to question
the analysis of NP and AP by Selkirk (1977) as both containing DET, on the
basis of examples like John knows this man and John is this tall. But in
that that
fact, what Selkirk’s observations point to is a generalization: if NP and AP are
both analyzed as [ + N], DET can be generalized as the specifier of [ + N]
phrases.
A similar generalization can be made for COMP. If it is supposed that
infinitival VP (or VP0 ) is the maximal projection of V, and that S0 is the
maximal projection of Modal [later Infl] (as suggested by McA’Nulty 1980,
Klein 1981, and Chomsky 1981a)—and if it is assumed further that Modal and
V share the feature [+V] (cf. Chomsky 1972)—then we have the generalization
that COMP is the specifier of [+V] phrases.28
K&M also argue that the introduction of COMP in both VP0 and S0 is
undesirable, given that VP0 is not a bounding node with respect to subjacency.
They note (p. 135) that the presence of COMP in VP0 cannot block configur-
ations like the following (their (79)):
(49) *What2 does Mary wonder [VP0 to whom3 [to give e2 e3]]?
But there are other constraints which will block (49), including the Variable
Interpretation Convention of Wilkins (1977; 1980) and the Locality Condition
of LLT.
Finally, in this respect, note that an analysis of examples like (48) and (49)
need not turn on just the issue of VP0 vs. S0 . Our theory (in LLT) in fact says
that these wh-phrases should be analyzed as NPs with [+wh] specifiers that
permit wh-Movement. This analysis is based on an adaptation of the ‘deverb-
alizing’ rules of Jackendoff (1977), where NP can be rewritten as [SPEC V00 ].

28
Under this account, a VP which occurs with tense would be Vmax-1. We do not in fact claim
that infinitival VP has a COMP (except when the VP is an infinitival NP). We include this
discussion simply to address the logic of K&M’s argument.
152 explaining syntax

The SPEC which is a sister of any V00 is then analyzed as COMP (as opposed to
DET), and permits wh-Movement.29
So far as we can tell, these are the only two arguments that K&M bring to
bear against the notion of COMP in VP0 which are relevant in light of our
proposed theory of VP coindexing.

5.4.2 Redundancy of base rules


A different source of potential redundancy in K&M’s VP0 analysis is the fact
that both S0 and VP0 must be introduced in the expansion of VP, NP, AP, and
PP. However, because both S and VP0 are projections of [+V], the correct (and
very general) base rule would be (50).
(50) Xmax ! X [+V]max
Relatives and sentential subjects can also be expressed in terms of [+V], as in
(51), where [+V]max refers to VP0 and S0 .
(51) a. NP ! NP [+V]max
b. NP ![+V]max

29
Further evidence of this type of analysis of infinitival NPs can be found in Spanish. There is
good evidence for the two following nominal structures, even where no wh-term is involved:

(a) N⬙ (b) N⬙

COMP V⬙ DET N⬘

el V⬙ ADV
su AP N N⬙

V⬘ constantemente doloroso V (de) el pueblo

V⬘ PP despertar
‘his painful awakening
hablar P NP of the town’

con ellos
‘the speaking with
them constantly’
The structure like that in (a) is the one which permits wh-Movement. This analysis of infinitival
NPs is presented in Wilkins (1986). (See also Plann 1981 for a discussion of these infinitives.)
control, pro, & the projection principle 153

By taking into account the feature specifications of major categories, we see


that including VP0 in the base leads to no complication of the base rules.
Thus far we have seen that K&M’s principal arguments against the VP0
analysis are based on considerations of phrase structure.30 They also present
a number of arguments not directly against the VP0, but rather in favor of the
S0 analysis. These, according to them, present serious problems for the VP0
analysis.

5.4.3 Pseudo-clefts
K&M argue that the grammar can be simplified if there is no VP0, because
then it need only be stated that S0 can be focus of a pseudo-cleft. They fail to
note sentences like the following, which suggest that VP can function as a
focus if it is not tensed:
(52) a. What he did was feed the ducks.
b. What he wanted to do was feed the ducks.
Tensed VP cannot be a focus, but that fact has little to do with whether there is
a VP0 constituent, since VP0 would not contain tense.

5.4.4 Extraposition
K&M argue for a simplification by pointing out that both S and so-called VP0
extrapose. They do not note that AP and PP can also extrapose, as shown in
(56)–(57):
(53) a. A book which we didn’t like appeared.
b. A book appeared which we didn’t like.
(54) a. A book on which to work appeared.
b. A book appeared on which to work.
(55) a. A problem to work on is on the table.
b. A problem is on the table to work on.
(56) a. A book bound in leather was on the table.
b. A book was on the table bound in leather.
(57) a. A book about armadillos has just appeared.
b. A book has just appeared about armadillos.

30
While these points about the base are of interest, K&M have glossed over some additional
complexities that are important to consider. These are discussed in LLT, ch. 2.
154 explaining syntax

The true generalization is not ‘Extrapose S0 from NP’, as K&M would con-
clude, but, rather, simply ‘Extrapose from NP.’

5.4.5 Coordination
According to K&M, infinitival complements conjoin with sentential comple-
ments, and therefore should be considered to be of the same category. They
give the following examples (p. 133):
(58) a. To write a novel and for the world to give it critical acclaim is John’s
dream.
b. John expected to write a novel but that it would be a critical disaster.
The same logic would lead to the conclusion that the complements are all PPs
or NPs, because for-to complements can be conjoined with PP, and that-
complements can be conjoined with NPs:
(59) a. John hopes for Mary to leave and for a miracle.
b. I believe your answer, and that you believe what you are saying.
c. That you were here last night, and John’s reaction when you told
him, surprised no one.
The argument from conjunction used in (58) to show that VP0 is the same
category as S0 would lead to the conclusion that, in (59), S0 is NP or PP. Either
it is the case that conjunction does not provide a test for syntactic category, or
else there must be no problem with saying that all the conjoined constituents
are NPs. But presumably K&M cannot adopt this view (see Koster 1978b).

5.4.6 Construal
The strongest arguments for subjects in superficially subjectless clauses deal
with anaphora, coreference, and rules of construal in general. K&M point out
several facts that can be explained if these clauses contain a PRO subject. Two
important points must be made about this part of their discussion.
First, K&M’s approach is sufficiently problematic to warrant exploration of
the relevant constructions within alternative theories; e.g. such exploration
would seem necessary for Q-Float and for the correct construal of all. While
certain things can be adequately accounted for by a movement analysis of Q,
illustrated in (60), a number of problems remain. These can be exemplified
by (61).
(60) a. All the men tried to leave.
b. The men all tried to leave.
c. The men tried [PRO all to leave].
d. The men tried [to all leave].
control, pro, & the projection principle 155

(61) a. The men all tried to fit in the car.


b. The men tried to all fit in the car.
Examples like (61) were noted by Baltin (1982). A meaning difference exists
between (61a) and (61b) which would seem to militate against a Q-Float
analysis of the placement and construal of all, because presumably the
Q-Float rule should not alter the meaning of the sentence. In (61a), the
men—either individually or as a group—could be trying to fit into the car;
in (61b), they are trying to fit in the car all together. A predicational (coin-
dexing) analysis of (61) predicts a meaning difference: in (61a), all is part of
the VP all try to VP, whereas in (61b), all is part of the embedded VP all fit. In
both cases, all is correctly construed with the men by virtue of the coindexing.
Next consider the following:

(62) a. John, Fred, and Mary have all left .


expect to all leave.

b. *All John, Fred, and Mary have left .


expect to leave.
The ungrammaticality of (62b) seems to require a predicational analysis of
(62a); there is no well-formed source for a movement account of all. (Inter-
estingly, Ruwet (1982) shows that the French rule of R-Tous, which in many
respects corresponds to English Q-Float, also must apply in constructions
where it cannot have a well-formed source.)
Second, even if PRO unproblematically explains the relevant aspects of
anaphora and coreference,31 this does not affect our claim that PRO is not a
syntactic element. A theory can perfectly well use PRO, or its equivalent,
in LF—or at some other level relevant for semantic interpretation, such as
our R-structure—without incorporating it into strictly syntactic levels. The
distribution of anaphoric elements, the possibilities for coreference, and
construal in general are exactly the type of phenomena that should be
accounted for at a level that is not strictly syntactic.32

5.5 Comparison with the Projection Principle


A theory of grammar which makes no provision for a phonetically null
subject of embedded infinitival complements in syntactic structure is one

31
In LLT, ch. 2, we show that there are also certain problems with PRO in the account of
reflexives, especially reflexives inside NP.
32
Wilkins (1985) shows that our level of R-structure is in fact relevant for bound coreference,
reflexivization, and related phenomena.
156 explaining syntax

which cannot assume the PrP. It follows from the PrP that all verbs which have
a logical subject also have syntactic subjects:
. . . Ł-theory requires that clauses with certain verb phrases (e.g. persuade John to leave
but not be raining or be a good reason for his refusal) must have subjects at the level of
LF-representation. By the projection principle, these clauses must have subjects at D-
structure and S-structure, either PRO, or the trace of an NP, or some phonetically-
realized NP. (Chomsky 1981a: 40)

To do away with PRO convincingly, it is necessary to consider carefully the


role of the PrP in the theory of grammar, independently of the characteriza-
tion of the distribution of PRO. In other words, a theory with no PRO must
not only cover all the aspects in which PRO is a useful device, but must also
have some mechanism capable of doing the work done by the PrP in the PRO
theory. While it is difficult really to separate out the effects of a single
principle within such a cohesive theory as GB, it seems possible to distinguish
four types of work done by the PrP.

5.5.1 The categorial component and the lexicon


First, the PrP makes possible a radical reduction in the categorial component.
Reference to independent properties of base structures is eliminated in favor
of the specification of properties of lexical items—which, presumably, need to
be included in the lexicon in any case. The theory of grammatical relations is
derivable from the well-formed base structures which themselves are deter-
mined by the requirements of lexical items at LF.
In our theory, because we assume that lexical entries are specifications of
thematic structure, and because we have a comprehensive theory of predica-
tion, we also have a greatly reduced categorial component. To compare the
two theories, consider these examples:
(63) a. We persuaded Bill to leave.
b. We expected (Bill) to leave.
c. We believed Bill to be the winner.
Beginning with the verbs persuade and expect, we can assume (along
with Chomsky) that they differ in their lexical properties and their LF-
representations—in that persuade takes an NP object and a clausal comple-
ment, while expect does not have both together. These facts are correctly
captured in both theories, as illustrated below:
(64) a. We persuaded [NP Bill] [S PRO to leave].
b. We expected [s0 PRO to leave].
Bill
control, pro, & the projection principle 157

(65) a. We persuadedi [NP Billi] [VP to leavek] i.


Bill = [patientj, themej]
Bill = themek
[to leave] = goalj
b. Wei expectedj [VP to leavek]i.
we = themek
[to leave] = themej33
As can be seen in (64) (assuming the appropriate theory for the control and
interpretation of PRO), the PrP requires that the LF be directly represented in
the syntactic structure. That is, persuade has exactly one NP object and one
full clausal complement; expect has exactly one clausal complement or one NP
object. Given the early version of the Ł-criterion (Chomsky 1981a: 36), each
NP (lexically realized or PRO) would bear exactly one thematic relation.
The examples in (65) show the result of coindexing for predication in the no-
PRO theory (as discussed in }5.2, examples (13)–(16)). The syntactic structure
for persuade includes an NP object and a VP complement. The lexical require-
ments are satisfied because the definition of ‘proposition’ in the theory includes
both full clauses and antecedent/predicate pairs. Persuade requires an object
which is a (set of) individual(s) along with a proposition. After coindexing,
[Bill]i [to leave]i in (65c) corresponds to a proposition, while [Bill] remains the
object. For expect, [Bill]i [to leave]i is the proposition (after coindexing);
alternatively, theme is assigned just to the VP in (65b). In the examples of
(65), we include partial specifications of thematic role assignment (discussed in
detail in LLT). Because we do not assume the PrP, neither do we maintain the
Ł-criterion. Our well-formedness condition on the assignment of thematic
roles is stated in terms of local domains (indicated in (65) by subscripts on
the verbs). No NP may bear more than a single role within the same class,
extensional or intensional, within the same domain (see the discussion of the
principles of Completeness and Distributedness in LLT, ch. 3). In (65a), Bill
occurs as the theme of both persuade and leave. Since these two verbs define
different domains, the result is well-formed. (An ill-formed case would be, for
example, if some NP were assigned both theme and source of the same
predicate.)
In our (no-PRO) theory, because a verb like expect says simply that it may
have a propositional complement, we predict the grammaticality of all three
of these cases:

33
Wasow presents a very similar thematic analysis in a lexical framework
158 explaining syntax

(66) a. Wei expected [to leave]i.


b. We expected Billi [to leave]i.
c. We expected [that everyone would leave].
For the PRO theory, in order for the PrP to be inviolable, there must be a
system of exceptional case-marking to account for the grammaticality of both
alternatives in (67) (see Chomsky 1981a: 97–9):

(67) We expected [ PRO to leave].


Bill
This is because PRO and lexical NP are elsewhere in complementary distri-
bution. If expect here assigns case, then PRO should be ill-formed, because
PRO cannot have case. But if expect does not assign case, then (67) with Bill
should be ungrammatical because a lexical NP must have case.
This brings us back to the example in (63c) with believe. Believe is gram-
matical only with a lexical NP as ‘subject’ of the infinitive:
(68) *We believed to be the winners.
This difference between expect and believe is captured in terms of optional vs.
obligatory S0 deletion to get the right results for case-marking. For the PRO
theory, persuade and try (We tried [PRO to leave]; *We tried [Bill to leave]) are
the paradigmatic cases, and expect and believe require exceptional treatment.
In our theory, however, believe, persuade, and expect are all regular in
English, given the correct lexical information: believe must have a prepos-
itional complement, expect may have a propositional complement, and
persuade has both an object and a proposition as its complements.34 Try
also can be readily handled if its lexical entry simply states that its theme is
non-propositional. This would mean that theme is assigned to [to leave] in
(69a). While to leave and we are coindexed, this coindexing pair does not
count as propositional, because it is not treated as a unit with respect to the
thematic properties of the verb try:
(69) a. Wei triedj [to leave]i.
we = experiencerj
[to leave] = themej
b. *We tried Bill to leave.
c. *We tried that Bill should leave.

34
Believe behaves differently in other languages, e.g. Spanish and French. The equivalent of
(68) is grammatical in Spanish, but strings of the form NP V NP VP (*Juan cree a Maria ser
inteligente) are ungrammatical. The proper treatment of believe-type constructions in French
and Spanish requires a full analysis of control in clitic languages; such a treatment is beyond the
scope of this chapter.
control, pro, & the projection principle 159

In our theory, an example like (69b) is ungrammatical because try occurs with
too many arguments.35 Example (69c) is ungrammatical because the goal is a
proposition, in violation of the lexical specifications for the verb.
To summarize, both systems provide for a reduction of the categorial
component of the grammar. Both theories involve a certain cost. To sustain
this reduction, the (inviolable) PrP framework must include a system of
exceptional case-marking, with (sometimes obligatory) S0 deletion. The no-
PRO theory must permit fairly detailed lexical entries. The important point is
that the PrP is not supported just because it allows for a reduced categorial
component. For both theories, the base rules are needed only for the unpre-
dictable distribution of certain categories, e.g. prepositions.

5.5.2 Raising to subject


Second, the PrP is important for what would otherwise be an indeterminacy
in the theory. The PrP insures against two analyses for the seems construction.
As discussed by Chomsky (1981a: 187), the interpretive analysis (as opposed to
movement) must be ruled out. In other words, the following should not be a
possible D-structure:
(70) John seems [PRO to have been there]
Chomsky says: “In fact, the interpretive option is ruled out by the projection
principle, since John appears in a non-Ł-position in D-structure.”
In our no-PRO theory, the subject-raised construction must be base-
generated, since there is no NP-movement.36 In (71), as indicated, John is
the antecedent of the infinitival VP. Seem specifies in its lexical entry that it
takes a propositional complement. Additionally, it specifies that it assigns no
role to its subject—hence the grammaticality of (72) with it as subject (see
discussion of it and there in LLT, ch. 3). Because John in (71) bears no role with
respect to the main verb, the locality condition (b.ii) of Coindex is relevant.
None of the thematic conditions are invoked; and after coindexing, the

35
Note that (69b) would be well-formed without the embedded VP, as in We tried Bill, just so
long as exactly two roles are assigned. The situation with try is somewhat more complicated than
we suggest in the text, because it is necessary to rule out *We tried Bill to leave, even when Bill
and to leave are coindexed. In such a case, Bill would actually have a thematic role—the one
which to leave assigns to its antecedent. We propose that the explanation in this case is that such
a configuration of Ł-roles and coindexing, where the antecedent lacks its own Ł-role, forces a
propositional interpretation, which is of course ruled out for certain verbs.
36
The question of whether PRO exists is logically independent of whether there is NP
movement; however, we believe we have shown (LLT, ch. 3) that a no-PRO theory with NP-
movement runs into unneeded complications. The only interesting theories in this respect have
both NP-movement and PRO, or neither. In light of what we take to be syntactic evidence
against PRO, we adopt the second alternative.
160 explaining syntax

thematic role is assigned to the proposition. The proposition formed by the


coindexing is the theme of seems:
(71) Johni seems (to me) [to have been there]i.
(72) It seems (that) John has been there.
Both the PRO and the no-PRO theories avoid an indeterminacy in the
analysis. For one, the seems construction must be movement; for the other,
it is base-generated (because there is no NP-movement). The relevant point is
that, in the no-PRO theory, the PrP is shown to be unnecessary.

5.5.3 NP-trace
Third, the PrP is important in distinguishing NP-trace and PRO. Given
an antecedent-[e] pair, the PrP gives a principled account of when the [e]
is the trace of ‘Move Æ’ for NP, and when [e] must be PRO. If the
antecedent is in a Ł-position, and the [e] is the trace of a movement,
then a violation of the Ł-criterion will occur at S-structure, because the
antecedent will have more than a single Ł-role. This is illustrated in (73a)
below. Example (73b) shows that the D-structure would represent a
violation of the PrP, because it is not a projection of the lexical properties
of the matrix verb:
(73) a. Johni hoped ti to leave.
b. [e] hoped John to leave.
In (73a), John would have two roles at S-structure, one directly from the verb
hope and the other via the trace. In the D-structure, the PrP is directly violated
because there are too few arguments to satisfy the lexical requirements of the
verb hope. (Presumably nothing can be assigned to an empty node [e] where
no lexical insertion has taken place.)
Alternatively, if the antecedent is in a non-Ł-position and the [e] is a PRO,
there will again be violations:
(74) John was seen PRO.
John in this case has no role whatever. The PrP therefore leads to a principled
distinction between PRO and the trace of NP.
In our theory, the issue does not arise because there is no PRO, nor is there
a trace of NP-movement. Even if there were a rule of NP-movement, for
reasons of learnability (see LLT, ch. 5)—and because there is no phonological
evidence for NP-trace (see Culicover and Rochemont 1983)—movement to an
argument position could not leave a trace. Movement to COMP or FOCUS,
control, pro, & the projection principle 161

however, must leave a trace. Here again, in our theory, the PrP is not
necessary.

5.5.4 Acquisition
Finally, and perhaps most importantly, the PrP implies that acquisition can be
based on the learning of lexical items. This issue is not addressed directly in
the literature on GB theory; however, Chomsky states (1981a: 31):
The grammar of a particular language can be regarded as simply a specification of
values of parameters of U[niversal] G[rammar], nothing more. Since the projection
principle has the consequence of substantially reducing the specification of the
categorial component for a particular grammar, it has corresponding implications
for the theory of language acquisition. Someone learning English must somehow
discover the subcategorization features of persuade, one aspect of learning its mean-
ing. Given this knowledge, basic properties of the syntactic structures in which
persuade appears are determined by the projection principle and need not be learned
independently. Similarly, a person who knows the word persuade ([and] hence knows
its lexical properties, specifically, its subcategorization features) can at once assign an
appropriate LF-representation and S- and D-structure when the word is heard in an
utterance, or in producing the word, and will recognize the sentence to be deviant if
other properties of the utterance conflict with this assignment. Hence languages
satisfying the projection principle in their basic design have obvious advantages
with respect to acquisition and use.

Chomsky later says (p. 343):


The Ł-criterion and the projection principle impose narrow constraints on the form of
grammar and lead to a wide variety of consequences. At the LF-level, the Ł-criterion is
virtually a definition of well-formedness, uncontroversial in its essentials, though the
nature of the syntax of LF, hence the precise way in which the Ł-criterion applies at this
level, is an important if difficult empirical issue. The projection principle, in contrast,
is not at all obviously correct. It is violated by most existing descriptive work, and it
has some important consequences in several domains: internal to grammar, it serves
to reduce the base rules to a very small number of parameters and to limit severely the
variety of S-structures, and it enters into many specific arguments, as we have seen;
beyond, it poses the problems of processing and acquisition in a decidedly different
light, delimiting fairly narrowly the ways in which these problems should be pursued.
It is, therefore, a principle that should be considered with caution; if correct, it is
important.

In LLT, we have directly addressed the issue of language learnability; in


chapter 5 of that book, we discuss in detail the issue of degree-0 learnability,
given our general Locality Condition. In that discussion—in which we go so
162 explaining syntax

far as to outline how a formal proof would proceed—we also assume that
learning is based on the acquisition of information about lexical entries. The
learnability problem is cast in terms of the learning of the correct assignment
of thematic roles and grammatical relations to the correct NPs. Although we
do not assume the PrP for the learnability model, we believe that the plausi-
bility of degree-0 learning, as we conceive it, is demonstrated. We therefore
feel confident in claiming that the PrP also fails to give a unique characteriza-
tion of a theory of learning based on lexical information.

5.6 Conclusion
We have shown here that an interesting theory of control exists which makes
no use of PRO in the syntax. To do this, we have had to give detailed
consideration to two of the basic principles of the GB theory: the Projection
Principle and the Ł-criterion. We think we have demonstrated that neither
principle is necessarily supported by the syntactic data, given an alternative
theory based on the Locality Condition of LLT and on the level of R-structure
with the particular properties which we postulate for it. In addition, since our
theory has been developed with the specific goal of determining a plausible
learning theory—and since we take learnability requirements to be crucial in
the explanation of the structure of linguistic theory—we believe our work to
be particularly relevant at the explanatory level.
6

Negative curiosities
(1982)*

Remarks on Chapter 6
I was motivated to write this article by the idea being entertained in the late
1970s that ‘stylistic’ rules such as extraposition have no effect on the logical
form of a sentence, although they do have consequences for superficial
constituent ordering. This seemed to me at first sight to be wrong, because
of counterexamples resulting from extraposition of a negative over any, e.g.
*Pictures were on any of the tables of none of the men, which was my original
concern in sketching out this paper. Over time the paper morphed into an
extended investigation into a number of oddities involving negation in
English, including tag questions. I have omitted from the current version an
Appendix that now strikes me as superfluous to the main argument.
The main argument of the paper is that the logical properties of sentences
are determined by the most superficial syntactic representation, that is, the one
that corresponds directly to linear order. While the argument developed here is
essentially about the facts of interpretation, there are significant theoretical
implications. In particular, if the logical form of a sentence depends strictly on
the superficial structure, then the motivation for deriving extraposition
through movement is significantly weakened. This is a welcome result, since
a movement analysis of extraposition does not fit naturally with the treatment
of leftward movement constructions, such as wh-questions and topicalization.
Michael Rochemont and I pursued the issue of a rightward movement
analysis of extraposition in other work, including Culicover and Rochemont
(1990) and a 1997 paper, reprinted here as Chapter 7. We concluded ultimately
that extraposition is not movement, as originally suggested by the negative
curiosities discussed here, but a special case of predication, along the lines
discussed in Chapter 5.

* [This paper originally appeared as Peter W. Culicover, Negative Curiosities, Indiana Uni-
versity Linguistics Club, 1982. (Revision of Social Science Research Report 81, UC Irvine.) It is
reprinted by permission of the Indiana University Linguistics Club.]
164 explaining syntax

6.1 Introduction
There has been considerable effort expended in linguistics in recent years on
the investigation of the properties of unbounded movement rules, such as wh-
fronting and NP movement.1 This work has led to the development of the
trace theory of movement rules, in which restrictions on the output possibil-
ities of such unbounded rules are handled not by conditions on the rules
themselves but by constraints on the derived syntactic relationship between
the moved constituent and its trace, corresponding to the underlying position
of the moved constituent. The intriguing possibility has emerged that these
constraints may in fact be constraints on the logical forms corresponding to
the derived structures, and are not strictly syntactic constraints.2
My concern in this paper will be primarily with rules that are, from all
indications, not unbounded movement rules: tag formation, negative inver-
sion and Stylistic Inversion. To a considerable extent my interest here is a
descriptive (or perhaps observational) and not a theoretical one, because
there are certain facts that have been ignored in traditional treatments of
these rules, and which should, I believe, be taken into consideration in any
future account. However, the phenomena that I will discuss do have theoret-
ical implications, and while I will not pursue them in great detail here, I will
suggest some likely directions in which the evidence points. Specifically, it
appears that there are some logical properties of the sentences to be con-
sidered that appear to be determined by the linear order of constituents after
all transformations have applied.
To put these points into perspective, let us recall that Chomsky and Lasnik
(1977) propose that the logical form (LF) of a sentence is determined not by
the actual surface structure of the sentence, but by the intermediate structure
that results from the application of rules of ‘Core Grammar,’ such as wh-
Fronting and NP movement, cited above. Other rules are viewed as stylistic,
and do not, in the Chomsky and Lasnik proposal, bear on aspects of logical
form. Rochemont (1978; 1979; 1980) has developed a particular version of this
proposal, setting forth a characterization of the form and function of stylistic
rules.
Since the term “logical form” is a vague one, we could choose to speak
rather of a putative level of representation LF that has certain specific and
perhaps yet to be discovered properties. It is entirely plausible that limiting LF
to, say, representation of the binding relationships between quantifiers and

1
See Chomsky and Lasnik (1977) and references cited there. It is by no means universally
accepted that NP Movement is an unbounded rule, nor even that it is a transformation. For an
alternative view, see Bresnan (1978).
2
See Chomsky (1980) for a recent formulation of some constraints on logical form.
negative curiosities 165

variables will turn out to be a viable and productive research strategy. It is


also reasonable in principle to identify such a level of representation with
a syntactic level, e.g. the output of Core Grammar. Without taking any
position on the ultimate usefulness of this assumption, I will adopt the view
here that logical form (or at least a level of logical form) is that representation
of the sentence that specifies the scope relationships between negation and
quantifiers.
This paper runs the following course. I will first discuss interrogative tags,
and isolate those tags that display polarity phenomena from other sorts of
tags. This subclass of tags, which appears to be a natural class, does not admit
of a purely transformational characterization. Rather, it appears that the
syntactic characterization of this class of tags depends in part on semantic
factors, in particular, on the scope of negation. [In this respect, my analysis
here supplants my analysis in Chapter 3 in the light of additional empirical
considerations.]
In discussing the scope of negation I will bring out certain facts, some of
them well-known, that suggest that the scope of negation is determined by the
application of stylistic transformations. If this is correct, the conclusion
follows that logical form is determined in part by these transformations.
Furthermore, since the form of the tags depend in part on the scope of
negation, the tags cannot be transformationally derived, but must be base-
generated and interpreted in surface structures. This is, of course, a non-
traditional solution.
I will conclude with discussion of Stylistic Inversion, a transformation
whose output bears on the interpretation of wh with the variable that it
binds. Assuming that the facts admit of no alternative analysis, it seems to
follow that this particular level of interpretation cannot be determined after
the application of just the rules of Core Grammar. Such a conclusion casts
doubt on the empirical viability of the stipulated level LF discussed earlier.
However, it should be stressed that these remarks are in no way conclusive,
and that potentially workable alternatives to the analysis that will be suggested
abound.

6.2 Tags: the polarity facts


The first phenomenon to be examined is that of the polarity of the tag in
an interrogative tag question. Because there are numerous side issues that
must be identified and tracked down, the main point should be summarized
beforehand. Briefly, it can be demonstrated that the polarity of the interroga-
tive tag depends on the polarity of the main clause of the sentence, where the
166 explaining syntax

main clause is the part of the sentence to the left of the comma, and the tag is
the part to the right of the comma.
(1) a. John drank the tea, didn’t he?
b. John didn’t drink the tea, did he?
The polarity of the main clause does not depend simply on whether there is
negation in the AUX position, but on a complex set of conditions. These
conditions appear to be surface structure conditions, in part, having to do
with the surface position of constituents containing negatives. The polarity of
the tag serves in turn as a diagnostic for what the scope of negation is in the
main clause.

6.2.1 Types of tag


Let us first establish the fact that the interrogative tag must disagree in
polarity with the main clause. That this is the case might not seem obvious
at first, because of the fact that there are grammatical tag questions of the
following sort.
(2) John drank the tea, did he?
However, a close examination of the data suggests that there are in fact three
sorts of tags that can be appended to main clauses (at least): interrogative,
disputational, and assertival. While similar in syntactic structure, the three
tags can be distinguished by their intonations, as well as their meanings.a
The interrogative tag is distinguished by a rising intonation on the tag; the
interpretation roughly is that the speaker suspects that the proposition
expressed by the main clause is true, and he is seeking confirmation of this.
In the following examples, the intonation falls to the comma as it would in a
normal declarative sentence, dips down to the AUX of the tag, and rises up to
the pronoun.b

(3)
a. John drank the tea, didn’t he?

b. John didn’t drink the tea, did he?

a
I discuss the different types of tags and their meanings in Culicover (1973), reprinted as
Ch. 3 of this book.
b
When I wrote this paper I did not have the benefit of the subsequently developed ToBI
framework for annotating intonation (Beckman et al. 2005). As far as I know, the intonation of
English tags has yet not been given a precise description within the ToBI framework.
negative curiosities 167

It appears in fact that the intonation rise on the tag does not quite bring the
pitch up to the level at the comma.
The disputational tag has a relatively flat intonation. The pitch of the tag in
this case is determined by the pitch at the end of the main clause. If the pitch
is rising (in an expression of shocked disbelief), the pitch on the tag remains
at that level, as in (4).

(4)
You plan to marry my daughter, do you?
However, if the pitch falls on the last part of the main clause, as in contem-
plation of a recently expressed proposition, the pitch on the disputation tag is
low but flat.

(5)
You plan to marry my daughter, do you?
If the main clause is negative, the disputational tag is still positive.

(6)
You don’t plan to marry my daughter, do you?

(7)
You don’t plan to marry my daughter, do you?
It is impossible to have a disputational tag that is negative, attached to either a
positive or a negative main clause.

(8)
a. *You plan to marry my daughter, don’t you?

b. *You plan to marry my daughter, don’t you?


(9) *You don’t plan to marry my daughter, don’t you? (with any intonation)
These facts suggest quite clearly that the disputational tag is a different type of
tag than the interrogative tag: it must be positive, it has different intonation,
and conveys a different meaning. It seems reasonable to exclude such tags
from the discussion of the syntax of interrogative tags, even though doing so
changes the precise character of the syntactic generalization that might
otherwise be captured, as we will see.
168 explaining syntax

The assertival tag is similar to the interrogative tag, but differs from it in
intonation and in nuance. The intonation of the assertival tag is a falling one:
the pitch on the AUX starts higher than the pitch at the end of the main
clause, and falls back down to this level (approximately).

(10)
You plan to marry my daughter, don’t you?
The interpretation of this tag differs from that of the interrogative in that this
one expects confirmation from the listener, and does not simply seek con-
firmation. Arguably, the assertival tag is a variant of the interrogative tag
involving a switch of accent in the tag, which leads to a different intonation
contour and a slightly different intonation. It is certainly true that the two
types display the same polarity facts.

(11)
You don’t plan to marry my daughter, do you?
It should be noted that the same intonation as we find in the assertival tag
shows up in a case where there is no polarity difference.

(12)
You plan to marry my daughter, do you.
It is likely, however, that, this intonation contour is a consequence of putting
stress on the AUX, and does not signal a crucially different type of tag from
the flat, disputational tag that we discussed above.

6.2.2 Syntactic analysis of tags


Having established that the polarity facts hold for interrogative and assertival
tags, and that disputational tags should be discussed separately, we must
examine the formal devices necessary for characterizing this generalization
correctly. Because the traditional transformational analyses are more or less
familiar, I will not go into them in great detail. There are three basic distin-
guishable proposals. (i) That of Klima (1964) introduces the negative into the
tag when the underlying main clause is positive, and forms a positive tag
when the underlying in clause is negative. (ii) Culicover (1971) and Akmajian
and Heny (1975) generate the negative outside of the main clause, form the
tag, and then locate the negative either in the main clause or the tag. (iii)
Culicover (1976) attempts to explain the appearance of the negative in terms
of the structural description of the tag formation transformation.
negative curiosities 169

The formal description of analysis (i) is given by the following rules:c

(13) a. NP AUX not X


1
1 2 3 4 ⇒ 1 2 3 4, whether 2
+PRO
b. NP AUX X
1
1 2 3 ⇒ 1 2 3, whether 2 not
+PRO
Regardless of whether there is some way of stating the rules in order to
collapse them notationally, the crucial property of this analysis is that it
requires two distinct transformations in order to characterize the polarity
facts. Moreover, one of these transformations inserts a designated item in a
not particularly general way. Such an analysis thus suggests, counterintuitively
it seems to me, that a language with only positive tags on negative sentences
(and no tags on positive sentences) would be more natural than English.
Analysis (ii) capitalizes on the fact that there can be a positive tag on a
positive main clause, along the lines of (4), (5), and (12). The analysis involves
the following two transformations.

(14) a. Tag Formation: (not) (whether) NP AUX X


1 2 3 4 5 ⇒
3
1 Ø 3 4 5, 2 4
+PRO
b. not-Placement: not X AUX Y
1 2 3 4 ⇒ Ø 2 3+ 1 4
The main feature of this analysis is that it fails to capture syntactically the
difference between the interrogative tags and the disputational tags. Notice
that the presence of not is not required for the generation of the tag. Thus the
same transformational rule derives both sorts of tags. The rule does not in
itself specify what the intonation will be, or what the interpretation of the
tagged sentences will be. Since the intonation and the placement of not
determine in part the interpretation of the tag question, it follows that there
will be certain aspects of interpretation that cannot be determined except in
derived structure. In particular, we would have a prima facie argument, given
this analysis, that the scope of negation is determined in derived structure,

c
The use of transformational rules to generate tag questions is a device rooted in the earliest
period of generative grammar. A more modern treatment would not employ such devices, but
would still be faced with the problem of describing what a possible tag question is, and what it
means. Given the idiosyncrasies that the English tags display, a more contemporary approach
would take a constructional perspective, as in e.g. Culicover (1999) and Kay (2002b).
170 explaining syntax

and that the force of the tag is determined in derived structure as well.
However, we do not have an argument here that the scope of negation is
determined after all transformations, since in this analysis placement of not
apparently determines its ultimate scope.
What I will argue however, is that the polarity of the tag cannot be
characterized simply by a rule of not-Placement, but depends on the scope
of negation expressed as a logical property. It is of some interest to note,
therefore, that there are problems with the purely syntactic analysis of tags
from the point of view of the syntax itself.
It appears to be a mistake to generate the two kinds of tags, interrogative
and disputational, by the same syntactic rules. Because of sentences like the
following we will have to extend the tag formation transformation to include
auxiliaries that follow AUX.
(15) a. John would have left, wouldn’t he have?
b. Mary should be here by now, shouldn’t she be?
c. Clancy hasn’t been trying very hard, has he been?
Ignoring here the precise form that such a rule would take, observe that these
extended tags cannot be used disputationally. With incredulous, rising inton-
ation, all of the following are quite unacceptable.
(16) a. *John would have left, would he have?
b. *Mary should be here by now, should she be?
c. *Clancy has been trying very hard, has he been?
These facts suggest that the syntactic generalization captured by the tag
formation rule ordered before not-Placement is a spurious one. Notice that
there is a way to avoid the problem just noted: remove the parentheses in Tag
Formation from not. The correctness of this revised analysis then rests on
whether the analysis captures all of the relevant data (which it does not), and
whether not-Placement itself is a well-motivated transformation. We can
avoid this latter question here, since we can show that the revised analysis is
not descriptively adequate, even if it is preferable to the analysis of (14).
Finally, analysis (iii) (in Culicover 1976) tries to explain the appearance of
negation in the main clause or the tag as a consequence of whether negation
has underlyingly sentential or verb phrase scope. The tag formation rule is
stated as follows:

(17) Tag Formation: whether NP AUX (not) VP


1 3 4 5 ⇒
2
Ø 2 3 Ø 5, 1 34
+PRO
negative curiosities 171

For this rule to apply correctly, we must impose a special interpretation on the
meaning of the parenthesized not in the structural description: If not is
present between AUX and VP, it is moved into the tag. However, if there is
no not between AUX and VP, only the AUX is copied into the tag. That is, the
fourth term of the structural description in this case is satisfied by Ø, which is
disjunctive with not. Hence Ø (in effect nothing) is copied over into the tag.
In order to get negation in the main clause, it must be generated in some
position in addition to the immediate post-AUX position as a daughter of
S. To make this analysis work, we must generate not as a daughter of VP.
The claim, then, is that not that appears in the tag is underlying S negation,
while not that appears in the main clause is underlying VP negation. If there is
some semantic correlate to the syntactic position of negation, we would
expect that negated main clauses with positive tags would have a more
restricted range of interpretation than identical declaratives with negation,
since only in the case of the latter could the negative be attached to S or to
VP. As far as I know there is no data to suggest that this is the case. In fact, the
only data that pretends to illustrate a difference between sentential and VP
negation does not provide the relevant distinction.
(18) John doesn’t lie because he is honest.
On the traditional analysis, where negation takes wide (S) scope we get the
entailment that John doesn’t lie for the reason that he is honest, but he lies for
some other reason. In fact, he may not be honest. Where negation takes
narrow (VP) scope, we get the entailment that John doesn’t lie, and the reason
is that he is honest. (There are, of course, alternative analyses in which the
relevant variable is the position of the because clause, and not negation.) The
ambiguity shows up when we introduce a tag, however.
(19) John doesn’t lie because he is honest, does he?
The ambiguity of (19) is predicted in an analysis in which the difference
between VP and S scope is not tied to a syntactic difference in the position
of not, and is not predicted where the scope of not is syntactically
characterized.

6.2.3 Determinants of tag polarity


Having summarized the competing syntactic analyses of tag polarity, it is
relevant to note that there are other determinants of whether the tag will be
positive or negative that do not involve the presence of not in AUX position or
the movement of not into this position. Such examples as the following show
that the negative element responsible for a positive interrogative tag may
appear elsewhere in the main clause.
172 explaining syntax

did he?
(20) a. No one drank the tea,
didn’t he?
b. Pictures of none of the women were hanging in the gallery,
were they?
weren’t they?
are they?
c. Nobody’s pictures of Bill are on sale,
aren’t they?
The property that these examples share with the more traditional examples in
which a negative in the main clause selects a positive tag is that the main
clauses of these may also be paraphrased by it is not the case that, indicating
that both classes of example have wide (S) scope negation.3
(21) a. It is not the case that anyone drank the tea.
b. It is not the case that pictures of any of the women were hanging in
the gallery.
c. It is not the case that anybody’s pictures of Bill are on sale.
In order to incorporate such examples into a syntactic analysis, we would
have to add another rule of negative placement that moves not into constitu-
ents like the subject NPs in (20). In fact, Klima’s (1964) analysis of negation
contains, in addition to not-Placement, a subsequent transformation that
attracts not to a preceding indefinite NP, and another rule that incorporates
not with an indefinite to yield, ultimately, no, none, nobody, etc. In current
theory neither of these latter two transformations can be formulated. The rule
that incorporates not demands significant respelling in violation of the Strict
Lexical Hypothesis; see Jackendoff (1972: esp. ch. 9) for arguments against
Klima’s analysis.
The rule that attracts not to an indefinite must also look indefinitely far into
the subject NP to determine that an indefinite in fact is present, and the
incorporation rule must actually lower not into the NP. That there is no
principled bound to this lowering can be seen from examples such as the
following, constructed along the lines of (20b).
(22) a. Photographs of pictures of none of the women were hanging in the
were they?
gallery,
weren’t they?
b. Negatives of photographs of pictures of none of the women were
were they?
found in the darkroom,
weren’t they?

3
See Jackendoff (1972) for discussion of wide scope negation and its paraphrases.
negative curiosities 173

As expected, with a positive tag the negation in the main clause has wide
scope, and the following paraphrases are appropriate.
(23) a. It is not the case that photographs of pictures of any of the women
were found in the darkroom.
b. It is not the case that negatives of photographs of pictures of any of
the women were found in the darkroom.
A rule permitting the unbounded lowering violates two constraints accepted
in much of current syntactic theory: lowering is not permitted, and trans-
formations cannot apply over an unbounded domain.4
Granting that wide scope negation determines that the tag will be positive,
what determines that negation will have wide scope? From examples that we
have already encountered we may conclude tentatively at least that AUX
negation and negation in a subject NP will yield wide scope. Before continu-
ing with this line of inquiry, however, we should rule out the logical possibility
that the selection of the positive tag is determined by a small set of syntactic
conditions, and not by a single semantic property of the main clause. In
particular, we should rule out the possibility that it is sufficient simply for
there to be a negative in the subject NP in order for there to be a positive tag.
The following examples demonstrate that the condition is not syntactic.
(24) a. A man with no hair was on the bus, *was he?
wasn’t
b. Requests for no noise are treated with disdain, *are they?
aren’t
c. Movies with no children are popular with adults, *are they?
aren’t
Confirming our intuition is the fact that the following are not paraphrases of
the main clauses in (24).
(25) a. It is not the case that a man with any hair was on the bus.
b. It is not the case that requests for any noise are treated with disdain.
c. It is not the case that movies with any children are popular with
adults.
We thus illustrate the well-known fact about negation that it can take con-
stituent (here, NP) as well as sentential scope. The point here is that there is

4
For the constraint against lowering, see Chomsky (1965) and for a different formulation,
Wexler and Culicover (1980). Boundedness follows from a variety of independently proposed
constraints, including the Subjacency Condition of Chomsky (1973), the Binary Principle of
Wexler and Culicover (1980), Culicover and Wexler (1977), and perhaps the Subject Condition of
Chomsky (1973), at least for the examples in (22).
174 explaining syntax

no [single] syntactic correlate to sentential scope that we could use to


formulate the polarity of the tag in purely syntactic terms.
Let us consider now the problem of interpreting a negative as having
sentential scope. Informally, a negative constituent will yield sentential
scope for a given S if the negative is ‘accessible’ from the S. Accessibility is
related to various constraints in the literature dealing with movement and
logical form. For example, a negative in an embedded that clause or one in a
relative clause apparently cannot permit a positive tag.
(26) a. *Karen believed that no one drank the tea, did she?
b. *The man who no one likes was here, was he?
(The stars refer here, and elsewhere to interrogative tags, unless otherwise
noted.) It is well known that relative clauses block extraction of a wh-phrase,
for example, and there are various constraints that seek to explain this fact.
What distinguishes negative accessibility from wh-Fronting is that the latter
can apply to subjects (and other constituents) of some that clauses.
(27) a. Whoi does Karen believe (*that) _i drank the tea.
b. *Whoj was the man whoi _j likes _i here?
Further differences arise in considering NPs like the subjects of (20b) and
(20c). As is well known, it is generally unacceptable to extract a constituent of
a subject, and it is never possible to extract a possessive from an NP.
(28) a. *Whoi were [pictures of _i] hanging in the gallery?
b. *Whosei are [_i pictures of Bill] on sale?
It is at least true that a negative within a simple S may be construed as having
sentential scope. Some relevant data has already been presented in the form of
(21). What is problematic is whether or not a negative in non-subject position
can function as a sentential scope negative. Consider the following examples.
(29) a. John is predicting the election of no candidate.
b. Mary hopes to find none of the applications.
c. Fred is looking for none of the unicorns.
In each example we have an ambiguity. In (29a), it could be that there is no
candidate whose election John is predicting. Or it could be that John is
predicting that no candidate will be elected. In (29b), it could be that there
is no application that Mary hopes to find. Or it could be that Mary hopes that
she will find none of the applications. In (29c), it could be that there is no
unicorn that Fred is looking for. Or, somewhat contradictorily, Fred could be
engaged in an active search for the nonexistent unicorns, and will be surprised
negative curiosities 175

and disappointed if he finds one. This latter interpretation is rather hard to


visualize, of course, since why would he be looking and what would he
actually be looking for?
Our intuitions are that sentential negation is possible with non-subjects,
then. These intuitions are substantiated by the fact that the examples in (29)
may appear with either positive or negative interrogative tags.
(30) a. John is predicting the election of no candidate, is he?
isn’t he?
b. Mary hopes to find none of the applications, does she?
doesn’t she?
c. Fred is looking for none of the unicorns, is he?
isn’t he?
While not all examples of this type allow the ambiguity in question, the
examples that do are clear enough to suggest at least tentatively that the
ambiguity is a systematic one, but one that may be overridden by other factors.

6.2.4 Deriving the ambiguity


Let us now consider why this ambiguity should exist in the first place. Is it
simply an accidental fact that certain negative constituents may take wide
scope over the sentence? In fact, it appears that this phenomenon in part is a
special case of a more general one. Observe that in general there is an
ambiguity in the interpretation of indefinite noun phrases.
(31) a. John is predicting the election of a candidate.
b. Mary hopes to find one of the applications.
c. Fred is looking for a unicorn.
As is well known, in intensional contexts, an indefinite may receive an
existential interpretation with wide scope, or it may simply act as a descriptive
element. Thus, in (31a), either there is a particular candidate whose election
John is predicting, or John is predicting that a candidate will be elected,
without having anyone particular in mind. In the latter case we are character-
izing his prediction and not claiming that he was referring to anyone.
In (31b), similarly, either there is a particular application that Mary hopes
to find, or Mary hopes that she will find at least one (and perhaps at most
one). In (31c), either there is a particular unicorn that Fred is seeking, or he is
simply on a unicorn hunt.
In the case of verbs describing physical relationships between objects it is
hard to get just the non-existential reading on an indefinite. However, when
mental states are involved, there may be a particular object in the mind of the
speaker, or there may not be. It is appropriate to use the common noun to
176 explaining syntax

refer either to the particular object by describing it, or to the type of object in
mind. When a physical relationship is involved that is described by a transitive
verb, it entails that there is some physical object corresponding to the direct
object, and hence the type interpretation of the indefinite NP will always be
paired with an existential interpretation.
The possibility of assigning wide scope to NPs in general is discussed by
Dresher (1977), who proposes the following rule:
(32) NP-Scope Interpretation
Any configuration [S . . . NP. . . ] can be interpreted either as
i. [S . . . NP. . . ] or as
ii. [S NP [AB x n [S . . . hen . . . ]]]
Dresher notes the clear applicability of this rule to cases in which the NP is
indefinite (pp. 372–3). Since negative NPs are indefinites, we will be able to use
this rule to get wide scope negation for cases like John is predicting the election
of no candidate (29a). Let us consider what the domain of (32) is.
As stated, (32) is extremely general. While he does not pursue the matter in
detail, Dresher does note that it is applicable at least to simple S’s, and to
complement S’s, as in (33).5
(33) Mary thinks that John is looking for a lawyer.
Dresher shows that this sentence, by the appropriate application of (32), is
predicted to have three readings, and all three appear to hold in fact. Example
(31a) shows that (32) applies to an NP within another NP. Given this, we can
use (32) to account for the wide scope of negation in all of the examples that
we have thus far considered, provided that we assume in addition that wide
scope of negation in fact is formally equivalent to the result of applying (32) to
a negative indefinite. For example, applying (32) to (31a) does not automatic-
ally give the desired result.
(34) a. John is predicting the election of no candidate.
b. no candidate [x n [John is predicting the election of himn]].
For purposes of this discussion we will simply stipulate that a logical form
such as (34b) with an indefinite taking wide scope is equivalent to a formal
logical expression in which the indefinite is interpreted as an existential, and
that furthermore if the indefinite is negative, it is interpreted as a negative
existential, as in (35).

5
Dresher’s example (59).
negative curiosities 177

(35) there does not exist [any candidate [x n John is predicting the election of
himn]]].
Such a stipulation is not a solution, but is made simply in lieu of having
worked out a complete and precise analysis, one which may well involve some
substantial reformulation of (32).
As predicted by (32), it should be possible to get wide scope of negation
when the negative indefinite is within a sentential complement. In fact, it is
possible, but it is necessary to assign heavy stress to the indefinite NP in order
for the interpretation to come through clearly.6 The following examples
illustrate.
(36) a. Karen believed that no one drank the tea, did she?
b. Carl claimed that he wanted none of the books, did he?
c. Sam predicted that no candidate would be elected, did he?
The positive tag is acceptable just in case we can read the negative constituent
as being an existential that takes scope over the entire sentence, not just the
that clause. This reading is closely related to the so-called not-Transportation
or not-Hopping reading given by the following paraphrases.
(37) a. Karen didn’t believe that anyone drank the tea, did she?
b. John didn’t claim that he wanted any of the books, did he?
c. Sam didn’t predict that any candidate would be elected, did he?
We can in fact generate the same entailments for the examples in (37) and
those in (30) by applying (32) to the latter and then applying the inference
exemplified in (34).

6.2.5 Tags and surface structure scope


We have seen that a positive interrogative tag can only co-occur with a main
clause over which there is wide scope negation. We may reasonably imagine
that the tag is generated freely, with or without negation, and that the
appearance of a positive tag with the appropriate intonation specifies a
condition that the main clause must satisfy at the level of logical form. It is
not clear, however, that the condition required by the positive tag must be
stated in terms of logical form; it is logically possible, for example, that some
deep structure syntactic configuration might be sufficient to determine
whether or not the main clause will have wide scope negation. If so, sentences
with the wrong tags could be filtered out by the grammar at this level of

6
There are examples in which heavy stress facilitates the wide scope interpretation for non-
negative indefinites, as Dresher notes (1977: 370).
178 explaining syntax

representation. There appears to be no natural syntactic analysis of this sort


available.
On the other hand, it is also logically possible that while there are several
disparate syntactic constructions that allow the assignment of wide scope of
negation, the rules for assigning wide scope all apply at some early stage in the
derivation, perhaps in deep structure, perhaps at the end of some natural class
of transformations (such as those of Core Grammar), or at some other
identifiable level of syntactic representation. In fact, it appears from the
evidence that the scope of negation, and hence the condition on the main
clause for the selection of the positive tag, cannot be determined until the
application of the transformation that fronts a negative constituent (perhaps
topicalization) and subject AUX inversion (SAI) unless there is some syntactic
condition for SAI that is met by sentences with fronted negatives but not by
sentences with fronted topics. The correlation between wide scope and SAI is
well known, but the connection with tags has not, I believe, been previously
noted. Consider the following examples.
(38) a. With no job would John be happy, would(*n’t) he?
b. With no job John would be happy, would*(n’t) he?
(39) a. The election of no candidate did John predict, did(*n’t) he?
b. The election of no candidate, John predicted, did*(n’t) he?
(40) a. In not many years will Christmas fall on Tuesday, will it?
*won’t
b. In not many years, Christmas will fall on a Tuesday, *will it?
won’t
Given that the positive tag is a diagnostic for wide scope negation, it follows
that the logical form, which involves the scope of negation, presumably,
cannot be determined until after the application of these rules. If we failed
to map surface structures into logical form we would be in effect claiming the
grammaticality of all of the starred a-sentences in (38)–(40).
One way of avoiding the conclusion that surface structure determines
logical form is to show that there is a syntactic difference between cases in
which a fronted constituent triggers SAI, and those in which a fronted
constituent does not. Such a syntactic difference is proposed by Rochemont
(1978); another would be the difference between topic position (TOP) and
COMP position discussed by Chomsky (1977). Thus far the evidence to
support such a difference is not compelling, but it is at least clear why such
evidence is important. If such a syntactic difference can be maintained, the
rule of interpretation for wide scope of interpretation can be stated in terms
of the configuration that triggers SAI. Otherwise, such a rule of interpretation
negative curiosities 179

would have to be stated in terms of the surface structure output of SAI, since
only the application of SAI would provide the crucial condition for wide
scope.
To avoid relating surface structures directly to logical form we could also
seek an analysis in which the syntactic structure contains a trigger both for the
relevant transformations and the wide scope interpretation. Suppose for
example, that after fronting of with no job we have one of the two following
intermediate structures.
(41) a. With no job NEG John would be happy.
b. With no job John would be happy.
NEG in (41) would trigger SAI and would assign wide scope negation to with
no job.
While the broad outlines of an analysis of this sort may be easy enough to
talk about, the details are neither trivial nor self-evident. Most significantly,
recall that we had found it possible to make use of Dresher’s general rule of
NP Scope Interpretation (32) to explain the possibility of having wide scope
negation given a negative indefinite. An analysis involving NEG divides the
responsibility for assignment of wide scope negation among two rules, one of
which is (32), and the other of which applies just in case a negative has been
fronted and the sentence contains NEG.
A proponent of the analysis involving NEG would naturally seek to gener-
alize NEG to all instances in which a negative constituent has wide scope,
whether or not it is fronted. Such a generalization still leaves us with Dresher’s
(32) for non-negative NPs, so that we will still have two rules for assigning
wide scope. On the whole it does not appear that anything is to be gained by
attributing wide scope negation to an abstract marker NEG except that we
could then avoid the conclusion that wide scope of negation is determined at
surface structure. Properly formulated, an analysis involving NEG would
allow the scope of negation to be determined in deep structure or at some
early stage in the derivation.

6.3 Any
We turn now to examples involving rules other than SAI that also suggest that
surface structure is the determining level of the scope of negation, and hence
of logical form. In order to maintain the claim that logical form is, in contrast,
determined after the rules of ‘Core Grammar’, it would appear to be necessary
to extend the definition of Core Grammar so broadly that it would lose all of
its theoretical interest.
180 explaining syntax

It is well known that any can be interpreted as an existential when it is


within the scope of a negative. So, the interpretation of (42) will be roughly
(43), following the pattern of (30).
(42) I don’t have any money.
(43) not (∃x)(money(x))[ x n [ I have hen ]]
The existential interpretation of any, it can be shown, depends on surface
structure. Any transformation that reorders the negative and the any yields
a structure for which this interpretation is not valid. This is because in order
for any to be within the scope of negation, it must be both preceded and
c-commanded by the negative. Some simple examples in which one or both of
these conditions are violated are given below in (44).
(44) a. *I didn’t leave, and John has any money.
b. *Any of the men didn’t see John.
c. *I gave anyone nothing.
In (44a), the negative precedes but does not c-command any. In (44b), the
negative neither precedes nor c-commands any. In (44c), the negative argu-
ably c-commands any, but does not precede it.
The first rule that we will consider that affects any is Heavy NP Shift. The
rule is illustrated in (45), and its effect on the scope of negation is illustrated in
(40) and (47).
(45) a. John gave [the books that he found] to the library on the next block.
b. John gave to the library on the next block [the books that he found].
(46) a. John gave [none of the books that he found] to any of the libraries in
the city.
b. *John gave to any of the libraries in the city [none of the books that
he found].
(47) a. *John gave [any of the books that he found] to none of the libraries
in the city.
b. John gave to none of the libraries in the city [any of the books he had
found].
What is interesting is that not only does the example with any turn out to be
ungrammatical when the negative is moved to the right of any, but moving
any to the right of the negative allows for a successful interpretation of any.
Example (48) shows that the negative constituent still has wide scope after
Heavy NP Shift has applied, so that the problem is not simply that there is no
wide scope in (46b).
negative curiosities 181

(48) John gave to the libraries in the city none of the books that he found,
did he?
In order to get the wide scope interpretation in (48) it is necessary to stress
none.d
Another observation about these sentences is the following: in general, the
meaning of a sentence after Heavy NP Shift has applied is identical to its
interpretation before Heavy NP Shift, suggesting on a classical model that the
interpretation be assigned before the rule applies. To interpret the sentence
after the rule applied would require that we reconstruct the original position
of the moved NP and move it back ‘in the semantics’. However we cannot
completely interpret the sentence before Heavy NP Shift if the scope of
negation is part of the interpretation of the sentence, since Heavy NP Shift
affects logical form. This appears to put us in somewhat of a quandary. So,
in deriving Heavy NP Shift we have to do the following. (i) We must
specify what surface position in the VP the direct object will have; (ii) we
must specify that the direct object functions as such; (iii) we must specify
the interaction between negatives and indefinites in terms of (i), not (ii).
These observations are consistent with the position that at least in part the
interpretation of the sentence depends strictly on surface structure after a
stylistic rule.7
A second construction that is thought of as stylistic (see Rochemont (1978))
but that affects logical form is Stylistic Inversion. In Culicover, 1977 I suggest
that the derivation of this construction has two parts. One part fronts the

d
This interaction between the position of negation and the position of the heavy NP is
consistent with the view that Heavy NP Shift is not movement, but an alternative ordering
within VP.
7
If the example in (ii) below is acceptable, there may be a stylistic rule that does not affect
logical form. Consider the rule of VP topicalization
(i) a. They said that John wouldn’t give the paintings to Mary, and he didn’t give the paintings
to Mary.
b. He said that John wouldn’t give the paintings to Mary, and give the paintings to Mary he
didn’t.
If the VP contains any in the scope of Aux negation, we get the following:
(ii) They said that John wouldn’t give any of the paintings to Mary and give any of the paintings
to Mary he didn’t.
If (ii) is good, it means that the scope of negation over any is determined before VP topicaliza-
tion. However, if (ii) is bad, and it probably is, VP topicalization must precede the assignment of
wide scope to not in the main clause.
182 explaining syntax

sister of an intransitive verb and leaves behind a dummy, and the other part
moves the subject into the position of the dummy.e (49) illustrates.
(49) a. John walked into the room. )
b. Into the room John walked ˜ )
c. Into the room walked John.
Each of these two rules can change the relative order of an indefinite and
negation, and this clearly has consequences for the interpretation.
(50) a. *Any of the men didn’t walk into the room.
b. Into the room didn’t walk any of the men.
(51) a. None of the men walked into the room.
b. Into the room walked none of the men.
(52) a. None of the men walked into any of the rooms.
b. *Into any of the rooms walked none of the men.
(53) a. The men didn’t walk into any of the rooms.
b. *Into any of the rooms didn’t walk the men.
(54) a. *Any of the men walked into none of the rooms.
b. Into none of the rooms walked any of the men.
Another rule, also stylistic, is extraposition of PP or PPEXT (Rochemont
1978). This rule also affects logical form, as shown below.
(55) a. Pictures of the women were hanging on the wall.
b. Pictures were hanging on the wall of the women.
(56) a. Pictures of none of the women were hanging on the wall.
b. Pictures were hanging on the wall of none of the women.
(57) a. Pictures of none of the women were hanging on any of the walls.
b. *Pictures were hanging on any of the walls of none of the women.
(58) a. *Pictures of any of the women were hanging on none of the walls.
b. Pictures were hanging on none of the walls of any of the women.

e
This derivation is somewhat different from the one that I proposed subsequently with
Levine in Culicover and Levine (2001), reprinted in this book as Ch. 9. The Culicover–Levine
analysis proposes that there are in fact two constructions. The details of the configuration of PP
and logical subject turn out not to be relevant, however, as long as the PP c-commands the
logical subject (so that any is licensed); the argument made here is that what matters is the linear
order of the constituents.
negative curiosities 183

(59) a. *Pictures of any of the women weren’t hanging on any of the walls.
b. Pictures weren’t hanging on any of the walls of any of the women.
Here, as elsewhere, we find ourselves in a somewhat puzzling situation. On
the one hand, we wish to represent the fact that the broken up constituent in
fact is interpreted as a constituent, and we might do this by mapping the
constituent into some semantic representation before PPEXT, for example. If
the co-occurrence of negation and indefinites with respect to one another is
an interpretive phenomenon, which it is in part, it might reasonably be
expected to be stated at this level of representation. But it cannot be, because
PPEXT can reorder the negatives and the indefinites.
What is particularly surprising the case of these last examples is that the
negative in the extraposed PP is sufficient to yield a sentential negative
interpretation for the entire constituent from which it was extraposed, but
this negative interpretation does not govern the any that follows. That is, in
(56b) we get a perfectly reasonable interpretation that no pictures of any of
the women were hanging on the wall. (We also get the odd but not totally
implausible interpretation that pictures depicting womenlessness were hang-
ing on the wall.)
However, as (57b) shows, this interpretation is still not sufficient to allow
any to appear. What this suggests is that the reading no pictures of any of the
women is an entailment of (57), and that there in fact is no level of representa-
tion where (56a) and (56b) or any of the other pairs are represented identi-
cally. How the rules for entailment are properly to be stated is a problem for
future study, and have interesting implications for accounts of strict surface
structure interpretation by a comprehension device.f
I conclude with a related matter, but one that does not involve any. It turns
out that there are parentheticals that must co-occur with sentential negation.
One such is I don’t think. Below it is compared with I think.
(60) a. John isn’t here, I (don’t) think.
b. John is here, I (*don’t) think.
It is well known that parentheticals may appear internal to sentences, and this
is illustrated by (61).
(61) John, I think, is here.
I don’t think can also appear internally. However, it turns out that in order for
I don’t think to be acceptable, it is not sufficient that the sentence contain

f
How this interpretation would work is the concern of Ch. 7.
184 explaining syntax

sentential negation that takes scope over the parenthetical. Rather, in addition
to this, the negative element must precede I don’t think in surface structure.
(62) a. John doesn’t believe that Mary is here, I don’t think.
b. John doesn’t believe, I don’t think, that Mary is here.
c. John doesn’t, I don’t think, believe that Mary is here.
d. *John, I don’t think, doesn’t believe that Mary is here.
The problem in (62d) is not the surface structure position of the parenthetical
(before AUX) per se, since a negative subject also allows the parenthetical.
(63) a. No one believes that Mary is here, I don’t think.
b. No one believes, I don’t think, that Mary is here.
c. No one, I don’t think, believes that Mary is here.
The requirement that the negation be sentential in scope is shown by pairs like
the following.
(64) a. *In not many years, Christmas will fall on a Tuesday, I don’t think.
b. In not many years will Christmas fall on a Tuesday, I don’t think.
And note the following as well.
(65) a. *In not many years, I don’t think, Christmas will fall on a Tuesday.
b. In not many years, I don’t think, will Christmas fall on a Tuesday.
That the parentheticals have anything directly to do with determining logical
form is unlikely. Nevertheless, the examples show that aspects of the scope of
negation cannot be determined at intermediate levels of the derivation, but at
surface structure. For example, if there is in fact a (stylistic) rule that moves
constituents around parentheticals [or parentheticals around constituents],
this rule must precede assignment of scope of negation so that it can be
determined in surface structure whether the internal parenthetical is to the
right of a negative that has wide scope. Specification of the scope of negation
before reordering of the parenthetical would yield the ungrammatical
examples of (60)–(64). Thus we may hypothesize that assignment of scope
of negation follows the reordering of parentheticals.

6.4 More curiosities


Let us consider now several problems that are more or less related to those
already discussed, and in particular bear on the matter of where logical form is
negative curiosities 185

determined. It has been known for some time8 that a well-formed relative
clause need not have a relative pronoun, that or ∅ in COMP position. The
following examples show in fact that the rule fronting constituents in Stylistic
Inversion (cf. (49b)) may front a constituent that is in no obvious sense a
wh-phrase. The crucial sentence is (66c).
(66) a. ?This is the church which very expensive paintings are hanging on
the walls of. (wh-Fronting)
b. This is the church on the walls of which are hanging very expensive
paintings. (Stylistic Inversion)
c. This is the church hanging on the walls of which are very expensive
paintings. (Stylistic Inversion)
The fronted constituent in (66c) is a VP, presumably.9 Note that if the
condition requiring the relative clause to have a wh-phrase or a phrase
containing a wh-phrase in COMP is in fact a condition of logical form, this
condition cannot be applied until after the application of the rules deriving
Stylistic Inversion. It is possible, though, that only the inversion of the subject
is a stylistic rule, a possibility that will be discussed in somewhat more detail
below.
Let us turn now to an argument that wh-Fronting must apply after Stylistic
Inversion. If this argument is correct, it would follow that wh-Fronting could
not be a rule of Core Grammar, rendering the latter of little theoretical
interest. However, we will see that it may be possible to distinguish two
rules of wh-Fronting, along the lines suggested by Koster (1978a), thus
avoiding this conclusion.
In the following examples, Stylistic Inversion appears to have applied in
the lower S before wh-Fronting has moved the clause containing wh to the
higher S.
(67) This is the wall on which Mary claims were hanging twelve ghastly
pictures of Nixon.
(68) On which of these walls does Mary suspect were hanging the ghastly
pictures of Nixon?
The fronted constituent need not be a PP.

8
See Emonds (1976).
9
It would be natural to try to explain the fact that Stylistic Inversion applies to hanging on the
wall by reanalyzing it as something other than a VP, or by motivating a feature decomposition of
VP to allow generalization with other constituents that also trigger the rule. For some specula-
tion, see Culicover (1982).
186 explaining syntax

(69) This is the wall, hanging on which Mary claims were twelve ghastly
pictures of Nixon.
(70) Hanging on which of these walls does Mary suspect were the ghastly
pictures of Nixon?
Since Stylistic Inversion occurs in the lower S, but the wh-phrase ends up in
the higher S, we must conclude that the wh-phrase is fronted in the lower
clause first by Stylistic Inversion, and then moved into the higher clause by
wh-Fronting.
(71) This is the wall [COMP Mary claims [COMP twelve ghastly pictures of
Nixon were hanging on which]] )
This is the wall [COMP Mary claims [on which were hanging twelve
ghastly pictures of Nixon]] )
This is the wall [on which Mary claims [Ø were hanging twelve ghastly
pictures of Nixon]]
Koster (1978a) suggests that the rule of wh-Fronting that moves wh-phrases
out of complements is not a rule of Core Grammar, while wh-Fronting in
simple S’s is. Thus the examples in (67)-(70) simply show that the first rule
must follow Stylistic Inversion, but the second, core rule does not.
It might be supposed that this is an undesirable result, because it requires
that we break the one maximally general transformation of wh-Fronting into
two rules. However, Koster (1978a) also proposes that there is no rule of
wh-Fronting at all. Rather, what is part of Core Grammar is the coindexing of
an initial wh-phrase with its trace, while what is not part of Core Grammar is
a configuration in which an initial wh-phrase may bind a trace in a lower
clause. We need not concern ourselves with the technical details here.
Adopting this analysis of wh-Fronting requires us to reanalyze Stylistic
Inversion along the following lines: the topicalized constituent is generated
in initial position in the base; this constituent binds a trace; inversion of the
subject depends on the condition that the verb phrase contain a trace of an
intransitive V in the following configuration.10

10
The verb must be intransitive, because in general the subject of an S cannot be moved into
direct object position when the direct object has been fronted. It is necessary to specify in the
statement of the rule that the only daughters of VP are V and the trace, so that the subject does
not move into the position occupied by the trace of the object of a prepositional phrase.
(i) a. *Whoi did ei see Billj?
b. *Which tablei did ej sit [PP on Maryj]?
negative curiosities 187

(72) [Hanging on which of these walls]i [ . . . twelve pictures of Nixon [VP


were ei]]
Notice that this analysis commits us to generating hanging on which of these
walls in initial position in the base. Furthermore, this phrase must count as a
wh-phrase in an analysis such as Koster’s, and must also trigger SAI. Whether
these are acceptable consequences for such a framework is unclear.
However, there are additional examples that suggest that the fronted
constituents in sentences involving Stylistic Inversion cannot be treated the
same way by the syntax as normal topics and fronted wh. The following
appear to be quite acceptable.11
(73) a. In the room seemed to be a friend of Bill’s.
b. Onto the table tried to climb an enormous elephant.
c. To our next party promised to come all of our friends from Missouri.
Somewhat more marginal but still acceptable are the following.
(74) a. I expect on this table to be a keg of beer and on that table to be a
pound of Greek olives.
b. Mary believes in the next room to be an enormous elephant.
If the latter examples are to be generated, they present a problem, because in
general infinitives cannot have constituents in topic position, as the examples
in (75) indicate.
(75) a. *I expect onto this table a keg of beer to fall and on that table a
pound of Greek olives to be sitting.
b. *Mary believes in the next room an enormous elephant to be sleeping.
However, given the rather marginal nature of these examples perhaps we
should not make too much of them.g,12

11
Such examples were discussed by Akmajian in a paper presented to the LSA in 1974 at San
Diego.
g
Marginal though these examples are, Culicover and Levine (2001) end up making some-
what more of them than is proposed here.
12
Suppose we grant that there is no topic position in infinitives. Then examples like the
following argue for a trace-filling analysis of Stylistic Inversion.
(i) In this room John expects to be sitting an enormous elephant.
Since there is no position in the infinitive into which to move the adverb, it is impossible to
trigger Stylistic Inversion on the lower cycle. When the adverb is moved on the higher cycle, the
subject NP is not an enormous elephant, but John. However, we do not get
(ii) *In this room expects an enormous elephant to be sitting John.
188 explaining syntax

The examples in (73) present a different but related problem. On an


analysis in which the subject of seems is the underlying subject of the infini-
tive, (73a) will be derived in the following way.
(76) In the roomi ˜ seemed [S a friend of Bill’s to be ei] )
In the roomi ˜ seemed [S to be a friend of Bill’s]
The problem is that the dummy subject of seems is never filled in this analysis.
We cannot avoid this problem by adopting Koster’s (1978a) proposal that
the subject of seemed binds trace in the lower clause, as in (77).
(77) In the roomi a friend of Bill’sj seemed [S ej to be ei]
Movement of a friend of Bill’sj into ei would derive the correct string, although
we might question the propriety of such a lowering, and we might wonder
what the statement of the rule might be. It is not clear why ej would not be
moved into ei on the lower S before coindexing on the higher S, leaving in the
room free. On the other hand, in a framework like Koster’s the presence of ej
ought to block movement of a friend of Bill’s into the lower S. It is entirely
possible, of course, that a stylistic rule does not interact either with coindex-
ing (a rule of Core Grammar) or with the constraints on rules of Core
Grammar, and such a possibility would have to be explored fully if we were
attempting to develop an account of Stylistic Inversion within a framework of
the sort suggested by Koster.

6.5 Conclusion
It appears that the scope of negation must be determined in surface structure
if SAI is not syntactically triggered, that tags cannot be generated by a
transformation, and that Stylistic Inversion must precede certain instances
of wh-Fronting. More generally, it seems to be the case that logical form
cannot be completely determined before surface structure, although in certain
constructions earlier levels of structure may contain sufficient information for
the assignment of logical forms. These results cast some doubt on the notion
that there is a level of logical form defined as the output of the transform-
ations wh-Fronting and NP Movement.

If the directional adverb is generated in initial position in the base, the trace of the adverb is
already in underlying structure, and Stylistic Inversion may apply cyclically. This argument is
vitiated if it turns out that sentences like (i) are ungrammatical (my judgments are unclear), or,
if there is COMP position in the infinitive through which a directional adverb may move.
PART II

Structures
This page intentionally left blank
7

Deriving dependent right


adjuncts in English
(1997)*
Michael S. Rochemont and Peter W. Culicover

Remarks on Chapter 7
Michael Rochemont and I wrote this paper for a conference on rightward
movement at Tilburg University. While we believed that extraposition could
be handled by interpretive rules, we were interested in seeing if we could find
conclusive arguments for or against treating extraposition as movement. The
antisymmetry perspective of Kayne (1994) ruled out a rightward movement
account of extraposition and Heavy NP Shift, and required such apparent
rightward movements to be a remnant of massive leftward movement. In the
course of the research we realized that the antisymmetry approach also allows
for an analysis of these constructions in terms of massive rightward move-
ment, given alternative assumptions about branching direction. Crucially, we
found that there was no empirical evidence to decide among the various
alternatives, and in the interest of keeping the syntax as simple as possible, we
concluded that the interpretive position was the preferred one.

7.1 Introduction
In this paper we will be concerned with the properties of rightward positioned
adjuncts in English that are in some sense dependent for their interpretation on
a position elsewhere in the sentence, e.g. relative and result clause extraposition

* [This chapter appeared originally in Dorothee Beerman, David LeBlanc, and Henk van
Riemsdijk (eds), Rightward Movement. Amsterdam: Benjamins (1997). It is reprinted here by
permission of John Benjamins. For their comments we would like to thank Bob Levine, Louise
McNally, and the members of audiences at the University of Groningen, Tilburg University,
Université du Québec à Montréal, and the University of British Columbia. Michael Rochemont’s
work on this project was supported by grant no. 410-92-1379 from the Social Sciences and
Humanities Research Council of Canada.]
192 explaining syntax

and rightmost positioned (argument) heavy noun phrases. These constructions


seem to be the strongest cases in English for rightward movement. We have
argued in previous work that this is not the correct account of extraposition
constructions. On the basis of contrasts between these constructions and
rightmost heavy NP constructions, we have argued that only the latter are
derived by rightward movement (see Culicover and Rochemont 1990; Rochem-
ont and Culicover 1990; Rochemont 1992). Our goal here is to re-examine the
evidence presented in favor of these conclusions in light of the possibility that
syntactic theory permits no construction to be derived by rightward movement.
It will be seen that the facts about extraposition can be fully accommodated on
a leftward movement account in which the extraposed constituent achieves its
rightmost position through the leftward movement of other elements in the
sentence. We will also show that it is also possible to provide a leftward
movement analysis of the rightmost heavy NPs that is fully compatible with
the data that we consider. In both cases we will argue that successful leftward
movement accounts must have certain characteristics that hold also of success-
ful accounts that are compatible with rightward movement or adjunction.
Given that the two sets of constructions (the various extrapositions and the
rightward positioned heavy noun phrases) display differing characteristic
properties, we will examine the two classes separately. In each case, we
proceed by uncovering some central empirical generalizations that must be
captured under any account and show how they are captured on our own
rightward movement/adjunction analyses. Armed with these descriptive cri-
teria, we then explore a variety of leftward movement alternatives to test their
empirical adequacy in light of the generalizations.

7.2 Properties of extraposition constructions


7.2.1 Relative clause extraposition
The fundamental issue is where the extraposed clauses are adjoined. The
evidence that bears on the site of attachment of an extraposed clause is: (i) can
it be construed with a given antecedent, (ii) constituency, (iii) c-command,
(iv) relative order (assuming this to correlate in some fashion with height of
attachment). The evidence that we have developed in earlier work suggests the
following generalization: the interpretation and acceptability of an extraposed
relative clause is determined by the S-structure position of its antecedent
(Culicover and Rochemont 1990, henceforth C&R). What this means, modulo
a particular analysis, is that a relative clause related to an object (OX) is
attached closer to its antecedent than is a relative clause related to a subject
(SX). A relative clause related to a subject is attached closer to its antecedent
than is a relative clause related to an antecedent in COMP (WhX).
deriving dependent right adjuncts 193

For clarity of presentation we will illustrate using classical assumptions


regarding phrase structure and linear order. Note that we are abstracting from
questions of movement. We are looking just at the site of attachment of the
relevant phrase at the surface. We will also suppose for the sake of illustration
that X-bar theory permits structures with rightward adjunction, regardless of
how that is achieved.
Here is the data. The first type of evidence involves simply relative linear
order, which in traditional phrase structure terms has often been taken to
correspond to relative height of attachment. The examples in (1) show that in
a sentence with both an object and a subject-extraposed relative, the phrase
extraposed from object must precede that extraposed from subject. That is,
the object-extraposed relative is attached closer to the object than is the
subject-extraposed relative.
(1) a. A man entered the room last night that I had just finished painting
who had blond hair.
b. *A man entered the room last night who had blond hair that I had
just finished painting.
(Rochemont and Culicover 1990 (R&C))
A relative extraposed from a wh-phrase in COMP (WhX) must follow a
subject (2) or an object (3)/(4) extraposed relative. Note that what is relevant
is the surface position of the antecedent, as shown by the examples in (3) and
(4), where only LF movement of the object wh-phrase is irrelevant to the
construal of OX.
(2) a. ?(?)Which room did a man enter last night who had blond hair that
you had just finished painting?
b. *Which room did a man enter last night that you had just finished
painting who had blond hair?
(3) a. ?Which man entered which room last night that you had just finished
painting who had blond hair?
b. *Which man entered which room last night who had blond hair that
you had just finished painting?
(4) a. Which article did you find on a table yesterday that was in the living
room that you claimed was written by your best friend.
b. *Which article did you find on a table yesterday that you claimed was
written by your best friend that was in the living room.
These facts from relative linear ordering of extraposed relatives are compatible
with a classical structure as in (5).
194 explaining syntax

(5) CP

CP WhX

WH C⬘

C IP

IP SX

NP I⬘

I VP

VP OX

V NP
Constituency tests such as VP ellipsis, VP topicalization, and pseudo-cleft give
results that are consistent with this structure (see R&C), but they are consist-
ent with plausible alternatives, so we will not discuss them here.
The varying potential for coreference under Condition C of the Binding
Theory is also compatible with the same differences in adjunction positions.1
Example (6) shows that the subject c-commands an object-extraposed rela-
tive, and the examples in (7) show that an indirect object c-commands an
object relative only in its non-extraposed position.2 (It is not possible to

1
We do not consider parallel facts from bound variable interpretations of pronouns, though
the results are for the most part equivalent to the Condition C effects observed here. The
interpretation of variable binding examples is somewhat more complicated than the Condition
C facts, owing to the possibility that the former is constrained by LF c-command relations, as
suggested by the literature on weak crossover (see Culicover 1993a and Lasnik and Stowell 1991
for some recent perspectives).
2
As pointed out to us by Bob Levine, our account of (7b) presupposes that there cannot be
any ‘vacuous’ extraposition, in which the relative clause is adjacent to the head noun but
adjoined to the VP. Levine also notes that there may be some question as to the ungrammatic-
ality of (7b), in view of the relatively greater acceptability of examples such as the following.
(i) I offered heri many gifts from Central Asia that Maryi didn’t like.
In these examples, it appears that the PP internal to NP is sufficient to permit coreference. If this
is the case, then it is not clear that a similar effect is not in effect in (7b). Hence it is possible that
vacuous extraposition is may exist. Note that this possibility cannot be ruled out on the account
of C&R.
deriving dependent right adjuncts 195

construct a relevant example to test whether the subject c-commands SX,


because the subject itself would have to be pronominal.)
(6) *Shei invited many people to the party that Maryi didn’t know.
(7) a. I sent heri many gifts last year that Maryi didn’t like.
b. *I sent heri many gifts that Maryi didn’t like last year.
(C&R)
The examples in (8) show that the subject does not c-command a relative
extraposed from a wh-phrase in its COMP.
(8) a. *Shei [VP[VP invited several people to the party] [CP that Maryi didn’t
like]].
b. How many people did [IP shei invite to the party] [CP that Maryi
didn’t like]?
(based on C&R)
The examples in (9) show that a matrix subject c-commands an embedded
extraposed relative, whether from object, subject, or wh-phrase in COMP.
(9) a. *[Shei said [that I sent heri many gifts last year]][that Maryi didn’t like]
b. *[Shei wondered [how many people [IP shei invited to the party]]][CP
that Maryi didn’t like]
c. *[Hei said [that a man came into the room]][that Johni didn’t like]
(based on C&R)

An alternative hypothesis is that a dative pronominal does not c-command to the right in
VP. This possibility would appear to be falsified by examples such as the following.
(ii) a. *I told heri that Maryi would win.
b. *I offered heri Maryi’s favorite food.
c. *I gave heri some flattering pictures of Maryi.
The contrast between the examples in (ii) and (i) recalls the contrast between arguments and
adjuncts noted by Lebeaux (1988) in connection with anti-reconstruction effects, as in (iii).
(iii) a. Which gifts from Central Asia that Maryi didn’t like did shei try to sell to someone else?
b. ?Which of Maryi’s favorite foods did shei prefer?
c. *Which pictures of Maryi did shei like best?
Lebeaux’s observation is that pronominal subjects appear to produce condition C effects with
R-expressions in fronted arguments but not adjuncts. The facts in (i) and (ii) suggest that dative
pronouns produce condition C effects in R-expressions to the right of them that are in argument
position, but not those that are in adjuncts. A related point is made in fn. 3 below.
196 explaining syntax

Example (10) shows that a matrix subject does not c-command a relative
extraposed from wh in its own COMP, even if it does c-command the trace of
the wh. (Compare (9c).)3
(10) Which man did hei say came into the room that Johni didn’t like?
Finally, (11) shows that it is the surface and not the LF position of the
antecedent that is relevant to the positioning of the extraposed relative.
(11) a. *Who told heri that Sam was taking a student to the dance [CP that
the teacheri liked]?
b. *Who told heri that Sam was taking [which student] to the dance [CP
that the teacheri liked]?
(C&R)
To conclude, the height of attachment of an extraposed relative is a function
of the surface position of its antecedent. That is, given (5), an extraposed
relative is adjoined to the minimal maximal projection containing its surface
antecedent.

7.2.2 Result clause extraposition


Continuing to make the same assumptions about phrase structure, we can
show from the coreference data that result clauses also have their bounded-
ness determined by the position of their antecedent. In this case, however, the
antecedent is so in its LF position. The contrast in examples (12) shows the
difference in height of attachment for comparable extraposed relative and
result clauses; a subject fails to c-command an object extraposed result clause.
(12) a. *Shei met few people at the party who Maryi upset.
b. Shei met so few people at the party that Maryi was upset.
(based on Guéron and May 1984 (G&M))

3
Bob Levine has pointed out to us that the absence of a Condition C violation in (10) appears
to parallel the anti-reconstruction facts discussed by Lebeaux (1988) (see also fn. 2 above).
(i) a. which man that Johni didn’t like did hei say came into the room
b. *whose claim that Johni was a spy did hei refuse to acknowledge
(ii) a. which man did hei say came into the room that Johni didn’t like (= (10))
b. *whose claim did hei refuse to acknowledge that Johni was a spy
If the adjuncthood of the relative clause is responsible for the absence of a Condition C violation
in (i.a), and not its adjunction site, then our argument is somewhat weakened. On the other
hand, it is possible that in (ii.b) the extraposed complement is adjoined above the subject, but
because it is an argument it undergoes reconstruction, which feeds Condition C. In this case, the
higher adjunction of the complement would not be sufficient to allow it to avoid Condition C,
while the higher adjunction of the relative clause would be.
deriving dependent right adjuncts 197

Even a matrix object (13) or matrix subject (14) can fail to c-command a result
clause extraposed from within the embedded complement.
(13) a. *I told heri that the concert was attended by many people last year
that made Maryi nervous.
b. I told heri that the concert was attended by so many people last year
that I made Maryi nervous.
(G&M)
(14) a. *Shei told me that the concert was attended by many people last year
that made Maryi nervous.
b. Shei thought that the concert was attended by so many people last
year that Maryi decided not to go this year.
Following G&M, we propose that so is the LF antecedent of the result clause.
That so has potentially different scope interpretations at LF is shown by (15),
whose two readings may be informally represented as (15a,b).
(15) Mary believes that Harryi is so crazy that hei acted irrationally. (G&M)
a. Mary believes that so [Harry is crazy][that he acted irrationally]
b. so [Mary believes that Harry is crazy][that he acted irrationally]
The two readings of (15) may be paraphrased as follows: (a) Mary has the
belief that Harry is so crazy that he acted irrationally, or (b) the extent to
which Mary believes that Harry is crazy is such that he acted irrationally. Let
us suppose that the result clause is adjoined to the clause over which so takes
scope at LF. This gives the correct results for an example like (16), where the
only reading compatible with Condition C places the result clause outside the
c-command domain of the matrix subject and correspondingly forces only
the wide scope reading for so; unlike (15), (16) is unambiguous.
(16) Shej believes that Harryi was so crazy that Maryj left himi.
With Guéron and May, we propose that so undergoes LF raising to achieve its
proper scope. Unlike Guéron and May, however, we suppose so to move at LF
as an adjunct. We therefore correctly predict that it will display LF island
effects with sentential subjects (17), wh-islands (18), complex NPs (19), and
adjunct islands (20b, 21).
(17) a. [[That so many people ate cheesecake] that we had to order more]
surprised us.
b. *[That so many people ate cheesecake] surprised us that we had to
order more.
(R&C)
(18) Mary wondered whoi was so crazy that hei acted irrationally.
198 explaining syntax

(19) a. Shei claimed that so many people left that Maryi must have been
lying.
b. *Shei made the claim that so many people left that Maryi must have
been lying.
(20) a. Shei tried to do so many pushups that Maryi hurt herself.
b. *Shei bent to do so many pushups that Maryi hurt herself.
(21) Shei hurried out after eating so much food that Maryi must have been
sick.
In all of these cases the coreference requires that the result clause be outside of
the clause that contains the so, because it has to be higher than the pronom-
inal. If so is prevented from moving, ungrammaticality or unambiguity
results. We conclude that the height of attachment of an extraposed result
clause is a function of the LF position of its so antecedent—the result clause is
adjoined at the surface to the clause to which so is adjoined at LF.
On the basis of our discussion of result and relative clause extrapositions,
we can state the following generalization: for both relative and result
clause extraposition, it is the antecedent that determines the
height of attachment of the extraposed phrase. In the case of relatives
it is the surface position of the antecedent, and in the case of result clauses it is
the LF position.4 This means that the extraposed clause can be no higher in
the tree than its antecedent, and it must be at least as high as its antecedent.
The precise interpretation of ‘high’ depends on independent assumptions

4
Since the bulk of our evidence for this generalization relies on Condition C effects, it might
be thought that the generalization is undermined by the observation that Condition C is
essentially an LF effect. The relative and result clauses might in fact be relatively ‘low’ in the
structure at the surface, and achieve positions satisfying the generalization only at LF under
movement. Our argument that this cannot be so is that extraposed clauses can be seen to appear
outside the clauses they ‘originate’ in even at the surface and quite apart from c-command
effects. In (i), the extraposed relative appears outside the temporal adverb even though the latter
is readily construed with the matrix verb. (That is, (i) can have the same meaning as (ii).) (See
R&C p. 37 for a similar example.)
(i) Mary expected her flight to be so late yesterday that she neglected to set her alarm.
(ii) Yesterday, Mary expected her flight to be so late that she neglected to set her alarm.
Similarly, (iii) can have the same meaning as (iv).
(iii) Shei thought that the concert would be attended by so many people last year that Maryi
decided not to go.
(iv) Last year, shei thought that the concert would be attended by so many people that Maryi
decided not to go.
We assume that since at the surface temporal adjuncts cannot escape from the clause they
originate in, they are similarly bounded at LF.
deriving dependent right adjuncts 199

about what the structures actually are. Given classical assumptions, we sup-
pose that the extraposed clause must be adjoined to the lowest maximal
projection that contains the antecedent; given other assumptions, which we
will discuss, the generalization would be implemented somewhat differently,
consistent with the differences in attachment that we have noted.

7.3 The Complement Principle


Let us now consider the question of what regulates the height of attachment
of extraposition. Assume a movement analysis. That the extraposed constitu-
ent must be adjoined at least as high as the antecedent follows directly from
proper binding. That the extraposed constituent can be adjoined no higher
than the maximal projection that contains the antecedent does not follow
from any independent constraints on movement. Subjacency allows in
principle for unbounded movement, and is therefore too weak. Ross’s Right
Roof Constraint is also too weak, in that it does not guarantee that a clause
extraposed from an object will adjoin no higher than to VP (Baltin 1981b). It is
also too strong, in that it prevents result clauses from being adjoined high
enough, in cases where the so antecedent escapes from its clause at LF (cf.
(16)).5 Given these difficulties, G&M propose, adapting Guéron (1980), that
the height of attachment of an extraposed phrase is regulated by a principle
that requires a local relation between the extraposed phrase and its S-structure
or LF antecedent. This principle is referred to by C&R as the Complement
Principle (CP). For present purposes, the precise formulation of the Comple-
ment Principle is not relevant. Suffice it to say that the Complement Principle
must have roughly the consequence in (22).
(22) Complement Principle: An extraposed phrase must be adjoined to the
minimal maximal projection that contains its antecedent.

7.4 Extraposition is not rightward movement


Once we have a principle such as the CP that guarantees the bounding effect
for extraposed constituents, the question then arises as to what purpose is
served by a movement analysis of extraposition. Note that under classical
assumptions, an adjunct can be freely generated to the right, subject only to
the condition that it be given a proper interpretation at LF (PFI, Chomsky
1986). This condition is satisfied by the CP, and so it relates the bounding
effects for extraposition to the need for full interpretation.

5
These observations motivate Baltin’s (1981) Generalized Subjacency.
200 explaining syntax

The argument against movement is reinforced by the observation that a


movement analysis is incompatible with well-established restrictions on
movement. In particular, extraposition from subject violates Subjacency/
CED. Result clause extraposition can violate the Right Roof Constraint, and
result clause extraposition is sometimes unbounded, while relative clause
extraposition never is.
Given that there is no need for a rightward movement analysis in order to
capture the bounding properties and the interpretation of extraposed clauses
(independently accomplished by the CP), C&R argue from Occam’s Razor
that a base-generation analysis of extraposition constructions is to be
preferred.

7.5 Leftward movement


While the account of C&R does not invoke rightward movement in extra-
position, it does require that extraposed phrases be base generated as right-
adjoined adjuncts. Let us suppose, with Kayne (1994), that there can be
neither rightward movement nor right adjunction. Can the generalizations
we have listed be captured on an account invoking only leftward movement?
In addressing this question, we will bear in mind here central empirical
consequences that a successful analysis must have:
(i) an object-extraposed relative is not c-command by an indirect object
(e.g. (7a));
(ii) a subject doesn’t c-command a relative extraposed from wh in its
COMP (e.g. (8b));
(iii) the subject of a clause over which so takes scope does not c-command
the extraposed result clause associated with so (e.g. (12b)).

7.5.1 Stranding
Consider first a stranding analysis of relative-clause extraposition, on which
extraposed relatives are stranded by leftward movement of the antecedent, on a
par with Sportiche (1988)’s analysis of Q-Float in French. This analysis fails the
first requirement, in that it assigns a structure on the order of (23), where the
indirect object c-commands the relative clause whether it is ‘extraposed’ or not.

(23) ... .

NPIO .

NPDO .

[tDO EX] ...


deriving dependent right adjuncts 201

Hence a pronominal IO will always c-command a relative clause in the DO,


whether it is stranded or not.
Requirement (ii) poses a similar problem, since the extraposed relative, if
stranded in an A-position, will certainly be c-commanded by the subject.
Regarding requirement (iii), there has to our knowledge been no proposal to
derive extraposed result clauses under stranding. One could imagine such an
analysis, where the result clause is stranded under leftward movement of so to
the specifier position of the phrase in which it surfaces. But this analysis
would place the result clause below all the preceding phrases, and so it would
always be improperly c-commanded, e.g. by a subject.
There is a fourth argument against the stranding analysis. Consider that it
is possible (see (24)) to extrapose a relative clause from the noun phrase
complement to an L-marked PPa (see Baltin 1978). But this would require
analyzing the P and antecedent of the relative as a constituent to the exclusion
of the relative, incorrectly predicting the possibility of examples such as (25).
(24) a. I found the article in a magazine yesterday that was on the coffee
table.
b. John talked to several people at the party who have blond hair.
(25) a. *In which magazine did you see it which was on the table?
b. *I noticed the mistake in a picture suddenly that had been taken of
Ronald Reagan.
(Example (25a) is taken from Baltin (1978: 82).)
While there may be other problems with the stranding analysis (e.g. how to
capture the relative ordering of the extraposed relative and other VP constitu-
ents), given these failings, we conclude that it is not plausible.

7.5.2 Low adjunct


On the second alternative, an extraposed constituent originates as a low,
relatively rightmost adjunct in a Larsonian-type cascade structure. We call
this the Low Adjunct Analysis (diagrammed in (26)). This analysis can readily
generate both relative and result clause extraposition. However, it faces the
same difficulties as the stranding analysis. Every argument that precedes the
extraposed phrase must c-command it, in violation of requirements (i), (ii),
and (iii).

a
That is, a PP that is an argument in virtue of being selected by a lexical head.
202 explaining syntax

(26) ... .

NPS .

NPDO EX

7.5.3 High specifier


A third possibility for leftward movement is that an extraposed phrase
originates in a specifier position higher than a specifier position that is the
ultimate landing site of its antecedent. We call this the High Specifier Analysis.
The phrase containing the antecedent then raises, either to a still higher
specifier position or perhaps to adjoin to the specifier position containing
the extraposed phrase. What is crucial in either alternative is that the extra-
posed clause at some point in the derivation is higher and to the left of its
antecedent, and a phrase containing the antecedent moves to the left of the
extraposed clause. Example (27) illustrates for the result clause case, (28) for
the case of a relative extraposed from a wh-phrase in COMP, and (29) for an
object extraposed relative in a double-object construction.

(27) ... .

Spec .

X .

Spec .

RX X IP

...[NP ... so ... ] ...

(28) ... .

Spec .

X .

Spec .

WhX X IP

... WhP ...


deriving dependent right adjuncts 203

(29) ... .

Spec .

X .

Spec .
.
OX X
.
IO
DO .

We must assume that some principle like the Complement Principle guaran-
tees the proper interpretation of the result/relative clause, and that the
structures in (27)–(29) appear at the appropriate level of clausal embedding.
One virtue of this analysis is that it readily captures the relative order of
relative clauses and other extraposed constituents. It also satisfies our three
requirements. Since the relevant arguments will always be contained in a
projection that excludes the extraposed constituent (the boxed constituent
in each structure), they will always fail to c-command the extraposed con-
stituent. In effect, leftward movement is producing the mirror image of the
underlying order without disturbing the crucial c-command relations. We say
‘crucial’ because certainly the structure in this case is different from the
adjunction structure that we assumed in the classical approach. But it is
possible to define a type of c-command such that the specifier containing
the extraposed clause c-commands the constituent containing the antecedent.
Of the three alternatives that we have considered, this last is the only one
that seems viable given the evidence that we have discussed. We emphasize
that while this is a leftward movement analysis, as opposed to base generation,
it too requires a version of the CP. This analysis remains incomplete, of
course, without (i) some account of why the boxed phrase must move,
(ii) independent motivation for the structures assumed, and (iii) an explan-
ation of what licenses the required movements, e.g. movement of IP across RX
into a higher Spec in (27).

7.6 HNPS and PTI


7.6.1 Properties
We cite here six properties of Heavy NP Shift (HNPS) and Presentational
there Insertion (PTI) that are consistent with the heavy NP (HNP) moving to
204 explaining syntax

a right adjoined A0 position. First, HNP is an adjunct, as shown by the fact


that nothing can be extracted from it, either in PTI or in HNPS.
(30) a. *Which famous actor did there appear in the newspaper a picture of?
b. ?Which famous actor did a picture of appear in the newspaper?b
(31) a. John noticed a picture of his mother on the wall.
b. John noticed on the wall a picture of his mother.
c. Who did John notice a picture of on the wall?
d. *Who did John notice on the wall a picture of?
(32) a. Who did John sell Mary a picture of?
b. *Who did John sell to Mary a picture of?6
(Wexler and Culicover 1980; Rochemont and Culicover 1991)
Second, an NP in indirect object position cannot undergo HNPS, just as a wh-
phrase in this position cannot undergo wh-Movement (Larson 1988: 354).
This suggests that HNPS, like wh-Movement, is A0 -movement. A-movement
of the dative NP is possible, of course.
(33) a. Bill gave John t yesterday the book that he was looking for.
b. What did Bill give John t yesterday?
c. *Bill gave t the book yesterday anyone who wanted it.
d. *Who did Bill give t the book yesterday.
(34) Bill was given the book.
Third, in HNPS, the HNP licenses a parasitic gap, which suggests that it is in
an A0 position.7
(35) I filed t without reading pg [all of the reports that you gave me]
Fourth, HNPS and PTI appear to ‘freeze’ the constituent from which the HNP
is ‘shifted’, as shown by the following.
(36) a. Who did John give the picture that was hanging on the wall to t?
b. *Who did John give to t the picture that was hanging on the wall?
(37) a. *Which room did there enter t a man with long blond hair?
b. *I don’t remember which room there walked into t a man with long
blond hair.

b
Our original judgment of this examples was ‘*’, which I now believe is too strong. For
discussion of fully acceptable or almost fully acceptable extraction from subject NPs, see
Kluender (2004).
6
There are those who do not share our judgments about this example. To us, the difference
in grammaticality illustrated here is very sharp.
7
PTI cannot in principle license a parasitic gap because the HNP is a subject.
deriving dependent right adjuncts 205

c. (*)Did there walk into the room a man with long blond hair?
d. *This is the room that there walked into t a man with long blond
hair.
In R&C we argue that HNPS does not freeze the entire VP, because of
examples like the following.
(38) a. For whom did Bill purchase t last week an all expense paid ticket to
Europe?
b. I don’t remember for which of his sisters Bill bought in Europe t a
fourteenth century gold ring.
c. This is the woman from whom Bill purchased t last week a brand
new convertible with red trim.
But as Bresnan (1994) observes, we did not consider the possibility that the
extracted phrase is moved from a position following the HNP. Therefore, let
us provisionally accept the proposal originally made by Wexler and Culicover
(1980) that HNPS freezes the VP.8 Given this, the important point is that the
freezing effect in PTI is different from that in HNPS, since in PTI the entire
clause is frozen, while in HNPS only the VP is frozen, as extraction of the
subject and SAI show in (39).
(39) a. Which of these people purchased from you last week an all expense
paid ticket to Europe?
b. Did Bill buy for his mother anything she really liked?
Note that in comparison, extraposition of relative clauses from PP is possible
(cf. (24)).
R&C argue that these four properties follow directly from a rightward
adjunction account. There are two additional properties of a somewhat
different character that also suggest that HNPS and PTI involve movement.
First, HNPS out of a PP is impossible (Ross 1967).
(40) a. *I found the article in t yesterday [the magazine that was lying on the
coffee table].
b. *John talked to t at the party [several people who had blond hair].
(Rochemont 1992)

8
Bob Levine (p.c.) points out that Johnson (1985) argues against Bresnan’s point using
examples such as the following.
(i) Robin is a person [at whom]i I consider tj excessively angry ti [a whole gang of maniacal
Tolstoy scholars]j.
Here, the PP must originate to the left of the shifted NP, yet the VP does not appear to be frozen.
206 explaining syntax

And second, HNPS and PTI are clause-bounded.


(41) a. It was believed by everyone that Mary bought t for her mother [an
ornate gold ring]
b. ?It was believed that Mary bought t for her mother [an ornate gold
ring] by everyone
c. *It was believed that Mary bought t for her mother by everyone [an
ornate gold ring]
(42) a. It was believed by everyone that there walked into the room [a man
with long blond hair]
b. ?It was believed that there walked into the room [a man with long
blond hair] by everyone
c. *It was believed that there walked into the room by everyone [a man
with long blond hair]
(Rochemont 1992)
R&C account for the boundedness illustrated by these properties with a
version of the Rightward Movement Constraint. Unlike Ross’s (1967) Right
Roof Constraint, which accounts only for clause-boundedness, our constraint
requires that rightward movement be phrase-bounded.

7.6.2 Leftward movement and rightmost heavy noun phrases


7.6.2.1 Predicate raising
Let us consider how these properties could be accounted for on a leftward
movement account. On the first alternative, which we will call Predicate
raising (PR), the heavy NP remains in situ in a specifier position, and the
predicate consisting of the verb and other VP constituents moves into a
higher empty V position (Larson 1988; 1990). There is a natural extension of
this analysis to PTI (in unpublished work by Larson).
(43) a. Sam [V stored] all the things he valued tV in a vault
b. Sam [V stored in a vault] all the things he valued tV
The difference between HNPS and PTI is that in the former case, the subject
NP moves to a specifier position to the left of the verb and the HNP remains
in situ, while in PTI the subject NP itself is the HNP that remains in situ.
(44) there [V entered the room] a man with a funny hat tV
The HNP in this analysis is in its canonical argument position. It cannot
therefore be an adjunct, since extraction from this position is generally
possible (cf. (31) and (32)). Thus PR does not account for the first property
noted above. The analysis does account for the impossibility of HNPS of an
deriving dependent right adjuncts 207

indirect object in the double object construction on Larson’s (1988) analysis;


on this analysis, the constituent containing the verb and the direct object
contains the trace of the indirect object, and is hence thematically saturated.
The structure is given in (45).
(45) [VP [V0 [V ˜] [VP Maryj [V0 [V0 send tj] a book]]]]
As a consequence, under Larson’s assumptions, V0 cannot be reanalyzed as a
V for the purposes of PR. But while this analysis successfully accommodates
(33c), it appears to provide no means of deriving (46) (equivalent to (33a)),
with HNPS of the direct object.
(46) I sent Mary ti at Christmas [a book that I had bought]i
On Larson’s analysis, there is no V0-constituent that contains just send Mary
that can undergo PR, stranding the direct object (see (45)).
Under the classical analysis of parasitic gaps, it would appear that the third
property would not be correctly characterized by such an account. So a
leftward movement account would have to either reanalyze the cases of
parasitic gaps (Larson 1988) or show that they are not true parasitic gaps
(along the lines of Postal 1994).
Consider now the freezing effects. The PR analysis, which creates a complex
predicate from the material that precedes the HNP at S-structure, predicts
some but not all of these effects. It correctly predicts that the VP will be frozen
in HNPS (Larson 1988). However, it predicts that only the VP will be frozen in
PTI, which is not the case. In fact, if a PP is in ‘rightward scrambled’ VP-final
position, it too resists extraction.
(47) a. Who did you buy a picture of Einstein for t last week?
b. *Who did you buy last week for t a picture of Einstein?
c. *Who did you buy last week a picture of Einstein for t?
On an analysis in which the ‘shifted’ constituents are in situ regardless of
whether they are in VP-final or VP-internal position, it is not clear how to
capture the differences in extraction possibilities.
Finally, a virtue of this analysis is that it captures the fact that HNPS out of
a PP is impossible. A predicate can be formed from a verb and its L-marked
PP; there is no predicate that consists solely of the verb and the preposition of
that PP (Larson 1988).
So in summary, there are four problems with this version of a leftward
movement analysis. First, it does not capture the adjunct status of the
shifted NP. (In fact, it does not capture the adjunct status of a shifted PP
either.) Second, it does not explain the fact that HNPS cannot apply to an
indirect object but can apply to a direct object. Third, it does not account
208 explaining syntax

for the fact that parasitic gaps are licensed in HNPS. And fourth, it does
not capture the full range of freezing effects in HNPS and PTI (see (36)–
(39) above).

7.6.2.2 Movement to High Specifier


There is a conceivable leftward movement account that might overcome all of
the difficulties with the PR account. The basic problem with the PR account is
that it cannot represent the ‘shifted’ phrase as an adjunct. Let us suppose,
therefore, that the ‘shifted phrase’ moves leftward to a higher A0 specifier
position, and that the phrase that it raises out of subsequently moves leftward
to a still higher specifier position. Again, a variant of this analysis is one in
which the latter constituent adjoins to the specifier containing the HNP.

(48) .

Spec .

NPs X .

Spec .
.
X
Spec .

V .
.
NPo
V PP

(49) .

Spec .
Spec .
.
NPo V
.
to
V PP
deriving dependent right adjuncts 209

(50) .

Spec .

X .

Spec .
.
X
there .

V .
.
NPs

(51) .

Spec .

Spec .

.
NPs X
there .

V .
.
ts

By treating HNPS as essentially an A0 movement, this analysis directly cap-


tures the failure of extraction from the HNP, the possibility for parasitic gaps,
the extractability of a direct object but not an indirect object, and the freezing
of the constituent from which the HNP has been extracted, since after it
undergoes leftward movement it, too, is an adjunct.

7.6.3 Phrase bounding


The Movement to High Specifier (MHS) account faces some difficulties not
encountered on the PR analysis. It fails to block HNPS from a PP, since in
210 explaining syntax

English, leftward movement from PP is not blocked. It also fails to block long
extraction of the HNP. These are exactly the properties that on a rightward
movement account are attributed to the Rightward Movement Constraint.
Seen from this perspective, the rightward movement account and the MHS
account have the same weakness: they must both provide for some means of
phrase bounding that is thus far not independently motivated by any property
of leftward movement. The equivalent of the Rightward Movement Con-
straint on the MHS analysis must be a principle whose effect is to guarantee
that the requisite functional structures to which the HNP and its containing
phrase move are immediately above the containing phrase. Thus the cost of
properly characterizing bounding appears to be equivalent in both accounts.
There do not appear to be any empirical differences between the two, at least
none that are tied to configuration. Our comparison of the leftward move-
ment and rightward movement accounts shows that it is possible to repro-
duce on the leftward movement account the essential properties of the
structures that would result from rightward movement. In principle, it
appears that the two are notational variants of one another, mutatis mutandis,
and there can be no empirical basis for choosing between them. Questions
that remain open on the leftward movement account concern independent
motivation of the required functional structure and the triggering and licens-
ing conditions on the movements.
For example, in the structures that we proposed on the MHS analysis of
HNPS, there is an open question as to whether and how the trace of the HNP
is properly bound (see (48)), since the HNP does not c-command its trace.
A parallel question arises in the licensing of parasitic gaps in HNPS, where the
HNP fails to c-command the parasitic gap. In this account, one possibility
would be to appeal to reconstruction to legitimize the relevant configurations.
Alternatively, we might suppose that neither proper binding nor the licensing
of parasitic gaps makes reference to c-command. One can conceive of an
equivalent notion to which these licensing conditions could make reference,
e.g. the HNP will be in some type of sister relation to the constituent
containing the trace or the parasitic gap. The sort of sister relation that
might qualify is one in which the two sisters are dominated by all of the
same lexical, but not functional, projections (Chomsky 1986: 13).

7.7 Conclusion
Let us review. First, the language-internal facts from English, at least, do not
bear on the question of whether there is rightward and leftward movement, or
just leftward movement. In fact, there is no empirical reason why there cannot
be strict leftward branching, with rightward movement deriving all of the
deriving dependent right adjuncts 211

ordering and relative height facts, essentially the converse of the MHS
analysis.
Second, the facts do bear on the question of what form such an analysis
must take. For example, an account invoking leftward movement must be of
the High Specifier type for both extraposition and heavy noun phrases. In
particular, neither the Stranding analysis of extraposition nor the Predicate
raising analysis of HNPS give rise to an empirically adequate account, unless
of course they involve movement to a high specifier as part of the derivation.
Third, the choice between successful leftward and rightward movement/
adjunction alternatives must hinge on their relative explanatory potentials.
For instance, we have argued that both types of account require separate
stipulations with the effects of the Complement Principle and the Rightward
Movement Constraint. If these stipulations can be derived from other con-
siderations on one or the other view, that view gains an advantage over the
other, to the extent that the derivation has no comparable equivalent on the
other view. (At present we can see no way of eliminating these stipulations on
either view.) Whatever the outcome of future exploration of these and related
questions, it remains clear that the question whether rightward movement
exists or not, at least for these constructions of English, is not an empirical
one.
8

Topicalization, inversion, and


complementizers in English
(1992)*

Remarks on Chapter 8
I wrote a first version of this article for the Going Romance conference at the
University of Utrecht. I had been away from syntactic research for a few years
due to a flirtation with academic administration, but an ongoing reading
group that I had organized with Shigeru Miyagawa had helped me stay some-
what aware of what was going on. I was interested in what was happening with
the ‘exploded Infl’ proposed by Pollock (1989), and thought I would try to
apply the same type of analysis of the left periphery of English. I proposed that
English has an invisible functional Pol(arity) head between C and Infl. I did
not publish this paper in a journal because I was suspicious of the account
of the amelioration of the that-t effect when there is an adverb in [Spec,Pol]
between that and t, the ‘Adverb Effect’. Ultimately I argued against ECP
accounts of the that-t effect on the basis of the Adverb Effect—see
Chapter 9 below.
Much of the later work in the subsequent ‘cartographic’ framework
addresses some of the problems with the approach explored here and gener-
alizes it to languages other than English (see Cinque 2002; 2006; Belletti 2004;
Rizzi 1997; 2004; Cinque and Rizzi 2008).

* [This chapter appeared originally in Denis Delfitto, Martin Evergert, Arnold Evers, and Frits
Stuurman (eds), Going Romance and Beyond, OTS Working Papers, University of Utrecht (1992).
Portions of this material were presented to audiences at the University of Arizona, the Rijksuniversi-
teit van Utrecht, and ESCOL. For helpful discussion, criticism, and specific suggestions regarding the
analyses proposed in this paper I would like to thank Andy Barss, Arnold Evers, Hans den Besten, Alec
Marantz, Shigeru Miyagawa, J. J. Nakayama, David Pesetsky, Tom Roeper, Bonnie Schwartz, Frits
Stuurman, Laurie Zaring, and especially Marc Authier, Peter Coopmans, Heizo Nakajima, Michael
Rochemont, and Ayumi Ueyama. Naturally I am responsible for any errors.]
topicalization, inversion, and complementizers 213

8.1 Introduction
I argue in this paper that there are two complementizer-type positions in
English, as illustrated in (1). ‘Pol(P)’ abbreviates ‘Polarity (Phrase).’1
(1) [CP Spec C [PolP Spec Pol [IP . . . ]]]
The various arguments that I give are directed towards demonstrating that
there are generalizations that can be best explained if we assume the existence
of both C and Pol, with their associated maximal projections and specifiers.2
I will suggest that C ranges at least over that, Q, and [e], while Pol may be
at least neg, wh, and so.3 There is also evidence that Pol may be Focus.
Movement into [Spec,PolP] is licensed through Spec-head agreement, as is
movement into [Spec,CP]. Such licensing depends crucially on the ability of the
particular head to participate in an agreement relationship with Spec
(Chomsky 1986; Rizzi 1990; 1996). Movements into [Spec,PolP] yield Subject
AUX Inversion (SAI) because of the need for Pol when it is a bound morpheme
to adjoin to an overt element. I assume that ‘topic’ topicalization, where the
topic does not carry primary stress (Gundel 1974), is adjunction to a maximal
projection (e.g. CP, PolP or IP), and is not substitution for a Spec (Lasnik and
Saito 1992; Rochemont 1989). However, I suggest that ‘focus’ topicalization
(Gundel 1974) may in fact be substitution for [Spec,PolP] when Pol is Focus.
These points are developed in the following way. }8.2 demonstrates that
topicalization and Negative Inversion involve very different landing sites for
the fronted constituent. Topicalization creates a ‘topic island’ while Negative

1
I adapt the category Pol from Johnson (1989), who makes different use of it than is proposed
here. For Johnson, Pol is the category of the ‘adverbs’ so, too, and not. My proposal resembles several
others that have appeared recently, as well. Laka (1990) proposes a head  for English, Spanish, and
Basque that resembles Pol in many respects; I will suggest a variety of additional evidence for her
general proposal as well as several modifications. Ueyama (1991) has argued for a similar head in
Japanese, while Koizumi (1991) proposes a somewhat different M(P) for ‘modal’ adverbs in
Japanese; the two proposals are not entirely compatible, however. Haegeman (1991) argues exten-
sively for a neg(P) external to IP in West Flemish, which appears to have many of the properties of
Pol when Pol takes on the value neg in my analysis. Authier (1991) suggests that CP can iterate in
English, yielding superficially similar structures to those that I investigate in this paper.
2
The view that there are two adjunction sites to the left of the subject is not entirely novel; see
e.g. Grosu (1975) and Reinhart (1981b). Reinhart in particular is concerned with the fact that it is
possible to extract from Hebrew clauses that appear to have filled COMP (relative clauses and wh-
questions) in violation of Subjacency. Rather than take S to be a bounding node, she suggests that
there are two escape hatches in Hebrew (and in Italian) but only one in English. The framework
within which their arguments are couched is sufficiently different from the current one that it is
not entirely clear how their evidence can be brought to bear on the current proposal.
3
Another value of C which I will not discuss at length here is Rel(ative). See fn. 26 below.
Laka (1990) shows, following Klima (1964), that there is a phonologically empty morpheme that
denotes affirmation and is in complementary distribution with neg.
214 explaining syntax

Inversion does not. The conclusion is that the first is adjunction, while the
second is substitution into a specifier position to the right of the comple-
mentizer, i.e. [Spec,PolP].
}8.3 produces a range of new evidence to support the analysis. (i) The presence
of Pol in addition to C allows certain subject that-t extractions not to violate the
ECP. (ii) The existence of C and Pol allows us to explain why inversion occurs
in embedded sentences with fronted negation and so, but not with fronted wh.
(iii) The analysis extends naturally to an account of Sluicing (Ross 1969b).
(iv) The availability of two complementizer positions, each of which has a
Spec, allows us to explain some subtle differences between why and how come,
on the assumption that they are both generated outside of IP (see Rizzi 1990).
In order to account for the licensing of subject wh and subject neg/so, it is
necessary to assume that in English PolP may be a complement of Infl as well
as of C. }8.4 pursues some implications of this analysis and extends it to the
account of focus constructions in Hungarian, English, and other languages.
}8.4 also examines briefly the implications of the Pol analysis for the verb
second phenomena of the Germanic languages.
For the purposes of this paper I will adopt aspects of the theoretical
perspective of Rizzi (1990) as modified by Cinque (1990), as well as that of
Lasnik and Saito (1992). The points that are most relevant to the investigation
here are the following.
Head government: The formal licensing portion of the ECP is reducible
to a requirement of proper head government.4
Spec-head agreement: A filled Spec is licensed by Spec-head agreement
(Rizzi 1990; 1991).
Empty C agrees: In English, that is inert with respect to agreement, while
empty C can agree with Spec. Thus, movement of a subject through
[Spec,CP] is licensed when C is empty, because C is coindexed with the
[Spec,CP] through Spec-head agreement, hence with the trace in [Spec,
IP] (Lasnik and Saito 1992; Rizzi 1990; Rochemont and Culicover 1990).
Topic islands: Adjunction to a maximal projection creates a barrier to
extraction (Lasnik and Saito 1992; Rochemont 1989). Following Cinque
(1990), a single barrier to movement bars extraction; hence topicaliza-
tion through adjunction creates a ‘topic island’.
X0 adjunction: Movement of a head X0 is always structure-preserving,
i.e. it is either adjunction to another X0 or substitution for empty X0
(Chomsky 1986; Baker 1988).

4
The term “proper head government” is taken from Rizzi (1990 ). Lasnik and Saito argue
that lexical government and antecedent government are distinct notions, but that only an X0 can
be a proper governor. For many cases, the two approaches converge, although the phenomena
are grouped somewhat differently.
topicalization, inversion, and complementizers 215

It will simplify the discussion considerably to assume that the subject in


English originates as [Spec,IP], and that SAI involves movement of Infl to
the left. One alternative, that the subject originates in VP and the subject and
AUX remain in situ in S-structure, raises difficult questions of Case assign-
ment and licensing of specifiers that would take us far afield.5 I will also leave
open the complicated question of whether the functional category Pol may in
fact be a variant some other functional category, such as AgrS or some type of
aspectual head.

8.2 Two landing sites


Here I show that on the standard view of the English complementizer
structure, Negative Inversion cannot be fully accommodated. Given the
structure (2),
(2) [CP Spec C IP]
the position of a fronted negative must either be that of a fronted wh, or of a
topic. There is evidence that it is neither.
The standard GB view of English wh-questions is that the wh moves into
[Spec,CP], and Infl adjoins to C. On this view, both movements are structure-
preserving (Baker 1988; Chomsky 1986).
(3) [CP [Spec whati] Q+willj [IP Robin tj [VP say ti]]]
Baltin (1982) and Lasnik and Saito (1992) argue that topicalization is (non-
structure-preserving) adjunction to IP. This is plausible, since the topic
appears to the right of the complementizer that.
(4) a. I think that, to Lee, Robin gave a book.
b. Lee said that, on the table, she is going to put the yellow dishes.
c. Robin says that, the birdseed, he is going to put in the shed.
Multiple leftward movement in a single clause yields the ungrammatical cases
in (5) and (6).6
(5) a. *What did, to Lee, Robin give?
b. *Which dishes are, on the table, you going to put?
c. *Where are, the birdseed, you going to put?

5
See Diesing (1990) for discussion of V-second in Yiddish.
6
As discussed in }8.4, there are two types of topicalization, with different intonations. It is
marginally more acceptable to extract from the ‘focus’ topicalization, structure, which I suggest
may not be an adjunction structure but a substitution for a Spec.
216 explaining syntax

(6) a. *I asked what, to Lee, Robin gave.


b. *Lee forgot which dishes, on the table, you are going to put.
c. *Robin knows where, the birdseed, you are going to put.
To rule out these examples, let us follow Cinque (1990) in saying that a maximal
projection that is not c(ategory)-selected is a barrier to extraction.7 In the case
of adjunction to IP, the newly created IP satisfies the c-selection requirement of
C, but the original IP does not.8 Hence the original IP is a barrier sufficient to
block subsequent extraction, and a ‘topic island’ arises (Lasnik and Saito 1992;
Rochemont 1989).9 The double bracket denotes a barrier.
(7) NP forget [CP Spec C [IP [on the table] [IP you are going to put which
dishes]]]
Now consider Negative Inversion. There are two possible structures for
Negative Inversion on the standard approach. Consider (8).
(8) a. Lee said that at no time would she agree to visit Robin.
b. It is apparent that only on Fridays will the traffic be too heavy to get
there in time.
c. The fact that on not a single hat was there a propeller indicates how far
the beanie has fallen in modern times.

7
Specifically, Cinque proposes the following definitions of barrier.
(113) Definition of barrier for government
Every maximal projection that fails to be directly selected by a category nondistinct from
[+V] is a barrier for government.
(114) Definition of barrier for binding
Every maximal projection that fails to be (directly or indirectly) selected in the canonical
direction by a category nondistinct from [+V] is a barrier for binding.
8
I thank Shigeru Miyagawa for suggesting this formulation to me.
9
If we wish to allow IP to be an inherent barrier, then an alternative account is possible.
Lasnik and Saito (1992) and Rochemont (1989) propose that adjunction to IP creates a ‘topic
island’ with respect to subsequent extraction from IP. The new IP node constitutes an extra
barrier. A Subjacency violation follows when something is extracted across the original IP, which
is a barrier, and the barrier created by adjunction of the topic. (i) illustrates.
(i) I asked [CP what [[IP [to Mary] [[IP Bill gave t t]]]
(The double brackets indicate the two barriers that what must cross.) Thus, the examples in (5)
are ruled out for two reasons. First, extraction of the wh over the two barriers is a Subjacency
violation; second, movement of Infl over the two barriers is a Subjacency violation.
It is also possible that the topic islands are a reflex of Relativized Minimality (Rizzi 1990). On the
face of it, both adjunction of the topic to IP and substitution of wh into [Spec,CP] are
A0 -movements, and thus should yield Relativized Minimality violation in combination. I leave the
question open for now; for some additional considerations, see the discussion in fn. 14 below.
topicalization, inversion, and complementizers 217

At no time, etc. are fronted expressions that are preceded by C and are
followed by an inverted I(nfl). If they are topics, they are adjoined to
IP. Then in these inversion examples, Infl must also adjoin to IP, in violation
of the requirement that movement of a head be a substitution or an adjunc-
tion to another head.10 On the other hand, if the fronted expression is
adjoined to CP, then that cannot be C.
Extraction from clauses in which Negative Inversion has applied cannot be
easily accommodated within this framework, regardless of which structure we
choose. If Negative Inversion is assumed to pattern like a wh-question,
extraction from a Negative Inversion clause should be blocked by the same
mechanism that blocks extraction from wh-islands in English. On the other
hand, if Negative Inversion is assumed to pattern like topicalization, extrac-
tion should be blocked by the same mechanism that blocks extraction from
topic islands. In either case, extraction should be unacceptable, but it is not.
The relevant data is given in (9)–(14).11

(9) These are the books which Lee says that


that
?* with great difficulty, she can carry
*to Robin, she will give .
*on the table, she will put

(10) These are the books which Lee says that


that
only with great difficulty can she carry
only to Robin will she give .
only on the table will she put
(11) Which books did Lee say that
?* with great difficulty, she can carry
*to Robin, she will give ?
*on the table, she will put

10
This assumption is not universally accepted. It is not made in e.g. Rochemont and
Culicover (1990), and it does not appear to be made by Lasnik and Saito (1992). It may well
be possible to replace the requirement that X0 movement and even XP movement be structure-
preserving by a requirement that adjunctions be properly licensed, along lines suggested by
Fukui and Speas (1986), Hoekstra (1991), and Culicover (1993b).
11
There appears to be a ‘focus’ topicalization construction in English that differs from the
‘topic’ topicalization construction intonationally, and in not creating a topic island. The starred
examples in (9), (11), and (13) are much improved under the ‘focus’ topicalization reading. See
}8.4 for discussion.
218 explaining syntax

(12) Which books did Lee say that


only with great difficulty can she carry
only to Robin will she give ?
only on the table will she put
(13) On which table did Lee say that
*with great difficulty, she can put the books
*for Robin, she can put the books ?
*these books, she can put
(14) On which table did Lee say that
only with great difficulty would she put the books
only for Robin would she put the books ?
only these books would she put

The contrast between topicalization and Negative Inversion sentences with


respect to extraction shows that the fronted negative does not create a topic
island. Thus, if that is C there must be a substitution site for the negative.
On the other hand, suppose for the sake of argument that that were not a
C in the sentences in (8). If the fronted negative constituent occupied [Spec,
CP], it would be impossible to extract from embedded Negative Inversion
sentences, by analogy with embedded wh-questions. Because the position is
occupied, extraction cannot be successive cyclic, but must move out of the
lower S in one step. This is a Relativized Minimality violation (Rizzi 1990).12
On the view that that is a complementizer, these sentences show that C can
take as its complement a maximal projection that is distinct from IP. This
maximal projection contains a Spec and a head, just like CP. Call this new
projection PolP. The head of PolP may be neg, which agrees with a negative in
[Spec,PolP] under Spec-head agreement.13
Along similar lines, suppose we analyze a relative clause as being of the
form [CP[Spec XPi] Rel [IP . . . ti . . . ]]. Negative Inversion should be impossible,
because there is no landing site for the negative constituent, or because the

12
I assume here that a negative constituent in [Spec,CP] should count as an A0 minimality
domain for a wh in a higher [Spec,CP] that c-commands it. However, as I note below, it turns
out that Relativized Minimality does not hold for wh/negative interactions. Even if Relativized
Minimality does not apply, the force of the evidence is still that there is a maximal projection
different from CP involved in the derivation of negative inversion.
13
It has been proposed that that may take a CP complement (Rizzi and Roberts 1989; Authier
1991); Chomsky (1977) adopts a similar approach in a earlier framework. Such a structure must
be severely constrained so that illicit sequences are not generated: *that that ( . . . ), *who that, *at
no time who, *at no time that, etc. Taking PolP to be the complement of C imposes these
restrictions directly, in terms of the range of C and of Pol. In some sense, of course, the two
options are notational variants of one another.
topicalization, inversion, and complementizers 219

adjunction of the negative constituent would create a topic island and block
the movement of XP into [Spec,CP]. As the following sentences show, relative
clauses allow Negative Inversion.14
?* with great difficulty Lee can carry
(15) These are the books which *to Robin Lee will give .
*on the table Lee will put
only with great difficulty can Lee carry
(16) These are the books which only to Robin will Lee give .
only on this table will Lee put

Again, the evidence suggests that there is an additional landing site for the
negative constituent that is distinct from [Spec,CP].

8.3 Additional evidence


In this section I consider additional evidence to support the conclusion that
there is a PolP. In each case, I show that the assumption that there are two
heads or two Specs allows for the explanation for what would otherwise be
puzzling phenomena.

8.3.1 Suspension of that-t ECP effects


Here I show that the presence of empty Pol licenses extraction of a subject in
English even when C is that. Thus it is possible to explain cases in which the
expected that-t effect due to ECP is suspended.
The presence of both PolP and IP predicts that it should be possible to
adjoin a topic to either. Consider the implications for extraction from IP. First
of all, PolP is not a barrier to extraction; if it were, it would block extraction
after Negative Inversion. Thus it is possible to extract from IP over PolP to
[Spec,CP] when [Spec,PolP] is filled.
There are certain adjunctions in English that do not appear to give rise to
topic islands. The examples in (17) show that adjuncts such as for all intents
and purposes, yesterday, in NP’s opinion, and under normal circumstances have
this property.

14
The fact that it is possible to extract from a Negative Inversion sentence undermines the
Relativized Minimality account of topic islands (see fn. 9 above). Negative Inversion involves
substitution for Spec, and hence is an A0 -movement. If topicalization and wh-Movement are
also A0 -movements, they should be blocked by the movement of a negative constituent into
Spec, but they are not. One inference to draw is that movement of a negative into Spec is a
different type of movement from topicalization and wh-Movement, so that Relativized Minim-
ality does not apply. But then it is equally or more plausible on formal grounds that topicaliza-
tion and wh-Movement are also different types of movement from one another.
220 explaining syntax

Op that
(17) a. Robin met the man whoi for all intents and purposes ti was the
i
mayor of the city.
Opi that
b. This is the tree which just yesterday I had tried to dig up ti with
i
my shovel.
c. I asked whati in your opinion Robin gave ti to Lee.
d. Lee forgot which dishesi under normal circumstances you would put
ti on the table.
In each of these cases there is extraction of a wh-phrase over an adjunct, yet no
topic island violation of the sort seen in examples such as (11) and (15) above.
Why this should be the case is an independently complex matter that I cannot
go into here; in any case, the empirical evidence shows that not all adjuncts
create topic islands.
Assume now that if there is no Pol and nothing that must be adjoined to
PolP, PolP is not present. If PolP is not present and if there is an adjunct that
does not create a topic island, a constituent Æ can move over the IP-adjunct
into [Spec,CP], as in (18).15
(18) [CP [Spec Æi] C [IP XP [IP . . . ti . . . ]]]
Suppose next that XP is adjoined to PolP, again not producing a topic island
in this case. A constituent Æ can move into [Spec,PolP] and then into [Spec,
CP] over a PolP-adjunct, if there is no topic island, as shown in (19).
(19) [CP [Spec Æi] C [PolP XP [PolP [Spec ti0 ] Pol [IP . . . ti . . . ]]]]
Thus, in cases where adjunction does not create a topic island, there will be
two possible structures for extraction over the topic, namely (18) and (19).
Suppose now that Æi is the subject of IP. Furthermore, let C be that, which
cannot undergo Spec-head agreement (Rizzi 1990). I continue to assume that
there is an XP adjunct in each case that does not create a topic island.
(20) a. . . . [CP [Spec Æi] that [IP XP [IP ti . . . ]]]
b. . . . [CP [Spec Æi] that [PolP XP [PolP [Spec ti0 ] Poli [IP ti . . . ]]]]

15
I am assuming for completeness that adjunction to IP of a non-topic island adjunct is a
possibility. But nothing hangs on this assumption. Suppose that we could independently
demonstrate that the non-topic island adjuncts are not moved, but generated in adjunct
position in D-structure. Then things would actually be simpler if we were to assume that
there are no D-structure IP adjuncts. We could continue to suppose that Move Æ can adjoin
either to IP or to PolP. All of these conclusions are consistent with the analysis later of why,
which I argue is generated in D-structure in [Spec,PolP].
topicalization, inversion, and complementizers 221

In (20a) there is no PolP. This is a typical that-t violation; that is not


coindexed with ti and thus does not head-govern it. Thus C does not properly
head-govern ti, and there is an ECP violation at ti.
Consider now (20b). Pol can undergo Spec-head agreement with the trace
ti0 in [Spec,PolP]. With Spec-head agreement, Pol receives the index i of Æi;
hence Pol is coindexed with the subject trace ti. Thus, Pol properly head-
governs ti, and there is no ECP violation.
I therefore predict that there may be certain instances in which adjunction
to the right of that appears to suspend the ECP by suspending the that-t effect.
It has been seen that some adjuncts do not create topic islands. When such
adjuncts are present, we in fact do not get that-t violations.16 The relevant
examples are given in (21).
Op that
(21) a. Robin met the man whoi Leslie said that for all intents and
i
purposes ti was the mayor of the city.
b. This is the tree Opi that I said that just yesterday ti had resisted my
shovel.
c. I asked whati Leslie said that in her opinion ti had made Robin give a
book to Lee.
d. Lee forgot which dishesi Leslie had said that under normal circum-
stances ti should be put on the table.
The examples in (21) show that without the topic island, the presence of Pol
licenses extraction of the subject. The that-t effect does not occur here, as
noted, because that does not occupy the position of the potential head-
governor for the subject trace. Thus, (21) contrasts sharply with (22), and
falls together with (23) in grammaticality.

16
As Peter Coopmans has pointed out to me, a question now arises as to the status of ti0 in
(20b). This trace is not lexically governed or antecedent-governed under the definition of
Rochemont and Culicover (1990) or head-governed under the definition of Rizzi (1990). The
most natural approach to take here is to say that the correct structure when the that-t effect is
suspended is not in fact (20b), but (i).
(i) Æi . . . [CP [Spec ] that [PolP XP [PolP [Spec ] Poli [IP ti . . . ]]]]
Either (i) is a long extraction of the sort discussed by Cinque (1990), or the non-argument trace
can be freely deleted in LF (Lasnik and Saito 1984). What is essential is that the empty Pol is
licensed by the adjoined XP and in turn licenses the empty subject position, which is not
possible when XP is adjoined to IP, as in (ii), or when there is no adjunct, as in (iii).
(ii) Æi . . . [CP [Spec ] that [IP XP [IP ti . . . ]]]
(iii) Æi . . . [CP [Spec ] that [IP ti . . . ]]
On the long extraction approach, the mechanism by which an empty Pol (or C) head governs
the subject cannot involve Spec-head agreement, since there is nothing in [Spec,PolP].
222 explaining syntax

Op that
(22) a. *Robin met the man whoi Leslie said that ti was the mayor of
i
the city.
b. *This is the tree Opi that I said that ti had resisted my shovel.
c. *I asked whati Leslie said that ti had made Robin give a book to Lee.
d. *Lee forgot which dishesi Leslie had said that ti should be put on the
table.
Op that
(23) a. Robin met the man whoi Leslie said [ei] ti was the mayor of the
i
city.
b. This is the tree Opi that I said [ei] ti had resisted my shovel.
c. I asked whati Leslie said [ei] ti had made Robin give a book to Lee.
d. Lee forgot which dishesi Leslie had said [ei] ti should be put on the
table.
In order to capture the difference between (21) and (22), we must make the
natural assumption that when Pol and [Spec,PolP] are entirely empty and
nothing adjoins to PolP, PolP is pruned from the structure. Otherwise, if we
were to allow empty [Spec,PolP] and a PolP with nothing adjoined to it, we
would expect to never get the that-t effect. Crucially, we cannot take the non-
topic island adjuncts to be in [Spec,PolP], because we would then lack the
formal mechanism for linking Pol with the subject in trace position through
Spec-head agreement with a trace in [Spec,PolP]. (But see fn. 16 above for
some indication that presence of the empty Pol itself, and not the contents of
[Spec,PolP], is what is relevant here.)
We predict that the counterpart of the that-t effect will be suspended in case
the complementizer is other than that. It is impossible to test this prediction
in the case of infinitives, because Pol only selects tensed IP complements (see
fn. 22 below). But suppose that the complementizer is Q, to be discussed in
greater detail in }8.3.2 below. There appears to be a suspension of the ‘Q-t’
effect as well.
(24) a. *Who did Lee wonder whether t had left
b. ?Who did Lee wonder whether Leslie had seen t
c. ?Who did Lee wonder whether just yesterday t had left
d. *Why did Lee wonder [whether Leslie had left t]
e. *Why did Lee wonder [whether just yesterday Leslie had left t]
Assume the analysis of Cinque (1990). Example (24a) is an ECP violation,
since the subject is not head-governed. Long movement of the subject does
not save this sentence. (24b) involves long extraction from a weak island.
There is no ECP violation, since the direct object is properly head-governed.
Example (24c) should be judged as acceptable as (24b), since presumably the
topicalization, inversion, and complementizers 223

empty Pol properly head-governs the subject in this case. While the judgment
is somewhat subtle, the acceptability of this example appears to be closer to
that of (24b) than to that of (24a) and (24d,e), which are ECP violations.
Examples with other adjuncts confirm this general tendency.
(25) a. ?the person who Lee wondered whether *(for all intents and circum-
stances) t was already the Democratic candidate
b. ?the pasta that Lee forgot whether *(in your opinion) t should be
served for dinner
c. ?What did Lee wonder whether *(under more normal circum-
stances) t would have been served for dinner

8.3.2 Subject Aux Inversion (SAI)


Here I consider why inversion occurs when a negative constituent is moved
into [Spec,PolP].17 Let us assume that the negative constituent and Pol agree
in the feature neg, an instance of Spec-head agreement. For clarity I will use
neg or [Pol neg] to refer to the negative Pol, and NegP to refer to the
corresponding phrase that moves into [Spec,PolP] (and similarly for wh/
WhP and so/SoP).
It is plausible that inversion occurs in Negative Inversion as a direct
consequence of the movement of NegP into [Spec,PolP]. Modifying and
generalizing a suggestion of Pesetsky (1987) for interrogatives, suppose that
neg is a morpheme that must cliticize to another head.18 In the configuration

17
Of course it is possible to front a negative constituent without inversion, as shown by
Klima (1964). I am focusing here on those cases in which the negative has sentential scope. For
discussion of the interpretive difference between Negative Inversion and ordinary topicaliza-
tion, see Klima (1964), as well as Liberman (1974) and Rochemont (1978).
18
We may take a similar approach to so-Inversion, illustrated in (i).
(i) So many people did John insult that he did not dare return home.
We would therefore predict that extraction from a so-Inversion context will be grammatical, by
analogy with extraction from a Negative Inversion context. The judgments are marginal at best,
however, for reasons that are not clear to me.
(ii) a. Mary says that she will sell this book to so many people that she will become rich.
b. ?This is the book that Mary says that to so many people will she sell that she will become
rich.
(iii) a. Mary says that she will put the books on so many tables that the floor will collapse.
b. ?These are the books that Mary says that on so many tables will she put that the floor
will collapse.
(iv) a. Mary says that she will read the book with so much attention that she won’t hear the
phone ring.
b. ?This is the book that Mary says that with so much attention will she read that she won’t
hear the phone ring.
224 explaining syntax

(26) [PolP [Spec NegP] [Pol neg] [IP . . . Infl . . . ]]


there is no such head adjacent to [Pol neg]. Therefore, the head of IP must
raise and adjoin to Pol.19
(27) [PolP [Spec NegP] [Pol neg]+Infli [IP . . . ti . . . ]]
This raising of Infl to Pol constitutes SAI.20
While this general picture appears plausible, consideration of the specifics
raises numerous questions. Most prominently, why does inversion apply in
direct questions but not in embedded questions? It cannot be the case that
SAI per se is a ‘root’ transformation, as originally suggested by Emonds (1970;
1976), because Negative Inversion and so-Inversion can be embedded. Com-
pare the following examples.
(28) a. What did Robin see?
b. I wonder what Robin saw.
c. *I wonder what did Robin see.
d. I said that not once had Robin raised his hand.
e. I said that so many people did Robin insult that he did not dare
return home.
From the simple fact that inversion occurs in a direct question it follows that
[Pol wh] can occur in main clauses. The derivation is the following.
(29) [PolP [Spec WhP] [Pol wh]+Infli [IP . . . ti . . . ]]
It is clear that wh must also move to initial position in an embedded question.
Thus the (apparently) maximal head in the embedded question requires Spec-
head agreement with the fronted wh. Since inversion does not occur, the head
in question cannot be [Pol wh], if we hold to our assumption that wh is a
morpheme that triggers inversion. Since the interrogative character of an

[The marginal sentences are instances of crossing dependency, which could be responsible for
the judgment.]
19
Laka (1990: 40) proposes that Infl must move to neg as a consequence of the following
Tense c-command condition, based on a suggestion by Pollock (1989): “negation must be
c-commanded by Infl at S-structure.” More generally, in S-structure Tense must dominate all
other inflectional elements, including neg. If I am correct that English has both a complemen-
tizer Q and a Pol wh, then the fact that Infl does not raise to Q might constitute a problem for
such an approach.
20
A not dissimilar account is given by Rizzi (1996). Rizzi suggests that in wh-questions I is
marked [+wh]. This I moves to C in order to license Spec-head agreement with a wh in Spec.
The two approaches are technically very similar. One difference appears to be that by incorpor-
ating Pol into I in the form of a feature, we would lose the ability of empty Pol to license a subject
trace, as discussed in }8.3.1.
topicalization, inversion, and complementizers 225

embedded complement can be selected by the matrix verb (Grimshaw 1979),


the head that licenses wh-Movement in embedded questions cannot be Pol;
the verb can only select the complementizer. Hence the head in question must
be an interrogative complementizer distinct from [Pol wh], and which in fact
excludes [Pol wh]. I will call this complementizer Q. Q, like that, appears in
embedded contexts only.21 That is, I assume that in general complementizers
per se do not appear in main clauses.
The key point here is that the analysis that assumes the existence of both
C and Pol is in a position to account for the fact that inversion does not occur
in embedded questions. The complementizer Q, as befits a complementizer,
occurs in embedded questions. [Pol wh] occurs only in main clauses and
triggers inversion in direct questions because the wh morpheme is a clitic.22

21
In fact, it may be that in some languages, Q is realized overtly as that (or whatever
corresponds to that). For example, Bavarian (Bayer 1984) may have the sequence wh-daß.
(i) I woass ned [wanni (dass) [da Xavea ti kummt]]
I know not when that the Xaver t comes
(ii) Es is no ned g’wiess [weai (dass) [ti kummt]]
it is yet not sure who that t comes
(iii) dea Hund [deai (wo) [ti gestern d’Katz bissn hot]]
the dog which that t yesterday the cat bitten has
(iv) de Frau [deai (wo) [da Xavea ti a Bussl g’gem hot]]
the woman to-who that the Xaver t a kiss given has
Similar examples for relative clauses (but not questions) are cited for English by Grimshaw
(1975) (see also Bresnan 1976 ; Chomsky and Lasnik 1977 ), where Rel is realized as that.
22
It is possible to have wh-infinitives in English, but not neg-infinitives or so-infinitives.
(i) a. I was wondering whether (or not) I should leave.
b. I was wondering what I should do.
c. I was wondering how many times I should call.
d. I expected that not once would I see John.
e. I expected that so many people would I meet that I wouldn’t be able to count them all.
(ii) a. I was wondering whether (or not) to leave.
b. I was wondering what to do.
c. I was wondering how many times to call.
d. *I expected not once to have seen John.
e. *I expected so many people to meet that I wouldn’t be able to count them all.
The current account crucially provides both [Spec,CP], the landing site for fronted wh, and
[Spec,PolP], the landing site for fronted neg and so. The evidence of these examples is that Pol
selects for Tense. Note in this regard that Negative Inversion cannot apply in subjunctives and in
imperatives.
(iii) a. It is important that you never talk to them.
b. *It is important that never (do) you talk to them.
(iv) a. You talk to no one.
b. *To no one do you talk.
(v) a. No one talks to anyone.
b. *To no one does anyone talk.
These facts follow if subjunctives and imperatives lack Tense but have Agr, as suggested by
Beukema and Coopmans (1989).
226 explaining syntax

In order for this analysis to go through, it is necessary to demonstrate that


the sequence C–Pol in embedded clauses is in general possible; the sequence
C–wh is excluded in embedded sentences, presumably on principled grounds,
but sequences of the form C–neg and C–so exist. In fact we have already seen
instances of that–neg and that –so. The other combinations exist, but are of
varied acceptability.
(30) a. ?Lee wonders whether at no time at all would Robin volunteer.
b. Lee wonders whether only then would Robin volunteer.
c. ?Lee wonders whether so many people did Robin insult that he does
not dare return home.
d. Lee will finally tell us whether or not to so many people did Robin
give his phone number that we can expect phone calls all week.
e. ??Lee wonders exactly when in no way at all could Robin solve the
puzzle.
f. ?Lee told us where on very few occasions would Robin ever agree to
eat dinner.
g. Lee wonders why in no way would Robin volunteer.
h. Lee wonders why only then would Robin volunteer.
i. Lee wonders how come at not many times would Robin eat dinner.
The well-formedness of some of these examples, and the variability of judg-
NEG
ments, suggests that the sequence Q– SO is in principle possible, as
predicted.23
The sequence Q–wh is ruled out in embedded questions. I will presume
that there are pragmatic reasons for this. That is, there is nothing syntactically
wrong with embedded wh, but its function as an operator that expresses a
direct question requires that it appear only in roots.24 By the same token,
direct imperatives cannot be embedded: *Robin said that (don’t) (you) sit
down, *the person that (don’t) (you) invite to the party. It is unlikely that there
is a natural characterization of this restriction in purely syntactic terms.
In main clauses, Move Æ moves a constituent into [Spec,PolP], where it
agrees with Pol.25 In order to bind the morpheme Pol, Infl adjoins to Pol.

23
The somewhat greater acceptability in embedded questions of only-phrases than NegPs
raises the possibility that there are different functional categories for the two.
24
As noted by Hooper and Thompson (1973), the restriction on the distribution of wh-
inversion is not a syntactic one, since it can be found in subordinate clauses that have a ‘root’
function.
25
This agreement is referred to by Rizzi (1996) as the “wh Criterion” for wh-questions
(following May 1985) and the “Neg Criterion” for the negative cases. One aspect of these criteria
is that the Spec position must be filled. How this requirement is to be satisfied in the case of yes-
no questions is a problem that I touch on below. Rizzi does not address it in his analysis.
topicalization, inversion, and complementizers 227

There are thus the following derived structures for wh-questions and Negative
Inversion.
(31) a. [PolP [Spec WhPi] wh [IP . . . ti . . . ]]
b. [PolP [Spec NegPi] neg [IP . . . ti . . . ]]
In embeddings, [Pol wh] cannot appear. Therefore, there is no movement of
Infl in an embedded question. The WhP must move into [Spec,CP] in order
to undergo Spec-head agreement with the complementizer Q. But neg can
appear as Pol in an embedded sentence, and so there is embedded Negative
Inversion.
(32) a. . . . [CP [Spec WhPi] Q [IP . . . ti . . . ]]
b. . . . [CP [Spec ] C [PolP [Spec NegPi] neg [IP . . . ti . . . ]]]
Assume, as before, that PolP is optional.26
At this point it might be objected that the theory of interrogative syntax is
rendered unaesthetic by the assignment of interrogative properties to both C,
in the form of Q, and to Pol, in the form of wh. In fact, one might counter
this objection by saying that such a distribution is the norm. To support this
position, I note the analysis of negative complements of Laka (1990). Laka
shows that negative verbs such as deny, regret, and forget do not have the
feature neg, which explains why they do not govern Negative Polarity Items
(NPI) in object position, in contrast with not.

26
I do not discuss relative clauses at length in the text. My analysis suggests that the head of a
relative clause is the complementizer Rel, which must undergo Spec-head agreement with a
suitable constituent in [Spec,CP]. I predict that Negative Inversion and so-Inversion will be
possible in relative clauses, and they are.

(i) This is the man that only once did I talk to.
who

(ii) This is the man that so many times did I talk to that I was arrested.
who
Interestingly, Negative Inversion may apply when the constituent in [Spec,CP] is negative as well
as relative.
(iii) These are the people, none of whom had I ever seen.
The grammaticality of this sentence suggests the following derivation.
(iv) people, [CP [Spec [none of whom]i] Rel [PolP [Spec t 0 i] neg+Infl] [IP . . . tj . . . ti]]]
The NegP none of whom first moves into [Spec,PolP], where it triggers inversion. Presumably it
or its trace satisfies Spec-head agreement with neg. Then it moves into [Spec,CP], where it
satisfies Spec-head agreement with Rel.
228 explaining syntax

denied
(33) a. *I regretted anything interesting.
forgot
say
b. I didn’t claim anything interesting.
remember
However, NPIs appear in complements of these verbs.
denied
(34) I regretted that anything interesting happened
forgot
So Laka concludes, correctly I believe, that the complements of these negative
verbs contain the complementizer thatNEG, which governs the NPIs. In this
regard the negative verbs are entirely parallel to interrogative verbs, such as
wonder, ask, etc. in English, which select the complementizer Q.27 Thus, given
the existence of the negative polarity marker neg and the negative comple-
mentizer thatNEG, the existence of a parallel pair consisting of an interrogative
polarity marker wh and an interrogative complementizer is not surprising.

8.3.3 Whether
Let us turn to yes-no questions. The traditional analysis of yes-no questions in
generative grammar starts with the assumption that these are wh-questions in

27
Laka’s discussion is extensive, and I have given here only a brief motivation for the
analysis. Perhaps the strongest evidence in favor of her analysis is that, while normally NPIs
cannot be moved to the left of their governor, clauses containing NPIs can be so moved if they
contain the negative complementizer. Consider the following examples.
(i) a. Robin didn’t say anything interesting.
b. *Anything interesting, Robin didn’t say t.
(ii) a. Robin didn’t say that anything interesting happened.
b. *That anything interesting happened, Robin didn’t say t.
(iii) a. Robin denied that anything interesting happened.
b. That anything interesting happened, Robin denied t.
(Laka does not cite these cases, but does cite examples involving subject complements that make
the same point.) Along similar lines, note that the NPI must be c-commanded by the element
that governs it. Such a relationship does not hold in a pseudo-cleft, nor does ‘reconstruction’
feed the constraint that licenses NPIs. But within a selected negative clause in focus position of a
pseudo-cleft, an NPI is fine.
(iv) a. Robin didn’t do anything interesting.
b. Robin denied that anything interesting happened.
(v) a. *What Robin didn’t do was [anything interesting].
b. What Robin denied was [that anything interesting happened].
The force of this evidence, along with Laka’s, appears to show clearly the existence of thatNEG.
topicalization, inversion, and complementizers 229

disguise, in that they contain a covert wh element that triggers inversion (Katz
and Postal 1964; Klima 1964). This element is whether.
The traditional approach to the direct yes-no question also assumes that
whether is deleted in S-structure. Such an analysis does not explain why this
deletion is obligatory, or why it is impossible in embedded wh-questions.
(35) a. (*whether) did you call Robin
b. I wonder *(whether) Lee called Robin
We could therefore modify the traditional analysis as follows. The absence of
whether in the S-structure of direct yes-no questions suggests that whether is
never in [Spec,PolP]. Rather, whether is a CP-adjunct, and thus will move into
[Spec,CP] to satisfy Spec-head agreement with the complementizer Q.
(36) . . . [CP [Spec whetheri] Q [PolP [Spec ] Pol [IP NP I VP]] ti ]
The treatment of whether as a CP-adjunct is consistent with Klima’s (1964)
analysis, in which whether has the underlying form wh-either. Either, for its
part, is plausibly analyzed as a CP-adjunct, the affective variant of too, as in
Robin didn’t leave, either; Robin left, too, etc.
On this view of whether as a CP-adjunct, inversion in a direct yes-no
question cannot be the reflex of movement of whether to [Spec,PolP] and
then deletion of whether. Inversion must arise from the adjunction of Infl to
wh when [Spec,PolP] is empty. The derivation of a direct yes-no question is
then as follows.28
(37) [PolP [Spec ] wh [IP NP Infl VP]] )
[PolP [Spec ] wh+Infl [IP NP t VP]]

If Pol is neg or so, we will get inversion after whether, as illustrated in the
following examples, repeated from (30).
(38) a. ?Lee wonders whether at no time at all would Robin volunteer.
b. Lee wonders whether only then would Robin volunteer.
c. ?Lee wonders whether so many people did Robin insult that he does
not dare return home.
d. Lee will finally tell us whether or not to so many people did Robin
give his phone number to that we can expect phone calls all week.

28
This derivation obviously requires that empty [Spec,PolP] agrees with wh for the purposes
of Spec-head agreement, which appears to conflict with Rizzi’s (1996) wh-Criterion, which
requires that [Spec,CP] be overtly filled. In order to maintain this criterion, we would have to
assume the existence of an abstract operator (e.g. WH+SO) that is a PolP-adjunct. I can find no
independent syntactic evidence to support the existence of such an operator.
230 explaining syntax

Following Borer (1989), we have an interesting account of the difference


between whether and if.29 Borer suggests that whether is in Spec, while if is
C. In terms of our analysis, since if is an overt complementizer it cannot
participate in Spec-head agreement; hence we do not get:
where
(39) * who if
why
As a complementizer, if is in complementary distribution with that and Q. If is
thus in some sense an interrogative or irrealis variant of that; like that, it cannot
take an infinitival complement. Thus we have the following distribution.
whether
(40) I was wondering to leave now.
*if

Finally, neither if nor whether appears in main questions, since if is C and


whether is a CP adjunct, while a main clause is maximally a PolP.30

8.3.4 Elliptical constructions


Using the C–Pol analysis, we can express in a more or less natural way the
differential behavior of elliptical embedded sentences depending on the form
of the C–Pol sequence. The cases that we are concerned with are those in
which IP is empty.
(41) a. [ . . . [CP Spec Q [IP e ]] . . . ]
b. [ . . . [CP Spec that [PolP Spec neg [IP e ]]] . . . ]
c. [ . . . [CP Spec that [PolP Spec so [IP e ]]] . . . ]
If [Spec,CP] is filled with a WhP, as in (42), then we get the familiar Sluicing
construction (Ross 1969b).31

29
For a different view of if and whether, see Stuurman (1991).
30
There may be a phonologically empty variant of if that occurs in subjunctive inversion.
(i) a. If John had left, I wouldn’t have called.
b. Had John left, I wouldn’t have called.
Let us call this element if. Like if, if is a C. I presume that, like neg and so, it must be bound
even though it is phonologically empty. Thus we get inversion, as in (ii).
(ii) [CP [Spec ] if+hadi [IP John ti left]]
31
We do not get *I forget [whether Q [IP e ]], for reasons that are probably tied to the fact that
Sluicing is a focusing construction, and whether cannot be in focus. For a general approach to
the syntax of focus, see Rochemont (1986) and Rochemont and Culicover (1990).
topicalization, inversion, and complementizers 231

who
what
where
when
(42) . . . but I forget [CP how Q [IP e ]]
why
which NP
how AP
etc.
In this construction, [IP e ] is interpreted in such a way that in LF it contains
a variable that is bound by the fronted WhP. For example, (43) is interpreted
as (44).a
(43) Robin saw someone, but I forget who
(44) ∃x (Robin saw x), but I forget who:x )
∃x (Robin saw x), but I forget who:x (Robin saw x)32
Crucially, there is no counterpart to the Sluicing construction for topicaliza-
tion, fronted NegP or fronted SoP, as illustrated in (45).
(45) a. Robin saw someone, and I believe that Fred, *(Robin saw t)
b. Lee said that Robin saw someone, but I believe that not a single
person *(did he see).
c. Lee asked whether Robin saw everyone, and I said that so many
people *(did he see that . . . ).
The ungrammaticality of these examples supports the view that embedded
questions are structurally different from topicalization, Negative Inversion,
and so-Inversion in ways that I have already discussed. The ungrammaticality
of (45b) and (45c) follows directly from our analysis, since without the
possibility of inversion in the embedded clause, the morpheme neg or so
cannot be bound.
NEG
(46) . . . and NP V [CP [Spec ] that [PolP NegP/SoP] SO
[IP e ]]]

NEG
Necessarily, SO
cannot cross over the filled [Spec,PolP] and adjoin to that.
This is a plausible assumption to make for such a cliticization operation.

a
For a more recent account of the interpretation of Sluicing that does not assume an empty
IP, see Culicover and Jackendoff (2005; 2012).
32
This analysis of Sluicing entails that the island constraints cannot be conditions on the
LF representations since, as Ross pointed out in his original paper, there are well-formed
instances of Sluicing that violate the Complex NP Constraint, for example.
(i) John met a man who was wearing some kind of hat, but I don’t know what kind of hat
[*John met a man who was wearing t].
232 explaining syntax

The ungrammaticality of (45a), on the other hand, may stem from the fact
that the empty IP is not formally licensed by that, owing either to the presence
of the topic, the inability of that to be a head governor in general, or both.
I leave the question open here.
Consider next (41b) and (41c). Here, unlike in the case of Sluicing, the
empty IP may be treated as a prosentential that does not contain a variable
that is bound from outside IP. I represent this IP as +pro, without claiming
that it necessarily has the properties attributed to +pro in the Binding theory.
(47) a. . . . [CP Spec that [PolP Spec neg [IP +pro ]]]
b. . . . [CP Spec that [PolP Spec so [IP +pro ]]]
Unlike in the topicalization case of (45a), the empty IP here is properly head
NEG
governed by SO
. But because neg and so are morphemes that must be
bound, these are ill-formed S-structures as given here. Suppose that neg and
so adjoin to that over an empty Spec.33
(48) . . . [CP that+negi [POLP Spec ti [IP +pro ]]]
. . . [CP that+soi [POLP Spec ti [IP +pro ]]]

33
Alternatively, we may assume that cliticization of neg and so to that does not yield a well-
formed PF representation, but that cliticization to the empty complementizer [e] does. This
alternative is made attractive by the observation that in general not and so may only occur with
that-Deletion verbs.

believe
hope (that) S
(i) a. I expect so .
imagine not
persuade him

*whispered
(that) S
?regretted
b. Lee so .
*ordered
not
*established

The generalization is not perfect, however, in that there are some verbs that allow that-Deletion
but not
so .
not

know (that) S
(ii) I understand so
remember ?*
not
topicalization, inversion, and complementizers 233

On this approach, the realization of that+neg in PF is not, and the realization


of that+so is so, that is, a realization of C+Pol.34 The English structure for
these expressions thus parallels the S-structure of comparable expressions in
French, as illustrated in (49).
(49) a. Je crois que oui.
I think that yes
‘I think so.’
b. Je crois que non.
I think that no
‘I think not.’
c. *Je crois oui.
*Je crois non.
The difference is that in French, oui and non are free and therefore do not
need to adjoin to C.35

8.3.5 Why and how come


Consider now the distribution of why and how come. It, too, relies crucially on
the existence of both CP and PolP. It is generally accepted that why is
structurally different from other wh proforms. For example, Rizzi (1990)
suggests that why, unlike the other wh’s, can be generated in [Spec,CP]
without undergoing Move Æ. I adapt Rizzi’s general approach here.
In the current analysis, the complex behavior of why appears to be best
captured if we assume that it is a PolP adjunct that moves into [Spec,PolP]
when Pol is wh and into [Spec,CP] when C is Q.36 The semantically related
how come is not a WhP, and therefore cannot undergo Move Æ into [Spec,
PolP] in direct questions. I will show that it is in fact a PolP-adjunct, and
moves into [Spec,CP] when C is Q.
Let us first establish the basic difference between why and how come. As the
following examples show, how come is not a true wh-interrogative: it does not

34
I leave open in this paper the proper treatment of English not in auxiliary and other uses.
For some very interesting discussion, see Laka (1990), who takes not and n’t to be surface
realizations of neg. Alternatively, we might pursue the hypothesis that not is [Spec,PolP] when
Pol is neg, while n’t is neg. [For some additional discussion, see Ch. 6 above.]
35
Why we cannot say *I think that yes and *I think that no in English is an independent
question. For some discussion, see Laka (1990).
36
An alternative is that why is generated in [Spec,PolP]. But how come must be a PolP-
adjunct, as I show immediately below, so taking why to be a PolP-adjunct allows us to treat why
and how come as essentially the same.
234 explaining syntax

allow inversion, it cannot co-occur with the hell/in the world, and it cannot
occur with ever, in contrast with why and the other interrogatives.37
*why
(50) a. did Robin say that
how come
*why the hell
b. did Robin say thatb
how come in the world
*why
c. Robin said that
how come

(51) a. ?whyever would you do that?


b. *how come ever you would do that?
c. *however come you would do that?
d. whenever he leaves, tell me
e. whatever did he say?
(52) a. why would you ever do that?
b. *how come you ever would do that?
c. *however come would you do that?
d. when did he ever say those things?
e. what did he ever say to you?
If how come is not a WhP, it can never appear in [Spec,PolP], because it cannot
agree with wh. On the other hand, how come must be interrogative in some
sense, because it can appear in [Spec,CP] when C is Q, as (53) shows.
(53) I wonder how come Robin said that.
It follows that how come must either be generated in [Spec,CP] in D-structure,
or it must be an adjunct that may move into [Spec,CP]. In view of the fact that
how come may also appear in main clauses, which lack C and [Spec,CP],
I conclude that how come is a PolP-adjunct.
In contrast, why is a WhP. To trigger inversion, it must move into [Spec,
PolP] (so that it will trigger inversion) and it must raise to [Spec,CP] when

37
As Pesetsky (1987) shows, the hell/in the world is compatible only with the sentence-initial
interrogative, i.e. the one that takes widest scope.
(i) a. who the hell hit Mary
b. who hit who
c. who the hell hit who
d. *who hit who the hell
e. *who the hell hit who the hell
b
But ?how the hell come seems to be marginally possible.
topicalization, inversion, and complementizers 235

C is Q, as in wh-questions in general (to satisfy Spec-head agreement with Q).


(54) illustrates
(54) I wonder [CP [Spec ] Q [PolP [Spec ] wh why [IP . . . ]]] )
I wonder [CP [Spec ] Q [PolP [Spec why] wh t [IP . . . ]]] )
I wonder [CP [Spec why] Q [PolP [Spec t0 ] wh t [IP . . . ]]] )
I wonder [CP [Spec why] Q+wh [PolP [Spec t0 ] t [IP . . . ]]]
The following evidence suggests that this is the correct analysis.38
why
*how
(55) a. Robin told me not to fix the sink, but he didn’t tell me not.
*when
*where
b. Robin told me that I shouldn’t fix the sink, but he didn’t tell me
why
*how not
*when
*where
c. Robin told me not to look at someone/something, but he didn’t tell
me *what not.
*who
We have seen that so and not are realizations of prosententials. It is possible to
have
(56) a. why so
b. why not

38
A similar but distinct pattern holds for infinitival questions, e.g.

??where to
when to
(i) a. . . . Robin didn’t know *what to .
*who to
*how many to

?why not to
?where not to
b. . . . Robin didn’t know, when not to .
*what not to
*who not to
?how many not to

I do not find the judgments stable, however, and therefore I will forgo attempting to account for
them here.
236 explaining syntax

but not
(57) a. *how so
not

b. *where so
not

c. *when so
not

d. *what so
not

e. *who so
not

Thus, ?why
*how
(58) He said he wanted to leave, but he didn’t say so.
*where
*when
(59) a. *He said that he did something for a strange reason, but he didn’t say
what so.
b. *He said that he wanted to see someone for some reason, but he
didn’t say who so.
Some speakers do not accept why so at all. But there is another elliptical
construction in which why so and how so appear to be quite acceptable, while
the other interrogatives are not.39
(60) A: Robin will not leave on time.
B: i. Why so?
ii. How so?
In this case, how so has more or less the interpretation why so. Note that we
cannot have *how not, which suggests that this use of how is idiosyncratic.
On the analysis of Sluicing in }8.3.5, the interrogative is in [Spec,CP], as
in (61).
(61) . . . [CP [Spec what] Q [IP e ]]
Crucially, what must bind a trace in the LF representation of the empty
IP, which is thus not a prosentential. But suppose that why originates as a

39
Thanks to Marc Authier for suggesting this argument to me.
topicalization, inversion, and complementizers 237
NEG
PolP-adjunct, and Pol is SO
. While why binds a trace, the trace is not
contained within the minimal IP, which may therefore be +pro. Hence, why
not/so has the underlying structure in (62).
NEG
(62) [CP whyi Q [PolP Spec SO
[IP ti [IP +pro ]]]]

As in the analysis of think not ,


NEG
+[+pro] will adjoin to the comple-
so SO

mentizer, in this case Q, yielding not .


so
The claim that how come is a PolP-adjunct that moves into [Spec,CP] and
that why is a PolP-adjunct that moves into [Spec,PolP] is also supported by
the following facts.
why
how come
(63) a. What did Robin do, and ??how ?
?when
*where
why
how come ?
b. When did Robin go and
?how
*where
why
how come
c. Robin told me what to do, and ?how .
?when
*where
why
how come
d. (Tell me) who left, and ?*how .
*when
*where
The sentence what did Robin do, and why? in (63a) means ‘what did Robin do,
and why did he do that’ or ‘what did Robin do, and why did he do what he did.’
The ellipsis in (63) must therefore include the LF representation of the wh in
[Spec,PolP] as well as the trace that it binds; in effect, it must include the LF
representation of the IP after reconstruction, as shown in (64) for (63a).40
(64) [PolP what [IP did Robin do t]] and [PolP whyi [IP ti [IP Robin do what]]]

40
I leave open here the precise details of how the ellipsis is to be formally captured. For a
range of views, see Sag (1976), Wasow (1979), and Williams (1977).
238 explaining syntax

On the other hand, the other wh-words are moved into [Spec,PolP] by Move
Æ. Consequently, if the IP is reconstructed as in (64), there will be no trace in
the reconstructed IP for the moved wh to bind, as in *what did Robin do and
how, shown in (65). The reconstructed IP is shown in strikeout.
(65) [PolP what [IP did Robin do t]] and [PolP how [IP Robin do what]]
The unavailability of a trace in the reconstructed IP for the moved wh explains
the ungrammaticality of the sentences in (63) that lack why or how come.41
By assuming that why and how come originate outside of IP we can also
account for the fact that only these interrogatives allow internal topicaliza-
tion. We have already seen that topicalization blocks extraction of a wh from
IP, because of the topic island created by adjunction. I repeat the examples
of (6).
(6) a. *I asked what, to Lee, Robin gave.
b. *Lee forgot which dishes, on the table, you are going to put.
c. *Robin knows where, the birdseed, you are going to put.
However, why and how come are generated outside of IP. Topicalization can
apply freely below them, adjoining to IP. The following examples demonstrate
that the prediction is correct.42
(66) a. I asked why , to Lee, Robin gave the book
how come

b. Lee forgot why , on the table, you are going to put the dishes
how come

c. Robin knows why , the birdseed, you are going to put in the
how come
bird feeder

41
Along related lines, the following examples show that it is possible to have ellipsis in a
relative clause when the relative proform is why or how come, but not when it is another relative
proform, that or empty complementizer.
(i) a. John would not tell me the reason why (not).
b. John would not tell me the reason how come (*not).
c. *John would not tell me the way how (not).
d. *John would not tell me the time when (not).
e. *John would not tell me the place where (not).
f. *John would not tell me the thing which (not).
g. *John would not tell me the person who (not).
42
Sentences such as these are problematic for Lasnik and Saito (1992).
topicalization, inversion, and complementizers 239

8.4 Extension to focus


In the preceding sections I presented a variety of evidence to support the view
that there are two complementizer-type positions in English, each of which is
the head of a maximal projection. In many respects this analysis is in the spirit
of the approach taken by Pollock (1989), Chomsky (1989), and Johnson
(1989), and is quite close in certain details to that of Laka (1990). There are
apparent differences: the heads that I propose are outside IP, while Pollock,
Chomsky, and Johnson are concerned with heads within IP that form part of
the inflectional system. Laka suggests that in English there is a head  that
ranges over neg and Aff(irmative), and appears internal to IP. In what follows
I will show that the various approaches fall together to a consider extent.

8.4.1 Licensing subjects


There is a significant problem with the analysis that I have proposed that
suggests that the outside-IP/inside-IP distinction just drawn is not a strict
one. As noted by Rizzi (1990), an analysis that proposes that wh in Spec
triggers inversion must take into account the fact that inversion does not
occur with subject wh-phrases.
(67) [PolP [Spec ] wh [IP whoi [Past do] leave]] )
[PolP [Spec whoi] wh [IP ti [Past do] leave]] )
[PolP [Spec whoi] wh+[Past do]j [IP ti tj leave]]
(68) a. who left
b. *who did leave
The ungrammaticality of *who did leave with unstressed did shows that
inversion does not apply in these cases. But in the current analysis, it is
necessary to adjoin Infl = [Past do] to wh, so that wh can be bound.
A similar problem arises in the case of negation and so; we get
(69) a. no one left
b. *no one did leave
(70) a. so many people left
b. *so many people did leave
In the spirit of the analysis proposed in this paper, the obvious move to make
here is to assume that PolP may be a complement of Infl as well as of C.43

43
Of course, we will still have to rule out the ungrammatical examples. The obvious
approach would be to extend the ECP for subject traces to cases in which Pol is not empty, e.g.
(i) [PolP [Spec whoi] wh+didj [IP ti tj leave]]
(ii) [PolP [Spec no onei] neg+didj [IP ti tj leave]]
240 explaining syntax

The sequence Infl–Pol allows Pol to raise to Infl in order to be bound without
yielding the S-structure inversion pattern, as in (71).
(71) [IP whoi [Infl Past do] [PolP Spec wh [VP . . . ]]] )
[IP whoi [Infl Past do]+wh [PolP Spec t [VP . . . ]]] )
[IP whoi [Infl Past]+wh [PolP Spec t [VP . . . ]]]
After this raising, Infl is a composite head that can license the wh in subject
position through Spec-head agreement. Similarly for neg and so.44
For this derivation to work as intended, do must be deleted before V even
across Pol. A question then arises as to why not blocks the deletion of do, given
that not is an instance of the head neg (cf. Laka 1990).
This derivation also entails that when the subject is questioned, the inter-
rogative remains in situ in S-structure, in contrast with questions where a non-
subject is interrogative. Finally, empty [Spec,PolP] inside of IP does not block
the deletion of do, nor does it appear to be a landing site, for English at least.
I will not deal in detail with the first point, which appears to be merely a
technical matter.45 On the second point, there appears to be no strong
evidence that the interrogative is anywhere other than in subject position in
S-structure. The fact that the subject functions as the focus of the sentence
follows from the fact that it is a WhP in the scope of a wh functional head. As
shown by multiple wh-questions, a WhP need not move into Spec to be
interpreted as a focus.
(72) What did you give to whom?
The claim that a negative subject is in situ (as in no one left) is far less controver-
sial, although the pattern appears to be exactly identical to that of the interroga-
tive. In the negative case we would say that Pol is neg; similarly for so.46

In each case, ti is not properly governed, since it is not coindexed with wh. I speculate that when
Pol is wh, neg, or so, agreement with what is in [Spec,PolP] does not entail coindexing. But
when Pol is [e], agreement can only be accomplished through coindexing.
44
An alternative is that wh, neg, so, etc. may appear as features on I as well as functional
categories external to IP. This dual status of Pol is problematic, however, and should lead us to
eliminate one of the two possibilities. Because of space limitations I will not pursue this question
further here.
45
The obvious route to pursue is that not is [Spec,PolP], and the head is neg. Then do will be
deleted unless there is a filled Spec between it and V.
46
An examination of Spanish is instructive in this regard. In Spanish, a negative sentence has
an overt sentence-initial no unless there is a fronted negative constituent.
(i) a. no lo tengo
neg it I-have
b. Juan no lo tiene
John neg it has
topicalization, inversion, and complementizers 241

(73) a. [IP no onei [Infl Past]+neg [PolP Spec t [VP . . . ]]]


b. [IP so many peoplei [Infl Past]+so [PolP Spec t [VP . . . ]]] (that . . . )
In fact, to the extent that there is evidence that bears on this question, it
suggests that the wh, the negative, and the so subject are in situ. As the
following examples show, there is a lack of parallelism between subject and
non-subject cases, suggesting that only the non-subject WhP and NegP move
into [Spec,PolP].
(74) a. Who will Robin see and [who] will Lee talk to?
b. Who will Robin see and [who will] Lee talk to?
c. *Who will Robin see and [who will] talk to Lee?
d. *Who will Robin see and [who] will talk to Lee?
e. *Who will talk to Lee and [who will] Robin see?
f. *Who will talk to Lee and [who] will Robin see?
(75) a. Leslie told me who Robin will see and [who] Lee will talk to.
b. Leslie told me who Robin will see and Lee [will] talk to.
c. *Leslie told me who Robin will see and [will] talk to Lee.
d. *?Leslie told me who will talk to Lee and Robin will see.
(76) a. No one will Robin see and will Lee talk to.
b. No one will Robin see and [will] Lee talk to.
c. *No one will Robin see and [no one will] talk to Lee.
d. *No one will Robin see and will talk to Lee.
e. *No one will talk to Lee and [no one will] Robin see.
f. *No one will talk to Lee and will Robin see.

(ii) no dice nada


neg he-says nothing
(iii) no hay nunca ninguna carta de nadie
neg there-is never no letter from nobody
(iv) a. no está nadie en casa
neg is no one at home
b. nadie (*no) está en casa
no one neg is at home
(v) a. no habla inglés ninguno de ellos
neg speak English none of them
b. ninguno de ellos (*no) habla inglés
none of them neg speaks English
We may capture this distribution of facts by supposing that neg appears either external to IP or
internal to IP. In either case, it is realized as no unless there is a negative specifier with which it
can agree. When it is external to IP it licenses a negative in [Spec,PolP]; when it is internal to IP,
it licenses a negative subject. For a full treatment along related lines, see Laka (1990).
242 explaining syntax

It is of course possible to assume that the subjects move in each of these


examples, leaving a trace, and that parallelism requires that the trace be a
subject in both conjuncts or a non-subject in both conjuncts. But the
assumption that there is no movement of the subject explains the lack of
parallelism directly, with no additional stipulation on the traces.
Contraction processes also appear to treat the subject WhP or NegP as
though it were in situ. Will in Pol does not contract, but will in Infl does
contract when the subject is pronominal, or when it is who.
I
you
(77) a. she will leave
we
they
I
you
b. she ’ll leave
we
they
(78) a. Lee will leave
b. *Lee’ll leave
(79) a. Who will leave
b. Who’ll leave?
(80) a. who will Lee visit
b. *who’ll Lee visit
c. who[ʌ]ll Lee visit
These examples thus support the view that in S-structure, subject who is in
situ.
Next, consider Stylistic Inversion. On the analysis of Rochemont and Culi-
cover (1990), the underlying subject is in situ in S-structure, as shown in (81).
(81) [IP [VP ti into the room]j [IP [Past+walki]k [IP a man tk tj]]]
In more standard analyses (e.g. Safir 1985; Stowell 1981), the underlying
subject ends up in VP.c In either case, if we believe that a negative must be
moved into [Spec,CP] in order to get sentential scope, we are surprised to find
that a negative subject does not appear in [Spec,CP] when there is Stylistic
Inversion, but in the subject position.

c
And in the analysis in Ch. 9 of this book. For additional arguments that postverbal subjects
in focus constructions are in situ in VP, see Culicover and Winkler (2008).
topicalization, inversion, and complementizers 243

no one
(82) a. Into the room walked none of the women .
few of the women
No one
b. * None of the women into the room walked .
did into the room walk
Few of the women
No one
c. * None of the women did I say that into the room walked.
Few of the women

By the same token, in a simple sentence a wh-subject can appear in the


inverted position and yet receive a more or less normal interrogative inter-
pretation (as distinct from an echo interpretation).
(83) Into the room walked who?
We can account for this behavior if we suppose that the negative and wh-
subjects are associated with the appropriate polarity marker while remaining
in situ.47
Finally consider Gapping. Gapping in English typically occurs in a right
conjunct when the verbal sequence is identical in both conjuncts.
(84) a. Robin will eat peanuts and Lee [will eat] pistachios.
b. Lee was living in New York, and Robin [was living] in London.
With respect to Gapping, WhP and NegP subjects act like subjects in situ.
(85) a. Who will eat peanuts and who [will eat] pistachios?
b. Who was living in New York, and who [was living] in London?
(86) a. No man will eat peanuts and no woman [will eat] pistachios.
b. No one was living in New York, and no one [was living] in London.

47
There are alternatives, of course. It might be supposed e.g. that the inverted subject
position is a focus position, which requires that whatever occupies that position move to
Spec,CP in LF. While there is evidence for this position being a focus (see Rochemont and
Culicover 1990), this focus position crucially does not yield Weak Crossover, unlike S-structure
movement or true LF movement of a focus (see Chomsky 1977).
(i) a. *Whoi did hisi mother scold ti
b. *Hisi mother scolded johni
c. Onto hisi face fell johni
d. Onto hisi face fell which boyi
244 explaining syntax

(87) a. Many people here drive General Motors cars, but no one [drives] a
Pontiac.
b. Many people here drive General Motors cars, but who ?[drives]/
does [drive] a Pontiac.
(88) a. Many people here would drive a General Motors car, but no one
would [drive] a Pontiac.
b. Many people here would drive a General Motors car, but who would
[drive] a Pontiac.

8.4.2 Implications of internal PolP


Finally, let us consider the third point. It appears that PolP in English lacks
[Spec,PolP] when it is internal to IP, or that [Spec,PolP] cannot be filled in
this position. The analyses of Chomsky (1989), Johnson (1989), Pollock (1989),
and Laka (1990) appear to assume in general such a ‘defective’ character for
the projections of functional heads within IP in English. Hence the absence of
[Spec,PolP] in English does not appear to be exceptional.
One reason might be that English has a restriction that prohibits multi-
word phrases internal to the verbal sequence, so that otherwise identical
phrases contrast sharply, as in (89).
(89) a. Robin would never do that.
b. ??Robin would not ever do that.
c. *Robin would not once do that.
d. Robin wouldn’t ever do that.
(90) a. Robin will immediately leave.
b. *Robin will at once leave.
It is plausible, therefore, that English has [Spec,PolP] internal to IP, but it can
only be filled by simple adverbials, such as not, so, too. Then for sentence
negation we may take the negative head (i.e. head of NegP in the treatments of
Chomsky 1989, Johnson 1989, and Pollock 1989) to be neg, and for sentence
so, so.
(91) a. [IP NP I [PolP [Spec not] neg [VP V. . . ]]]
b. [IP NP I [PolP [Spec so] so [VP V. . . ] ]]
NEG
In such cases SO
is cliticized to V raised into Pol. This treatment also allows
us a uniform account of the some-any phenomenon: any is licensed when it is
to the right of and c-commanded by a negative head. Cf.
topicalization, inversion, and complementizers 245

(92) a. Robin didn’t neg like anyone.


b. No one neg liked anyone.
c. Robin denied[+neg] liking anyone.
d. Robin gave neg nothing to anyone. [with Spec-head agreement
possibly satisfied in LF]
e. *Robin saw no one in any room. [where no one has narrow scope]
The data superficially suggest, too, that wh cannot head an internal PolP in
English when there is an empty Spec. We may avoid this stipulation by
assuming that interrogative intonation in yes-no questions without inversion
corresponds to just this configuration (cf. Katz and Postal 1964).
(93) You’re going out again? ↑
But what of the case of filled internal [Spec,PolP] when Pol=wh?48 The
following wh-questions are clearly ungrammatical, regardless of the relative
order of Infl and wh.
who WH will talk to
what WH wants to do
(94) *Robin why WH will leave early
how WH fixed the car
etc.
will who WH talk to
does what WH want to do
(95) *Robin will why WH leave early
did how WH fix the car
etc.
We cannot rule all of these out as violations of the constraint that the internal
[Spec,PolP] in English may not contain a complex phrase-level constituent.
There is no apparent difference in syntactic complexity between when and
then, but only then can appear internally.

then
(96) Robin opened the door.
*when
One solution rests on the fact that wh is a clitic. If [Spec,PolP] is filled, then
wh cannot cliticize to I. If, in addition, V cannot adjoin to Pol, then wh will
not cliticize to anything, and sentences such as (94) will not be generated.
We would expect, in any event, that in some languages at least PolP could
have a phrasal [Spec,PolP] and a wh head internal to IP. In fact, Horvath
(1985) shows that the landing site for interrogative wh in main clauses in

48
Thanks to Peter Coopmans for raising this question for me.
246 explaining syntax

Hungarian is in pre-V position. In contrast, the landing site for relative wh is


in [Spec,CP], as illustrated below.
(97) a. Mari miti tett az asztalra ti
Mary what-acc put the table-onto ti
‘What did Mary put on the table?’
b. Az edények amiketi Mari az asztalra tett ti
the dishes which-pl.acc Mary the table put ti
‘the dishes which Mary put on the table’
The difference between these two constructions is that the interrogative
contains a PolP whose head is wh. Hence Move Æ moves WhP to [Spec,
PolP].49 But in the relative clause, the relative marker is the head of CP and the
relativized phrase moves into [Spec,CP].
Horvath also shows that the pre-V position is in general a focus position in
Hungarian.
(98) Mari az asztalra tette az edényeket
Mary the table-onto put the dishes-acc.
‘Mary put the dishes on the table.’

There are also SVO languages with focus to the right of V (e.g. Swahili,
M. Rochemont, p.c.). In such a language, the focus constituent can be
moved into [Spec,PolP], and subsequent raising and adjunction of the
heads will move the verb to the left of the focus, as illustrated in (99).
(99) [IP NP [Infl+[Pol+Vi]]j [PolP Spec tj [VP ti . . . ]]]
For Arabic, Ouhalla (1994) has shown that there are two negative operators,
one external to TnsP (maa) and one internal to TnsP (laa). There are two
interrogative markers, ʔa and hal. Only the external interrogative is consistent
with disjunctive questions.
(100) a. ʔa Zaynab-a uy-hibbu Zayd-un ʔam Laylaa
Q Zaynab-acc 3ms-loves Zayd-nom or Laylaa
‘Is it Zaynab that Zayd loves or Laylaa?’
b. *hal Zaynab-a uy-hibbu Zayd-un ʔam Laylaa

49
Horvath views the focus position as governed by V. However, she raises the possibility
(1985: 146, n. 35) that an analysis similar to ours might be entertained, suggesting that the focus
position might be governed by Infl.
topicalization, inversion, and complementizers 247

The distribution is thus the same as in English, where a disjunctive question is


compatible only with SAI (triggered by the external wh) and not with
intonation (triggered by internal wh).
(101) a. Is it Leslie Lee loves, or Robin?
b. *Leslie loves Lee(↑,) or Robin↑?
Given that external negation and interrogation are overt distinguished in at
least some languages, it is reasonable to suppose that both positions are
utilized even in those languages, such as English, where they are not
distinguished.

8.4.3 Pol as focus in English


The preceding discussion raises the possibility that Pol expresses not only wh
and neg, but more generally focus. Consider Spanish in this regard. Laka
(1990), following Contreras (1976), shows that in Spanish the ‘emphatic’ word
order OVS is derived by Move Æ of the object into pre-IP position. She
demonstrates that this NP is a focus.
(102) a. Pedro viene mañana
Peter arrives tomorrow
b. mañana viene Pedro
tomorrow arrives Peter
We can account for this correlation of focus interpretation with the emphatic
word order by supposing, with Laka, that there is a focus position in Spanish
into which a focus can be moved. In particular, in our terms we may say that
Pol can be Focus in Spanish in pre-IP position, and constituents may be
focused by moving them into [Spec,PolP].
Returning to English, suppose that Pol may designate focus in this language
as well. Intuitively it is correct to say that in English [Spec,PolP] is a focus
position, given the interpretation of wh-questions, Negative Inversion, and
so-Inversion. For example, a fronted negative or so phrase can serve as the
answer to a wh-question, and in fact must be focus (Rochemont 1978).
(103) Did you see anyone?
a. No, not a single person did I see.
b. Yes, so many people did I see that I was amazed.
(104) a. Q: Who visits Robin on very few occasions?
b. A: *On very few occasions does Leslie visit Robin.
(For discussion of this type of test for focus, see Rochemont 1986.)
248 explaining syntax

Suppose that Pol can be Focus. This value of Pol is distinct from wh
(interrogation), neg (negation), and so (emphasis). Since Focus is empty, it
can agree with its Spec, just as empty C can (Rizzi 1990). By assumption it is
phonologically inert and does not trigger inversion. We would predict that
certain instances of movement that appear to be topicalization are actually
movements to [Spec,PolP] of Focus. On the assumption that a topic can
adjoin to IP, we then predict the existence of two different structures for
essentially the same sequence in S-structure.
(105) [PolP [Spec XPi] Focus [IP . . . ti . . . ]]
[PolP Spec Pol [IP XPi [IP . . . ti . . . ]]]
Consider how these structures differ from one another and what empirical
predictions are made. First, there might be two intonations corresponding to
the two structures, where one intonation corresponds to a focus interpret-
ation and the other does not. Second, when XP is moved into [Spec,PolP] it
should be possible to extract over it, just as it is possible to extract over a
fronted negative constituent.
Concerning the prosodic difference, it has been noted in the literature that
there are two distinct topicalization intonation contours, ‘topic’ and ‘focus’
(Gundel 1974). The topic intonation is the typical ‘comma intonation’, where
the topic and the rest of the sentence constitute separate intonation groups.
(106) a. To Robin, I gave a book.
b. On the table, Lee put the books.
c. Last year, we were living in St. Louis.
d. In those days, we drove a nice car.
e. Robin, I really dislike.
The focus intonation is characterized by a primary stress in the topic and no
break between the topic and the rest of the sentence. It is possible for there to
be an additional primary stress elsewhere in the sentence as well.
(107) a. To robin I gave a book.
b. On the table Lee put the books.
c. last year we were living in St. Louis.
d. In those days we drove a nice car.
e. robin I really dislike.
(108) a. To robin I gave a book.
b. On the table Lee put the books.
c. last year we were living in St. Louis.
d. In those days we drove a nice car.
e. robin I really dislike.
topicalization, inversion, and complementizers 249

The claim that the stressed elements in these sentences are foci is supported
by the fact that they can be used to answer corresponding questions
(Gundel 1974 : ch. 5), To whom did you give a book?, etc.; To whom did you
give what?, etc.
Consider next extraction. PolP is not a barrier, since it is c-selected by C (in
the sense of Cinque 1990). Where the topic is in [Spec,PolP], then, we expect
that extraction from IP over PolP into a higher Spec should be possible.
Moreover, this extraction, if it is possible, should correlate with the focus
intonation difference.
The examples in (109)–(112) test this prediction. The first group of sen-
tences illustrates extraction over an IP-adjoined topic. In the (a) examples the
wh-phrase moves over the topic into the closest [Spec,CP]. In the (b)
examples the wh-phrase moves to a higher [Spec,CP]. In the (c) examples
the wh-phrase moves over the topic into the closest [Spec,CP] and Infl must
also move to the left of the topic in order to move into Pol. Hence Infl as well
as wh crosses both IP nodes in the (c) examples.
(109) a. *This is the book which, to Robin, I gave.
b. *Which book did Lee say that, to Robin, she gave?
c. *Which book did, to Robin, Lee give?
(110) a. *I picked up the books which, on the table, Lee had put.
b. *Which books did Lee say that, on the table, she had put?
c. *Which books did, on the table, Lee put?
(111) a. *This is the town in which, last year, we were living.
b. *In which town did Lee say that, last year, we were living?
c. *In which town were, last year, you living?
(112) a. *This is car which, in those days, we drove.
b. *Which car did Lee say that, in those days, we drove?
c. *Which car did, in those days, you drive?
As we can see in these examples, with the comma intonation extraction over
the topic is uniformly ungrammatical.
Next, consider extraction over a focus. In the (a) examples we have
movement to an embedded [Spec,CP] without inversion, while in the (b)
examples we have movement to a higher [Spec,CP]. In the (c) examples, Pol
must be wh in order that the wh-question be well-formed. Hence Pol cannot
be Focus. The topic must be adjoined to IP, which creates a topic island. Thus
250 explaining syntax

we predict that simple wh-questions with a focus in [Spec,PolP] are


impossible.50
(113) a. This is the book which to robin I gave.
b. Which book did Lee say that to robin she gave?
What

c. * Which book to robin did Lee give?


What

(114) a. I picked up the books which on the table Lee had put.

b. Which book did Lee say that on the table she had put?
What

c. * Which book on the table did Lee put?


What

(115) a. This is the town in which last year we were living.


where

b. In which town did Lee say that last year we were living?
Where

c. * In which town last year were you living?


Where

(116) a. This is car which in those days we drove.


b. Which car did Lee say that in those days we drove?
What

c. * Which car in those days did you drive?


What

The judgments here are subtle. Nevertheless, there appears to be a clear


improvement in the (a) and (b) examples when the comma intonation is
eliminated, supporting the predicted distinction. We also expect to have
multiple topicalization just in case the inner topic is a focus. The cases of
multiple topicalization in the literature appear to have this property.51

50
The sentences in (113)–(116) are somewhat reminiscent of Baltin’s (1982) well-known He’s a
man to whom liberty we could never grant.
51
Stylistic Inversion also has a smooth intonation, suggesting that it is a case of ‘focus’
topicalization.
topicalization, inversion, and complementizers 251

(117) a. This book to robin I gave.


b. Last year in St. louis we were living.
c. In those days a nice car we drove.
(118) a. *This book, to Robin, I gave.
b. *Last year, in St. Louis, we were living.
c. *In those days, a nice car, we drove.
(119) a. This book Lee says that to robin I gave.
b. Last year Lee says that in St. louis we were living.
c. In those days Lee says that a nice car we drove.
d. In those days Lee says that a nice car we drove and an old car we
avoided like the plague.
(120) a. *This book Lee says that, to Robin, I gave.
b. *Last year Lee says that, in St. Louis, we were living.
c. *In those days Lee says that, a nice car, we drove.
Finally, it has been noted in the literature that topicalization does not show
Weak Crossover effects, since the topicalized constituent is not an operator
that binds one or more variables. In contrast, we would expect that focus
topicalization would produce Weak Crossover effects, since a focus is inter-
preted as an operator (Chomsky 1977). The following judgments, while
delicate, appear to support the analysis.

(i) a. Into the room (*,) walked Mary


b. Susan said that (*,) into the room (*,) walked Mary.
The view that the landing site for the fronted expression is [Spec,PolP] is supported by the fact
that extraction from the inverted subject is marginally possible.
(ii) ?This is the person who/that Susan said that onto the floor had fallen an expensive picture
of.
This derivation is also consistent with the VP topicalization analysis for Stylistic Inversion
proposed by Rochemont and Culicover (1990), in which the V is moved out of VP into Infl, the
VP is then fronted, and then Infl+V is moved to the right of the topicalized phrase.
(iii) [VP ti into the room]j [Infl+walki]k Mary tk tj
On the Rochemont and Culicover analysis these movements are non-structure-preserving,
while on the current analysis they can be formulated as structure-preserving substitutions.
However, the topicalized VP is not interpreted as focus (Rochemont and Culicover 1990).
Rather, the focus is the subject, which is a puzzle for the current analysis.
I am grateful to Heizo Nakajima for pointing out to me that treating Stylistic Inversion as
focus topicalization correctly predicts that it will be possible to extract over the fronted
constituent, as in (iv).
(iv) a. John said that in the park, under the tree stood a man who had an appointment with
Mary.
b. In which park did John say that under the tree stood a man who had an appointment
with Mary?
252 explaining syntax

(121) a. Robini, hisi mother really appreciates.


b. To Robini, hisi mother gave lots of presents.
(122) a. *robini hisi mother really appreciates.
b. *To robini hisi mother gave lots of presents.
I therefore conclude that in English, as in Hungarian and other languages, Pol
may be Focus.

8.4.4 Comparative Germanic


I conclude with some observations about the implications of the English Pol
analysis for the description of the other Germanic languages. Owing to the
complexity of the subject matter and the already considerable length of this
paper, what I have to say here will be for the most part programmatic.
It is apparent that inversion in English should be formally related to the
verb-second phenomena of the Germanic languages. It is now a standard
analysis that V-second in Germanic arises from the movement of a tensed
verb into C. In general this movement occurs when [Spec,CP] is filled, as in
the German (123) (from Haider 1986).
(123) Gestern habe ich es auf den Tisch gestellt.
Yesterday have I it on the table put

In English, inversion must be triggered by wh, neg, or so. While Pol may
be Focus, Focus does not trigger inversion in English. German and the
Scandinavian languages differ from English in that inversion is found for
the most part with any fronted constituent. This latter group of languages can
themselves be differentiated according to whether or not V2 in complements
is in complementary distribution with the presence of an overt complemen-
tizer. For instance, the sequence [CP C–XP–V–NP– . . . ] is not possible in
German but it is in Faroese. (The Faroese examples are from Vikner 1991.)
(124) Ge. a. *Ich glaube, daß gestern habe ich es auf den Tisch gestellt.
b. Ich glaube, daß gestern ich es auf den Tisch gestellt habe.
c. Ich glaube, gestern habe ich es auf den Tisch gestellt.
d. *Ich glaube, gestern ich es auf den Tisch gestellt habe.
(125) Fa. a. Tróndur segđi, at ı́ gjár vóru dreingirnir als ikki ósamdir.
Trondur said, that yesterday were boys-the at-all not disagreed
b. *Tróndur segđi, at ı́ gjár dreingirnir vóru als ikki ósamdir.
c. *Tróndur segđi, at ı́ gjár dreingirnir als ikki vóru ósamdir.
topicalization, inversion, and complementizers 253

Example (126) shows that in German there must be inversion when there is a
‘topicalized’ constituent. It is standard in the analysis of German and the
other Germanic languages to hold that the surface order Subject–Verb–XP is
derived by V2, where the subject occupies the [Spec,CP] position. Hence in
German the tensed verb in the complement must follow a clause-initial
subject.
(126) a. Ich glaube, daß Johann Maria gesehen hat.
I believe that Johann Maria seen has
b. *Ich glaube, daß Johann hat Maria gesehen.
c. *Ich glaube, Johann Maria gesehen hat.
d. Ich glaube, Johann hat Maria gesehen.
But in Faroese, the tensed verb need not move into second position.
(127) a. Tróndur segđi, at dreingirnir vóru als ikki ósamdir.
Trondur said, that boys-the were at-all not disagreed
b. Tróndur segđi, at dreingirnir als ikki vóru ósamdir.
Trondur said, that boys-the at-all not were disagreed
Suppose that we express these differences in terms of Pol. In English, Pol
ranges over wh, neg, so, Focus, and [e], while in the other Germanic
languages it is unrestricted to topic. In all the languages but English, empty
Pol is a bound morpheme that must be bound to a lexical head, in particular,
V. In German, PolP is in complementary distribution with CP, while in
English and the other Germanic languages it can be a complement of
C. Hence in German, Pol is obligatory when there is no C; in the other
languages it is optional.52 (128) summarizes.
(128) English German Faroese
Range of Pol wh Topic Topic
neg
so
Foc
[e]
Empty Pol Free Bound Bound
Distribution Optional C Optional
of PolP complement of C Pol complement of C

52
Since Pol selects only tensed IP, it follows that there are no wh-infinitives in German.
254 explaining syntax

The obligatory character of Pol in German is brought out by the distribution


of expletives. Von Fintel (1990) argues that certain instances of expletive es are
actually realizations of obligatory [Spec,CP]. Adapting his analysis slightly, let
us say that in German, Pol must license a specifier. The presence of Pol is
signaled by a topicalized constituent and V2. When there is no topic, and only
when there is no topic, we will expect es. This expectation is precisely what is
shown by the following examples.
(129) a. *(Es) wurde gestern getanzt.
it became yesterday danced
‘Yesterday there was dancing.’
b. *(Es) sind drei Reiter in die Stadt gekommen.
it are three horsemen into the city came
‘Three horsemen came into the city.’
c. *(Es) hat jemand ein Haus gekauft.
it has someone a house bought
‘Someone has bought a house.’
(130) Hans sagte, dass
Hans said that
a. (*es) getanzt wurde.
b. (*es) drei Reiter in die Stadt gekommen sind.
c. (*es) ein Mann ein Haus gekauft hat.
it a man a house bought has
‘. . . a man bought a house’
(131) a. Wo wurde (*es) getanzt?
where became it danced
‘Where was there dancing?’
b. Wohin sind (*es) drei Reiter gekommen?
where-from are is three horsemen come
‘From where did three horsemen come?’
c. Wann hat (*es) ein Mann ein Haus gekauft?
When has it a man a house bought
‘When did a man buy a house?’

In (129) [Spec,PolP] is not filled, hence es must appear. In (130) there is a


complementizer, hence there is no Pol. Thus es cannot appear. In (131), [Spec,
PolP] is occupied, and again es does not appear.
topicalization, inversion, and complementizers 255

8.5 Summary
In this paper I have given evidence that there are two complementizer-type
positions in English, each of which is the head of a maximal projection. The
two heads, C and Pol, permit the explanation of a range of phenomena that
do not appear to be amenable to a one-complementizer analysis. For example,
the fact that there is no that-t violation when that is immediately followed by
one of a certain class of adjuncts is accounted for if empty Pol undergoes
agreement with the subject trace. The occurrence of SAI in embedded Nega-
tive Inversion and so-Inversion sentences but not in embedded wh-questions
has a natural account if we distinguish pure complementizers such as that and
Q from polarity operators such as wh, neg, and so. The assumption that Pol
selects only tensed S’s but not infinitivals allows us to explain the fact that
there are only wh infinitivals, not negative or so infinitivals. The C/Pol analysis
also allows us to capture some facts about the behavior of why and how come
as well as some subtle differences between them. By assuming some relatively
minimal differences in the range of Pol and in the distribution of PolP with
respect to C, it appears that we may be able to account for some of the
differences among the Germanic languages regarding V-second phenomena.
Finally, I have proposed that PolP can appear not only as a complement of
C, but as a complement of I. When it is IP-internal, [Spec,PolP] can function
as the location of pre-V focus, as in Hungarian. Allowing Pol to be Focus
allows us to capture the difference between comma intonation and focus
intonation topicalization in English, and predicts correctly that certain
instances of topicalization will not create topic islands. In languages like
Arabic, external and internal neg and wh are overtly distinguished, which
supports the general picture developed for English.
9

The Adverb Effect


Evidence against ECP accounts of the
that-t effect
(1992)*

Remarks on Chapter 9
This article is concerned with the fact noted in Chapter 8 that an adverb
(and other initial material) that intervenes between that and the trace of an
A0 extraction significantly ameliorates the that-t effect (*what do you think
that t happened? what do you think that just t happened?). I was unaware at
the time (i.e. had forgotten) that the data had been originally observed
by Bresnan (1977). The significance of the Adverb Effect is that it undermines
the ECP account—a grammatical constraint formulated in terms of antece-
dent and/or head government—since the intervening adverb does not on
the face of it significantly alter the syntactic configuration. It is of course
possible to make ad hoc assumptions about what the structure is
when the adverb is present that will change the government relations in
the intended direction, but the phenomenon calls out for an alternative
perspective.
The Adverb Effect and its evil twin, the that-t effect, are among the
more interesting puzzles unearthed in the contemporary exploration of
English syntax. At this point I am convinced that the correct account is
not a strictly syntactic one, but rather one that appeals to the computation
of the correspondence between syntactic structure and interpretation. Robert
Levine and I offer some speculation in Chapter 10 along these lines, but a
genuine explanation has yet to be provided.

* [A condensed version of this chapter first appeared in Linguistic Inquiry 24: 557–61 as
Culicover (1993). I am very grateful to Chris Barker, Peter Coopmans, Michael Rochemont,
Philip Miller, Mineharu Nakayama, Bob Levine, Carl Pollard, and an anonymous Linguistic
Inquiry reviewer for helpful comments and criticisms on various aspects of this research. This
article was inspired in part by the reviewer pointing out examples like (8) in the text.]
the adverb effect 257

9.1 The Adverb Effect


I have argued in Chapter 8 that suspension of the that-t effect provides
evidence for the existence of an empty category Pol(arity) that is distinct
from C and external to IP, as in (1). Subsequent investigation, reported here,
suggests that this argument does not go through. In fact, the evidence calls
into question the class of solutions to the that-t effect that crucially make use
of ECP, particularly in regard to the role of the complementizer in permitting
the trace of the subject to be properly governed.
(1) CP

Spec C⬘

C PolP

Spec Pol

Pol(arity) IP

The familiar contrast that illustrates the that-t effect is given in (2) and (3).
(2) a. I expected (that) you would win the race.
b. Which race did you expect (that) I would win?

(3) a. Whoi did you expect ti would win the race?


b. *Whoi did you expect that ti would win the race?
The examples in (4) show that the effect is suspended if there is a sentential
adverbial between that and IP.1

1
The same effect occurs with PPs topicalized out of VP, but it is more difficult to control for
the effects of crossing dependency and topic islands. The following examples appear to me to be
fairly acceptable, with focal stress on the topic.
(i) a. Robin met the man whoi Leslie said that [to kim]j ti had given the money tj.
b. I asked whoi you had claimed that [on this table]j ti had put the books tj.
258 explaining syntax
Op that
(4) a. Robin met the man whoi Leslie said that for all intents and
i
purposes ti was the mayor of the city.
b. This is the tree Opi that I said that just yesterday ti had resisted my
shovel.
c. I asked whati Leslie said that in her opinion ti had made Robin give
a book to Lee.
d. Lee forgot which dishesi Leslie had said that under normal circum-
stances ti should be put on the table.
Let’s call this the Adverb Effect.2
First I will examine the Adverb Effect and consider what it suggests about
ECP accounts of the that-t effect. Then I will explore extensions of the Adverb
Effect and show that it has some interesting implications for the analysis of
parasitic gaps.
The (questionable) argument for the empty functional category Pol(arity)
that I alluded to above goes as follows. Suppose we assume that a subject trace
is licensed by an empty complementizer, but not by an overt lexical comple-
mentizer. There have been a number of proposals in the literature for deriving
this result. Let us assume for concreteness the proposal of Rizzi (1990), in
which one possible instantiation of the empty complementizer is Agr, which
agrees with the trace in [Spec,CP] by general Spec-head agreement and, by
transitivity, with the subject trace as well, as shown in (5).3
(5) [CP ti0 Agri [IP ti . . . ]]

2
Note that the sentential adverbials in (4) in general do not give rise to topic islands (see (iii)
and (iv)), which have been discussed by Lasnik and Saito (1992) and Rochemont (1989).
Opi that
(i) a. This is the tree which just yesterday I had tried to dig up ti with my shovel.
i
b. I asked whati in your opinion Robin gave ti to Lee.
c. Lee forgot which dishesi under normal circumstances you would put ti on the table.
(ii) a. I think that, to Lee, Robin gave a book.
b. Lee said that, on the table, she is going to put the yellow dishes.
c. Robin says that, the birdseed, he is going to put in the shed.
(iii) a. *Whati didk, [to Lee]j, Robin tk give ti tj?
b. *[Which dishes]i arek, [on the table]j, you tk going to put ti tj?
c. *Wherei arek, [the birdseed]j, you tk going to put tj ti?
(iv) a. I asked whati, [to Lee]j, Robin gave ti tj.
b. *Lee forgot [which dishes]i, [on the table]j, you are going to put ti tj.
c. *Robin knows wherei, [the birdseed]j, you are going to put tj ti.
It is not clear whether this is related to the Adverb Effect.
3
See Rochemont and Culicover (1990) for a similar account.
the adverb effect 259

Rizzi stipulates that C, which is normally ‘inert for government’, becomes a


head governor for the subject trace by virtue of this coindexing. Hence the
subject trace does not violate the ECP.
The suspension of the that-t effect in (4) may be taken to be evidence that
between the sentential adverbial and the subject trace there is an empty
category Pol(arity) that is distinct from the complementizer that functions
as the head governor of the subject trace. This is what I argued in Chapter 8.
However, such an analysis turns out to be not entirely unproblematic in terms
of its theoretical consequences. There are unresolved questions about the
status of the adverbial and the status of intermediate traces, which I will
summarize. First, the adverbial is either the specifier of this empty Pol, or it is
adjoined above PolP. The two options are schematized in (6).
(6) a. . . . [CP [Spec NPi] that [PolP SAdv [[Pol e] [IP ti . . . ]]]]
b. . . . [CP [Spec NPi] that [PolP SAdv [PolP [Spec ti0 ] [[Pol ei] [IP ti . . . ]]]]]
The first option is unsatisfactory since there is no apparent agreement
relationship between SAdv and [Pol e]. If (6a) is the structure, it would follow
that if any phrase whatsoever or if no phrase at all appeared in this
position, empty Pol would license the empty subject. We would thus falsely
predict that there are no that-t violations. We would in fact have to require
that empty Pol can appear only if there is an overt Spec, which is an ad
hoc stipulation.a Furthermore, the licensing of the subject trace by the empty
Pol would have to depend strictly on the fact that Pol is empty, since there is
no way to derive the agreement between Pol and the subject trace on this
account, by using Spec-head agreement. In this structure there is nothing in the
specifier of Pol that agrees both with Pol and with the subject trace.
The second option maintains agreement between the empty head and the
subject trace. But it suffers from the problem that now the trace ti0 in [Spec,
PolP] is not properly governed. If we accept the view of Lasnik and Saito
(1984; 1992) that intermediate traces may delete in LF and that ECP applies at
LF, this offending trace does not yield an ECP violation. But then neither does
the offending trace ti0 in (7), which lacks an adjoined adverbial.
(7) . . . [CP [Spec NPi] that [PolP [Spec ti0 ] [[Pol ei] [IP ti . . . ]]]]
Again, we would falsely predict that there is never a that-t violation.
While technical solutions to these problems may well exist, there is an
additional problem that suggests that the general approach is on the wrong
track, regardless of its technical feasibility. Consider the following sentence.

a
This stipulation subsequently evolved into a general principle of Optimality Theoretic
syntax; effectively, structure is not present unless it is needed to host an overt constituent. See
Grimshaw (1997).
260 explaining syntax

(8) Leslie is the person who I said that only then would run for President.
This example appears to be comparable in grammaticality to one that con-
tains a non-negative adverbial.
(9) Leslie is the person who I said that at that time would run for President.
Fronted only then typically causes Negative Inversion. Suppose therefore that
the structure of (8) is as in (10).
(10) . . . whoi [I said [CP that [PolP [only then] [Pol wouldj][IP ti tj run for
President]]]]
The main problem is that it is not clear how it is that ti is properly governed.
Wouldj cannot head-govern ti, since the two are not coindexed. Similar
configurations involving interrogatives are ill-formed, as Rizzi (1990) notes.
(11) a. *whoi didj [IP ti tj sleep] (from Koopman 1983)
b. *[isj [IP ti tj intelligent]] [every man in the room]i
So we don’t really want to re-index wouldj and tj with i in (10).
It might be thought that perhaps the negative adverbial in this case does
not actually trigger Negative Inversion. Note, however, that the negative
adverbial takes sentential scope, since it licenses polarity items.
(12) a. Leslie is the person who I said that at no time would run for any
public office.
b. Robin met the man who Leslie said that only then had seen anything
moving.
c. It is Leslie who I believe that only for one moment had given a damn
about the budget.
Topicalized negative phrases, i.e. those that don’t trigger Negative Inversion,
cannot license polarity items.
(13) a. At no time would Leslie run for any public office.
b. *At no time(,) Leslie would run for any public office.
(14) a. Only then did Leslie see anything moving.
b. *Only then(,) Leslie saw anything moving.
(15) a. Not once had Leslie given a damn about the budget.
b. *Not once(,) Leslie had given a damn about the budget.
the adverb effect 261

So it appears that there really is inversion in (8).4


The grammaticality of (8) thus suggests that the suspension of the that-t
effect when there is a sentential adverbial between that and the subject trace is
not attributable to the presence of an empty functional category adjacent to
the subject.5 This in turn suggests that the that-t effect has nothing to do
with whether or not a subject trace is licensed by a empty complementizer. In
general, ECP approaches to the that-t effect depend on that somehow not
allowing proper government of the subject trace. For example, on Rizzi’s
(1990) account, as we have seen, that does not agree with Spec and hence
the subject trace is not properly head governed. The presence of SAdv would
appear to be irrelevant. In Lasnik and Saito (1984), the presence of both that
and a trace in COMP prevents the trace from c-commanding and thereby
antecedent governing the subject trace.6 Again, an intervening SAdv appears
to be irrelevant to the relationship between the supposed proper governor and
the empty subject.
Sentences of the sort that we have seen, that allow a subject trace to coexist
with that, cast doubt on the correctness of all such accounts. If that blocks
proper government of an empty category in the cases without a SAdv, then it
should do so when there is an SAdv.7 More precisely, regardless of whether the
presence of that blocks antecedent government or head government, it is not
clear how the intervening SAdv could prevent that from blocking antecedent
government or head government.
Thus, the data show that the original Chomsky and Lasnik (1977) proposal
for a that-t filter is empirically more adequate than standard ECP accounts.
The filter does rule out *that-t but not that-SAdv-t. If a filter, or some
mechanism that makes it appear that there is a filter, is responsible for the
ungrammaticality of *that-t, then a subject trace can nevertheless always

4
If there is inversion in (8), we might expect that in the absence of a modal, the sequence
Tense-[NP t]-V- . . . would trigger do-support. Then (i.a) should be grammatical and (i.b) should
be ungrammatical.
(i) a. ??Leslie is the person who I said that only in that election did run for any public office.
b. Leslie is the person who I said that only in that election ran for any public office.
I speculate that the oddness of the first example is due to the fact that the sequence did V with
unstressed did is marginal in PF, regardless of the presence of the empty category. The second
example, while grammatical, has an analysis in which the adverb only in that election appears
between Infl and VP.
5
This negative conclusion is not an argument against the existence of Pol. I am suggesting
that the Adverb Effect simply does not constitute evidence for the existence of Pol.
6
A similar account is proposed by Kayne (1981a).
7
There is no question here of some dialect variation involving the status of the comple-
mentizer that, as suggested by Sobin (1987), since speakers such as myself who have the that-t
effect also accept sentences in which it is suspended.
262 explaining syntax

satisfy ECP. If the ECP must hold for the subject trace, either the ECP doesn’t
involve head government, or the subject trace is head governed.
What are the potential consequences? If head government is not part of
ECP, then we have to worry anew about argument/adjunct differences in
extraction, no small task. If head government is a part of ECP, and if the
subject is head-governed (e.g. by Infl or by C), there are then questions of
what the head governor is and how to account for the Negative Inversion
cases discussed above (see (8)). With each alternative, we are faced with a
different set of complicated consequences that are worth pursuing, but lack of
space prevents me from pursuing them here.
Whether the Chomsky–Lasnik type of filter is the correct account awaits
additional research, as does the question of how an empty subject is licensed.8
In the space remaining, I want to consider a broader range of cases in which
the that-t effect appears, showing that the Adverb Effect applies to comple-
mentizers other than that and to certain parasitic gaps as well as true gaps.

9.2 Other complementizers


There are other elements besides that which introduce a sentential comple-
ment, including for, whether, if, like, and as if. It is well known that the that-t
effect holds for complementizers in general. Let us consider whether all of the
complementizers show the Adverb Effect as well.
For does not show the Adverb Effect, presumably because it must be
adjacent to the NP in order to assign Case, as shown by the ungrammaticality
of *We were hoping for under all circumstances you to stay.
(16) a. We were hoping for you to stay.
b. *Whoi were you hoping for ti to stay?
c. *Whoi were you hoping for under any circumstances ti to stay?
An empty subject produced by extraction cannot be adjacent to whether or if.
(17) *This is a person whoi you might well wonder whether ti would
if
dislike you.
But a sentential adverbial improves acceptability.
(18) This is a person whoi you might well wonder whether under some
if
circumstances ti would dislike you.

8
See Pesetsky (1979) for arguments against the filter analysis of the that-t effect. In Culicover
(1992b) I explore the hypothesis that the filter is actually due to a prosodic constraint (at PF) on
the distribution of stress peaks in the neighborhood of wh-trace.
the adverb effect 263

At worst there is still a weak wh-island violation, due to the extraction over
whether/if, but it is no worse than extraction from object position over
whether/if.
(19) This is a person whoi you might well wonder whether under some
if
circumstances you would dislike ti.
Very much the same judgments hold for the movement of a empty oper-
ator, which we see in the cleft construction.
(20) a. *It is this person Opi that you might well wonder whether ti
if
dislikes you
b. It is this person Opi that you might well wonder whether for all
if
intents and purposes ti dislikes you.
c. It is this person Opi that you might well wonder whether you
if
should pay attention to ti.
Consider next the Stylistic Inversion construction, illustrated in (21).
(21) On the table was put the book with the answers.
If the ‘subject’ gap (that is, the gap to the left of the verb) results from the
movement of the PP we get the same pattern as we get with the movement of a
subject NP.
(22) a. *[On which table]i were you wondering whether ti had been put
if
the books that you had bought?
b. [On which table]i were you wondering whether under certain
if
circumstances ti might have been put the books that you had
bought.b
And similarly for the cleft construction, where the empty operator is linked to
the PP in focus position.
(23) a. *It was on this table Opi that I was wondering whether ti had
if
been sitting [the book with the answers].
b. It was on this table that I was wondering Opi whether at some
if
time or another ti had been sitting [the book with the answers].
Like and as if occur in more restricted contexts, but display the same
behavior. Extraction of a non-subject is possible, extraction of a subject is

b
My original judgment had (22b) as grammatical and (22a) ungrammatical. At this point it
seems to me that the adverb ameliorates (22b) in comparison with (22a), although it is still quite
marginal.
264 explaining syntax

ungrammatical, and the Adverb Effect applies. Note the contrast between (c)
and (d) in the following examples.
(24) a. It seems like you lost your notebook.
b. This is the notebooki Opi that it seems like you lost ti.
c. *This is the person Opi that it seems like ti lost the notebook.
d. This is the person Opi that it seems like just a few minutes ago ti
lost the notebook.
(25) a. It seems as if you lost your notebook.
b. This is the notebooki Opi that it seems as if you lost ti.
c. *This is the person Opi that it seems as if ti lost the notebook.
d. This is the person Opi that it seems as if just a few minutes ago ti
lost the notebook.
The data thus confirm that not only does the that-t effect generalizes to the
full set of complementizers (whatever its ultimate source), but the Adverb
Effect does as well.

9.3 Parasitic gaps


Another kind of gap occurs in the parasitic gap construction, illustrated
in (26) and (27).
(26) Whati did you buy ti after stating clearly that you would make pgi
yourself.
(27) This is the very person whoi you should tell ti whether you might
consult pgi in the future.
Because there is no extraction from the constituent that contains the parasitic
gap, there is no CED violation in (26) and no wh-island violation in (27). The
pattern is well known. In the recent GB literature, the parasitic gap in general
is licensed by an empty operator in the clause (Chomsky 1986) or by direct
linking to the external operator (Frampton 1990).9 It turns out that some
subject parasitic gaps show the Adverb Effect, while others do not. Those that
do show the effect are not immediately dominated by an extraction barrier,
while those that do not are, as we will see.
Note first that the parasitic gap is normally ungrammatical in subject
position.

9
In GPSG and related approaches, parasitic gaps are treated as similar to multiple extraction
from a coordinate structure. See Gazdar et al. (1985).
the adverb effect 265

(28) *Whati did you buy ti after stating clearly that pgi could easily be made
at home?
(29) *This is the very person whoi you should ask ti whether pgi might be
consulting you in the future.
And, as in the extraction cases, a sentential adverb seems to improve matters.
(30) ?Whati did you buy ti after stating clearly that with very little difficulty
pgi could be made at home?
(31) ?This is the very person whoi you should ask ti whether under some
circumstances pgi might be consulting you in the future.
A more deeply embedded parasitic gap behaves in the same way.
(32) a. Whati did you buy ti after stating clearly that it was obvious that you
could make pgi yourself at home?
b. *Whati did you buy ti after stating clearly that it was obvious that pgi
could easily be made at home?
c. ?Whati did you buy ti after stating clearly that it was obvious that
with very little difficulty pgi could be made at home?
(33) a. This is the very person whoi you should tell ti whether you think
that you will consult pgi in the future.
b. *This is the very person whoi you should tell ti whether you think
that pgi should consult you in the future.
c. ?This is the very person whoi you should tell ti whether you think
that under some circumstances pgi should consult you in the future.
We may take these examples as showing that these parasitic gaps, like
some true gaps, are generated by ‘movement’ of a null empty operator
(Chomsky 1986).10
Now let us turn to some cases where the Adverb Effect does not occur.
An empty subject that results from extraction cannot be adjacent to a subor-
dinating conjunction.
(34) *I met a person whoi I went and bought some jewelry just before ti
disappeared without a trace.

10
Notice that the possibility of nominative parasitic gaps calls into question the view that
there is a ‘case compatibility’ condition on the complex chain containing a parasitic gap and its
antecedent. It also undermines the account of Frampton (1990), in which the parasitic gap must
be ‘lexically identified’. Subjects, on Frampton’s analysis, are not lexically identified.
266 explaining syntax

There is both a CED violation and a classical ECP violation here, because of
the extraction of a subject. The presence of an adverb does not appear to
reduce the ungrammaticality of the subject extraction case even slightly.
(35) *I met a person whoi I went and bought some jewelry just before for all
intents and purposes ti disappeared without a trace
When there is no extraction site in the adjunct, but a parasitic gap, there
is presumably no CED violation. But a subject gap is worse than a non-subject
gap and, as before, a sentential adverb does not significantly improve
grammaticality.
(36) a. Whati did you pay for ti just before the store tried to repossess pgi?
b. *Whati did you pay for ti just before pgi was repossessed by the
store?
c. *Whati did you pay for ti just before for all intents and purposes pgi
was repossessed by the store?
These violations in CED configurations fall together with other Subja-
cency-type violations in their resistance to the Adverb Effect. In (37) we see
that extraction from subject position of a relative clause is not improved by
the presence of the adverb.
(37) a. *This is the mani that the theoremj that ti proved tj contains a
serious error.
b. *This is the mani that the theoremj that for all intents and purposes
ti proved tj contains a serious error.
A similar result holds when the gap in the relative clause is a parasitic gap. (38)
shows the grammaticality of parasitic gaps in this construction, while (39)
shows the ungrammaticality of subject parasitic gaps in relative clauses.
(38) Beer is the only beverage whichi everyonej that tj likes pgi praises ti.
(39) *Beer is the only beverage whichi everyonej that pgi makes tj drunk
praises ti.
And (40) shows that a sentential adverb does not improve grammaticality.
(40) *Beer is the only beverage whichi everyonej that under any circum-
stances pgi makes tj drunk praises ti.
Robert D. Levine (p.c.) has pointed out that in these relative clauses there
is crossing dependency regardless of whether there is an adverb. This is
the adverb effect 267

definitely a factor. I noted above that the Adverb Effect holds in a embedded
wh-question headed by whether, regardless of whether there is extraction
(cf. (18)) or a parasitic gap (cf. (33)). But in wh-islands in which something
has been fronted, the crossing dependency has a clear effect, which appears to
overwhelm the Adverb Effect (as shown in the c examples).
(41) a. ??whoi did you ask ti [whoj tj likes pgi]
b. *whoi did you ask ti [whoj pgi likes tj]
c. *whoi did you ask ti [whoj for a very good reason pgi likes tj]
(42) a. ??whati did you find out [whoj tj said ti]
b. *whoi did you find out [whatj ti said tj]
c. *whoi did you find out [whatj for a very good reason ti said tj]
Because the complementizer position that contains wh or a null operator in
the embedded S is adjacent to the subject position, there is no way to
dissociate the effect of crossing dependency from the effect of simply having
a subject trace adjacent to an overt complementizer.
Similar results hold for complex NPs (appositives):
(43) a. Beer is the only beverage whichi the fact that everyone likes pgi fails
to make ti more expensive.
b. *Beer is the only beverage whichi the fact that pgi makes people sick
fails to make ti less expensive.
c. *Beer is the only beverage whichi the fact that for all intents and
purposes pgi makes people sick fails to make ti less expensive,
—and for sentential subjects.
(44) a. Ed is the only politician whoi that everyone dislikes pgi appears to
bother ti.11
b. *Ed is the only politician whoi that pgi really dislikes people appears
to bother ti.
c. *Ed is the only politician whoi that for all intents and purposes pgi
really dislikes people appears to bother ti.
That is, a subject parasitic gap that is maximal in a Subjacency island is as
ungrammatical as a trace in the same position.

11
The acceptability of this sentence is enhanced by putting a brief pause after who and heavy
stress on dislikes and bother.
268 explaining syntax

9.4 Summary
There is a general constraint against the sequence C-t, where C is an overt
complementizer or subordinating conjunction and not a relative/comparative
marker. The Adverb Effect somehow improves the grammaticality of an
empty subject by interposing material between the complementizer and the
subject. There are two types of response to the Adverb Effect. First, the Adverb
Effect applies to empty subjects (true gaps or parasitic gaps) in domains from
which extraction is in principle possible. These are the subjects of that-
complements and the subjects of whether-complements. Second, the Adverb
Effect is neutralized when the empty subject is maximal in a domain from
which extraction is in principle impossible, such as CED configurations,
relative clauses, appositive clauses, and sentential subjects.
The paradox implicit in these observations is the following. On the one
hand it appears that extraction of subjects and parasitic gap licensing
of subjects are subject to the same barriers, even if only the former involves
movement across the extraction barrier. On the other hand, non-subject
parasitic gaps, and parasitic gap subjects of sentential complements, are
licensed in configurations where extraction is impossible.12 So, it appears
that what blocks extraction of subjects blocks parasitic gap subjects, but what
blocks extraction of non-subjects and subjects of sentential complements
does not block comparable parasitic gaps. The paradox lies in the fact
that we are presumably dealing with the same mechanisms of extraction in
all cases, the same mechanism for licensing parasitic gaps in all cases, and the
same characterization of barriers in all cases. Something has to give here.
I leave the problem for future investigation.
In conclusion, returning to the observations that launched this paper,
I have shown that the presence of sentential adverbs suspends the that-t effect,
and more generally, the C-t effect. This result calls into question classical ECP
accounts of this effect, in which that more or less directly blocks proper
government of the empty subject. The evidence suggests that the that-t effect
should be thoroughly reconsidered and the data re-evaluated, and with it the
portion of the theory that incorporates the ECP. The interaction between the
Adverb Effect and parasitic gaps suggests that the Adverb Effect may have
some additional diagnostic properties that will be useful in understanding the
nature of parasitic gaps, extraction, and barriers.

12
These generalizations hold particularly clearly if we exclude wh-islands from consideration
because of the crossing dependency effect noted earlier, and assume that extraction from a wh-
island is in principle possible (and ruled out for other reasons, e.g. Minimality).
10

Stylistic Inversion in English


A reconsideration
(2001)*

Peter W. Culicover and Robert D. Levine

Remarks on Chapter 10
Our goal in this paper was to understand the syntactic structure of Stylistic
Inversion (Into the room walked Sandy). We argue that the phenomenon
described and discussed in the literature as Locative or Stylistic Inversion
in English is actually a conflation of two quite different constructions: on the
one hand, light inversion (LI), in which the postverbal NP element can be
phonologically and structurally extremely simple, possibly consisting of a
single name, and on the other hand heavy inversion (HI), where the post-
verbal element is heavy in the sense of heavy NP shift. We present evidence
that the preverbal PP in LI patterns with subjects but the PP in HI is a
syntactic topic, using a variety of tests which distinguish A-positions from
A0-positions. Other significant differences between HI and LI, such as the
classes of verbs which support these two constructions, respectively, and the
differential behavior of HI and LI with respect to adverbial placement,
provide support for interpreting HI as a case of heavy NP shift applying to
subject constituents.

* [This chapter appeared originally in Natural Language and Linguistic Theory 19: 283–310
(2001). It is reprinted here by permission of Springer. An earlier version was presented at the
Colloque de Syntaxe et Sémantique, University of Paris VII, October 1995. We thank the
participants at that conference for their comments, as well as various other audiences elsewhere
which have provided us with helpful feedback, including the University of Girona. In addition,
we wish to express our appreciation for the care and effort evident in the responses to our paper
of several anonymous referees for NLLT.]
270 explaining syntax

10.1 Introduction
Levin and Rappaport Hovav (1995) have recently argued against the view that
Stylistic Inversion is a diagnostic for unaccusativity.1 Rather, they suggest,
Stylistic Inversion occurs with a wide range of verbs, including unaccusatives,
passives, and—crucially—unergatives. We demonstrate in the following
discussion that the argument of Levin and Rappaport Hovav does not go
through, because they, along with all other students of Stylistic Inversion, fail
to observe that there are actually two Stylistic Inversion constructions in
English. One construction, which we call light inversion (LI), is restricted to
unaccusatives; the other, which we call heavy inversion (HI), is not (we explain
this technology shortly). In general, it has been evidence of HI that has been
used to argue that Stylistic Inversion is not restricted to unaccusatives.
We begin by adducing evidence in }10.2 that in LI the fronted PP is a subject,
i.e. occupies the Spec position associated with IP. In }10.3 we elaborate our
claim that there are two Stylistic Inversion constructions, presenting a wide
range of evidence that Stylistic Inversion with ‘light’ subjects is possible only
when the verb is unaccusative; when the verb is unergative or even transitive,
Stylistic Inversion is possible, but only with a ‘heavy’ subject. The notion of
‘heavy’ here corresponds exactly to the one that is relevant to heavy NP shift
(see Arnold et al. 2000 for detailed discussion of the factors which heaviness
comprises). We assume that in the case of light inversion (LI), the subject is in
situ in VP, while in the case of heavy inversion (HI), the subject appears in
[Spec, IP] at some point in the derivation and subsequently postposes to the
right of VP. For concreteness we assume the following derivations.
(1) LI: [IP e I [VP V NPsubj PP. . . ]] ) [IP PP I [VP V NPsubj t . . . ]]
(2) HI: [IP e I [VP NPsubj V PP. . . ]] ) [IP NPsubj I [VP tsubj V PP. . . ]] )
[IP t0 subj I [VP tsubj V PP. . . ] NPsubj ] ) [IP PP [IP t0 subj I [VP tsubj V tPP . . . ]
NPsubj ]]
We stress at the outset that the main focus of this paper is that there are two
constructions. Space considerations prohibit us from exploring in satisfactory
depth all of the technical questions bearing on the specific details. We do
assume, following proposals of Coopmans (1989) and Hoekstra and Mulder
(1990) among others, that the subject NP in (1) is selected as a sister of the
unaccusative verb. Either it or the PP moves into the higher specifier position,
which we assume to be [Spec, IP]. The apparent optionality of such move-
ment is an obvious problem from the perspective of a theory of movement

1
Throughout we use the term ‘Stylistic Inversion’. Another term commonly found in the
literature is ‘Locative Inversion’.
stylistic inversion in english 271

triggered by the need to discharge features (e.g. Chomsky 1995), but we will
not pursue this aspect of the analysis here.
More controversially, we assume that the sentence-final subject in (2) is
necessarily in [Spec, IP] at some point in the derivation, and that it ends up in
final position through movement. If this NP moves to the right, as we assume
in (2), then this clearly raises important questions in the light of the proposal
of Kayne (1994) that there are no rightward movements. For recent commen-
tary on this as well as other aspects of Kayne’s proposal, see the papers in
Beerman et al. (1997). It is conceivable that the proper derivation of HI does
not involve movement of the subject to the right, but rather movement of
everything else to the left. We will not be able to develop and evaluate here an
analysis along these lines.a
An additional complication is that movement of the NP to the right
leaves a trace that must be licensed. It is generally claimed that in the
configuration that t Infl . . . , the trace of the subject is not licensed (see
e.g. Rizzi 1997, and Chapter 9 above). The question then arises as to why
the subject trace would be licensed in the configuration PP t Infl . . . , as in
(2). Our hypothesis is that the licensing of the subject trace is not a strictly
grammatical phenomenon, but rather a processing effect.b Again, to develop
such an idea in satisfactory depth would take us far afield and away from the
primary focus of the paper.
In the following section, we briefly touch on the claim that LI occurs only
when the verb is unaccusative. The facts turn out not to be entirely simple, but
the generalization can be sustained more or less in this form. We support this
claim by providing a number of syntactic contexts in which LI is impossible,
but where HI yields a structure which creates the illusion of an ordinary
stylistically inverted form. }10.4 summarizes our conclusions and notes sev-
eral important issues which our conclusions raise, but which we have not been
able to address within the confines of this paper.

10.2 PP is a subject
Frequently cited evidence that the PP in Stylistic Inversion is a subject is the
following. First, long extraction of the PP produces a that-t effect, as first
noted in Bresnan (1977); see also Culicover (1993a) and Chapter 9 above. This
generalization extends to other complementizers (e.g. whether-t, extraction
from gerundives) that show a Comp-t effect (Pesetsky 1982). We illustrate the
relevant data in examples (3)–(7):

a
But see Ch. 6 above, which argues against a leftward movement analysis.
b
This conclusion is compatible with the arguments in Ch. 9 above regarding the Adverb Effect.
272 explaining syntax

 that-t:

(3) a. Into the room Terry claims (*that) t walked a bunch of gorillas.
b. Into which room does Terry claim (*that) t walked that bunch of
gorillas?
(4) That bunch of gorillas, Terry claims (*that) t walked into the room.
 whether-t:
(5) a. ?Into this room, Terry wonders whether a bunch of gorillas had
walked t.
b. *Into this room, Terry wonders whether t had walked a bunch of
gorillas.
 gerundive:
(6) a. Terry imagined a bunch of gorillas walking into the room.
b. Into the room Terry imagined a bunch of gorillas walking.
c. Into the room Terry imagined walking [a bunch of gorillas].
d. Into which room did Terry imagine a bunch of gorillas walking?
e. Into which room did Terry imagine walking [a bunch of gorillas].
f. [How many gorillas] did Terry imagine walking into the room?
(7) a. Terry thought about a bunch of gorillas walking into the room.
b. ?Into the room Terry thought about a bunch of gorillas walking.
c. *Into the room Terry thought about walking [a bunch of gorillas].
d. ?Into which room did Terry think about a bunch of gorillas walking?
e. *Into which room did Terry think about walking [a bunch of
gorillas]?
f. *[How many gorillas] did Terry think about walking into the room?
But this argument is far from conclusive, because it crucially assumes that it is
the fronted PP and not the postverbal subject which is responsible for the
trace in subject position. We argue later that the postverbal subjects in such
examples are exclusively heavy, in precisely the sense that distinguishes
constituents eligible to undergo heavy NP shift from those which are not,
and hence must be moved to their surface position from [Spec, IP]. What the
starred examples then show is that that-t is indeed ill-formed, but not that the
extracted PP is linked to the subject trace.
Second, the fronted PP in Stylistic Inversion appears to undergo Raising,
suggesting that it is a subject.
(8) a. A picture of Robin seemed to be hanging on the wall.
b. On the wall seemed to be hanging a picture of Robin.
stylistic inversion in english 273

These sentences can be derived on the approach taken in Culicover and


Rochemont 1990 (R&C), in which the adverbial is topicalized and the com-
bination of modal and main verb or tensed verb is moved into second
position. However, as Levine (1989), Culicover (1992a) (Chapter 8 of this
book), and Coopmans (1992) point out, it is necessary in such cases to extend
RC’s I/V-raising to a less principled restructuring operation, as in (9). In this
derivation, a succession of lexical heads must raise into higher Infl nodes to
form a complex I/V category which can then undergo a subsequent raising to
the highest Infl node as a single unit, finally undergoing moving to C to give
rise to the distinctive inversion. On this analysis, a category [I/V seemed to be
hanging] must be formed, join seem in a higher I/V category, and at last raise
to the matrix Comp to yield [I/V/C seemed to be hanging], taken to be a single
complex head:2
(9) [IP e [Infl seemed] [IP [a picture of Robin] [Infl to] [VP be hanging on the
wall]]]!
[IP e [Infl seemed] [IP [a picture of Robin] [I/V to bei [VP ti hanging on the
wall]]]!
[IP e [Infl seemed] [IP [a picture of Robin] [I/V to bei hangingj ] [VP ti tj on
the wall]]]!
[IP [a picture of Robin]k [Infl seemed] [IP tk [I/V to be hanging] [VP . . . on
the wall]]]!
[IP [a picture of Robin]k [I/V seemed [to be hanging]l] [IP tk tl [VP . . . on
the wall]]]!
[IP [VP . . . on the wall]m [IP [a picture of Robin]k [I/V seemed to be
hanging] [IP tk tl tm ]]]!
[IP [VP . . . on the wall]m [I/V/C seemed to be hanging]h [IP [a picture of
Robin]k th [IP . . . ]]]
The R&C analysis thus becomes distinctly implausible once more complex
structures are involved in the inversion process.
Regardless of the particular details, it turns out that raising cannot be taken
as evidence that the PP is a subject if, as we argue later, the postverbal subject
can arrive in this position by rightward movement from [Spec, IP] in some
cases of Stylistic Inversion. The correct analysis, we claim, is that the post-
verbal subject is what undergoes raising, prior to its movement to the right.
The extracted PP is then in a topic position in these examples. In fact it

2
In order to minimize notational complexity, we replace strings of traces with ellipses where
appropriate.
274 explaining syntax

appears that the PP in the case of LI cannot undergo raising, in spite of the
fact that it is in [Spec, IP].3
Consider the following contrasts.
(10) a. Into the room appeared to be walking slowly a very large caterpillar.
b. Into the room walked Robin slowly.4
c. *Into the room appeared to be walking Robin slowly.
(11) a. Slowly into the room walked Robin boldly.
b. *Slowly into the room appeared to walk Robin boldly.
(12) a. Into the room singing walked Robin slowly.
b. *Into the room singing appeared to walk Robin slowly.
The presence of the adverb after the subject forces the LI analysis. We see that
in this case, a simple PP, or a more complex XP (a V-less VP in the RC
analysis), cannot undergo raising to a higher subject position.
Yet while these two arguments ultimately fail to support the treatment of
PPs in Stylistic Inversion as subjects, there is a significant set of data, reflecting a
systematic differences between A- and A0 -positions, which confirms the subject
status of the preverbal PPs in (light) Stylistic Inversion, and which is not
consistent with PP moving directly into a topic position in these cases, viz. the
fact that true Stylistic Inversion, which we refer to as light inversion (LI), does
not produce Weak Crossover (WCO) effects (just like Raising, and in contrast
with wh-Movement).5 The basic contrast we appeal to here is shown in (13):

3
It is not entirely clear why the PP in LI does not undergo raising. If the PP can raise from VP
to [Spec, IP] in the first place, then we might expect that it would satisfy the conditions for
raising from a lower [Spec, IP] into a higher [Spec, IP]. Thus it appears that the answer to the
question must be a semantic one. If, for example, seem and appear predicate of [Spec, IP], then
only a referential PP could be in this position as the surface subject of seem and appear. Contrast
the following:
(i) a. Under the table is a good place to put the beer.
b. Under the table rolled the beer slowly.
(ii) a. Under the table seems to be a good place to put the beer.
b. *Under the table seemed to roll the beer slowly.
4
Note that on our current analysis, slowly into the room must be a constituent. This contrasts
with the view taken by R&C, which is that slowly into the room is the remnant of a VP from
which the V has raised. This analysis is ruled out on the present account, due to the presence of
the subject NP within VP.
5
Prior claims for the subject status of the subject status of the PP have been made by Bresnan
(1994) and, in somewhat more complex form, in Stowell (1981), where the PP moves through
subject position en route to a final topic position. We stress that, as indicated, we do not take all
the evidence cited in such sources as genuine support for the analysis of PPs as subjects, though
we agree on the conclusion.
stylistic inversion in english 275

(13) a. Whoi appears to hisi mother [ti to be a genius]?


b. ?Whoi is hisi mother grilling ti obsessively? [WCO].
c. ??Whoi does hisi mother think [ti is a genius]? [WCO].
d. ?To whomi did hisi students give ti a teaching award?
While the last three examples here are not altogether impossible, they are far
less well-formed than the raising example in (13a), which is impeccable.6
A strikingly parallel contrast is evident between PPs in Stylistic Inversion on
the one hand and straightforward topicalization on the other, where the PP
contains a quantified NP that is to be interpreted as binding a pronoun in the
postverb NP:
(14) a. *Into every dogi’s cage itsi owner peered. [topicalization, WCO]
b. Into every dogi’s cage peered itsi owner. [Stylistic Inversion, no
WCO]
(15) a. *Itsi owner stood next to none of the winning dogsi. [WCO at LF]
b. Next to none of the winning dogsi stood itsi owner. [Stylistic Inver-
sion, no WCO]
c. *Next to none of the winning dogsi itsi owner stood. [topicalization,
WCO]
d. *?Next to none of the winning dogsi did itsi owner stand. [Negative
inversion, WCO]
(16) a. In every dogi’s cage hung itsi collar.
b. *In every dogi’s cage hung on a hook itsi most attractive and
expensive collar.7
The relevant point is highlighted in (14): the quantified NP within the PP in
the Stylistic Inversion example (14b) can bind the possessive pronoun within

6
Note that whether the fronted wh-element is fronted on its own or is piedpiped, the effect is
the same—as we would expect, given, on the one hand, index percolation at S-structure, and
reconstruction of the preposition back to its D-structure location at LF on the other.
7
An anonymous reader judges examples (16a) and (16b) to be indistinguishable in gram-
maticality. We suspect that the relative acceptability of (16b) is due to a reading of every as each,
which does not produce Weak Crossover violation. Compare (16b) with (i).
(i) In each dogi’s cage itsi most attractive and expensive collar was sitting on a hook.
Replacing every by no should sharpen the judgment for those speakers for whom the difference
between (16a) and (16b) is minimal.
(ii) a. In no dogi’s cage hung itsi collar.
b. *In no dogi’s cage was hanging on a hook itsi most attractive and expensive collar.
c. *In no dogi’s cage itsi most attractive and expensive collar was hanging on a hook.
276 explaining syntax

the VP, parallel to the raised subject who in (13a), while the quantified NP
within the topicalized PP in (14a), like the wh-moved NP in (13b,c), cannot
bind the corresponding pronoun. The difference in the status of the inversion
and topicalization example respectively shown here follows immediately on
the assumption that in (16a) the PP is in a A-position and the subject is in VP,
while in (16b) the PP is topicalized and the subject is linked to [Spec, IP].
A particularly clear demonstration of the contrast we find between the two
kinds of case emerges from the fact that the postverbal quantifier no dog
produces a WCO violation when it binds the pronoun in the PP in A-position
in ??*In itsi cage sat no dogi, just as a quantifier in a direct object produces a
WCO violation when the pronoun is in an NP subject, as in *Itsi master
criticized no dogi. Again, the Stylistic Inversion cases pattern in a fashion
parallel to examples with a quantified subject uncontroversially in [Spec,
IP]. Example (16b), on the other hand, falls together with the case in which
the PP is topicalized and the subject is in [Spec, IP], as in (13d), or e.g. *To
every instructori hisi students gave a teaching award. In this case, as we have
already noted, the PP behaves as though it is reconstructed into the postverbal
position. Compare the examples in (15), which show the same pattern.
The WCO data we have adduced thus points strongly to the conclusion that
the fronted PP is a syntactic surface subject (i.e. is in [Spec, IP]) or, at the very
least, is in a superior A-position with respect to binding, Weak Crossover, and
so on. This hypothesis is consistent with Bresnan’s (1994) proposal that PP is
assigned the SUBJ function under an LFG treatment.

10.3 Light and heavy inversion


We turn now to our claim that there are two types of Stylistic Inversion. To
launch the discussion, we repeat an example cited by Levin and Rappaport
Hovav that is intended to demonstrate that Stylistic Inversion occurs with
unergatives.
(17) In the enclosure, among the chicks, hopped the most recent children of
Nepomuk and Snow White. [M. Benary, Rowan Farm 287; L&RH’s
(78): 257]
It will be noted that the subject in this example is relatively complex. When we
replace it with a less complex simple NP, the sentence becomes a good deal less
natural; it is considerably improved if the NP is made prosodically more
prominent.
The difference between the heavy and light subjects is made still sharper
when we introduce more material into the VP. As noted in Kathol and Levine
(1992), a simple subject NP cannot appear at all after a VP-adverb, but a
focused or heavy NP can, when the verb is unaccusative; thus, compare the ill-
formed (18c) with (18d) and (18e):
stylistic inversion in english 277

(18) a. Into the room walked Robin.


b. Into the room walked Robin carefully.
c. *Into the room walked carefully Robin.
d. Remember Robin? Well, into the room walked carefully . . . robin!
e. Into the room walked carefully the students in the class who had
heard about the social psych experiment that we were about to
perpetrate.
The pattern is comparable to that of heavy NP shift in VP, as illustrated
in (19):
(19) a. Carefully I addressed Robin.
b. I addressed Robin carefully.
c. *I addressed carefully Robin.
d. I addressed carefully . . . robin!
e. I addressed carefully the students in the class who had heard about
the social psych experiment that we were about to perpetrate.
f. I addressed the students in the class who had heard about the social
psych experiment that we were about to perpetrate (extremely)
carefully.
It is important to note that the intended judgments are difficult if not
impossible to make unless the sentences are spoken with the proper inton-
ation. Embedding the examples in context may help to produce the inton-
ation, but to reinforce the point, we illustrate the intonational phrasing of
some of the crucial examples in (19). Specifically, in the HI example (18d),
there are three intonational phrases, one for into the room, one for walked
carefully, and one for Robin.8

(20) H∗ H∗ H∗

L– !H – L–

Into the room walked carefully ROBIN!


This ‘HI-intonation’ is the intonation of example (18d) and all of the
examples of HI in this paper.9 What is crucial here is the segmentation of
the intonational pattern into three phrases; the precise contour of each phrase
may vary to some extent.

8
We are grateful to Mary Beckman for her help in notating the HI intonation.
9
As Mary Beckman has pointed out to us (p.c.), this phrasing correlates rather nicely with
what we claim to be the constituent structure of such examples (see fn. 2 above).
278 explaining syntax

When the verb is unergative, the light NP cannot appear postverbally at all,
while the heavy NP can appear after the VP, but not before it, as we see in (21).
(21) a. *In the room slept Robin.
b. *In the room slept Robin fitfully.
c. *In the room slept fitfully Robin.
d. Remember Robin? Well, in the room slept fitfully . . . robin!
e. In the room slept fitfully the students in the class who had heard
about the social psych experiment that we were about to perpetrate.
f. In the room slept the students in the class who had heard about the
social psych experiment that we were about to perpetrate (very)
fitfully.
Here the crucial contrast is between the (c) example and the (d,e) examples.
Such contrasts follow immediately if a sentence such as (21e) is derived by
movement of the heavy NP subject to the right, as suggested by R&C. If this
approach is correct, we would expect the heavy NP to appear exclusively
external to the VP, since it would then be moving across the entire VP from
[Spec, IP], perhaps adjoined to IP as shown in (22), and the contrast between
(21e) and (21f) indeed shows that this subject must be in a position adjoined
outside the VP slept fitfully, just as the scenario we have outlined requires.

(22) IP

IP NPi

-
Spec I ROBIN

ti Infl VP

. . . Spec -
V

ti - Adjct
V

V fitfully

slept
stylistic inversion in english 279

But then the pattern seen in connection with unaccusative verbs, for example
(18b), where the subject is light and cannot appear in the adjoined position
occupied by the heavy NP in (22), must have a different derivation, one in
which the subject is licensed in a VP-internal position.10 Such a position is
available only to the subject of verbs like walk, given the difference with sleep
that is illustrated here.11,12
Presentational there constructions are standardly taken to illustrate the
existence of a class of unaccusative verbs in English, as in e.g. Coopmans
(1989). But there phenomena also provide independent motivation for
movement of the subject to the right, and for the observation that the
locative PP need not move to the left. Thus, R&C argue that movement of a
heavy NP subject to the right produces presentational there insertion (PTI),
as in (23):
(23) a. There slept fitfully in the next room a group of the students in the
class who had heard about the social psych experiment that we were
about to perpetrate.
b. In the next room there slept fitfully a group of the students in the
class who had heard about the social psych experiment that we were
about to perpetrate.

10
It should be pointed out that when the verb is unaccusative and the subject is heavy, there
is really no way to tell whether the subject is in situ in VP as in the LI construction, or whether it
has moved to the right from [Spec, IP] as in the HI construction. Such a sentence will display all
of the properties of both constructions (since, on our account the conditions for each of the two
homophonous structures will be satisfied) and will therefore have no diagnostic utility vis-à-vis
the proposed analysis.
11
We conjecture that there is a correlation between this structure, in which the unaccusative
subject originates as the direct object of the verb, and the interpretation of ‘movement along a
path’ that is typical of the unaccusative construction. Note that this correlation is construc-
tional, not lexical, given that such an interpretation can be associated with any verb that can be
plausibly used to denote a property of movement along a path:
stumbled
wobbled
(i) Into the room stormed Fred.
blustered
skidded
12
As pointed out by two reviewers, the derivation that we propose for HI raises the question
of how it is that topicalization of PP and movement to the right of the heavy NP can interact. If
the heavy NP moves first, we might expect the resulting structure to be ‘frozen’ (cf. Wexler and
Culicover 1980), blocking subsequent topicalization. But if topicalization applies first, then we
might expect there to be a topic-island effect, blocking subsequent movement to the right of the
heavy subject NP.
As pointed out by Johnson (1985), the evidence that heavy NP shift blocks subsequent
extraction is not conclusive. In the following example, the PP must extract from a VP to
which heavy NP shift has applied.
280 explaining syntax

Our proposal, however, is that the appearance of there in subject position is


not the only way to license such a rightward displacement of the subject;
rather, what must be the case is that the empty subject position be licensed,
either by filling it with there or by preposing the PP. If this proposal is correct,
then the fact that HI appears to be a type of Stylistic Inversion is in part an
illusion. We claim that what seems to be Stylistic Inversion, via heavy NP shift
of a subject, exists in numerous contexts where light inversion is impossible—
a state of affairs making it a priori very unlikely that a single mechanism
subsumes both HI and LI. Consider the following:
(i) Heavy NP shift derives the illusion of Stylistic Inversion in infinitival
complements (as in (24b,c)):
(24) a. I expected Robin to walk into the room.
b. *I expected t to walk Robin into the room/*I expected t to walk into
the room Robin.
c. I expected t to walk into the room . . . robin! [HI intonation]
d. I expected t to walk into the room a group of the students in the class
who had heard about the social psych experiment that we were
about to perpetrate. [HI intonation]
e. I expect t to preach from this pulpit a close associate of the great
Cotton Mather. [HI intonation]
(25) a. *Into the room I expected t to walk Robin.
b. Into the room I expected t to walk . . . robin! [HI intonation]
c. I didn’t expect robin to walk into the room; rather, into the room k
I expected t to walk k a group of the students in the class who had

(i) the refrigerator [into which]i I put tj ti after I got home [all of the beer that I had bought]j
Moreover, compare the following examples:

(ii) a. Whoi did you give all of your money to ti?


b. *Whoi did you give tj to ti [all of your money]j ?
c. the person whoi I mentioned tj to ti over the phone [the decision to close the factory]j
Example (iic) suggests that the problem with (iib) is not strictly speaking a matter of a
grammatical constraint that blocks extraction. Rather, it appears to have to do with the
identification of the trace of the wh-phrase, which is facilitated when there is material interven-
ing between it and the postposed heavy NP. For related ideas, see Fodor (1978) and Jackendoff
and Culicover (1972), reprinted here as Chapter 11.
Regarding the possibility that there is a topic-island effect, again we suggest that in some cases
extraction across a topic presents problems for language processing, particularly when the topic
and the extracted constituent are of the same syntactic category. Extraction of a subject to the
right when there is a PP topic does not present comparable difficulties. In fact, we note that the
Adverb Effect of Culicover (1993b) (see Ch. 9 above) constitutes evidence that a PP topic actually
ameliorates problems caused by extraction of a subject to the left.
stylistic inversion in english 281

heard about the social psych experiment that we were about to


perpetrate. [HI intonation, where k indicates a marked juncture]
d. Q: Who did you expect to preach from this pulpit? A: From this
pulpit I expected t to preach k a close associate of the great Cotton
Mather. [HI intonation]
(ii) Heavy NP shift derives the illusion of Stylistic Inversion in gerundives, as
in (26)–(28).13
(26) a. I was speculating about who would walk into the room. First,
I imagined Robin walking into the room.
b. I imagined into the room t walking *( . . . ) Robin.
c. I was speculating about who would walk into the room, and
I imagined into the room t walking a group of the students in the
class who had heard about the social psych experiment that we were
about to perpetrate. [HI intonation]
d. I was having a fantasy about what had happened in this church, and
I imagined from this pulpit t preaching a close associate of the great
Cotton Mather. [HI intonation]
(27) a. I decided to let no one into the room; in fact, *I prevented t from
walking into the room Robin.
b. I prevented t from walking into the room . . . robin! [HI intonation]
c. I prevented t from walking into the room a group of the students in
the class who had heard about the social psych experiment that we
were about to perpetrate. [HI intonation]

13
The reader may find these data somewhat surprising, in that on our analysis the well-
formed inversion examples are analyzed as instances of PP topicalization. Yet it is well known
that topicalization within nonfinite clauses is typically extremely degraded. But this is far less
true in the case of gerundives than infinitives. Compare e.g.
(i) that solution Robin having already explored t and rejected t, she decided to see if she could
mate in six moves with just the rook and the two pawns.
(ii) *I really want *that solution Robin to explore t thoroughly.
It thus appears that gerundive clauses are rather more tolerant of topicalization than infini-
tive clauses; in fact, this is essentially what we would predict if the Case-assignment proper-
ties of gerundives are as analyzed in Reuland (1983), where the subject of gerunds is governed
by the verbal affix and thus an internal source of Case is available to such subjects, as opposed
to infinitive clauses, whose overt subjects must be in all cases be externally governed in order
to receive Case. We grant however that they are probably not up to the standard of normal
finite clause complementation and might therefore strike some readers as less than fully
natural.
282 explaining syntax

(28) a. I decided to let no one into the room; in fact, *into the room
I prevented t from walking Robin.
b. Into the room I even prevented t from walking . . . robin!
c. Into the room I even prevented t from walking a group of the
students in the class who had heard about the social psych experi-
ment that we were about to perpetrate. [HI intonation]
d. I decided to allow no one to do anything in this church; in fact, from
this pulpit I even prevented t from preaching a close associate of the
great Cotton Mather. [HI intonation]
(iii) Heavy NP shift corresponds to control of PRO by an ‘invisible’ subject
coindexed with the postverbal heavy NP, as in (29d,e).
(29) a. Robin expected PRO to walk into the room.
b. Into the room Robin expected PRO to walk.
c. *Into the room t expected PRO to walk Robin.
d. Into the room t expected PRO to walk . . . robin! [HI intonation]
e. We had set up the protocols perfectly to ‘trick’ the students, so that
into the room t fully expected PRO to walk a group of the students in
the class who had heard about the social psych experiment that we
were about to perpetrate. [HI intonation]
f. Preaching from this pulpit is a great achievement and people come
from near and far hoping to do it. In fact, from this pulpit t expected
PRO to preach a number of close associates of the great Cotton
Mather himself. [HI intonation]
(30) a. Robin avoided PRO walking into the room.
b. Into the room Robin avoided PRO walking.14

14
The following example is ill-formed on normal intonation:
(i) Remember Robin and her fear of windows? *Well, predictably, into the room t avoided
PRO walking Robin.
But note that the following examples appear to be well-formed with the appropriate prosody:
(ii) They said that not everyone would recklessly walk into the room, and, predictably, into the
room t avoided PRO walking . . . robin! [HI intonation]
(iii) We had set up the protocols perfectly to ‘trick’ the students. But for some reason, into the
room t avoided PRO walking a group of the students in the class who had heard about the
social psych experiment that we were about to perpetrate. [HI intonation]
(iv) Preaching from this pulpit was known by many to be terribly unlucky; in fact, from this
pulpit t, studiously avoided PRO preaching any sane associate of Cotton Mather/even the
least superstitious of Cotton Mather’s associates. [HI intonation]
stylistic inversion in english 283

Clearly, it is extremely unlikely that the PP is interpretable as the controller of


PRO in these cases; the simplest assumption is that PRO is somehow con-
trolled by the heavy postverbal NP. But since PRO subjects in complements of
expect, for example, are obligatorily controlled by the subject of expect, it
follows that there is a subject of expect in the examples in (29) coindexed with
the heavy NP, but invisible—exactly what follows from our HI analysis.
(iv) Heavy NP shift derives the illusion that Stylistic Inversion occurs in the
complement of a perception verb, as in (31) and (32).15
(31) a. *We saw into this room run Robin.
b. It was terrible to be in the hotel during the Tolstoy convention; we
actually saw k into this room run k a ravenous horde of angry
Tolstoy scholars. [HI intonation]
c. We heard from this pulpit preach a close associate of Cotton Mather.
[HI intonation]
(32) a. *Into this room we saw run Robin.
b. Into this room k we saw run k a ravenous horde of angry Tolstoy
scholars. [HI intonation]
c. From this pulpit we heard preach a close associate of Cotton Mather.
[HI intonation]
Cf.
(33) We saw go totally ballistic that ravenous horde of angry Tolstoy
scholars. [HI intonation]
(v) Heavy NP shift derives the illusion that the postposed subject of Stylistic
Inversion can be the antecedent of a floated quantifier in the AUX, as for
example in (34):

These and the previous examples raise the obvious question of how the ECP is to be satisfied
with respect to the trace in subject position. The question is actually more complicated, in view
of the problems noted in Culicover (1993b) and Ch. 9 above in accounting for the that-trace
effect in terms of the ECP. For an interesting approach to these problems, see Rizzi (1997); full
discussion of the possible sources of the that-trace effect and their interaction with the structures
we are positing for heavy inversion would take us well beyond the scope of the present paper,
and we leave investigation of this issue for future work.
15
It is not clear to us how the PP gets into topic position in (31b), in view of the ungram-
maticality of (i).
(i) *We saw into the room an angry horde of Tolstoy scholars run.
We leave this question as an unsolved problem. It is possible that the phenomenon seen here is
related to that of French exceptional case marking, where the subject of an infinitival cannot
appear in situ but can be extracted if it is an interrogative or a clitic pronoun (Kayne 1981b).
284 explaining syntax

(34) a. Everyone seemed very hungry today. For example, into the cafeteria
have both gone the two students that I was telling you about. [HI
intonation]
b. From this pulpit have both preached Cotton Mather’s two closest
and most trusted associates. [HI intonation]
By contrast, when the subject is light, as in (35), it cannot be the antecedent of
the floated quantifier, as (36) and (37) illustrate.
(35) a. Both the students have gone into the cafeteria.
b. The students have both gone into the cafeteria.
(36) a. Q: Who went into the cafeteria? A: Into the cafeteria have gone both
(of the) students, I think.
b. Q: Who went into the cafeteria? *A: Into the cafeteria have both
gone the students, I think.
(37) a. Into the mists of history are quickly disappearing both my heroes.
b. *Into the mists of history are both quickly disappearing my heroes.
The evidence thus suggests, once again, that the heavy subject is moving to the
right from [Spec, IP], while the light subject is in situ in VP.
There are several other differences between LI and HI that do not involve
the subject NP directly:
(vi) HI but not LI allows long extraction of the XP from a tensed
complement.
(38) a. *Into the room I claim/believe walked Robin.
b. *Into the room I claim/believe/expect t will walk Robin.
c. *From this pulpit I claim/believe/expect t will preach Robin.
(39) a. Into the room I claim/believe/expect ti will walk . . . robini! [HI
intonation]
b. From this pulpit I claim/believe/expect t will preach (eloquently) . . .
robin! [HI intonation]
(40) a. Into the room I claim/believe/expect ti will walk [a ravenous horde
of angry Tolstoy scholars]i. [HI intonation]
b. From this pulpit I claim/believe/expect ti will preach (incoherently)
[a series of ravenous Tolstoy scholars]i. [HI intonation]
The key point here is the contrast between (38) on the one hand and (39) and
(40) on the other, pointedly demonstrating the difference in extraction
stylistic inversion in english 285

possibilities that hinges on the lightness or heaviness of the postverbal


NP. Moreover, simple heavy NP shift of the subject of the tensed
S unaccompanied by topicalization of the PP is ungrammatical:
claim
(41) a. *I believe [t will walk into the room this minute a horde of angry
expect
Tolstoy scholars].
claim
b. From this pulpit I believe ti will preach . . . robini !
expect
claim
(42) a. I believe [there will walk into the room this minute a horde of
expect
angry Tolstoy scholars].
claim
b. I believe [there will preach from this pulpit all week a series of
expect
increasingly angry Tolstoy scholars].
(vii) Extraction from a subject in the LI (immediate postverbal) position is
better than extraction from a subject in the HI (VP-final) position.16
(43) a. ?Who did you say that into the room walked offensive friends of t
waving rude signs? [HI intonation]
b. *Who did you say that into the room walked waving rude signs
offensive friends of t? [HI intonation]
c. *Who did you say that from this pulpit preached waving rude signs
offensive friends of t?
This difference is consistent with the view that the light subject is in situ in VP,
while the heavy subject is in an adjoined position. It is equally ungrammatical
to extract from a shifted heavy direct object, for example (Wexler and Culi-
cover 1980).

16
Note that if the light subject is in situ, the awkwardness of extracting from it must be due to
the fact that it is the logical and not the syntactic subject of the sentence. This observation recalls
the proposal of Culicover and Wilkins (1984) that extraction from the antecedent of a predicate
diminishes acceptability, regardless of the syntactic configuration in which the antecedent
appears. This specific effect need not, and apparently is not, universal, given that extraction
from postverbal unaccusative subjects is fine in other languages such as German and Italian. But
the language-specific nature of such restrictions is unsurprising and well attested elsewhere;
thus, in English, gaps within subjects are only sanctioned as part of parasitic gap constructions
(modulo a limited class of examples noted in Ross 1967), while in Icelandic such gaps may occur
freely even without coindexed gaps elsewhere in the clause, as noted in Sells (1984).
286 explaining syntax

(44) *Whoi did you say that you saw ti yesterday [offensive friends of tj]i
(vii) HI but not LI (marginally) allows where.
We begin with the general observation that while a relative PP produces
Stylistic Inversion, both relative and interrogative where block inversion, as
illustrated in (45)–(52).
(45) a. the place to which Robin went
b. the place where Robin went
(46) a. the place to which went Robin
b. *the place where went Robin
(47) a. the city in which all my relatives live
b. the city in which live all my relatives
(48) a. the city where all my relatives live
b. *the city where live all my relatives
Similarly for interrogative PP vs. where:
(49) a. To which place did Robin go?
b. Where did Robin go?
(50) a. To which place went Robin?
b. *Where went Robin?
(51) a. In which city do all your relatives live?
b. Where do all your relatives live?
(52) a. In which city live all your relatives?
b. *Where live all your relatives?
These facts appear at first sight to be totally mysterious. Notice, however, that
the ungrammatical examples are greatly improved by introduction of an
adverb, an apparent instance of the Adverb Effect (Chapter 9).
(53) a. This is the city where for the most part live all my relatives.
b. This is the city where for most of the year live all my relatives.
c. ?Leslie asked me where, at that point, had gone the thieves who had
taken my money.
d. ?(Leslie was wondering) where for most of the year live all of your
most favorite relatives.
Significantly, however, there is no improvement unless the postposed subject
is relatively heavy.
stylistic inversion in english 287

(54) a. *This is the city where for the most part lives Robin.
b. *This is the city where for most of the year lives Robin.
c. *(Leslie asked me) where at that point went Robin.
d. *(Leslie was wondering) where for most of the year live your kids.
The efficacy of the Adverb Effect when there is HI, but not when there is LI,
once again strongly suggests that there are two different structures for the two
constructions. More precisely, it appears that the landing site for where in HI
is the complementizer position or [Spec, CP], producing a C-t effect that is
ameliorated by the Adverb Effect. But apparently there is no landing site for
where in LI. If, as we have suggested, LI involves movement of a PP into [Spec,
IP], we can explain the absence of a landing site by positing that where is not a
PP in the required sense, but an NP, since NPs—for reasons that of course
need to be explained—fail to participate in LI.17,18
To sum up, we have observed several distinct syntactic phenomena that
support the claim that there are two SI constructions, HI and LI, notably:
 that it is possible to postpose only a constituent corresponding to a
heavy subject in the cases of various kinds of nonfinite complements
(see (i)), gerundives (see (ii)), configurations of control (see (iii)),
and complements of a perception verb (see (iv));
 that only the heavy antecedent of a floated quantifier can postpose
(see (v));
 that only the heavy subject of an embedded complement can post-
pose when a constituent of a that complement has been fronted to
the matrix (see (vi));

17
Note e.g. that where can be a tough subject, in spite of the fact that PPs are typically ruled
out as subjects of tough predicates: ??*In which room would be easiest to hold the exam?, but
Where would be easiest to hold the exam?
18
We can only consider briefly here the restriction that allows only PPs in [Spec, IP] to
trigger locative inversion. Suppose that NP movement paralleled PP movement to create
inversion structures. Consider the following contrast:
(i) a. Robin ran into the room.
b. Into the room ran Robin.
c. [e] Infl [[ran Robin] into the room]
(ii) a. Robin ran the race.
b. *The race ran Robin. [on the same reading as (ii.a)]
c. [e] Infl [Robin [ran the race]]
The (a) examples show the consequence of moving the subject into [Spec, IP]. The (b)-examples
show what happens when we move the non-subject out of VP into [Spec, IP]. The approximate
underlying structures are given as the (c)-examples, where (i.c) follows (1) in the text.
We assume that a D-structure subject in [Spec, VP] will be assigned an agentive Ł-role.
288 explaining syntax

 that it is less acceptable to extract from a postposed heavy subject than


from the subject of LI (see (vii)).
Furthermore, as we have already pointed out in }10.1, only heavy NPs can
appear in what have standardly been taken to be instances where a PP
inversion subject has been raised. In each case, we have strong evidence
from the possible appearance of adverbial material intervening between the
postposed subject and the verb that these subjects are in adjoined positions
outside the VP, just where heavy-shifted objects appear in HNPS. Thus, all of
these differences follow from the view that HI is derived by a generalization of
heavy NP shift to subjects, while in LI the subject is in situ in VP. It is crucial to
note that the role of heaviness here is not simply that of preventing light
postposed subjects from appearing to the right of adverbial or other material,
for were this the case, it would be possible to interpret what we are calling
heavy inversion simply as the occurrence of heavy NPs in the LI structure,
followed by heavy shift of the postverbal NP. Such an interpretation of the
LI/HI distinction is however precluded by the fact that there can be no light
NP inversion at all in the nonfinite cases noted, which would be inexplicable
under the assumption that the heavy NPs which do appear in these construc-
tions originated in postverbal position, as we are claiming for the light NPs.
That is, on the assumption that both HI and LI correspond to the structure
(55) [IP PPi Infl [VP V NPsubj ti . . . ]]
there would be no way to block the possibility of the PP raising to root subject
position in the LI as well as the HI cases, in spite of the fact that, as noted
above, such apparent ‘raising’ cases are restricted to HI, and similarly for the
various other examples we have given of inversion possibilities allowed only
for HI constructions.19

19
One additional piece of evidence that the postverbal NP is in situ comes from superiority
effects: LI, unlike wh-Movement, does not produce strong superiority violations, as shown in
(i)–(iv):
(i) a. Who did what?
b. *What did who do?
(ii) a. Who came out of which room?
b. *Out of which room did who come?
c. (?)Out of which room came who?
(iii) a. Who did you claim t did what?
b. *What did you claim who did t?
(iv) a. Who did you claim came out of which room?
b. *Out of which room did you claim who came?
c. Out of which room did you claim came who?
(Cf. ?Which man saw who?)
stylistic inversion in english 289

10.4 Conclusion
If the arguments presented in the preceding discussion are on the right track,
it is necessary to reassess the data standardly cited by syntacticians offering
accounts of English Stylistic Inversion, so that such accounts are to be
expected to correctly predict the well-formedness status of inversion just in
case the subject can be light. In support of this claim, we have presented
evidence from Weak Crossover phenomena that preposed PPs in (light)
inversion are genuine subjects, rather than topicalized constituents, and
then provided several strands of evidence, involving intervention effects,
infinitival and gerundive complements and associated control phenomena,
perception verb complementation, quantifier float, and a variety of other
phenomena, which clearly sort the possible appearance of postverbal heavy
NPs from those which allow light NPs. The data which have in the past been
used to argue that fronted PPs are subjects which can undergo raising are a
further case in point, since as we showed earlier these examples are only well-
formed when the postverbal NP is heavy. The simplest account of these
effects, we believe, is to recognize the possibility that subjects as well as objects
can heavy shift. Such a conclusion in turn raises several important theoretical
questions.
 What licenses the trace in subject position when HI heavy-shifts the
subject?
The general impossibility of heavy-shifting subjects of finite clauses would
lead one to conclude that the resulting subject traces are not properly
governed, giving rise to an ECP effect. But as we have noted earlier, reducing
the that-t effect to the ECP is not entirely straightforward (see note 15).
 Why is HI as well as LI incompatible with an overt object?
In the case of LI, it seems reasonable to take this property as a reflection of the
restriction of LI to unaccusative verbs, which of course do not take a direct

Again we see that the PP in Stylistic Inversion displays subject rather than topic or wh-moved
properties: (iic) is essentially comparable in acceptability to (ii.a), while (ii.b), containing a wh-
moved PP, displays the strong unacceptability of a classic superiority effect violation. An
anonymous reader writes that some speakers find it difficult to perceive the intended difference
between (iv.b) and (iv.c), although to our ears it is quite sharp. Let us replace who by how many
people :

(v) a. *Out of which room did you claim how many people came?
b. Out of which room did you claim came how many people?
In our judgment this move strengthens the superiority effect in (iv.b) to the point that the
sentence is virtually uninterpretable, but leaves (iv.c) unchanged.
290 explaining syntax

object in addition to their surface subject; but why should the same restriction
carry over to HI, whose derivational history should make it irrelevant whether
or not an object is present? On the contrary, it is standardly assumed that no
examples of Stylistic Inversion are possible with direct objects:
(56) a. A bunch of teenagers in funny hats had put some gum into the gas
tank of our motorcycle.
b. *Into the gas tank of our motorcycle had put some gum a bunch of
teenagers in funny hats.
We believe that any full discussion of this point must take into account the
fact that, although awkward, there are examples of HI containing direct
objects which we believe to be grammatical:20
(57) a. In the backyard were quietly sunning themselves k a group of the
largest iguanas that had ever been seen in Ohio.
b. The economist predicted that at that precise moment k would turn
the corner k the economics of half a dozen South American
nations.21
c. In the laboratory were dying their various horrible deaths the more
than ten thousand fruit flies that Dr. Zapp had collected in his
garden over the summer.
d. Outside in the still upright hangar were heaving deep sighs of relief
the few remaining pilots who had not been chosen to fly in the worst
hurricane since hurricanes had names.
Our analysis predicts that such examples should exist; what remains at issue is
the distinction between cases such as (57) on the one hand vs. (56b) on the
other. We note that the direct objects in the examples in (57) are not
referential. This fact suggests that what allows such cases is that the verb
phrases are thematically intransitive, i.e. no Ł-role is assigned to the direct
object. Sun oneself means ‘to sun’, turn the corner in this case is an idiom that
means ‘improve’, die a horrible death means die horribly, and heave a sigh

20
As above, we indicate with the notation k a major prosodic juncture. Such junctures
appear in what we take to be acceptable utterances of these examples.
21
Unquestionably, turn the corner is at least semi-idiomatic. Nonetheless, the fact that this
idiomaticity is preserved under passivization (e.g. The corner was finally turned on July 10, when
the Ostrogoth economy finally emerged from its deep recession) indicates that the corner is indeed
an internal syntactic argument of the verb, which can therefore hardly be regarded as exhibiting
intransitive, much less unaccusative argument structure here. Similar observations hold for
(57d), e.g. After the crisis brows were mopped, deep sighs of relief were heaved, and then everyone
got back to work.
stylistic inversion in english 291

means ‘sigh deeply’. Precisely why the absence of an Ł-role corresponding to


the object allows inversion to occur is a question for future research.
 Finally, why does true Stylistic Inversion—that is, LI—seem, beyond its
pragmatically presentational impact, to be restricted to verbs which can
be interpreted as expressing either motion to a point or maintenance of
a particular physical orientation at some location?
It is well beyond the scope of the present paper to provide detailed discussion
of these issues. In view of the evidence presented above, however, we believe
that there is good reason to reassess much of the literature devoted to
inversion constructions, and to treat Stylistic Inversion proper as a far more
restricted phenomenon than it has previously been considered.
This page intentionally left blank
PART III

Computation
This page intentionally left blank
11

A reconsideration of Dative
Movements
(1972)*
Ray Jackendoff and Peter W. Culicover

Remarks on Chapter 11
The first section of this article is a transformational account of dative alternations
to
(V–NP2–NP1) with to and for (V–NP1– –NP2). We provided an account of
f or
the fact that the double object construction that is related to to has different
syntactic and semantic properties than the double object construction that is
related to for. The facts discussed in this section in fact suggest, from a contem-
porary perspective, that these alternations are lexically governed constructions,
in the sense of Goldberg (1995).
The remainder of the article is concerned with the fact that A0 constructions
where the gap is the indirect object are less than fully acceptable. This article
was one of the first in the literature to suggest that data that had been
previously thought of as being the responsibility of syntactic rules actually
reflect aspects of the computation of the meaning of a sentence based on its
form. We hypothesized that identification of the gap corresponding to an A0
filler is triggered only when the syntactic context requires it. Hence in the
sentence *Who did you give a book the processor does not posit a gap between
give and a book, because a book satisfies a requirement of the verb give. The
processor expects a gap after the preposition to, but when the end of the
sentence is reached, there is no to. Hence there is no gap for the filler, which
we propose results in a processing error and the judgment of unacceptability.

* [This chapter appeared originally in Foundations of Language 7: 397–412 (1972). It is


reprinted here by permission of John Benjamins.]
296 explaining syntax

11.1 Introduction
Two well-known transformational relationships are the shifts of indirect
objects with to and for.
(1) Bill gave a book to Mary.
(2) Bill gave Mary a book.
(3) Bill bought a book for Mary.
(4) Bill bought Mary a book.
To explain differences between the two processes, standard analyses of the
dative, for example Fillmore (1965), generally postulate two similar Dative
Movement rules, one of which applies to to-indirect objects and the other to
for-indirect objects. In this paper we will show that this analysis can be improved
somewhat within the framework of traditional transformational rules. How-
ever, not all difficulties can be eliminated in this way. In an effort to further
improve the solution, we will show that, on independent grounds, constraints
imposed by the hearer’s perceptual strategy for interpreting sentences play a part
in the unacceptability of certain constructions. These constraints will then be
used to account for the remaining anomalies in the dative shift paradigms.

11.2 The syntax of indirect objects


Let us first try to arrive at the most general transformational solution for the
indirect object shifts. For the purposes of exposition, we will assume that the
underlying order of objects is direct–indirect, and that the Dative Movement
rules permute the objects and delete the preposition of the indirect object. The
alternative—that the opposite order holds in deep structure, and the preposition
is inserted, or not deleted, just in case the permutation of objects takes place—is
also essentially compatible with the arguments to be presented here. However,
we will give some evidence that to and for are present in the deep structure and
sometimes deleted (not inserted) by the Dative Movement transformations.
Given these assumptions, the customary statement of to-dative is (5).
(5) X - V - NP - to - NP - Y
1 2 3 4 5 6
⇒ 1–2–5–3–ؖؖ6
Now let us consider the ordering between Passive and to-dative. If Passive
precedes to-dative it is possible to derive the following sentences.
(6) John gave a book to Mary. (no rules apply)
(7) A book was given to Mary by John. (Passive only)
a reconsideration of dative movements 297

(8) John gave Mary a book. (to-dative only)


We observe that with this ordering we cannot derive (9)—
(9) A book was given Mary by John.
—unless to-dative can apply over a null environment, i.e. unless we restate to-
dative as (10). Notice that (10) contains an optional item in its structural
description, so that it would be possible for the indirect object (item 5) to
permute, as it were, around nothing.
(10) X - V - (NP) - to - NP - Y
1 2 (3) 4 5 6
⇒ 1 – 2 – 5 – (3) – Ø – Ø – 6
But even granting that to-dative could apply in a null environment, we still
could not derive (11), below, with the ordering Passive > Dative.
(11) Mary was given a book by John.
This follows from two facts: (1) that Passive can only front the NP next to the
verb1 and (2) that the NP a book must be next to the verb at the time Passive
applies, and the NP Mary cannot.
If we assume the ordering to be to-dative > Passive, we find that we are then
able to derive the following sentences.
(12) John gave a book to Mary. (no rules apply)
(13) John gave Mary a book. (to-dative only applies)
(14) A book was given to Mary by John. (Passive only applies)
(15) Mary was given a book by John. (to-dative, then Passive)
We observe that with this ordering we cannot derive (9) either. The to can
delete only if Mary is moved next to the verb by to-dative; then a book cannot
passivize, since it no longer immediately follows the verb.a

1
This is shown by examples like the following (pointed out by Klima):
(i) a. This table has been eaten at by many famous people.
b. *This table has been eaten food at by many famous people.
c. Food has been eaten at this table by many famous people.
(ii) a. This violin was once played by Heifetz.
b. *This violin has been played the Kreutzer Sonata on by Heifetz.
c. The Kreutzer Sonata has been played on this violin by Heifetz.
Only when there is no direct object intervening between the prepositional phrase and the verb
can the object of the preposition undergo the passive. Cf. also examples (71)–(78).
a
Sentences such (9) are often said to be ungrammatical, although preferable when the direct
object is pronominal, e.g. A book was given her by John. If (9) is ruled out, then the discussion to
298 explaining syntax

Thus with either ordering of to-dative and Passive we are unable to generate
the full range of sentences. To avoid this difficulty we might resort to a
solution like Fillmore’s, involving an extension of the environment of Passive.
Fillmore constructs the rules in such a way that the sequence V + to-NP can be
considered a verb for the sake of the Passive transformation. In this way the
NP a book can be considered to be next to the verb in (13), so that it can be
moved into subject position by Passive, forming (9). A more satisfying
solution will be proposed later on.
Now assume that Passive has been altered in a suitable way so that we can
get the full range of dative and passive sentences (9) and (12)–(15). Now let us
question the direct and indirect objects in these sentences, using the rule of
wh-Movement.
(16) What did John give to Mary?
(17) Whom did John give a book to? (from (12))
(18) What did John give Mary?
(19) *Whom did John give a book? (from (13))
(20) What was given to Mary by John?
(21) Whom was a book given to by John? (from (14))
(22) What was Mary given by John?
(23) Who was given a book by John? (from (15))
(24) What was given Mary by John?
(25) *Whom was a book given by John? (from (9))
How can we prevent the perfectly general rule of wh-preposing from produ-
cing the questionable sentences (19) and (25)? Fillmore utilizes the rather
artificial device of prohibiting the transformation of wh-Attachment from
applying to NPs that are to-indirect objects positioned next to the verb. As
Kuroda (1968) points out, however, wh-Attachment is not a transformation in
post-Aspects generative theory; rather, the base generates the wh-marker with
the noun phrase it is associated with at the surface. If this is the case,
Fillmore’s solution can no longer be stated.
Furthermore, and more important, Kuroda shows that the ungrammatic-
ality of (19) and (25) is not due to their being questions, since the non-preposed

follow can be considerably simplified. In fact, since passive applies to the two complement
structures of the VP, we may take give Mary a book to be an instance of the dative construction,
alternating with give a book to Mary. The corresponding passive construction maps the first
postverbal argument to the syntactic Subject, yielding (14) and (15).
a reconsideration of dative movements 299

versions (26) and (27) are as acceptable as any other wh-question in which the
wh is not preposed.
(26) John gave whom a book?
(27) A book was given whom by John?

Rather, the ungrammaticality seems to be due to the preposing operation


itself. Other preposing transformations, such as topicalization and clefting,
produce similar contrasts.
(28) Only to me would he give an umbrella.
(29) *Only me would he give an umbrella.
(30) It is to me that he gave an umbrella.
(31) *It is me that he gave an umbrella.
(= Kuroda’s (25)–(28))
Thus we see that the restriction on preposing is independent of the particular
rule under consideration. This general restriction on preposing is still
unexplained.
Turning to for-dative, we find a similar situation, though not quite as bad.
If for-dative follows Passive, the following sentences are derivable.
(32) John bought a new wardrobe for Mary.
(33) A new wardrobe was bought for Mary by John.
(34) John bought Mary a new wardrobe.
This ordering correctly predicts that (35) is ungrammatical.
(35) *A new wardrobe was bought Mary by John.b
However, it does not enable us to derive the grammatical sentence (36).
(36) Mary was bought a new wardrobe by John.2
By ordering for-dative before Passive we are able to generate the full paradigm
(32)–(34), (36), since (36) will then result from the successive application of

b
(35) is the counterpart of (9).
2
Passives of the form (36) seem to vary in acceptability, depending on a number of factors, some
of which we can make explicit. There are two semantically distinct for-datives, only one of which
undergoes passive. The first type is exemplified in (32)–(36), where it can be said that as a result of the
event Mary has a new wardrobe. The other type is exemplified by John played a tune for Mary, which
undergoes the dative shift but not the passive analogous to (36): ?*Mary was played a tune by John.
This event does not have as one of its results that Mary has a tune. There also seem to be some factors
of length involved: Mary was bought a book by John seems somewhat less acceptable than (36).
300 explaining syntax

the two rules. The difference between the to-dative and the for-dative lies in
the fact that the passive of the direct object with the indirect object prepos-
ition deleted is grammatical for to-dative (9), but ungrammatical for for-
dative (35). It is the form (9) which requires an alteration to the passive
transformation in Fillmore’s solution.
As in the case of to-dative, questioning all combinations of the for-dative
paradigm produces some questionable sentences.
(37) What did John buy for Mary?
(38) Whom did John buy a book for? (from (32))
(39) What did John buy Mary?
(40) *Whom did John buy a book? (from (33))
(41) What was bought for Mary by John?
(42) Whom was a book bought for by John? (from (34))
(43) ?What was Mary bought by John?c
(44) Who was bought a book by John? (from (36))
Of course, the questions formed from the ungrammatical (35) are ungram-
matical too.
(45) *What was bought Mary by John?
(46) *Whom was a book bought by John?
Still, the problem of accounting for the ungrammaticality of (19), (25), (40),
and (43) remains.
Thus far we have improved on Fillmore’s solution to the problems arising
from the interaction between the dative shifts and Passive. It still remains to
give some account of the restrictions on wh-Movement. One could retain
Kuroda’s solution, in which an ad hoc restriction is placed on preposing
transformations operating on certain dative constructions, and still have a
grammar superior to Fillmore’s with respect to the dative paradigms.d

c
As far as I know the unacceptability of this sentence has not been discussed in the
subsequent literature, and remains a puzzle, in view of the acceptability of the corresponding
What was Mary given by John?
d
I have omitted here an analysis that at tempts to conflate the derivation of the dative
constructions with other cases that involve PP–PP complements. The analysis assumes reorder-
ing of the PP complements and lexically governed deletion of the preposition in the first PP. In
contemporary constructional terms it is far more straightforward to specify that particular verbs
select NP–PP or PP–PP complements, where the PPs are headed by specific prepositions.
a reconsideration of dative movements 301

Unfortunately, any attempt to restore generality to wh-Movement in terms of


rule-ordering arguments and statements of rules fails. For example, we cannot
order wh-Movement before Preposition Deletion, since it must follow Passive in
order to generate Who was John hit by? Therefore we will present a solution
which may seem rather bizarre, one based on a theory of perceptual strategy.

11.3 Perceptual strategy constraints on acceptability3


Let us suppose the strategy for interpreting a sentence involves making
hypotheses about the deep structure of the sentence on the basis of the
amount of the sentence heard up to a given point. The essential task is to
find out which constituents have been moved out of their deep structure
position and in what deep structure position they originated.e The method is
to notice concatenations that could not occur in the possible deep structures
predicted by the base rules and by the subcategorization and selectional
restrictions of verbs in the sentence.
A rather clear case of perceptual strategy influencing interpretation is the
unusual restriction on the rule Extraposition from NP.4 This rule optionally
moves a relative clause to the end of a sentence to form, for example, (48)
from (47) and (50) from (49).
(47) A man [who was from Philadelphia] came in.
(48) A man came in [who was from Philadelphia].
(49) He let the cats [which were meowing] out.
(50) He let the cats out [which were meowing].
Assuming that this rule can operate freely, moving relative clauses from the
subject to the end of a sentence, we could expect (51) to optionally become (52).
(51) The girl [who is bold] likes the boy.
(52) The girl likes the boy [who is bold].
We could thus incorrectly predict that (52) is ambiguous, having a reading
where who is bold applies to the girl as well as the obvious one where it applies
to the boy. We conclude that Extraposition from NP could not have taken
place in (52).

3
The proposals of this section are similar to those of Bever (1970) and Klima (1970), but were
arrived at independently.
e
In contemporary terms the task would be better characterized in terms of assigning
thematic roles to the arguments.
4
This restriction is most extensively discussed in Keyser (1967). Other properties of the rule
are discussed in Ross (1967) and Akmajian (1970).
302 explaining syntax

What is the exact form of the restriction on Extraposition from NP? From
the examples so far, the condition seems to be that the relative clause cannot
cross over another NP. This condition in itself is rather strange. But in fact the
condition must be more complicated than that. Consider the following cases,
which vary from plausible to very bad.
(53) ?The man went to Philadelphia [who loves Mary].
(54) ?*The man kicked the snail [who loves Mary].
[relative clause on man]
(55) ?*The man hit John [who loves Mary].
(56) ?John hit the man in the stomach [who loves Mary].
(57) *The man hit John in the stomach [who loves Mary].
The generalization seems to be that acceptability is inversely correlated with
the plausibility of generating the final relative clause with another, nearer
NP. This is certainly a very strange condition to put on a transformation,
prohibiting it just in case it would produce an ambiguity. It runs counter to all
the usual notions of how structural ambiguities are developed by the grammar.
In terms of a theory of perceptual strategy, this restriction makes a certain
amount of sense. Consider the interpretation of (48). At the stage at which a
man came in has been heard, it is known that the next word to follow will not
be related to in in any way. Who signals the beginning of a relative clause, since
we are not currently in the middle of an NP, and an appropriate NP must be
found for it to apply to. The only eligible one in the sentence is the subject, so
the correct interpretation results. In (52), however, boy is not necessarily the
end of its NP; in particular, a relative clause is the possible continuation of the
NP. Therefore, who occasions no surprise: it is automatically put with boy, and
is given no chance to associate with girl.
Now consider the intermediate cases (53)–(57). In (53), the proper noun
Philadelphia leaves open the possibility of an appositive relative following it,
and so the relative pronoun who to some extent confirms this possibility. On the
other hand, who is an inappropriate relative pronoun for Philadelphia, and the
lack of a pause means that the relative clause cannot be an appositive, so after a
moment’s confusion the interpreter looks for another source for the relative. In
(54) and (55) the plausibility of the relative going with the final NP is higher
than in (53): (54) is only a violation of gender, and (55) only lacks a pause for the
relative to be grammatically associated with John as an appositive. Therefore the
tendency to interpret the relative as semigrammatically associated with the final
NP is stronger, and so attaching the relative to the subject is less plausible.
a reconsideration of dative movements 303

In (56) and (57), the semantic plausibility of a relative clause modifying


stomach is very low (try to think of a good one), and the gender disagreement
is readily apparent. Therefore another source for the relative is considered.
Apparently the next nearest NP is tried first, since the difference in accept-
ability between (56) and (57) can then be explained by the fact that appositive
relatives do not extrapose; cf. (58).
(58) *John came in, who had been sick a long time.
It is true that this argument involves a great deal of hand-waving, but the
intent is clear: the constraint on extraposition from NP, which is very awk-
ward to state in terms of conditions on application of transformations,
becomes much clearer in terms of the difficulty of correctly interpreting the
resulting strings. By permitting problems in string interpretation as possible
sources of ungrammaticality, we can eliminate this otherwise unexplained
constraint. However, we must leave open for the present the question of how
to incorporate this innovation into the theory of grammar.f
For further applications of this theory of perceptual strategy, consider the
following three sentences, all of which sound somewhat curious.
(59) I gave the man that a dog bit a bandage.
(60) The professor that the student believed was arrested by the police died.
(61) Have you seen the man who I want to leave the room in a hurry?5
In actual speech these examples can be straightened out by the judicious use
of pauses at sentence boundaries. Still, they sound a bit stranger than, for
example, (62)–(65).
(62) I told the man that a dog bit a bandage.
(63) The professor that the student liked was arrested by the police.
(64) Have you seen the man who I want to hit?
(65) Have you seen the man who I believed to be sick?
The difficulties in sentences (59)–(61) have to do with the interpretation of the
relative clauses. According to our theory of perceptual strategy, correctly
interpreting a relative clause poses two problems: finding the deep structure
position of the preposed wh-word, and the finding the end of the relative
clause and resumption of the main sentence.

f
For an analysis in which the extraposed clause is interpreted in its surface position, see Ch. 6
above and Culicover and Rochemont (1990).
5
This example was pointed out to us by John Limber.
304 explaining syntax

(59) creates difficulties in both of these respects at once. Many people try
to interpret it as they would (62), with a bandage as part of the subordinate
clause. This is because bit a bandage is an actually occurring sequence in a
single clause: hence the gap into which the relative pronoun may fit is not
immediately apparent. But if a bandage is part of the VP, there is no place in
the relative clause for the relative pronoun, since bite can only take a single
object. Furthermore, if a bandage is part of the relative clause, the main
verb give will not have been provided with its full range of complements.
Thus the logical decision to put a bandage in the relative clause results in
confusion.
(60) is an example of the opposite problem of interpretation: the end of the
relative clause is guessed to be sooner than it actually is. The critical part of
the sentence is the sequence believed was, which does not occur unless an NP
has been moved away, and which therefore signals that a transformation has
applied. But apparently the first hypothesis is that the sentence will be of the
same form as (63), with was as the verb of the main sentence. Thus the real
main verb, died, comes as a surprise.
In (61) the problem is again that of finding an appropriate place for the
relative pronoun, which has been fronted from the position after want. Since
want to leave is a permissible string in a non-relativized sentence. I, rather
than the relative pronoun, is interpreted as the subject of leave. The gap for
the relative pronoun to fit into is assumed to be further to the right, as in (64).
Then, when no gap occurs, the usual confusion results. Note that (65)
presents no such problem, since believe to be sick does not occur unless an
NP has been moved away from after believe.
To see more clearly that an appeal to perceptual strategy is useful here,
consider how the distinction in acceptability between (60) and (64) would
have to be captured if it were a restriction on transformations. The wh-
preposing rule would have to be prohibited from operating in a very particu-
lar situation—when it is trying to prepose an NP from the position circled in
(66), just in case the preceding verb permits complement subject deletion to
take place (the difference between want and believe).

(66) S

NP VP

V NP VP
a reconsideration of dative movements 305

Like the transformational constraint needed for extraposition from NP, this
restriction seems highly unlikely. A solution employing perceptual strategy
seems to give a much more motivated account of the restriction.g

11.4 Application of perceptual strategy to dative


movements
The restriction on wh-preposing out of dative constructions is almost as odd
as that needed to prevent (61). Furthermore (although this is not a particu-
larly strong argument), the offending sentences have the same ring of strange-
ness about them as (61). This suggests using perceptual strategy constraints to
account for anomalies in the dative paradigms.
Under this hypothesis, let us work through the operation of finding the
deep structures of the various questions associated with the dative construc-
tions, which we repeat here for convenience.
(16) What did John give to Mary?
(17) Whom did John give a book to?
(18) What did John give Mary?
(19) *Whom did John give a book?
(20) What was given to Mary by John?
(21) Whom was a book given to by John?
(22) What was Mary given by John?
(23) Who was given a book by John?
(24) What was given Mary by John?
(25) *Whom was a book given by John?
(37) What did John buy for Mary?
(38) Whom did John buy a book for?
(39) What did John buy Mary?
(40) *Whom did John buy a book?
(41) What was bought for Mary by John?

g
Recent computational work attributes certain judgments of unacceptability to ‘surprisal’,
i.e. the predictability of the continuation of a (parsed) string of words. See e.g. Hale (2003) and
Levy (2008).
306 explaining syntax

(42) Whom was a book bought for by John?


(43) ?What was Mary bought by John?
(44) Who was bought a book by John?
In each sentence, the presence of the wh-word signals that the interpreter of
the sentence must look for a gap into which the wh-word can fit. In (16), to
immediately follows give, which can never happen in a declarative sentence.
One can thus conclude that what must have been fronted from the between
these two words. In (18), give Mary is a permissible sequence in a declarative
sentence, so what need not have come from between them. In fact, if it had,
the impossible string *give something Mary would have to be the correspond-
ing declarative VP form. However, nothing follows Mary, and the verb give
requires two objects. Give Mary something is a possible declarative VP form,
so one can conclude that what has been fronted from the end of the sentence.
In (17), give a book is a possible string in a declarative VP, and the bare
preposition at the end shows that whom must have come from the end of the
sentence. In (19), again give a book is a possible string, and so no gap is noticed
at the stage where Whom did John give a book has been perceived.h At this
stage, the listener’s hypothesis is that whom has been fronted from the end;
hence the preposition to is expected to follow book, as in (17). Imagine the
hearer’s consternation when the expected to does not arrive. The sentence is
therefore judged unacceptable, since it is expected to be (17) and then fails to
conform to that expectation. A similar analysis can be constructed for the
parallel cases (37)–(40).
In (20), the gap is noticed as in (16). In (21), the sequence to by signals the
gap. In (23), the who comes from surface subject position, as can be seen from
the fact that there is no NP intervening between the auxiliary and the main
verb. In (24), what is seen to be the subject for the same reason. Similar
analyses can be performed for (41)–(44). This leaves (22), (25), and the
strangeness of (43) to be accounted for.
Let us suppose that one strategy used in reconstructing underlying
structures is that NP positions that one can be sure of (such as the postverbal
position of the surface object of the passive) are established first, then
NPs which have been moved away from arbitrary positions (such as NPs
fronted by wh-Fronting) are fitted into remaining gaps. Notice that this
hypothesis is not consistent with the hypothesis that one reconstructs deep
structures by doing the transformations in reverse, in reconstruction wh-

h
A more contemporary characterization of what happens here is that the gap is posited, but
is immediately suppressed by the presence of the following NP that can serve as the direct object.
a reconsideration of dative movements 307

preposing would precede Passive, rather than the other way around, as we are
proposing.
In (22), then, Mary will be recognized as coming from a position directly
after the verb. This yields the string (intermediate in the process of interpret-
ation) give Mary. Since give requires two objects before by occurs, the gap is
recognized to be after Mary as soon as by is perceived. In (25), undoing the
passive gives the string give a book. As in (19), this is a possible string, so it is
expected that to will follow. When instead by follows, a gap is recognized, but
it is not the expected gap, and hence the sentence is judged unacceptable.
This leaves the case of (43), which we frankly find to be a mystery. Note the
slight unnaturalness of the passive Mary was bought a book, which may be due
to the passivization of an optional for-object (to-objects are obligatory),
leaving no trace of the characteristic preposition.6 Who was bought a book
seems similar in acceptability. (43) is somewhat worse, perhaps because at the
stage at which only what was Mary has been perceived, the most suggestive
hypothesis about the structure is a continuation along the lines of what was
Mary doing. This may interact with the slight unnaturalness of the actual
declarative form to produce some confusion.
None of these problems concerning questions arise with other verbs that
permute objects, where both objects are PPs.
(67) Who did you speak to about the movie?
(68) What did you speak to Harry about?
(69) Who did you speak about the movie to?
(70) What did you speak about to Harry?
(71) Who was the movie spoken about to?
(72) What was Harry spoken to about?
(73) Who did he credit with the discovery?
(74) What did he credit Bill with?
(75) Who did he credit the discovery to?
(76) What did he credit to Bill?
(77) Who was the discovery credited to?
(78) What was Bill credited with?

6
Cf. also fn. 2 above in this connection.
308 explaining syntax

In the first set of examples there is always a bare preposition signaling the gap.
In the second, there is either a bare preposition or a string V+P, which also
signals a gap, since in declarative form the verb is always followed by an
NP. Thus these cases differ from the to- and for-dative in that their indirect
objects leave noticeable gaps when they are fronted from postverbal position.
This is not the case with true to- and for-dative indirect objects.
This approach explains nicely Kuroda’s observation that the restriction has
to do with fronting, not with the process of questioning. In echo questions,
where the wh-phrase is not moved from its position, it is obvious that no
problem will arise in finding where it came from. Likewise, it explains why
corresponding sentences are bad in the topicalized and cleft instructions
(28)–(31). Again the difficulty lies in finding the gap in the VP from which
the preposed element was removed, and the same problem of being unable to
correctly detect the gap arises in case the indirect object has been fronted or
deleted from postverbal position. An explanation in terms of perceptual
strategy thus accounts for the fact that three independent rules have identical
strange restrictions.
The fact that our approach to these problems appeals to performance
should not be interpreted as sweeping the problem under the rug. A general
solution within the bounds of statement and ordering of transformations
seems out of the question; if we wish to preserve the generality of the
transformations, we must appeal elsewhere. The fledgling theory of percep-
tual strategy we have presented seems to be in general agreement with the
models proposed in Fodor and Garrett (1967) and Bever (1970), developed
from the results of experimental work.
Nor should the fact that certain sentences appear to be rejected on grounds
of performance be interpreted as an indication that the competence/perform-
ance distinction ought to be abandoned. The distinction between the rules of
the grammar and how the rules are used by the speaker or hearer to create or
interpret sentences is still scrupulously maintained. All that is changed is that
it is no longer so obvious what sentences are to be generated by the rules: we
cannot rely entirely on intuition to determine whether an unacceptable
sentence is grammatical or not (using ‘grammatical’ in the technical sense,
‘generated by the grammar’). Though this makes the linguistic theory of
Aspects (Chomsky 1965) more difficult to apply in practice, it does not by
any means make it conceptually unsound. Rather, the appeal to performance
made here is precisely parallel to the case of center-embedded sentences
discussed in Aspects, chapter 1, section 2, which is used to illuminate and
sharpen the competence/performance distinction.
12

Markedness, antisymmetry,
and complexity of
constructions
(2003)*
Peter W. Culicover and Andrzej Nowak

Remarks on Chapter 12
Our concern in this chapter is with the interactions between language change,
language acquisition, markedness, and computational complexity of map-
pings between grammatical representations. We demonstrate through a com-
putational simulation of language change that markedness can produce ‘gaps’
in the distribution of combinations of linguistic features. Certain combin-
ations will not occur, simply because there are competing combinations that
are computationally less complex. We argue that one contributor to marked-
ness in this sense is the degree of the transparency of the mapping between
superficial syntactic structure and conceptual structure. We develop a rough
measure of complexity that takes into account the extent to which the syn-
tactic structure involves stretching and twisting of the relations that hold in
conceptual structure, and we show how it gives the right results in a number
of specific cases.
This work was followed up in Culicover and Nowak (2003) and more
recently Culicover (2013). It elaborates on the view that much of the explan-
ation of what constitutes the syntax of a language, and syntax in general, derives
from the properties of the computation of the form–meaning correspondence,
viewed in terms of the reduction or avoidance of complexity.

* [This chapter appeared originally in Pierre Pica and Johan Rooryk (eds), Language
Variation Yearbook, Vol.2 (2002). It is reprinted here by permission of John Benjamins.]
310 explaining syntax

12.1 Introduction
One of the strongest arguments for the thesis that the human mind possesses
a Universal Grammar (UG) with specific grammatical properties is that
languages do not appear to have arbitrary and uncorrelated properties.
What we find, rather, is that the properties of languages cluster, and that
there are asymmetries among the logical possibilities. For example, VSO
languages are always prepositional, and SOV languages are usually postpos-
itional (Greenberg 1963: 78–9). There are languages that express wh-questions
using leftward movement to a peripheral position in the clause, and there are
languages that express wh-questions without overt movement. But there do
not appear to be languages that express wh-questions using rightward move-
ment to a peripheral position in the clause.
It is natural, given observations such as these, to posit that they are direct
reflections of UG, which the language learner draws upon in choosing or
constructing grammars. However, there are two other possibilities that have
to be ruled out before such a conclusion can be drawn. First, the clustering of
properties and the absence of certain logical possibilities may be due to social
forces. In such a case we would not expect to find the same asymmetries in
different parts of the world where languages are not genetically related or in
contact. Second, these asymmetries may be due to the interaction between the
grammatical or processing complexity of certain constructions and social
forces. On this view, all of the logical possibilities are linguistic possibilities,
but those that are more complex tend to lose out over time to their less
complex competitors as linguistic knowledge is transmitted from generation
to generation in a network of social interactions.
The intention of this paper is to explore and make somewhat more precise
these scenarios. We make the background assumption that language change
occurs in part as the consequence of different learners being exposed to
different evidence regarding the precise grammar of the language that they
are to learn. Following the original insight of Chomsky (1965), we assume that
learners chose the most economical grammar consistent with their experience,
and even overlook counterevidence to the most economical solution unless the
counterevidence is particularly robust. It is reasonable to understand economy
in terms of the complexity of the grammatical representation that is to be
learned (although there are many other ideas around). To the extent that
learners reduce complexity we will then expect language change to reflect this
preference in the relative ubiquity of certain grammatical devices compared
with others, and even in the appearance of universals (Briscoe 2000).
We will begin by illustrating the ways in which language change gives rise to
correlations of properties; it will be demonstrated that some combinations are
markedness and antisymmetry 311

excluded purely as a consequence of social factors that have nothing to do


with their linguistic content. We then note that if there is a bias in favor of
some combination of properties, this results in a uniform pattern that cannot
be explained in purely social terms.
This observation takes us to a consideration of the factors that determine
complexity in this context. We suggest, following up on an idea in Culicover
(1999) based on Hawkins (1994), that the complexity in this case is that of the
mapping between strings of words and conceptual structure (in the sense of
Jackendoff 1990). In a fairly transparent sense such mappings define con-
structions, and the relative generality of a construction is determined by its
grammatical complexity.1

12.2 Change and clustering


Imagine a society of speakers of a language, some of them competent speakers
and some of them learners. Each speaker interacts with each of the other
speakers with some frequency, in part as a function of the distances between
them. (Distance may be physical and/or social.) As a consequence of drift,
noise in the information channels, conscious innovation, and contact with
other languages, there will be linguistic diversity in this society. Some learners
may have considerable experience with diversity, others may have very little.
Over the course of generations, learners interact with speakers whose lan-
guage is determined by interactions with similar speakers, so that there is a
consistency of grammar that may distinguish the social group from another,
more distant group.

12.2.1 The simulation model


In order to test the general properties of the interaction between language
learning and language change, we developed a simulation model of social
interaction based on the theory of social impact due to Latané and computa-
tional simulations based on this theory developed at the Center for Complex
Systems at the Institute for Social Studies of the University of Warsaw by
Andrzej Nowak and his colleagues.2 Our intuition was that the transmission
and clustering of linguistic properties though social contact should display

1
This notion of construction is related to that of Construction Grammar (see e.g. Goldberg
1995), in that we assume, with Jackendff (1990), that grammatical knowledge consists of syntax–
semantics correspondences.
2
Latané (1996), Nowak et al. (1990). Nettle (1999) independently hit upon the idea of using
the Latané–Nowak approach to Social Impact theory in a computational simulation of language
change.
312 explaining syntax

the essential properties of the transmission and clustering of any cognitive


features.

12.2.2 Gaps
12.2.2.1 How gaps arise
We suppose for the sake of the simulation that the class of possible grammars
of natural languages can be characterized entirely in terms of values of
features.3 A prevalent view in current linguistic theory is that most if not all
of the most theoretically interesting aspects of language variation, language
change, and language acquisition can be accounted for in terms of a small set
of binary features, called ‘parameters’. For our purposes, however, it is suffi-
cient to assume that whatever the features are, however many there are, and
whatever values they have, learners are influenced to adopt the values of their
community through social interaction.
Our simulation supposes that there are three two-valued features, which
define eight distinct languages.
(1) +F1,+F2,+F3
+F1,+F2,F3
+F1,F2,+F3
+F1,F2,F3
F1,+F2,+F3
F1,+F2,F3
F1,F2,+F3
F1,F2,F3
Gaps occur when certain feature combinations are not attested. Our simula-
tion shows that gaps may arise over the course of time, as the values of two of
the features become strongly correlated. To take a simple example, if the
geographical distribution of [F2] becomes sufficiently restricted, it may
fail to overlap with [+F1]. That is, [+F1] and [+F2] become highly correlated.
In such a case, some of the languages, namely those with [+F1,F2], will cease
to exist. Such a situation may occur simply as a consequence of the social
structure, and in itself tells us nothing interesting about the relationship
between [+F1] and [F2].
For the simulation, we may assume that at the outset of the simulation all
possible combinations of features are possible (the ‘Tower of Babel’ state). The
reasoning is that if certain combinations fail to exist after some period of
time, this fact must be due to social factors, since there are no initial gaps. If

3
In fact this must be true in a trivial sense; see Culicover (1999) for discussion.
markedness and antisymmetry 313

we allowed for initial gaps, i.e. innate implicational universals, then the
appearance down the line of gaps would not provide any evidence about
the effect of social interaction on the distribution and clustering of linguistic
properties.
Figure 12.1 shows the random distribution of feature values for three
features in a population of 2500 (=5050). The upper left-hand image
shows the distinct languages as differences on the gray scale. The other images
show the distribution of + and  values for the three features FIRSTs,
SECONDs, and THIRDs.
The population of each of the eight languages is shown in the histogram in
Figure 12.2. As can be seen, the languages are distributed more or less evenly
over the entire population, as would be expected from a randomized assign-
ment of feature values.
We have omitted intermediate steps in the simulation due to limited space.
After 69 steps the distribution of languages and features is as in Figure 12.3. The
histogram in Figure 12.4 shows the population levels of the eight languages at
this point. The loss of languages illustrated in this particular instance of the
simulation is not unique. It is a consequence of the particular assumptions

Map of languages Map of SECONDs

Classification<U,1> Second mem <U,1>


Map of FIRSTs Map of THIRDs

First mem <n,1> Third mem <n,1>

Figure 12.1. Initial random distribution of feature values


314 explaining syntax

Histogram of Languages
121

0 Histogram (Classification)

Figure 12.2. Population of the eight languages

Map of languages Map of SECONDs

Classification <0,7> Map at I IHSIs Second mem <n,1> Map at I IIHRDs

First mem <0,1> Third mem <0,1>

Figure 12.3. Distribution of languages and features after 150 steps

made in the simulation about how individuals interact in the network.


Running the same simulation under the same interaction parameters yields
a different pattern of features and languages each time, but the results are
essentially the same. We repeated this simulation 100 times. The chart in
Figure 12.5 shows the number of times a given number of languages remained
in the simulation after 200 steps.
markedness and antisymmetry 315

424

Figure 12.4. Population of languages after 150 steps

60

50
Number of runs

40

30

20

10

0
1 2 3 4 5 6 7 8
Number of languages at step 200

Figure 12.5. Loss of languages in repeated simulation

In 50 of the 100 runs of the simulation there were eight languages after 200
steps. But in 32 runs there were 7 languages, in 10 runs there were 6 languages,
and so on. So while the precise number of languages that will remain after a
certain number of steps is not predictable, it is clear that gaps in the set of
languages can and will arise over the course of time as a consequence of the
interaction in the network. The chart in Figure 12.6 shows that over a longer
time span the number of languages for the same simulation tends to decline.

12.2.2.2 Gaps and bias


Let us now introduce bias into our simulation. Suppose that a particular
combination of features, say [+F1, –F2], is less preferred than the other three
combinations of these two features. On any run of the simulation model the
316 explaining syntax

Number of language at step 1000


30

25
Number of runs

20

15

10

0
1 2 3 4 5 6 7 8
Number of languages

Figure 12.6. Distribution of languages after 1,000 steps

results will look like those we have already seen. However, on every run of the
simulation model the results will be more or less the same, in that there will be
gaps or immanent gaps in [+F1, –F2] languages. It is known that simulations
that assume bias in general show a clustering towards the same stable state;4
the strength of the bias determines the predictability of the outcome.
This behavior of the simulation model suggests that it might be productive
to look at the content of particular feature combinations in order to deter-
mine what it is about them that yields more or less complexity. A number of
candidates for complexity should be considered.
Optimality theory (OT) as applied to syntax posits that particular struc-
tures are produced by rules that violate various constraints. Given a particular
formulation that captures a general tendency or a universal, it would be
natural to ask what it is about the particular constraints that yields the
observed ranking, since OT theory itself is not a theory of where the rankings
come from. On the other hand, OT allows for different rankings of the same
constraints, which suggests a priori that it might not shed much light on the
question of whether there is an independent universal metric that ranks
particular structures with respect to complexity.
Chomsky’s Minimalist Program (1995) proposes a measure of economy
that ranks derivations. The metric is formulated in terms of formal operations
and does not directly address the superficial properties of the languages
produced. From the perspective of the learner it is the superficial properties

4
This is demonstrated in simulations by Nowak et al. (1990). Kirby (1994) notes the role of
bias in change, while Briscoe (2000) has constructed computational simulations of the evolution
of language in which biases play a major role in determining the ultimate outcomes.
markedness and antisymmetry 317

that are most salient (or at least, for us, putting ourselves in the position of the
learner). One cannot rule out the possibility that there is a relationship
between derivational economy and superficial properties of the strings to be
processed by the learner, but nothing along these lines springs to mind. See
Jackendoff (1997) for discussion of the fact that derivation itself is far from
being a necessary component of a descriptively adequate account of human
language, as well as a vast amount of research in non-derivational theories,
especially HPSG.5
Parsing theory may offer some insight into what goes into the complexity
of a particular string, in terms of the extent to which the structure corre-
sponding to the string is transparently determined by the string.
Learnability theory has also been concerned with complexity, not so much
the complexity of individual examples as the complexity of a system of
examples with respect to the grammar that accounts for their properties.

12.3 Markedness and computational complexity


12.3.1 OT
OT posits that knowledge of language can be expressed in terms of the ordering
of constraints. The well-formed expressions of a language are those that opti-
mally satisfy the constraints. In principle there may be more than one way in
which an expression can satisfy the constraints; the ranking of the constraints
relative to one another determines which of these is optimal.
Let us take a familiar artificial example. Suppose that there is one constraint
to the effect that some category Æ must appear in clause initial position, call it
‘Move’, and another constraint that says that categories do not appear in other
than their canonical position, call it ‘Stay’. We may have two rankings of these
two constraints:
(2) Stay > Move
(3) Move > Stay
Consider a string of the form in (4).
(4) Æi [ . . . ti . . . ]
This string is optimal with respect to (3), but not with respect to (2). The
tableaux in (5) illustrate.

5
The exchange in Natural Language and Linguistic Theory regarding the MP (beginning with
Lappin et al. 2000) does not offer any particularly good motivation for derivational economy, in
our view, but below we suggest an incompatible alternative view of derivational complexity that
might be more satisfying.
318 explaining syntax

(5) a. String: Stay Move


Æi [ . . . ti . . . ] *!
[ . . . Æi . . . ] *

b. String: Move Stay


Æi [ . . . ti . . . ] *
[ . . . Æi . . . ] *!

In (5a) the movement string is ill-formed with respect to the more highly
ranked constraint, Stay, while the non-movement string is well-formed with
respect to this constraint. The reverse situation holds in (5b). Thus we have
grammars for two languages, of which one requires movement and the other
disallows it. The only difference between the two grammars in this case is the
relative ordering of the constraints. This is the device for representing lan-
guage variation in OT.
An account of this type raises two fundamental questions. First, what
determines the set of possible constraints? Second, if some orderings of
constraints are preferred to others, why is this the case? Beyond this there
are difficult questions of computability and learnability (Tesar 1995).
In OT the set of possible constraints is determined by Universal Grammar.
This much is not controversial, since any theory of grammar must provide some
account of what the possibilities are that languages may choose among.6 The
critical question has to do with the rankings. In some cases there appears to be a
natural ordering of the constraints, but there is nothing in the theory per se that
rules out any particular orderings. If we find that there is a preferred ordering,
this ordering of the constraints is an accounting of or an embodiment of the
markedness relations, in some sense. But of course, in addition to representing
markedness, we would like to be able to explain where it comes from.
Bresnan (2000) characterizes markedness in syntax in terms of the corre-
spondence between representations, in particular, c-structure and f-structure:
“there is not a perfect correspondence between the categorial (c-structure)
head and the functional (f-structure) head.” We believe that the notion of
correspondence in general is the right one for the purpose of characterizing
optimality; let us go back to the most primitive correspondence, however, that
between sound and meaning, in order to find an explanation for markedness

6
Matters become somewhat more complex if we attempt to derive some of the constraints
from functional considerations, rather than simply assume that they are all part of UG. For
discussion, see Newmeyer (2002) and Aissen and Bresnan (2002).
markedness and antisymmetry 319

relations. If, as we suggest in the next section, markedness in the end corres-
ponds to the complexity of mapping between strings and conceptual struc-
tures, an OT account, to the extent that it correctly captures the markedness
relations, is parasitic on the underlying correspondence that is ultimately
responsible for complexity.

12.3.2 The basis for markedness


12.3.2.1 The Derivational Theory of Complexity
We take it as given that the job of the grammar that the learner constructs or
acquires is to map strings of words into conceptual structures and vice versa.
This mapping is not one-to-one. A word or string of words may correspond
simultaneously to several disjoint parts of the conceptual structure (CS), and
one part of the CS may correspond to several disjoint substrings. The hier-
archical structure of CS does not correspond in a straightforward way to the
ordering of the string. In the early days of generative grammar, transform-
ations of phrase markers representing or corresponding to aspects of mean-
ing, especially argument structure, was a device for capturing some of these
mismatches. Given some canonical deep structure representation, the com-
plexity of the mapping could be measured roughly by the number of oper-
ations required to get the string from the deep structure.7 This was called the
“derivational theory of complexity” (DTC),8 and was thoroughly repudiated
by the end of the 1970s. Bresnan (2000) argues against an updated version as it
appears in the OT syntax of Grimshaw (1997), formulated in terms of move-
ments of heads to functional categories and of phrases to Spec.
The problem with the DTC was that it calculated complexity on the basis of
the number of transformational operations, and many of these operations
were simply formal housekeeping devices required by the transformational
theory of the time, such as Affix Hopping. While the number of such house-
keeping devices might differ from sentence to sentence, there was no evidence
that they contributed at all to relative processing complexity. But the DTC
contains a core of insight. The important transformational operations that
contribute to complexity are those that deform the canonical deep structure
so that contiguous portions of the string do not correspond to contiguous
portions of the deep structure. These correspondences constitute mismatches
that the language learner and the language processor have to figure out.9

7
Deep structure was renamed D-structure in subsequent syntactic theory.
8
Brown and Hanlon (1970); Fodor et al. (1974).
9
For more on mismatches, see Culicover and Jackendoff (1995; 1997; 1999), among many
others.
320 explaining syntax

To take a simple example, consider extraposition of relative clauses.

(6) A man called who wants to buy your car.

The interpretation of this example is ‘a man who wants to buy your car called’,
but the relative clause and the head that it modifies are not adjacent in the
string. Hence there is a mismatch between the hierarchical structure and the
string, illustrated in (7).10

(7) .

. .

. . called

a man who wants to buy your car

a man called who wants to buy your car

The crossing of mapping lines and the breaking up of the structure of the
subject illustrates the mismatch. (The crossing has nothing to do with linear
ordering in the structure, but with the way we display the hierarchical
organization and how it maps into the string.)
Intuitively, discontinuity of the sort illustrated in (7) does not contribute
significantly to processing complexity. If this intuition is correct, it would
suggest that discontinuity in itself is not problematic. Rather, complexity
arises when there are factors that interfere with the resolution of the
discontinuity.11 In the case of extraposition, on the assumption that extra-
position is not inherently complex, this may well be because it is treated as a
special case of binding, along the lines suggested by Culicover and Rochemont
(1990). The core idea, in this case, is that processing of the linear order of

10
There are several familiar mechanisms for representing discontinuity in natural language,
including movement and passing features of some gap within the larger string, so that the entire
string inherits the ability to license the ‘moved’ constituent. The formal devices for capturing
this type of relationship are not at issue here. The main point is that the mismatch introduces a
level of complexity into the mapping, both from the perspective of computing it for a given
string, and from the perspective of determining its precise characteristics on the basis of pairs
consisting of string and corresponding CS.
11
It is often suggested that extraposition and other rightward movements improve process-
ing by reducing center-embedding. See Hawkins (1994) and Wasow (1997).
markedness and antisymmetry 321

words produces a structure of the form in (8) at the point at which the
extraposed constituent is encountered.

(8) a man called who wants to buy your car

. .

a man called

Processing of the relative clause creates a predicate that must be applied to the
representation of an object in CS; in this case the only available antecedent is
the CS representation of a man. Mapping (8) into (7) depends on the extent
to which this antecedent is computationally accessible.a It is this accessibility
that we believe underlies the complexity of the mapping between strings and
CS, both for learners and for adult language processors, especially in the case
of discontinuity but in other cases as well.12
This takes us close to a familiar idea in the domain of human sentence
processing. Constituents that have been processed and interpreted are in
general accessible to subsequent operations that require retrieval of their
meanings (Bransford and Franks 1971); at the same time, the actual form of
these constituents is difficult to retrieve as sentence processing continues.13
One of the key ideas in this work is that local relations are easier to compute
than more distant relations, which require memory for the elements that
occur earlier. Memory may degrade with time or it may be overloaded by the
need to perform multiple tasks; or it may be disordered by the need to
perform multiple similar tasks. All of these are logically possible, and empir-
ical evidence exists to suggest that they are in fact realistic problems for a
language processor. Again, we suggest that the language learner faces similar
problems. The bottom line, other things being equal, is that distance in the
string between elements that are functionally related to one another in the
interpretation of the string contributes to complexity of mapping that string
into CS.

a
The discussion of extraposition in Ch. 11 discusses some factors that may render a
particular NP less accessible as an antecedent.
12
Hence we follow the lead of Berwick (1987), who saw the connection very clearly.
13
There are many additional complexities, of course. See Kluender (1998) for a discussion of
some of these.
322 explaining syntax

A further contributor to complexity of the mapping is that CS is not the


only complex hierarchical structure that is mapped onto the string. There is
also discourse structure, which we take here to be the representation of topic
and focus. To some extent, which varies from language to language, these
aspects of the discourse structure are expressed in terms of word order. In
English, for example, a topic may be identified through extraction to sen-
tence-initial position (Prince 1987). Focus in certain languages is marked by
extraction to a left peripheral position (as argued in a number of papers in
E. Kiss 1992). The possibility that such relations are marked in a given
language introduces an additional component of complexity to the mapping
between the string and its interpretation.14
A measure of complexity that intuitively falls under this idea of complexity
concerns the extent to which the order of words in a sentence corresponds
uniformly to its branching structure. Hawkins (1994) has argued for the view
that “words and constituents occur in the orders they do so that syntactic
groupings and their immediate constituents can be recognized (and pro-
duced) as rapidly and efficiently as possible in language performance”
(p.57). Hawkins shows that different constituent orders require different-
sized spans of a string and corresponding phrase structure in order to
determine what the immediate constituents are. The differences “appear to
correspond to differences in processing load, therefore, involving the size of
working memory and the number of computations performed simultan-
eously on elements within this working memory” (60).15 The contribution
of distance is not restricted to overt movement. In the case of so-called ‘LF’
movements, where an operator has scope over a region of a sentence, there is a
measurable distance between the operator and the boundaries of what it takes
scope over.
The direction that these observations point to is that one key to complexity,
in the sense of language acquisition at least, and its impact on language
change, is not formal syntactic complexity in the sense of the derivation of
the phrase marker. Rather, it is the complexity of the syntactic construction as

14
Of course, we could suppose that CS includes a representation for discourse structure as
well as a representation for argument structure, but this would not simplify the mapping
problem, since we would then be dealing with a more complex CS with more possibilities.
15
One minor concern with the explanatory force of this argument is that we might have
expected that human memory would have evolved so as to overcome the problems offered by
non-uniform branching. Of course there are many reasons why this would not have happened,
and it is probably impossible to settle the issue. Shifting the burden of explanation to language
acquisition rather than language processing sidesteps this problem, since we probably do not
want to attribute to early learners the adult’s capacity to store and process long strings of
linguistic material. See }12.3.2, and fn. b below.
markedness and antisymmetry 323

a way of conveying the corresponding conceptual structure. The construction


may be sui generis, as is suggested by the example of Culicover and Jackendoff
(1999) of the more X the more Y, or it may be the product of the interaction of
a set of structural devices, such as fronting, scrambling, head movement,
and so on.

12.3.2.2 Learnability theory


These two types of complexity, derivational complexity and processing com-
plexity, take us to learnability. The basic problem of the complexity of the
mapping between string and CS was addressed formally in Wexler and Culi-
cover (1980).16 There the sole criterion was the learnability of a class of
transformational grammars defined over a fixed base. A class of grammars
is not learnable in a particular sense if it is possible for a learner to construct a
grammar in which there is an error that can never be corrected by subsequent
experience, in principle. Errors that can be corrected on the basis of experi-
ence are called ‘detectable’ errors; the proof of learnability involves demon-
strating that there are no undetectable grammatical errors, given certain
assumptions about the possible grammatical operations and derivations
that may be hypothesized by the learner.
The identifiability of errors is an appropriate consideration in an account
of learning that posits random construction or random selection of rules. In
such a theory, the correctness of a particular hypothesis is determined by
whether it produces errors. If we shift our perspective to a constructive
account, then we shift our emphasis from the identification of grammatical
errors to the relative complexity of the mapping.17 If a mapping is relatively
opaque then the ability of the learner to compute the mapping is severely
limited. On this perspective, the most transparent mapping is one in which
the string contains unambiguous, independent, and complete evidence about
what the corresponding CS representation is.
We have already illustrated a mapping that involves a certain amount of
complexity, in (7). Let us compare this with the type of situation envisaged in

16
The mapping was formulated in terms of strings and base phrase markers, but the general
problem is the same as the one that we are considering here.
17
This is not to say that grammatical errors per se are irrelevant, but simply that they are not
the whole story. On the current perspective, a grammatical error would occur if a particular
string is hypothesized to correspond to the wrong conceptual structure representation. We
assume that such errors are always detectable on the basis of subsequent information in the form
of <string,CS> pairs, but leave open the possibility that a particular formulation of the
correspondences might give rise to pathological cases that would have to be addressed.
324 explaining syntax

Kayne’s (1994) Antisymmetry theory, where all branching is binary and to the
right, such that all phrases are of the form given in (9).18

(9) XP

Spec X⬘

X YP
Kayne assumes that there is a strict correlation between asymmetric c-
command and linear order called the Linear Correspondence Axiom (LCA),
such that if Æ c-commands  and  does not c-command Æ, then Æ precedes .
If there is no movement, and if the branching structure in (9) is taken to be
the CS, then the mapping between strings and corresponding CS representa-
tions will be straightforward, in fact. All of the mappings will conform to the
LCA. Moreover, the mapping will be maximally simple, in that in order to
construct the mapping it is sufficient to scan the string from left to right,
establishing a correspondence between each element in the string and each
constituent of the CS.

12.4 The computation of complexity


12.4.1 Distance
We have argued to this point that the distance between functionally related
parts of a string is the crucial component of complexity, because of memory
limitations.b Here we formulate a rough measure of this distance. The essen-
tial idea is that in the simple case the string is an image of the CS representa-
tion, to a first approximation, and relative distance in the two domains should
be relatively consistent. When it isn’t, there is ‘twisting’ of the structure so that
it can map into the string. The greater the twisting, the greater the complexity.
Let us begin with a CS representation. For convenience, we will assume that
the CS representation is a structure in which the terminals correspond to
the individual words and functional heads of a string; in essence, it is like a

18
In principle all branching could be to the left in Kayne’s approach, but Kayne introduces
an additional stipulation that rules out leftward branching.
b
Memory limitations play a central role in many accounts of processing complexity, e.g.
Hawkins’s work cited here and Hofmeister (2011). For arguments that memory limitations do
not correspond directly to acceptability judgments, see Sprouse et al. (2012). A plausible inter-
pretation of the role of memory is that it is a biasing factor, which leads speakers to prefer
certain constructions over others, which leads to higher frequency for the preferred construc-
tions, which ultimately produces ‘surprisal’ in the case of dispreferred constructions. Surprisal
in turn corresponds to lower acceptability. For some discussion, see Culicover (2013).
markedness and antisymmetry 325

D-structure in the classical sense. Using such a structure instead of a true CS


along the lines of Jackendoff (1990) allows for substantial simplification. It
allows us to develop a foundation for the intuition that uniform branching is
optimal, which in turn allows us to view the objectives of Kayne’s antisym-
metry theory in terms of markedness in contrast to rigid constraints on
structure.
In the representations that follow we take the capital letters to correspond
to the types in the CS hierarchy; the terminals are basic concepts.

(10) A

B C

D E F G

H I
string = defhi

Let us say that the Image of D is d, and so on for the other terminals in the CS
representation. We simplify dramatically here, because it is plausible that a
single CS can be expressed in a number of different ways. We can also define
an inverse relation and since there is more information in the tree than in the
string, the inverse image defines a set containing one or more CS
representations.
(11) Image(D)=d
Image-1(d)=<D, . . . >
Hence the correspondences are many-to-many.
It is possible that the image of a higher level node in the tree is not
decomposable into the image of its constituents, which would be typical of
an idiom (e.g. Image-1(kick the bucket)=<DIE, . . . >). It is also possible that a
single element in a string corresponds to a complex CS representation, as
argued e.g. by Jackendoff (1990). And it is possible that there is a particular
aspect of CS that corresponds to a class of strings that satisfy a certain
structural description, as has been argued for the dative construction
among others (see Goldberg 1995; Jackendoff 1997). We leave these more
complex possibilities aside here.
We can measure the distance between constituents of the CS representation
in terms of the height of the common ancestor. For sisters we will say that the
CDistance, i.e. the distance in the CS representation, is 0, which is the number
of ancestors that they do not have in common. So for (10) we have (12).
326 explaining syntax

(12) CDistance(H,I) = 0
The CDistance between a node and the daughter of its sister is 1, as in the case
of (F,H) and (F,I). In general, the CDistance between two nodes is the number
of dominating nodes that the path between them passes through. A node is
not a dominating node if the path through it links sisters; otherwise it is.
Given this notion of CDistance, we can relate the distance between sub-
strings to linear relations between the corresponding parts of the CS repre-
sentation. The general idea is the following. For a given distance between two
elements (words, phrases, etc.) in the string, we posit that greater distance in
CS requires greater processing, and hence produces greater complexity, other
things being equal.
Consider the string defhi. Sisterhood at CS, i.e. CDistance = 0, corresponds
to adjacency in the string. If CDistance(Æ,) = 0, and Image(Æ) precedes
Image(), then the right edge of Image(Æ) is adjacent to the left edge of
Image(). This is the case, for example, for Æ = B and  = C.
We use this property to measure the amount of deformation (or ‘twisting’)
of a CS representation with respect to its corresponding string. In the case of
adjacency there is no deformation. We may measure deformation in terms of
the distance in the string between the right edge of Image(Æ) and the left edge
of Image(), which in this case is 0. But we must be careful to correlate these
distances appropriately. So, for example, the distance between B and G is 1.
Image(B) = de and Image(G) = hi. The distance between the right edge of de
and the left edge of hi is one element, namely f, but this is simply because f is a
terminal.
Suppose we replace F corresponding to f in the string in (10) with [F J K],
corresponding to jk in the string.

(13) A

B C

D E F G

J K H I
string = dejkhi

Now there are two elements in Image(F). But the distance between de and jk
and is 1, if we treat Image(B) = de and Image(F) = jk as single units. They can
be so treated because they correspond to constituents of CS. Let us call this
markedness and antisymmetry 327

distance between substrings that correspond to constituents the Parse Dis-


tance, or PDistance.
(14) Given a string s, containing initial substring a and final substring b such
that Image(a)=a and Image(b)=b, PDistance(a,b) is the minimal
number of strings x1, . . . , xn such that s=a+x1+ . . . xn+b
If a and b are adjacent then PDistance = 0. In (10), PDistance(Image(B),
Image(G)) = 1. PDistance(e,i) = 2, and PDistance(d,i) = 3.
Consider now the most basic relation, that of head–complement. Haw-
kins’ intuition that heads are optimally adjacent to the heads of their
complements correlates in a natural way with the relative distance measures.
For simplicity of exposition, let us identify Image(x) and x. We can then
encode both CS and the string in a traditional ordered phrase marker, as
shown in (15).

(15) a. . b. .

H1 . H1 .

H2 XP XP H2

We observe that in (15a),


(16) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) = 1 PDistance(H1,H2) = 0
CDistance(H1,XP) = 1 PDistance(H1,XP) = 1
and in (15b),
(17) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) = 1 PDistance(H1,H2) = 1
CDistance(H1,XP) = 1 PDistance(H1,XP) = 0
We have highlighted with boldface where the difference between the two cases
lies. A twisting of the hierarchical structure is reflected by an increase or
decrease in PDistance and constant CDistance. Such a relation occurs when a
head and the heads of its complement are separated in the string; this requires
that the head that occurs first be held in memory along with the lower
material until the lower head comes along. The more complex structure is
the one for which the PDistance between two heads is greater, while the
CDistance is the same.
328 explaining syntax

To see whether this is an accidental property of the particular configuration,


let us see what happens when we have a uniform left-branching structure.

(18) a. . b. .

. H1 . H1

H2 XP XP H2

For (18a),
(19) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) =1 PDistance(H1,H2) =1
CDistance(H1,XP) = 1 PDistance(H1,XP) = 0
and for (18b),
(20) CDistance(H2,XP) = 0 PDistance(H2,XP) = 0
CDistance(H1,H2) =1 PDistance(H1,H2) =0
CDistance(H1,XP) = 1 PDistance(H1,XP) = 1
Again, the greater PDistance between heads that are adjacent in the structure
occurs when the branching is not uniform, as in (18a).
The total deformation of a tree of course grows as the number of heads
grows, and the extent to which they do not line up grows. So, if we take the
pattern in (18a) and replicate it, the total PDistance between adjacent heads
will equal the number of alternating pairs of heads, while the total CDistance
between adjacent heads will remain 0. We might surmise that a single head in
an initial position with all other heads to the right might not be that costly in
terms of complexity, and might optimize something else in the grammar.
The computational cost would be minimized if the head in question was the
highest, since an internal ‘outlier’ would produce a cost with respect to the
head immediately above it and the one immediately below it.
On this view, complexity of processing is correlated with memory load, and
uniformity of branching reduces memory load. In this sense, the antisymme-
try approach of Kayne (1994) is correct in placing a high value on uniformity
of the direction of branching structure, but is too strong in that it does not
allow for non-uniform branching at all. For our purposes, it is enough to say
that uniformity is computationally less complex, other things being equal.
The reduction of complexity, coupled with a theory of language change that
reflects the computation biases of learners as discussed in }12.2, will produce a
markedness and antisymmetry 329

situation in which uniformity of branching is a very strong tendency without


being an absolute universal—a result that appears to be correct (again, see
Hawkins 1994).

12.4.2 Stretching and twisting


The measure of complexity in terms of distance is a crude one, but it is worth
seeing whether it extends naturally to other phenomena. We have already
discussed extraposition, and have argued that it is not inherently complex as
long as the antecedent of the extraposed predict is accessible. It is well known
that extraposition is more difficult to process when there is an intervening
potential antecedent (Ross 1967), a relation that is easily formulated in terms
of relative PDistance.
Another phenomenon of some interest is that wh-Movement and related
constructions have been observed to be strictly leftward-, not rightward-
branching. Kayne derives this result by postulating uniform rightward branch-
ing, so that the possible landing sites will always be to the left. Left-branching
languages typically lack such leftward movements, which Kayne explains by
deriving the left-branching structure from leftward movements that block
other leftward movements. For example, movement of IP to SpecCP puts
C in final position, and blocks subsequent movements to SpecCP.

(21) CP

Spec C⬘

C IP
[+WH]

XP
[+WH]

As we have already seen, a mirror image of a structure preserves all of the


distance relations, so that it will not be possible to derive the absence of
rightward movement from distance considerations alone. It is not implausible
that operators that bind variables need to be processed before the variables
that they bind, so that the variables may be identified as such.19 Such

19
An absolute requirement along these lines is too strong, given that there are cases where an
operator binds a variable to its left, such as If hei wants to, each mani can vote (Greg Carlson,
p.c.). We hypothesize that the correct account is one that assigns a strong preference to the case
in which the operator precedes what it binds, presumably for processing reasons.
330 explaining syntax

functional considerations entail that movement of operators will be to pos-


itions where they precede the variables that they bind, not to the right.
This does not tell us, however, why there is no leftward movement for
purposes of marking scope in most if not all strictly head-final languages. One
possible answer is that in head-final languages, the only possible movement
for the operator would be to the head that defines its scope (typically the
inflected verb, or something adjoined to the verb, such as a complementizer
or a particle). In a head-final language this verbal head is on a right branch, of
course. So the operator would have to move to the right, which is ruled out on
the sorts of functional grounds we have just discussed. Note that there are
head-final languages in which covert and overt markers are licensed to the
right. In Korean, for example, the relative clause ends in a relative marker,
although, strikingly, there is no overt movement of a relative pronoun.
Let us consider, finally, the cost of extracting from a moved constituent.
(22) illustrates.
(22) a. A

Bi C

D E

Fj G

H J L M

K ti N tj

b. A

Bi C

D E

L M

N F

H J

K ti
markedness and antisymmetry 331

Intuitions about complexity suggest that extraction from an extracted


constituent is more problematic than extraction from an unmoved constitu-
ent. The first empirical evidence pointing this out is due to Postal (1972), who
used it as an argument against successive cyclic movement in the Conditions
framework of Chomsky (1973).
(23) a. Leslie believes that [a picture of Terry]i, you would never find ti in a
shop like that.
b. *Terry is the person whoj Leslie believes that [a picture of tj], you
would never find ti in a shop like that.
Examples of the following sort are cited by Wexler and Culicover (1980) as
evidence for the Freezing Principle, which blocks extraction from a moved
constituent.
(24) a. Whoi did you tell Mary [a story about ti]?
b. *Whoi did you tell tj to Mary [a story about ti]j?
The Freezing Principle was motivated by considerations of learnability.
At the same time, we may take the view that extractions such as these are
grammatical but marginal. This more closely fits our current perspective,
which is that extreme deformation produces complexity but not necessarily
complete ungrammaticality. Examples such as (24b) are judged by some
speakers to be grammatical, and examples such as the following are not
completely impossible.
(25) ?Terry is the person [of whom]j Leslie pointed out that [such pictures
tj]i you would never find ti in a shop like that.
The intuition that we wish to develop about extraction, then, is that a simple
movement to an accessible position is in effect a ‘stretching’ of the CS
representation onto a particular linear order. Constituents that are close in
CS are more distant syntactically, but the topological relations are not signifi-
cantly distorted—the PDistance between a moved constituent and its trace is
correlated with the CDistance. Presumably there is some falling off when
these distances become large, but the intervening material is not problematic.
However, when we extract from an extracted constituent, there is a ‘twisting’
of the structure in order to map it into the string. Attachment of Bi in (22a) is
actually closer in PDistance and CDistance to its trace (shown in (26a)) than
it is in (22b) (shown in (26b)) yet the complexity of this attachment is greater.
(26) a. PDistance(Bi,ti) = 3
CDistance(Bi,ti) = 4
332 explaining syntax

PDistance(Fj,tj) = 2
CDistance(Fj,tj) = 2
b. PDistance(Bi,ti) = 5
CDistance(Bi,ti) = 5
When the trace is contained in a moved constituent, the complexity would be
better represented by constructing a measure that takes this fact explicitly into
account. One possibility is to multiply the CDistance from Bi to its trace by
the CDistance from Fj to its trace in (22a), which yields 8 compared with 5 in
(22b). Such a measure, while arbitrary, reflects the degree of deformation of
the tree.
To sum up, there are essentially three ways to map a CS into a string. One is
to align the constituents of the CS with the string without crossing constitu-
ents of the parse string. The second is to stretch a CS constituent to position
the corresponding string in a position where it is not adjacent to its CS sisters.
The third is to twist the lines so that the correspondences between strings and
constituents of CS cross. Our intention is that the relative complexity
accorded to this measure reflects the relative complexity in terms of memory
requirements, and that we do not have to formulate an explicit theory of
memory for sentence processing in order to be able to capture the basic
outlines of comparative complexity.
Note that there are several complexities that we have not factored into our
account here. A string of words may map into a CS representation so that
there are fewer primitives in the CS representation than there are words in the
string; this is a characterization of idiomaticity. Or there may be more
primitives in the CS representation than in the string; this is a characterization
of a ‘construction’ in the sense of Construction Grammar. In both cases there
is the opportunity for a mismatch in the CDistance and PDistance, since the
two are equal when there is a uniform linearization of a branching structure,
with a one-to-one correspondence between elements of the string and elem-
ents of the CS representation. To the extent that this additional complexity
presents a burden for the learner, we might expect some effect on learning.
But there is no twisting and so the burden, if it exists, is relatively light.

12.5 Summary
We have suggested that at its core the antisymmetry theory reflects the relative
computational simplicity of mapping strings into structures assuming uni-
form branching. The branching really has to do with the relative linear order
in the string between related heads and their identifiability, a measure that can
be correlated with memory but that can be abstractly formulated for string/
markedness and antisymmetry 333

structure mappings. A computational bias for certain constructions will


produce a clustering of certain structural features in languages, given a
plausible theory of language change that ties up with a theory of language
acquisition. Hence we expect to find, and in fact do find, that languages
tend towards uniform branching. At the same time, greater complexity does
not entail nonexistence, and deviations from the optimal are possible
and attested, yielding variation among languages. Taking the perspective of
markedness allows us to accommodate these deviations without taking the
radical step advocated by Kayne (1994), that of allowing only uniform right-
ward binary branching, and accounting for all apparent counterexamples in
derivational terms.
13

Morphological complexity
outside of universal grammar
(1998)*

Jirka Hana and Peter W. Culicover

Remarks on Chapter 13
This chapter is about morphosyntax, in particular the use of linear order in
inflected words to express correspondences between form and meaning. In this
case, we focus on the identification of inflectional morphology and the corres-
pondence between morphological structure and syntactic function. We explore
the possibility that different orderings among the root and inflection in an
inflected form may yield differences in the complexity of the form–meaning
correspondence. We assume that complexity differences result in turn in prefer-
ences for some orderings over others, as seen in typological distribution, along
lines similar to those discussed in Chapter 12. Specifically, we argue that the
identification of inflectional morphology expressed as suffixation is computation-
ally less complex than prefixation, which in turn is computationally less complex
than infixation. These preferences account for the greater frequency of suffixation
over prefixation, and the greater frequency of prefixation over infixation.

13.1 Background
We address here one aspect of the question of why human language is the way
it is. It has been observed (Sapir 1921; Greenberg 1957; Hawkins and Gilligan
1988) that inflectional morphology tends overwhelmingly to be suffixation,
rather than prefixation, infixation, reduplication, or other logical possibilities

* [This chapter originally appeared as Jirka Hana and Peter W. Culicover, ‘Morphological com-
plexity outside of Universal Grammar’, OSUWPL 58, Spring 2008, pp. 84–108. We thank Chris Brew,
Beth Hume, Brian Joseph, John Nerbonne, and three anonymous reviewers from the journal Cognitive
Science for valuable feedback on various versions. We also thank Mary Beckman and Shari Speer.]
morphological complexity 335

that are quite rare if they exist at all. For this study, we assume that the
statistical distribution of possibilities is a consequence of how language is
represented or processed in the mind. That is, we rule out the possibility that
the distributions that we find are the result of contact, genetic relatedness, or
historical accidents (e.g. annihilation of speakers of languages with certain
characteristics), although such possibilities are of course conceivable and in
principle might provide a better explanation of the facts than the one that we
assume here.
The two possibilities that we focus on concern whether the preference for
suffixation is a property of the human capacity for language per se, or whether
it is the consequence of general human cognitive capacities. Following
common practice in linguistic theory, let us suppose that there is a part of
the human mind/brain, called the Language Faculty, that is specialized for
language (see e.g. Chomsky 1973). The specific content of the Language
Faculty is called Universal Grammar. We take it to be an open question
whether there is such a faculty and what its specific properties are; we do
not simply stipulate that it must exist or that it must have certain properties,
nor do we deny its existence and assert that the human capacity for language
can be accounted for entirely in terms that do not appeal to any cognitive
specialization. The goal of our research here is simply to investigate whether it
is possible to account for a particular property of human language in terms
that do not require that this property in some way follows from the architec-
ture of the Language Faculty.

13.1.1 Types of inflectional morphology


Inflectional morphology is the phenomenon whereby the grammatical prop-
erties of a word (or phrase) are expressed by realizing the word in a particular
form taken from a set of possible forms. The set of possible forms of a word is
called its paradigm.1 A simple example is presented by the English nominal
paradigms distinguishing singular and plural. The general rule is that the
singular member of the paradigm has nothing added to it—it is simply the
stem—while the plural member has some variant of s added to the end of
the stem.2

1
The word ‘paradigm’ is used in two related but different meanings: (1) all the forms of a
given lemma; (2) in the original meaning, referring to a distinguished member of an inflectional
class, or more abstractly to a pattern in which the forms of words belonging to the same
inflectional class are formed. We reserve the term ‘paradigm’ only for the former meaning, and
use the phrase ‘paradigm pattern’ for the latter.
2
Throughout, we mark relevant morpheme boundaries by ‘·’, e.g. book·s.
336 explaining syntax

(1) Singular: book patch tag


Plural: book·s patch·es tag·s
Other, more complex instances of inflectional morphology involve mor-
phological case in languages such as Finnish and Russian, and tense, aspect,
modality, etc. in verb systems, as in Italian and Navajo. For a survey of the
various inflectional systems and their functions, see Spencer and Zwicky
(1998).
It is possible to imagine other ways of marking plural. Imagine a language
just like English, but one in which the plural morpheme precedes the stem.
(2) Singular: book patch tag
Plural: s·book s·patch s·tag
Or imagine a language in which the plural is formed by reduplicating the
entire stem:
(3) Singular: book patch tag
Plural: book·book patch·patch tag·tag
—or a language in which the plural is formed by reduplicating the initial
consonant of the stem and following it with a dummy vowel to maintain
syllabic well-formedness.
(4) Singular: book patch tag
Plural: be·book pe·patch te·tag
Many other possibilities come to mind, of which some are attested in
languages of the world and others are not. A favorite example of something
imaginable that does not occur is that of pronouncing the word backwards.
The pattern would be something like
(5) Singular: book patch tag
Plural: koob tchap gat
13.1.2 A classical example: prefix–suffix asymmetry
Greenberg (1957) finds that across languages, suffixing is more frequent than
prefixing and far more frequent than infixing. This tendency was first sug-
gested by Sapir (1921). It is important that the asymmetry holds not only when
simply counting languages, which is always problematic, but also in diverse
statistical measures. For example, Hawkins and Gilligan (1988) suggest a
number of universals capturing the correlation between affix position in
morphology and head position in syntax. The correlation is significantly
skewed towards preference for suffixes. For example, postpositional and
head-final languages use suffixes and no prefixes, while prepositional and
morphological complexity 337

head-initial languages use not only prefixes, as expected, but also suffixes.
Moreover, there are many languages that use exclusively suffixes and not
prefixes (e.g. Basque, Finnish), but there are very few that use only prefixes
and no suffixes (e.g. Thai, but in derivation, not in inflection).
There have been several attempts to explain the suffix–prefix asymmetry,
using processing arguments, historical arguments, and combinations of both.

13.1.2.1 Processing explanation


Cutler et al. (1985) and Hawkins and Gilligan (1988) offer an explanation
based on lexical processing. They use the following line of reasoning: It is
assumed that lexical processing precedes syntactic processing and affixes
usually convey syntactic information. Thus listeners process stems before
affixes. Hence a suffixing language, unlike a prefixing language, allows listen-
ers to process morphemes in the same order as they are heard. The preference
is a reflection of the word-recognition process.
In addition, since affixes form a closed class that is much smaller than the
open class of roots, the amount of information communicated in the same
time is on average higher for roots than for affixes. Therefore, in a suffixing
language, the hearer can narrow down the candidates for the current word
earlier than in a prefixing language. Moreover, often (but not always) the
inflectional categories can be inferred from context.3

13.1.2.2 Historical explanation


Givón (1979) argues that the reason for suffix preference is historical. He
claims that (1) bound morphemes originate mainly from free morphemes and
(2) originally all languages were SOV (with auxiliaries following the verb).
Therefore verbal affixes are mostly suffixes since they were originally auxiliar-
ies following the verb. However, assumption (2) of the argument is not widely
accepted (see e.g. Hawkins and Gilligan (1988: 310) for an opposing view).
Moreover, it leaves open the case of nonverbal affixes.

13.1.2.3 Processing and historical explanation


Hall (1988) tries to integrate the historical explanation offered by Givón (1979)
(}13.1.2.2) and the processing explanation by Hawkins and Gilligan (1988)
(}13.1.2.1). He adopts Givón’s claim that affixes originate mainly from free
morphemes, but he does not need the questionable assumption about

3
For example, even though in free word-order languages like Russian or Czech it is not
possible to predict case endings in general, they can be predicted in many specific cases because
of agreement within the noun phrase, subject–verb agreement, semantics, etc.
338 explaining syntax

original SOV word order; he uses Hawkins and Gilligan’s argument about
efficient processing to conclude that prefixes are less likely than suffixes
because free morphemes are less likely to fuse in pre-stem positions.
Although the work above correctly explains suffix–prefix asymmetry, it has
several disadvantages: (1) it relies on several processing assumptions that are
not completely independent of the explained problem, (2) there are many
other asymmetries in the distribution of potential morphological systems;
(3) as stated above, it addresses only verbal morphology. In the rest of the paper,
we develop an alternative measure that we believe addresses all of these issues.

13.2 Our approach


As noted, the question of why some possibilities are more frequent than
others and why some do not exist has two types of answer, one narrowly
linguistic and one more general. The linguistic answer is that the Language
Faculty is structured in such a way as to allow some possibilities and not
others, and the preferences themselves are a property of Universal Grammar.
This is in fact the standard view in mainstream Generative Grammar, where
the fact that rules of grammar are constrained in particular ways is taken to
reflect the architecture of the Language Faculty; the constraints are part of
Universal Grammar (Chomsky 1973; Wexler and Culicover 1980) and prevent
learners from formulating certain invalid hypotheses about the grammars
that they are trying to acquire.
The alternative, which we are exploring in our work, is that the possibilities
and their relative frequencies are a consequence of relative computational
complexity for the learner of the language. On this view, morphological
systems that are inherently more complex are not impossible, but less pre-
ferred. Relatively lower preference produces a bias against a particular
hypothesis in the face of preferred competing hypotheses. This bias yields a
distribution in which the preferred option is more widely adopted, other
things being equal. See Culicover and Nowak (2002, reprinted here as
Chapter 12) for a model of such a state of affairs.
If we simply observe the relative frequencies of the various possibilities we
will not be able to confirm the view that we have just outlined, because it relies
on a notion of relative complexity that remains undefined. We run the risk of
circularity if we try to argue that the more complex is less preferred, and that
we know what is more complex by seeing what is less preferred, however
relative preference is measured. Therefore, the problem that we focus on here
is that of developing a measure of complexity that will correctly predict the
clear cases of relative preference, but that will also be independent of the
phenomenon. Such a measure should not take into account observations
morphological complexity 339

about preference per se, but rather formal properties of the systems under
consideration. On this approach, if a system of Type I is measurably more
complex than a system of Type II, we would predict that Type I systems would
be more commonly found than Type II systems.

13.2.1 Complexity
We see basically two types of measure as the most plausible accounts of
relative morphological complexity: learning and real-time processing. Sim-
plifying somewhat, inflectional morphology involves adding a morpheme to
another form, the stem. From the perspective of learning, it may be more
difficult to sort out the stem from the inflectional morpheme if the latter is
prefixed than if it is suffixed. The other possibility is a processing one: once all
of the forms have been learned, it is more difficult to recognize forms and
distinguish them from one another when the morphological system works a
particular way, e.g. uses inflectional prefixes.
We do not rule out the possibility of a processing explanation in principle,
although we do not believe that the proposals that have been advanced (see
}13.1.2) are particularly compelling or comprehensive. The types of measure
that we explore here (see }13.4) are of the learning type.

13.2.2 Acquisition complexity: the dynamical component


We assume that the key determinant of complexity is the transparency or
opacity of the morphological system to the learner. If we look at a collection
of data without consideration of the task of acquisition, but just consider the
overall transparency of the data, there is no apparent distinction between
suffixation, prefixation, or a number of other morphological devices that can
be imagined. However, language is inherently temporal, in the sense that
expressions are encountered and processed in time. At the beginning of an
unknown word, it is generally hard for a naı̈ve learner to predict the entire
form of the word. Given this, our question about relative complexity may be
formulated somewhat more precisely as follows: Assuming the sequential
processing of words, how do different formal morphological devices contrib-
ute to the complexity of acquiring the language?
The intuition of many researchers is that it is the temporal structure of
language that produces the observed preference for suffixation. We adopt this
insight and make it precise. In particular, we compute for all words in a
lexicon their relative similarity to one another as determined by a sequential
algorithm. Words that are identical except for a single difference are closer to
one another if the difference falls towards the end of the words than if it comes
at the beginning, a reflection of the higher processing cost to the learner of
340 explaining syntax

keeping early differences in memory versus the lower processing cost of


simply checking that early identities are not problematic. We describe the
algorithm in detail in }13.4 and justify some of the particular choices that we
make in formulating it.
An important consequence of the complexity measure is that it correctly
yields the desired result, i.e. that inflectional suffixation is less costly to a
system than is inflectional prefixation. Given this measure, we are then able to
apply it to cases for which it was not originally devised, e.g. infixation, various
types of reduplication, and templatic morphology.

13.3 Relevant studies in acquisition and processing


In this section, we review several relevant studies.

13.3.1 Lexical processing


A large amount of psycholinguistics literature suggests that lexical access is
generally achieved on the basis of the initial part of the word:
 The beginning is the most effective cue for recall or recognition of a
word, cf. Nooteboom (1981) (Dutch).
 Word-final distortions often go undetected, cf. Marslen-Wilson and
Welsh (1978); Cole (1973); Cole and Jakimik (1978; 1980).
 Speakers usually avoid word-initial distortion, cf. Cooper and Paccia-
Cooper (1980).
An example of a model based on these facts is the cohort model of Marslen-
Wilson and Tyler (1980). It assumes that when an acoustic signal is heard, all
words consistent with it are activated; as more input is being heard, fewer
words stay activated, until only one remains activated. This model also allows
easy incorporation of constraints and preferences imposed by other levels of
grammar or real-world knowledge.
Similarly, as Connine et al. (1993) and Marslen-Wilson (1993) show,
changes involving non-adjacent segments are generally more disruptive to
word recognition than changes involving adjacent segments.

13.3.2 External cues for morphology acquisition


Language contains many cues on different levels that a speaker can exploit
when processing or acquiring morphology. None of these cues is 100%
reliable. It is questionable whether they are available to their full extent during
the developmental stage when morphology is acquired.
morphological complexity 341

PHONOTACTICS. It is often the case that a certain segment combination is


impossible (or rare) within a morpheme but does occur across the morpheme
boundary. Saffran et al. (1996) showed that hearers are sensitive to phonotac-
tic transition probabilities across word boundaries. The results in Hay et al.
(2003) suggest that this sensitivity extends to morpheme boundaries. Their
study found that clusters infrequent in a given language tend to be perceived
as being separated by a morpheme boundary.4
SYNTACTIC CUES. In some cases, it is possible to partially or completely predict
inflectional characteristics of a word based on its syntactic context. In English,
for example, knowing what the subject is makes it possible to know whether or
not the present tense main verb will have the 3rd person singular form.
SEMANTIC CUES. Inflectionally related words (i) share certain semantic prop-
erties (e.g. both walk and walked refer to the same action), (ii) occur in similar
contexts (eat and ate occur with the same type of object, while eat and drink
occur with different types of object). Similarly, words belonging to the same
morphological category often share certain semantic features (e.g. referring to
multiple entities). Note, however, that the opposite implication is not true:
two words sharing some semantic properties, and occurring in similar con-
texts, do not necessary have to be inflectionally related (cf. walk and run).
DISTRIBUTIONAL CUES. According to Baroni (2000) distributional cues are one
of the most important cues in morphology acquisition. Morphemes are
syntagmatically independent units—if a substring of a word is a morpheme,
then it should occur in other words. A learner should look for substrings
which occur in a high number of different words (that can be exhaustively
parsed into morphemes). He also claims that distributional cues play a
primary role in the earliest stages of morpheme discovery. Distributional
properties suggest that certain strings are morphemes, making it easier to
notice the systematic semantic patterns occurring with certain of those words.
Longer words are more likely to be morphologically complex.
13.3.3 Computational acquisition of paradigms
Several algorithms exploit the fact that forms of the same lemma5 are likely to
be similar in multiple ways. For example, Yarowsky and Wicentowski (2000)

4
The study explores the perception of nonsense words containing nasal–obstruent clusters.
Words containing clusters rare in English (e.g. /np/) were rated as potential words more likely
when the context allowed placing a morpheme boundary in the middle of the cluster,
e.g. zan·plirshdom was rated better than zanp·lirshdom.
5
The term ‘lemma’ is used with several different meanings. In our usage, every set of forms
belonging to the same inflectional paradigm is assigned a lemma, a particular form chosen by
convention (e.g. nominative singular for nouns, infinitive for verbs) to represent that set. The
342 explaining syntax

assume that forms belonging to the same lexeme are likely to have similar
orthography and contextual properties, and that the distribution of forms will
be similar for all lexemes. In addition they combine these similarity measures
with an iteratively trained probabilistic grammar generating the word forms.
Similarly Baroni et al. (2002) successfully use orthographical and semantic
similarity.
Formal similarity. The usual tool for discovering similarity of strings
is the Levenshtein edit distance (Levenshtein 1966). The advantage is that
it is extremely simple and is applicable to concatenative as well as non-
concatenative morphology. Some authors (Baroni et al. 2002) use the stand-
ard edit distance, where all editing operations (insert, delete, substitute) have a
cost of 1. Yarowsky and Wicentowski (2000) use a more elaborated approach.
Their edit operations have different costs for different segments and the costs
are iteratively re-estimated; initial values can be based either on phonetic
similarity or on a related language.
Semantic similarity. In most of the applications, semantics cannot be
accessed directly and therefore must be derived from other accessible proper-
ties of words. For example, Jacquemin (1997) exploits the fact that semantic-
ally similar words occur in similar contexts.
Distributional properties. The method of Yarowsky and Wicentowski
(2000) acquires morphology of English irregular verbs by comparing the
distributions of their forms with regular verbs, assuming they are distributed
equally.6 They also note that forms of the same lemma have similar selectional
preferences. For example, related verbs tend to occur with similar subjects
and objects. The selectional preferences are usually even more similar across

terms ‘citation form’, ‘canonical form’ are used with the same meaning. For example, the forms
break, breaks, broke, broken, breaking have the same lemma, break. Note that in this usage, only
forms related by inflection share the same lemma, thus e.g. the noun songs and the verb sings do
not have the same lemma.
6
Obviously, this approach would have to be significantly modified for classes other than
verbs and/or for highly inflective languages. Let’s consider e.g. Czech nouns. Not all nouns have
the same distribution of forms; e.g. many numeral constructions require the counted object to
be in the genitive. Therefore, currency names are more likely to occur in the genitive than, say,
proper names. Proper nouns occur in vocative far more often than inanimate objects, words
denoting uncountable substances (e.g. sugar) occur much more often in singular than in plural,
etc. Therefore, we would have to assume that there is not just a single distribution of forms
shared by all the noun lemmas, but several distributions. The forms of currency names, proper
names, and uncountable substances would probably belong to different distributions.
The algorithm in Yarowsky and Wicentowski (2000) is given candidates for verbal paradigms
and it discards those whose forms do not fit into the required uniform distribution. The
algorithm for discovering Czech nouns could use the same technique, but (i) there would not
be just one distribution but several, (ii) the algorithm would need to discover what those
distributions are.
morphological complexity 343

different forms of the same lemma than across synonyms. For this case, they
manually specify regular expressions that (roughly) capture patterns of pos-
sible selectional frames.

13.4 The complexity model


We turn now to our approach to the issue. For the comparison of acquisition
complexity of different morphological systems, we assume that morphology
acquisition has three consecutive stages:7
(i) Forms are learned as suppletives.
(ii) Paradigms (i.e. groups of forms sharing the same lemma) are dis-
covered and forms are grouped into paradigms.
(iii) Regularities in paradigms are discovered and morphemes are identi-
fied (if there are any).
The first stage is uninteresting for our purpose; the complexity of morpho-
logical acquisition is determined by the complexity of the second and third
stages. To simplify the task, we focus on the second stage. This means that we
estimate the complexity of morphology acquisition in terms of the complex-
ity of clustering words into paradigms: the easier it is to cluster words into
paradigms, the easier (we assume) it will be to acquire their morphology.8
We assume that this clustering is performed on the basis of the semantic
and formal similarity of words; words that are formally and semantically
similar are put into the same paradigm and words that are different are put
into distinct paradigms. For now, we employ several simplifications: we
ignore most irregularities, we assume that there is no homonymy and no
synonymy of morphemes and we also disregard phonological alternations.
Obviously, a higher incidence of any of these makes the acquisition task
harder.

13.4.1 Semantic similarity


Our model simplifies the acquisition task further by assuming that the
semantics is available for every word. We believe that this is not an unreason-
able assumption since infants are exposed to language in context. If they have

7
A more realistic model would allow iterative repetition of these stages. Even after establish-
ing a basic morphological competence, new forms that are opaque for it are still learned as
suppletives. The output of Stage 3 can be used to improve the clustering in Stage 2.
8
Of course, it is possible to imagine languages where Stage 2 is easy and Stage 3 is very hard.
For instance, in a language where plural is formed by some complex change of the last vowel,
Stage 2 is quite simple (words that differ only in that vowel go into the same paradigm), while
Stage 3 (discovering the rule that governs the vowel change) is hard.
344 explaining syntax

limited access to context, their language development is very different, as


Peters and Menn (1993) show in their comparison of morphological acquisi-
tion in a normal and a visually impaired child. Moreover, as computational
studies show, words can be clustered into semantic classes using their distri-
butional properties (Yarowsky and Wicentowski 2000).

13.4.2 Similarity of forms


As noted earlier, we assume that ease of morphological acquisition correlates
with ease of clustering forms into paradigms using their formal similarity as a
cue. We propose a measure called the paradigm similarity index (PSI) to
quantify the ease of such clustering. A low PSI means that (in general) words
belonging to the same paradigm are similar to each other, while they are
different from other words. The lower the index, the easier it is to correctly
cluster the forms into paradigms.
If L denotes the set of words (types, not tokens) in a language L and prdgm(w)
is a set of words belonging to the same paradigm as the word w, then we can
define PSI as:
(6) PSI(L) = avg {ipd (w) epd(w)jw 2 L}
where epd is the average distance between a word and all other words:
(7) epd(w) = avg {ed(w, u)|u 2 L}
and ipd is the average distance between a word and all words of the same
paradigm:
(8) ipd(i) = avg {ed(w, u)|u 2 prdgm(w)}
Finally, ed is a function measuring the similarity of two words (similarity of
their forms, i.e. sounds, not of their content). In the subsequent models, we
use various variants of the Levenshtein distance (LD), proposed by Levenshtein
(1966), as the ed function.

13.4.3 Model 0: standard Levenshtein distance


The Levenshtein distance defines the distance between two sequences s1 and s2
as the minimal number of edit operations (substitution, insertion, or dele-
tion) necessary to modify s1 into s2. For an extensive discussion of the original
measure and a number of modifications and applications, see Sankoff and
Kruskal (1983).
The algorithm of the Model 0 variant of the ed function is given in
Figure 13.1. The pseudocode is very similar to functional programming lan-
guages like Haskell or ML. The function ed accepts two strings and returns a
morphological complexity 345

ed : : String, String –> Integer


[], [] =0
u, [] = length u // DELETE u
[], v = length v // INSERT v
u:us, v:vs = min [ // the minium of
(if u = = v then 0 else 1) + ed (us, vs), // MATCH / SUBST
1 + ed (us, v:vs), // DELETE u
1 + ed (u:us, vs) ] // INSERT v

Figure 13.1. Edit Distance Algorithm of Model 0 (Levenshtein)

natural number—the edit distance of those strings. The function is followed


by several templates introduced by ‘|’ selecting the proper code depending on
the content of the arguments. The edit distance of
 two empty strings is 0;
 a string from an empty string is equal to the length of that string—the
number of DELETEs or INSERTs necessary to turn one into the other;
 two non-empty strings is equal to the cost of the cheapest of the
following three possibilities:
- the cost of MATCH or SUBSTITUTE on the current characters plus
the edit distance between the remaining characters;
- the cost of DELETing the first character of the first string (u), i.e. 1,
plus the edit distance between the remaining characters (us) and the
second string (v:vs);
- the cost of INSERTing the first character of the second string (v) at
the beginning of the first string, i.e. 1, plus the edit distance between
the first string (u:us) and the remaining characters of the second
string (vs).
The standard Levenshtein distance is a simple and elegant measure that is very
useful in many areas of sequence processing. However, for morphology and
especially acquisition, it is an extremely rough approximation. It does not
reflect many constraints of the physical and cognitive context the acquisition
occurs in. For example, the fact that some mutations are more common than
others is not taken into account.
What is most crucial, however, is that the standard LD does not reflect the
fact that words are perceived and produced in time. The distance is defined as
the minimum cost over all possible string modifications. This may be desir-
able for many applications and is even computable by a very effective dynamic
programming algorithm (cf. Sankoff and Kruskal 1983). However, the limita-
tions of human memory make such a computational model highly unrealis-
tic. In the subsequent models, we modify the standard Levenshtein distance
346 explaining syntax

measure in such a way that it reflects more intuitively the physical and
cognitive reality of morphology acquisition. Some of the modifications are
similar to edit distance variants proposed by others, while some we believe are
original.

13.4.3.1 Suffix vs. prefix


Unsurprisingly, our Model 0 (based on the standard Levenshtein distance)
treats suffixing and prefixing languages as equally complex. Consider the two
‘languages’ in Table 13.1, or more formally in (9), differing only in the position
of the affix.
(9) L = kuti, norebu, . . . , A = ve, ba, LP = A·L, LS = L·A.
For both languages, the cheapest way to modify any singular form to the
corresponding plural form is to apply two substitution operations on the two
segments of the affix. Therefore, the edit cost is 2 in both cases, as Table 13.2
shows. The same is true in the opposite direction (Plural ! Singular).
Therefore the complexity index is the same for both languages. Similarly,
the result for languages with different length of affixes (ve·kuti vs. uba·kuti) or
languages where one of the forms is a bare stem (kuti vs. ba·kuti) would be the
same for both affix types—see Table 13.3. Of course, this is not the result we
are seeking.

13.4.3.2 Mirror image


Obviously, the model (but also the standard Levenshtein distance) predicts
that reversal as a hypothetical morphological operation is extremely compli-
cated to acquire—it is unable to find any formal similarity between two forms
related by reversal.

13.4.4 Model 1: matching strings in time


In this and subsequent models, we modify the standard edit distance to better
reflect the linguistic and psychological reality of morphological acquisition –

Table 13.1. Sample prefixing and suffixing languages


Prefixing language(LP) Suffixing language(LS)
Singular ve .kuti kuti .ve
Plural ba .kuti kuti.ba
Singular ve .norebu norebu .ve
Plural ba .norebu norebu .ba
. .
. .
. .
morphological complexity 347

Table 13.2. Comparing prefixed and suffixed words in Model 0


Prefixing language(LP) Suffixing language(LS)
operation cost operation cost
v b substitute 1 k k match 0
e a substitute 1 u u match 0
k k match 0 t t match 0
u u match 0 i i match 0
t t match 0 v b substitute 1
i i match 0 e a substitute 1

Total cost 2 Total cost 2

Table 13.3. Comparing prefixed and suffixed words in Model 0


Prefixing language(L0 P) Suffixing language(L0 S)
operation cost operation cost
u insert 1 k k match 0
v b substitute 1 u u match 0
e a substitute 1 t t match 0
k k match 0 i I match 0
u u match 0 v u substitute 1
t t match 0 e b substitute 1
i i match 0 a insert 1

Total cost 3 Total cost 3

especially the fact that language occurs in time, and that human computa-
tional resources are limited.
Model 1 uses an incremental algorithm to compute similarity distance of
two strings. Unlike Model 0, Model 1 calculates only one edit operation
sequence. At each position, it selects a single edit operation. The most
preferred operation is MATCH. If MATCH is not possible, another operation
(SUBSTITUTE, DELETE, or INSERT) is selected randomly.9 The edit dis-
tance computed by this algorithm is larger or equal to the edit distance
computed by Model 0 algorithm (Figure 13.1). It cannot be smaller, because
Model 0 computes the optimal distance. It can be larger, because the oper-
ation selected randomly does not have to be optimal.

9
A more realistic model could (1) adjust the preference in the operation selection by
experience, (2) employ a limited look-ahead window. For the sake of simplicity, we ignore
these options.
348 explaining syntax

ed : : String, String –> Integer


[], [] =0
u, [] = length u // DELETE u
[], v = length v // INSERT v
u : us, u : vs = ed (us, vs) // MATCH
u : us, v : vs = 1 + random [ // one of:
ed (us, vs) , // SUBTITUTE
ed (us, v : vs) , // DELETE
ed (u : us, vs) ] // INSERT

Figure 13.2. Edit Distance Algorithm of Model 1

The algorithm for computing such edit distance is spelled out in


Figure 13.2. The code for the first three cases (two empty strings, or a non-
empty string and an empty string) is the same as in the Model 1 algorithm.
The algorithms differ in the last two cases covering nonempty strings:
MATCH is performed if possible, a random operation is selected otherwise.

13.4.4.1 Prefixes vs. suffixes


Other things being equal, Model 1 considers it easier to acquire paradigms of a
language with suffixes than of a language with prefixes. Intuitively, the reason
for the higher complexity of prefixation is as follows: When a non-optimal
operation is selected, it negatively influences the matching of the rest of the
string. In a prefixing language, the forms of the same lemma differ at the
beginning and therefore a non-optimal operation can be selected earlier than
in a suffixing language. Thus the substring whose matching is negatively
influenced is longer.
Let LP be a prefixing language, LS the analogous suffixing language, wp 2 LP
and ws the analogous word 2 LS.10 Obviously, it is more probable that ipd
(wp)  ipd(ws) than not. Asymptotically, for infinite languages, the epd(wp) =
epd(ws). Therefore, for such languages PSI(LP) > PSI(LS). We cannot assume
infinite languages, but we assume that the languages are large enough to avoid
pathological anomalies.
Consider Figure 13.3. It shows all the possible sequences of edit operations
for two forms of a lemma from both prefixing (A) and suffixing (B) languages
LP and LS. The best sequences are on the diagonals.11 The best sequences

10
If S is a set of stems, A a set of affixes, then LP = A·S and LS = S·A. If s 2 S and a 2 A, then
wp = a·s and ws = s·a. The symbol · denotes both language concatenation and string
concatenation.
11
Note that this is not the general case, e.g., for words of different length there is no diagonal
at all—cf. Figure 13.3 C or D.
morphological complexity 349

v e k u t i k u t i v e

b k
a u
k t
u i
t b
i a
A. A prefixing language in M1 B. A suffixing language in M1

v e k u t i k u t i v e

k k
u u
t t
i i
C. Zero prefixes in M1 D. Zero prefixes in M1

Match Substitite Delete Insert

Figure 13.3. Comparing words in Model 1

(SSMMMM, or 2 SUBSTITUTEs followed by 4 MATCHes, for LP and


MMMMSS for LS ) are of course the same as those calculated by the standard
Levenshtein Distance. And their costs are the same for both languages.
However, the paradigm similarity index PSI is not defined in terms of the
best match, but in terms of the average cost of all possible sequences of edit
operations—see (6). The average costs are different; they are much higher for
LP than for LS. For LS, the cost is dependent only on the cost of matching the
two suffixes. The stems are always matched by the optimal sequence of
MATCH operations. Therefore a deviation from the optimal sequence can
occur only in the suffix. In LP, however, the uncertainty occurs at the begin-
ning of the word and a deviation from the optimal sequence there introduces
uncertainty later that causes further deviations from the optimal sequence of
operations. The worst sequences for LS contain 4 MATCHes, 2 DELETEs, and
2 INSERTs; the cost is 4. The worst sequences for LP contain 6 DELETEs and
6 INSERTs; the cost is 12.
In case of languages using zero affixes, the difference is even more apparent,
as C and D in Figure 13.3 show. Model 1 allows only one sequence of edit
operations for words kuti and kuti·ve of the suffixing language LS0 –
350 explaining syntax

MMMMII.12 The cost is equal to 2 and since there are no other possibilities,
the average cost of matching those two words is trivially optimal. The optimal
sequence for words kuti and ve·kuti of the prefixing language LP0 (IIMMMM)
costs also 2. However, there are many other non-optimal sequences. The worst
ones contain 6 INSERTs and 4 DELETEs and have a cost of 10.13

13.4.4.2 Evaluation
We randomly generate pairs of languages in various ways. The members of the
pair are identical except for the position of the affix. There is no homonymy
in the languages. For each such pair we calculated the following ratio:
PSI(LP)
(10) sufPref =
PSI(LS)

If sufPref > 1 Model 1 considers the suffixing language LS easier to acquire


than the prefixing language LP.
We generated 100 such pairs of languages with the parameters summarized
in Table 13.4, calculating statistics for sufPref. The alphabet can be thought of
as a set of segments, syllables, or other units. Before discarding homonyms, all
distributions are uniform. As can be seen from Table 13.5, Model 1 really
considers the generated suffixing languages much simpler than the prefixing
ones.

13.4.4.3 Other processes


Infixes. Model 1 makes an interesting prediction about the complexity of
infixes. It considers infixing languages to be more complex than suffixing

Table 13.4. Experiment: parameters


Number of languages 100
Alphabet size 25
Number of stems in a language 50
Shortest stem 1
Longest stem 6
Number of affixes in a language 3
Shortest affix 0
Longest affix 3

12
Note that delete or insert operations cannot be applied if match is possible.
13
In a model using a look-ahead window, the prefixing language would be still more
complex, but the difference would be smaller.
morphological complexity 351

Table 13.5. Experiment: results


mean 1.29
standard deviation 0.17
Q1 1.16
median 1.27
Q3 1.33

languages, but less complex than prefixing languages. The reason is simple:
the uncertainty is introduced later than in the case of a prefix, therefore it is
possible that the string whose matching can be influenced by a non-optimal
operation selection is shorter.
This prediction contradicts the fact that infixes are much rarer than prefixes
(}13.1.2). Note, however, that the prediction concerns simplicity of clustering
word forms into paradigms. According to the model, it is easier to cluster
forms of an infixing language into paradigms than those of a prefixing
language. It may well be the case that infixing languages are more complex
from another point of view, that of identification of morphemes: other things
being equal, a discontinuous stem is probably harder to identify than a
continuous one.
Metathesis.The model prefers metathesis occurring later in a string for the
same reasons as it prefers suffixes over prefixes. This prediction is in accord
with data (see }13.B.2). However, the model also considers metathesis (of two
adjacent segments) to have the same cost as an affix consisting of two segments
and to be even cheaper than an affix with more segments. This definitely does
not reflect the reality. In }13.4.5.2, we suggest how to rectify this.
Mirror image. Similarly as Model 0, this model considers mirror image to
be extremely complicated to acquire.
Templatic morphology. As we note in Appendix }13.B.1, templatic morph-
ology does not have to be harder to acquire than morphology using continuous
affixes. Following Fowler (1983), it can be claimed that consonants of the
root and vowels of the inflection are perceptually in different ‘dimensions’—
consonants are modulated on the basic vowel contour of syllables—and there-
fore clearly separable.

13.4.5 Possible further extensions


13.4.5.1. Model 2: morpheme boundaries and backtracking
In this section we suggest extending Model 1 by a notion of a probabilistic
morpheme boundary to capture the fact that, other things being equal,
352 explaining syntax

exceptions and a high number of paradigm patterns make a language harder


to acquire. This is just a proposal; we leave a proper evaluation for future
research.
Intuitively, a morphological system with a small number of paradigmatic
patterns should be easier to acquire than a system with large number of
paradigms (or a lot of irregularities). However, the measure in previous
models is strictly local. The cost depends only on the matched pair of
words, not on global distributional properties. This means that words related
by a rare pattern can have the same score as words related by a frequent
pattern. For example, Model 1 considers foot [fut] / feet [fit] to be equally
similar as dog [dag] / dogs [dagz], or even more similar than bench [bentʃ]/
benches [bentʃɪz]. Thus a language with one paradigmatic pattern is assigned
the same complexity as a language where every lemma has its own paradigm
(assuming the languages are otherwise equal, i.e. they are of the same mor-
phological type and morphemes have the same length).
Model 2 partially addresses this drawback by enhancing Model 1 with
probabilistic morpheme boundaries and backtracking. Probabilistic mor-
pheme boundaries are dependent on global distributional properties, namely
syllable predictability. Which syllable will follow is less predictable across
morphemes than morpheme-internally. This was first observed by Harris
(1955), and is usually exploited in computational linguistics in unsupervised
acquisition of concatenative morphology. Several studies (Johnson and
Jusczyk 2001; Saffran et al. 1996) show that the degree of syllable predicta-
bility is one of the cues used in word segmentation. Since acquisition of word
segmentation occurs before morphology acquisition, it is reasonable to
assume that this strategy is available in the case of morphological acquisition
as well. Hay et al. (2003) suggest that this is in fact the case. They found that
clusters that are infrequent in a given language tend to be perceived as being
separated by a morpheme boundary. The transitional probabilities for vari-
ous syllables14 are more distinct in a language with few regular paradigms.
Thus in such a language morpheme boundaries are easier to determine than
in a highly irregular language.
In Model 2, the similarity distance between two words is computed using
a stack and backtracking. Each time when there is a choice of operation
(i.e., anytime MATCH operation cannot be applied), a choice point is re-
membered on the stack. This means that Model 2 makes it possible to correct
apparent mistakes in matching that Model 1 was not able to do. The new total
similarity distance between two words is a function of (1) the usual cost of edit

14
It is probable that learners extract similar probabilities on other levels as well—segments,
feet, etc.
morphological complexity 353

operations, (2) the size of the stack in all steps, (3) the cost of possible
backtracking. Each of them is adding to the memory load and/or slowing
processing.
Matching morpheme boundaries increases the probability that the two
words are being matched the ‘right’ way (i.e. that the match is not accidental).
This means that it is more likely that the choices of edit operations made in
the past were correct, and therefore backtracking is less likely to occur. In such
case, Model 2 flushes the stack. Similarly, the stack can be flushed if a certain
number of matches occurs in a row, but a morpheme boundary contributes
more to the certainty of the right analysis. In general, we introduce a notion of
anchor, that is, i.e. a sequence of matches of certain weight when the stack
is flushed. This can be further enhanced by assigning different weights to
matching of different segments (consonants are less volatile than vowels).
Morpheme boundaries would then have higher weight than any segment.
Moreover, more probable boundaries would have higher weights than less
probable ones.
Thus in general, a regular language with more predictable morpheme
boundaries needs a smaller stack for clustering words according to their
formal similarity.
Suffix vs. prefix. It is evident that Model 2 also considers prefixing
languages more complex than suffixing languages for two reasons. First, the
early uncertainty of a prefixing language leads to more deviations from the
minimal sequence of edit operations in the same way as in Model 1. Second,
the stack is filled early and the information must be kept there for a longer
time, therefore the memory load is higher.
Infixes. Our intuitions tell us that Model 2, unlike Model 1, would
consider an infixing language more complex than a prefixing language. The
reason is that predicting morpheme boundaries using statistics is harder in an
infixing language than in the corresponding prefixing language. However we
have not worked out the formal details of this.

13.4.5.2 Other possibilities


Variable atomic distances. A still more realistic model would need to take
into consideration the fact that certain sounds are more likely to be substi-
tuted for one another than other sounds. The model would reflect this by
using different SUBSTITUTE costs for different sound pairs. For example,
substituting [p] for [b], which are the same sounds except voicing, would be
cheaper than substituting [p] for [i], which differ in practically all features.
This would reflect (i) language-independent sound similarities related to
perception or production (e.g. substituting a vowel by a vowel would be
354 explaining syntax

cheaper than replacing it by a consonant), (ii) sound similarities specific to a


particular language and gradually acquired by the learner (e.g. [s] and [ʃ] are
allophones, and are therefore often substituted one for the other, in Korean,
but not in Czech). An iterative acquisition of these similarities was success-
fully used by (Yarowsky and Wicentowski 2000) (see }13.3.3).
More realistic insert. The model could also employ more realistic
INSERT operations, one referring to a lexicon of acquired items and one
referring to the word to be matched. The former INSERT would allow the
insertion of units recognized as morphemes in the previous iterations
of the second (paradigm discovery) and third stages (pattern discovery) of
the acquisition process. This INSERT is much cheaper than the normal
INSERT. A model containing such INSERT would consider metathesis
much more complex than, for example, concatenative morphology. The latter
INSERT would work like a copy operation—it would allow inserting material
occurring at another place in the word. This INSERT would make reduplica-
tion very simple.

13.5 Conclusion
We showed that it is possible to model the prevalence of various morpho-
logical systems in terms of their acquisition complexity. Our complexity
measure is based on the Levenshtein edit distance modified to reflect external
constraints—human memory limitations and the fact that language occurs in
time. Such a measure produces some interesting predictions; for example, it
correctly predicts prefix–suffix asymmetry and shows mirror image morph-
ology to be virtually impossible.

13.A Morphology acquisition by neural networks


Most of the research on using neural or connectionist networks for morpho-
logical acquisition is devoted to finding models that are able to learn both
rules and exceptions (cf. Rumelhart and McClelland 1986; Plunkett and
Marchman 1991; Prasada and Pinker 1993, etc.). Since we are interested in
comparing morphological systems in terms of their typological properties,
this research is not directly relevant.
However, there is also research comparing the acquisition of different
morphological types. Gasser (1994) shows that a simple modular recurrent
connectionist model is able to acquire various inflectional processes, and
that different processes have a different level of acquisition complexity. His
model takes phones (one at a time) as input and outputs the corresponding
stems and inflections. During the training process, the model is exposed to
morphological complexity 355

both forms and the corresponding stem–inflection pairs. This is similar (with
enough simplification) to our idealization of a child being exposed to both
forms and their meanings.
Many of the results are in accord with the preferences attested in real
languages (see }13.1.2): it was easier to identify roots in a suffixing language
than in a prefixing one, the templates were relatively easy, and infixes were
relatively hard.15 In a similar experiment Gasser and Lee (1991) showed that
the model does not learn linguistically implausible languages—pig Latin or a
mirror image language (see (5)). The model was unable to learn any form of
syllable reduplication. A model enhanced with modules for syllable process-
ing was able to learn a very simple form of reduplication—reduplicating onset
or rime of a single syllable. It is necessary to stress that the problem addressed
by Gasser was much simpler than real acquisition: (1) at most two inflectional
categories were used, each with only two values, (2) each form belonged only
to one paradigm, (3) there were no irregularities, and (4) only the relevant
forms with their functions were presented (no context, no noise).

13.B Templatic morphology, metathesis


13.B.1 Templatic morphology
In templatic morphology, both the roots and affixes are discontinuous. Only
Semitic languages belong to this category. Semitic roots are discontinuous
consonantal sequences formed by three or four consonants (l-m-d ‘learn’).
To form a word the root must be interleaved with a (mostly) vocalic pattern as
in the Hebrew examples in (11).
(11) lomed ‘learnmasc’ shatak ‘be-quietpres.masc’
amad ‘learntmasc.sg.3rd’ shatak ‘was-quietmasc.sg.3rd’
limed ‘taughtmasc.sg.3rd’ shitek ‘made-sb-to-be-quietmasc.sg.3rd’
lumad ‘was-taughtmasc.sg.3rd’ shutak ‘was-made-to-be-quietmasc.sg.3rd’

Phonological alternations are possible—e.g. stop alternating with fricatives


([b]/[v]). Semitic morphology is not exclusively templatic: some processes are
also concatenative.
Processing template morphology. From the processing point of view,
template morphology may seem complicated. However, if we assume that
consonants of the root and vowels of the inflection are perceptually in

15
The accuracy of root identification was best in the case of suffixes, templates, and umlaut
(c.75%); in the case of prefixes, infixes, and deletion it was lower (c.50%); all above the chance
baseline (c.3%) The accuracy of the inflection identification showed a different pattern: the best
were prefix and circumfix (95+%), slightly harder were deletion, template, and suffix (90+%),
and the hardest were umlaut and infix (c.75%); all above the chance baseline (50%).
356 explaining syntax

different ‘dimensions’ and therefore clearly separable, it would not be more


complicated than morphology using continuous affixes or suprasegmentals.
Fowler (1983) convincingly argues on phonetic grounds for such an assump-
tion—consonants are modulated on the basic vowel contour of syllables.
Ravid’s (2003) study also suggests that template morphology is not more
difficult to acquire than a concatenative one. She finds that in case of forms
alternatively produced by template and concatenative processes, children
tend to acquire the template option first. She also claims that young Israeli
children rely on triconsonantal roots as the least marked option when
forming certain verbs. Three-year-old children are able to extract the root
from a word—they are able to interpret novel root-based nouns.

13.B.2 Metathesis
In morphological metathesis, the relative order of two segments encodes a
morphological distinction. For example, in Rotuman (Austronesian family,
related to Fijian), words distinguish two forms, called the complete and
incomplete phase16 by Churchward (1940), and in many cases these are
distinguished by metathesis (examples due to Hoeksema and Janda 1988):17

(12) Complete phase Incomplete phase


aírε ai ε′ r
˘ ‘fish
púrε puε′ r
˘ ‘rule, decide’ (Rotuman)
tík c ti ′ k
c˘ ‘flesh’
sε′ ma s ′ ám
ε̆ ‘left-handed’
Although phonological metathesis is not rare, it is far less common than other
processes like assimilation. As a morphological marker (i.e. not induced by
phonotactics as a consequence of other changes) it is extremely rare—found
in some Oceanic (including the above-mentioned Rotuman) and North
American Pacific Northwest languages (e.g. Sierra Miwok, Mutsun) (Becker
2000). According to Janda (2011), it is probable that in such cases of metath-
esis, originally, some other means marked the morphological category and
metathesis was only a consequence of phonotactic constraints, and only later
became the primary marker.
Mielke and Hume (2001) examined 54 languages involving metathesis and
found that it is very rare word/root-initially or with non-adjacent segments.

16
According to Hoeksema and Janda (1988), the complete phase indicates definiteness or
emphasis for nouns and perfective aspect or emphasis for verbs and adjectives, while the
incomplete phase marks words as indefinite/imperfective and nonemphatic.
17
In many cases, subtraction (rako vs. rak ‘to imitate’), subtraction with umlaut (hoti vs. höt ‘to
embark’), or identity (rı̄ vs. rı̄ ‘house’) is used instead. See McCarthy (2000) for more discussion.
morphological complexity 357

They found only one language (Fur) with a fully productive root-initial
metathesis involving a wide variety of sounds. Apparent cases of non-adjacent
metathesis can be usually analyzed as two separate metatheses, each motiv-
ated by an independent phonological constraint.
Processing metathesis. Mielke and Hume (2001) suggest that the reasons
for the relative infrequency of metathesis are related to word recognition—
metathesis impedes word recognition more than other frequent processes,
like assimilation. Word recognition (see }13.3.1) can also explain the fact that
it is even rarer (or perhaps nonexistent) word/root-initially or with non-
adjacent segments: since (i) lexical access is generally achieved on the basis of
the initial part of the word and (ii) phonological changes involving non-
adjacent segments are generally more disruptive to word recognition.
References

Adger, David (2003). Core Syntax: A Minimalist Approach. Oxford: Oxford University
Press.
Aissen, Judith, and Joan Bresnan (2002). Optimality and functionality: objections and
refutations. Natural Language and Linguistic Theory 20: 81–95.
Akmajian, Adrian (1970). On deriving cleft sentences from pseudocleft sentences.
Linguistic Inquiry 1: 149–68.
Akmajian, Adrian, and Frank Heny (1975). An Introduction to Transformational
Generative Grammar. Cambridge, Mass.: MIT Press.
Anderson, Stephen R. (1977). Comments on the paper by Wasow. In Peter W. Culicover,
Thomas Wasow, and Adrian Akmajian (eds), Formal Syntax, 361–77. New York:
Academic Press.
Arnold, Jennifer, Thomas Wasow, Anthony Losongco, and Ryan Ginstrom (2000).
Heaviness vs. newness: the effects of structural complexity and discourse status on
constituent ordering. Language 76: 28–55.
Austin, J. L. (1962). How to Do Things with Words. New York: Oxford University Press.
Authier, Jean-Marc (1991). Iterated CPs and embedded topicalization. Linguistic
Inquiry 23: 329–36.
Bach, Emmon (1980). In defense of passive. Linguistics and Philosophy 3: 297–341.
Baker, Mark (1988). Incorporation: A Theory of Grammatical Function Changing.
Chicago: University of Chicago Press.
Baltin, Mark (1978). Towards a theory of movement rules. Dissertation, MIT.
Baltin, Mark (1981). Strict bounding: the logical problem of language acquisition. In
Carl Lee Baker and John J. McCarthy (eds), The Logical Problem of Language
Acquisition, 247–95. Cambridge, Mass.: MIT Press.
Baltin, Mark (1982). A landing site theory of movement rules. Linguistic Inquiry 3: 1–38.
Baroni, Marco (2000). Distributional cues in morpheme discovery: a computational
model and empirical evidence. Dissertation, UCLA.
Baroni, Marco, Johannes Matiasek, and Harald Trost (2002). Unsupervised discovery
of morphologically related words based on orthographic and semantic similarity. In
Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning,
vol. 6: 48–57.
Bayer, Josef (1984). Towards an explanation of certain that-t phenomena: the COMP
node in Bavarian. In W. de Geest and Y. Putseys (eds), Sentential Complementation,
23–32. Dordrecht: Foris.
Becker, Thomas (2000). Metathesis. In Geert Booij, Christian Lehmann, and Joachim
Mugdan (eds), Morphology: A Handbook on Inflection and Word Formation, 576–81.
Berlin: Mouton de Gruyter.
references 359

Beckman, Mary E., Julia Hirschberg, and Stephanie Shattuck-Hufnagel (2005). The
original ToBI system and the evolution of the ToBI framework. In Sun-Ah Jun (ed.),
Prosodic Typology: The Phonology of Intonation and Phrasing, 9–54. Cambridge:
Cambridge University Press.
Beerman, Dorothee, David LeBlanc, and Henk van Riemsdijk (eds) (1997). Rightward
Movement. Amsterdam: Benjamins.
Belletti, Adriana (2004). Structures and Beyond: The Cartography of Syntactic Struc-
tures. New York: Oxford University Press.
Berman, Arlene, and Michael Szamosi (1972). Observations on sentential stress.
Language 48: 304–25.
Berwick, Robert C. (1987). Parsability and learnability. In Brian MacWhinney (ed.),
Mechanisms of Language Acquisition, 345–65. Hillsdale, NJ: Erlbaum.
Beukema, Frits, and Peter Coopmans (1989). A government-binding perspective on
the imperative in English. Journal of Linguistics 25: 417–36.
Bever, Thomas G. (1970). The cognitive basis for linguistic structures. In John
R. Hayes (ed.), Cognition and the Development of Language, 279–362. New York:
Wiley.
Bever, Thomas G., and Brian McElree (1988). Empty categories access their antece-
dents during comprehension. Linguistic Inquiry 19: 35–43.
Bever, Thomas G., and David J. Townsend (2001). Sentence Comprehension: The
Integration of Habits and Rules. Cambridge, Mass.: MIT Press.
Bierwisch, Manfred (1968). Two critical problems of accent rules. Journal of Linguistics
4: 173–8.
Bing, Janet (1979). Aspects of English prosody. Dissertation, University of Massachu-
setts, Amherst.
Bolinger, Dwight (1958). Stress and information. American Speech 33: 3–20.
Bolinger, Dwight (1961). Contrastive accent and contrastive stress. Language 37: 83–96.
Bolinger, Dwight (1972). Accent is predictable (if you’re a mind-reader). Language 48:
633–44.
Borer, Hagit (1989). Anaphoric AGR. In Osvaldo Jaeggli and Kenneth Safir (eds), The
Null Subject Parameter, 69–110. Dordrecht: Kluwer.
Brame, Michael (1975). On the abstractness of syntactic structure: the VP controversy.
Linguistic Analysis 1: 191–203.
Brame, Michael (1978). Base Generated Syntax. Seattle, Wa.: Noit Amrofer.
Bransford, John D., and Jeffery J. Franks (1971). The abstraction of linguistic ideas.
European Journal of Cognitive Psychology 2: 331–50.
Bresnan, Joan (1971). Sentence stress and syntactic transformations. Language 47:
257–81.
Bresnan, Joan (1972). Stress and syntax: a reply. Language 48: 326–42.
Bresnan, Joan (1976). Evidence for a theory of unbounded transformations. Linguistic
Analysis 2: 353–93.
Bresnan, Joan (1977). Variables in the theory of transformations. In Peter W. Culicover,
Thomas Wasow, and Adrien Akmajian (eds), Formal Syntax, 157–96. New York:
Academic Press.
360 references

Bresnan, Joan (1978). A realistic model of transformational grammar. In Morris Halle,


Joan W. Bresnan, and George Miller (eds), Linguistic Theory and Psychological
Reality, 1–59. Cambridge, Mass.: MIT Press.
Bresnan, Joan (1982a). The Mental Representation of Grammatical Relations. Cam-
bridge, Mass.: MIT Press.
Bresnan, Joan (1982b). Control and complementation. In Bresnan (1982a: 282–390).
Bresnan, Joan (1982c). The passive in grammatical theory. In Bresnan (1982a: 3–86).
Bresnan, Joan (1994). Locative inversion and the architecture of universal grammar.
Language 70: 72–131.
Bresnan, Joan (2000). Optimal syntax. In Joost Dekkers, Frank van der Leeuw, and
Jeroen van de Weijer (eds), Optimality Theory: Phonology, Syntax and Acquisition,
335–85. Oxford: Oxford University Press.
Bresnan, Joan (2001). Lexical-Functional Syntax. Oxford: Wiley-Blackwell.
Briscoe, Edward (2000). Grammatical acquisition: inductive bias and coevolution of
language and the language acquisition device. Language 76: 245–96.
Brown, C. M., and P. Hagoort (1999). The Neurocognition of Language. New York:
Oxford University Press.
Brown, Roger, and Camille Hanlon (1970). Derivational complexity and the order of
acquisition in child speech. In John R. Hayes (ed.), Cognition and the Development
of Language, 155–207. New York: Wiley.
Chafe, Wallace L. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and
point of view. In Charles N. Li (ed.), Subject and Topic, 22–55. New York: Academic Press.
Chierchia, Gennaro (1985). Formal semantics and the grammar of predication. Lin-
guistic Inquiry 16: 417–43.
Chomsky, Noam (1955). The Logical Structure of Linguistic Theory. New York: Plenum.
Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton.
Chomsky, Noam (1964). Current Issues in Linguistic Theory. The Hague: Mouton.
Chomsky, Noam (1965). Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press.
Chomsky, Noam (1971). Deep structure, surface structure and semantic interpret-
ation. In Danny Steinberg and Leon Jacobovits (eds), Semantics, 183–216. Cam-
bridge: Cambridge University Press.
Chomsky, Noam (1972). Remarks on nominalization. In Roderick A. Jacobs and
Peter S. Rosenbaum (eds), Readings in English Transformational Grammar, 184–221.
London: Ginn.
Chomsky, Noam (1973). Conditions on transformations. In Stephen Anderson and
Paul Kiparsky (eds), Festschrift for Morris Halle, 232–86. New York: Holt, Rinehart
& Winston.
Chomsky, Noam (1976). Conditions on rules of grammar. Linguistic Analysis 2: 303–51.
Chomsky, Noam (1977). On wh movement. In Peter W. Culicover, Thomas Wasow,
and Adrian Akmajian (eds), Formal Syntax, 71–132. New York: Academic Press.
Chomsky, Noam (1980). On binding. Linguistic Inquiry 11: 1–46.
Chomsky, Noam (1981a). Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, Noam (1981b). Markedness and core grammar. In Adriana Belletti, Luciana
Brandi, and Luigi Rizzi (eds), Theory of Markedness in Generative Grammar, 123–46.
Pisa: Scuola Normale Superiore.
references 361

Chomsky, Noam (1986). Barriers. Cambridge, Mass.: MIT Press.


Chomsky, Noam (1989). Some notes on economy of derivation and representation.
MIT Working Papers in Linguistics 10.
Chomsky, Noam (1995). The Minimalist Program. Cambridge, Mass.: MIT Press.
Chomsky, Noam, and Morris Halle (1968). The Sound Pattern of English. New York:
Harper & Row.
Chomsky, Noam, Morris Halle, and Fred Lukoff (1956). On accent and juncture in
English. In Morris Halle, Horace Lunt, Hugh MacLean, and Cornelis van Schoon-
eveld (eds), For Roman Jakobson, 65–80. The Hague: Mouton.
Chomsky, Noam, and Howard Lasnik (1977). Filters and control. Linguistic Inquiry 8:
425–504.
Churchward, C. Maxwell (1940). Rotuman Grammar and Dictionary. Sydney: Meth-
odist Church of Australasia.
Cinque, Guglielmo (1990). Types of A-Bar Dependencies. Cambridge, Mass.: MIT Press.
Cinque, Guglielmo (2002). Functional Structure in DP and IP: The Cartography of
Syntactic Structures. New York: Oxford University Press.
Cinque, Guglielmo (2006). Restructuring and Functional Heads: The Cartography of
Syntactic Structures. New York: Oxford University Press.
Cinque, Guglielmo and Luigi Rizzi (2008). The cartography of syntactic structures.
Studies in Linguistics 2: 42–58.
Cole, R. A. (1973). Listening for mispronunciations: a measure of what we hear during
speech. Attention, Perception, and Psychophysics 13: 153–6.
Cole, R. A., and J. Jakimik (1978). Understanding speech: how words are heard. In
G. Underwood (ed.), Strategies of Information Processing, 67–116. London: Aca-
demic Press.
Cole, R. A., and J. Jakimik (1980). How are syllables used to recognize words? Journal
of the Acoustical Society of America 67: 965.
Collins, Chris (1991). Why and how come. MIT Working Papers in Linguistics 15.
Connine, C. M., D. G. Blasko, and D. Titone (1993). Do the beginnings of spoken
words have a special status in auditory word recognition? Journal of Memory and
Language 32: 193–210.
Contreras, Helas (1976). A Theory of Word Order with Special Reference to Spanish.
Amsterdam: North-Holland.
Cooper, W. E., and J. Paccia-Cooper (1980). Syntax and Speech. Cambridge, Mass.:
Harvard University Press.
Coopmans, Peter (1989). Where stylistic and syntactic processes meet: inversion in
English. Language 65: 728–51.
Coopmans, Peter (1992). Review of Rochemont and Culicover (1990), English Focus
Constructions and the Theory of Grammar. Language 68: 206–10.
Culicover, Peter W. (1970). One more can of beer. Linguistic Inquiry 1: 366–9.
Culicover, Peter W. (1971). Syntactic and semantic investigations. Dissertation, MIT.
Culicover, Peter W. (1973). On the coherence of syntactic descriptions. Journal of
Linguistics 9: 35–51.
Culicover, Peter W. (1976). Syntax. New York: Academic Press.
362 references

Culicover, Peter W. (1977). Some observations concerning pseudo-clefts. Linguistic


Analysis 3: 347–75.
Culicover, Peter W. (1982). Though-Attraction. Bloomington: Indiana University Lin-
guistics Club, Bloomington, Ind.
Culicover, Peter W. (1991). Polarity, inversion and focus in English. In ESCOL ’91:
Proceedings of the Eight Eastern States Conference on Linguistics. Columbus, Ohio,
46–68.
Culicover, Peter W. (1992a). Topicalization, inversion, and complementizers in Eng-
lish. In Denis Delfitto, Martin Everaert, Arnold Evers, and Frits Stuurman (eds),
Going Romance and Beyond: Fifth Symposium on Comparative Grammar, 1–43.
Utrecht: OTS.
Culicover, Peter W. (1992b). Focus and grammar. Proceedings of CONSOLE 1, OTS
Working Papers, University of Utrecht, Utrecht: OTS.
Culicover, Peter W. (1993a). Degrees of freedom. Proceedings of the Annual Child
Language Research Forum 25: 30–37.
Culicover, Peter W. (1993b). Evidence against ECP accounts of the that-t effect.
Linguistic Inquiry 24: 557–61.
Culicover, Peter W. (1999). Syntactic Nuts: Hard Cases in Syntax. Oxford: Oxford
University Press.
Culicover, Peter W. (2013). Grammar and Complexity: Language at the Intersection of
Competence and Performance. Oxford: Oxford University Press.
Culicover, Peter W., and Ray Jackendoff (1995). Something else for the binding theory.
Linguistic Inquiry 26: 249–75.
Culicover, Peter W., and Ray Jackendoff (1997). Syntactic coordination despite seman-
tic subordination. Linguistic Inquiry 28: 195–217.
Culicover, Peter W., and Ray Jackendoff (1999). The view from the periphery: the
English comparative correlative. Linguistic Inquiry 30: 543–71.
Culicover, Peter W., and Ray Jackendoff (2001). Control is not movement. Linguistic
Inquiry 30: 483–511.
Culicover, Peter W., and Ray Jackendoff (2005). Simpler Syntax. Oxford: Oxford
University Press.
Culicover, Peter W., and Ray Jackendoff (2006). Turn control over to the semantics.
Syntax 9: 131–52.
Culicover, Peter W., and Ray Jackendoff (2012). A domain-general cognitive relation
and how language expresses it. Language 88: 305–40.
Culicover, Peter W., and Robert D. Levine (2001). Stylistic inversion and the that-t effect
in English: a reconsideration. Natural Language and Linguistic Theory 19: 283–310.
Culicover, Peter W., and Andrzej Nowak (2002). Learnability, markedness, and the
complexity of constructions. In Pierre Pica and Johan Rooryk (eds), Language
Variation Yearbook, vol. 2, 5–30. Amsterdam: Benjamins.
Culicover, Peter W., and Andrzej Nowak (2003). Dynamical Grammar. Oxford:
Oxford University Press.
Culicover, Peter W., and Michael S. Rochemont (1983). Stress and focus in English.
Language 59: 123–65.
references 363

Culicover, Peter W., and Michael S. Rochemont (1990). Extraposition and the comple-
ment principle. Linguistic Inquiry 21: 23–48.
Culicover, Peter W., and Kenneth Wexler (1977). Some syntactic consequences of a
theory of language learnability. In Peter W. Culicover, Thomas Wasow, and Adrian
Akmajian (eds), Formal Syntax, 7–60. New York: Academic Press.
Culicover, Peter W., and Wendy Wilkins (1984). Locality in Linguistic Theory. New
York: Academic Press.
Culicover, Peter W., and Susanne Winkler (2008). English focus inversion construc-
tions. Journal of Linguistics 44: 625–58.
Cutler, Anne, John A. Hawkins, and Gary Gilligan (1985). The suffixing preference:
a processing explanation. Linguistics 23: 723–58.
Daneš, František (1967). Order of elements and sentence intonation. In Morris Halle,
Horace Lunt, Hugh MacLean, and Cornelis van Schooneveld (eds), For Roman
Jakobson, 499–512. The Hague: Mouton.
Delahunty, Gerald P. (1981). Topics in the syntax and semantics of English cleft
sentences. Dissertation, University of California, Irvine.
den Dikken, Marcel (2005). Comparative correlatives comparatively. Linguistic
Inquiry 36: 497–533.
Diesing, Molly (1990). Verb movement and the subject position in Yiddish. Natural
Language and Linguistic Theory 8: 41–79.
Dogil, Gregory (1979). Autosegmental Account of Phonological Emphasis. Carbondale,
Ill.: Linguistic Research.
Downing, Bruce T. (1970). Syntactic structure and phonological phrasing in English.
Dissertation, University of Texas, Austin.
Dowty, David (1985). On recent analyses of the semantics of control. Linguistics and
Philosophy 8: 291–331.
Dowty, David (1991). Thematic proto-roles and argument selection. Language 67:
547–619.
Dresher, Elan (1977). Logical representations and linguistic theory. Linguistic Inquiry
8: 351–78.
E. Kiss, Katalin (ed.) (1992). Discourse Configurationality. Oxford: Oxford University
Press.
Emonds, Joseph (1970). Root and Structure Preserving Transformations. Bloomington:
Indiana University Linguistics Club.
Emonds, Joseph (1976). A Transformational Approach to English Syntax. New York:
Academic Press.
Farmer, Ann K. (1984). Modularity in Syntax. Cambridge, Mass.: MIT Press.
Featherston, Sam (2001). Empty Categories in Sentence Processing. Amsterdam:
Benjamins.
Fillmore, Charles J. (1965). Indirect Object Constructions and the Ordering of Trans-
formations. The Hague: Mouton.
Fillmore, Charles J. (1999). Inversion and constructional inheritance. In Gert Webel-
huth, Jean-Pierre Koenig, and Andreas Kathol (eds), Lexical and Constructional
Aspects of Linguistic Explanation, 113–28. Stanford, Calif.: CSLI.
364 references

Fillmore, Charles J., Paul Kay, and Mary Catherine O’Connor (1988). Regularity and
idiomaticity in grammatical constructions: the case of let alone. Language 64: 501–39.
Fodor, Janet D. (1978). Parsing strategies and constraints on transformations. Linguis-
tic Inquiry 9: 427–73.
Fodor, Jerry A., and Merrill Garrett (1967). Some syntactic determinants of sentential
complexity. Attention, Perception, and Psychophysics 2: 289–96.
Fodor, Jerry A., Thomas Bever, and Merrill Garrett (1974). The Psychology of Language:
An Introduction to Psycholinguistics and Generative Grammar. New York: McGraw-
Hill.
Fowler, Carol A. (1983). Converging sources of evidence on spoken and perceived
rhythms of speech: cyclic production of vowels in monosyllabic stress feet. Journal
of Experimental Psychology: General 112: 386.
Frampton, John (1990). Parasitic gaps and the theory of wh-chains. Linguistic Inquiry
21: 49–77.
Freidin, Robert (1975). The analysis of passives. Language 51: 384–405.
Friedmann, Naama, and Louis P. Shapiro (2003). Agrammatic comprehension of OSV
and OVS sentences in Hebrew. Journal of Speech, Language and Hearing Research
46: 288-97.
Fukui, Naoki, and Margaret Speas (1986). Specifiers and projection. MIT Working
Papers in Linguistics 8: 128–72.
Gallistel, C. Randall (1990). The Organization of Learning. Cambridge, Mass.: MIT Press.
Gasser, Michael (1994). Acquiring receptive morphology: a connectionist model.
Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics,
279–86.
Gasser, Michael, and Chan-Do Lee (1991). A short-term memory architecture for
the learning of morphophonemic rules. In R. P. Lippmann, J. E. Moody, and
D. S. Touretzkey (eds), Advances in Neural Information Processing Systems 3,
605–11. San Mateo, Calif: Morgan Kaufmann.
Gazdar, Gerald (1981). Unbounded dependencies and coordinate structure. Linguistic
Inquiry 12: 155–84.
Gazdar, Gerald, Ewan Klein, Geoffrey Pullum, and Ivan A. Sag (1985). Generalized
Phrase Structure Grammar. Cambridge, Mass.: Harvard University Press.
Givón, Talmy (1979). On Understanding Grammar. New York: Academic Press.
Goldberg, Adele E. (1995). Constructions: A Construction Grammar Approach to Argu-
ment Structure. Chicago: University of Chicago Press.
Goldberg, Adele E. (2006). Constructions at Work: Constructionist Approaches in
Context. Oxford: Oxford University Press.
Goldberg, Adele E., and Ray Jackendoff (2004). The English resultative as a family of
constructions. Language 80: 532–67.
Greenberg, Joseph H. (1957). Essays in Linguistics. Chicago: University of Chicago
Press.
Greenberg, Joseph H. 1963. Some universals of grammar with particular reference to
the order of meaningful elements. In Universals of Language, 73–113. Cambridge,
Mass.: MIT Press.
references 365

Grice, H. P. (1975). Logic and conversation. In Peter Cole and Jerry L. Morgan (eds),
Speech Acts, 41–58. New York: Academic Press.
Grimshaw, Jane (1975). Relativization by deletion in Chaucerian Middle English. In
Jane Grimshaw (ed.), Papers in the History and Structure of English 1. Amherst,
Mass.: University of Massachusetts.
Grimshaw, Jane (1979). Complement selection and the lexicon. Linguistic Inquiry 10:
279–326.
Grimshaw, Jane (1997). Projections, heads and optimality. Linguistic Inquiry 28: 373–422.
Grodzinsky, Yosef (2000). The neurology of syntax: language use without Broca’s area.
Behavioral and Brain Sciences 23: 1–71.
Grosu, Alexander (1975). The position of fronted wh phrases. Linguistic Inquiry
6: 588–99.
Gruber, Jeffrey S. (1965). Studies in lexical relations. Dissertation, MIT.
Gruber, Jeffrey S. (1967). Disjunctive ordering among lexical insertion rules. MS, MIT.
Guéron, Jacqueline (1980). On the syntax and semantics of PP extraposition. Linguis-
tic Inquiry 1: 637–78.
Guéron, Jacqueline, and Robert May (1984). Extraposition and logical form. Linguistic
Inquiry 5: 1–31.
Gundel, Janet (1974). The role of topic and comment in linguistic theory. Dissertation,
University of Texas at Austin.
Gunter, Richard (1966). On the placement of accent in dialogue: a feature of context
grammar. Journal of Linguistics 2: 159–79.
Haegeman, Liliane (1991). Negative concord, negative heads. In Denis Delfitto, Martin
Everaert, Arnold Evers, and Frits Stuurman (eds), Going Romance and Beyond: Fifth
Symposium on Comparative Grammar. Utrecht: OTS.
Haider, Hubert (1986). V-second in German. In Hubert Haider and Martin Prinzhorn
(eds), Verb Second Phenomena in Germanic Languages, 49–75. Dordrecht: Foris.
Hale, John T. (2003). The information conveyed by words in sentences. Journal of
Psycholinguistic Research 32: 101–23.
Hale, Kenneth, LaVerne Jeanne, and Paul Platero (1977). Three cases of overgenera-
tion. In Peter W. Culicover, Thomas Wasow, and Adrian Akmajian (eds), Formal
Syntax, 379–416. New York: Academic Press.
Hall, Christopher J. (1988). Integrating diachronic and processing principles in
explaining the suffixing preference. In John A. Hawkins (ed.), Explaining Language
Universals, 321–49. Oxford: Blackwell.
Harris, Zelig S. (1955). From phoneme to morpheme. Language 31: 190–222.
Hasegawa, Nobuko (1981). The VP complement and control phenomena: beyond
trace theory. Linguistic Analysis 7: 85–120.
Hauser, Marc D. (2000). Wild Minds: What Animals Really Think. New York: Holt.
Hawkins, John A. (1994). A Performance Theory of Order and Constituency. Cambridge:
Cambridge University Press.
Hawkins, John A., and Gary Gilligan (1988). Prefixing and suffixing universals in
relation to basic word order. Lingua 74: 219–59.
366 references

Hay, Jennifer, Janet Pierrehumbert, and Mary E. Beckman (2003). Speech perception,
well-formedness and the statistics of the lexicon. In John Local, Richard Ogden,
and Rosalyn Temple (eds), Papers in Laboratory Phonology, vol. 6, 58–74. Cam-
bridge: Cambridge University Press.
Hoeksema, Jack, and Richard D. Janda (1988). Implications of process morphology for
categorial grammar. In Richard T. Oehrle, Emmon Bach, and Dierdre Wheeler
(eds), Categorial Grammars and Natural Language Structures, 199–247. New York:
Academic Press.
Hoekstra, Eric (1991). Licensing conditions on phrase structure. Dissertation, Univer-
sity of Groningen.
Hoekstra, Teun, and René Mulder (1990). Unergatives as copula verbs: location and
existential predication. Linguistic Review 7: 1–79.
Hofmeister, Philip (2011). Representational complexity and memory retrieval in
language comprehension. Language and Cognitive Processes 26: 376–405.
Hooper, Joan, and Sandra A. Thompson (1973). On the applicability of root trans-
formations. Language 4: 465–97.
Horvath, Julia (1979). Core grammar and a stylistic rule in Hungarian syntax. NELS
9: 237–55.
Horvath, Julia (1985). Focus in the Theory of Grammar and the Syntax of Hungarian.
Dordrecht: Foris.
Jackendoff, Ray (1969). An interpretive theory of negation. Foundations of Language
4: 218–41.
Jackendoff, Ray (1972). Semantic Interpretation in Generative Grammar. Cambridge,
Mass.: MIT Press.
Jackendoff, Ray (1977). X-Bar Syntax: A Study of Phrase Structure. Cambridge, Mass.:
MIT Press.
Jackendoff, Ray (1990). Semantic Structures. Cambridge, Mass.: MIT Press.
Jackendoff, Ray (1997). The Architecture of the Language Faculty. Cambridge, Mass.:
MIT Press.
Jackendoff, Ray (2002). Foundations of Language. Oxford: Oxford University Press.
Jackendoff, Ray, and Peter W. Culicover (1972). A reconsideration of dative move-
ment. Foundations of Language 6: 197–219.
Jackendoff, Ray, and Peter W. Culicover (2003). The semantic basis of control.
Language 79: 517–56.
Jacobson, Pauline (1992). Antecedent contained deletion in a variable-free semantics.
In Chris Barker and David Dowty (eds), Proceedings of the Second Conference on
Semantics and Linguistic Theory, 193–213. Columbus: Department of Linguistics,
Ohio State University.
Jacquemin, Christian (1997). Guessing morphology from terms and corpora. Proceed-
ings of the 20th Annual International Conference on Research and Development in
Information Retrieval, 156–67.
Jaeggli, Osvaldo (1980). Remarks on to contraction. Linguistic Inquiry 11: 239-46.
Jaeggli, Osvaldo (1982). Topics in Romance Syntax. Dordrecht: Foris.
references 367

Janda, Richard D. (2011). Why morphological metathesis rules are rare: on the
possibility of historical explanation in linguistics. In Proceedings of the Annual
Meeting of the Berkeley Linguistics Society, 87–103.
Jespersen, Otto (1949). A Modern English Grammar on Historical Principles, 7: Syntax.
London: Allen & Unwin.
Johnson, E. K., and P. W. Jusczyk (2001). Word segmentation by 8-month-olds: when
speech cues count more than statistics. Journal of Memory and Language 44: 548–67.
Johnson, Kyle (1985). A case for movement. Dissertation, MIT.
Johnson, Kyle (1989). Clausal architecture and structural case. MS, University of
Wisconsin-Madison.
Kathol, Andreas, and Robert D. Levine (1992). Inversion as a linearization effect. In
Amy Schaefer (ed.), Proceedings of NELS 23, 207–21. Amherst, Mass.: GLSA.
Katz, Jerrold J., and Paul M. Postal (1964). Toward an Integrated Theory of Linguistic
Descriptions. Cambridge, Mass.: MIT Press.
Kay, Paul (2002a). An informal sketch of a formal architecture for construction
grammar. Grammars 5: 1–19.
Kay, Paul (2002b). English subjectless tagged sentences. Language 78: 453–81.
Kay, Paul, and Charles J. Fillmore (1999). Grammatical constructions and linguistic
generalizations: the what’s x doing y? construction. Language 75: 1–33.
Kayne, Richard S. (1981a). ECP extensions. Linguistic Inquiry 22: 93–133.
Kayne, Richard S. (1981b). On certain differences between French and English. Lin-
guistic Inquiry 12: 349–71.
Kayne, Richard S. (1994). The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press.
Keenan, Edward (1980). Passive is phrasal (not sentential or lexical). In Teun Hoek-
stra, Harry van der Hulst, and Michael Moortgat (eds), Lexical Grammar, 181–213.
Dordrecht: Foris.
Kehler, Andrew (2000). Coherence and the resolution of ellipsis. Linguistics and
Philosophy 23: 533–75.
Keyser, Samuel J. (1967). Machine recognition of transformational grammars of English.
DTIC document: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix
=html&identifier=AD0653993.
Kirby, Simon (1994). Adaptive explanations for language universals. Sprachtypologie
und Universalienforschung 47: 186–210.
Kisseberth, Charles W. (1970). On the functional unity of phonological rules. Linguis-
tic Inquiry 1: 291–306.
Klavans, Judith L., and Philip Resnik (1996). The Balancing Act: Combining Symbolic
and Statistical Approaches to Language. Cambridge, Mass.: MIT Press.
Klein, Sharon M. (1981). Syntactic theory and the developing grammar. Dissertation,
UCLA.
Klima, Edward (1964). Negation in English. In Jerry Fodor and Jerrold J. Katz (eds),
The Structure of Language, 246–323. Englewood Cliffs, NJ: Prentice-Hall.
Klima, Edward (1970). Regulatory devices against functional ambiguity. MS, MIT.
368 references

Kluender, Robert (1998). On the distinction between strong and weak islands: a
processing perspective. In Peter W. Culicover and Louise McNally (eds), The Limits
of Syntax, 241–79. New York: Academic Press.
Kluender, Robert (2004). Are subject islands subject to a processing account? In
Benjamin Schmeiser, Vineeta Chand, Ann Kelleher, and Angelo Rodriguez (eds),
Proceedings of WCCFL 23, 101–25. Somerville, Mass.: Cascadilla Press.
Koizumi, Masatoshi (1991). Syntax of adjuncts and the phrase structure of Japanese.
Dissertation, Ohio State University.
Koopman, Hilda (1983). ECP effects in main clauses. Linguistic Inquiry 4: 346–50.
Koster, Jan (1978a). Locality Principles in Syntax. Dordrecht: Foris.
Koster, Jan (1978b). Why subject sentences don’t exist. In Samuel J. Keyser (ed.),
Recent Transformational Studies in European Languages, 53–64. Cambridge, Mass.:
MIT Press.
Koster, Jan, and Robert May (1981). On the constituency of infinitives. Language 58:
116–43.
Kuroda, S.-Y. (1968). Review of Fillmore (1965). Language 44: 374–78.
Ladd, Robert (1980). The Structure of Intonational Meaning. Bloomington: Indiana
University Press.
Laka, Itziar (1990). Negation in syntax: on the nature of functional categories and
projections. Dissertation, MIT.
Lakoff, George (1969). On derivational constraints. In Robert I. Binnick, Alice Davi-
son, Georgia Green, and Jerry L. Morgan (eds), Papers from the Fifth Regional
Meeting of the Chicago Linguistic Society, 117–39. Chicago: CLS.
Lakoff, George (1970). Linguistics and natural logic. Synthese 22: 151–271.
Lakoff, George (1971). On the Nature of Syntactic Irregularity. New York: Holt, Rine-
hart & Winston.
Lakoff, George (1972). The global nature of the Nuclear Stress Rule. Language 48:
285–303.
Lakoff, Robin (1969). A syntactic argument for negative transportation. In Robert
I. Binnick, Alice Davison, Georgia Green, and Jerry L. Morgan (eds), Papers from
the Fifth Regional Meeting of the Chicago Linguistic Society, 140–47. Chicago: CLS.
Landau, Idan (2006). Severing the distribution of PRO from case. Syntax 9: 153–70.
Lappin, Shalom (1996). The interpretation of ellipsis. In Shalom Lappin (ed.), Hand-
book of Contemporary Semantic Theory, 145–75. Oxford: Blackwell.
Lappin, Shalom, Robert D. Levine, and David Johnson (2000). The structure of
unscientific revolutions. Natural Language and Linguistic Theory 18: 665–71.
Larson, Richard (1988). On the double object construction. Linguistic Inquiry 19: 335–91.
Larson, Richard (1990). Double objects revisited: reply to Jackendoff. Linguistic
Inquiry 21: 589–632.
Lasnik, Howard (2001). When can you save a structure by destroying it? NELS
31: 301–20.
Lasnik, Howard (2002). The minimalist program in syntax. Trends in Cognitive
Sciences 6: 432–37.
references 369

Lasnik, Howard, and Mamoru Saito (1984). On the nature of proper government.
Linguistic Inquiry 5: 235–89.
Lasnik, Howard, and Mamoru Saito (1992). Move Alpha. Cambridge, Mass.: MIT
Press.
Lasnik, Howard, and Tim Stowell (1991). Weakest crossover. Linguistic Inquiry 22:
687–720.
Latané, Bibb (1996). The emergence of clustering and correlation from social inter-
actions. In R. Hegselmann and H. O. Peitgen (eds), Modelle socialer Dynamiken:
Ordnung, Chaos und Komplexität, 79–104. Vienna: Holder-Pichler-Tempsky.
Lebeaux, David (1988). Language acquisition and the form of the grammar. Disserta-
tion, University of Massachusetts, Amherst.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics 10.8: 707–10.
Levin, Beth, and Malka Rappaport Hovav (1995). Unaccusativity: At the Syntax–
Lexical Semantics Interface. Cambridge, Mass.: MIT Press.
Levine, Robert D. (1989). On focus inversion: syntactic valence and the role of a subcat
list. Linguistics 17: 1013–55.
Levy, Roger (2008). Expectation-based syntactic comprehension. Cognition 106: 1126–77.
Liberman, Mark (1974). On conditioning the rule of Subj–Aux inversion. In Ellen
Kaisse and Jorgen Hankamer (eds), Papers from the Fifth Annual Meeting of NELS,
77–91. Amherst, Mass.
Liberman, Mark (1979). The Intonational System of English. New York: Garland.
Liberman, Mark, and Alan Prince (1977). On stress and linguistic rhythm. Linguistic
Inquiry 8: 249–336.
Manning, Christopher D., and Hinrich Schütze (1999). Foundations of Statistical
Natural Language Processing. Cambridge, Mass.: MIT Press.
Manzini, M. Rita (1983). Restructuring and reanalysis. Dissertation, MIT.
Marslen-Wilson, W. D. (1993). Issues of process and representation in lexical access. In
G. T. M. Altmann and R. Shillcock (eds), Cognitive Models of Speech Processing: The
Second Sperlonga Meeting, 187–210. Mahwah, NJ: Erlbaum.
Marslen-Wilson, W. D., and L. K. Tyler (1980). The temporal structure of spoken
language understanding. Cognition 8: 1–71.
Marslen-Wilson, W. D., and A. Welsh (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10: 29–63.
May, Robert (1985). Logical Form. Cambridge, Mass.: MIT Press.
McA’Nulty, Judith (1980). Binding without case. In John Jensen (ed.), Proceedings of
NELS 10, 315–28. Ottawa: Cahiers linguistiques d’Ottawa, University of Ottawa.
McCarthy, John J. (2000). The prosody of phase in Rotuman. Natural Language and
Linguistic Theory 18: 147-97.
McCarthy, John J. (2003). Sympathy, cumulativity, and the Duke-of-York gambit. In
Caroline Féry and Ruben van de Vijver (eds), The Optimal Syllable, 23–76. Cam-
bridge: Cambridge University Press.
Merchant, Jason (2001). The Syntax of Silence. Oxford: Oxford University Press.
370 references

Mielke, Jeff, and Elizabeth Hume (2001). Consequences of word recognition for
metathesis. In Elizabeth Hume, Norval Smith, and Jeroen van de Weijer (eds),
Surface Syllable Structure and Segment Sequencing, 135–58. Leiden: HIL.
Nerbonne, John, Wilbert Heeringa, and Peter Kleiweg (1999). Edit distance and dialect
proximity. In David Sankoff and Joseph Kruskal (eds), Time Warps, String Edits and
Macromolecules: The Theory and Practice of Sequence Comparison, v–xv. Stanford,
Calif.: CSLI.
Nettle, Daniel (1999). Using social impact theory to simulate language change. Lingua
108: 95–117.
Newman, Stanley (1946). On the stress system of English. Word 2: 171–87.
Newmeyer, Frederick J. (1998). On the supposed ‘counterfunctionality’ of universal
grammar: some evolutionary implications. In James R. Hurford, Michael Studdert-
Kennedy, and Chris Knight (eds), Approaches to the Evolution of Language, 305–19.
Cambridge: Cambridge University Press.
Newmeyer, Frederick J. (2001). Agent-assignment, tree-pruning, and Broca’s aphasia.
Behavioral and Brain Sciences 23: 44–5.
Newmeyer, Frederick J. (2002). Optimality and functionality: a critique of functionally-
based optimality-theoretic syntax. Natural Language and Linguistic Theory 21: 43–80.
Nishigauchi, Taisuke (1984). Control and the thematic domain. Language 60: 21–50.
Nooteboom, S. G. (1981). Lexical retrieval from fragments of spoken words: begin-
nings vs. endings. Journal of Phonetics 9: 401–24.
Nowak, Andrzej, Jacek Szamrej, and Bibb Latané (1990). From private attitude to
public opinion: a dynamic theory of social impact. Psychological Review 97: 362–76.
Otero, Carlos (1972). Acceptable ungrammatical sentences in Spanish. Linguistic
Inquiry 3: 233–42.
Ouhalla, Jamal (1994). Verb movement and word order in Arabic. In David Lightfoot
and Norbert Hornstein (eds), Verb Movement, 41–72. Cambridge: Cambridge
University Press.
Partee, Barbara, Alice ter Meulen, and Robert E. Wall (1990). Mathematical Methods in
Linguistics. Dordrecht: Kluwer.
Perlmutter, David M. (1983). Studies in Relational Grammar. Chicago: University of
Chicago Press.
Pesetsky, David (1979). Russian morphology and lexical theory. Dissertation, MIT.
Pesetsky, David (1982). Complementizer-trace phenomena and the nominative island
condition. Linguistic Review 1: 297–343.
Pesetsky, David (1987). Wh-in-situ: movement and unselective binding. In Eric
J. Reuland and Alice G. B. ter Meulen (eds), The Representation of (In)Definiteness,
98–129. Cambridge, Mass.: MIT Press.
Peters, Anne M., and Lise Menn (1993). False starts and filler syllables: ways to learn
grammatical morphemes. Language 69: 742–77.
Piñango, Maria Mercedes (1999). Real-time processing implications of aspectual
coercion at the syntax–semantics interface. Journal of Psycholinguistic Research 28:
395–414.
references 371

Piñango, Maria Mercedes (2000). Canonicity in Broca’s sentence comprehension: the


case of psychological verbs. In Y. Grodzinsky (ed.), Language and the Brain, 327–50.
New York: Academic Press.
Plann, Susan (1981). The two el + infinitive constructions in Spanish. Linguistic
Analysis 7: 203–40.
Plunkett, K., and V. Marchman (1991). U-shaped learning and frequency effects in
a multi-layered perception: implications for child language acquisition. Cognition
38: 43–102.
Pollard, Carl, and Ivan A. Sag (1994). Head-Driven Phrase Structure Grammar.
Chicago: University of Chicago Press.
Pollock, Jean-Yves (1989). Verb movement, universal grammar and the structure of
IP. Linguistic Inquiry 20: 365–424.
Postal, Paul M. (1972). On some rules that are not successive cyclic. Linguistic Inquiry
3: 211–22.
Postal, Paul M. (1994). Parasitic and pseudo-parasitic gaps. Linguistic Inquiry 25: 63–117.
Prasada, S., and S. Pinker (1993). Generalisation of regular and irregular morpho-
logical patterns. Language and Cognitive Processes 8: 1–56.
Prince, Ellen F. (1981). On the inferencing of indefinite this NPs. In Aravind K. Joshi,
Bonnie Lynn Webber, and Ivan A. Sag (eds), Discourse Structure and Discourse
Setting, 231–50. Cambridge: Cambridge University Press.
Prince, Ellen F. (1987). Topicalization and left-dislocation: a functional analysis.
Annals of the New York Academy of Sciences 433: Discourses in Reading and Linguis-
tics, 213–25.
Pullum, Geoffrey (1976). The Duke of York Gambit. Journal of Linguistics 12: 83–102.
Ravid, D. (2003). A developmental perspective on root perception in Hebrew and
Palestinian Arabic. Language Acquisition and Language Disorders 28: 293–320.
Reinhart, Tanya (1981a). Pragmatics and linguistics: an analysis of sentence topics.
Philosophica 27: 53–94.
Reinhart, Tanya (1981b). A second Comp position. In Adriana Belletti, Luciana
Brandi, and Luigi Rizzi (eds), Theory of Markedness in Generative Grammar,
517–57. Pisa: Scuola Normale Superiore.
Reuland, Eric (1983). Governing -ing. Linguistic Inquiry 14: 101–36.
Rizzi, Luigi (1990). Relativized Minimality. Cambridge, Mass.: MIT Press.
Rizzi, Luigi (1996). Residual verb second and the wh-criterion. In Adriana Belletti and
Luigi Rizzi (eds), Parameters and Functional Heads, 63–90. Oxford: Oxford Univer-
sity Press.
Rizzi, Luigi (1997). The fine structure of the left periphery. In Liliane Haegeman (ed.),
Handbook of Generative Syntax, 281–338. Dordrecht: Kluwer Academic.
Rizzi, Luigi (2004). The Structure of CP and IP: The Cartography of Syntactic Structures.
New York: Oxford University Press.
Rizzi, Luigi, and Ian Roberts (1989). Complex inversion in French. Probus 1: 1–30.
Rochemont, Michael S. (1978). A theory of stylistic rules in English. Dissertation,
University of Massachusetts, Amherst.
372 references

Rochemont, Michael S. (1979). Remarks on the stylistic component in generative


grammar. In Elisabet Engdahl and Mark Stein (eds), Papers Presented to Emmon
Bach by his Students, 147–164. Amherst, Mass.: University of Massachussetts.
Rochemont, Michael S. (1980). Stylistic transformations. MS, UCLA.
Rochemont, Michael S. (1986). Focus in Generative Grammar. Amsterdam: Benjamins.
Rochemont, Michael S. (1989). Topic islands and the subjacency parameter. Canadian
Journal of Linguistics 34: 145–70.
Rochemont, Michael S. (1992). Bounding rightward A-bar dependencies. In Helen
Goodluck and Michael S. Rochemont (eds), Island Constraints: Theory, Acquisition
and Processing, 1–33. Dordrecht: Kluwer Academic.
Rochemont, Michael S. (1998). Phonological focus and structural focus. In Peter
W. Culicover and Louise McNally (eds), The Limits of Syntax, 337–63. New York:
Academic Press.
Rochemont, Michael S., and Peter W. Culicover (1990). English Focus Constructions
and the Theory of Grammar. Cambridge: Cambridge University Press.
Rochemont, Michael S., and Peter W. Culicover (1991). In defense of rightward
movement. Toronto Working Papers in Linguistics.
Rodgers, Richard, and Lorenz Hart (1937). Where or When. Alfred Music Publishers.
Ross, John R. (1967). Constraints on variables in syntax. Dissertation, MIT.
Ross, John R. (1969a). Adjectives as noun phrases. In David A. Reibel and Sanford
Schane (eds), Modern Studies in English, 352–60. New York: Prentice-Hall.
Ross, John R. (1969b). Guess who. In Robert I. Binnick, Alice Davison, Georgia
M. Green, and Jerry L. Morgan (eds), Proceedings of the Fifth Annual Meeting of
the Chicago Linguistics Society, 252–86. Chicago: CLS.
Rothstein, Susan D. (1983). The syntactic forms of predication. Dissertation, MIT.
Rumelhart, David E., and James L. McClelland (1986). On learning the past tense of
English verbs. In David E. Rumelhart and James L. McClelland (eds), Psychological
and Biological Models, 216–71. Cambridge, Mass.: MIT Press.
Ruwet, Nicolas (1982). Grammaire des insultes et autres études. Paris: Seuil.
Saffran, Jennifer R., Richard N. Aslin, and Elisa L. Newport (1996). Statistical learning
by 8-month-old infants. Science 274: 1926.
Safir, Kenneth (1985). Syntactic Chains. Cambridge: Cambridge University Press.
Sag, Ivan A. (1976). Deletion and logical form. Dissertation, MIT.
Sag, Ivan A. (1997). English relative clause constructions. Journal of Linguistics 33: 431–84.
Sag, Ivan A., and Carl Pollard (1991). An integrated theory of complement control.
Language 67: 63–113.
Sankoff, David, and Joseph B. Kruskal (1983). Time Warps, String Edits, and Macro-
molecules: The Theory and Practice of Sequence Comparison. Stanford, Calif.: CSLI.
Sapir, Edward (1921). Language: An Introduction to the Study of Speech. New York:
Harcourt, Brace.
Schmerling, Susan (1976). Aspects of English Sentence Stress. Austin: University of
Texas Press.
Selkirk, Elizabeth O. (1972). The phrase phonology of English and French. Disserta-
tion, MIT.
references 373

Selkirk, Elizabeth O. (1977). Some remarks on noun phrase structure. In Peter


W. Culicover, Thomas Wasow, and Adrian Akmajian (eds), Formal Syntax,
285–316. New York: Academic Press.
Selkirk, Elizabeth O. (1978). On Prosodic Structure and its Relation to Syntactic
Structure. Bloomington, Ind.: Indiana University Linguistics Club.
Sells, Peter (1984). Syntax and semantics of resumptive pronouns. Dissertation,
University of Massachusetts, Amherst.
Sobin, Nicholas (1987). The variable status of comp-trace phenomena. Natural Lan-
guage and Linguistic Theory 5: 33–60.
Spencer, Andrew, and Arnold Zwicky (eds) (1998). The Handbook of Morphology.
Oxford: Blackwell.
Sperber, Dan, and Deirdre Wilson (1979). Ordered entailments: an alternative to
presuppositional theories. In Choon-Kyu Oh and David A. Dinneen (eds), Presup-
position. Syntax and Semantics 11: 299–323. New York: Academic Press.
Sportiche, Dominique (1988). A theory of floating quantifiers and its corollaries for
constituent structure. Linguistic Inquiry 19: 425–49.
Sprouse, Jon, Matt Wagers, and Colin Phillips (2012). A test of the relation between
working memory capacity and syntactic island effects. Language 88: 82–123.
Stainton, Robert J. (1998). Quantifier phrases, meaningfulness ‘in isolation’, and
ellipsis. Linguistics and Philosophy 21: 311–40.
Stillings, Justine (1975). The formulation of gapping in English as evidence for variable
types in syntactic transformations. Linguistic Analysis 1: 247–73.
Stockwell, Robert P. (1960). The place of intonation in a generative grammar of
English. Language 36: 360–67.
Stockwell, Robert P. (1972). The role of intonation: reconsiderations and other
considerations. In Dwight L. Bolinger (ed.), Readings on Intonation, 87–109. Har-
mondsworth: Penguin.
Stowell, Timothy (1981). Origins of phrase structure. Dissertation, MIT.
Stuurman, Frits (1991). If and whether: questions and conditions. Lingua 83: 1–41.
Tesar, Bruce (1995). Computational optimality theory. Dissertation, University of
Colorado, Boulder.
Trager, George L., and Henry Lee Smith (1951). An Outline of English Structure.
Norman, Okla.: Battenburg Press.
Ueyama, Ayumi (1991). Scrambling and the focus interpretation. Paper presented to
the Workshop on Japanese Syntax and Universal Grammar, Rochester, NY.
Van Valin, Jr., Robert D., and Randy J. LaPolla (1997). Syntax: Structure, Meaning and
Function. Cambridge: Cambridge University Press.
Vikner, Sten (1991). Finite verb movement in Scandinavian embedded clauses. Disser-
tation, University of Maryland, College Park.
von Fintel, Kai-Uwe (1990). Licensing of clausal specifiers in German. In D. Meyer, S.
Tomioka and L. Zidani-Eroglu (eds), Proceedings of the First Meeting of the Formal
Linguistic Society of Midamerica. Madison, Wisc.: University of Madison Press.
Wasow, Thomas (1979). Anaphora in Generative Grammar. Ghent: E. Story-Scientia.
374 references

Wasow, Thomas (1980). Major and minor rules in lexical grammar. In Teun Hoekstra,
Harry van der Hulst, and Michael Moortgat (eds), Lexical Grammar, 285–312.
Dordrecht: Foris.
Wasow, Thomas (1997). Remarks on grammatical weight. Language Variation and
Change 9: 81–106.
Wexler, Kenneth, and Peter W. Culicover (1980). Formal Principles of Language
Acquisition. Cambridge, Mass.: MIT Press.
Wilkins, Wendy (1977). The variable interpretation condition. Doctoral dissertation,
UCLA.
Wilkins, Wendy (1980). Adjacency and variables in syntactic transformations. Linguis-
tic Inquiry 1: 709–58.
Wilkins, Wendy (1985). On the linguistic function of thematic relations. Paper pre-
sented at Symposium on Thematic Relations, Seattle.
Wilkins, Wendy (1986). El sintagma nominal de infinitivo. Revista argentina de lingüı́s-
tica 2: 209–29.
Wilkins, Wendy (2005). Anatomy matters. Linguistic Review 22: 271–88.
Williams, Edwin (1977). Discourse and logical form. Linguistic Inquiry 8: 101–40.
Williams, Edwin (1980). Predication. Linguistic Inquiry 1: 203–38.
Williams, Edwin (1981). Remarks on stress and anaphora. Journal of Linguistic
Research 1: 1–16.
Yarowsky, David, and Richard Wicentowski (2000). Minimally supervised morpho-
logical analysis by multimodal alignment. In Proceedings of the 38th Annual Meeting
on Association for Computational Linguistics, 207–16.
Index

accent 118, 119; see also accent placement Bing, J. 76, 83, 111, 115, 118
accent placement 73, 74, 75, 78, 83–5, 92, Bolinger, D. 71, 74, 83, 89, 99, 115–17
93, 104, 112, 114, 116, 117 Borer, H. 74, 230
Adger, D. 4 Brame, M. 138
Adverb Effect 212, 256–68 Bransford, J. 321
Aissen, J. 318 Bresnan, J. 2, 46, 115, 116, 130, 134, 138,
Akmajian, A. 168, 187, 301 146, 164, 205, 225, 256, 271, 274,
Anderson, S. 125, 126, 135 318, 319
Arabic 246, 255 Brew, C. 334
Arnold, J. 270 Briscoe, E. 310, 316
Austin, J. 34 Brown, R. 8, 11, 319
Authier, M. 212, 213, 218, 236
Autonomous Systems view 72, 75, 83, c-construable 71, 107, 109–14
100, 115, 116, 117 CED 200, 264, 266, 268
autonomy 53, 72, 115, 116, 117, 118, 132 Chafe, W. 114
Chierchia, G. 127
Bach, E. 138 Chomsky, N. 1, 4, 28, 29, 30, 46, 58, 67,
Baker, M. 214, 215 71–3, 77, 78, 82, 85, 92, 100–2, 107,
Baltin, M. 155, 199, 201, 215, 250 115–17, 121, 122, 125, 137, 146, 151,
Bare Argument Ellipsis 5–7, 15, 28 156–9, 161, 164, 173, 178, 199, 210, 211,
Barker, C. 256 214, 215, 218, 239, 243, 244, 251, 261,
Baroni, M. 341, 342 262, 264–7, 308, 310, 316,
Barss, A. 212 331, 335
Basque 337 Churchward, C. 356
Bavarian 223 Cinque, G. 212, 214, 216, 221, 222, 249
Bayer, J. 223 cliticization 97–100
Becker, T. 356 coherence 53, 68, 70
Beckman, M. 166, 277, 334 coindexing 102, 121–30, 132–42, 145, 146,
Beerman, D. 191, 271 150, 152, 155, 157–60, 186, 188, 214, 221,
Belletti, A. 212 259, 260, 282, 283
Berman, A. 115, 116 Cole, R. 340
Berwick, R. 321 COMP 124, 151, 152, 160–1, 178, 185
Beukema, F. 223 Complement Principle 199, 203, 211
Bever, T. 8, 301, 308 complexity
bias 311, 315, 316, 333, 338 computational/processing 309–11, 32
Bierwisch, M. 85, 115 compositionality 2, 7, 10
bijacent 128, 129, 136, 140 conditionals (and OM-sentences) 16, 18,
Binding Theory 194, 232 22–4, 30, 32, 41, 43, 44, 52
Condition C 194–8 Connine, C. 340
376 index

consequential interpretation Finnish 336, 337


(OM-sentences) 17–21, 25, 30–6, focus
41, 52 contrastive 74, 105–9, 111–13, 118
Contreras, H. 247 informational 105, 108, 109, 112
control 120, 121, 123, 134–46, 157–9 presentational 74, 104, 105, 108–12, 114
arbitrary 139, 144 focus 71–119, 148, 153, 161, 213, 214, 215,
Cooper, W. 340 217, 230, 239–44, 246, 247–53,
Coopmans, P. 212, 221, 225, 245, 256, 270, 263, 322
273, 279 focus assignment 100–4
Cutler, A. 337 Fodor, J. A. 8, 308, 319
Czech 337, 342, 354 Fodor, J. D. 280
Fowler, C. 351
D-structure 126, 129, 136, 140, 145, 159, Frampton, J. 264, 265
160, 161, 234, 325 Franks, J. 321
Daneš, F. 115 freezing 98, 203, 204, 207–9, 331
dative construction 7–9, 129, 138, 204, Frege, G. 2, 7, 9, 10
295–301 Freidin, R. 138
Delahunty, G. 93, 149 French 3, 155, 158, 200, 233, 283
Delfitto, D. 212 Friedmann, N. 8
den Besten, H. 212 Fukui, N. 217
den Dikken, M. 9 Fur 357
Diesing, M. 215
Dogil, G. 75 Gallistel, R. 10
Downing, B. 77 Gapping 147, 243
Dowty, D. 120, 126 Gasser, M. 354, 355
Dresher, E. 176, 177 Gazdar, G. 138, 264
Dutch 66, 340 German 3, 252–4, 285
Gil, D. 75
E. Kiss, K. 322 Gilligan, G. 335, 336, 337
ECP 101, 212, 214, 219, 221–3, 256–9, Givón, T. 337
261–2, 266, 268, 283, 289 Goldberg, A. 2, 7, 8, 10, 52, 53, 295,
edit distance 66; see also Levenshtein 311, 325
ellipsis 5–6, 15, 28–30, 194, 230, 236–8 grammatical relations 27, 122, 123, 156
Emonds, J. 60, 61, 63, 64, 120, 185, 224 Greek 123
Everaert, M. 212 Greenberg, J. 310, 334, 336
Evers, A. 212 Grice, H. 111
extraposition 153, 163, 182, 192–200, 205, Grimshaw, J. 127, 135, 225, 259, 319
211, 301, 302, 303, 305, 320, 329 Grodzinsky, Y. 8
Grosu, A. 213
Farmer, A. 120, 137 Gruber, J. 124, 128, 133
Faroese 253 Guéron, J. 110, 196, 197, 199
Featherston, S. 3 Gundel, J. 213, 248, 249
Fillmore, C. 52, 53, 296, 298 Gunter, R. 90
index 377

Haegeman, L. 213 120, 122, 123, 125–8, 134, 137, 138, 141,
Hagoort, P. 8, 11 151, 160, 172, 231, 280, 311, 317, 319,
Haider, H. 252 323, 325
Hale, J. 305 Jacobson, P. 5
Hale, K. 72, 83, 100, 116, 132 Jaeggli, O. 73, 92, 101, 146
Halle, M. 46, 77, 85, 115–17 Jakamik, J. 340
Harnish, M. 15 Janda, R. 356
Harris, Z. 352 Jelinek, E. 120
Hauser, M. 10 Johnson, E. 352
Hawkins, J. 311, 320, 322, 327, 329, 334, Johnson, K. 205, 213, 239, 244, 279
336–8 Joseph, B. 334
Hay, J. 341, 352 Jusczyk, P. 252
Head Rule 78, 79, 81, 83, 84, 97, 98 juxtapositional interpretation
heavy inversion (HI) 269–71, 277, 279, (OM-sentences) 19, 20, 21, 24
280–9
Heavy NP Shift (HNPS) 180, 181, 191, Kathol, A. 276
203–11, 269–72, 277, 280–5, 288 Katz, J. 56, 101, 229, 245
Hebrew 213 Kay, P. 52, 53, 169
Heny, F. 168 Kayne, R. 101, 191, 200, 261, 271, 283, 324,
Hoeksema, J. 356 328, 329, 333
Hoekstra, E. 217 Keenan, E. 138
Hoekstra, T. 217 Kehler, A. 5
Hofmeister, P. 324 Kirby, S. 316
Hooper, J. 240 Kisseberth, C. 60, 61, 63
Horvath, J. 90, 245, 246 Kitagawa, C. 120
Hume, E. 334, 356, 357 Klavans, J. 10
Hungarian 14, 246, 252, 255 Klein, S. 123, 151
Hyman, L. 71 Klima, E. 49, 50, 54, 55, 168, 213, 223, 229,
297, 301
Icelandic 285 Kluender, R. 204, 321
idiom 9, 290, 325, 332 Korean 354
imperative 20, 23–5, 54, 59 Koster, J. 121, 122, 138, 140, 150–4,
incongruence interpretation 185–8, 151
(OM-sentences) 17, 21, 22, 30, 38, Kruksal, J. 345
46–9 Kuroda, S.-Y. 298
island
topic 213, 214, 216–22, 238, 249, 255, Ladd, R. 74, 75, 83, 111, 112, 115–18
257, 258, 279, 280 Laka, I. 213, 224, 227, 228, 233, 239, 240,
LF 197 241, 244, 247
wh- 197, 217, 263, 264, 267 Lakoff, G. 61, 70, 115, 116
Italian 3, 213, 285, 336 Lakoff, R. 64
Landau, I. 147
Jackendoff, R. 1, 5, 6, 7, 9, 10, 15, 28, 30, Lappin, S. 5, 317
32, 37, 50, 55, 56, 71, 79, 81, 116, 117, Larson, R. 204, 206, 207
378 index

Lasnik, H. 4, 6, 71–3, 78, 92, 100, 164, 194, Nakajima, H. 212


213–17, 221, 225, 238, 258, 259, Nakayama, M. (J.J.) 212, 256
261, 262 Navajo 336
Latané, B. 311 negation 50, 51, 57, 134, 163, 165, 166,
learnability 67, 88, 160–2, 317, 318, 169–73, 175–84, 214, 224, 239, 247, 248
323, 331 Negative Inversion 164, 215–20, 223–7,
Leben, W. 71 231, 247, 255, 260, 275
LeBlanc, D. 191 Nerbonne, J. 334
Lee, C.-D. 355 Nettle, D. 311
Levenshtein, V. 66, 342, 344–6, 349, 354 Newman, S. 115
Levin, B. 7, 270, 276 Newmeyer, F. 10, 120, 318
Levine, R. 182, 187, 191, 194, 196, 205, 256, Nishigauchi, T. 128, 140, 141
266, 273, 276 Nooteboom, S. 340
Levy, R. 305 Nowak, A. 65, 309, 311, 316, 338
Liberman, M. 75–8, 83–5, 115, 223
light inversion (LI) 269–71, 274, 279, 280, Oehrle, R. 120
284–9 Optimality Theory (OT) 259, 316–19
Limber, J. 203 Otero, C. 72
locality 123, 128, 137, 138, 145, 151, 159, 162 Ouhalla, J. 246
logical form (LF) 2, 73, 89, 101, 102, 107,
108, 112, 121, 155–7, 161, 163–5, 174, Paccia-Cooper, J. 340
176–9, 181, 182, 184, 185, 188, 193, paradigm 335, 343–4, 348–9,
194, 196–9, 221, 231, 236, 237, 243, 351–2, 354
259, 322 parasitic gap 204, 207–10, 264–8, 285
Partee, B. 3
Manzini, M. 141 passive 32, 133–4, 137–9, 141, 270, 296–301,
Marantz, A. 212 306–7
Marchman, V. 354 perceptual strategy 96, 301–8
Marslen-Wilson, W. 340 Perlmutter, D. 2
May, R. 121, 122, 140, 150–4, 196, Pesetsky, D. 212, 223, 234, 262, 271
197, 226 Pica, P. 309
McA’Nulty, J. 123, 151 Pinker, S. 354
McCarthy, J. 67 Plann, S. 152
McClelland J. 354 Plunkett, K. 354
McElree, B. 8 polarity phrase (PolP) 213, 214, 218–27,
McNally, L. 191 229–34, 239, 244–50, 253, 257, 259
Merchant, J. 6, 15, 30 Polla, R. 2
metonymy 7 Pollard, C. 2, 10, 120, 134, 256
Mielke, J. 356, 357 Pollock, J. Y. 212, 224, 239, 244
Miller, P. 256 Postal, P. 56, 101, 207, 229, 245, 331
Minimalist Program 4, 316 Prasada, S. 354
Miyagawa, S. 212 predication 121, 122, 124, 126, 139, 140,
Mutsun 356 150, 155–7
index 379

Presentational there Insertion (PTI) Rumelhart, D. 354


203–8, 279 Russian 3, 336, 337
Prince, A. 75–8, 83–5, 115
Prince, E. 109, 322 S-structure 71–3, 75, 88, 93, 100–2, 105,
PRO 121, 122, 124, 146–62 113, 121, 160, 192, 199, 207, 215, 229,
pro-drop 145–6 233, 240, 242, 248
Projection Principle 121, 122, 125, 156–62 Saffran, J. 341, 352
Prosodic (P-) structure 75–8, 82–4, 88, Safir, K. 242
97, 98, 104 Sag, I. 2, 10, 53, 120, 134, 237
pseudo-cleft 93, 148, 153, 194, 228 Saito, M. 213–17, 221, 238, 258, 259, 261
pseudo-imperative 29, 32 Sankoff, D. 345
Pullum, G. 67, 120 Sapir, E. 336
Schapiro, L. 8
R-structure 125–129, 135–8, 141, Schmerling, S. 76, 83, 115, 117
144–6, 155 Schwartz, B. 212
Raising (to subject) 159–60, scope
272–5, 289 of negation 55, 165–6, 169–82, 184,
Rappaport Hovav, M. 7, 270, 276 188, 242, 245, 260
Ravid, D. 356 quantifier 10, 112, 165, 197, 200, 322
Reinhart, T. 114, 213 Selkirk, E. 71, 75, 77, 82, 84, 97, 99, 151
relative clause 32, 38, 148–9, 152, 174, 185, Sells, P. 285
192–6, 200–3, 205, 218–19, 246, 266, Semitic 355
286, 301–4, 320–21, 330 sequential interpretation (OM-
Relativized Minimality 216, 218, 219 sentences) 17, 19–21, 30, 34, 35
Resnik, P. 10 Sierra Miwok 356
result clause 191, 196–202 Simpler Syntax 1, 3
Reuland, E. 281 Simpler Syntax Hypothesis 2
rhetorical OM-sentences 49–51 Sisters Rule 79, 81, 83, 84, 97
Rizzi, L. 212–14, 216, 218, 220, 221, 224, Sluicing 28, 214, 230–2, 236
226, 233, 239, 248, 258–60, 271, 283 Smith, G. 115
Roberts, I. 218 Sobin, N. 261
Rochemont, M. 71, 74, 90, 108, 110, 115, sound + motion construction 7
160, 163, 164, 178, 181, 182, 191–3, Spanish 3, 141–6, 247
204–6, 212–14, 216, 217, 221, 223, 230, Speas, M. 217
242, 243, 246, 247, 251, 256, 258, 273, Spec-head agreement 213, 214, 218,
303, 320 220–4, 227, 229, 230, 235, 240,
Rodgers, R. and Hart, L. 6 245, 258, 259
Roeper, T. 212 Speer, S. 334
Rooryk, J. 309 Spencer, A. 336
Rosenbaum, P. 138 Sperber, D. 103, 114, 116
Ross, J. 6, 70, 77, 205, 214, 230, 231, 285, Sportiche, D. 200
301, 329 Sprouse, J. 324
Rothstein, S. 122 Stainton, R. 5
Rotuman 356 Stillings, J. 147
380 index

Stockwell, R. 115 topicalization 163, 178, 194, 213–16, 218,


Stowell, T. 124, 194, 242, 274 231, 238, 248, 250–1, 255, 275, 276,
stress 71, 72, 74, 75, 83, 85–93, 97, 99–101, 285, 299
104–8, 111, 113–19 Townsend, D. 8
contrastive 47, 48, 105, 106, 114–16, 118 Trager, G. 115
emphatic 21, 46–8 Tyler, L. 340
Strong Assignment (SA) 83, 84, 88–90,
94, 95, 101, 103, 104 Ueyama, A. 212, 213
structure-preserving 61, 64, 65, 214, 251
Stuurman, F. 212 van Riemsdijk, H. 191
Stylistic Inversion 149–50 164, 185–8, 242, Van Valin, R. 2,
263, 269–76, 280–6, 289, 290 von Fintel, K. 254
Subjacency 151, 173, 199, 200, 213, 216, VP Ellipsis 194
266, 267
subject AUX inversion (SAI) 178, 179, Wasow, T. 130, 138, 157, 237, 320
187, 188, 205, 213, 215, 223–8, Weak Crossover (WCO) 194, 243, 251,
247, 255 274–6, 289
superiority 288, 289 Welsh, A. 340
Swahili 246 West Flemish 213
Switch Rule 93, 94, 97, 100 Wexler, K. 88, 98, 173, 204, 279, 285, 323,
syntactic complexity 3, 245, 322 331, 338
Szamosi, M. 115, 116 wh-Fronting 102, 164, 174, 185, 186,
188, 306
tag wh-question 88, 90–3, 102, 111–12, 115,
assertival 166, 168 140, 147, 151–2, 215, 217, 218, 227,
disputational 166–70 228–9, 235, 240, 245, 247, 249–50,
emphatic 53, 54, 59, 60 267, 299, 310
imperative 54, 59 Wicentowski, R. 341, 342, 344, 354
interrogative 54, 55, 59, 163, 165, 166, Wilkins, W. 10, 121, 137, 155, 285
169–78 Williams, E. 74, 106, 116, 122, 128, 139,
tag question; see tag, interrogative 141, 237
Tesar, B. 318 Wilson, D. 103, 114, 116
that-t effect 212, 214, 219, 221, 222, 255–9, Winkler, S. 242
261, 262, 264, 268, 271, 272,
283, 289 Yarowsky, D. 341, 342, 344, 354
thematic relations; see thematic roles Yiddish 215
thematic roles 28, 121, 123–8, 130, 145,
157, 162 Zaring, L. 212
Thompson, S. 240 Zwicky, A. 336

You might also like