You are on page 1of 9

Modeling the Noun Phrase versus Sentence Coordination Ambiguity in

Dutch: Evidence from Surprisal Theory

Harm Brouwer Hartmut Fitz John C. J. Hoeks


University of Groningen University of Groningen University of Groningen
Groningen, the Netherlands Groningen, the Netherlands Groningen, the Netherlands
harm.brouwer@rug.nl h.fitz@rug.nl j.c.j.hoeks@rug.nl

Abstract periment by Frazier (1987), exemplify this ambi-


guity:
This paper investigates whether surprisal
theory can account for differential pro- (1) a. Piet kuste Marie en / haar zusje / ook
Piet kissed Marie and / her sister / too
cessing difficulty in the NP-/S-coordina- [1,222ms; NP-coordination]
tion ambiguity in Dutch. Surprisal is es-
timated using a Probabilistic Context-Free b. Piet kuste Marie en / haar zusje / lachte
Piet kissed Marie and / her sister / laughed
Grammar (PCFG), which is induced from [1,596ms; S-coordination]
an automatically annotated corpus. We
find that our lexicalized surprisal model Both sentences are temporarily ambiguous in the
can account for the reading time data from boldface region. Sentence (1-a) is disambiguated
a classic experiment on this ambiguity by as an NP-coordination by the sentence-final ad-
Frazier (1987). We argue that syntactic verb ook. Sentence (1-b), on the other hand, is dis-
and lexical probabilities, as specified in a ambiguated as an S-coordination by the sentence-
PCFG, are sufficient to account for what is final verb lachte. Frazier found that the verb lachte
commonly referred to as an NP-coordina- in sentence (1-b) takes longer to process (1,596
tion preference. ms) than the adverb ook (1,222 ms) in (1-a).

1 Introduction Frazier (1987) explained these findings by as-


suming that the human language processor ad-
Language comprehension is incremental in that heres to the so-called minimal attachment prin-
meaning is continuously assigned to utterances ciple. According to this principle, the sentence
as they are encountered word-by-word (Altmann processor projects the simplest syntactic struc-
and Kamide, 1999). Not all words, however, are ture which is compatible with the material read
equally easy to process. A word’s processing dif- at any point in time. NP-coordination is syntac-
ficulty is affected by, for instance, its frequency or tically simpler than S-coordination in that it re-
its effect on the syntactic and semantic interpreta- quires less phrasal nodes to be projected. Hence,
tion of a sentence. A recent theory of sentence pro- the processor is biased towards NP- over S-coor-
cessing, surprisal theory (Hale, 2001; Levy, 2008), dination. Processing costs are incurred when this
combines several of these aspects into one single initial preference has to be revised in the disam-
concept, namely the surprisal of a word. A word’s biguating region, as in sentence (1-b), resulting in
surprisal is proportional to its expectancy, i.e., the longer reading times. Hoeks et al. (2006) have
extent to which that word is expected (or pre- shown that the NP-coordination preference can be
dicted). The processing difficulty a word causes reduced, but not entirely eliminated, when poor
during comprehension is argued to be related lin- thematic fit between the verb and a potential object
early to its surprisal; the higher the surprisal value make an NP-coordination less likely (e.g., Jasper
of a word, the more difficult it is to process. sanded the board and the carpenter laughed). We
In this paper we investigate whether surprisal argue here that this residual preference for NP-
theory can account for the processing difficulty coordination can be explained in terms of syntac-
involved in sentences containing the noun phrase tic and lexical expectation within the framework
(NP) versus sentence (S) coordination ambiguity. of surprisal theory. In contrast to the minimal at-
The sentences in (1), from a self-paced reading ex- tachment principle, surprisal theory does not pos-

72
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics, ACL 2010, pages 72–80,
Uppsala, Sweden, 15 July 2010. 2010
c Association for Computational Linguistics
tulate specific kinds of syntactic representations or sequence of word categories (i.e., part-of-speech
rely on a metric of syntactic complexity to predict tags). While previous studies have used unlexical-
processing behavior. ized surprisal to predict reading times, evidence
This paper is organized as follows. In section for lexicalized surprisal is rather sparse. Smith
2 below, we briefly sketch basic surprisal theory. and Levy (2008) investigated the relation between
Then we describe how we induced a grammar lexicalized surprisal and reading time data for nat-
from a large annotated Dutch corpus and how sur- uralistic texts. Using a trigram language model,
prisal was estimated from this grammar (section they showed that there was a linear relationship
3). In section 4, we describe Frazier’s experiment between the two measures. Demberg and Keller
on the NP-/S-coordination ambiguity in more de- (2008) examined whether this relation extended
tail, and present our surprisal-based simulations of beyond transitional probabilities and found no sig-
this data. We conclude with a discussion of our re- nificant effects. This state of affairs is somewhat
sults in section 5. unfortunate for surprisal theory since input to the
human language processor consists of sequences
2 Surprisal Theory of words, not part-of-speech tags. In our study we
therefore used lexicalized surprisal to investigate
As was mentioned in the introduction, language whether it can account for reading time data from
processing is highly incremental, and proceeds on the NP-/S-coordination ambiguity in Dutch. Lex-
a more or less word-by-word basis. This suggests icalized surprisal furthermore allows us to study
that a person’s difficulty with processing a sen- how syntactic expectations might be modulated or
tence can be modeled on a word level as proposed even reversed by lexical expectations in temporar-
by Attneave (1959). Furthermore, it has recently ily ambiguous sentences.
been suggested that one of the characteristics of
the comprehension system that makes it so fast, 2.1 Probabilistic Context Free Grammars
is its ability to anticipate what a speaker will say
Both Hale (2001) and Levy (2008) used a Prob-
next. In other words, the language comprehension
abilistic Context Free Grammar (PCFG) as a lan-
system works predictively (Otten et al., 2007; van
guage model in their implementations of surprisal
Berkum et al., 2005). Surprisal theory is a model
theory. A PCFG consists of a set of rewrite rules
of differential processing difficulty which accom-
which are assigned some probability (Charniak,
modates both these properties of the comprehen-
1993):
sion system, incremental processing and word pre-
diction (Hale, 2001; Levy, 2008). In this theory, S → NP, VP 1.0
the processing difficulty of a sentence is a func- NP → Det, N 0.5
tion of word processing difficulty. A word’s dif- NP → NP, VP 0.5
ficulty is inversely proportional to its expectancy, ... → ... ...
i.e., the extent to which the word was expected or In this toy grammar, for instance, a noun phrase
predicted in the context in which it occurred. The placeholder can be rewritten to a determiner fol-
lower a word’s expectancy, the more difficult it is lowed by a noun symbol with probability 0.5.
to process. A word’s surprisal is linearly related to From such a PCFG, the probability of a sentence
its difficulty. Consequently, words with lower con- can be estimated as the product of the probabili-
ditional probabilities (expectancy) lead to higher ties of all the rules used to derive the sentence. If
surprisal than words with higher conditional prob- a sentence has multiple derivations, its probabil-
abilities. ity is the sum of the probabilities for each deriva-
Surprisal theory is, to some extent, indepen- tion. For our purpose, we also needed to obtain the
dent of the language model that generates condi- probability of partial sentences, called prefix prob-
tional word probabilities. Different models can abilities. The prefix probability P (w1 ...wi ) of a
be used to estimate these probabilities. For all partial sentence w1 ...wi is the sum of the probabil-
such models, however, a clear distinction can be ities of all sentences generated by the PCFG which
made between lexicalized and unlexicalized sur- share the initial segment w1 ...wi . Hale (2001)
prisal. In lexicalized surprisal, the input to the lan- pointed out that the ratio of the prefix probabilities
guage model is a sequence of words (i.e., a sen- P (w1 . . . wi ) and P (w1 . . . wi−1 ) equals precisely
tence). In unlexicalized surprisal, the input is a the conditional probability of word wi . Given a

73
PCFG, the difficulty of word wi can therefore be only a small number of ‘relevant’ analyses would
defined as: be considered in parallel.
P (w1 . . . wi )
 
difficulty(wi ) ∝ −log2 . 3 Grammar and Parser
P (w1 . . . wi−1 )
Surprisal theory requires a probabilistic lan- 3.1 Grammar Induction
guage model that generates some form of word In our simulations, we used a PCFG to model
expectancy. The theory itself, however, is largely the phrase structure of natural language. To in-
neutral with respect to which model is employed. duce such a grammar, an annotated corpus was
Models other than PCFGs can be used to esti- required. We used Alpino (van Noord, 2006)—
mate surprisal. Nederhof et al. (1998), for in- a robust and wide-coverage dependency parser
stance, show that prefix probabilities, and there- for Dutch—to automatically generate such a cor-
fore surprisal, can be estimated from Tree Adjoin- pus, annotated with phrase structure, for 204.000
ing Grammars. This approach was taken in Dem- sentences, which were randomly extracted from
berg and Keller (2009). Other approaches have Dutch newspapers. These analyses were then
used trigram models (Smith and Levy, 2008), Sim- used to induce a PCFG consisting of 650 gram-
ple Recurrent Networks of the Elman type (Frank, mar rules, 89 non-terminals, and 208.133 termi-
2009), Markov models and Echo-state Networks nals (lexical items).1 Moreover, 29 of the 89 non-
(Frank and Bod, 2010). This illustrates that sur- terminals could result in epsilon productions.
prisal theory is not committed to specific claims The Alpino parser constructed the phrase struc-
about the structural representations that language ture analyses automatically. Despite Alpino’s high
takes in the human mind. It rather functions as a accuracy, some analyses might not be entirely cor-
“causal bottleneck” between the representations of rect. Nonetheless, the overall quality of Alpino’s
a language model, and expectation-based compre- analyses is sufficient for corpus studies, and since
hension difficulty (Levy, 2008). In other words, surprisal theory relies largely on corpus features,
comprehension difficulty does not critically de- we believe the small number of (partially) incor-
pend on the structural representations postulated rect analyses should not affect the surprisal esti-
by the language model which is harnessed to gen- mates computed from our PCFG.
erate word expectancy.
The use of PCFGs raises some important ques- 3.2 Earley-Stolcke Parser
tions on parallelism in language processing. A To compute prefix probabilities in our model we
prefix probability can be interpreted as a prob- implemented Stolcke’s (1995) probabilistic modi-
ability distribution over all analyses compatible fication of Earley’s (1970) parsing algorithm. An
with a partial sentence. Since partial sentences Earley-Stolcke parser is a breadth-first parser. At
can sometimes be completed in an indefinite num- each point in processing, the parser maintains a
ber of ways, it seems both practically and psycho- collection of states that reflect all possible analy-
logically implausible to implement this distribu- ses of a partial sentence thus far. A state is a record
tion as an enumeration over complete structures. that keeps track of:
Instead, prefix probabilities should be estimated
as a by-product of incremental processing, as in (a) the position up to which a sentence has been
Stolcke’s (1995) parser (see section 3.2). This processed,
approach, however, still leaves open how many
analyses are considered in parallel; does the hu- (b) the grammar rule that is applied,
man sentence processor employ full or limited par-
(c) a “dot position” indicating which part of the
allelism? Jurafsky (1996) showed that full par-
rule has been processed thus far, and
allelism becomes more and more unmanageable
when the amount of information used for disam- (d) the leftmost edge of the partial string gener-
biguation increases. Levy, on the other hand, ar- ated by the rule.
gued that studies of probabilistic parsing reveal
1
that typically a small number of analyses are as- A PCFG can be induced by estimating the relative fre-
quency of each CFG rule A → α:
signed the majority of probability mass (Roark,
2001). Thus, even when assuming full parallelism, P (A → α) = Pcount(A→α) .
count(A→β)
β

74
The collection of states is constantly expanded by ject of a conjoined sentence; Piet kuste Marie and
three operations. First upcoming structural and haar zusje lachte are conjoined.
lexical material is predicted. For all predictions,
(2) a. Piet kuste Marie en haar zusje ook
new states are added with the “dot” placed on Pete kissed Marie and her sister too
the leftmost side of the rule. Then it is deter- (Ambiguous; NP-coordination)
mined whether there is a state that predicts the next
b. Piet kuste Marie en haar zusje lachte
word in the input sentence. If this is the case, a Pete kissed Marie and her sister laughed
new state is added with the “dot” placed right to (Ambiguous; S-coordination)
the predicted word. A third operation looks for
c. Annie zag haar zusje ook
states with the “dot” rightmost to a grammar rule, Annie saw her sister too
and then tries to find states which have the com- (Unambiguous; NP-control)
pleted state as their leftmost edge. If such states
d. Annie zag dat haar zusje lachte
are found, the “dot” in these states is moved to Annie saw that her sister laughed
the right of this edge. This step is repeated until (Unambiguous; S-control)
no more new states are added. These three op-
erations are cyclically performed until the entire Sentence (2-c) and (2-d) functioned as unambigu-
sentence is processed. Our grammar contained ous controls. These sentences are identical up to
29 non-terminals that could result in epsilon pro- the verb zag. In (2-c), the verb is followed by
ductions. Due to the way epsilon productions are the single NP haar zusje, and subsequently the ad-
handled within the Earley-Stolcke parser (i.e., by verb ook. The adverb eliminates the possibility of
means of “spontaneous dot shifting”), having a an NP-coordination. In (2-d), on the other hand,
large number of epsilon productions leads to a the same verb is followed by the complementizer
large number of predicted and completed edges. dat, indicating that the clause her sister laughed is
As a consequence, pursuing all possible analyses a subordinate clause (the complementizer is oblig-
may become computationally infeasible. To over- atory in Dutch).
come this problem, we modified the Earley-Stol- Frazier constructed twelve sets consisting of
cke parser with a beam λ. In prediction and com- four of such sentences each. The 48 sentences
pletion, only the λ-number of states with the high- were divided into three frames. The first frame
est probabilities are added.2 This constrains the included all the material up to the critical NP
number of states generated by the parser and en- haar zusje in (2). The second frame contained only
forces limited parallelism. the critical NP itself, and the third frame contained
all the material that followed this NP.
4 NP-/S-coordination ambiguities 40 native Dutch speakers participated in the ex-
periment. Reading times for the final frames were
4.1 Frazier’s experiment collected using a self-paced reading task. Figure 1
Our aim was to determine to what extent lexi- depicts the mean reading times for each of the four
calized surprisal theory can account for reading conditions.
time data for the NP-/S-coordination ambiguity in Frazier found a significant interaction between
Dutch. This type of ambiguity was investigated Type of Coordination (NP- versus S-coordination)
by Frazier (1987) using a self-paced reading ex- and Ambiguity (ambiguous versus control) indi-
periment. The sentences in (2) are part of Fra- cating that the effect of disambiguation was larger
zier’s materials. Sentence (2-a) and (2-b) exem- for S-coordinations (ambiguous: 1596 ms; con-
plify an NP-/S-coordination ambiguity. The sen- trol: 1141 ms) than for NP-coordinations (ambigu-
tences are identical and temporarily ambiguous up ous: 1222 ms; control: 1082 ms).
to the NP haar zusje (her sister). In (2-a) this
NP is followed by the adverb ook, and therefore 4.2 Simulations
disambiguated to be part of an NP-coordination; We simulated Frazier’s experiment in our model.
Marie and haar zusje are conjoined. In (2-b), on Since one set of sentences contained a word that
other hand, the same NP is followed by the verb was not covered by our lexicon (set 11; “Lor-
lachte, and therefore disambiguated as the sub- raine”), we used only eleven of the twelve sets
2
A similar approach was used in Roark (2001) and of test items from her study. The remaining 44
Frank (2009). sentences were successfully analyzed. In our first

75
NP−/S−control

1600
NP−/S−coordination
mean reading times (ms) ambiguous
unambiguous

difference in means (NP* − S*)


1200
200

0
800

−200

−400
400

−600
NP−coord/control S−coord/control

type of coordination
32 16 8 4

beam
Figure 1: Reading time data for the NP-/S-coordi-
nation ambiguity (Frazier, 1987).
Figure 3: Differences between NP versus S sur-
prisal for different beam sizes (λs).
simulation we fixed a beam of λ = 16. Figure 2
depicts surprisal values in the sentence-final frame
as estimated by our model. When final frames items (i.e., 11) available for this statistical test (re-
contained multiple words, we averaged the sur- call that the test in the original experiment was
prisal values for these words. As Figure 2 shows, based on 40 participants). Follow-up analyses re-
vealed that the difference between S-coordination
7500

and S-control was significant (p < 0.05), whereas


ambiguous the difference between NP-coordination and NP-
7000

unambiguous control was not (p = 0.527).


mean surprisal

To test the robustness of these findings, we re-


6500

peated the simulation with different beam sizes


6000

(λs) by iteratively halving the beam, starting with


λ = 32. Figure 3 shows the differences in
5500

mean surprisal between NP-coordination and S-


coordination, and NP-control and S-control. With
5000

the beam set to four (λ = 4), we did not obtain full


NP−coord/control S−coord/control analyses for all test items. Consequently, two sets
of items had to be disregarded (sets 8 and 9). For
type of coordination
the remaining items, however, we obtained an NP-
Figure 2: Mean surprisal values for the final frame coordination preference for all beam sizes. The
in the model (λ = 16). largest difference occurred for λ = 16. When
the beam was set to λ ≤ 8, the difference stabi-
our model successfully replicated the effects re- lized. Taking everything into account, the model
ported in Frazier (1987): In both types of coordi- with λ = 16 led to the best overall match with
nations there was a difference in mean surprisal Frazier’s reading time data.
between the ambiguous sentences and the con- As for the interaction, Figure 4 depicts the dif-
trols, but in the S-coordinations this effect was ferences in mean surprisal between NP-coordina-
larger than in the sentences with NP-coordination. tion and NP-control, and S-coordination and S-
Statistical analyses confirmed our findings. An control. These results indicate that we robustly
ANOVA on surprisal values per item revealed an replicated the interaction between coordination
interaction between Type of Coordination (NP- vs. type and ambiguity. For all beam sizes, S-co-
S-coordination) and Ambiguity (ambiguous vs. ordination benefited more from disambiguation
control), which was marginally significant (p = than NP-coordination, i.e., the difference in means
0.06), most probably due to the small number of between S-coordination and S-control was larger

76
NP−coordination
NP−coordination/NP−control S−coordination
S−coordination/S−control
difference in means (*−coord. − *−control)
8000

1500 7000

surprisals
1000 6000

500 5000

0
1 2 3 4 5 6 7 8 9 10 12
32 16 8 4
sentences
beam
Figure 5: Surprisal per sentence for final frames in
Figure 4: Differences in coordination versus con- the ambiguous condition.
trol surprisal for different beam sizes (λs).
NP−control
S−control
than the difference in means between NP-coordi-
nation and NP-control.
In our simulations, we found that surprisal the-
ory can account for reading time data from a clas-
7000
sic experiment on the NP-/S-coordination ambigu-
surprisals

ity in Dutch reported by Frazier (1987). This sug-


gests that the interplay between syntactic and lex- 6000
ical expectancy might be sufficient to explain an
NP-coordination preference in human subjects. In
the remainder of this section, we analyze our re- 5000
sults and explain how this preference arises in the
model.
1 2 3 4 5 6 7 8 9 10 12

4.3 Model Analysis sentences

To determine what caused the NP-preference in Figure 6: Surprisal per sentence for final frames in
our model, we inspected surprisal differences the unambiguous condition.
item-by-item. Whether the NP-coordination pref-
erence was syntactic or lexical in nature should overall (see Figure 2), this was not the case for all
be reflected in the grammar. If it was syntactic, tested items. A similar pattern was found for the
NP-coordination would have a higher probability NP-control versus S-control items in Figure 6. S-
than S-coordination according to our PCFG. If, on controls led to lower surprisal overall, but not for
the other hand, it was lexical, NP- and S-coor- all items. Manual inspection of the grammar re-
dination should be equally probable syntactically. vealed a bias towards NP-coordination. A total of
Another possibility, however, is that syntactic and 115 PCFG rules concerned coordination (≈ 18%
lexical probabilities interacted. If this was the of the entire grammar). As these rules expanded
case, we should expect NP-coordinations to lead the same grammatical category, their probabilities
to lower surprisal values on average only, but not summed to 1. A rule-by-rule inspection showed
necessarily on every item. Figure 5 shows the es- that approximately 48% of the probability mass
timated surprisal values per sentence-final frame was assigned to rules that dealt with NP-coordi-
for the ambiguous condition and Figure 6 for the nations, 22% to rules that dealt with S-coordina-
unambiguous condition. Figure 5 indicates that tions, and the remaining 30% to rules that dealt
although NP-coordination led to lower surprisal with coordination in other structures. In other

77
words, there was a clear preference for NP-coordi- these results to be robust for a critical model pa-
nation in the grammar. Despite this bias, for some rameter (beam size), which suggests that syntac-
tested items the S-coordination received lower sur- tic processing in human comprehension might be
prisal than the NP-coordination (Figure 5). In- based on limited parallelism only. Surprisal the-
dividual NP-coordination rules might have lower ory models processing difficulty on a word level.
probability than individual S-coordination rules, A word’s difficulty is related to the expectations
so the overall preference for NP-coordination in the language processor forms, given the structural
the grammar therefore does not have to be re- and lexical material that precedes it. The model
flected in every test item. Secondly, syntactic showed a clear preference for NP-coordination
probabilities could be modified by lexical proba- which suggests that structural and lexical expec-
bilities. Suppose for a pair of test items that NP- tations as estimated from a corpus might be suffi-
coordination was syntactically preferred over S- cient to explain the NP-coordination bias in human
coordination. If the sentence was disambiguated sentence processing.
as an NP-coordination by a highly improbable lex- Our account of this bias differs considerably
ical item, and disambiguated as an S-coordination from the original account proposed by Frazier
by a highly probable lexical item, surprisal for the (minimal attachment principle) in a number of
NP-coordination might turn out higher than sur- ways. Frazier’s explanation is based on a met-
prisal for the S-coordination. In this way, lexical ric of syntactic complexity which in turn depends
factors could override the NP-coordination bias in on quite specific syntactic representations of a
the grammar, leading to a preference for S-coordi- language’s phrase structure. Surprisal theory, on
nation in some items. the other hand, is largely neutral with respect to
To summarize, the PCFG displayed an over- the form syntactic representations take in the hu-
all NP-coordination preference when surprisal was man mind.4 Moreover, differential processing in
averaged over the test sentences and this result is surprisal-based models does not require the speci-
consistent with the findings of Frazier (1987). The fication of a notion of syntactic complexity. Both
NP-coordination preference, however, was not in- these aspects make surprisal theory a parsimo-
variably reflected on an item-by-item basis. Some nious explanatory framework. The minimal at-
S-coordinations showed lower surprisal than the tachment principle postulates that the bias towards
corresponding NP-coordinations. This reversal of NP-coordination is an initial processing primitive.
processing difficulty can be explained in terms of In contrast, the bias in our simulations is a func-
differences in individual rules, and in terms of in- tion of the model’s input history and linguistic
teractions between syntactic and lexical probabil- experience from which the grammar is induced.
ities. This suggests that specific lexical expecta- It is further modulated by the immediate context
tions might have a much stronger effect on disam- from which upcoming words are predicted dur-
biguation preferences than supposed by the min- ing processing. Consequently, the model’s prefer-
imal attachment principle. Unfortunately, Frazier ence for one structural type can vary across sen-
(1987) only reported mean reading times for the tence tokens and even be reversed on occasion.
two coordination types.3 It would be interesting to We argued that our grammar showed an over-
compare the predictions from our surprisal model all preference for NP-coordination but this pref-
with human data item-by-item in order to validate erence was not necessarily reflected on each and
the magnitude of lexical effects we found in the every rule that dealt with coordinations. Some S-
model. coordination rules could have higher probability
than NP-coordination rules. In addition, syntac-
5 Discussion tic expectations were modified by lexical expec-
tations. Thus, even when NP-coordination was
In this paper we have shown that a model of lex- structurally favored over S-coordination, highly
icalized surprisal, based on an automatically in- unexpected lexical material could lead to more
duced PCFG, can account for the NP-/S-ambiguity processing difficulty for NP-coordination than for
reading time data of Frazier (1987). We found
4
This is not to say, of course, that the choice of language
3
Thus it was not possible to determine the strength of the model to estimate surprisal is completely irrelevant; differ-
correlation between reading times in Frazier’s study and sur- ent models will yield different degrees of fit, see Frank and
prisal in our model. Bod (2010).

78
S-coordination. Surprisal theory allows us to build to Psychology: A summary of basic concepts, meth-
a formally precise computational model of read- ods, and results. Holt, Rinehart and Winston.
ing time data which generates testable, quantita- E. Charniak. 1993. Statistical Language Learning.
tive predictions about the differential processing MIT Press.
of individual test items. These predictions (Figure
V. Demberg and F. Keller. 2008. Data from eye-
5) indicate that mean reading times for a set of NP- tracking corpora as evidence for theories of syntactic
/S-coordination sentences may not be adequate to processing complexity. Cognition, 109:193–210.
tap the origin of differential processing difficulty.
V. Demberg and F. Keller. 2009. A computational
Our results are consistent with the findings of model of prediction in human parsing: Unifying lo-
Hoeks et al. (2002), who also found evidence cality and surprisal effects. In Proceedings of the
for an NP-coordination preference in a self-paced 31st Annual Conference of the Cognitive Science So-
reading experiment as well as in an eye-tracking ciety, Amsterdam, the Netherlands.
experiment. They suggested that NP-coordination J. Earley. 1970. An efficient context-free parsing algo-
might be easier to process because it has a sim- rithm. Communications of the ACM, 6:451–455.
pler topic structure than S-coordination. The for-
S. Frank and R. Bod. 2010. The irrelevance of hi-
mer only has one topic, whereas the latter has two. erarchical structure to human sentence processing.
Hoeks et al. (2002) argue that having more than Unpublished manuscript.
one topic is unexpected. Sentences with more than
S. Frank. 2009. Surprisal-based comparison between a
one topic will therefore cause more processing dif- symbolic and a connectionist model of sentence pro-
ficulty. This preference for simple topic-structure cessing. In Proceedings of the 31th Annual Confer-
that was evident in language comprehension may ence of the Cognitive Science Society, pages 1139–
also be present in language production, and hence 1144, Amsterdam, the Netherlands.
in language corpora. Thus, it may very well be L. Frazier. 1987. Syntactic processing: Evidence from
the case that the NP-coordination preference that Dutch. Natural Langauge and Linguistic Theory,
was present in our training corpus may have had 5:519–559.
a pragmatic origin related to topic-structure. The J. Hale. 2001. A probabilistic Earley parser as a psy-
outcome of our surprisal model is also compati- cholinguistic model. In Proceedings of the 2nd Con-
ble with the results of Hoeks et al. (2006) who ference of the North American Chapter of the As-
found that thematic information can strongly re- sociation for Computational Linguistics, volume 2,
pages 159–166.
duce but not completely eliminate the NP-coordi-
nation preference. Surprisal theory is explicitly J. Hoeks, W. Vonk, and H. Schriefers. 2002. Process-
built on the assumption that multiple sources of ing coordinated structures in context: The effect of
topic-structure on ambiguity resolution. Journal of
information can interact in parallel at any point in Memory and Language, 46:99–119.
time during sentence processing. Accordingly, we
suggest here that the residual preference for NP- J. Hoeks, P. Hendriks, W. Vonk, C. Brown, and P. Ha-
goort. 2006. Processing the noun phrase versus sen-
coordination found in the study of Hoeks et al.
tence coordination ambiguity: Thematic informa-
(2006) might be explained in terms of syntactic tion does not completely eliminate processing dif-
and lexical expectation. And finally, our approach ficulty. The Quarterly Journal of Experimental Psy-
is consistent with a large body of evidence indi- chology, 59:1581–1599.
cating that language comprehension is incremen- D. Jurafsky. 1996. A probabilistic model of lexical
tal and makes use of expectation-driven word pre- and syntactic access and disambiguation. Cognitive
diction (Pickering and Garrod, 2007). It remains Science, 20:137–147.
to be tested whether our model can explain behav- R. Levy. 2008. Expectation-based syntactic compre-
ioral data from the processing of ambiguities other hension. Cognition, 106:1126–1177.
than the Dutch NP- versus S-coordination case.
M. Nederhof, A. Sarkar, and G. Satta. 1998. Prefix
probabilities from stochastic tree adjoining gram-
mar. In Proceedings of COLING-ACL ’98, pages
References 953–959, Montreal.
G. Altmann and Y. Kamide. 1999. Incremental inter-
pretation at verbs: Restricting the domain of subse- M. Otten, M. Nieuwland, and J. van Berkum. 2007.
quent reference. Cognition, 73:247–264. Great expectations: Specific lexical anticipation in-
fluences the processing of spoken language. BMC
F. Attneave. 1959. Applications of Information Theory Neuroscience.

79
M. Pickering and S. Garrod. 2007. Do people use lan-
guage production to make predictions during com-
prehension? Trends in Cognitive Sciences, 11:105–
110.
B. Roark. 2001. Probabilistic top-down parsing
and language modeling. Computational Linguistics,
27:249–276.
N. Smith and R. Levy. 2008. Optimal processing times
in reading: A formal model and empirical investi-
gation. In Proceedings of the 30th annual confer-
ence of the Cognitive Science Society, pages 595–
600, Austin, TX.
A. Stolcke. 1995. An efficient probabilistic context-
free parsing algorithm that computes prefix proba-
bilities. Computational linguistics, 21:165–201.
J. van Berkum, C. Brown, P. Zwitserlood, V. Kooij-
man, and P. Hagoort. 2005. Anticipating upcom-
ing words in discourse: Evidence from ERPs and
reading times. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 31:443–467.
G. van Noord. 2006. At last parsing is now op-
erational. In Verbum Ex Machina. Actes de la
13e conférence sur le traitement automatique des
langues naturelles, pages 20–42. Presses universi-
taires de Louvain.

80

You might also like