Chomsky Hierarchy

CHOMSKY HIERARCHY
Within the field of computer science, specifically in the area of

formal languages, the Chomsky hierarchy (occasionally referred to
as Chomsky–Schützenberger hierarchy) is a containment
hierarchy of classes of formal grammars.
This hierarchy of grammars was described by Noam Chomsky

in 1956. It is also named after Marcel-Paul Schützenberger who
played a crucial role in the development of the theory of formal
languages.
The People behind the Chomsky Hierarchy
A. Noam Chomsky
- Born December 7, 1928
- Currently Professor Emeritus of linguistics at MIT
- Created the theory of generative grammar
- Sparked the cognitive revolution in psychology
- From 1945, studied philosophy and linguistics at the
University of
Pennsylvania
- PhD in linguistics from University of Pennsylvania in
1955
- 1956, appointed full Professor at MIT, Department of
Linguistics and
Philosophy
- 1966, Ferrari P. Ward Chair; 1976, Institute Professor
Contributions:
Linguistics
Transformational grammars
Generative grammar
Language aquisition
Computer Science
Chomsky hierarchy
Chomsky Normal Form
Context Free Grammars
Psychology
Cognitive Revolution (1959)
Universal grammar
1
B. Marcel-Paul Schützenberger
- Born 1920, died 1996
- Mathematician, Doctor of Medicine
- Professor of the Faculty of Sciences, University of Paris
- Member of the Academy of Sciences
- First trained as a physician, doctorate in medicine in
1948
- PhD in mathematics in 1953
- Professor at the University of Poitiers, 1957-1963

- Director of research at the CNRS, 1963-1964
- Professor in the Faculty of Sciences at the University of
Paris,
1964-1996
Contributions:
Formal languages with Noam Chomsky
Chomsky-Schützenberger hierarchy
Chomsky-Schützenberger theorem
Automata with Samuel Ellenberger
Biology and Darwinism
Mathematical critique of neo-darwinism
(1966)
Phrase Structure Rules

Structure within the NP
1 Definitions
(1) a tree for ‘the brown fox sings’
2
• Linguistic trees have nodes. The nodes in (1) are A, B, C, D, E,
F,
and G.
• There are two kinds of nodes: internal nodes and terminal
nodes. The
internal nodes in (1) are A, B, and E. The terminal nodes
are C, D, F, and G. Terminal nodes are so called because
they are not expanded into anything further. The tree
ends there.
• Terminal nodes are also called leaf nodes. The leaves of (1)
are
really the words that constitute the sentence ‘the brown
fox sings’ i.e. ‘the’, ‘brown’, ‘fox’, and ‘sings’.
(2) a. A set of nodes form a constituent iff they are

exhaustively
dominated by a common node.
b. X is a constituent of Y iff X is dominated by Y.

c. X is an immediate constituent of Y iff X is
immediately
dominated by Y.
Notions such as subject, object, prepositional object etc. can be

defined structurally. So a subject is the NP immediately dominated by
S and an object is an NP immediately dominated by VP etc.
(3) a. If a node X immediately dominates a node Y, then X is

the
mother of Y, and Y is the daughter of X.
b. A set of nodes are sisters if they are all immediately
dominated by the same (mother) node.
We can now define a host of relationships on trees - grandmother,

granddaughter, descendant, ancestor etc.
Another important relationship that is defined in purely structural

terms is c-command.
(4) A c-commands B if and only if A does not dominate B and

the node that immediately dominates A dominates B.
3
c-command is used in the formulation of Condition C, a principle
used to determine what a pronoun may not refer to.
• CONDITION C
(5) A pronoun cannot refer to a proper name it c-commands.
Note that Condition C is a negative condition. It never tells you what a
particular pronoun must refer to. It only tells you what it cannot refer
to.
In general, if a pronoun cannot refer to a proper name (despite
agreeing in gender and number), you can conclude that the pronoun
c-commands the proper name.
• The NO CROSSING BRANCHES CONSTRAINT

(6) If one node X precedes another node Y, then all descendants
of X must also precede Y and
all descendants of Y.
2 How to grow trees

Where do the trees that we use to analyze linguistic structure come
from?
In a way, they are just representations of facts that exist out in the
world - the facts that we can discover using constituency test. So one
way to make trees is by doing empirical work – taking a sentence,
applying various constituency tests to the words in the sentence, and
then drawing a tree based on the results of our tests.
This empirical method is ultimately the only correct way to deduce

‘tree structure’. However, in most cases, we can simplify things
considerably by using Phrase Structure Rules.
Phrase Structure Rules are rules of the sort

X Y Z
This rule says ‘take the node X and expand it into the nodes Y and Z’.
Alternately, going from right to left (or from below), it says ‘if you
have a Y and a Z next to each other, you can combine them to make
an X’.
Phrase structure rules can be categorial i.e. rules that expand

categories into other categories, or
they can also be lexical i.e. rules that expand category labels by word
(lexical items).
4
• A grammar can then be thought of as a set of phrase structure rules
(categorial rules plus lexical rules).
The categorial rules can be thought of as (part of) the syntax and the
lexical rules as (part of) the lexicon.
2.1 Some Phrase Structure Rules for English
(7) Categorial Rules

a. SNP Modal VP
b. VPV AP PP
c. APADVP A
d. ADVP! ADV
e. PP !P NP
f. NP!DN
(8) Lexical Rules

a. N girl
b. N  boy
c. Adv incredibly
d. A  conceited
e. V seem
f. Modal must
g. P to
h. D that
i. D this
Some sentences these rules will generate:

(9) a. This boy must seem incredibly conceited to that girl.
b. This boy must seem incredibly conceited to this girl.
c. This boy must seem incredibly conceited to that boy.
d. This boy must seem incredibly conceited to this boy.
e. This girl must seem incredibly conceited to that girl.
f. This girl must seem incredibly conceited to this girl.
g. This girl must seem incredibly conceited to that boy.
h. This girl must seem incredibly conceited to this boy.
How many more sentences will these rules generate?

• Optional constituents
How do we handle cases like:
(10) This boy must seem incredibly stupid.
5
2.2 Introducing infinity
We know that human languages can contain sentences of arbitrary
length. Consider (11) which stands for an infinite number of
sentences.
(11) He believes that he believes that he believes that he believes

that … he
ate pizza.
So if all of human language is to be generated by a set of phrase

structure rules, the relevant set of phrase structure rules should
generate an infinite number of sentences.
How can that be done?

Let us try to analyze (11), starting with a more manageable (12).
(12) He believes that he ate pizza.

We start with the following categorial rules:
(13) a. S NP VP
b. VP V ¯S
c. ¯S  COMP S
d. VPVNP
We need the following lexical rules:

(14) a. NPhe
b. NP pizza
c. V ate
d. Vbelieves
e. COMP that
Now we can generate (12). This is shown in (15).
6
But is (12) all that the rules in (13) and (14) will generate?
How many sentences will (13) and (14) generate?
2.2.1 Overgeneration
The rules in (13) and (14) will also generate sentences (see the
structure below) like:
(16) *He ate that he believes pizza.
How can we constrain phrase structure rules so that such

overgeneration does not take place?
3 Noun Phrases
So far, we have seen two kinds of categories:

word-level categories such as N, V, A, P etc. (somewhat imprecisely,
words) and
7
phrase-level categories such as NP, VP, AP, PP etc. (somewhat
imprecisely, sequences of words which can ‘stand on their own’).
We will now investigate if these two kinds of categories are all we

need a third category which lies in between words and full phrases.
Consider the following NP:
(17) the king of England

We feel quite confident saying that ‘the king of England’ is an NP.
What else can we say about its structure?
There seems to be a lot of evidence that of England is a PP. It can be

co-ordinated, shared in shared constituent co-ordination. It can also
function as a sentence fragment and be preposed.
(18) a. the king [PP of England] and [PP of the empire].

(coordination)
b. He is the king, and she is the queen, [PP of England]. (shared
constituent coordination)
c. A:Was he the king of Livonia?
B: No, [PP of England]. (sentence fragment)
d. [PP Of which country] was he the king?
At this point we have two options:
8
There is evidence from constituency tests that the sequence of words
‘king of England’ forms a constituent.
• ‘king of England’ can undergo co-ordination with another similar

sequence.
(21) Vivian dared defy the [king of England] and [ruler of the Empire]?
• ‘king of England’ can serve as the shared constituent in shared

constituent co-ordination.
(22) Edward was the last, and some people say the best, [king of
England].
• There is a proform that replaces sequences like ‘king of England’.
(23) The present [king of England] is more popular than the last one.
So ‘king of England’ forms a constituent that excludes the. Thus we

have evidence for the tree in (20).
This evidence doesn’t actually rule out the tree in (19). It is not easy
to rule out (19) on the basis of the discussion so far. However, an
assumption that natural language structures only involve binary
branching could be used to block structures like (19).
3.1 What kind of constituent is ‘king of England’?

In other words, what is the name of the node labeled ?? in (20)??
Let us assume that it is an NP.
We find that this assumption is problematic in many ways.
•‘king of England’ does not have the distribution of ‘normal’/‘full’
noun phrases. Normal NPs can occur in subject position, in object
9
position, and as a prepositional object. ‘king of England’ cannot
appear in any of these positions.
(24) a. subject:
i. [The king of England] invaded several countries.

ii. * [King of England] invaded several countries.
b. object:
i. I saw [the king of England] on the T yesterday.
ii. * I saw [king of England] on the T yesterday.
c. prepositional object:
i. I didn’t give any money to [the king of England].
ii. * I didn’t give any money to [king of England].
• Consider the tree for ‘the king of England’ under the assumption
that ‘king of England’ is also an NP.i. I didn’t give any money to [the
king of England].
From this tree, we can read of the phrase structure rules involved in
building it. They are shown in (26).
(26) a. Categorial Rules:

i. NPDNP
ii. NPN PP
iii. PP P NP
b. Lexical Rules:
i. D the
10
ii. Nking
iii. P of
iv. NPEngland
Note in particular the categorial rule (26a.i). It has the unusual

property that it expands a node label into itself. Such rules are called
recursive and this phenomena is called recursion.
So we can go from NP to [D NP] to [D D NP] to [D D D NP] and so on.

In principle, using the rules in (26), we can generate NPs like those in
(27).
(27) a. * the the king of England

b. * the the the king of England
c. * the … the the the king of England
Now, it is very clear that none of the NPs in (27) are good noun
phrases in English. From this we can conclude that the categorial rule
(26a.i), which is the source of the recursion, cannot be correct.
So:
• ‘king of England’ cannot be an NP
and yet _ ‘king of England’ is a constituent of some sort.
Let us call nominal constituents that are bigger than words but still
not full phrases ¯N (or N0 or N-bar).
Our tree now becomes:
NPs are somtimes called N-double bars or N00. N are sometimes

called N0.
4 Complements and Adjuncts
11
Consider the phrase-structure rules responsible for generating (28) :
(29) a. NP DN0
b. N0 N PP
c. PP P NP
We see that D combines with an N0 to its left and forms an NP.

Similarly P combines with an NP to its left and forms a PP.
Likewise, (29) says that an N combines with any PP that follows it (i.e.
any postnominal NP) and forms an N’.
But is this really the case? Do the PPs in (30a, b) have the same
relation to the N?
(30) a. a student [of Physics]

b. a student [with long hair]
It seems not. Consider the following pattern:
(31) a. i. He is [a student of Physics].
ii. = He is [studying Physics].
b. i. He is [a student with long hair].
ii. 6= He is [studying long hair].
PPs like ‘of Physics’ are called complements, while PPs like ‘with long
hair’ are called adjuncts.
Corresponding to this difference in terminology, a structural
difference is also proposed. This is shown in (32).
In terms of phrase structure rules this is:
(33) a. N’’ DN, (Determiner Rule)

b. N’  N’ PP (Adjunct Rule)
12
c. N’  N PP (Complement Rule)
The rules in (33) make a prediction - if an NP contains both a

complement PP and an adjunct PP, the complement PP should
precede the adjunct PP. This prediction turns out to be true.
(34) a. the student [of Physics] [with long hair]

b. * the student [with long hair] [of Physics]
4.1 Optional Constituents of the Noun Phrase

Do all NPs have to contains a determiner, a noun, a complement PP,
and an adjunct PP?
Well, they have to contain an N, otherwise they wouldn’t be NPs.
What about the others?
Consider the rules in (33). If you wanted to make an NP, would it be
necessary to apply, the Adjunct rule?
You could take an N and a complement PP and make an N0. Then you
could combine the N’ with an adjunct PP to make another N’.
You could, but you don’t have to. You can now just combine your
adjunct-less N’ with a D on its left to make an NP.
So, NPs don’t have to contain adjuncts. In other words, the adjunct
rule is an optional rule.
Still, the rules in (33) insist that every NP must have a determiner (the
Determiner rule) and a complement PP (the Complement rule). This
is, however, just false.
(35) a. the student

b. the student with long hair
(35a) is an NP without a complement PP, (35a) show that an NP
without a complement PP can still take an adjunct PP. How can we
modify our phrase structure rules to handle these case?
For this purpose, we will introduce new terminology: (A) means that A
is optional. So we can now change our complement rule from (36a) to
(36b).
(36) a. N’ N PP (Old Complement Rule)

b. N’ N (PP) (New Complement Rule)
We also find optionality of determiners cf. (37a-e).
13
(37) a. cheese from Greece
b. students
c. students with long hair
d. students of physics
e. students of physics with long hair
However, this optionality is lexically determined i.e. it only works

for certain nouns – noncount nouns and plural count nouns but not
singular count nouns.
(38) a. * Student likes pizza

b. * Student with long hair likes pizza
For such nouns, we can modify the determiner rule in the way we
modified the complement rule in (36).
(39) a. NP DN’ (Old Determiner Rule)
b. NP (D) N’ (New Determiner Rule)
However, we have to think of a way to block the NPs in (38) from

being generated.
4.2 Non-branching Phrases
Consider the new complement rule:
(40) N’ N (PP) (New Complement Rule)
This rule is really equivalent to the following two rules:
(41) a. N’ N PP
b. N’ N
(41a) is nothing new. (41b) is definitely new. It tells us something

unexpected. According to (41b), student in ‘the student’ is both an
N’ as well as an N, while student in the student of Physics is only
an N, not an N’. Similarly, student in ‘the student with long hair’
should be both an N’ as well as an N. We can check to see if these
predictions are true.
The test we will use is substitution by the N0 pro-form one.
14
(42) a. The [student] with long hair is dating the one with short hair.
b. This [student] works harder than that one.
c. * The [student] of chemistry was older than the one of
Physics.
What can co-ordination tell us here?
4.3 A bit more on N’

Both (43a, b) are responsible for the creation of N’’s.
(43) a. N’ N0 PP (Adjunct Rule)
b. N’ N PP (Complement Rule)
How can we be sure that the node created by the complement rule
isn’t N-bar1 and the node created by the adjunct rule N-bar2?
Again by constituency test: we know that only like categories can be
co-ordinated and we find that N’ created by the two different rules
can be co-ordinated.
(44) the [ [students of Chemistry with long hair] and [professors of

Physics]]
In addition, the pro-N’ one can refer to N0s created by either rule.
(45) a. Which [student of Physics]? The one with long hair?

b. Which [student of Physics with long hair]? That one?
Hence we can conclude that the ‘output’ of both the rules is indeed
one kind of node, which we call N’.
15
The Hierarchy
Turing Machines and Type 0 Languages
The classes of languages that are accepted by finite-state automata

on the one hand and pushdown automata on the other hand were
shown earlier to be the classes of Type 3 and Type 2 languages,
respectively. The following two theorems show that the class of
languages accepted by Turing machines is the class of Type 0
languages.
Theorem 4.6.1 Each Type 0 language is a recursively enumerable

language.
Proof Consider any Type 0 grammar G = <N, , P, S>. From G

construct a two auxiliary-work-tape Turing machine MG that on a given
input x nondeterministically generates some string w in L(G), and
then accepts x if and only if x = w.
The Turing machine MG generates the string w by tracing a derivation

in G of w from S. MG starts by placing the sentential form S in the first
auxiliary work tape. Then MG repeatedly replaces the sentential form
16
stored on the first auxiliary work tape with the one that succeeds it in
the derivation. The second auxiliary work tape is used as an
intermediate memory, while deriving the successor of each of the
sentential forms.
The successor of each sentential form is obtained by non-

deterministically searching for a substring , such that is a
production rule in G, and then replacing by in .
MG uses a subcomponent M1 to copy the prefix of that precedes onto

the second auxiliary work tape.
MG uses a subcomponent M2 to read from the first auxiliary work tape

and replace it by on the second.
MG uses a subcomponent M3 to copy the suffix of that succeeds onto

the second auxiliary work tape.
MG uses a subcomponent M4 to copy the sentential form created on

the second auxiliary work tape onto the first. In addition, MG uses M4
to determine whether the new sentential form is a string in L(G). If w
is in L(G), then the control is passed to a subcomponent M5.
Otherwise, the control is passed to M1.
MG uses the subcomponent M5 to determine whether the input string x

is equal to the string w stored on the first auxiliary work tape.
Example 4.6.1 Consider the grammar G which has the following

production rules.
17
The language L(G) is accepted by the Turing machine MG, whose
transition diagram is given in Figure 4.6.1.
Figure 4.6.1 A Turing machine MG for simulating the grammar G that

has the production rules S aSbS and Sb  .
The components M1, M2, and M3 scan from left to right the sentential
form stored on the first auxiliary work tape. As the components scan
the tape they erase its content.
The component M2 of MG uses two different sequences of transition

rules for the first and second production rules: S aSbS and Sb . The
sequence of transition rules that corresponds to S aSbS removes S
from the first auxiliary work tape and stores aSbS on the second. The
sequence of transition rules that corresponds to Sb removes Sb
from the first auxiliary work tape and stores nothing on the second.
18
The component M4 scans from right to left the sentential form in the
second auxiliary work tape, erasing the content of the tape during the
scanning. M4 starts scanning the sentential form in its first state,
determining that the sentential form is a string of terminal symbols if
it reaches the blank symbol B while in the first state. In such a case,
M4 transfers the control to M5. M4 determines that the sentential form
is not a string of terminal symbols if it reaches a nonterminal symbol.
In this case, M4 switches from its first to its second state.
Theorem 4.6.2 Each recursively enumerable language is a Type 0

language.
Proof The proof consists of constructing from a given Turing

machine M a grammar that can simulate the computations of M. The
constructed grammar G consists of three groups of production rules.
The purpose of the first group is to determine the following three

items.
a. An initial configuration of M on some input.

b. Some segment for each auxiliary work tape of M. Each segment
must include the location under the head of the corresponding
tape.
c. Some sequence of transition rules of M. The sequence of
transition rules must start at the initial state, end at an
accepting state, and be compatible in the transitions that it
allows between the states.
The group of production rules can specify any initial configuration of

M, any segment of an auxiliary work tape that satisfies the above
conditions, and any sequence of transition rules that satisfies the
above conditions.
The purpose of the second group of production rules is to simulate a

computation of M. The simulation must start at the configuration
determined by the first group. In addition, the simulation must be in
accordance with the sequence of transition rules, and within the
segments of the auxiliary work tapes determined by the first group.
The purpose of the third group of production rules is to extract the

input whenever an accepting computation has been simulated, and to
leave nonterminal symbols in the sentential form in the other cases.
Consequently, the grammar can generate a given string if and only if
the Turing machine M has an accepting computation on the string.
19
Consider any Turing machine M = <Q, , , , q0, B, F>. With no loss of
generality it can be assumed that M is a two auxiliary-work-tape
Turing machine , that no transition rule originates at an accepting
state, and that N = { | is in } { [q] | q is in Q } {¢, $, , , ,
#, S, A, C, D, E, F, K} is a multiset whose symbols are all distinct.
From M construct a grammar G = <N, , P, S> that generates L(M), by

tracing in its derivations the configurations that M goes through in its
accepting computations. The production rules in P are of the following
form.
a. Production rules for generating any sentential form that has the
following pattern.
Each such sentential form corresponds to an initial configuration

(¢q0a1 an$, q0, q0) of M, and a sequence of transition rules i1 it.
The transition rules define a sequence of compatible states that
starts at the initial state and ends at an accepting state.
represents the input head, represents the head of the first
auxiliary work tape, and represents the head of the second
auxiliary work tape. The string B B B B corresponds to a
segment of the first auxiliary work tape, and the string B B B B
to a segment of the second.
A string in the language is derivable from the sentential form if and

only if the following three conditions hold.
a. The string is equal to a1 an.

b. M accepts a1 an in a computation that uses the sequence
of transition rules i1 it.
c. B B B B corresponds to a segment of the ith auxiliary
work tape that is sufficiently large for the considered
computation of M, 1 i 2. The position of in the
segment indicates the initial location of the corresponding
auxiliary work-tape head in the segment.
The production rules are of the following form.
20
The production rules for the nonterminal symbols S and A can
generate a string of the form ¢ a1 an$C for each possible input
a1 an of M. The production rules for the nonterminal symbols C
and D can generate a string of the form B B B B#E for each
possible segment B B B B of the first auxiliary work tape that
contains the corresponding head location. The production rules
for E and F can generate a string of the form B B B B#[q0] for
each possible segment B B B B of the second tape that
contains the corresponding head location. The production rules
for the nonterminal symbols that correspond to the states of M
can generate any sequence i1 it of transition rules of M that
starts at the initial state, ends at an accepting state, and is
compatible in the transition between the states.
b. Production rules for deriving from a sentential form
which corresponds to configuration = (uqv$, u1qv1, u2qv2), a

sentential form
which corresponds to configuration = (û $, û1 1, û2 2). and

are assumed to be two configurations of M such that is
reachable from by a move that uses the transition rule ij.
21
a. For each transition rule the set of production rules have
1. A production rule of the form X X for each X in {¢,
$, #}.
2. A production rule of the form a a, for each symbol
a in {¢, $} that satisfies the following condition: is a
transition rule that scans the symbol a on the input tape
without moving the input head.
3. A production rule of the form a a , for each symbol
a in {¢, $} that satisfies the following condition: is a
transition rule that scans the symbol a in the input tape
while moving the input head one position to the right.
4. A production rule of the form a b ab, for each pair
of symbols a and b in {¢, $} that satisfy the following
condition: is a transition rule that scans the symbol b in
the input tape while moving the input head one position
to the left.
5. A production rule of the form X Y for each 1 i 2,
and for each pair of symbols X and Y in that satisfy the
following condition: is a transition rule that replaces X
with Y in the ith auxiliary work tape without changing the
head position.
6. A production rule of the form X Y for each 1 i 2,
and for each pair of symbols X and Y in that satisfy the
following condition: is a transition rule that replaces X
with Y in the ith auxiliary work tape while moving the
corresponding head one position to the right.
7. A production rule of the form X Y XZ for each 1 i
2, and for each triplet of symbols X, Y, and Z in that
satisfy the following condition: is a transition rule that
replaces the symbol Y with Z in the ith auxiliary work tape
while moving the corresponding head one position to the
left.
The purpose of the production rules in (1) is to transport from

right to left over the nonhead symbols in { , , }, across a
representation
of a configuration of M. gets across the head symbols , ,

and by using the production rules in (2) through (7). As gets
across the head symbols, the production rules in (2) through (7)
22
"simulate" the changes in the tapes of M, and the corresponding
heads position, because of the transition rule .
b. Production rules for extracting from a sentential form
which corresponds to an accepting configuration of M, the input

that M accepts. The production rules are as follows.
Example 4.6.2 Let M be the Turing machine whose transition

diagram is given in Figure 4.5.6(a). L(M) is generated by the grammar
G that consists of the following production rules.
A. Production rules that find a sentential form that corresponds to

the initial configuration of M, according to (a) in the proof of
Theorem 4.6.2.
B. "Transporting" production rules that correspond to (b.1) in the

proof of Theorem 4.6.2, 1 i 5.
23
C. "Simulating" production rules that correspond to (b.2-b.4) in
the proof of Theorem 4.6.2.
D. "Simulating" production rules that correspond to (b.5-b.7) in

the proof of Theorem 4.6.2.
E. "Extracting" production rules that correspond to (c) in the proof

of Theorem 4.6.2.
The string abc has a leftmost derivation of the following form in G.
24
Theorem 4.6.2, together with Theorem 4.5.3, implies the following
result.
Corollary 4.6.1 The membership problem is undecidable for Type

0 grammars or, equivalently, for { (G, x) | G is a Type 0 grammar, and
x is in L(G) }.
A context-sensitive grammar is a Type 1 grammar in which each

production rule has the form 1A 2 1 2 for some nonterminal symbol
A. Intuitively, a production rule of the form 1A 2 1 2 indicates that A
can be used only if it is within the left context of 1 and the right
context of 2. A language is said to be a context-sensitive language, if
it can be generated by a context-sensitive grammar.
A language is context-sensitive if and only if it is a Type 1 language

(Exercise 4.6.4), and if and only if it is accepted by a linear bounded
automaton (Exercise 4.6.5). By definition and Theorem 3.3.1, each
context-free language is also context-sensitive, but the converse is
false because the non-context-free language { aibici | i 0 } is

context-sensitive. It can also be shown that each context-sensitive
language is recursive (Exercise 1.4.4), and that the recursive
language LLBA_reject = { x | x = xi and Mi does not have accepting
computations on input xi in which at most |xi| locations are visited in
each auxiliary work tape } is not context-sensitive (Exercise 4.5.6).
25
Figure 4.6.2 gives the hierarchy of some classes of languages. All the
inclusions in the hierarchy are proper
Figure 4.6.2 Hierarchy of some classes of languages. Each of the

indicated languages belongs to the corresponding class but not to the
class just below it in the hierarchy.
26

Chomsky Hierarchy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chomsky Hierarchy

Uploaded by

Copyright:

Available Formats

CHOMSKY HIERARCHY

Within the field of computer science, specifically in the area of

This hierarchy of grammars was described by Noam Chomsky

The People behind the Chomsky Hierarchy

- Professor at the University of Poitiers, 1957-1963

Phrase Structure Rules

(2) a. A set of nodes form a constituent iff they are

b. X is a constituent of Y iff X is dominated by Y.

Notions such as subject, object, prepositional object etc. can be

(3) a. If a node X immediately dominates a node Y, then X is

We can now define a host of relationships on trees - grandmother,

Another important relationship that is defined in purely structural

(4) A c-commands B if and only if A does not dominate B and

• The NO CROSSING BRANCHES CONSTRAINT

2 How to grow trees

This empirical method is ultimately the only correct way to deduce

Phrase Structure Rules are rules of the sort

Phrase structure rules can be categorial i.e. rules that expand

2.1 Some Phrase Structure Rules for English

(7) Categorial Rules

(8) Lexical Rules

Some sentences these rules will generate:

How many more sentences will these rules generate?

(11) He believes that he believes that he believes that he believes

So if all of human language is to be generated by a set of phrase

How can that be done?

(12) He believes that he ate pizza.

We need the following lexical rules:

Now we can generate (12). This is shown in (15).

How can we constrain phrase structure rules so that such

So far, we have seen two kinds of categories:

We will now investigate if these two kinds of categories are all we

Consider the following NP:

(17) the king of England

There seems to be a lot of evidence that of England is a PP. It can be

(18) a. the king [PP of England] and [PP of the empire].

At this point we have two options:

• ‘king of England’ can undergo co-ordination with another similar

• ‘king of England’ can serve as the shared constituent in shared

• There is a proform that replaces sequences like ‘king of England’.

So ‘king of England’ forms a constituent that excludes the. Thus we

3.1 What kind of constituent is ‘king of England’?

i. [The king of England] invaded several countries.

(26) a. Categorial Rules:

Note in particular the categorial rule (26a.i). It has the unusual

So we can go from NP to [D NP] to [D D NP] to [D D D NP] and so on.

(27) a. * the the king of England

Our tree now becomes:

NPs are somtimes called N-double bars or N00. N are sometimes

4 Complements and Adjuncts

We see that D combines with an N0 to its left and forms an NP.

(30) a. a student [of Physics]

In terms of phrase structure rules this is:

(33) a. N’’ DN, (Determiner Rule)

The rules in (33) make a prediction - if an NP contains both a

(34) a. the student [of Physics] [with long hair]

4.1 Optional Constituents of the Noun Phrase

(35) a. the student

(36) a. N’ N PP (Old Complement Rule)

We also find optionality of determiners cf. (37a-e).

However, this optionality is lexically determined i.e. it only works

(38) a. * Student likes pizza

However, we have to think of a way to block the NPs in (38) from