Professional Documents
Culture Documents
1 Background
Here is an example context-free grammar which de- Since the time of Pini, at least, linguists have described
scribes all two-letter strings containing the letters and the grammars of languages in terms of their block struc.
ture, and described how sentences are recursively built
up from smaller phrases, and eventually individual words
S AA
or word elements. An essential property of these block
A
structures is that logical units never overlap. For example, the sentence:
A
If we start with the nonterminal symbol S then we can
John, whose blue car was in the garage, walked
use the rule S AA to turn S into AA . We can
to the grocery store.
then apply one of the two later rules. For example, if
we apply A to the rst A we get A . If we
then apply A to the second A we get . Since can be logically parenthesized as follows:
both and are terminal symbols, and in context-free
grammars terminal symbols never appear on the left hand
(John, ((whose blue car) (was (in the garage))),
side of a production rule, there are no more rules that
(walked (to (the grocery store)))).
can be applied. This same process can be used, applying
the second two rules in dierent orders in order to get all
A context-free grammar provides a simple and mathepossible strings within our simple context-free grammar.
matically precise mechanism for describing the methods
Languages generated by context-free grammars are by which phrases in some natural language are built from
known as context-free languages (CFL). Dierent smaller blocks, capturing the block structure of sencontext-free grammars can generate the same context- tences in a natural way. Its simplicity makes the formalfree language. It is important to distinguish properties ism amenable to rigorous mathematical study. Imporof the language (intrinsic properties) from properties of a tant features of natural language syntax such as agreement
particular grammar (extrinsic properties). The language and reference are not part of the context-free grammar,
equality question (do two given context-free grammars but the basic recursive structure of sentences, the way in
1
2 FORMAL DEFINITIONS
members of R are called the (rewrite) rules or productions of the grammar. (also commonly symbolized by a P)
G = (V, , R, S) where
u = v ), the relation uv holds. In other words, ()
+
and () are the reexive transitive closure (allowing a
1. V is a nite set; each element v V is called a word to yield itself) and the transitive closure (requiring
nonterminal character or a variable. Each variable at least one step) of () , respectively.
represents a dierent type of phrase or clause in
the sentence. Variables are also sometimes called
syntactic categories. Each variable denes a sub- 2.4 Context-free language
language of the language dened by G.
Formal denitions
L(G) = {w : S w}
by the grammar G.
3. R is a nite relation from V to (V ) , where the A language L is said to be a context-free language (CFL),
asterisk represents the Kleene star operation. The if there exists a CFG G, such that L = L(G) .
3.2
2.5
Proper CFGs
) : S N
3
The rst rule allows the S symbol to multiply; the second
rule allows the S symbol to become enclosed by matching
parentheses; and the third rule terminates the recursion.
no unproductive symbols: N V : w :
no cycles: N V : N N
S SS
S ()
Every context-free grammar can be eectively transS (S)
formed into a weakly equivalent one without unreach[8]
S []
able symbols, a weakly equivalent one without unproductive symbols,[9] and a weakly equivalent one withS [S]
out cycles.[10] Every context-free grammar not producing
can be eectively transformed into a weakly equiva- with terminal symbols [ ] ( ) and nonterminal S.
lent one without -productions;[11] altogether, every such
The following sequence can be derived in that grammar:
grammar can be eectively transformed into a weakly
equivalent proper CFG.
([ [ [ ()() [ ][ ] ] ]([ ]) ])
However, there is no context-free grammar for generating
all sequences of two dierent types of parentheses, each
The grammar G = ({S}, {a, b}, P, S) , with produc- separately balanced disregarding the other, but where the
two types need not nest inside one another, for example:
tions
2.6
Example
[(])
S aSa,
S bSb,
or
S ,
is context-free. It is not proper since it includes an production. A typical derivation in this grammar is
3.3
A regular grammar
This makes it clear that L(G) = {ww : w {a, b} } context-free grammar, however, is also regular.
. The language is context-free, however it can be proved
Sa
that it is not regular.
S aS
3
3.1
Examples
Well-formed parentheses
S bS
The terminals here are a and b, while the only nonterminal is S. The language described is all nonempty strings
of a s and b s that end in a .
The canonical example of a context free grammar is This grammar is regular: no rule has more than one nonparenthesis matching, which is representative of the gen- terminal in its right-hand side, and each of these nontereral case. There are two terminal symbols "(" and ")" and minals is at the same end of the right-hand side.
one nonterminal symbol S. The production rules are
Every regular grammar corresponds directly to a
nondeterministic nite automaton, so we know that this
S SS
is a regular language.
S (S)
Using pipe symbols, the grammar above can be described
S ()
3 EXAMPLES
S a | aS | bS
3.4
Matching pairs
(x+y)*x-S*y/(S+S)
(x+y)*x-z*y/(x+x)
(x+y)*x-S*y/(x+S)
(x+y)*x-z*y/(x+S)
3.5
Algebraic expressions
2. S y
3. S z
4. S S + S
5. S S - S
6. S S * S
7. S S / S
8. S ( S )
This grammar can, for example, generate the string
(x+y)*x-z*y/(x+x)
as follows:
S (the start symbol)
But can a dierent parse tree still produce the same terminal string, which is ( x + y ) * x - z * y / ( x + x ) in this
case? Yes, for this particular grammar, this is possible.
Grammars with this property are called ambiguous.
S - S (by rule 5)
3.7
Tz
SS*T
SS/T
SS+T
SS-T
T(S)
ST
the occurrence of its left hand side to which it is ap(once again picking S as the start symbol). This alterplied
native grammar will produce x + y * z with a parse tree
similar to the left one above, i.e. implicitly assuming the
association (x + y) * z, which is not according to standard For clarity, the intermediate string is usually given as well.
operator precedence. More elaborate, unambiguous and
context-free grammars can be constructed that produce For instance, with the grammar:
parse trees that obey all desired operator precedence and (1) S S + S (2) S 1 (3) S a
associativity rules.
the string
3.6
Further examples
1+1+a
can be derived with the derivation:
Example 1
SU|V
U TaU | TaT | UaT
V TbV | TbT | VbT
T aTbT | bTaT |
Given such a strategy, a derivation is completely determined by the sequence of rules applied. For instance, the
leftmost derivation
3.6.2
can be summarized as
Example 2
UNDECIDABLE PROBLEMS
because this determines the order in which the pieces of is no -production (that is, a rule that has the empty string
code will be executed. See for an example LL parsers as a product). If a grammar does generate the empty
and LR parsers.
string, it will be necessary to include the rule S
A derivation also imposes in some sense a hierarchical , but there need be no other -rule. Every context-free
structure on the string that is derived. For example, if grammar with no -production has an equivalent gramthe string 1 + 1 + a is derived according to the leftmost mar in Chomsky normal form or Greibach normal form.
Equivalent here means that the two grammars generate
derivation:
the same language.
S S + S (1)
1 + S (2)
1 + S + S (1)
1 + 1 + S (2)
1 + 1 + a (3)
The especially simple form of production rules in Chomsky normal form grammars has both theoretical and practical implications. For instance, given a context-free
grammar, one can use the Chomsky normal form to construct a polynomial-time algorithm that decides whether
a given string is in the language represented by that grammar or not (the CYK algorithm).
5 Closure properties
S + S + a (1)
S + 1 + a (2)
7 Undecidable problems
1 + 1 + a (2)
Some questions that are undecidable for wider classes
of grammars become decidable for context-free gramand this denes the following parse tree:
mars; e.g. the emptiness problem (whether the grammar
S /|\ / | \ / | \ S '+' S /|\ | / | \ | S '+' S 'a' | | '1' '1'
generates any terminal strings at all), is undecidable for
If, for certain strings in the language of the grammar, context-sensitive grammars, but decidable for contextthere is more than one parsing tree, then the grammar free grammars.
is said to be an ambiguous grammar. Such grammars are However, many problems are undecidable even for
usually hard to parse because the parser cannot always context-free grammars. Examples are:
decide which grammar rule it has to apply. Usually, ambiguity is a feature of the grammar, not the language, and
an unambiguous grammar can be found that generates the 7.1 Universality
same context-free language. However, there are certain
languages that can only be generated by ambiguous gram- Given a CFG, does it generate the language of all
mars; such languages are called inherently ambiguous lan- strings over the alphabet of terminal symbols used in its
guages.
rules?[18][19]
A reduction can be demonstrated to this problem from
the well-known undecidable problem of determining
4 Normal forms
whether a Turing machine accepts a particular input (the
halting problem). The reduction uses the concept of a
Every context-free grammar that does not generate the computation history, a string describing an entire compuempty string can be transformed into one in which there tation of a Turing machine. A CFG can be constructed
7
that generates all strings that are not accepting computation histories for a particular Turing machine on a parrev
|b
S 1 S1rev | |N SN
ticular input, and thus it will accept all strings only if the
rev
machine doesn't accept that input.
where i denotes the reversed string i and b doesn't
occur among the ai ; and let grammar consist of the rule
7.2
Language equality
T a1 T a1 | |ak T ak |b
The undecidability of this problem is a direct consequence of the previous: it is impossible to even decide
whether a CFG is equivalent to the trivial CFG dening
the language of all strings.
7.3
Language inclusion
Given two CFGs, can the rst one generate all strings that
the second one can generate?[19][20]
If this problem was decidable, then language equality
could be decided too: two CFGs G1 and G2 generate the
same language if L(G1) is a subset of L(G2) and L(G2)
is a subset of L(G1).
7.4
8 Extensions
An obvious way to extend the context-free grammar formalism is to allow nonterminals to have arguments, the
values of which are passed along within the rules. This
allows natural language features such as agreement and
reference, and programming language analogs such as the
correct use and denition of identiers, to be expressed in
a natural way. E.g. we can now easily express that in English sentences, the subject and verb must agree in number. In computer science, examples of this approach include ax grammars, attribute grammars, indexed grammars, and Van Wijngaarden two-level grammars. Similar
extensions exist in linguistics.
7.5
Grammar ambiguity
7.6
Language disjointness
9 Subclasses
There are a number of important subclasses of the
context-free grammars:
LR(k) grammars (also known as deterministic
context-free grammars) allow parsing (string recognition) with deterministic pushdown automata
(PDA), but they can only describe deterministic
context-free languages.
Simple LR, Look-Ahead LR grammars are subclasses that allow further simplication of parsing.
SLR and LALR are recognized using the same PDA
as LR, but with simpler tables, in most cases.
LL(k) and LL(*) grammars allow parsing by direct
construction of a leftmost derivation as described
above, and describe even fewer languages.
12 NOTES
Simple grammars are a subclass of the LL(1) grammars mostly interesting for its theoretical property
that language equality of simple grammars is decidable, while language inclusion is not.
Bracketed grammars have the property that the terminal symbols are divided into left and right bracket
pairs that always match up in rules.
11 See also
Parsing expression grammar
Stochastic context-free grammar
Algorithms for context-free grammar generation
Pumping lemma for context-free languages
Linear grammars have no rules with more than one 11.1 Parsing algorithms
nonterminal in the right hand side.
CYK algorithm
Regular grammars are a subclass of the linear grammars and describe the regular languages, i.e. they
correspond to nite automata and regular expressions.
LR parsing extends LL parsing to support a larger range
of grammars; in turn, generalized LR parsing extends
LR parsing to support arbitrary context-free grammars.
On LL grammars and LR grammars, it essentially performs LL parsing and LR parsing, respectively, while on
nondeterministic grammars, it is as ecient as can be
expected. Although GLR parsing was developed in the
1980s, many new language denitions and parser generators continue to be based on LL, LALR or LR parsing
up to the present day.
GLR parser
LL parser
Earley algorithm
12 Notes
[1] Stephen Scheinberg, Note on the Boolean Properties of
Context Free Languages, Information and Control, 3, 372
375 (1960).
[2] Introduction to Automata Theory, Languages, and Computation, John E. Hopcroft, Rajeen Motwani, Jerey D.
Ullman, Addison Wesley, 2001, p.191
[3] Hopcroft & Ullman (1979), p. 106.
10
Linguistic applications
13
References
Hopcroft, John E.; Ullman, Jerey D. (1979), Introduction to Automata Theory, Languages, and Computation, Addison-Wesley. Chapter 4: Context-Free
Grammars, pp. 77106; Chapter 6: Properties of
Context-Free Languages, pp. 125137.
Sipser, Michael (1997), Introduction to the Theory of Computation, PWS Publishing, ISBN 0-53494728-X. Chapter 2: Context-Free Grammars, pp.
91122; Section 4.1.2: Decidable problems concerning context-free languages, pp. 156159; Section 5.1.1: Reductions via computation histories:
pp. 176183.
J. Berstel, L. Boasson (1990). Jan van Leeuwen, ed.
Context-Free Languages. Handbook of Theoretical
Computer Science. B. Elsevier. pp. 59102.
14
External links
10
15
15
15.1
15.2
Images
15.3
Content license