The Theory of Parsing, Translation, and Compiling PDF

This first volume of a comprehensive,
self-contained, and up-to-date treat-

ment of compiler theory emphasizes
parsing and its theoretical framework.
Chapters are devoted to mathematical
preliminaries, an overview of compil-
ing, elements of language theory,
theory of translation, general parsing
methods, one-pass no-backtrack pars-
ing, and limited backtrack parsing
algorithms.
Also included are valuable appendices

on the syntax for an extensible base
language, for SNOBOL4 statements,
for PL860 and for a PAL syntax
directed translation scheme.
Among the features:

• Emphasizes principles that have
broad applicability rather than de-
tails specific to a given language or
machine.
• Provides complete coverage of all
important parsing algorithms, in-
eluding the LL, LR, Precedence
and Earley's Methods.
• Presents many examples to illus-
trate concepts and applications.
Exercises at the end of each section
(continued on back jlap)
THE THEORY OF
PARSING, TRANSLATION,
AND COMPILING
Prentice-Hall
Series in Automatic Computation
George Forsythe, editor
AHO AND ULLMAN, The Theory of Parsing, Translation, and Compiling,

V o l u m e I: Parsing; V o l u m e II: Compiling
(ANDREE),3 Computer Programming: Techniques, Analysis, and Mathematics
ANSELONE, Collectively Compact Operator Approximation Theory
and Applications to Integral Equations
ARBIB, Theories of Abstract Automata
BATES AND DOUGLAS, Programming Language/One, 2nd ed.
BLUMENTHAL, Management Information Systems
BOBROW AND SCHWARTZ, Computers and the Policy-Making Community
BOWLES, editor, Computers in Humanistic Research
BRENT, Algorithms for Minimization without Derivatives
CESCHINO AND KUNTZMAN, Numerical Solution of Initial Value Problems
CRESS, et al., FORTRAN IV with WATFOR and WATFIV
DANIEL, The Approximate Minimization of Functionals
DESMONDE, A Conversational Graphic Data Processing System
DESMONDE, Computers and Their Uses, 2nd ed.
DESMONDE, Real-Time Data Processing Systems
DRUMMOND, Evaluation and Measurement Techniques for Digital Computer Systems
EVANS, et al., Simulation Using Digital Computers
FIKE, Computer Evaluation of Mathematical Functions
FIKE, PL/1 for Scientific Programers
rORSYTHE AND MOLER, Computer Solution of Linear Algebraic Systems
GAUTHIER AND PONTO, Designing Systems Programs
GEAR, Numerical lnital Value Problems in Ordinary Differential Equations
GOLDEN, FORTRAN I V Programming and Computing
GOLDEN AND LEtCHUS, IBM~360 Programming and Computing
GORDON, System Simulation
GREENSPAN, Lectures on the Numerical Solution of Linear, Singular, and
Nonlinear Differential Equations
GRUENBERGER, editor, Computers and Communications
GRUENBERGER, editor, Critical Factors in Data Management
GRUENBERGER, editor, Expanding Use of Computers in the 70's
GRUENBERGER, editor, Fourth Generation Computers
HARTMANIS AND STEARNS, Algebraic Structure Theory of Sequential Machines
HULL, Introduction to Computing
JACOBY, et al., Iterative Methods for Nonlinear Optimization Problems
JOHNSON, System Structure in Data, Programs and Computers
KANTER, The Computer and the Executive
KIVIAT, et al., The SIMSCRIPT H Programming Language
LORIN, Parallelism in Hardware and Software: Real and Apparent Concurrency
LOUDEN AND LEDIN, Programming thelBM 1130, 2nd ed.
MARTIN, Design of Real-Time Computer Systems
MARTIN, Future Developments in Telecommunications
MARTIN, Man-Computer Dialogue
MARTIN, Programming Real-Time Computing Systems
MARTIN, Systems Analysis for Data Transmission
MARTIN, Telecommunications and the Computer
MARTIN, Teleprocessing Network Organization
MARTIN AND NORMAN, The Computerized Society
MATHISON AND WALKER, Computers and Telecommunications: Issues in Public Policy
MCKEEMAN, et al., A Compiler Generator
MEYERS, Time-Sharing Computation in the Social Sciences
MINSKY, Computation: Finite and Infinite Machines
MOORE, Interval Analysis
PLANE AND MCMILLAN, Discrete Optimization: Integer Programming and
Network Analysis for Management Decisions
PRITSKER AND KIVIAT, Simulation with GASP II:
a FORTRAN-Based Simulation Language
PYLYSHYN, editor, Perspectives on the Computer Revolution
RICH, Internal Sorting Methods: Illustrated with PL/1 Program
RffSTIN, editor, Computer Networks
RUSTIN, editor, Debugging Techniques in Large Systems
RUSTIN, editor, Formal Semantics of Programming Languages
SACKMAN AND CITRENBAUM, editors, On-line Planning:
To wards Creative Problem-Solving
SALTON, editor, The S M A R T Retrieval System: Experiments
in Automatic Document Processing
SAMMET, Programming Languages: History and Fundamentals
SCHULTZ, Digital Processing: A System Orientation
SCHULTZ, Finite Element Analysis
SCHWARZ, et al., Numerical Analysis of Symmetric Matrices
SHERMAN, Techniques in Computer Programming
SIMON AND SIKLOSSY, Representation and Meaning: Experiments
with Information Processing Systems
SNYDER, Chebyshev Methods in Numerical Approximation
STERLING AND POLLACK, Introduction to Statistical Data Processing
STOUTMEYER, PL/1 Programming for Engineering and Science
STROUD, Approximate Calculation of Multiple Integrals
STROUD AND SECREST, Gbussian Quadrature Formulas
TAVISS, editor, The Computer Impact
TRAUn, Iterative Methods for the Solution of Equations
UHR, Pattern Recognition, Learning, and Thought
VAN TASSEL, Computer Security Management
VARGA, Matrix Iterative Analysis
VAZSONYI, Problem Solving by Digital Computers with PL/1 Programming
WAITE, Implementing Software for Non-Numeric Application
WILKINSON, Rounding Errors in Algebraic Processes
ZIEGLER, Time-Sharing Data Processing Systems
THE THEORY OF
A N D COMPILING
VOLUME I: PARSING
A L F R E D V. A H O
Bell Telephone Laboratories, Inc.

Murray Hill, N.J.
JEFFREY D. U L L M A N
Department of Electrical Engineering

Princeton University
PRENTICE-HALL, INC.
ENGLEW00D CLIFFS, N.J.

© 1972 by Bell Telephone Laboratories,
Incorporated, and J. D. Ullman
All rights reserved. No part of this book

may be reproduced in any form or by any means
without permission in writing from the publisher,
1098
ISBN: 0-13-914556-7
Library of Congress Catalog Card No. 72-1073
Printed in the United States of America
PRENTICE-HALL INTERNATIONAL, INC., London

PRENTICE-HALL OF AUSTRALIA, PTY. LTD., Sydney
PRENTICE-HALL OF CANADA, LTD., Toronto
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
For Adrienne and Holly
PREFACE
This book is intended for a one or two semester course in compiling

theory at the senior or graduate level. It is a theoretically oriented treatment
of a practical subject. Our motivation for .making it so is threefold.
(1) In an area as rapidly changing as Computer Science, sound pedagogy
demands that courses emphasize ideas, rather than implementation details.
It is our hope that the algorithms and concepts presented in this book will
survive the next generation of computers and programming languages, and
that at least some of them will be applicable to fields other than compiler
writing.
(2) Compiler writing has progressed to the point where many portions of
a compiler can be isolated and subjected to design optimization. It is im-
portant that appropriate mathematical tools be available to the person
attempting this optimization.
(3) Some of the most useful and most efficient compiler algorithms,
e.g. LR(k) parsing, require a good deal of mathematical background for full
understanding. We expect, therefore, that a good theoretical background will
become essential for the compiler designer.
While we have not omitted difficult theorems that are relevant to com-
piling, we have tried to make the book as readable as possible. Numer-
ous examples are given, each based on a small grammar, rather than on
the large grammars encountered in practice. It is hoped that these examples
are sufficient to illustrate the basic ideas, even in cases where the theoretical
developments are difficult to follow in isolation.
Use of the Book
The notes from which this book derives were used in courses at Princeton
University and Stevens Institute of Technology at both the senior and
graduate levels. Both one and two semester courses have been taught from this
book. In a one semester course, the course in compilers was preceded by a
× PREFACE
course covering finite automata and context-free languages. It was therefore

unnecessary to cover Chapters 0, 2 and 8. Most of the remaining chapters were
covered in detail.
In a two semester sequence, most of Volume I was covered in the first
semester and most of Volume II, except for Chapter 8, in the second. In the
two semester course more attention was devoted to proofs and proof tech-
niques than in the one semester course.
Some sections of the book are clearly more important than others, and
we would like to give the reader some brief comments regarding our estimates
of the relative importance of various parts of Volume I. As a general com-
ment, it is probably wise to skip most of the proofs. We include proofs of all
main results because we believe them to be necessary for maximum under-
standing of the subject. However, we suspect that many courses in compiling
do not get this deeply into many topics, and reasonable understanding can
be obtained with only a smattering of proofs.
Chapters0 (mathematical background) and 1 (overview of compiling)
are almost all essential material, except possibly for Section 1.3, which covers
applications of parsing other than to compilers.
We believe that every concept and theorem introduced in Chapter 2
(language theory) finds use somewhere in the remaining nine chapters.
However, some of the material can be skipped in a course on compilers.
A good candidate for omission is the rather difficult material on regular
expression equations in Section 2.2.1. One is then forced to omit some of the
material on right linear grammars in Section 2.2.2. (although the equivalence
between these and finite automata can be obtained in other ways) and the
material on Rosenkrantz's method of achieving Greibach normal form in
Section 2.4.5.
The concepts of Chapter 3 (translation) are quite essential to the rest of
the book. However, Section 3.2.3, on the hierarchy of syntax,directed transla-
tions, is rather difficult and can be omitted.
We believe that Section 4.1 on backtracking methods of parsing is less vital
than the tabular methods of Section 4.2.
Most of Chapter 5 (single-pass parsing) is essential. We suggest that LL
grammars (Section 5.1), LR grammars (Section 5.2), precedence grammars
(Sections 5.3.2 and 5.3.4) and operator precedence grammars (Section 5.4.3)
receive maximum priority. Other sections could be omitted if necessary.
Chapter 6 (backtracking algorithms) is less essential than most of
Chapter 5 or Section 4.2. If given a choice, we would cover Section 6.1 rather
than 6.2.
Organization of the Book
The entire work The Theory of Parsing, Translation, and Compiling appears
in two volumes, Parsing (Chs. 0-6) and Compiling (Chs. 7-11). (The topics
covered in the second volume are parser optimization, theory of deterministic
PREFACE xi
parsing, translation, bookkeeping, and code optimization.) The two volumes

form an integrated work, with pages consecutively numbered, and with a
bibliography and index for both volumes appearing in Volume II.
Problems and bibliographical notes appear at the end of each section
(numbered i.j). Except for open problems and research problems, we have
used stars to indicate grades of difficulty. Singly starred problems require
one significant insight for their solution. Doubly starred exercises require
more than one such insight.
It is recommended that a course based on this book be accompanied by a
programming laboratory in which several compiler parts are designed and
implemented. At the end of certain sections of this book appear programming
exercises, which can be used as projects in such a programming laboratory.
Acknowledgements
Many people have carefully read various parts of this manuscript and
have helped us significantly in its preparation. Especially, we would like
to thank David Benson, John Bruno, Stephen Chen, Matthew Geller, James
Gimpel, Michael Harrison, Ned Horvath, Jean Ichbiah, Brian Kernighan,
Douglas Mcllroy, Robert Martin, Robert Morris, Howard Siegel, Leah
Siegel, Harold Stone, and Thomas Szymanski, as well as referees Thomas
Cheatham, Michael Fischer, and William McKeeman. we have also received
important comments from many of the students who used these notes, among
them Alan Demers, Nahed El Djabri, Matthew Hecht, Peter Henderson,
Peter Maika, Thomas Peterson, Ravi Sethi, Kenneth Sills, and Steven Squires.
Our thanks are also due for the excellent typing of the manuscript done
by Hannah Kresse and Dorothy Luciani. In addition, we acknowledge the
support services provided by Bell Telephone Laboratories during the pre-
paration of the manuscript. The use of UNIX, an operating system for the
PDP-11 computer designed by Dennis Ritchie and Kenneth Thompson,
expedited the preparation of certain parts of this manuscript.
ALFRED V. AHO
JEFFREY D. ULLMAN
CONTENTS
PREFACE
O MATHEMATICAL PRELIMINARIES
0.1 Concepts from Set Theory

0.1.1 Sets 1
0.1.2 Operations on Sets 3
0.1.3 Relations 5
0.1.4 Closures of Relations 7
0.1.5 Ordering Relations 9
0.1.6 Mappings 10
Exercises 11
0.2 Sets of Strings 15

0.2.1 Strings 15
0.2.2 Languages 16
0.2.3 Operations on Languages 17
Exercises 18
0.3 Concepts from Logic 19

0.3.1 Proofs 19
0.3.2 Proof by Induction 20
0.3.3 Logical Connectives 21
Exercises 22
Bibliographic Notes 25
0.4 Procedures and Algorithms 25

0.4.1 Procedures 25
0.4.2 Algorithms 27
, , ,
XIII
xiv CONTENTS
0.4.3 Recursive Functions 28

0.4.4 Specification of Procedures 29
0.4.5 Problems 29
0.4.6 Post's Correspondence Problem 32
Exercises 33
0.5 Concepts from Graph Theory 37

0.5.1 Directed Graphs 37
0.5.2 Directed Acyclic Graphs 39
0.5.3 Trees 40
0.5.4 Ordered Graphs 41
0.5.5 Inductive Proofs Involving Dags 43
0.5.6 Linear Orders from Partial Orders 43
0.5.7 Representations for Trees 45
0.5.8 Paths Through a Graph 47
Exercises 50
| AN I N T R O D U C T I O N TO C O M P I L I N G 53
1.1 Programming Languages 53
1.1.1 Specification of Programming Languages 53
1.1.2r Syntax and Semantics 55
1.2 An Overview of Compiling 58

1.2.1 The Portions of a Compiler 58
1.2.2 Lexical Analysis 59
1.2.3 Bookkeeping 62
1.2.4 Parsing 63
1.2.5 Code Generation 65
1.2.6 Code Optimization 70
1.2.7 Error Analysis and Recovery 72
1.2.8 Summary 73
Exercises 75
1.3 Other Applications of Parsing and Translating Algorithms 77
1.3.1 Natural Languages 78
1.3.2 Structural Description of Patterns 79
2 ELEMENTS OF L A N G U A G E T H E O R Y 83
2.1 Representations for Languages 83

2.1.1 Motivation 84
CONTENTS xv
2.1.2 Grammars 84
2.1.3 Restricted Grammars 91
2.1.4 Recognizers 93
Exercises 96
2.2 Regular Sets, Their Generators, and Their Recognizers 103

2.2.1 Regular Sets and Regular Expressions 103
2.2.2 Regular Sets and Right-Linear Grammars 110
2.2.3 Finite Automata 112
2.2.4 Finite Automata and Regular Sets 118
2.2.5 Summary 120
Exercises 121
Bibiliographic Notes 124
2.3 Properties of Regular Sets 124

2.3.1 Minimization of Finite Automata 124
2.3.2 The Pumping Lemma for Regular Sets 128
2.3.3 Closure Properties o f Regular Sets 129
2.3.4 Decidable Questions About Regular Sets 130
Exercises 132
2.4 Context-free Languages 138

2.4.1 Derivation Trees 139
2.4.2 Transformations on Context-Free Grammars 143
2.4.3 Chomsky Normal Form 151
2.4.4 Greibach Normal Form 153
2.4.5 An Alternative Method of Achieving Greibach Normal Form 159
Exercises 163
2.5 Pushdown Automata 167

2.5.1 The Basic Definition 167
2.5.2 Variants of Pushdown Automata 172
2.5.3 Equivalence of PDA Languages and CFL's 176
2.5.4 Deterministic Pushdown Automata 184
Exercises 190
2.6 Properties of Context-Free Languages 192

2.6.1 Ogden's Lemma 192
2.6.2 Closure Properties of CFL's 196
2.6.3 Decidability Results 199
2.6.4 Properties o f Deterministic CFL's 201
2.6.5 Ambiguity 202
Exercises 207
Bibliographic Notes 21I
xvi CONTENTS
3 THEORY OF T R A N S L A T I O N 212
3.1 Formalisms for Translations 213

3.1.1 Translation and Semantics 213
3.1.2 Syntax-Directed Translation Schemata 215
3.1.3 Finite Transducers 223
3.1.4 Pushdown Transducers 227
Exercises 233
3.2 Properties of Syntax-Directed Translations 238

3.2.1 Characterizing Languages 238
3.2.2 Properties of Simple SDT's 243
3.2.3 A Hierarchy of SDT's 243
Exercises 250
3.3 Lexical Analysis 251

3.3.1 An Extended Language for Regular Expressions 252
3.3.2 Indirect Lexical Analysis 254
3.3.3 Direct Lexical Analysis 258
3.3.4 Software Simulation of Finite Transducers 260
Exercises 261
3.4 Parsing 263

3.4.1 Definition of Parsing 263
3.4.2 Top-Down Parsing 264
3.4.3 Bottom-Up Parsing 268
3.4.4 Comparison of Top-Down and Bottom-Up Parsing 271
3.4.5 Grammatical Covering 275
Exercises 277
4 GENERAL PARSING M E T H O D S 281
4.1 Backtrack Parsing 282

4.1.1 Simulation of a P D T 282
4.1.2 Informal Top-Down Parsing 285
4.1.3 The Top-Down Parsing Algorithm 289
4.1.4 Time and Space Complexity of the Top-Down Parser 297
4.1.5 Bottom-UpParsing 301
Exercises 307
CONTENTS xvii
4.2 Tabular Parsing Methods 314

4.2.1 The Cocke- Younger-Kasami Algorithm 314
4.2.2 The ParsingMethodofEarley 320
Exercises 330
5 O N E - P A S S NO B A C K T R A C K PARSING 333
5.1 LL(k) Grammars 334

5.1.1 Definition of LL(k) Grammar 334
5.1.2 Predictive Parsing Algorithms 338
5.1.3 Implications of the LL(k) Definition 342
5.1.4 Parsing LL(1) Grammars 345
5.1.5 Parsing LL(k) Grammars 348
5.1.6 Testing for the LL(k) Condition 356
Exercises 361
5.2 Deterministic Bottom-Up Parsing 368

5.2.1 Deterministic Shift-Reduce Parsing 368
5.2.2 LR(k) Grammars 371
5.2.3 Implications of the LR(k) Definition 380
5.2.4 Testing for the LR(k) Condition 391
5.2.5 Deterministic Right Parsers for LR(k) Grammars 392
5.2.6 Implementation of LL(k) and LR(k) Parsers 396
Exercises 396
5.3 Precedence Grammars 399

5.3.1 Formal Shift-Reduce Parsing Algorithms 400
5.3.2 Simple Precedence Grammars 403
5.3.3 Extended Precedence Grammars 410
5.3.4 Weak Precedence Grammars 415
Exercises 424
5.4 Other Classes of Shift-Reduce Parsable Grammars 426

5.4.1 Bounded-Right-Context Grammars 427
5.4.2 Mixed Strategy Precedence Grammars 435
5.4.3 Operator Precedence Grammars 439
5.4.4 Floyd-Evans Production Language 443
5.4.5 Chapter Summary 448
Exercises 450
°°°
XVIll CONTENTS
6 LIMITED B A C K T R A C K PARSING A L G O R I T H M S 456
6.1 Limited Backtrack Top-Down Parsing 456

6.1.1 TDPL 457
6.1.2 TDPL and Deterministic Context-Free Languages 466
6.1.3 A Generalization of TDPL 469
6.1.4 Time Complexity of GTDPL Languages 473
6.1.5 Implementation of GTDPL Programs 476
Exercises 482
6.2 Limited Backtrack Bottom-Up Parsing 485

6.2.1 Noncanonical Parsing 485
6.2.2 Two-StackParsers 487
6.2.3 ColmerauerPrecedence Relations 490
6.2.4 Test forColmerauerPrecedence 493
Exercises 499
APPENDIX 501
A.1 Syntax for an Extensible Base Language 501

A.2 Syntax of SNOBOL4 Statements 505
A.3 Syntax forPL360 507
A.4 A Syntax-Directed Translation Scheme for PAL 512
BIBLIOGRAPHY 519
INDEX TO L E M M A S , T H E O R E M S ,
AND ALGORITHMS 531
INDEX TO V O L U M E i 533
MATHEMATICAL
0 PRELIMINARIES
To speak clearly and accurately we need a precise and well-defined lan-

guage. This chapter describes the language that we shall use to discuss pars-
ing, translation, and the other topics to be covered in this book. This language
is primarily elementary set theory with some rudimentary concepts from
graph theory and logic included. For readers having background in these
areas, Chapter 0 can be easily skimmed and treated as a reference for nota-
tion and definitions.
0.1. CONCEPTS FROM SET THEORY
This section will briefly review some of the most basic concepts from set
theory: relations, functions, orderings, and the usual operations on sets.
0.1.1. •S e t s
In what follows, we assume that there are certain objects, referred to as

atoms. The term atom wilt be a rudimentary concept, which is just another
way of saying that the term atom will be left undefined, and what we choose
to call an atom depends on our domain of discourse. Many times it is con-
venient to consider integers or letters of an alphabet to be atoms.
We also postulate an abstract notion of membership. If a is a member of
A, we write a ~ A. The negation of this statement is written a ~ A. We
assume that if a is an atom, then it has no member; i.e., x ~ a for all x in
the domain of discourse.
We shall also use certain primitive objects, called sets, which are not
atoms. If A is a set, then its members or elements are those objects a (not
2 MATHEMATICAL PRELIMINARIES CHAP. 0
necessarily atoms) such that a E A. Each member of a set is either an atom

or another set. We assume each member of a set appears exactly once in
that set. If A has a finite number of members, then A is a finite set, and we
often write A = {al, a2,. • •, an}, if a l , . . . , an are all the members of A and
a~ ~ a j, for i ~ j. Note that order is unimportant. We could also write
A = [ a n , . . . , al}, for example. We reserve the symbol ~ for the empty set,
the set which has no members. Note that an atom also has no members,
but ~ is not an atom, and no atom is ~ .
The statement @A = n means that set A has n members.
Example 0.1
Let the nonnegative integers be atoms. Then A = {1, {2, 3}, 4} is a set.
A's members are l, {2, 3}, and 4. The member {2, 3} of A is also a set. Its
members are 2 and 3. However, the atoms 2 and 3 are not members of
A itself. We could equivalently have written A = {4, 1, {3, 2}}. Note that
#A=3. D
A useful way o f defining sets is by means of a predicate, a statement
involving one or more unknowns which has one of two values, true or false.
The set defined by a predicate consists of exactly those elements for which
the predicate is true. However, we must be careful what predicate we choose
to define a set, or we may attempt to define a set that could not possibly exist.
Example 0.2
The phenomenon alluded to above is known as Russell's paradox. Let
P(X) be the predicate " X is not a member of itself"; i.e., X ~ X. Then we
might think that we could define the set Y of all X such that P(X) was true;
i.e., Y consists of exactly those sets that are not members of themselves.
Since most common sets seem not to be members of themselves, it is tempting
to suppose that set Y exists.
But if Y exists, we should be able to answer the question, "Is Y a member
of itself?" But this leads to an impossible situation. If Y ~ Y, then P(Y) is
false, and Y is not a member of itself, by definfi:ion of Y. Hence, it is not
possible that Y ~ Y. Conversely, suppose that Y ~ Y. Then, by definition
of Y again, Y ~ Y. We see that Y ~ Y implies Y ~ Y and that Y ~ Y
implies Y ~ Y. Since either Y ~ Y or Y ~ Y is true, both are true, a situa-
tion which we shall assume is impossible. One "way out" is to accept that
set Y does not exist. [Z]
The normal way to avoid Russell's paradox is to define sets only by those
predicates P(X) of the form " X is in A and Pi(X)," where A is a known set
and P1 is an arbitrary predicate. If the set A is understood, we shall just write
PI(X) for " X is in A and Pi(X)."
SEC. 0.1 CONCEPTS FROM SET THEORY 3
If P(X) is a predicate, then we denote the set of objects X for which

P(X) is true by IX] P(X)].
Example 0.3
Let P(X) be the predicate "X is a nonnegative even integer." That is,
P(X) is "X is in the set of integers and Pi(X)," where PI(X) is the predi-
cate " X is even." Then A----[X]P(X)} is the set which is often written
{0, 2, 4, . . . , 2 n , . . . ) . Colloquially, we can assume that the set of nonnega-
tive integers is understood, and write A -- IX I X is even). D
We have glossed over a great deal of development called axiomatic set

theory. The interested reader is referred to Halmos [1960] or Suppes [1960]
(see the Bibliography at the end of Volume 1) for a more complete treatment
of this subject.
DEFINITION
We say that set A is included in set B, written A ___ B, if every element of

A is also an element of B. Sometimes we say that B includes A, written
B ~ A, if A ~ B. In either case, A is said to be a subset of B, and B a superset
of A.
If B contains an element not contained in A and A ___ B, then we say
that A is properly included in B, written A ~ B (or B properly includes A,
written B ~ A). We can also say that A is a proper subset of B or that B is
a proper superset of A.
Two sets A and B are equal if and only if A ~ B and B _ A.
A picture called a Venn diagram is often used to graphically describe set
membership and inclusion. Figure 0. i shows a Venn diagram for the relation
A~_B.
Fig. 0.1 Venn diagram of set inclusion:

A~B.
0.1.2. Operations on Sets
There are several basic operations on sets which can be used to construct
new sets.
DEFINITION
Let A and B be sets. The union of A and B, written A U B, is the set
containing all elements in A together with all elements in B. Formally,
A uB=~xIxEA orx~B}.t
The intersection of A and B, written A ~ B, is the set of all elements
that are in both A and B. Formally, A n B = {x Ix ~ A and x 6 B}.
The difference of A a n d B, written A -- B, is the set of all elements in A
that are not in B. If A = U m t h e set of all elements under consideration or
the universal set, as it is sometimes called--then U - B is often written ./~
and called the complement of B.
Note that we have referred to the universal set as the set of all objects
"under consideration." We must be careful to be sure that U exists. For
example, if we choose U to be "the set of all sets," then we would have
Russell's paradox again. Also, note that /t is not well defined unless we
assume that complementation with respect to some known universe is
implied.
In general A B = A n /t. Venn diagrams for these set operations are
shown in Fig. 0.2.
(a) (b) (c)
AUB AnB A-B

Fig. 0.2 Venn diagrams of set operations.
If A (~ B -- ;~, then A and B are said to be disjoint.

DEFINITION
If I is some (indexing) set such that A t is a known set for each i i n / , then
we write U A~ for {X] there exists i ~ I such that X ~ At}. Since I may not
iEI
be finite, this definition is an extension of the union of two sets. If I is defined
by predicate P(i), we sometimes write U At for U At. For example, " U At"
P(i) iEI i>2
means A3 u A, U As U . . .
tNote that we may not have a set guaranteed to include A W B, so this use of predi-
cate definition appears questionable. In axiomatic set theory, the existence of A w B is
taken to be an axiom.
DEFINITION
Let A be a set. The power set of A, written ~P(A) or sometimes 2 A, is the
set of all subsets of A. That is, 6'(A) = [B! B ~ A}.t
Example 0.4
Let A = {1, 2}. Then ~P(A) = ( ~ , {1}, {2}, {1, 2}}. As another example,
6,(~) = {~]. 13
In general, if A is a finite set of m members, CP(A) has 2 m members. The
empty set is a m e m b e r of 6'(A) for every A.
We have observed that the members of a set are considered to be un-
ordered. It is often convenient to have ordered pairs of objects available for
discourse. We thus make the following definition.
DEFINITION
Let a and b be objects. Then (a, b) denotes the ordered pair consisting of
a and b in that order. We say that (a, b) = (c, d) if and only if a = c and
b = d. In contrast, [a, b} = {b, a}.
Ordered pairs can be considered sets if we define (a, b) to be the set
[a, [a, b}}. It is left to the Exercises to show that [a, [a, b]} = [c, [c, d]} if and
only if a = c and b = d. Thus this definition is consistent with what we
regard to be the fundamental property of ordered pairs.
DEFINITION
The Cartesian product of sets A and B, denoted A x B, is
{(a, b) la ~ A and b ~ B}.
Example 0.5
Let A = [ 1, 2} and B = {2, 3, 4}. Then
A x B = ((1, 2), (I, 3), (1, 4), (2, 2), (2, 3), (2, 4)}. D
0.1.3. Relations
M a n y c o m m o n mathematical concepts, such as membership, set inclu-

sion, and arithmetic "less than" ( < ) , are referred to as relations. We shall
give a formal definition of the concept and see how c o m m o n examples of
relations fit the formal definition.
~The existence of the power set of any set is an axiom of set theory. The other set
defining axioms, in addition to the power set axiom and the union axiom previously
mentioned, are:
(1) If A is a set and P a predicate, then [XIP(X) and X ~ A} is a set.
(2) If X is an atom or set, then [X} is a set.
(3) If A is a set, then {XI for some Y, we have X ~ Y and Y ~ A} is a set.
DEFINITION
Let A and B be sets. A relation from A to B is any subset of A x B. If
A = B, we say that the relation is on A. If R is a relation from A to B, we
write a R b whenever (a, b) is in R. We call A the domain of R, and B the
range of R.
Example 0.6
Let A be the set of integers. The relation < is {(a, b) la is less than b}.
We thus write a < b exactly when we would expect to do so. [[]
DEFINITION
The relation {(b, a)[ (a, b ) ~ R} is called the inverse of R and is often
denoted R- 1.
A relation is a very general concept. Often a relation may possess certain
properties to which special names have been given.
DEFINITION
Let A be a set and R a relation on A. We say that R is
(1) Reflexive if a R a for all a in A,
(2) Symmetric if "a R b" implies "b R a" for a, b in A, and
(3) Transitive if "a R b and b R c" implies "a R c" for a, b, c in A. The
elements a, b, and c need not be distinct.
Relations obeying these three properties occur frequently and have addi-
tional properties as a consequence. The term equivalence relation is used to
describe a relation which is reflexive, symmetric, and transitive.
An important property of equivalence relations is that an equivalence
relation R on a set A partitions A into disjoint subsets called equivalence
classes. For each element a in A we define [a], the equivalence class of a, to
be the set {b l a R b}.
Example 0.7
Consider the relation of congruence modulo N on the nonnegative inte-
gers. We say that a _= b mod N (read "a is congruent to b modulo N") if
there is an integer k such that a - b = kN. As a specific case let us take
N = 3. Then the set {0, 3, 6 , . . . , 3 n , . . . } forms an equivalence class, since
3n ~ 3m mod 3 for all integer values of m and n. We shall use [0] to denote
this class. We could have used [3] or [6] or [3n], since any element of an
equivalence class can be used as a representative of that class.
The two other equivalence classes under the relation congruence modulo
3 are
[1] = {1, 4, 7 , . . . , 3n -t- 1 , . . . }
[2] = {2, 5, 8 , . . . , 3n -q- 2 , . . . }
The union of the three sets [0], [ 1] and [2] is the set of all nonnegative integers.
Thus we have partitioned the set of all nonnegative integers into the three
disjoint equivalence classes [0], [1], and [2] by means of the equivalence rela-
tion congruence modulo 3 (Fig. 0.3). D
Set of all
nonnegative
integers
[1]
Fig. 0.3 Equivalence classes for congruence modulo 3.
The index of an equivalence relation on a set A is the number of equiva-

lence classes into which A is partitioned.
The following theorem about equivalence relations is left as an exercise.
THEOREM 0.1
Let R be an equivalence relation on A. Then for all a and b in A, either

[a] = [b] or [a] and [b] are disjoint.
Proof. Exercise. D
0.1.4. Closures of Relations
Given a relation R, we often need to find another relation R', which

includes R and has certain additional properties, e.g., transitivity. Moreover,
we would generally like R' to be as "small" as possible, that is, to be a subset
of every other relation including R which has the desired properties. Of
course, the "smallest" relation may not be unique if the additional properties
are strange. However, for the common properties mentioned in the previous
section, we can often find a unique superset of a given relation wffh these
as additional properties. Some specific cases follow.
DEFINITION
The k-fold product of a relation R (on A), denoted R k, is defined as fol-

lows"
(1) a R I b if and only if a R b;
(2) a R t b if and only if there exists c in A such that a R c and c R ~- 1 b
f o r i > 1.
This is an example of a recursive definition, a method of definition we
shall use many times. To examine the recursive aspect of this definition,
8 MATHEMATICALPRELIMINARIES CHAP. 0
suppose that a R* b. T h e n by (2) there is a c~ such that a R c~ a n d c 1 R 3 b.

Applying (2) again there is a c 2 such that c 1 R c 2 a n d c 2 R 2 b. One m o r e
application o f (2) says that there is a c 3 such that c z R c 3 a n d c 3 R ~ b. N o w
we can apply (1) and say that c3 R b.
Thus, if a R 4 b, then there exists a sequence of elements c 1, c 2, c 3 in A such
that a R cl, cl R c2, c2 R c3, a n d c a R b.
The transitive closure of a relation R on a set A will be d e n o t e d R ÷. We
define a R ÷ b if a n d only if a R ~b for some i ~ 1. We shall see that R ÷ is
the smallest transitive relation that includes R.
W e could have alternatively defined R ÷ by saying that a R ÷ b if there
exists a sequence cl, c2 . . . . , c, of zero or more elements in A such that
a R c~, c~ R c 2, . . . , c,_ 1 R c,, c, R b. I f n -- 0, a R b is meant.
The reflexive and transitive closure o f a relation R o n a set A is d e n o t e d
R* a n d is defined as follows"
(1) a R* a for all a in A;
(2) a R* b if a R ÷ b;
(3) N o t h i n g is in R* unless its being there follows from (1) or (2).
I f we define R ° by saying a R ° b if and only if a = b, then a R* b if a n d only
if a R l b for some i ~ 0.
The only difference between R + a n d R* is that a R* a is true for all a in
A b u t a R ÷ a m a y or m a y not be true. R* is the smallest reflexive a n d transi-
tive relation that includes R.
In Section 0.5.8 we shall examine m e t h o d s o f c o m p u t i n g the reflexive
a n d tralasitive closure of a relation efficiently. We would like to prove here
that R ÷ a n d R* are the smallest supersets of R with the desired properties.
TI-IEOR~ 0.2
If R ÷ a n d R* are, respectively, the transitive a n d reflexive-transitive
closure o f R as defined above, then
(a) R ÷ is transitive; if R' is a n y transitive relation such that R _~ R',
then R + ~ R'.
(b) R* is reflexive and transitive; if R' is any reflexive a n d transitive
relation such that R ~ R', then R* ~ R'.
Proof We prove only (a); (b) is left as an exercise. First, to show that
R ÷ is transitive, we m u s t show that if a R+b and b R ÷ c, then a R ÷ c.
Since a R ÷ b, there exists a sequence of elements d l , . . . , d, such that
dl R d 2 , . . . , d,_~ R d,, where dl = a a n d d~ = b. Since b R + c, we can find
e l , . . . , e , such that el R e2 . . . . , em_l R era, where el = b = d, and em = c.
Applying the definition of R + m ÷ n times, we conclude that a R + c.
N o w we shall show that R + is the smallest.transitive relation that includes
R. Let R' be any transitive relation such that R ~ R'. We must show that
R + ~_ R'. Thus let (a, b) be in R+; i.e., a R ÷ b. Then there is a sequence
c l , . . . , c, such that a = c 1, b = c,, and c tRct+ 1 for 1 ~ i < n. Since

R ~ R', we have c~ R' c~+1 for 1 _< i < n. Since R' is transitive, repeated appli-
cation of the definition of transitivity yields c~ R' c,; i.e., a R' b. Since (a, b)
is an arbitrary member of R +, we have shown that every member of R + is
also a member of R'. Thus, R + ~ R', as desired. D
0.1.5. Ordering Relations
An important class of relations on sets arc the ordering relations. In gen-

eral, an ordering on a set A is any transitive relation on A. In the study of
algorithms, a special type of ordering, called a partial order, is particularly
important.
DEFINITION
A partial order on a set A is a relation R on A such that

(1) R is transitive, and
(2) For all a in A, a R a is false. (That is, R is irreflexive.)
F r o m properties (i) and (2) of a partial order it follows that if a R b, then
b R a is false. This is called the asymmetric property.
Example 0.8
An example of a partial order is proper inclusion of sets. For example,
let S = {ei . . . . , e,} be a set of n elements and let A = 6'(S). There are 2"
elements in A. Then define a R b if and only if a ~ b for all a, b in A. R is
a partial order.
If S = {0, 1, 2}, then Fig. 0.4 graphically depicts this partial order. Set
$1 properly includes set $2 if and only if there is a path downward from
S~ to $2. D
In the literature the term partial order is sometimes used to denote what
we call a reflexive partial order.
0, 1,2~
2}
Fig. 0.4 A partial order.

DEFINITION
A reflexive partial order on a set A is a relation R such that

(1) R is transitive,
(2) R is reflexive, and
(3) If a R b and b R a, then a = b. This property is called antisymmetry.
An example of a reflexive partial order would be (not necessarily proper)
inclusion of sets.
In Section 0.5 we shall show that every partial order can be graphically
displayed in terms of a structure called a directed acyclic graph.
An important special case of a partial order is linear order (sometimes
called a total order).
DEFINITION
A linear order R on a set A is a partial order such that if a and b are in

A, then either a R b, b R a, or a = b. If A is a finite set, then one conven-
ient representation for the linear order R is to display the elements of A
as a sequence a 1, a 2 , . . . , a, such that at R aj if and only if i < j, where
A = [ a l , . . . , a,}.
We can also define a reflexive linear order analogously. That is, R is
a reflexive linear order on A if R is a reflexive partial order such that for all
a and b in A, either a R b or b R a.
For example, the relation < (less than) on the nonnegative integers is
a linear order. The relation < is a reflexive linear order.
0.1.6. Mappings
One important kind of relation that we shall be using is known as a

mapping.
DEFINITION
A mapping (also function or transformation) M from a set A to a set B is
a relation from A to B such that if (a, b) and (a, c) are in M, then b = c.
If (a, b) is in M, we shall often write M ( a ) = b. We say that M(a) is
defined if there exists b in B such that (a, b) is in M. If m(a) is defined for
all a in A, we shall say that M is total. If we wish to emphasize that M may
not be defined for all a in A, we shall say that M is a partial mapping (func-
tion) from A to B. In either case, we write M : A ~ B. We call A and B the
domain and range of M, respectively.
If M " A ~ B is a mapping having the property that for each b in B
there is at most one a in A such that M(a)~- b, then M is an injection (one-
to-one mapping) from A into B. If M is a total mapping such that for each
b in B there is exactly one a in A such that M(a) = b, then M is a bijection
(one-to-one correspondence) between A and B.
If M " A ~ B is an injection, then we can find the inverse mapping
EXERCISES 11
M-1 : B ---~ A such that M-l(b) = a if and only if M(a) = b. If there exists
b in B for which there is no a in A such that M(a) = b, then M-~ will be a
partial function.
The notion of a bijection is used to define the cardinality of a set, which,
informally speaking, denotes the number of elements the set contains.
DEFINITION
Two sets A and B are of equal cardinality if there is a bijection M from
AtoB.
Example 0.9
{0, 1, 2} and {a, b, c} are of equal cardinality. To prove this, use, for
example, the bijection M = {(0, a), (1, b), (2, c)}. The set of integers is
equal in cardinality to the set of even integers, even though the latter is
a proper subset of the former. A bijection we can use to prove this would be
{(i, 2i)1i is an integer}. E]
We can now define precisely what we mean by a finite and infinite set.I
DEFINITION
A set S isfinite if it is equal in cardinality to the set {1, 2 , . . . , n} for some
integer n. A set is infinite if it is equal in cardinality to a proper subset of
itself. A set is countable if it is equal in cardinality to the set of positive
integers. It follows from Example 0.9 that every countable set is infinite.
An infinite set that is not countable is called uncountable.
Examples of countable sets are
(1) The set of all positive and negative integers,
(2) The set of even integers, and
(3) {(a, b) la and b are integers}.
Examples of uncountable sets are
(1) The set of real numbers,
(2) The set of all mappings from the integers to the integers, and
(3) The set of all subsets of the positive integers.
EXERCISES
0.1.1. Write out the sets defined by the following predicates. Assume that
A = {0, 1,2, 3, 4, 5, 6}.
(a) {XI X is in A and X is even}.
tWe have used these terms previously, of course, assuming that their intuitive meaning
was clear. The formal definitions should be of some interest, however.
(b) {XI X is in A and X is a perfect square}.

(c) {XI X is in A and X >_ X 2 + 1}.
0.1.2. Let A = {0, 1, 2} and B = {0, 3, 4}. Write out
(a) a U B.
(b) A N B .
(c) A - - B .
(d) 6'(a).
(e) A × B.
0.1.3. Show that if A is a set with n elements, then (P(A) has 2" elements.
0.1.4. Let A and B be sets and let U be some universal set with respect to
which complements are taken. Show that
(a) A u B = ~ n a .
(b) A n B = A UB.
These two identities are referred to as De Morgan's laws.
0.1.5. Show that there does not exist a set U such that for all sets A, A ~ U.
Hint: Consider Russell's paradox.
0.1.6. Give an example of a relation on a set which is
(a) Reflexive but not symmetric or transitive.
(b) Symmetric but not reflexive or transitive.
(c) Transitive but not reflexive or symmetric.
In each case specify the set on which the relation is defined.
0.1.7. Give an example of a relation on a set which is
(a) Reflexive and symmetric but not transitive.
(b) Reflexive and transitive but not symmetric.
(c) Symmetric and transitive but not reflexive.
Warning: D o not be misled into believing that a relation which is sym-
metric and transitive must be reflexive (since a R b and b R a implies
aRa).
0.1.8. Show that the following relations are equivalence relations"
(a) {(a, a) l a ~ A }.
(tO Congruence on the set of triangles.
0.1.9. Let R be an equivalence relation on a set A. Let a and b be in A.
Show that
(a) [a] = [b] if and only if a R b.
(b) [a] n [b] = Z~ if and only if a R b is false.?
0.1.10. Let A be a finite set. What equivalence relations on A induce the largest
and smallest number of equivalence classes ?
0.1.11. Let A = {0, 1, 2) and R = [(0, 1), (1, 2)}. Find R* and R +.
0.1.12. Prove Theorem 0.2(b).
tBy "a R b is false," we mean that (a, b) ~ R.

EXERCISES 13
0.1.13. Let R be a relation on A. Show that there is a unique relation Re such

that
(I) R ~ R,,
(2) Re is an equivalence relation on A, and
(3) If R' is any equivalence relation on A such that R ~ R', then
Re ~ R'.
Re is called the least equivalence relation containing R.
DEFINITION
A well order on a set A is a reflexive partial order R on A such that
for each n o n e m p t y subset B ___ A there exists b in B such that b R a
for all a in B (i.e., each n o n e m p t y subset contains a smallest element).
0.1.14. Show that ~ (less than or equal to) is a well order on the positive
integers.
DEFINITION
Let A be a set. Define
(1) A 1 -- A, and
(2) A" = A '-~ × A, for i > 1.
Let A + denote U A;.
i>_ 1
0.1.15. Let R be a well order on A. Define /~ on A + by"

(al . . . . . am)? R (bl . . . . . b,) if and only if either
(1) F o r some i < m, a i = b i for 1 _ j < i, a¢ ~ b~ and as Rb~, or
(2) m ~ n , and a~ -- b~ for all i, 1 ~ i ~ m .
Show that /~ is a well order on A +. We call /~ a lexicographie order
on A +. (The ordering of words in a dictionary is an example of a
lexicographic order.)
0.1.16. State whether each of the following are partial orders, reflexive partial
orders, linear orders, or reflexive linear orders"
(a) ~ on 6'(A).
(b) ~ on 6~(A).
(c) The relation R1 on the set H of h u m a n beings defined by a R1 b
if and only if a is the father of b.
(d) The relation Rz on H given by a R b if and only if a is an
ancestor of b.
(e) The relation R s on H defined by a R3 b if and only if a is older
than b.
0.1.17. Let R1 and Rz be relations. The composition of Ri and Rz, denoted
Ri o Rz is {(a, b)[for some c, a R1 c and c Rz b}. Show that if R1 and
R2 are mappings, then R1 o R2 is a mapping. U n d e r what conditions
will R i o R2 be a total mapping ? An injection ? A bijection ?
•J'Strictly speaking, (al . . . . . am) means (((... (al, a2), a3) . . . . ), am), according to the
definition of A m.
0.1.18. Let A be a finite set and let B ~ A. Show that if M" A ---~ B is a bijec-
tion, then A = B.
0.1.19. Let A and B have m and n elements, respectively. Show that there are
n m total functions from A t o B. How many (not necessarily total)
functions from A to B are there ?
'0.1.20. Let A be an arbitrary (not necessarily finite) set. Show that the sets
&(A) and {MI M is a total function from A to {0, 1}} are of equal car-
dinality.
0.1.21. Show that the set of all integers is equal in cardinality to
(a) The set of primes.
(b) The set of pairs of integers.
Hint: Define a linear order on the set of pairs of integers by
( i l , j l ) R (i2,J2) if and only if il + j l < i2 + j 2 or il + j l = i2 + j 2
and il < i2.
0.1.22. Set A is "larger than" B if A and B are of different cardinality but B
is equal in cardinality to a subset of A. Show that the set of real numbers
between 0 and 1, exclusive, is larger than the set of integers. Hint:
Represent real numbers by unique decimal expansions. In contradic-
tion, suppose that the two sets in question were of equal cardinality.
Then we could find a sequence of real numbers rl, r 2 , . . , which
included all real numbers r, 0 < r < 1. Can you find a real number r
between 0 and 1 which differs in the ith decimal place from rt for all i?
"0.1.23. Let R be a linear order on a finite set A. Show that there exists a unique
element a ~ A such that a R b for all b ~ A --{a}. Such an element
a is called the least element. If A is infinite, does there always exist a
least element ?
"0.1.24. Show that [a, [a, b}] = {c, [c, d}} if and only if a = c and b = d.
0.1.25. Let R be a partial order on a set A. Show that if a R b, then b R a is
false.
"0.1.26. Use the power set and union axioms to help show that if A and B are
sets, then A × B is a set.
*'0.1.27. Show that every set is either finite or infinite, but not both.
"0.1.28. Show that every countable set is infinite.
"0.1.29. Show that the following sets have the same cardinality:
(1) The set of real numbers between 0 and 1,
(2) The set of all real numbers,
(3) The set of all mappings from the integers to the integers, and
(4) The set of all subsets of the positive integers,
*'0.1.30. Show that 6~(A) is always larger than A for any set A.
0.1.31. Show that if R is a partial order on a set A, then the relation R' given
by R' = R w {(a, a)[a ~ A} is a reflexive partial order on A.
SEC. 0.2 SETS OF STRINGS 15
0.1.32. Show that if R is a reflexive partial order on a set A, then the relation
R' = R -- [(a, a)la ~ A} is a partial order on A.
0.2. SETS OF STRINGS
In this book we shall be dealing primarily with sets whose elements are
strings of symbols. In this section we shall define a number of terms dealing
with strings.
0.2.1. Strings
First of all we need the concept of an alphabet. To us an alphabet will be

any set of symbols. We assume that the term symbol has a sufficiently clear
intuitive meaning that it needs no further explication.
An alphabet need not be finite or even countable, but for all practical
applications alphabets will be finite. Two examples of alphabets are the set
of 26 upper- and 26 lowercase Roman letters (the Roman alphabet) and the
set {0, 1}, which is often called the binary alphabet.
The terms letter and character will be used synonymously with symbol to
denote an element of an alphabet. If we put a sequence of symbols side by
side, we have a string of symbols. For example, 01011 is a string over the
binary alphabet [0, 1}. The terms sentence and word are often used as syno-
nyms for string.
There is one string which arises frequently and which has been given
a special denotation. This is the empty string and it will be denoted by the
symbol e. The empty string is that string which has no symbols.
CONVENTION
We shall ordinarily use capital Greek letters for alphabets. The letters
a, b, c, and d will represent symbols and the letters t, u, v, w, x, y, and z
generally represent strings. We shall represent a string of i a's by a ~. For
example, a t = a,t a z = aa, a 3 = aaa, and so forth. Then, a ° is e, the empty
string.
DEFINITION
We formally define strings over an alphabet E in the following manner:

(1) e is a string over ~.
(2) If x is a string over Z and a is in E, then xa is a string over Z.
(3) y is a string over E if and only if its being so follows from (1) and (2).
There are several operations on strings for which we shall have use later
on. If x and y are strings, then the string xy is called the concatenation of x
tWe thus identify the symbol a and the string consisting of a alone.
and y. For example, if x = ab and y = cd, then xy = abed. For all strings
X, xe -~- e x = x.
The reversal of a string x, denoted x R, is the string x written in reverse

order; i.e., if x : a~ • .. a,, where each a~ is a symbol, then x R : a , . . . a , .
Also, e R : e.
Let x, y, and z be arbitrary strings over some alphabet E. We call x a
prefix of the string xy and y a suffix of xy. y is a substring of xyz. Both a
prefix and suffix of a string x are substrings of x. For example, ba is both
a prefix and a substring of the string bac. Notice that the empty string is
a substring, a prefix, and a s u ~ x of every string.
If x :/: y and x is a prefix (suffix) of some string y, then x is called a proper
prefix (suffix) of y.
The length of a string is the number of symbols in the string. That is, if
x ---- a~ . . . a,, where each at is a symbol, then the length of x is n. We shall
denote the length of a string x b y ] x 1. For example, l a a b l = 3 a n d ] e l = 0.
All strings which we shall encounter will be of finite length.
0.2.2. Languages
DEFINITION
A language over an alphabet Z is a set of strings over E. This definition
surely encompasses almost everyone's notion of a language. F O R T R A N ,
A L G O L , PL/I, and even English are included in this definition.
Example 0.10
Let us consider some simple examples of languages over an alphabet Z.

The empty set ~ is a language. The set [e} which contains only the empty
string is a language. Notice that ~ and {~e}are two distinct languages. D
DEFINITION
We let E* denote the set containing all strings over E including e. For
example, if A is the binary alphabet {0, 1], then
E* = [e, 0, 1, 00, 01, 10, 11,000, 0 0 1 , . . . ] .
Every language over E is a subset of E*. The set of all strings over E but
excluding e will be denoted by E+.
Example 0.11
Let us consider the language L 1 containing all strings of zero or more

a's. We can denote L 1 by [atl i ~ 0}. It should be clear that L1 = {a}*. [Z]
SEC. 0.2 SETS OF STRINGS 17
CONVENTION
When no confusion results we shall often denote a set consisting of a
single element by the element itself. Thus according to this convention a*
= {a}*.
DEFINITION
A language L such that no string in L is a proper prefix (suffix) of any
other string in L is said to have the prefix (suffix) property.
For example, a* does not have the prefix property but (a~b[i ~ O} does.
0.2.3. Operations on Languages
We shall often be concerned with various operations applied to languages.

In this section, we shall consider some basic and fundamental operations on
languages.
Since a language is just a set, the operations of union, intersection, differ-
ence, and complementation apply to languages. The operation concatenation
can be applied to languages as well as strings.
DEFINITION
Let L1 be a language over alphabet El and L z a language over E2. Then
L1L2, called the concatenation or product of L a and L z, is the language
{xy[x ~ L~ and y ~ L2].
There will be occasions when we wish to concatenate any arbitrary num-
ber of strings from a language. This notion is captured in the closure of
a language.
DEFINITION
The closure of L, denoted L*, is defined as follows"
(1) L ° = (e}.
(2) L" = LL"-1 for n .~ 1.
(3) L * = U L".
n>_0
The positive closure of L, denoted L +, is [,_JL". Note that L + = LL* = L*L

n~l
and that L* = L + U (e}.
We shall also be interested in mappings on languages. A simple type of
mapping which occurs frequently when dealing with languages is homo-
morphism. We can define a homomorphism in the following way.
DEFINITION
Let E~ and E2 be alphabets. A homomorphism is a mapping h" E~ ~ E*.
We extend the domain of the homomorphism h to ~* by letting h ( e ) = e
and h(xa) = h(x)h(a) for all x in ~*, a in E,.
Applying a h o m o m o r p h i s m to a language L, we get another language

h(L), which is the set of strings (h(w) lw ~ L].
Example 0.12
Suppose that we wish to change every instance of 0 in a string to a and
every 1 to bb. We can define a h o m o m o r p h i s m h such that h ( 0 ) = a and
h(1) = bb. Then i f L is the language (0"l"[n ~ 1}, h ( L ) = {a"b2"ln ~ 1}. [Z
Although h o m o m o r p h i s m s on languages are not always one-to-one

mappings, it is often useful to talk about their inverses (as relations).
DEFINITION
If h" E~ --~ Ez* is a h o m o m o r p h i s m , then the relation h-1. E2* ---~ 6~(E~*),
defined below, is called an inverse homomorphism. If y is in E2*, then h-~(y)
is the set of strings over E1 which get mapped by h to y. That is, h-~(y) =
(xlh(x) = y}. I f L is a language over E2, then h-I(L)is the language over Ex
consisting of those strings which get mapped by h into a string in L. Formally,
h - ' ( L ) = U h-'(y) = (xlh(x) ~ L].
y EL
Example 0.13
Let h be a h o m o m o r p h i s m such that h(0) = a and h(1) = a. It follows
that h-~(a) = {0, 1} and h-~(a *) = [0, 1}*.
As a second example, suppose that h is a h o m o m o r p h i s m such that
h(0) = a and h(i) = e. Then h- l(e) = 1" and h- l(a) = 1'01". Here 1"01"
denotes the language [ 1i01J I i, j ~ 0], which is consistent with our definitions
and the convention which identifies a and [a]. [~]
EXERCISES
0.2.1. Give all the (a) prefixes, (b) suffixes, and (c) substrings of the string abe.
0.2.2. Prove or disprove: L + = L* -- {el.
0.2.3. Let h be the homomorphism defined by h(0) = a, h(1) = bb, and h(2) = e.
What is h(L), where L = {012]* ?t
0.2.4. Let h be as in Exercise 0.2.3. What is h-l({ab]*)?
*0.2.5. Prove or disprox,e the following"
(a) h-l(h(L)).'= L.
(b) h(h-~(L)) = L.
0,2,6, Can L* or L + ever be ~ ? Under what circumstances are L* and L +
finite ?
tNote that {012}* is not {0, 1, 2}*.

SEC. 0.3 CONCEPTSFROMLOGIC 19
*0.2.7. Give well orders on the following languages:

(a) (a, b}*.
(b) a*b*c*.
(c) (w[ w ~ (a, b}* and the number of a's in w equals the number of b's).
0.2.8. Which of the following languages have the prefix (suffix) property?
(a)~.
(b) {e).
(c) (a"b"ln ~ 1~.
(d) L*, if L has the prefix property.
(e) (wl w ~ {a, b}* and the number of a's in w equals the number of b's~}.
0.3. CONCEPTS F R O M LOGIC
In this book we shall present a number of algorithms which are useful

in language-processing applications. For some functions several algorithms
are known, and it is desirable to present the algorithms in a common frame-
work in which they can be evaluated and compared.
Above all, it is most desirable to know that an algorithm performs the
function that it is supposed to perform. For this reason, we shall provide
proofs that the various algorithms that we shall present function as adver-
tised. In this section, we shall briefly comment on what is a proof and men-
tion some useful techniques of proof.
0.3.1. Proofs
A formal mathematical system can be characterized by the following

basic components"
(1) Basic symbols,
(2) Formation rules,
(3) Axioms, and
(4) Rules of inference.
The set of basic symbols would include the symbols for constants, operators,
and so forth. Statements can then be constructed from these basic symbols
according to a set of formation rules. Certain primitive statements can be
defined, and the validity of these statements can be accepted without justifi-
cation. These statements are known as the axioms of the system.
Then certain rules can be specified whereby valid statements can be used
to infer new valid statements. Such rules are called rules of inference.
The objective may be to prove that a certain statement is true in a certain
mathematical system. A proof of that statement is a sequence of statements
such that
(1) Each statement is either an axiom or can be created from one or more
of the previous statements by means of a rule of inference.
(2) The last statement in the sequence is the statement to be proved.

A statement for which we can find a proof is called a theorem of that
formal system. Obviously, every axiom of a formal system is a theorem.
In spirit at least, the proof of any mathematical theorem can be formu-
lated in these terms. However, going to a level of detail in which each state-
ment is either an axiom or follows from previous statements by rudimentary
rules of inference makes the proofs of all but the most elementary theorems
too long. The tasl~ of finding proofs of theorems in this fashion is in itself
laborious, even for computers.
Consequently, mathematicians invariably employ various shortcuts to
reduce the length of a proof. Statements which are previously proved theo-
rems can be inserted into proofs. Also, statements can be omitted when it is
(hopefully) clear what is being done. This technique is practiced virtually
everywhere, and this book is no exception.
It is known to be impossible to provide a universal method for proving
theorems. However, in the next sections we shall mention a few of the more
commonly used techniques.
0.3.2. Proof by Induction
Suppose that we wish to prove that a statement S(n) about an integer n

is true for all integers in a set N.
If N is finite, then one method of proof is to show that S(n) is true for
each value of n in N. This method of proof is sometimes called proof by
perfect induction or proof by exhaustion.
If N is an infinite subset of the integers, then we may use simple mathe-
matical induction. Let no be the smallest value in N. To show that S(n) is
true for all n in N, we may equivalently show that
(1) S(no) is true. (This is called the basis of the induction.)
(2) Assuming that S(m) is true for all m < n in N, show that S(n) is also
true. (This is the inductive step.)
Example 0.14
Suppose then that S(n) is the statement
1-q-3+5+...+2n-- 1 =n 2
That is, the sum of odd integers is a perfect square. Suppose we wish to show
that S(n) is true for all positive integers. Thus N = {~1,2, 3 , . . . } .
Basis. F o r n = l w e h a v e l = 12.
Inductive Step. Assuming S ( 1 ) , . . . , S(n) are true [in particular, that S(n)
SEe. 0.3 CONCEPTSFROMLOGIC 21
is true], we have
1-t-3-Jr-5+..-+[2n-- l]+[2(n+ 1)-- 1 ] : n 2 + 2 n + 1

: (n + 1)2
so that S(n + 1) must then also be true.

We thus conclude that S(n) is true for all positive integers.
The reader is referred to Section 0.5.5 for some methods of induction on

sets other than integers.
0.3.3. Logical Connectives
Often a statement (theorem) may read "P if and only if Q" or "P is
a necessary and sufficient condition for Q," where P and Q are themselves
statements. The terms/f, only if, necessary, and sufficient have precise mean-
ings in logic.
A logical connective is a symbol that can be used to create a statement
out of simpler statements. For example, and, or, not, implies are logical
connectives, not being a unary connective and the others binary connectives.
If P and Q are statements, then P and Q, P or Q, not P, and P implies Q are
also statements.
The symbol A is used to denote and, V to denote or, ,,~ to denote not,
and ~ to denote implies.
There are well-defined rules governing the truth or falsehood of a state-
ment containing logical connectives. For example, the statement P and Q is
true only when both P is true and Q is also true. We can summarize the
properties of a logical connective by a table, called a truth table, which dis-
plays the value of a composite statement in terms of the values of its compo-
nents. Figure 0.5 shows the truth table for the logical connectives and, or,
not and implies.
P Q PAQ PVQ ,.~P P---~Q
F F F F T T
F T F T T T
T F F T F F
T T T T F T
Fig. 0.5 Truth tables for and, or, not, and implies.
F r o m the table (Fig. 0.5) we see that P ~ Q is false only when P is true
and Q is false. It may seem a little odd that if P is false, then P implies Q
is always true, regardless of the value of Q. But in logic this is customary;

from falsehood as a hypothesis, anything follows.
We can now return to consideration of a statement of the form P if and
only if Q. This statement consists of two parts: P / f Q and P only if Q. It is
more common to state P / f Q as if Q then P, which is only another way of
saying Q implies P.
In fact the following five statements are equivalent:
(1) P implies Q.
(2) l f P then a.
(3) P only if Q.
(4) Q is a necessary condition for P.
(5) P is a sufficient condition for Q.
To show that the statement P if and only if Q is true we must show both
that Q implies P and that P implies Q. Thus, P if and only if Q is true exactly
when P and Q are either both true or both false.
There are several alternative methods of showing that the statement P
implies Q is always true. One method is to show that the statement not Q
implies not Pt is always true. The reader should verify that not Q implies
not P has exactly the same truth table as P implies Q. The statement not Q
implies not P is called the contrapositive of P implies Q.
One important technique of proof is proof by contradiction, sometimes
called the indirect proof or reductio ad absurdum. Here, to show that P implies
Q is true, we show that not Q and P implies falsehood is true. That is to say,
we assume that Q is not true, and if assuming that P is true we are able to
obtain a statement known to be false, then P implies Q must be true.
The converse of the statement if P then Q is if Q then P. The statement
P if and only if Q is often written if P then Q and conversely. Note that a
statement and its converse do not have the same truth table.
EXERCISES
DEFINITION
Propositional calculus is a good example of a mathematical system. Formally,
propositional calculus can be defined as a system ~ consisting of
(1) A set of primitive symbols,
(2) Rules for generating well-formed statements,
(3) A set of axioms, and
(4) Rules of inference.
tWe assume "not" takes precedence over "implies." Thus the proper phrasing of the
sentence is (not P) implies (not Q). In general, "not" takes precedence over "and," which
takes precedence over "or," which takes precedence over "implies."
EXERCISES 23
(1) The primitive symbols of g are ( , ) , ---~, ~ , and an infinite set of statement
letters al, a2, a3 . . . . . The symbol ~ can be thought of as implies and ~ as not.
(2) A well-formed statement is formed by one or more applications of the
following rules:
(a) A statement letter is a statement.
(b) If A and B are statements, then so are ( ~ A) and (A ~ B).
(3) Let A, B, and C be statements. The axioms of S are
AI: (A ~ ( n - - , A))
A2: ((A ~ (n ~ c)) ~ ((A ~ B) ~ (a ~ C)))
A3: ((,-, B ---~ ~ A) ~ ((~ B ~ A) ~ a))
(4) The rule of inference is modus ponens, i.e., from' the statements (A -----~B)
and A we can infer the statement B.
We shall leave out parentheses wherever possible. The statement a ---~ a is a
theorem of S and has as p r o o f the sequence of statements
(i) (a ~ ((a ---~ a) ~ a)) ~ ((a ---~ (a ---~ a)) --~ (a --~ a)) from A2 with
A =a, B=(a~a), and C = a .
(ii) a ~ ((a ~ a) ~ a) from A1.
(iii) (a ~ (a ~ a)) ---~ (a ~ a) by modus ponens from (i) and (ii).
(iv) a ~ (a ~ a) from A1.
(v) a ---~ a by modus ponens from (iii) and (iv).
"0.3.1. Prove that ( ~ a --~ a) ~ a is a theorem of S.
0.3.2. A tautology is a statement that is true for all possible truth values of
the statement variables. Show that every theorem of S is a tautology.
Hint: Prove the theorem by induction on the n u m b e r of steps necessary
to obtain the theorem.
**0.3.3. Prove the converse of Exercise 0.3.2, i.e., that every tautology is a theorem.
Thus a simple m e t h o d to determine whether a statement of propositional
calculus is a theorem is to determine whether that statement is a tau-
tology.
0.3.4. Give the truth table for the statement if P then if Q then R.
DEFINITION
Boolean algebra can be interpreted as a system for manipulating
truth-valued variables using logical connectives informally interpreted
as and, or, and not. Formally, a Boolean algebra is a set B t o g e t h e r with
operations • (and), + (or), and - (not). The axioms of Boolean algebra
are the following: F o r all a, b, and c in B,
(1) a + ( b + c ) = ( a + b ) + c (associativity)
a - ( b . c) = ( a - b ) . c.
(2) a + b = b . + a (commutativity)
a.b =b.a.
(3) a . (b + c) = ( a - b ) + ( a . c) (distributivity)
a + ( b . c ) = ( a + b ) . ( a + c).
In addition, there are two distinguished members of B, 0 and 1 (in

the most c o m m o n Boolean algebra, these are the only members of B,
representing falsehood and truth, respectively), with the following laws:
(4) a + 0 = a
a.l=a.
(5) a + a = l
a.d =0.
The rule of inference is substitution of equals for equals.
*0.3.5. Show that the following statements are theorems in any Boolean algebra:
(a) 0 = i.
(b) a + ( b . a ) = a + b .
(c) a = a .
What are the informal interpretations of these theorems ?
0.3.6. Let A be a set. Show that (P(A) is a Boolean algebra if + , . , and - are
u , N, and complementation with respect to the universe A.
**0.3.7. Let B be a Boolean algebra where ~ B = n. Show that n = 2 m for some
integer m.
0.3.8. Prove by induction that
n(n + 1)
l + 2 + . . . + n = ~
0.3.9. Prove by induction that
(1 + 2 + . . . +n) z =13 +2 3 +... + ns
"0.3.10. What is wrong with the following?

THEOREM
All marbles have the same color.
Proof. Let A be any set of n marbles, n > 1. We shall "prove" by
induction on n that all marbles in A have the same color.
Basis. If n = 1, all marbles in A are clearly of the same color.
Inductive Step. Assume that if A is any set of n marbles, then all
marbles in A are the same color. Let A' be a set of n + 1 marbles, n > 1.
Remove one marble from A'. We are then left with a set A " of n marbles,
which, by the inductive hypothesis, has marbles all the same color.
Remove from A" a second marble and then add to A " the marble
originally removed. We again have a set of n marbles, which by the
inductive hypothesis has marbles the same color. Thus the two marbles
removed must have been the same color so that the set A' must contain
marbles all the same color. Thus, in any set of n marbles, all marbles
are the same color. D
"0.3.11. Let R be a well order on a set A and S(a) a statement about a in A.
Assume that if S(b) is true for all b ~ a such that b R a, then S(a) is true.
SEC. 0.4 PROCEDURES AND ALGORITHMS 25
Show that then S(a) is true for all a in A. Note that this is a generaliza-
tion of the principle of simple induction.
0.3.12. Show that there are only four unary logical connectives. Give their truth
tables.
0.3.13. Show that there are 16 binary logical connectives.
0.3.14. Two logical statements are equivalent if they have the same truth table.
Show that
(a) ~ (P A Q) is equivalent to ~ p v ~ Q.
(b) ~ (P V Q) is equivalent to ~ P A ~ Q.
0.3.15. Show that P --~ Q is equivalent to ~ Q ~ ~ P.
0.3.16. Show that P ~ Q is equivalent to P A ~ Q ~ false.
"0.3.17. A set of logical connectives is complete if for any logical statement we
can find an equivalent statement containing only those logical connec-
tives. Show that [A, ~ ] and [V, ~} are complete sets of logical con-
nectives.
BIBLIOGRAPHIC NOTES
Church [1956] and Mendelson [1968] give good treatments of mathematical

logic. Halmos [1963] gives a nice introduction to Boolean algebras.
0.4 PROCEDURES AND ALGORITHMS
The concept of algorithm is central to computing. The definition of

algorithm can be approached from a variety of directions. In this section we
shall discuss the term algorithm informally and hint at how a more formal
definition can be obtained.
0.4.1. Procedures
We shall begin with a slightly more general concept, that of a procedure.

Broadly speaking, a procedure consists of a finite set of instructions each of
which can be mechanically executed in a fixed a m o u n t of time with a fixed
a m o u n t of effort. A procedure can have any number of inputs and outputs.
To be precise, we should define the terms instruction, input, and output.
However, we shall not go into the details of such a definition here since any
"reasonable" definition is adequate for our needs.
A good example of a procedure is a machine language computer program.
A program consists of a finite number of machine instructions, and each
instruction usually requires a fixed amount of computation. However, pro-
cedures in the form of computer programs may often be very difficult to
understand, so a more descriptive notation wit! be used in this book. The
26 MATHEMATICALPRELIMINARIES CHAI'. 0
following example is representative of the notation we shall use to describe

procedures and algorithms.
Example 0.15
Consider Euclid's algorithm to determine the greatest common divisor

of two positive integers p and q.
Procedure 1. Euclidean algorithm.
lnput, p and q, positive integers.
Output. g, the greatest common divisor o f p and q.
Method. Step 1: Let r be the remainder of p/q.
Step 2: If r = 0, set g = q and halt. Otherwise set p = q, then
q = r, and go to step 1. D
Let us see if procedure 1 qualifies as a procedure under our definition.

Procedure 1 certainly consists of a finite set of instructions (each step is
considered as one instruction) and has input and output. However, can each
instruction be mechanically executed with a fixed amount of effort ?
Strictly speaking, the answer to this question is no, because i f p and q
are sufficiently large, the computation of the remainder of p/q may require
an amount of effort that is proportional in some way to the size of p and q.
However, we could replace step 1 by a sequence of steps whose net effect
is to compute the remainder of p/q, although the amount of effort in each
step would be fixed and independent of the size ofp and q. (Thus the number
of times each step is executed is an increasing function of the size ofp and q.)
These steps, for example, could implement the customary paper-and-pencil
method of doing integer division.
Thus we shall permit a step of a procedure to be a procedure in itself.
So under this liberalized notion of procedure, procedure 1 qualifies as a pro-
cedure.
In general, it is convenient to assume that integers are basic entities, and
we shall do so. Any integer can be stored in one memory cell, and any integer
arithmetic operation can be perforr~ed in one step. This is a fair assumption
only if the integers are less than 2k, where k is the number of bits in a com-
puter word, as often happens in practice. However, the reader should bear
in mind. the additional effort necessary to handle integers of arbitrary size
when the elementary steps handle only integers of bounded size.
We must now face perhaps the most important consideration--proving
that the procedure does what it is supposed to do. For each pair of integers
p and q, does procedure 1 in fact compute g to be the greatest common
divisor of p and q ? The answer is yes, but we shall leave the proof of this
particular assertion to the Exercises.
We might note in passing, however, that one useful technique of proof
for showing that procedures work as intended is induction on the number

of steps taken.
0.4.2. Algorithms
We shall now place an all-important restriction on a procedure to obtain

what is known as an algorithm.
DEFINITION
A procedure halts on a given input if there is a finite number t such theft
after executing t (not necessarily different) elementary instructions of the
procedure, either there is no instruction to be executed next or a "halt"
instruction was last executed. A procedure which halts on all inputs is called
an algorithm.
Example 0.16
Consider the procedure of Example 0.15. We observe that steps 1 and 2
must be executed alternately. After step 1, step 2 must be executed. After
step 2, step 1 may be executed, or there may be no next step; i.e., the proce-
dure halts. We can prove that for every input p and q, the procedure halts
after at most 2q steps,t and that thus the procedure is an algorithm. The
proof turns on observing that the value r computed in step 1 is less than
the value of q, and that, hence, successive values of q when step 1 is executed
form a monotonically decreasing sequence. Thus, by the qth time step 2 is
executed, r, which cannot be negative and is less than the current value of q,
must attain the value zero. When r = 0, the procedure halts. [3
There are several reasons why a procedure may fail to halt on some inputs.
It is possible that a procedure can get into an infinite loop under certain
conditions. For example, if a procedure contained the instruction
Step I: If x = 0, then go to Step 1, else halt,
then for x = 0 the procedure would never halt. Variations on this situation
are countless.
Our interest will be almost exclusively in algorithms. We shall be inter-
ested not only in proving that algorithms are correct, but also in evaluating
algorithms. The two main criteria for evaluating how well algorithms perform
will be
(1) The number of elementary mechanical operations executed as a func-
tion of the size of the input (time complexity), and
(2) How large an auxiliary memory is required to hold intermediate
tin fact, 4 logs q is an upper bound on the number of steps executed for q > 1. We
leave this as an exercise.
results that arise during the execution, again as a function of the size of the
input (space complexity).
Example 0.17
in Example 0.16 we saw that the number of steps of procedure 1 (Example

0.15) that would be executed with input (p, q) is bounded above by 2q.
The amount of memory used is one cell for each of p, q, and r, assuming that
any integer can be stored in a single cell. If we assume that the amount of
memory needed to store an integer depends on the length of the binary
representation of that integer, the amount of memory needed is proportional
to log 2 n, where n is the maximum of inputs p and q. [Z]
0.4.3. Recursive Functions
A procedure defines a mapping from the set of all allowable inputs to

a set of outputs. The mapping defined by a procedure is called a partial
recursivefunc-tion or recursivefunction. If the procedure is an algorithm, then
the mapping is called a total recursive function.
A procedure can also be used to define a language. We could have a pro-
cedure to which we can present an arbitrary string x. After some computa-
tion, the procedure would output "yes" when string x is in the language.
If x is not in the language, then the procedure may halt and say "no" or the
procedure may never halt.
This procedure would then define a language L as the set of input strings
for which the procedure has output "yes." The behavior of the procedure
on a string not in the language is not acceptable from a practical point of
view. If the procedure has not halted after some length of time on an input
x, we would not know whether x was in the language but the procedure had
not finished computing or whether x was not in the language and the proce-
dure would never terminate.
If we had used an algorithm to define a language, then the algorithm
would halt on all inputs. Consequently, patience is justified with an algorithm
in that we know that if we wait long enough, the algorithm will eventually
halt and say either "yes" or "no."
A set which can be defined by a procedure is said to be recursively enu-
merable. A set which can be defined by an algorithm is called recursive.
If we use more precise definitions, then we can rigorously show that
there are sets which are not recursivety enumerable. We can also show that
there are recursively enumerable sets which are not recursive.
We can state this in another way. There are mappings which cannot be
specified by any procedure. There are also mappings which can be specified
by a procedure but which cannot be specified by an algorithm.
We shall see that these concepts have great underlying significance for
a theory of programming. In Section 0.4.5 we shall give an example of

a procedure for which it can be shown that there is no equivalent algorithm.
0.4.4. Specification of Procedures
In the previous section we informally defined what we meant by procedure

and algorithm. It is possible to give a rigorous definition of these terms in
a variety of formalisms. In fact there are a large number of formal notations
for describing procedures. These notations include
(1) Turing machines [Turing, 1936-1937].
(2) Chomsky type 0 grammars [Chomsky, 1959a and 1963].
(3) Markov algorithms [Markov, 1951].
(4) Lambda calculus [Church, 1941].
(5) Post systems [Post, 1943].
(6) Tag systems [Post, 1965].
(7) Most programming languages [Sammet, 1969].
This list can be extended readily. The important point to be made here
is that it is possible to simulate a procedure written in one of these notations
by means of a procedure written in any other of these notations. In this
sense all these notations are equivalent.
Many years ago the logicians Church and Turing hypothesized that any
computational process which could be reasonably called a procedure could
be simulated by a Turing machine. This hypothesis is known as the Church-
Turing thesis and has been generally accepted. Thus the most general class
of sets that we would wish to deal with in a practical way would be included
in the class of recursively enumerable sets.
Most programming languages, at least in principle, have the capability
of specifying any procedure. In Chapter 11 (Volume 2) we shall see what
consequences this capability produces when we attempt to optimize pro-
grams.
We shall not discuss the details of these formalisms for procedures here,
although some of them appear in the exercises. Minsky [1967] gives a very
readable introduction to this topic.
In our book we shall use the rather informal notation for describing
procedures and algorithms that we have already seen.
0.4.5. Problems
We shall use the word problem in a rather specific way in this book.
DEFINITION
A problem (or question) is a statement (predicate) which is either true or

false, depending on the value of some number of unknowns of designated
type in the statement. A problem is usually presented as a question, and

we say the answer to the problem is "yes" if the statement is true and "no"
if the statement is false.
Example 0.18
An example of a problem is "x is less than y, for integers x and y." More
colloquially, we can express the statement in question form and delete men-
tion of the type of x and y: "Is x less than y ?"
DEFINITION
An instance of a problem is a set of allowable values for its unknowns.
For example, the instances of the problem of Example 0.18 are ordered
pairs of integers.
A mapping from the set of instances of a problem to ~yes, no} is called
a solution to the problem. If this mapping can be specified by an algorithm,
then the problem is said to be (recursively) decidable or solvable. If no
algorithm exists to specify this mapping, then the problem is said to be
(recursively) undecidable or unsolvable.
One of the remarkable achievements of twentieth-century mathematics
was the discovery of problems that are undecidable. We shall see later that
undecidable problems seriously hamper the development of a broadly appli-
cable theory of computation.
Example 0.19
Let us discuss the particular problem "Is procedure P an algorithm ?"
Its analysis will go a long way toward exhibiting why some problems are
undecidable. First, we must assume that all procedures are specified in some
formal system such as those mentioned earlier in this section.
It appears that every formal specification language for procedures admits
only a countable number of procedures. While we cannot prove this in gen-
eral, we give one example, the formalism for representing absolute machine
language programs, and leave the other mentioned specifications for the
Exercises. Any absolute machine language program is a finite sequence of
O's and l's (which we imagine are grouped 32, 36, 48, or some number to
a machine word).
Suppose that we have a string of O's and l's representing a machine lan-
guage program. We can assign an integer to this program by giving its posi-
tion in some well ordering of all strings of O's and l's. One such ordering
can be obtained by ordering the strings of O's and l's in terms of increasing
length and lexicographically ordering strings of equal length by treating each
string as a binary number. Since there are only a finite number of strings of
any length, every string in {~0, 1~* is thus mapped to some integer. The first
SEC. 0.4 PROCEDURES A N D ALGORITHMS 31
few pairs in this bijection are
Integer String
1 e
2 0
3 1
4 00
5 01
6 10
7 11
8 000
9 001
In this fashion we see that for each machine language program we can find
a unique integer and that for each integer we can find a certain machine
language program.
It seems that no matter what formalism for specifying procedures is taken,
we shall always be able to find a one-to-one correspondence between pro-
cedures and integers. Thus it makes sense to talk about the ith procedure in
any given formalism for specifying procedures. Moreover, the correspon-
dence between procedures and integers is sufficiently simple that one can,
given an integer i, write out the ith procedure, or given a procedure, find its
corresponding number.
Let us suppose that there is a procedure Pj which is an algorithm and
takes as input a specification of a procedure in our formalism and returns
the answer "yes" if and only if its input is an algorithm. All known formalisms
for procedure specification have the property that procedures can be com-
bined in certain simple ways. In particular, given the hypothetical procedure
(algorithm) Pj, we could construct an algorithm Pk to work as follows:
ALGORITHM Pk
lnput. Any procedure P which requires one input.

Output.
(1) "No" if ( a ) P is not an algorithm or ( b ) P is an algorithm and
P(P) = "yes."
(2) "Yes" otherwise.
The notation P(P) means that we are applying procedure P to its own
specification as input.
Method.
(1) If Pj(P) = "yes," then go to step (2). Otherwise output "no" and halt.
(2) If the input P is an algorithm and P takes a procedure specification
as input and gives "yes" or "no" as output, Pk applies P to itself (P) as input.
(We assume that procedure specifications are such that these questions about
input and output forms can be ascertained by inspection. The assumption
is true in known cases.)
(3) Pk gives output "no" or "yes" if P gives output "yes" or "no," respec-
tively.
We see that Pk is an algorithm, on the assumption that P~ is an algorithm.
Also Pk requires one input. But what does PE do when its input is itself?
Presumably, Pj determines that Pk is an algorithm [i.e., Py(Pk)= "yes"].
Pk then simulates itself on itself. But now Pk cannot give an output that is
consistent. If Pk determines that this simulation gives "yes" as output, Pk
gives "no" as output. But Pk just determined that it gave "yes" when applied
to itself. A similar paradox occurs if Pk finds that the simulation gives "no."
We must conclude that it is fallacious to assume that the algorithm Pj exists,
and thus the question "Is P an algorithm ?" is not decidable for any of the
known procedure formalisms. [1
We should emphasize that a problem is decidable if and only if there is

an algorithm which will take as input an arbitrary instance of that problem
and give the answer yes or no. Given a specific instance of a problem, we
are often able to answer yes or no for that specific instance. This does not
necessarily make the problem decidable. We must be able to give a uniform
algorithm which will work for all instances of the problem before we can
say that the problem is decidable.
As an additional caveat, we should point out that the encoding of the
instances of a problem is vitally important. Normally a "standard" encoding
(one that can be mapped by means of an algorithm into a Turing machine
specification) is assumed. If nonstandard encodings are used, then problems
which are normally undecidable can become decidable. In such cases, how-
ever, there will be no algorithm to go from the standard encoding to the
nonstandard. (See Exercise 0.4.21.)
0.4.6. Post's Correspondence Problem
In this section we shall introduce one of the paradigm undecidable prob-

lems, called Post's correspondence problem. Later in the book we shall use
this problem to show that other problems are undecidable.
DEFINITION
An instance of Post's correspondence problem over alphabet E is a finite

set of pairs in E ÷ × E ÷ (i.e., a set of pairs of nonempty strings over E).
The problem is to determine if there exists a finite sequence of (not neces-
sarily distinct) pairs (xl, yl), (x2, Y2),..., (xm, Ym) such that x~x2...x= =
EXERCISES 33
YlYz "'" Ym" We shall call such a sequence a viable sequence for this instance
of Post's correspondence problem. We shall often use xlx2...xm to represent
the viable sequence.
Example 0.20
Consider the following instance of Post's correspondence problem over
{a, b}:
{(abbb, b), (a, aab), (ba, b)}
The sequence (a, aab), (a, aab), (ba, b), (abbb, b) is viable, since
(a)(a)(ba)(abbb) = (aab)(aab)(b)(b).
The instance {(ab, aba), (aba, baa), (baa, aa)} of Post's correspondence
problem has no viable sequences, since any such sequence must begin with
the pair (ab, aba), and from that point on, the total number of a's in the
first components of the pairs in the sequence will always be less than the num-
ber of a's in the second components. D
There is a procedure which is in a sense a "solution" to Post's correspon-

dence problem. Namely, one can linearly order all possible sequences of
pairs of strings that can be constructed from a given instance of the problem.
One can then proceed to test each sequence to see if that sequence is viable.
On encountering the first viable sequence, the procedure would halt and
report yes. Otherwise the procedure would continue to operate forever.
However, there is no algorithm to solve Post's correspondence problem,
for one can show that if there were such an algorithm, then one could solve
the halting problem for Turing machines (Exercise 0.4.22)--but the halting
problem for Turing machines is undecidable (Exercise 0.4.14). '
EXERCISES
0.4.1. A perfect number is an integer which is equal to the sum of all its
divisors (including 1 but excluding the number itself). For example,
6= 1 +2+3 and 2 8 = 1 + 2 - t - 4 - t - 7 + 14 are the first two per-
fect numbers. (The next three are 496, 8128, and 33550336.) Construct
a procedure which has input i and output the ith perfect number. (At
present it is not known whether there are a finite or infinite number of
perfect numbers.)
0.4.2. Prove that the Euclidean algorithm of Example 0.15 is correct.
0.4.3. Provide an algorithm to add two n-digit decimal numbers. How much
time and space does the algorithm require as a function of n ? (See
Winograd [1965] for a discussion of the time complexity of addition.)
0.4.4. Provide an algorithm to multiply two n-digit decimal numbers. How
much time and space does the algorithm require ? (See Winograd [1967]
and Cook and Aanderaa [1969] for a discussion of the time com-
plexity of multiplication.)
0.4.5. Give an algorithm to multiply two integer-valued n by n matrices.
Assume that integer arithmetic operations can be done in one step.
What is the speed of your algorithm ? If it is proportional to n 3 steps,
see Strassen [1969] for an asymptotically faster one.
0.4.6. Let L ~ [a,b}*. The characteristic function for L is a mapping
.fL:Z---~ {0, 1 }, where Z is the set of nonnegative integers, such that
.fL (i) = 1 if the i th string in {a, b}* is in L and .fL (i) = 0 otherwise.
Show that L is recursively enumerable if and only if .fL is a partial
recursive function.
0.4.7. Show that L is a recursive set if and only if both L and L are recursively
enumerable.
0.4.8. Let P be a procedure which defines a recusively enumerable set
L ~_ {a, b}*. F r o m P construct a procedure P' which will generate all
and only all the elements of L. That is, the output of P' is to be an
infinite string of the form xl ~: xz ~ x3 ~ . . . , where L = {xl, x z , . . . ] .
Hint: Construct P' to apply i steps of procedure P to the jth string in
[a, b}* for all (i, j), in a reasonable order.
DEFINITION
A Turing machine consists of a finite set of states (Q), tape symbols
(F), and a function ~ (the next move function, i.e., program) that maps
a subset of Q × F to Q × F × [L, R]. A subset ~ ~ F is designated
as the set of input symbols and one symbol in I" - ~ is designated the
blank. One state, q0, is designated the start state. The Turing machine
operates on a tape, one square of which is pointed to by a tape head.
All but a finite number of squares hold the blank at any time. A con-
figuration of a Turing machine is a pair (q, ~ Ffl), where q is the state,
• fl is the nonblank portion of the tape, and r is a special symbol,
indicating that the tape head is at the square immediately to its right.
(F does not occupy a square.)
The next configuration after configuration (q, 0cFfl) is determined
by letting A be the symbol scanned by the tape head (the leftmost
symbol of fl, or the blank if fl = e) and finding ~(q, A). Suppose that
~(q, A) = (p, A', D), where p is a state, A' a tape symbol, and D = L
or R. Then the next configuration is (p, 0c' Ffl'), where ~'fl' is formed
from ~ r fl by replacing the A to the right of r by A' and then moving
the symbol in direction D (left if D = L, right if D = R). It may be
necessary to insert a blank at one end in order to move F.
The Turing machine can be thought of as a formalism for defining
procedures. Its input may be any finite length string w in E*. The
procedure is executed by starting with configuration (q0, r w) and repeat-
edly computing next configurations. If the Turing machine halts, i.e.,
it has reached a configuration for which no move is defined (recall that
may not be specified for all pairs in Q x F), then the output is the
nonblank portion of the Turing machine's tape.
EXERCISES 35
*0.4.9. Exhibit a Turing machine that, given an input w in {0, 1}*, will write
YES on its tape if w is a palindrome (i.e., w = wR) and write NO
otherwise, halting in either case.
'0.4.10. Assume that all Turing machines use a finite subset of some countable
set of symbols, ala2 . . . . for their states and tape symbols. Show that
there is a one-to-one correspondence between the integers and Turing
machines.
*'0.4.11. Show that there is no Turing machine which halts on all inputs (i.e.,
algorithm) and determines, given integer i written in binary on its tape,
whether the ith Turing machine halts. (See Exercise 0.4.10.)
'0.4.12. Let al, a2 . . . . be a countable set of symbols. Show that the set of
finite-length strings over these symbols is countable.
'0.4.13. Informally describe a Turing machine which takes a pair of integers
i and j as-input, and halts if and only if the ith Turing machine halts
when given the jth string (as in Exercise 0.4.12) as input. Such a Turing
machine is called universal.
*'0.4.14. Show that there exists no Turing machine which always halts, takes
input (i,]) a pair of integers, and prints YES on its tape if Turing
machine i halts with input j and NO otherwise. Hint: Assume such
a Turing machine existed, and derive a contradiction as in Example
0.19. The existence of a universal Turing machine is useful in many
proofs.
*'0.4.15. Show that there is no Turing machine (not necessarily an algorithm)
which determines whether an arbitrary Turing machine is an algorithm.
Note that this statement is stronger than Exercise 0.4.14, where we
essentially showed that no such Turing machine which always halts
exists.
"0.4.16. Show that it is undecidable whether a given Turing machine halts when
started with blank tape.
0.4.17. Show that the problem of determining whether a statement is a theorem
in propositional calculus is decidable. Hint: See Exercises 0.3.2 and
0.3.3.
0.4.18. Show that the problem of deciding whether a string is in a particular
recursive set is decidable.
0.4.19. Does Post's correspondence problem have a viable sequence in the
following instances ?
(a) (01,011), (10, 000), (00, 0).
(b) (1, 11), (11,101), (101,011), (011, 1011).
How do you reconcile being able to answer this exercise with the fact
that Post's correspondence problem is undecidable ?
0.4.20. Show that Post's correspondence problem with strings restricted to be
over the alphabet [a} is decidable. How do you reconcile this result
with the und~idability of Post's correspondence problem ?
"0.4.21. Let P1, Pz . . . . be an enumeration of procedures in some formalism.

Define a new enumeration P~ , P' z , . . . as follows
(1) Let P~t-1 be the ith of P1, Pz . . . . which is not an algorithm.
(2) Let Pat be the ith of Pi, Pz,. • • which is an algorithm.
Then there is a simple algorithm to determine, given j, whether P~,
is an algorithm--just see whether j is even or odd. Moreover, each of
P1, Pz . . . . is P~ for some j. How do you reconcile the existence of this
one-to-one correspondence between integers and procedures with the
claims of Example 0.19 ?
• *0.4.22. Show that Post's correspondence problem is undecidable. H i n t : Given
a Turing machine, construct an instance of Post's problem which has
a viable sequence if and only if the Turing machine halts when started
with blank tape.
• 0.4.23. Show that the Euclidean algorithm in Example 0.15 halts after at most
4 logz q steps when started with inputs p and q, where q > 1.
DEFINITION
A variant of Post's correspondence problem is the partial corre-
spondence problem over alphabet E. An instance of the partial corre-
spondence problem is to determine, given a finite set of pairs in E ÷ × E+,
whether there exists for each m > 0 a sequence of not necessarily dis-
tinct pairs (xl, y~), (x2, Yz), • • •, (Xm, Ym) such that the first m symbols
of the stnng X xXz . . . Xm coincide with the first m symbols of
Y l Y 2 "'" ym,
• *0.4.24. Prove that the partial correspondence problem is undecidable.
BIBLIOGRAPHIC NOTES
Davis [1965] is a good anthology of many early papers in the study of proce-
dures and algorithms. Turing's paper [Turing, 1936-1937] in which Turing machines
first appear makes particularly interesting reading if one bears in mind that the
paper was written before modern electronic computers were conceived.
The study of recursive and partial recursive functions is part of a now well-
developed branch of mathematics called recursive function theory. Rogers [1967],
Kleene [1952], and Davis [1958] are good references in this subject.
Post's correspondence problem first appeared in Post [1947]. The partial cor-
respondence problem preceding Exercise 0.4.24 is from Knuth [1965].
Computational complexity is the study of algorithms from the point of view
of measuring the number of primitive operations (time complexity) or the amount
of auxiliary storage (space complexity) required to compute a given function.
Borodin [1970] and Hartmanis and Hopcroft [1971] give readable surveys of
this topic, and Irland and Fischer [1970] have compiled a bibliography on this
subject.
Solutions to many of the *'d exercises in this section can be found in Minsky
[1967] and Hopcroft and Ullman [1969].
SEC. 0.5 CONCEPTS FROM GRAPH THEORY 37
0.5. CONCEPTS FROM GRAPH THEORY
Graphs and trees provide convenient descriptions of many structures

that are useful in performing computations. In this section we shall examine
a number of concepts from graph theory which we shall use throughout
the remainder of our book.
0.5.1. Directed Graphs
Graphs can be directed or undirected and ordered or unordered. Our

primary interest will be in ordered and unordered directed graphs.
DEFINITION
An unordered directed graph G is a pair (A, R), where A is a set of elements

called nodes (or vertices) and R is a relation on A. Unless stated otherwise,
the term graph will mean directed graph.
Example 0.21
Let G = ( A , R ) , where A = { 1 , 2 , 3 , 4 } and R = { ( 1 , 1 ) , ( 1 , 2 ) , ( 2 , 3 ) ,
(2, 4), (3, 4), (4, 1), (4, 3)}. We can draw a picture of the graph G by number-
ing four points 1, 2, 3, 4 and drawing an arrow from point a to point b if
(a, b) is in R. Figure 0.6 shows a picture of this directed graph. [Z]
Fig. 0.6 Example of a directed graph.
A pair (a, b) in R is called an edge or arc of G. This edge is said to leave

node a and enter node b (or be incident upon node b). For example, (1, 2) is
an edge in the example graph. If (a, b) is an edge, we say that a is a prede-
cessor of b and b is a successor of a.
Loosely speaking, two graphs are the same if we can draw them to look
the same, independently of what names we give to the nodes. Formally, we
define equality of unordered directed graphs as follows.
DEFINITION
Let G1 = (At, R~) and G2 = (A2, R2) be graphs. We say G~ and G2 are
equal (or the same) if there is a bijectionf: A~ ---~ A2 such that a R~ b if and
only if f (a)Rzf(b). That is, there is an edge between nodes a and b in G a if

and only if there is an edge between their corresponding nodes in G2.
It is common to have certain information attached to the nodes and/or
edges of a graph. We call such information a labeling.
DEFINITION
Let (A, R) be a graph. A labeling of the graph is a pair of functions f

and g, where f, the node labeling, maps A to some set, and g, the edge labeling,
maps R to some (possibly distinct) set. Let G~ = (A 1, Rx) and G2 = (Az, R2)
be labeled graphs, with labelings (f~, gl) and (f2, g2), respectively. Then G1
and G z are equal labeled graphs if there is a bijection h" A i ~ A 2 such that
(1) a R1 b if and only if h(a) R2 h(b) (i.e., G1 and Gz are equal as unlabeled
graphs).
(2) fl (a) = f2(h(a)) (i.e., corresponding nodes have the same labels).
(3) ga((a, b ) ) = g2((h(a), h(b))) (i.e., corresponding edges have the same
label.)
In many cases, only the nodes or only the edges are labeled. These situa-
tions correspond, respectively, to f or g having a single element for its
range. In these cases, condition (2) or (3), respectively, is trivially satisfied.
Example 0.22
Let G, = ((a, b, c}, [(a, b), (b, c), (c, a)])and G 2 = ((0, 1, 2}, ((1, 0), (2, 1),
(0, 2)]). Let the labeling of G I be defined by f~(a) = fl(b) = X, f~(c) = Y,
g~((a,b))=g~((b,c))=ot, gl((c,a))=fl. Let the labeling of G2 be
fz(0) = f2(2) = X, f2(1) = Y, gz((0, 2)) = g2((2, 1)) = a, and gz((1, 0)) = ft.
G 1 and G z are shown in Fig. 0.7.
G~ and G z are equal. The correspondence is h(a) = O, h(b) = 2, h(c) = 1.
b
X
)
X X
) X
Y
(a) G1 (b) G2
Fig. 0.7 Equal labelled graphs.

DEFINITION
A sequence of nodes (a0, a t , . . . , a,), n ~ 1, is a path of length n from

node a0 to node a, if there is an edge which leaves node a~_ 1 and enters node
a~ for 1 ~ i < n. For example, (1, 2, 4, 3) is a path in Fig. 0.6. If there is
a path from node a 0 to node a,, we say that a, is accessible from a o.
A cycle (or circuit) is a path (a0, a t , . . . , a,) in which a0 = a,. In Fig.
0.6, (1, 1) is a cycle of length 1.
A directed graph is strongly connected if there is a path from a to b for
every pair of distinct nodes a and b.
Finally we introduce the concept of the degree of a node. The in-degree
of a node a is the number of edges entering a and the out-degree of a is the
number of edges leaving a.
0.5.2. Directed Acyclic Graphs
DEFINITION
A dag (short for directed acyclic graph) is a directed graph that has no
cycles. Figure 0.8 shows an example of a dag.
A node having in-degree 0 will be called a base node. One having out-
degree 0 is called a leaf In Fig. 0.8, nodes 1, 2, 3, and 4 are base nodes and
nodes 2, 4, 7, 8, and 9 are leaves.
Fig. 0.8 Example of a dag.
If (a, b) is an edge in a dag, a is called a direct ancestor of b, and b a direct

descendant of a.
If there is a path from node a to node b, then a is said to be an ancestor
of b and b a descendant of a. In Fig. 0.8, node 9 is a descendant of node 1 ;
node 1 is an ancestor of node 9.
Note that if R is a partial order on a set A, then (A, R) is a dag. More-
over, if we have a dag (A, R) and let R' be the relation "is a descendant of"
on A, then R' is a partial order on A.
0.5.3. Trees
A tree is a special type of dag and has many important applications in

compiler theory.
DEFINITION
An (oriented) tree T is a directed graph G = (A, R) with a specified node

r in A called the root such that
(1) r has in-degree 0,
(2) All other nodes of T have in-degree 1, and
(3) Every node in A is accessible from r.
Figure 0.9 provides an example of a tree with six nodes. The root is
Fig. 0.9 Example of a tree.
numbered 1. We shall follow the convention of drawing trees with the root
on top and having all arcs directed downward. Adopting this convention we
can omit the arrowheads.
THEOREM 0.3
A tree T has the following properties:
(1) T is acyclic.
(2) For each node in a tree there is a unique path from the root to that
node.
Proof. Exercise. E]
DEFINITION
A subtree of a tree T = (A, R) is any tree T' = (A', R') such that
(1) A' is nonempty and contained in A,
(2) R ' = A ' × A ' A R , and
(3) No node of A -- A' is a descendant of a node in A'.
For example,
is a subtree of the tree in Fig. 0.9. We say that the root of a subtree dominates
the subtree.
0.5.4. Ordered Graphs
DEFINITION
An ordered directed graph is a pair (A, R) where A is a set of vertices as
before and R is a set of linearly ordered lists of edges such that each element
of R is of the form ((a, b~), (a, b2),. • •, (a, b~)), where a is a distinct member
of A. This element would indicate that, for vertex a, there are n ares leaving
a, the first entering vertex b~, the second entering vertex b2, and so forth.
Example 0.23
Figure 0.10 shows a picture of an ordered directed graph. The linear
ordering on the arcs leaving a vertex is indicated by numbering the arcs
leaving a vertex by 1, 2 , . . . , n, where n is the out-degree of that vertex.
Fig. 0.10 Ordered directed graph.
The formal specification for Fig. 0.10 is (A, R), where A = [a, b, c} and
R = [((a, c), (a, b), (a, b), (a, a)), ((b, c))}. D
Notice that Fig. 0.10 is not a directed graph according to our definition,
since there are two arcs leaving node a and entering node b. (Recall that
in a set there is only one instance of each element.)
As for unordered graphs, we define the notions of labeling and equality
of ordered graphs.
DEFINITION
A labeling of an ordered graph G = (A, R) is a pair of mappings f and
g such that
(1) f : A ----~ S for some set S ( f labels the nodes), and
(2) g maps R to sequences of symbols from some set T such that g maps
((a, b l ) , . . . , (a, b,)) to a sequence of n symbols of T. (g labels the edges.)
Labeled graphs Gi = (A1, R1) and Gz = (As, R2) with labelings (fl, gl)
and (fz, g2), respectively, are equal if there exists a bijection h ' A 1 ~ A2
such that
(1) Ri contains ((a, b ~ ) , . . . , (a, b,)) if and only if R2 contains
((h(a), h ( b l ) ) , . . . , (h(a), h(b;))),
(2) f~fa) = f z ( h ( a ) ) for all a in A 1, and
(3) g l ( ( ( a , bl), . . . , (a, b,))) : g2((h(a), h(bl)) . . . . , (h(a), h(b,))).
Informally, two labeled ordered graphs are equal if there is a one-to-one
correspondence between nodes that preserves the node and edge labels. If
the labeling functions all have a range with one element, then the graph is
essentially unlabeled, and only condition (1) needs to be shown. Similarly,
only the node labeling or only the edge labeling may map to a single element,
and condition (2) or (3) will become trivial.
For each ordered graph (A, R), there is an underlying unordered graph
(A, R') formed by allowing R' to be the set of (a, b) such that there is a list
((a, bl) . . . . , (a, bn)) in R, and b = bi for some i, 1 _~ i < n.
An ordered dag is an ordered graph whose underlying graph is a dag.
An ordered tree is an ordered graph (A, R) whose underlying graph is
a tree, and such that if ((a, bl),. • •, (a, bn)) is in R, then b~ ~ bj, if i -~ j.
Unless otherwise stated, we shall assume that the direct descendants of
a node of an ordered dag or tree are always linearly ordered from left to
right in a diagram.
There is a great distinction between ordered graphs and unordered graphs
from the point of view of when two graphs are the same.
For example the two trees T~ and T2 in Fig. 0.11 are equivalent if T1 and
T1 7'2
Fig. 0.11 Two trees.
T2 are unordered. But if T~ and T2 are ordered, then T1 and T2 are not the
same.
0.5.5. Inductive Proofs Involving Dags
Many theorems about dags, and especially trees, can be proved by induc-
tion, but it is often not clear on what to base the induction. Theorems which
yield to this kind of proof are often of the form that something is true for
all, or a certain subset of, the nodes of the tree. Thus we must prove some-
thing about nodes of the tree, and we need some parameter of nodes such
that the inductive step can be proved.
Two such parameters are the depth of a node, the minimum path length
(or in the case of a tree, the path length) from a base node (root in the case
of a tree) to the given node, and the height (or level) of a node, the maximum
path length from the node to a leaf.
Another approach to inductions on finite ordered trees is to order the
nodes in some way and perform the induction on the position of the node
in that sequence. Two common orderings are defined below.
DEFINITION
Let T be a finite ordered tree. A preorder of the nodes of T is obtained

by applying step 1 recursively, beginning at the root of T.
Step 1: Let this application of step 1 be to node a. If a is a leaf, list node
a and halt. If a is not a leaf, let its direct descendants be a~, a 2 , . . . , a,. Then
list a and subsequently apply step 1 to a~, a 2 , . . . , a, in that order.
A postorder of T is formed by changing the last sentence of step 1 to read
"Apply step 1 to al, a2, • • •, a, in that order and then list a."
Example 0.24
Consider the ordered tree of Fig. 0.12. The preorder of the nodes is
123456789. The postorder is 342789651. [-]
Sometimes it is possible to perform an induction on the place that a node

has in some order, such as pre- or postorder. Examples of these forms of
induction appear throughout the book.
0.5.6. Linear Orders from Partial Orders
If we have a partial order R on a set A, often we wish to find a linear

order which is a superset of the partial order. This problem of embedding
a partial order in a linear order is called topological sorting.
Intuitively, topological sorting corresponds to taking a dag, which is in
effect a partial order, and squeezing the dag into a single column of nodes
Fig. 0.12 Ordered tree.
such that all edges point downward. The linear order is given by the position
of nodes in the column.
For example, under this type of transformation the dag of Fig. 0.8 could
look as shown in Fig. 0.13.
Fig. 0.13 Linear order from the dag of

Fig. 0.8.
Formally we say that R' is a linear order that embeds a partial order R on
a set A if R' is a linear order and R ~ R', i.e., a R b implies that a R' b for
all a, b in A. Given a partial order R, there are many linear orders that embed
R (Exercise 0.5.5). The following algorithm finds one such linear order.
ALGORITHM 0.1
Topological sort.
Input. A partial order R on a finite set A.
Output. A linear order R' on A such that R ~ R'.
Method. Since A is a finite set, we can represent the linear order R' on
A as a list a 1, a 2. . . . , a, such that as R' aj if i < j , and A = [ a l , . . . , a,}.
The following steps construct this sequence of elements:
(1) L e t i = I , A t = A , a n d R ~ = R .
(2) If At is empty, halt, and al, a 2 , . . . , a~_l is the desired linear order.
Otherwise, let at be an element in A~ such that a R~ a~ is false for all a ~ A~.
(3) Let A,.+t be A ; - {a,.} and Ri+ 1 be R, ~ (A,.+i × A,.+~). Then let i
be i + 1 and repeat step (2). I~
If we represent a partial order as a dag, then Algorithm 0. l has a p a r t i c u -
larly simple interpretation. At each step (A~, R,) is a dag and ai is a base
node of (A i, Ri). The dag (A~+~, R,.+~) is formed from (A,., R;) by deleting
node ai and all edges leaving a,.
Example 0.25
Let A = {a, b, c, d} and R ---- {(a, b), (a, c), (b, d), (c, d)}. Since a is the
only node in A such that a' R a is false for all a' E A, we must choose a 1 = a.
Then A z = {b, c, d} and R z = {(b, d), (c, d)}; we now choose either b or
c for a2. Let us choose az = b. Then A 3 = {c, d} and R3 = {(c, d)}. Continu-
ing, we find a3 = c and a 4 = d.
The complete linear order R' is {(a, b), (b, c), (c, d), (a, c), (b, d), (a, d)}.
THEOREM 0 . 4
Algorithm 0.1 produces a linear order R' which embeds the given partial
order R.
Proof. A simple inductive exercise.
0.5.7. Representations for Trees
A tree is a two-dimensional structure, but in many situations it is con-

venient to use only one-dimensional data structures. Consequently we are
interested in having one-dimensional representations for trees which have
all the information contained in the two-dimensional picture. What we mean
by this is that the two-dimensional picture can be recovered from the one-
dimensional representation.
Obviously one one-dimensional representation of a tree T = (A, R)
would be the sets A and R themselves.
But there are also other representations. For example, we can use nested
brackets to indicate the nodes at each depth of a tree. Recall that the depth
of a node in a tree is the length of the path from the root to that node. For
example, in Fig. 0.9, node 1 is at depth 0, node 3 is at depth 1, and node 6
is at depth 2. The depth of a tree is the length of the longest path. The tree of
Fig. 0.9 has depth 2.
Using brackets to indicate depth, the tree of Fig. 0.9 could be represented
as 1(2, 3(4, 5, 6)). We shall call this the left-bracketed representation, since
a subtree is represented by the expression appearing inside a balanced pair
of parentheses and the node which is the root of that subtree appears imme-
diately to the left of the left parenthesis.
DEFINITION
In general the left-bracketed representation of a tree T can be obtained by
applying the following recursive rules to T. The string lrep(T) denotes the
left-bracketed representation of tree T.
(l) If T has a root numbered a with subtrees T1, T 2 , . . . , Tk in order,
then lrep(T) = a(lrep(T1), l r e p ( T 2 ) , . . . , lrep(Tk)).
(2) If T has a root numbered a with no direct descendants, then
lrep(T) = a.
If we delete the parentheses from a left-bracketed representation of a tree,
we are left with a preorder of the nodes.
We can also obtain a right-bracketed representation for a tree T, rrep(T),
as follows:
(1) I f T h a s a root numbered a with subtrees T 1, T 2 , . . . , Tk, then rrep(T)
-- (rrep(T~), r r e p ( T 2 ) , . . . , rrep(Tk))a.
(2) If T has a root numbered a with no direct descendants, then
rrep(T) = a.
Thus rrep(T) for the tree of Fig. 0.12 would be ((3, 4)2, ((7, 8, 9)6)5)1.
In this representation, the direct ancestor is immediately to the right of
the first right parenthesis enclosing that node. Also, note that if we delete
the parentheses we are left with a postorder of the nodes.
Another representation of a tree is to list the direct ancestor of nodes
l, 2, . . . , n of a tree in that order. The root would be recognized by letting
its ancestor be 0.
Example 0.26
The tree shown in Fig. 0.14 would be represented by 0122441777. Here
0 in position 1 indicates that node 1 has "node 0" as its direct ancestor (i.e.,
node 1 is the root). The 1 in position 7 indicates that node 7 has direct ances-
tor 1.
CONCEPTS FROM GRAPH THEORY 47
sac. 0.5
Fig. 0.14 A tree.
0.5.8. Paths Through a Graph
In this section we shall outline a computationally efficient method of

computing the transitive closure of a relation R on a set A. If we view the
relation as an unordered graph (A, R), then the transitive closure of R is
equivalent to the set of pairs of nodes (a, b) such that there is a path from
node a to node b.
Another possible interpretation is to view the relation (or the unordered
graph) as a (square) Boolean matrix (that is, a matrix of O's and l's) called
an adjacency matrix, in which the entry in row i, column j is 1 if and only if
the element corresponding to row i is R-related to the element corresponding
to column j. Figure 0.15 shows the Boolean matrix M corresponding to the
1 2 3 4
1 1 0 0
0 0 1 1
0 0 0 1
1 0 i 1 0
Fig. 0.15 Boolean matrix for Fig. 0.6.
oo
graph of Fig. 0.6. If M is a Boolean matrix, then M + = ,=]=~M" (where M"

represents M Boolean multipliedt by itself n times) represents the transitive
"t'That is, use the usual formula for matrix multiplication with the Boolean operations
• and + for multiplication and addition, respectively.
closure of the relation represented by M. Thus the algorithm could also be

used as a method of computing M +.
For Fig. 0.15, M + would be a matrix of all l's.
Actually we shall give a slightly more general algorithm here. We assume
that we have an unordered directed graph in which there is a nonnegative
cost ctj associated with an edge from node i to node j. (If there is no edge
from node i to node j, ctj is infinite.) The algorithm will compute the mini-
m u m cost of a path between any pair of nodes. The case in which we wish
to compute only the transitive closure of a relation R over { a l , . . . , a,} is
expressed by letting ctj = 0 if at R aj and c~i = oo otherwise.
ALCORITHM 0.2
Minimum cost of paths through a graph.
Input. A graph with n nodes numbered 1, 2 , . . . , n and a cost function
c~j for 1 < i, j < n with ctj >_ 0 for all i and j.
Output. An n × n matrix M = [ m j , with m~j the lowest cost of any
path from node i to node j, for all i and j.
Method.
(1) Set mi~ = c~j for all i and j such that 1 < i, j ~ n.
(2) S e t k = 1.
(3) For all i and j, if mr1 > m~k + me j, set m~j to m~k -t- mk i-
(4) If k < n, increase k by 1 and go to step (3). If k = n, halt. [Z]
The heart of Algorithm 0.2 is step (3), in which we deduce whether the
current cost of going from node i to node j can be made smaller by first
going from node i to node k and then from node k to node j.
Since step (3) is executed once for all possible values of i, j, and k, Algorithm
0.2 is n 3 in time complexity.
It is not immediately clear that Algorithm 0.2 does produce the minimum
cost of any path from node i to j. Thus we should prove that Algorithm 0.2
does what it claims.
THEOREM 0.5
When Algorithm 0.2 terminates, mii is the smallest value expressible as

e,,,. + . - . + c,~_lv~ such that v 1 -- i and Vm = j. (This sum is the cost of
the path v~, v2, • • •, vm from node i to node j.)
P r o o f To prove the t h e o r e m we shall prove the following statement by

induction on 1, the value o f k in step (3) o f the algorithm.
Statement (0.5.1). After step (3) is executed with k = / , mr1 has the small-
est value expressible as a sum o f the f o r m c,~,, + . . . + c~..,~., where v~ = i,
v , = j, a n d none of v 2 , . . . , Vm_ ~ is greater t h a n I.
We shall call this m i n i m u m value the correct value o f m~j with k = 1.
This value is the cost of a cheapest p a t h from node i to node j which does
not pass t h r o u g h a node whose n u m b e r is higher than l.
Basis. Let us consider the initial condition, which we can represent by

letting l = 0. [If you like, we can think of step (1) as step (3) with k = 0.]
W h e n l -- 0, m = 2, so mtj = ctj, which is the correct initial value.
Inductive Step. Assume that statement (0.5.1) is true for all l < 10. Let us
consider the value o f mt~ after step (3) has been executed with k = lo.
Suppose that the m i n i m u m sum C.~v, + . " + c.._,., for mil with k = l0
is such that no vp, 2 < p < m -- 1, is equal to lo. F r o m the inductive hypothe-
sis e.,~,-+- .... + e.._,., is the correct value of mtj with k = 10 - - I , so
c.,~. + . . . + c~..,., is also the correct value o f mtj with k -- 10.
N o w suppose that the m i n i m u m sum stj = e.,~, -+- . . . + c~._,v, for mr1
with k -- 1o is such that vp = l0 for some 2 ~ p < m -- 1. T h a t is, stj is the
cost of the p a t h v 1, v2, • • •, vm. W e can assume that there is no node vq on
this path, q #: p, such that v~ is 10. Otherwise the p a t h vl, v 2 , . . . , Vm contains
a cycle, a n d we can delete at least one term from the sum c~,~, + • • • + c~._,..
without increasing the value o f the sum stj. Thus we can always find a sum
for stj in which vp = 10 for only one value of p, 2 ~ p < m -- 1.
Let us assume that 2 < p < m - - 1. The c a s e s p = 2 a n d p--m-- 1
are left to the reader. Let us consider the sum s t . , - - c.,., -+- . . . + c.._,~.
a n d s..j = e....., + . . . + c.._,.. (the costs of the paths f r o m node i to node
v p a n d f r o m node v p to node j in the sum s~). F r o m the inductive hypothesis
we can assume that st.. is the correct value for m;.. with k = l0 -- 1 a n d
that s~.j is the correct value for mv.j with k = l0 -- 1. Thus when step (3)
is executed with k = l o, mij is correctly given the value mi., + m~.j.
W e have thus shown that statement (0.5.1) is true for all 1. W h e n l = n,
statement (0.5.1) states that at the end of Algorithm 0.2, m~ has the lowest
possible value. [-7
A c o m m o n special case o f finding m i n i m u m cost paths t h r o u g h a g r a p h

occurs when we w a n t to determine the set of nodes which are accessible f r o m
a given node. Equivalently, given a relation R on set A, with a in A, we want
to find the set of b in A such that a R + b, where R + is the transitive closure of
R. F o r this p u r p o s e we can use the following algorithm o f quadratic time
complexity.
ALGORITHM 0.3
Finding the set of nodes accessible from a given node of a directed graph.
Input. G r a p h (A, R), with A a finite set and a in A.
Output. The set of nodes b in A such that a R* b.
Method. We form a list L and update it repeatedly. We shall also mark
m e m b e r s of A during the course of the algorithm. Initially, all members of
A are u n m a r k e d . The nodes m a r k e d will b e those accessible f r o m a.
(1) Set L = a and m a r k a.
(2) If L is empty, halt. Otherwise, let b be the first element on list L.
Delete b f r o m L.
(3) F o r all c in A such that b R c and c is u n m a r k e d , add c to the b o t t o m
of list L, m a r k c, and go to step (2). [~]
We leave a p r o o f that Algorithm 0.3 works correctly to the Exercises.
EXERCISES
0.5.1. What is the maximum number of edges a dag with n nodes can have ?
0.5.2. Prove Theorem 0.3.
0.5.3. Give the pre- and postorders for the tree of Fig. 0.14. Give left- and
right-bracketed representations for the tree.
*0.5.4. (a) Design an algorithm that will map a left-bracketed representation
of a tree into a right-bracketed representation.
(b) Design an algorithm that will map a right-bracketed representation
of a tree into a left-bracketed representation.
0.5.5. How many linear orders embed the partial order of the dag of Fig. 0.8 ?
0.5.6. Complete the proof of Theorem 0.5.
0.5.7. Give upper bounds on the time and space necessary to implement
Algorithm 0.1. Assume that one memory cell is needed to store any
node name or integer, and that one elementary step is needed for each
of a reasonable set of primitive operations, including the arithmetic
operations and examination or alteration of a cell in an array indexed
by a known integer.
0.5.8. Let A = (a, b, c, d} and R = ((a, b), (b, c), (a, c), (b, d)}. Find a linear
order R' such that R ~ R'. How many such linear orders are there ?
DEFINITION
An undirected graph G is a triple (A, E, f ) where A is a set of nodes,
E is a set of edge names and f is a mapping from E to the set of un-
ordered pairs of node~. I f f ( e ) = ~a, b], then we mean that edge e con-
nects nodes a and b. A path in an undirected graph is a sequence of
EXERCISES 51
nodes a0, al, a2 . . . . . an such that there is an edge connecting at_l and
at for 1 _~ i ~ n. An undirected graph is c o n n e c t e d if there is a path
between every pair of distinct nodes.
DEFINITION
An undirected tree can be defined recursively as follows. An
undirected tree is a set of one or more nodes with one distinguished
node r called the root of the tree. The remaining nodes can be parti-
tioned into zero or more sets T~ . . . . , Tk each of which forms a tree.
The trees T~ . . . . . Tk are called the subtrees of the root and an
undirected edge connects r with all of and only the subtree roots.
A spanning tree for a connectefi undirected graph G is a tree which
contains all nodes of N.
0.5.9. Provide an algorithm to construct a spanning tree for a connected
undirected graph.
0.5.10. Let (A,R) be an unordered graph such that A = {1,2,3, 4} and
R = {(1, 2), (2, 3), (4, 1), (4, 3)}. Find R ÷, the transitive closure of R.
Let the adjacency matrix for R be M. Compute M ÷ and show that
M + is the adjacency matrix for (A, R+).
0.5.11. Show that Algorithm 0.2 takes time proportional to n 3 in basic steps
similar to those mentioned in Exercise 0.5.7.
0.5.12. Prove that Algorithm 0.3 marks node b if and only if a R ÷ b.
0.5.13. Show that Algorithm 0.3 takes time proportional to the maximum of
# A and # R .
0.5.14. The following are three unordered directed graphs. Which two are the
same ?
G1 = ({a, b, c}, [(a, b), (b, c), (c, a)))
G2 = ({a, b, c), {(b, a), (a, c), (b, c)))
G3 = ({a, b, c], {(c, b), (c, a), (b, a)])
0.5.15. The following are three ordered directed graphs with only nodes labeled.
Which two are the same 9.
G1 = ({a, b, cl, {((a, b), (a, c)), ((b, a), (b, c)), ((c, b))})
with labeling ll(a) = X, ll(b) = Z , and It(c) = Y.
G2 = ([a, b, c}, [((a, c)), ((b, c), (b, a)), ((c, b), (c, a))})
with labeling/2(a) = Y, lz(b) = X, and/2(c) = Z.
G3 = ([a, b, c}, [((a, c), (a, b)), ((b, c)), ((c, a), (c, b))})
with labeling/3(a) = Y,/3(b) = X, and/3(c) = Z.


0.5.17. Provide an algorithm to determine whether an undirected graph is
connected.
"0.5.18. Provide an algorithm to determine whether two graphs are equal.
"0.5.19. Provide an efficient algorithm to determine whether two nodes of a
tree are on the same path. Hint: Consider preordering the nodes.
**0.5.20. Provide an efficient algorithm to determine the first common ancestor
of two nodes of a tree.
Programming Exercises
0.5.21. Write a program that will construct an adjacency matrix from a linked
list representation of a graph.
0.5.22. Write a program that will construct a linked list representation of a
graph from an adjacency matrix.
0.5.23. Write programs to impIement Algorithms 0.1, 0.2, and 0.3.
BIBLIOGRAPHIC NOTES
Graphs are an ancient and honorable part of mathematics. Harary [1969],

Ore [1962], and Berge [1958] discuss the theory of graphs. Knuth [1968] is a good
source for techniques for manipulating graphs and trees inside computers.
Algorithm 0.2 is Warshalrs algorithm as given in Floyd [1962a]. One interesting
result on computing the transitive closure of a relation is found in Munro [1971],
where it is shown that the transitive closure of a relation can be computed in the
time required to compute the product of two matrices over a Boolean ring. Thus,
using Strassen [1969], the time complexity of transitive closure is no greater than
n 2-81, not n 3, as Algorithm 0.2 takes.
AN INTRODUCTION
1 TO COMPILING
This book considers the problems involved in mapping one representation

of a procedure into another. The most common occurrence of this mapping
is during the compilation of a source program, written in a high-level pro-
gramming language, into object code for a particular digital computer.
We shall discuss algorithm-translating techniques which are applicable
to the design of compilers and other language-processing devices. To put
these techniques into perspective, in this chapter we shall summarize some
of the salient aspects of the compiling process and mention certain other
areas in which parsing or translation plays a major role.
Like the previous chapter, those who have a prior familiarity with the
material, in this case compilers, will find the discussion quite elementary.
These readers can skip this chapter or merely skim it for terminology.
1.1. PROGRAMMING LANGUAGES
In this section we shall briefly discuss the notion of a programming lan-

guage. We shall then touch on the problems inherent in the specification of
a programming language and in the design of a translator for such a language.
1.1.1. Specification of Programming Languages
The basic machine language operations of a digital computer are invari-

ably very primitive compared with the complex functions that occur in
mathematics, engineering, and other disciplines. Although any function that
can be specified as a procedure can be implemented as a sequence of exceed-
ingly simple machine language instructions, for most applications it is much
53
54 AN INTRODUCTION TO COMPILING CHAP. 1
preferable to use a higher-level language whose primitive instructions

approximate the type of operations that occur i n t h e application. For exam-
ple, if matrix operations are being performed, it is more convenient to
write an instruction of the form
A=B,C
to represent the fact that A is a matrix obtained by multiplying matrices B

and C together rather than a long sequence of machine language operations
whose intent is the same.
Programming languages can alleviate much of the drudgery of program-
ming in machine language, but they also introduce a number of new problems
of their own. Of course, since computers can still "understand" only machine
language, a program written in a high-level language must be ultimately
translated into machine language. The device performing this translation has
become known as a compiler.
Another problem concerned with programming languages is the specifi-
cation of the language itself. In a minimal specification of a programming
language we need to define
(1) The set of symbols which can be used in valid programs,
(2) The set of valid programs, and
(3) The "meaning" of each vali,4 ~ros:am.
Defining the permissible set of symbols is easy. One should bear in mind,
however, that in some languages such as SNOBOL or F O R T R A N , the
beginning and/or end of a card has significance and thus should be considered
a "symbol." Blank is also considered a symbol in some cases. Defining the set
of programs which are "valid" is a much more difficult task. In many cases it is
very hard just to decide whether a given program should be considered valid.
In the specification of programming languages it has become customary
to define the class of permissible programs by a set of grammatical rules
which allow some programs of questionable validity to be constructed. For
example, many F O R T R A N specifications permit a statement of the form
L GOTO L
within a "valid" F O R T R A N program. However, the specification of a super-

set of the truly valid programs is often much simpler than the specification
of all and only those programs which we would consider valid in the narrow-
est sense of the word.
The third and most difficult aspect of language specification is defining
the meaning of each valid program. Several approaches to this problem have
been taken. One method is to define a mapping which associates with each
valid program a sentence in a language whose meaning we understand. For
SEC. 1.1 PROGRAMMING LANGUAGES 55
example, we could use functional calculus or lambda calculus as the "well-

understood" language. Then we can define the meaning of a program in any
programming language in terms of an equivalent "program" in functional
calculus or lambda calculus. By equivalent program, we mean one which
defines the same function.
Another method of giving meaning to programs is to define an idealized
machine. The meaning of a program can then be specified in terms of its
effect on this machine started off in some predetermined initial configuration.
In this scheme the abstract machine becomes an interpreter for the language.
A third approach is to ignore deep questions of "meaning" altogether,
and this is the appproach we shall take here. For us, the "meaning" of a
source program is simply the output of the compiler when applied to the
source program.
In this book we shall assume that we have the specification of a compiler
as a set of pairs (x, y), where x is a source language program and y is a target
language program into which x is to be translated. We shall assume that we
know what this set of pairs is beforehand, and that our main concern is
the construction of an efficient device which when given x as input will pro-
duce y as output. We shall refer to the set of pairs (x, y) as a translation.
If each x is a string over alphabet 5: and y is a string over A, then a translation
is merely a mapping from l~* to A*.
1.1.2. Syntax and S e m a n t i c s
It is often more convenient in specifying and implementing translations

to treat a translation as the composition of two simpler mappings. The first
of these relations, known as t h e syntactic mapping, associates with each
input (program in the source language)some structure which is the domain
for the second relation, the semantic mapping. It is not immediately apparent
that there should be any structure which will aid in the translation process,
but almost without exception, a labeled tree turns out to be a very useful
structure to place on the input. Without delving into the philosophy of why
this should be so, much of this book will be devoted to algorithms for the
efficient construction of the proper trees for input programs.
As a natural example of how tree structures are built on strings, every
English sentence can be broken down into syntactic categories which are
related by grammatical rules. For example, the sentence
"The pig is in the pen"
has a grammatical structure which is indicated by the tree of Fig. 1.1, whose
nodes are labeled by syntactic categories and whose leaves are labeled by
the terminal symbols, which, in this case, are English words.
Likewise, a program written in a programming language can be broken
<sentence>
<noun phrase>
J <verb phrase>.
/ ~
<adjective> <noun> <verb> <phrase>
I
the
I
pig
I
is <preposition> <noun phrase>
I
in
/
<adjective>
\ <noun>
I
the
I
pen
Fig. 1.1 Tree structure for English sentence.
down into syntactic c o m p o n e n t s which are related by syntactic rules govern-

ing the language. F o r example, the string
a+b,c
m a y have a syntactic structure given by the tree of Fig. 1.2.t The term parsing
or syntactic analysis is given to the process of finding the syntactic structure
<expression>
<term> <term> * <factor>
<factor> <factor> <identifier>
<identifier> <identifier> c
b
Fig. 1.2 Tree from arithmetic expression.
tThe use of three syntactic categories, (expression), (term), and (factor), rather than
just (expression), is forced on us by our desire that the structure of an arithmetic expres-
sion be unique. The reader should bear this in mind, lest our subsequent examples of the
syntactic analysis of arithmetic expressions appear unnecessarily complicated.
BIBLIOGRAPHIC NOTES 57
associated with an input sentence. The syntactic structure of a sentence is

useful in helping to understand the relationships among the various symbols
of a sentence.
The term syntax of a language will refer to a relation which associates
with each sentence of a language a syntactic structure. We can then define
a valid sentence of a language as a string of symbols which has the overall
syntactic structure of (sentence). In the next chapter we shall discuss several
methods of rigorously defining the syntax of a language.
The second part of the translation is called the semantic mapping, in
which the structured input is mapped to an output, normally a machine
language program. The term semantics of a language will refer to a mapping
which associates with the syntactic structure of each input a string in some
language (possibly the same language) which we consider the "meaning"
of the original sentence. The specification of the semantics of a language is
a very difficult matter which has not yet been fully resolved, particularly for
natural languages, e.g., English.
Even the specification of the syntax and semantics of a programming
language is a nontrivial task. Although there are no universally applicable
methods, there are two concepts from language theory which can be used to
make up part of the description.
The first of these is the concept of a context-free grammar. Most of the
rules for describing syntactic structure can be formalized as a context-free
grammar. Moreover, a context-free grammar provides a description which
is sufficiently precise to be used as part of the specification of the compiler
itself. In Chapter 2 we shall present the relevant concepts from the theory of
context-free languages.
The second concept is the syntax-directed translation schema, which can
be used to specify mappings from one language to another. We shall study
syntax-directed translation schemata in some detail in Chapters 3 and 9.
In this book an attempt has been made to present those aspects of lan-
guage theory and other formal theories which bear on the design of pro-
gramming languages and their compilers. In some cases the impact of the
theory is only to provide a framework in which to talk about problems that
occur in compiling. In other cases the theory will provide uniform and prac-
ticable solutions to some of the design problems that occur in compiling.
BIBLIOGRAPHIC NOTES
High-level programming languages evolved in the early 1950's. At that time

computers lacked floating-point arithmetic operations, so the first programming
languages were representations for floating-point arithmetic. The first major
programming language was FORTRAN, which was developed in the mid-!950's.
Several other algebraic languages were also developed at that time, but FORTRAN
emerged as the most widely used language. Since that time hundreds of high-level
programming languages have been developed. Sammet [1969] gives an account
of many of the languages in existence in the mid-1960's.
Much of the theory of programming languages and compilers has lagged
behind the practical development. A great stimulus to the theory of formal lan-
guages was the use of what is now known as Backus Naur Form (BNF) in the
syntactic definition of ALGOL 60 (Naur [1963]). This report, together with the
early work of Chomsky [1959a, 1963], stimulated the vigorous development of
the theory of formal languages during the 1960's. Much of this book presents
results from language theory which have relevance to the design and understand-
ing of language translators.
Most of the early work on language theory was concerned with the syntactic
definition of languages. The semantic definition of languages, a much more difficult
question, received less attention and even at the time of the writing of this book
was not a fully resolved matter. Two good anthologies on, the formal specification
of semantics are Steel [1966] and Engeler [1971]. The IBM Vienna laboratory
definition of PL/I [Lucas and Walk, 1969] is one example of a totally formal
approach to the specification of a major programming language.
One of the more interesting developments in programming languages has been
the creation of extensible languages--languages whose syntax and semantics can
be changed within a program. One of the earliest and most commonly proposed
schemes for language extension is the macro definition. See McIlroy [1960], Leaven-
worth [1966], and Cheatham [1966], for example. Galler and Perlis [1967] have
suggested an extension scheme whereby new data types and new operators can be
introduced into ALGOL. Later developments in extensible languages are con-
tained in Christensen and Shaw [1969] and Wegbreit [1970]. ALGOL 68 is an
example of a major programming language with language extension facilities [Van
Wijngaarden, 1969].
1.2. AN O V E R V I E W OF C O M P I L I N G
We shall discuss techniques and algorithms which are applicable to the

design of compilers and other language-processing devices. To put these
algorithms into perspective, in this section we shall take a global picture of
the compiling process.
1.2.1, The Portions of a Compiler
Many compilers for many languages have certain processes in common.

We shall attempt to abstract the essence of some of these processes. In doing
so we shall attempt to remove from these processes as many machine-depen-
dent and operating-system-dependent considerations as possible. Although
implementation considerations are important (a bad implementation can
destroy a good algorithm), we feel that understanding the fundamental
SEC. 1.2 AN OVERVIEW OF COMPILING 59
nature of a problem is essential and will make the techniques for solution
of that problem applicable to other basically similar problems.
A source program in a programming language is nothing more than
a string of characters. A compiler ultimately converts this string of characters
into a string of bits, the object code. In this process, subprocesses with
the following names can often be identified:
(1) Lexical analysis.
(2) Bookkeeping, or symbol table operations.
(3) Parsing or syntax analysis.
(4) Code generation or translation to intermediate code (e.g. assembly
language).
(5) Code optimization.
(6) Object code generation (e.g. assembly).
In any given compiler, the order of the processes may be slightly different
from that shown, and several of the processes may be combined into a single
phase. Moreover, a compiler should not be shattered by any input it receives;
it must be capable of responding to any input string. For those input strings
which do not represent syntactically valid programs, appropriate diagnostic
messages must be given.
We shall describe the first five phases of compilation briefly. These phases
do not necessarily occur separately in an actual compiler. However, it is
often convenient to conceptually partition a compiler into these phases in
order to isolate the problems that are unique to that part of the compilation
process.
1.2.2. Lexical Analysis
The lexical analysis phase comes first. The input to the compiler and hence
the lexical analyzer is a string of symbols from an alphabet of characters.
In the reference version of PL/I for example, the terminal symbol alphabet
contains the 60 symbols
AB C ... Z $@~
0 1 2 ... 9 u
blank
=÷-,/(),.;"&l~>< 7%
In a program, certain combinations of symbols are often treated as a
single entity. Some typical examples of this would include the following"
(1) In languages such as PL/I a string of one or more blanks is normally
treated as a single blank.
(2) Certain languages have keywords such as BEGIN, END, GOTO,
DO, INTEGER, and so forth which are treated as single entities.
(3) Strings representing numerical constants are treated as single items.

(4) Identifiers used as names for variables, functions, procedures, labels,
and the like are another example of a single lexical unit in a programming
language.
It is the job of the lexical analyzer to group together certain terminal
characters into single syntactic entities, called tokens. What constitutes
a token is implied by the specification of the programming language. A
token is a string of terminal symbols, with which we associate a lexical struc-
ture consisting of a pair of the form (token type, data). The first component
is a syntactic category such as "constant" or "identifier," and the second
component is a pointer to data that have been accumulated about this par-
ticular token. For a given language the number of token types will be pre-
sumed finite. We shall call the pair (token type, data) a " t o k e n " also, when
there is no source of confusion.
Thus the lexical analyzer is a translator whose input is the string of
symbols representing the source program and whose output is a stream of
tokens. This output forms the input to the syntactic analyzer.
Example 1.1
Consider the following assignment statement from a FORTRAN-like

language"
COST -- (PRICE ÷ TAX) • 0.98
The lexical analysis phase would find COST, PRICE, and TAX to be tokens
of type identifier and 0.98 to be a token of type constant. The characters
--, (, -+-, ), and • are tokens by themselves. Let us assume that all constants
and identifiers are to be mapped into tokens of the type (id). We assume
that the data component of a token is a pointer, an entry to a table contain-
ing the actual name of the identifier together with other data we have col-
lected about that particular identifier. The first component of a token is used
by the syntactic analyzer for parsing. The second component is used by the
code generation phase to produce appropriate machine code.
Thus the output of the lexical analyzer operating directly on our input
string would be the following sequence of tokens"
(ida1 = ((ida2 + (id)3) • (ida,
Here we have indicated the data pointer of a token by means of a subscript.

The symbols --, (, + , ), and * are to be construed as tokens whose token type
is represented by themselves. They have no associated data, and hence we
indicate no data pointer for them. [---I
Lexical analysis is easy if tokens of more than one character are isolated
by characters which are tokens themselves. In the example above, --, (, -+-,
and • cannot appear as part of an identifier, so COST, PRICE, and TAX
can be readily distinguished as tokens.
However, lexical analysis may not be so easy in general. For example,
consider the following valid F O R T R A N statements:
(1) DO 10 I = 1.15
(2) DO 10 I - - 1,15
In statement (1) DO 10 l i s a variablet and 1.15 a constant. In statement (2)

DO is a keyword, 10 a constant, and I a variable; 1 and 15 are constants.
If a lexical analyzer were implemented as a coroutine [Gentleman, 1971;
Mcllroy, 1968] and were to start at the beginning of one of these statements,
with a command such as "find the next token," it could not determine if that
token was DO or DO 10 I until it had reached the comma or decimal point.
Thus a lexical analyzer may need to look ahead of the token it is actually
interested in. A worse example occurs in PL/I, where keywords may also be
variables. Upon seeing an input string of the form
DECLARE(X1, X 2 , . . . , Xn)
the lexical analyzer would have no way of telling whether D E C L A R E was
a function identifier and X1, X 2 , . . . , Xn were its arguments or whether
D E C L A R E was a keyword causing the identifiers X1, X2, . . . , Xn to have
the attribute (or attributes) immediately following the right parenthesis.
Here the distinction would have to be made on what follows the right paren-
thesis. But since n can be arbitrarily large,** the PL/I lexical analyzer might
have to look ahead an arbitrary distance. However, there is another approach
to lexical analysis that is less convenient but it avoids the problem of arbitrary
lookahead.
We shall define two extreme approaches to lexical analysis. Most tech-
niques in use fall into one or the other of these categories and some are
a combination of the two:
(1) A lexical analyzer is said to operate directly if, given a string of input
text and a pointer into that text, the analyzer will determine the token imme-
diately to the right of the place pointed to and move the pointer to the right
of the portion of text forming the token.
(2) A lexical analyzer is said to operate indirectly if, given a string of
text, a pointer into that text, and a token type, it will determine if input
tRecall that in FORTRAN blanks are ignored.

:]:The language specification does not impose an upper limit on n. However, a given
PL/I compiler will.
characters appearing immediately to the right of the pointer form a token

of that type. If so, the pointer is moved to the right of the portion of text
forming that token.
Example 1.2
Consider the F O R T R A N text
DO 10 I = 1,15
with the pointer currently at the left end. An indirect lexical analyzer would
respond "yes" if asked for a token of type DO or a token of type (identifier).
In the former case, the pointer would be moved two symbols to the right,
and in the latter case, five symbols to the right.
A direct lexical analyzer would examine the text up to the comma and
conclude that the next token was of type DO. The pointer would then move
two symbols to the right, although many more symbols would be scanned in
the process. El
Generally, we shall describe parsing algorithms under the assumption

that lexical analysis is direct. The backtrack or "nondeterministic" parsing
algorithms can be used with indirect lexical analysis. We shall include a dis-
cussion of this type of parsing in Chapters 4 and 6.
1.2.3. Bookkeeping
As tokens are uncovered in lexical analysis, information about certain

tokens is collected and stored in one or more tables. What this information
is depends on the language. In the F O R T R A N example we would want to
know that COST, PRICE, and TAX were floating-point variables and 0.98
a floating-point constant.
Assuming that COST, PRICE, and TAX have not been declared in
a type statement, this information about these variables can be gleaned from
the fact that COST, PRICE, and TAX begin with letters other than I, J, K,
L, M, or N.
As another example about collecting information about variables, con-
sider a F O R T R A N dimension statement of the form
D I M E N S I O N A(10,20)
On encountering this statement, we would have to store information that A

is an identifier which is the name of a two-dimensional array whose size is
10 by 20.
In complex languages such as PL/I, the number of facts which might be
stored about a given variable is quite large--on the order of a dozen or so.
Let us consider a somewhat simplified example of a table in which infor-
mation about identifiers is stored. Such a table is often called a symbol table.
The table will list all identifiers together with the relevant information con-
cerning each identifier.
Suppose that we encounter the statement
COST = (PRICE + TAX) • 0.98
After this statement, the table might appear as follows"
Entry Identifier Information
1 COST Floating-point variable

2 PRICE Floating-point variable
3 TAX Floating-point variable
4 0.98 Floating-point constant
On encountering a future identifier in the input stream, this table would

be consulted to see whether that identifier has already appeared. If it has,
then the data portion of the token for that identifier is made equal to the
entry number of the original occurrence of the variable with that name.
For example, if a succeeding statement in the F O R T R A N program con-
tained the variable COST, then the token for the second occurrence of COST
would be ( i d ) l , the same as the token for the first occurrence of COST.
Thus such a table must simultaneously allow for
(1) The rapid addition of new identifiers and new items of information,
and
(2) The rapid retrieval of information for a given identifier.
The method of data storage usually used is the hash (or scatter) table,
which will be discussed in Chapter 10 (Volume II).
1.2.4. Parsing
As we mentioned earlier, the output of the lexical analyzer is a string of

tokens. This string of tokens forms the input to the syntactic analyzer, which
examines only the first components of the token (the token types). The
information about each token (second component) is used later in the com-
piling process to generate the machine code. •
Parsing, or syntax analysis, as it is sometimes known, is a process in which
the string of tokens is examined to determine whether the string obeys certain
structural conventions explicit in the syntactic definition of the language.
It is also essential in the code generation process to know what the syn-
tactic structure of a given string is. For example, the syntactic structure of
the expression A q- B • C must reflect the fact that B and C are first multi-
plied and that then the result is added to A. No other ordering of the opera-
tions will produce the desired calculation.
Parsing is one of the best-understood phases of compilation. From a set
of syntactic rules it is possible to automatically construct parsers which will
make sure that a source program obeys the syntactic structure defined by
these syntactic rules. In Chapters 4-7 we shall study several different parsing
techniques and algorithms for generating a parser from a given grammar.
The output from the parser is a tree which represents the syntactic struc-
ture inherent in the source program. In many ways this tree structure is closely
related to the parsing diagrams we used to make for English sentences in
elementary school.
Example 1.3
Suppose that the output of the lexical analyzer is the string of tokens
(1.2.1) (id), : ( (id)2 + (id)3) • (id)4
This string conveys the information that the following three operations are
to be performed in exactly the following way:
(1) (id)3 is to be added to (id)2,
(2) The result of (1)is to be multiplied by (id)4, and
(3) The result of (2) is to be stored in the location reserved for (id)~.
This sequence of steps can be pictorially represented in terms of a labeled
tree, as shown in Fig. 1.3. That is, the interior nodes of the tree represent
<id>l <id>4
<id> 7 + <id> 3
Fig. 1.3 Tree structure.

sEc. 1.2 AN OVERVIEW OF COMPILING 65
actions which must be taken. The direct descendants of each node either
represent values to which the action is to be applied (if the node is labeled by
an identifier or is an interior node) or help to determine what the action
should be. (In particular, the = , q-, and • signs do this.) Note that the paren-
theses in (1.2.1) do not explicitly appear in the tree, although we might want
to show them as direct descendants of n 1. The role of the parentheses is only
to influence the order of computation. If they did not appear in (1.2.1), then
the usual convention that multiplication "takes precedence" over addition
would apply, and the first step would be to multiply (ida3 and (id)4. [Z]
1.2.5. Code Generation
The tree built by the parser is used to generate a translation of the input
program. This translation could be a program in machine language, but more
often it is in an intermediate language such as assembly language or "three
address code." (The latter is a sequence of simple statements; each involves
no more than three identifiers, e.g. A = B, A = B -t- C, or GOTO A.)
If the compiler is to do extensive code optimization, then code of the
three address type is preferred. Since three address code does not pin com-
putations to specific computer registers, it is easier to use registers to advan-
tage when optimizing. If little or no optimization is to be done, then assembly
or even machine code is preferred as an intermediate language. We shall give
a running example of a translation into an assembly type language to illus-
trate the salient points of the translation process.
For this discussion let us assume that we have a computer with one work-
ing register (the accumulator) and assembly language instructions of the form
Instruction Effect
LOAD m c(m) ~ accumulator

ADDt m c(accumulator) + c(m) ~ accumulator
MPY m c(accumulator) • c(m) ~ accumulator
STORE m c(accumulator) ~ m
LOAD =m m ---, accumulator
ADD =m c(accumutator) + m ~ accumulator
MPY =m c(accumulator) • m ~ accumulator
tLet us assume that ADD and MPY refer to floating-point operations.
Here the notation c(m) ---~ accumulator, for example, means the contents of
memory location m are to be placed in the accumulator. The expression
= m denotes the numerical value m. With these comments, the effects of
the seven instructions should be obvious.
The output of the parser is a tree (or some representation of one) which
represents the syntactic structure inherent in the string of tokens coming out
of the lexicai analyzer. F r o m this tree, and the information stored in the
symbol table, it is possible to construct the object code. In practice, tree con-
struction and code generation are often carried out simultaneously, but
conceptually it is easier to think of these two processes as occurring serially.
There are several methods for specifying how the intermediate code is to be
constructed from the syntax tree. One method which is particularly elegant
and effective is the syntax-directed translation. Here we associate with each
node n a string C(n) of intermediate code. The code for node n is constructed
by concatenating the code strings associated with the descendants of n and
other fixed strings in a fixed order. Thus translation proceeds from the bot-
tom up (i.e., from the leaves toward the root). The fixed strings and fixed
order are determined by the algorithm used. More will be said about this
in Chapters 3 and 9.
An important problem which arises is how to select the code C(n) for
each node n such that C(n) at the root is the desired code for the entire
statement. In general, some interpretation must be placed on C(n) such that
the interpretation can be uniformly applied to all situations in which node
n can appear.
For arithmetic assignment statements, the desired interpretation is fairly
natural and. will be explained in the following paragraphs. In general, the
interpretation must be specified by the compiler designer if the method of
syntax-directed translation is to be used. This task may be easy or hard, and
in difficult cases, the detailed structure of the tree may have to be adjusted
to aid in the translation process.
For a specific example, we shall describe a syntax-directed translation of
simple arithmetic expressions. We notice that in Fig. 1.3, there are three
types of interior nodes, depending on whether their middle descendant is
labeled = , + , or ,. These three types of nodes are shown in Fig. 1.4, where
/
(a) (b) (c)
Fig. 1.4 Types of interior nodes. P
represents an arbitrary subtree (possibly a single node). We observe that

for any arithmetic assignment statement involving only arithmetic operators
+ and ,, we can construct a tree with one node of type (a) (the root) and
other interior nodes of types (b) and (c) only.
The code associated with a node n will be subject to the following inter-
pretation"
SEC. 1.2 AN OVERVIEW OF COMPILING 6'7
(1) If n is a node of type (a), then C(n) will be code which computes the
value of the expression on the right and stores it in the location reserved for
the identifier labeling the left descendant.
(2) If n is a node of type (b) or (c), then C(n) is code which, when preceded
by the operation code LOAD, brings to the accumulator the value of the
subtree dominated by n.
Thus, in Fig. 1.3, when preceded by LOAD, C(nl) brings to the accumu-
lator the value of (id)2 ÷ (id)3, and C(n2) brings to the accumulator the
value of ((id)2 -t- (id)3) * (id)4. C(n3) is code, which brings the latter value
to the accumulator and stores it in the location of (id)l.
We must consider how to build C(n) from the code for n's descendants.
In what fellows, we assume that assembly language statements are to be
generated in one string, with a semicolon or a new line separating the state-
ments. Also, we assume that assigned to each node n of the tree is a level
number l(n), which denotes the maximum length of a path from that node to
a leaf. Thus, l(n)= 0 if n is a leaf, and if n has descendants n t , . . . , nk,
l(n) = maxx_<~_<kl(n~) + 1. We can compute l(n) bottom up, at the same time
that C(n) is computed. The purpose of recording levels is to control the use
of temporary stores. We must never store two needed quantities in the same
temporary location simultaneously. Figure 1.5 shows the level numbers of
each node in the tree of Fig. 1.3.
We shall now define a syntax-directed code generation algorithm to
3 n3
4
<id> 2 + <id~ 3
0 0 0
Fig. 1.5 Levelnumbers.
compute C(n) for all nodes n of a tree consisting of leaves, a root of type
(a), and interior nodes of either type (b) or type (c).
ALGORITHM 1.1
Syntax-directed translation of simple assignment statements.
Input. A labeled ordered tree representing an assignment statement
involving arithmetic operations -+- and • only. We assume that the level of
each node has been computed.
Output. Assembly language code to perform the assignment.
Method. Do steps (i) and (2) for all nodes of level 0. Then do steps (3),
(4), (5) on all nodes of level 1, then level 2, and so forth, until all nodes have
been acted upon.
(1) Suppose that n is a leaf with label (id~>j.
(i) Suppose that entry j in the identifier table is a variable. Then
C(n) is the name of that variable.
(ii) Suppose that entry j in the identifier table is a constant k. Then
C(n) is ' = k . ' t
(2) If n is a leaf, with label = , ,, or -t-, then C(n) is the empty string.
(In this algorithm, we do not need or wish to produce an output for leaves
labeled = , ,, or + . )
(3) If n is a node of type (a) and its descendants are hi, n2, and n 3, then
C(n) is 'LOAD' C(ns)'; STORE' C(nl).
(4) If n is a node of type (b) and its descendants are nx, n2, and n a, then
C(n) is C(n3)'; STORE $' l(n)'; LOAD' C(nl)'; A D D $' l(n).
This sequence of instructions uses a temporary location whose name is
the character $ followed by the level number of node n. It is straightforward
to see that when this sequence is preceded by LOAD, the value finally resid-
ing in the accumulator will be the sum of the values of the expressions domi-
nated by nl and n3.
We make two comments on the choice of temporary names. First, these
names are chosen to start with $ so that they cannot be confused with the
identifier names in F O R T R A N . Second, because of the way l(n) is chosen,
we can claim that C(n) contains no reference to a temporary $i if i is greater
than l(n). Thus, in particular, C(nx) contains no reference to '$' l(n). We can
thus guarantee that the value stored into '$' l(n) will still be there when it is
added to the accumulator.
(5) If all is as in (4) but node n is of type (c), then C(n) is
C(n3) '; STORE $' l(n) '; LOAD' C(nl)'; MPY $' l(n).
This code has the desired effect, with the desired result appearing in the
accumulator. 5
1"For emphasis, we surround with quotes those strings which represent themselves,
rather than naming a string.
We leave a proof of the correctness for Algorithm 1.1 for the Exercises.
It proceeds recursively on the height (i.e., level) of a node.
Example 1.4
Let us apply Algorithm 1.1 to the tree of Fig. 1.3. The tree given in Fig.
1.6 has the code associated with each node explicitly shown on the tree.
The nodes labeled (id)l through (id)4 are given the associated code COST,
LOAD = 0.98
STORE $2
LOAD TAX
n3 STORE $1
LOAD PRICE
ADD $1
MPY $2
STORE COST
= 0.98
$2
/ LOAD TAX
$1
/ PRICE
$1
$2
Y
TAX )
STORE sl ~ nlj .
LOAD PRICE <1~
ADD $1
< id >2 + < id >3

PRICE TAX
Fig. 1.6 Tree with generated code.
PRICE, TAX, and =0.98, respectively. We are now in a position to compute

C(nl). Since l(nl) -- 1, the formula of rule (4) gives
C(nl) = 'TAX; STORE $1; LOAD PRICE; ADD $1'
Thus, when preceded by LOAD, C(nl) produces the sum of PRICE and
TAX in the .accumulator, although it does it in an awkward way. The code
optimization process can "iron out" some of this awkwardness, or the rules
by which the object code is constructed can be elaborated to take care of
some special cases.
Next we can evaluate C(n2) using rule (5), and get
C(n2) = '=0.98; STORE $2; LOAD' C(nl)'; MPY $2'
Here, C(nl) is the string mentioned in the previous paragraph, and $2
is used as temporary, since l(n2) = 2.
We evaluate C(n3) using rule (3) and get
C(n3) = 'LOAD' C ( n 2 ) ' ; STORE COST'
The list of assembly language instructions (with semicolons replaced by
new lines) which form the translation of our original "COST . . . . " state-
ment is
(1.2.2) LOAD =0.98

STORE $2
LOAD TAX
STORE $1
LOAD PRICE
ADD $1
MPY $2
STORE COST 5
1.2.6. Code Optimization
In many situations it is desirable to have compilers produce object pro-

grams that run efficiently. Code optimization is the term generally applied to
attempts to make object programs more "efficient," e.g., faster running or
more compact.
There is a great spectrum of possibilities for code optimization. At one
extreme is true algorithm optimization. Here a compiler might attempt to
obtain some idea of the functions that are defined by the procedure specified
by the source language program. If a function is recognized, then the com-
piler might substitute a more efficient procedure to compute a given function
and generate machine code for that procedure.
Unfortunately optimization of this nature is exceedingly difficult. It is
a sad fact that there is no algorithmic way to find the shortest or fastest-
running program equivalent to a given program. In fact it can be shown in
an abstract way that there exist algorithms which can be speeded up indefi-
nitely, That is to say, there are some recursive functions for which any given
algorithm defining that function can be made to run arbitrarily faster for
large enough inputs.
Thus the term optimization is a complete misnomer---in practice we
must be content with code improvement. Various code improvement tech-
niques can be employed at various phases of the compilation process.

In general, what we can do is perform a sequence of transformations on
a given program in hopes of transforming the program to a more efficient
one. Such transformations must, of course, preserve the effect of the program
on the outside world. These transformations can be applied at various times
during the compilation process. For example, we can manipulate the input
program itself, the structures produced in the syntax analysis phase, or the
code produced as output of the code generation phase. In Chapter 11, we
shall discuss code optimization in more detail.
In the remainder of this section we shall discuss some transformations
which can be applied to shorten the code (1.2.2):
(1) If we assume that q- is a commutative operator, then we can replace
a sequence of instructions of the form LOAD ~; A D D fl by the sequence
LOAD fl; A D D ~, for any 0c and ft. We require, however, that there be no
transfer to the statement A D D fl from anywhere in the program.
(2) Likewise, if we assume that • is a commutative operator, we can
replace LOAD ~; MPY fl by LOAD fl; MPY ct.
(3) For any 0c, a sequence of statements of the form STORE ~; LOAD ct
can be deleted, provided either that ~ is not subsequently used or that
is stored into before being used again. (We can more often delete the first
statement LOAD ~ alone; to do so, it is required only that no transfers to
the statement L O A D ~ occur elsewhere in the program.)
(4) The sequence LOAD ~; STORE fl can be deleted if it is followed by
another LOAD, provided that there is no transfer to STORE fl and that
subsequent mention of fl is replaced by ~ until, but not including, such time
as another STORE fl instruction appears.
Example 1.5
These four transformations have been selected for their applicability to
(1.2.2). In general there would be a large set of transformations, and they
would be tried in various combinations. In (1.2.2), we notice that rule (1)
applies to LOAD PRICE; A D D $1, and we can, on speculation, tempo-
rarily replace these instructions by LOAD $1; A D D PRICE, obtaining
the code
(1.2.3) LOAD =0.98

STORE $2
LOAD TAX
STORE $1
LOAD $1
ADD PRICE
MPY $2
STORE COST
We now observe that in (1.2.3), the sequence STORE $1; LOAD $1

can be deleted by rule (3). Thus we obtain the codet
(1.2.4) LOAD =0.98

STORE $2
LOAD TAX
ADD PRICE
MPY $2
STORE COST
We can now apply rule (4) to the sequence LOAD =0.98; STORE $2.
These two instructions are deleted and $2 in the instruction MPY $2 is
replaced by M P Y =0.98. The final code is
(1.2.5) LOAD TAX

ADD PRICE
MPY =0.98
STORE COST
The code of (1.2.5) is the shortest that can be obtained using our four
transformations and is the shortest under any set of reasonable transfor-
mations. 5
1.2.7. Error Analysis and Recovery
We have so far assumed that the input to the compiler is a well-formed

program and that each phase of compiling can be carried out in a way that
makes sense. In practice, this will not be the case in many compilations.
Programming is still much an art, and there is ample opportunity for various
kinds of bugs to creep into most programs. Even if we feel that we have
understood the problem for which we are writing a program, and even if
we have chosen the proper algorithm to solve the problem, we often cannot
be sure that the program we have written faithfully executes the algorithm
it should perform.
A compiler has an opportunity to detect errors in a program in at least
three of the phases of compilation--lexical analysis, syntactic analysis, and
code generation. When an error is encountered, it is a difficult job, bordering
on an application of "artificial intelligence," for the compiler to be able to
look at an arbitrary faulty program and tell what was probably meant. How-
ever, in certain cases, it is easy to make a good guess. For example, if the
source statement A = B , 2C is seen, there is a high likelihood that
A = B • 2 • C was meant.
tA similar simplification could be obtained using rule (4) directly. However, we are
trying to give some examples of how different types of transformations can be used.
In general, when the compiler comes to a point in the input stream where
it cannot continue producing a valid parse, some compilers attempt to make
a "minimal" change in the input in order for the parse to proceed. Some
possible changes are
(1) Alteration of a single character. For example, if the parser is given
"identifier" INTEJER by the lexical analyzer and it is not proper for an
identifier to appear at this point in the program, the parser may guess that
the keyword INTEGER was meant.
(2) Insertion of a single token. For example, the parser can replace 2C
by 2 • C. (2 -q- C would do as well, but in this case, we "know" that 2 • C
is more likely.)
(3) Deletion of a single token. For example, a comma is often incorrectly
inserted after the 10 in a F O R T R A N statement such as DO 10 I = 1, 20.
(4) Simple permutation of tokens. For example, INTEGER I might be
written incorrectly as I INTEGER.
In many programming languages, statements are easily identified. If it
becomes hopeless to parse a particular (ill-formed) statement, even after
applying changes such as those above, it is often possible to ignore the state-
ment completely and continue parsing as though this ill-formed, statement
did not appear.
In general, however, there is very little of a mathematical nature known
about error recovery algorithms and algorithms to generate "good" diag-
nostics. In Chapters 4 and 5, we shall discuss certain parsing algorithms, LL,
LR, and Earley's algorithm, which have the property that as soon as the
input stream is such that there is no possible following sequence which
could make a well-formed input, the algorithms announce this fact. This
property is useful in error recovery and analysis, but some parsing algorithms
discussed do not possess it.
1.2.8. Summary
Our conceptual model of a compiler is summarized in Fig. 1.7. The code

optimization phase is shown occurring after the code generation phase, but
as we remarked earlier, various attempts at code optimization can be per-
formed throughout the compiler.
An error analysis and recovery procedure can be called from the lexical
analysis phase, syntactic analysis phase, or code generation phase, and if
the recovery is successful, control is returned to the phase from which the
error recovery procedure was called. Errors in which no token appears at
some point in the input stream are detected during lexicat analysis. Errors
in which the input can be broken into tokens but no tree structure can be
placed on these tokens are detected during syntactic analysis. Finally, errors
in which the input has a syntactic structure, but no meaningful code can be
74 A N INTRODUCTION TO COMPILING CHAP. 1
Book
keeping
Source Object
program
program
Error
analysis
Fig. 1.7 Model of a compiler.
generated from this structure, are detected during code generation. An exam-
ple of this situation would be a variable used without declaration. The parser
ignores the data component of tokens and so could not detect this error.
The symbol tables (bookkeeping) are produced in the lexical analysis
process and in some situations also during syntactic analysis when, say,
attributes and the identifiers to which they refer are connected in the tree
structure being formed. These tables are used in the code generation phase
and possibly in the assembly phase of compilation.
A final phase, which we refer to as assembly, is shown in Fig. 1.7. In this
phase the intermediate code is processed to produce the final machine lan-
guage representation of the object program. Some compilers may produce
machine language code directly as the result of code generation, so that the
assembly phase may not be explicitly present.
The model of a compiler we have portrayed in Fig. 1.7 is a first-order
approximation to a real compiler. For example, some compilers are designed
to operate using a very small amount of storage and as a consequence may
EXERCISES 75
consist of a large n u m b e r of phases which are called upon successively to

gradually change a source p r o g r a m into an object program.
Our goal is not to tabulate all possible ways in which compilers have
been built. R a t h e r we are interested in studying the fundamental problems
that arise in the design of compilers and other language-processing devices.
EXERCISES
"1.2.1. Describe the syntax and semantics of a F O R T R A N assignment state-

ment.
"1.2.2. Can your favorite programming language be used to define any recur-
sively enumerable set ? Will a given compiler necessarily compile the
resulting program ?
1.2.3. Give an example of a F O R T R A N program which is syntactically well
formed but which does not define an algorithm.
*'1.2.4. What is the maximum lookahead needed for the direct lexical analysis
of F O R T R A N ? By lookahead is meant the number of symbols which
are scanned by the analyzer but do not form part of the token found.
*'1.2.5. What is the maximum lookahead needed for the direct lexical analysis
of ALGOL 60? You may assume that superfluous blanks and end of
card markers have been deleted.
1.2.6. Parse the statement X = A • B ÷ C • D using a tree with interior nodes
of the forms shown in Fig. 1.4. Hint: Recall that, conventionally, mul-
tiplications are performed before additions in the absence of parentheses.
1.2.7. Parse the statement X = A • (B + C ) , D, as in Exercise 1.2.6. Hint:
When several operands are multiplied together, we assume that order
of multiplication is unimportant.t Choose any order you like.
1.2.8. Use the rules of code generation developed in Section 1.2.5 to translate
the parse trees of Exercises 1.2.6 and 1.2.7 in a syntax-directed way.
"1.2.9. Does the transformation of LOAD 0c; STORE fl; LOAD ~,; STORE
into LOAD ~,; STORE t~; LOAD ~; STORE fl preserve the input-
output relation of programs? If not, what restrictions must be placed
on identifiers ~, fl, ~,, ~ ? We assume that no transfers into the interior
of the sequence occur.
1.2.10. Give some transformations on assembly code which preserve the input-
output relation of programs.
"1.2.11. Construct a syntax-directed translation for arithmetic assignment state-
ments involving + and • which will, in particular, map the parse of
Fig. 1.3 directly into the assembly code (1.2.5).
tStrictly speaking, order may be important due to overflow and/or rounding.

"1.2.12. Design a syntax-directed translation scheme which will generate object

code for expressions involving both real and integer arithmetic. Assume
that the type of each identifier is known, and that the result of operating
on a real and an integer is a real.
"1.2.13. Prove that Algorithm 1.1 operates correctly. You must first define when
an input assignment statement and output assembly code are equivalent.
Research Problem
There are many research areas and open problems concerned with compiling
and translation of algorithms. These will be mentioned in more appropriate chap-
ters. However, we mention one here, because this area will not be treated in any
detail in the book.
1.2.14. Develop techniques for proving compilers correct. Some work has been
done in this area and in the more general area of proving programs
and/or algorithms correct. (See the following Bibliographic Notes.)
However, it is clear that more work in the area is needed.
An entirely different approach to the problem of producing reliable
compilers is to develop theory applicable to their empirical testing.
That is, we assume we "know" our compiling algorithms to be correct.
We want to test whether a particular program implements them correctly.
In the first approach, above, one would attempt to prove the equivalence
of the written program and abstract compiling algorithm. The second
approach suggested is to devise a finite set of inputs to the compiler
such that if these are compiled correctly, one can say with reasonably
certainty (say a 99 Yo confidence level) that the compiler program has
no bugs. Apparently, one would have to make some assumption about
the frequency and nature of programming errors in the compiler pro-
gram itself.
BIBLIOGRAPHIC NOTES
The development of compilers and compiling techniques paralleled that of

programming languages. The first FORTRAN compiler was designed to produce
efficient object code [Backus et al., 1957]. Numerous compilers have been written
since, and several new compiling techniques have emerged. The greatest strides
have occurred in lexical and syntactic analysis and in some understanding of code
generation techniques.
There are a large number of papers in the literature relating to compiler design.
We shall not attempt to mention all these sources here. Comprehensive surveys
of the history of compilers and compiler development can be found in Rosen
[1967], Feldman and Gries [1968], and Cocke and Schwartz [1970]. Several books
that describe compiler construction techniques are Randell and Russell [1964],
McKeeman et al., [1970], Cocke and Schwartz [1970], and Gries [1971]. Hopgood
[1969] gives a brief but readable survey of compiling techniques. An elementary
discussion of compilers is given in Lee [1967].
SEC. 1.3 OTHER APPLICATIONS OF PARSING AND TRANSLATING ALGORITHMS 77
Several compilers have been written which emphasize comprehensive error

diagnostics, such as D I T R A N [Moulton and Muller, 1967] and IITRAN [Dewar
et al., 1969]. Also, a few compilers have been written which attempt to correct each
error encountered and to execute the object program no matter how many errors
have been encountered. The philosophy here is to continue compilation and
execution in spite of errors, in an effort to uncover as many errors as possible.
Examples of such compilers are CORC [Conway and Maxwell, 1963, and Freeman,
1964], CUPL [Conway and Maxwell, 1968], and PL/C [Conway et al., 1970].
Spelling mistakes are a frequent source of errors in programs. Freeman [1964]
and Morgan [1970] describe some techniques they have found effective in cor-
recting spelling errors in programs.
A general survey of error recovery in compiling can be found in Elspas et al.
[1971].
Some work on providing the theoretical foundations for proving that com-
pilers work correctly is reported in McCarthy [1963], McCarthy and Painter [1967],
Painter [1970], and Floyd [1967a].
The implementation of a compiler is a task that involves a considerable amount
of effort. A large number of programming systems called compiler-compilers have
been developed in an attempt to make the implementation of compilers a less
onerous task. Brooker and Morris [1963], Cheatham [1965], Cheatham and
Standish [1970], Ingerman [1966], Irons [1963b], Feldman [1966], McClure [1965],
McKeeman et al. [1970], Reynolds [1965], Schorre [1964], and Warshall and
Shapiro [1964] are just a few of the many references on this subject. A compiler-
compiler can be simply viewed as a programming language in which a source
program is the description of a compiler for some language and the object program
is the compiler for that language.
As such, the source program for a compiler-compiler is merely a formalism for
describing a compiler. Consequently, the source program must contain explicitly
or implicitly, a description of the lexical analyzer, the syntactic analyzer, the code
generator, and the various other phases of the compiler to be constructed. The
compiler-compiler is an attempt at providing an environment in which these de-
scriptions can be easily written down.
Several compiler-compilers provide some variant of a syntax-directed transla-
tion scheme for the specification of a compiler, and some also provide an auto-
matic parsing mechanism. TMG [McClure, 1965] is a prime example of this type
of system. Other compiler-compilers, such as TGS [Cheatham, 1965] for example,
instead provide an elaborate high-level language in which to describe the various
algorithms that go into the making of a compiler. Feldman and Gries [1968] have
provided a comprehensive survey of compiler-compilers.
1.3. OTHER A P P L I C A T I O N S OF P A R S I N G A N D
TRANSLATING ALGORITHMS
In this section we shall mention two areas, other than compiling, in which
hierarchical structures such as those f o u n d in parsing and translating
algorithms can play a major role. These are the areas of natural language
translation and pattern recognition.
1.3.t. Natural Languages
It would seem that text in a natural language could be translated, either

to another natural language or to machine language (if the sentences
described a procedure), exactly as programming languages are translated.
Problems first appear in the parsing phase, however. Computer languages
are precisely defined (with occasional exceptions, of course), and the struc-
ture of statements can be easily discerned. The usual model of the structure
of statements is a tree, as described in Section 1.2.4.
Natural languages, first of all, are afflicted with both syntactic and seman-
tic ambiguities. To take English as the obvious example of a natural language,
the sentence "I have drawn butter" has at least two meanings, depending on
whether "drawn" is an adjective or part of the verb of the sentence. Thus
it is impossible always to produce a unique parse tree for an English sentence,
especially if the sentence is treated outside of the context in which it appears.
A more difficult problem concerning natural languages is that the words,
i.e., terminal symbols of the language, relate to other words in the sentence,
outside the sentence and possibly the general environment itself. Thus the
simple tree structure is not always sufficient to describe all the information
about English sentences that one would wish to have around when transla-
tion (the analog of code generation for programming languages) occurred.
For a commonly used example, the noun "pen" is really at least two
different nouns which we might refer to as "fountain pen" and "pig pen."
We might wish to translate English into some language in which "fountain
pen" and "pig pen" are distinct words. If we were given the sentence "This
pen leaks" to translate, it seems clear that "fountain pen" is correct. How-
ever, if the sentence were taken from the report "Preventing Nose Colds in
Hogs," we might want to reconsider our decision.
The point to be made is that the meaning and structure of an English
sentence can be determined only by examining its total environment" the
surrounding sentences, physical information (i.e., "Put the pen in the glass"
refers to "fountain pen" because a pig pen won't fit in a glass), and even the
nature of the speaker or writer (i.e., what does "This pen leaks" mean if
the speaker is a convict ?).
To describe in more detail the information that can be gleaned from
natural language sentences, linguists use structure systems that are more
complicated than the tree structures sufficient for programming languages.
Many of these efforts fall under the heading of context-sensitive grammars
and transformational grammars. We shall not cover either theory in detail,
although context-sensitive grammars are defined in the next chapter, and
a rudimentary form of transformational grammar can be discussed as a gen-
eralized form of syntax-directed translation on trees. This notion will be
mentioned in Chapter 9. The bibliographic notes for this section include
some places to look for more information on natural language parsing.
SE¢. 1.3 OTHERAPPLICATIONSOF PARSINGAND TRANSLATINGALGORITHMS 79
1.3.2. Structural Description of Patterns
Certain important sets of patterns have natural descriptions that lend

themselves to a form of syntactic analysis. For example, Shaw [1970] analyzed
cloud chamber photographs by putting a tree structure on relevant lines and
curves appearing therein. We shall here describe a particularly appealing
way of defining sets of graphs, called "web grammars" [Pfaltz and Rosenfeld,
1969]. While a complete description of web grammars would require knowl-
edge of Section 2.1, we can give a simple example here to illustrate the essen-
tial ideas.
Example 1.6
Our example concerns graphs called "d-charts,"t which can be thought
of as the flow charts for a programming language whose programs are
defined by the following rules:
(1) A simple assignment statement is a program.
(2) If $1 and Sz are programs, then so is S 1 ; S ~.
(3) If S 1 and Sz are programs and A is a predicate, then
if A then $i else $2 end

is a program.
(4) If S is a program and A a predicate, then
while A do S end
is a program.
We can write flow charts for all such programs, where the nodes (blocks)
of the flow chart represent code either to test a predicate or perform a simple
assignment statement. All the d-charts can be constructed by beginning with
a single node, representing a program, and repeatedly replacing nodes repre-
senting programs by one of the three structures shown in Fig. 1.8. These
replacement rules correspond to rules (2), (3), and (4) above, respectively.
The rules for connecting these structures to the rest of the graph are
the following. Suppose that node no is replaced by the structure of Fig.
1.8(a), (b), or (c).
(1) Edges entering no now enter nx, n3, or n6, respectively.
(2) An edge from no to node n is replaced by an edge from n2 to n in
Fig. 1.8(a), by edges from both n4 and n5 to n in Fig. 1.8(b), and by an edge
from n6 to n in Fig. 1.8(c).
Nodes n3 and n6 represent predicate tests and may not be further replaced.
The other nodes represent programs and may be further replaced.
tThe d honors E. Dijkstra.

() nl
n6 ~ n 7
n4
( n2
(a) (b) (c)
Fig. 1.8 Structures representing subprograms in a d-chart.
Let us build the d-chart which corresponds, in a sense, to a program of

the form
if B1 then
while B2 do
if B 3 then $1 else S 2 end end;
$3
else ifB4 then
$4;
$5;
while B s do $6 end
else $7 end end
The entire program is of the form if B1 then Ss else S~ end, where

Ss represents everything from the first while to $3 and $9 represents if
B4 " " $7 end. We can also show this analysis by replacing a single node
by the structure of Fig. !.8(b).
Continuing the analysis, Ss is of the form Sx0; $3, where S~0 is
while B2 --" $2 end end. Thus we can reflect this analysis by replacing node
n4 of Fig. 1.8(b) by Fig. 1.8(a). The result is shown in Fig. 1.9(a). Then,
we see S~0 is of the form while B2 do $11 end, where S~ is if B 3 . . . $2 end.
We can thus replace the left direct descendant of the root in Fig. 1.9(a)by
the structure of Fig. 1.8(c). The result is shown in Fig. 1.9(b).
The result of analyzing the program in this way is shown in Fig. 1.10.
Here, we have taken the liberty of drawing contours around each node that
is replaced and of making sure that all subsequent replacements remain
inside the contour. Thus we can place a natural tree structure on the d-chart
by representing nodes of the d-chart as leaves, and contours by interior
nodes. Nodes have as direct ancestor the contour most closely including
what they represent. The tree structure is shown in Fig. 1.11. Nodes are
labeled by the node or contour they represent.
SEC. 1.3 OTHERAPPLICATIONS OF PARSING AND TRANSLATING ALGORITHMS 81
(a) (b)
Fig. 1.9 Constructing a d-chart.
/ ./ i ~ - ---- --- >.,,f"- "~. .~ /r"~O ~ ~ . O'S X

/ / / \ i
.I , i "\ \,
' I ~ i I
I , I I
! 'l \1 \il"s,~ (s~),,i! i! L(s~)) \'~6 I I
t t\ .K'-" i~.,;// t\ "T' c~ ,~ / i
\
\,~, 1111
Fig. 1.10 Complete d-chart.
In a sense, the example above is a fraud. The d-chart structure, as reflected

in Fig. 1.11, is essentially the same structure that the parsing phase of a com-
piler would place on the original program. Thus it appears that we are dis-
cussing the same kind of syntax analysis as in Section 1.2.3. However, it
should be borne in mind. that this kind of structural analysis can be done
without reference to the program, looking only at the d-chart. Moreover,
while we used this example of a web grammar because of its relation to pro-
gramming languages, there are many purely graph-theoretic notions that
C1
Bl C5
c3 J % s3 B4 d ,i, C6 % S7
B2
B3 S1 $2 $4 $5 B5 $6
Fig. 1.11 Tree describing d-chart structure.
can be defined using web grammars (suitably generalized from Example 1.8),
for example, the class of planar graphs or the class of binary trees.
BIBLIOGRAPHIC NOTES
Chomsky [1965] gives a good treatment of the difficulties in trying to find a

satisfactory grammatical model for English. Bobrow [1963] surveys efforts at using
English or, more accurately, some subset of English as a programming language.
Bar-Hillel [1964] surveys theoretical aspects of linguistics.
The notion of a web grammar is from Pfaltz and Rosenfeld [1969] and the
theory was extended in Montanari [1970] and Pavlidis [1972]. Some of the original
work in syntactic analysis of patterns is due to Shaw [1970]. A survey of results
in this area can be found in Miller and Shaw [1968].
ELEMENTS OF
2 LANGUAGE THEORY
In this chapter we shall present those aspects of formal language theory

which are relevant to parsing and translation. Initially, we shall concentrate
on the syntactic aspects of language. As most of the syntax of modern pro-
gramming languages can be described by means of a context-free grammar,
we shall focus our attention on the theory of context-free languages.
We shall first study an important subclass of the context-free languages,
namely the regular sets. Concepts from the theory of regular sets have wide-
spread application and pervade much of the material of this book.
Another important class of languages is the deterministic context-free
languages. These are context-free languages that have grammars which are
easily parsed, and fortunately, or by intent, modern programming languages
can be viewed as deterministic context-free languages with good, although
not complete, precision.
These three classes of languages, the context-free, regular, and deter-
ministic context-free, will be defined and some of their principal properties
given. Since the theory of languages encompasses an enormous body of
material, and since not all of it is relevant to parsing and translation, some
important theorems of language theory are here proved in a very sketchy
way or relegated to the Exercises. We try to emphasize only those aspects of
language theory which are useful in the development of this book.
As in Chapters 0 and 1, we invite the reader who has been introduced to
the theory of languages to skip or skim this chapter.
2.1. REPRESENTATIONS FOR LANGUAGES
In this section, we shall discuss from a general point of view the two
principal methods of defining languages--the generator and the recognizer.
83
84 ELEMENTS OF LANGUAGE THEORY CHAP. 2
We shall discuss only the most common kind of generator, the Chomsky
grammar. We treat recognizers in somewhat greater generality, and in sub-
sequent sections we shall introduce some of the great variety of recognizers
that have been studied.
2.1.1. Motivation
Our definition of a language L is a set of finite-length strings over some

finite alphabet Z. The first important question is how to represent L when
L is infinite. Certainly, if L consisted of a finite number of strings, then one
obvious way would be to list all the strings in L.
However, for many languages it is not possible (or perhaps not desirable)
to put an upper bound on the length of the longest string in that language.
Consequently, in many cases it is reasonable to consider languages which
contain arbitrarily many strings. Obviously, languages of this nature cannot
be specified by an exhaustive enumeration of the sentences of the language,
and some other representation must be sought. Invariably, we want our
specification of a language to be of finite size, although the language being
specified may not be finite.
There are several methods of specification which fulfill this requirement.
One method is to use a generative system, called a grammar. Each sentence
in the language can be constructed by well-defined methods, using the rules
(usually called productions) of the grammar. One advantage of defining
a language by means of a grammar is that the operations of parsing and
translation are often made simpler by the structure imparted to the sentences
of the language by the grammar. We shall treat grammars, particularly
the "context-free" grammars, in detail.
A second method for language specification is to use a procedure which
when presented with an arbitrary input string will halt and answer "yes"
after a finite amount of computation if that string is in the language. In the
most general case, we could allow the procedure to either halt and answer
"no" or to continue operating forever if the string under consideration
were not in the language. In practical situations, however, we must insist that
the procedure be an algorithm, so that it will halt for all inputs.
We shall use a somewhat stylized device to represent procedures for
defining languages. This device, called a recognizer, will be introduced in
Section 2.1.4.
2.1.2. Grammars
Grammars are probably the most important class of generators of lan-

guages. A grammar is a mathematical system for defining a language, as
well as a device for giving the sentences in the language a useful structure.
In this section we shall look at a class of grammars called Chomsky gram-
mars, or sometimes phrase structure grammars.
SEC. 2.1 REPRESENTATIONSFOR LANGUAGES 85
A grammar for a language L uses two finite disjoint sets of symbols.

These are the set of nonterminal symbols, which we shall often denote by
N,t and the set of terminal symbols, which we shall denote by Z. The set of
terminal symbols is the alphabet over which the language is defined. Non-
terminal symbols are used in the generation of words in the language in
a way which will become clear later.
The heart of a grammar is a finite set P of formation rules, or productions
as we shall call them, which describe how the sentences of the language are
to be generated. A production is merely a pair of strings, or; more precisely,
an element of (N U Z)*N(N U Z)* x (N U Z)*. That is, the first compo-
nent is any string containing at least one nonterminal, and the second com-
ponent is any string.
For example, a pair (AB, CDE) might be a production. If it is determined
that some string a can be generated (or "derived") by the grammar, and
has AB, the left side of the production, as a substring, then we can form
a new string fl by replacing one instance of the substring AB in a by CDE.
We then say that fl is derived by the grammar. For example, if FGABH
can be derived, then FGCDEH can also be derived. The language defined
by the grammar is the set of strings which consist only of terminals and
which can be derived starting with one particular string consisting of one
designated symbol, usually denoted S.
CONVENTIO~q
If (a, fl) is a production, we use the descriptive shorthand t~ --~ fl and
refer to the production as ~ --~ fl rather than (~, fl).
We now give a formal definition of grammar.
DEFINITION
A grammar is a 4-tuple G = (N, Z, P, S) where
(i) N is a finite set of nonterminal symbols (sometimes called variables or
syntactic categories).
(2) Z is a finite set of terminal symbols, disjoint from N.
(3) P is a finite subset of
(N U ~)*N(N U $)* x (N U Z)*
An element (a, fl) in P will be written a ~ fl and called a production.

(4) S is a distinguished symbol in N called the sentence (or start) symbol.
Example 2.1
An example of a grammar is G1 = ({A, S}, (0, 1}, P, S), where P consists
of
1"According to our convention about alphabet names, this symbol is a capital

Greek nu, although the reader will probably want to call it "en," as is more customary
anyway.
S: ~ 0A1
0,4 ~ 00A 1
A ~e
The nonterminal symbols are A and S and the terminal symbols are 0 and
1. D
A grammar defines a language in a recursive manner. We define a special

kind of string called a sententialform of a grammar G = (N, E, P, S) recur-
sively as follows"
(1) S is a sentential form.
(2) If afly is a sentential form and fl ~ ~ is in P, then ~d~7 is also a sen-
tential form.
A sentential form of G containing no nonterminal symbols is called
a sentence generated by G.
The language generated by a grammar G, denoted L(G), is the set of sen-
tences generated by G.
We shall now introduce some terminology which we shall find useful.
Let G = (N, ~, P, S) be a grammar. We can define a relation ==~ (to be read
G
as directly derives) on (N u E)* as follows" If efl~, is a string in (N W E)*
and fl ~ t~ is a production in P, then eft7 =-~ ed~,.
G
+
We shall use ~ (to be read derives in a nontrivial way) to denote the
G
transitive closure of ~ , and ~ (to be read derives) to denote the reflexive

G G
and transitive closure of =~. When it is clear which grammar we are talking
G
+ *
about, we shall drop the subscript G from =-~, =-~, and =-~.
k
We shall also use the notation ~ to denote the k-fold product of the
k
relation =-~. That is to say, e ~-~ fl if there is a sequence ix0, e l , . . . , ek of
k--t-1 strings (not necessarily distinct) such that e = e0, e~-~ ~ et for
1 ~ i ~ k and ek = fl- This sequence of strings is Called a derivation of
length k of fl from e in G. Thus, L(G) = {wlw is in 2~* and S ~ w}. Also
t +
notice that e ~ fl if and only if e =-~ fl for some i ~ 0, and e =-~ fl if and
i
only if e ~ fl for some i ~ 1.
Example 2.2
Let us consider grammar G 1 of Example 2.1 and the following derivation'
S ==~ 0A1 ~ 00A1 t =-~ 0011. That is, in the first step, S is replaced by 0A1
according to the production S ~ 0A1. At the second step, 0A is replaced
SEC. 2.1 REPRESENTATIONS FOR LANGUAGES 87
by 00A 1, and at the third, A is replaced by e. We may say that S =-~ 0011,
+ ,
S~ 0011, S ~ 00i 1, and that 0011 is in L(G~). It c a n b e shown that
L ( G , ) -- {0"1" I n ~ 1}
and we leave this result for the Exercises.
CONVENTION
A notational shorthand which is quite useful for representing a set of

productions is to use
to denote the n productions
a ~fll
~Z -----> ff2
tg > ~n
We shall also use the following conventions to represent various symbols

and strings concerned with a grammar"
(1) a, b, c, and d represent terminals, as do the digits 0, 1 , . . . , 9.
(2) A, B, C, D, and S represent nonterminals; S represents the start
symbol.
(3) U, V , . . . , Z represent either nonterminals or terminals.
(4) a, f l , . . , represent strings of nonterminals and terminals.
(5) u, v , . . . , z represent strings of terminals only.
Subscripts and superscripts do not change these conventions. When
a symbol obeys these conventions, we shall often omit ~mention of the con-
vention. We can thus specify a grammar by merely listing its productions if
all terminals and nonterminals obey conventions (1) and (2). Thus grammar
G i can be specified simply as
S ~ 0A1
OA >~OOA1
A ~e
No mention of the nonterminal or terminal sets or the start symbol is neces-

sary.
We now give further examples of grammars.
Example 2,3
Let G = ([<digit>}, {0, 1 , . . . , 9}, {<digit> ---~ 0 1 1 1 . . . 19}, <digit>). Here

(digit> is treated as a single nonterminal symbol. L(G) is clearly the set of
the ten decimal digits. Notice that L(G) is a finite set.
Example 2.4
Let Go = ([E, T, F}, [a, + , ,, (,)}, P, E), where P consists of the produc-
tions
E >E+TIT
T >T,F]F
F > (E) la
An example of a derivation in this grammar would be
E - - - > E-+- T
~T+ T
>'F-q- T
>a+T
>a-+- T , F
--->a+F,F
.-=->a + a , F
---> a .+- a , a
L(Go) is the set of arithmetic expressions that can be built up using the sym-
bols a, ÷ , ,, (, and). D
The grammar in Example 2.4 will be used repeatedly in the book and is
always referred to as Go.
Example 2.5
Let G be defined by
S .> aSBC[abC
CB > BC
bB ~ bb
bC > bc
cC > cc
We have the following derivation in G"
S ~ aSBC
aab CB C
>. a a b B C C
aabbCC
aabbcC
aabbcc
The language generated by G is [a"b"c"[n ~ 1}. D

Example 2.6
Let G be the grammar with productions
S ~ CD Ab ~ bA
C > aCA Ba ~ aB
C ~ bCB Bb ~~ bB
AD >aD C >e
BD- >bD D >e
Aa ~ aA
An example of a derivation in G is
S---~CD
- ~.aCAD
- ~ abCBAD
===~.abBA D
abBaD
abaBD
abab D
abab
We shall show that L ( G ) = [ww] w ~ {a, b}*}. That is, L(G) consists of
strings of a's and b's of even length such that the first half of each string is
the same as the second half.
Since L(G) is a set, the easiest way to show thaf L(G) = {ww[w ~ {a, b}*}
is to show that {ww]w ~ {a, b}*} ..q L(G) and that L(G) ~ [ww[w ~ {a, b}*}.
90 ELEMENTS OF LANGUAGE THEORY CHAP• 2
To show that { w w l w ~ {a, b}*} ~ L(G) we must show that every string
of the form ww can be derived from S. By a simple inductive proof we can
show that the following derivations are possible in G"
(I) s ~ CD.
(2) For n .> O,
n
C ~ c~c2 . . " c . C X . X . _ ~ ... X~
c~c~ . . . c~X.X~_~ . . . X~
where, for 1 ~ i ~ n, ct = a if and only if Xt = A, and ct = b if and only

i f X t = B.
(3) xo • .. X2X1D ~ X n • ' " X2c,D
n--I
c t X ~ . . . X2D
.~ c~X~ . . . X3c2D
n-2
~"c lc2X~ . . . X3D
clc2 " " c,,-tXnD

CtC 2 • . . Cn_lCnD
>" C l C 2 • • . Cn_iC n
The details of such an inductive proof are straightforward and will be

omitted here.
In derivation (2), C derives a string of a's and b's followed by a mirror
image string of A's and B's. In derivation (3), the A's and B's migrate to
the right end of the string, where an A becomes a and B becomes b on con-
tact with D, which acts as a right endmarker. The only way an A or B can
be replaced by a terminal is for it to move to the right end of the string.
In this fashion the string of A's and B's is reversed and thus matches the
string of a's and b's derived from C in derivation (2).
Combining derivations (1), (2), and (3) we have for n ~ 0
+
S~¢ic2 ..• ¢n¢iC2 • • • Cn
where c, E {a, b} for 1 < i < n. Thus [ w w l w ~ {a, b}*} ~ L(G).

We would now like to show that L(G) ~ [wwl w ~ [a, b}*}. To do this
we must show that S derives terminal strings only of the form ww. In general,
to show that a grammar generates only strings of a certain form is a much
more difficult matter than showing that it generates certain strings.
At this point it is convenient to define two homomorphisms g and h
such that
g(a) = a, g(b) = b, g(A) = g ( B ) = e

and
h(a) = h(b) = e, h(A) = A, h(B) = B
m
For this grammar G we can prove by induction on m ~ 1 that if S ~ ~,

then a¢ can be written as a string clcz . . . c, U f l V such that
(1) Each ct is either a or b;
(2) U is either C or e;
(3) fl is a string in [a, b, A, B}" such that g ( f l ) = c ~ c 2 . . . c ~ ,
h(,O = X.X._, . . . X,+,,
and Xj is A or B as cj is a or b, i < j < n; and

(4) V is either D or e.
The details of this induction will be omitted.
We now observe that the sentential forms of G which consist entirely of
terminal symbols are all of the form e l e z . . , e , e ~ c z . . . e , , where each
c; ~ [a, b].
Thus, L(G) ~ {ww]w ~ [a, b}*}.

We can now conclude that L(G) = ( w w l w ~ [a, b}*}. D
2.1.3. Restricted Grammars
Grammars can be classified according to the format of their productions.

Let G = (N, 2:, P, S) be a grammar.
DEFINITION
G is said to be
(1) Right-linear if each production in P is of the form A ~ x B or
A ~ x, where A and B are in N and x is in Z*.
(2) Context-free if each production in P is of the form A ~ ~, where
A is in N and tx is in (N u Z)*.
(3) Context-sensitive if each production in P is of the form ~ ---~ fl, where
I~I_<lPl.
A grammar with no restrictions as above is called unrestricted.
The grammar of Example 2.3 is a right-linear grammar, Another example
of a right-linear grammar is the grammar w i t h the productions
S ---->. OS[1St e
This grammar generates the language [0, 1}*.

92 ELEMENTSOF LANGUAGE THEORY CHAP. 2
The grammar of Example 2.4 is an important example of a context-flee

grammar. Notice that according to our definition, every right-linear grammar
is also a context-free grammar.
The grammar of Example 2.5 is clearly a context-sensitive grammar.
We should emphasize that the definition of context-sensitive grammar
does not permit a production of the form A ~ e, commonly known as an
e-production. Thus a context-free grammar having e-productions would not
be a context-sensitive grammar.
The reason for not permitting e-productions in context-sensitive gram-
mars is to ensure that the language generated by a context-sensitive grammar
i~ recursive. That is to say, we want to be able to give an algorithm which,
presented with an arbitrary context-sensitive grammar G and input string
w, will determine whether or not w is in L(G). (See Exercise 2.1.18.)
Even if we permitted just one e-production in a context-sensitive grammar
(without imposing additional conditions on the grammar), then the expanded
class of grammars would be capable of defining any recursively enumerable
set (see Exercise 2.1.20). The grammar in Example 2.6 is unrestricted. Note
that it is not right-linear, context-free, or context-sensitive.
CONVENTION
If a language L can be generated by a type x grammar, then L is said to

be a type x language, for all the "type x" 's that we have defined or shall
define.
Thus L(G) of Example 2.3 is a right-linear language, L(Go) in Example
2.4 is a context-free language, and L(G) of Example 2.5 is a paradigm con-
text-sensitive language. The language generated by the grammar in Example
2.6 is an unrestricted language, although {wwlw ~ (a, b}* and w ~ e] also
happens to be a context-sensitive language.
The four types of grammars and languages we have defined are often
referred to as the Chomsky hierarchy.
CONVENTION
We shall hereafter abbreviate context-free grammar and language by

C F G and CFL, respectively, Likewise, CSG and CSL stand for context-
sensitive grammar and context-sensitive language.
Every right-linear language is a CFL, and there are CFL's, such as
[0"l"]n ~> 1}, that are not right-linear. The CFL's which do not contain
the empty string likewise form a proper subset of the context-sensitive lan-
guages. These in turn are a proper subset of the recursive sets, which are
in turn a proper subset of the recursively enumerable sets. The (unrestric-
ted) grammars define exactly the recursively enumerable sets. These matters
are left for the Exercises.
SEC. 2.1 REPRESENTATIONSFOR LANGUAGES 93
Often, the context-sensitive languages are defined to be the languages

that we have defined plus all those languages L U [e}, where L is a context-
sensitive language as defined here. In that case, we may call the CFL's
a proper subset of the CSL's.
We should emphasize the fact that although we may be given a certain
type of grammar, the language generated by that grammar might be generated
by a less powerful grammar. As a simple example, the context-free grammar
S > ASIe
A >011
generates the language {0, 1}*, which, as we have seen, can also be generated
by a right-linear grammar.
We should also mention that there are a number of grammatical models
that have been recently introduced outside the Chomsky hierarchy. Some of
the motivation in introducing new grammatical models is to find a generative
device that can better represent all the syntax and/or semantics of pro-
gramming languages. Some of these models are introduced in the Exercises.
2.1.4. Recognizers
A second common method of providing a finite specification for a lan-

guage is to define a recognizer for the language. In essence a recognizer is
merely a highly stylized procedure for defining a set. A recognizer can be
pictured as shown in Fig. 2.1.
There are three parts to a recognizer an input tape, a finite state control,
and an auxiliary memory.
!
ao at [ a2 an Input tape
t
~ Input head
Finite state
control
Fig. 2.1 A recognizer.

The input tape can be considered to be divided into a linear sequence of

tape squares, each tape square containing exactly one input symbol from
a finite input alphabet. Both the leftmost and rightmost tape squares may be
occupied by unique endmarkers, or there may be a right endmarker and no
left endmarker, or there may be no endmarkers on either end of the input
tape.
There is an input head, which can read one input square at a given instant
of time. In a move by a recognizer, the input head can move one square to
the left, remain stationary, or move one square to the right. A recognizer
which can never move its input head left is called a one-way recognizer.
Normally, the input tape is assumed to be a read-only tape, meaning
that once the input tape is set no symbols can be changed. However, it is
possible to define recognizers which utilize a read-write input tape.
The memory of a recognizer can be any type of data store. We assume
that there is a finite memory alphabet and that the memory contains only
symbols from this finite memory alphabet in some data organization. We
also assume that at any instant of time we can finitely describe the contents
and structure of the memory, although as time goes on, the memory may
become arbitrarily large. An important example of an auxiliary memory is
the pushdown list, which can be abstractly represented as a string of memory
symbols, e.g., Z1 Z2 . . . Zn, where each Zt is assumed to be from some finite
memory alphabet IF', and Z 1 is assumed to be on top.
The behavior of the auxiliary memory for a class of recognizers is char-
acterized by two functions--a store function and a fetch function. It is
assumed that the fetch function is a mapping from the set of possible memory
configurations to a finite set of information symbols, which could be the same
as the memory alphabet.
For example, the only information that can be accessed from a pushdown
list is the topmost symbol. Thus a fetch function f for a pushdown list would
be a mapping from r + to IF' such that f(Za . . . Z , ) = Za.
The store function is a mapping which describes how memory may be
altered. It maps memory and a control string to memory. If we assume that
a store operation for a pushdown list replaces the topmost symbol on the
pushdown list by a finite length string of memory symbols, then the store
function g could be represented as g: F + × IF'* ~ F*, such that
g ( Z , Z ~ . . . Zn, r, . . . r , ) = Y, . . . Y k Z ~ . . . Zn.
If we replace the topmost symbol Z 1 on a pushdown list by the empty

string, then the s y m b o l Z 2 becomes the topmost symbol and can then be
accessed by a fetch operation.
Generally speaking, it is the type of memory which determines the name
of a recognizer. For example a recognizer having a pushdown list for a mem-
src. 2.1 REPRESENTATIONS FOR LANGUAGES 95
ory would be called a pushdown recognizer (or more usually, pushdown

automaton).
The heart of a recognizer is the finite state control, which can be thought
of as a program which dictates the behavior of the recognizer. The control
can be represented as a finite set of states together with a mapping which
describes how the states change in accordance with the current input symbol
(i.e., the one under the input head) and the current information fetched
from the memory. The control also determines in which direction the input
head is to be shifted and what information is to be stored in the memory.
A recognizer operates by making a sequence of moves. At the start of
a move, the current input symbol is read, and the memory is probed by means
of the fetch function, The current input symbol and the information fetched
from the memory, together with the current state of the control, determine
what the move is to be. The move itself consists of
(1) Shifting the input head one square left, one square right, or keeping
the input head stationary;
(2) Storing information into the memory; and
(3) Changing the state of the control.
The behavior of a recognizer can be conveniently described in terms of
configurations of the recognizer. A configuration is a picture of the recognizer
describing
(1) Thestate of the finite control;
(2) The contents of the input tape, together with the location of the input
head; and
(3) The contents of the memory.
We should mention here that the finite control of a recognizer can be
deterministic or nondeterministic. If the control is nondeterministic, then in
each configuration there is a finite set of possible moves that the recognizer
can make.
The control is said to be deterministic if in each configuration there is
at most one possible move. Nondeterministic recognizers are a convenient
mathematical abstraction, but, unfortunately, they are often difficult to
simulate in practice. We shall give several examples and applications of
nondeterministic recognizers in the sections that follow.
The initial configuration of a recognizer is one in which the finite controI
is in a specified initial state, the input head is scanning the leftmost symbol
on the input tape, and the memory has a specified initial content.
A final configuration is one in which the finite control is in one of a speci-
fied set of final states and the input head is scanning the right endmarker or,
if there is no right endmarker, has moved off the right end of the input tape.
Often, the memory must also satisfy certain conditions if the configuration
is to be considered final.
We say that a recognizer accepts an input string w if, starting from the
initial configuration with w on the input tape, the recognizer can make
a sequence of moves and end in a final configuration.
We should point out that a nondeterministic recognizer may be able to
make many different sequences of moves from an initial configuration. How-
ever, if at least one of these sequences ends in a final configuration, then
the initial input string will be accepted.
The language defined by a recognizer is the set of input strings it accepts.
For each class of grammars in the Chomsky hierarchy there is a natural
class of recognizers that defines the same class of languages. These recognizers
are finite automata, pushdown automata, linear bounded automata, and
Turing machines. Specifically, the following characterizations of the Chomsky
languages exist"
(I) A language L is right-linear if and only if L is defined by a (one-way
deterministic) finite automaton.
(2) A language L is context-free if and only if L is defined by a (one-way
nondeterministic) pushdown automaton,
(3) A language L is context-sensitive if and only if L is defined by a (two-
way nondeterministic) linear bounded automaton.
(4) A language L is recursively enumerable if and only if L is defined by
a Turing machine.
The precise definition of these recognizers will be found in the Exercises
and later sections. Finite automata and pushdown automata are important
in the theory of compiling and will be studied in some detail in this chapter.
EXERCISES
2.1.1. Construct right-linear grammars for

(a) Identifiers which can be of arbitrary length but must start with
a letter (as in ALGOL).
(b) Identifiers which can be one to six symbols in length and must
start with L J, K, L, M, or N (as for FORTRAN integer variables).
(c) Real constants as in PL/I or FORTRAN, e.g., -- 10.8, 3.14159,
2., 6.625E-27.
L(d) All strings Of o's and l's having both an Odd number of O's
and an odd number of l's.
2.1.2. Construct context-free grammars that generate
(a) All strings of O's and l's having equal numbers of 0's and l's.
(b) [aiaz . . . anan.., azal lag ~ {0, 1}, 1 ~ i ~ n}.
(c) Well-formedstatements in propositional calculus.
(d) (0'lJl i ~ j and i, j > 0).
(e) All possible sequences of balanced parenthesesl
EXERCISES 97
"2.1.3. Describe the language generated by the productions S----~ b S S l a .

Observe that it is not always easy to describe what language a grammar
generates.
"2.1.4. Construct context-sensitive grammars that generate
(a) [an'In > 1).
(b) {ww I w ~ [a, b}+}.
(c) [ w l w ~ [a, b, c}+ and the number of a's in w equals the number
of b's which equals the number of c's].
(d) [ambnambnl m, n > 1}.
Hint: Think of the set of productions in a context-sensitive grammar
as a program. You can use special nonterminal symbols as a combina-
tion "input head" and terminal symbol.
"2.1.5. A "true" context-sensitive grammar G is a grammar (N, ~, P, S) in
which each production in P is of the form
where • and fl are in (N U E)*, ~' ~ (N u ZZ)+, and A ~ N. Such

a production can be interpreted to mean that A can be replaced by
only in the context 0~_fl. Show that every context-sensitive language
can be generated by a "true" context-sensitive grammar.
• "2.1.6. What class of languages can be generated by grammars with only
left context, that is, grammars in which each production is of the form
~zA ~ ~zfl, ~z in (N U X~)*, fl in (N U E)+ ?
2.1.7. Show that every context-free language can be generated by a grammar
G = (N, ~ , P , S) in which each production is of either the form
A ---~ cz, cz in N*, or A ~ w, w in E*.
2.1.8. Show that every context-sensitive language can be generated by a gram-
mar G = (N, X~, P, S) in which each production is either of the form
----~ fl, where cz and fl are in N +, or A ~ w, where A ~ N and
w ~ X +.
2.1.9. Prove that L(G) = [a"bnc" In ~ 1}, where G is the grammar in Example
2.5.
"2.1.10. Can you describe the set of context-free grammars by means of a
context-free grammar ?
• 2.1.11. Show that every recursivelY enumerable set can be generated by a
grammar with at most two nonterminal symbols. Can you generate
every recursively enumerable set with a grammar having only one
nonterminal symbol ?
2.1.12. Show that if G = (N, ~, P, S) is a grammar such that ~ N = n, and
X~ does not contain any of the symbols A1, A2 . . . . , then there is an
equivalent grammar G' = (N', E, P', A1) such that
N' = [A1, A 2 , . . . , An}.

2.1.13. Prove that the grammar G~ of Example 2.1 generates {0"l"[n > 1}.
Hint: Observe that each sentential form has at most one nonterminal.
Thus productions can be applied in only one place in a string.
DEFINITION
In an unrestricted grammar G there are many ways of deriving a
given sentence that are essentially the same, differing only in the order
in which productions are applied. If G is context-free, then we can
represent these essentially similar derivations by means of a derivation
tree. However, if G is context-sensitive or unrestricted, we can define
equivalence classes of derivations in the following manner.
Let G = (N, E, P, S) be an unrestricted grammar. Let D be the
set of all derivations of the form S ~ w. That is, elements of D
are sequences of the form (0~0, 0~1. . . . , t~n) such that t~0 = S, ~n ~ ~*,
and oct_~ ~ oct 1 < i < n.
Define a relation R0 on D by (t~0, t X l , . . . , o~n) R0 (fl0, r i , . • . , fin)
if and only if there is some i between 1 and n - 1 such that
(1) txj = fl~ for all 1 ~ j ~ n such that j ~ i.
(2) We can write oct_l = 71727a~'4~'s and t~+l = ~'lO?aeys such that
72~t~ and Y 4 ~ 6 are in P, and either t x t = 7 ' ~ 3 y 4 ~ s and
fit = 71Y2736~'5, or conversely.
Let R be the least equivalence relation containing R0. Each equiva-
lence class of R represents the essentially similar derivations of a given
sentence.
*'2.1.14. What is the maximum size of an equivalence class of R (as a function
of n and i 0c~ I) if G is
(a) Right-linear.
(b) Context-free.
(c) Such that every production is of the form tx ~ fl and [ tx I < [fl [.
• 2.1.15. Let G be defined by
S ~ AOBIB1A
A- > B_BI0
B----~ AAI 1
What is the size of the equivalence class under R which contains the
derivation
S ~ AOB ==~ BBOB ==~ 1BOB ==~ 1 A A O B ~ IOAOB
1000B ~ 10001
DEFINITION
A grammar G is said to be unambiguous if each w in L(G) appears
as the last component of a derivation in one and only one equivalence
class under R, as defined above. For example,
EXERCISES 99
S- > abClaB
B- > be
bC----> bc
is an ambiguous grammar since the sequences
(S, abC, abc) and (S, aB, abc)
are in two distinct equivalence classes.

"2.1.16. Show that every right-linear language has an unambiguous right-linear
grammar.
"2.1.17. Let G = (N, ~, P, S) be a context-sensitive grammar, and let N u X~
n
have m members. Let w be a word in L(G). Show that S ==~ w, where
G
n < (m + 1)IwI.
2.1.18. Show that every context-sensitive language is recursive. Hint: Use the
result of Exercise 2.1.17 to construct an algorithm to determine if w is
in L(G) for arbitrary word w and context-sensitive grammar G.
2.1.19. Show that every CFL is recursive. Hint: Use Exercise 2.1.18, but be
careful about the empty word.
"2.1.20. Show that if G = (N, ~, P, S) is an unrestricted grammar, then there
is a context-sensitive grammar G ' = (N', X~ U {c}, P', S') such that
w is in L(G) if and only if wc~is in L(G') for some i > 0
Hint: Fill out every noncontext-sensitive production of G with c's.

Then add productions to allow the c's to be shifted to the right end
of any sentential form.
2.1.21. Show that if L = L(G) for any arbitrary grammar G, then there is
a context-sensitive language Li and a homomorphism h such that
L = h(L1).
2.1.22. Let {A1, A2 . . . . } be a countable set of nonterminal symbols, not
including the symbols 0 and 1. Show that every context-sensitive
language L ~ { 0 , 1 } * has a CSG G = (N, {0,1}, P, A1), where
N = {A1, A 2 , . . . ,A~}, for some i. We call such a context-sensitive
grammar normalized.
"2.1.23. Show that the set of normalized context-sensitive grammars as defined
above is countable.
"2.1.24. Show that there is a recursive set contained in {0, i}* which is not a
context-sensitive language. Hint: Order the normalized, context-
sensitive grammars so that one may talk about the ith grammar. Like-
wise, lexicographicaUy order {0, 1}* so that we may talk about the ith
string in {0, 1}*. Then define L = {w~lw~ is not in L(G3} and show
that L is recursive but not context-sensitive.
'1O0 ELEMENTS OF LANGUAGE THEORY CHAP. 2
*'2.1.25. Show that a language is defined by a grammar if and only if it is re-

cognized by a Turing machine. (A Turing machine is defined in the
Exercises of Section 0.4 on page 34.)
2.1.26. Define a nondeterministic recognizer whose memory is an initially blank
Turing machine tape, which is not permitted to grow longer than the
input. Show that a language is defined by such a recognizer if and only
if it is a CSL. This recognizer is called a linear bounded automaton
(LBA, for short).
DEFINITION
An indexed grammar is a 5-tuple G = (N, E, A, P, S), where N, E,

and A are finite sets of nonterminals, terminals, and intermediates,
respectively. S in N is the start symbol and P is a finite set of produc-
tions of the forms
A + X i V i X 2 V 2 "'" X . V . , n>0
and
M----> X~ ~uIX2 V 2 " " X+V., n~0
where A is in N, the X's in N kJ E, f in A, and the V's in A*, such

that if Xi is in Z, then V~ = e. Let ~ and ,8 be strings in (NA* u E)*,
A e N, 0 e A*, and let A --+ X~ V~ "'" X,V, be in P. Then we write
ocXOp ~ ocXt e ~O~x 2 e 2 0 [ . . . X~e~O'J~

G
where 0~ = 0 if ~ ~ N and 0~ = e if X~ ~ E. That is, the string of

intermediates following a nonterminal distributes over the nonter-
minals, but not over the terminals, which can never be followed
by an intermediate. If A f ~ X1Vx , ' " XnV, is in P, then
o~AfOfl ~ O~Xllp'lO~ . " X.ly,O~fl,
G
as above. Such a step "consumes" the intermediate following A, but
,
otherwise is the same as the first type of step. Let ~ be the reflexive,
G
transitive closure of ===~, and define
G
L(G) = {wlw in ~* and S ~ w}.

G
E x a m p l e 2.7
Let G = ({S, T, A, B, C}, {a, b, c}, {f, g }, P, S), where P consists of
S---->Tg
T~ Tf
T~ A~C
Af > aA
Bf - - - > bB
EXERCISES 101
Cf - >cC
Ag----+a
Bg---~b
Cg =-+. c
Then L(G) = {a"b"c" In _> 1 }. For example, aabbcc has the derivation
S--~Tg
- - ~ Tfg
AfgBfgCfg
aAgBfgCfg
aaBfgCfg
- - ~ aabBgCfg
aabbCfg
aabbeCg
> aabbcc [-]
"2.1.27. Give indexed grammars for the following languages:

(a) {wwl w ~ {a, b]*].
(b) {a"b"21n ~ 1).
*'2.1.28. Show that every indexed language iscontext-sensitive.
2.1.29. Show that every CFL is an indexed language.
2.1.30. Let us postulate a recognizer whose memory is a single integer (written
in binary if you will). Suppose that the memory control strings as
described in Section 2.1.4 are only X and Y. Which of the following
could be memory fetch functions for the above recognizer ?
0, if i is even
(a) f ( i ) = 1, if i is odd.
a, if i is even
(b) f ( i ) b, if i is odd.
• [0, if i is even and the input symbol under the input
(c) f ( i ) = 1 head is a
1, otherwise.
2.1.31. Which of the following could be memory store functions for the
recognizer in Exercise 2.1.30 ?
(a) g(i, X ) -- 0
g(i, Y) -- i ÷ t.
(b) g(i, X ) -- 0
g(i, Y ) - { i ÷ 1, if the previous store instruction was X
÷ 2, if the previous store instruction was Y.
DEFINITION
A tag system consists of two finite alphabets N and ~ and a finite

set of rules of the form (t~, fl), where tx and fl are in (N U E)*. If ~,
is an arbitrary string in (N u E)* and (t~, fl) is a rule, then we write
• ~' I- ?ft. That is, the prefix ~ may be removed from the front of any
.
string provided fl is then placed at the end of the string. Let ~ be
the reflexive, transitive closure of ~ . For any string ? in (N W E)*,
L r is {wI w is in E* and ? t-- w}.
• "2.1.32. Show that L r is always defined by some grammar. Hint: Use Exercise
2.1.25 or see Minsky [1967].
• "2.1.33. Show that for any grammar G, L(G) is defined by a tag system in the
manner described above. The hint of Exercise 2.1.32 again applies.
Open Problems
2.1.34. Is the complement of a context-sensitive language always context-
sensitive ?
The recognizer of Exercise 2.1.26 is called a linear bounded auto-
maton (LBA). If we make it deterministic, we have a deterministic LBA
(DLBA).
2.1.35. Is every context-sensitive language recognized by a DLBA ?
2.1.36. Is every indexed language recognized by a DLBA ?
By Exercise 2.1.28, a positive answer to Exercise 2.1.35 implies a
positive answer to Exercise 2.1.36.
BIBLIOGRAPHIC NOTES
Formal language theory was greatly stimulated by the work of Chomsky in

the late 1950's [Chomsky, 1956, 1957, 1959a, 1959b]. Good references to early
work on generative systems are Chomsky [1963] and Bar-Hillel [1964].
The Chomsky hierarchy of grammars and languages has been extensively
studied. Many of the major results concerning the Chomsky hierarchy are given
in the Exercises. Most of these results are proved in detail in Hopcroft and UUman
[1969] or Ginsburg [1966].
Since Chomsky introduced phrase structure grammars, many other models of
grammars have also appeared in the literature. Some of these models use special-
ized forms of productions. Indexed grammars [Aho, 1968], macro grammars
[Fischer, 1968], and scattered context grammars [Greibach and Hopcroft, 1969]
are examples of such grammars. Other grammatical models impose restrictions
on the order in which productions can be applied. Programmed grammars [Rosen-
krantz, 1968] are a prime example.
Recognizers for languages have also been extensively studied. Turing machines
were deffined by A. Turing in 1936. Somewhat later, the concept of a finite state
SEC. 2.2 REGULAR SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 103
machine appeared in McCulloch and Pitts [1943]. The study of recognizers was
stimulated by the work of Moore [1956] and Rabin and Scott [1959].
A significant amount of effort in language theory has been expended in deter-
mining the algebraic properties of classes of languages and in determining decida-
bility results for classes of grammars and recognizers. For each of the four classes
of grammars in the Chomsky hierarchy there is a class of recognizers which defines
precisely those languages generated by that class of grammars. These observations
have led to a study of abstract families of languages and recognizers in which
classes of languages are defined in terms of algebraic properties. Certain algebraic
properties in a class of languages are necessary and sufficient to guarantee the
existence of a class of recognizers for those languages. Work in this area was
pioneered by Ginsburg and Greibach [1969] and Hopcroft and Ullman [1967].
Book [1970] gives a good survey of language theory circa 1970.
Haines [1970] claims that the left context grammars in Exercise 2.1.6 generate
exactly the context-sensitive languages. Exercise 2.1.28 is from Aho [1968].
2.2. REGULAR SETS, THEIR GENERATORS,

AND THEIR RECOGNIZERS
The regular sets are a class of languages central to much of language

theory. In this section we shall study several methods of specifying languages,
all of which define exactly the regular sets. These methods include regular
expressions, right-linear grammars, deterministic finite automata, and non-
deterministic finite automata.
2.2.1. Regular Sets and Regular Expressions
DEFINITION
Let Z be a finite alphabet. We define a regular set over Z recursively in
the following manner:
(1) ~ (the empty set) is a regular set over Z.
(2) {e} is a regular set over E.
(3) {a} is a regular set over Z for all a in Z.
(4) If P and Q are regular sets over Z, then so are
(a) P U a .
(b) PQ.
(c) P*.
(5) Nothing else is a regular set.
Thus a subset of Z* is regular if and only if it is ~ , {e}, or {a}, for some
a in Z, or can be obtained from these by a finite number of applications of
the operations union, concatenation, and closure.
We shall define a convenient method for denoting regular sets over
a finite alphabet E.
DEFINITION
Regular expressions over X and the regular sets they denote are defined
recursively, as follows"
(1) ~ is a regular expression denoting the regular set ~ .
(2) e is a regular expression denoting the regular set {e}.
(3) a in X is a regular expression denoting the regular set {a}.
(4) I f p and q are regular expressions denoting the regular sets P and Q,
respectively, then
(a) (p + q) is a regular expression denoting P u Q.
(b) (pq) is a regular expression denoting PQ.
(e) (p)* is a regular expression denoting P*.
(5) Nothing else is a regular expression.
We shall use the shorthand notation p+ to denote the regular expression
pp*. Also, we shall remove redundant parentheses from regular expressions
whenever no ambiguity can arise. In this regard, we assume that * has the
highest precedence, then concatenation, and then -+-. Thus, 0 -+- 10" means
(0 + (1 (0"))).
Example 2.8
Some examples of regular expressions are
(1) 01, denoting {01}.
(2) 0", denoting {0}*.
(3) O + 1)*, denoting {0, 1}*.
(4) (0 + 1)* 011, denoting the set of all strings of 0's and l's ending in 011.
(5) (a + b)(a + b + 0 + 1)*, denoting the set of all strings in {0, 1, a, b}*
beginning with a or b.
(6) (00 + 11)*((01 + 10)(00 + 11)*(01 + 10)(00 + 11)*)*, denoting the
set of all strings of O's and l's containing both an even number of O's and
an even number of l's. E]
It should be quite clear that for each regular set we can find at least one
regular expression denoting that regular set. Also, for each regular expression
we can construct the regular set denoted by that regular expression. Unfor-
tunately, for each regular set there is an infinity of regular expressions denot-
ing that set.
We shall say two regular expressions are equal ( = ) if they denote the same
set.
Some basic algebraic properties of regular expressions are stated in the
following lemma.
LEMMA 2.1
Let g, p, and ~, be regular expressions. Then
(1) ~ + fl = fl + ~ (2) ~ * -- e
(3) 0~ ÷ (fl --k y) = (0~ + fl) + y (4) oc(fl?)= (~fl)?

(5) ~(fl + y ) = ~fl -q- ~? (6) (~ + fl)}, = ~, + ,8?
(7) ~e = e0¢ = ~ (8) Z0c = ~~ = O
(9) ~* = • + ~* (1 O) (~*)* -- ~*
(11) ~-+-~=~ (12) ~+ ~ =
Proof (1) Let ~ and fl denote the sets L 1 and L 2, respectively. Then
•-Jr fl denotes L1 U L2 and fl + ~ denotes L2 U L1. But L t U L2 = L2 u L 1
from the definition of union. Hence, 0¢ + ,8 = fl -q- 0~.
The remaining parts are left for the Exercises.
In what follows, we shall not distinguish between a regular expression

and the set it denotes unless confusion will arise. For example, under this
convention the symbol a will represent the set ~a}.
W h e n dealing with languages it is often convenient to use equations
whose indeterminates and coefficients represent sets. Here, we shall consider
sets of equations whose coefficients are regular expressions and shall call
such equations regular expression equations.
For example, consider the regular expression equation
(2.2.1) X = aX + b
where a and b are regular expressions. We can easily verify by direct substitu-
tion that X = a*b is a solution to Eq. (2.2.1). That is to say, when we sub-
stitute the set represented by a*b in both sides of Eq. (2.2.1), then each side
of the equation represents the same set.
We can also have sets of equations that define languages. For example,
consider the pair of equations
X = a~ X --Jr-a 2 Y -~- a3
(2.2.2)
Y = blX-+- b2 Y-+- b3
where each at and b~ is a regular expression. We shall show how we cart

solve this pair of simultaneous equations to obtain the solutions
X : (a 1 -Jr-azb2*ba)*(a 3 q- a2bz*b3)
Y : (b 2 + b~aa*a2)*(b3 -if- b~aa*a3)
However, we should first mention that not all regular expression equa-
tions have unique solutions. For example, if
(2.2.3) X = ~X + fl
is a regular expression equation and a denotes a set which contains the empty
string, then X = 0c*(fl + y) is also a solution to (2.2.3) for all ?. (? does not
even have to be regular. See Exercise 2.2.7.) Thus Eq. (2.2.3) has an infinity
'! 06 ELEMENTSOF LANGUAGE THEORY CHAP. 2
of solutions. In situations of this nature we shall use the smallest solution,

which we call the minimal fixed point. The minimal fixed point for Eq.
(2.2.3) is X = 0~*fl.
DEFINITION
A set of regular expression equations is said to be in standard form over

a set of indeterminates A = [X1, X 2 , . . . , X,} if for each Xt in A there is
an equation of the form
Yi : ~io + ~ilYl + oci2Y2 ~- "'" + O~inYn

with ~u a regular expression over some alphabet disjoint from A.
The ~'s are the coefficients. Note that if ~u : @, a possible regular
expression, then effectively there is no term for Xj in the equation for Xt.
Also, if 0~u = e, then effectively the term for Xj in the equation for X~ is
just Xj. That is, Z~ plays the role of coefficient 0, and e the role of coefficient
1 in ordinary linear equations.
ALGORITHM 2. I
Solving a set of regular expression equations in standard form.
Input. A set Q of regular expression equations in standard form over A,
whose coefficients are regular expressions over alphabet X. Let A be the set
{x,, xo}.
Output. A set of solutions of the form Xt = 0ci, 1 ~ i ~ n, where ~, is
a regular expression over E.
Method. The method is reminiscent of solving linear equations using
Gaussfan elimination.
Step 1" Let i = 1.
Step 2" If i = n, go to step 4. Otherwise, using the identities of Lemma
2.1, write the equation for Xt as X~ = ~Xt --k fl, where ~ is a regular expres-
sion over 1~and fl is a regular expression of the form flo+fl~X~+~+ . . . +fl,X,,
with each fit a regular expression over I:. We shall see that this will always
be possible. Then in the equations for X~+~,..., X,, we replace Xt on the
right by the regular expression 0~*fl.
Step 3" Increase i by 1 and return to step 2.
Step 4" After executing step 2 the equation for Xt will have only symbols
in E and X t , . . . , X, on the right. In particular, the equation for X n will
have only X, and symbols in E on the right. At this point i = n and we now
go to step 5.
Step 5" The equation for X~ is of the form Xt = ~zX~--k fl, where ~ and
/~ are regular expressions over Z. Emit the statement X~ = e*fl and substitute
e*,t/for X~ in the remaining equations.
Step 6" If i = 1, end. Otherwise, decrease i by 1 and return to step 5. [-7
SEC. 2.2 REGULAR SETS, THEIR GENERATORS~ AND THEIR RECOGNIZERS 107
Example 2.9
Let A = {X~, X2, X3}, and let the set of equations be
(2.2.4) X1----OX"2 -t-- 1X1 + e

(2.2.5) X2 = OX3 -+- 1X2
(2.2.6) X 3 --OXt + 1X a
From (2.2.4) we obtain X1 = 1X1 + (0X2 + e). We then replace Xt by

l*(0X2 + e) in the remaining equations. Equation (2.2.6) becomes
X 3 --01*(OX2 + e) q- 1X 3,
which can be written, using Lemma 2.1, as
(2.2.7) X3=01*0X2+lX 3+01.
If we now work on (2.2.5), which was not changed by the previous step,
we replace X 2 by I*0X 3, in (2.2.7), and obtain
(2.2.8) X 3 = (01"01"0 _jr_ 1)X3 q-- 01"
We now reach step 5 of Algorithm 2.1. From Eq. (2.2.8) we obtain the
solution for X3"
(2.2.9) X 3 = (01"01"0 --¢- 1)*01"
We substitute (2.2.9) in (2.2.5), to yield
(2.2.10) X2 = 0(01"01"0 + 1)*01" + 1X2
Since X 3 does not appear in (2.2.4), that equation is not modified. We then
solve (2.2.10), obtaining
(2.2.11) X2 -- 1*0(01 *01 *0 + 1)*01*
Substituting (2.2.11) into (2.2.4), we obtain
(2.2.12) X t = 01"0(01 *01"0 + 1)*01 * q- 1X 1 q-- e
The solution to (2.2.12) is
(2.2.13) X, = 1"(01"0(01"01"0 .+ 1)*01" -]- e)
The output of Algorithm 2.1 is the set of Eq. (2.2.9), (2.2.11), and (2.2.13).
D
We must show that the output of Algorithm 2.1 is truly a solution to

the equations, in the sense that when the solutions are substituted for the
indeterminates, the sets denoted by both sides of each equation are the same.
As we have pointed out, the solution to a set of standard form equations is
not always unique. However, when a set of equations does not have a unique
solution we shall see that Algorithm 2.1 yields the minimal fixed point.
DEFINITION
Let Q be a set of standard form equations over A with coefficients over

~. We say that a mapping f from A to languages in E* is a solution to Q if
upon substituting f ( X ) for X in each equation, for all X in A, the equations
become set equalities. We say that f : A --, 6'(E*) is a minimalfixed point of
Q i f f i s a solution, and if g is any other solution, f ( X ) ~ g(X) for all X i n A.
The following two lemmas provide useful information about minimal
fixed points.
LEMMA 2.2
Every set Q of standard form equations over A has a unique minimal
fixed point.
Proof Let f ( X ) = {w l for all solutions g to Q, w is in g(X)}, for all X
in A. It is straightforward to show that f is a solution and that f ( X ) ~ g(X)
for all solutions g. Thus, f is the unique minimal fixed point of Q. E]
We shall now characterize the minimal fixed point of a set of equations.
LEPTA 2.3
Let Q be a set of standard form equations over A, where A = [ X x , . . . ,X,}
and the equation for each X~ is
Then the minimal fixed point of Q is f, where
f ( X ~ ) = [wa . . . w~ l for some sequence of integers j~ . . . . ,Jm, m ~ 1,

W, is in 0cj,0, and wk is in txj~j~÷,, 1 < k < m, where j~ = i}.
Proof. It is straightforward to show that the following set equations are

valid:
f ( X , ) - - O~,o U oc,l f ( X 1) U . . . W o~o,f(X,,)
for all i. Thus, f is a solution.

To show that f is the minimal fixed point, suppose that g is a solution
and that for some i there exists a string w in f(X~) -- g(X~). Since w is in
f(X~), we can write w - - wl - " win, where for some sequence of integers
j l , . . . , j m , we have Wm in OC:~o, Wk in 0~j~:~÷~,for 1 ~ k < m, and Jt = i.
Since g is a solution, we have g(Xj) = ~jo U a j l g ( X 1 ) (_) " ' " U ajng(Xn) for
all j. In particular, a j0 ~ g(Xj) and aj~g(Xk) ~ g(Xj) for all j and k. Thus,
Wmis in g ( X j . ) , Win-~Wm is in g(Xj._,), and so forth. Finally, w = w~w2 - . . Wm
is in g(Xi, ) = g(X,). But then we have a contradiction, since we supposed
that w was not in g ( X 3 . Thus we can conclude that f ( X 3 ~ g ( X 3 for all i.
It immediately follows that f is the minimal fixed point of Q. [~]
LEMMA 2.4
Let Q1 and Q2 be the set of equations before and after a single applica-
tion of step 2 of Algorithm 2.1. Then Q1 and Q2 have the same minimal
fixed point.
Proof. Suppose that in step 2 the equation
At --- O~iO -~- O~tiAi .q_ ~i,i+lAi+l _ql_ . . . _~_ ~i,,A, '
is under consideration. (Note: The coefficient for A h is ~ for 1 ~ h < i.)

In Q1 and Q2 the equations for Ah, h ~ i, are the same.
Suppose that
(2.2.14) Aj = ~jo -+" ~] ~jkAk

k=l
is the equation for A~, j > i, in Q1. In Q2 the equation for Aj becomes
(2.2.15) Aj -- flo + ~ ,OkA~

k=i+i
where
~ytO~u~i0
Pk ~jk + * for i < k < n
We can use Lemma 2.3 to express the minimal fixed points of Q1 and
Q2, which we shall denote by fl and f2, respectively. From the form of Eq.
(2.2.15), every string in f2(A~) is in f l ( A j ) . This follows from the fact that
any string w which is in the set denoted by • j~,0ci~ * can be expressed as
WlW2 . . " Wm, where wl is in ~ji, wm is in ~ik, and w2 . . . . , Wm-1 are in ~,.
Thus, w is the concatenation of a sequence of strings in the sets denoted by
coefficients of Q1, for which the subscripts satisfy the condition of Lemma
2.3. A similar observation holds for strings in o~j~.*o%. Thus it can be shown
that f2(A~) ~ f~(Aj).
Conversely, suppose w is in f l ( A j ) . Then by Lemma 2.3 we may write
w = W l . . ' W m , for some sequence of nonzero subscripts ll, . . . . lm such
that Wm is in 0ct.0, wp is in ~I~1.... 1 ~ p < m, and 11 = j. We can group the
w / s uniquely, such that we can write w : Yl "'" Y,, where yp : w, . . . w,,
and
1 10 ELEMENTS OF LANGUAGE THEORY CHAP. 2
(1) If 1, ~ i, then s = t + 1.
(2) If l, > i, then s is chosen such that l , + ~ , . . . , l, are all i and l,+~ =/= i.
It follows that in either case, yp is in the coefficient of A j,÷, in the equation
of Q2 for A h, and hence w is in fz(Aj). We conclude that f~(A~)= f2(Aj)
for all j. [S]
LE~VnV~A2.5
Let Q~ and Q2 be the sets of equations before and after a single appli-
cation of step 5 in Algorithm 2.1. Then Q~ and Q2 have the same minimal
fixed points.
Proof. Exercise, similar to Lemma 2.4.
THEOREM 2.1
Algorithm 2.1 correctly determines the minimal fixed point of a set of
standard form equations.
Proof. After step 5 has been applied for all j, the equations are all of
the form A,. = a~, where at is a regular expression over Z. The minimal
fixed point of such a set is clearly f(A,) = ~.
2.2.2. Regular Sets and Right-Linear Grammars
We shall show that a language is defined by a right-linear grammar if

and only if it is a regular set. A few observations are needed to show that
every regular set has a right-linear grammar. Let ~ be a finite alphabet.
LEMMA 2.6
(i) ~ , (ii) [e}, and (iii) {a} for all a in I~ are right-linear languages.
Proof.
(i) G = ({S}, E, ~ , S) is a right-linear grammar such that L(G) = rE.
(ii) G = ({S},Z, {S--~ e},S) is a right-linear grammar for which
L(G) = [e}.
(iii) G, = ([S}, l~, {S --~ a}, S) is a right-linear grammar for which
L(G,) = [a}. [Z
LEMMA 2.7
If L t and Lz are right-linear languages, then so are (i) L~ U Lz, (ii) L~L2,
and (iii) L~*.
Proof. Since L~ and L 2 are right-linear, we can assume that there exist
right-linear grammars Gt = (N 1, ~, P~, S~) and G2 = (N2, X, P2, $2) such
that L(Gt) = Lt and L(G2) = L2. We shall also assume that N 1 and N 2 are
disjoint. That is, we know we can rename nonterminals arbitrarily, so this
assumption is without loss of generality.
(i) Let G3 be the right-linear grammar
(N, u u {s,}, x, P, v v {s, s, l&}, s,),

SEC. 2.2 R E G U L A R SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 111
where S a is a new nonterminal symbol not in N~ or N 2. It should be clear that

+
L(G3) = L(G1) U L(G2) because for each derivation $3 =-~ w there is either
+ + Ga
a derivation S~ =-~ w or $2 =-~ w and conversely. Since G 3 is a right-linear
Gt G2
grammar, L(G3) is a right-linear language.
(ii) Let G4 be the right-linear grammar (N1 U N~, X, P4, St) in which
P4 is defined as follows:
(1) If A ~ x B is in P~, then A ~ x B is in P4.
(2) If A --~ x is in P1, then A ~ xS2 is in P4.
(3) All productions in P2 are in P4.
+ + + +
Note that if S~ =~ w, then $1 ~ wSz, and that if $2 ~ x, then S z - ~ x.

G1 G4 G~ G4
-4-
Thus, L(G1)L(G2) ~ L(G4). Now suppose that $1 ~ w. Since there are no
G4
productions of the form A ---~ x in P4 that "came out of" P~ we can write
+ +
the derivation in the form S 1 ~ x S z =-> x y , where w = x y and all produc-

G4 G4
+
tions used in the derivation S1 =-~ x S z arose from rules (1) and (2) of the
construction of P4. Thus we must have the derivations S~ =~ x and S ~ y.
Gt G2
Hence, L(G4) ~ L ( G i ) L ( G 2 ) . It thus follows that L(G4) = L ( G t ) L ( G 2 ) .
(iii) Let G5 = (Nt u ($5), X, Ps, $5) such that $5 is not in N~ and P5
is constructed as follows"
(1) If A ~ x B is in P~, then A ---~ x B is in Ps.
(2) If A ~ x is in P1, then A ~ xS~ and A ~ x are in P5-
(3) $5 ~ S a l e are in Ps-
+ + + + +
A proof that $5 ==~ x~S5 ==> x~xzS5 =~ . . . ==~ x t x z . . . x,,_~S5 =*

Gn G5 Gn Gs 175
+ + +
XlX z . . . X , , x X . if and only if S 1 ~ x~, S~ ==~ xz . . . . , S1 = * x. is left for the

G1 G1 Gl
Exercises.
From the above, we have L ( G s ) = (L(G1))*. [~]
We can now equate the class of right-linear languages with the class of
regular sets.
THEOREM 2.2
A language is a regular set if and only if it is a right-linear language.
Proof.
Only if." This portion follows from Lemmas 2.6 and 2.7 and induction
on the number of applications of the definition of regular set necessary to
show a particular regular set to be one.
If: Let G = (N, E, P, S) be a right-linear grammar with N = (A l , . . . , A,].
We can construct a set of regular expression equations in standard form
with the nonterminals in N as indeterminates. The equation for A t is
A t = 0% + ~tlA1 + . . . + 0c,.,A,, where

(1) 0% = wl + ' " + wk, where At--* w i t . . . I wk are all productions
with Ai on the left and only terminals on the right. If k -- 0, take ~o to be ~ .
(2) 0q.j, j > 0, is xl + . . . + Xm, where At---, xlA~l.." Jx,,,Aj are all
productions with At on the left and a right side ending in Aj. Again, if m = 0,
then ~tj = ~ .
Using Lemma 2.3, it is straightforward to show that L(G) is f(S), where
f is the minimal fixed point of the constructed set of equations. This portion
of the proof is left for the Exercises. But f(S) is a language with a regular
expression, as constructed by Algorithm 2.1. Thus, L(G) is a regular set.
Example 2.10
Let G be defined by the productions
S >OAI1SIe
A > 0BI1A
B > 0S]IB
Then the set of equations generated is that of Example 2.9, with S, A,

and B, respectively, identified with X1, X2, and X 3. In fact, L(G) is the set of
strings whose number of O's is divisible by 3. It is not hard to show that this
set is denoted by the regular expression of (2.2.13). [~]
2.2.3. Finite Automata
We have seen three ways of defining the class of regular sets:

(1) The class of regular sets is the least class of languages containing
~ , [e}, and {a} for all symbols a and closed under union, concatenation,
and *.
(2) The regular sets are those sets defined by regular expressions.
(3) The regular sets are the languages generated by right-linear grammars.
We shall now consider a fourth way, as the sets defined by finite automata.
A finite automaton is one of the simplest recognizers. Its "infinite" memory
is null. Ordinarily, the finite automaton consists only of an input tape and
a finite control. Here, we shall allow the finite control to be nondeterministic,
but restrict the input head to be one way. In fact, we require that the input
head shift right on every move.'t The two-way finite automaton is considered
in the Exercises.
tRecall that, by definition, a one-way recognizer does not shift its input head left but
may keep it stationary during a move. Allowing a finite automaton to keep its input head
stationary does not permit the finite automaton to recognize any language not recognizable
by a conventional finite automaton.
REGULAR SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 113
SEC. 2.2
We specify a finite automaton by defining its finite set of control states,

the allowable input symbols, the initial state, and the set of final states,
i.e., the states which indicate acceptance of the input. There is a state transi-
tion function which, given the "current" state and "current" input symbol,
gives all possible next states. It should be emphasized that the device is
nondeterministic in an automaton-theoretic sense. That is, the device goes
to all its next states, if you will, replicating itself in such a way that one
instance of itself exists for each of its possible next states. The device accepts
if any of its parallel existences reaches an accepting state.
The nondeterminism of the finite automaton should not be confused
with "randomness," in which the automaton could randomly choose a next
state according to fixed probabilities but had a single existence. Such an
automaton is called "probabilistic" and will not be studied here.
We now give a formal definition of nondeterministic finite automa-
ton.
DEFINITION
A nondeterministic finite automaton is a 5-tuple M = (Q, %, d~, q0, F),

where
(1) Q is a finite set of states;
(2) % is a finite set of permissible input symbols;
(3) 6 is a mapping from Q x % to 6~(Q) which dictates the behavior of
the finite state control; 6 is sometimes called the state transition function;
(4) qo in Q is the initial state of the finite state control; and
(5) F ~ Q is the set of final states.
A finite automaton operates by making a sequence of moves. A move is
determined by the current state of the finite control and the input symbol
currently scanned by the input head. A move itself consists of the control
changing state and the input head shifting one square to the right.
To determine the future behavior of a finite automaton, all we need to
know are
(1) The current state of the finite control and
(2) The string of symbols on the input tape consisting of the symbol
under the input head followed by all symbols to the right of this symbol.
These two items of information provide an instantaneous description of
the finite automaton, which we shall call a configuration.
DEFINITION
If M = (Q,%,~, qo, F) is a finite automaton, then a pair (q, w ) i n
Q x %* is a configuration of M. A configuration of the form (q0, w) is
called an initial configuration, and one of the form (q, e), where q is in F, is
called a final (or accepting) configuration.
A move by M is represented by a binary relation q--~-(or }--, where M is
1 14 ELEMENTSOF LANGUAGE THEORY CHAP. 2
understood) on configurations. If 6(q, a) contains q', then (q, aw) ~ (q', w)

for all w in E*.
This says that if M is in state q and the input head is scanning the input
symbol a, then M may make a move in which it goes into state q' and shifts
the input head one square to the right. Since M is in general nondetermin-
istic, there may be states other than q' which it could also enter on one move.
We say that C [.~- C' if and only if C = C'. We say that C O[-~- Ck, k ~ 1,
if and only if there exist configurations C 1 , . . . , Ck_ 1 such that Ct [-~- Ct+~,
for all i, 0 < i < k. CI-~- C' means that C I-~- C' for some k ~ 1, and
C[-~- C' means that C [.~- C' for some k ~ 0. Thus, 1--~.and !~- are, respectively,
the transitive and reflexive-transitive closure of l~-. We shall drop the sub-
script M if no ambiguity arises.
We shall say that an input string w is accepted by M if (qo, w)1.-~--(q, e)
for some q in F. The language defined by M, denoted L(M), is the set of input
strings accepted by M, that is,
[wlw ~ ~* and (q0, w)1.-~-(q, e) for some q in F}.
We shall now give two examples of finite automata. The first is a simple
"deterministic" automaton; the second shows the use of nondeterminism.
Example 2.11
Let M = (~p,q, r}, [0, 1}, $,p, {r~}) be a finite automaton, where $ is
specified as follows"
Input
$ 0 1
State p
q
r Jr] ~r}
M accepts all strings of O's and l's which have two consecutive O's. That
is, state p is the initial state and can be interpreted as "Two consecutive O's
have not yet appeared, and the previous symbol was not a 0." State q means
"Two consecutive O's have not appeared, but the previous symbol was a 0."
State r means "Two consecutive O's have appeared." Note that once state r
is entered, M remains in that state.
On input 01001, the only possible sequence of configurations, beginning
with the initial configuration (p, 01001), is
(p, 01001) [-- (q, 1'001)

(p, ool)
(q, 01)
SEC. 2.2 REGULAR SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 1 15
~- (r, 1)
[- (r, e)
Thus, 01001 is in L(M). E]
Example 2.12
Let us design a nondeterministic finite automaton to accept the set of
strings in {1, 2, 3} such that the last symbol in the input string also appears
previously in the string. That is, 121 is accepted, but 31312 is not. We shall
have a state q0, which represents the idea that no attempt has been made to
recognize anything. In this state the automaton (or that existence of it,
anyway) is "coasting in neutral." We shall have states q t, q2, and q3, which
represent the idea that a "guess" has been made that the last symbol of the
string is the subscript of the state. We have one final state, qs" In addition
to remaining in qo, the automaton can go to state q= if a is the next input.
If (an existence of) the automaton is in qa, it can go to qs if it sees another a.
The automaton goes no place from qs, since the question of acceptance must
be decided anew as each symbol on its input tape becomes the "last." We
specify M formally as
M = ([qo, qa, q2, q3, qs}, [1, 2, 3}, ~, qo, {qt'})
where ~ is given by the following table"
Input
1 2 3
State q0 (qo,ql} {qo,q2} [qo,q3}

ql {ql,q:} (qa} {ql]
q2 [q2} {q2, qf} {q2}
q3 {q3} {q3} {q3,qf}
qf
On input 12321, the proliferation of configurations is shown in the tree

in Fig. 2.2.
~.q3, 21)--(q3, 1)---(q3,e)

tq2,321)--(q2, 21)x-(q2, 1)--(q2, e)
"\qf , 1)
k q x , 2321)~(q1,321)~(qx, 21)~(q'a, 1)X--(ql,e)
\"t.qf, e)
Fig. 2.2 Configurations of M.
Since (q0, 12321)l-~-- (qs, e), the string 12321 is in L(M). Note that certain
configurations are repeated in Fig. 2.2, and for this reason a directed acyclic
graph might be a more suitable representation for the configurations entered
b y M . E]
It is often convenient to use a pictorial representation of finite automata.

DEFINITION
Let M = (Q, E, ~, q0, F) be a nondeterministic finite automaton. The

transition graph for M is an unordered labeled graph where 'the nodes of
the graph are labeled by the names of the states and there is an edge (p, q)
if there exists an a ~ E such that ~(p, a) contains q. Further, we label the
edge (p, q) by the list of a such that q ~ ~(p, a). Thus the transition graphs
for the automata of Examples 2.11 and 2.12 are shown in Fig. 2.3. We have
1 0
0 _~0,1
Start
(a) Example 2.11
1, 2, 3
1,2, l 1 1
Start - )
(b) Example 2.12

Fig. 2.3 Transitiongraphs.
indicated the start state by pointing to it with an arrow labeled "start," and
final states have been circled.
We shall define a deterministic finite automaton as a special case of the
nondeterministic variety.
DEFINITION
Let M = (Q, E, ~, q0, F) be a nondeterministic finite automaton. We say

that M is deterministic if ~(q, a) has no more than one member for any q in
SEC. 2.2 REGULAR SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 1 17
Q and a in E. If O(q, a) always has exactly one member, we say that M is

completely specified.
Thus the automaton of Example 2.11 is a completely specified determin-
istic finite automaton. We shall hereafter reserve the term finite automaton
for a completely specified deterministic finite automaton.
One of the most important results from the theory of finite automata is
that the classes of languages defined by nondeterministic finite automata
and completely specified deterministic finite automata are identical. We shall
prove the result now.
CONVENTION
Since we shall be dealing primarily with deterministic finite automata,
we shall write "6(q, a ) = p" instead of "6(q, a ) = {p}" when the automaton
with transition function 6 is deterministic. If O(q, a ) = ~ , we shall often
say that 6(q, a) is "undefined."
THEOREM 2.3
If L - - L ( M ) for some nondeterministic finite automaton M, then
L -- L(M') for some finite automaton M'.
Proof Let M - - (Q, E, 6, qo, F). We construct M ' = (Q', E, 6', q0, F'),
as follows"
(1) Q' -- (p(Q). Thus the states of M' are sets of states of M.
(2) qo = {q0}.
(3) F ' consists of all subsets S of Q such that S ~ F =/= ~ .
(4) For all S ~ Q, 6'(S, a ) = s', where S ' = {pl0(q, a) contains p for
some q in S}.
It is left for the Exercises to prove the following statement by induction
on/'
(2.2.16) (S, w)l-~ (S', e) if and only if

S' = {Pl(q, w) l~- (P, e) for some q in S}
As a special case of (2.2.16), ({qo}, w)I-~ (S', e) for some S' in F' if and
only if (q0 ' w)I .A-.
M
(p ' e) for some p in F. Thus, L(M') = L(M)
Example 2.13
Let us construct a finite automaton M ' = (Q, {1, 2, 3}, ,6', (q0}, F) accept-
ing the language of M in Example 2.12. Since M has 5 states, it seems that
M ' has 32 states. However, not all of these are accessible from the initial
state. That is, we call a state p accessible if there is a w such that
(q0, w)I* (P, e), where q0 is the initial state. Here, we shall construct only the
accessible states.
We begin by observing that {q0} is accessible. 6'((q0}, a ) = (q0, q~} for
a = 1, 2, and 3. Let us consider the state {q0, ql}. We have 6'((q0, qx}, 1) =
Input
1 2 3
State A = {q0} B C D
B = { q o , qx} E F G
C={qo,q2] F H I
D = [qo, q3} G I J
E = [qo, q l , q f } E F G
F = {qo, q I , q 2 } K K L
G = { q o , qx,q3} M L M
H = [q0, q2, qf} F H I
1 = [qo,qz, q3} L N N
J = {qo, q 3, qf} G I J
K = {qo, q l , q z , qy} K K L
L = [qo, q l , q 2 , q 3 } P P P
M = {qo, q l , q 3 , qy} M L M
N = { q o , q 2 , q 3 , qf} L N N
P = {qo,ql,q2, q3,qf} P P P
Fig. 2.4 Transition function of M ' .
{qo, ql, qr}" Proceeding in this way, we find that a set of states of M is acces-
sible if and only if-
(1) It contains qo, and
(2) If it contains @, then it also contains ql, q~, or q3.
The complete set of accessible states, together with the 8' function, is
given in Fig. 2.4.
The initial state of M' is A, and the set of final states consists of E, H, .1,
K, M, N, and P.
2.2.4. Finite A u t o m a t a and Regular Sets
We shall show that a language is t regular set if and only if it is defined

by a finite automaton. The method i; first to show that a finite automaton
language is defined by a right-linear ~ :ammar. Then we show that the finite
automaton languages include ~ , (e} {a} for all symbols a, and are closed
under union, concatenation, and *. Ti us every regular set is a finite automa-
ton language. The following sequence of lemmas proves these assertions.
LEMMA 2.8
If L = L(M) for finite automaton M, then L = L(G) for some right-

linear grammar G.
Proof. Let M = (Q, E, 8, qo, F) (M is deterministic, of course). We let
G' = (Q, ~, P, qo), where P is defined as follows:
(1) If d~(q, a) = r, then P contains the production q ~ at.
(2) If p is in F, then p ---~ e is a production in P.
SEC. 2.2 REGULAR SETS, THEIR GENERATORS, AND THEIR RECOGNIZERS 'l 19
We can show that each step of a derivation in G mimics a move by M.

We shall prove by induction on i that
i+1
(2.2.17) For q in Q, q ~ w if and only if (q, w) [_L_(r, e) for some r in F
Basis. For i = 0, clearly q =~ e if and only if (q, e) [.-.~-(q, e) for q in F.

Inductive Step. Assume that (2.2.17) is true for i and let w = ax, where
i+1 i
[xl = i. Then q ~ w if and only ifq ==~as ~ ax for some s 6 Q. But q =~ as
i
if and only if O(q, a) = s. From the inductive hypothesis, s ==~ x if and only
i+1
if (s, x) I'- 1 (r, e) for some r ~ F. Therefore, q ~ w if and only if (q, w) l-L-(r, e)
for some r ~ F. Thus Eq. (2.2.17) is true for all i > 0.
+
We now have q0 =~ w if and only if (q0, w)I .-~- (r, e) for some r ~ F.
Thus, L(G) = L(M). E]
LEMMA 2.9
Let X be a finite alphabet. (i) ~ , (ii) {e}, and (iii) {a} for a ~ X are finite
automaton languages.
Proof.
(i) Any finite automaton with an empty set of final states accepts ~ .
(ii) Let M = ({q0}, X, ~5, qo, [q0}), where 6(qo, a) is undefined for all a
in X. Then L ( M ) = {e}.
(iii) Let M = ({q0, ql}, X, 6, qo, [ql}), where ~(qo, a) = ql and 6 is un-
defined otherwise. Then L ( M ) = [a]. E]
LEMMA 2.10
Let L1 = L(Mi) and L2 = L(M2) for finite automata Mt and M2. Then
(i) L~ U L2, (ii) L~L2, and (iii) Lt* are finite automaton languages.
Proof Let M~ = (Qi, E, 61, qt, El) and M2 = (Q2, E, c52,q2, F2). We
assume without loss of generality that Q1 (q Q2 = ~ , since states can be
renamed at will.
(i) Let M = (Q1 u Q2 u [q0}, E, c5, q0, F) be a nondeterministic finite
automaton, where
(1) qo is a new state,
(2) F=F1 UF2ifeisnotinLlorLzandF=F 1UF2U{qo}ife
is in L1 or L2, and
(3) (a) 8(q0, a) = ~5(ql, a) u ~5(q2, a) for all a in X,
(b) 8(q, a) = 6i(q, a) for all q in Q1, a in X, and
(c) cS(q, a) = ~2(q, a) for all q in Q2, a in X.
Thus, M guesses whether to simulate M1 or M2. Since M is nondeterministic,
it actually does both. It is straightforward to show by induction on i > 1
that (q0, w) l.-~-(q, e) if and only i f q i s in Qi and (ql, w) [~-~(q, e) or qis in Q2
and (q2, w)1~7 (q, e). This result, together with the definition of F, yields
L(M) = L(M~) U L(M2).
(ii) To construct a finite automaton M to recognize L1L 2 let
M = (Q1 t.j Q z, E, t~, q~, F), where ~ is defined by
(1) ~(q, a) = ~l(q, a) for all q in Q~ - F~,
(2) 5(q, a) = ill(q, a) U t~2(q2, a) for all q in F1, and
(3) $(q, a) = ~2(q, a) for all q in Q2.
Let
F = J Fz', if q2 ~ Fz
IF1 U F2, if q2 ~ F2
That is, M begins by simulating M1. When M reaches a final state of M1,
it may, nondeterministically, imagine that it is in the initial state of M2
by rule (2). M will then simulate M2. Let x be in L 1 and y in L 2. Then
(q~, xy)[.~- (q, y) for some q in F~. If x = e, then q = q l. If y ~ e, then, using
one rule from (2) and zero or more from (3), (q, y)1.~- (r, e) for some r ~ F2.
If y = e, then q is in F, since q2 ~ F2. Thus, xy is in L(M). Suppose that w
is in L(M). Then (q l, w)I-~ (q, e) for some q ~ F. There are two cases to
consider depending on whether q ~ F2 or q ~ F1. Suppose that q ~ /72.
Then we can write w = xay for some a in E such that
O
(ql, xay) l~- (r, ay) I~- (s, y) IM*~(q, e),

where r ~ Fi, s ~ Q2, and $2(q, a) contains s. Then x ~ L 1 and ay ~ L 2.
Suppose that q ~ F 1. Then q2 ~ F2 and e is in L 2. Thus, w ~ L 1. We con-
clude that L(M) = L~L z.
(iii) We construct M = (Q1 u [q'~}, ~, $, q', F 1 u {q'}), where q' is a new
state not in Q1, to accept L~* as follows. ~ is defined by
(1) O(q, a) = Ol(q, a) if q is in Q 1 - F1 and a ~ E,
(2) t~(q, a) = ~1 (q, a) u ~ ~(ql, a) if q is in F 1 and a ~ E, and
(3) $(q', a) = ~1 (q l, a) for all a in E.
Thus, whenever M enters a final state of M1, it has the option of continuing
to simulate M1 or to begin simulating M1 anew from the initial state. A proof
that L ( M ) = L~* is similar to the proof of part (ii). Note that since q' is
a final state, e ~ L(M). D
THEOREM 2 . 4
A language is accepted by a finite automaton if and only if it is a regular

set.
Proof Immediate from Theorem 2.2 and Lemmas 2.8, 2.9, and 2.10. D
2.2.5. Summary
The results of Section 2.2 can be summarized in the following theorem.

EXERCISES 1 21
THEOREM 2.5
The following statements are equivalent:
(1) L is a regular set.
(2) L is a right-linear language.
(3) L is a finite a u t o m a t o n language.
(4) L is a nondeterministic finite a u t o m a t o n language.
(5) L is d e n o t e d by a regular expression. [ ]
EXERCISES
2.2.1. Which of the following are regular sets? Give regular expressions for
those which are.
(a) The set of words with an equal number of 0's and l's.
(b) The set of words in {0, 1}* with an even number of O's and an odd
number of l's.
(c) The set of words in ~* whose length is divisible by 3.
(d) The set of words in {0, 1]* with no substring 101.
2.2.2. Show that the set of regular expressions over 1~ is a CFL.
2.2.3. Show that if L is any regular set, then there is an infinity of regular
expressions denoting L.
2.2.4. Let L be a regular set. Prove directly from the definition of a regular
set that L R is a regular set. Hint: Induction on the number of applica-
tions of the definition of regular set used to show L to be regular.
2.2.5. Show the folilowing identities for regular expressions 0~, fl, and ~:
(a) ~(fl + 7) = ~P + ~7. (g) (~ + P)r = ~r + Pr.
(b) ~ + (p + ~) = (~ + p) + 7. (h) ~ * = e .
(i) 0C* -t- 0C -- ~;*.
(d) 0ce -- e0~ = 0~. (j) (~*)* = ~*.
(e) ~ = ~ =~. (k) (~ + p)* = (~*p*)*.
(f) ~ + 0 ~ = ~ . (1) ~ + ~ = ~ .
2.2.6. Solve the following set of regular expression equations:
A1 = (01" + 1)A1 + A2
A2 = 11 + 1A1 + 00A3
A3 = e + A 1 ÷A2
2.2.7. Consider the single equation
(2.2.18) X = 0CX + fl
where 0~ and fl are regular expressions over ~ and X q~ E. Show that

(a) If e is not in 0c, then X = 0~*fl is the unique solution to (2.2.18).
(b) If e is in ~, then 0c*fl is the minimal fixed point of (2.2.18), but
there are an infinity of solutions.
(c) In either case, every solution to (2.2.18) is a set of the form

• * ( f l u L) for some (not necessarily regular) language L.
2.2.8. Solve the following general pair of standard form equations"
X = ~1 X -t- 0~2 Y -b ~3
Y = PI X ~ 1~2 Y ~ ~3
2.2.9. Complete the proof of Lemma 2.4.

2.2.10. Prove L e m m a 2.5.
2.2.11. Find right-linear grammars for those sets in Exercise 2.2.1 which are
regular sets.
DEFINITION
A grammar G = (N, E, P, S) is left-linear if every production in P
is of the form A ---~ Bw or A --, w.
2.2.12. Show that a language is a regular set if and only if it has a left-linear
grammar. Hint: Use Exercise 2.2.4.
DEFINITION
A right-linear grammar G = (N, Z, P, S) is called a regular gram-
mar when
(1) All productions with the possible exception of S ~ e are of the
form A ~ aB or A ----~ a, where A and B are in N and a is in Z.
(2) If S---~ e is in P, then S does not appear on the right of any
production.
2.2.13. Show that every regular set has a regular grammar. Hint: There are
several ways to do this. One way is to apply a sequence of transforma-
tions to a right-linear grammar G which will map G into an equivalent
regular grammar. Another way is to construct a regular grammar
directly from a finite automaton.
2.2.14. Construct a regular grammar for the regular set generated by the
right-linear grammar
A----~ BIC
B---~ OBI1BIOll
C----~ O D I 1 C l e
D ~ 0Cl 1D
2.2.15. Provide an algorithm which, given a regular grammar G and a string w,

determines whether w is in L(G).
2.2.16. Prove line (2.2.16) in Theorem 2.3.
2.2.17. Complete the proof of L e m m a 2.7(iii).
EXERCISES 123
DEFINITION
A production A ~ 0c of right-linear grammar G = (N, E, P, S) is
useless if there do not exist strings w and x in ~* such that
S =:~ wA ==~ woc ~ wx.
2.2.18. Give an algorithm to convert a right-linear grammar to an equivalent

one with no useless productions.
"2.2.19. Let G = (N, Z, P, S) be a right-linear grammar. Let N = [A1 . . . . . A,},
and define ~xtj = xl + x2 + . - - + xm, where At ~ x t At . . . . . At ~ xmA#
are all the productions of the form A t - - , y A # . Also define
OClo = x t + . . . + Xm, where At ~ xt . . . . . At ---~ xm are all produc-
tions of the form At ---~ y. Let Q be the set of standard form equations
At = ~zt0 + tgtlA1 + tZtzA2 + . . . + tZt~A,. Show that the minimal
fixed point of Q is L(G). H i n t : Use Lemma 2.3.
2.2.20. Show that L ( G ) o f Example 2.10 is the set of strings in [0, 1}*, whose
length is divisible by 3.
2.2.21. Find deterministic and nondeterministic finite automata for those sets
of Exercise 2.2.1 which are regular.
2.2.22. Show that the finite automaton of Example 2.11 accepts the language
(0 + i)*00(0 + 1)*.
2.2.23. Prove that the finite automaton of Example 2.12 accepts the language
[wa[a in [1, 2, 3} and w has an instance of a}.
2.2.24. Complete the proof of Lemma 2.10(iii).
*2.2.25. A two-way finite automaton is a (nondeterministic) finite control with
an input head that can move either left or right or remain stationary.
Show that a language is accepted by a two-way finite automaton if and
only if it is a regular set. Hint: Construct a deterministic one-way
finite automaton which, after reading input w-~ e, has in its finite
control a finite table which tells, for each state q of the two-way auto-
maton in what state, if any, it would move off the right end of w, when
started in state q at the rightmost symbol of w.
*2.2.26. Show that allowing a one-way finite automaton to keep its input head
stationary does not increase the class of languages defined by the device.
**2.2.27. For arbitrary n, show that there is a regular set which can be recognized
by an n-state nondeterministic finite automaton but requires 2" states
in any deterministic finite automaton recognizing it.
2.2.28. Show that every language accepted by an n-state two-way finite auto-
maton is accepted by a 2"~"+1)-state finite automaton.
**2.2.29. How many different languages over [0, 1} are defined by two-state
(a) Nondeterministic finite automata ?
(b) Deterministic finite automata ?
(c) Finite automata?
DEFINITION
A set S of integers forms an arithmetic progression if we can write

S = [ c , c + p , c ÷ 2 p . . . . . c + i p . . . . ]. For any language L, let
S(L) = {/[for some w in L, I wl = i3.
**2.2.30. Show that for every regular language L, S(L) is the union of a finite
number of arithmetic progressions.
Open Problem
2.2.31. How close to the bound of Exercise 2.2.28 for converting n-state two-
way nondeterministic finite automata to k-state finite automata is it
actually possible to come ?
BIBLIOGRAPHIC NOTES
Regular expressions were defined by Kleene [1956]. McNaughton and Yamada

[1960] and Brzozowski [1962] cover regular expressions in more detail. Salomaa
[1966] describes two axiom systems for regular expressions. The equivalence of
regular languages and regular sets is given by Chomsky and Miller [1958]. The
equivalence of deterministic and nondeterministic finite automata is given by
Rabin and Scott [1959]. Exercise 2.2.25 is from there and from Shepherdson [1959].
2.3. PROPERTIES OF REGULAR SETS
In this section we shall derive a number of useful facts about finite

automata and regular sets. A particularly important result is that for every
regular set there is an essentially unique minimum state finite automaton
that defines that set.
2.3.1. Minimization of Finite A u t o m a t a
Given a finite automaton M, we can find the smallest finite automaton

equivalent to M by eliminating all inaccessible states in M and then merging
all redundant states in M. The redundant states are determined by partition-
ing the set of all accessible states into equivalence classes such that each
equivalence class contains indistinguishable states and is as large as possible.
We then choose one representative from each equivalence class as a state for
the reduced automaton. Thus we can reduce the size of M if M contains
inaccessible states or two or more indistinguishable states. We shall show
that this reduced machine is the smallest finite automaton that recognizes
the regular set defined by the original machine M.
DEFINITION
Let M = (Q, E, ~, q0, F) be a finite automaton, and let qt and qz be dis-

tinct states. We say that x in E* distinguishes qx from q2 if (qx, x)[-~-- (q3, e),
SEC. 2.3 PROPERTIES OF REGULAR SETS 125
(q2,x)l-~-(q4, e), and exactly one of q3 and q4 is in F. We say that ql

k
and qz are k-indistinguishable, written qa ~ q~, if and only if there is no x,
with ]xl ~ k, which distinguishes ql from q2. We say that two states ql and
q2 are indistinguishable, written ql ~ qz, if and only if they are k-indistinguish-
able for all k _~ 0.
A state q ~ Q is said to be inaccessible if there is no input string x such
that (qo, x ) l ± (q, e).
M is said to be reduced if no state in Q is inaccessible and no two distinct
states of Q are indistinguishable.
Example 2.14
Consider the finite automaton M whose transition graph is shown in
Fig. 2.5.
0
Start-~
Fig. 2.5 Transitiongraph of M.
To reduce M we first notice that states F and G are inaccessible from

the start state A and thus can be removed. We shall see in the next algorithm
that the equivalence classes under ~ are {A}, {B, D}, and {C, E}. Thus we
can represent these sets by the states p, q, and r, respectively, to obtain the
finite automaton of Fig. 2.6, which is the reduced automaton for M. Q
LEMMA 2.11
Let M = (Q, ~ , 8 , q0,F) be a finite automaton with n states. States ql
and qz are indistinguishable if and only if they are (n -- 2)-indistinguishable.
1
Start • ( ~0 . )'- - '- - - 0.1
- ~' - '- - -N-/ ~ x_ ~ ,_~~. j. .~. l. . . . . . , ~ ~
0
Fig. 2.6 Reducedmachine.
Proof The "only if" portion is trivial. The "if" portion is trivial if F
has 0 or n states. Therefore, assume the contrary.
We shall show that the following condition must hold on the k-indistin-
guishability relations"
n-2 n-3 2 1 0
To see this, we observe that for ql and q2 in Q,

0
(1) ql =-- qz if and only if both q~ and qz are either in F or not in F.
k k--1 k--1
(2) ql ~ q2 if and only if ql ~ q2 and for all a in 2, 6(ql, a) ~ 6(q2, a).
0
The equivalence relation = is the coarsest and partitions Q into two equiva-
k+l k k+l k
lence classes, F and Q - F. Then if - - ~ ~ , - - is a strict refinement o f - - ,
k+l k
that is, - - contains at least one more equivalence class than - - . Since there
are at most n -- 1 elements in either F or Q - F we can have at most n -- 2
0 k+l k k+l k+2
successive refinements of - - . If for s6me k, _ , then - - ..-,
k k+l k
by (2). Thus, - - is the first relation - - such that -- - - . [Z]
L e m m a 2.11 has the interesting interpretation that if two states can be
distinguished, they can be distinguished by an input sequence of length less
than the n u m b e r of states in the finite a u t o m a t o n . The following algorithm
gives the details of how to minimize the n u m b e r of states in a finite a u t o m a -
ton.
ALGORITHM 2.2
Construction of the canonical finite a u t o m a t o n .

Input. A finite a u t o m a t o n M -- (Q, Z, 5, q0, F).
Output. A reduced equivalent finite a u t o m a t o n M'.
Method.
Step 1." Use Algorithm 0.3 on the transition graph of M to find those
states which are inaccessible from q0. Delete all inaccessible states.
o 1
Step 2: Construct the equivalence relations - - , - - , . . . , as outlined in
k+l k k
L e m m a 2.11. Continue until -- _ . Choose - - to be - - .
Step 3: Construct the finite a u t o m a t o n M ' = (Q', E, 5', q;, F'), where
(a) Q' is the set of equivalence classes under _=. Let [p] be the equivalence
class of state p under ~ .
(b) 5'([p], a) = [q] if 6(p, a) = q.
(c) q; is [q0]-
(d) F ' = [[q]lq ~ F].
SEC. 2.3 PROPERTIES OF R E G U L A R SETS 127
It is straightforward to show that step 3(b) is consistent; i.e., whatever

member of [p] we choose we get the same equivalence class for t~([p], a).
A proof that L(M') = L(M) is also straightforward and left for the Exercises.
We prove that no automaton with fewer states than M ' accepts L(M).
THEOREM 2.6
M ' of Algorithm 2.2 has the smallest number of states of any finite
automaton accepting L(M).
Proof. Suppose that M " had fewer states than M' and that L(M") -- L(M).
Each equivalence class under --~ is nonempty, so each state of M' is
accessible. Thus there exist strings w and x such that (q0', w)1~,, (q, e) and
(q0', x)[~,, (q, e), where qo' is initial state of M " , but w and x take M' to
different states. Hence, w and x take M to different states, say p and r, which
are distinguishable. That is, there is some y such that exactly one of wy
and xy is in L(M). But wy and xy must take M " to the same state, namely,
that state s such that (q, y)[~,, (s, e). Thus is not possible that exactly one
of wy and xy is in L(M"), as supposed.
Example 2.15
Let us find a reduced finite automaton for the finite automaton M whose
k
transition graph is shown in Fig. 2.7. The equivalence classes for = , k > 0,
are as follows"
0
For ~ " {A, F}, {B, C, D, E}
1
For ~ " {A, F}, {B, E}, {C, D}
2
For --" {A, F}, {B, E}, {C, D}.
Start
a
b a _~ /"-
Fig. 2.7 Transition graph of M.

2 1 1
Since ~ = ~ we have ~ - ~ . The reduced machine M' is (~[A], [B], [C]},
[a, b}, ~', A, [[A]}), where ~' is defined as
a b
[A] [,4] In]

[B] IS] [c]
[C] [c] [A]
Here we have chosen [A] to represent the equivalence class [A, F}, [B] to
represent [B, E}, and [C] to represent [C, D}.
2.3.2. The Pumping Lemma for Regular Sets
We shall now derive a characterization of regular sets that will be useful

in proving certain languages not to be regular. The next theorem is referred
to as a "pumping" lemma, because it says, in effect, that given any regular
set and any sufficiently long sentence in that set, we can find a nonempty
substring in that sentence which can be repeated as often as we like (i.e.,
"pumped") and the new strings so formed will all be in the same regular set.
It is often possible to thus derive a contradiction of the hypothesis that the
set is regular.
THEOREM 2.7
The pumping lemma for regular sets" Let L be a regular set. There exists
a constant p such that if a string w is in L and l wt ~ p, then w can be written
as xyz, where 0 < t yl_< p and xy~z ~ L for all i _~ 0.
Proof. Let M = (Q, X, ~, q0, F) be a finite automaton with n states such
that L(M) = L. Let p = n. If w ~ L and I w l ~ n, then consider the sequence
of configurations entered by M in accepting w. Since there are at least n + 1
configurations in the sequence, there must be two with the same state among
the first n + 1 configurations. Thus we have a sequence of moves such that
(qo, xyz)1.-~-- (ql, yz) [-&.(ql, z)[--~. (q2, e)

for some q~ and 0 < k ~ n. Thus, 0 < l y l _ < n. But then
(qo, xfiz)1-~-. (q~, y'z)
[2-(ql,y~-~z )
o
l"~--(ql, yz)
l-'~-"(q l, z)
I-~- (q2, e)
SEC. 2.3 PROPERTIES OF REGULAR SETS 129
must be a valid sequence of moves for all i > 0. Since w = xyz is in L, xytz
is in L for all i > 1. The case i = 0 is handled similarly, l---1
Example 2.16
We shall use the pumping iemma to show that L -- [0"l"[n > 1} is not
a regular set. Suppose that L is regular. Then for a sufficiently large n, 0"1"
can be written as xyz such that y =/= e and xyiz ~ L for all i ~ 0. If y ~ 0 +
or y ~ 1+, then xz = xy°z ~ L. If y E 0+1 +, then xyyz ~ L. We have a con-
tradiction, so L cannot be regular. [--]
2.3.3. Closure Properties of Regular Sets
We say that a set A is closed under the n-ary operation 0 if O(a 1, a2, . . . , a,)
is in A whenever at is in A for 1 < i < n. For example, the set of integers is
closed under the binary operation addition.
In this section we shall examine certain operations under which the class
of regular sets is closed. We can then use these closure properties to help
determine whether certain languages are regular. We already know that if
Lt and L2 are regular sets, then L~ U L2, LiL2, and L* are regular.
DEFINITION
A class of sets is a Boolean algebra of sets if it is closed under union,
intersection, and complementation.
THEOREM 2.8
The class of regular sets included in Z* is a Boolean algebra of sets for
any alphabet l~.
Proof We shall show closure under complementation. We already have
closure under union, and closure under intersection follows from the set-
theoretic law A n B = 7 U - B - (Exercise 0.1.4). Let M -- (Q, A, d~, q0, F)
be any finite automaton with A ~ E. It is easy to show that every regular
set L ~ E* has such a finite automaton. Then the finite automaton
M' = (Q, A, 6, qo, Q - F) accepts A* -- L(M). Note that the fact that M is
completely specified is needed here. Now L(M), the complement with respect
to ~*, can be expressed as L ( M ) = L ( M ' ) U E * ( E - - A)E*. Since
]g*(E -- A)E* is regular, the regularity of L(M) follows from the closure of
regular sets under union. E]
THEOREM 2.9
The class of regular sets is closed under reversal.
Proof. Let M -- (Q, IE, ~, q0, F) be a finite automaton defining the regu-
lar set L. To define L R we "run M backward." That is, let M ' be the nondeter-
ministic finite automaton (Q u {q~}, ~:, 6', q~, F'), where F' is [q0] if e ~ L
and F ' = {q0, q~} if e e L.
(1) ~3'(q~,,a) contains q if $(q, a) ~ F,

(2) For all q' in Q and a in E, d~'(q', a) contains q if O(q, a) = q'.
It is easy to show that (q0, w)[.~-(q, e), where q ~ F, if and only if
(q~,, wR)1~ (qo, e). ThUs, L ( M ' ) = ( L ( M ) ) R = L n. E]
The class of regular sets is closed under most common language theoretic
operations, More of these closure properties are explored in the Exer-
cises.
2.3.4. Decidable Questions About Regular Sets
We have seen certain specifications for regular sets, such as regular

expressions and finite automata. There are certain natural questions concern-
ing these representations that come up. Three questions with which we shall
be concerned here are the following:
The membership problem" "Given a specification of known type and
a string w, is w in the language so specified ?"
The emptiness problem: "Given a specification of known type, does it
specify the empty set ?"
The equivalence problem: "Given two specifications of the same known
type, do they specify the same language ?"
The specifications for regular sets that we shall consider are
(1) Regular expressions,
(2) Right-linear grammars, and
(3) Finite automata.
We shall first give algorithms to decide the three problems when the specifi-
cation is a finite automaton.
ALGORITHM 2.3
Decision of the membership question problem for finite automata.
Input. A finite automaton M = (Q, E, A, q0, F) and a word w in 2;*.
Output. "YES" if w ~ L ( M ) , "NO" otherwise.
Method. Let w = ata z . . . an. Successively find the states q~ = di(qo, a~),
q2 = t~(q~, a 2 ) , . . . , q, = ~(q,_~, a,). If q, is in F, say "YES"; if not, say
"NO." ~
The correctness of Algorithm 2.3 is too obvious to discuss. However, it is

worth discussing the time and space complexity of the algorithm. A natural
measure of these complexities is the number of steps and memory cells
needed to execute the algorithm on a random access computer in which each
memory cell can store an integer of arbitrary size. (Actually there is a bound
on the size of integers for real machines, but this bound is so large that we
would undoubtedly never come against it for finite automata which we might
SEC. 2.3 PROPERTIES OF REGULAR SETS 1 31
reasonably consider. Thus the assumption of unbounded integers is a reason-

able mathematical simplification here.)
It is easy to see that the time taken is a linear function of the length of w.
However, it is not so clear whether or not the "size" of M affects the time
taken. We must assume that the actual specification for M is a string of
symbols chosen from some finite alphabet. Thus we might suppose that
states are named q0, q~, . . . , qi, -. •, where the integer subscripts are binary
numbers. Likewise, the input symbols might be called a t, a2, . . . . Assuming
a normal kind of computer, one could take the pairs in the relation d; and
construct a two-dimensional array that in cell (i, j) gave 5(qi, a~). Thus the
total time of the algorithm would be an amount proportional to the length
of the specification of M to construct the table, plus an amount proportional
to lw] to execute the algorithm.
The space required is primarily the space required by the table, which is
seen to be proportional to the length of M's specification. (Recall that tf is
really a set of pairs, one for each pair of a state and input symbol.)
We shall now give algorithms to decide the emptiness and equivalence
problems when the method of specification is a finite automaton.
ALGORITHM 2 . 4
Decision of emptiness problem for finite automata.

Input. A finite automaton M = (Q, E, tS, q0, F).
Output. "YES" if L(M) ~ ~, "NO" otherwise.
Method. Compute the set of states accessible from qo. If this set contains
a final state, say "YES"; otherwise, say "NO." D
ALGORITHM 2.5
Decision of equivalence problem for finite automata.
'Input. Two finite automata Mi = (Ql, Xl, t~l, ql,F1) and M2 =

(Q2, X2, ~2, q2, F2) s0ch that Q1 A Q2 = ~ .
Output. "YES" if L(M~) = L(M2), "NO" otherwise.
Method. Construct the finite automaton
M = (Q, u Qz, E, u E2, ~, u ~2, q,, F, u Fz).

Using Lemma 2.11 determine whether ql ~ q2. If so, say "YES"; otherwise,
say "NO". [Z]
We point out that we could also use Algorithm 2.4 to solve the equiva-
lence problem since L(M1)= L(M2) if and only if
(L(M~) ~ L(M2) ) t,,.) (L(Mi) ~ L(M2) ) = ~ .

We now turn to the decidability of the membership, emptiness, and

equivalence problems for two other representations of regular sets--the
regular expressions and right-linear grammars. It is simple to show that for
these representations the three problems are also decidable. A regular expres-
sion can be converted, by an algorithm which is implicit in Lemmas 2.9 and
2.10, into a finite automaton. The appropriate one of Algorithms 2.3-2.5
can then be applied. A right-linear grammar can be converted into a regular
expression by Algorithm 2.1 and the algorithm implicit in Theorem 2.2.
Obviously, these algorithms are too indirect to be practical. Direct, fast-
working algorithms will be the subject of several of the Exercises.
We can summarize these results by the following theorem.
THEOREM 2.10
If the method of specification is finite automata, regular expressions, or
right-linear grammars, then the membership, emptiness, and equivalence
problems are decidable for regular sets. [Z
It should be emphasized that these three problems are not decidable for
every representation of regular sets. In particular, consider the following
example.
Example 2.17
We can enumerate the Turing machines. (See the Exercises in Section 0.4.)
Let M1, M z , . . • be such an enumeration. We can define the integers to be
a representation of the regular sets as follows:
(1) If Mt accepts a regular set, then let integer i represent that regular set.
(2) If Mt does not accept a regular set, then let integer i represent {e}.
Each integer thus represents a regular set, and each regular set is repre-
sented by at least one integer. It is known that for the representation of
Turing machines used here the emptiness problem is undecidable (Exercise
0.4.16). Suppose that it were decidable whether integer i represented @.
Then it is easy to see that M~ accepts ~ if and only if i represents ~ . Thus
the emptiness problem is undecidable for regular sets when regular sets are
specified in this manner. D
EXERCISES
2.3.1. Given a finite automaton with n accessible states, what is the smallest
number of states the reduced machine can have ?
EXERCISES 1 33
2.3.2. Find the minimum state finite automaton for the language specified by
the finite automaton M = ((A, B, C, D, E, F], {0, 1}, ~, A, [E, F}), where
is given by
Input
0 1
State A B C
B E F
C A A
D F E
E D F
F D E
2.3.3. Show that for all n there is an n-state finite automaton such that
n-2 n-3
~ ~.
2.3.4. Prove that L ( M ' ) = L ( M ) in Algorithm 2.2.
DEFINITION
We say that a relation R on 1~* is right-invariant if x R y implies
x z R y z for all x, y, z in ~*.
2.3.5. Show that L is a regular set if and only if L is the union of some of the
equivalence classes of a right-invariant equivalence relation R of finite
index. Hint: Only if." Let R be the relation x R y if and only if
(q0, x)1~ (p, e), (q0, Y) [~ (q, e), and p = q. (That is, x and y take a finite
automaton defining L to the same state.) Show that R is a right-invariant
equivalence relation of finite index. If: Construct a finite automaton
for L using the equivalence classes of R for states.
DEFINITION
We say that E is the coarsest right-invariant equivalence relation for
a language L ~ E* if x E y if and only if for all z ~ E* we find
x z ~ L exactly when y z ~ L.
The following exercise states that every right-invariant equivalence
relation defining a language is always contained in E.
2.3.6. Let L be the union of some of the equivalence classes of a right-invariant
equivalence relation R on ~*. Let E be the coarsest right-invariant
equivalence relation for L. Show that E ~ R.
*2.3.7. Show that the coarsest right invariant equivalence relation for a lan-
guage is of finite index if and only if that language is a regular set.
2.3.8. Let M = (Q, E, ~, q0, F) be a reduced finite automaton. Define the
relation E on ~* as follows: x E y if and only if (q0, x)I ~ (p, e),
(q0, Y)I~ (q, e), and p = q. Show that E is the coarsest right-invariant
equivalence relation for L ( M ) .
DEFINITION
An equivalence relation R on E* is a congruence relation if R is
both left- and right-invariant (i.e., if x R y, then wxz R wyz for all
w, x, y, z in E*).
2.3.9. Show that L is a regular set if and only if L is the union of some of the
equivalence classes of a congruence relation of finite index.
2.3.10. Show that if M1 and Mz are two reduced finite automata such that
L ( M i ) = L(M2), then the transition graphs of M1 and M2 are the same.
"2.3.11. Show that Algorithm 2.2 is of time complexity n 2. (That is, show that
there exists a finite automaton M with n states such that Algorithm 2.2
requires n 2 operations to find the reduced automaton for M.) What is
the expected time complexity of Algorithm 2.2 ?
It is possible to find an algorithm for minimizing the states in a
finite automaton which always runs in time no greater than n log n,
where n is the number of states in the finite automaton to be reduced.
The most time-consuming part of Algorithm 2.2 is the determination
of the equivalence classes under =_ in step 2 using the method suggested
in Lemma 2.11. However, we can use the following algorithm in step 2
to reduce the time complexity of Algorithm 2.2 to n log n.
This new algorithm refines partitions on the set of states in a
manner somewhat different from that suggested by Lemma 2.11.
Initially, the states are partitioned into final and nonfinal states.
Then, suppose that we have the partition consisting of the set of
blocks {Itt, lt2 . . . . , uk-~}. A block lrt in this partition and an input
symbol a are selected and used to refine this partition. Each block no
such that ~(q, a) ~ us for some q in ~zj is split into two blocks/t~ and
7t~' such that 7t~. = [ q t q ~ 7r~ and 5(q,a) ~ 7it} and 7t~' =Tt s -- 7t:j,
Thus, in contrast with the method in Lemma 2.11, here blocks are
refined when the successor states on a given input have previously
been shown inequivalent.
ALGORITHM 2.6
Determining the equivalence classes of a finite automaton.

Input. A finite automaton M = (Q, Z, 5, q0, F).
Output. The indistinguishability classes under _=.
Method.
(1) Define tS-l(q, a ) = [ p l t ~ ( p , a ) = q } for all q ~ Q and a ~ ~.
For a ~ Z, let ;g;.o = [qlq ~ ~, and 5-1(q, a) ~ ~}.
(2) Letgl =Fandgz =Q--F.
(3) For all a ~ Z, define the index set
[[ 1}, if z~n,,. ~ #n2,.

I(a) . - . , . _
([2}, otherwise
(4) Set k = 3.
EXERCISES 135
(5) Select a ~ ~ and i ~ I(a). [If I ( a ) = Z~ for all a ~ 1~, halt;

the output is the set {~za, zr2 . . . . . ~zk-a].]
(6) Delete i from l(a).
(7) For all j < k such that there is a state q ~ n~ and O(q, a) ~ nt
do steps 7(a)-7(d):
(a) Let ztjt = {q[O(q, a) 6 xi and q 6 ztj} and l e t / t iI t = try - x~.
(b) Replace zt~ by zr~ and let zrk = zri. " Construct new ztl, a and
~zk,= for all a 6 Z.
(c) For all a ~ ]~, modify l(a) as follows:
[I(a) t..) {j}, if j q~ I(a) and 0 < ~.~.a ~ #n;k,a

I(a) = i I(a) U { k } , otherwise
(d) S e t k = k + l .
(8) Go to step 5.
2.3.12. Apply Algorithm 2.6 to the finite automata in Example 2.15 and
Exercise 2.3.2.
2.3.13. Prove that Algorithm 2.6 correctly determines the indistinguishability
classes of a finite automaton.
*'2.3.14. Show that Algorithm 2.6 can be implemented in time n log n.
2.3.15. Show that the following are not regular sets:
(a) {0"10"ln > 1}.
(b) {wwl w is in {0, 1}*].
(c) L(G), where G is defined by productions S ----. aSbS[ e.
(d) {a"~ln >_ 1}.
(e) {ap IP is a prime}.
(f) {wiw is in {0, 1]* and w has an equal number of O's and l's].
2.3.16. Let f ( m ) be a monotonically increasing function such that for all n
there exists m such that f ( m + 1) > f ( m ) + n. Show that {aS(m)[m > 1}
is not regular.
DEFINITION
Let L1 and Lz be languages. We define the following operations"
(1) L1/L2 = {wlfor some x ~ L2, wx is in L1].
(2) INIT(Lx) = {wlfor some x, wx is in L1].
(3) FIN(La) = {wlfor some x, xw is in L1].
(4) SUB(L1) = {wl for some x and y, xwy is in La ].
(5) M I N ( L i ) = {wlw ~ L1 and for no proper prefix x of w is
x eLx}.
(6) MAX(L1) = {wlw ~ L1 and for no x v~ e is wx ~ L,}.
Example 2.18
Let L1 ={0"1"0 m l n , m ~ l } and L2 = l * 0 * .
T h e n L ~ / L z = L1 U {0il~li>_ 1,] < i].
Lz/L1 = ~3.
INIT(L,) = L~ u {Oel:li~ I,] ~ i} u 0".

HN(Li) = [O~l:Oklk> I,]> I, i<j} U I+0 + U 0".
SUB(L1) = {0qJ0k] i < ] } U 1'0" U 0"1".
MIN(L1) = {0"l"01n > 1].
MAX(L a) -- ~ . D
"2,3.17, Let L1 and L2 be regular. Show that the following are regular:
(a) L1/L2.
(b) INIT(L~).
(c) FIN(L1).
(d) SUB(L1).
(e) MIN(L~).
(f) MAX(L1).
"2.3,18, Let L1 be a regular set and L2 an arbitrary language. Show that La]L2
is regular. Does there exist an algorithm to find a finite automaton for
L 1[L2 given one for L a ?
DEFINITION
The derivative Dx~z of a regular expression 0c with respect to x ~ E*
can be defined recursively as follows:
(1) OeOC = 0~.
(2) F o r a ~ X
(a) Oaf2 = ~ .
(b) Dae = ~ .
(c) D a b = { ~ ' ifa-Cb
e, ifa =b.
(d) Da(O~ + fl) = ha~Z + Daft.
{(Da00fl, if e ¢ 0¢
(e) Da(OCfl)= (DaOOfl + Daft, if e ~ 0C.
(f) Boa* = (Boa)a*.
(3) F o r a ~ X a n d x ~ X*,
D a . a = Ox(Oaa)
2.3.19. Show that if o~ = 10"1, then

(a) DeOC = 10"1.
(b) D0a = ~ .
(c) Dla; = 0"1.
*2.3.20. Show that if ~ is a regular expression that denotes the regular set R,
then Dx0c denotes x \ R = (w[ xw ~ R}.
*'2.3.21. Let L be a regular set. Show that { x l x y ~ L for some y such that
Ixl = [y I} is regular.
A generalization of Exercise 2.3.21 is the following.
**2.3.22. Let L be a regular set and f ( x ) a polynomial in x with nonnegative
integer coefficients. Show that
EXERCISES 137
[w[ wy ~ L for some y such that [y[ = f(I w 1)]

is regular.
*2.3.23. Let L be a regular set and h a homomorphism. Show that h(L) and
h-I(L) are regular sets.
2.3.24. Prove the correctness of Algorithms 2.4-2.5.
2.3.25. Discuss the time and space complexity of Algorithms 2.4 and 2.5.
2.3.26. Give a formal proof of Theorem 2.9. N o t e that it is not sufficient to
show simply that, say, for every regular expression there is a finite
automaton accepting the set denoted thereby. One must show that
there is an algorithm to construct the automaton from the regular
expression. See Example 2.17 in this connection.
*2.3.27. Give an efficient algorithm to minimize the number of states in an
incompletely specified deterministic finite automaton.
2.3.28. Give efficient algorithms to solve the membership, emptiness, and
equivalence problems for
(a) Regular expressions.
(b) Right-linear grammars.
(c) Nondeterministic finite automata.
**2.3.29. Show that the membership and equivalence problems are undecidable
for the representation of regular sets given in Example 2.17.
*2.3.30. Show that the question "Is L(M) infinite ?" is decidable for finite auto-
mata. Hint: Show that L(M) is infinite, for n-state finite automaton M,
if and only if L(M) contains a word w such that n _<lwl < 2n.
"2.3.31. Show that it is decidable, for finite automata M1 and M2, whether
L(M~) g_ L(Mz).
Open Problem
2.3.32. Find a fast algorithm (say, one which takes time n k for some constant
k on automata of n states) which gives a minimum state nondeter-
ministic finite automaton equivalent to a given one.
2.3.33. Write a program that takes as input a finite automaton, right-linear
grammar, or regular expression and produces as output an equivalent
finite automaton, right-linear grammar, or regular expression. For
example, this program can be used to construct a finite automaton from
a regular expression.
2.3.34. Construct a program that takes as input a specification of a finite
automaton M and produces as output a,reduced finite automaton that
is equivalent to M.
2.3.35. Write a program that will simulate a nondeterministic finite automaton.
2.3.36. Construct a program that determines whether two specifications of a
regular set are equivalent.
BIBLIOGRAPHIC NOTES
The minimization of finite automata was first studied by Huffman [1954] and
Moore [1956]. The closure properties of regular sets and decidability results for
finite automata are from Rabin and Scott [1959].
The Exercises contain some of the many results concerning finite automata
and regular sets. Algorithm 2.6 is from Hopcroft [1971]. Exercise 2.3.22 has been
proved by Kosaraju [1970]. The derivative of a regular expression was defined by
Brzozowski [1964].
There are many techniques to minimize incompletely specified finite automata
(Exercise 2.3.27). Ginsburg [1962] and Prather [1969] consider this problem.
Kameda and Weiner [1968] give a partial solution to Exercise 2.3.32.
The books by Gill [1962], Ginsburg [1962], Harrison [1965], Minsky [1967],
Booth [1967], Ginzburg [1968], Arbib [1969], and Salomaa [1969a] cover finite
automata in detail.
Thompson [1968] outlines a useful programming technique for constructing a
recognizer from a regular expression.
2.4. CONTEXT-FREE L A N G U A G E S
Of the four classes of grammars in the Chomsky hierarchy, the context-

free grammars are the most important in terms of application to program-
ming languages and compiling. A context-free grammar can be used to specify
most of the syntactic structure of a programming language. In addition,
a context-free grammar can be used as the basis of various schemes for
specifying translations.
During the compiling process itself, we can use the syntactic structure
imparted to an input program by a context-free grammar to help produce
the translation for the input. The syntactic structure of an input sentence
can be determined from the sequence of productions used to derive that
input string. Thus in a compiler the syntactic analyzer can be viewed as
a device which attempts to determine if there is a derivation of the input
string according to some context-free grammar. However, given a C F G G
and an input string w, it is a nontrivial task to determine whether w is in
L(G) and, if so, what is a derivation for w in G. We shall treat this question
in detail in Chapters 4-7.
In this section we shall build the foundation on which we shall base our
study of parsing. In particular, we shall define derivation trees and study
some transformations which can be applied to context-free grammars to
make their representation more convenient.
SEC. 2.4 CONTEXT-FREE LANGUAGES 139
2.4.1. Derivation Trees
In a grammar it is possible to have several derivations that are equivalent,

in the sense that all derivations use the same productions at the same places,
but in a different order. The definition of when two derivations are equiva-
lent is a complex matter for unrestricted grammars (see the Exercises for
Section 2.2), but for context-free grammars we can define a convenient
graphical representative of an equivalence class of derivations called a deriva-
tion tree.
A derivation tree for a context-free grammar G = (N, X, P, S) is a labeled
ordered tree in which each node is labeled by a symbol from N U X W {e}.
If an interior node is labeled A and its direct descendants are labeled
Xi, X 2 , . . . , X,, then A --, X~X 2 . . . X, is a production in P.
DEFINITION
A labeled ordered tree D is a derivation tree (or parse tree) for a context-
free grammar G(A) = (N, X, P, A) if
(1) The root of D is labeled A.
(2) If D ~ , . . . , D k are the subtrees of the direct descendants of the root
and the root of D t is labeled X~, then A ~ Xt . . . X k is a production in P.
D~ must be a derivation tree for G(Xg) = (N, X, P, X~) if X~ is a nonterminal,
and D t is a single node labeled X t if X~ is a terminal.
(3) Alternatively, if D a is the only subtree of the root of D and the root
of D a is labeled e, then A ~ e is a production in P.
Example 2.19
The trees in Fig. 2.8 are derivation trees for the grammar G = G(S)
defined by S ---~ aSbSIbSaS] e. D
We note that there is a natural ordering on the nodes of an ordered tree.
That is, the direct descendants of a node are ordered "from the left" as defined
J a S
e e e
(a) (b) (c) (d)
Fig. 2.11 Derivation trees.
in Section 0.5.4. We extend the from-the-left ordering as follows. Suppose

that n is a node and n ~ , . . . , n k are its direct descendants. Then if i < j, n,
and all its descendants are to the left of nj and all its descendants. It is left
for the Exercises to show that this ordering is consistent. All that needs to
be shown is that given any two nodes of an ordered tree, they are either on
a path or one is to the left of the other.
DEFINITION
The frontier of a derivation tree is the string obtained by concatenating

the labels of the leaves (in order from the left). For example, the frontiers
of the derivation trees in Fig. 2.8 are (a) S, (b) e, (c) abab, and (d) abab.
We shall now show that a derivation tree is an adequate representation
for derivations by showing that for every derivation of a sentential form
in a C F G G there is a derivation tree of G with frontier ~, and conversely.
To do so we introduce a few more terms. Let D be a derivation tree for a
C F G G = (N, E, P, S).
DEFINITION
A cut of D is a set C of nodes of D such that

(1) No two nodes in C are on the same path in D, and
(2) No other node of D can be added to C without violating (1).
Example 2.20
The set of nodes consisting of only the root is a cut. Another cut is the
set of leaves. The set of circled nodes in Fig. 2.9 is a cut. E]
Fig. 2.9 Example of a cut.
DEFINITION
Let us define an interior frontier of D as the string obtained by concatenat-

ing (in order from the left) the labels of the nodes of a cut of D. For example,
abaSbS is an interior frontier of the derivation tree shown in Fig. 2.9.
SEC. 2.4 CONTEXT-FREE LANGUAGES '141
LEMMA 2.12
Let S = a0, a l , . . . , an be a derivation of a, from S in C F G G =
(N, Z,P, S). Then there is a derivation tree D for G such that D has frontier
a n and interior frontiers a0, a l , . . . , a,-1 (among others).
Proof. We shall construct a sequence of derivation trees D t, 0 < i < n,
such that the frontier of D~ is at.
Let D O be the derivation tree consisting of the single node labeled S.
Suppose that at = fltAT'~ and this instance of A is rewritten to obtain
at+~ = fl~X~X2 . . . Xky~. Then the derivation tree Di+ 1 is obtained from D~
by adding k direct descendants to the leaf labeled with this instance of A
(i.e., the node which contributes the Ifl, I ÷ 1st symbol to the frontier of Dr)
and labeling these direct descendants X,, X 2 , . . . , Xk respectively. It should
be evident that the frontier of D~+1 is a~+~. The construction of D~+~ from
D~ is shown in Fig. 2.10.
s S
fli A Yi fit A rt
X1 X2...Xk
(a) Di (b) Di+ 1
Fig. 2.10 Alteration of Trees
D, will then be the desired derivation tree D. [Z

We will now obtain the converse of Lemma 2.I2. That is, for every deri-
vation tree for G there is at least one derivation in G.
LEMMA 2.13
Let D be a derivation tree for a C F G G = (N, Z, P, S) with frontier a.
Then S => a.
Proof. Let C 0, C 1, C 2 , . . . , C, be any sequence of cuts of D such that
(1) C Ocontains only the root of D.
(2) C~+~ is obtained from C~ by replacing one interior node in C~ by its
direct descendants, for 0 ~ i < n.
(3) C, is the set of leaves of D.
Clearly at least one such sequence exists.
If a~ is the interior frontier associated with C~, then a0, a ~ , . . . , a, is
a derivation of a, from a0 in G. D
There are two derivations that can be constructed from a derivation tree
which will be of particular interest to us.
DEFINITION
In the proof of Lemma 2.13, if C;+t is obtained from Ci by replacing the

leftmost nonleaf in Ct by its direct descendants, then the associated derivation
• 0, ~1,. - . , ~, is called a le~most derivation of ~, from ~o in G. We define
a rightmost derivation analogously by replacing "leftmost" by "rightmost"
above. Notice that the leftmost (or rightmost) derivation associated with
a derivation tree is unique.
If S -- ~o, ~1, • • •, ~, -- w is a leftmost derivation of the terminal string
w, then each 0~t, 0 < i < n, is of the form xtAd3t with xt ~ E*, A t ~ N,
and Pt 6 (N U Z)*. The leftmost nonterminal A t is rewritten to obtain each
succeeding sentential form. The reverse situation holds for rightmost deriva-
tions.
Example 2.21
Let Go be the C F G
E= E+TIT
T= >T,FIF
F > (E) la
The derivation tree shown in Fig. 2.11 represents ten equivalent derivations
!
T
I
F
I
F
I
a
i
a Fig. 2.11 Exampleof a tree.
of the sentence a + a. The leftmost derivation is
E------~ E + T - - - ~ T + T - - - > F + T - - - > a + T - - - ~ a + F - - - ~ a + a
and the rightmost derivation is
E----> E + T----> E-+- F----~ E + a~----~ T + a - - - > F + a----~, a + a

D
SEC. 2.4 CONTEXT-TREE L A N G U A G E S 143
DEFINITION
If S----0~0, 0~1. . . . ,0on is a leftmost derivation in grammar G, then we

shall write S =~ 0~, or S ~=~ ~,, if G is clear, to indicate the leftmost deriva-
G lm lm
tion. We call c~ a left sentential form. Likewise, if S = a0, a l , . . . , c~ is a
rightmost derivation, we shall write S ~ a~, and call a~ a right sentential

rm
form. We use ~ and ~ to indicate single-step leftmost and rightmost
lm rm
derivations.
We can combine Lemmas 2.12 and 2.13 into the following theorem.
THEOREM 2.1 1
Let G - ~ (N, E , P , S) be a CFG. Then S ~ 0 c if and only if there is
a derivation tree for G with frontier 0~.
Proof. Immediate from Lemmas 2.12 and 2.13. [Z]
Notice that we have been careful not to say that given a derivation
S ~-~ ~ in a C F G G we can find a unique derivation tree for G with frontier ~.
The reason for this is that there are context-free grammars which have several
distinct derivation trees with the same frontier. The grammar in Example
2.19 is an example of such a grammar. Derivation trees (c) and (d) (Fig. 2.8)
in that example have equal frontiers but are not the same trees.
DEFINITION
We say that a C F G G is ambiguous if there is at least one sentence w

in L(G) for which there is more than one distinct derivation tree with frontier
w. This is equivalent to saying that G is ambiguous if there is a sentence w
in L(G) with two or more distinct leftmost (or rightmost) derivations (Exercise
2.4.4).
We shall consider ambiguity in more detail in Section 2.6.5.
2.4.2. Transformations on C o n t e x t - F r e e Grammars
Given a grammar it is often desirable to modify the grammar so that

a certain structure is imposed on the language generated. For example, let us
consider L(Go). This language can be generated by the grammar G with pro-
ductions
E- ~ E Jr EtE . EI(E)Ia
But there are two features of G which are not desirable. First of all, G is
ambiguous because of the productions E ~ E -k El E . E. This ambiguity
can be removed by using the grammar G I with productions
E ~ E + T I E * T[ T
Z~ (E) ta
The other drawback to G, which is shared by G1, is that the operators

+ and • have the same precedence. That is to say, in the expressions
a + a • a and a • a + a, the operators would associate from the left as in
(a + a) • a and (a • a) + a, respectively.
In going to the grammar Go we can obtain the conventional precedence
of + and ..
In general, there is no algorithmic method to impose an arbitrary struc-
ture on a given language. However, there are a number of useful transfor-
mations which can be used to modify a grammar without disturbing the
language generated. In this section and in Sections 2.4.3-2.4.5 we shall con-
sider a number of transformations of this nature.
We shall begin by considering some very obvious but important transfor-
mations. In certain situations, a C F G may contain useless symbols and
productions. For example, consider the grammar G = ({S, A}, [a, b~, P, S),
where P = IS ~ a, A ~ b]. In G, the nonterminal A and the terminal b
cannot appear in any sentential form. Thus these two symbols are irrelevant
insofar as L(G) is concerned and can be removed from the specification of
G without affecting L(G).
DEFINITION
We say that a symbol X ~ N U ~ is useless in a C F G G = (N, ~, P, S)

if there does not exist a derivation of the form S ~ wXy ~ wxy. Note that
w, x, and y are in ~*.
To determine whether a nonterminal A is useless, we first provide an
algorithm to determine whether a nonterminal can generate any terminal
strings; i.e., is {w[ A ~ w, w in E*} : ~ ? The existence of such an algorithm
implies that the emptiness problem is solvable for context-free grammars.
ALGORITHM 2.7
IS L(G) nonempty ?
Input. C F G G = (N, E, P, S).
Output. "YES" if L(G) ~ ~ , "NO" otherwise.
Method. We construct sets No, N t , . . . recursively as follows:
(1) Let N 0 = ~ and s e t i = 1.
(2) Let N t = { A [ A ~ a is in P and a ~ (N t_l U E ) * } w N t _ ~.
(3) If N~ :/: N,._i, then set i - - i + 1 and go to step 2. Otherwise, let
N, -- N,.
(4) If S is in N~, output "YES"; otherwise, output "NO." V-]
Since N, _~ N, Algorithm 2.7 must terminate after at most n + 1 iterations
of step (2) if N has n members. We shall prove the correctness of Algorithm
2.7. The proof is simple and will serve as a model for several similar proofs.
THEOREM 2. t 2
Algorithm 2.7 says " Y E S " if and only if S =~ w for some w in X*.
Proof. We first prove the following statement by induction on i"
(2.4.1) If A is in N~, then A ~ w for some w in X*
The basis, i = 0, holds vacuously, since N o = ~ . Assume that (2.4.1)

is true for i, and let A be in Nt+l. If A is also in Nt, the inductive step is
trivial. If A is in N t + l - N~, then there is a production A----~ X1 .. Xk,
where each X~ is either in X or a nonterminal in N r Thus we can find a string
wj such that Xj ~ wj for each j. If X t is in X, wj = Xj, and otherwise the
existence of w j follows from (2.4.1). It is simple to see that
A ~ X 1 ... X k ::::-~, w 1 X 2 . . . Xk ~ "'" ~ W1 "'" Wk"
The case k = 0 (i.e., production A ---~ e) is not ruled out. The inductive
step is complete.
The definition of Nt assures us that if N~ = Nt_ 1, then N~ = Nt+ 1 . . . . •
We must show that if A ~ w for some w ~ X*, then A is in Ne. By the above
comment, all we need to show is that A is in Nt for some i. We show the
following by induction on n"
/I
(2.4.2) If A ~ w, then A is in N t for some i
The basis, n = 1, is trivial; i = 1 in this case. Assume that (2.4.2) is true

n+l n
for n, and let A =:~ w. Then we can write A :=~ X1 " " Xk =:~ W, where
nj
w = wt " " we such that X j ~ w I for each j, where nj < n.t
By (2.4.2), if X~ is in N, then Xj is in N,, for some i~. If Xj is in X, let
i j = 0. Let i = 1 + max ( i t , . . . , ik). Then by definition, A is in N t. The
induction is complete. Letting A = S in (2.4.1) and (2.4.2), we have the
theorem. D
COROLLARY
It is decidable, for C F G G, if L(G) = ~3. [Z]
DEFINITION
We say that a symbol X in N U X is inaccessible in a C F G G =
(N, X, P, S) if X does not appear in any sentential form.
"l'This is an "obvious" comment that requires a little thought. Think about the deriva-
n+l
tion tree for the derivation A ~ w. wy is the frontier of the subtree with root Xj.
The following algorithm, which is an adaptation of Algorithm 0.3, can

be used to remove inaccessible symbols from a CFG.
ALGORITHM 2.8
Removal of inaccessible symbols.

lnput. CFG G = (N, E, P, S).
Output. CFG G' = (N', E', P', S) such that
(i) L(G') = L(G).
(ii) For all X in N' w E' there exist ~ and fl in (N' U E')* such that
S=~
G'
~Xp.
Method.
(1) Let V0 = {S} and set i = 1.
(2) Let V~ = {X[ some A --, ~Xp is in P and A is in Vt_~} U V~_~.
(3) If Vt ~ Vt_ 1, set i = i + 1 and go to step (2). Otherwise, let
N'= V, A N
z'= z, n z
P' be those productions in P which involve
only symbols in Vt
C' = (N', Z',P', S) 5
There is a great deal of similarity between Algorithms 2.7 and 2.8. Note
that in Algorithm 2.8, since V~ _.q N u E, step (2) of the algorithm can be
repeated at most a finite number of times. Moreover, a straightforward proof
by induction on i shows that S =-~ o~Xfl if and only if X is in Vt for some i.
G
We are now in a position to remove all useless symbols from a CFG.
ALGORITHM 2 . 9
Useless symbol removal.

Input. CFG G = (N, E, P, S), such that L(G) ~ ~3.
Output. CFG G' = (N', E', P', S) such that L(G') = L(G) and no symbol
in N' U E' is useless.
Method.
(1) Apply Algorithm 2.7 to G to obtain N,. Let G 1 = (N n N,, E, Pi, S),
where P1 contains those productions of P involving only symbols in N e U E.
(2) Apply Algorithm 2.8 to G~ to obtain G' = (N', E', P', S). [Z
Step (1) of Algorithm 2.9 removes from G all nonterminals which cannot
generate a terminal string. Step (2) then proceeds to remove all symbols
which are not accessible. Each symbol X in the resulting grammar must
appear in at least one derivation of the form S =:~ w X y :=~ wxy. Note that
applying Algorithm 2.8 first and then applying Algorithm 2.7 will not always
result in a grammar with no useless symbols.
T~EOREM 2.13
a' of Algorithm 2.9 has no useless symbols, and L(G') = L(G).
Proof. We leave it for the Exercises to show that L(G') = L(G). Suppose
that A E N' is useless. From the definition of useless, there are two cases
to consider.
Case 1" S =-~ eArl is false for all e and ft. In this case, A would have
G'
been removed in step (2) of Algorithm 2.9.
Case 2" S =-~ eArl for some a and p, but A =~ w is false for all w in ~'*.
G" G'
Then A is not removed in step (2), and, moreover, if A =~- ~,B~, then B is not
G
removed in step (2). Thus, if A ~ w, it would follow that A ~ w. We con-

G G'
clude that A ==~ w is also false for all w, and A is eliminated in step (1).
G
The proof that no terminal of G' is useless is handled similarly and is
left for the Exercises. [~]
Example 2.22
Consider the grammar G = ((S, A, B~},(a, b}, P, S), where P consists of
S- >alA
A >AB
B >b
Let us apply Algorithm 2.9 to G. In step (1), N e - { S , B } so that

G1 -- ({S, B}, [a, b], (S --, a, B ~ b], S). Applying Algorithm 2.8, we have
Vz -- V, -- {S, a). Thus, G' -- ({S], (a], (S--~ a], S).
If we apply Algorithm 2.8 first to G, we find that all symbols are acces-
sible, so the grammar does not change. Then applying Algorithm 2.7 gives
N e -- (S, B], so the resulting grammar is G1 above, not G'. V-I
It is often convenient to eliminate e-productions, that is, productions of

the form A ~ e, from a C F G G. However, if e is in L(G), then clearly it is
impossible to have no productions of the form A ~ e.
DEFINITION
We say that a C F G G = (N, E, P, S) is e-free if either

(1) P has no e-productions, or
148 ELEMENTS OF L A N G U A G E THEORY CHAP. 2
(2) There is exactly one e-production S - - , e and S does not appear on

the right side of any production in P.
ALGORITHM2.10
Conversion to an e-free grammar.
Input. CFG G = (N, Z, P, S).
Output. Equivalent e-free C F G G' -- (N', Z, P', S').
Method.
+
(1) Construct N, : [AIA ~ N and A :=~ e}. The algorithm is similar to

G
that used in Algorithms 2.7 and 2.8 and is left for the Exercises.
(2) Let P' be the set of productions constructed as follows"
(a) If A ~ aoBla,Bza z . . . BkO~k is in P, k _~ O, and for 1 _G i ~ k
each B~ is in N e but nosymbols in any ~xj are in N,, 0 G j G k,
then add to P' all productions of the form
where Xi is either Bi or e, without adding A ---~ e to P'. (This

could occur if all ~ = e.)
(b) If S is in Ne, add to P' the productions
S' >elS
where S' is a new symbol, and let N ' = N t,.)[S'}. Otherwise,

let N' = N and S ' = S.
(3) Let G' = (N', Z, P', S'). [~
Example 2.23
Consider the grammar of Example 2.19 with productions
S > aSbS[ bSaSi e
Applying Algorithm 2.10 to this grammar, we would obtain the grammar

with the following productions"
!
S >Sle
S aSbSl bSaSl aSb l abSl ab l bSa l baS l ba
THEOREM 2.14
Algorithm 2.10 produces an e-free grammar equivalent to its input
grammar.
Proof By inspection, G' of Algorithm 2.10 is e-free. To prove that
L(G) -- L(G'), we can prove the following statement by induction on the

length of w"
(2.4.3) A~ w if and only if w =/= e and A ~ w

G' G
The proof of (2.4.3) is left for the Exercises. Substituting S for A in (2.4.3),
we see that for w =/= e, w ~ L(G) if and only if w ~ L(G'). The fact that
e ~ L(G) if and only if e ~ L(G') is evident. Thus, L(G) -- L(G'). [--7
Another transformation on grammars which we find useful is the removal
of productions of the form A --. B, which we shall call single productions.
ALGORITHM 2.11
Removal of single productions.
Input. An e-free C F G G.
Output. An equivalent e-free C F G G' with no single productions.
Method.
(1) Construct for each A in N the set N~ -- (BI A ==~ B} as follows"

(a) Let N o = {A] and s e t / = 1.
(b) Let N,. = {C] B ----, C is in P and B ~ N,._ 1} U N,._i.
(c) If N~ =/= N,_ ~, set i - - i -¢- 1 and repeat step (b). Otherwise, let
N A = N i.
(2) Construct P' as follows" If B --, 0~is in P and not a single production,
place A ---, a in P' for all A such that B ~ N~.
(3) Let G' -- (N, E, P', S). [-]
Example 2.24
Let us apply Algorithm 2.11 to the grammar G O with productions
E--+E+ TIT
T ~T,FIF
F- ~. ( E ) l a
In step (1), N~ = (E, T, F}, N r = {T, F}, N~ = {F]. After step (2), P' becomes
E , E-t- T I T , F I ( E ) I a
T ,T,FI(E)la
F , (E)la [Z]
THEOREM 2.15
In Algorithm 2.11, G' has no single productions, and L(G) = L(G').
- ? : -
Proof By inspection, G' has no single productions. We shall first show

that L(G') c__L(G). Let w be in L(G'). Then there exists in G' a derivation
S - 0% =~ 0¢1 =~ . . . => ~ , - - w. If the p r o d u c t i o n applied going from ~; to
~,.+~ is A - - ~ fl, then there is some B in N (possibly, A - B) such that
A =~ B and B = ~ ft. Thus, A =~ fl and ~z,. =~ ~,.+ 1. It follows that S =~ w, and
O G O G O
w is in L(G). Thus, L(G') ~ L(G).

To show that L(G') --L(G), we must show that L(G) c__L(G'). Thus let
w be in L(G) and S -- ~0 =~ ~ =~ " " =~ ~, -- w be a leftmost derivation
Im lm lm
o f w in G. We can find a sequence of subscripts il, iz,. • •, ik consisting of

exactly those j such that ~j_ 1 =~ ~j by an application of a p r o d u c t i o n other
lm
than a single production. In particular, since the derivation of a terminal

string cannot end with a single production, i k -~ n.
Since the derivation is leftmost, consecutive uses of single productions
replace the symbol at the same position in the left sentential forms involved.
Thus we see that S ==~ ~;, z > 0c;~=~ . . . = ~ 0~;~-- w. Thus, w is in L(G').
G" O' G' G'
We conclude that L(G') -- L(G).

DEFINITION
A C F G G -- (N, Z, P, S) is said to be cycle-free if there is no derivation

+
of the form A =~ A for any A in N. G is said to be proper if it is cycle-free,
is e-free, a n d has no useless symbols.
G r a m m a r s w h i c h have cycles or e-productions are sometimes more dif-
ficult to parse than g r a m m a r s which are cycle-free and e-free. In addition,
in any practical situation useless symbols increase the size o f a parser un-
necessarily. T h r o u g h o u t this b o o k we shall a s s u m e a g r a m m a r has no useless
symbols. F o r some of the parsing algorithms to be discussed in this b o o k we
shall insist that the g r a m m a r at h a n d be proper. The following t h e o r e m shows
that this requirement still allows us to consider all context-free languages.
THEOREM 2.16
If L is a CFL, then L = L(G) for some proper C F G G.
Proof Use Algorithms 2.8-2.11. [~]
DEFINITION
An A-production in a C F G is a p r o d u c t i o n of the form A ~ 0~ for some

0c. ( D o not confuse an " A - p r o d u c t i o n " with an "e-production," which is one
o f the form B ~ e.)
Next we introduce a transformation which can be used to eliminate from
a g r a m m a r a p r o d u c t i o n of the form A ~ o~Bfl. To eliminate this p r o d u c t i o n
we must add to the g r a m m a r a set of new productions formed by replacing
the nonterminal B by all right sides of B-productions.
LEMMA 2.14
Let G = (N, Z, P, S) be a C F G and A ---~ ~Bfl be in P for some B ~ N
and 0~ and fl in (N U Z)*. Let B ---~ ~'1 I?z[ "'" [~'k be all the B-productions
in P. Let G ' = (N, Z , P ' , S ) w h e r e
Then L(G) = L(G').

Proof. Exercise.
Example 2.25
Let us replace the production A ---~ aAA in the grammar G having the
two productions A ~ aAA[b. Applying Lemma 2.14, assuming that ~ = a,
B = A, and fl = A, we would obtain G' having productions
A ---~ a a A A A l a b A l b .
Derivation trees corresponding to the derivations of aabbb in G and G'

are shown in Fig. 2.12(a) and (b). Note that the effect of the transformation
is to "merge" the root of the tree in Fig. 2.12(a) with its second direct descen-
dant. [~
a/!~A~b
LI I ! I
b b b b b
(a) (b)
In G In G'
Fig. 2.12 Derivation trees in G and G'.
2.4.3. Chomsky Normal Form
DEFINITION
A C F G G = (N, Z, P, S) is said to be in Chomsky normal f o r m (CNF)

if each production in P is of one of the forms
(1) A ~ B C with A, B, and C in N, or
(2) A ~ a w i t h a E Z, or
(3) If e ~ L(G), then S--~ e is a production, and S does not appear on

the right side of any production.
We shall show that every context-free language has a Chomsky normal
form grammar. This result is useful in simplifying the notation needed to
represent a context-free language.
ALGORITHM 2.12
Conversion to Chomsky normal form.
Input. A proper C F G G = (N, 2;, P, S) with no single productions.
Output. A C F G G' in CNF, with L(G) -- L(G').
Method. From G we shall construct an equivalent C N F grammar G'
as follows. Let P' be the following set of productions:
(1) Add each production of the form A --~ a in P to P'.
(2) Add each production of the form A --~ B C in P to P'.
(3) If S --, e is in P, add S --, e to P'.
(4) For each production of the form A ---, Xi . . - X k in P, where k > 2,
add to P' the following set of productions. We let X', stand for X~. if X,. is in
N, and let X',. be a new nonterminal if X; is in E.
A > X ' i ( X z . . . Xk>

< X , . . . X,> >x~<x, ... x,>
<x,_~ ... x,> > x;_~<x,_,x~>

<x,_,x,> > X~_lX;
where each ( X i .-- Xk) is a new nonterminal symbol.

(5) For each production of the form A --> X 1 X 2, where either X1 or X 2
or both are in E, add to P' the production A --> X'iX'z.
(6) For each nonterminal of the form a' introduced in steps (4) and (5),
add to P' the production a' ---~ a. Finally, let N' be N together with all new
nonterminals introduced in the construction of P'. Then our desired gram-
mar is G' -- (N', E, P', S). [~
THEOREM 2.17
Let L be a CFL. Then L -- L(G) for some C F G G in Chomsky normal
form.
Proof. By Theorem 2.16, L has a proper grammar. The grammar G' of

Algorithm 2.12 is clearly in CNF. It suffices to show that in Algorithm 2.12,
L ( G ) - L(G'). This statement follows by an application of Lemma 2.14 to
SEC. 2.4 CONTEXT-FREE LANGUAGES 15:3
each production of G' with a nonterminal a', and then to each production
with a nonterminal of the form (Xt - . . Xj). The resulting grammar will
b e G . [Z]
Example 2.26
Let G be the proper C F G defined by
S > aABIBA
A . > B B B [a
B > ASIb
We construct P' in Algorithm 2.12 by retaining the productions S ---~ BA,

A ~ a, B ~ A S , and B ~ b. We replace S ~ a A B by S ~ a ' ( A B ) and
( A B ) ~ AB. A ~ B B B is replaced by A ---~ B ( B B ) and ( B B ) ~ BB.
Finally, we add a'--~ a. The resulting grammar is G ' = (N', {a, b}, P', S),
where N ' = [S, A, B, ( A B ) , ( B B ) , a'} and P' consists of
S- > a ' ( A B ) I BA
A > B ASIb
(AB> , AB
<BB> , nB
a' > a [Z]
2.4.4. Greibach Normal Form
We next show that it is possible to find for each CFL a grammar in which.
every production has a right side beginning with a terminal. Central to the
construction is the idea of left recursion and its elimination.
DEFINITION
A nonterminal A in C F G G = (N, ~, P, S) is said to be recursive if

+
A ~ ~Afl for some ~ and ft. If ~ = e, then A is said to be left-recursive. Simi-
larly, if fl = e, then A is right-recursive. A grammar with at least one left-
(right-) recursive nonterminal is said to be left- (right-) recursive. A grammar
in which all nonterminals, except possibly the start symbol, are recursive is
said to be recursive.
Certain of the parsing algorithms which we shall discuss do not work
with left-recursive grammars. We shall show that every context-free language
has at least one non-left-recursive grammar. We begin by showing how to
eliminate immediate left recursion from a CFG.
LEMMA 2.15
Let G = (N, Z, P, S) be a C F G in which
a ~ A ~ IA~2i . . . I A ~ IP~ I/~ I''' I/L

are all the A-productions in P and no fl~ begins with A. Let
G ' = (N U [A'}, Z, P', S),
where P ' is P with these productions replaced by
A >P~I/~I'" IP, lP~A'I/~A'I... I/LA'

A' >~1~!"" I~l~a'l~a'l,., IO~mA'
A' is a new nonterminal not in N.t Then L(G') = L(G).
Proof In G, the strings which can be derived leftmost from A using only
A-productions are seen to be exactly those strings in the regular set
(Pl -q- P2 + " ' " "2U P,)(O~I + ~2 + " ' ' -3U ~m)*" These are exactly the strings
which can be derived rightmost from A using one A-production and some
number of A'-productions of G'. (The resulting derivation is no longer left-
most.) All steps of the derivation in G that do not use an A-production can
be done directly in G', since the non-A-productions of G and G' are the same.
We conclude that w is in L(G) and that L(G') ~ L(G).
For the converse, essentially the same argument is used. The derivation
in G' is taken to be rightmost, and sequences of one A-production and any
number of A'-productions are considered. Thus, L ( G ) = L(G'). D
The effect of the transformation in Lemma 2.15 on derivation trees is
shown in Fig. 2.13.
Example 2.27
Let Go be our usual grammar with productions
E >E+TIT
T >T , F I F
F > (E) la
The grammar G' with productions
E > TITE'
E' >+ TI+TE'
T > FIFT'
tNote that the A ~ p{s are in the initial and final sets of A-productions.
A A
/\ /\
A Oql A'
/\ /\
.,4 °q2 ~ik A'
/\
Oqk- I A"
A"
/\ /\. *°
A aik A'
/ /\
oq2 A'
\
oq 1
(a) Portion of tree in G (b) Corresponding portion in G'
Fig. 2.13 Portions of trees.
T'- > ,F[,FT'

F > (E)la
is equivalent to Go and is the one obtained by applying the construction in

Lemma 2.15 with A -- E and then A -- T. [-]
We are now ready to give an algorithm to eliminate left recursion from

a proper CFG. This algorithm is similar in spirit to the algorithm we used
to solve regular expression equations.
i
ALGORITHM 2.13
Elimination of left recursion.
Input. A proper C F G G -- (N, X, P, S).
Output. A C F G G' with no left recursion.
Method.
(1) Let N = [ A I , . . . , A,}. We shall first transform G so that if At ---~
is a production, then a begins either with a terminal or some Aj such that
j > i. For this purpose, set i = 1.
(2) Let the Arproductions be A~--~ Atal 1 " " I AtO~m fl~ I " "lflp, where
no ,81 begins with A k if k < i. (It will always be possible to do this.) Replace
these Arproductions by
A, > l~,i... IB~IP, A',I"" llbA~

156 ELEMENTS OF L A N G U A G E THEORY CHAP. 2
where A'~ is a new variable. All the At-productions now begin with a terminal
or Ak for some k > i.
(3) If i = n, let G' be the resulting grammar, and halt. Otherwise, set
i=i+ 1 and j = 1.
(4) Replace each production of the form A t ---~ A~0~, by the productions
A,---> fll0~! " " [ flm0~, where Aj ---~ fl~ ] . . . [tim are all the Approductions. It
will now be the case that all Afproductions begin with a terminal or A k,
for k > j, so all At-productions will then also have that property.
(5) If j = i - 1, go to step (2). Otherwise, setj =j-Jr- 1 and go to step (4).
D
THEOREM 2.18
Every CFL has a hon-left-recursive grammar.
Proof. Let G be a proper grammar for C F L L. If we apply Algorithm
2.13, the only transformations used are those of Lemmas 2.14 and 2.15.
Thus the resulting G' generates L.
We must show that G' is free of left recursion. The following two state-
ments are proved by induction on a quantity which we shall subsequently
define:
(2.4.4) After step (2) is executed for i, all At-productions begin with
a terminal or A k, for k > i
(2.4.5) After step (4) is executed for i and j, all At-productions begin with
a terminal or A u, for k > j
We define, the score of an instance of (2.4.4) to be ni. The score of an

instance of (2.4.5) is ni + j. We prove (2.4.4) and (2.4.5) by induction on
the score of an instance of these statements.
Basis (score of n)" Here i = 1 and j = 0. The only instance is (2.4.4)
with i = 1. None of ill,- • . , ft, in step (2) can begin with A1, so (2.4.4) is
immediate if i = 1.
Induction. Assume (2.4.4) and (2.4.5) for scores less than s, and let i
and j be such that 0 < j j. [This follows because, if j > 1, the instance of
(2.4.5) with parameters i and j -- 1 has a score lower than s. The case j = 1
follows from (2.4.4).] Statement (2.4.5) with parameters i and j is thus imme-
diate from the form of the new productions.
An inductive proof of (2.4.4) with score s (i.e., ni = s, j = 0) is left for
the Exercises.
It follows from (2.4.4) that none of A 1. . . . , A, could be left-recursive.
+
Indeed, if A t ==> At0c for some 0~, there would have to be Aj and A k with
lm
k < j such that A~ =-~ Ajfl ==~ Ak? =-~ Aloe. We must now show that no A't
lm lm lm
SEC. 2,4 CONTEXT-FREE LANGUAGES 157
introduced in step (2) can be left-recursive. This follows immediately from

the fact that if A~ ~ A)~, is a production created in step (2), then j CA IAb
C > AB[ CC[ a
We take A t = A, A z = B, and A 3 = C. The grammar after each appli-

cation of step (2) or step (4) of Algorithm 2.13 is shown below. At each step
we show only the new productions for nonterminals whose productions
change.
Step (2) with i -- 1: no change.

Step (4) with i - - 2, j - - 1: B ----, CA]BCb]ab
Step (2) with i = 2 : B --~ CA[ab[CAB'IabB'
B' --~ CbB' l Cb
Step (4) with i = 3 , j = 1: C ~ BCB]aB]CC]a
Step (4) with i = 3, j = 2:
C ~ CA CBI abCB[ CAB' CBI abB'CB[aB[ CCI a
Step (2) with i = 3:
C ---~ abCBI abB' CBI aBI a l abCBC' [abB'CBC' IaBC'I aC'
c' ~ A CBC' IAB' CBC' I CC' IA CB IAB' CB I C
An interesting special case of non-left recursiveness is Greibach no[mal

form.
DEFINITION
A C F G G = (N, E, P, S) is said. to be in Greibach normal form ( G N F )

if G is e-free and each non-e-production in P is of the form A --~ aa with
a ~ E a n d a ~ N*.
If a grammar is not left-recursive, then we can find a natural partial order
on the nonterminals. This partial order can be embedded in a linear order
which is useful in putting a grammar into Greibach normal form.
LEMMA 2.16
Let G = (N, ~, P, S) be a non-left-recursive grammar. Then there is
a linear order < on N such that if A --~ Ba is in P, then A < B.
+
Proof. Let R be the relation A R B if and only if A =~ Ba for some a.

By definition of left recursion, R is a partial order. (Transitivity is easy to

show.) By Algorithm 0.1, R can be extended to a linear order < with the
desired property. D
ALGORITHM 2.14
Conversion to Greibach normal form.
Input. A non-left-recursive proper C F G G = (N, E, P, S).
Output. A grammar G' in G N F such that L(G) = L(G').
Method.
(1) Construct, by Lemma 2.16, a linear order < on N such that every
A-production begins either with a terminal or some nonterminal B such
that A < B. Let N -- [ A ~ , . . . , A,}, so that A~ < A2 . . . < A,.
(2) Set i = n - 1.
(3) If i -- 0, go to step (5). Otherwise, replace each production of the form
A, ~ Ajoc, where j > i, by A~ ---, fl~0~[ . . . [fl,,~z, where Aj ---, ,8~ ] . . . l fl,,
are all the Alproductions. It will be true that each of , 8 ~ , . . . , tim begins
with a terminal.
(4) Set i = i - 1 and return to step (3).
(5) At this point all productions (except possibly S----. e) begin with
a terminal. For each production, say A - - . a X e . . . Xk, replace those Xj
which are terminals by X'j, a new nonterminal.
(6) For all X~. introduced in step (5), add the production X~ --~ Xj. [-]
THnOREM 2.19
If L is a CFL, then L -- L(G) for some G in GNF.
Proof A straightforward induction on n -- i (that is, backwards, starting
at i - - n - 1 and finishing at i = 1) shows that after applying step (3) of
Algorithm 2.13 for i all At-productions begin with a terminal. The property
of the linear order < is crucial here. Step (5) puts the grammar into GNF,
and by Lemma 2.14 does not change the language generated. [--]
Example 2.29
Consider the grammar G with productions
E >TITE'
E' > +T] + T E '
T >FIFT'
T' > ,F] ,FT'
F > (E) ia
Take E' < E < T' < T < F as the linear order on nonterminals.
All F-productions begin with a terminal, as they must, since F is highest

in the order. The next highest symbol, T, has productions T---, F[ FT', so
we substitute for F in both to obtain T ~ ( E ) l a [(E)T' [aT'. Proceeding to
T', we find no change necessary. We then replace the E-productions by
E---, ( E ) I a I ( E ) T ' I a T ' I ( E ) E ' I a E ' I ( E ) T ' E ' I a T ' E ' . No change for E' is nec-
essary.
Steps ( 5 ) a n d (6) introduce a new nonterminal )' and a production
)'--~ ). All instances of ) are replaced by )', in the previous productions.
Thus the resulting G N F grammar has the productions
g >(E)'IaI(E)'T'IaT'I(E)'E'IaE'I(E)' T'E']aT'E'
E' ~ +TI +TE'
T ~(E)'IaI(E)'T'IaT'
T' ~ , F I,FT'
r ~ (E)' Ia
)' ---~)
One undesirable aspect of using this technique to put a grammar into

G N F is the large number of new productions created. The following tech-
nique can be used to find a G N F grammar without introducing too many
new productions. However, this new method may introduce more nonter-
minals.
2.4.5. An Alternative Method of Achieving

Greibach Normal Form
There is another way to obtain a grammar in which each production is

of the form A ~ a~. This technique requires the grammar to be rewritten
only once. Let G = (N, Z, P, A) be a C F G which contains no e-productions
(not even A - ~ e) and no single productions.
Instead of describing the method in terms of the set of productions we
shall use a set of defining equations, of the type introduced in Section 2.2.2,
to represent the productions. For example, the set of productions
A >AaBIBB[b
B ~ aA [ B A a t B d l c
can be represented by the equations
A = A aB q-- BB -+- b
(2.4.6) B = a A -Jr-B A a -at- B d -+- c
where A and B are now indeterminates representing sets.

DEFINITION
Let A and X be two disjoint alphabets. A set of defining equations over

and A is a set of equations of the form A . = o~ + ~2 + "'" + OCk,where
A ~ A and each 0~ is a string in (A u X)*. If k = 0, the equation is taken
to be A = ~ . There is one equation for each A in A. A solution to the set of
defining equations is a function f from A to 6'(Z*) such that i l l ( A ) is sub-
stituted everywhere for A, for each A ~ A, then the equations become set
equalities. We say that solution f is a minimalfixed point ill(A) ~ g(A) for
all .4 ~ A and solutions g.
We define a C F G corresponding to a set of defining equations by creating
the productions A --~ 0~ [0c2[ . . . 10Okfor each equation A = ~i + " " + ~zk.
The nonterminals are the symbols in A. Obviously, the correspondence is
one-to-one. We shall state some results about defining equations that are
generalizations of the results proved for standard form regular expression
equations (which are a special case of defining equations). The proofs are
left for the Exercises.
LEMMA 2 . 1 7
The minimal fixed point of a set of defining equations over A and X is

unique and is given by f ( A ) = [wlA ==>w with w ~ X*}, where G is the
G
corresponding CFG.
Proof. Exercise. [Z]
We shall employ a matrix notation to represent defining equations. Let us
assume that A = [A 1, A 2 , . . . , A,}. The matrix equation
d = d R + _B
represents n equations. Here _A is the row vector [A 1, A2, .. •, A,], R is an

n × n matrix whose entries are regular expressions, and _B is a row vector
consisting of n regular expressions.
We take "scalar" multiplication to be concatenation, and scalar addition
to be + (i.e., union). Matrix and vector addition and multiplication are
defined as in the usual (integer, real, etc.) case. We let the regular expression
in row j, column i of R be ~ + . . . + ~k, if A j ~ i , . . . , Aimk are all terms
with leading symbol A~ in the equation for A~. We let the jth component of
B be those terms in the equation for A j which begin with a symbol of X.
Thus, Bj and Rio are those expressions such that the productions for A~
can be written as
A j - - A t R l j -+- A2R2s + . . . + AtRtj + . . . -+- A,,R,j + B i
where Bj is a sum of expressions beginning with terminals.

Thus the defining equations (2.4.6) would be written as
(2.4.7) [A, B] = [A, B] IaBB Aa f~+ d l + [ b , a A + c ]
We shall now find an equivalent set of defining equations for d =

d R + _B such that the new set of defining equations corresponds to a set
of productions all of whose right sides begin with a terminal symbol.
The transformation turns on the following observation.
LEMMA 2.18
Let A = d R ÷ B be a set of defining equations. Then the minimal fixed
point is d = BR*, where R* = I + R ÷ R 2 + R 3 + . . . . I is an identity
matrix (e along the diagonal and ~ elsewhere), R 2 = RR, R a = RRR, and
so forth.
Proof. Exercise.
If we let R ÷ = RR*, then we can write the minimal fixed point of the
equations _A = dR + _B as _A = _B(R+ + I ) = _BR+ -+- BI = _BR+ + _B. Un-
fortunately, we cannot find a corresponding grammar for these equations;
they are not defining equations, as the elements of R ÷ may be infinite sets of
terms. However, we can replace R ÷ by a new matrix of "unknowns." That
is, we can replace R ÷ by a matrix Q with qt~ as a new symbol in row i,
column j.
We can then obtain equations for the q~j's by observing that R + =
RR + + R. Thus, Q = RQ ÷ R is a set of defining equations for the q~j's.
Note that there are n 2 equations if Q and R are n × n matrices. The fol-
lowing lemma relates the two sets of equations.
LEMMA 2.19
Let d = dR + _Bbe a set of defining equations over A and X. Let Q be
a matrix of the size of R such that each component of Q is a unique new
symbol. Then the system of defining equations represented by d = _BQ + _B
and Q = R Q + R has a minimal fixed point which agrees on A with that of
d = dR + _B.
Proof. Exercise. E]
We now give another algorithm to convert a proper grammar to GNF.
ALCORITHM 2.15
Conversion to Greibach normal form.
Input. A proper grammar G = (N, X, P, S) such that S ~ e is not in P.
Output. A grammar G' ---- (N', X, P', S) in G N F .
Method.
(1) F r o m G, write the corresponding set of defining equations A = _AR + _B
over N and X.
(2) Let Q be an n x n matrix of new symbols, where @N = n. Construct

the new set of defining equations d = _BQ-Jr _B, Q = RQ q-R, and let
G 1 be the corresponding grammar. Since every term in B begins with a ter-
minal, all A-productions of G 1, for A ~ N, will begin with terminals.
(3) Since G is proper, e is not a coefficient in R. Thus each q-production
of G1, where q is a component of Q, begins with a symbol in N u Z. Replace
each leading nonterminal A in these productions by all the right sides of
the A-productions. The resulting grammar has only productions whose right
sides begin with terminals.
(4) For each terminal a appearing in a production as other than the first
symbol on the right side, replace it by new nonterminals of the form a' and
add production a' --~ a. Call the resulting grammar G'. [Z
THEOREM 2.20
Algorithm 2.15 yields a grammar G' in GNF, and L(G) = L(G').
Proof. That G' is in G N F follows from the properness of G. That is, no
component of _B or R is e. That L(G') = L(G) follows from Lemmas 2.14,
2.17, and 2.19. D
Example 2.30
Let us consider the grammar whose corresponding defining equations
are (2.4.7), that is,
A ~ AaBIBBIb
B ~ aA IBAaIBd[ c
We rewrite these equations according to step (2) of Algorithm 2.15 as
We then add the equations
(2.4.9) I~ ZXI=fa:
• A a Z~
q--dlI~ zX] -q- EaBBA a q - d l
The grammar corresponding t o (2.4.8) and (2.4.9) is
A > b WI an YI c YI b
B >b X l a A Z l c Z l a A l e
W ~ aBWla11
X > aBX
Y >B W [ A a Y [ d Y [ B
z ~nXlAaZldZlAa[d
EXERCISES 163
Note that X is a useless symbol. In step (3), the productions

Y - - , B W I Aa YI B and Z --~ B X I A a Z t Aa are replaced by substituting for
the leading A's and B's. We omit this transformation, as well as that o f
step (4), which should now be familiar to the reader. [~
EXERCISES
2.4.1. Let G be defined by

S---->AB
A ---> AalbB
B ~alSb
Give derivation trees for the following sentential forms:

(a) baabaab.
(b) bBABb.
(c) baSb.
2.4.2. Give a leftmost and rightmost derivation of the string baabaab in the
grammar of Exercise 2.4.1.
2.4.3. Give all cuts of the tree of Fig. 2.14.
nl
n4
n5
n7 Fig. 2.14 Unlabelled derivation tree.
2.4.4. Show that the following are equivalent statements about a CFG G and
sentence w:
(a) w is the frontier of two distinct derivation trees of G.
(b) w has two distinct leftmost derivations in G.
(c) w has two distinct rightmost derivations in G.
**2.4.5. What is the largest number of different derivations that are represen-
table by the same derivation tree of n nodes ?
2.4.6. Convert the grammar
S >AIB
A > aBlbSlb
B ----~. ABIBa
C---->. AS[b
to an equivalent CFG with no useless symbols.

2.4.7. Prove that Algorithm 2.8 correctly removes inaccessible symbols.

2.4.9. Discuss the time and space complexity of Algorithm 2.8. Use a random
access computer model.
2.4.10. Give an algorithm to compute for a CFG G = (N, E, P, S) the set of
.
A ~ N such that A :=~ e. How fast is your algorithm ?
2.4.11. Find an e-free grammar equivalent to the following"
S >ABC
A -----~ BBIe
B > CCIa
C >AAIb

2.4.13. Find a proper grammar equivalent to the following"
S-----> A I B
A-----> C I D
B----~ D I E
C---->. S l a l e
19 > Slb
E----> S l c l e

2.4.15. Prove Lemma 2.14.
2.4.16. Put the following grammars in Chomsky normal form"
(a) S ~ 0S1 t01.
(b) S ---~ aBIbA
A ---~ aS IbAAla
B ---, bSlaBBIb.
k
2.4.17. If G = (N, ~, P, S) is in CNF, S :=~ w, I w[ = n, and w is in ~*, what
G
is k?
2.4.18. Give a detailed proof of Theorem 2.17.
2.4.19. Put the grammar
S ~ BalAb
A ~ Sa[AAb[a
B~ SblBBalb
into G N F
(a) Using Algorithm 2.14.
(b) Using Algorithm 2.15.
EXERCISES 165
*2.4.20. Give a fast algorithm to test if a C F G G is left-recursive.

2.4.21. Give an algorithm to eliminate right recursion from a C F G .
2.4.22. Complete the proof of L e m m a 2.15.
*2.4.23. Prove L e m m a s 2.17-2.19.
2.4.24. Complete Example 2.30 to yield a proper g r a m m a r in G N F .
2.4.25. Discuss the relative merits of Algorithms 2.14 and 2.15, especially with
regard to the size of the resulting grammar.
*2.4.26. Show that every C F L without e has a g r a m m a r where all productions
are of the forms A ~ a B C , A ---~ aB, and A ----~ a.
DEFINITION
A C F G is an operator g r a m m a r if no production has a right side
with two adjacent nonterminals.
*2.4.27. Show that every C F L has an operator grammar. H i n t : Begin with a
G N F grammar.
*2.4.28. Show that every C F L is generated by a g r a m m a r in which each pro-
duction is of one of the forms
A ~ aBbC, A ~ aBb, A ~ aB, or A~ a
If e ~ L(G), then S ---~ e is also in P.

**2.4.29. Consider the g r a m m a r with the two productions S---~ S S [ a . Show
that the n u m b e r of distinct leftmost derivations of a n is given by
X~ = ~ t + j = , ~X~, where X1 = 1. Show that
i~0
j~0
Xn+l = n + 1
n)
(These are the Catalan numbers.)
*2.4.30. Show that if L is a C F L containing no sentence of length less than 2,
then L has a g r a m m a r with all productions of the form A ---~ aocb.
2.4.31. Show that every C F L has a g r a m m a r in which if X1 X2 . . . Xk is the
right side of a production, then XI . . . . . Xk are all distinct.
DEFINITION
A C F G G = (N, E, P, S) is linear if every production is of the form
A ~ w B x or A ---~ w for w and x in Z* and B in N.
2.4.32. Show that every linear language without e has a g r a m m a r in which
each production is of one of the forms A ~ aB, A ~ Ba, or A ~ a.
*2.4.33. Show that every C F L has a g r a m m a r G ---- (N, E, P, S) such that if A
,
is in N -- [S], then [ w l A :=~ w and w is in Z*} is infinite.
2.4.34. Show that every C F L has a recursive grammar. H i n t : Use L e m m a 2.14
and Exercise 2.4.33.
*2.4.35. Let us call a C F G G = (N, ~, P, S)quasi-linear if for every production

A ---, Xa . . . Xk there is at most one X~ which generates an infinite set
of terminal strings. Show that every quasi-linear grammar generates
a linear language.
DEFINITION
The graph of a C F G G = (N, Z, P, S) is a directed unordered graph

(N u ~ u {e}, R) such that A R X if and only if A ----~ocXfl is a pro-
duction in P for some 0¢ and ft.
2.4.36. Show that if a grammar has no useless symbols, then all nodes are ac-
cessible from S. Is the converse of this statement true ?
2.4.37. Let T be the transformation on context-free grammars defined in
Lemma 2.14. That is, if G and G' are the grammars in the statement of
Lemma 2.14, then T maps G into G'. Show that Algorithms 2.10 and
2.11 can be implemented by means of repeated applications of this
transformation T.
2.4.38. Construct a program that eliminates all useless symbols from a CFG.
2.4.39. Write a program that maps a CFG into an equivalent proper CFG.
2.4.40. Construct a program that removes all left recursion from a CFG.
2.4.41. Write a program that decides whether a given derivation tree is a valid
derivation tree for a CFG.
BIBLIOGRAPHIC NOTES
A derivation tree is also called a variety of other names including generation

tree, parsing diagram, parse tree, syntax tree, phrase marker, and p-marker. The
representation of a derivation in terms of a derivation tree has been a familiar
concept in linguistics. The concept of leftmost derivation appeared in Evey
[1963].
Many of the algorithms in this chapter have been known since the early 1960's,
although many did not appear in the literature until considerably later. Theorem
2.17 (Chomsky normal form) was first presented by Chomsky [1959a]. Theorem
2.18 (Greibach normal form) was presented by Greibach [1965]. The alternative
method of achieving G N F (Algorithm 2.15) and the result stated in Exercise
2.4.30 were presented by Rosenkrantz [1967]. Algorithm 2.14 for Greibach normal
form has been attributed to M. Paull.
Chomsky [1963], Chomsky and Schutzenberger [1963], and Ginsburg and Rice
[1962] have used equations to represent the productions of a context-free gram-
mar.
Operator grammars were first considered by Floyd [1963]. The normal forms
given in Exercises 2.4.26-2.4.28 were derived by Greibach [1965].
SEC. 2.5 PUSHDOWN AUTOMATA 167
2.5. PUSHDOWN A U T O M A T A
We now introduce the pushdown a u t o m a t o n - - a recognizer that is a natu-

ral model for syntactic analyzers of context-free languages. The pushdown
automaton is a one-way nondeterministic recognizer whose infinite storage
consists of one pushdown list, as shown in Fig. 2.15.
Read only
a2 • .- art
input tape
!
[ Filit
state
control Zi
Z2
Pushdown
list
Zm
Fig. 2.15 Pushdown a u t o m a t o n .
We shall prove a fundamental result regarding pushdown a u t o m a t a - -

that a language is context-free if and only if it is accepted by a nondetermin-
istic pushdown automaton. We shall also consider a subclass of context-free
languages which are of prime importance when parsability is considered.
These, called the deterministic CFL's, are those CFL's which can be recog-
nized by a deterministic pushdown automaton.
2.5.1. The Basic Definition
We shall represent a pushdown list as a string of symbols with the top-

most symbol written either on the left or on the right depending on which
convention is most convenient for the situation at hand. For the time being
we shall assume that the top symbol on the pushdown list is the leftmost
symbol of the string representing the pushdown list.
DEFINITION
A pushdown automaton (PDA for short) is a 7-tuple
p = (Q, z, r , 3, q0, z0, F),

168 ELEMENTS OF LANGUAGE THEORY CHAP. 2.
where
(1) Q is a finite set of :tate symbols representing the possible states of
the finite state control,
(2) 2~ is a finite input alphabet,
(3) F is a finite alphabet of pushdown list symbols,
(4) 6 is a mapping from Q x (Z w {e}) x F to the finite subsets of
QxF*,
(5) qo ~ Q is the initial state of the finite control,
(6) Z0 E F is the symbol that appears initially on the pushdown list
(the start symbol), and
A configuration of P is a triple (q, w, 0c) in Q x ~* x F*, where
(1) q represents the current state of the finite control.
(2) w represents the unused portion of the input. The first symbol of w
is under the input head. If w = e, then it is assumed that all of the input
tape has been read.
(3) a represents the contents of the pushdown list. The leftmost symbol
of ~ is the topmost pushdown symbol. If 0¢ = e, then the pushdown list is
assumed to be empty.
A move by P will be represented by the binary relation I-v- (or ~ whenever
P is understood) on configurations. We write
(2.5.1) (q, aw, ZoO ~ (q', w, 7oc)
if dr(q, a, Z) contains (q', 7') for any q ~ Q, a ~ :E w [e}, w ~ Z*, Z ~ F,

and 0~ ~ F*.
If a ¢- e, Eq. (2.5.1) states that if P is in a configuration such that the
finite control is in state q, the current input symbol is a, and the symbol on
top of the pushdown list is Z, then P may go into a configuration in which
the finite control is now in state q', the input head has been shifted one square
to the right, and the topmost symbol on the pushdown list has been replaced
by the string 7' of pushdown list symbols. If 7' = e, we say that the pushdown
list has been popped.
If a = e, then the move is called an e-move. In an e-move the current
input symbol is not taken into consideration, and the input head is not
moved. However, the state of the finite control can be changed, and the
contents of the memory can be adjusted. Note that an e-move can occur
even if all of the input has been read.
No move is possible if the pushdown list is empty.
We can define the relations 1.4-, for i ~ 0, I.-~--,and [--~- in the customary
fashion. Thus, 1-~--and 1-~--are, respectively, the reflexive-transitive and tran-
sitive closures of l---.
SEC. 2.5 PUSHDOWNAUTOMATA 169
An initial configuration of P is one of the form (qo, w, Zo) for some w in

lg*. That is, the finite state control is in the initial state, the input contains
the string to be recognized, and the pushdown list contains only the symbol
Z0. A final configuration is one of the form (q, e, ~x), where q is in FF and
is in F*.
We say that a string w is accepted by P if (q0, w, Z0)I-~-- (q, e, a0 for some
q in F and tx in r'*. The language defined by P, denoted L(P), is the
set of strings accepted by P. L(P) will be called a pushdown automaton
language.
Example 2.31
Let us give a pushdown automaton for the language L = [0"l']n > 0}.
Let P = ({qo, ql, q2}, {0, 1}, {Z, 0}, d~, qo, Z, {q0}), where
6(qo, O, Z) = [(q~, OZ)}

6(q~, O, O) = [(q~, 00)}
,fi(q~, 1, O) = {(qz, e)}
6(q2, 1, O) = [(qz, e)]
6(q2, e, Z) = {(qo, e)}
P operates by copying the initial string of O's from its input tape onto its
pushdown list and then popping one 0 from the pushdown list for each 1
that is seen on the input. Moreover, the state transitions ensure that all O's
must precede the l's. For example, with the input string 001 I, P would make
the following sequence of moves:
(qo, 0011, Z) F- (q~, 011, 0Z)

l- (q~, l l, 00Z)
~- (qz, 1, 0Z)
F- (q2, e, Z)
(q0, e, e)
In general we can show that
(qo, O, Z) J---- (q~, e, OZ)

(q~, 0', OZ) [d__(q~, e, O'÷~Z)
(ql, 1, 0'+'Z) [- (q2, e, 0'Z)
(q~, 1', 0'Z)! ~ (q~, e, Z)
(q2, e, Z) t-- (q0, e, e)
Stringing all of this together, we have the following sequence of moves by P:
(qo, 0"1", Z)[2,+1 (qo, e, e) forn~ 1

and
(qo, e, Z)l-q-. (q0, e, Z)
Thus, L ~ L(P).
Now we need to show that L ~ L(P). That is, P will accept only strings
of the form 0"1". This is the hard part. It is generally easy to show that a recog-
nizer accepts certain strings. As with grammars, it is invariably much more
difficult to show that a recognizer accepts only strings of a certain form.
Here we notice that if P accepts an input string other than e, it must
cycle through the sequence of states q0, q l, q2, q0.
We notice that if (q0, w, Z) [A_ (q l, e, ~), i ~ 1, then w = 0 t and 0c = 0tZ.
Likewise if (q2, w, 00 [._t_.(q2, e, fl), then w = 1t and 0~ = 0~fl. Also,
(q~, w, tO ~ (q2, e, fl) only if w = 1 and ~ = Off; (q2, w, Z ) I ~ (q0, e, e) only
if w = e. Thus if (q0, w, Z)I-z-(q0, e, ~), for some i _~ 0, either w = e and
i = 0 or w = 0"1", i = 2n + 1, and 0~ = e. Hence L _p_L(P). [Z]
We emphasize that a pushdown automaton, as we have defined it, can
make moves even though it has scanned all of its input. However, a pushdown
automaton cannot make a move if its pushdown list is empty.
Example 2.32
Let us design a pushdown automaton for the language
L -- {wwRI w ~ {a, b}+]}.
Let P = ([qo, ql, q2}, [a, b}, {Z, a, b}, ~, qo, Z, {q2]), where
(1) $(qo, a, Z) = [(qo, aZ)}
(2) $(qo, b, Z) : [(qo, bZ)}
(3) $(qo, a, a) = [(qo, aa), (q l, e)}
(4) O(qo, a, b) : ((qo, ab)}
(5) 6(qo, b, a) : {(qo, ba)}
(6) ,6(qo, b, b) = ((qo, bb), (q l, e)}
(7) ,6(ql, a, a) : [(ql, e)}
(8) 6(ql, b, b) -- [(ql, e)]
(9) d~(ql, e, Z) : [(q2, e)}
P initially copies some of its input onto its pushdown list, by rules (1),
(2), (4), and (5) and the first alternatives of rules (3) and (6). However, P is
nondeterministic. Anytime it wishes, as long as its current input matches
the top of the pushdown list, it may enter state q~ and begin matching its
pushdown list against the input. The second alternatives of rules (3) and (6)
represent this choice, and the matching continues by rules (7) and (8). Note
that if P ever fails to find a match, then this instance of P "dies." However,
since P is nondeterministic, it makes all possible moves. If any choice causes
P to expose the Z on its pushdown list, then by rule (9) that Z is erased and
state q2 entered. Thus P accepts if and only if all matches are made.
For example, with the input string abba, P can make the following
sequences of moves, among others"
(1) (q0, abba, Z) [--- (qo, bba, aZ)
(qo, ba, baZ)
(qo, a, bbaZ)
(qo, e, abbaZ)
(2) (q0, abba, Z) ~ (qo, bba, aZ)
[-- (qo, ba, baZ)
(q l, a, aZ)
k- (q~,e,Z)
k- (q2, e,e).
Since the sequence (2) ends in final state q2, P accepts the input string abba.
Again it is relatively easy to show that if w = CxC2 . . . c,,c,,c,,_~ . . . c~,
each ct in {a, b}, 1 < i < n, then
(q0, w, Z) [.2_ (q0, c,,c,,-1 "'" ci, CnCn-t "'" clZ)

I--(ql, cn_~ . . . c , , c,,_~ . . . c~Z)
I"-' (q~, e, Z)
I ~ (q2, e, e).
Thus, L ~ L(P).
It is not quite as easy to show that if (q0, w, Z)I-~--(q2, e, ~) for some
~ I'*, then w is of the form xx R for some x in (a + b) ÷ and ~ = e. This
proof is left for the Exercises. We can then conclude that L ( P ) = L.
The pushdown automaton of Example 2.32 quite clearly brings out the
nondeterministic nature of a PDA. From any configuration of the form
(qo, aw, a~) it is possible for P to make one of two moves---either push
another a on the pushdown list or pop the a from the top of the pushdown
list.
We should emphasize that although a nondeterministic pushdown
automaton may provide a convenient abstract definition for a language,
the device must be deterministically simulated to be realized in practice. In
Chapter 4 we shall discuss systematic methods for simulating nondetermin-
istic pushdown automata.
2.5.2. Variants of Pushdown Automata
In this section we shall define some variants of PDA's and relate the lan-
guages defined to the original PDA languages. First we would like to bring
out a fundamental aspect of the behavior of a PDA which should be quite
intuitive. This can be stated as "What transpires on top of the pushdown
list is independent of what is under the top of the pushdown list."
LEtvr~L~ 2.20
Let P = (Q, Z, I", d~, q0, Z0, F) be a PDA. If (q, w, A)I -z- (q', e, e), then
(q, w, A~)I -n- (q', e, t~) for all A ~ 17' and ~ ~ r * .
Proof A proof by induction on n is quite elementary. For n = 1, the
lemma is certainly true. Assuming that it is true for all 1 ~ n < n', let
(q, w, A)!-~-- (q', e, e). Such a sequence of moves must be of the form
(q, w, A) 1-- (q,, w~, x~ . . . x~)

I°~ (q~, w~, x~ ... x~)
I'~-~(q,, w~, x~)

I"~(q', e, e)
where k > 1 and n~ < n' for 1 < i < k.t

Then the following sequence of moves must also be possible for any
~r*.
(q, w, AtE) ~ (q l, w, X1 . . . XkO0
1"1 (q2, W2, X2 "'" Xut~)
1~-'~ (qk, Wk, Xutx)

1"~ (q', e, t~)
Except for the first move, we invoke the inductive hypothesis. El

Next, we would like to extend the definition of a PDA slightly to permit
the P D A to replace a finite-length string of symbols on top of the pushdown
•]'This is another of those "obvious" statements which may require some thought•
Imagine the P D A running through the indicated sequence of configurations. Eventually,
the length of the pushdown list becomes k - 1 for the first time. Since none of X2 . . . Xk
has ever been the top symbol, they must still be there, so let n l be the number of elapsed
moves• Then wait until the length of the list first becomes k -- 2 and let n2 be the number
of additional moves made. Proceed in this way until the list becomes empty•
list by some other finite-length string in a single move. Recall that our original
version of PDA could replace only the topmost symbol on top of the push-
down list on a given move.
DEFINITION
Let an extended PDA be a 7-tupte P = (Q, X, F, ~, qo, Z0, F), where

is a mapping from a finite subset o f Q x (Z u [e}) x F* to the finite subsets
of Q x r'* and all other symbols have the same meaning as before.
A configuration is as before, and we write (q, aw, a},)~ (q', w, fl?) if
6(q, a, a) contains (q', 13) for q in Q, a in X u [e}, and a in F*. In this move
the string a is replaced by the string fl on top of the pushdown list. As before,
the language defined by P, denoted L(P), is
{w I (q0, w, Z)I-~-- (q, e, a) for some q in F and a in F*}.
Notice that unlike a conventional PDA, an extended pushdown automa-

ton is capable of making moves when its pushdown list is empty.
Example 2.33
Let us define an extended PDA P to recognize L = {wwRlw ~ [a, b}*}.
Let P = ([q, p}, [a, b}, {a, b, S, Z}, 6, q, Z, [p}), where
(1) 6(q, a, e) = {(q, a)}
(2) 6(q, b, e) = [(q, b)}
(3) O(q, e, e) = {(q, S)}
(4) O(q, e, aSa) = {(q, S)}
(5) O(q, e, bSb) = {(q, S)}
(6) 6(q, e, SZ) = {(p, e)}
With input aabbaa, P can make the following sequence of moves:
(q, aabbaa, Z) ~ (q, abbaa, aZ)

(q, bbaa, aaZ)
(q, baa, baaZ)
k- (q, baa, SbaaZ)
}- (q, aa, bSbaaZ)
}-- (q, aa, SaaZ)
k- (q, a, aSaaZ)
]-- (q, a, SaZ)
(q, e, aSaZ)
[- (q, e, SZ)
[-- (p, e, e)
P operates by first storing a prefix of the input on the pushdown list. Then
a centermarker S is placed on top of the pushdown list. P then places the
next input symbol on the pushdown list and replaces aSa or bSb by S on
the list. P continues in this fashion until all of the input is used. If SZ then
remains on the pushdown list, P erases SZ and enters the final state. D
We would now like to show that L is a PDA language if and only if L

is an extended PDA language. The "only if" part of this statement is clearly
true. The "if" part is the following lemma.
LEMMA 2.21
Let (Q, E, F, 6, qo, Zo, F) be an extended PDA. Then there is a PDA
P, such that L(P1) = L(P).
Proof Let
m = max{l all 6(q, a, 00 is nonempty for some q E Q, a ~ X u [e}].
We shall construct a PDA Pa to simulate P by storing the top m sym-

bols that appear on P's pushdown list in a "buffer" of length m located
in the finite state control of P1. In this way P~ can tell at the start of each move
what the top m symbols of P's pushdown list are. If, in a move, P replaces
the top k symbols on the pushdown list by a string of l symbols, then P1
will replace the first k symbols in the buffer by the string of length l. If i < k,
then P1 will make k -- I bookkeeping e-moves in which k -- l symbols are
transferred from the top of the pushdown list to the buffer in the finite con-
trol. The buffer will then be full and P~ ready to simulate another move of P.
If I > k, symbols are transferred from the buffer to the pushdown list.
Formally, let P~ = (Qi, E, F~, ~l, q~, z1, Fi), where
(1) Qt = {[q, a]]q ~ Q, a ~ F~, and 0 < [a[ ~ m}.
(2) = r u (z,}.
(3) ~1 is defined as follows"
(a) Suppose that ~(q, a, X~ . . . Xk) contains (r, Y~ . . . Yi).
(i) If l > k , then for all Z ~ F~ and a ~ F~' such that
la[=m-k,
6,([q, X, . . . Xka ], a, Z) contains ([r, fl], yZ)
where fly = Y, . . . Y# and ]fl[ = m.

(ii) If l < k , then for all Z ~ F1 and a ~ F~* such that
la]=m-k,
~5,([q, X1 .-. Xka ], a, Z) contains ([r, Y, . . . YlaZ], e)
(b) For all q ~ Q, Z ~ F1, and a ~ F~* such t h a t [ a I < m,
fil([q, a], e, Z) = {([q, aZ], e)}

These rules cause the buffer in the finite control to fill up (i.e.,
contain m symbols).
(4) qt = [q0, Z o Z T - 1]. The buffer initially contains Z0 on top and m -- 1
Z l ' s below. Z l's are used as a special marker for the bottom of the pushdown
list.
(5) F1 = [[q, ~]lq ~ F, 0c ~ F~*}.
It is not difficult to show that
(q, aT, X l . . . XkXk+ ~ . " Xn) [-~-(r, w, Y1 "'" YIXk+i "'" Xn)
if and only if ([q, a], aT, fl)1-~- ([r, a'], w, fl'), where
(1) ~fl = x~ . . . x , z ~ ,
(2) ~ ' f l ' = r~ . . . Y,X~+x . . . x o z ~ ,
(3) I~t = t~:1 = m, and
(4) Between the two configurations of P1 shown is none whose state has
a second component (buffer) of length m. Direct examination of the rules
of P1 is sufficient.
Thus, (qo, w, Zo)]-~- (q, e, oO for some q in F and ~z in F* if and only if
([q0 Z0Z~-l], w, Zl)[--~---

Pt ([q, ~], e, r)
where lfll = m and fl~, = 0cZ~. Thus, L(P1) = L(P). [[]
Let us now examine those inputs to a P D A which cause the pushdown
list to become empty.
DEFINITION
Let P = (Q, Z, F, ~, qo, Z0, F) be a PDA, or an extended PDA. We say
that a string w ~ E* is accepted by P by empty pushdown list whenever
(q0, w, Z0) [--~-(q, e, e) for some q ~ Q. Let L,(P) be the set of strings accepted
by P by empty pushdown list.
LEMMA 2.22
Let L be L(P) for some P D A P = (Q, Z, F, ~, q0, Zo, F). We can construct
a P D A P ' such that L,(P') = L.
Proof. We shall let P' simulate P. Anytime P enters a final state, P ' will
have a choice to continue simulating P or to enter a special state q, which
causes the pushdown list to be emptied. However, there is one complication.
P may make a sequence of moves on an input string w which causes its push-
down list to become empty without the finite control being in a final state.
Thus, to prevent P ' from accepting w when it should not, we add to P ' a spe-
cial bottom marker for the pushdown list which can be removed only by
P ' in state q,. Formally, let P ' be (Q u {q,, q'}, Z, F u {Z'}, 6 ' , q ' , Z ' , O),t
tWe shall usually make the set of final states ~ if the PDA is to accept by empty
pushdown list. Obviously, the set of final states could be anything we wished.
where d~' is defined as follows:

(1) If 8(q, a, Z) contains (r, y), then 8'(q, a, Z) contains (r, y) for all
q~ Q,a ~ Xu{e],andZ ~ F.
(2) d~'(q', e,Z') = {(q0, ZoZ')} • P " s first move is to write ZoZ' on the push-
down list and enter the initial state of P. Z' will act as the special marker for
the bottom of the pushdown list.
(3) For all q E F and Z ~ F u {Z'}, ~'(q, e, Z) contains (q,, e).
(4) For all Z ~ F u {Z'}, ~'(q,, e, Z) = {(qe, e)}.
We can clearly see that
(q', w, z') ~ (qo, w, z0z')

(q, e, Y1 "'" Y,)
(q,, e, Y2 "'" Yr)
F~.(qe, e, e)
where Yr = Z', if and only if
(qo, w, Z 0 ) ~ (q, e, Y1 "'" Yr-1)
for q E F and Y1 "'" Yr-1 ~ F*. Hence, L,(P') = L(P). E]

The converse of Lemma 2.22 is also true.
LEMMA 2.23
Let P = (Q, E, r', 3, q0, zo, ~ ) be a PDA. We can construct a P D A P '
such that L(P') = L,(P).
Proof. P' will simulate P but have a special symbol Z ' on the bottom of
its pushdown list. As soon as P ' can read Z', P ' will enter a new final state q s"
A formal construction is left for the Exercises.
2.5.3. Equivalence of PDA Languages and CFL's
We can now use these results to show that the P D A languages are exactly
the context-free languages. In the following lemma we construct the natural
(nondeterministic) "top-down" parser for a context-free grammar.
LEMMA 2.24
Let G = (N, Z, P, S) be a CFG. F r o m G we can construct a P D A R
such that L,(R) = L(G).
Proof. We shall construct R to simulate all leftmost derivations in G.
Let R = ({q}, Z, N U X, 5, q, S, ~ ) , where 5 is defined as follows:
(1) If A ~ ~ is in P, then ~(q, e, A) contains (q, 00.
(2) J(q, a, a) = [(q, e)} for all a in X.
We now want to show that
SEC. 2.5 P U S H D O W N AUTOMATA 177
wl
(2.5.2) A ===~ w if and only if (q, w, A)1.2-- (q, e, e) for some m, n ~ 1

Only if." We shall prove this part by induction on m. Suppose that
m
A==~w. I f m = 1 andw=al ""ak, k>0, then
(q, al . . . ak, A) l---(q, a~ . . . ak, al . . . ak)

] - - ( q , e, e)
m
N o w suppose that A :=~ w for some m > 1. The first step of this deriva-
mt
tion must be of the form A ~ X12"2 " " Xk, where X~ ==~ xi for some
m~ < m, 1 < i < k, and where x t x 2 . . . x k = w. Then
(q, w, A) ~ (q, w, X~X, . . . X,,)

If X~ is in N, then
(q, x,, X,) ~-- (q, e, e)
by the inductive hypothesis. If Xt -- xt is in Z, then

(q, x,, X,) ]- (q, e, e)
Putting this sequence of moves together we have (q, w, A ) ~ - (q, e, e).

If: We shall now show by induction on n that if (q, w, A) ~z_ (q, e, e), then
+
A:=~w.
For n = 1, w = e and A ---, e is in P. Let us assume that this statement is
true for all n' < n. Then the first move made by R must be of the form
(q, w, A) 1--- (q, w, X 1 "°" Y k )
and (q, x,, Xi) ~ - ( q , e, e) for 1 < i ~ k, where w = x l x 2 " " Xk ( L e m m a

+
2.20). Then A ~ 21 ' - " Xk is a production in P, and Xt ==~ x~ from the

0
inductive hypothesis if Xi ~ N. If Xt is in Z, then X~ ==~ x~. Thus
A---~.X1 ... X k
:,g
, x~X~ .. " Xk
~ 2122 •.. Xk_lXk
- ~. 2 1 2 2 ..° 2k_lXk = W
is a derivation of w from A in G.
+
As a special case of (2.5.2), we have the derivation S ~ w if and only

if (q, w, S) ~ (q, e, e). Thus, L~(R) = L(G). E]
Example 2.34
Let us construct a P D A P such that L,(P) = L(Go) , where Go is our usual
grammar for arithmetic expressions. Let P = ({~q), Z, F, ~, q, E, ~ ) , where
is defined as follows:
(1) 6(q, e, E) = [(q, E + T), (q, T)}.
(2) 6(q, e, T) = [(q, T , F ) , (q, F)}.
(3) 5(q, e, F) = [(q, (E)), (q, a)}.
(4) d~(q, b, b) = [(q, e)} for all b ~ [a, + , ,, (,)}.
With input a + a • a, P can make the following moves among others:
(q, a -k a , a, E) ~ (q, a -l- a , a, E + T)

~-(q,a + a,a,T+ T)
F- (q,a + a , a , F + T)
~(q,a+ a,a,a+ T)
F- (q, + a , a , + T)
~- (q, a • a, T)
~(q,a,a,T,F)
]- (q, a , a, F , F)
~(q,a,a,a,F)
~(q,,a,,F)
[-- ( q , a , F )
(q, a, a)
~ (q,e,e)
Notice that in this sequence of moves P has used the rules in a sequence that
corresponds to a leftmost derivation of a + a • a from E in G 0. D
This type of analysis is called "top-down parsing," or "predictive analy-

sis," because we are in effect constructing a derivation tree starting off from
the top (at the root) and working down. We shall discuss top-down parsing
in greater detail in Chapters 3, 4, and 5.
We can construct an extended P D A that acts as a "bottom-up parser"
by simulating rightmost derivations in reverse in a C F G G. Let us consider
the sentence a + a • a in L(Go). The sequence
E===~ E + T - - - > E + T , F---~. E + T , a----> E + F , a

E + a , a-----~. T + a , a - - - > F + a , a---->a + a , a
of right-sentential forms represents a rightmost derivation of a + a • a from

E in Go.
Now suppose that we write this derivation reversed. If we consider that

in going from the string a 4- a • a to the string F -Jr- a • a we have applied
the production F--~ a in reverse, then we can say that the string a + a • a
has been "left-reduced" to the string F 4 - - a , a. Moreover, this represents
the only leftmost reduction that is possible. Similarly the right sentential
form F + a • a can be left-reduced to T 4- a • a by means of the production
T---~ F, and so forth. We can formally define the process of left reduction
as follows.
DEFINITION
Let G = (N, ~, P, S) be a CFG, and suppose that
S ==>ocAw ==>o~flw ==~ xw

i,m i,m i,m
is a rightmost derivation. Then we say that the right-sentential form ~flw

can be left-reduced under the production A ~ fl to the right-sentential
form ocAw. Furthermore, we call the substring fl at the explicitly shown
position a handle of 0~flw. Thus a handle of a right-sentential form is any
substring which is the right side of some production and which can be
replaced by the left side of that production so that the resulting string is also
a right-sentential form.
Example 2.35
Consider the grammar with the following productions"
S >ActBd
A- ~ aAb[ab
B ~ aBbblabb
This grammar generates the language {a"b"cln ~> 1] u {a"b2"d[ n ~> 1].
Consider the right-sentential form aabbbbd. The only handle of this string
is abb, since aBbbd is a right-sentential form. Note that although ab is the
right side of the production A ---, ab, ab is not a handle of aabbbbd since
aAbbbd is not a right-sentential form. [-]
Another way or defining the handle of a right-sentential form is to say

that the handle is the frontier of the leftmost complete subtree of depth 1
(i.e., a node all of whose direct descendants are leaves, together with these
leaves) of some derivation tree for that right-sentential form.
In the grammar Go, the derivation tree for a + a , a is shown in Fig.
2.16(a). The leftmost complete subtree has the leftmost node labeled F as
root and frontier a.
If we delete the leaves of the leftmost complete subtree, we are left with
the derivation tree of Fig. 2.16(b). The frontier of this tree is F + a • a, and
180 ELEMENTS OF L A N G U A G E T H E O R Y cI-Ial>. 2
+ T
i
F F
I !
a
!
a a
i
(a)
E + T E T
T T • F T T F
I
F F
I i
a F
I
a
a
I a
I
(b) (c)
Fig. 2.16 Handle pruning.
this string is precisely the result of left-reducing a -k a • a. The handle of

this tree is the frontier F of the subtree with root labeled T. Again remov-
ing the handle we are left with Fig. 2.16(c).
The process of reducing trees in this manner is called handle pruning.
F r o m a C F G G we can construct an equivalent extended PDA~P which
operates by handle pruning. At this point it is convenient to represent a
pushdown list as string such that the rightmost symbol of the pushdown
list, rather than the leftmost, is at the top. Using this convention, if P --"
(Q, ~, F, tS, q0, Z0, F) is a PDA, its configurations are exactly as before.
However, the [- relation is defined slightly differently. If tS(q, a, 00 con-
tains (p, fl), then we write (q, aw, ?oc) ~- (p, w, ?fl) for all w ~ ~* and ? ~ F*.
Thus a notation such as "~(q, a, YZ) contains (p, VWX)" means different
things depending on whether the (extended) P D A has the top of its pushdown
list at the left or right. If at the left, Y and V are the top symbols before and
after the move. If at the right, Z and X are the top symbols. Given a P D A
sEc. 2.5 PUSHDOWN AUTOMATA 1 81
with the top at the left, one can create a PDA doing exactly the same things,
but with the pushdown top at the right, by reversing all strings in F*. For
example, (p, VWX) ~ ~(q, a, YZ) becomes (p, X W V ) ~ ~(q, a, Z Y). Of
course, one must specify the fact that the top is now at the right. Conversely,
a P D A with top at the right can easily be converted to one with the top a
the left.
We see that the 7-tuple notation for PDA's can be interpreted as two
different PDA's, depending on whether the top is taken at the right or left.
We feel that the notational convenience which results from having these two
conventions outweighs any initial confusion. As the "default condition,"
unless it is specified otherwise, ordinary PDA's have their pushdown tops
on the left and extended PDA's have their pushdown tops on the right.
LEMMA 2.25
Let (N, X, P, S) be a CFG. From G we can construct an extended PDA
R such that L(R) = L(G).t R can "reasonably" be said to operate by handle
pruning.
Proof. Let R = ({q, r }, X, N U ~ U {$}, ~, q, $, {r }) be an extended PDA~
in which $ is defined as follows:
(1) d~(q, a, e) = {(q, a)} for all a ~ Z. These moves cause input symbols
to be shifted on top of the pushdown list.
(2) If A --~ ~ is in P, then O(q, e, 00 contains (q, A).
(3) $(q, e, $ S ) = {(r, e)}.
We shall show that R operates by computing right-sentential forms of G,
starting with a string of all terminals (on R's input) and ending with the
string S. The inductive hypothesis which will be proved by induction on n is
* n
(2.5.3) S~ aAy ~ xy implies (q, xy, $) ~ (q, y, SaA)
fm rm
The basis, n = 0, is trivial; no moves of R are involved. Let us assume (2.5.3)

for values of n smaller than the value we now choose for n. We can write
n--1
eAy :=~ efly=-~ xy. Suppose that eft consists solely of terminals. Then
Fm I"m
eft = x and (q, xy, $ ) ~ (q, y, $efl) !- (q, Y, SEA).

If eft is not in Z*, then we can write eft = ~,Bz, where B is the rightmost
tObviously, Lemma 2.25 is implied by Lemmas 2.23 and 2.24. It is the construction
that is of interest here.
:l:Our convention puts pushdown tops on the right.
* n--I
nonterminal. By (2.5.3), S ==>?Bzy ==~xy implies (q, xy, $ ) ~ (q, zy, $?B).
rm rm
Also, (q, zy, $?B) ~ (q, y, $~,Bz) ~-- (q, y, $o~A) is a valid sequence of moves.
We conclude that (2.5.3) is true. Since (q, e, $ S ) ~ (r, e, e), we have
L(C) _~ L(R).
We must now show the following, in order to conclude that L(R) ~ L(G),
and hence, L(G) = L(R)"
(2.5.4) If (q, xy, $) ~ (q, y, $aA), then aA y ~ xy
The basis, n = 0, holds vacuously. For the inductive step, assume that
(2.5.4) is true for all values of n < m. When the top symbol of the pushdown
list of R is a nonterminal, we know that the last move of R was caused by
rule (2) of the definition of ,3. Thus we can write
(q, xy, $) ~ (q, y, $afl) t-- (q, y, $aA),
where A --, fl is in P. If 0~fl has a nonterminal, then by inductive hypothesis

(2.5.4), o~fly =-~ xy. Thus, ocAy ==~ocfly ~ xy, as contended.
As a special case of (2.5.4), (q, w, $ ) ~ (q, e, $S) implies that S ~ w.
Since R only accepts w if (q, w, $) t---(q, e, $S) ~ (r, e, e), it follows that
L(R) _~ L(G). Thus, L(R) -- L(G). [~
Notice that R stores a right-sentential form of the type aAx with aA on

the pushdown list and x remaining on the input tape immediately after
a reduction. Then R can proceed to shift symbols of x on the pushdown list
until the handle is on top of the pushdown list. Then R can make another
reduction. This type of syntactic analysis is called "bottom-up parsing" or
"reductions analysis."
Example 2.36
Let us construct a bottom-up analyzer R for Go. Let R be the extended

PDA ({q, r}, Z, I', 5, q, $, [r]), where 5 is as follows:
(1) O(q, b, e) = [(q, b)} for all b in {a, + , . , (,)}.
(2) 5(q, e, E + T) = [(q, E)}
5(q, e, T) = [(q, E)}
5(q, e, T . F) = [(q, T)}
6(q, e, F) = {(q, T)]
~(q, e, (e)) = {(q, F)}
5(q, e, a) = [(q, F)}.
(3) 5(q, e, $ E ) = {(r, e)~.
With input a + a • a, R can make the following sequence of moves"
PUSHDOWN AUTOMATA 183
SEC. 2.5
(q, a + a , a, $) 1-- (q, + a , a, $a)

1- (q, + a • a, $F)
[-(q, + a , a , $ T )
(q, + a , a, $E)
(q, a • a, $E + )
(q, , a, $E + a)
(q, • a, $E + F)
~--- (q, • a, $E + T)
~- (q, a, $E + T ,)
1-- (q, e, SE + T • a)
[- (q, e, SE + T , F)
1- (q, e, $E + T)
1-- (q, e, $E)
y- (r, e, e)
Notice that R can make a great number of different sequences of moves

with input a + a • a. This sequence, however, is the only one that goes from
an initial configuration to a final configuration. 5
We shall now demonstrate that a language defined by a P D A is a context-

free language.
LEMMA 2.26
Let R = (Q, X, F, 8, q0, Z0, F) be a PDA. We can construct a C F G such
that L(G) = L,(R).
Proof. We shall construct G so that a leftmost derivation of w in G
directly corresponds to a sequence of moves made by R in processing w.
We shall use nonterminal symbols of the form [qZr] with q and r in Q and
+
Z ~ F. We shall then show that [qZr] =~ w if and only if (q, w, Z) t----(r, e, e).
Formally, let G = (N, Z, P, S), where
(1) N = ([qZr][q, r ~ O , Z ~ F} u (S}.
(2) The productions in P are constructed as follows"
(a) If 8(q, a, Z) contains (r, X1 " " Xk), t k _> 1, then add to P all
productions of the form
[qZsk] ~ a[rXlsl][siX2s2] "'" [Sk-lXkS~]
tR has its pushdown list top on the left, since we did not state otherwise.
for every sequence s 1, s 2 , . . . , se of states in Q.

(b) If 6(q, a,Z) contains (r, e), then add the production [qZr]---, a
to P.
(c) Add to P, S --~ [qoZoq] for each q E Q.
It is straightforward to show by induction on m and n that for all q, r ~ Q,
m
and Z ~ F, [qZr] =~ w if and only if (q, w, Z) Fz- (r, e, e). We leave the proof
+
for the Exercises. Then, S =~ [qoZoq] =~ w if and only if (q0, w, Z0) ~-- (q, e, e)
for q in Q. Thus, Le(R) -- L(G). D
We can summarize these results in the following theorem.
THEOREM 2.21
The following statements are equivalent:
(1) L is L(G) for a CFG G.
(2) L is L(P) for a PDA P.
(3) L is Le(P) for a PDA P.
(4) L is L(P) for an extended PDA P.
Proof ( 3 ) ~ (1) by Lemma 2.26. (1)----~ (3) by Lemma 2.24. (4)--~ (2)
by Lemma 2.21, and (2) ~ (4) is trivial. (2) ~ (3) by Lemma 2.22 and
(3)----, (2) by Lemma 2.23. r-]
2.5.4. Deterministic Pushdown Automata
We have seen that for every context-flee grammar G we can construct

a PDA to recognize L(G). The PDA constructed was nondeterministic, how-
ever. For practical applications we are more interested in deterministic push-
down automata--PDA's which can make at most one move in any configura-
tion. In this section we shall study deterministic PDA's and later on we shall
see that, unfortunately, deterministic PDA's are not as powerful in their
recognitive capability as nondeterministic PDA's. There are context-free
languages which cannot be defined by any deterministic PDA.
A language which is defined by a deterministic pushdown automaton will
be called a deterministic CFL. In Chapter 5 we shall define a subclass of the
context-free grammars called LR(k) grammars. In Chapter 8 we shall show
that every LR(k) grammar generates a deterministic CFL and that every
deterministic CFL has an LR(1) grammar.
DEFINITION
A PDA P -= (Q, Z, F, ~, q0, Zo, F) is said to be deterministic (a D P D A

for short) if for each q ~ Q and Z ~ I" either
(1) ~(q,a,Z) contains at most one element for each a in E and
~(q, e, Z)-= ~ or
(2) O(q, a, Z ) - - ~ for all a ~ Z and 6(q, e, Z) contains at most one

element.
These two restrictions imply that a D P D A has at most one choice of
move in any configuration. Thus in practice it is much easier to simulate
a deterministic PDA than a nondeterministic PDA. For this reason the deter-
ministic CFL's are an important class of languages for practical applications.
CONVENTION
Since 6(q, a, Z) contains at most one element for a DPDA, we shall write
6(q, a, Z) = (r, ~) instead of 6(q, a, Z) = {(r, ?)}.
Example 2.37
Let us construct a D P D A for the language L = {wcw~lw ~ [a, b}+].
Let P = ([qo, q l, q2}, {a, b, c}, {Z, a, b}, ,6, qo, Z, {q2]), where the rules of t~
are
6(qo, X, Y) = (qo, X Y) for all X ~ (a, b}

Y ~ [Z,a,b}
6(qo, c, Y ) = (q~, Y) for all Y E [a, b}
O(q~, X, X) = (q~, e) for all X ~ [a, b}
di(q t , e, Z ) = (q2, e)
Until P sees the centermarker c, it stores its input on the pushdown list.
When the e is reached, P goes to state q l and proceeds to match its subsequent
input against the pushdown list. A proof that L(P) -- L is left for the Exer-
cises. D
The definition of a D P D A can be naturally widened to include the extend-

ed PDA's which we would naturally consider deterministic.
DEFINITION
An extended PDA P -- (Q, E, r', ~, q0, Zo, F) is an (extended) determin-
istic PDA if the following conditions hold:
(1) For n o q ~ Q , a ~ E u { e } a n d ~ , ~ F* is z~O(q, a, 7 ) > 1.
(2) If O(q, a, 0c) ~ ~ , 6(q, a, fl) ~ ~ , and ~ ~ fl, then neither of ~ and fl
is a suffix of the other.?
(3) If 6(q, a, oc) ~ ~ , and O(q, e,/3) :~ ~ , then neither of 0c and fl is
a suffix of the other.
We see that in the special case in which the extended PDA is an ordinary
PDA, the two definitions agree. Also, if the construction of Lemma 2.21 is
tlf the extended PDA has its pushdown list top at the left, replace "suffix" by "prefix."
applied to an extended PDA P, the result will be a D P D A if and only if P

is an extended DPDA.
When modeling a syntactic analyzer, it is desirable to use a D P D A P
that reads all of its input, even when the input is not in L(P). We shall show
that it is possible to always find such a DPDA.
We first modify a D P D A so that in any configuration with input remain-
ing there is a next move. The next lemma shows how.
LEMMA 2.27
Let P = (Q, E, r', ~, q0, z0, F) be a DPDA. We can construct an equiva-
lent D P D A P' = (Q', Z, F', ~', q~, Z~, F') such that
(1) For all a ~ E, q ~ Q' and Z ~ F', either
(a) 8'(q, a, Z ) c o n t a i n s exactly one element and t~'(q, e , Z ) = ~ or
(b) tS'(q, a, Z) = ~ and O'(q, e, Z) contains exactly one element.
(2) If $(q, a, Z~,) = (r, ~,) for some a in ~ u {e}, then ~, = aZ~ for some
a~F*.
Proof Z~ will act as an endmarker on the pushdown list to prevent the
pushdown list from becoming completely empty. Let F ' = F u {Zg}, and.
let Q' = {q~, q,} u Q. 6' is defined thus"
(1) dV(q~, e, Zg) = (q0, ZoZ~) •
(2) For all q E Q, a ~ Z u {e}, and Z ~ F such that t~(q, a , Z ) - ~ ~ ,
~'(q, a, Z) = ~(q, a, Z).
(3) If fi(q, e , Z ) = ~ and $(q, a , Z ) = ~ for some a ~ Z and Z ~ F,
let $'(q, a, Z) = (q,, Z).
(4) For all Z ~ r ' and a ~ ~, dV(q,, a, Z ) = (q,, Z).
The first rule allows P ' to simulate P by having P' write Z0 on top of Z~
on the pushdown list and enter state qo. The rules in (2) permit P' to simulate
P until no next move is possible. In such a situation P' will go into a nonfinal
state q,, by rule (3), and remain there without altering the pushdown list,
while consuming any remaining input. A proof that L(P') = L(P) is left for
the Exercises. Q
It is possible for a D P D A to make an infinite number of e-moves from

some configurations without ever using an input symbol. We call these con-
figurations looping.
DEFINITION
Configuration (q, w, a) of D P D A P is looping if for all integers i there
exists a configuration (p,, w, l?,) such that ]fl, I>--I~t and
(q, w, 00 ~- (P~, w, fl~) ~ (P2, w, 1/2) F - ' " ".
Thus a configuration is looping if P can make an infinite number of

e-moves without creating a shorter pushdown list; that list might grow
indefinitely or cycle between several different strings.
Note that there are nonlooping configurations which after popping part
of their list using e-moves enter a looping configuration. We shall show that
it is impossible to make an infinite number of e-moves from a configuration
unless a looping configuration is entered after a finite, calculable number of
moves.
If P enters a looping configuration in the middle of the input string, then
P will not use any more input, even though P might satisfy Lemma 2.27.
Given a D P D A P, we want to modify P to form an equivalent D P D A P'
such that P' can never enter a looping configuration.
ALGORITHM 2.16
Detection of looping configurations.

Input. D P D A P = (Q, E, I', ~, q0, Z0, F).
Output.
(1) C a = {(q, A)](q, e, A) is a looping configuration and there is no r in
F such that (q, e, A) 1----(r, e, 00 for any 0~ ~ F*}, and
(2) C2 = {(q,A)l(q, e,A) is a looping configuration and (q, e, A)~----
(r, e, a) for some r ~ F and a ~ F*}.
Method. Let ~ Q = n t, # F = n 2, and let l be the length of the longest
string written on the pushdown list by P in a single move. Let n 3 =
n~(n~'"'~- n2)/(n z -- 1), where n3 = na if n2 = 1. n3 is the maximum number
of e-moves P can make without looping.
(1) For each q ~ Q and A ~ F determine whether (q, e, A) ~ (r, e, a)
for some r e Q and ~ ~ F +. Direct simulation of P is used. If so, (q, e, A) is
a looping configuration, for then we shall see that there must be a pair
(q', A'), with q' ~ Q and A' E F, such that
(q, e, A) t--- (q', e, A'fl)

(q', e,
Ira(j- 1,) (q,, e, A' rJfl)
where m > 0 and j > 0. Note that ~, can be e.

(2) If (q, e, A) is a looping configuration, determine whether there is
an r in F such that (q, e, A ) ~ (r, e, 00 for some 0 _< j < n 3. Again, direct
simulation is used. If so, add (q, A) to C z. Otherwise, add (q, A) to C~. We
claim that if P can reach a final configuration from (q, e, A), it must do so
in n 3 or fewer moves.
THEOREM 2.22
Algorithm 2.16 correctly determines C~ and C2.
Proof We first prove that step (1) correctly determines C1 U C2. If (q, A)
is in C1 U C2, then, obviously, (q, e, A ) ~ (r, e, ~). Conversely, suppose
that (q, e, A) ~ (r, e, ~).
Case 1" There exists fl ~ r * , with I//I > nan21 and (q, e , A ) ~ (p, e, fl)
~--(r, e, ~ ) f o r some p ~ Q. If we consider, for j = 1, 2 , . . . , nan2l + 1,
the configurations which P entered in the sequence of moves (q, e, A)
(p, e, fl) the last time the pushdown list had length j, then we see that there
must exist q' and A' such that at two of those times the state of P was q' and
A' was on top of the list. In other words, we can write (q, e, A) l-- (q', e, A'6)
(q', e, A'76 ) ~ (p, e, fl). Thus, (q, e, A) ~ (q', e, A'$) ~ (q', e, A'rJ~) for
allj ~ 0 by Lemma 2.20. Here, m > 0, so an infinity of e-moves can be made
from configuration (q, e, A), and (q, A) is in C~ U C 2.
Case 2: Suppose that the opposite of case 1 is true, namely that for all fl
such that (q, e, A) ~ (p, e, fl) ]--- (r, e, ~) we have I/~1 _< nln2l. Since there
are n 3 -+- 1 different fl's, na possible states, and n2 -k- n~ -+- n~ -+- . . . -q- n~'"a
= (n~~"~- n2)/(n 2 --1) possible pushdown lists of length at most nlnJ,
there must be some repeated configuration. It is immediate that (q, A) is
in Cx U C 2.
The proof that step (2) correctly apportions C a U C2 between C a and C~
is left for the Exercises. E]
DEFINITION
A D P D A P -- (Q, Z, F, ~, qo, Zo, F) is continuing if for all w ~ E* there

exists p ~ Q and ~ ~ F* such that (qo, w, Zo) ~ - (p, e, ~). Intuitively, a con-
tinuing D P D A is one which is capable of reading all of its input string.
LEMMA 2.28
Let P -- (Q, ~, F, 6, qo, Zo, F) be a DPDA. Then there is an equivalent
continuing D P D A P'.
Proof Let us assume by Lemma 2.27 that P always has a next move.
Let P ' = (Q u {p, r}, ~, F, t~', q0, Z0, F U [p]), where p and r are new
states, t~' is defined as follows"
(1) For all q ~ Q, a ~ Z, and Z E F, let 6'(q, a, Z) = O(q, a, Z).
(2) For all q E Q and Z ~ r' such that (q, e, Z) is not a looping configu-
ration, let O'(q, e, Z) = $(q, e, Z).
(3) For all (q, Z) in the set Cl of Algorithm 2.16, let 6'(q, e, Z) = (r, Z).
(4) For all (q, Z) in the set C 2 of Algorithm 2.16, let r~'(q, e, Z) = (p, Z).
(5) For all a E E and Z ~ I", t~'(p, a, Z) = (r, Z) and ~'(r, a, Z) = (r, Z).
Thus, P ' simulates P. If P enters a looping configuration, then P' will enter on
the next move either state p or r depending on whether the loop of configu-
rations contains or does not contain a final state. Then, under all inputs, P'
s~c. 2.5 PUSHDOWN AUTOMATA 189
enters state r from p and stays in state r without altering the pushdown list.
Thus, L(P') = L(P).
It is necessary to show that P ' is continuing. Rules (3), (4), and (5) assure
us that no violation of the "continuing" condition occurs if P enters a looping
configuration. It is necessary to observe only that if P is in a configuration
which is not looping, then within a finite number of moves it must either
(1) Make a non-e-move or
(2) Enter a configuration which has a shorter pushdown list.
Moreover, (2) cannot occur indefinitely, because the pushdown list is
initially of finite length. Thus either (1) must eventually occur or P enters
a looping configuration after some instance of (2). We may conclude that
P ' is continuing. [~]
We can now prove an important property of DPDA's, namely that their

languages are closed under complementation. We shall see in the next section
that this is not true for the class of all CFL's.
THEOREM 2.23
If L = L(P) for D P D A P, then L = L(P') for some D P D A P'.

Proof. We may, by Lemma 2.28, assume that P is continuing. We shall
construct P' to simulate P and see, between two shifts of its input head,
whether or not P has entered an accepting state. Since P ' must accept the
complement of L(P), P' accepts an input if P has not accepted it and is about
to shift its input head (so P could not subsequently accept that input).
Formally, let P = (Q, ~, r', ~, q0, z0, F) and P' = (Q', E, I", ~', q~, z0, F'),
where
(1) Q' = {[q, i] Iq ~ Q, i ~ {0, 1, 2}},
(2) q~ = [q0, o] if q0 ~ F a n d q~ = [q0, 1] if q0 e F, and
(3) F ' = {[q, 2][q in a}.
The states [q, 0] are intended to mean that P has not been in a final state
since it last made a non-e-move. [q, 1] states indicate that P has entered a final
state in that time. [q, 2] states are used only for final states. If P ' is in a [q, 0]
state and P (in simulation) is about to make a non-e-move, then P' first
enters state [q, 2] and then simulates P. Thus, P ' accepts if and only if P does
not accept. The fact that P is continuing assures us that P ' will always get a
chance to accept an input if P does not. The formal definition of 6' follows:
(i) I f q ~ Q , a ~ ~E, a n d Z ~ F, then
d~'([q, 1], a, Z) = J'([q, 2], a, Z) = ([p, i], ~,),
where ~ ( q , a , Z ) = ( p , ~ , ) , i = 0 i f p ~ F , and i = l i f p E F .
(ii) If q c Q, Z e F, and d(q, e, Z) = (p, ~,), then
6'([q, 1], e, Z) = ([p, 1], ~,)

a n d ~'([q, 0], e, Z ) -- ([p, i], 7), where i - - 0 if p ~ F a n d i - 1 if p ~ F.
(iii) If 6(q, e, Z ) -- O, then 6'([q, 0], e, Z ) -- ([q, 2], Z).
Rule (i) handles non-e-moves. The second c o m p o n e n t of the state is set
to 0 or ] properly. Rule (ii) handles e-moves; again the second c o m p o n e n t of
the state is h a n d l e d as intended. Rule (iii) allows P ' to accept an input exactly
when P does not. A f o r m a l p r o o f that L ( P ' ) - L ( P ) will be omitted. ~-]
There are a n u m b e r of other i m p o r t a n t properties of deterministic CFL's.
We shall defer the discussion of these to the Exercises and the next section.
EXERCISES
2.5.1. Construct PDA's accepting the complements (with respect to [a, b}*)
of the following languages:
(a) {a"b"a" [n ~> 1}.
(b) [wwRIw ~ [a, b}*}.
(c) [ambnamb" [m, n ~ 1}.
(d) (ww] w ~ (a, b}*).
Hint: Have the nondeterministic 'PDA "guess" why its input is not in
the language and check that its guess is correct.
2.5.2. Prove that the PDA of Example 2.31 accepts [wwR[ W ~ (a, b}+}.
2.5.3. Show that every CFL is accepted by a PDA which never increases the
length of its pushdown list by more than one on a single move.
2.5.4. Show that every CFL is accepted by a PDA P = (Q, E, 1-', 6, q0, Z0, F)
such that if (p, 7) is in O(q, a, Z), then either ~, = e, ~, - Z, or 7 -- YZ
for some Y ~ F. Hint: Consider the construction of Lemma 2.21.
2.5.5. Show that every CFL is accepted by a PDA which makes no e-moves.
Hint: Recall that every CFL has a grammar in Greibach normal form.
2.5.6. Show that every CFL is L(P) for some two-state PDA P.
2.5.8. Find bottom-up and top-down recognizers (PDA's) for the following
grammars:
(a) S ---, aSb [e.
(b) S - - ~ AS[b
A - ~ SAla.
(c) s ~ , SSlA
A--~OAliSIO1.
2.5.9. Find a grammar generating L(P), where
P = ([q0, ql, q2], [a, b}, {Z0, A], 6, q0, Z0, [q2})
EXERCISES 1 91
and ~ is given by
6(qo, a, Zo) -- (q~, AZo)

O(qo, a, A) -- (q l, A A )
O(ql, a, A) -- (qo, A A )
(~(ql, e, A) -- (qz, A)
6(q2, b, A) = (q2, e)
Hint: It is not necessary to construct the productions for useless non-

terminals.
"2.5.10. Show that if P -- (Q, Z, F, 6, q0, Z0, F) is a PDA, then the set of strings
which can appear on the pushdown list is a regular set. That is, show
that {0~I(q0, w, 2"o) ~ (q, x, ~) for some q, w, and x} is regular.
2.5.12. Let P be a P D A for which there is a constant k such that P can never
have more than k symbols on its pushdown list at any time. Show that
L(P) is a regular set.
2.5.13. Give D P D A ' s accepting the following languages:
(a) [0~1J IJ ~ i}.
(b) [wl w consists of an equal number of a's and b's}.
(c) L(Go), where Go is the usual grammar for rudimentary arithmetic
expressions.
2.5.14. Show that the D P D A of Example 2.36 accepts {wewRI w E {a, b}+}.
2.5.15. Show that if the construction of Lemma 2.21 is applied to an extended
DPDA, then the result is a DPDA.
2.5.16. Prove that P and P' in Eemma 2.27 accept the same language.
2.5.17. Prove that step (2) of Algorithm 2.16 correctly distinguishes C1 from Cz.
2.5.19. The PDA's we have defined make a move independent of their input
unless they move their input head. We could relax this restriction and
allow the input symbol scanned to influence the move even when the
input head remains stationary. Show that this extension still accepts only
the CFL's.
*2.5.20. We could further augment the PDA by allowing it to move two ways
on the input. Also, let the device have endmarkers on the input. We call
such an automaton a 2PDA, and if it is deterministic, a 2DPDA. Show
that the following languages can be recognized by 2DPDA's:
(a) {a"b"c" [n > 1}.
(b) (wwlw ~ {a, b}*}.
(c) {a2"l n Z> 1}.
2.5.21. Show that a 2PDA canrecognize {wxw I w and x are in {0, 1}+}.
Open Questions
2.5.22. Does there exist a language accepted by a 2PDA that is not accepted by
a 2DPDA ?
2.5.23. Does there exist a CFL which is not accepted by any 2DPDA ?
2.5.24. Write a program that simulates a deterministic PDA.
*2.5.25. Devise a programming language that can be used to specify pushdown
automata. Construct a compiler for your programming language. A
source program in the language is to define a PDA P. The object program
is to be a recognizer which given an input string w simulates the behavior
of P on w in some reasonable sense.
2.5.26. Write a program that takes as input CFG G and constructs a nondeter-
ministic top-down (or bottom-up) recognizer for G.
BIBLIOGRAPHIC NOTES
The importance of pushdown lists, or stacks, as they are also known, in lan-
guage processing was recognized by the early 1950's. Oettinger [1961] and
Schutzenberger [1963] were the first to formalize the concept of a pushdown auto-
maton. The equivalence of pushdown automaton languages and context-free
languages was demonstrated by Chomsky [1962] and Evey [1963].
Two-way pushdown automata have been studied by Hartmanis et al. [1965],
Gray et al. [1967], Aho et al. [1968], and Cook [1971].
2.6. PROPERTIES OF CONTEXT-FREE

LANGUAGES
In this section we shall examine some of the basic properties of context-

free languages. The results mentioned here are actually a small sampling of
the great wealth of knowledge about context-free languages. In particular,
we shall discuss some operations under which C F L ' s are closed, some decid-
ability results, and matters of ambiguous context-free grammars and lan-
guages.
2.6.1. Ogden's Lemma
We begin by proving a theorem (Ogden's lemma) about context-free

grammars from which we can derive a number of results about context-free
languages. F r o m this theorem we can derive a "pumping lemma" for context-
free languages.
SEC. 2.6 PROPERTIES OF CONTEXT-FREE LANGUAGES 193
DEFINITION
A position in a string of length k is an integer i such that 1 < i < k. We

say that symbol a occurs at position i of string w if w = w law 2 and
i w~l= i -- 1. For example, the symbol a occurs at the third position of the
string baacc.
THEOREM 2.24
For each C F G G = (N, ~, P, S), there is an integer k ~ 1 such that if
z is in L(G), ]z[ ~ k, and if any k or more distinct positions in z are designated
as being "distinguished," then z can be written as uvwxy such that
(1) w contains at least one of the distinguished positions.
(2) Either u and v both contain distinguished positions, or x and y both
contain distinguished positions.
(3) vwx has at most k distinguished positions.
(4) There is a nonterminal A such that
+ + + + +
S ~ uAy ~ uvAxy :- . . . ~ uv~Axty ~ uv~wxiy

G G G G G
for all integers i (including i = 0, in which case the derivation is

+ +
S ==~ u A y ==~ uwy).

o G
Proof. Let m = ~ N and l be the length of the longest right side of

a production in P. Choose k = l 2"+3, and consider a derivation tree T for
some sentence z in L(G), where Jz [ ~ k and at least k positions of z are
designated distinguished. Note that T must contain at least one path of
length at least 2m -k- 3. We can distinguish those leaves of T which, in the
frontier z of T, fill the distinguished positions.
Let us call node n of T a branch node if n has at least two direct descen-
dants, say n 1 and n 2, such that nl and n 2 both have distinguished leaves as
descendants.
We construct a path nl, n 2 , . . , in T as follows"
(1) nl is the root of T.
(2) If we have found n i and only one of nt's direct descendants has distin-
guished leaves among its descendants (i.e., nt is not a branch node), then
let n~+~ be that direct descendant of n~.
(3) If n~ is a branch node, choose n~+~ to be that direct descendant of nt
with the largest number of distinguished leaves for descendants. If there is
a tie, choose the rightmost (this choice is arbitrary).
(4) If ni is a leaf, terminate the path.
Let nl, n 2 , . . . ,np be the path so constructed. A simple induction on i
shows that if n i , . . . , n~ have r branch nodes among them, then ni+l has at
least 12m+3-r distinguished descendants. The basis, i = 0, is trivial; r = 0,
and nl has at least k = 12m+3 distinguished descendants. For the induction,

observe that if n~ is not a branch node, then n~ and n~+ 1 have the same number
of distinguished descendants, and that if n~ is a branch node, n~+ 1 has at
least 1/Ith as many.
Since n~ has l 2m+3 distinguished descendants, the path n l , . . . , np has at
least 2m -k 3 branch nodes. Moreover, np is a leaf and so is not a branch
node. Thus, p > 2m -q- 3.
Let ba,b2,..., b2m+3 be the last 2m ÷ 3 branch nodes in the path n l , . . . ,n~.
We call bt a left branch node if a direct descendant of b~ not on the path
has a distinguished descendant to the left of np and a right branch node
otherwise.
We assume that at least m -t- 2 of b l , . . . , b2m+3 are left branch nodes.
The case in which at least m -k 2 are right branch nodes is handled analo-
gously.
Let 1 1 , . . . , lm+z be the last m - q - 2 left branch nodes in the sequence
b~,..., b2m+3.Since ~ N = m, we can find two nodes among lz, . . . , lm+2,
say Is and 18, such that f < g, and the labels of Ir and lg are the same, say A.
This situation is depicted in Fig. 2.17. The double line represents the path
n 1, . . . , np; . ' s represent distinguished leaves, but there may be others.
If we delete all of ls's descendants, we have a derivation tree with frontier
uAy, where u represents those leaves to the left of If and y represents those
+
to the right. Thus, S ==~ uAy. If we consider the subtree dominated by 1r

+
with the descendants of lg deleted, we see that A ==~ vAx, where v and x are
the frontiers from the descendant leaves of lr to the left and right, respec-
tively, of Ig. Finally, let w be the frontier of the subtree dominated by Ig.
+
Then A ~ w. We observe that z = uvwxy.
o w x y
Fig. 2.17 D e r i v a t i o n tree T.

SEC. 2.6 P R O P E R T I E S OF C O N T E X T - F R E E L A N G U A G E S 195
4- +
Putting all these derivations together, we have S =-~ u A y ==~ uwy, and
+ + + + + +
for all i > 1, S =-~ u A y ==~ u v A x y =-~ uv2AxZy ==~. . . ==~ uvtAxty ==~ uvtwxty.
Thus condition (4) is satisfied. Moreover, u has at least one distinguished
position, the descendant of some direct descendant of 1~. v likewise
has at least one distinguished position, descending from 1r. Thus condition
(2) is satisfied. Condition (1) is satisfied, since w has a distinguished position,
namely np.
To see that condition (3), that v w x has no more than k distinguished
positions, is satisfied, we observe that b 1, being the 2m -t-- 3rd branch node
from the end of path n 1, . . . , np, has no more than k distinguished positions.
Since 1r is a descendant of b 1, our desired result is immediate.
We should also consider the alternative case in which at least m -+ 2 of
b l, . . . , b2,,+ 3 are right branch nodes. However, this case is handled symmet-
rically, and we shall find condition (2) satisfied because x and y each have
distinguished positions. [~
An important corollary of Ogden's lemma is what is usually referred to

as the pumping lemma for context-free languages.
COROLLARY
Let L be a CFL. Then there exists a constant k such that if lzl >_ k and
z ~ L, then we can write z = u v w x y such that v x ~ e, l v w x [ < k, and for
all i, uvtwxty is in L.
P r o o f In Theorem 2.24, choose any C F G for L and let all positions of
each sentence be distinguished. [~]
It is the corollary to Theorem 2.24 that we most often use when proving
certain languages not to be context-free. Theorem 2.24 itself will be used
when we talk about inherent ambiguity of CFL's in Section 2.6.5.
Example 2.38
Let us use the pumping lemma to show that L = [a"'[n > 1} is not a
CFL. If L were a CFL, then we would have an integer k such that if n 2 > k,
then a"' = uvwxy, where v and x are not both e and i vwxl < k. In particular,
let n be k itself. Certainly k 2 > k. Then uv2wxZy is supposedly in L. But
since Ivwxl_< k, we have 1 _<lvxl_< k, so k 2 < t uv~wx~y l < k 2 + k.
But the next perfect square after k 2 is (k + 1 ) 2 = k 2 + 2k + 1. Since
k 2 + k < k 2 + 2k + 1, we see that luvZwxZy[ is not a perfect square. But
by the pumping lemma, uv2wx2y is in L, which is a contradiction. D
Example 2.39
Let us show that L -- {a"b"c"[n ~ 1} is not a CFL. If it were, then we

would have a constant k as defined in the pumping lemma. Let z = akb~'e k.
Then z = uvwxy. Since [vwxl_~ k, it is not possible that v and x together

have occurrences of a's, b's, and c's; vwx will not "stretch" across the k b's.
Thus, uwy, which is in L by the pumping lemma, has either k a's or k c's.
It does not, however, have k instances of each of the three symbols, because
l uwyl < 3k. Thus, uwy has more of one symbol than another and is not in L.
We thus have a contradiction and can conclude only that L is not context-
free. E]
2.6.2. Closure Properties of CFL's
Closure properties can often be used to help prove that certain languages
are not context-free, as well as being interesting from a theoretical point of
view. In this section we shall summarize some of the major closure properties
of the context-free languages.
DEFINITION
Let ,,C be a class of languages and let L ~ ~* be in ,C. Suppose that

for each a in ~, La is a language in ~. ,,C is closed under substitution if for all
choices of L,
L'=[xlx2... x, l alaz...a, ~ L
x l ~ L,t
x2 ~ Za,
.
x,~Lo.}
is in £.
Example 2.40
Let L = [0"l"in ~ 1}, L0 = [a}, and L 1 = [bmcmlm _~ 1}. Then the sub-
stitution of L0 and Li into Z is
L' -- [anbmlcmlbm"-cm. . . . bm.cm"I n > 1, mt ~> 1} [--]
THEOREM 2.25
The class of context-free languages is closed under substitution.
P r o o f Let L ~ Z* be a CFL where ~ -- {a 1, a 2 , . . . , a,]. Let L a _ ~*
be a CFL for each a in ~. Call the language that results from the substitution
of the La's for a in L by the name L'. Let G -- (N, E, P, S) be a C F G for L
and G~ -- (N,, E,, Pa, a') be a C F G for La. We assume that N and all Na are
mutually disjoint. Let G ' = (N', E', P', S), where
(1) N ' = U N~ w N.
aEE
(2) ~ f f = U ~a"
aEZ
(3) Let h be the homomorphism on N U Z such that h ( A ) = A for all A in
N and h ( a ) = a' for a in Z. Let P ' = {A ----~h(a) lA ---~ a is in P} U U P,.
aEZ
Thus, P ' consists of the productions of the Ga'S together with the produc-
tions of G with all terminals made (primed) nonterminals. Let al . . . a , be
! f f
inLandx~inL~,forl ~i~n. ThenS~a'l ... a, ~ xlaz
,
.. a, ~ . . .
G' G" O'
x~ ... x,. Thus, L' _~ L(G').

G'
Suppose that w is in L ( G ' ) and consider a derivation tree T of w. Because
of the disjointness of N and the N,'s, each leaf with non-e-label has at least
one ancestor labeled a' for some a in Z. If we delete all nodes of Twhich have
an ancestor other than themselves with label a' for a ~ Z, then we have
a derivation tree T' with frontier a'l . . . a',, where a I . . . a , is in L. If we
let x i be the frontier of the subtree of T dominated by the ith leaf of T',
then w = x~ . . . x, and xt is in L,,. Thus, L ( G ' ) = L ' . [Z]
COROLLARY
The context-free languages are closed under (1) union, (2) product,
(3) ,, (4) + , and (5) homomorphism.
Proof. Let La and L b be context-free languages.
(1) Substitute L a for a and Lb for b in the C F L {a, b}.
(2) Substitute Z a for a and L b for b in {ab}.
(3) Substitute La for a in a*.
(4) Substitute L, for a in a ÷.
(5) Let L~ = {h(a)} for homomorphism h, and substitute into L to obtain
h(L).
THEOREM 2.26
The class of context-free languages is closed under intersection with
regular sets.
Proof We can show that a P D A P and a finite automaton A running in
parallel can be simulated by a P D A P'. The composite P D A P ' simulates P
directly and changes the state of A each time P makes a non-e-move. P '
accepts if and only if both P accepts and A is in a final state. The details of
such a proof are left for the Exercises. [Z]
Unlike the regular sets, the context-free languages are not a Boolean
algebra of sets.
THEOREM 2.27
The class of context-free languages is not closed under intersection or
complement.
P r o o f L 1 = {a"b"dln > 1, i _> 1} and L2 = {db"c"[i _> 1, n > 1} are both

context-free languages. However, by Example 2.39, L1 ~ L z = {a"b"c"[n > 1}
is not a context-free language. Thus the context-free languages are not
closed under intersection.
We can also conclude that the context-free languages are not closed under
complement. This follows from the fact that any class of languages closed
under union and complement must also be closed under intersection, using
De Morgan's law.
The CFL's are closed under union by the corollary to Theorem 2.25. [Z]
There are many other operations under which the context-free languages
are closed. Some of these operations will be discussed in the Exercises. We
shall conclude this section by providing a few applications of closure prop-
erties in showing that certain sets are not context-free languages.
Example 2.41
L = {ww[w ~ {a, b} +} is not a context-free language. Suppose that L
were context-free. Then L' = L A a+b +a+b + = {amb"amb" l m, n > 1} would
also be context-free by Theorem 2.26. But from Exercise 2.6.3(e), we know
that L' is not a context-free language. [Z]
Example 2.42
L = { w w l w ~ {c,f} +} is not a context-free language. Let h be the homo-
morphism h ( c ) = a and h ( f ) = b. Then h ( L ) = { w w l w ~ {a, b}+}, which by
the previous example is not a context-free language. Since the CFL's are
closed under homomorphism (corollary to Theorem 2.25), we conclude
that L is not a CFL. [2]
Example 2.43
A L G O L is not a context-free language. Consider the following class of
A L G O L programs:
L = {begin integer w; w : = 1; endtw is any string in {c,f}+}.
Let LA be the set of all valid A L G O L programs. Let R be the regular set
denoted by the regular expression
begin integer (c + f ) + ; (c ÷ f ) + : = 1; end
Then L = LA ~ R. Finally let h be the homomorphism such that h(c) = c,

h ( f ) = f , and h ( X ) = e otherwise. Then h(L) = {ww [w e {c, f}+}.
Consequently, if LA is context-free, then h(LA ~ R) must also be context-
free. However, we know that h(LA ~ R) is not context-free so we must con-
clude that L A, the set of all valid A L G O L programs, is not a context-free
language.
Example 2.43 shows that a programming language requiring declaration

of identifiers which can be arbitrarily Iong is not context-free. In a compiler,
however, identifiers are usually handled by the lexical analyzer and reduced
to single tokens before reaching the syntactic analyzer. Thus the language
that is to be recognized by the syntactic analyzer usually can be considered
to be a context-free language.
There are other non-context-free aspects of A L G O L and many other
languages. For example, each procedure takes the same number of arguments
each time it is mentioned. It is thus possible to show that the language which
the syntactic analyzer sees is not context-free by mapping programs with
three calls of the same procedure to {0"10"10" In > 0}, which is not a CFL.
Normally, however, some process outside of syntactic analysis is used to
check that the number of arguments to a procedure is consistent with the
definition of the procedure.
2.6.3. Decidability Results
We have already seen that the emptiness problem is decidable for context-
free grammars. Algorithm 2.7 will accept any context-free grammar G as
input and determine whether or not L(G) is empty.
Let us consider the membership problem for CFG's. We must find
an algorithm which given a context-free grammar G = (N, E, P, S) and
a word w in E*, will determine whether or not w is in L(G). Obtaining an
efficient algorithm for this problem will provide much of the subject matter
of Chapters 4-7. However, from a purely theoretical point of view we
can immediately conclude that the membership problem is solvable for
CFG's, since we can always transform G into an equivalent proper context-
free grammar G' using the transformations of Section 2.4.2. Neglecting
the empty word, a proper context-free grammar is a context-sensitive gram-
mar, so we can apply the brute force algorithm for deciding the membership
problem for context-sensitive grammars to G'. (See Exercise 2.1.19.)
Let us consider the equivalence problem for context-free grammars.
Unfortunately, here we encounter a problem which is not decidable. We
shall prove that there is no algorithm which, given any two CFG's G1 and G2,
can determine whether L(G1) = L(G2). In fact, we shall show that even given
a C F G G1 and a right-linear grammar G2 there is no algorithm to determine
whether L(Gi) = L(G2). As with most undecidable problems, we shall show
that if we can solve the equivalence problem for CFG's, then we can solve
Post's correspondence problem. We can construct from an instance of Post's
correspondence problem two naturally related context-free languages.
DEFINITION
Let C = (x 1, y l ) , . . . , (x,, y,) be an instance of Post's problem over
alphabet E. Let I = {1, 2 , . . . , n}, assume that I ~ E = .•, and let Lc
be { x ~ , x , . . . X~,.i,,,ir,,-t''" i 1 1 i ~ , ' ' ' , im a r e in L m ~> 1}. Let Mc be

{YtlY, " " Yi,.i,,,ir,,-t "'" it lit, . . . ,im a r e i n / , m > 1}.
LEMMA 2.29
Let C - - ( x t , Y t ) , . . . , (x,, y,) be an instance of Post's correspondence
problem over X, where X rq {1, 2 . . . . , n} = ~ . Then
(1) We can find extended D P D A ' s accepting Lc and M o
(2) Lc n Mc -- ~ if and only if C has no solution.
Proof
(1) It is straightforward to construct an extended D P D A (with pushdown
top on the right) which stores all symbols in ~: on its pushdown list. When
symbols from { 1 , . . . , n} appear on the input, it pops x~ from the top of its
list if integer i appears on the input. If x~ is not at the top of the list, the D P D A
halts. The D P D A also checks with its finite control that its input is in
X + { 1 , . . . , n} + and accepts when all symbols from X are removed from the
pushdown list. Thus, L c is accepted. We may find an extended D P D A
for M c similarly.
(2) If L c ~ Mc contains the sentence wire "'" it, where w is in X +, then
w is clearly a viable sequence. If xil "-" x~ = yi~ - . . y~, = w, then wim . . . i t
will be in Lc ~ M o D
Let us return to the equivalence problem for CFG's. We need two addi-
tional languages related t o an instance of Post's correspondence problem.
DEFINITION
Let C - - ( x ~ , Y t ) , . . . , (x,, y,) be an instance of Post's correspondence

problem over X and let I = ~ 1 , . . . , n}. Assume that X (3 I = ~ . Define
Q____c= {w4pwR!] w is in x+I+}, where # is not in ]g or L Define Pc = Lc:#:Mg.
LEMMA 2.30
Let C be as above. Then
(1) We can find extended D P D A ' s accepting Qc and Pc, and
(2) Qc ~ Pc = ~ if and only if C has no solution.
Proof.
(1) A D P D A accepting Qc can be constructed easily. For Pc, we know
by L e m m a 2.29 that there exists a D P D A , say M~, accepting L o To find
a D P D A M2 that accepts M g is not much harder; one stores integers and
checks them against the portion of input that is in X+. Thus we can construct
D P D A M3 to simulate Mr, check for :#, and then simulate M2.
(2) If u v ~ w x is in Qc ~ Pc, where u and x are in X + and v and w in
1 +, then u = x ~, and v = wR, because u v @ w x is in Qo Because u v ~ w x is
in Pc, u is a viable sequence. Thus, C has a solution. Conversely, if we have
x i , ' " xi,. = Yii "'" Yt,., then x~, . . . Xt,,i m ' ' ' it ~ il "'" i,,,x~.,.., xi, is in
Qc A P o
LEMMA 2.31
Let C be as above. Then

(1) We can find a C F G for Qc u Pc, and
(2) Qc u Pc = (E u I)* if and only if C has no solution.
Proof.
(1) From the closure of deterministic CFL's under complement (Theorem
2.23), we can find DPDA's for Qc and Pc. From the equivalence of CFL's
and PDA languages (Lemma 2.26), we can find CFG's for these languages.
From closure of CFL's under union, we can find a C F G for Qc u Pc.
(2) Immediate from Lemma 2.30(2) and De Morgan's law. D
We can now show that it is undecidable whether two CFG's generate

the same language. In fact, we can prove somethingstronger" It is still unde-
cidable even if one of the grammars is right-linear.
THEOREM 2.28
It is undecidable for a C F G G~ and a right-linear grammar G 2 whether
L(G~) = L(G~).
Proof. If not, then we could decide Post's problem as follows"
(1) Given an instance C, construct, by Lemma 2.31, a C F G G~ generating
Qc L9 Pc, and construct a right-linear grammar G z generating the regular set
(~ U I)*, where C is over ~, C has lists of length n, and I = {1 . . . . , n}.
Again, some renaming of symbols may first be necessary, but the existence
or nonexistence of a solution is left intact.
(2) Apply the hypothetical algorithm to determine if L(G~)= L(G2).
By Lemma 2.31(2), this equality holds if and only if C has no solution. [Z]
Since there are algorithms to convert a C F G to a PDA and vice versa,

Theorem 2.28 also implies that it is undecidable whether two PDA's, or
a PDA and a finite automaton, recognize the same language, whether a PDA
recognizes the set denoted by a regular expression, and so forth.
2.6.4. Properties of Deterministic CFL's
The deterministic context-flee languages are closed under remarkably

few of the operations under which the entire class of context-free languages
is closed. We already know that the deterministic CFL's are closed under
complement. Since La = {a~b~cJ[i,j ~ 1} and L2 ---- {a~bJcJ[i,j ~ 1} are both
deterministic CFL's and L~ ~ Lz = {a"b"c"[n ~ 1} is a language which is
not context-free (Example 2.39), we have the following two nonclosure
properties.
THEOREM 2.29
The class of deterministic CFL's is not closed under intersection or union.
Proof. Nonclosure under intersection is immediate from the above.

Nonclosure under union follows from De Morgan's law and closure under
complement.
The deterministic CFL's form a proper subset of the CFL's, as we see

from the following example.
Example 2.44
We can easily show that the complement of L : {a"bnc"ln ~ 1} is a CFL.
The sentence w ~ L if and only if one or more of the following hold"
(1) w is not in a+b÷c÷.
(2) w : aib@k, and i ~ j.
(3) w : atbJck, and j ~ k.
The set satisfying (1) is regular, and the sets satisfying (2) and (3) are each
context-free, as the reader can easily show by constructing nondeterministic
PDA's recognizing them. Since the CFL's are closed under union, L is a CFL.
But if L were a deterministic CFL, then L would be likewise, by Theorem
2.23. But L is not even a CFL. [Z]
The deterministic CFL's have the same positive decidability results as

the CFL. That is, given a D P D A P, we can determine whether L ( P ) = ~,
and given an input string w, we can easily determine whether w is in L(P).
Moreover, given a deterministic D P D A P and a regular set R, we can
determine whether L(P)= R, since L(P)= R if and only if we have (L(P)n R)
U (L(P) n R) = ~. (L(P) n /~) U (L(P) N R) is easily seen to be a CFL.
Other decidability results appear in the Exercises.
2.6.5. Ambiguity
Recall that a context-free g r a m m a r G = (N, E, P, S) is ambiguous if

there is a sentence w in L(G) with two or more distinct derivation trees.
Equivalently, G is ambiguous if there exists a sentence w with two distinct
leftmost (or rightmost) derivations.
When we are using a g r a m m a r to help define a programming language
we would like that g r a m m a r to be unambiguous. Otherwise, a programmer
and a compiler may have differing opinions as to the meaning of some
sentences.
Example 2.45
Perhaps the most famous example of ambiguity in a programming lan-
guage is the dangling else. Consider the grammar G with productions
S > if b then S else S[if b then S la

G is ambiguous since the sentence
if b then if b then a else a
has two derivation trees as shown in Fig. 2.18. The derivation tree in Fig.
2.18(a) imposes the interpretation
if b then (if b then a) else a
while the tree in Fig. 2.18(b) gives
if b then (if b then a else a)
if
(a)
if b then S
if b then S else
i
a
(b)
Fig. 2.18 Two derivation trees.
We might like to have an algorithm to determine whether an arbitrary

C F G is unambiguous. Unfortunately, such an algorithm does not exist.
THEOREM 2.30
It is undecidable whether a C F G G is ambiguous.
Proof. Let C = (x 1, Yl), • • •, (x,, y,,) be an instance of Post's correspon-
dence problem over X. Let G be the C F G (IS, A, B}, X U / , P, S), where
I = {1, 2 , . . . , hi, and P contains the productions
-- 7
S >AIB
A > x~Ailxti, for 1 < i < n
B~ yiBilyti, for 1 < i < n
The nonterminals A and B generate the languages Lc and Mc, respec-

tively, defined on p. 199. It is easy to see that no sentence has more than one
distinct leftmost derivation from A, or from B. Thus, if there exists a sentence
with two leftmost derivations from S, one must begin with S ~ A, and
Im
the other with S ~ B. But by Lemma 2.29, there is a sentence derived from
Im
both A and B if and only if instance C of Post's problem has a solution.
Thus, G is ambiguous if and only if C has a solution. It is then a straight-
forward matter to show that if there were an algorithm to decide the ambi-
guity of an arbitrary CFG, then we could decide Post's correspondence
problem. [~]
Ambiguity is a function of the grammar rather than the language. Certain
ambiguous grammars may have equivalent unambiguous ones.
Example 2.46
Let us consider the grammar and language of the previous example.
The reason that grammar G is ambiguous is that an else can be associated
with two different then's. For this reason, programming languages which
allow both if-then--else and if-then statements can be ambiguous. This
ambiguity can be removed if we arbitrarily decide that an else should be
attached to the last preceding then, as in Fig. 2.18(b).
We can revise the grammar of Example 2.45 to have two nonterminals,
S~ and $2. We insist that $2 generate if-then-else, while S~ is free to generate
either kind of statement. The rules of the new grammar are
S1 > if b then S~lif b then $2 else Sxla

$2 > if b then $2 else S2la
The fact that only $2 precedes else ensures that between the then-else
pair generated by any one production must appear either the single symbol a
or another else. Thus the structure of Fig. 2.18(a) cannot occur. In Chapter 5
we shall develop deterministic parsing methods for various grammars,
including the current one, and shall be able at that time to prove our new
grammar to be unambiguous. [Z]
Although there is no general algorithm which can be used to determine if

a grammar is ambiguous, it is possible to isolate certain constructs in produc-
tions which lead to ambiguous grammars. Since ambiguous grammars are
often harder to parse than unambiguous ones, we shall mention some of the
more common constructs of this nature here so that they can be recognized
in practice.
A proper grammar containing the productions A--~ AAI;o~ will be am-
biguous because the substring A A A has two parses-
A A
A
/\ /\
A A A A
This ambiguity disappears if instead we use the productions
A >AB1B
B >~
or the productions
A >BAIB
B >¢
Another example of an ambiguous production is A ~ AocA. The pair of

productions A --~ ocA ]A fl introduces ambiguity since A ~=~ ocA ~ ~A fl and
A ~> Aft ~=~ ~Afl imply two distinct leftmost derivations of ocAfl. A slightly
more elaborate pair of productions which gives rise to an ambiguous gram-
mar is A - - ~ ocA ]ocArlA. Other exampIes of ambiguous grammars can be
found in the Exercises.
We shall call a CFL inherently ambiguous if it has no unambiguous
CFG. It is not at first obvious that there is such a thing as an inherently
ambiguous CFL, but we shall present one in the next example. In fact, it is
undecidable whether a given C F G generates an inherently ambiguous lan-
guage (i.e., whether there exists an equivalent unambiguous CFG). However,
there are large subclasses of the CFL's known not to be inherently ambiguous
and no inherently ambiguous programming languages have been devised yet.
Most important, every deterministic CFL has an unambiguous grammar, as
we shall see in Chapter 8.
Example 2.47
Let L - - [ a i b : c 1 ] i - - j or j ~ l~. L is an inherently ambiguous CFL.
Intuitively, the reason is that the words with i - - j must be generated by
a set of productions different from those generating the words with j - - I.
At least some of the words with i -- j -- l must be generated by both mecha-
nisms.
One C F G for L is
S: ~ > A B I D C
A > aAle
B > bBcle
C > cCl e
D > aDb[e
Clearly the above grammar is ambiguous.

We can use Ogden's lemma to prove that L is inherently ambiguous. Let
G be an arbitrary grammar for L, and let k be the constant associated with
G in Theorem 2.24. If that constant is less than 3, let k = 3. Consider the
word z = akbkc k+k', where the a's are all distinguished. We can write
z = uvwxy. Since w has distinguished positions, u and v consist only of a's.
If x consists of two different symbols, then uvZwx~y is surely not in L, so x
is either in a*, b*, or c*.
If x is in a*, uv2wx2y would be ak+Pbkc k+k t for some p, 1 ~ p ~ k, which
is not in L. If x is in c*, uv2wx2y would be ak+P'bkc k÷kt÷~'', where 1 ~ pt ~ k.
This word likewise is not in L.
In the second case, where x is in b*, we have uv2wx2y = ak+P'bk+P'c k÷k:,
where 1 ~ p~ ~ k. If this word is in L, then either p~ = P2 or p~ :¢: P2 and
p~ = k! In the latter case, uv3wx3y = a k + 2 P l b k + Z r ' C k+kt is surely not in L.
So we conclude that Pl = P2- Observe that pl = Iv[ and pz ----[x].
By Theorem 2.24, there is a derivation
+ + +
(2.6.1) S ~ uAy ~ uvmAx'y ~ btvmwxmy for all m :> 0
In particular, let m = k ! / p 1. Since 1 ~ pl ~ k, we know that m is an

integer. Then uv'nwxmy : ak+k:bk+ktC k+k:.
A symmetric argument starting with the word ak+k~bkc k shows that
there exist u !, v !, W ! , x', y', where only u' has an a, v' is in b*, and there is
a nonterminal B such that
+ + +
S ~ u'By' ~ u'(v')m'B(x')m'y ' ~ U'(v')m'w(x')m'y
(2.6.2)
= ak+k!bk+ktck+kt
If we can show that the two derivations of ak+k~be+ktCk÷kt have different

derivation trees, then we shall have shown that L is inherently ambiguous,
since G was chosen without restriction and has been shown ambiguous.
Suppose that the two derivations (2.6.1) and (2.6.2) have the same deriva-
tion tree. Since A generates a's and b's and B generates b's and c's, neither
A nor B could appear as a label of a descendant of a node labeled by the
EXERCISES 207
other. Thus there exists a sentential f o r m tiAt2Bt3, where the t's are terminal
strings. F o r all i a n d j, tlv~wxq2(v')iw'(x')it3 would presumably be in L. But
I vl = ]x[ a n d [v'l = Ix' [. Also, x a n d v' consist exclusively of b's, v consists
of a's, a n d x' consists of c's. Thus choosing i a n d j equal a n d sufficiently
large will ensure that the above w o r d has m o r e b's than a's or o's. W e m a y
thus conclude that G is a m b i g u o u s and that L is inherently ambiguous. [ ]
EXERCISES
2.6.1. Let L be a context-free language and R a regular set. Show that the
following languages are context-free:
(a) INIT(L).
(b) FIN(L).
(c) SUB(L).
(d) L/R.
(e) L ~ R.
The definitions of these operations are found in the Exercises of Sec-
tion 2.3 on p. 135.
2.6.2. Show that if L is a CFL and h a homomorphism, then h-a(L) is a CFL.
Hint: Let P be a PDA accepting L. Construct P' to apply h to each of
its input symbols in turn, store the result in a buffer (in the finite con-
trol), and simulate P on the symbols in the buffer. Be sure that your
buffer is of finite length.
2.6,3. Show that the following are not CFL's:
(a) [aibic j [j < i].
(b) {a~bJckl i < j < k}.
(c) The set of strings with an equal number of a's, b's, and c's.
(d) [a'bJaJb' IJ ~ i].
(e) [amb"a'~b ~ [m, n ~ 1].
(f) {a~biek[none of i, j, and k are equal}.
(g) {nHa ~ [n is a decimal integer > 1}. (This construct is representative
of F O R T R A N Hollerith fields.)
**2.6.4. Show that every CFL over a one-symbol alphabet is regular. Hint:
Use the pumping lemma.
**2.6.$. Show that the following are not always CFL's when L is a CFL:
(a) MAX(L).
(b) MIN(L).
(c) L 1/2 = [x[ for some y, xy is in L and [xl = [y[}.
*2.6.6. Show the following pumping lemma for linear languages. If L is a linear
language, there is a constant k such that if z ~ L and Iz] > k, then
z = uvwxy, where[uvxy] < k, vx ~ e and for all i, uvtwx~y is in L.
2.6.7. Show that [a"b"amb m In, m > 1} is not a linear language.
'2.6.8. A one-turn PDA is one which in any sequence of moves first writes
symbols on the pushdown list and then pops symbols from the push-
down list. Once it starts popping symbols from the pushdown list, it
can then never write on its pushdown list. Show that a C F L is linear
if and only if it can be recognized by a one-turn PDA.
*2.6.9. Let G = (N, 2~, P, S) be a CFG, Show that the following are CFL's"
(a) {tg[S ==~ tg}.
Im
(b) { ~ l s ~ ~}.
rm
(c) (~ls=~ ~].

2.6.10. Give details of the proof of the corollary to Theorem 2.25.
2.6.12. Give formal constructions of the D P D A ' s used in the proofs of Lemmas
2.29( 1) and 2.30(1 ).
"2.6.13. Show that the language Qc n Pc of Section 2.6.3 is a C F L if and only
if it is empty.
2.6.14. Show that it is undecidable for C F G G whether
(a) L(G) is a CFL.
(b) L(G) is regular.
(c) L(G) is a deterministic CFL.
Hint: Use Exercise 2.6.13 and consider a C F G for Qc u Pc.
2.6.15. Show that it is undecidable whether context-sensitive grammar G
generates a CFL.
2.6.16. Let G1 and G2 be CFG's. Show that it is undecidable whether
L(G1) n L(Gz) = ~ .
"2.6.17. Let G1 be a C F G and Gz a right-linear grammar. Show that
(a) It is undecidable whether L(Gz) ~ L(G1).
(b) It is decidable whether L(Gi) ~ L(G2).
"2.6.18. Let P1 and Pz be DPDA's. Show that it is undecidable whether
(a) L(Pi) U L(Pz) is a deterministic CFL.
(b) L(Pa)L(P2) is a deterministic CFL.
(c) L(P1) ~ L(Pz).
(d) L(P1)* is a deterministic CFL.
*'2.6.19, Show that it is decidable, for D P D A P, whether L(P) is regular. Con-
trast Exercise 2.6.14(b).
**2.6.20. Let L be a deterministic C F L and R a regular set. Show that the fol-
lowing are deterministic CFL's"
(a) LR. (b) L/R.
(c) L u R. (d) MAX(L).
(e) MIN(L). (f) L N R.
Hint: F o r (a, b, e, f ) , let P be a D P D A for L and M a finite automaton
for some regular set R. We must show that there is a D P D A P ' which
EXERCISES 209
simulates P but keeps on each cell of its pushdown list the information,
" F o r what states p of M and q of P does there exist w that will take
M from state p to a final state and cause P to accept if started in state
q with this cell the top of the pushdown list ?" We must show that
there is but a finite amount of information for each cell and that P '
can keep track of it as the pushdown list grows and shrinks. Once we
know how to construct P', the four desired D P D A ' s are relatively easy
to construct.
2.6.21. Show that for deterministic C F L L and regular set R, the following
may not be deterministic C F L ' s :
(a) R L .
(b) [ x l x R ~ L].
(c) {xi for some y ~ R, we have y x ~ L}.
(d) h(L), for h o m o m o r p h i s m h.
2.6.22. Show that h-I(L) is a deterministic C F L if L is.
**2.6.23. Show that Qc u Pc is an inherently ambiguous C F L whenever it is
not empty.
**2.6.24. Show that it is undecidable whether a C F G G generates an inherently
ambiguous language.
*2.6.25. Show that the grammar of Example 2.46 is unambiguous.
**2.6.26. Show that the language LI U Lz, where L1 = {a'b'ambmlm, n > 1}
and L2 = {a'bma~b" i m, n > 1}, is inherently ambiguous.
**2.6.27. Shove that the C F G with productions S----~ a S b S c l a S b l b S c l d is
ambiguous. Is the language inherently ambiguous ?
*2.6.28. Show that it is decidable for a D P D A P whether L ( P ) has the prefix
property. Is the prefix property decidable for an arbitrary C F L ?
DEFINITION
A D y c k language is a C F L generated by a grammar G =
({S}, Z, P, S), where E = {al . . . . . a~, bl . . . . . bk} for some k _~ 1 and
P consists of the productions S --~ S S l a i S b l l a 2 S b z l " " l a k S b k l e .
**2.6.29. Show that given alphabet Z, we can find an alphabet Z', a Dyck lan-
guage LD ~ ~ ' , and a homorphism h from Z'* to Z* such that for any
C F L L ~ Z* there is a regular set R such that h(LD ~ R ) = L.
*2.6.30. Let L be a C F L and S ( L ) = [ilfor some w ~ L, we have Iwl = i}.
Show that S ( L ) is a finite union of arithmetic progressions.
DEFINITION
An n-vector is an n-tuple of nonnegative integers. If v i = (a l , . . . , a,)
and v~ = (bl . . . . . b,) are n-vectors and c a nonnegative integer, then
v l -~- vz = (a l ÷ b l . . . . . a, + b,) and cv l = (ca l , . . ., ca,). A set S
of n-vectors is linear if there are n-vectors v0 . . . . . Vk such that
S=[vlv = v 0 + ClVl + . . . + CkVk, for some nonnegative integers
Cl . . . . . ck]. A set of n-vectors is semilinear if it is the union of a finite

n u m b e r of linear sets.
*'2.6.31. Let E - - a ~ , a2, . . . . a,. Let ~b(x) be the n u m b e r of instances of b
in the string x. Show that [(#al(w), @as(w) . . . . . :~a.(w))l w ~ L) is a
semilinear set for each C F L L _ E*.
DEFINITION
The index o f a derivation in a C F G G is the m a x i m u m number of
nonterminals in any sentential form of that derivation, l(w), the index
o f a sentence w, is the smallest index of any derivation of w in G. I(G),
the index o f G, is max I(w) taken over all w in L(G). The index o f a
CFL L is min I(G) taken over all G such that L(G) = L.
**2.6.32. Show that the index of the g r a m m a r G with productions
S ~ SSIOSI le
is infinite. Show that the index of L(G) is infinite.

+
*2.6.33. A C F G G -- (N, Z, P, S) is self-embedding if A =~ uAv for some u and
v in Z +. (Neither u nor v can be e.) Show that a C F L L is not regular
if and only if all grammars that generate L are self-embedding.
DEFINITION
Let ~ be a class of languages with L1 ~ E~ and L2 ~ Ez* in £ .

Let a and b be new symbols not in Z1 U Z2. ~ is closed under
(1) Marked union if aL 1 U bL2 is in £ ,
(2) M a r k e d concatenation if LlaL2 is in L, and
(3) Marked • if (aLl)* is in £ .
2.6.34. Show that the deterministic C F L ' s are closed under marked union,
marked concatenation, and marked ..
*2.6.35. Let G be a (not necessarily context-free) g r a m m a r (N, Z, P, S), where
each production in P is of the form x A y ~ x~y, x and y are in Z*,
A E N, and ~' E (N u E)*. Show that L(G) is a CFL.
**2.6.36. Let G1 -- (N1, E l , P1, $1) and G2 -- (N2, Z2, Pz, $2) be two C F G ' s .
Show it is undecidable whether [~1S1 - > ~ } = [ f l l S z : = ~ f l } and
Gl G2
whether [~1Si ~ 0~} = [fllS2 ~ fl].

Gi Im G~ Im
Open Problem
2.6.37. Is it decidable, for D P D A ' s PI and P2, whether L(P1) -- L(P2)?
Research Problems
2.6.38. Develop methods for proving certain grammars to be unambiguous.
By Theorem 2.30 it is impossible to find a method that will work for
an arbitrary unambiguous grammar. However, it would be nice to have

techniques that could be applied to large classes of context-free gram-
mars.
2.6.39. A related research area is to find large classes of CFL's which are
known to have at least one unambiguous CFG. The reader should be
aware that in Chapter 8 we shall prove the deterministic CFL's to be
such a class.
2.6.40. Find transformations which can be used to make classes of ambiguous
grammars unambiguous.
BIBLIOGRAPHIC NOTES
We shall not attempt to reference here all the numerous papers that have been
written on context-free languages. The works by Hopcroft and Ullman [1969],
Ginsburg [1966], Gross and Lentin [1970], and Book [1970] contain many of the
references on the theoretical developments of context-free languages.
Theorem 2.24, Ogden's lemma, is from Ogden [1968]. Bar-Hillel et al. [1961]
give several of the basic theorems about closure properties and decidability results
of CFL's. Ginsburg and Greibach [1966] give many of the basic properties of
deterministic CFL's.
Cantor [1962], Floyd [1962a], and Chomsky and Schutzenberger [19631 inde-
pendently discovered that it is undecidable whether a CFG is ambiguous. The
existence of inherently ambiguous CFL's was noted by Parikh [1966]. Inherently
ambiguous CFL's are treated in detail by Ginsburg [1966] and Hopcroft and
Ullman [19691.
The Exercises contain many results that appear in the literature. Exercise 2.6.19
is from Stearns [1967]. The constructions hinted at in Exercise 2.6.20 are given in
detail by Hopcroft and Ullman [1969]. Exercise 2.6.29 is proved by Ginsburg
[1966]. Exercise 2.6.31 is known as Parikh's theorem and was first given by Parikh
[1966]. Exercise 2.6.32 is from Salomaa [1969b]. Exercise 2.6.33 is from Chomsky
[1959a]. Exercise 2.6.36 is from Blattner [19721.
THEORY OF
TRANSLATION
A translation is a set of pairs of strings. A compiler defines a translation

in which the pairs are (source program, object program). If we consider
a compiler consisting of the three phases lexical analysis, syntactic analysis,
and code generation, then each of these phases itself defines a translation.
As we mentioned in Chapter 1, lexical analysis can be considered as a trans-
lation in which strings representing source programs are mapped into sti'ings
of tokens. The syntactic analyzer maps strings of tokens into strings repre-
senting trees. The code generator then takes these strings into machine or
assembly language.
In this chapter we shall present some elementary methods for defining
translations. We shall also present devices which can be used to implement
these translations and algorithms which can be used to automatically con-
struct these devices from the specification of a translation.
We shall first explore translations from an abstract point of view and then
consider the applicability of the translation models to lexical analysis and
syntactic analysis. For the most part, we defer treatment of code generation,
which is the principal application of translation theory, to Chapter 9.
In general, when designing a large system, such as a compiler, one should
partition the overall system into components whose behavior and properties
can be understood and precisely defined. Then it is possible to compare
algorithms which can be used to implement the function to be performed by
that component and to select the most appropriate algorithm for that com-
ponent. Once the components have been isolated and specified, it should then
also be possible to establish performance standards for each component and
tests by which a given component can be evaluated. We must therefore
212
SEC. 3.1 FORMALISMS FOR TRANSLATIONS 213
understand the specification and implementation of translations before we

can apply engineering design criteria to compilers.
3.1. FORMALISMS FOR TRANSLATIONS
In this section two fundamental methods of defining translations are

presented. One of these is the "translation scheme," which is a grammar with
a mechanism for producing an output for each sentence generated. The other
method is the "transducer," a recognizer which can emit a finite-length string
of output symbols on each move. First we shall consider translation schemes
based on context-free grammars. We shall then consider finite transducers
and pushdown transducers.
3.1.1. Translation and Semantics
In Chapter 2 we considered only the syntactic aspects of languages. There

we saw several methods for defining the well-formed sentences of a language.
We now wish to investigate techniques for associating with each sentence of
a language another string which is to be the output for that sentence. The
term "semantics" is sometimes used to denote this association of outputs
with sentences when the output string defines the "meaning" of the input
sentence.
DEFINITION
Suppose that Z is an input alphabet and A an output alphabet. We define

a translation f r o m a language L 1 ~ E* to a language L 2 ~ A* as a relation
T from Z* to A* such that the domain of T is L 1 and the range of T is L 2.
A sentence y such that (x, y) is in T is called an output for x. Note that,
in general, in a translation a given input can have more than one output.
However, any translation describing a programming language should be
a function (i.e., there exists at most one output for each input).
There are many examples of translations. Perhaps the most rudimentary
type of translation is that which can be specified by a homomorphism.
Example 3.1
Suppose that we wish to change every Greek letter in a sentence in E*

into its corresponding English name. We can use the homomorphism h,
where
(1) h(a) -- a if a is a member of Z minus the Greek letters and
(2) h(a) is defined in the following table if a is a Greek letter:
214 THEORY OF TRANSLATION CHAP. 3
Greek Letter h Greek Letter h
A a alpha N v nu
B fl beta E, ~ xi
F y gamma O o omicron
A 5 delta II r~ pi
E e epsilon P p rho
Z ( zeta Z a sigma
H 1/ eta T z tau
O 0 theta T o upsilon
I z iota • ~b phi
K x kappa X Z chi
A 2 lambda ~F V psi
M g mu ~ co omega
For example, the sentence a = nr 2 would have the translation a = pi r 2. [Z]
Another example of a translation, one which is useful in describing

a process that often occurs in compilation, is mapping arithmetic expressions
in infix notation into equivalent expressions in Polish notation.
DEFINITION
There is a useful way of representing ordinary (or infix) arithmetic expres-
sions without using parentheses. This notation is referred to as Polish nota-
tion.t Let 19 be a set of binary operators (e.g., [ + , ,]), and let Z be a set of
operands. The two forms of Polish notation, prefix Polish and postfix Polish,
are defined recursively as follows:
(1) If an infix expression E is a single operand a, in Z, then both the prefix
Polish and postfix Polish representation of E is a.
(2) If E1 0 E2 is an infix expression, where 0 is an operator, and Et and
E2 are infix expressions, the operands of 0, then
(a) 0 E'tE~ is the prefix Polish representation of El 0 E2, where E'i
and E~ are the prefix Polish representations of E~ and Ez, respec-
tively, and
(b) ,_.~r"0r"'
~ z is the postfix Polish representation of Ei 0 Ez, where E~'
and E~ are the postfix Polish representations of E~ and Ez, respec-
tively.
(3) If (E) is an infix expression, then
(a) The prefix Polish representation of (E) is the prefix Polish repre-
sentation of E, and
(b) The postfix Polish representation of (E) is the postfix Polish
representation of E.
]'The term "Polish" is used, as this notation was first described by the Polish mathe-
matician Lukasiewicz, whose name is significantly harder to pronounce than is "Polish."
Example 3.2
Consider the infix expression (a + b) • c. This expression is of the form
E1 * E2, where E1 = (a q- b) and E2 = c. Thus the prefix and postfix Polish
expressions for E2 are both c. The prefix expression for E~ is the same as that
for a-q-b, which is q-ab. Thus the prefix expression for (a ÷ b ) , c is
,+abe.
Similarly, the postfix expression for a ÷ b is ab ÷, so the postfix expres-
sion for (a + b) • c is a b ÷ c , . D
It is not at all obvious that a prefix or postfix expression can be uniquely

returned to an infix expression. The observations leading to a proof of this
fact are found in Exercises 3.1.16 and 3.1.17.
We can use trees to conveniently represent arithmetic expressions. For
example, (a ÷ b ) , c has the tree representation shown in Fig. 3.1. In the
Fig. 3.1 Tree representation for

(a + b)*c.
tree representation each interior node is labeled by an operator from O and

each leaf by an operand from ~. The prefix Polish representation is merely
the left-bracketed representation of the tree with all parentheses deleted.
Similarly, the postfix Polish representation is the right-bracketed represen-
tation of the tree, with parentheses again deleted.
Two important examples of translations are the sets of pairs
((x, y)lx is an infix expression and y is the prefix (or,

alternatively, postfix) Polish representation of x].
These translations cannot be specified by a homomorphism. We need transla-

tion specifiers with more power and shall now turn our attention to formal-
isms which allow these and other translations to be conveniently specified.
3.1.2. Syntax-Directed Translation Schemata
The problem of finitely specifying an infinite translation is similar to

the problem of specifying an infinite language. There are several possible
approaches toward the specification of translations. Analogous to a language
generator, such as a grammar, we can have a system which generates the pairs
in the translation. We can also use a recognizer with two tapes to recognize
those pairs in the translation. Or we could define an automaton which takes

a string x as input and emits (nondeterministically if necessary) all y such
that y is a translation of x. While this list does not exhaust all possibilities,
it does cover the models in common use.
Let us call a device which given an input string x, calculates an output
string y such that (x, y) is in a given translation T, a translator for T. There
are several features which are desirable in the definition of a translation.
Two of these features are
(1) The definition of the translation should be readable. That is to say,
it should be easy to determine what pairs are in the translation.
(2) It should be possible to mechanically construct an efficient translator
for that translation directly from the definition.
Features which are desirable in translators are
(1) Efficient operation. For an input string w of length n, the amount of
time required to process w should be linearly proportional to n.
(2) Small size.
(3) Correctness. It would be desirable to have a small finite test such that
if the translator passed this test, this would be a guarantee that the translator
works correctly on all inputs.
One formalism for defining translations is the syntax-directed translation
schema. Intuitively, a syntax-directed translation schema is simply a grammar
in which translation elements are attached to each production. Whenever
a production is used in the derivation of an input sentence, the translation
element is used to help compute a portion of the output sentence associated
with the portion of the input sentence generated by that production.
Example 3.3
Consider the following translation schema which defines the translation
{(x, XR)[X ~ {0, 1}*}. That is, for each input x, the output is x reversed. The
rules defining this translation are
Production Translation Element
(1) S---* OS S = SO
(2) S ~ I S S=S1
(3) S---. e S-- e
An input-output pair in the translation defined by this schema can be

obtained by generating a sequence of pairs of strings (0c, fl) called translation
f o r m s , where ~ is an input sentential form and fl an output sentential form.
We begin with the translation form (S, S). We can then apply the first rule
sac. 3.1 FORMALISMS FOR TRANSLATIONS 217
to this form. To do so, we expand the first S using the production S ~ 0S.
Then we replace the output sentential form S by SO in accordance with the
translation element S ---- SO. For the time being, we can think of the trans-
lation element simply as a production S --~ SO. Thus we obtain the transla-
tion form (0S, SO). We can expand each S in this new translation form by
using rule (1) again to obtain (00S, SO0). If we then apply rule (2), we obtain
(001S, S100). If we then apply rule (3), we obtain (001,100). No further rules
can be applied to this translation form and thus (001,100) is in the transla-
tion defined by this translation schema. V-]
A translation schema T defines some translation z(T). We can build

a translator for T(T) from the translation schema that works as follows.
Given an input string x, the translator finds (if possible) some derivation of
x from S using the productions in the translation schema. Suppose that
S = ~0 =* ~ =* ~z =~ "'" =* ~, = x is such a derivation. Then the trans-
lator creates a derivation of translation forms
(~0,/~0) --~- ( ~ , P,) . . . . . > (~., p.)
such that (~0, fl0) = (S, S), (~,, ft.) = (x, y), and each fl, is obtained by
applying to fl~-i the translation element corresponding to the production
used in going from t~_ 1 to ~t at the "corresponding" place. The string y is
an output for x.
Often the output sentential forms can be created at the time the input is
being parsed.
Example 3.4
Consider the following translation scheme which maps arithmetic expres-
sions of L ( G o ) to postfix Polish"
Production Translation Element
E----~ E + T E = ET-+-
E--~T E-'T
T--* T* F T= TF*
T---~F T=F
F ~ (E) F = E
F---~a F=a
The production E ~ E q- T is associated with the translation element

E = ET +. This translation element says that the translation associated with
E on the left of the production is the translation associated with E on the
right of the production followed by the translation of T followed by + .
21 8 THEORYOF TRANSLATION CHAP. 3
Let us determine the output for the input a + a • a. To do so, let us first
find a leftmost derivation of a -+ a • a from E using the productions of the
translation scheme. Then we compute the corresponding sequence of trans-
lation forms as shown"
(E, E) ~ (E -q- T, E T + )
- - - - > ( T + T, T T + )
------~ (F q- T, F T + )
~- (a q- T, aT q-)
----> (a -q- T , F, aTF , +)
u ( a -q- F , F, aFF , + )
> (a + a , F, aaF , + )
~ >(a-+a,a, aaa,+)
Each output sentential form is computed by replacing the appropriate non-

terminal in the previous output sentential form by the right side of transla-
tion rule associated with the production used in deriving the corresponding
input sentential form.
The translation schemata in Examples 3.3 and 3.4 are special cases of
an important class of translation schemata called syntax-directed translation
schemata.
DEFINITION
A syntax-directed translation schema (SDTS for short) is a 5-tuple

T - - (N, Z, A, R, S), where
(1) N is a finite set of nonterminal symbols.
(2) Z is a finite input alphabet.
(3) A is a finite output alphabet.
(4) R is a finite set of rules of the form A ----~~, fl, where ~ ~ (N U Z)*,
fl ~ (N u A)*, and the nonterminals in ,6' are a permutation of the non-
terminals in ~.
(5) S is a distinguished nonterminal in N, the start symbol.
Let A ~ ~, fl be a rule. To each nonterminal of ~ there is associated
an identical nonterminal of ft. If a nonterminal B appears only once in
and ,8, then the association is obvious. If B appears more than once, we use
integer superscripts to indicate the association. This association is an intimate
part of the rule. For example, in the rule A ~ B(I~CB .~2~,B(2~B(I~C, the three
positions in BcI~CB Cz~ are associated with positions 2, 3, and 1, respectively,
in B~2~B~I~C.
We define a translation form of T as follows:
(1) (S, S) is a translation form, and the two S's are said to be associated.
(2) If (eArl, e'Afl') is a translation form, in which the two explicit
instances of A are associated, and if A ---~ 7, ~" is a rule in R, then (e?fl, e'?'fl')
is a translation form. The nonterminals of ? and ~,' are associated in the trans-
lation form exactly as they are associated in the rule. The nonterminals of
• and fl are associated with those of e' and fl' in the new translation form
exactly as in the old. The association will again be indicated by superscripts,
when needed, and this association is an essential feature of the form.
If the forms (eArl, e'Afl') and (eI, fl, e'~,'fl'), together with their associa-
tions, are related as above, then we write (eArl, e'Afl') =-~ (e?fl, e'?'fl').
T
+ * k
We use =~, ==~, and ==~ to stand for the transitive closure, reflexive-transitive
T T T
closure, and k-fold product of ==~. As is customary, we shall drop the sub-
T
script T whenever possible.
The translation defined by T, denoted z(T), is the set of pairs
{(x, y) l (S, S) ~ (x, y), x ~ 12" and y ~ A*}.
Example 3.5
Consider the SDTS T = (IS}, {a, -q-}, {a, + }, R, S), where R has the rules
S > -k Sc~)S ~2~, S ~ + S <z)

S > a, a
Consider the following derivation in T:
(S, S) ~ (+ S ~ ) S ~2~, S C~' + S TM)

(-q- -~- S(3)S(4)S(2) ' S ( 3 ) .q_ S ( 4 ) .~- S TM)
(+ + aS~4~S ~2~, a + S " ) + S ~2~)

> (-q- + aaS, a -q- a + S)
- - - ~ ( + + aaa, a q- a + a)
z(T) = ((x, a(q-a)~) [ i ~ 0 and x is a prefix polish representation of a(-t-a) ~

with some order of association of the -q-'s}. D
DEFINITION
If T = (N, E, A, R, S) is an SDTS, then z(T) is called a syntax-directed

translation (SDT). The grammar G~ = (N, E, P, S), where
P = [A ~ e l A ---~ e, fl is in R},
is called the underlying (or input)grammar of the SDTS T. The grammar

Go = (N, A, P ' , S), where P ' = {A----~ ill A ---~ t~, fl is in R} is called the
output grammar of T.
We can alternatively view a syntax-directed translation as a method of
transforming derivation trees in the input g r a m m a r G~ into derivation trees
in the output grammar Go. Given an input sentence x, a translation for x
can be obtained by constructing a derivation tree for x, then transforming
the derivation tree into a tree in the output grammar, and then taking the
frontier of the output tree as a translation for x.
ALGORITHM 3.1
Tree transformation via an SDTS.
lnput. An SDTS T = (N, £ , A , R, S), with input g r a m m a r Gt =
(N, £, P~, S), output g r a m m a r Go = (N, A, Po, S), and a derivation tree
D in G i, with frontier in X*.
Output. Some derivation tree D' in G o such that if x and y are the fron-
tiers of D and D', respectively, then (x, y) ~ ~:(T).
Method.
(1) Apt~ly step (2), recursively, starting with the root of D.
(2) Let this step be applied to node n. It will be the case that n is an inte-
rior node of D. Let n have direct descendants n x , . . . , nk.
(a) Delete those of n l , . . . , nk which are leaves (i.e., have terminal
or e-labels).
(b) Let the production of G~ represented by n and its direct descen-
dants be A ~ ~. That is, A is the label of n and t~ is formed by
concatenating the labels of n 1, . . . , nk. Choose some rule of the
form A ~ ~, fl in R.t Permute the remaining direct descendants
of n, if any, in accordance with the association between the non-
terminals of ~ and ft. (The subtrees dominated by these nodes
remain in fixed relationship to the direct descendants of n.)
(c) Insert direct descendant leaves of n so that the labels of its direct
descendants form ft.
(d) Apply step (2) to the direct descendants of n which are not leaves,
in order from the left.
(3) The resulting tree is D'. E]
Example 3.6
Let us consider the SDTS T = ({S, A}, {0, 1}, {a, b}, R, S), where R
consists of
"['Note that fl may not be uniquely determined from A and ~. If more than one rule
is applicable, the choice can be arbitrary.
S > OAS, SAa

A > OSA, A S a
S > 1, b
A >l,b
A derivation tree in the input grammar is shown in Fig. 3.2(a). If we apply

step (2) of Algorithm 3.1 to the root of Fig. 3.2(a), we delete the leftmost leaf
labeled 0. Then, since S ---~ OAS was the production used at the root and
the only translation element for that production is SAa, we must reverse the
order of the remaining direct descendants of the root. Then we add a third
direct descendant, labeled a, at the rightmost position. The resulting tree is
shown in Fig. 3.2(b).
s s s
0
/1\ A S
/1\
S A a S
/1\ A a
o
/ 1s\ A \ 1 1
//1\ 0 S A b
//1\ A S a
I I1 1
I I 1 1
I Ib b
(a) (b) (c)
Fig. 3.2 Application of Algorithm 3.1.
We next apply step (2) to the first two direct descendants of the root.
Application of step (2) to the second of these descendants results in two more
calls of step (2). The resulting tree is shown in Fig. 3.2(c). Notice that
(00111, bbbaa) is in z(T).
To show the relation between the translation process of Algorithm 3.1

and the SDTS which is input to that algorithm, we prove the following
theorem.
THEOREM 3.1
(1) If x and y are the frontiers of D and D', respectively, in Algorithm
3.1, then (x, y) is in z(T).
(2) If (x, y) is in I:(T), then there exists a derivation tree D with frontier
x and a sequence of choices for each execution of step (2b) such that the
frontier of the resulting tree D' is y.
Proof.
(1) We show the following by induction on the number of interior nodes
of a tree E:
222 THEORYOF TRANSLATION CHAP. 3
(3.1.1) Let E be a derivation tree in G~ with frontier x and root labeled A,

and suppose that step (2) applied to E yields a tree E' with frontier
y. Then (A, A) ~ (x, y)
The basis, one interior node, is trivial. All direct descendants are leaves,
and there must be a rule A --, x, y in R.
For the inductive step, assume that statement (3.1.1) holds for smaller
trees, and let the root of E have direct descendants with labels X 1 , . . . , Xk.
Then x = xl . . " x~, where Xj =-> xj, 1 _<j _< k. Let the direct descendants of
Gt
the root of E ' have labels Y~ • • • Y~. Then y = y~ . • • Yl, where Yj ~ yi,
Go
1 ~ j ~ 1. Also, there is a rule A ----~X1 . . . Xk, Y~ .." Yi in R.
If Xj is a nonterminal, then it is associated with some Y~, where
X~ = Yp,. By the inductive hypothesis (3.1.1), (Xj, X:) =~ (xj, yp). Because
of the permutation of nodes in step (2b), we know that
(A, A) ~ (X, . . . X , , Y, . . . Y,)

(x~X~ . . . x~, < ' ~ . . . ~'~)
(X 1 "'" Xk, I~ik ) ' ' " t~}k))~,
where a) m) is
(a) y j if Yj is in N and is associated with one of X1, • . . , Xm, and
(b) Yj otherwise.
Thus Eq. (3.1.1) follows.
Part (2) of the theorem is a special case of the following statement"
i
(3.1.2) If (A, A) ~ (x, y), then there is a derivation tree D in Gi, with
root labeled A, frontier x, and a sequence of choices in step (2b)
so that the application of step (2) to D gives a tree with frontier y.
A proof of (3.1.2) by induction on i is left for the Exercises.
We comment that the order in which step (2) of Algorithm 3.1 is applied
to nodes is unimportant. We could choose any order that considered each
interior node exactly once. This statement is also left for the Exercises.
DEFINITION
An SDTS T = (N, E, A, R, S) such that in each rule A --~ 0c, ,8 in R,

associated nonterminals occur in the same order in 0c and fl is called a simple
SDTS. The translation defined by a simple SDTS is called a simple syntax-

directed translation (simple SDT).
The syntax-directed translation schemata of Examples 3.3-3.5 are all
simple. That of Example 3.6 is not.
The association of nonterminals in a form of a simple SDTS is straight-
forward. They must be associated in the order in which they appear.
The simple syntax-directed translations are important because for each
simple SDT we can easily construct a translator consisting of a pushdown
transducer. This construction will be given in Section 3.1.4. Many, but not
all, useful translations can be described as simple SDT's. In Chapter 9 we
shall present several generalizations of syntax-directed translation schemata
which can be used to define larger classes of translations on context-free
languages. We close this section with another example of a simple SDT.
Example 3.7
The following simple SDTS maps the arithmetic expressions in L(Go)
to arithmetic expressions with no redundant parentheses:
(I) E > (E), E

(2) E > E-+- E, E-+- E
(3) E~ T, T
(4) T > (T), T
(5) T >A,A, A*A
(6) T > a, a
(7) A > (E -k E), (E q- E)
(8)" A > T, T
For example, the translation of ((a q - ( a , a ) ) , a) according to this

SDTS is (a + a , a) , a.t Q
3.1.3. Finite Transducers
We shall now introduce our simplest translator, the finite transducer.

A transducer is simply a recognizer which emits an output string during
each move made. (The output may be e, however.) The finite transducer is
obtained by taking a finite automaton and permitting the machine to emit
a string of output symbols on each move (Fig. 3.3). In Section 3.3 we shall
use a finite transducer as a model for a lexical analyzer.
tNote that the underlying grammar is ambiguous, but that each input word has exactly
one output.
I' al a2 an
Read only
input tape
1
Finite
control
Write only
output tape
Fig. 3.3 Finite transducer.
For generality we shall consider a nondeterministic finite automaton

which is capable of making e-moves, as the basis of a finite transducer.
DEFINITION
A finite transducer M is a 6-tuple (Q, E, A, ~, q0, F), where

(1) Q is a finite set of states.
(2) E is a finite input alphabet.
(3) A is a finite output alphabet.
(4) ~ is a mapping from Q x (E u [e}) to finite subsets of Q x A*.
(5) q0 ~ Q is the initial state.
We define a configuration of M as a triple (q, x, y), where
(1) q ~ Q is the current state of the finite control.
(2) x ~ E* is the input string remaining on the input tape, with the left-
most symbol of x under the input head.
(3) y E A* is the output string emitted up to this point.
We define ~ (or ~ , when M is clear), a binary relation on configurations,
to reflect a move by M. Specifically, for all q ~ Q, a ~ ~ u {e}, x ~ E*,
and y ~ A* such that d~(q, a) contains (r, z), we write
(q, ax, y) ~- (r, x, yz)
We can then define [A_, [____,and [-- in the usual fashion.

We say that y is an output for x if (q0, x, e ) [ - - (q, e, y) for some q in F.
The translation defined by M, denoted z ( g ) , is {(x, Y) i (qo, x, e) ~-- (q, e, y)
for some q in F}. A translation defined by a finite transducer will be called
a regular translation or finite transducer mapping.
Notice that before an output string y can be considered a translation of
an input x, the input string x must take M from an initial state to a final
state.
Example 3.8
Let us design a finite transducer which recognizes arithmetic expressions
generated by the productions
S ~a+SIa--SI+SI--SIa
and removes redundant unary operators from these expressions. For exam-
ple, we would translate - - a - - 4 - - a ~ a into - - a - a-q-a. In this
language, a represents an identifier, and an arbitrary sequence of unary -q-'s
and --'s is permitted in front of an identifier. Notice that the input language
is a regular set. Let M = (Q, ~, A, d~, q0, F), where
(1) a = {q0, ql, q2, q3, q,}.
(2) E = [a, + , - - ] .
(3) A = E.
(4) 6 is defined by the transition graph of Fig. 3.4. A label x/y on an edge
directed from the node labeled qt to the node labeled qj indicates that 6(q~, x)
contains (qj, y).
(5) F = {ql].
M starts in state q0 and determines whether there are an odd or even
Start
+/e
--/e
Fig. 3.4 Transition graph.

n u m b e r of m i n u s signs preceding the first a by a l t e r n a t i n g between q0 a n d q4

on input --. W h e n an a appears, M goes to state q~, to accept the input, a n d
emits either a or -- a, d e p e n d i n g on w h e t h e r an even or o d d n u m b e r o f - - ' s
have a p p e a r e d . F o r s u b s e q u e n t a's, M counts w h e t h e r the n u m b e r o f - - ' s
is even or odd using states q2 a n d q3. The only difference between the
q2 -- q3 pair a n d q0 -- q4 is that the f o r m e r emits + a, r a t h e r t h a n a alone,
if an even n u m b e r of -- signs precede a.
W i t h input -- a 4- -- a -- -b -- a, M would m a k e the following sequence
of moves:
(q0, -- a 4- -- a -- -1- -- a, e) [-- (qa, a -q- -- a -- -b -- a, e)

(q l, 4- -- a - -b -- a , - a)
(qz,- a- -t- -- a, - - a )
(q3, a - - +--a,-a)
~-(ql,--+--a,--a--a)
~(q3, q- - - a , - - a - - a )
(q3,--a, --a- a)
(q2, a , - a- a)
1- (ql, e, - - a -- a + a)
Thus, M m a p s -- a 4- -- a [ a into -- a -- a + a, since q l is a final

state. D
We say that a finite t r a n s d u c e r M is deterministic if the following condi-

tion holds for all q ~ Q"
(1) Either ~(q, a) contains at m o s t one element for each a ~ E, a n d
~(q, e) is empty, or
(2) ~(q, e) contains one element, a n d for all a ~ E, ~(q, a) is empty.
The finite t r a n s d u c e r in E x a m p l e 3.8 is deterministic.
N o t e that a deterministic finite t r a n s d u c e r can define several translations
for a single input.
Example 3.9
Let M = ({q0, q~}, {a}, {b}, ~, qo, {q~}) a n d let J(q0, a) = {(ql, b)} a n d
d~(ql, e) = {(q~, b)}. T h e n
(q0, a, e ) 1 ~ (q~, e, b)
~-~-(ql, e, b '+1)
is a valid sequence o f m o v e s for all i ~ 0. Thus, 'r(M) = {(a, bt)[i ~ 1}. [Z]
sac. 3.1 FORMALISMS FOR TRANSLATIONS 227
There are several simple modifications of the definition of determinism

for finite transducers that will ensure the uniqueness of output. The one we
suggest is to require that no e-moves be made in a final state.
A number of closure properties for classes of languages can be obtained
by using transducers as operators on languages. For example, if M is a finite
transducer and L is included i n the domain of z(M), then we can define
M(L) : {Y lx ~ L and (x, y) ~ z(M)}.
We can also define an inverse finite transducer mapping as follows. Let M
be a finite transducer. Then M-I(L) = {xly ~ L and (x, y) ~ z(M)}.
It is not difficult to show that finite transducer mappings and inverse
finite transducer mappings preserve both the regular sets and the context-
free languages. That is, if L is a regular set ( C F L ) a n d M is a finite transducer,
then M(L) and M-I(L) are both regular sets (CFL's). Proofs are left for the
Exercises. We can use these observations to show that certain languages are
not regular or not context-free.
Example 3,10
The language generated by the following grammar G is not regular"
S ---+ if S then S[a

Let
Li = L(G) ~ (if)* a (then a)*
= {(if)" a (then a)"ln ~ 0}
Consider the finite transducer M -- (Q, Z, A, J, q0, F), where

(1) Q = {q,10 ~ i ~ 6}.
(2) Z = {a, i, fi t, h, 'e', n}.
(3) A = {0, 1}.
(4) ~ is defined by the transition graph of Fig. 3.5.
(5) F = {qz].
Here 'e' denotes the letter e, as distinguished from the empty string.
Thus M(L~) = {0kl k [k ~ 0}, which we know is not a regular set. Since the
regular sets are closed under intersection and finite transducer mappings,
we must conclude that L(G) is not a regular set. D
3.1.4. Pushdown Transducers
We shall now introduce another important class of translators called

pushdown transducers. A pushdown transducer is obtained by providing
a pushdown automaton with an output. On each move the automaton is
permitted to emit a finite-length output string.
a/e
Start
i/e ' 'e
Fig. 3.5 Transition graph of M.
DEFINITION
A pushdown transducer (PDT) P is an 8-tuple (Q, £, F, A, ~, qo, Z0, F),

where all symbols have the same meaning as for a P D A except that A is
a finite output alphabet and ~ is now a mapping from Q x (£ u {e}) x r
to finite subsets of Q x IF* x A*.
We define a configuration of P as a 4-tuple (q, x, ~, y), where q, x, and
are the same as for a P D A and y is the output string emitted to this point.
If ~(q, a, Z) contains (r, ~, z), then we write (q, ax, ZT, y) k- (r, x, ~?, yz)
for all x ~ £*, ? ~ r'*, and y ~ A*.
We say that y is an output for x if (qo, x, Z0, e)[--- (q, e, ~, y) for some
q ~ F and ~ ~ F*. The translation defined by P, denoted z(P), is
[(x, Y) [ (qo, x, Z o, e) [-----.(q, e, ~, y) for some q ~ F and ~ ~ I'*}.
As with PDA's, we can say that y is an output for x by empty pushdown

list if (q0, x, Z0, e)t---- (q, e, e, y) for some q in Q. The translation defined by
P by empty pushdown list, denoted z,(P), is
{(x, Y) [ (qo, x, Zo, e) ~ (q, e, e, y) for some q ~ Q}.
We can also define extended PDT's with their pushdown list top on the
right in a way analogous to extended PDA's.
Example 3.11
Let P be the pushdown transducer
([q}, {a, + , ,}, {-k, *, E}, {a, -q-, ,}, ~, q, E, {q}),
where 6 is defined as follows:

sec. 3.1 FORMALISMSFOR TRANSLATIONS 229
6(q, a, E) = {(q, e, a)}

6(q, + , E) = ((q, EE + , e)]
6 ( q , . , E) = [(q, E E . , e)}
6(q, e, + ) = {(q, e, +)}
6(q, e, . ) = [(q, e, .)]
With input + • aaa, P makes the following sequence of moves"
(q, + • aaa, E, e) t- (q, * aaa, E E + , e)

~- (q, aaa, E E • E +, e)
F- (q, aa, E • E + , a)
(q, a, • E + , aa)
F- ( q , a , E + , a a . )
t- (q, e, +, aa • a)
~-(q,e,e, aa.a+)
Thus a translation by empty pushdown list of + • aaa is aa • a +. It can be

verified that z,(P) is the set
{(x, y) lx is a prefix Polish arithmetic expression over [ + , . , a}

and y is the corresponding postfix Polish expression}. [1
DEFINITION
If P = (Q, E, F, A, c5, qo, Zo, F) is a pushdown transducer, then the push-
down automaton (Q, E, F, ~', q0, Z0, F), where ~'(q, a, Z) contains (r, ~,) if
and only if ~(q, a, Z) contains (r, 7, y) for some y, is called the P D A under-
lying P.
We say that the P D T P = (Q, E, F, A, 6, q0, Z0, F) is determip,stie
(a DPDT) when
(1) For all q e Q, a ~ E w {e}, and Z e F, 6(q, a, Z ) contains at most
one element, and
(2) If 0(q, e, Z ) ~ ~ , then 6(q, a, Z ) = ~ for all a E ~:.t
Clearly, if L is the domain of z(P) for some pushdown transducer P, then
L = L(P'), where P ' is the pushdown automaton underlying P.
?Note that this definition is slightly stronger than saying that the underlying PDA is
deterministic. The latter could be deterministic, but (1) may not hold because the PDT
can give two different outputs on two moves which are otherwise identical. Also note that
condition (2) implies that if 6(q, a, Z) ~ ~ for some a e ~, then 6(q, e, Z) = ~.
Many of the results proved in Section 2.5 for pushdown automata carry
over naturally to pushdown transducers. In particular, the following lemma
can be shown in a way analogous to Lemmas 2.22 and 2.23.
LEMMA 3.1
A translation T is z(P1) for a pushdown transducer P1 if and only if T
is ze(P2) for a pushdown transducer P2.
Proof Exercise.
A pushdown transducer, particularly a deterministic pushdown trans-
ducer, is a useful model of the syntactic analysis phase of compiling. In Sec-
tion 3.4 we shall use the pushdown transducer in this phase of compiling.
Now we shall prove that a translation is a simple SDT if and only if it
can be defined by a pushdown transducer. Thus the pushdown transducers
characterize the class of simple SDT's in the same manner that pushdown
automata characterize the context-free languages.
LEMMA 3.2
Let T = (N, Z, A, R, S) be a simple SDTS. Then there is a pushdown
transducer P such that z,(P) = z(T).
Proof. Let G~ be the input grammar of T. We construct P to recognize
L(G~) top-down as in Lemma 2.24.
To simulate a rule A ---~ ~, fl of T, P will replace ,4 on top of its pushdown
list by 0~with output symbols offl intermeshed. That is, if ~ = XoA ~xi • • • A , x ,
and f l - - - y o A ~ y ~ . . . A,y,, then P will place XoYoA~x~y~... A,x,,y, on its
pushdown list. We need, however, to distinguish between the symbols of Z
and those of A, so that the word x~y~ can be broken up correctly. If Z and
A are disjoint, there is no problem, but to take care of the general case, we
define a new alphabet A' corresponding to A but known to be disjoint
from Z. That is, let A' consist of new symbols a' for each a ~ A. Then
Z ~ A ' = ~ . Let h be the homomorphism defined by h ( a ) = a' for each
ainA.
Let P = ({q}, Z, N U Z U A', A, 5, q, S, ~), where 5 is defined as fol-
lows-
(1) If A ~ XoBlX~... BkXk, YoB1Yl " " BkYk is a rule in R with k _~ 0,
then 5(q, e, A) contains (q, Xoy~B~x~y'~ . . . BkxkY~,, e), where y~ = h(yt),
O~li~k.
(2) 6(q, a, a) = {(q, e, e)} for all a in Z.
(3) O(q, e, a') = {(q, e, a)} for all a in A.
By induction on m and n, we can show that, for A in N and m, n ~ 1,
m
(3.1.3) (A, A) :::> (x, y) for some m if and only if (q, x, A, e) ~ - (q, e, e, y)
for some n
O n l y / f : The basis, m = 1, holds trivially, as A ~ x, y must be in R.

Then (q, x, A, e) ~ (q, x, x h ( y ) , e) ~ (q, e, h(y), e) ~ (q, e, e, y).
For the inductive step assume (3.1.3) for values smaller than m, and
m--1
let (A, A) ~ (xoBlXl . . . BkXk, YoB1Yl "'" BkYk) ~ (X, y). Because simple
SDTS's involve no permutation of the order of nonterminals, we can write
mlt
x = xoulxl'" UkXk and y = yoVlyl . . . VkYk, SO that (B~, B~):=~ (u~, %) for
1 ~ i _~ k, where m t < m for each i. Thus, by the inductive hypothesis
(3.1.3), (q, ui, Bi, e) ~ (q, e, e, v~). Putting these sequences of moves together,
we have
(q, x, A, e ) ~ - - (q, x, xoh(Yo)B~ . . . B k X k h ( y k ) , e)
(q, uax~ . . . UkX k, h(Yo)Ba "'" B k x e h ( y k ) , e)
]--- (q, u t x l "'" UkX~,, Bi . . . BkXkh(Yk), Yo)
~---- (q, Xl "'" UkXk, xah(Y~) " " " BkXkh(Yk), YoVa)
~--- . . . ~--- (q, e, e, y)
If: Again the basis, n = 1, is trivial. It must be that A ~ e, e is in R.

For the inductive step, let the first move of P be
(q, x, A, e) ~ (q, x, x o h ( y o ) B l x l h ( y a ) . . . B~xkh(Yk), e),
where the xt's are in Z* and the h(y~)'s denote strings in (A')*, with the y~'s
in A*. Then Xo must be a prefix of x, and the next moves of P remove x 0
from the input and pushdown list and then emit Yo. Let x' be the
remaining input. There must be some prefix u 1 of x' that causes the level
holding Ba to be popped from the pushdown list. Let Vl be emitted up to
the time the pushdown list first becomes shorter than [Bi"-BkXkh(Yk)l.
Then (q, u~, B~, e)1--- (q, e, e, vl) by a sequence of fewer than n moves. By
inductive hypothesis (3.1.3), (B~, Bi) ==~ (ua, v~).
Reasoning in this way, we find that we can write x as XoUlXl . . . UkX ~ and
y as Y o V l Y ~ ' ' ' V k y k SO that ( B , B ~ ) = : , ( u , %) for 1 _~ i_~ k. Since rule
A - - ~ xoB~x~ . . . BkXk, Y o B i Y l "'" BkYe is clearly in R, we have
(A, A) = - (x, y).
As a special case of (3.1.3), we have (S, S ) = ~ (x, y) if and only if
(q, x, S, e)[---(q, e, e, y), so z , ( P ) = z ( T ) . [~
Example 3.'12
The simple SDTS T having rules
E- > q- EE, E E --k

E ~ • EE, E E •
E - - - - ~ a, a
would give rise to the pushdown transducer
P = ({q}, [a, + , ,}, {E, a, + , ,, a', -q-', ,'}, [a, + , ,}, 6, q, E, ~),
where d~ is defined by
(1) 6(q, e, E) = {(q, + EE +', e), (q, • EE ,', e), (q, aa', e)}
(2) d~(q, b, b) = [(q, e, e)} for all b in [a, + , ,]
(3) 6(q, e, b') = [(q, e, b)} for all b in {a, + , ,}.
This is a nondeterministic pushdown transducer. Example 3.11 gives
an equivalent deterministic pushdown transducer. [~[]
LEMMA 3.3
Let P = (Q, E, F, A, $, qo, Zo, F) be a pushdown transducer. Then there
is a simple SDTS T such that z(T) = 1re(P).
Proof. The construction is similar to that of obtaining a CFG from
a PDA. Let T = (N, E, A, R, S), where
(1) N = [[pAq] IP, q ~ Q, A ~ F} t,J {S}.
(2) R is defined as follows:
(a) If 5(p, a, A) contains (r, X ~ X z . . . X k, y), then if k > 0, R con-
tains the rules
[pAqk ] > a[rXlql][qiX2q2]... [qk_~Xkqk],

y[rX~ql][q~X2q2]"" [qk-~Xkqk]
for all sequences ql, q 2 , . . . , qe of states in Q. If k = 0, then

the rule is [pAr] ~ a, y.
(b) For each q in Q, R contains the rule S ~ [q0 Zo q], [q0 Zo q].
Clearly, Tis a simple SDTS. Again, by induction on m and n it is straight-
forward to show that
m
(3.1.4) ([pAq], [pAq]) =0. (x, y) if and only if (p, x, A, e ) ~ (q, e, e, y)

for all p and q in Q and A ~ F
We leave the proof of (3.1.4) for the Exercises.

+
Thus we have (S, S) ~ ([q0 Z0 q], [q0 Z0 q]) ==~ (x, y) if and only if
(q0, x, Zo, e ) ~ (q, e, e, y). Hence z(T) = z,(P). [[]
Example 3.13
Using the construction in the previous lemma let us build a simple SDTS
from the pushdown transducer in Example 3.11. We obtain the SDTS
T = (N, {a, + , ,}, [a, + , ,}, R, S), where N = {[qXq]] X ~ [ + , ,, E}} U S
and where R has the rules
EXERCISES 233
S > [qEq], [qEq]

[qEq] > a, a
[qEq] > + [qEq][qEq][q + q], [qEq][qEq][q -F q]
[qEq] > • [qEq][qEq][q • q], [qEq][qEq][q • q]
[q + q ] - ~e,+
[q,q] >e,,
Notice that using t r a n s f o r m a t i o n s similar to those for removing single

and e-productions f r o m a C F G , we can simplify the rules to
S >a,a
S > + SS, S S +
S >,SS, SS, Q
THEOREM 3.2
Tis a simple S D T if and only if T i s z(P) for some p u s h d o w n transducer P.
Proof I m m e d i a t e f r o m L e m m a s 3.1, 3.2, and 3.3. E]
In C h a p t e r 9 we shall introduce a machine called the p u s h d o w n processor
which is capable of defining all syntax-directed translations.
EXERCISES
3.1.1. An operator with one argument is called a unary operator, one with two
arguments a binary operator, and in general an operator with n argu-
ments is called an n-ary operator. For example, -- can be either a
unary operator (as in - - a ) or a binary operator (as in a - b). The
degree of an operator is the number of arguments it takes. Let 19 be
a set of operators each of whose degree is known and let ~ be a set
of operands. Construct context-free grammars G1 and G2 to generate
the prefix Polish and postfix Polish expressions over O and ~.
"3.1.2. The "precedence" of infix operators determines the order in which
the operators are to be applied. If binary operator 0t "takes precedence
over" 02, then a02 b01 c is to be evaluated as a02 (b01 c). For
example, • takes precedence over + , so a + b • c means a + (b • c)
rather than (a + b) • c. Consider the Boolean operators ~ (equiva-
lence), ~ (implication), V (or), A (and), and --7 (not). These operators
are Iisted in order of increasing precedence. - 7 is a unary operator and
the others are binary. As an example, ---7 (a V b) _= - 7 a A --7 b has the
implied parenthesization (--a (a V b)) :- ((--~ a) A (--1 b)). Construct a
CFG which generates all valid Boolean expressions over these opera-
tors and operands a, b, c with no superfluous parentheses.
"3.1.3. Construct a simple SDTS which maps Boolean expressions in infix

n o t a t i o n into prefix notation.
'3.1.4. In A L G O L , expressions can be constructed using the following binary
operators listed in order of their precedence levels. If more than one
operator appears on a level, these operators are applied in left-to-right
order. F o r example a - b -Jr c means (a - b) ~ c.
(1) ~ (2) ~ (3) V
(4) A (5) --1 (6) ~ < = ~ >
(7) + - (8) × / "-- (9) 1"
Construct a simple SDTS which maps infix expressions containing these
operators into postfix Polish notation.
3.1.5. Consider the following SDTS. A string of the form ( x ) is a single
nonterminal"
(exp) > sum ( e x p ) C1~ with ( v a r ) ~-- ( e x p ) C2) to (exp)~3~,f
begin local t;
t ~--- 0;
for ( v a r ) ~ ( e x p ) ~z~ to ( e x p ) ~3~ do
t ~ t 4- (exp){1}; result t
end
(var) ---> (id), (id)
(exp) > (id), (id)
(id) ~ a (id), a (id)
(id) - ~ b(id), b ( i d )
(id)- > a, a
( i d ) - - - > b, b
Give the translation for the sentences
(a) sum a a with a ~ b to b b .
(b) sum sum a with a a ~ a a a to a a a a with b ~-- bb to bbb.
"3.1.6. Consider the following translation scheme"

(statement) > for ( v a r ) ~ (exp){ 1) to ( e x p ) (z) do ( s t a t e m e n t ) ,
begin ( v a r ) ~---- (exp){1) ;
L" if ( v a r ) .~ ( e x p ) {2} then
begin ( s t a t e m e n t ) ;
(var) ~ ( v a r ) + 1;
go to L
end
end
tNote that this comma separates the two parts of the rule.
EXERCISES 235
(var) ~ (id), ( i d )
(exp)- ~ (id), ( i d )
(statement) ~ ( v a r ) ~-- (exp), ( v a r ) ~-- (exp)
(id) ~ a(id), a(id)
(id)- > b(id), b(id)
( i d ) ----> a, a
(id) ~ b, b
Why is this not an SDTS ? What should the output be for the following
input sentence:
for a ~-- b to aa do baa ~ bba
Hint: Apply Algorithm 3.1 duplicating the nodes labeled ( v a r ) in the

output tree.
Exercises 3.1.5 and 3..1.6 provide examples of how a language can
be extended using syntax macros. Appendix A.1 contains the details
of how such an extension mechanism can be incorporated into a
language.
3.1.7. Prove that the domain and range of any syntax-directed translation are
context-free languages.
3.1.8. Let L _q ~* be a C F L and R ~ ~* a regular set. Construct an SDTS
T such that
~(T) = {(x, y) lif x ~ L -- R, then y = 0

if x ~ L A R, then y = 1}
"3.1.9. Construct an SDTS T such that
'r(T) = {(x, y ) l x e {a, b}* and y = d, where

i = [ ~ , ( x ) - ~b(x) l, where ~a(x) is
the number of d's in x}
"3.1.10. Show that if Z is a regular set and M is a finite transducer, then

M ( L ) and M - I ( L ) are regular.
"3.1.11. Show that if L is a C F L and M is a finite transducer, then M ( L ) and
M - ~ ( L ) are CFL's.
"3.1.12. Let R be a regular set. Construct a finite transducer M such that
M ( L ) = L / R for any language L. With Exercise 3.1.11, this implies that
the regular sets and CFL's are closed u n d e r / R .
"3.1.13. Let R be a regular set. Construct a finite transducer M such that
M ( L ) = R/L for any language L.
3.1.14. A n SDTS T = (N, ~, A, R, S) is right-linear if each rule in R is of the

form
A ~ xB, yB
or
A-----~x,y
where A, B are in N, x ~ ~*, and y ~ A*. Show that if T is right-linear,

then 'r(T) is a regular translation.
*'3.1.15. Show that if T g a* × b* is an SDT, then T can be defined by a finite
transducer.
3.1.16. l e t us consider the class of prefix expressions over operators O and
operands ~. If a l . . . a , is a sequence in (® u E)*, compute st, the
score at position i, 0 0 for all i < n.
"3.1.17. Let a l . . . a , be a prefix expression in which a l is an m-ary operator.
Prove that the unique way to write a l . . . a , as a a w ~ ' . ' W m , where
wl . . . . ,Wm are prefix expressions, is to choose wj, 1 < j < m, so that
it ends with the first ak such that sk = m - - j .
"3.1.18. Show that every prefix expression with binary operators comes from
a unique infix expression with no redundant parentheses.
3.1.19. Restate and prove Exercises 3.1.16-3.1.18 for postfix expressions.
"3.1.21. Prove that the order in which step (2) of Algorithm 3.1 is applied t o
nodes does not affect the resulting tree.
3.1.22. Prove Lemrna 3.1.
3.1.23. Give pushdown transducers for the simple SDT's defined by the trans-
lation schemata of Examples 3.5 and 3.7.
3.1.2,4. Construct a grammar for SNOBOL4 statements that reflects the
associativity and precedence of operators given in Appendix A.2.
3.1.25. Give an SDTS that defines the (empty store) translation of the following
PDT"
({q, p}, {a, b}, [Zo, A, B}, [a, b}, ~, q, Z0, ~ )
where ~ is given by
6(q, a, X) = (q, A X, e), for all X = Z0, A, B

6(q, b, X) = (q, BX, e), for all X = Z0, A, B
O(q, e, A) -- (p, A, a)
0(p, a, A) = (p, e, b)
6(p, b, B) = (p, e, b)
6(p, e, Zo) = (p, e, a)
*'3.1.26. Consider two pushdown transducers connected in series, so the output

of the first forms the input of the second. Show that with such a tandem
connection, the set of possible output strings of the second PDT can be
any recursively enumerable set.
3.1.27. Show that T is a regular translation if and only if there is a linear
context-free language L such that T = [(x, y)lxcy R ~ L}, where c is a
new symbol.
'3.1.28. Show that it is undecidable for two regular translations T, and /'2
whether Ti = / 2 .
Open Problems
3.1.29. Is it decidable whether two deterministic finite transducers are equiva-
lent ?
3.1.30. Is it decidable whether two deterministic pushdown transducers are
equivalent ?
Research Problem
3.1.31. It is known to be undecidable whether two nondeterministic finite
transducers are equivalent (Exercise 3.1.28). Thus we cannot "mini-
mize" them in the same sense that we minimized finite automata in
Section 3.3.1. However, there are some techniques that can serve to
make the number of states smaller. Can you find a useful collection of
these ? The same can be attempted for PDT's.
BIBLIOGRAPHIC NOTES
The concept of syntax-directed translation has occurred to many people. Irons

[1961] and Barnett and Futrelle [1962] were among the first to advocate its use.
Finite transducers are similar to the generalized sequential machines introduced
by Ginsburg [1962]. Our definitions of syntax-directed translation schema and
pushdown transducer along with Theorem 3.2 are similar to those of Lewis and
Stearns [1968]. Griffiths [1968] shows that the equivalence problem for nondeter-
ministic finite transducers with no e-outputs is also unsolvable.
3.2. PROPERTIES OF SYNTAX-DIRECTED

TRANSLATIONS
In this section we shall examine some of the theoretical properties of

syntax-directed translations. We shall also characterize those translations
which can be defined as simple syntax-directed translations.
3.2.1. Characterizing Languages
DEFINITION
We say that language L characterizes a translation T if there exist two

homomorphisms h~ and h2 such that T = [(h~ (w), h2(w)) l w E L}.
Example 3.14
The translation T = {(a~,aO[n ~ 1} is characterized by 0 ÷, since
T = {(ha(w), h2(w))lw ~ 0+}, where ha(0) = h2(0) = a. E]
We say that a language Z ~ (X U A')* strongly characterizes a transla-

tion T ~ X* × A* if
(I) X A A ' = ~ 3 .
(2) Z = {(ha(w), h2(w))lw ~ L}, where
(a) ha (a) = a for all a in Z and h, (b) = e for all b in A'.
(b) hz(a ) = e for all a in X and h z is a one-to-one correspondence
between A' and A [i.e., h2(b) e A for all b in A' and h2(b) = h2(b')
implies that b = b'].
Example 3.15
The translation T = [ ( a " , a " ) [ n > 1} i s strongly characterized by
L 1 = {a"b"! n > 1}. It is also strongly characterized by L 2 = {wlw consists
of an equal number of a's and b's]. The homomorphisms in each case are
ha(a) = a, h i ( b ) = e and h2(a)= e, hz(b)= a. T is not strongly characterized
by the language 0 +. [Z
We can use the concept of a characterizing language to investigate the

classes of translations defined by finite transducers and pushdown trans-
ducers.
LEMMA 3.4
Let T = (N, E, A, R, S) be an SDTS in which each rule is of the form

A~aB, bB or A ---, a, b f o r a ~ X u { e } , beAu[e], andB~N. Then
v(T) is a regular translation.
Proof Let M be the finite transducer (N U If}, X, A, ~, S, {f}), w h e r e f
is a new symbol. Define O(A, a) to contain (B, b) if A ---~ aB, bB is in R, and
SEC. 3.2 PROPERTIES OF SYNTAX-DIRECTED TRANSLATIONS 239
to contain (f, b) if A ~ a, b is in R. Then a straightforward induction on n

shows that
?1
(S, x, e ) ~ (A, e, y) if and only if (S, S) ---9. (xA, yA)
It follows that (S, x, e) ~ (f, e, y) if and only if (S, S) ==~ (x, y). The
details are left for the Exercises. Thus, z(T) = z(M). E]
THEOREM 3.3
T is a regular translation if and only if T is strongly characterized by

a regular set.
Proof
If: Suppose that L G (X U A')* is a regular set and that ht and h2 are
homomorphisms such that h ~ ( a ) = a for a ~ X, h i ( a ) = e for a ~ A',
hz(a) = e for a ~ X, and h2 is a one-to-one correspondence from A' to A. Let
T = {(ha(w), hz(w)lw ~ L], and let G = (N, Z U A', P, S) be a regular gram-
mar such that L(G) = L. Then consider the SDTS U = (N, X, A, R, S),
where R is defined as follows:
(1) If A ~ aB is in P, then A --, hl(a)B, h2(a)B is in R.
(2) If A ~ a is in P, then A --~ ha(a), h2(a) is in R.
/t
An elementary induction shows that (A, A ) = ~ (x, y) if and only if
U
n
A ~ w, h,(w) = x, and h2(w) = y.
G
+
Thus we can conclude that (S, S) =~ (x, y) if and only if (x, y) is in T.

U
Hence z ( U ) = T. By Lemma 3.4, there is a finite transducer M such that

, ( M ) = T.
Only if: Suppose that T ~ Z* x A* is a regular translation, and that
M = (Q, X, A, tS, q0, F) is a finite transducer such that , ( M ) = T.
Let A ' = {a'[a ~ A] be an alphabet of new symbols. Let G =
(Q, X u A', P, q0) be the right-linear grammar in which P has the following
productions"
(1) If 6(q, a) contains (r, y), then q ~ ah(y)r is in P, where h is a homo-
morphism such that h(a) = a' for all a in A.
(2) If q is in F, then q ~ e is in P.
Let ha and h2 be the following homomorphisms"
ha(a ) = a for all a in X

ha(b) = e for all b in A'
h2(a) = e for all a in X
h2(b' ) = b for all b' in A'
We can now show by induction on m and n that (q, x, e ) ~ (r, e, y) for

n
some m if and only if q :=~ wr for some n, where h~(w) = x and h2(w) = y.
+
Thus, (qo, x, e) ~ (q, e, y), with q in F, if and only if qo => wq ==~ w, where
h l ( w ) = x and h2(w ) = y. Hence, T = {(hi(w), h2(w)) I w ~ L ( G ) } . Thus, L ( G )
strongly characterizes T. D
COROLLARY
T is a regular translation if and only if T is characterized by a regular set.

P r o o f . Strong characterization is a special case of characterization. Thus
the "only if" portion is immediate. The "if" portion is a simple generaliza-
tion of the "if" portion of the theorem.
In much the same fashion we can show an analogous result for simple
syntax-directed translations.
THEOREM 3.4
T is a simple syntax-directed translation if and only if it is strongly char-

acterized by a context-free language.
?roof.
I f : Let T be strongly characterized by the language generated by
Ga = (N, Z U A',P, S), where hi and h2 are the two homomorphisms involved.
Construct a simple SDTS T1 = (N, X, A, R, S), where R is defined by:
For each production A --~ woB awa . . . B k W k in P, let
A ~ h~(wo)Baha(w~)... Bkh~(wk), h2(wo)Bah2(wa)"" Bkhz(we)
be a rule in R.
A straightforward induction on n shows that
Itl n
(1) If A :=~ w, then (A, A) ==~ (hi(w), hz(w)).
Gx T1
B B
(2) If (A, A) =~ (x, y), then there is some w such that A =:~ w, hi (w) = x,
T1 Gt
and h2(w) = y.
Thus, z ( T 1 ) = T.
O n l y if: Let T = z(T2), where T2 = (N, E , A , R , S), and let A ' =
{ a ' [ a ~ A} be an alphabet of new symbols. Construct C F G G2 =
(N, X U A', P, S), where P contains production A ~ Xoy~Blxmy'l . . . B k x k y ~
for each rule A ~ x o B l x t . . . B k X k , Y o B l Y l . . . B k Y k in R; Yl is Yt with each
symbol a ~ A replaced by a'. Let h 1 and h 2 be the obvious homomorphisms,
ha(a) = a for a ~ E, h i ( a ) = e for a ~ A', hz(a) = e for a E E, and h2(a') = a
for a ~ A. Again it is elementary to prove by induction that
n n
(1) If A ==~ w, then (A, A) ~ (ha(w), h2(w)).
G~ T2
n n
(2) If (A, A) ~ (x, y), then for some w, we have A ~ w, ha(w) = x,
T2 G~
and h2(w)= y. Eli
COROLLARY
A translation is a simple SDT if and only if it is characterized by a context-

free language. Eli
We can use Theorem 3.3 and 3.4 to show that certain translations are not
regular translations or not simple SDT's. It is easy to show that the domain
and range of every simple SDT is a CFL. But there are simple syntax-directed
translations whose domain and range are regular sets but which cannot be
specified by any finite transducer or even pushdown transducer.
Example 3.16
Consider the simple SDTS T with rules
S----~ OS, SO
S > 1S, S1
S >e,e
z(T) = {(w, wR.)[W E {0, 1}*}. We shall show that z(T) is not a regular trans-
lation.
Suppose that x(T) is a regular translation. Then there is some regular
language L which strongly characterizes z(T). We can assume without loss of
generality that L ~ {0, 1, a, b}*, and that the two homomorphisms involved
are hi(0) = 0, hi(l) = 1, ha(a) = hi(b) = e and h2(0) = h2(1) = e, h2(a) = O,
hz(b) = 1.
If L is regular, it is accepted by a finite automaton
M = (Q, [0, 1, a, b}, d~, qo, F)
with s states for some s. There must be some z ~ L such that hi(z) = 0~1 '
and h2(z)= lS0 ". This is because (0Sl ", 1"0 ") ~ z(T). All O's precede all
l's in z, and all b's precede all a's. Thus the first s symbols of z are only
O's and b's. If we consider the states entered by M when reading the first s
symbols of z, we see that these cannot all be different; we can write z = uvw
such that (qo, z) ~ (q, vw) t--- (q, w) ~ (p, e), where l uvl <_ s, l vl >_ 1, and
p ~ F. Then uvvw is in L. But hl(uvvw ) = 0"+ml" and h2(uvvw) = l'+n0 s,
where not both m and n are zero. Thus, (0'÷ml s, l'+n0s) ~ z(T), a contradic-
tion. We conclude that z(T) is not a regular translation. D
Example 3.17
Consider the SDTS T with the rules
S- ~ A~I~cA ~2~,A~2~cA~1~
A > 0A, 0A
A~ 1A, 1A
A >e,e
Here 'r(T) = {(ucv, vcu)lu, v ~ [0, 1}*}. We shall show that 'r(T) is not
a simple SDT.
Suppose that L is a CFL which strongly characterizes z(T). We can sup-
pose that A' = [c', 0', 1'}, L ~ ({0, 1, c} u A3*, and that h 1 and h2 are the
obvious homomorphisms. For every u and v in {0, 1}*, there is a word zuv
in L such that hl(zu,)= ucv and h2(z,,)= vcu. We consider two cases,
depending on whether c precedes or follows c' in certain of the zu,'s.
Case 1: For all u there is some v such that c precedes c' in z~v. Let R be
the regular set {0, 1, 0', l'}*c[0, 1, 0', l'}*c'{0, 1, 0', 1'}*. Then L n R is
a CFL, since the CFL's are closed under intersection with regular sets.
Note that L n R is the set of sentences in Z in which c precedes c'. Let M be
the finite transducer which, until it reads c, transmits O's and l's, while
skipping over primed symbols. After reading c, M does nothing until it
reaches c'. Subsequently, M prints 0 for 0' and 1 for 1', skipping over O's
and l's. Then M ( L n R) is a CFL, since the CFL's are closed under finite
transductions, and in this case M ( L N R) = [uulu ~ [0, 1}*}. The latter is not
a C F L by Example 2.41.
Case 2: For some u there is no v such that c precedes c' in zu,. Then for
every v there is a u such that c' precedes c in z,,. An argument similar to
case 1 shows that if L were a CFL, then [vv[v ~ [0, 1}*} would also be
a CFL. We leave this argument for the Exercises.
We conclude that ~(T) is not strongly characterized by any context-free
language and hence is not a simple SDT. [[]
Let 3, denote the class of regular translations, 3, the simple SDT's, and
3 the SDT's. From these examples we have the following result.
THEOREM 3.5
Z, ~ Z, ~ Z.
Proof 3, ~ 3 is by definition. 3, ~ 3s is immediate when one realizes
that a finite transducer is a special case of a PDT. Proper inclusion follows
from Examples 3.16 and 3.17. [[]
3.2.2. Properties of Simple SDT's
Using the idea of a characterizing language, we can prove analogs for

many of the normal form theorems of Section 2.6. We shall mention two of
them here and leave some others for the Exercises. The first is an analog of
Chomsky normal form.
THEOREM 3.6
Let T be a simple SDT. Then T = I:(T~), where Ti = (N, ~, A, R, S) is
a simple SDTS such that each rule in R is of one of the forms
(1) A ~ BC, BC, where A, B, and C are (not necessarily distinct) mem-
bers of N, or
(2) A --~ a, b, where exactly one of a and b is e and the other is in ~ or
A, as appropriate.
(3) S ---~ e, e if (e, e) is in T and S does not appear on the right of any
rule.
Proof. Apply the construction of Theorem 3.4 to a grammar in CNF. D
The second is an analog of Greibach normal form.
THEOREM 3.7
Let T be a simple SDT. Then T = ~(T1), where T1 = (N, E, A, R, S) is
a simple SDTS such that each rule in R is of the form A - 4 a~, b~, where
is in N*, exactly one of a and b is e, and the other is in ~ or A [with the same
exception as case (3) of the previous theorem].
Proof. Apply the construction of Theorem 3.4 to a grammar in GNF. [Z]
We comment that in the previous two theorems we cannot make both a
be in E and b in A at the same time. Then the translation would be length
preserving, which is not always the case for an arbitrary SDT.
3.2,3. A Hierarchy of SDT's
The main result of this section is that there is no analog of Chomsky

normal form for arbitrary syntax-directed translations. With one exception,
each time we increase the number of nonterminals which we allow on the right
side of rules of an SDTS, we can define a strictly larger class of SDT's. Some
other interesting properties of SDT's are proved along the way.
DEFINITION
Let T = (N, ~, A, R, S) be an SDTS. We say that T is of order k if for
no rule A ~ ~, fl in R does 0c (equivalently fl) have more than k instances of
nonterminals. We also say that z(T) is of order k. Let 3k be the class of all
SDT's of order k.
244 T H E O R Y OF T R A N S L A T I O N CHAP. 3
Obviously, ~1 ~ ~32 ~ "'" -~ ~3~~ . . . . We shall show that each of these

inclusions is proper, except that ~33 = 32. A sequence of preliminary results
is needed.
LEMMA 3.5
Proof. It is elementary to show that the domain of an SDTS of order 1

is a linear CFL. However, by Theorem 3.6, every simple SDT is of order 2,
and every CFL is the domain of some simple SDT (say, the identity trans-
lation with that language as domain). Since the linear languages are a proper
subset of the CFL's (Exercise 2.6.7), the inclusion of 3~ in ~2 is proper. D
There are various normal forms for SDTS's. We claim that it is possible
to eliminate useless nonterminals from an SDTS as from a CFG. Also, there
is a normal form for SDTS's somewhat similar to CNF. All rules can be put
in a form where the right side consists wholly of nonterminals or has no
nonterminals.
DEFINITION
A nonterminal A in an SDTS T = (N, E, A, R, S) is useless if either
(1) There exist no x ~ E* and y ~ A* such that (A, A) => (x, y), or
(2) For no t~ and t~2 i n (N U E)* and fl~ and flz in (N U A)* does
:g
(s, s)
LEMMA 3.6
Every SDT of order k is defined by an SDTS of order k with no useless
nonterminals.
Proof. Exercise analogous to Theorem 2.13. [~
LEMMA 3.7
Every SDT T of order k ~ 2 is defined by an SDTS T1 = (N, 1~, A, R, S),
where if A ---~ ~, fl is in R, then either
(1) ~ and fl are in N*, or
(2) ~ is in E* and fl in A*.
Moreover, T1 has no useless nonterminals.
Proof. Let T2 = (N', E, A, R', S) be an SDTS with no useless nonter-
minals such that z ( T z ) = T. We construct R from R' as follows. Let
A -----~x o B l x l ' " BkXk, YoCiYl"'" CkYk be a rule in R', with k > 0. Let
be the permutation on the set of integers 1 to k such that the nonterminal
Bi is associated with the nonterminal C~(i). Introduce new nonterminals
A', D ~ , . . . , D k and E 0 , . . . , Ek, and replace the rule by
A > Eo A', EoA'

Eo > Xo, Yo
A '~ > Dr'" Dk, D't • • • D~ where D~ = D',~.) for 1 ~ i < k
Dt >B~E i , B iE i forl~i<k
Et > x;, y..) for 1 ~ i < k
For example, if the rule is A ---, x o B ~ X l B 2 X 2 B 3 x 3 , y o B 3 y ~ B ~ y 2 B 2 y 3 , then

zr -- (2, 3, 1). We would replace this rule by
A > EoA', EoA'

Eo - > Xo, Yo
A ' - - - - ~ D 1 D 2 D 3, D 3 D 1 D 2
D~ ~ B~E~, B t E~, for i = 1, 2, 3
E1 > xl, Y2
E2 = > x2, Y3
E3 > x3, Y l
Since each Dt and E~ has only one rule, it is easy to see that the effect of
all these new rules is exactly the same as the rule they replace. Rules in R'
with no nonterminals on the right are placed directly in R. Let N be N'
together with the new nonterminals. Then z ( T 2 ) : z(T~) and T2 satisfies
the conditions of the lemma. D
LEMMA 3.8
~2 : ~3"
Proof. It suffices, by Lemma 3.7, to show how a rule of the form
A ~ B 1 B 2 B 3, C1C2C 3 can be replaced by two rules with two nonterminals in
each component of the right side. Let ~z be the permutation such that Bt is
associated with C~,). There are six possible values for ~z. In each case, we can
introduce a new nonterminal D and replace the rule in question by two rules
as shown in Fig. 3.6.
n(1) n(2) n(3) Rules
1 2 3 A ~ BtD, B1D D ~ BzB3, B2B3

1 3 2 A ~ BiD, BtD D ~ BzB3, B3Bz
2 1 3 A ~ DB3, DB3 D ~ B1B2, BzBx
2 3 1 A ~ DB3, B3D D ~ B1Bz, BiBz
3 1 2 /1 ~ B1 D, DB1 D ~ B2B3, BzB3
3 2 1 A ~ B1D, DBi D ~ B2B3, B3Bz
Fig. 3.6 New rules.

It is straightforward to check that the effect of the new rules is the same
as the old in each case.
LEMMA 3.9
Every SDT of order k ~ 2 has an SDTS T = (N, E, A, R, S) satisfying

Lemma 3.7, and the following"
(1) There is no rule of the form A ---~ B, B in R.
(2) There is no rule of the form A --~ e, e in R (unless A = S, and then
S does not appear on the right side of any rule).
Proof. Exercise analogous to Theorems 2.14 and 2.15. [~]
We shall now define a family of translations Tk for k ~ 4 such that Tk

is of order k but not of order k - 1. Subsequent lemmas will prove this.
DEFINITION
Let k ~ 4. Define 2~k, for the remainder of this section only, to be

{ a l , . . . , ak}. Define the permutation nk, for k even, by
if i is odd
~(i)
k+ i _I 2i + 1
if i is even
Thus, re4 is [3, 1, 4, 2] and zt6 is [4, 1, 5, 2, 6, 3]. Define ztk for k odd by
rk+l
if/= 1
2 '
i
~(i) = k - - - ~ - + 1, if i is even
i--1
if i is odd and i ¢ 1
2 '
Thus, zt5 = [3, 5, 1, 4, 2] and zt7 = [4, 7, 1, 6, 2, 5, 3].

Let T k be the one-to-one correspondence which takes
a~'a~~ . . . a~* to .~'~"'.~"~'~'

~n(1)t~zt(2)".. "~'~'~'
ten(k)
For example, if as, a2, a3, and a4 are called a, b, c, and d, then
T4 = {(a'bJckd z, cea'd~bJ) l i, j, k, l ~ 0}
In what follows, we shall assume that k is a fixed integer, k ~ 4, and

that there is some SDTS T = (N, ~k, Ek, R, S) of order k -- 1 which defines
SEC. 3.2 PROPERTIES OF SYNTAX-DIREC~D TRANSLATIONS 247
Tk. We assume without loss of generality that T satisfies Lemma 3.9, and
hence Lemmas 3.6 and 3.7. We shall prove, by contradiction, that T cannot
exist.
DEFINITION
Let E be a subset of Ek and A ~ N. (Recall that we are referring to the

hypothetical SDTS T.) We say that X is (A, d)-bounded in the domain (alt.
range) if for every (x, y) such that (A, A) =-> (x, y), there is some a E X such
T
that x (alt. y) has no more than d occurrences of a. If X is not (A, d)-bounded
in the domain (alt. range) for any d, we say that A covers X in the domain
(alt. range).
LEMMA 3.10
If A covers X in the domain, then it covers X in the range, and conversely.
Proof Suppose that A covers Z in the domain, but that X is (A, d)-
bounded in the range. By Lemma 3.6, there exist wl, w2, w3, and w4 in X~
such that (S, S)=0. (w~Aw2, w3Aw4). Let m = [WaW, l. Since A covers X in
the domain, there exist w5 and w6 in Xk* such that (A, A) ==~ (w~, w6), and for
all a ~ Z, ws has more than m + d occurrences of a. However, since X is
(A, d)-bounded in the range, there is some b ~ X such that Wn has no more
than d occurrences of b. But (wlwsw2, w3w6w4) would be a member of T k
under these circumstances, although WlWsW2 has more than m + d occur-
rences of b and w3w6w4 has no more than m + d occurrences of b.
By contradiction, we see that if X is covered by A in the domain, then
it is also covered by A in the range. The converse is proved by a symmetric
argument. D
As a consequence of Lemma 3.10, we are entitled to say that A covers E

without mentioning domain or range.
LEMMA 3.11
Let A cover Xk" Then there is a rule A ~ B 1 . . . Bin, C 1 . . . Cm in R,
and sets 1 9 ~ , . . . , ®m, whose union is Xk, such that Bi covers ®t, for 1 ~ i
~m.
Proof Let do be the largest finite integer such that for some X ~ Xk and
Bi, 1 < i _< m, X is (B~, d0)-bounded but not (B~, do -- 1)-bounded. Clearly,
do exists. Define d~ = do(k -- 1) + 1. There must exist strings x and y in X~
such that (A, A) ==~ (x, y), and for aU a E Xk, x and y each have at least d~
occurrences of a, for otherwise Xk would be (A, dl)-bounded.
Let the first step of the derivation (A, A) ==~ (x, y) be (A, A) ==~
(B~ . . . Bin, C 1 " " C~). Since T is assumed to be of order k - 1, we have
m ~ k -- 1. We can write x = x ~ . . . xm so that (B~, Bi) =-~ (xt, yi) for some y~.
If a is an arbitrary element of E k, it is not possible that none of x i has more

than do occurrences of a, because then x would have no more than do(k -- 1)
= dl -- 1 occurrences of a. Let ®t be the subset of £k such that xt has more
than do occurrences of all and only those members of Oi. By the foregoing,
O1 U 02 t,A . . . U O m = £k" We claim that Bt covers Ot for each i. For if
not, then O~ is (Be, d)-bounded for some d > do. By our choice of do, this is
impossible. D
DEFINITION
Let at, aj, and at be distinct members of £k. We say that a~ is between
a t and al if either
(1) i < j < l , or
(2) zte(i) < ~k(J) < z~k(l).
Thus a symbol is formally between two others if it appears physically between
them either in the domain or range of Tk.
LEMMA 3.12
Let A cover £k, and let A ---~ B ~ . . . Bin, C 1 " " Cm be a rule satisfying
Lemma 3.11. If Bt covers {at} and also covers {a,}, and at is between a t and
a,, then Bt covers {a,}, and for no j ~ i does Bj cover {a,}.
P r o o f Let us suppose that r < t < s. Suppose that Bj covers {a,}, j ~ i.
There are two cases to consider, depending on whether j i.
Case 1: j < i. Since in the underlying grammar of T, Bt derives a string
g
with a, in it and Bj derives a string with a, in it, we have (A, A):=~ (x, y),
where x has an instance of at preceding an instance of a t. Then by Lemma
3.6, there exists such a sentence in the domain of Tk, which we know not to
be the case.
Case 2: j > i. Allow Be to derive a sentence with a, in it, and we can
similarly find a sentence in the domain of T k with a, preceding a,.
By contradiction, we rule out the possibility that r < t < s. The only
other possibility, that nk(r) < nk(t) < nk(S), is handled similarly, reasoning
about the range of T k. Thus no Bj, j ~ i, covers {at}. If Bj covers £, where
a, e £, then B~ certainly covers {a,}. Thus by Lemma 3.11, Bt covers some
set containing a,, and hence covers {a,}. D
LEMMA 3.13
If A covers Ek, k > 4, then there is some rule A --~ B ~ . . . Bin, C ~ . . . C~
and some i, 1 ~ i ~ m, such that Bg covers £k'
Proof. We shall do the case in which k is even. The case of odd k is simi-
lar and will be left for the Exercises. Let A ~ B ~ . . . Bin, C x "'" Cm be a rule
satisfying Lemma 3.11. Since m ~ k -- 1 by hypothesis about T, there must
be some Bt which covers two members of Ek, say Bt covers [a,, as}, r ~ s.
Hence, Bt covers {at} and [as}, and by Lemma 3.12, if at is between ar and as,
then Bt covers (at} and no C i, j ~ i, covers {at}.
If we consider the range of Tk, we see that, should Bt cover (ak/z] and
{akin+ ~}, then it covers (a} for all a ~ E~, and no other Bj covers any {a}.
It will follow by Lemma 3.11 that Bt covers Ek. Reasoning further, if Bt
covers [am} and {a,}, where m < k/2 and n > k/2, then consideration of
the domain assures us that B~ covers {ak/2} and (ak/2+ ~}.
Thus, if one of r and s is equal to or less than k/2, while the other is greater
than k/2, the desired result is immediate.
The other cases are that r < k/2 and s < k/2 or r > k/2, s > k/2. But in
the range, any distinct r and s, both equal to or less than k/2, have some at,
t > k/2, between them. Likewise, if r > k/2 and s > k/2, we find some at,
t < k/2, between them. The lemma thus follows in any case. [~]
LEMMA 3.14
Tk is in 3k -- 3e-~, for k > 4.
Proof. Clearly, T k is in ~3k. It suffices to show that T, the hypothetical
SDTS of order k -- 1, does not exist. Since S certainly covers Zk, by Lemma
3.13 we can find a sequence of nonterminals A0, A 1 , . . . , A#N in N, where
A 0 = S and for 0 < i < ~ N , there is a rule A t ~ stAr+lilt, ?tAt+Id;r More-
over, for all i, At covers Ek. N o t all the A's can be distinct, so we can find
i and j, with i < j and A t = Aj. By Lemma 3.6, we can find w t , . . . , w~o so
that for all p ~ 0,
(S, S) ---> (w~A~w~, w3A,w~)

(w~w~A,w6w2,W3WTA:sW4)
(w ~w~A,w~w2, w3w~A~wfw,)
(WlWsW9W6W2~
" " W3w¢w~0w~w,).
By Lemma 3.9(1) we can assume that not all of ~t, fit, ~'t, and Jt are e, and
by Lemma 3.9(2) that not all of ws, w6, WT, and ws are e.
For each a ~ Ek, it must be that wsw6 and w7w8 have the same number of
occurrences of a, or else there would be a pair in z(T) not in T k. Since A t
covers Ek, should ws, w6, w7, and w8 have any symbol but ax or a~, we could
easily choose w9 to obtain a pair not in Tk. Hence there is an occurrence of
a t or a k in w 7 or w8. Since A t covers E~ again, we could choose wl0 to yield
a pair not in T k. We conclude that T does not exist, and that T k is not in
THEOREM 3.8
With the exception of k = 2, 3k is properly contained in ~3k÷1 for k .~ 1.
Proof. The case k = 1 is L e m m a 3.5. The other cases are L e m m a 3.14.
D
A n interesting practical consequence of T h e o r e m 3.8 is that while it m a y
be attractive to build a compiler writing system that assumes the underlying
g r a m m a r to be in C h o m s k y n o r m a l form, such a system is not capable o f
p e r f o r m i n g any syntax-directed translation of which a m o r e general system
is capable. However, it is likely that a practically motivated S D T would at
worst be in ~33 (and hence in ~3z).
EXERCISES
"3.2.1. Let T be a SDT. Show that there is a constant c such that for each x in
the domain of T, there exists y such that (x, y) 6 T and [y I<_ c([ x[ ÷ 1).
*3.2.2 (a) Show that if T1 is a regular translation and Tz is an SDT, then
Ti o T2 = {(x, z)[ for some y, (x, y) ~ T1 and (y, z) ~ T2} is an SDT.I"
(b) Show that T1 o T2 is simple if T2 is.
3.2.3 (a) Show that if T is an SDT, then T-~ is an SDT.
(b) Show that T-1 is simple if T is.
*3.2.4 (a) Let T~ be a regular translation and T2 an SDT. Show that T2 o T~
is an SDT.
(b) Show that T2 o T~ is simple if T2 is.
3.2.5. Give strong characterizing languages for
(a) The SDT Example 3.5.
(b) The SDT of Example 3.7.
(c) The SDT of Example 3.12
3.2.6. Give characterizing languages for the SDT's of Exercise 3.2.5 which do
not strongly characterize them.
3.2.8. Complete case 2 of Example 3.17.
3.2.9. Show that every simple SDT is defined by a simple SDTS with no useless
nonterminals.
3.2.10. Let T~ be a simple SDT and Tz a regular translation. Is T1 ~ T2 always
a simple SDT ?
tOften, this operation on translations, called composition, is written with the operands
in the opposite order. That is, our definition above would be for Tz o T1, not T1 o T2.
We shall not change to the definition given here, for the sake of natural appearance.
SEC. 3.3 LEX~CALANALYSIS 251

3.2.13. Give an SDTS of order k for Tk.
3.2.14. Let T = (N, ~, ~, R, S), where N = {S, A, B, C, D}, :E = {a, b, c, d}, and
R has the rules
A ~ aA, aA
A---> e, e
B- ~ bB, b B
B - - - - ~ e, e
C -----~ cC, c C
C - . - - > e, e
D - > dD, dD
D-----> e, e
and one other rule. Give the minimum order of "r(T) if that additional
rule is
(a) S ---~ A B C D , A B C D .
(b) S - - . A B C D , B C D A .
(c) S ~ A B C D , D B C A .
(d) S ~ A B C D , B D A C .
3.2.15. Show that if T is defined by a DPDT, then T is strongly characterized by
a deterministic context-free language.
3.2.16. Is the converse of Exercise 3.2.15 true?
3.2.17. Prove the corollaries to Theorems 3.3 and 3.4.
BIBLIOGRAPHIC NOTES
The concept of a characterizing language and the results of Sections 3.2.1 and
3.2.2 are from Aho and Ullman [1969b]. The results of Section 3.2.3 are from
Aho and Ultman [1969a].
3.3. LEXlCAL ANALYSIS
Lexical analysis is the first phase of the compiling process. In this phase,
characters from the source program are read and collected into single logical
items called tokens. Lexical analysis is important in compilation for several
reasons. Perhaps most significant, replacing identifiers and constants in
a program by single tokens makes the representation of a program much
more convenient for later processing. Lexical analysis further reduces the
length of the representation of the program by removing irrelevant blanks
and comments from the representation of the source program. During sub-
sequent stages of compilation, the compiler may make several passes over
the internal representation of the program. Consequently, reducing the length
of this representation by lexical analysis can reduce the overall compilation
time.
In many situations the constructs we choose to isolate as tokens are some-
what arbitrary. For example, if a language allows complex number constants
of the form
<complex constant> (<real>, <real>)

then two strategies are possible. We can treat <real> as a lexical item and
defer recognition of the construct (<real>, <real>) as complex constant until
syntactic analysis. Alternatively, utilizing a more complicated lexical analyzer,
we might recognize the construct (<real>, <real>) as a complex constant at
the lexical level and pass the token identifier to the syntax analyzer. It is also
important to note that the variations in the terminal character set local to
one computer center can be confined to the lexical level.
Much of the activity that occurs during lexical analysis can be modeled
by finite transducers acting in series or parallel. As an example, we might
have a series of finite transducers constituting the lexical analyzer. The first
transducer in this chain might remove all irrelevant blanks from the source
program, the second might suppress all comments, the third might search for
constants, and so forth. Another possibility might be to have a collection of
finite transducers, one of which would be activated to look for a certain
lexical construct.
In this section we shall discuss techniques which can be used in the con-
struction of efficient lexical analyzers. As mentioned in Section 1.2.1, there
are essentially two kinds of lexical analyzers--direct and indirect. We shall
discuss how to design both from the regular expressions that describe the
tokens involved.
3.3.1. An Extended Language for Regular

Expressions
The sets of allowable character strings that form the identifiers and other
tokens of programming languages are almost invariably regular sets. For
example, F O R T R A N identifiers are described by "from one to six letters or
digits, beginning with a letter." This set is clearly regular and has the regular
expression
(A + . . . + Z)(e + (/t + . . . - t - Z-t- 0 + . . . ÷ 9)

(e + (A + . . . + Z + 0 + . . . + 9)(e + (A ÷ . . . + Z + 0 + . . . + 9)
(e + (A -q- • • • + Z + 0 + . . . -t- 9)(e --Jr-A + . . . + Z + 0 + . . . + 9)))))
SEC. 3.3 LEXICAL ANALYSIS 253
Since the above expression is cumbersome, it would be wise to introduce

extended regular expressions that would describe this and other regular
expressions of practical interest conveniently.
DEFINITION
An extended regular expression and the regular set it denotes are defined
recursively as follows"
(1) If R is a regular expression, then it is an extended regular expression
and denotes itself.'l
(2) If R is an extended regular expression, then
(a) R ÷ is an extended regular expression and denotes RR*.
(b) R *~ is an extended regular expression, and denotes
[e} U R U R R t,.) . . . U R".
(c) R ÷~ is an extended regular expression and denotes
R t,A R R U . . . U R".
(3) If R I and R 2 are extended regular expressions, then R1 ~ R 2 and
R~ -- R 2 are extended regular expressions and denote {x] x e R~ and x ~ R2}
and { x l x ~ R~ and x q~ R2], respectively.
(4) Nothing else is an extended regular expression.
CONVENTION
We shall use liD extended regular expressions in place of the binary +
operator (union) to make the distinction between the latter operator and
the unary ÷ and ÷" operators more apparent.
Another useful facility when defining regular sets is the ability to give
names to regular sets. We must be careful not to make such definitions cir-
cular, or we have essentially a system of defining equations, similar to that
in Section 2.6, capable of defining any context-free language. There is, in
principle, nothing wrong with using the power of defining equations to make
our definitions of tokens (or using a pushdown transducer to recognize
tokens). However, as a general rule, the lexical analyzer has simple structure,
normally that of a finite automaton. Thus we prefer to use a definition mecha-
nism that can define only regular sets and from which finite transducers can
be readily constructed. The inherently context-free portions of a language
are analyzed during the parsing phase, which is considerably more complex
than the lexical phase.
DEFINITION
A sequence o f regular definitions over alphabet X is a list of definitions
A 1 = Ri, A 2 = R 2 , . . . , A, = R,, where A ~ , . . . , A, are distinct symbols
•l'Recall that we do not distinguish between a regular expression and the set it denotes
if the distinction is clear.
not in Z and for 1 < i < n, Rt is an extended regular expression over

u {A 1 , . . . , A~_ 1}. We define R'~, for 1 < i _~ n, an extended regular expres-
sion over E, recursively as follows"
(1) R'~ = R~.
(2) R~ is R, with R' substituted for each instance of A j, 1 < j < i.
The set denoted by A~ is the set denoted by R't.
It should be clear that the sets denoted by extended regular expressions
and sequences of regular definitions are regular. A proof is requested in
the Exercises.
Example 3.18
We can specify the F O R T R A N identifiers by the following sequence of
regular definitions"
(letter) = A IB I . . . IZ
(digit) = 0 1 1 [ . - . t 9
(identifier) = (letter)((letter) I(digit)) *s
If we did not wish to allow the keywords of F O R T R A N to be used as

identifiers, then we could revise the definition of (identifier) to exclude those
strings. Then the last definition should read
(identifier) = ((letter)((letter)l(digit)) .5) -- (DO I I F ] . . . ) V--]

Example 3.19
Wecan define the usual real constants such as 3.14159, --682, or 6.6E -- 29,
by the following sequence of regular definitionst"
(digit) = 0l 1 [ . . . 19
(sign) = -% I - [ e
(integer) = (sign) (digit) +
(decimal) = (sign)((digit)*. (digit)+ I(digit) + • (digit)*)
(constant) = (integer) l (decimal) [(decimal ) E ( i n t e g e r )
3.3.2. Indirect Lexical Analysis
In indirect lexicat analysis, we are expected to determine, scanning a string

of characters, whether a substring forming a particular token appears. If
the set of possible strings of characters which can form this token is denoted
by a regular set, as it usually can be, then the problem of building an indirect
lexical analyzer for this token can be thought of as a problem in the imple-
~A specific implementation of a language would usually impose a restriction on the
length of a constant.
SEC. 3.3 LEXICALANALYSIS 255
mentation of a finite transducer. The finite transducer is almost a finite

automaton in that it looks at the input without producing any output until
it has determined that a token of the given type is present (i.e., reaches
a final state). It then signals that this token has appeared, and the output is
the string of symbols constituting the token.
Obviously, the final state is itself an indication. However, a lexical ana-
lyzer may have to examine one or more symbols beyond the right end of
the token. A simple example is that we cannot determine the right end of
an A L G O L identifier until we encounter a symbol that is neither a letter nor
a digit--symbols normally not considered part of the identifier.
In indirect lexical analysis it is possible to accept an output from the lexical
analyzer which says that a certain token might appear, and if we later discover
that this token does not appear, then backtracking of the parsing algorithm
will ensure that the analyzer for the correct token is eventually set to work
on the same string. Using indirect lexical analysis we must be careful that we
do not perform any erroneous bookkeeping operations. Normally, we should
not enter an identifier in the symbol table until we are sure that it is a valid
identifier. (Alternatively, we can provide a mechanism for deleting entries
from tables.)
The problem of indirect lexical analysis is thus essentially the problem of
constructing a deterministic finite automaton from a regular expression and
its implementation in software. The results of Chapter 2 convince us that
the construction is possible, although much work is involved. It turns out
that it is not hard to go directly from a regular expression to a nondetermin-
istic finite automaton. We can then use Theorem 2.3 to convert to a deter-
ministic one or we can simulate the nondeterministic finite automaton by
keeping track of all possible move sequences in parallel. In direct lexical
analysis as well, it is convenient to begin the design of a direct lexical analyzer
with concise nondeterministic finite automata for each of the tokens.
The nondeterministic finite automaton can be constructed by an algorithm
similar to the one by which right-linear grammars were constructed from
regular expressions in Section 2.2. It is rather tricky to extend the construc-
tion of nondeterministic automata to all the extended regular expressions
directly, especially since the n and -- operations imply constructions on
deterministic automata. (It is very difficult to prove that R~ n R2 or
R~ -- R~ are regular if R~ and R z are defined by nondeterministic automata
without somehow making reference to deterministic automata. On the other
hand, proofs of closure under u , . , and • need no reference to deterministic
automata.) However, the operators ÷, ÷~, and *" are handled naturally.
ALGORITHM 3.2
Construction of a nondeterministic finite automaton from an extended
regular expression.
Input. An extended regular expression R over alphabet X, with no

instance of symbol Z~ or operator n o r - - .
Output. A nondeterministic finite automaton M such that T ( M ) = R.
Method.
(I) Execute step (2) recursively, beginning with expression R. Let M be
the automaton constructed by the first call of that step.
(2) Let R o be the extended regular expression to which this step is applied.
A nondeterministic finite automaton Mo is constructed. Several cases occur"
(a) Ro is the symbol e. Let Mo = ([q}, E, ~ , q, {q}); where q is a new
symbol.
(b) R o is symbol a in X. Let Mo = ({ql, q2], Z, ~o, ql, {q2]), where
~o(ql, a) = {q2} and ~o is undefined otherwise; q i and q2 are new
symbols.
(c) R o is R i[R z. Then we can apply step (2) to R1 and R z
to yield M1 : (Q1, X, ~ I, ql, F1) and M2 = (Q2, X, ~2, q2, Fz),
respectively, where Q1 and Q2 are disjoint. Construct Mo =
(01 u Q2 t..j {q0}, X, ~o, qo, Fo), where
(i) qo is a new symbol.
(ii) c5o includes ~ 1 and ~2, and ~o(qo, a) = ~l(q 1, a) U ~2(q2, a).
(iii) Fo is F1 U F 2 if neither q l ~ F1 nor q2 ~ F2, and Fo :
F1 U F2 U {qo} otherwise.
(d) Ro is R1R 2. Apply step (2) to R 1 and R2 to yield M1 and M2 as
in case (c). Construct Mo = (Q1 u Q2, X, ~o, q 1, Fo), where
(i) ~o includes ~2; for all q ~ Q, and a ~ X, ~o(q, a) = ~l(q, a)
if q ~ F, and ~5o(q, a) = c5l(q, a) u ~2(q2, a) otherwise.
(ii) Fo = F2 if q2 is not in F2 and Fo = F1 U F2 otherwise.
(e) Ro is Rt*. Apply step (2) to R1 to yield M1 = (Q1, X, ~51, ql, El).
Construct Mo = (Q1 u {qo], E, ~o, qo, F1 U {qo}), where qo is
a new symbol, and ~o is defined by
(i) ~o(qo, a) = ~l(q 1, a).
(ii) If q ¢= El, then ~o(q, a) : ~ l(q, a).
(iii) I f q ~ F, then C5o(q, a) = ~ ( q , a) U c51(ql, a).
(f) Ro is Rt. Apply step (2) to R 1 to yield M1 as in (e). Construct
Mo = (Q1, X, ~o, ql, Ft), where ~o(q, a) = ~l(q, a) if q 6 F1 and
~o(q, a) = ~ l(q, a) u ~ l(q 1, a) if q ~ El.
(g) Ro is R~*". Apply step (2) to R1 to yield M1 as in (e). Construct
Mo : (Qi × { 1 , . . . , n], X, ~o, [q l, 1], Fo), where
(i) Ifq ~ F 1 or i = n , then~o([q,i],a):{[p,i]l~5~(q,a)containsp}.
(ii) If q ~ F1 and i < n, then ~o([q, i], a) =
([P, i]1~ ~(q, a) contains p} U {[p, i + 1][~ ~(q~, a) contains p}.
(iii) Fo = {[q, i][q ~ F,, 1 < i ~ n] L) [[q,, 1]}.
(h) R0 is R~-~. Do the same as in step (g), but in part (iii) F0 is defined
as [[q, i] l q ~ El, 1 ~ i < n] instead.
THEOREM 3.9
Algorithm 3.2 yields a nondeterministic finite automaton M such that

T ( M ) = R.
Proof. Inductive exercise.
We comment that in parts (g) and (h) of Algorithm 3.2 the second com-
ponent of the state of M0 can be implemented efficiently in software as a
counter, in many cases, even when the automaton is converted to a deter-
ministic version. This is so because in many cases R 1 has the prefix property,
and a word in R t n can be broken into words in R x trivially. For example,
R~ might be (digit) as in Example 3.18, and all members of (digit) are of
length 1.
Example 3.20
Let us develop a nondeterministic automaton for the identifiers defined

in Example 3.18. To apply step (2) of Algorithm 3.2 to the expression named
(identifier), we must apply it to (letter) and ((letter)] (digit)) *s. The con-
struction for the former actually involves 26 applications of step (2b) and 25
of step (2c). However, the result is seen to be ({q l, q2], X, ~1, q x, {q2}), if
obvious state identifications are applied, t X = [ A , . . . , Z , 0 , . . . , 9], and
6~(q,, A) = 6,(q,, B) . . . . . 6,(q,, Z ) = [q2}.
To obtain an automaton for ((letter)l(digit)) .5, we need another
automaton for (letter), say ([q3, q4}, X, ~2, q3, [q4}), and the obvious one for
(digit), say ({qs, q6}, X, 63, qs, {q6}). To take the union of these, we add
a new initial state, qT, and find that q3 and q5 cannot be reached therefrom.
Moreover, q4 and q6 can clearly be identified. The resulting machine is
([q4, qT}, X, 64, q7, {q4}), where
64(q7, A) . . . . . ~4(q7, Z ) = t~,(q7, 0) . . . . . ~,(qT, 9) = [q,}.
To apply case (g), we construct states [q,, i] and [qT, i], for 1 ~ i ~ 5.
The final states are [q4, i], 1 _~ i ~ 5, and [qT, 1]. The last is also the initial
state. We have a machine (Qs, X, 85, [qT, 1], Fs), where F5 is as above, and
t~([q7 , 1], a) = {[q4, 1]}; O([q4, i], a) : {[q4, i +- 1]}, for all a in X and i :
tThat is, two states of a nondeterministic finite automaton can be identified if both are
final or both are nonfinal and on each input they transfer to the same set of states. There
are other conditions under which two states of a nondeterministic finite automaton can be
identified, but this condition is all that is needed here.
1, 2, 3, 4. Thus states [qT, 2], . . . , [qT, 5] are not accessible and do not have
to appear in Qs. Hence, Q5 = Fs.
To obtain the final a u t o m a t o n for (identifier> we use case (d). The result-
ing a u t o m a t o n is
M = ([ql, q2, [q,, 1 ] , . . . , [q4, 5]}, Z, 5, ql, [q2, [q,, 1 ] , . . . , [q4, 5]}),
where ~5 is defined by
(1) c~(ql, ~) = [q2} for all letters ix.
(2) c~(q2, 00 = [[q,, 1]} for all 0c in Z.
(3) c~([q,, i], ~) = [[q,, i -1- 1]} for all ~ in Z and 1 _~ i < 5.
Note that [q7, 1] is inaccessible and has been removed from M. Also, M is
deterministic here, although it need not be in general.
The transition graph for this machine is shown in Fig. 3.7. [ZI
Start
A~ . . . ~ Z~
~0 ..... 9
A,...,Z,
0. . . . . 9
Fig. 3.7 Nondeterministic finite automaton for identifiers.
3.3.3. Direct Lexical Analysis
When the lexical analysis is direct, one must search for one of a large
number of tokens. The most efficient way is generally to search for these in
parallel, since the search often narrows quite quickly. Thus the model of
a direct lexical analyzer is m a n y finite a u t o m a t a operating in parallel, or to
be exact, one finite transducer simulating many a u t o m a t a and emitting a sig-
nal as to which of the a u t o m a t a has successfully recognized a string.
If we have a set of nondeterministic finite a u t o m a t a to simulate in parallel
and their state sets are disjoint, we can merge the state sets and next state
functions to create one nondeterministic finite automaton, which may be
converted to a deterministic one by Theorem 2.3. (The only nuance is that
the initial state of the deterministic automaton is the set of all initial states
of the components.) Thus it is more convenient to merge before converting
to a deterministic device than the other way round.
The combined deterministic automaton can be considered to be a simple
kind of finite transducer. It emits the token name and, perhaps, information
that will locate the instance of the token. Each state of the combined automa-
ton represents states from various of the component automata. Apparently,
when the combined automaton enters a state which contains a final state
of one of the component automata, and no other states, it should stop and
emit the name of the token for that component automaton. However, matters
are often not that simple.
For example, if an identifier can be any string of characters except for
a keyword, it does not make for good practice to define an identifier by the
exact regular set, because it is complicated and requires many states. Instead,
one uses a simple definition for identifier (Example 3.18 is one such) and
leaves it to the combined automaton to make the right decision.
In this case, should the combined automaton enter a state which included
a final state for one of the keyword automata and a state of the automaton
for identifiers and the next input symbol (perhaps a blank or special sign)
indicated the end of the token, the keyword would take priority, and indi-
cation that the keyword was found would be emitted.
Example 3.21
Let us consider a somewhat abstract example. Suppose that identifiers
are composed of any string of the four symbols D, F, I, and O, followed by
a blank (b), except for the keywords DO and IF, which need not be followed
by a blank, but may not be followed immediately by any of the letters D,
F, I, or O.
The identifiers are recognized by the finite automaton of Fig. 3.8(a),
DO by that of Fig. 3.8(b), and IF by Fig. 3.8(c). (All automata here are deter-
ministic, although that need not be true in general, of course.)
The merged automaton is shown in Fig. 3.9. State q2 indicates that an
identifier has been found. However, states [q 1, q8} and [q 1, qs} are ambiguous.
They might indicate IF or DO, respectively, or they might just indicate the
initial portion of some identifier, such as DOOF. To resolve the conflict,
the lexical analyzer must look at an additional character. If a D , O , I, or F
follows, we had the prefix of an identifier. If anything else, including a blank,
follows (assume that there are more characters than the five mentioned), we
enter new states, q9 or q l0, and emit a signal to the effect that DO or IF,
respectively, was detected, and that it ends one symbol previously. If we
enter q2, we emit a signal saying that an identifier has been found, ending one
symbol previously.
Since it is the output of the device, not the state, that is important, states
F,I,O
Start
(a) Identifiers
Start-- Q
(b) DO
Start r Q
(c) IF
Fig. 3.8 Automata for lexical analysis.
q2, qg, and ql0 can be identified and, in fact, will have no representation at
all in the implementation. U
3.3.4. S o f t w a r e Simulation of Finite Transducers
There are several approaches to the simulation of finite automata or

transducers. A slow but compact technique is to encode the next move
function of the device and execute the encoding interpretively. Since lexical
analysis is a major portion of the activity of a translator, this mode of opera-
tion is frequently too slow to be acceptable. However, some computers have
single instructions that can recognize the kinds of tokens with which we have
been dealing. While these instructions cannot simulate an arbitrary finite
automaton, they work very well when tokens are either keywords or identi-
fiers.
An alternative approach is to make a piece of program for each state.
The function of the program is to determine the next character (a subroutine
may be used to locate that character), emit any output required, and transfer
to the entry of the program corresponding to the next state.
An important design question is the proper method of determining the
next character. If the next state function for the current state were such
that most different next characters lead to different next states, there is proba-
bly nothing better to do than to transfer indirectly through a table based on
the next character. This method is as fast as any, but requires a table whose
size is proportional to the number of different characters.
In the typical lexical analyzer, there will be many states such that all but
EXERCISES 261
not D, F, I, 0
D,F,I,O
D D,F,I
D,
F,O
Start
D,I,O
D,F,I,O
notD, F,I,O
Fig. 3.9 Combined lexical analyzer.t
very few next characters lead to the same state, It may be too expensive of
space to allocate a full table for each such state. A reasonable compromise
between time and space considerations, for many states, would be to use
binary decisions to weed out those few characters that cause a transition to
an unusual state.
EXERCISES
3.3.1. Give regular expressions for the following extended regular expressions:
(a) (a+3b+3) .2.
(b) (alb)* -- (ab)*.
(c) (aal bb) .4 ~ a(ab[ ba)+b.
tUnlike Fig. 3.8(a), Fig. 3.9 does not permit the empty string to be an identifier.
3.3.2. Give a sequence of regular definitions that culminate in the definition of

(a) A L G O L identifiers.
(b) PL/I identifiers.
(c) Complex constants of the form (~, fl), where 0c and fl are real
F O R T R A N constants.
(d) Comments in PL/I.
3.3.4. Give indirect lexica! analyzers for the three regular sets of Exercise 3.3.2.
3.3.5. Give a direct lexical analyzer that distinguishes among the following
tokens"
(1) Identifiers consisting of any sequence of letters and digits, with
at least one letter somewhere. [An exception occurs in rule (3).]
(2) Constants as in Example 3.19.
(3) The keywords IF, IN, and INTEGER, which are not to be
considered identifiers.
3.3.6. Extend the notion of indistinguishable states (Section 2.3) to apply to
nondeterministic finite automata. If all indistinguishable states are
merged, do we necessarily get a minimum state nondeterministic auto-
maton ?
**3.3.7. Is direct lexical analysis for F O R T R A N easier if the source program is
scanned backward ?
Research Problem
3.3.8. Give an algorithm to choose an implementation for direct lexical ana-
lyzers. Your algorithm should be able to accept some indication of the
desired time-space trade off. You may not wish to implement the symbol-
by-symbol action of a finite automaton, but rather allow for the possibility
of other actions. For example, if many of the tokens were arithmetic
signs of length 1, and these had to be separated by blanks, as in
SNOBOL, it might be wise to separate out these tokens from others as
the first move of the lexical analyzer by checking whether the second
character was blank.
3.3.9. Construct a lexical analyzer for one of the programming languages given
in the Appendix. Give consideration to how the lexical analyzer will
recover from lexicai errors, particularly misspellings.
3.3.10. Devise a programming language based on extended regular expressions.
Construct a compiler for this language. The object language program
should be an implementation of the lexical analyzer described by the
source program.
SEC. 3.4 PARSING 263
BIBLIOGRAPHIC NOTES
The AED RWORD (Read a WORD) system was the first major system to use
finite state machine techniques in the construction of lexical analyzers. Johnson
et al. [1968] provide an overview of this system.
An algorithm that constructs from a regular expression a machine language
program that simulates a corresponding nondeterministic finite automaton is given
by Thompson [1968]. This algorithm has been used as a pattern-matching mecha-
nism in a powerful text-editing language called QED.
A lexical analyzer should be designed to cope with lexical errors in its input.
Some examples of lexical errors are
(1) Substitution of an incorrect symbol for a correct symbol in a token.
(2) Insertion of an extra symbol in a token.
(3) Deletion of a symbol from a token.
(4) Transposition of a pair of adjacent symbols in a token.
Freeman [1964] and Morgan [1970] describe techniques which can be used to
detect and recover from errors of this nature. The Bibliographic Notes at the end
of Section 1.2 provide additional references to error detection and recovery in
compiling.
3.4. PARSING
The second phase of compiling is normally that of parsing or syntax

analysis. In this section, formal definitions of two common types of parsing
are given, and their capabilities are briefly compared. We shall also discuss
what it means for one grammar to "cover" another grammar.
3.4.1. Definition of Parsing
We say that a sentence w in L(G) for some C F G G has been parsed when
we know one (or perhaps all) of its derivation trees. In a translator, this tree
may be "physically" constructed in the computer memory, but it is more
likely that its representation is more subtle. One can deduce the parse tree
by watching the steps taken by the syntax analyzer, although the connection
would hardly be obvious at first.
Fortunately, most compilers parse by simulating a PDA which is recog-
nizing the input either top-down or bottom-up (see Section 2.5). We shall
see that the ability of a PDA to parse top-down is associated with the ability
of a P D T to map input strings to their leftmost derivations. Bottom-up pars-
ing is similarly associated with mapping input strings to the reverse of their
rightmost derivations. We shall thus treat the parsing problem as that of
mapping strings to either leftmost or rightmost derivations. While there are
many other parsing strategies, these two definitions serve as the significant
benchmarks.
Some other parsing strategies are mentioned in various parts of the book.
In the Exercises at the end of Sections 3.4, 4.1, and 5.1 we shall discuss left-
corner parsing, a parsing method that is both top-down and bottom-up in
nature. In Section 6.2.1 of Chapter 6 we shall discuss generalized top-down
and bottom-up parsing.
DEFINITION
Let G = (N, E, P, S) be a CFG, and suppose that the productions of P

are numbered 1, 2 , . . . , p. Let a be in (N U X)*. Then
(1) A left parse of a is a sequence of productions used in a leftmost deri-
vation of a from S.
(2) A right parse of a is the reverse of a sequence of productions used in
a rightmost derivation of a from S in G.
We can represent these parses by a sequence of numbers from 1 to p.
Example 3.22
Consider the grammar Go, where the productions are numbered as shown:
(1) E ~ E + T
(2) E---~ T
(3) T---~ T. F
(4) T ~ F
(5) F ~ (E)
(6) F ~ a
The left parse of the sentence a , (a + a) is 23465124646. The right parse
of a • (a + a) is 64642641532.
We shall use an extension of the =~ notation to describe left and right

parses.
CONVENTION
Let G = (N, X, P, S) be a CFG, and assume that the productions are

numbered from I to p. We write ~z ~=-~ ,8 if a ==~ fl and the production
lm
applied is numbered i. Similarly, we write 0~=>~ fl if 0~=~ fl and production
rm
i is used. We extend these notations by
(1) If a ~'==~ fl and fl "'==~ 7, then a .... ==~ 7.
(2) If a ==~"' fl and fl ==~"' t', then a =~ .... 2,'.
3.4.2, Top-Down Parsing
In this section we wish to examine the nature of the left-parsing problem

for CFG's. Let rr = i 1 .-- i, be a left parse of a sentence w in L(G), where
G is a CFG. Knowing n, we can construct a parse tree for w in the following
"top-down" manner. We begin with the root labeled S. Then i l gives the
sEc. 3.4 PARSING 265
production to be used to expand S. Suppose that i l is the number of the

production S ---~ X i . • • Xk. We then create k descendants of the node labeled
S and label these descendants X1, X 2 , . . •, Xk. If X1, X 2 , . . •, X~_ 1 are ter-
minals, then the first i - I symbols of w must be X 1 . " Xi_l. Production
i2 must then be of the form Xt ---~ Y ~ ' " Y~, and we can continue building
the parse tree for w by expanding the node labeled X~. We can proceed in
this fashion and construct the entire parse tree for w corresponding to
the left parse ~r.
Now suppose that we are given a C F G G = (N, E, P, S) in which the
productions are numbered from 1 through p and a string w ~ E* for which
we wish to construct a left parse. One way of looking at this problem is that
we know the root and frontier of a parse tree and "all" we need to do is
fill in the intermediate nodes. Left parsing suggests that we attempt to fill in
the parse starting from the root and then working left to right toward the
frontier.
It is quite easy to show that there is a simple SDTS which maps strings
in L(G) to all their left (or right, if you prefer) parses. We shall define such
an SDTS here, although we prefer to examine the PDT which implements
the translation, because the latter gives an introduction to the physical
execution of its translation.
DEFINITION
Let G = (N, Z, P, S) be a C F G in which the productions have been
numbered from 1 to p. Define T~, or Tt, where G is understood, to be the
SDTS (N, ~, { 1 , . . . ,p}, R, S), where R consists of rules A----~ ~, fl such
that A ~ ~ is production i in P and fl is i~', where ~' is ~ with the terminals
deleted.
Example 3.23
Let G O be the usual grammar with productions numbered as in Example
3.22. Then T~ = ([E, T, F}, { + , . , (,), a}, { 1 , . . . , 6}, R, E), where R consists
of
E ~E+ T, lET
E ~ T, 2T
T~ ~ T * F, 3TF
T ,~ F, 4F
r ~ (E), 5E
F ~ a, 6
The pair of derivation trees in Fig. 3.10 shows the translation defined for
a,(a+a). D
The following theorem is left for the Exercises.
E E
i 2/!
L 4/! !NNNN~,
! 1/E~T
1
T
!
!
F
!
2/! 4/!
F a 4
/! F
I
6
I !
a 6
(a) input (b) output

Fig. 3.10 Translation T~.
THEOREM 3.10
Let G = (N, X, P, S) be a CFG. Then Ti = [(w, zt) lS ~==~w}.
Proof. We can prove by induction that (A, A) =-~ (w, n) if and only if
T~
Using a construction similar to that in Lemma 3.2, we can construct for

any grammar G a nondeterministic pushdown transducer that acts as a left
parser for G.
DEFINITION
Let G = (N, X, P, S) be a C F G in which the productions have been num-
bered from 1 to p. Let M f (or Mz when G is understood) be the nondeter-
ministic pushdown transducer ([q}, X, N U X, [ 1, 2 , . . . , p}, t~, q, S, ~), where
is defined as follows"
(1) t~(q, e, A) contains (q, a, i) if the ith production in P is A --+ a.
(2) 6(q, a, a) = {(q, e, e)} for all a in X.
We call M~ the left parser for G.
With input w, Mz simulates a leftmost derivation of w from S in G. Using
rules in (1), each time Mi expands a nonterminal on top of the pushdown
list according to a production in P, Mz will also emit the number of that
production. If there is a terminal symbol on top of the pushdown list, M~

will use a rule in (2) to ensure that this terminal matches the current input
symbol. Thus, M~ can produce only a leftmost derivation for w.
THEOREM3.11
Let G = (N, ~, P, S) be a CFG. Then lr~(M~) = [(w, n)[S ~=-~ w}.
Proof Another elementary inductive exercise. The inductive hypothesis
this time is that (q, w, A, e ) ~ (q, e, e, rt) if and only if A "==~ w. E]
Note that M~ is almost, but not quite, the PDT that one obtains by
Lemma 3.2 from the SDTS Tj°.
Example 3.24
Let us construct a left parser for G 0. Here
Mi = ({q}, E, N w E, {1, 2 , . . . , 6}, $,q, E, ~),

where
6(q, e, E) = {(q, E .+ T, 1), (q, T, 2)~}
0(q, e, T) = ((q, T • F, 3), (q, F, 4)}
$(q, e, F) = {(q, (E), 5), (q, a, 6)}
O(q, b, b) = [(q, e, e)} for all b in Z
With the input string a + a , a, M~0 can make the following sequence of
moves, among others:
(q,a + a , a , E , e ) (q, a-k- a , a , E + T, 1)

(q, a + a , a , T + T, 12)
(q, a + a , a , F + T, 124)
F- (q, a + a , a , a -4- T, 1246)
~- (q, + a , a, + T, 1246)
(q, a , a, T, 1246)
(q, a • a, T • F, 12463)
(q, a • a, F • F, 124634)
(q, a • a, a • F, 1246346)
(q, • a, • F, 1246346)
(q, a, F, 1246346)
(q, a, a, 12463466)
~- (q, e, e, 12463466) [~
The left parser is in general a nondeterministic device. To use it in prac-

tice, we must simulate it deterministically. There are some grammars, such
as those which are not cycle-free, for which a complete simulation is impos-
sible, in this case because there are an infinity of left parses for some words.
Moreover, the natural simulation, which we shall discuss in Chapter 4, fails
on a larger class of grammars, those which are left-recursive. An essential
requirement for doing top-down parsing is that left recursion be eliminated.
There is a natural class of grammars, which we shall call LL (for scanning
the input from the !eft producing a !eft parse) and discuss in Section 5.1, for
which the left parser can be made deterministic by the simple expedient of
allowing it to look some finite number of symbols ahead on the input and
to base its move on what it sees. The LL grammars are those which can be
parsed "in a natural way" by a deterministic left parser.
There is a wider class of grammars for which there is some D P D T which
can implement the SDTS Tz. These include all the LL grammars and some
others which can be parsed only in an "unnatural" way, i.e., those in which
the contents of the pushdown list do not reflect successive steps of a leftmost
derivation, as does Mz. Such grammars are of only theoretical interest, insofar
as top-down parsing is concerned, but we shall treat them briefly in Section
3.4.4.
3.4,3. Bottom-Up Parsing
Let us now turn our attention to the right-parsing problem. Consider the
rightmost derivation of a + a • a from E in G O•
E---~IE+ T
,-3E+T,F
---~6E+ T,a
----~4E+F,a
~6E+a,a
~.z T + a , a
.,-4F+a,a
~6aq-a.a
Writing in reverse the sequence of productions used in this derivation gives

us the right parse 64264631 for a --t- a • a.
In general, a right parse for a string w in a grammar G = (N, E, P, S) is
a sequence of productions which can be used to reduce w to the sentence
symbol S. Viewed in terms of a derivation tree, a right parse for a sentence
w represents the sequence of handle prunings in which a derivation tree with
sEc. 3.4 PARSING 269
frontier w is pruned to a single node labeled S. In effect, this is equivalent to

starting with only the frontier of a derivation tree for w and then "filling in"
the derivation tree from the leaves to the root. Thus the term "bottom-up"
parsing is often associated with the generation of a right parse.
In analogy with the SDTS Tz which maps words in L(G) to their left
parses, we can define T,, an SDTS which maps words to right parses. The
translation elements have terminals deleted and the production numbers at
the right end. We leave it for the Exercises to show that this SDTS correctly
defines the desired translation.
As for top-down parsing, we are really interested in a PDT which imple-
ments Tr. We shall define an extended PDT in analogy with the extended
PDA.
DEFINITION
An extended P D T is an 8-tuple P = (Q, 1~, F, A, 6, q0, Z0, F), where all
symbols are as before except 6, which is a map from a finite subset of
Q x (Z u {e}) x F* to the finite subsets of Q x F* x A*. Configurations
are defined as before, but with the pushdown top normally on the right,
and we say that (q, aw, fl~, x) ~- (p, w, fly, xy) if and only if J(q, a, ~)
contains (p, 7', Y).
The extended PDT P is deterministic
(1) If for all q ~ Q, a ~ I: u {e}, and ~ ~ F*, @6(q, a, ~) < 1 and,
(2) If 6(q, a, ~) ::/:: ;2 and J(q, b, fl) :/: ;2, with b = a or b = e, then
neither of ~ and fl is a suffix of the other.
DEFINITION
Let G = (N, 1~,P, S) be a CFG. Let M~ be the extended nondeterministic
pushdown transducer ([q}, ~E,N U E U {$}, [ 1 , . . . , p}, ~, q, $, ;2). The push-
down top is on the right, and 6 is defined as follows:
(1) O(q, e, ~) contains (q, A, i) if production i in P is A ---~ 0~.
(2) J(q, a, e) = {(q, a, e)} for all a in E.
(3) O(q, e, $S) = {(q, e, e)}.
This pushdown transducer embodies the elements of what is known as
a shift-reduce parsing algorithm. Under rule (2), M, shifts input symbols
onto the top of the pushdown list. Whenever a handle appears on top of the
pushdown list, Mr can reduce the handle under rule (1) and emit the number
of the production used to reduce the handle. Mr may then shift more input
symbols onto the pushdown list, until the next handle appears on top of
the pushdown list. The handle can then be reduced and the production
number emitted. Mr continues to operate in this fashion until the pushdown
list contains only the sentence symbol on top of the end of pushdown list
marker. Under rule (3) Mr can then enter a configuration in which the push-
down list is empty.
THEOREM 3.12
Let G = (N, Z, P, S) be a CFG. Then lr,(Mr) = {(w, nR)[ S :::~'~ w}.
Proof. The proof is similar to that of Lemma 2.25 and is left for the
Exercises. 5
Example 3.25
The right parser for Go would be
Mr°0 = ({q}, E, N U • U {$}, {1, 2 , . . . , 6}, ~, q, $, ~),
where
~(q, e, E -q- T) = [(q, E, 1)}
d~(q, e, T) = {(q, E, 2)}
~(q, e, T • F) = [(q, T, 3)}
dr(q, e, F) = {(q, T, 4)}
~(q, e, (E)) = {(q, F, 5)}
,~(q, e, a) = {(q, F, 6)}
~(q, b, e) = {(q, b, e)} for all b in
~(q, e, SE) = {(q, e, e)}
With input a 4- a . a, M~ ° could make the following sequence of moves,

among others:
(q, a + a . a, $, e) F-- (q, q-a . a, $a, e)
~- (q, --ka • a, $F, 6)
(q, q- a • a, ST, 64)
~--. (q, ÷ a . a, $E, 642)
(q, a • a, $E q-, 642)
t-- (q, * a, $E + a, 642)
(q, • a, $E + F, 6426)
F- (q, * a, $E + T, 64264)
(q, a, $E + T . , 64264)
(q, e, $E + T . a, 64264)
(q, e, SE + T • F, 642646)
(q, e, $E q- T, 6426463)
~- (q, e, $E, 64264631)
(q, e, e, 64264631)
Thus, Mr would produce the right parse 64264631 for the input string
a + a.a. [~]
We shall discuss deterministic simulation of a nondeterministic right

parser in Chapter 4. In Section 5.2 we shall discuss an important subclass of
CFG's, the LR (for scanning the input from [eft to right and producing
a right parse), for which the PDT can be made to operate deterministically
by allowing it to look some finite number of symbols ahead on the input.
The LR grammars are thus those which can be parsed naturally bottom-
up and deterministically. As in left parsing, there are grammars which may
be right-parsed deterministically, but not in the natural way. We shall treat
these in the next section.
3.4.4. Comparison of Top-Down and Bottom-Up

Parsing
If we consider only nondeterministic parsers, then there is little compari-

son to be made. By Theorems 3.11 and 3.12, every C F G has both a left and
right parser. However, if we consider the important question of whether
deterministic parsers exist for a given grammar, things are not so simple.
DEFINITION
A CFG G is left-parsable if there exists a D P D T P such that
v(P) = {(x$, :n:)[(x, g) ~ T~}.
G is right-parsable if there exists a D P D T P with
"r(P) = {(x$, ~)](x, ~z) c Tr°}.
In both cases we shall permit the DPDT to use an endmarker to delimit the
right end of the input string.
Note that all grammars are left- and right-parsable in an informal sense,
but it is determinism that is reflected in the formal definition.
We find that the classes of left- and right-parsable grammars are incom-
mensurate; that is, neither is a subset of the other. This is surprising in view
of Section 8.1, where we shall show that the LL grammars, those which can
be left-parsed deterministically in a natural way, are a subset of the LR
grammars, those which can be right-parsed deterministically in a natural way.
The following examples give grammars which are left- (right-) parsable but
not right- (left-) parsable.
Example 3.26
Let G1 be defined by
(1) S ---, BAb (2) S --~ CAc

(3) A ~ BA (4) A--~ a
(5) B ~ a (6) C ~ a
L ( G t ) - - a a + b ÷ aa+e. We can show that G1 is neither LL nor LR,
because we do not know whether the first a in any sentence comes from B
or C until we have seen the last symbol of the sentence.
However we can "unnaturally" produce a left parse for any input string
with a D P D T as follows. Suppose that the input is a"+2b, n ~ O. Then the
D P D T can produce the left parse 15(35)"4 by storing all a's on the pushdown
list until the b is seen. No output is generated until the b is encountered.
Then the D P D T can emit 15(35)"4 by using the a's stored on the pushdown
list to count to n. Likewise, if the input is a"+Ze, we can produce 26(35)"4 as
output. In either case, the trick is to delay producing any output until b or e
is seen.
We shall now attempt to convince the reader that there is no D P D T
which can produce a valid right parse for all inputs. Suppose that M were
a D P D T which produced the right parses 55"43"1 for an+2b and 65"43"2 for
a"+2e. We shall give an informal proof that M does not exist. The proof
draws heavily on ideas in Ginsburg and Greibach [1966] in which it is shown
that {a"b"[n ~ 1} u {a"b2"[n :> 1} is not a deterministic CFL. The reader is
referred there for assistance in constructing a formal proof. We can show
each of the following"
(1) Let a * be input to M. Then the output of M is empty, or else M would
emit a 5 or 6, and we could "fool" it by placing c or b, respectively, on the
input, causing M to produce an erroneofis output.
(2) As a's enter the input of M, they must be stored in some way on the
pushdown list. Specifically, we can show that there exist integers j and k,
pushdown strings a and fl, and state q such that for all integers p ~ 0,
(qo, ak+JP, Zo, e) t--- (q, e, flPoc, e), where q0 and Z0 are the initial state and
pushdown symbol of M.
(3) If after k + jp a's, one b appears on M ' s input, M cannot emit symbol
4 before erasing its pushdown tape to e. For if it did, we could "fool" it by
previously placingj more a's on the input and finding that M emits the same
number of 5's as it did previously.
(4) After reducing its pushdown list to e, M cannot "remember" how
many a's were on the input, because the only thing different about M ' s
configurations for different values of p (where k ÷ jp is the number of a's)
is now the state. Thus, M does not know how many 3's to emit.
Example 3.27
Let G z be defined by
(1) S--~ Ab (2) S --~ Ac
(3) A ~ AB (4) A --~ a
(5) B - - . a
L(G2) = a+b + a ÷c. It is easy to show that G2 is right-parsable. Using an

argument similar to that in Example 3.26 it can be shown that G2 is not left-
parsable. E]
THEOREM 3.13
The classes of left- and right-parsable grammars are incommensurate.
Proof. By Examples 3.26 and 3.27. [Z]
Despite the above theorem, as a general rule, bottom-up parsing is more

appealing than top-down parsing. For a given programming language is
often easier to write down a grammar that is right-parsable than one that is
left-parsable. Also, as was mentioned, the LL grammars are included in
the LR grammars. In the next chapter, we shall also see that the natural
simulation of a nondeterministic PDT works for a class of grammars that
is, in a sense to be discussed there, more general when the PDT is a right
parser than a left parser.
When we look at translation, however, the left parse appears more desir-
able. We shall show that every simple SDT can be performed by
(1) A PDT which produces left parses of words, followed by
(2) A DPDT which maps the left parses into output strings of the SDT.
Interestingly, there are simple SDT's such that "left parse" cannot be
replaced by "right parse" in the above.
If a compiler translated by first constructing the entire parse and then
converting the parse to object code, the above claim would be sufficient to
prove that there are certain translations which require a left parse at the
intermediate stage.
However, many compilers construct the parse tree node by node and
compute the translation at each node when that node is constructed. We
claim that if a translation cannot be computed directly from the right parse,
then it cannot be computed node by node, if the nodes themselves are con-
structed in a bottom-up way. These ideas will be discussed in more detail in
Chapter 9, and we ask the reader to wait until then for the matter of node-
by-node translation to be formalized.
DEFINITION
Let G = (N, X, P, S) be a CFG. We define L~ and L~, the left and right
parse languages of G, respectively, by
L~ = {n:lS "---~ w for some w in L(G)}

and
L~ = {nRI S ----'~ W for some w in L(G)}
We can extend the "==~ and ==~ notations to SDT's by saying that
(0~, fl) ~==~ (~,, ~) if and only if (tz, fl) ~ (~,, c~) by a sequence of rules such
that the leftmost nonterminal of e is replaced at each step and these rules,
with translation elements deleted, form the sequence of productions ~. We
define ~-~ for SDT's analogously.
DEFINITION
An SDTS is semantically unambiguous if there are no two distinct rules

of the form A ---~ tx, fl and A ~ 0¢, ~,.
A semantically unambiguous SDTS has exactly one translation element
for each production of the underlying grammar.
THEOREM 3.14
Let T = (N, E, A, R, S) be a semantically unambiguous simple SDTS.
Then there exists a D P D T P such that I:,(P) = {(zt, y) i (S, S) ~ (x, y) for
some x ~ X*}.
Proof. Assume N and zX are disjoint. Let P = ({q}, { 1 , . . . , p}, N L) A,

A, ,5, q, S, ~), where 1 , . . . , p are the numbers of the productions of the
underlying grammar, and ~ is defined by
(I) Let A ---, e be production i, and A ~ e, fl the lone rule beginning
with A ~ 0c. Then ~(q, i, A) (q, fl, e).
(2) For all b in A, ~(q, e, b) = (q, e, b).
P is deterministic because rule (1) applies only with a nonterminal on top
of the pushdown list, and rule (2) applies only with an output symbol on top.
The proof that P works correctly follows from an easy inductive hypothesis"
(q, zt, A, e)[-- (q, e, e, y) if and only if there exists some x in E* such that
(A, A) "=-~ (x, y). We leave the proof for the Exercises.
To show a simple SDT not to be executable by any D P D T which maps

Lf, where G is the underlying grammar, to the output of the SDT, we need
the following lemma.
LEMMA 3.15
There is no D P D T P such that v(P) = {(wc, wRcw)lw ~ (a, b}*}.
Proof. Here the symbol c plays the role of a right endmarker. Suppose
that with input w, P emitted some non-e-string, say dx, where d = a or b.
Let d be the other of a and b, and consider the action of P with wdc as input.
It must emit some string, but that string begins with d. Hence, P does not
map wdc to dwRcwd, as demanded. Thus, P may not emit any output until
the right endmarker c is reached. At that time, it has some string aw on its
pushdown list and is in state qw.
Informally, a~ must be essentially w, in which case, by erasing a~, P can
emit wR. But once P has erased aw, P cannot then "remember" all of w in order
SEe, 3.4 PARSING 275
to print it. A formal proof of the lemma draws upon the ideas outlined in
Example 3.26. We shall sketch such a proof here.
Consider inputs of the form w - - a ; . Then there are integers j and k,
a state q, and strings • and fl such that when the input is aJ+"kc, P will place
eft" on its pushdown list and enter state q. Then P must erase the pushdown
list down to e at or before the time it emits w%. But since e is independent
of w, it is no longer possible to emit w. [--]
THEOREM 3.15
There exists a simple SDTS T - - ( N , E, A, R, S) such that there is no
D P D T P for which r(P) = {OrR, x) ! (S, S) ~'~ (w, x) for some w}.
Proof. Let T be defined by the rules
(1) S --. Sa, aSa
(2) S --, Sb, bSb
(3) S ~ c, c
Then L7 = 3(1 ÷ 2)*, where G is the underlying grammar. If we let
17(1) -- a and h(2) -- b, then the desired z(P) is [(3e, h(e)Rch(e))la ~ [ 1, 2}*}.
If P existed, with or without a right endmarker, then we could easily con-
struct a D P D T to define the translation {(wc, wRcw)lw e {a, b}*}, in contra-
diction of Lemma 3.15. [--I
We conclude that both left parsing and right parsing are of interest, and
we shall study both in succeeding chapters. Another type of parsing which
embodies features of both top-down and bottom-up parsing is left-corner
parsing. Left-corner parsing will be treated in the Exercises.
3.4.5. Grammatical Covering
Let G 1 be a CFG. We can consider a grammar G 2 to be similar from

the point of view of the parsing process if L(Gz) -- L(G 1) and we can express
the left and/or right parse of a sentence generated by G1 in terms of its parse
in Gz. If such is the case, we say that G2 covers G I. There are several uses for
covering grammars. For example, if a programming language is expressed
in terms of a grammar which is "hard" to parse, then it would be desirable
to find a covering grammar which is "easier" to parse. Also, certain parsing
algorithms which we shall study work only if a grammar is some normal
form, e.g., C N F or non-left-recursive. If G1 is an arbitrary grammar and G 2
a particular normal form of G 1, then it would be desirable if the parses in
G i can be simply recovered from those in G z. If this is the case, it is not
necessary that we be able to recover parses in G2 from those in G 1.
For a formal definition of what it means to "recover" parses in one gram-
mar from those in another, we use the notion of a string homomorphism
between the parses. Other, stronger mappings could be used, and some of
these are discussed in the Exercises.
DEFINITION
Let G 1 = (N1, ~, P1, $1) and G2 = (N2, ~, P2, $2) be C F G ' s such that
L(G1) = L(G2). We say that G 2 left-covers G1 if there is a homomorphism h
from P2 to P1 such that
(1) If $2 ~==~ w, then $1 ht~)==~ W, and
(2) For all rt such that Sx ~==~ w, there exists n' such that $2 *'==~ w and
h(n') = n.
We say G 2 that right-covers G~ if there is a homomorphism h from P2 to

PI such that
(I) If S 2 ==~ w, then S1 ==~hC~)W, and
(2) For all rt such that S~ ==~ w, there exists zt' such that Sz ==~' w and
h(n') = n.
Example 3.28
Let G 1 be the grammar
(1) S --~ 0S1
(2) S --~ 01
and G2 be the following C N F grammar equivalent to G l:
(1) S ~ AB
(2) s - ~ A c
(3) B ~ SC
(4) A ~ 0
(5) C ~ 1
We see G 2 left-covers G 1 with the homomorphism h(1) = 1, h(2) = 2, and
h(3) = h(4) = h(5) = e. For example,
S 1432455 > 0011, h(1432455) = 12, and S 12_____~0011

G2 G1
G2 also right-covers G1, and in this case, the same h can be used. For
example,
S, > 1352544 0011, h(1352544) = 12, and S~ 12 0011

G~ G1
G 1 does not left- or right-cover G2. Since both grammars are unambigu-
ous, the mapping between parses is fixed. Thus a homomorphism g showing
that G1 was a left cover would have to map 1"2 into (143)"24(5) "+I, which
can easily be shown to be impossible. [~
Many of the constructions in Section 2.4, which put grammars into

normal forms, can be shown to yield grammars which left- or right-cover
the original.
EXERCISES 277
Example 3.29
The key step in the C h o m s k y normal form construction (Algorithm 2.12)
is the replacement of a production A --~ X t ' " X,, n > 2, by A ~ X t B a ,
B t ~ X 2 B z , . . . , B,_z ~ X,_ t X.. The resulting g r a m m a r can be shown to
left-cover the original if we m a p production A ~ XaB~ to A --, X~ . . . X.
and each of the productions B 1 ~ X z B 2 , . . . , Bn-z - ~ X n - I X n to the empty
string. If we wish a right cover instead, we may replace A ~ X 1 . . ' X, by
A ~ BIXn, O 1 ~ B2Xn_i,... , B,,_ 2 ~ X t X 2. [--]
Other covering results are left for the Exercises.
EXERCISES
3.4.1. Give an algorithm to construct a derivation tree from a left or right

parse.
3.4.2. Let G be a CFG. Show that L~ is a deterministic CFL.
3.4.3. Is LrG always a deterministic CFL?
*3.4.4. Construct a determinstic pushdown transducer P such that
* ( P ) = [(g, g')l~: is in L~ and g' is the right parse

for the same derivation tree}.
*3.4.5. Can you construct a deterministic pushdown transducer P such that

'~(P) = {(g, g')[~: is in L~ and g' is the corresponding left parse} ?
3.4.6. Give left and right parses in Go for the following words"
(a) ((a))
(b) a + (a + a)
(c) a,a,a
3.4.7. Let G be the CFG defined by the following numbered productions

(1) S ~ if B then S else S
(2) S--~ s
(3) B-----~ B A B
(4) B - - , B V B
(5/ B - - ~ b
Give SDTS's which define T~ and T,°.
3.4.8. Give PDT's which define T~ and T~, where G is as in Exercise 3.4.7.
3.4.11. Give an appropriate definition for T,°, and prove that for your SDTS,
words in L(G) are mapped to their right parses.
3.4.12. Give an algorithm to convert an extended PDT to an equivalent PDT.
Your algorithm should be such that if applied to a deterministic extended
PDT, the result is a DPDT. Prove that your algorithm does this.

• 3.4.14. Give deterministic right parsers for the grammars
(a) (1) S ~ S 0
(2) S - - - i S 1
(3) S ~ e
(b) (1) S-~ AB
(2) A~ 0A1
(3) A--,e
(4) B-, B1
(5) B--,e
"3.4.15. Give deterministic left parsers for the grammars
(a) (1) S - - ~ 0S
(2) S---, 1S
(3) S ~ e
(b) (1) S ~ 0S1
(2) S ~ A
(3) A --,. A1
(4) A--~ e
"3.4.16. Which of the grammars in Exercise 3.4.14 have deterministic left par-
sers ? Which in Exercise 3.4.15 have deterministic right parsers ?
"3.4.17. Give a detailed proof that the grammars in Examples 3.26 and 3.27
are right- (left-) parsable but not left- (right-) parsable.
DEFINITION
T h e left corner of a non-e-production is the leftmost symbol (ter-
minal or nonterminal) on the right side. A le~-corner p a r s e of a sentence
is the sequence of productions used at the interior nodes of a parse
tree in which all nodes have been ordered as follows. If a node n has
p direct descendants nl, n2, . . . , np, then all nodes in the subtree with
root n l precede n. Node n precedes all its other descendants. The des-
cendants of n~ precede those of n3, which precede those of r/a, and
so forth.
Roughly speaking, in left-corner parsing the left corner of a produc-
tion is recognized bottom-up and the remainder of the production is
recognized top-down.
Example 3.30
Figure 3.11 shows a parse tree for the sentence bbaaab generated
by the following grammar:
(1) S ~ A S (2) S ~ BB
(3) A ~ b A A (4) A --~ a
(5) B ~ b (6) B - - r e
EXERCISES 279
nl
/N
®n3
/\
(~n4 ! ~)n6 (~)n7 (~)n8
/ I I /
~),n5 Qnl 2 Qnl 3(~)nl4
(~)n9 @nl0 ~)nll

! !
Fig. 3.11 Parse tree.
The ordering of the nodes imposed by the left-corner-parse definition

states that node nz and its descendants precede n l, which is then fol-
lowed by n3 and its descendants. N o d e n4 precedes nz, which precedes
///5, //6, and their descendants. Then //9 precedes ns, which precedes nl0,
n l~, and their descendants. Continuing in this fashion we obtain the
following ordering of nodes:
H4 /'//2 /119 /7/5 ///15 ///10 ///16 Hll /I/12 1/6 H1 ///13 /q/7 t//3 ///14 /18
The left-corner parse is the sequence of productions applied at the

interior nodes in this order. Thus the left-corner parse for bbaaab is
334441625.
A n o t h e r m e t h o d of defining the left-corner parse of a sentence of

a g r a m m a r G is to use the following simple SDTS associated with G.
DEFINITION
Let G = (N, E , P , S) be a C F G in which the productions are

n u m b e r e d 1 to p. Let T~c be the simple SDTS (N, E, (1, 2 . . . . . p), R, S),
where R contains a rule for each product i on in P determined as follows:
If the ith p r o d u c t i o n in P is A - - , B0~ or A ~ a a or A ~ e, then R
contains the rule A - - , B ~ , Bioc" or A --~ a~, i0~' or A ~ e, i, respec-
tively, where ~' is ~ with all terminal symbols removed. Then, if (w, rr)
is in "c(T~c), zc is a left-corner parse for w.
Example 3.31
Tic for the g r a m m a r of the previous example is
S --~ A S , A 1 S S ---, BB, B 2 B

A ~ bAA, 3AA A ---, a, 4
B - - - , b, 5 B - - ~ e, 6
We can confirm that (bbaaab, 334441625) is in z(T~). D
3.4.21. Prove that (w, ~) is in z(T~) if and only if n is a left-corner parse for w.
3.4.22. Show that for each CFG there is a (nondeterministic) PDT which maps
the sentences of the language to their left-corner parses.
3.4.23. Devise algorithms which will map a left-corner parse into (1) the
corresponding left parse and (2) the corresponding right parse and
conversely.
3.4.24. Show that if Ga left- (right-) covers G2 and G2 left- (right-) covers G~,
then G3 left- (right-) covers Gi.
3.4.25. Let G a be a cycle-free grammar. Show that G i is left- and right-covered
by grammars with no single productions.
3.4.26. Show that every cycle-free grammar is left- and right-covered by gram-
mars in CNF.
*3.4.27. Show that not every CFG is covered by an e-free grammar.
3.4.28. Show that Algorithm 2.9, which eliminates useless symbols, produces
a grammar which left- and right-covers the original.
**3.4.29. Show that not every proper grammar is left- or right-covered by a
grammar in GNF. Hint: Consider the grammar S ~ S01 S11011.
**3.4.30. Show that Exercise 3.4.29 still holds if the homomorphism in the
definition of cover is replaced by a finite transduction.
"3.4.31. Does Exercise 3.4.29 still hold if the homomorphism is replaced by a
pushdown transducer mapping ?
Research Problem
3.4.32. It would be nice if whenever G2 left- or right-covered G i, every SDTS
with G1 as underlying grammar were equivalent to an SDTS with G~
as underlying grammar. Unfortunately, this is not so. Can you find the
conditions relating G1 and G2 so that the SDT's with underlying
grammar G t are a subset of those with underlying grammar Gz ?
BIBLIOGRAPHIC NOTES
Additional details concerning grammatical covering can be found in Reynolds

and Haskell [1970], Gray [1969] and Gray and Harrison [1969]. In some early
articles left-corner parsing was called bottom-up parsing. A more extensive
treatment of left-corner parsing is contained in Cheatham [1967].
4 GENERAL PARSING METHODS
This chapter is devoted to parsing algorithms that are applicable to

the entire class of context-free languages. Not all these algorithms can be
used on all context-free grammars, but each context-free language has at
least one grammar for which all these methods are applicable.
The full backtracking algorithms will be discussed first. These algorithms
deterministically simulate nondeterministic parsers. As a function of the
length of the string to be parsed, these backtracking methods require linear
space but may take exponential time.
The algorithms discussed in the second section of this chapter are tabular
in nature. These algorithms are the Cocke-Younger-Kasami algorithm and
Earley's algorithm. They each take space n 2 and time n 3. Earley's algorithm
works for any context-free grammar and requires time n 2 whenever the
grammar is unambiguous.
The algorithms in this chapter are included in this book primarily to give
more insight into the design of parsers. It should be clearly stated at the
outset that backtrack parsing algorithms should be shunned in most practical
applications. Even the tabular methods, which are asymptotically much faster
than the backtracking algorithms, should be avoided if the language at hand
has a grammar for which the more efficient parsing algorithms of Chapters 5 and
6 are applicable. It is almost certain that virtually all programming languages
have easily parsable grammars for which these algorithms are applicable.
The methods of this chapter would be used in applications where the
grammars encountered do not possess the special properties that are needed
by the algorithms of Chapters 5 and 6. For example, if ambiguous grammars
are necessary, and all parses are of interest, as in natural language processing,
then some of the methods of this chapter might be considered.
281
282 GENERALPARSING METHODS CHAP. 4
4.1. BACKTRACK PARSING
Suppose that we have a nondeterministic pushdown transducer P and

an input string w. Suppose further that each sequence of moves that P can
make on input w is of bounded length. Then the total number of distinct
sequences of moves that P can make is also finite, although possibly an expo-
nential function of the length of w. A crude, but straightforward, way of
deterministically simulating P is to linearly order the sequences of moves in
some manner and then simulate each sequence of moves in the prescribed
order.
If we are interested in all outputs for input w, then we would have to
simulate all move sequences. If we are interested in only one output for w,
then once we have found the first sequence of moves that terminates in
a final configuration, we can stop simulating P. Of course, if no sequence of
moves terminates in a final configuration, then all move sequences would
have to be tried.
We can think of backtrack parsing in the following terms. Usually,
the sequences of moves are arranged in such an order that it is possible to
simulate the next move sequence by retracing (backtracking) the last moves
made unti! a configuration is reached in which an untried alternative move
is possible. This alternative move would then be taken. In practice, local
criteria by which it is possible, without simulating an entire sequence, to
determine that the sequence cannot lead to a final configuration, are used
to speed up the backtracking process.
In this section we shall describe how we can deterministically simulate
a nondeterministic pushdown transducer using backtracking. We shall then
discuss two special cases. The first will be top-down backtrack parsing in
which we produce a left parse for the input. The second case is bottom-up
backtrack parsing in which we produce a right parse.
4.1.1. S i m u l a t i o n of a P D T
Let us consider a P D T P and its underlying PDA M. If we give M an

input w, it is convenient to know that while M may nondeterministically try
many sequences of moves, each sequence is of bounded length. If so, then
these sequences can all be tried in some reasonable order. If there are infinite
sequences of moves with input w, it is, in at least one sense, impossible to
directly simulate M completely. Thus we make the following definition.
DEFINITION
A PDA M = (Q, ~, I', $, q0, Z0, F) is halting if for each w in ~*, there is
a constant k , such that if (q0, w, Z 0 ) ~ (q, x, 7), then m < kw. A P D T is
halting if its underlying PDA is halting.
sEc. 4.1 BACKTRACK PARSING 283
It is interesting to observe the conditions on a grammar G under which 1

the left or right parser for G is halting. It is left for the Exercises to s h o w
that the left parser is halting if and only if G is not left-recursive; the right
parser is halting if and only if G is cycle-free and has no e-productions. We
shall show subsequently that these conditions are the ones under which our
general top-down and bottom-up backtrack parsing algorithms work,
although more general algorithms work on a larger class of grammars.
We should observe that the condition of cycle freedom plus no e-produc-
tions is not really very restrictive. Every CFL without e has such a grammar,
and, moreover, any context-free grammar can be made cycle-flee and e-fr~,e
by simple transformations (Algorithms 2.10 and 2.11). What is more, if the
original grammar is unambiguous, then the modified grammar left and right
covers it. Non-left recursion is a more stringent condition in this sense. While
every CFL has a non-left-recursive grammar (Theorem 2.18), there may be
no non-left-recursive covering grammar. (See Exercise 3.4.29.)
As an example of what is involved in backtrack parsing and, in general,
simulating a nondeterministic pushdown transducer, let us consider the
grammar G with productions
(1) S- ~ aSbS ~
(2) S >aS
(3) S >c
The following pushdown transducer T is a left parser for G. The moves of
T are given by
$(q, a, S) = {(q, SbS, 1), (q, S, 2)}
iS(q, c, S) = [(q, e, 3)]
~(q, b, b) = {(q, e, e)}
Suppose that we wish to parse the input string aacbc. Figure 4.I shows
a tree which represents the possible sequences of moves that T can make
with this input.
cJ C ° ~
c3~ ~c4 c,~C2~c,s
C5
!1
C8 C12
[
C16
C6
I I I
C9 Cl 3
C7
I Clo
I ! C14 Fig. 4.1 Moves of parser.
= ? =
284 GENERAL PARSING METHODS CHAP. 4
Co represents the initial configuration (q, aacbc, S, e). The rules of T show
that two next configurations are possible from Co, namely Cx = (q, ache, SbS, 1)
and C2 = (q, acbc, S, 2). (The ordering here is arbitrary.) From C~, T can
enter configurations C3 = (q, cbc, SbSbS, 11) and C4 = (q, cbc, SbS, 12).
From C2, T can enter configurations C~ = (q, cbc, SbS, 21) and C~5 =
(q, cbc, S, 22). The remaining configurations are determined uniquely.
One way to determine all parses for the given input string is to determine
all accepting configurations which are accessible from Co in the tree of con-
figurations. This can be done by tracing out all possible paths which begin
at C Oand terminate in a configuration from which no next move is possible.
We can assign an order in which the paths are tried by ordering the choices
of next moves available to T for each combination of state, input symbol,
and symbol on top of the pushdown list. For example, let us choose (q, SbS, 1)
as the first choice and (q, S, 2) as the second choice of move whenever
the rule t~(q, a, S) is applicable.
Let us now consider how all the accepting configurations of T can be
determined: by systematically tracing out all possible sequences of moves of
T. From Co suppose that we make the first choice of next move to obtain C~.
From C~ we again take the first choice to obtain C 3. Continuing in this
fashion we follow the sequence of configurations Co, C~, C a, C5, C6, C7. C7
represents the terminal configuration (q, e, bS, 1133), which is not an accept-
ing configuration. To determine if there is another terminal configuration,
we can "backtrack" up the tree until we encounter a configuration from
which another choice of next move not yet considered is available. Thus we
must be able to restore configuration C6 from C7. Going back to C6 from
C7 can involve moving the input head back on the input, recovering what
was previously on the pushdown list, and deleting any output symbols that
were emitted in going from C6 to C7. Having restored C6, we must also have
available to us the next choice of moves (if any). Since no alternate choices
exist in C6, we continue backtracking to C5, and then C a and C~.
From C~ we can then use the second choice of move for ~(q, a, S) and
obtain configuration C4. We can then continue through configurations C8
and C9 to obtain C~0 = (q, e, e, 1233), which happens to be an accepting
configuration.
We can then emit the left parse 1233 as output. If we are interested in
obtaining only one parse for the input we can halt at this point. However,
if we are interested in all parses, we can proceed to backtrack to configuration
Co and then try all configurations accessible from C2. C~4 represents another
accepting configuration, (q, e, e, 2133).
We would then halt after all possible sequences of moves that T could
have made have been considered. If the input string had not been syntactically
well formed, then all possible move sequences would have to be considered.
SEC. 4.1 BACKTRACK PARSING 285
After exhausting all choices of moves without finding an accepting configu-

ration we would output the message "error."
The above analysis illustrates the salient features of what is sometimes
known as a nondeterministic algorithm, one in which choices are allowed at
certain steps and all choices must be followed. In effect, we systematically
generate all configurations that the data underlying the algorithm can be in
until we either encounter a solution or exhaust all possibilities. The notion
of a nondeterministic algorithm is thus applicable not only to the simulation
of nondeterministic automata, but to many other problems as well. It is
interesting to note that something analogous to the halting condition for
PDT's always enters into the question of whether a nondeterministic
algorithm can be simulated deterministicaUy. Some specific examples of
nondeterministic algorithms are found in the Exercises.
In syntax analysis a grammar rather than a pushdown transducer will
usually be given. For this reason we shall now discuss top-down and bottom-
up parsing directly in terms of the given grammar rather than in terms of
the left or right parser for the grammar. However, the manner in which
the algorithms work is identical to the serial simulation of the pushdown
parser. Instead of cycling through the possible sequences of moves the parser
can make, we shall cycle through all possible derivations that are consistent
with the input.
4.1.2. Informal Top-Down Parsing
The name top-down parsing comes from the idea that we attempt to
produce a parse tree for the input string starting from the top (root) and
working down to the leaves. We begin by taking the given grammar and
numbering in some order the alternates for every nonterminal. That is, if
A---~ ~z~10c21"" 10c~ are all the A-productions in the grammar, we assign
some ordering to the ¢zt's (the alternates for A).
For example, consider the grammar mentioned in the previous section.
The S-productions are
S > aSbS[aSIc
and let us use them in the order given. That is, aSbS will be the first alternate
for S, aS the second, and c the third. Let us assume that our input string is
aacbc. We shall use an input pointer which initially points at the leftmost
symbol of the input string.
Briefly stated, a top-down parser attempts to generate a derivation tree
for the input as follows. We begin with a tree containing one node labeled S.
That node is the initial active node. We then perform the following steps
recursively"
(1) If the active node is labeled by a nonterminal, say A, then choose

the first alternate, say X~ . . . Xk, for A and create k direct descendants for
A labeled X1, X2,. • . , Xk. Make X1 the active node. If k = 0, then make
the node immediately to the right of A active.
(2) If the active node is labeled by a terminal, say a, then compare the
current input symbol with a. If they match, then make active the node
immediately to the right of a and move the input pointer one symbol to the
right. If a does not match the current input symbol, go back to the node
where the previous production was applied, adjust the input pointer if neces-
sary, and try the next alternate. If no alternate is possible, go back to the next
previous node, and so forth.
At all times we attempt to keep the derivation tree consistent with the
input string. That is, if x~ is the frontier of the tree generated thus far, where
0~is either e or begins with a nonterminal symbol, then x is a prefix of the input
string.
In our example we begin with a derivation tree initially having one node
labeled S. We then apply the first S-production, extending the tree in a man-
ner that is consistent with the given input string. Here, we would use
S ~ a S b S to extend the tree to Fig. 4.2(a). Since the active node of the tree is
a at this instant and the first input symbol is a, we advance the input pointer
to the second input symbol and make the S immediately to the right of a
the new active node. We then expand this S in Fig. 4.2(a), using the first
alternate, to obtain Fig. 4.2(b). Since the new active node is a, which matches
(a) (b)
I
c ¢
i I
¢
(c) (d)
Fig. 4.2 Partial derivation trees.

SEC. 4.1 BACKTRACKPARSING 287
the second input symbol, we advance the input pointer to the third input
symbol.
We then expand the leftmost S in Fig. 4.2(b), but this time we cannot use
either the first or second alternate because then the resulting left-sentential
form would not be consistent with the input string. Thus we must use the
third alternate to obtain Fig. 4.2(c). We can now advance the input pointer
from the third to the fourth and then to the fifth input symbol, since the next
two active symbols in the left-sentential form represented by Fig. 4.2(c) are
c and b.
We can expand the leftmost S in Fig. 4.2(c) using the third alternate for
S to obtain Fig. 4.2(d). (The first two alternates are again inconsistent with
the input.) The fifth terminal symbol is c, and thus we can advance the input
pointer one symbol to the left. (We assume that there is a marker to denote
the end of the input string.) However, there are more symbols generated by
Fig. 4.2(d), namely bS, than there are in the input string, so we now know
that we are on the wrong track in finding a correct parse for the input.
Recalling the pushdown parser of Section 4.1.1, we have at this point gone
through the sequence of configurations Co, C1, C3, C5, C6, C7. There is no
next move possible from C 7.
We must now find some other left-sentential form. We first see if there is
another alternate for the production used to obtain the tree of Fig. 4.2(d)
from the previous tree. There is none, since we used S ~ c to obtain Fig.
4.2(d) from Fig. 4.2(c). We then return to the tree of Fig. 4.2(c) and reset
the input pointer to position 3 on the input. We determine if there is another
alternate for the production used to obtain Fig. 4.2(c) from the previous tree.
Again there is none, since we used S - 4 c to obtain Fig. 4.2(c) from Fig.
4.2(b). We thus return to Fig. 4.2(b), resetting the input pointer to position 2.
We used the first alternate for S to obtain Fig. 4.2(b) from Fig. 4.2(a), so now
we try the second alternate and obtain the tree of Fig. 4.3(a).
We can now advance the input pointer to position 3, since the a generated
matches the a at position 2 in the input string. Now, we may use only the
third alternate to expand the leftmost S in Fig. 4.3(a) to obtain Fig. 4.3(b).
The input symbols at positions 3 and 4 are now matched, so we can advance
the input pointer to position 5. We can apply only the third alternate for S
in Fig. 4.3(b), and we obtain Fig. 4.3(c). The final input symbol is matched
with the rightmost symbol of Fig. 4.3(c). We thus know that Fig. 4.3(c) is
a valid parse for the input. At this point we can backtrack to continue look-
ing for other parses, or terminate.
Because our grammar is not left-recursive, we shall eventually exhaust all
possibilities by backtracking. That is, we would be at the root, and all alter-
nates for S would have been tried. At this point we can halt, and if we have
not found a parse, we can report that the input string is not syntactically
well formed.
a S s
!
c
(a) (b)
a . S c
I
¢
(c)
Fig. 4.3 Further attempts at parsing.
There is a major pitfall in this procedure. If the grammar is left-recursive,

then this process may never terminate. For example, suppose that Ae is
the first alternate for A. We would then apply this production forever when-
ever A is to be expanded.
One might argue that this problem could be avoided by trying the alter-
nate A0c for A last. However, the left recursion might be far more subtle,
involving several productions. For example, the first A-production might be
A ---~ SC. Then if S ~ A B is the first production for S, we would have
A =-~ S C =~ A B C , and this pattern would repeat. Even if a suitable ordering
for the productions of all nonterminals is found, on inputs 'which are not
syntactically well formed, the left-recursive cycles would occur eventually,
since all preceding choices would fail.
A second attempt to nullify the effects of left recursion might be to bound
the number of nodes in the temporary tree in terms of the length of the input
string. If we have a C F G G = (N, X, P, S) with ~ N = k and an input string
w of length n - 1, we can show that if w is in L(G), then there is at least one
derivation tree for w that has no path of length greater than kn. Thus we
could confine our search to derivation trees of depth (maximum path length)
no greater than kn.
However, the number of derivation trees of depth < d can be an enor-
mous function of d for some grammars. For example, consider the grammar
G with productions S ~ S S I e. The number of derivation trees of depth

d for this grammar is given by the recurrence
D(1) = 1
D(d) = (D(d-- 1))2+ 1
Values of D ( d ) for d from 1 to 6 are given in Fig. 4.4.
d D(d)
1 1
2 2
3 5
4 26
5 677
6 458330
Fig. 4.4 Values of D(d).
D ( d ) grows very rapidly, faster than 2 2`-` for d > 3. (Also, see Exercise
4.1.4.) This growth is so huge that any grammar in which two productions
of this form need to be considered could not possibly be reasonably parsed
using this modification of the top-down parsing algorithm.
For these reasons the approach generally taken is to apply the top-down
parsing algorithm only to grammars that are free of left recursion.
4.1.3. The Top-Down Parsing Algorithm
We are now ready to describe our top-down backtrack parsing algorithm.

The algorithm uses two pushdown lists (L1 and L2) and a counter containing
the current position of the input pointer. To describe the algorithm precisely,
we shall use a stylized notation similar to that used to describe configurations
of a pushdown transducer.
ALGORITHM 4.1
Top-down backtrack parsing.
Input. A non-left-recursive CFG G = (N, X, P, S) and an input string
w = ala 2 . . . a,, n ~ O. We assume that the productions in P are numbered
1 , 2 , . . . ,p.
Output. One left parse for w if one exists. The output "error" otherwise.
Method.
(i) For each nonterminal A in N, order the alternates for A. Let A t be
the index for the ith alternate of A. For example, if A --~ t~litz2[.-. [~k
are all the A-productions in P and we have ordered the alternates as shown,
then A 1 is the index for as, A2 is the index for 0~2, and so forth.
(2) A 4-tuple (s, i, ~, fl) will be used to denote a configuration of the
algorithm"
(a) s denotes the state of the algorithm.
(b) i represents the location of the input pointer. We assume that
the n + 1st "input symbol" is $, the right endmarker.
(c) 0~ represents the first pushdown list (L1).
(d) fl represents the second pushdown list (L2).
The top of 0c will be on the right and the top of fl will be on the left. L2 repre-
sents the "current" left-sentential form, the one which our expansion of
nonterminals has produced. Referring to our informal description of top-
down parsing in Section 4.1.1, the symbol on top of L2 is the symbol labeling
the active node of the derivation tree being generated. L1 represents the cur-
rent history of the choices of alternates made and the input symbols over
which the input head has shifted. The algorithm will be in one of three states
q, b, or t; q denotes normal operation, b denotes backtracking, and t is the
terminating state.
(3) The initial configuration of the algorithm is (q, 1, e, S$).
(4) There are six types of steps. These steps will be described in terms of
their effect on the configuration of the algorithm. The heart of the algorithm
is to compute successive configurations defined by a "goes to" relation, t--.
The notation (s, i, a, fl) ~ (s', i', a', fl') means that if the current configu-
ration is (s, i, ~, fl), then we are to go next into the configuration (s', i', ~', fl').
Unless otherwise stated, i can be any integer from 1 to n + 1, a a string in
(X w I)*, where I is the set of indices for the alternates, and fl a string in
(N U X)*. The six types of move are as follows:
(a) Tree expansion
(q, i, oc, Aft) F- (q, i, ocA 1, ?lfl)
where A ~ 71 is a production in P and ?1 is the first alternate for

A. This step corresponds to an expansion of the partial derivation
tree using the first alternate for the leftmost nonterminal in the
tree.
(b) Successful match of input symbols and derived symbol
(q, i, oc, aft) ~- (q, i + 1, oca, fl)
provided a~ - - a , i ~ n. If the ith input symbol matches the next

terminal symbol derived, we move that terminal symbol from
the top of L2 to the top of L1 and increment the input pointer.
(c) Successful conclusion
( q , n + 1, a, $) }--- (t, n -+- 1, a, e)
We have reached the end of the input and have found a left-
sentential form which matches the input. We can recover the left
parse from a by applying the following homomorphism h to a:
h(a) -- e for all a in E; h(A,) = p, where p is the production num-
ber associated with the production A ~ ?, and ? is the ith alter-
nate for A.
(d) Unsuccessful match of input symbol and derived symbol
( q, i, oc, aft) ~ (b, i, o~, aft) if at ~ a
We go into the backtracking mode as soon as the left-sentential

form being derived is not consistent with the input.
(e) Backtracking on input
(b, i, o~a, fl) F- (b, i -- 1, ~, aft)
for all a in E. In the backtracking mode we shift input symbols

back from L1 to L2.
(f) Try next alternate
(b, i, ocAj, ?jfl) ~-

(i) (q, i, ~Aj+ 1, ?j+lfl), if )'j+l is the j + 1st alternate for A.
(Note that ),~ is replaced by ~,j+t on the top of L2.)
(ii) No configuration, if i - - 1 , A = S, and there are only j
alternates for S. (This condition indicates that we have
exhausted all possible left-sentential forms consistent with
the input w without having found a parse for w.)
(iii) ( b , i , ~ , A f l ) otherwise. (Here, the alternates for A are
exhausted, and we backtrack by removing Aj from L1 and
replacing ~,j by A on L2.)
The execution of the algorithm is as follows.
Step 1" Starting in the initial configuration, compute successive next
configurations Co ~ C1 [-- " " ~ Ci ~ .." until no further configurations
can be computed.
Step 2: If the last computed configuration is (t, n + 1, ?, e), emit h(?)
and halt. h(?) is the first found left parse. Otherwise, emit the error signal. D
Algorithm 4.1 is essentially the algorithm we described informally earlier,

with a few bookkeeping features added to perform the backtracking.
Example 4.1
Let us consider the operation of Algorithm 4.1 using the grammar G
with productions
(1) E > T-4- E
(2) E >T
(3) T - - - . F . T
(4) T >F
(5) F - >a
Let E~ be T + E , Ez be T, T~ be F • T, a n d / ' 2 be F. With the input a -4- a,

Algorithm 4.1 computes the following sequence of configurations"
(q, 1, e, E$) [-- (q, 1, g~, T + E$)

I- (q, 1, E 1T 1, F • T + E$)
(q, 1, E 1T1F1, a • T + E$)
(q, 2, E~ TtFta, • T + E$)
(b, 2, Et T~Fta, * T + E$)
[-- (b, 1, E~ T~Ft, a • T + E$)
(b, 1, Et Tt, F , T + E$)
l--- (q, 1, Et T 2, F + E$)
[-- (q, 1, E~ T2F~, a + E$)
(q, 2, Et T2Fta, + E$)
[-- (q, 3, E~ T2F~a +, E$)
l- (q, 3, E, TzF~a + E 1, T + E$)
(q, 3, E~ T2F ~a + Et Tt, F • T + E$)
(q, 3, E~ T2Fta + E~ TtF 1, a • T + E$)
t- (q, 4, E~ T2F~a + E t T~F,a, • T + E$)
(b, 4, E t T2Fxa + E~ TiF~a, • T + E$)
(b, 3, E~ T2F~a + E t T~F t, a • T + E$)
~- (b, 3, E~ T2F~a + E~ T~, F , T + E$)
~- (q, 3, E~ T2F~a + E, T2, F q- E$)
(q, 3, E, TzF~a -k E, T2Ft, a -4- E$)
[- (q, 4, E, TzF~a -t-- E1TzF~a, .+- E$)
[- (b, 4, E~ TzFta -4- E~ TzF, a, + E$)
(b, 3, El TzF~a + E~ TzF a, a + E$)

~- (b, 3, E 1TzF~a + E 1T2, F + E$)
~- (b, 3, Et T2F~a + Ea, T + E$)
~- (q, 3, E 1TzFla -k E2, T$)
]- (q, 3, E 1T2Fla + E2 T 1, F • T$)
]- (q, 3, Ea T2F~a + E2 T~Fa, a • T$)
F- (q, 4, gaTzV~a + gzr~.Fta, • T$)
~--- (b, 4, E1T2Fla + EzT1F1 a, * T$)
[- (b, 3, Ea T2Faa + E2 TiF1, a • T$)
(b, 3, Ea TzFaa + EzT1, F . T$)
(q, 3, g~ T2Faa + E2T2, F$)
[- (q, 3, EI T2F~a + E2 T2F1, aS)
(q, 4, g~ r~F~a + g~r~F~a, $)
(t, 4, E~ T2F~a + E2T2F~a, e)
The left parse is h(E~ T2Fxa + E2T2F~a) = 145245. [Z
We shall now show that Algorithm 4.1 does indeed produce a left parse
for w according to G if one exists.
DEFINITION
A partial left parse is the sequence of productions used in a leftmost
derivation of a left-sententiai form. We say that a partial left parse is con-
sistent with the input string w if the associated left-sentential form is consis-
tent with w.
Let G = (N, X, P, S) be the non-left-recursive g r a m m a r of Example 4.1
and let w = a 1 - . . an be the input string. The sequence of consistent partial
left parses for w, n0, tel, z~2, . . . . zti, • • • is defined as follows:
(1) no is e and represents a derivation of S from S. (z~0 is not strictly
a parse.)
(2) n1 is the production n u m b e r for S ---~ ~, where ~z is the first alternate
for S.
(3) ztt is defined as follows: Suppose that S "'-~=~ xAy. Let fl be the lowest
n u m b e r e d alternate for A, if it exists, such that we can write xfl?-- xyO,
where ~ is either e or begins with a nonterminal and xy is a prefix of w.
Then n ~ - - n i _ i A k , where k is the n u m b e r of alternate ft. In this case we
call rt~ a continuation of n,._~. If, on the other hand, no such fl exists, or
S .... =~ x for some terminal string x, then let j be the largest integer less
than i - 1 such that the following conditions hold:
(a) Let S ~'=~ xB?, and let nj+ 1 be a continuation of nj, with alternate
ek replacing B in the last step of nj+ 1. Then there exists an alter-
nate e,, for B which follows ek in the order of alternates for B.
(b) We can write Xem? = xyO, where O is e or begins with a nonter-
minal; xy is a prefix of w. Then ni -- njB,,, where B m is the num-
ber of production B --, e~. In this case, we call nt a modification
of ~i- 1-
(c) 7r~ is undefined if (a) or (b) does not apply.
Example 4.2
For the g r a m m a r G of Example 4.1 and the input string a + a, the
sequence of consistent partial left parses is
e
1
13
14
145
1451
14513
14514
1452
14523
14524
145245
2
23
24
It should be observed that the sequence of consistent partial left parses

up to the first correct parse is related to the sequence of strings appearing
on L1. Neglecting the terminal symbols on L1, the two sequences are the
same, except that L1 will have certain sequences that are not consistent with
the input. When such a sequence appears on L1, backtracking immediately
occurs. [~]
It should be obvious that the sequence of consistent partial left parses is

unique and includes all the consistent partial left parses in a natural lexi-
cographic order.
LEMMA 4.1
Let G = (N, X, P, S) be a non-left-recursive grammar. Then there exists

i
a constant c such that if A ~ wBot and Iwl = n, then i < c"+2.t
lm
t i n fact, a stronger result is possible; i is linear in n. However, this result suffices for
the time being and will help prove the stronger result.
SEC. 4.1 BACKTRACK PARSIN6 295
Proof. Let ~ N = k, and consider the derivation tree D corresponding

i
to the leftmost derivation A =-~ wBt~. Suppose that there exists a path of
length more than k(n -I- 2) from the root to a leaf. Let n o be the node labeled
by the explicitly shown B in wBe. If the path reaches a leaf to the right of
no, then the path to no must be at least as long. This follows because in
a leftmost derivation the leftmost nonterminal is always rewritten. Thus
the direct ancestor of each node to the right of no is an ancestor of no. The
derivation tree D is shown in Fig. 4.5.
A
w no Fig. 4.5 Derivation tree D.
Thus, if there is a path of length greater than k(n + 2) in D, we can find

one such path which reaches n o or a node to its left. Then we can find k 4-- 1
consecutive nodes, say n l , . . . , nk+l, on the path such that each node yields
the same portion of wB. All the direct descendants of n,., 1 _~ i < k, that lie
to the left of ni+ ~ derive e. We must thus be able to find two of n a , . . . , nk+ ~
with the same label, and this label is easily shown to be a left-recursive
nonterminal.
We may conclude that D has no path of length greater than k(n --1- 2).
Let l be the length of the longest right side of a production. Then D has no
i
more than l k~"+2~ interior nodes. We conclude that if A ~ wBt~, then
i ~ I kC"+2). Choosing c = l k proves the lemma. [Z]
COROLLARY
i
a constant c' such that if S ~ w B e and w ~ e, then [el <_ c'[ w 1.
lm
Proof. Referring to Fig. 4.5, we have shown that the path from the root to
n o is no longer than k(I w l + 2). Thus, [e[_~ kl(I w[ -q- 2). Choose c' = 3kl. E]
LEMMA 4.2
Let G = (N, Z , P , S) be a C F G with no useless nonterminals and
w = ala 2 . . . a, an input string in Z*. The sequence of consistent left parses
for w is finite if and only if G is not left-recursive.
Proof. If G is left-recursive, then clearly the sequence of consistent left

parses is infinite for some terminal string. Suppose that G is not left-recursive.
Then each consistent left parse is of length at most c ~+2, for some c, by Lemma
4.1. There are thus a finite number of consistent left parses. U
DEFINITION
Let G = (N, 2~, P, S) be a C F G and ? a sequence of subscripted non-

terminals (indices for alternates) and terminals. Let n be a partial left parse
consistent with w. We say that ?? describes rc if the following holds"
(1) Let 7t=p~ . . . p , , and S = ~0 p,=->~l p~==~0~2 . . . P~==>~k" Let ~ = xtfl~,
where fl~ is e or begins with a nonterminal.
(2) Then 7' = Ailw~Ai~w2 "'" At~wk, where At, is the index for the pro-
duction applied going from 0cj_~ to ~j, and wj is the suffix of xj such that
xj = x~_ t wj.
LEMMA 4.3
Let G = (N, 2~, P, S) be a non-left-recursive grammar and no, zc~, . . . ,
z ~ , . . , be the sequence of consistent partial left parses for w. Suppose that
none of 7t0. . . . , z~t are left parses for w. Let S ~,==> ~ and S ~,+'==~ft. Write
and fl, respectively, as x ~ and yfll, where ~ and ]?~ are each either e or
begin with a nonterminal. Then in Algorithm 4.1,
(q, I, e, S$) }----(q, j~, 71, ~1 $) t--- (q, J2, 7,2, fl~ $),
where Jl = lxl + 1, Jz = lyl+ 1, and 71 and 7'2 describe z~ and rot+l,

respectively.
Proof. The proof is by induction on i. The basis, i = 0, is trivial. For

the inductive step, we need to consider two cases.
Case 1: zt~+l is a continuation of zc~. Let ~1 have first symbol A, with
alternates f l a , . . . , fie. If xAflj is not consistent with the input, then should
Algorithm 4.1 replace A by flj, rules (d), (e), and (fi) ensure that the alter-
nate fli+~ will be the next expansion tried. Since we assumed zct+l to be
a continuation of ztt, the desired alternate for A will subsequently be tried.
After input symbols are shifted by rule (b), configuration (q, Jz, 7,z, fl15) is
reached.
Case 2: Suppose that n~+~ is a modification of rci. Then all untried alter-
nates for A immediately lead to backtracking, and by rules (e) and (fiii),
the contents of L1 will eventually describe 7~j, the partial left parse mentioned
in the definition of a modification. Configuration (q, J2, 7,2, fl15) is then
reached as in case 1. [Z]
THEOREM 4.1
Algorithm 4.1 produces a left parse for w if one exists and otherwise
emits an error message.
Proof. F r o m Lemma 4.3 we see that the algorithm cycles through all
consistent partial left parses until either a left parse is found for the input
or all consistent partial left parses are exhausted. From Lemma 4.2 we know
that the number of partial left parses is finite, so the algorithm must even-
tually terminate. [[]
4.1.4. Time and Space Complexity of the

Top-Down Parser
Let us consider a computer in which the space needed to store a con-

figuration of Algorithm 4.1 is proportional to the sum of the lengths of
the two lists, a very reasonable assumption. It is also reasonable to assume
that the time spent computing configuration C 2 from C I, if C 1 l:-C2, is
a constant, independent of the configurations involved. Under these assump-
tions, we shall show that Algorithm 4.I takes linear space and at most expo-
nential time as functions of input length. The proofs require the following
lemma, a strengthening of Lemma 4.1.
LEMMA 4.4
Let G = (N, X, P, S) be a non-left-recursive grammar. Then there exists
i
a constant c such that if A ==~ 0~ and 10~l ~ 1, then i ~ c] 0~I-
t
Proof. By Lemma 4.1, there is a constant c l such that if A ==~ e, then
i ~ c 1. Let ~ N = k and let l be the length of the longest right side of a pro-
duction. By Lemma 2.16 we can express N as {A o, A ~ , . . . , Ak_~} such that
+
if A t ==~ A~0~, then j > i. We shall prove the following statement by induction
on the parameter p = kn - - j "
t
(4.1.1) If Aj ~ g and 1~1 = n ~ 1, then i ~ klc~] oc] -- jlc~
Basis. The basis, p = 0, holds vacuously, since we assume that n ~ 1.

Induction. Assume all instances of (4.1.1) such that kn - - j < p are true.
N o w consider a particular instance with k n - j = p. Let the first step in
the derivation be A j = = ~ X i . . . X~, for r ~ I. Then we can write g =
0ct . . . g, such that Xm =~ g~, 1 ~ m ~ r, and i = 1 + i~ + . . . + ira. Let
g~ = 0c2 . . . . . g,_ ~ = e, and 0c~ ~ e. Since ~ ~ e, s exists.
Case I " X , is a nonterminal, say Ag. Then Aj==~. AgX,+~ . . - X , , so

g > j . Since k10~,[ - - g < p, we have by (4.1.1), i, ~ klc~[oc, I - glc t. Since
[0Cm] < Igl, for s + 1 ~ m ~ r, we have by (4.1.1) that i ~ k l c ~ l ~ z s i w h e n -
ever s --k 1 < m < r and am ~ e. Certainly at most l -- 1 of a ~ , . . . , ar are

e, so the sum of im over those m such that am = e is at most (l -- 1)c 1. Thus,
i= 1 -q-J1 + . . . -q-it
< 1 + (l- 1)ct + k/c~ I~1-gZc~
< kZc~ I~1-- (g -- 1)Zci
< kZcxl ~I -- jZc~.
Case 2: Xs is a terminal. It is left for the Exercises to show that in this

case i < 1 -q- ktci(Itx [ -- 1) ~ klcl I~1 - j l c , .
i
We conclude from (4.1.1) that if S ==~ a and l a] ~ 1, then i_< k/c 1 I~1.
Let c = k/c I to conclude the lemma. E]
COROLLARY 1
Let G = (N, ~, P, S) be a non-left-recursive grammar. Then there exists
i
a constant c' such that if S =-~ wAa and w ~ e, then i < c'[w I.
lm
Proof. By the corollary to Lemma 4.1, there is a constant c" such that
la [ ~ c"] w 1. By Lemma 4.4, i_~ c[ wAoc [. Since I wan[ ~ (2 -q- c") [w l, the
choice c' = c(2 -q-- c") yields the desired result. ~ .
COROLLARY 2
a constant k such that if 7t is a partial left parse consistent with sentence w,
and S " = ~ xe, where e is either e or begins with a nonterminal, then
I~1 _< k(Iwl + ~).
Proof. If x -~ e, then by Corollary 1, we have 17tl _~ c'lx 1. Certainly,
Ixt_<lwl, so l nl _< c'lwl. As an exercise, we can show that if x = e, then
l r~l _< c'. Thus, 17tl _< c'(lw[ + 1) in either case. [Z]
THEOREM 4.2
There is a constant c such that Algorithm 4.1, with input w of length
n ~ 1, uses no more than cn cells, if one cell only is needed for each symbol
on the two lists of the configurations.
Proof. Except possibly for the last expansion made, list L2 is part of
a left-sentential form a such that S ~=-~ 0¢, where 7~ is a partial left parse
consistent with w. By Corollary 2 to Lemma 4.4, I nl _< k(l w l + 1). Since
there is a bound on the length of the right side of any production, say/, we
know that i a l <_ kl(i w l -+- 1) < 2kl I w !. Thus the length of L2 is no greater
than 2kl l w I -q- l -- 1 < 3kl l w 1.
List L1 consists of part of the left-sentential form ~ (most or all of the
SEC. 4.1 BACKTRACK P A R S I N G 299
terminal prefix) and in[ indices. It thus follows by Corollary 2 to Lemma

4.4 that the length of L1 is at most 2k(! ÷ 1)lw [. The sum of the two lengths
is thus proportional to lw 1.
THEOREM 4.3
There is a constant c such that Algorithm 4.1, when its input w is of length
n ~> 1, makes no more than e" elementary operations, provided the calcu-
lation of one step of Algorithm 4.1 takes a constant number of elementary
operations.
Proof. By Corollary 2, every partial left parse consistent with w is of
length at most c~n for some ca. Thus there are at most e~ different partial
left parses consistent with w for some constant e z. Algorithm 4.1 computes
at most n configurations between configurations whose contents of L1
describe consecutive partial left parses. The total number of configurations
computed by Algorithm 4.1 is thus no more than nel. From the binomial
theorem the relation nc~ < (c 2 -+- 1)" is immediate. Choose c to be (c 2 + 1)m,
where m is the maximum number of elementary operations required to com-
pute one step of Algorithm 4.1. U
Theorem 4.3 is in a sense as strong as possible. That is, there are non-
left-recursive grammars which cause Algorithm 4.1 to spend an exponential
amount of time, because there are c" partial left parses consistent with some
words of length n.
Example 4.3
Let G = ({S}, {a, b}, P, S), where P consists of S --~ aSS l e. Let X(n) be
the number of different leftmost parses of a", and let Y(n) be the number of
partial left parses consistent with a". The following recurrence equations
define X(n) and Y(n):
X(0) = i
(4.1.2) n-I
X(n)-- Z X(i)X(n- 1--i)
i=0
Y(0)---- 2
(4.1.3) n-I
Y(n) = Y ( n - 11 + ~] X(i) Y ( n - 1 -- i)
i=0
Line (4.1.2) comes from the fact that every derivation for a sentence a"
with n ~ 1 begins with production S ---, aSS. The remaining n -- 1 a's can
be divided any way between the two S's. In line (4.1.3), the Y(n -- 1) term
corresponds to the possibility that after the first step S =~ aSS, the second
S is never rewritten; the summation corresponds to the possibility that
the first S derives a t for some i. The formula Y(0) -- 2 is from the observation
:300 GENERAL PARSING METHODS CHAP. 4
that the null derivation and the derivation S :::> e are consistent with string e.
From Exercise 2.4.29 we have
X(n)-- 1 (2~)
n+l
so Xfn) :> 2"-1. Thus
n-1
Y(n) > Y ( n - 1 ) + ~] 2'-1 Y ( n - 1 - - i )
i=0
from which Y(n) > 2" certainly follows. [-7

This example points out a major problem with top-down backtrack pars-
ing. The number of steps necessary to parse by Algorithm 4.1 can be enor-
mous. There are several techniques that can be used to speed this algorithm
somewhat. We shall mention a few of them here.
(1) We can order productions so that the most likely alternates are tried
first. However, this will not help in those cases in which the input is not
syntactically well formed, and all possibilities have to be tried.
DEFINITION
For a CFG G -- (N, Z, P, S),
FIRST k(a) -- [x t~ ::~ xfl and [xl = k or a =:~ x and i xl < k}.
lm
That is, FIRSTk(a ) consists of all terminal prefixes of length k (or less if a
derives a terminal string of length less than k) of the terminal strings that
can be derived from ~.
(2) We can look ahead at the next k input symbols to determine whether
a given alternate should be used. For example, we can tabulate, for each
alternate a, a lookahead set FIRSTk(a ). If no prefix of the remaining input
string is contained in FIRSTk(a), we can immediately reject a and try the
next alternate. This technique is very useful both when the given input is
in L(G) and when it is not in L(G). In Chapter 5 we shall see that for certain
classes of grammars the use of lookahead can entirely eliminate the need
for backtracking.
(3) We can add bookkeeping features which will allow faster backtrack-
ing. For example, if we know that the last m-productions applied have no
applicable next alternates, when failure occurs we can skip back directly to
the position where there is an applicable alternate.
(4) We can restrict the amount of backtracking that can be done. We
shall discuss parsing techniques of this nature in Chapter 6.
Another severe problem with backtrack parsing is its poor error-locating
capability. If an input string is not syntactically well formed, then a compiler
should announce which input symbols are in error. Moreover, once one error
has been found, the compiler should recover from that error so that parsing
can resume in order to detect any additional errors that might occur.
If the input string is not syntactically well formed, then the backtracking
algorithm as formulated will merely announce error, leaving the input
pointer at the first input symbol. To obtain more detailed error information,
we can incorporate error productions into the grammar. Error productions
are used to generate strings containing common syntactic errors and would
make syntactically invalid strings well formed. The production numbers in
the output corresponding to these error productions can then be used to
signal the location of errors in the input string. However, from a practical
point of view, the parsing algorithms presented in Chapter 5 have better
error-announcing capabilities than backtracking algorithms with error pro-
ductions.
4.1.5. Bottom-Up Parsing
There is a general approach to parsing that is in a sense opposite to that

of top-down parsing. The top-down parsing algorithm can be thought of
as building the parse tree by trial and error from the root (top) and proceed-
ing downward to the leaves. Its opposite, bottom-up parsing, starts with
the leaves (i.e., the input symbols themselves) and attempts to build the tree
upwards toward the root.
We shall describe a formulation of bottom-up parsing that is called shift-
reduce parsing. The parsing proceeds using essentially a right parser cycling
through all possible rightmost derivations, in reverse, that are consistent
with the input. A move consists of scanning the string on top of the pushdown
list to see if there is a right side of a production that matches the symbols
on top of the list. If so, a reduction is made, replacing these symbols by the
left side of the production. If more than one reduction is possible, we order
the possible reductions in some arbitrary manner and apply the first.
If no reduction is possible, we shift the next input symbol onto the push-
down list and proceed as before. We shall always attempt to make a reduc-
tion before shifting. If we come to the end of the string and no reduction is
possible, we backtrack to the last move at which we made a reduction. If
another reduction was possible at that point we try that.
Let us consider a grammar with productions S - - ~ AB, A - - . ab, and
B --~ aba. Let the input string be ababa. We would shift the first a on the push-
down list. Since no reduction is possible, we would then shift the b on the
pushdown list. We then replace ab on top of the pushdown list by A. At this
point we have the partial tree of Fig. 4.6(a).
As the A cannot be further reduced, we shift a onto the pushdown list.
Again no reduction is possible, so we shift b onto the pushdown list. We can
then reduce ab to A. We now have the partial tree of Fig. 4.6(b).
a b a
(a)
/~.b a/A~b
(b)
A B
a b a b a
(c)
Ajs
a/ ~b a/!~a (d)
Fig. 4.6 P a r t i a l p a r s e trees in b o t t o m -
up parse.
We shift a on the pushdown list and find that no reductions are possible.
We then backtrack to the last position at which we made a reduction, namely
where the pushdown list contained Aab (b is on top here) and we replaced
ab by A, i.e., when the partial tree was that of Fig. 4.6(a). Since no other
reduction is possible, we now shift instead of reducing. The pushdown list
now contains Aaba. We can then reduce aba to B, to obtain Fig. 4.6(c).
Next, we replace AB by S and thus have a complete tree, shown in Fig. 4.6(d).
This method can be viewed as considering all possible sequences of moves
of a nondeterministic right parser for a grammar. However, as with top-
down parsing, we must avoid situations in which the number of possible
moves is infinite.
One such pitfall occurs when a grammar has cycles, that is, derivations
+
of the form A ~ A for some nonterminal A. The number of partial trees

can be infinite in this case, so we shall rule out grammars with cycles. Also,
e-productions cause difficulty, since we can make an arbitrary number of
reductions in which the empty string is "reduced" to a nonterminal. Bottom-
up parsing can be extended to embrace grammars with e-productions, but
for simplicity we shall choose to outlaw e-productions here.
sEc. 4.1 BACKTRACKPARSING 303
ALGORITHM 4.2
Bottom-up backtrack parsing.
Input. C F G G = (N, E, P, S) with no cycles or e-productions, whose
productions are numbered 1 to p, and an input string w = alaz .. • a,, n ~ 1.
Output. One right parse for w if one exists. The output "error" otherwise.
Method.
(1) Order the productions arbitrarily.
(2) We shall couch our algorithm in the 4-tuple configurations similar
to those used in Algorithm 4.1. In a configuration (s, i, a, fl)
(a) s represents the state of the algorithm.
(b) i represents the current location of the input pointer. We assume
the n + 1st input symbol is $, the right endmarker.
(c) a represents a pushdown list L1 (whose top is on the right).
(d) fl represents a pushdown list L2 (whose top is on the left).
As before, the algorithm can be in one of three states q, b, or t. L1 will hold
a string of terminals and nonterminals that derives the portion of input to
the left of the input pointer. L2 will hold a history of the shifts and reductions
necessary to obtain the contents of L1 from the input.
(3) The initial configuration of the algorithm is (q, 1, $, e).
(4) The algorithm itself is as follows. We begin by trying to apply step 1.
Step 1" Attempt to reduce
(q, i, ~fl, ?) ~ (q, i, ~A, j~,)
provided A --~ fl is the jth production in P and fl is the first right side in
the linear ordering in (1) that is a suffix of aft. The production number is
written on L2. If step 1 applies, return to step 1. Otherwise go to step 2.
Step 2: Shift
(q, i, oc, r) [-- (q, i -k 1, oca~,s?)
provided i :/= n -k 1. Go to step 1.

If i -- n -+- 1, instead go to step 3.
If step 2 is successful, we write the ith input symbol on top of L1, incre-
ment the input pointer, and write s on L2, to indicate that a shift has been
made.
Step 3: Accept
(q, n --Jr-1, $S, ?) ~ (t, n --t- 1, $S, ?)
Emit h(?), where h is the homomorphism

h(s) = e
h(j) = j for all production numbers
h(7) is a right parse of w in reverse. Then halt.

If step 3 is not applicable, go to step 4.
Step 4" Enter backtracking mode
(q,n + 1, ~x, ~) I-- (b,n + 1, o~, 7)
provided ~ -~ $S. Go to step 5.

Step 5" Backtracking
(a) (b, i, ~A, Jr) ~ (q, i, rE'B, k~,)
if the jth production in P is A ----~fl and the next production in the ordering
of (1) whose right side is a suffix of ~fl is B ~ fl', numbered k. Note that
t~fl = t~'fl'. Go to step 1. (Here we have backtracked to the previous reduc-
tion, and we try the next alternative reduction.)
(b) (b, n + 1, ~A, jy) ~ (b, n -q- 1, t~fl, y)
if the jth production in P is A ---~ fl and no other alternative reductions of

~fl remain. Go to step 5. (If no alternative reductions exist, "undo" the reduc-
tion and continue backtracking when the input pointer is at n q- 1.)
(c) (b, i, ~A, Jr) ~ (q, i --k 1, ~fla, sT)
if i ~ n -t- 1, the jth production in P is A ~ fl, and no other alternative

reductions of ~fl remain. Here a = a~ is shifted onto L1, and an s is entered
on L2. Go to step 1.
Here we have backtracked to the previous reduction. No alternative
reductions exist, so we try a shift instead.
(d) (b, i, ~a, sy) ~ (b, i -- 1, t~, r)
if the top entry on L2 is the shift symbol. (Here all alternatives at position i
have been exhausted, and the shift action must be undone. The input pointer
moves left, the terminal symbol a t is removed from L1 and the symbol s is
removed from L2.) D
Example 4.4
Let us apply this bottom-up parsing algorithm to the grammar G with
productions
sEc. 4.1 BACKTRACK PARSING 305
(1) E - - - ~ E + T
(2) E ~T
(3) T ~ T, F
(4) T~ ~F
(5) F~ ~ a
If E + T appears on top of L1, we shall first try reducing, using E ---~ E + T

and then using E ~ T. If T . F appears on top of L1, we shall first try
T----, T , F and then T---. F. With input a • a the bottom-up algorithm would
go through the following configurations:
(q, 1, $, e) ~ (q, 2, $a, s)

}-- (q, 2, $F, 5s)
(q, 2, ST, 45s)
~- (q, 2, $E, 245s)
1-- (q, 3, $E ,, s245s)
1-.- (q, 4, $E • a, ss245s)
(q, 4, $E • F, 5ss245s)
t-- (q, 4, S E . T, 45ss245s)
I-- (q, 4, $E • E, 245ss245s)
t-- (b, 4, $ E , E, 245ss245s)
l- (b, 4, SE • T, 45ss245s)
}--- (b, 4, $E • F, 5ss245s)
t- (b, 4, $E • a, ss245s)
1--- (b, 3, $ E . , s245s)
(b, 2, $E, 245s)
t--- (q, 3, $T,,s45s)
t--- (q, 4, $T • a, ss45s)
(q, 4, $ T , F, 5ss45s)
(q, 4, ST, 35ss45s)
(q, 4, $E, 235ss45s)
I--- (t, 4, $E, 235ss45s) E]
We can prove the correctness of Algorithm 4.2 in a manner analogous
to the way we showed that top-down parsing worked. We shall outline
a proof here, leaving most of the details as Exercises.
DEFINITION
Let G = (N, E, P, S) be a CFG. We say that z~ is a partial right parse

consistent with w if there is some g in (N W E)* and a prefix x of w such that
LEMMA 4.5
Let G be a cycle-free CFG with no e-productions. Then there is a constant

c such that the number of partial right parses consistent with an input of
length n is at most c".
Proof. Exercise. [~]
THEOREM 4.4
Algorithm 4.2 correctly finds a right parse of w if one exists, and signals
an error otherwise.
Proof By Lemma 4.5, the number of partial right parses consistent with
the input is finite. It is left as an Exercise to show that unless Algorithm 4.2
finds a parse, it cycles through all partial right parses in a natural order.
Namely, each partial right parse can be coded by a sequence of production
indices and shift symbols (s). Algorithm 4.2 considers each such sequence
that is a partial right parse in a lexicographic order. That lexicographic order
is determined by an order of the symbols, placing s last, and ordering the
production indices as in step 1 of Algorithm 4.2. Note that not every sequence
of such symbols is a consistent partial right parse. [-7
Paralleling the analysis for Algorithm 4.1, we can also show that the
lengths of the lists in the configurations for Algorithm 4.2 remain linear in
the input length.
THEOREM 4.5
Let one cell be needed for each symbol on a list in a configuration of
Algorithm 4.2, and let the number of elementary operations needed to com-
pute one step of Algorithm 4.2 be bounded. Then for some constants c a and
c 2, Algorithm 4.2 requires can space and c~ time, when given input of length
n~l.
Proof Exercise. [~]
There are a number of modifications we can make to the basic bottom-up

parsing algorithm in order to speed up its operations"
(1) We can add "lookahead" so that if we find that the next k symbols
to the right of the input pointer could not possibly follow an A in any right-
sentential form, then we do not make a reduction according to any A-pro-
duction.
EXERCISES 307
(2) We can attempt to order the reductions so that the most likely reduc-
tions are made first.
(3) We can add information to determine whether certain reductions
will lead to success. For example, if the first reduction uses the production
A ~ ax . . . ak, where a~ is the first input symbol and we know that there is
no y in Z* such that S ~ Ay, then this reduction can be immediately ruled
out. In general we want to be sure that if $~ is on L1, then ~ is the prefix
of a right-sentential form. While this test is complicated in general, certain
notions, such as precedence, discussed in Chapter 5, will make it easy to rule
out many ~'s that might appear on L1.
(4) We can add features to make backtracking faster. For example, we
might store information that will allow us to directly recover the previous
configuration at which a reduction was made.
Some of these considerations are explored in Exercises 4.1.12-4.1.14 and
4.1.25. The remarks on error detection and recovery with the backtracking
top-down algorithm also apply to the bottom-up algorithm.
EXERCISES

S----~ ASIa
A ~ bSA[b
What sequence of steps is taken by Algorithm 4.1 if the order of alter-

nates is as shown, and the input is
(a) ba?
(b) baba ?
What are the sequences if the order of alternates is reversed ?
4.1.2. Let G be the grammar
S----> S A I A
A ~ aAlb
What sequence of steps are taken by Algorithm 4.2 if the order of

reductions is longest first, and the input is
(a) ab ?
(b) abab ?
What if the order of choice is shortest first ?
4.1.3. Show that every cycle-free CFG that does not generate e is right-
covered by one for which Algorithm 4.2 works, but may not be left-
covered by any for which Algorithm 4.1 works.
"4.1.4. Show that the solution to the recurrence
D(1) = 1
D(d) = ( D ( d - 1) 2) + 1
is D(d) = [k~], where k is a real number and [x] is the greatest integer
_< x. Here, k = 1.502837 . . . .
4.1.5. Complete the proof of Corollary 2 to Lemma 4.4.
4.1.6. Modify Algorithm 4.1 to refrain from using an alternate if it is impos-
sible to derive the next k input symbols, for fixed k, from the resulting
left-sentential form.
4.1.7. Modify Algorithm 4.1 to work on an arbitrary grammar by putting
bounds on the length to which L1 and L2 can grow.
*'4.1.8. Give a necessary and sufficient condition on the input grammar such
that Algorithm 4.1 will never enter the backtrack mode.
4.1.9. Prove L e m m a 4.5.
4.1.11. Modify Algorithm 4.2 to work for an arbitrary C F G by bounding the
length of the lists L1 and L2.
4.1.12. Modify Algorithm 4.2 to run faster by checking that the partial right
parse together with the input to the right of the pointer does not con-
tain any sequence of k symbols that could not be part of a right-
sentential form.
4.1.13. Modify Algorithms 4.1 and 4.2 to backtrack to any specially designated
previous configuration using a finite number of reasonably defined
elementary operations.
*'4.1.14. Give a necessary and sufficient condition on a grammar such that
Algorithm 4.2 will operate with no backtracking. What if the modifi-
cation of Exercise 4.1.12 is first m a d e ?
4.1.15. Find a cycle-free grammar with no e-productions on which Algorithm
4.2 takes an exponential amount of time.
4.1.16. Improve the bound of Lemma 4.4 if the grammar has no e-productions.
4.1.17. Show that if a grammar G with no useless symbols has either a cycle
or an e-production, then Algorithm 4.2 will not terminate on any
sentence not in L(G).
DEFINITION
We shall outline a programming language in which we can write
nondeterministic algorithms. We call the language N D F (nondeter-
ministic F O R T R A N ) , because it consists of F O R T R A N - l i k e statements
plus the statement C H O I C E (nl . . . . . nk), where k ~ 2 and nl . . . . . nk
are statement numbers.
To define the meaning of an N D F program, we postulate the
existence of an interpreter capable of executing any finite number of
EXERCISES 309
programs in a round robin fashion (i.e., working on the compiled

version of each in turn for a fixed number of machine operations).
We assume that the meaning of the usual F O R T R A N statements is
understood. However, if the statement CHOICE (nl . . . . . nk) is ex-
ecuted, the interpreter makes k copies of the program and its entire
data region. Control is transferred to statement nt in the ith copy of
the program for 1 ~ i < k. All output appears on a single printer,
and all input is received from a single card reader (so that we had
better read all input before executing any CHOICE statement).
Example 4.5
The following N D F program prints the legend NOT A PRIME
one or more times if the input is not a prime number, and prints
nothing if it is a prime"
READ N
I=l
PICK A VALUE OF I G R E A T E R T H A N 1
1 I=I+l
CHOICE (1, 2)
2 IF (I .EQ. N) STOP
F I N D IF I IS A DIVISOR OF N A N D NOT EQUAL
TO N
IF ( ( N / I ) , I .NE. N) STOP
WRITE ("NOT A PRIME")
STOP
"4.1.38. Write an N D F program which prints all answers to the "eight queens
problem." (Select eight points on an 8 x 8 grid so that no two lie on
any row, column, or diagonal line.)
'4.1.19. Write N D F programs to simulate a left or right parser.
It would be nice if there were an algorithm which determined if a
given N D F program could run forever on some input. Unfortunately
this is not decidable for F O R T R A N , or any other programming lan-
guage. However, we can make such a determination if we assume that
branches (from IF and assigned GOTO statements, not CHOICE
statements) are externally controlled by a "demon" who is trying to
make the program run forever, rather than by the values of the pro-
gram variables. We say an N D F program is halting if for each input
there is no sequence of branches and nondeterministic choices that
cause any copy of the program to run beyond some constant number
of executed statements, the constant being a function of the number of
input cards available for data. (Assume that the program halts if it
attempts to read and no data is available.)
• 4.1.20. Give an algorithm to determine whether an N D F program is halting
under the assumption that no D O loop index is ever decremented.
"4.1.21. Give an algorithm which takes a halting N D F program and constructs
from it an equivalent A L G O L program. By "equivalent program,"
we have to mean that the A L G O L program does input and output
in an order which the N D F program might do it, since no order for the
N D F program is known. ALGOL, rather than F O R T R A N , is preferred
here, because recursion is very convenient.
4.1.22. Let G = (N, ~, P, S) be a CFG. From G construct a C F G G' such
that L(G') = ~* and if S '~=> w, then S ~==~ w.
G G'
A P D T (with pushdown top on the left) that behaves as a nondeter-
ministic left-corner parser for a grammar can be constructed from the
grammar. The parser will use as pushdown symbols nonterminals,
terminals, and special symbols of the form [A, B], where A and B are
nonterminals.
Nonterminals and terminals appearing on the pushdown list are
goals to be recognized top-down. In a symbol [A, B], A is the current
goal to be recognized and B is the nonterminal which has just been
recognized bottom-up. F r o m a C F G G = (N, 2~, P, S) we can construct
a P D T M = ([q}, E, N x N W N W E, A, ~, q, S, ~ ) which will be a
left-corner parser for G. Here A = [ 1, 2 . . . . , p} is the set of production
numbers, and 5 is defined as follows:
(1) Suppose that A ~ ot is the ith production in P.
(a) If 0t is of the form Bfl, where B E N, then O(q, e, [C, B])
contains (q, fl[C, A], i) for all C ~ N. Here we assume that
we have recognized the left-corner B bottom-up so we
establish the symbols in fl as goals to be recognized top-
down. Once we have recognized fl, we shall have recognized
an A.
(b) If 0t does not begin with a nonterminal, then O(q, e, C)
contains (q, a[C, A], i) for all nonterminals C. Here, once
is recognized, the nonterminal A will have been recognized.
(2) 6(q, e, [A, A]) contains (q, e, e) for all A ~ N. Here an instance
of the goal A which we have been looking for has been recognized. If
this instance of A is not a left corner, we remove [A, A] from the push-
down list signifying that this instance of A was the goal we sought.
(3) O(q, a, a ) = [(q, e, e)} for all a ~ E. Here the current goal is
a terminal symbol which matches the current input symbol. The goal,
being satisfied, is removed.
M defines the translation
[(w, rc)lw ~ L(G) and ~ is a left-corner parse for w}.

EXERCISES 311
Example 4.6
Consider the C F G G = (N, ~, P, S) with the productions
(1) E ~ E q- T
(2) E ~ T
(3) T - >F t T
(4) T ~F
(5) F - > (e)
(6) F - - + a
A nondeterministic left-corner parser for G is the P D T
M = ([q}, ~, N x N U N U ~ , { 1 , 2 . . . . ,6}, ~,q, E, Z~)
where ~ is defined as follows for all A ~ N:

(1) (a) 5(q, e, [A, E]) contains (q, ÷ T[A, El, 1).
(b) O(q, e, [A, T]) contains (q, [A, E], 2).
(c) 5(q, e, [A, F]) contains (q, t T[A, T], 3) and (q, [A, T], 4)
(d) O(q, e, A) = [(q, (E)[A, F], 5), (q, a[A, F], 6)}.
(2) 6(q, e, [A, A]) contains (q, e, e).
(3) O(q, a, a) = {(q, e, e)} for all a ~ E.
Let us parse the input string a t a + a using M. The derivation
tree for this string is shown in Fig. 4.7. Since the P D T has only one
state, we shall ignore the state. The P D T starts off in configuration
( a t a Jr- a, E, e)
The second rule in (1 d) is applicable (so is the first), so the PDT can go
into configuration
(a t a -+- a, a[E, F], 6)
!
T F
I
a
1 F
1
a
I Fig. 4.7 Derivation tree for a t a -k a.
31 2 GENERALPARSING METHODS CHAP. 4
Here, the left-corner a has been generated using production 6. The

symbol a is then compared with the current input symbol to yield
(1" a + a, [E, F], 6)
We can then use the first rule in (lc) to obtain
(1" a q- a, "1"T[E, T], 63)
Here we are saying that the left corner of the production T ~ F 1" T
will now be recognized once we find t and T. We can then enter the
following configurations:
(a + a, TIE, T], 63) ~---(a + a, a[T, F][E, T], 636)

( + a, [ T, F] [E, T], 636)
l--- (+ a, [T, T][E, T], 6364)
At this point T is the current goal and an instance of T which is not a

left corner has been found. Thus using rule (2) to erase the goal, we
obtain
( + a, [E, T], 6364)
Continuing, we can terminate with the following sequence of configura-

tions:
( + a, [E, El, 63642) ~ (+ a, + T[E, E], 636421)
1--- (a, TIE, El, 636421)
(a, a[T, F][E, El, 6364216)
(e, [T, F][E, E], 6364216)
(e, IT, T][E, El, 63642164)
(e, [E, E], 63642164)
F- (e, e, 63642164) D
"4.1.23. Show that the construction above yields a nondeterministic left-corner

parser for a CFG.
4.1.24. Construct a left-corner backtrack parsing algorithm.
Let G = (N, X~,P, S) be a CFG which contains no production with
a right side of length 0 or 1. (Every CFL L such that w ~ L implies
that [ w I>_ 2 has such a grammar.) A nondeterministic shift-reduce right
parser for G can be constructed such that each entry on the pushdown
list is a pair of the form (X, Q), where X ~ N u E u {$} ($ is an
endmarker for the pushdown list) and Q is the set of productions P
with an indication of all possible prefixes of the right side of each
production which could have been recognized to this point. That is,
Q will be P with dots placed between some of the symbols of the
right sides. There will be a dot in front of Xt in the production
A----~ X~ Xz .." X, if and only if X1 -.. X,-1 is a suffix of the list of

grammar symbols on the pushdown list.
A shift move can be made if the current input symbol is the con-
tinuation of some production. In particular, a shift move can always
be made if A ---~ 0¢ is in P and the current input symbol is in FIRST i(~).
A reduce move can be made if the end of the right side of a pro-
duction has been reached. Suppose that A ~ tz is such a production.
To reduce we remove It~f entries from the top of the pushdown list. If
(X, Q) is the entry now on top of the pushdown list, we then write
(A, Q') on the list, where Q' is computed from Q by assuming that
an A has been recognized. That is, Q' is formed from Q by moving all
dots that are immediately to the left of an A to the right of A and
adding dots at the left end of the right sides if not already there.
Example 4.7
Consider the grammar S - - ~ Scl ab and the input string abc. Ini-
tially, the parser would have ($, Q0) on the pushdown list, where Q0
is S---~. Scl .ab. We can then shift the first input symbol and write
(a, Q1) on the pushdown list, where Q1 is S - - - , . Sc[. a.b. Here, we
can be beginning the productions S ~ Scl ab, or we could have seen
the first a of production S ~ ab. Shifting the next input symbol b,
we would write (b, Qz) on the pushdown list, where Q1 is
S - - ~ . Sc] .ab.. We can then reduce using production S ~ ab. The
pushdown list would now contain ($, Qo)(S, Q3), where Q3 is
S ~ .S.c[.ab. [[]
Domolki has suggested implementing this algorithm using a binary

matrix M to represent the productions and a binary vector V to store
the possible positions in each production. The vector V can be used in
place of Q in the algorithm above. Each new vector on the pushdown
list can be easily computed from M and the current value of V using
simple bit operations.
4.1.25. Use Domolki's algorithm to help determine possible reductions in
Algorithm 4.2.
BIBLIOGRAPHIC NOTES
Many of the early compiler-compilers and syntax-directed compilers used

nondeterministic parsing algorithms. Variants of top-down backtrack parsing
methods were used in Brooker and Morris' compiler-compiler [Rosen, 1967b] and
in the META compiler writing systems [Schorre, 1964]. The symbolic program-
ming system C O G E N T simulated a nondeterministic top-down parser by carrying
along all viable move sequences in parallel [Reynolds, 1965]. Top-down back-
tracking methods have also been used to parse natural languages [Kuno and
Oettinger, 1962].
One of the earliest published parsing algorithms is the essentially left-corner

parsing algorithm of Irons [1961]. Surveys of early parsing techniques are given
by Floyd [1964b], Cheatham and Sattley [1963], and Griffiths and Petrick [1965].
Unger [1968] describes a top-down algorithm in which the initial and final sym-
bols derivable from a nonterminal are used to reduce backtracking. Nondeter-
ministic algorithms are discussed by Floyd [1967b].
One implementation of Domolki's algorithm is described by Hext and Roberts
[1970].
The survey article by Cohen and Gotlieb [1970] describes the use of list struc-
ture representations for context-free grammars in backtrack and nonbacktrack
parsing algorithms.
4.2, TABULAR PARSING METHODS
We shall study two parsing methods that work for all context-free gram-
mars, the Cocke-Younger-Kasami algorithm and Earley's algorithm. Each
algorithm requires n 3 time and n 2 space, but the latter requires only n z time
when the underlying grammar is unambiguous. Moreover, Earley's algorithm
can be made to work in linear time and space for most of the grammars which
can be parsed in linear time by the methods to be discussed in subsequent
chapters.
4.2.1. The Cocke-Younger-Kasami Algorithm
In the last section we observed that the top-down and bottom-up back-
tracking methods may take an exponential amount of time to parse according
to an arbitrary grammar. In this section, we shall give a method guaranteed
to do the job in time proportional to the cube of the input length. It is essen-
tially a "dynamic programming" method and is included here because of
its simplicity. It is doubtful, however, that it will find practical use, for three
reasons:
(1) n 3 time is too much to allow for parsing.
(2) The method uses an amount of space proportional to the square of
the input length.
(3) The method of the next section (Earley's algorithm) does at least as
well in all respects as this one, and for many grammars does better.
The method works as follows. Let G = (N, Z, P, S) be a Chomsky normal
form C F G with no e-production. A simple generalization works for non-
C N F grammars as well, but we leave this generalization to the reader. Since
a cycle-free C F G can be left- or right-covered by a C F G in Chomsky normal
form, the generalization is not too important.
Let w = a l a 2 . . . a n be the input string which is to be parsed according
to G. We assume that each a~ is in X for 1 < i < n. The essence of the
SEC. 4.2 TABULAR PARSING METHODS 315
algorithm is the construction of a triangular p a r s e t a b l e T, whose elements

we denote t~j for 1 ~ i ~ n and 1 < j < n -- i + 1. Each tij will have a value
+
which is a subset of N. Nonterminal A will be in tij if and only if A =~

a~a;+~ • . . a~+j_~, that is, if A derives the j input symbols beginning at posi-
tion i. As a special case, the input string w is in L ( G ) if and only if S is in t l,.
Thus, to determine whether string w is in L ( G ) , we compute the parse
table T for w and look to see if S is in entry t 2,. Then, if we want one (or all)
parses of w, we can use the parse table to construct these parses. Algorithm
4.4 can be used for this purpose.
We shall first give an algorithm to compute the parse table and then
the algorithm to construct the parses from the table.
ALGORITHM 4.3
C o c k e - Y o u n g e r - K a s a m i parsing algorithm.
I n p u t . A Chomsky normal form C F G G = (N, X, P, S) with no e-pro-
duction and an input string w = a l a 2 . . . a , in X +.
Output. The parse table T for w such that tij contains A if and only if
+
A ==~ a~a~+ ~ • • • ai+ j_ 1.
Method.
(1) Set tit = [A I A ~ a~ is in P} for each i. After this step, if t~1 contains
+
A, then clearly A ==~ a t.

(2) Assume that t~j, has been computed for all i, 1 < i < n, and all j ' ,
1 < j ' < j . Set
t~y = {A if or some k , 1 < k < j , A ~ BC is in P,

B is in t~k, and C is in ti+k,j_k}.'l"
Since I < k < j, both k and j -- k are less than j. Thus both t~k and ti+k,j_ k
are computed before t~j is computed. After this step, if t~j contains A, then
+ +
A ~ BC ~ ai • • • ai+k_aC ~ a,. • • • a i + k _ i a i + k • • • ai+j_ 1.
(3) Repeat step (2) until tij is known for all 1 _~ i < n, and 1 < j
n--i+l.
Example 4.8
Consider the C N F grammar G with productions
tNote that we are not discussing in detail how this is to be done. Obviously, the com-
putation involved can be done by computer. When we discuss the time complexity of
Algorithm 4.3, we shall give details of this step that enable it to be done efficiently.
316 GENERAL PARSING METHODS cam. 4
S > AA IASIb
A > SA IASla
Let abaab be the input string. The parse table T that results from Algorithm
4.3 is shown in Fig. 4.8. F r o m step (1), tit = {A} since A ~ a is in P and
at = a. In step (2) we add S to t32, since S ---~ A A is in P and A is in both
t3t and t4t. Note that, in general, if the tt~'s are displayed as shown, we can
A,S A,S
A,S S A,S
A,S A S A,S
jt 1 A S A A S
i~ 1 2 3 4 5 Fig. 4.8 Parse table T.
compute ttj, i > 1, by examining the nonterminals in the following pairs of

entries"
(ttl, t,+i.~-1), (t,2, t,+z,j-2), • • •, (t,.j-1, t,+j-l.,)
Then, if B is in tie and C is in t~+k.j-k for some k such that 1 ~ k < j and
A ~ B C is in P, we add A to ttj. That is, we move up the ith column and
down the diagonal extending to the right of cell ttj simultaneously, observing
the nonterminals in the pairs of cells as we go.
Since S is in t~5, abaab is in L(G). [B
THEOREM 4.6
If Algorithm 4.3 is applied to C N F g r a m m a r G and input string a~ .. • a,,
+
then upon termination, A is in t~j if and only if A ==~ a s . . . a~+~_~.
Proof. The proof is a straightforward induction on j and is left for the
Exercises. The most difficult step occurs in the "if" portion, where one must
+
observe that i f j > 1 and A ==~ a s . . . at+i_~, then there exist nonterminals
+
B and C and integer k such that A ~ B C is in P, B ~ a~ . . . an+k_ 1, and
+
C ~ at+ k • • • a,+j_ l. E]
Next, we show that Algorithm 4.3 can be executed on a random access

computer in n 3 suitably defined elementary operations. For this purpose, we
SEC. 4.2 TABULAR PARSING METHODS 31 '7
shall assume that we have several integer variables available, one of which
is n, the input length. An elementary operation, for the purposes of this discus-
sion, is one of the following'
(1) Setting a variable to a constant, to the value held by some variable,
or to the sum or difference of the value of two variables or constants;
(2) Testing if two variables are equal,
(3) Examining and/or altering the value of t~j, if i and j are the current
values of two integer variables or constants, or
(4) Examining a,, the ith input symbol, if i is the value of some variable.
We note that operation (3) is a finite operation if the grammar is known
in advance. As the grammar becomes more complex, the amount of space
necessary to store t,.j and the amount of time necessary to examine it both
increase, in terms of reasonable steps of a more elementary nature. However,
here we are interested only in the variation of time with input length. It is
left to the reader to define some more elementary steps to replace (3) and
find the functional variation of the computation time with the number of
nonterminais and productions of the grammar.
CONVENTION
We take the notation '~f(n) is 0(g(n))" to mean that there exists a constant
k such that for all n ~ 1, f(n) ~ kg(n). Thus, when we say that Algorithm
4.3 operates in time 0(n3), we mean that there exists a constant k for which
it never takes more than kn 3 elementary operations on a word of length n.
THEOREM 4.7
Algorithm 4.3 requires 0(n 3) elementary operations of the type enumerated
above to compute tij for all i and j.
Proof To compute tit for all i merely requires that we set i = 1 [opera-
tion(l)], then repeatedly set t~l to ~A]A --~ a~ is in P~} [operations (3) and (4)],
test if i = n [operation (2)], and if not, increment i by I [operation (1)].
The total number of elementary operations performed is 0(n).
Next, we must perform the following steps to compute t~j"
(1) Set j -- 1.
(2) Test ifj -- n. If not, increment j by 1 and perform line(j), a procedure
to be defined below.
(3) Repeat step (2) until j = n.
Exclusive of operations required for line(j), this routine involves 2 n - 2
elementary operations. The total number of elementary operations required
for Algorithm 4.3 is thus 0(n) plus ~ = 2 l(j), where l(j)is the number of
elementary operations used in line(j). We shall show that l(j) is 0(n 2) and
thus that the total number of operations is 0(n0.
The procedure line(j) computes all entries t~j such that 1 _< i < n -
j -I-- 1. It embodies the procedure outlined in Example 4.8 to compute tij. It is
defined as follows (we assume that all t~j initially have value N)'
(1) Let i = 1 a n d j ' = n - - j - k 1.
(2) L e t k = 1.
(3) Let k' -- i q- k and j " = j - k.
(4) Examine t;k and tk,i,,. Let
t~j --- t~j u { A [ A - - , B C is in P, B in t,.k, and C in tk,j,,].
(5) Increment k by 1.
(6) If k - - j , go to step (7). Otherwise, go to step (3).
(7) If i-----j', halt. Otherwise do step (8).
(8) Increment i by 1 and go to step (2).
We observe that the above routine consists of an inner loop, (3)-(6),
and an outer loop, (2)-(8). The inner loop is executedj -- 1 times (for values
of k from 1 to j - 1) each time it is entered. At the end, t,.j has the value
defined in Algorithm 4.3. It consists of seven elementary operations, and so
the inner loop uses 0(j) elementary operations each time it is entered.
The outer loop is entered n -- j -Jr- 1 times and consists of 0(j) elementary
operations each time it is entered. Since j ~ n, each computation of line(j)
takes 0(n 2) operations.
Since line(j) is computed n times, the total number of elementary opera-
tions needed to execute Algorithm 4.3 is thus 0(n3). [--]
We shall now describe how to find a left parse from the parse table.
The method is given by Algorithm 4.4.
ALGORITHM 4.4
Left parse from parse table.
Input. A Chomsky normal form C F G G - - ( N , E, P, S) in which the
productions in P are numbered from 1 to p, an input string w -- a~a 2 . . . a,,
and the parse table T for w constructed by Algorithm 4.3.
Output. A left parse for w or the signal "error."
Method. We shall describe a recursive routine gen(i,j, A) to generate
+
a left parse corresponding to the derivation A = , a i a ~ + 1 . . . ai+j_ 1. The
lm
routine gen(i, j, A) is defined as follows"
(1) I f j -- 1 and the mth production in P is A --, a,., then emit the produc-
tion number m.
(2) If j > 1, k is the smallest integer, 1 < k < j, such that for some B
in t;k and C in t~+k,j-k, A --* B C is a production in P, say the mth. (There
SEC. 4.2 TABULAR PARSING METHODS 31 9
may be several choices for A --~ B C here. We can arbitrarily choose the one
with the smallest m.) Then emit the production number m and execute
gen(i, k, B), followed by gen(i -t-- k, j - k, C).
Algorithm 4.4, then, is to execute gen(1, n, S), provided that S is in t~,. If
S is not in t,,, emit the message "error."
We shall extend the notion of an elementary operation to include the

writing of a production number associated with a production. We can then
show the following result.
THEOREM 4.8
If Algorithm 4.4 is executed with input string a I . . - a,, then it will termi-
nate with some left parse for the input if one exists. The number of elementary
steps taken by Algorithm 4.4 is 0(nZ).
P r o o f An induction on the order in which gen is called shows that
whenever gen(i,j, A) is called, then A is in t;j. It is thus straightforward to
show that Algorithm 4.4 produces a left parse.
t
To show that Algorithm 4.4 operates in time O(nZ), we prove by induction

on j that for all j a call of gen(i, j, A) takes no more than e l i 2 steps for some
constant c 1. The basis, j = 1, is trivial, since step (1) of Algorithm 4.4 applies
and uses one elementary operation.
For the induction, a call of gen(i,j, A) with j > 1 causes step (2) to be
executed. The reader can verify that there is a constant c z such that step (2)
takes no more than c z j elementary operations, exclusive of calls. If gen(i, k, B)
and gen(i q - k , j - k, C) are called, then by the inductive hypothesis, no
more than elk z + c1( j -- k) z -? c2j steps are taken by gen(i, j, A). This expres-
sion reduces to c~(j z + 2k z -- 2kj) + czj. Since 1 < k < j and j~> 2, we
know that 2k z - 2kj < 2 - 2j < --j. Thus, if we chose c, to be c 2 in the
inductive hypothesis, we would have elk z q- c ~ ( j - k) 2 -t- czj < e l i z. Since
we are free to make this choice of c~, we conclude the theorem. [--]
Example 4.9
Let G be the grammar with the productions
(1) S ~ ~ A A
(2) S ~ AS
(3) S >b
(4) A ~ SA
(5) A ~ AS
(6) A ,a
Let w = abaab be the input string. The parse table for w is given in Example
4.8.
Since S is in T 15, w is in L ( G ) . To find a left parse for abaab we call routine
gen(1, 5, S). We find A in t~l and in t24 and the production S ---~ A A in the
set of productions. Thus we emit 1 (the production number for S---~ A A )
and then call gen(1, 1, A) and gen(2, 4, A). gen(1, 1, A) gives the production
number 6. Since S is in t21 and A is in t33 and A --~ S A is the fourth produc-
tion, gen(2, 4, A) emits 4 and calls gen(2, 1, S) followed by gen(3, 3, A).
Continuing in this fashion we obtain the left parse 164356263.
Note that G is ambiguous; in fact, abaab has more than one left parse.
It is not in general possible to obtain all parses of the input from a parse
table in less than exponential time, as there may be an exponential number
of left parses for the input. D
We should mention that Algorithm 4.4 can be made to run faster if,
when we construct the parse table and add a new entry, we place pointers to
those entries which cause the new entry to appear (see Exercise 4.2.21).
4.2.2. The Parsing Method of Earley
In this section we shall present a parsing method which will parse an input
string according to an arbitrary C F G using time 0(n 3) and space 0(n2), where
n is the length of the input string. Moreover, if the C F G is unambiguous,
the time variation is quadratic, and on most grammars for programming
languages the algorithm can be modified so that both the time and space
variations are linear with respect to input length (Exercise 4.2.18). We shall
first give the basic algorithm informally and later show that the computation
can be organized in such a manner that the time bounds stated above can
be obtained.
The central idea of the algorithm is the following. Let G = (N, Z, P, S)
be a C F G and let w = a l a 2 . . . a n be an input string in Z*. An object of
the form [A--~ X i X 2 . . . X k • Xk+l... Xm, i] is called an item for w if
A --~ X'~ . . - X'~ is a production in P and 0 ~ i ~ n. The dot between X k and
Xk+~ is a metasymbol not in N or Z. The integer k can be any number
including 0 (in which case the • is the first symbol) or m (in which case it
is the last).t
For each integer j, 0 ~ j ~ n, we shall construct a list of items Ij such
that [A --~ t~ • fl, i] is in Ij for 0 ~ i ~ j if and only if for some }, and ~, we
have S ~ ~.4~, ~ ~ a~ . . . a,, and tx ~ a,.l " " aj. Thus the second com-
ponent of the item and the number of the list on which it appears bracket
the portion of the input derived from the string ~. The other conditions on
the item merely assure us of the possibility that the production A --~ t~fl
tIf the production is A ~ e, then the item is [A ~ . , i].

could be used in the way indicated in some input sequence that is consistent
with w up to position j.
The sequence of lists Io, I1,. • •, I, will be called the parse lists for the
input string w. We note that w is in L(G) if and only if there is some item of
the form [S ~ ~ . , 0] in I,.
We shall now describe an algorithm which, given any g r a m m a r , will
generate the parse lists for any input string.
ALGORITHM 4.5
Earley's parsing algorithm.
Input. C F G G = (N, E, P, S) and an input string w -- ala 2 • .. a, in Z*.
Output. The parse lists Io, I~ . . . . . I,.
Method. First, we construct Io as follows"
(1) If S --~ a is a production in P, add [S ~ • 0~, 0] to Io.
N o w perform steps (2) and (3) until no new items can be added to Io.
(2) If [B ~ y-, 0] is on Io,t add [A --~ ~B • p, 01 for all [A --- a • Bp, 0]
on I o.
(3) Suppose that [A ~ ~ . Bfl, 0] is an item in I o. A d d to Io, for all
productions in P of the form B ~ y, the item [B ~ • y, 0] (provided this item
is not already in Io).
We now construct Ij, having constructed I0, I 1 , . . . , Ij_ t-
(4) For each [B --~ 0~ • aft, i] in Ij_~ such that a -- aj, add [B --. 0~a • fl, i]
to I v.
N o w perform steps (5) and (6) until no new items can be added.
(5) Let [A --~ ~,., i] be an item in Ij. Examine It for items of the form
[B ~ 0~ • Ap, k]. F o r each one found, we add [B ~ ~A • fl, k] to Ij.
(6) Let [A ~ 0~- Bfl, i] be an item in Ij. For all B ~ 7 in P, we add
[B ~ • y, j] to Ij.
N o t e that consideration of an item with a terminal to the right of the dot
yields no new items in steps (2), (3), (5) and (6).
The algorithm, then, is to construct Ij for 0 ~ j _~ n. [~]
Example 4.10
Let us consider the g r a m m a r G with the productions
(1) E >T + E
(2) E-- >T
(3) T - - - > F , T
(4) T >F
(5) F ~ > (E)
(6) F - - - ~ a
tNote that ? can be e. This is the way rule (2) becomes applicable initially.
and let (a q - a ) , a be the input string. From step (1) we add new items
[E----,. T + E, 0] and [ E - - , . T, 0] to I 0. These items are considered by
adding to I0 the items [T ~ • F • T, 0] and [T --, • F, 0] from rule (3). Con-
tinuing, we then add [F ~ • (E), 0] and [F--~ • a, 0]. No more items can
be added to I 0.
We now construct I~. By rule (4) we add I F - - , (. E), 0], since aa -----(.
Then rule (6) causes [ E ~ . T + E , 1], [ E ~ . T , 1], [ T - - ~ . F , T , 1],
[T--, • F, 1], [F ~ • (E), 1], and [F----, • a, 1] to be added. Now, no more
items can be added to I~.
To construct 12, we note that a2 = a and that by rule (4) [F----, a . , 1] is
to be added to Iz. Then by rule (5), we consider this item by going to I~ and
looking for items with F immediately to the right of the dot. We find two,
and add [ T - - , F . • T, 1] and [T--~ F . , 1] to 12. Considering the first of
these yields nothing, but the second causes us to again examine I~, this time
for items with. T in them. Two more items are added to/2, [E ~ T . q- E, 1]
and [E--~ T . , 1]. Again the first yields nothing, but the second causes
[F ~ (E .), 0] to be added to 12. Now no more items can be added to 12,
so I2 is complete.
The values of all the lists are given in Fig. 4.9.
Io 11 Iz
[E----> • T q - E, 0] I F - - ~ (. E ) , 0 ] [F---. a . , 1]
[E~. T, 0] [E~. T - t - E , 1] [T~F..T, 1]
[T---~ • F . T, 0] [ E - - ~ • T, 1] [ T - - ~ F . , 1]
[T--~ • F, 0] [ T - - ~ . F . T , 1] [ E - - ~ T . W E , l]
[F--~ • (E), 01 [T----~ • F, 11 [E ~ T . , 1]
[F ~ ° a, 0] [F ~ • (E), 1] [ F - o (E .), 0]
[F ---~ . a , l ]
13 h ls
[E ~ T q- • E, 11 [F ~ a., 3] [F ~ ( E ) . , 0]
[E------~ • T-t- E, 31 [T----~ F. • T, 3] [T----~ F . , T, 0]
[E ~ • T, 31 [T ~ F., 31 [T ~ F . , 0]
[T---> • F , T, 3] [E ~ T. + E, 31 [E ~ T . + E, 01
[T --~ • F, 3] [E ~ T., 3] [E --~ T . , 0]
[F---~ • (E), 31 [E----* T + E . , 1]
[F ~ • a, 31 [F ~ ( E . ) , 01
16 I7
[T-.F.. T, 0] [F---~ a . , 6]
[T---, • F , T, 6] IT---, F . • T, 61
[T----~ • F, 6] [T----~ F . , 61
[F ~ • (E), 6] [T ~ F , T . , 01
[F ~ • a, 6] [E ----* T . q- E, 01
[E--~ T.,0]
Fig. 4.9 Parse lists for Example 4.10.
Since [E --~ T . , 0] is on the last list, the input is in L ( G ) .

We shall pursue the following course in analyzing Earley's algorithm.

First, we shall show that the informal interpretation of the items mentioned
before is correct. Then, we shall show that with a reasonable concept of
an elementary operation, if G is unambiguous, then the time of execution is
quadratic in input length. Finally, we shall show how to construct a parse
from the lists and, in fact, how this may be done in quadratic time.
THEOREM 4.9
If parse lists are constructed as in Algorithm 4.5, then [A ---~ t~. fl, i] is
on Ij if and only if ~ =~ a~+l "'" aj and, moreover, there are strings y and ,6
such that S => ~,AJ and 7 =~ al " " a~.

Proof
O n l y if: We shall do this part by induction on the n u m b e r of items which
have been added to I0, 11, . . . . 1i before [A ~ t~- fl, i] is added to Ij. For
the basis, which we take to be all of I0, we observe that anything added to
I 0 has a ~ e, so S =-~ 7AJ holds with ? = e.
For the inductive step, assume that we have constructed I0 and that the
hypothesis holds for all items presently on/~, i_~j. Suppose that [A ~ ~ . fl, i]
is added to Ij because of rule (4). Then a = oc'aj and [A ---~ t x ' - a j f l , i] is
on Ij_ 1. By the inductive hypothesis, o~' ~ a~+l • • " a j_ 1 and there exist
strings ~,' and 6' such that S =-~ ? ' A J ' and y ' = > a 1 • • • a r It then follows that
= t~'aj ==~ at+ 1 "'" aj, and the inductive hypothesis is satisfied with ~, = ~,'
and ~ = 6'.
Next, suppose that [,4 ---. a . fl, i] is added by rule (5). Then 0~ = ogB
for some B in N, and for some k , [,4 ~ o~' • B f l , i] is on I k. Also, [B --, 1/., k]
.
is on Ij for some I/in (N u E)*. By the inductive hypothesis, r / ~ > ak+ 1 "" • aj
and 0~'==~ a ~ + l . . . a k . Thus, tx = o g B = > a ~ + l . . . a j. Also by hypothesis,
there exist ~,' and 6' such that S ~ ?'A6' and ~,'=> a 1 .-- ai. Again, the rest
of the inductive hypothesis is satisfied with y = ~,' and J = 6'.
The remaining case, in which [A ---. ct. fl, i] is added by rule (6), has
0c = e and i = j. Its elementary verification is left to the reader, and we
conclude the "only if" portion.
I f: The "if" portion is the p r o o f of the statement
(4.2.1) If S =~ ~,AO, ~, =~ a 1 . . . a~, A ---, ctfl is in P, and tz = ~ a~+l " ' " aj,
then [A --. ~ • fl, i] is on list Ij
We must prove all possible instances of (4.2.1). Any instance can be

characterized by specifying the strings 0~, fl, y, and ~, the nonterminal A, and
the integers i and j, since S and al . . . a , are fixed. We shall denote such
an instance by [~, fl, 7, $, A , i, j]. The conclusion to be d r a w n f r o m the above

instance is that [A ----~ at • fl, i] is o n list I 1. N o t e that 7 a n d J do not figure
explicitly into the conclusion.
The p r o o f will turn on ranking the various instances a n d proving the
result by induction on the rank. The r a n k of the instance ~ = [at, fl, ~, $, A, i,j]
is c o m p u t e d as follows"
Let %(~) be the length o f a shortest derivation S ==~ ?AJ.
Let ~2(~) be the length o f a shortest derivation 7 ==~ a I . . . a~.
.
Let %(t0 be the length o f a shortest derivation at ==~ at+~ . . . a v.
The r a n k o f t~ is defined to be zl(t0 q- 2[.] -+- z2(tt) -q- z~(~)].
W e n o w prove (4.2.1) by induction on the r a n k o f an instance g =
[at, fl, 7, t~, A , i,j]. If the r a n k is 0, then ~r~(g) = zz(g) = %(g) = j = 0. W e
can conclude that at = 7 = $ = e a n d that A = S. T h e n we need to show
that [S ~ • ,0, 0] is on list 0. However, this follows immediately f r o m the
first rule for that list, as S ---~ fl must be in P.
F o r the inductive step, let ~, as above, be an instance o f (4.2.1) of some
r a n k r > 0, a n d assume that (4.2.1) is true for instances o f smaller rank.
Three cases arise, depending on whether at ends in a terminal, ends in a non-
terminal, or is e.
Case 1: ot = at'a for some a in E. Since ~ ==~ at+ 1 . . . a~, we conclude

that a = a v. Consider the instance ~' = [at', avfl , ~,, $, A, i , j -- 1]. Since
A ~ at'avfl is in P, tt' is an instance o f (4.2.1), a n d its r a n k is easily seen to be
r - 2. W e m a y conclude that [A --~ at'. a~fl, i] is on list I v_ 1. By rule (4),
[A ----~ at • fl, i] will be placed on list I v.
Case 2: at = at'B for some B in N. There is some k, i < k _< j, such that
at'=,, a~+ 1 . . . a k a n d B ==~ ak+ 1 . - . ay. F r o m the instance o f lower r a n k
~' = [at', B fl, 7, $, A , i, k] we conclude that [A ~ at'. B fl, i] is on list Ik. Let
B ==~ t / b e the first step in a m i n i m u m length derivation B ==~ ak+ 1 • • • aj.
Consider the instance a " = [t/, e, 7at', fl$, B, k , j ] . Since S ==~ 7A$ ==~
7at'Bfl$, we conclude that zl(a") _< %(a) -Jr 1. Let n 1 be the m i n i m u m num-
.
ber o f steps in a derivation at' ==~ at+ 1 .. • a k a n d n 2 be the m i n i m u m n u m b e r
in a derivation B =-~ ak+l " ' " aj. T h e n %(~) = n I -4- n2. Since B ==~ t/=-~
ak+ 1 . . " a v, we conclude that z3(a") = n2 -- 1. It is straightforward to see
that % ( t l " ) = % ( ~ ) -b n~. Hence %(~") q-- zs(a") = %(a) -k nl + nz -- 1 =
~:2(a) -Jr- z3(a) -- 1. Thus ~:l(a") + 2[j -q- z2(a") .jr_%(a")] is less than r. By the
inductive hypothesis for a " we conclude that [B --~ 1//., k] is on list Iv, a n d
with [A ~ at'- B f l , i] on list Ik, conclude by rule (2) or (5) that [A ~ at. fl, i]
is on list I v.
Case 3: at = e. We m a y conclude that i = j a n d z3(~) = 0 in this case.
Since r > 0, we may conclude 'that the derivation S =~ yA,~ is of length

greater than 0. If it were of length 0, then z l(a) -- 0. Then we would have
7 ----- e, so x2(a) = i -- 0. Since i = j and z3(a) = 0 have been shown in gen-
eral for this case, we would have r -- 0.
We can thus find some B in N and 7', 7", 6', and ~" in (N u X)* such
that S =~ y'B6'=:> y'7"A,~"6', where B ---, 7 " A 6 " is in P, 7 = Y'7", ~ = ~"6',
,
and 7'B6' is the penultimate step in some shortest derivation S = ~ 7A6.
Consider the instance a ' - - [ 7 " , A 6 " , 7', 6', B, k, j], where k is an integer
such that 7 ' = ~ al . " a k and 7 " = ~ a k + l . . . a j . Let the smallest length of
the latter derivations be nl and n2, respectively. Then Zz(a') -- n 1, "r3(a') -- n 2,
and ~2(a) -- n~ ÷ n 2. We have already argued that z3(a) -- 0, and B, 7', and
6' were selected so that zl(a') -- ~ ( a ) -- I. It follows that the rank of a' is
r -- 1. We may conclude that [B --, 7" • Ad;", k] is on list 1i. By rule (6), or
rule (3) for list I0, we place [A --, • fl, j] on list 1i. D
Note that as a special case of T h e o r e m 4.9, [S ~ a . , 0] is on list I, if

and only if S --, a is in P and a =~ a I - . . a,; i.e., al " " a, is in L(G) if and
only if [S --, a . , 0] is on list I, for some ~.
We shall now examine the running time of Algorithm 4.5. We leave it to
the reader to show that in general 0(n0 suitably defined elementary steps are
sufficient to parse any word of length n according to a k n o w n grammar.
We shall concentrate on showing that if the g r a m m a r is unambiguous, 0(n 2)
steps are sufficient.
LEMMA 4.6
Let G = (N, X, P, S) be an unambiguous g r a m m a r and a 1 . . - a, a string
in X*. Then when executing Algorithm 4.5, we attempt to add an item
[A ~ tz • fl, i] to list Ij at most once if ct ~ e.
P r o o f This item can be added only in steps (2), (4), or (5). If added in
step (4), the last symbol of ~ is a terminal, and if in steps (2) or (5), the last
symbol is a nonterminal. In the first case, the result is obvious. In the second
case, suppose that [A ---~ tx'B. fl, i] is added to list Ij when two distinct
items, [ B - - , 7 ", k] and [B----~ 6 . , l], are considered. Then it must be that
[A ~ tz' • Bfl, i] is on both list k and list 1. (The case k = l is not ruled out,
however.)
Suppose that k ~ 1. By Theorem 4.9, there exist 01, 02, 03, and 04 such
that S =-> 01A02 => 01~'Bp02 ~ a l " " ajll02 and S => 0 3 A O 4 ~ 03o(BflO 4
=> a ~ . " a j l l O , . But in the first derivation, 0 ~ ' ==> a l " ' ' a k , and in the
second, 03~' =-~ a l " " a v Then there are two distinct derivation trees for
some al " " an, with ~'B deriving ai+l " " aj in two different ways.
Now, suppose that k = l. Then it must be that 7 ~ ~. It is again easy

to find two distinct derivation trees for a 1 . . . a,. The details are left for
the Exercises. E]
We now examine the steps of Algorithm 4.5. We shall leave the definition
of "elementary operation" for this algorithm to the reader. The crucial step
in showing that Algorithm 4.5 is of quadratic time complexity is not how
"elementary operation" is defined--any reasonable set of list-processing
primitives will do. The crucial step in the argument concerns "bookkeeping"
for the costs involved. We here assume that the g r a m m a r G is fixed, so that
any processes concerning its symbols can be considered elementary. As in
the previous section, the matter of time variation with the "size" of the gram-
mar is left for the Exercises.
For I0, step (1) clearly can be done in a fixed number of elementary
operations. Step (3) for I0 and step (6) for the general case can be done in
a finite number of elementary operations each time an item is considered,
provided we keep track of those items [A ~ ~ . fl, j] which have been
added to Ij. Since g r a m m a r G is fixed, this information can be kept in a
finite table for each j. If this is done, it is not necessary to scan the entire list
I v to see if items are already on the list.
For steps (2), (4), and (5), addition of items to 1i is facilitated if we can
scan some list I~ such that i < j for all those items having a desired symbol
to the right of the dot, the desired symbol being a terminal in step (4) and
a nonterminal in steps (2) and (5). Thus we need two links from every item
on a list.
The first points to the next item on the list. This link allows us to consider
each item in turn. The second points to the next item with the same symbol
to the right of the dot. It is this link which allows us to scan a list efficiently
in steps (2), (4), and (5).
The general strategy will be to consider each item on a list once to
add new items. However, immediately upon adding an item of the form
[A ~ oc. Bfl, i] to I v, we consult the finite table for I v to determine if
[B ~ 7 ", J] is on I v for any 7. If so, we also add [A ~ ~ B . fl, i] to I v.
We observe that there are a fixed number of strings, say k, that can appear
as the first half of an item. Thus at most k ( j ÷ !) items appear on I v. If we
can show that Algorithm 4.5 spends a fixed amount of time, say c, for each
item on a list, we shall show that the amount of time taken is 0(n2), since
c ~ k( j + 1) = ½ck(n + 1 ) ( n + 2 ) ~ c ' n 2 for some constant c'

j=0
The "bookkeeping trick" is as follows. We charge time to an item, under

certain circumstances, both when it is considered and when it is entered onto
a list. The m a x i m u m amount of time charged in either case is fixed. We also
charge a fixed amount of time to the list itself.
We leave it to the reader to show that I0 can be constructed in a fixed

amount of time. We shall consider the items on lists Ij, for j > 0. In step (4)
of Algorithm 4.5 for Ij, we examine aj and the previous list. For each entry
on Ij_ 1 with a~ to the right of the dot, we add an item to Ij. As we can examine
only those items on Ij_ 1 satisfying that condition, we need charge only a finite
amount of time to each item added, and a finite amount of time to 1~ for
examining a~ and for finding the first item of Ij_ ~ with • aj in it.
Now, we consider each item on Ij and charge time to it in order to see
if step (5) or step (6) applies. We can accomplish step (6) in a finite amount
of time, as we need examine only the table associated with Ij that tells whether
all [A ~ • ~, j] have been added for the relevant A. This table can be exam-
ined in fixed time, and if necessary, a fixed number of items are added to Ij.
This time is all charged to the item considered.
If step (5) applies, we must scan some list I~, k < j , for all items having
• B in them for some particular B. Each time one is found, an item is added
to list lj, and the time is charged to the item added, not the one being con-
sidered !
To show that the amount of time charged to any item on any list is bound-
ed above by some finite number, we need observe only that by Lemma 4.6,
if the grammar is unambiguous, only one attempt will ever be made to add
an item to a list. This observation also ensures that in step (5) we do not have
to spend time checking to see if an item already appears on a list.
THEOREM 4.10
If the underlying grammar is unambiguous, then Algorithm 4.5 can be
executed in 0(n 2) reasonably defined elementary operations when the input
is of length n.
Proof. A formalization of the above argument and the notion of an ele-
mentary operation is left for the Exercises. D
THEOREM 4.11
In all cases, Algorithm 4.5 can be executed in 0(n 3) reasonably defined
elementary operations when the input is of length n.
Proof. Exercise.
Our last portion of the analysis of Earley's algorithm concerns the method
of constructing a parse from the completed lists. For this purpose we give
Algorithm 4.6, which generates a right parse from the parse lists. We choose
to produce a right parse because the algorithm is slightly simpler. A left
parse can also be found with a simple alteration in the algorithm.
Also for the sake of simplicity, we shall assume that the grammar at hand
has no cycles. If a cycle does exist in the grammar, then it is possible to have
arbitrarily many parses for some input strings. However, Algorithm 4.6 can
be modified to accommodate grammars with cycles (Exercise 4.2.23).
It should be pointed out that as for Algorithm 4.4, we can make Algorithm
4.6 simpler by placing pointers with each item added to a list in Algorithm
4.5. Those pointers give the one or two items which lead to its placement on
its list.
ALGORITHM 4.6
Construction of a right parse from the parse lists.
Input. A cycle-free C F G G = (N, X, P, S) with the productions in P
numbered from 1 to p, an input string w = a 1 . . . a n, and the parse lists
I0, I a , . . . , I, for w.
Output. 7r, a right parse for w, or an "error" message.
Method. If no item of the form [S---~ 0c., 0] is on In, then w is not in
L(G), so emit "error" and halt. Otherwise, initialize the parse 7r to e and
execute the routine R([S----~ ~z., 0], n) where the routine R is defined as
follows:
Routine R([A ---+ f l . , i], j ) :
(1) Let n be h followed by the previous value of ~z, where h is the number
of production A ---~ ft. (We assume that 7r is a global variable.)
(2) If fl = X i X z . . . Xm, set k = m and 1 = j.
(3) (a) If Xk ~ X, subtract 1 from both k a n d / .
(b) If X k ~ N, find an item [Xk ~ 7' ", r] in I~ for some r such that
[A ~ X ~ X z . . . Xk_~ . X k . . . Xm, i] is in I,. Then execute
R([Xk ---' 7' ", r], l). Subtract 1 from k and set l = r.
(4) Repeat step (3) until k = 0. Halt. E]
Algorithm 4.6 works by tracing out a rightmost derivation of the input
string using the parse lists to determine the productions to use. The routine
R called with arguments [A --~ f l . , i] and j appends to the left end of the
current partial parse the number corresponding to the production A ~ ft.
If fl = voBlvlB2v 2 . . . B,v,, where B 1 , . . . , Bs are all the nonterminals in fl,
then the routine R determines the first production used to expand each Bt,
say B,---~ fl,, and the position in the input string w immediately before
the first terminal symbol derived from B,. The following recursive calls of
R are then made in the order shown:
R([B, --,/~,.,i,],j,)
R([B,_, ~ /L-~ ", ~,-~],L-,)
°
SEC. 4.2 T A B U L A R P A R S I N G METHODS 329
where
(1) j, = j - Iv, I and
(2) jq -- iq+x --[vq [for 1 ~ q < s.
Example 4.11
Let us apply Algorithm 4.6 to the parse lists of Example 4.10 in order to
produce a right parse for the input string (a + a) • a. Initially, we can execute
R([E----~ T . , 0], 7). In step (1), n gets value 2, the number associated with
production E ~ T. We then set k = 1 and 1 = 7 and execute step (3b) of
Algorithm 4.6. We find [T ----~ F • T . , 0] on 17 and [E ~ • T, 0] on Io. Thus
we execute R([T ~ F , T . , 0], 7), which results in production number 3
being appended to the left of n. Thus, n = 32. Following this call of R, in
step (2) we set k = 3 and l = 7.
Step (3b) is then executed with k = 3. We find [T---, F . , 6] on I0 and
IT ~ F , . T, 0] on /6, so we call R([T----, F . , 6], 7). After completion of
this call, we set k = 2 and l = 6. In Step (3a) we consider • and set k = 1
and 1 = 5. We then find [ F - - , ( E ) . , 0] on ls and [T---,. FF, T, 0] on I0,
so we call R([F---~ ( E ) . , 0], 5).
Continuing in this fashion we obtain the right parse 64642156432.
The calls of routine R are shown in Fig. 4.10 superimposed on the deri-
vation tree for (a + a) • a. D
E R([E~ T-,01,7)
!
i R([T~F,T.,01,7)
F R([F~(E)-,0I, 5) T R([T~ F., 61,7)
( )
i R([,,,,,~,~E~
T+E., 1],4) F R([F~a -,61,7)
R([T~F-,ll,2) T/ E R([E~T.,3],4)
a
I
I +
R([F~a.,11,2) F T R([T~F.,31,4)
a F R([Foa., 31,4)
Fig. 4.10 Diagram of execution of Algorithm 4.6.

THEOREM 4.12
A l g o r i t h m 4.6 correctly finds a right parse of a 1 . . . a, if one exists, a n d
can be m a d e to operate in time 0(n2).
P r o o f A straightforward induction on the order of the calls o f routine
R shows that a right parse is produced. We leave this portion of the p r o o f for
the Exercises.
In a m a n n e r analogous to T h e o r e m 4.10, we can show that a call of
R([A --~ f t . , i], j ) takes time 0 ( ( j - 0 2) if we can show that step (3b) takes
0 ( j - i) elementary operations. To do so, we must preprocess the lists in such
a way that the time taken to examine all the finite n u m b e r o f items on I k
whose second c o m p o n e n t is l requires a fixed c o m p u t a t i o n time. That is, for
each parse list, we must link the items with a c o m m o n second c o m p o n e n t
a n d establish a header pointing to the first entry on that list. This preprocess-
ing can be done in time 0(n 2) in an obvious way.
In step (3b), then, we examine the items on list Iz with second c o m p o n e n t
r = l, l - 1 , . . . , i until a desired item of the f o r m [X k --, 7 ", r] is found.
The verification that we have the desired item takes fixed time, since all items
with second c o m p o n e n t i on I, can be f o u n d in finite time. The total a m o u n t
of time spent in step (3b) is thus p r o p o r t i o n a l to j - i. [--]
EXERCISES
4.2.1. Let G be defined by S - - , AS! b, A ---, S A l a . Construct the parse tables

by Algorithm 4.3 for the following words:
(a) bbaab.
(b) ababab.
(c) aabba.
4.2.2. Use Algorithm 4.4 to obtain left parses for those words of Exercise 4.2.1
which are in L(G).
4.2.3. Construct parse lists for the grammar G of Exercise 4.2.1 and the words
of that exercise using Algorithm 4.5.
4.2.4. Use Algorithm 4.6 to construct right parses for those words of Exercise
4.2.1 which are in L(G).
4.2.5. Let G be given by S - - ~ SS[ a. Use Algorithm 4.5 to construct a few of
the parse lists Io, 11 . . . . when the input is aa . . . . How many elementary
operations are needed before I; is computed ?
4.2.7. Prove that Algorithm 4.4 correctly produces a left parse.
4.2.8. Complete the "only if" portion of Theorem 4.9.
4.2.9. Show that Earley's algorithm operates in time 0(n 3) on any grammar.
EXERCISES 331

4.2.11. Give a reasonable set of elementary operations for Theorems 4.10-4.12.
4.2.14. Show that Algorithm 4.6 correctly produces a right parse.
4.2.15. Modify Algorithm 4.3 to work on non-CNF grammars. H i n t : Each t~j
must hold not only those nonterminals A which derive a t . . . a ~ + j _ ,
but also certain substrings of right sides of productions which derive
ai • • • ai+j-1.
"4.2.16. Show that if the underlying grammar is linear, then a modification of

Algorithm 4.3 can be made to work in time 0(n2).
"4.2.17. We can modify Algorithm 4.3 to use "lookahead strings" of length
k _~ 0. Given a grammar G and an input string w = a l a 2 . . . a , , we
create a parse table T such that t~j contains A if and only if
(1) S ==~ o~Ax,

+
(2) A ==~ ai . . . at+j-l, and
(3) at+jai+j+l . . . ai+~+k-1 -- FIRSTk(X).
Thus, A would be placed in entry t~j provided the k input symbols to the
right of input symbol a~+j_a can legitimately appear after A in a sen-
tential form. Algorithm 4.3 uses lookahead strings of length 0. Modify
Algorithm 4.3 to use lookahead strings of length k ~> 1. What is the
time complexity of such an algorithm ?
"4.2.18. We can also modify Earley's algorithm to use lookahead. Here we would
use items of the form [A ---, ~ • fl, i, u] where u is a lookahead string of
length k. We would not enter this item on list Ij unless there is a deriva-
tion S ==~ ? A u v , where ~' ==~ al -.- at, • ==~ a~+~ ..- aj, and FIRSTk(flu)
contains a~+l " ' ' a j + k . Complete the details of modifying Earley's al-
gorithm to incorporate lookahead and then examine the time complexity
of the algorithm.
4.2.19. Modify Algorithm 4.4 to produce a right parse.
4.2.20. Modify Algorithm 4.6 to produce a left parse.
4.2.21. Show that it is possible to modify Algorithm 4.4 to produce a parse in
linear time if, in constructing the parse table, we add pointers with each
A in ttj to the B in t~k and C in tt+k.j-k that caused A to be placed in
t~j in step (2) of Algorithm 4.3.
4.2.22. Show that if Algorithm 4.5 is modified to include pointers from an item
to the other items which caused it to be placed on a list, then a right
(or left)parse can be obtained from the parse lists in linear time.
4.2.23. Modify Algorithm 4.6 to work on arbitrary CFG's (including those
with cycles). H i n t : Include the pointers in the parse lists as in Exercise
4.2.22.
4.2.24. What is the maximum number of items that can appear in a list Ij in
Algorithm 4.5 ?
*4.2.25. A grammar G is said to be of finite ambiguity if there is a constant k
such that if w is in L(G), then w has no more than k distinct left parses.
Show that Earley's algorithm takes time 0(n 2) on all grammars of finite
ambiguity.
Open Problems
There is little known about the actual time necessary to parse an
arbitrary context-free grammar. In fact, no good upper bounds are
known for the time it takes to recognize sentences in L(G) for arbitrary
CFG G, let alone parse it. We therefore propose the following open
problems and research areas.
4.2.26. Does there exist an upper bound lower than 0(n 3) on the time needed
to recognize an arbitrary CFL on some reasonable model of a random
access computer or a multitape Turing machine ?
4.2.27. Does there exist an upper bound better than 0(n 2) on the time needed
to recognize unambiguous CFL's ?
Research Problems
4.2.28. Find a CFL which cannot be recognized in time f(n) on a random
access computer or Turing machine (the latter would be easier), where
f(n) grows faster than n; i.e., lim,_.~ (n/f (n)) = 0. Can you find a CFL
which appears to take more than 0(n 2) time for recognition, even if you
cannot prove this to be so ?
4.2.29. Find large classes of CFG's which can be parsed in linear time by
Eadey's algorithm. Find large classes of ambiguous CFG's which can
be parsed in time 0(n 2) by Earley's algorithm. It should be mentioned
that all the deterministic CFL's have grammars in the former class.
4.2.30. Use Earley's algorithm to construct a parser for one of the grammars
in the Appendix.
4.2.31. Construct a program that takes as input any CFG G and produces as
output a parser for G that uses Earley's algorithm.
BIBLIOGRAPHIC NOTES
Algorithm 4.3 has been discovered independently by a number of people.

Hays [1967] reports a version of it, which he attributes to J. Cocke.
Younger [1967] uses Algorithm 4.3 to show that the time complexity of the
membership problem for context-free languages is 0(n3). Kasami [1965] also gives
a similar algorithm. Algorithm 4.5 is found in Earley's Ph.D. thesis [1968]. An
0(n 2) parsing algorithm for unambiguous CFG's is reported by Kasami and Torii
[1969].
ONE-PASS N O
5 BACKTRACK PARSING
In Chapter 4 we discussed backtrack techniques that could be used to

simulate the nondeterministic left and right parsers for large classes of con-
text-free grammars. However, we saw that in some cases such a simulation
could be quite extravagant in terms of time. In this chapter we shall discuss
classes of context-free grammars for which we can construct efficient parsers
--parsers which make cln operations and use c2n space in processing an
input of length n, where cl and c2 are small constants.
We shall have to pay a price for this efficiency, as none of the classes of
grammars for which we can construct these efficient parsers generate all
the context-free languages. However, there is strong evidence that the restrict-
ed classes of grammars for which we can construct these efficient parsers are
adequate to specify all the syntactic features of programming languages that
are normally specified by context-free grammars.
The parsing algorithms to be discussed are characterized by the facts
that the input string is scanned once from left to right and that the parsing
process is completely deterministic. In effect, we are merely restricting the
class of CFG's so that we are always able to construct a deterministic left
parser or a deterministic right parser for the grammar under consideration.
The classes of grammars to be discussed in this chapter include
(1) The LL(k) grammars--those for which the left parser can be made
to work deterministically if it is allowed to look at k input symbols to the
right of its current input position.t
(2) The LR(k) grammars--those for which the right parser can be made
tThis does not involve an extension of the definition of a DPDT. The k "lookahead
symbols" are stored in the finite control.
333
334 ONE-PASSNO BACKTRACK PARSING CHAP. 5
to work deterministically if it is allowed to look k input symbols beyond its

current input position.
(3) The precedence grammars--those for which the right parser can find
the handle of a right-sentential form by looking only at certain relations
between pairs of adjacent symbols of that sentential form.
5.1. LL(k) GRAMMARS
In this section we shall present the largest "natural" class of left-parsable

grammars--the LL(k) grammars.
5.1.1. Definition of LL(k) Grammar
As an introduction, let G = (N, I£, P, S) be an unambiguous grammar

and w = a t a 2 . . . a, a sentence in L ( G ) . Then there exists a unique sequence
of left-sentential forms 0%, e l , . - . , em such that S = ~0, ei P'==> ~i+1 for
0 < i < m and e m = w. The left parse for w is PoP1 "'" Pro-l"
NOW, suppose that we want to find this left parse by scanning w once
from left to right. We might try to do this by constructing Co, e l , . . . , era,
the sequence of left-sentential forms. If el = al "" " a i A f l , then at this point
we could have read the first j input symbols and compared them with the
first j symbols of 0~i. It would be desirable if ei+ i could be determined know-
ing only al . . . aj (the part of the input we have scanned to this point),
the next few input symbols (aj+laj+2 . . . aj+~ for some fixed k), and the non-
terminal A. If these three quantities uniquely determine which production is
to be used to expand A, we can then precisely determine et+~ from e~ and the
k input symbols aj+laj+ z . . . aj+ k.
A grammar in which each leftmost derivation has this property is said
to be an LL(k) grammar. We shall see that for each LL(k) grammar we can
construct a deterministic left parser which operates in linear time. A few
definitions are needed before we proceed.
DEFINITION
Let e = x f l be a left-sentential form in some grammar G = (N, ~, P, S)

such that x is in If* and fl either begins with a nonterminal or is e. We say
that x is the closed p o r t i o n of e and that fl is the open p o r t i o n of e. The bound-
ary between x and ,8 will be referred to as the border.
Example 5.1
Let ~ ---- a b a c A a B . The closed portion of ~ is abac; the open portion is
A a B . If ~ ---- abc, then abc is its closed portion and e its open portion. Its
border is at the right end.
sEc. 5.1 LL(k) GRAMMARS 335
The intuitive idea behind LL(k) grammars is that if we are constructing

a leftmost derivation S =~ w and we have already constructed S =~ ~ :=~ ~2
lm Im lm
=* . . . =~ et~ such that ~ =* w, then we can construct ~+1, the next step of
lm lm Ira
the derivation, by observing op,ly the closed portion of ~i and a "little
more," the "little more" being the next k input symbols of w. (Note that
the closed portion of ~t is a prefix of w.) It is important to observe that if
we do not see all of w when ~,+1 is constructed, then we do not really know
what terminal string is ultimately derived from S. Thus the LL(k)condition
implies that ~;+i is substantially independent (except for the next k terminal
symbols) of what is derived from the open portion of ~.
Viewed in terms of a derivation tree, we can construct a derivation tree
for a sentence w x y in an LL(k) grammar starting from the root and working
top-down determfiaistieally. Specifically, if we have constructed the partial
derivation tree with frontier wArE, then knowing w and the first k symbols
of x y we would know which production to use to expand A. The outline of
the complete tree is shown in Fig. 5.1.
w x y
Fig. 5.1 Partial derivation tree for the sentence wxy.
Recall that in Chapter 4 we defined for a C F G G -- (N, ~;, P, S) the func-

tion FIRST,(a), where k is an integer and 0c is in (N u Ig)*, to be
[w in Z * [ e i t h e r [ w [ < k and ct=* w, o r [ w I-- k and ~ = * w x for some x].
G G
We shall delete the subscript k and/or the superscript G from FIRST
whenever no confusion will result.
If ~ consists solely of terminals, then FIRSTk(~) is just {w], where w is
the first k symbols of ~ if 10c[ ~ k, and w = ~ if l~] < k. We shall write
FIRSTk(a ) = w, rather than {w}, in this case. It is straightforward to deter-

mine FIRST~(00 for particular g r a m m a r G. We shall defer an algorithm to
Section 5.1.6.
DEFINITION
Let G = (N, Z, P, S) be a CFG. We say that G is L L ( k ) , for some fixed

integer k, if whenever there are two leftmost derivations
(1) S => w A e = ~ w f l a =-~ w x and

lra Im
(2) S =-~ w A a =-~ w~,a =-~ wy

Ira Ira
such that FIRSTk(x ) = FIRSTk(y), it follows that fl = 7'.

Stated less formally, G is LL(k) if given a string w A a in (N U Z)* and
the first k terminal symbols (if they exist) to be derived from A a there is at
most one production which can be applied to A to yield a derivation of any
terminal string beginning with w followed by those k terminals.
We say that a grammar is L L if it is LL(k) for some k.
Example 5.2
Let G 1 be the grammar S ~ a A S ] b, A ---~ a l b S A . Intuitively, G1 is LL(1)
because given C, the leftmost nonterminal in any left-sentential form, and
c, the next input symbol, there is at most one production for C capable of
deriving a terminal string beginning with c. Going to the definition of an
LL(1) grammar, if S = ~ w S a ~ wfla ~ w x and S ~ w S a = ~ wTa ~ wy
lra lm lm lrn lm lra
and x and y start with the same symbol, we must have fl = 7'. Specifically, if
x and y start with a, then production S ---~ a A S was used and fl = 7' = a A S .
S---~ b is not a possible alternative. Conversely, if x and y start with b,
S ~ b must be the production used, and fl = 7' = b. Note that x = y = e
is impossible, since S does not derive e in G~.
A similar argument prevails when we consider two derivations
S ~ wAoc ~ wile ~ w x and S ~ wAoc ~ wT'a =-> wy. [~
lm lm lm tm lrn Ira
The grammar in Example 5.2 is an example of what is known as a simple

LL(!) grammar.
DEFINITION
A context-flee grammar G = (N, E, P, S ) with no e-productions such

that for all A ~ N each alternate for A begins with a distinct terminal symbol
is called a s i m p l e LL(1) grammar. Thus in a simple LL(1) grammar, given
a pair (A, a), where A ~ N and a ~ Z, there is at most one production of
the form A - - , aa.
SEC. 5.1 LL(k) GRAMMARS 337
Example 5.3
Let us consider the more complicated case of the grammar G 2 defined by
S ---~ e t a b A , A ~ S a a l b . We shall show that G2 is LL(2). To do this, we shall
show that if wBtx is any left-sentential form of G2 and w x is a sentence in
L(G), then there is at most one production B ~ f l in Gz such that
FIRST2(flt~ ) contains FIRST2(x ). Suppose that S ~ wSt~ ~ wflo~ ==~ w x
lm lm lm
and S ~ wSoc :=~ w~,o~ ~ wy, where the first two symbols of x and y agree if
lm Im lm
they exist. Since G2 is a linear grammar, ~ must be in (a + b)*. In fact,
we can say more. Either w = ~ = e, or the last production used in the deri-
vation S ~ wSoc was A ~ Saa. (There is no other way for S to "appear"
lm
in a sentential form.) Thus either ~ = e or ~ begins with aa.
Suppose that S ---~ e is used going from wSt~ to wilts. Then fl = e, and x
is either e or begins with aa. Likewise, if S ~ e is used going from wSt~ to
w},~, then ~ = e and y = e, or y begins with aa. If S ~ abA is used going
from wSoc to wflo~, then fl = a b A , and x begins with ab. Likewise, if S ---~ a b A
is used going from wStx to wToc, then ~, = a b A , and y begins with ab. There
are thus no possibilities other than x = y = e, x and y begin with aa, or both
begin with ab. Any other condition on the first two symbols of x and y implies
that one or both derivations are impossible. In the first two cases, S ---~ e is
used in both derivations, and fl = ? = e. In the third case, S ---~ a b A must
be used, and fl = ~, = abA.
It is left for the Exercises to prove that the situation in which A is the
symbol to the right of the border of the sentential form in question does not
yield a contradiction of the LL(2) condition. The reader should also verify
that G~ is not LL(1). D
Example 5.4
Let us consider the grammar G 3 = ({S, A, B}, {0, 1, a, b}, P3, S), where
P3 consists of S --~ A IB, A --~ a A b l O, B ~ aBbb l l. L(G3) is the language
(a'Ob"in ~ 0} u {a"lb2"! n ~ 0}. G 3 is not LL(k) for any k. Intuitively, if
we begin scanning a string of a's which is arbitrarily long, we do not know
whether the production S ---~ A or S ~ B was first used until a 0 or 1 is seen.
Referring to the definition of LL(k) grammar, we may take w = a = e,
fl = A, ~, = B, x = akOb k, and y = a klb 2k in the derivations
0 ,
S ~ S ~ A ~ akOb k
lm lm
0 *
S ~ S ~ B ~ a k l b 2k
lm lm
to satisfy conditions (1) and (2) of the definition. Moreover, x and y agree in
338 ONE-PASSNO BACKTRACKPARSING CHAP. 5
the first k symbols. However, the conclusion that fl -- ~, is false. Since k can
be arbitrary here, we may conclude that G 3 is not an LL grammar. In fact,
in Chapter 8 we shall show that L(G3) has no LL(k) grammar. F---]
5.1.2. Predictive Parsing Algorithms
We shall show that we can parse LL(k) grammars very conveniently using
what we call a k-predictive parsing algorithm. A k-predictive parsing algorithm
for a CFG G = (N, X, P, S) uses an input tape, a pushdown list, and an
output tape as shown in Fig. 5.2. The k-predictive parsing algorithm attempts
to trace out a leftmost derivation of the string placed on its input tape.
I w ,'l /'/ X Input tape
Input head
i] F ,i
Output tape
7
Fig. 5.2 Predictive parsing algorithm.
The input tape contains the input string to be parsed. The input tape is
read by an input head capable of reading the next k input symbols (whence
the k in k-predictive). The string scanned by the input head will be called the
lookahead string. In Fig. 5.2 the substring u of the input string wux represents
the lookahead string.
The pushdown list contains a string X0c$, where X'a is a string of push-
down symbols and $ is a special symbol used as a bottom of the pushdown
list marker. The symbol X is on top of the pushdown list. We shall use r' to
represent the alphabet of pushdown list symbols (excluding $).
The output tape contains a string 7z of production indices.
We shall represent the configuration of a predictive parsing algorithm by
a triple (x, X~, n), where
(1) x represents the unused portion of the original input string.
(2) Xa represents the string on the pushdown list (with X on top).
(3) n is the string on the output tape.
For example, the configuration in Fig. 5.2 is (ux, Xa$, ~r).
The action of a k-predictive parsing algorithm ~ is dictated by a parsing

table M, which is a mapping from (F u {$}) x Z.k to a set containing the
following elements"
(1) (fl, i), where fl is in F* and i is a production number. Presumably,
fl will be either the right side of production i or a representation of it.
(2) pop.
(3) accept.
(4) error.
The parsing algorithm parses an input by making a sequence of moves,
each move being very similar to a move of a pushdown transducer. In a move
the lookahead string u and the symbol X on top of the pushdown list are
determined. Then the entry M(X, u) in the parsing table is consulted to deter-
mine the actual move to be made. As would be expected, we shall describe
the moves of the parsing algorithm in terms of a relation ~ on the set of
configurations. Let u be FIRSTk(x ). We write
(1) (x, X0c, 70 ~ (x, fl0~, 7ti) if M(X, u) = (fl, i). Here the top symbol
X on the pushdown list is replaced by the string fl ~ IF'*, and the production
number i is appended to the output. The input head is not moved.
(2) (x, a0~, 70 1-- (x', 0c, n) if M(a, u) = pop and x = ax'. When the symbol
on top of the pushdown list matches the current input symbol (the first
symbol of the lookahead string), the pushdown list is popped and the input
head is moved one symbol to the right.
(3) If the parsing algorithm reaches configuration (e, $, n), then parsing
ceases, and the output string zt is the parse of the original input string. We
shall assume that M($, e) is always accept. Configuration (e, $, 70 is called
accepting.
(4) If the parsing algorithm reaches configuration (x, X0~, 10 and
M(X, u) = error, then parsing ceases, and an error is reported. The configu-
ration (x, X~, n) is called an error configuration.
If w ~ E* is the string to be parsed, then the initial configuration of the
parsing algorithm is (w, X0$, e), where Xo is a designated initial symbol.
If (x, X0$, e)I-~--(e, $, 70, we write ~ ( w ) = 7t and call n the output of
for input w. If (w, X0$, e) does not reach an accepting configuration, we say
that ~(w) is undefined. The translation defined by ~, denoted z(~), is the set
of pairs {(w, 7t) [~(w) = n}.
We say that ~ is a valid k-predictive parsing algorithm for CFG G if
(1) L(G) = {wta(w) is defined}, and
(2) If a(w) = n, then 7t is a left parse of w.
If a k-predictive parsing algorithm ~ uses a parsing table M and a is
a valid parsing algorithm for a CFG G, we say that M is a valid parsing
table for G.
340 ONE-PASS NO BACKTRACK PARSING CHAP. 5
Example 5.5
Let us construct a 1-predictive parsing algorithm et for G~, the simple
LL(1) grammar of Example 5.2. First, let us number the productions of G1
as follows:
(1) S - ,~ aAS
(2) S ~ b
(3) A .~ a
(4) A~ ,~bSA
A parsing table for a is shown in Fig. 5.3.
L o o k a h e a d string
a e
Symbol
on top of aAS, 1 b, 2 error
pushdown , ,
list a, 3 bSA, 4 error
pop error error
error pop error

.
error error accept
Fig. 5.3 Parsing table for a.
Using this table, et would parse the input string abbab as follows"
(abbab, S$, e) ~ (abbab, aAS$, 1)

1- (bbab, ASS, 1)
~- (bbab, bSAS$, 14)
k- (bab, SAS$, 14)
}--- (bab, bASS, 142)
k-- (ab, ASS, 142)
k- (ab, aSS, 1423)
(b, S$, 1423)
(b, b$, 14232)
l--- (e, $, 14232)
For the first move M(S, a) = (aAS, 1), so S on top of the pushdown list is
replaced by aAS, and production number 1 is written on the output tape.
For the next move M(a, a) = pop so that a is removed from the pushdown
list, and the input head is moved one position to the right.
Continuing in this fashion, we obtain the accepting configuration
(e, $, 14232). It should be clear that 14232 is a left parse of abbab, and, in
fact, a~ is a valid 1-predictive parsing algorithm for G 1.
A k-predictive parsing algorithm for a C F G G can be simulated by a

deterministic pushdown transducer with an endmarker on the input. Since
a pushdown transducer can look only at one input symbol, the lookahead
string should be stored as part of the state of the finite control. The rest of
the simulation should be obvious.
THEOREM 5.1
Let a~ be a k-predictive parsing algorithm for a CFG G. Then there exists
a deterministic pushdown transducer T such that z(T) = ((w$, z01 a~(w) = n~}.
Proof. Exercise. [[]
COROLLARY
Let ~ be a valid k-predictive parsing algorithm for G. Then there is
a deterministic left parser for G.
Example 5.6
Let us construct a deterministic left parser P~ for the 1-predictive parsing
algorithm in Example 5.5. Since the grammar is simple, we can obtain
a smaller DPDT if we move the input head one symbol to the right on each
move. The left parser will use $ as both a right endmarker on the input tape
and as a bottom of the pushdown list marker.
Let P~ = ([q0, q, accep(}, (a, b, $}, {S, A, a, b, $~},~, q0, $, [accept]), where
is defined as follows"
~(q0, e, $) = (q, S$, e)
~(q, a, S) = (q, AS, 1)
~(q, b, S) = (q, e, 2)
~(q, a, A) = (q, e, 3)
~(q, b, A) = (q, SA, 4)
~(q, $, $) = (accept, e, e)
It is easy to see that (w$, n) ~ z(P~) if and only if ~(w) = n. I[]

5.1.3. Implications of the LL(k) Definition
We shall show that for every LL(k) grammar we can mechanically con-
struct a valid k-predictive parsing algorithm. Since the parsing table is the
heart of the predictive parsing algorithm, we shall show how a parsing table
can be constructed from the grammar. We begin by examining the impli-
cations of the LL(k) definition.
The LL(k) definition states that, given a left-sentential form w a s , then
w and the next k input symbols following w will uniquely determine which
production is to be used to expand A. Thus at first glance it might appear
that we have to remember all of w to determine which production is to be
used next. However, this is not the case. The following theorem is fundamen-
tal to an understanding of LL(k) grammars.
THEOREM 5.2
Let G = (N, E, P, S) be a CFG. Then G is LL(k) if and only if the follow-
ing condition holds" If A ---~ fl and A ---~ ~, are distinct productions in P,
then FIRSTk(fl00 ~ FIRSTk(T00 = ~ ' for all wAtt such that S = ~ wAtx.
lm
Proof
Only if: Suppose that there exist w, A, ~, fl, and 7 as above, but
FIRSTk(fltx) ~ FIRSTk(7~) contains x. Then by definition of FIRST, we
have derivations S ~ wAoc =~ wflo~ =~ w x y and S ~ wAoc =~ wTo~ =-~ w x z
Im Im lm lm lm lm
for some y and z. (Note that here we need the fact that N has no useless
nonterminals, as we assume for all grammars.) If Ix I < k, then y = z = e.
Since fl ~ 7, G is not LL(k).
lf: Suppose that G is not LL(k). Then there exist two derivations
S ~ wAoc ~ wfloc ~ w x and S =~ wAoc =-~ wT~ =-~ wy such that x and y
lm lm lm lm lm lm
agree up to the first k places, but fl ~ ~,. Then A ---~ fl and A ~ y are distinct
productions in P, and FIRST(fiE) and FIRST(?~) each contain the string
FIRST(x), which is also FIRST(y). [~]
Let us look at some applications of Theorem 5.2 to LL(1) grammars.

Suppose that G = (N, E, P, S) is an e-free CFG, and we wish to determine
whether G is LL(1). Theorem 5.2 implies that G is LL(1) if and only if for
all A in N each set of A-productions A --, ~1 I ~ z l ' " I~. in P is such that
FIRST~(gl), F I R S T ~ ( a 2 ) , . . . , FIRSTI(~ .) are all pairwise disjoint. (Note
that e-freedom is essential here.)
Example 5.7
The grammar G having the two productions S - - - , a S a cannot be EL(l),
since F I R S T 1 ( a S ) = F I R S T , ( a ) = a. Intuitively, in parsing a string begin-
ning with an a, looking only at the first input symbol we would not know
whether to use S ~ a S or S --~ a to expand S. On the other hand, G is LL(2).
Using the notation in Theorem 5.2, if S =~ wAoc, then A = S and 0c = e.
lm
The only two productions for S are as given, so that fl = a S and ~, = a.
Since FIRST2(aS) = aa and FIRST2(a ) = a, G is LL(2) by Theorem 5.2. E]
Let us consider LL(1) grammars with e-productions. At this point it is

convenient to introduce the function FOLLOW~.
DEFINITION
Let G = (N, Z, P, S) be a CFG. We define FOLLOW~(fl), where k is

an integer and fl is in (N U E)*, to be the set {wl S ~ eft? and w is in
FIRST~(?)}. As is customary, we shall omit k and G whenever they are
understood.
Thus, FOLLOW~(A) includes the set of terminal symbols that can occur
immediately to the right of A in any sentential form, and if eA is a sentential
form, then e is also in FOLLOW~(A).
We can extend the functions FIRST and F O L L O W to domains which
are sets of strings rather than single strings, in the obvious manner. That is,
let G = (N, E, P, S) be a CFG. If L ~ (N U Z)*, then
FIRST~(L) = {wl for some a in L, w is in FIRS'r~(a)}
and FOLLOW~(L) = [wl for some a in L, w is in FOLLOW~(a)}.
For LL(1) grammars we can make the following important observation.
THEOREM 5.3
A C F G G = (N, E, P, S) is LL(1) if and only if the following condition
holds" For each A in N, if A --, fl and A ~ 7 are distinct productions, then
FIRST1( ~ F O L L O W i (A)) ~ FIRST1( 7 F O L L O W i ( A ) ) = ~ .
Thus we can show that a grammar G is LL(1) if and only if for each set
of A-productions A --~ tz~ [ ~ 2 1 - ' ' I~n the following conditions hold"
(1) FIRSTi(tz~), F I R S T i ( t z 2 ) , . . . , FIRSTi(tzn) are all pairwise disjoint.
(2) If ~t ~ e, then FIRST~(~i) A FOLLOW~(A) = ~ for 1 < j < n,
i~j.
These conditions are merely a restatement of Theorem 5.3. We should cau-
tion the reader that, appealing as it may seem, Theorem 5.3 does not gener-
alize directly. That is, tet G be a C F G such that statement (5.1.1) holds"
If A --~ fl and A --~ 7 are distinct A-productions, then

(5.1.1)
FIRSTk( fl FOLLOWk(A)) A FIRSTk( 7 F O L L O W k ( A ) ) =
344 ONE-PASS NO B A C K T R A C K P A R S I N G CHAP. 5
Such a grammar is called a strong LL(k) grammar. Every LL(1) grammar is

strong. However, the r~ext example shows that when k > 1 there are LL(k)
grammars that are not strong LL(k) grammars.
Example 5.8
Consider the grammar C, defined by
S > aAaalbAba
A >ble
Using Theorem 5.2, we can verify that G is an LL(2) grammar. Consider

the derivation S ~ aAaa. We observe that FIRST2(baa ) ~ FIRSTz(aa ) =
~ . Using the notation o f Theorem 5.2 here, ~ = aa, fl = b, and ? = e.
Likewise, if S = > bAba, then F I R S T 2 ( b b a ) ~ F I R S T 2 ( b a ) = ~ . Since all
derivations in G are of length 2, we have shown G to be LL(2), by Theorem
5.2. But FOLLOWz(A) = [aa, ba}, so
FIRSTz(b FOLLOW2(A)) ~ F I R S T z ( F O L L O W z ( A ) ) = {bali,
violating (5.1.1). Hence G is not a strong LL(2) grammar. D
One important consequence of the LL(k) definition is that a left-recursive

grammar cannot be LL(k) for any k (Exercise 5.1.1).
Example 5.9
Consider the grammar G with the two productions S ~ Sa[ b. From
i
Theorem 5.2, consider the derivation S ~ Sa t, i ~ 0 with A = S, ~ = e,
fl = Sa, and ~, = b. Then for i ~ k, FIRSTk(Saa t) A FIRSTk(ba ~) = ba k- ~.
Thus, G cannot be LL(k) for any k. [[]
It is also important to observe that every LL(k) grammar is unambiguous

(Exercise 5.1.3). Thus, if we are given an ambiguous grammar, we can imme-
diately conclude that it cannot be LL(k) for any k.
In Chapter 8 we shall see that many deterministic context-free languages
do not have an LL(k) grammar. For example, {anObn[n ~ 1~ u {a"lbZn[n ~ 1}
is such a language. Also, given a C F G G which is not LL(k)for a fixed k,
it is undecidable whether G has an equivalent LL(k) grammar. But in spite
of these obstacles, there are several situations in which various transforma-
tions can be applied to a grammar which is not LL(1) to change the given
grammar into an equivalent LL(1) grammar. We shall give two useful exam-
ples of such transformations here.
The first is the elimination of left recursion. We shall illustrate the tech-
nique with an example.
Example 5.10
Let G be the grammar S ----~ Salb, which we saw in Example 5.9 was not
LL. We can replace these two productions by the three productions
S > bS'
S' > aS' l e
to obtain an equivalent grammar G'. Using Theorem 5.3, we can readily

show G' to be LL(1). E]
Another useful transformation is left factoring. We again illustrate the

technique through an example.
Example 5.11
Consider the LL(2) grammar G with the two productions S---~ aS! a.
We can "factor" these two productions by writing them as S---~ a(S[ e).
That is, we assume that concatenation distributes over alternation (the
vertical bar). We can then replace these productions by
S > aA
A >Sic
to obtain an equivalent LL(1) grammar.
In general, the process of left factoring involves replacing the produc-

tions A --~ tzfll l " " [tzfln by A ----~tzA' and A' ----~fll 1"" [fin.
5.1.4. Parsing LL(1) Grammars
The heart of a k-predictive parsing algorithm is its parsing table M.

In this section and the next we show that every LL(k) grammar G can be left-
parsed by a k-predictive parsing algorithm by showing how a valid parsing
table can be constructed from G. We shall first consider the important special
case where G is an LL(1) grammar.
ALGORITHM 5.1
A parsing table for an LL(1) grammar.
Input. An LL(1) CFG G = (N, X, P, S).
Output. M, a valid parsing table for G.
Method. We shall assume that $ is the bottom of the pushdown list
marker. M is defined on (N U X u {$}) x (X U {e}) as follows:
(1) If A --~ ~ is the ith production in P, then M(A, a) = (oc, i) for all a
in FIRST 1(~), a ~ e. If e is also in FIRST~ (00, then M(A, b) = (oc, i) for all
b in FOLLOWl(A).
(2) M ( a , a) = pop for all a in ~.

(3) M ( $ , e) = accept.
(4) Otherwise M ( X , a) : error, for X in N U ~ U {$}, a in E U [e}. D
Before we prove that Algorithm 5.1 does produce a valid parsing table
for G, let us consider an example of Algorithm 5.1.
Example 5.12
Let us consider producing a parsing table for the grammar G with pro-
ductions
(1) E ~ TE' (2) E ' .--, -b T E '
(3) E'---~ e (4) T ~ FT'
(5) T'---~ • F T ' (6) T'--~ e
(7) F ~ (E) (8) F ~ a
Using Theorem 5.3 the reader can verify that G is an LL(1) grammar.
I n fact, the discerning reader will observe that G has been obtained from Go
using the transformation eliminating left recursion as in Example 5.10. Go
is not LL, by the way.
Let us now compute the entries for the E-row using step (1) of Algorithm
a ( ) + • e
TE', 1 TE', 1
E
t
e, 3 +TE', 2 e, 3
,,
FT', 4 FT', 4
T
it
e, 6 e, 6 • FT', 5 e, 6
a, 8 (E),7
pop
pop
pop
pop
pop
accept
Fig. 5.4 Parsing table for G.

5.1. Here, FIRSTI[TE' ] = {(, a}, so M[E, (] = [TE', 1] and M[E, a] = [TE', 1].
All other entries in the E-row are error. Let us now compute the entries
for the E'-row. We note FIRST1[-+- TE'] = -t-, so M[E', .+] = [-q- TE', 2].
Since E'--~ e is a production, we must compute FOLLOWl[E' ] = {e,)}.
Thus, M[E', e] = M[E', )] = [e, 3]. All other entries for E' are error. Continu-
ing in this fashion, we obtain the parsing table for G shown in Fig. 5.4.
Error entries have been left blank.
The 1-predictive parsing algorithm using this table would parse the input
string (a • a) in the following sequence of moves:
[(a • a), E$, e] [(a • a), TE'$, 1]

[(a • a), FT'E'$, 14]
[(a • a), (E)T'E'$, 147]
[a • a), E)T'E'$, 147]
[a • a), TE')T'E'$, 1471]
]-- [a • a), FT'E')T'E'$, 14714]
[a • a), aT'E')T'E'$, 147148]
[, a), T'E')T'E'$, 147148]
]-- [. a),. FT'E')T'E'$, 1471485]
[a), FT'E')T'E'$, 1471485]
[a), aT'E')T'E'$, 14714858]
1-- D, T'E')T'E'$, 14714858]
[), E')T'E' $, 147148586]
i-- D, )T'E'$, 1471485863]
[e, T'E', 1471485863]
[e, E'$, 14714858636]
[e, $, 147148586363]
THEOREM 5.4
Algorithm 5.1 produces a valid parsing table for an LL(I) grammar G.
Proof. We first note that if G is an LL(1) grammar, then at most one
value is defined in step (1) of Algorithm 5.1 for each entry M(A, a) of the
parsing matrix. This observation is merely a restatement of Theorem 5.3.
Next, a straightforward induction on the number of moves executed by
a 1-predictive parsing algorithm ~ using the parsing table M shows that if
(xy, S$, e)1--~-(y, oc$, zt), then S ~ x0~. Another induction on the number
of steps in a leftmost derivation can be used to show the converse, namely
that if S"=~ xa, where a is the open portion of xa, and FIRSTI(y ) is in
FIRSTI(a), then (xy, S$, e) l * (Y, aS, n). It then follows that (w, S$, e)
I*-- (e, $, ~z) if and only if S "=* w. Thus ff is a valid parsing algorithm for
G, and M a valid parsing table for G. V-]
5.1.5. Parsing LL(k) Grammars
Let us now consider the construction of a parsing table for an arbitrary

LL(k) grammar G = (N, E, P, S), where k > 1. If G is a strong LL(k) gram-
mar, then we can use Algorithm 5.1 with lookahead strings of length up to
k symbols. However, the situation is somewhat more complicated when G
is not a strong LL(k) grammar. In the LL(1)-predictive parsing algorithm
we placed only symbols in N u E on the pushdown list, and we found that
the combination of the nonterminal symbol on top of the pushdown list
and the current input symbol was sufficient to uniquely determine the next
production to be applied. However, when G is not strong, we find that a non-
terminal symbol and the lookahead string are not always sufficient to uniquely
determine the next production.
For example, consider the LL(2) grammar
S > aAaa tbAba
A >hie
of Example 5.8. Given the nonterminal A and the lookahead string ba we do
not know whether we should apply production A ~ b or A ~ e.
We can, however, resolve uncertainties of this nature by associating with
each nonterminal and the portion of a left-sentential form which may appear
to its right, a special symbol which we shall call an LL(k) table (not to be
confused with the parsing table). The LL(k) table, given a lookahead string,
will uniquely specify which production is to be applied next in a leftmost
derivation in an LL(k) grammar.
DEFINITION
Let E be an alphabet. If L 1 and L 2 are subsets of X*, let
L1 ~)k L2 = {w Ifor some x ~ L 1 and y ~ L2, we have w = xy

if [xyl ~ k and w is the first k symbols of xy otherwise].
Example 5.13
Let L, = [e, abb) and L2 : [b, bab]. Then L1 O2 L2 = {b, ba, ab}.
The ~ k operator is similar to an infix FIRST operator.

LEMMA 5.1
For any C F G G = (N, E , P , S), and for all a and fl in (N u E)*,
FIRST~(afl) = FIRST~(a) @)~ FIRST~(fl).
sEc. 5.1 LL(k) GRAMMARS 349
P r o o f Exercise. [B
DEFINITION
Let G = (N, Z, P, S) be a CFG. For each A in N and L ~ Z *k we define

T,4.L, the L L ( k ) table associated with A and L to be a function which given
a lookahead string u in Z *k returns either the symbol error or an A-produc-
tion and a finite list of subsets of Z *k.
Specifically,
(1) TA,L(U)= error if there is no production A--~ a in P such that
FIRSTk(a ) @e L contains u.
(2) TA,L(U) = (A ~ a, (Y~, Y 2 , . . . , Ym)) if A ~ a is the unique pro-
duction in P such that FIRSTk(a ) @e L contains u. If
a = xoBxxtBzx z . . . BmX m, m>0,
where each Bt ~ N a n d x t E Z*, then Yt = FIRSTk(xiB,+ix~+~ "'" Bmxm)@k L"

We shall call Y, a local follow set for B,. [If m = 0, TA.L(U) = (A --~ a, ~).]
(3) TA,L(U) is undefined if there are two or more productions
A --~ ~ , 1 ~ 1 "-" 1~. such that FIRSTk(et) @e L contains u, for 1 _ 2. This situation will not occur if G is an LL(k) grammar.
Intuitively, if TA.L(U)= error, then there is no possible derivation

+
in G of the form A x = - ~ u v for any x E L and v ~ Z*. Whenever
TA,L(U) = (A --~ a, (Y1, Y2, . . . , Ym)), there is exactly one production, A ---~ a,
+
which can be used in the first step of a derivation A x ==~ uv for any x ~ L
and v ~ Z*. Each set of strings Yt gives all possible prefixes of length
up to k of terminal strings which can follow a string derived from B~ when
we use the production A ~ a, where a = x o B l x i B 2 x 2 . . . BmXm, in any
derivation of the form A x ~ ax ~ uv, with x in L.
lm
By Theorem 5.2, G = (N, Z, P, S) is not LL(k) if and only if there exists
in (N U Z)* such that
(1) S ==~ wAa, and

Im
(2) FIRSTk(fla) ~ F I R S T k ( I , a ) ¢ ~ for some fl ~ ~, such that A ~ fl
and A --~ t' are in P.
By Lemma 5.1 we can rephrase condition (2) as
(2') If r = FIRSTk(a ), then (FIRSTe(fl) Ok L) C3 (FIRSTk(?) @k L) ~ ~ .
Therefore, if G is LL(k), and we have the derivation S =~ w A a =~ wx,

lm lm
then TA,L(U) will uniquely determine which production is to be used to expand
A, where u is FIRSTk(x) and L is FIRSTk(a ).
Example 5.14
Consider the LL(2) grammar
S > aAaalbAba
A >ble
Let us compute the LL(2) table Ts, te~, which we shall denote To. Since
S ---~aAaa is a production, we compute FIRST2(aAaa ) Q2 {e} = {aa, ab}.
Likewise, S---~ bAba is a production, and FIRST2(bAba ) e 2 [e} = {bb}.
Thus we find To(aa)= (S---~ aAaa, Y). Y is the local follow set for A;
Y = FIRST2(aa)02 [e] = [aa}. The string aa is the string to the right of
A in the production S---~ aAaa. Continuing in this fashion, we obtain the
table To shown below"
Table To
Production Sets
aa S ~ aAaa {aa}
ab S ~ aAaa {aa}
bb S ~ bAba {ba}
For each u in (a -+ b) .2 not shown, To(u) = error. D
We shall now provide an algorithm to compute those LL(k) tables for

an LL(k) grammar G which are needed to construct a parsing table for G.
It should be noted that if G is an LL(1) grammar, this algorithm might pro-
duce ~nore than one table per nonterminal. However, the parsers constructed
by Algorithms 5.1 and 5.2 will be quite similar. They act the same way on
inputs in the language, of course. On other inputs, the parser of Algorithm
5.2 might detect the error while the parser of Algorithm 5.1 proceeds to make
a few more moves.
ALGORITHM 5.2
Construction of LL(k) tables.
Input. An LL(k) CFG G = (N, E, P, S).
Output. 3, the set of LL(k) tables needed to construct a parsing table
for G.
Method.
(1) Construct To, the LL(k) table associated with S and {e}.
(2) Initially set ~ = IT0}.
(3) For each LL(k) table T in 3 with entry
r(u) = (a ~ XoaxXxa~x~... a~x~, (r~, r ~ , . . . , r , ) ) ,

add to 3 the LL(k) table Ts,,r,, for 1 ble
We begin with 3 = [Ts.t,l~}. Since Ts.t,~(aa)= (S---~ aAaa, {aa}), we must

add TA.taa~ tO 3. Likewise, since To(bb)= (S ~ bAba, {ha}), we must also
add TA,Cbo~to 3. The nonerror entries for the LL(2) tables TA.taa~ and TA.Cbo~
are shown below'
Table Ta, taa~

Production Sets
ba A ----~b
aa a----~e u
Table TA, tbal

Production Sets
ba a ----~ e
bb A ----~ b
At this point 3 = {Ts,{e], ZA,{aa} , ZA,{ba] ~ and no new entries can be added to
3 in Algorithm 5.2 so that the three LL(2) tables in 3 are the relevant LL(2)
table for G.
From the relevant set of LL(k) tables for an LL(k) grammar G we can
use the following algorithm to construct a valid parsing table for G. The k-
predictive parsing algorithm using this parsing table will actually use the
LL(k) tables themselves as nonterminal symbols on the pushdown list.
ALGORITHM 5.3
A parsing table for an LL(k) grammar G = (N, X, P, S).
Input. An LL(k) C F G G = (N, X, P, S) and 3, the set of LL(k) tables
for G.
Output. M, a valid parsing table for G.
Method. M is defined on (3 U X u [$}) × X.k as follows:
(1) If A --~ xoBlxlB2x2 ... BmX m is the ith production in P and TA,L is
in 3, then for all u such that

Ta,z(u) = (A ---~ XoBlX~B2x 2 . . . BmXm, <Y,, Y2, . .. , Ym)),
we have M(TA,z, u) = (xoTBl, rlxlTB,,v, X2 . . . TB.,r x m, i).
(2) M(a, av) = pop for all v in X *~k- 1).
(3) M($, e) = accept.
(4) Otherwise, M ( X , u) = error.
(5) Ts, te~ is the initial table. D
Example 5.16
Let us construct the parsing table for the LL(2) grammar
(1) S > aAaa

(2) S > bAba
(3) A >b
(4) A >e
using the relevant set of LL(2) tables constructed in Example 5.15. The pars-
ing table resulting from Algorithm 5.3 is shown in Fig. 5.5. In Fig. 5.5,
To = Ts, t,~, T1 = Ta,~aa~, and T2 = TA,Cba~-Blank entries indicate error.
aa ab a ba bb b e
To aTlaa, 1 aTlaa, 1 b T2 ba, 2
T1 e,4 b,3
T2 e,4 b,. 3
pop pop pop
pop pop pop
accept
Fig. 5.5 Parsing table.
The 2-predictive parsing algorithm would make the following sequence

of moves with input bba"
(bba, To$ , e) ~- (bba, bTzba$ , 2)
~- (ba, T~ba$, 2)
F- (ba, ba$, 24)
(a, aS, 24)
}-(e, $, 24). [-]
THEOREM 5.5
If G = (N, E, P, S) is an LL(k) grammar, then the parsing table con-

structed in Algorithm 5.3 is a valid parsing table for G under a k-predictive
parsing algorithm.
Proof. The proof is similar to that of Theorem 5.4. If G is LL(k), then
no conflicts occur in the construction of the relevant LL(k) tables for G,
:¢
since if A ----~fl and A ~ ~ are in P and S :=> wAoc, then
lm
(FIRSTkffl) @~k FIRSTk(tx)) ~ (FIRSTk(?) @~k FIRSTk(00) = ~ .
In the construction of the relevant LL(k) tables for G, we compute a table

Ta.z only if for some S, w, and ~, we have S ~ wAot, and L = FIRST~(a). That
lm
is, L will be a local follow set for A. Thus, if u is in Z *k, then there is at most
one production A ~ fl such that u is in FIRSTk(fl) OkL.
Let us define the homomorphism h on 3 u E as foIlows:
h(a) = a for all a E
h(T) = A if T is an LL(k) table associated with A and L for
some L.
Note that each table in 3 must have at least one entry which is a production
index. Thus, A is uniquely determined by T.
We shall now prove that
(5.1.2) S "=~ x0c if and only if there is some ~' in (3 u ~)* such that
h(og) = ~, and (xy, To$, e)I--~- (y, oc'$, n) for all y such that
0~~ y. To is the LL(k) table associated with S and {el.
If." From the manner in which the parsing table is constructed, whenever
a production number i is emitted corresponding to the ith production
A ---~fl, the parsing algorithm replaces a table T such that h(T) = A by a string
fl' such that h ( f l ' ) = ft. The "if" portion of statement (5.1.2)can thus be
proved by a straightforward induction on the number of moves made by
the parsing algorithm.
Only if." Here we shall show that
(5.1.3) If A ===~x, then the parsing algorithm will make the sequence of
moves (xy, T, e) ~ (y, e, n) for any LL(k) table T associated with
A and L, where L = FIRSTk(0 0 for some ~ such that S ==~wAoc,
lm
and y is in L.
The proof will proceed by induction on In I. If A ~ a la2 . . . an, then

354 ONE-PASS NO BACKTRACK PARSING CHAP• 5
(ata z . . . any , T, e) ~- (a t . . . any, a 1 . . . a n, i)
since T ( u ) = (A ---~ a l a 2 . . , a n, f~) for all u in FIRSTk(ala2 . . . a,)@k L.

Then (al . . . anY, al . . . a n, i)I.-~- (y, e, i). Now suppose that statement
(5.I.3) is true for all leftmost derivations of length up to /, and suppose
that A t = = ~ x o B t x i B 2 x 2 . . . BmX m and Bj='==~yj, where [rrj] < l. Then
(XoY~Xl " ' ' YmX,nY, T, e)I -~- (XoYlX~ " " YmXmY, xoTax~ "'" TmX m, i), since
T ( u ) : ( A ~ x o B l X 1 . . . BmX m, < Y t " ' " Ym>) f o r all u included in
F I R S T k ( x o B a x a . . " B,nXm) e e L. Each Tj is the LL(k) table associated with
Bj and Y~, 1 < j < m, so that the inductive hypothesis holds for each
sequence of moves of the form
( y j x y . . . YmXmY, Tj, e)[.-~-- (x i . . . YmXmY, e, ztj)
Putting in the popping moves for the x / s we obtain
(xoY~X~y2x2 "'" YmxmY, T, e)

~- ( X o Y l X l y 2 x z " " YmXmY, x o T l x l T z x z "'" TmXm, i)
~-- ( y l x 2 y 2 x 2 . . . y.,xmY , T l x t T 2 x z ... Tmxm, i)
~-- ( x l y z x z ''' YmXmY, x I T z x z " ' " TmXm, iffl)
(YzX2 " ' " ymXmY, T2x2 " ' " TmXm, iffl)
,
[-- (y, e, ilrlrt2 . . . Ztm)
From statement (5.1.3) we have, as a special case, that if S~==~ w, then

(w, To $, e)[---- (e, $, n). E]
As another example, let us construct the parsing table for the LL(2)
grammar G 2 of Example 5.3.
Example 5.17
Consider the LL(2) grammar G2
(1) S >e
(2) S > abA
(3) A ~ Saa
(4) A >b
Let us first construct the relevant LL(2) tables for G 2. We begin by con-
structing To = Ts.t,~"
see. 5.1 LL(k) GRAMMARS 355
Table To
Production Sets
e S-----re m
ab S~ abA {e]
From To we obtain T1 = T,~.c,~"

Table T1
Production Sets
b A --, b
aa A---~ Saa [aa}
ab A---~ Saa {aa}
From Ta we obtain T2 = Ts, ca=~"

Table T2
u Production Sets
aa S ---~ e m
ab S ~ abA {aa}
From Tz we obtain T3 = TA.ta=~"

Table T3
Production Sets
aa A ----~Saa {aa}
ab A.--, Saa {aa}
ba A --, b
From these LL(2) tables we obtain the parsing table shown in Fig. 5,6.
The 2-predictive parsing algorithm using this parsing table would parse
the input string abaa by the following sequence of moves"
(abaa, To $, e) R (abaa, ab T1 $, 2)
1- (baa, bT1 $, 2)
R (aa, Ti $, 2)
[-- (aa, T2aa$, 23)
i-- (aa, aa$, 231)
[- (a, aS, 231)
t- (e, $, 231) [B
aa ab a ba bb b e
To ab T1, 2 e, 1
Tl T2aa, 3 T2aa, 3 b,4
T2 e, 1 ab T3 , 2
T3 T2aa, 3 T2aa, 3 b, 4
a pop pop pop
b pop pop pop
S accept
Fig. 5.6 Parsing table for G2.
We conclude this section by showing that the k-predictive parsing

algorithm parses every input string in linear time.
THEOREM 5.6
The number of steps executed by a k-predictive parsing algorithm, using
the parsing table resulting from Algorithm 5.3 for an LL(k) context-free
grammar G = (N, E, P, S), with an input of length n, is a linear function of n.
Proof. If G is an LL(k) grammar, G cannot be left-recursive. From
Lemma 4.1, the maximum number of steps in a derivation of the form
+
A ==> B~ is less than some constant c. Thus the maximum number of moves
lm
that can be made by a k-predictive parsing algorithm t~ before a pop move,
which consumes another input symbol, is bounded above by c. Therefore,
a can execute at most O(n) moves in processing an input of length n. D
5.1.6. Testing for the LL(k) Condition
Given a grammar G, there are several questions we might naturally ask

about G. First, one might ask whether G is LL(k) for a given value of k.
Second, is G an LL grammar? That is, does there exist some value of k
such that G is LL(k)? Finally, since the left parsers for LL(1) grammars are
particularly straightforward to construct, we might ask, if G is not LL(1),
whether there is an LL(1) grammar G' such that L(G')= L(G).
Unfortunately we can provide an algorithm to answer only the first
question. It can be shown that the second and third questions are undecidable.
In this section we shall provide a test to determine whether a grammar is
LL(k) for a specified value of k. If k = 1, we can use Theorem 5.3. For
arbitrary k, we can use Theorem 5.2. Here we shall give the general case.
It is essentially just showing that Algorithm 5.3 succeeds in producing
a parsing table only if G is LL(k).
Recall that G = (N, Z, P, S) is not LL(k) if and only if for some ~ in

(N u Z)* the following conditions hold"
(1) S ==~wAoc,
tm
(2) L = FIRSTk(a), and

(3) (FIRSTk(fl) Ok L) A (FIRSTk(?) ~ L) ¢ ~ ,
for some fl ~ ~, such that A ---~ fl and .,4 --~ 7 are productions in P.
ALGORITHM 5.4
Test for LL(k)-ness.
lnput. A CFG G : (N, Z, P, S) and an integer k.
Output. "Yes" if G is LL(k). "No," otherwise.
Method.
(1) For a nonterminal A in N such that A has two or more alternates,
compute a(A) = {L ~ Z'k IS ==~wan and L = FIRSTk(~)}. (We shall pro-
lm
vide an algorithm to do this subsequently.)
(2) If A ~ fl and A --~ ? are distinct A-productions, compute, for each
L in a(A), f(L) = (FIRSTk(fl) @k L) ~ (FIRSTk(?) @k L). Iff(L) ~ ~ , then
halt and return "No." If f ( L ) = ~ for all L in a(A), repeat step (2) for all
distinct pairs of A-productions.
(3) Repeat steps (1) and (2) for all nonterminals in N.
(4) Return "yes" if no violation of the LL(k) condition is found.
To implement Algorithm 5.4, we must be able to compute FIRST~(fl)

for any fl in (N u Z)* and CFG G = (N, Z, P, S). Second, we must be able
to find the sets in a(A) -----{L ~ E*k [there exists ~ such that S =~ wAs, and
lm
L - - FIRSTk(~)}. We shall now provide algorithms to compute both these
items.
AL6ORIrIaM 5.5
Computation of FIRSTk(fl).
Input. A CFG G = (N, Z, P, S) and a string f l = X i X z . . . X , in
(N W Z)*.
Output. FIRST~(fl).
Method. We compute FIRSTk(X~) for 1 < i ~ n and observe that by
Lemma 5.1
FIRSTk(fl) -- FIRSTk(X1) O1, FIRSTk(X2) @)I, " " Ok FIRSTk(X,)

It will thus suffice to show how to find FIRSTk(X ) when X is in N; if X is
in Z L) {e}, then obviously FIRSTk(X) = (X}.
We define sets Ft(X) for all X in N U l~ and for increasing values of i,

i > 0, as follows"
(1) F~(a) = [a} for all a in E and i > 0.
(2) Fo(A) = {~x E ~ * k l A ~ X0~ is in P, where either Ix[ = k or Ix[ < k
and ~ = e].
(3) Suppose that F 0, F x , . . . , Ft-1 have been defined for all A in N.
Then
F~(A) = {xlA ---~ Y ~ . . . Y, is in P and x is in
F,_ ,(g~) ~ k F,_ x(gz) ~ k "'" ~)k F,_ ~(Y.)} W F,_ i(A).
(4) As F~_~(A) ~ Ft(A) ~ E,k for all A and i, eventuaIly we must reach
an i for which Ft_i(A) = Ft(A) for all A in N. Let FIRSTk(A ) = Ft(A) for
that value of i. E]
Example 5.18
Let us construct the sets F~(X), assuming that k has the value l, for
the grammar G with productions
S > BA
A >-q-BAle
B > DC
C > , DCle
D > (S)la
Initially,
F0(S) = F0(B) =
Fo(A) = [-t-, e}
Fo(C) = [*, e}
Fo(D) = [(, a}
Then F,(B)={(, a} and Fi(X) = Fo(X) for all other X. Then F z ( S ) = [(, a}
and F 2 ( X ) = El(X) for all other X. F3(X) = F2(X) for all X, so that
FIRST(S) = FIRST(B) = FIRST(D) = [(, a}

FIRST(A) = [ + , e}
FIRST(C) = {,, e} D
THEOREM 5.7
Algorithm 5.5 correctly computes FIRSTk(A ).
Proof. We observe that if for all X in N U ~, Fi_i(X)= Ft(X), then
Ft(X) = Fj(X) for all j > i and all X. Thus we must prove that x is in
FIRSTk(A ) if and only if x is in Fj(A) for some j.
lf: We show that F / A ) ~ FIRSTk(A) by induction on j. The basis,
j = 0, is trivial. Let us consider a fixed value of j and assume that the
hypothesis is true for smaller values ofj.
If x is in Fj(A), then either it is in Fj_ i(A), in which case the result is
immediate, or we can find A ~ Y1 . . . Yn in P, with xp in F~_ l(Yp), where
1 ~ p ~ n, such that x = FIRSTk(x 1 . . . xn). By the inductive hypothesis,
xp is in FIRSTk(Yp). Thus there exists, for each p, a derivation Y~ ==~ yp,
where xp is FIRSTk(yp). Hence, A =-~ y l . . . y ~ . We must now show that
x = FIRSTk(y I . . . y~), and thus conclude that x is in FIRSTk(A ).
Case 1: tx~ . . . xn[ < k. Then yp = xp for each p, and x = y t . . . y ~ .
Since Yl "'" Yn is in FIRSTk(A ) in this case, x is in FIRSTk(A ).
Case 2." For some s > 0, Ix1 " " x,I < k but Ix1 " . . x~+11>_ k. Then
yp = xp for 1 ~ p ~ s, and x is the first k symbols of x~ . . . x,+~. Since
x,+~ is a prefix of Ys+l, x is a prefix of y1 . . . y,+l and hence of y~ . . . y~.
Thus, x is FIRSTk(A ).
/.
Only if: Let x be in FIRSTk(A). Then for some r, A ==~ y and x =

FIRSTk(y ). We show by induction on r that x is in F~(A). The basis, r = 1,
is trivial, since x is in Fo(A ). (In fact, the hypothesis could have been tightened
somewhat, but there is no point in doing so.)
Fix r, and assume that the hypothesis is true for smaller r. Then
r-1
A~Y1 "'" Y n ~ Y ,
rla
where y = y l . . . y~ and Yp ==~ yp for 1 < p ~ n. Evidently, rp < r. By the

inductive hypothesis, xp is in F~_I(Yp) , where xp = FIRSTk(yp). Thus,
FIRSTk(x ~ . . . xp), which is x, is in F~(A). [Z
In the next algorithm we shall see a method of computing, for a given

g r a m m a r G = (N, X, P, S), those sets L ~ X *k such that S ==~ wAo~ and
lm
FIRSTk(e ) = L for some w, A, and e.
ALGORITHM 5.6
Computation of tr(A).
Input. A C F G G = (N, X, P, S).
Output.
a(A) = {L1L ~ X*e such that S =-~ wAo~ and

Im
FIRSTk(~ ) = L for some w and ~}.
Method. We shall compute, for all A and B in N, sets a(A, B) such that
$
a(A,B)={L[LcE* k, and for some x and a, A = ~ x B a and L =
lm
FIRST(e)}. We construct sets tri(A, B) for each A and B and for i = 0, 1. . . .
as follows"
(1) Let a o ( A , B ) = {L ~ Z*k[A---~flBa is in P and L = FIRST(a)}.
(2) Assume that tr~_~(A, B) has been computed for all A and B. Define
at(A, B) as follows"
(a) If L is in at_ i(A, B), place L in tri(A, B).
(b) If there is a production A ---~ X1 . " X, in P, place L in at(A, B)
if for some j, 1 _~j < n, there is a set L' in trt_~(Xj, B) and L =
L' ~ k FIRST(Xj+i) @~k "'" ~ k FIRST(X,).
(3) When for some i, an(A, B) = a~_ x(A, B) for all A and B, let a(A, B) =
at(A, B). Since for all i, a~_~(A, B ) ~ at(A, B ) ~ (P(Z*k), such an i must
exist.
(4) The desired set is a(S, A). [~]
THEOREM 5.8
In Algorithm 5.6, L is in a(S, A) if and only if for some w ~ Z* and
a ~ (N u Z)*, S ==~wAa and L = FIRSTk(a ).
lm
Proof The proof is similar to that of the previous theorem and is left
for the Exercises.
Example 5.19
Let us test the grammar G with productions
S > AS[e
A ~ aAlb
for the LL(1) condition.
We begin by computing F I R S T i ( S ) = [e, a, b] and FIRSTi(A) = {a, b}.
We then must compute a(S) = a(S, S) and a(A) = a(S, A). From step
(1) of Algorithm 5.4 we have
ao(S, S) = {{e}] ao(S, A) -- {{e, a, b}}

ao(A, S) = ao(A, A) = {{e}}
F r o m step (2) we find no additions to these sets. For example, since S ~ AS

is a production, and ao(A, A) contains {e} we must add to a l(S, A) the set
L = {e} t~)l FIRST(S) = {e, a, b] by step (2b). But a~(S, A) already contains
{e, a, b}, because ao(S, A) contains this set.
Thus, a ( A ) = {(e, a, b}) and a ( S ) = [[el}. To check that G is LL(1), we
EXERCISES 361
have to verify that ( F I R S T ( A S ) ~ , (e}) ~ (FIRST(e)@), ( e ) ) = ~ . [This is

for the two S-productions and the lone m e m b e r of a(S, S).] Since
FIRST(AS)-- FIRST(A)~, F I R S T ( S ) - - (a, b}
and F I R S T ( e ) = [e], we indeed verify that [a, b} 5~ [ e ) = ~.
F o r the two A-productions, we must show that
(FIRST(aA) ~ [e, a, b)) ~ (FIRST(b) @>~ [e, a, b)) -- ~ .
This relation reduces to (a) 5~ {b] -- ~ , which is true. Thus, G is EL(l). [-7
EXERCISES
5.1.1. Show that if G is left-recursive, then G is not an LL grammar.

5.1.2. Show that if G has two productions A - , aOc]afl, where ~ ~ fl, then
G cannot be LL(1).
5.1.3. Show that every LL grammar is unambiguous.
5.1.4. Show that every grammar obeying statement (5.1.1) on page 343 is LL(k).
5.1.5. Show that the grammar with productions
S~ aAaBl bAbB
A- ~ alab
B- ~aBla
is LL(3) but not LL(2).

5.1.6. Construct the LL(3) tables for the grammar in Exercise 5.1.5.
5.1.7. Construct a deterministic left parser for the grammar in Example 5.17.
5.1.8. Give an algorithm to compute FOLLOW~(A) for nonterminal A.
"5.1.9. Show that every regular set has an LL(I) grammar.
5.1.10. Show that G = (N, Z , P , S) is an LL(1) grammar if and only if for
each set of A-productions A ~ 0~1 1(~2[''" ](~n the following condi-
tions hold"
(1) FIRST?(0ct) ~ FIRST~(~j) = ~5 for i ~ j.
(2) If 0¢; ==~ e, FIRST~(0Cj) ~ FOLLOW~(A) -- ~ for 1 __~j -< n,
i :~ j. Note that at most one ct~ can derive e.
*'5.1.11. Show that it is undecidable whether there exists an integer k such that
a CFG is LL(k). [In contrast, if we are given a fixed value for k, we
can determine if G is LL(k) for that particular value of k.]
"5.1.12. Show that it is undecidable whether a CFG generates an LL language.
"5.1.13. The definition of an LL(k) grammar is often stated in the following
manner. Let G - ( N , E, P, S) be a CFG. If S =~ wax, for w and x
362 ONE-PASS N O B A C K T R A C K PARSING CHAP. 5
in ~* and A ~ N, then for each y in Z *k there is at most one production

A ---~ a such that y is in FIRSTk(ax). Show that this definition is equi-
valent to the one given in Section 5.1.1.
*'5.1.17. Show that if L is an LL(k) language, then L has an LL(k) grammar
in Chomsky normal form.
5.1.18. Show that an LL(0) language has at most one member.
"5.1.19. Show that the grammar G with productions S ~ aaSbbla[e is LL(2).
Find an equivalent LL(1) grammar for L(G).
*'5.1.20. Show that the language [a"0b"l n > 0} LJ {a"lb2"l n > 0} is not an LL
language.
Exercises 5.1.21-5.1.24 are done in Chapter 8. The reader may
wish to try his hand at them now.
*'5.1.21. Show that it is decidable for two LL(k) grammars, G1 and G2, whether
L ( G , ) = L(Gz).
*'5.1.22. Show that for all k > 0, there exist languages which are LL(k + 1)
but not LL(k).
*'5.1.23. Show that every LL(k) language has an LL(k + 1) grammar with no
e-productions.
*'5.1.24. Show that every LL(k) language has an LL(k + 1) grammar in Grei-
bach normal form.
"5.1.25. Suppose that A ~ 0eft lay are two productions in a grammar G such
that a does not derive e and fl and ~ begin with different symbols.
Show that G is not LL(1). Under what conditions will the replacement
of these productions by
A ------~ 0cA'
h ' ---> ]31~,
transform G into an equivalent LL(1) grammar 9.

5.1.26. Show that if G = (N, Z, P, S) is an LL(k) grammar, then, for all
A ~ N, GA is LL(k), where Ga is the grammar obtained by removing
all useless productions and symbols from the grammar (N, Z, P, A).
Analogous to the LL grammars there is a class of grammars, called
the LC grammars, which can be parsed in a left-corner manner with
a deterministic pushdown transducer scanning the input from left to
right. Intuitively, a grammar G = (N, Z , P , S) is LC(k) if, knowing
,
the leftmost derivation S ~ wAS, we can uniquely determine that the
lm
production to replace A is A ~ X 1 . - . X~ once we have seen the
EXERCISES 363
portion of the input derived from X1 (the symbol X1 is the left corned
and the next k input symbols. In the formal definition, should Xx be
a terminal, we may then look only at the next k - 1 symbols. This
restriction is made for the sake of simplicity in stating an interesting
theorem which will be Exercise 5.1.33. In Fig. 5.7, we would recognize
production A ~ X1 . . . Xp after seeing wx and the first k symbols
(k - 1 if Xa is in E) of y. Note that if G were LL(k), we could recognize
the production "sooner," specifically, once we had seen w and
FIRSTk(Xy).
w x y
Fig. 5.7 Left-corner parsing.
We shall make use of the following type of derivation in the defini-

tion of an LC grammar.
DEFINITION
Let G be a CFG. We say that S = - ~ wAS if S ~ w A S and the
lc lrn
nonterminal A is not the left corner of the production which intro-
duced it into a left-sentential form of the sequence represented by
S :=-> wAo~.
lm
For example, in Go, E ~ E + T is false, since the E in E + T

le
arises from the left corner of the production E ~ E + T. On the other
,
hand, E ~ a + T is true, since T is not the left corner of the produc-
lc
tion E ~ E + T, which introduced the symbol T in the sequence .
E==-> E + T:=> a + T.
DEFINITION
CFG G----(N, Z , P , S) is L C ( k ) if the following conditions are
,
satisfied" Suppose that S ~ wAG. Then for each lookahead string u
lc
there is at most one production B ~ cx such that A ~ B'F and

(1) (a) If a = Cfl, C ~ N, then u ~ FIRSTk(fl},d;), and

(b) In addition, if C = A, then u is not in FIRST,(d;);
(2) If a does not begin with a nonterminal, then u ~ FIRSTk(aT'~).
Condition (la) guarantees that the use of the production B ~ Cfl
can be uniquely determined once we have seen w, the terminal string
derived from C (the left corner) and FIRSTk(fl)'~) (the lookahead
string).
Condition (lb) ensures that if the nonterminal A is left-recursive
(which is possible in an LC grammar), then we can tell after an instance
of A has been found whether that instance is the left corner of the
production B----~ AT' or the A in the left-sentential form wAa.
Condition (2) states that FIRSTk(0~7't~) uniquely determines that the
production B ---~ 0~ is to be used next in a left-corner parse after having
seen wB, when ~ does not begin with a nonterminal symbol. Note
that 0¢ might be e here.
For each LC(k) grammar G we can construct a deterministic left
corner parsing algorithm that parses an input string recognizing the left
corner of each production used bottom-up, and the remainder of the
production top-down.
Here, we shall outline how such a parser can be constructed for
LC(1) grammars. Let G = ( N , ~ , P , S ) be an L C ( 1 ) g r a m m a r .
F r o m G we shall construct a left corner parser ~ such that ~r(~) --
{(x, n)lx ~ L(G)and n; is a left-corner parse for x}. ~ uses an input
tape, a pushdown list and an output tape as does a k-prodictive parser.
The set of pushdown symbols is I" = N U ~ w (N x N) u {$}.
Initially, the pushdown list contains S$ (with S on top). A single
nonterminai or terminal symbol appearing on top of the pushdown list
can be interpreted as the current goal to be recognized. When a push-
down symbol is a pair of nonterminals of the form [A, B], we can think
of the first component A as the current goal to be recognized and
the second component B as a left comer which has just been recog-
nized.
For convenience, we shall construct a left-corner parsing table T
which is a mapping from F × (~ u {e}) to (F* × (P u {e}) u {pop,
accept, error}. This parsing table is similar to a 1-predictive parsing table
for an LL(1) grammar. A configuration of M will be a triple (w, X0~, zt),
where w represents the remaining input, X0c represents the pushdown
list with X e F on top, and 7t is the output to this point. If T(X, a) =
(fl, i), X in N u (N x N), then we write (aw, Xoc, rt) ~ (aw, floe, rti).
If T(a, a) = pop, then we write (aw, aoc, 70 ~ (w, oc, 70. We say that 7t
is a (left-corner) parse of x if (x, S$, e)I ~ (e, $, 70.
Let G = (N, E, P, S) be an LC(1) grammar. T is constructed from
G as follows:
(1) Suppose that B ----~ tz is the ith production in P.
(a) If 0c = Cfl, where C is a nonterminal, then T([A, C], a)
= (iliA, B], i) for all A ~ N and a ~ F I R S T i(fl)'~) such
EXERCISES 365
that S ~ wAO and A ~ B~,. Here, 12 recognizes left

lc
corners bottom-up. Note that A is either S or not the left
corner of some production, so at some point in the parsing
A will be a goal.
(b) If 0~ does not begin with a nonterminal, then T(A, a)
= (tz[A, B], i) for all A ~ N and a ~ FIRSTi(~}'~) such
that S ~ wAO and A =-> BT.
lc
(2) T([A, A], a ) = (e, e) for all A ~ N and a ~ FIRSTi(O) such
that S ~ wA~.
le
(3) T(a, a) = pop for all a E E.
(4) T($, e) = aecept.
(5) T(X, a) = error otherwise.
Example 5.20
Consider the following grammar G with productions
(1) S - - * S + A (2) S--~ A
(3) A --> A . B (4) A--> B
(5) B---> ( S ) (6) B ~ a
G is an LC(1) grammar. G is, in fact, Go slightly disguised. A left-
corner parsing table for G is shown in Fig. 5.8.
The parser using this left-corner parsing table would make the
following sequence of moves on input ~ a , a):
((a • a), S$, e) ~---((a • a), (S)[S, B]$, 5)

(a • a), ST[S, B]$, 5)
F- (a • a), a[S, B])[S, B]$, 56)
l-- (* a), [S, B])[S, B]$, 56)
[- (,a), IS, A])[S, B]$, 564)
}- (,a), • B[S, A])[S, B]$, 5643)
t- (a), B[S, A])[S, B]$, 5643)
(a), a[B, B][S, hi)IS, B]$, 56436)
~-- (), [B, B][S, A])[S, B]$, 56436)
(), [S, A])[S, B]$, 56436)
(), IS, S])[S, B]$, 564362)
1-- (),)[S, B]$, 564362)
1-- (e, [S, B]$, 564362)
[--- (e, [S, A]$, 5643624)
(e, IS, S]$, 56436242)
t- (e, $, 56436242)
Pushdown Input symbol

symbol a < > +
i'i
S a[S,B], 6 < S > [ S , B], 5
p . . . . . . . . . . . . . . .
, i~
A a[A,B] , 6 <S>[A,B] , 5
L. . . . . . . . . . . , ,, ~h
B a[B,B], 6 <S>[B,B], 5
[S,S] e,e +A[S, SI, l e,e
Is, A] [S, S], 2 [S, SI, 2 *B[S, A I , 3 [S, S], 2

• ,, ,,
[S, BI [S,A],4 [S,A],4 [S,A],4 [S,A],4

[A,AI e,e e,e .B[A,AI,3 e,e
. .
IA,B] [A,A],4 [A,A],4 Ih,a],4 [A,AI,4

, ,, ., ..
[B, BI e,e e,e e,e e,e

p . . . .
a pop
< pop
> pop
+[ pop
pop
sl accept
Fig. 5.8 Left-corner parsing table for G.
The reader can easily verify that 56436242 is the correct left corner
parse for (a • a).
"5.1.27. Show that the grammar with productions
S----+ A [ B
A ~ aAblO
B - > aBbb l 1
is not LC(k) for any k.

"5.1.28. Show that the following grammar is LC(1)"
E----->.E + T [ T
T--->" T . F [ F
F----> PI" F I P
P----~" (E) l a
5.1.29. Construct a left-corner parser for the grammar in Exercise 5.1.28.

EXERCISES 367
*'5.1.30. Provide an algorithm which will test whether an arbitrary grammar is

LC(1).
*'5.1.31. Show that every LL(k) grammar is LC (k).
5.1.32. Give an example of an LC(1) grammar which is not LL.
*'5.1.33. Show that ff Algorithm 2.14 is applied to a grammar G to put it in Grei-
bach normal form, then the resulting grammar is LL(k) if and only if
G is LC(k). Hence the class of LC languages is identical to the class of
LL languages.
"5.1.34. Provide an algorithm to construct a left-corner parser for an arbitrary
LC(k) grammar.
Research Problems
5.1.35. Find transformations which can be used to convert non-LL(k) grammars
into equivalent LL(1) grammars.
5.1.36. Write a program that takes as input an arbitrary C F G G and con-
structs a 1-predictive parsing table for G if G is LL(i).
5.1.37. Write a program that takes as input a parsing table and an input string
and parses the input string using the given parsing table.
5.1.38. Transform one of the grammars in the Appendix into an LL(1) grammar.
Then construct an LL(1) parser for that grammar.
5.1.39. Write a program that tests whether a grammar is LL(1).
Let M be a parsing table for an LL(1) grammar G. Suppose that we
are parsing an input string and the parser has reached the configuration
(ax, Xoc, ~z). If M(X, a ) = error, we would like to announce that an
error occurred at this input position and transfer to an error recovery
routine which modifies the contents of the pushdown list and input
tape so that parsing can proceed normally. Some possible error recovery
strategies are
(1) Delete a and try to continue parsing.
(2) Replace a by a symbol b such that M(X, b ) ~ error and con-
tinue parsing.
(3) Insert a symbol b in front of a on the input such that M(X, b)
error and continue parsing. This third technique should be used with
care since an infinite loop is easily possible.
(4) Scan forward on the input until some designated input symbol
b is found. Pop symbols from the pushdown list until a symbol X is
found such that X : ~ bfl for some ft. Then resume normal parsing.
We also might list for each pair (X, a) such that M(X, a) = error,
several possible error recovery methods with the most promising
method listed first. It is entirely possible that in some situations inser-
tion of a symbol may be the most reasonable course of action, while

in other cases deletion or change would be most likely to succeed.
5.1.40. Devise an error recovery algorithm for the LL(1) parser constructed
in Exercise 5.1.38.
BIBLIOGRAPHIC NOTES
LL(k) grammars were first defined by Lewis and Steams [1968]. In an early
version of that paper, these grammars were called TD(k) grammars, TD being an
acronym for top-down. Simple LL(1) grammars were first investigated by Koren-
jak and Hopcroft [1966], where they were called s-grammars.
The theory of LL(k) grammars was extensively developed by Rosenkrantz and
Steams [1970], and the answers to Exercises 5.1.21-5.1.24 can be found there.
LL(k) grammars and other versions of deterministic top-down grammars have
been considered by Knuth [1967], Kurki-Suonio [1969], Wood [1969a, 1970] and
Culik [1968].
P.M. Lewis, R.E. Stearns, and D.J. Rosenkrantz have designed compilers
for ALGOL and FORTRAN whose syntax analysis phase is based on an LL(1)
parser. Details of the ALGOL compiler are given by Lewis and Rosenkrantz
[1971]. This reference also contains an LL(1) grammar for ALGOL 60.
LC(k) grammars were first defined by Rosenkrantz and Lewis [1970]. Clues
to Exercises 5.1.27-5.1.34 can be found there.
5.2. DETERMINISTIC BOTTOM-UP PARSING
In the previous section, we saw a class of grammars which could be parsed

top-down deterministically, while scanning the input from left to right. There
is an analogous class of languages that can be parsed deterministically bot-
tom-up, using a left-to-right input scan. These are called LR grammars, and
their development closely parallels the development of the LL grammars in
the preceding section.
5.2.1. Deterministic Shift-Reduce Parsing
In Chapter 4 we indicated that bottom-up parsing can proceed in a shift-

reduce fashion employing two pushdown lists. Shift-reduce parsing consists
of shifting input symbols onto a pushdown list until a handle appears on
top of the pushdown list. The handle is then reduced. If no errors occur, this
process is repeated until all of the input string is scanned and only the sentence
symbol appears on the pushdown list. In Chapter 4 we provided a backtrack
algorithm that worked in essentially this fashion, normally making some
initially incorrect choices for some handles, but ultimately making the correct
choices. In this section we shall consider a large class of grammars for which
sEc. 5.2 DETERMINISTIC BOTTOM-UP PARSING 369
this type of parsing can always be done in a deterministic manner. These

are the LR(k) grammars the largest class of grammars which can be "natu-
rally" parsed bottom-up using a deterministic pushdown transducer. The L
stands for left-to-right scanning of the input, the R for producing a right
parse, and k for the number of input "lookahead" symbols.
We shall later consider various subclasses of LR(k) grammars, including
precedence grammars and bounded-right-context grammars.
Let ~x be a right-sentential form in some grammar, and suppose that
is either the empty string or ends with a nonterminal symbol. Then we shall
call ~ the open portion of 0~x and x the closed portion of 0cx. The boundary
between ~ and x is called the border. These definitions of open and closed
portion of a right-sentential form should not be confused with the previous
definitions of open and closed portion, which were for a left-sentential form.
A "shift-reduce" parsing algorithm can be considered a program for
an extended deterministic pushdown transducer which parses bottom-up.
Given an input string w, the DPDT simulates a rightmost derivation in
reverse. Suppose that
S ~ - oCo ------~ o~1------- ~ . . . ~. O~m = W

rm rm rm
is a rightmost derivation of w. Each right-sentential form at is stored by

the DPDT with the open portion of ~t on the pushdown list and the closed
portion as the unexpended input. For example, if 0~ -- o~Ax, then 0~A would
be on the pushdown list (with A on top) and x would be the as yet unseanned
portion of the original input string.
Suppose that 0ct_~--?Bz and that the production B - - . fly is used in
the step ~t-1 ~ ~, where yfl = o~A and y z = x. With 0~A on the pushdown
rm
list, the PDT will shift some number (possibly none) of the leading symbols
of x onto the pushdown list until the right end of the handle of o~t is found.
In this case, the string y is shifted onto the pushdown list.
Then the PDT must locate the left end of the handle. Once this has been
done, the PDT will replace the handle (here fly), which is on top of the push-
down list, b y the appropriate nonterminal (here B) and emit the number of
the production B----~ fly. The PDT now has yB on its pushdown list, and
the unexpended input is z. These strings are the open and closed portions,
respectively, of the right-sentential form ~_ i.
Note that the handle of ocAx can never lie entirely within ~, although it
could be wholly within x. That is, ~t-1 could be of the form o~AxlBx2, and
a production of the form B ---~y, where x l y x 2 = x could be applied to obtain
0~. Since Xl could be arbitrarily long, many shifts may occur before ~t can be
reduced to ~_ 1.
To sum up, there are three decisions which a shift-reduce parsing
algorithm must make. The first is to determine before each move whether to
shift an input symbol onto the pushdown list or to call for a reduction. This
decision is really the determination of where the right end of a handle occurs
in a right-sentential form.
The second and third decisions occur after the right end of a handle is
located. Once the handle is known to lie on top of the pushdown list, the left
end of the handle must be located within the pushdown list. Then, when
the handle has been thus isolated, we must find the appropriate nonterminal
by which it is to be replaced.
A grammar in which no two distinct productions have the same right
side is said to be uniquely invertible (UI) or, alternatively, backwards deter-
ministic. It is not difficult to show that every context-free language is gener-
ated by at least one uniquely invertible context-free grammar.
If a grammar is uniquely invertible, then once we have isolated the handle
of a right-sentential form, there is exactly one nonterminal by which it can
be replaced. However, many useful grammars are not uniquely invertible,
So in general we must have some mechanism for knowing with which non-
terminal to replace a handle.
Example 5.21
Let us consider the grammar G with the productions
(1) S ~ SaSb
(2) S ~e
Consider the rightmost derivation:
S ~ SaSb ~ SaSaSbb ~ SaSabb ~ Saabb ~ aabb
Let us parse the sentence aabb using a pushdown list and a shift-reduce
parsing algorithm. We shall use $ as an endmarker for both the input string
and the bottom of the pushdown list.
We shall describe the shift-reduce parsing algorithm in terms of configu-
rations consisting of triples of the form (~X, x, n), where
(1) ~X represents the contents of the pushdown list, with X on top;
(2) x is the unexpended input; and
(3) n is the output to this point.
We can picture this configuration as the configuration of an extended PDT
with the state omitted and the pushdown list preceeding the input. In
Section 5.3.1 we shall give a formal description of a shift-reduce parsing
algorithm.
Initially, the algorithm will be in configuration ($, aabb$, e). The algo-
SEC. 5.2 DETERMINISTIC BOTTOM-UP PARSING 371
rithm must then recognize that the handle of the right-sentential form aabb
is e, occurring at the left end, and that this handle is to be reduced to S.
We defer describing the actual mechanism whereby handle recognition
occurs. Thus the algorithm must next enter the configuration ($S, aabb$, 2).
It will then shift an input symbol on top of the pushdown list to enter
configuration ($Sa, abb$, 2). Then it will recognize that the handle e is
on top of the pushdown list and make a reduction to enter configuration
( $SaS, abb $, 22).
Continuing in this fashion, the algorithm would make the following
sequence of moves:
($, aabb $, e) ~ ($S, aabb $, 2)

k- ( $Sa, abb $, 2)
k- ( $SaS, abb $, 22)
F- ( $SaSa, bb $, 22)
k- ( $SaSaS, bb $, 222)
k- ( $SaSaSb, b $, 222)
?- ( $SaS, b $, 2221)
?- ( $SaSb, $, 2221)
?- ($S, $, 22211)
accept [~
5.2.2. LR(k) Grammars
In this section we shall define a large class of grammars for which we can
always construct deterministic right parsers. These grammars are the LR(k)
grammars.
Informally, we say that a grammar is LR(k) if given a rightmost derivation
S = ao - ~ al =~ a2 ~ . . . - ~ am = z, we can isolate the handle of each
rm rm rm rm
right-sentential form and determine which nonterminal is to replace the han-
dle by scanning ai from left to right, but only going at most k symbols past
the right end of the handle of ai.
Suppose that a,_~ = aAw and a~ = aflw, where fl is the handle of a~.
Suppose further that /3 = X1X2... X,.. If the grammar is LR(k), then we
can be sure of the following facts:
(1) Knowing aX~X~ ... Xj and the first k symbols of Xj+~... X,w, we
can be certain that the right end of the handle has not been reached until
j = r .
(2) Knowing aft and at most the first k symbols of w, we can always deter-
mine that fl is the handle and that fl is to be reduced to A.
(3) When ~_ ~ = S, we can signal with certainty that the input string is
to be accepted.
Note that in going through the sequence ~m, ~ - ~ , ' ' ' , ~0, we begin by
looking at only FIRSTk(~m)= FIRSTk(w ). At each step our lookahead
string will consist only o f k or fewer terminal symbols.
We Shall now define the term LR(k) grammar. But before we do so, we
first introduce the simple concept of an augmented grammar.
DEFINITION
Let G = (N, Z, P, S) be a CFG. We define the augmented grammar

derived from G as G' : (N U {S'}, ~, P U {S' ----~S}, S'). The augmented
grammar G' is merely G with a new starting production S ' - ~ S, where S' is
a new start symbol, not in N. We assume that S' ---~ S is the zeroth pro-
duction in G' and that the other productions of G are numbered 1, 2 . . . . , p.
We add the starting production so that when a reduction using the zeroth
production is called for, we can interpret this "reduction" as a signal to
accept.
We shall now give the precise definition of an LR(k) grammar.
DEFINITION
Let G = (N, E, P, S) be a C F G and let G ' = (N', ~, P', S') be its aug-
mented grammar. We say that G is LR(k), k ~ 0, if the three conditions
(1) S' ~ ocAw ~ ~pw,

G' rm G p rm
(2) S' ~ ~Bx ~ oq~y, and

G' rm G" r m
(3) FIRSTk(w ) = FIRSTk(Y)

imply that aAy = ~,Bx. (That is, ~ = ~,, A = B, and x = y.)
A grammar is LR if it is LR(k) for some k.
Intuitively this definition says that if aflw and afly are right-sentential
forms of the augmented grammar with FIRSTk(w ) = FIRSTk(y ) and if
A ---~ fl is the last production used to derive aflw in a rightmost derivation,
then A ~ fl must also be used to reduce afly to aAy in a right parse. Since
A can derive fl independently of w, the LR(k) condition says that there is
sufficient information in FIRSTk(w ) to determine that aft was derived from
aA. Thus there can never be any confusion about how to reduce any right-
sentential form of the augmented grammar. In addition, with an LR(k)
grammar we will always know whether we should accept the present input
string or continue parsing. If the start symbol does not appear on the right
side of any production, we can alternatively define an LR(k) grammar
G = (N, E, P, S) as one in which the three conditions
:g
(1) S =:~ ~aw ::~ ~pw,
rill I'm
(2) S =:~ ?Bx ~ ocpy, and

rlll 1"rrl.
(3) F I R S T k ( w ) = FIRSTk(y )
imply that otAy = yBx.
The reason we cannot always use this definition is that if the start symbol
appears on the right side of some production we may not be able to determine
whether we have reached the end of the input string and should accept or
whether we should continue parsing.
Example 5.22
Consider the grammar G with the productions
S >Sa[a
If we ignore the restriction against the start symbol appearing on the right
side of a production, i.e., use the alternative definition, G would be an LR(0)
grammar.
However, using the correct definition, G is not LR(0), since the three
conditions
0
(1) S'"---~ S"----'-~ S,
Gt rnl Gt rrll
(2) S' ~ S ~ Sa, and

G" r m G' r m
(3) FIRST0(e ) = FIRSTo(a ) = e
do not imply that S'a = S. Relating this situation to the definition we would
have ~ = e, , 8 = S, w = e , 7, = e, A = S', B = S, x = e, a n d y = a . The
problem here is that in the right-sentential form Sa of G' we cannot determine
whether S is the handle of Sa (i.e., whether to accept the input derived from
S) looking zero symbols past the S. Intuitively, G should not be an LR(0)
grammar and it is not, if we use the first definition. Throughout this book,
we shall use the first definition of LR(k)-ness. [~
In this section we show that for each LR(k) grammar G = (N, E, P, S)

we can construct a deterministic right parser which behaves in the following
manner.
First of all, the parser will be constructed from the augmented grammar
G'. The parser will behave very much like the shift-reduce parser introduced
in Example 5.21, except that the LR(k) parser will put special information
symbols, called LR(k) tables, on the pushdown list above each grammar
symbol on the pushdown list. These LR(k) tables will determine whether
a shift move or a reduce move is to be made and, in the case of a reduce
move, which production is to be used.
Perhaps the best way to describe the behavior of an LR(k) parser is via
a running example.
Let us consider the g r a m m a r G of Example 5.21, which we can verify is
an LR(1) g r a m m a r . The a u g m e n t e d g r a m m a r G' is
(0) S ' - - + S
(1) S - >SaSb
(2) S >e
A n LR(1) parser for G is displayed in Fig. 5.9.
Parsing action Goto

b e S a b
To 2 X 2 T1 X X
T1 S X A X T2 X
T2 2 2 X T3 X X
T3 S S X X T4 Ts
75 2 2 X r6 X X
T5 1 X 1 X X X
r6 S S X X T4 TT,
T7 1 X X X X
Legend
i -= reduce using production i
S ~ shift
A -= accept
X ~ error
Fig. 5.9 LR(t) parser for G.
An LR(k) parser for a C F G G is nothing m o r e than a set of rows in

a large table, where each row is called an " L R ( k ) table." One row, here To,
is distinguished as the initial LR(k) table. Each LR(k) table consists of two
f u n c t i o n s - - a parsing action function f a n d a goto function g:
(1) A parsing action function f takes a string u in E,k as a r g u m e n t (this
string is called the l o o k a h e a d string), and the value of f(u) is either shift,
reduce i, error, or accept.
SEC. 5.2 DETERMINISTIC BO'I~rOM-UP PARSING 375
(2) A goto function g takes a symbol X in N u ]E as argument and has

as value either the name of another LR(k) table or error.
Admittedly, we have not explained how to construct such a parser at this
point. The construction is delayed until Sections 5.2.3 and 5.2.4.
The LR parser behaves as a shift-reduce parsing algorithm, using a push-
down list, an input tape, and an output buffer. At the start, the pushdown
list contains the initial LR(k) table To and nothing else. The input tape con-
tains the word to be parsed, and the output buffer is initially empty. If we
assume that the input word to be parsed is aabb, then the parser would
initially be in configuration
(T0, aabb, e)
Parsing then proceeds by performing the following algorithm.

ALGORITHM 5.7
LR(k) parsing algorithm.
Input. A set 3 of LR(k) tables for an LR(k) grammar G = (N, ~, P, S),
with To ~ 3 designated as the initial table, and an input string z ~ ~*,
which is to be parsed.
Output. If z ~ L(G), the right parse of G. Otherwise, an error indication.
Method. Perform steps (1) and (2) until acceptance occurs or an error is
encountered. If acceptance occurs, the string in the output buffer is the right
parse of z.
(1) The lookahead string u, consisting of the next k input symbols, is
determined.
(2) The parsing action function f of the table on top of the pushdown
list is applied to the lookahead string u.
(a) If f ( u ) = shift, then the next input symbol, say a, is removed
from the input and shifted onto the pushdown list. The goto
function g of the table on top of the pushdown list is applied to a
to determine the new table to be placed on top of the pushdown
list. We then return to step (1). If there is no next input symbol or
g (a) is undefined, halt and declare error.
(b) If f(u) = reduce i and production i is A --~ ~, then 21~1 symbols?
are removed from the top of the pushdown list, and production
number i is placed in the output buffer. A new table T' is then
exposed as the top table of the pushdown list, and the goto func-
tion of T' is applied to A to determine the next table to be placed
?If 0~ = X ~ - . - Xr, at this point the top of the pushdown list will be of the form
ToX1T1XzTz... X, Tr. Removing 2 I~1 symbols removes the handle from the top of
the pushdown list along with any intervening LR tables.
on top of the pushdown list. We place A and this new table on

top of the pushdown list and return to step (1).
(c) If f ( u ) = error, we halt parsing (and, in practice, transfer to
an error recovery routine).
(d) If f ( u ) = accept, we halt and declare the string in the output
buffer to be the right parse of the original input string. D
Example 5.23
Let us apply Algorithm 5.7 to the initial configuration (To, aabb, e)
using the LR(1) tables of Fig. 5.9. The lookahead string here is a. The parsing
action function of To on a is reduce 2, where production 2 is S --~ e. By step
(2b), we are to remove 2[el = 0 symbols from the pushdown list and emit 2.
The table on top of the pushdown list after this process is still T 0. Since
the goto part of table To with argument S is T~, we then place STi on top of
the pushdown list to obtain the configuration (ToST~, aabb, 2).
Let us go through this cycle once more. The lookahead string is still a.
The parsing action of T~ on a is shift, so we remove a from the input and
place a on the pushdown list. The goto function of Tt on a is T2, so after this
step we have reached the configuration (ToSTlaT2, abb, 2).
Continuing in this fashion, the LR parser would make the following
sequence of moves:
(To, aabb, e) ~ (ToST1, aabb, 2)

}--- (ToSTiaT2, abb, 2)
(ToSTa aT2ST3, abb, 22)
1---(ToSTaaT2ST3aT4, bb, 22)
]--- (ToSTlaT2ST3aT4ST6, bb, 222)
(ToSTiaT2ST3aT4ST6bTT, b, 222)
}- (ToST~aT~ST3, b, 2221)
1---(ToST~aT2ST3bTs, e, 2221)
[-- (ToSTa, e, 22211)
Note that these steps are essentially the same as those of Example 5.21 and
that the LR(1) tables explain the way in which choices were made in that
example. D
In this section we shall develop the necessary algorithms to be able to
automatically construct an LR parser of this form for each LR grammar.
In fact, we shall see that a grammar G is LR(k) if and only if it has an LR(k)
parser. But first, let us return to the basic definition of an LR(k) grammar
and examine some of its consequences.
A proof that Algorithm 5.7 correctly parses an LR(k) grammar requires

considerable development of the theory of LR(k) grammars. Let us first
verify that our intuitive notions of what a deterministically right-parsable
grammar ought to be are in fact implied by the LR(k) definition. Suppose
that we are given a right-sentential form ~tflw of an augmented LR(k) gram-
mar such that ~Aw:::~. ~flw. We shall show that by scanning ~fl and
rm
FIRST~(w), there can be no confusion as to

(1) The location of the right end of the handle,
(2) The location of the left end of the handle, or
(3) What reduction to make once the handle has been isolated.
(1) Suppose that there were another right-sentential form ~fly such that
FIRSTk(y ) = FIRSTk(W) but that y can be written as Y lY2Y3, where B ~ Y2
is a production and t~flyiBy3 is a right-sentential form such that ¢tfly~By3
=::* ~flY~Y2Y3. This case is explicitly ruled out by the LR(k) definition. This
rm
becomes evident when we let x = Y3 and ~,B = ~flylB in that definition.

The right end of the handle might occur before the end of ft. That is, there
may be another right-sentential form ~,~Bvy such that B ----~~'2 is a production,
FIRSTk(y ) = FIRSTk(w ) and ~,lBvy ~ }'l~2vy tufty. This case is also ruled
=
rm
out if we let ~,B = 7,1B and x = vy in the LR(k) definition.

(2) Now suppose that we know where the right end of the handle of
a right-sentential form is but that there is confusion about its left end. That
is, suppose that ~Aw and tx'A'y are right-sentential forms such that
FIRSTk(W) = FIRSTk(y ) a n d ~Aw==~tzflw and t~'A'y=>~'fl'y= txfly.
rm rill
However, the L R ( k ) c o n d i t i o n stipulates that tzA = tz'A', so that both

fl = fl' and A = A'. Thus the left end of the handle is uniquely specified.
(3) There can be no confusion of type (3), since A = A' above. Thus
the nonterminal which is to replace the handle is always uniquely determined.
Let us now give some examples of LR and non-LR grammars.
Example 5.24
Let G~ be the right-linear grammar having the productions
S - - - ~ CID
C > aClb
D > aDlc
We shall show G~ to be LR(1).I"
t In fact, G~ is LR(0).
Every (rightmost) derivation in G'i (the augmented version of G~) is either

of the form
i
S' ~ S ~ C ~ aiC ~ aib for i ~ 0
or
i
S'-----~ S ~ D ~ aid ~ aic for i ~ 0
Let us refer to the LR(1) definition, and suppose that we have derivation
S' ~ tzAw =~, ocflw and S ' ==~ 7 B x =~ t~fly. Then since G'i is right-linear,
VIII rm rm r/n
we must have w = x = e. If F I R S T i ( w ) = FIRST~(y), then y = e also.

We must now show that txA = 7,B; i.e., ~ = 7' and A = B. Let B ---~ ,5 be
the production applied going from 3,Bx to ocfly. There are three cases to con-
sider.
Case 1: A = S ' (i.e., the derivation S ' = ~ o~Aw is trivial). Then tx = e
rill
and fl = S. By the form of derivations in G'i, there is only one way to derive
the right-sentential from S, so 7' = e and B = S', as was to be shown.
Case 2: A = C. Then fl is either a C or b. In the first case, we must have
B = C, for only C and S have a production which end in C. If B = S, then
3' = e by the form of derivations in G'i. Then 7B ~ ~fl. Thus we m a y con-
clude that B = C, 6 = aC, and 3' = ~. In the second case (fl b), we must
have B = C, because only C has a production ending in b. The conclusion
that 7' = tz and B A is again immediate.
Case 3" A ---- D. This case is symmetric to case 2.
N o t e that G 1 is not LL. D
Example 5.25
Let G 2 be the left-linear g r a m m a r with productions
S= > A b l Bc
A~ > Aale
B >Bale
Note that L(G2) = L(G1) for G1 above. However, G 2 is not LR(k) for any k.
Suppose that G2 is LR(k). Consider the two rightmost derivations in
the a u g m e n t e d g r a m m a r G~"
S'~----~ S ~ Aakb ~ akb

rm rm rm
and
S' ~ S ~ Bakc ~ akc
rm rm rm
sec. 5.2 DETERMINISTIC BOTTOM-UP PARSING 379
These two derivations satisfy the hypotheses of the LR(k) definition with
tz = e, fl = e, w = akb, y = e, and y = akc. Since A ~ B, G2 is not LR(k).
Moreover, this violation of the LR(k) condition holds for any k, so that
G2 i s n o t L R . 5
The grammar in Example 5.25 is not uniquely invertible, and although

we know where the handle is in any right-sentential form, we do not always
know whether to reduce the first handle, which is the empty string, to A or B
if we allow ourselves to scan only a finite number of terminal symbols
beyond the right end of the handle.
Example 5.26
A situation in which the location of a handle cannot be uniquely deter-
mined is found in the grammar G 3 with the productions
S-----~AB
A ----~ a
B ~ CDIaE
C .-----~. ab
D- >bb
E- > bba
G3 is not LR(1). We can see this by considering the two rightmost derivations
in the augmented grammar"
S' ~ S ~ AB ~ A CD ~ A Cbb ~ Aabbb
and
S' ~ S ~ AB ~ AaE ~ Aabba
In the right-sentential form A a b w we cannot determine whether the right end

of the handle occurs between b and w (when w = bb) or to the right of A a b
(when w = ha) if we know only the first symbol of w. Note that G3 is LR(2),
however. El
We can give an informal but appealing definition of an LR(k) grammar in

terms of its parse trees. We say that G is LR(k) if when examining a parse
tree for G, we know which production is used at any interior node after seeing
the frontier to the left of that node, what is derived from that node, and
the next k terminal symbols. For example in Fig. 5.10 we can determine with
certainty which production is used at node A by examining uv and
FIRSTk(W). In contrast, the LL(k) condition states that the production
which is used at A can be determined by examining u and F I R S T k ( V W ) .
u A w
1J Fig. 5.10 P a r s e tree.
In Example 5.25 we would argue that G 2 is not LR(k) because after seeing
the first k a's, we cannot determine whether production A ~ e or B ---~ e is
to be used to derive the empty string at the beginning of the input. We
cannot tell which production is used until we see the last input symbol, b
or c. In Chapter 8 we shall try to make rigorous arguments of this type, but
although the notion is intuitively appealing, it is rather difficult to formalize.
5.2.3 Implications of the LR(k) Definition
We shall now develop the theory necessary to construct LR(k) parsers.

DEFINITION
Suppose that S ~ ~ A w =~ ~flw is a rightmost derivation in grammar G.

rm rm
We say that a string ~, is a viable prefix of G if ~, is a prefix of ~fl. That is, ~,

is a string which is a prefix of some right-sentential form but which does not
extend past the right end of the handle of that right-sentential form.
The heart of the LR(k) parser is a set of tables. These are analogous to
the LL tables for LL grammars, which told us, given a lookahead string,
what production might be applied next. For an LR(k) grammar the tables
are associated with viable prefixes. The table associated with viable prefix ~,
will tell us, given a lookahead string consisting of the next k input symbols,
whether we have reached the right end of the handle. If so, it tells us what
the handle is and which production is to be used to reduce the handle.
Several problems arise. Since ~, can be arbitrarily long, it is not clear
that any finite set of tables will suffice. The LR(k) condition says that we
can uniquely determine the handle of a right-sentential form if we know
all of the right-sentential form in front of the handle as well as the next k
input symbols. Thus it is not obvious that we can always determine the
handle by knowing only a fixed amount of information about the string in
front of the handle. Moreover, if S ~ ~Aw ~ ~flw and the question "Can
rm rm
txflw be derived rightmost by a sequence of productions ending in production

p ?" can be answered reasonably, it may not be possible to calculate the tables
for aA from those for aft in a way that can be "implemented" on a pushdown
transducer (or possibly in any other convenient way). Thus we must consider
a table that includes enough information to computethe table corresponding
to aA from that for aft if it is decided that aAw ==~ apw for an appropriate w.
rm
We thus make the following definitions.
DEFINITION
Let G = (N, E, P, S) be a CFG. We say that [A ---~ fll • flz, u] is an LR(k)

item (for k and G, but we usually omit reference to these parameters when
they are understood) if A ---~ fl~fl2 is a production in P and u is in E *k. We
say that LR(k) item [A ----~fll • flz, u] is valid for afl~, a viable prefix of G,
if there is a derivation S =-~ a A w = , ocfllflzw such that u = FIRSTk(w ).
rnl rIi1
Note that fix may be e and that every viable prefix has at least one valid
LR(k) item.
Example 5.27
Consider grammar G1 of Example 5.24. Item [C ---~ a • C, e] is valid for
aaa, since there is a derivation S ~ aaC => aaaC. That is, ~ = aa and
rill rill
w = e in this example.
Note the similarity of our definition of item here to that found in the
description of Earley's algorithm. There is an interesting relation between
the two when Earley's algorithm is applied to an LR(k) grammar. See
Exercise 5.2.16.
The LR(k) items associated with the viable prefixes of a grammar are
the key to understanding how a deterministic right parser for an LR(k)
grammar works. In a sense we are primarily interested in LR(k) items of
the form [A ~ fl -, u], where the dot is at the right end of the production.
These items indicate which productions can be used to reduce right-sententia!
forms. The next definition and the following theorem are at the heart of
LR(k) parsing.
DEFINITION
We define the e-free first function, EFF~(00 as follows (we shall delete
the k and/or G when clear)"
(1) If a does not begin with a nonterminal, then EFFk(a) = FIRSTk(00.
(2) If a begins with a nonterminal, then
*
EFFk(a) = (w I there is a derivation a ~ fl ~ wx,

rlTl rm
where B ~ A w x for any nonterminal A), and w = FIRSTk(WX)
Thus, EFFk(a ) captures all members of FIRSTk(a ) whose derivation does

not involve replacing a leading nonterminal by e (equivalently, whose right-

most derivation does not use an e-production at the last step, when g begins
with a nonterminal).
Example 5.28
Consider the grammar G with the productions
S >AB
A >Bale
B~ > CbIC
C >cle
FIRST2(S) = {e, a, b, c, ab, ac, ba, ca, cb}
EFF2(S ) = {ca, cb} D
Recall that in Chapter 4 we considered a bottom-up parsing algorithm

which would ~not work on grammars having e-productions. For LR(k)
parsing we can permit e-productions in the grammar, but we must be careful
when we reduce the empty string to a nonterminal.
We shall see that using the E F F function we are able to correctly deter-
mine when the empty string is the handle to be reduced to a nonterminal.
First, however, we introduce a slight revision of the LR(k) definition. The
two derivations involved in that definition really play interchangeable roles,
and we can therefore assume without loss of generality that the handle of
the second derivation is at least as far right as that of the first.
LEMMA 5.2
If G = (N, ~, P, S') is an augmented grammar which is not LR(k), then
there exist derivations S' ~ gAw ~ t~j3w and S' ~ ?Bx ~ ?#x = g/3y,
rm rm rm rill
where FIRSTk(w ) = FIRSTk(y ) and l r,~t >_ Is,at but ?Bx ~ gay.
Proof. We know by the LR(k) definition that we can find derivations
satisfying all conditions, except possibly the condition ]76[ > I~Pl. Thus,
assume that Iral < I~Pl. We shall show that there is another counter-
example to the LR(k) condition, where ?~ plays the role of t~,a in that con-
dition.
Since we are given that ?6x = ~fly and i ral < I~/~ !, we find that for some
z in E+, we can write gfl = ?6z. Thus we have the derivations
S' ~ ?Bx ~ ?~x,

rill rill
and
S' ~ ~Aw ~ ~pw = ?d~zw
rill rill
Now z was defined so that x - - z y . Since FIRSTk(w ) = FIRSTk(y), it

follows that FIRSTk(x ) = FIRST~(zw). The LR(k) condition, if it held,
would say that ocAw = 7Bzw. We would have 7Bz = ~A and 7Bzy = o~Ay,
using operations of "cancellation" and concatenation, which preserve equal-
ity. But zy = x, so we have shown that e~Ay = 7Bx, which we originally
assumed to be false. If we relate the two derivations above to the LR(k)
condition, we see that they satisfy the conditions of the lemma, when the
proper substitutions of string names are made, of course.
LR(k) parsing techniques are based on the following theorem.

THEOREM 5.9
A grammar G = (N, 1~, P, S) is LR(k) if and only if the following condi-
tion holds for each u in ]~.k. Let ~fl be a viable prefix of a right-sentential
form ~]~w of the augmented grammar G'. If LR(k) item [A --~ f l . , u] is valid
for ~p, then there is no other LR(k) item [Ax ~ fl~ • P2, v] which is valid
for ~fl with u in EFFk(flzv ). (Note that f12 may be e.)
Proof.
Only if: Suppose that [A --~ p . , u] and [A~ ~ p~ • P2, v] are two dis-
tinct items valid for 0eft. That is to say, in the augmented grammar
S' ~ ocAw ~ ocpw with FIRSTk(w ) : u

rm rill
S' ~ 0~tA~x ~ ~P2x with FIRSTk(x ) = v

rill rm
and ~p = 0c~p~. Moreover, P2x ~ uy for some y in a (possibly zero step)

rm
derivation in which a leading nonterminal is never replaced by e.

We claim that G cannot be LR(k). To see this, we shall examine three
cases depending on whether (1) P2 = e, (2) P2 is in E+, or (3) ,02 has a non-
terminal.
Case 1: If P2 = e, then u = v, and the two derivations are
:o
S' ~ ocAw ~ ~pw
rill rill
and
S' ~ ~,AlX ---->. ~ p ~ x

rill rill
where FIRSTk(W) = FIRSTk(x ) = u ---- v. Since the two items are distinct,
either A ~ A~ or fl ~ fl~. In either case we have a violation of the LR(k)
definition.
Case 2: If f12 = z for some z in E +, then
S' ~ aAw ~ aflw

rill riD.
and
S' ~ a~A ~x ~ a~fl~zx

rill rm
where aft = a~fl~ and FIRSTk(zx ) = u. But then G is not LR(k), since
a A z x cannot be equal to a~Alx if z ~ E+.
Case 3: Suppose that f12 contains at least one nonterminal symbol. Then
f12 ==~ u~Bu3 ==~ u~u2u3, where u~u2 ~ e, since a leading nonterminal is not
rm rm
to be replaced by e in this derivation. Thus we would have two derivations
S' ~ aAw ~ aflw

rm rill
and
rm rill
alfllu~Bu3x ~ alfllulu2u3x
I'm rm
such that alfl~ = aft and uluzu3x = uy. The LR(k) definition requires that
OCAUlUzU3X = ~xfllulBu3x. That is, aAu~u2 = a~fl~u~B. Substituting aft for
a~fl~, we must have Aulu2 = flu~B. But since uluz ~ e, this is impossible.
Note that this is the place where the condition that u is in EFFk(fl2v ) is
required. If we had replaced E F F by F I R S T in the statement of the theorem,
then u~u2 could be e and aAu~u2u3x could then be equal to a~fl~u~Bu3x (if
ulu2 = e and fl = e).
If: Suppose that G is not LR(k). Then there are two derivations in the
augmented grammar
(5.2.1) S' ~ aAw ~ aflw

rm rm
and
(5.2.2) S' ~ ~Bx - - ~ 7~x = afly

rill rill
such that FIRSTk(w ) = FIRSTk(y ) = u, but a A y ~ ~,Bx. Moreover, we can

choose these derivations such that aft is as short as possible.
By Lemma 5.2, we may assume that ]~,~[ ~ [aft[. Let alA1yx be the last
right-sentential form in the derivation
S' ~ },Bx
rm
such that the length of its open portion is no more than lafll + 1. That is,
I~A~ I _<[~fll + 1. Then we can write (5.2.2) as
(5.2.3)
rm rm rm
where a,]31 = aft. By our choice of txlAly~, we have [ a , [ _ ~ ]afll _~ 1~'61.

Moreover, flzYl =o. y does not use a production B --~ e at the last step from
rm
our choice of a lA~yl. That is to say, if B ~ e were the last production

applied, then alA ~yl would not be the last right-sentential form in the deri-
vation S=> ?,Bx whose open portion is no longer than l aft[ + 1. Thus, u
rm
= FIRSTk(y) is in EFFk(fl2yl). We may conclude that [A1 ~ fll • f12, v] is

valid for aft where v = FIRST~(yi).
From derivation (5.2.1), [A--+ f l . , u] is also valid for aft, so that it
remains to show that A~ ~ ¢/1-¢/2 is not the same as A ~ / 3 - .
To show this, suppose that A1 ~ & . & is A ~ #-. Then derivation
(5.2.3) is of the form
S'---> alAy =--~ alfly

i'm i'm
where alfl = aft. Thus el = a and aAy = o~Bx, contrary to the hypothesis
that G is not LR(k). [~]
The construction of a deterministic right parser for an LR(k) grammar

requires knowing how to find all valid LR(k) items for each viable prefix of
a right-sentential form.
DEFINITION
Let G be a C F G and y a viable prefix of G. We define V~(~,) to be the set

of LR(k) items valid for ~, with respect to k and G. We again delete k and/or
G if understood. We define ,~ = { a l a -= V~(~') for some viable prefix ? of G}
as the collection of the sets of valid LR(k) items for G. ~ contains all sets of
LR(k) items which are valid for some viable prefix of G.
We shall next present an algorithm for constructing a set of LR(k) items
for any sentential form, followed by an algorithm to construct the collection
of the sets of valid items for any grammar G.
ALGORITHM 5.8
Construction of V~(?).
Input. C F G G = (N, X, P, S) and 7 in (N u X)*.
Output. V~(r).
Method. If 7 = X t X 2 " ' " X,, we construct V~(7)by constructing Vk(e),
v~(x~), v~(xxx~), . . ., v~(x,x~ ... x.).
(1) We construct Vk(e) as follows"
(a) If S ~ ~ is in P, add [S ----, • ~, e] to Vk(e).
(b) If [A --~ • B~, u] is in Ve(e) and B ~ fl is in P, then for each x
in FIRSTk(~u) add [ B - - , . ,8, x] to Vk(e), provided it is not
already there.
(c) Repeat step (b) until no more new items can be added to Vk(e).
(2) Suppose that we have constructed V,(X1X2... Xi_x), i ~ n. We
construct Vk(XtX2 "'" Xi) as follows"
(a) If [A---* ~ - X , fl, u] is in Vk(Xx... Xi_,), add [A--* ~Xt" fl, u]
to Vk(X, . . . X,).
(b) If [A --* ~ • Bfl, u] has been placed in Vk(Xi "'" X3 and B ---, J
is in P, then add [B---.. J, x] to V k ( X , . . . X3 for each x in
FIRSTk(flu), provided it is not already there.
(c) Repeat step (2b) until no more new items can be added to
v~(x, ... x,). El
DEFINITION
The repeated application of step (lb) or (2b) of Algorithm 5.8 to a set of

items is called taking the closure of that set.
We shall define a function GOTO on sets of items for a grammar G =
(N, Z, P, S). If a is a set of items such that ~ = V~(7), where 7 ~ (N U Z)*,
then GOTO(~, X) is that ~t' such that ~ t ' = V~(?X), where X ~ (N u Z).
In Algorithm 5.8 step (2) computes
V k ( X a X 2 . . . X~) = G O T O ( V k ( X , X 2 . . . X t _ ,), Xt).
Note that step (2) is really independent of X1 . " X~_ ,, depending only on the
set Vk(Xi ... Xt-,) itself.
Example 5.29
Let us construct Va(e), V,(S), and Vx(Sa) for the augmented grammar
S' >S
S > SaSb
S >e
(Note, however, that Algorithm 5.8 does not require that the grammar be
augmented.) We first compute V(e) using step 1 of Algorithm 5.8. In step
(la) we add [S' --, • S, e] to V(e). In step (lb) we add [S ----~ • SaSb, e] and
[S ~ . , e] to V(e). Since [S ----~ • SaSb, e] is now in V(e), we must also add
IS ---~ • SaSb, x] and [S ---~., x] to V(e) for all x in FIRST(aSb) = a. Thus,
V(e) contains the following items"
IS' ~ • S, e]
[S --->-. SaSb, e/a]
[S~ > ., e/a]
Here we have used the shorthand notation [A ~ ~x. ,8, x ~ / x 2 / " ' / x , ] for
the set of items [A ----~~ . ,8, x~], [A ---~ 0~ • fl, x 2 ] , . . . , [A ~ ~ . fl, x,]. To
obtain V(S), we compute GOTO(V(e), S). F r o m step (2a) we add the three
items [S' ---~ S . , e] and IS ---~ S . aSb, e/a] to V(S). Computing the closure
adds no new items to V(S), so V(S) is
IS' > S . , e]
[S ~ S . aSb, e/a]
V(Sa) is computed as GOTO(V(S), a). V(Sa) contains the following six

items"
[S > Sa • Sb, e/a]

[s > • SaSh, a/b]
[S = ~ ., a/b] [[]
We now show that Algorithm 5.8 correctly computes V~(y).

THrORnM 5.10
An LR(k) item is in V~(7) after step (2) of Algorithm 5.8 if and only if
that item is valid for ?.
Proof.
If: It is left to the reader to show that Algorithm 5.8 terminates and
correctly computes Vk(e). We shall show that if all and only the valid items
for X1 "-- Xt_l are in Vk(XxX2 . " Xt-1), then all and only the valid items
for X1 .." X~ are in V~(Xi . . . Xt).
Suppose that [A ~ fl~ • ,82, u] is valid for X1 .." X~. Then there exists
a derivation S ==~ o~Aw ~ ocfli[32w such that 0q81 = X~Xz . . . Xt and
rill rill
u = FIRSTk(w ). There are two cases to consider.

Suppose ,8, = ,8'~X~. Then [A ---~ ,8~ • X,82, u] is valid for X~ . . . Xt_~
and, by the inductive hypothesis, is in Vk(X~ . . . Xt_x). By step (2a) of

Algorithm 5.8, [A ---~ ]3'~X~ • ,132, u] is added to Vk(X~ . . . Xt).
Suppose that fl~ = e, in which case ~ = X~ . . . Xt. Since S => ~ A w is
rm
a rightmost derivation, there is an intermediate step in this derivation in
which the last symbol Xt of 0c is introduced. Thus we can write S ~ ogBy
rm
• 'TXfly =-~ ocAw, where ~'~, = X ~ . . . X~_~, and every step in the deri-
rill rill
vation ~'TXfly ~ ~ A w rewrites a nonterminal to the right of the explicitly

rm
shown Xt. Then [B ~ 7 . Xfl, v], where v = FIRSTk(y), is valid for

X~ . . . Xt_ ~, and by the inductive hypothesis is in Vk(X~ "'" X~_ ~). By step
(2a) of Algorithm 5.5, [B ~ 7X~. ~, v] is added to V k ( g ~ ' ' " Xt). Since
@=:,. Aw, we can find a sequence of nonterminals D~, D2,... , D m and strings
rm
0 2 , . . . , Om in (N W E)* such that $ begins with D~, A = Din, and produc-

tion D~ ----~Dr+ ~0~+~ is in P for 1 ~ i ~ m. By repeated application of step
(2b), [A ~ • f12, u] is added to V k ( X 1 . . . X~). The detail necessary to show
that u is a valid second component of items containing A ~ • f12 is left to
the reader.
Only if: Suppose that [A ----~fit • f12, u] is added to Vk(X~ . . . Xt). We
show by induction on the number of items previously added to Vk(Xt . . . Xt)
that this item is valid for X~ . . . X~.
The basis, zero items in Vk(X~ "'" Xt), is straightforward. In this case
[A --~ 1/~ • f12, u] must be placed in Vk(X~ "'" X~) in step (2a), so fl~ = fl'~X~
and [A ~ fl'~ • Xtfl~, u] is in Vk(X~ " " X~_ ~). Thus, S ~ txAw =-~ o~fl'~X~flzw
rm rill
and 0~fl'~ = X ~ . . . X~_~. Hence, [A --~ i l l . flz, u] is valid for X ~ . . . Xt.
For the inductive step, if [A ---~ fl~ • flz, u] is placed in Ve(X~ . . . X~) at
step (2a), the argument is the same as for the basis. If this item is added in
step (2b), then fl~ = e, and there is an item [B --~ 7 • A$, v] which has been
previously added to Vk(Xt "'" X~), with u in FIRST~($v). By the inductive
hypothesis [B---~ ~, • A$, v] is valid for X~ . . . X~, so there is a derivation
S ~ rE'By ==~ tx'TAJy, where 0¢'7 = X~ . . . X~. Then
rm rm
S =~ X , . . . X , A @ =~ X , . . . X, A z =~ X , . . . X, fl2z,
rm rm rm
where u : FiRST~(z). Hence [A ~ • ,8~, u] is valid for X~ . . . X i.
Algorithm 5.8 provides a method for constructing the set of LR(k) items
valid for any viable prefix. In the construction of a right parser for an LR(k)
grammar G we are interested in the sets of items which are valid for all viable
prefixes of G, namely the collection of the sets of valid items for G. Since
a grammar contains a finite number of productions, the number of sets of
items is also finite, but often very large. If ? is a viable prefix of a right
sententiai form ?w, then we shall see that Vk(?) contains all the information
about ? needed to continue parsing yw.
The following algorithm provides a systematic method for computing
the sets of LR(k) items for G.
ALGORITHM 5.9
Collection of sets of valid LR(k) items for G.

Input. CFG G = (N, X, P, S) and an integer k.
Output. S = [t~l(2 = Vk(?), and ? is a viable prefix of G}.
Method. Initially S is empty.
(1) Place Vk(e) in g. The set Vk(e) is initially "unmarked."
(2) If a set of items a in $ is unmarked, mark 6t by computing, for
each X in N U X, GOTO(~t, X). (Algorithm 5.8 can be used here.) If
e t ' = GOTO(a, X) is nonempty and is not already in $, then add a ' to $
as an unmarked set of items.
(3) Repeat step (2) until all sets of items in $ are marked. D
DEFINITION
If G is a CFG, then the collection of sets of valid LR(k) items for its
augmented grammar will be called the canonical collection of sets of LR(k)
items for G.
Note that it is never necessary to compute GOTO(~, S'), as this set of
items will always be empty.
Example 5.30
Let us compute the canonical collection of sets of LR(1) items for the
grammar G whose augmented grammar contains the productions
St = > S
S > SaSb
S >e
We begin by computing tg0 = V(e). (This was done in Example 5.29.)
(go: IS' > • S, e]

IS > • SaSb, e/a]
IS > . , e/a]
We then compute GOTO((g o, X) for all X ~ (S, a, b}. Let GOTO(ao, S)

be a~1.
121" [S' > S.,e]

[S > S . aSb, e/a]
GOTO(120, a) and GOTO(a0, b) are both empty, since neither a nor b are
viable prefixes of G. Next we must compute GOTO(al, X) for X ~ {S, a, b}.
GOTO(~i, S) and GOTO(~i, b) are empty and a; 2 = GOTO(12~, a) is
122" [S > Sa • Sb, e/a]

[S > • SaSb, a/b]
[S > . , a/b]
Continuing, we obtain the following sets of items"
t~3 : [S > S a S . b, e/a]

[S > S. aSb, a/b]
124: [S , S a . Sb, a/b]
[S > . SaSb, a/b]
[S > ., a/b]
125" [s ~ S a S h . , e/a]
126" [S > S a S . b, a/b]
[S > S . aSb, a/b]
127" [S > S a S b ., a/b]
The GOTO function is summarized in the following table"
Grammar Symbol
S a b
Set 120 121 u m
of 121 122 m
Items 122 123
123 124 t25
124 126
125
126 124 127
127
Note that GOTO(12, X) will always be empty if all items in 12 have the dot at
the right end of the production. Here, 125 and 127 are examples of such sets
of items.
The reader should note the similarity in the GOTO table above and
the GOTO function of the LR(1) parser for G in Fig. 5.9. [Z]
THEOREM 5.11
Algorithm 5.9 correctly determines ~.
Proof. By Theorem 5.10 it sumces to prove that a set of items (~ is placed
in S if and only if there exists a derivation S =* ocAw =~ 6¢flw, where 7 is
rm rm
a prefix of 0eft and 6 - - Vk(7). The "only if" portion is a straightforward
induction on the order in which the sets of items are placed in S. The "if"
portion is a no less straightforward induction on the length of 7. These are
both left for the Exercises. [-]
5.2.4. Testing for the LR(k) Condition
It may be of interest to know that a particular grammar is LR(k) for some

given value of k. We can provide an algorithm based on Theorem 5.9 and
Algorithm 5.9.
DEFINITION
Let G -- (N, Z, P, S) be a C F G and k an integer. A set 6 of LR(k) items
for G is said to be consistent if no two distinct members of 6 are of the form
[A --~ fl-, u] and [B ~ fll " f12, v], where u is in EFFk(flzv). flz may be e.
ALGORITHM 5.10
Test for LR(k)-ness.
Input. C F G G -- (N, X, P, S) and an integer k ~ 0.
Output. "Yes" if G is LR(k); "no" otherwise.
Method.
(1) Using Algorithm 5.9, compute g, the canonical collection of the sets
of LR(k) items for G.
(2) Examine each set of LR(k) items in $ and determine whether it is
consistent.
(3) If all sets in g are consistent, output Yes. Otherwise, declare G not to
be LR(k) for this particular value of k. [---]
The correctness of Algorithm 5.10 is merely a restatement of Theorem 5.9.
Example 5.31
Let us test the grammar in Example 5.30 for LR(l)-ness. We have
S = {60, • • •, 67}. The only sets of LR(1) items which need to be tested are
those that contain a dot at the right end of a production. These sets of items
are 60, 6 I, 6 z, 64, 65, and 67.
Let us consider 60. I n the items [S' --, • S, e] and [ S - - , • SaSb, e/a] in
6o, EFF(S) and EFF(Sa) are both empty, so no violation of consistency
with the items [S - - , . , e/a] occurs.
Let us consider a 1. Here EFF(aSb)= EFF(aSba)= a, but a is not

a lookahead string of the item [S'----~ S . , e]. Therefore, ~1 is consistent.
The sets of items ~2 and ~4 are consistent because EFF(Sbx)and
EFF(SaSbx) are both empty for all x. The sets of items ~5 and ~7 are clearly
consistent.
Thus all sets in S are consistent, so we have shown that G is LR(1). D
5.2.5. Deterministic Right Parsers for

LR(k) Grammars
In this section we shall informally describe how a deterministic extended

pushdown transducer with k symbol lookahead can be constructed from
an LR(k) grammar to act as a right parser for that grammar. We can view
the pushdown transducer described earlier as a shift-reduce parsing algorithm
which decides on the basis of its state, the top pushdown list entry, and
the lookahead string whether to make a shift or a reduction and, in the latter
use, what reduction to make.
To help make the decisions, the parser will have in every other pushdown
list cell an "LR(k) table," which summarizes the parsing information which
can be gleaned from a set of items. In particular, if ~ is a prefix of the push-
down string (top is on the right), then the table attached to the rightmost
symbol of ~ comes from the set of items Vk(00. The essence of the construc-
tion of the right parser, then, is finding the LR(k) table associated with
a set of items.
DEFINITION
Let G be a C F G and let S be a collection of sets of LR(k) items for G.

T(~), the LR(k) table associated with the set of items ~ in S, is a pair of
functions ( f , g~. f is called the parsing action function and g the goto
function.
(1) f maps E *k to {error, shift, accept} U {reduce i{ i is the number of
a production in P, i _~ 1}, where
(a) f ( u ) = shift if [A ~ i l l " f12, v] is in ~, f12 ~ e, and u is in
EFFk(fl2v).
(b) f(u) = reduce i if [A ~ f l . , u] is in ~ and A ---~ fl is production
iinP, i ~ 1.
(c) f(e) = accept if [S' --, S . , e] is in ~.
(d) f ( u ) = error otherwise.
(2) g, the goto function, determines the next applicable table. Some g
will be invoked immediately after each shift and reduction. Formally, g maps
N U E to the set of tables or the message error, g(X) is the table associated
with GOTO(~, X). If GOTO(~, X) is the empty set, then g(X)= error.
We should emphasize that by Theorem 5.9, if G is LR(k) and $ is the
canonical collection of sets of LR(k) items for G, then there can be no con-
flicts between actions specified by rules (la), (1 b), and (l c) above.
We say that the table T(~) is associated with a viable prefix ? of G if
a =
DEFINITION
The canonical set of LR(k) tables for an LR(k) grammar G is the pair
(~, To), where 5 the set of LR(k) tables associated with the canonical collec-
tion of sets of LR(k) items for G. T o is the LR(k) table associated with V~(e).
W e shall usually represent a canonical LR(k) parser as a table, of which
each row is an LR(k) table.
The LR(k) parsing algorithm given as Algorithm 5.7 using the canonical
set of LR(k) tables will be called the canonical LR(k)parsing algorithm or
canonical LR(k) parser, for short.
W e shall now summarize the process of constructing the canonical set of
LR(k) tables from an LR(k) grammar.
ALGORITHM 5.11
Construction of the canonical set of LR(k) tables from an LR(k) gram-

mar.
Input. An LR(k) grammar G = (N, Z, P, S).
Output. The canonical set of LR(k) tables for G.
Method.
(1) Construct the augmented grammar
G ' = (N U IS'}, Z, P U {S' --~ S}, S').
S ' - - , S is to be the zeroth production.

(2) From G' construct S, the canonical collection of sets of valid LR(k)
items for G.
(3) Let 3 be the set of LR(k) tables for G, where ~3= {T[T = T(a) for
some Ct ~ 8}. Let To = T(Cto), where Cto = V~(e). [~
Example 5.32
Let us construct the canonical set of LR(1) tables for the grammar G
whose augmented grammar is
(o) s'
(1) S > SaSb
(2) S >e
The canonical collection $ of sets of LR(1) items for G is given in Example

5.30. From S we shall construct the set of LR(k) tables.
Let us construct To = <f0, go>, the table associated with a~0. Since k = 1,
the possible lookahead strings are a, b, and e. Since ~o contains the items
[S ~ . , e/a], f o ( e ) = fo(a)= reduce 2. From the remaining items in ~to we
determine that fo(b)= error [since EFF(S00 is empty]. To compute the
GOTO function go, we note that GOTO(a~o, S ) = ~i and GOTO(~o, X) is
empty otherwise. If T1 is the name given to T(~i), then go(S)= Tx and
go(X) = e r r o r for all other X. We have now completed the computation of
To. We can represent To as follows"
fo go
b ,Sc a b
To 2 X 2 T1 X X
Here, 2 represents reduce using production 2, and X represents error.

Let us now compute the entries for T~ = (f~, g~). Since [S'--~ S . , e] is
in ~t, we have f~(e)= accept. Since IS ~ S . aSb, e/a] is in ~1, f t ( a ) =
shift. Note that the lookahead strings in this item have no relevance here.
Then, f j ( b ) = error. Since GOTO(t~, a ) = t~2, we let gx(a)= T2, where
T~ = T ( ~ ) .
Continuing in this fashion we obtain the set of LR(1) tables given in Fig.
5.9 on p. 374. [[]
In Chapter 7 we shall discuss a number of other methods for producing
LR(k) parsers from a grammar. These methods often produce parsers much
smaller than the canonical LR(k) parser. However, the canonical LR(k)
parser has several outstanding features, and these will be used as a yardstick
by which other LR(k) parsers will be evaluated. We will mention several
features concerning the behavior of the canonical LR(k) parsing algorithm"
(1) A simple induction on the number of moves made shows that each
table on the pushdown list is associated with the string of grammar symbols
to its left. Thus as soon as the first k input symbols of the remaining input
are such that no possible suffix could yield a sentence in L(G), the parser will
report error. At all times the string of grammar symbols on the pushdown list
must be a viable prefix of the grammar. Thus an LR(k) parser announces
error at the first possible opportunity in a left to right scan of the input
string.
(2) Let Tj = (~, gj). If f~(u)= shift and the parser is in configuration
(5.2.4) (ToXaTaXzT z . . . XiT ~, x, tO

then there is an item [B--, fl~ •/~z, v] which is valid for XaX2 . . . Xj, with u
,
in EFF(flzv). Thus by Theorem 5.9, if S' =~ XxX2 . . . Xjuy for some y in
rm
Y*, then the right end of the handle of Xa . . . X~uy must occur somewhere
to the right of X~.
sEc. 5.2 DETERMINISTIC B O T T O M - U P P A R S I N G 395
(3) If f j ( u ) = reduce i in configuration (5.2.4) and production i is

A ---~ YiY2 "'" Yr, then the string Xj_r+iX'j_,+z . . . X 1 on the pushdown list
in configuration (5.2.4) must be Y1 "'" Yr, since the set of items from which
table T~ is constructed contains the item [A ----~ Y1Y2"'" Y, ", u]. Thus in
a reduce move the symbols on top of the pushdown list do not need to be
examined. It is only necessary to pop 2r symbols from the pushdown list.
(4) If f / u ) = accept, then u = e. The pushdown list at this point is
ToST, where T is the LR(k) table associated with the set of items containing
[ S ' - . S ., el.
(5) A D P D T with an endmarker can be constructed to implement the
canonical LR(k) parsing algorithm. Once we realize that we can store
the lookahead string in the finite control of the DPDT, it should be evident
how an extended D P D T can be constructed to implement Algorithm 5.7,
the LR(k) parsing algorithm.
We leave the proofs of these observations for the Exercises. They are
essentially restatements of the definitions of valid item and LR(k) table. We
thus have the following theorem.
THEOREM 5.12
The canonical LR(k) parsing algorithm correctly produces a right parse
of its input if there is one, and declares "error" otherwise.
Proof. Based on the above observations, it follows immediately by induc-
tion on the number of moves made by the parsing algorithm that if e is the
string of grammar symbols on its pushdown list and x the unexpended input
including the lookahead string, then otx :=~ w, where w is the original input
string and ~ is the current output. As a special case, if it accepts w and
emits output nR, then S ==~ w. [Z]
A proof of the unambiguity of an LR grammar is a simple application

of the LR condition. Given two distinct rightmost derivations S ==~ el => .. •
rill rill
=~ e, ==~ w and S ==~ fl~ =~ . . . ==~ tim :=~ W, consider the smallest i such
rm rm rm rm rm
that OCn-t ~ flm-r A violation of the LR(k) definition for any k is immediate.
We leave details for the Exercises. It follows that the canonical LR(k) parsing
algorithm for an LR(k) grammar G produces a right parse for an input w
if and only if w ~ L(G).
It may not be completely obvious at first that the canonical LR(k) parser
operates in linear time, even when the elementary operations are taken to
be its own steps. That such is the case is the next theorem.
THEOREM 5.13
The number of steps executed by the canonical LR(k) parsing algorithm

in parsing an input of length n is 0(n).
P r o o f Let us define a C-configuration of the parser as follows"

(1) An initial configuration is a C-configuration.
(2) A configuration immediately after a shift move is a C-configuration.
(3) A configuration immediately after a reduction which makes the stack
shorter than in the previous C-configuration is a C-configuration.
In parsing an input of length n the parser can enter at most 2n C-configu-
rations. Let the characteristic of a C-configuration be the sum of the number
of grammar symbols on the pushdown list plus twice the number of remain-
ing input symbols. If C 1 and C z are successive C-configurations, then the
characteristic of C1 is at least one more than the characteristic of C2. Since
the characteristic of the initial configuration is 2n, the parser can enter at
most 2n C-configurations.
Now it suffices to show that there is a constant c such that the parser can
make at most c moves between successive C-configurations. To prove this
let us simulate the LR(k) parser by a D P D A which keeps the pushdown list
of the algorithm as its own pushdown list. By Theorem 2.22, if the D P D A
does not shift an input or reduce the size of its stack in a constant number of
moves, then it is in a loop. Hence, the parsing algorithm is also in a loop.
But we have observed that the parsing algorithm detects an error if there
is no succeeding input that completes a word in L(G). Thus there is some
word in L(G) with arbitrarily long rightmost derivations. The unambiguity
of LR(k) grammars is contradicted. We conclude that the parsing algorithm
enters no loops and that hence the constant c exists. D
5.2.6. Implementation of LL(k) and LR(k) Parsers
Both the LL(k) and LR(k) parser implementations seem to require plac-
ing large tables on the pushdown list. Actually, we can avoid this situation,
as follows:
(1) Make one copy of each possible table in memory. Then, on the push-
down list, replace the tables by pointers to the tables.
(2) Since both the LL(k) tables and LR(k) tables return the names of
other tables, we can use pointers to the tables instead of names.
We note that the grammar symbols are actually redundant on the push-
down list and in practice would not be written there.
EXERCISES
5.2.1. Determine which of the following grammars are LR(1):

(a) Go.
(b) S - - , AB, A --. OAlt e, B --~ 1B[ 1.
(c) S - - . OSI l A, A --~ 1All.
(d) S ----~ S + A IA, A ---~ (S) t a(S) l a.
EXERCISES 397
The last grammar generates parenthesized expressions with operator +

and with identifiers, denoted a, possibly singly subscripted.
5.2.2. Which of the grammars of Exercise 5.2.1 are LR(0)?
5.2.3. Construct the sets of LR(1) tables for those grammars of Exercise 5.2.1
which are LR(1). Do not forget to augment the grammars first.
5.2.4. Give the sequence of moves made by the LR(1) right parser for Go
with input (a + a) • (a + (a + a ) . a).
• 5.2.5. Prove or disprove each of the following:
(a) Every right-linear grammar is LL.
(b) Every right-linear grammar is LR.
(c) Every regular grammar is LL.
(d) Every regular grammar is LR.
(e) Every regular set has an LL(1) grammar.
(f) Every regular set has an LR(1) grammar.
(g) Every regular set has an LR(0) grammar.
5.2.6. Show that every LR grammar is unambiguous.
*5.2.7. Let G = (N, E, P, S), and define GR = (N, E, PR, S), where PR is P
with all right sides reversed. That is, PR = {A ~ 0~RIA ~ t~ is in P].
Give an example to show that GR need not be LR(k) even though G is.
*5.2.8. Let G = (N, E, P, S) be an arbitrary CFG. Define R~(i, u), for u in
E , e and production number i, to be {ocflulS ~ o~Aw ~ txflw, where
[m rlTl
u = FIRSTk(w) and production i is A ~ fl}. Show that R~(i, u) is

regular.
5.2.9. Give an alternate parsing algorithm for LR(k) grammars by keeping
track of the states of finite automata that recognize R~(i, u) for the
various i and u.
'5.2.10. Show that G is LR(k) if and only if for all ~, fl, u, and v, ~ in R~(i, u)
and t~fl in R~(j, v) implies fl = e and i = j.
'5.2.11. Show that G is LR(k) if and only if G is unambiguous and for all
w, x, y, and z in E*, the four conditions S ~ way, A ~ x, S ~ wxz,
and FIRSTk(y) = FIRSTk(Z) imply that S ~ wAz.
*'5.2.12. Show that it is undecidable whether a CFG is LR(k) for some k.
*'5.2.13. Show that it is undecidable whether an LR(k) grammar is an LL
grammar.
5.2.14. Show that it is decidable whether an LR(k) grammar is an LL(k) gram-
mar for the same value of k.
"5.2.15. Show that every e-free CFL is generated by some e-free uniquely
invertible CFG.
'5.2.16. Let G = (N, E, P, S) be a CFG grammar, and w = al, . . . an a string
in E". Suppose that when applying Earley's algorithm to G, we find
item [A--~ 0c-fl, j] (in the sense of Earley's algorithm) on list Ii.
.
Show that there is a derivation S ==~ y x such that item [A - ~ 0c • fl, u]
rm
(in the LR sense) is valid for y, u = FIRSTk(x), and 7 ~ a l . . . a~.

"5.2.17. Prove the converse of Exercise 5.2.16.
"5.2.18. Use Exercise 5.2.16 to show that if G is LR(k), then Earley's algorithm
with k symbol lookahead (see Exercise 4.2.17) takes linear time and
space.
5.2.19. Let X be any symbol. Show that EFFk(Xa) = EFF~(X) ~ FIRST~(a).
5.2.20. Use Exercise 5.2.19 to give an efficient algorithm to compute EFF(t~)
for any 0~.
5.2.21. Give formal details to show that cases 1 and 3 of Theorem 5.9 yield
violations of the LR(k) condition.
5.2.23. Prove the correctness of Algorithm 5.10.
5.2.24. Prove the correctness of Algorithm 5.11 by showing each of the obser-
vations following that algorithm.
In Chapter 8 we shall prove various results regarding LR grammars.
The reader may wish to try his hand at some of them now (Exercises
5.2.25-5.2.28).
• *5.2.25. Show that every LL(k) grammar is an LR(k) grammar.
• *5.2.26. Show that every deterministic C F L has an LR(1) grammar.
• 5.2.27. Show that there exist grammars which are (deterministically) right-
parsable but are not LR.
• 5.2.28. Show that there exist languages which are LR but not LL.
• *5.2.29. Show that every LC(k) grammar is an LR(k) grammar.

• *5.2.30. What is the maximum number of sets of valid items an LR(k) grammar
can have as a function of the number of grammar symbols, productions,
and the length of the longest production ?
"5.2.31. Let us call an item essential if it has its dot other than at the left end
[i.e., it is added in step (2a) of Algorithm 5.8]. Show that other than
for the set of items associated with e, and for reductions of the empty
string, the definition of the LR(k) table associated with a set of items
could have restricted attention to essential items, with no change in the
table constructed.
• 5.2.32. Show that the action of an LR(1) table on symbol a is shift if and
only if a appears immediately to the right of a dot in some item in the
set from which the table is constructed.
SEC. 5.3 PRECEDENCE GRAMMARS 399
5.2.33, Write a program to test whether an arbitrary grammar is LR(1). Esti-
mate how much time and space your program will require as a function
of the size of the input grammar.
5.2.34. Write a program that uses an LR(1) parsing table as in Fig. 5.9 to parse
input strings.
5.2.35. Write a program that generates an LR(1) parser for an LR(1) grammar.
5.2.36. Construct an LR(1) parser for a small grammar.
*5.2.37. Write a program that tests whether an arbitrary set of LR(1) tables
forms a valid parser for a given CFG.
Suppose that an LR(1) parser is in the configuration (ocT, ax, 7t)
and that the parsing action associated with T and a is error. As in LL
parsing, at this point we would like to announce error and transfer to
an error recovery routine that modifies the input and/or the pushdown
list so that the LR(1) parser can continue. As in the LL case we can
delete the input symbol, change it, or insert another input symbol
depending on which strategy seems most promising for the situation
at hand.
Leinius [1970] describes a more elaborate strategy in which LR(1)
tabIes stored in the pushdown list are consulted.
5.2.38. Write an LR(1) grammar for a sma11 language. Devise an error recovery
procedure to be used in conjunction with an LR(i) parser for this
grammar. Evaluate the efficacy of your procedure.
BIBLIOGRAPHIC NOTES
LR(k) grammars were first defined by Knuth [1965]. Unfortunately, the method
given in this section for producing an LR parser will result in very large parsers
for grammars of practical interest. In Chapter 7 we shall investigate techniques
developed by De Remer [1969], Korenjak [1969], and Aho and Ullman [i971],
which can often be used to construct much smaller LR parsers.
The LR(k) concept has also been extended to context-sensitive grammars by
Walters [1970].
The answer to Exercises 5.2.8-5.2.10 are given by Hopcroft and Ullman [1969].
Exercise 5.2.12 is from Knuth [1965].
5.3. PRECEDENCE G R A M M A R S
The class of shift-reduce parsable g r a m m a r s includes the LR(k) g r a m m a r s

and various subsets of the class of LR(k) g r a m m a r s . In this section we shall
precisely define a shift-reduce parsing a l g o r i t h m a n d consider the class of
400 ONE=PASSNO BACKTRACK PARSING CHAP. 5
t
precedence grammars, an important class of grammars which can be parsed

by an easily implemented shift-reduce parsing algorithm.
5.3.1. Formal Shift-Reduce Parsing Algorithms
DEHNITION
Let G = (N, ~, P, S) be a C F G in which the productions have been
numbered from I to p. A shift-reduce parsing algorithm for G is a pair of
functions (2 = (f, g),tt where f is called the shift-reduce function and g the
reduce function. These functions are defined as follows:
(1) f maps V* x (~ U [$})* to {shift, reduce, error, accept], where
V = N U ~ u {$}, and $ is a new symbol, the endmarker.
(2) g maps V* x (E u [$})* to {1, 2 . . . . , p, error], under the constraint
that if g(a, w) = i, then the right side of production i is a suffix of a.
A shift-reduce parsing algorithm uses a left-to-right input scan and
a pushdown list. The function f decides on the basis of what is on the push-
down list and what remains on the input tape whether to shift the current
input symbol onto the pushdown list or call for a reduction. If a reduction
is called for, then the function g is invoked to decide what reduction to make.
We can view the action of a shift-reduce parsing algorithm in terms of
configurations which are triples of the form
($Xi "'" Xm, al " " a . $ , p l "'" P.)
where
(1) $X1 " " Xm represents the pushdown list, with Xm on top. Each X't
is in N W E, and $ acts as a bottom of the pushdown list marker.
(2) al . " a, is the remaining portion of the original input, a 1 is the cur-
rent input symbol, and $ acts as a right endmarker for the input.
(3) p l . . . Pr is the string of production numbers used to reduce the origi-
nal input to X~ . . . Xma~ " " a,.
We can describe the action of (2 by two relations, t--a-and !-¢-, on configu-
rations. (The subscript (2 will be dropped whenever possible.)
(1) If f ( a , aw) = shift, then (a, aw, r~)p- (aa, w, z~) for all a in V*, w in
(E U {$})*, and n in { 1 , . . . , p ] * .
(2) If f(afl, w) = reduce, g(o~fl, w) = i, and production i is A ~ fl, then
(aft, w, n:) ~ (aA, w, n:i).
(3) If f ( a , w) = accept, then (a, w, n ) ~ accept.
(4) Otherwise, (a, w, 70 ~ error.
•lThese functions are not the functions associated with an LR(k) table.
We define 1- to be the union of ~ and ~--.. We then define ~ and ~ to

have their usual meanings.
We define ~(w) for w ~ ~* to be n if ($, w$, e) t----($S, $, re) 1--- accept,
and ~(w) = error if no such n exists.
We say that the shift-reduce parsing algorithm is valid for G if
(1) L(G) = [wl ~2(w) ~ error}, and
(2) If ~(w) = zc, then ~ is a right parse of w.
Example 5.33
Let us construct a shift-reduce parsing algorithm ~ = (f, g) for the
grammar G with productions
(i) S-~- > SaSb

(2) S re
The shift-reduce function f is specified as follows: For all ~ ~ V* and

x ~ (~: u [$})*,
(1) f(o~S, ex) = shift if c ~ {a, b}.
(2) f(ow, dx) = reduce if e ~ {a, b} and d ~ {a, b}.
(3) f ( $, ax) = reduce.
(4) f($, bx) = error.
(5) f(~zX, $ ) = error for X ~ IS, a}.
(6) f(~b, $) = reduce.
(7) f($S, $) = accept.
(8) f ( $ , $) = error.
The reduce function g is as follows. For all ~ ~ V* and x ~ (~ U [$})*,
(1) g($, ax) = 2
(2) g(oca,cx) = 2 for c ~ {a, b}.
(3) g($SaSb, cx)= 1 for c ~ {a, $}.
(4) g(ocaSaSb, cx)= 1 for c ~ {a, b}.
(5) Otherwise, g(~, x) = error.
Let us parse the input string aabb using ~. The parsing algorithm starts off
in the initial configuration
($, aabb$, e)
The first move is determined by f ( $ , aabb$), which we see, from the speci-
fication o f f , is reduce. To determine the reduction we consult g($, aabb$),
which we find is 2. Thus the first move is the reduction
($, aabb$, e) ~ ($S, aabb$, 2)

402 ONE-PASS NO BACKTRACK PARSING CrIAP. 5
The next move is determined by f($S, aabb$), which is shift. Thus the next
move is
($S, aabb$, 2) ~ ($Sa, abb$, 2)
Continuing in this fashion, the shift-reduce parsing algorithm e would make

the following sequence of moves:
($, aabb$, e) ~ ($S, aabb$, 2)

~- ( $Sa, abb $, 2)
~-- ($SaS, abb$, 22)
($SaSa, bb$, 22)
($SaSaS, bb $, 222)
.L ($SaSaSb, b$, 222)
_r_ ($SaS, b$, 2221)
($SaSb, $, 2221)
.z_ ($S, $, 22211)
accept
Thus, e(aabb) = 22211. Clearly, 22211 is the right parse of aabb. D
In practice, we do not want to look at the entire string in the pushdown

list and all of the remaining input string to determine what the next move
of a parsing algorithm should be. Usually, we want the shift-reduce function
to depend only on the top few symbols on the pushdown list and the next
few input symbols. Likewise, we would like the reduce function to depend
on only one or two symbols below the left end of the handle and on only
one or two of the next input symbols.
In the previous example, in fact, we notice that f depends only on the
symbol on top of the pushdown list and the next input symbol. The reduce
function g depends only on one symbol below the handle and the next input
symbol.
The LR(k) parser that we constructed in the previous section can be
viewed as a "shift-reduce parsing algorithm" in which the pushdown alphabet
is augmented with LR(k) tables. Treating the LR(k) parser as a shift-reduce
parsing algorithm, the shift-reduce function depends only on the symbol
on top of the pushdown list [the current LR(k) table] and the next k input
symbols. The reduce function depends on the table immediately below the
handle on the pushdown list and on zero input symbols. However, in our
present formalism, we may have to look at the entire contents of the stack
in order to determine what the top table is. The algorithms to be discussed
subsequently in this chapter will need only information near the top of the
stack. We therefore adopt the following convention.
CONVENTION
I f f and g are functions of a shift-reduce parsing algorithm and f(0~, w)
is defined, then we assume that f(floc, wx) = f(oc, w) for all fl and x, unless
otherwise stated. The analogous statement applies to g.
5.3.2, Simple Precedence Grammars
The simplest class of shift-reduce algorithms are based on "precedence

relations." In a precedence grammar the boundaries of the handle of a right-
sentential form can be located by consulting certain (precedence) relations
that hold among symbols appearing in right-sentential forms.
Precedence-oriented parsing techniques were among the first techniques
to be used in the construction of parsers for programming languages and
a number of variants of precedence grammars have appeared in the
literature. We shall discuss left-to-right deterministic precedence parsing in
which a right parse is to be produced. In this discussion we shall introduce
the following types of precedence grammars:
(1) Simple precedence.
(2) Extended precedence.
(3) Weak precedence.
(4) Mixed strategy precedence.
(5) Operator precedence.
The key to precedence parsing is the definition of a precedence relation
3> between grammar symbols such that scanning from left to right a right-
sentential form ocflw, of which fl is the handle, the precedence relation 3> is
first found to hold between the last symbol of fl and the first symbol of w.
If we use a shift-reduce parsing algorithm, then the decision to reduce
will occur whenever the precedence relation 3> holds between what is on top
of the pushdown list and the first remaining input symbol. If the relation 3>
does not hold, then a shift may be called for.
Thus the relation .> is used to locate the right end of a handle in a right-
sentential form. Location of the left end of the handle and determination of
the exact reduction to be made is done in one of several ways, depending on
the type of precedence being used.
The so-called "simple precedence" parsing technique uses three prece-
dence relations 4 , -~-, and 3> to isolate the handle in a right-sentential form
ocflw. If fl is the handle, then the relation ~ or ~-- is to hold between all pairs
of symbols in 0~, <~ is to hold between the last symbol of 0¢ and the first
symbol of fl, ~ is to hold between all pairs of symbols in the handle itself,
and the relation 3> is to hold between the last symbol of fl and the first
symbol of w.
404 ONE=PASS NO BACKTRACK PARSING CHAP. 5
Thus the handle of a right-sentential form of a simple precedence grammar

can be located by scanning the sentential form from left to right until the
precedence relation > is first encountered. The left end of the handle is
located by scanning backward until the precedence relation .~ holds. The
handle is the string between < and 3>. If we assume that the grammar is
uniquely invertible, then the handle can be uniquely reduced. This process
can be repeated until the input string is either reduced to the sentence symbol
or no further reductions are possible.
DEHNITION
The Wirth-Weber precedence relations <~, ~-, and 3> for a C F G G =
(N, ~, P, S) are defined on N u ~ as follows:
(t) We say that X . ~ Y if there exists A ---~ ~XBfl in P such that B ~ YT'.
(2) We say that X-~- Y if there exists A --~ ~XYfl in P.
(3) 3> is defined on (N U E ) x ~E, since the symbol immediately to
the right of a handle in a right-sentential form is always a terminal. We say
that X 3> a if A ---~ ~BYfl is in P, B ~ 7,X, and Y *~ a~. Notice that Y
will be a in the case Y ~ ad~.
In precedence parsing procedures we shall find it convenient to add a left
and right endmarker to the input string. We shall use $ as this endmarker
+
and we assume that $ .~ X for all X such that S ==~ X~ and Y 3> $ for all
Y such that S ~ ~ Y.
The calculation of Wirth-Weber precedence relations is not hard. We
leave it to the reader to devise an algorithm, or he may use the algorithm to
calculate extended precedence relations given in Section 5.3.3.
DEFINITION
A C F G G = (N, E, P, S) which is proper, t which has no e-productions,
and in which at most one Wirth-Weber precedence relation exists between
any pair of symbols in N u ~ is called a precedence grammar. A prece-
dence grammar which is uniquely invertible is called a simple precedence
grammar.
By our usual convention, we define the language generated by a (simple)
precedence grammar to be a (simple) precedence language.
Example 5.34
Let G have the productions
S > aSSblc
?Recall that a CFG G is proper if there is no derivation of the form A ~ A, if G has

no useless symbols, and if there are no e-productions except possibly S ~ e in which
case S does not appear on the right side of any production.
The precedence relations for G, together with the added precedence relations
involving the endmarkers, are shown in the precedence matrix of Fig. 5.11.
Each entry gives the precedence relations that hold between the symbol
labeling the row and the symbol labeling the column. Blank entries are
interpreted as error.
a b c
• <. • <.
<. ~ <.
•> .> .> .>
•> .> .> .>
<. <.
Fig. 5.11 Precedence relations.
The following technique is a systematic approach to the construction of

the precedence relations. First, " is easy to compute. We scan the right
sides of the productions and find that a " S, S " S, and S " b.
To compute <Z, we scan the right sides of the productions for adjacent
pairs XC. Then X is related by ~z to any leftmost symbol of a string derived
nontrivially from C. (We leave it to the reader to give an algorithm to find
all such symbols.) In our example, we consider the pairs aS and SS. In each
case, C can be identified with S, and S derives strings beginning with a and
c. Thus, X ~ Y, where X is any of a or S and Y is any of a or e.
To compute .>, we again consider adjacent pairs in the right sides, this
time of the form CX. We find those symbols Y that can appear at the end
of a string derived in one or more steps from C and those terminals d at the
beginning of a string derived in zero or more steps from X. If X is itself
a terminal, then X - - d is the only possibility. Here, SS and Sb are sub-
strings of this form. Y is b or e, and d is a or c in the first case and b in the
second.
It should be emphasized that ~----, ~z, and -> do not have the properties
normally ascribed to = , < , and > on the reals, integers, etc. For example,
"-- is not usually an equivalence relation; ~Z and .> are not normally transi-
tive, and they may be symmetric or reflexive.
Since there is at most one precedence relation in each entry of Fig. 5.1 i,
G is a precedence grammar. Moreover, all productions in G have unique
right sides, so that G is a simple precedence grammar, and L(G) is a simple
precedence language.
Let us consider $accb$, a right-sentential form of G delimited by endmark-
ers. We have $ < a, a < c, and c 3> c. The handle of accb is the first c, so
the precedence relations have isolated this handle.
We can often represent the relevant information in an n × n precedence

matrix by two vectors of dimension n. We shall discuss such representations
of precedence matrices in Section 7.1.
The following theorem shows that the precedence relation ~ occurs at
the beginning of a handle in a right-sentential form, ~ holds between adjacent
symbols of a handle, and .> holds at the right end of a handle. This is true
for all grammars with no e-productions, but it is only in a precedence gram-
mar that there is at most one precedence relation between any pair of symbols
in a viable prefix of a right-sentential form.
First we shall show a consequence of a precedence relation holding
between two symbols.
LEMMA 5.3
Let G = (N, E, P, S) be a proper C F G with no e-productions.

(1) I f X < A o r X " AandA~YaisinP, t h e n X - < Y.
(2) If A < a, A " a, or A -> a and A ~ a Yis a production, then Y 3> a.
Proof We leave (1) for the Exercises and prove (2). If A < a, then there
is a right side fllABfl2 such that B ~ a? for some 7. Since A ~ 0~Y, Y -> a
is immediate. If A " a, there is a right side flIAafl2. As we have a ==~ a and
A ~ 0~Y, it follows that Y .2> a again. If A .> a, then there is a right side
flxBXfl2, where B =* ?A and X *~ aO for some 7 and ~. Since B ~ ?ix Y,
we again have the desired conclusion. [Z]
THEOREM 5.14
Let G = (N, E, P, S) be a proper C F G with no e-productions. If
ss$ @ ... x + Aa, ...
XpXp_~ . . . Xk+aX~,'" X~a~ . . . aq
then
(1) For p i ;> 1, Xt+~ --" X~; and
(4) X~ > a~.
Proof. The proof will proceed by induction on n. For n = 0, we have
$S$ ~ $Xk . . . Xa$. F r o m the definition of the precedence relations we
have $ < X k , X~+, " X ~ f o r k > i ~ I a n d X , > $. Note that X k . . . X 1

cannot be the empty string, since G is assumed to be free of e-productions.
For the inductive step suppose that the statement of the theorem is true
for n. Now consider a derivation'
$S$~X,,...Xk+IAa,...aq
n
X p . . . Xk+lX1, . . . Xaal ... aq
Xp . . . Xj+,Y, ... YiXj_, ... X1a I . . aq
That is, Xj is replaced by Yr "'" Y1 at the last step. Thus, X j _ i , . . . , X, are

terminals; the case j = 1 is not ruled out.
By the inductive hypothesis, Xj+ a ~ X j or Xj+ 1 "-- Xj. Thus, Xj+ , <;. Y,.
by Lemma 5.3(1). Also, Xj is related by one of the three relations to the sym-
bol on its right (which may be a,). Thus, Ya "> Xj_ 1, or Y1 .2> al if j - - 1.
We have Yr " Yr-a " "'" "2--Yi, since Y r ' " Y1 is a right side. Finally,
Xi+ 1 < Xe or X;+ 1 " X; follows by the inductive hypothesis, for p < i < j.
Thus the induction is complete.
COROLLARY 1
If G is a precedence grammar, then conclusion (1) of Theorem 5.14 can

be strengthened by adding "exactly one of < and ~_." Conclusions (1)-(4)
can be strengthened by appending "and no other relations hold."
P r o o f . Immediate from the definition of a precedence grammar. [Z]
COROLLARY 2
Every simple precedence grammar is unambiguous.
P r o o f . All we need to do is observe that for any right-sentential form fl,
other than S, the previous right-sentential form a such that a r~m fl is unique.
From Corollary 1 we know that the handle of fl can be uniquely determined
by scanning fl surrounded by endmarkers from left to right until the first 3>
relation is found, and then scanning back until a <~ relation is encountered.
The handle lies between these points. Because a simple precedence grammar is
uniquely invertible, the nonterminal to which the handle is to be reduced is
unique. Thus, ~ can be uniquely found from ft. D
We note that since we are dealing only with proper grammars, the fact
that this and subsequent parsing algorithms operate in linear time is not
difficult to prove. The proofs are left for the Exercises.
We shall now describe how a deterministic right parser can be constructed
for a simple precedence grammar.
408 ONE-PASS NO BACKTRACK PARSING CI-IAP. 5
ALGORITHM 5.12
Shift-reduce parsing algorithm for a simple precedence grammar.
Input. A simple precedence grammar G = (N, X, P, S) in which the
productions in P are numbered from 1 to p.
Output. CZ= ( f , g), a shift-reduce parsing algorithm.
Method.
(1) The shift-reduce parsing algorithm will employ $ as a bottom marker
for the pushdown list and a right endmarker for the input.
(2) The shift-reduce function f will be independent of the contents of
the pushdown list except for the topmost symbol and independent of the
remaining input except for the leftmost input symbol. Thus we shall define
f only on (N u X U {$}) x (X u [$}), except in one case (rule c).
(a) f(X, a) = shift if X < a or X - - ~ a.
(b) f ( X , a) = reduce if X -> a.
(c) f($S, $) = accept.l"
(d) f ( X , a) = error otherwise.
(These rules can be implemented by consulting the precedence matrix itself.)
(3) The reduce function g depends only on the string on top of the push-
down list up to one symbol below the handle. The remaining input does not
affect g. Thus we define g only on (N U X t,.) {$})* as follows:
(a) g(X~+lXkXk_l "'" XI, e) = i if Xk+~ < Xk, Xj+~ " Xj for k > j
> 1, and production i is A ~ X k X k - ~ ' ' " Xi. (Note that the
reduce function g is only invoked when X~ 3> a, where a is the
current input symbol.)
(b) g(~z, e) = error, otherwise. [Z]
Example 5.35
Let us construct a shift-reduce parsing algorithm ~ = ( f , g) for the gram-
mar G with productions
(1) S > aSSb
(2) S ~ c
The precedence relations for G are given in Fig. 5.11 on p. 405. We can
use the precedence matrix itself for the shift-reduce function f. The reduce
function g is as follows"
(1) g(XaSSb) = 1 if X ~ IS, a, $].
(2) g(Xc) = 2 if X ~ [S, a, $}.
(3) g(00 = error, otherwise.
tNote that this rule may take priority over rules (2a) and (2b) when X = S and a = $.
With input accb, ¢~ would make the following sequence of moves"
($, accb $, e) ~ ($a, ccb $, e)

(Sac, cb $, e)
_r_ ( $aS, cb $, 2)
($aSc, b$, 2)
._r_ ( $aSS, b $, 22)
($aSSb, $, 22)
.z_ ($S, $, 221)
In configuration (Sac, cb$, e), for example, we have f(c, b ) = reduce and
g(ac, e) = 2. Thus
(Sac, cb $, e) R ( $aS, cb $, 2)
Let us examine the behavior of a on aeb, an input not in L(G). With
acb as input a would make the following moves:
($, acb $, e) ~ ( $a, cb $, e)

~2_ (Sac, b $, e)
[--($aS, b$,2)
12- (SaSh, $, 2)
R error
In configuration ($aSb, $, 2), f(b, $) = reduce. Since $ < a and a-~- S ~ b,

we can make a reduction only if aSb is the right side of some production.
However, no such production exists, so g(aSb, e) = error.
In practice we might keep a list of "error productions." Whenever an
error is encountered by the g function, we could then consult the list of error
productions to see if a reduction by an error production can be made. Other
precedence-oriented error recovery techniques are discussed in the biblio-
graphical notes at the end of this section. D
THEOREM 5.15
Algorithm 5.12 constructs a valid shift-reduce parsing algorithm for
a simple precedence grammar.
Proof. The proof is a straightforward consequence of Theorem 5.14,
the unique invertibility property, and the construction in Algorithm 5.12.
The details are left for the Exercises. El
It is interesting to consider the classes of languages which can be generated

by precedence grammars and simple precedence grammars. We discover
that every CFL without e has a precedence grammar but that not every CFL
without e has a simple precedence grammar. Moreover, for every CFL
without e we can find an e-free uniquely invertible CFG. Thus, insisting
that grammars be both precedence and uniquely invertible diminishes their
language-generating capability. Every simple precedence grammar is an
LR(1) grammar, but the LR(1) language (a0'l'l i ~ 1} w {b0tl2'l i ~ 1} has
no simple precedence grammar, as we shall see in Section 8.3.
5.3.3. Extended Precedence Grammars
It is possible to extend the definition of the Wirth-Weber precedence

relations to pairs of strings rather than pairs of symbols. We shall give a defi-
nition of extended precedence relations that relate strings of m symbols to
strings of n symbols. Our definition is designed with shift-reduce parsing in
mind.
Understanding the motivation of extended precedence requires that we
recall the two roles of precedence relations in a shift-reduce parsing
algorithm"
(1) Let ~X be the m symbols on top of the pushdown list (with X on top)
and a w the next n input symbols. If ~X < a w or ~ X " aw, then a is to be
shifted on top of the pushdown list. If ~ X 3> aw, then a reduction is to be
made.
(2) Suppose that X~, . . . X 2 X i is the string on the pushdown list and that
a 1 . . . a~ is the remaining input string when a reduction is called for (i.e.,
Xm "-" Xi 3> a 1 . . . a,). If the handle is X k " ' " X1, we then want
for k > j ~ l and
Xm+k''" Xk+l < Xk ''" Xlal "" a._k.'t
Thus parsing according to a uniquely invertible extended precedence

grammar is similar to parsing according to a simple Wirth--Weber precedence
grammar, except that the precedence relation between a pair of symbols X
and Y is determined by ~X and Y f l , where ~ is the m -- 1 symbols to the
left of X and fl is the n -- 1 symbols to the right of Y.
We keep shifting symbols onto the pushdown list until a 3> relation is
encountered between the string on top of the pushdown list and the remain-
ing input. We then scan back into the pushdown list over " relations until
"tWo assume that X, X r - t . . . X l a l . . . a,_,. is XrX,-1 . . . X,-n+l if r ~ n.

SEC. 5.3 PRECEDENCE GRAMMARS 4'1 1
the first .~ relation is encountered. The handle lies between the <~ and 3>
relations.
This discussion motivates the following definition.
DEFINITION
Let G = (N, E, P, S) be a proper C F G with no e-production. We

define the (m, n) precedence relations 4 , ~ , and 3> on (N U E U {$})m ×
(N U ~ U {$})" as follows" Let
$r"s$" ~ XpXp_ 1 ... Xk+ a A a ~ . . . aq

rm
X~,X~,_ i " " " X k + l X k " " " X l a i . . . aq
be any rightmost derivation. Then,

(1) 0c .~ fl if 0c consists of the last m symbols of XpXp_~ . . . Xk+~, and
either
(a) fl consists of the first n symbols of X k " " X~a~ . . . aq, o r
(b) X k is a terminal and fl is in FIRST,(Xk • • • X~a~ . . . aq).
(2) 0c ~- fl for all j, k > j ~ 1, such that 0c consists of the last m symbols
of X p X p _ ~ . . . X j + ~, and either
(a) p consists of the first n symbols of X)Xj_~ . . . X~a~ . . . aq, or
(b) X i is a terminal and fl is in FIRST,(XjXj_~ . . . X ~ a ~ . . . aq).
(3) X ~ X m _ ~ . . . X~ 3> a~ . . . a,,
We say that G is an (m, n) p r e c e d e n c e g r a m m a r if G is a proper C F G with
no e-production and the relations ~ , __~__,and 3> are pairwise disjoint. It
should be clear from Lemma 5.3 that G is a precedence grammar if and only
if G is a (1, i) precedence grammar. The details concerning endmarkers are
easy to handle. Whenever n = 1, conditions (lb) and (2b) yield nothing new.
We also comment that the disjointness of the portions of <~ and " arising
solely from definitions (1 b) and (2b) do not really affect our ability to find a
shift-reduce parsing algorithm for extended precedence grammars. We could
have given a more complicated but less restrictive definition, and leave the
development of such a class of grammars for the Exercises.
We shall give an algorithm to compute the extended precedence relations.
It is clearly applicable to Wirth-Weber precedence relations also.
ALGORITHM 5.13
Construction of (m, n) precedence relations.
Input. A proper C F G G = (N, E, P, S) with no e-production.
Output. The (m, n) precedence relations <~, ~--, and 3> for G.
M e t h o d . We begin by constructing the set $ of all substrings of length
rn -+ n that can appear in a string 0q~u such that $mS$" ~ ~ A w =*. ~ f l w and
rm rrn
u = FIRST,(w). The following steps do the job"

(1) Let S--{$ms$"-~, $m-~s$"} • The two strings in S have n o t been
"considered."
(2) If ~ is an unconsidered string in $, "consider" it by performing the
following two operations.
(a) If d~ is not of the form aAx, where Ix[ _< n, do nothing.
(b) If 5 = aAx, l xl _< n, and A ~ N, add to $, if not already there,
those strings ? such that there exists A ----~fl in P and ? is a sub-
string of length m -+- n of aflx. Note that since G is proper, we
have l~Pxl>_m q-- n. New strings added to S are not yet consid-
ered.
(3) Repeat step (2) until no string in $ remains unconsidered.
From set S, we construct the relations < , -~-, and ->, as follows:
(4) For each string aAw in $ such that I • I = m and for each A ---~ fl in P,
let a < 6, where 5 is the first n symbols of flw or fl begins with a terminal
and d~ is in FIRST,(flw).
(5) For each string aA in g such that t~1= m and for each production
A ~ fllXYfl2 in P, let ~1 ~ ~2, where 51 is the last m symbols of affiX
and 5z is the first n symbols of Yfl2?, or Y is a terminal and ~2 = Yw for
some w in FIRST,_I(fl2?).
(6) For each string aAw in S such that twl = n and for each A ---~ fl in
P, let ~ .> w, where ,6 is the last m symbols of aft. D
Example 5.36
Consider the grammar G having the productions
S > 0 S l l 1011
The (1, 1) precedence relations for G are shown in Fig. 5.12. Since 1 ~- 1
and 1 3> 1, G is not a (1, 1) precedence grammar.
Let us use Algorithm 5.13 to compute the (2, 1) precedence relations for G.
We start by computing S. Initially, $ = [$S$, $$S}. We consider $S$ by
adding $0S, 0S1, S11, 11 $, (these are all the substrings of $0S11 $ of length 3),
S 0 1 $
"_,.> .>
Fig. 5.12 (1, 1) precedence relations.

SEC. 5.3 PRECEDENCEGRAMMARS 413
and $01 and 011 (substrings of $0115 of length 3). Consideration of $$S
adds $$0. Consideration of $0S adds $00, 00S, and 001. Consideration of
0S1 adds 111, and consideration of 00S adds 000. These are all the members
of $.
To construct <Z, we consider those strings in S with S at the right. We
obtain $$ <Z 0, $0 ~ 0, and 00 ~ 0. To construct -~-, we again consider the
strings in S with S at the right and find $0 " S, 0S " 1, S1 " 1, $0 " 1,
01 ~ 1, 00-~- S, and 00-~- 1.
To construct .>, we consider strings in $ with S in the middle. We find
11 3> $ from $S$ and i 1 3> 1 from 0S1.
The (2,1)precedence relations for G are shown in Fig. 5.13. Strings of
length 2 which are not in the domain of ~-, <Z, or 3> do not appear.
S 0 l $
$$ <.
$0
0S
l
00 <.
01
SI
11 i -> ">
Fig. 5.13 (2, 1) precedence relations.
Since there are no (2, 1) precedence conflicts, G is a (2, 1) precedence

grammar. D
THEOREM 5.16
Algorithm 5.13 correctly computes 4 , -~-, and 3>.
Proof We first show that S is defined correctly. That is, 7, ~ S if and
only ifl r l = m + n and 7' is a substring of t~flu, where $mS$" *~rmt~Aw ==>,:m
~flW
and u = FIRST,(w).
Only if: The proof is by induction on the order in which strings are added
to $. The basis, the first two members of $, is immediate. For the induction,
suppose that 7 is added to $, because txAx is in S and A --~ fl is in P; that is,
~, is a substring of ¢tflx. Since ~Ax is in $, from the inductive hypothesis we
have the derivation $'S$" *=* l-Ill

~ ' A ' w =:>
l-hi
oc'fl'uv, where u = FIRST,(w) and
e'fl'u can be written as JlOCAx~ 2 for some j~ and ~2 in (N u I~ U {$})*.
Since G is proper, there is some y ~ (I~ u {$})* such that J2 ~ Y- Thus,
$ " S $ " ~ ~ o ~ A x y v ~ ~ l e f l x y v . Since ~, is a substring of eflx of length
m ÷ n, it is certainly a substring of eflz, where z = F I R S T , ( x y v ) .
If: An induction on k shows that if $~S$" =~k

l'Ill
~ A w :=>
rill
OCflw, then every
substring of ocflu of length m + n is in $, where u = FIRST,(w).
That steps (4)-(6) correctly compute <~, - - , and .> is a straightforward
consequence of the definitions of these relations. [--]
We may show the following theorem, which is the basis of the shift-reduce
parser for uniquely invertible (m, n) precedence grammars analogous to
that of Algorithm 5.12.
THEOREM 5 . 1 7
Let G - - (N, E, P, S) be an arbitrary proper C F G and let m and n be

integers. Let
(5.3.1) l-m
x , x , _ , . . . x , + xAax ... a,
l-m> XpXp_ 1 . . . Xk +l Xk . . . X l a 1 . . . aq
(1) For j such that p - m ~j > k, let 0c be the last m symbols of

X~Xp_~ . . . Xj+~ and let fl be the first n symbols of X j X j _ ~ . . . Xaa~ . . . aq.
If fl ~ (E u {$})*, then either ~ <Z fl or ~ " ft.
(2) Xm+kXm+k_1 . . . Xk+ 1 ~ fl, where fl consists of the first n symbols
of X~ . . . X l a 1 . . . aq.
(3) For k > j ~> 1 let ~ be the last m symbols in XpXp_ 1 "'" Xj+ ~ and let
fl be the first n symbols of X j X j _ i . . . Xaal . . . ap. Then ~ " ft.
(4) X , , % , , _ 1 . . . X i .> a x . . . a , .
P r o o f All but statement (1) are immediate consequences of the defini-

tions. To prove (1), we observe that since j > k, fl does not consist entirely
of $'s. Thus the derivation (5.3.1) can be written as
i
$~S$" ~ r m ~,Bw
- -r m~ ~ ' ~ 2 w
(5.3.2)
------>
rm
XpXp_l ... Xk+lAal ... aq
r m ~ X p X p _ l "'" X k + l X k ' ' " X l a l . . " aq

SEC. 5.3 PRECEDENCE GRAMMARS 41 5
where i is as large as possible such that B --, ~152 is a production in which

62 ~ e, B derives both Xj+~ and Xj, and ydi1 = XpXp_l . . . Xj+I.
If the first symbol of ~2 is a terminal, say ~2 = a83, then by rule 2(b) in
the definition o f - - ' , we have Xj+,,,Xj+,,,_i . . . Xj+i "-~ fl, where fl = ax and
x is in FIRST,_ a(63 w).
If the first symbol of ~2 is a nonterminal, let 6z = C~3. Since Xj is a ter-
minal, by hypothesis, C must be subsequently rewritten after several steps of
derivation (5.3.2) as De, for some D in N and e in (N W E)*. Then, D is
replaced by XiO for some 0, and the desired relation follows from rule 2(b)
of the definition of <~. D
COROLLARY
If G of Theorem 5.17 is an (m, n) precedence grammar, then Theorem
5.17 can be strengthened by adding the condition that no other relation holds
between the strings in question to each of (1)-(4) in the statement of the
theorem. D
The shift-reduce parsing algorithm for uniquely invertible extended

precedence grammars is exactly analogous to Algorithm 5.12 for simple
precedence grammars, and we shall only outline it here. The first n unexpend-
ed input symbols can be kept on top of the pushdown list. If Xm " " X1
appears top, al -. • a, is the first n input symbols and Xm • .. X1 " al " " a,
or Xm " " " X~ ~ al • "" a,, then we shift. If Xm "- • X 1 "> a~ .. • a,, we reduce.
Part (1) of Theorem 5.17 assures us that one of the first two cases will occur
whenever the handle lies to the right of X1. By part (4) of Theorem 5.17,
the right end of the handle has been reached if and only if the third case
applies.
To reduce, we search backwards through ~-- relations for a ~ relation,
exactly as in Algorithm 5.12. Parts (2) and (3) of Theorem 5.17 imply that
the handle will be correctly isolated.
5.3.4. Weak Precedence Grammars
Many naturally occurring grammars are not simple precedence grammars,

and in many cases rather awkward grammars result from an attempt to find
a simple precedence grammar for the language at hand. We can obtain
a larger class of grammars which can be parsed using precedence techniques
by relaxing the restriction that the <~ and ~-- precedence relations be disjoint.
We still use the 3> relation to locate the right end of the handle. We can
then use the right sides of the productions to locate the left end of the handle
by finding a production whose right side matches the symbols immediately
to the left of the right end of the handle. This is not much more expensive
than simple precedence parsing. When parsing with a simple precedence
grammar, once we had isolated the handle we still needed to determine which
41 6 ONE-PASSNO BACKTRACK PARSING CHAP. 5
production was to be used in making the reduction, and thus had to examine
these symbols anyway.
To make this scheme work, we must be able to determine which produc-
tion to use in case the right side of one production is a suffix of the right
side of another. For example, suppose that ocflTw is a right-sentential form in
which the right end of the handle occurs between ? and w. If A----~ ? and
B ~ / 3 7 are two productions, then it is not apparent which production should
be used to make the reduction.
We shall restrict ourselves to applying the longest applicable production.
The weak precedence grammars are one class of grammars for which this rule
is the correct one.
DEFINITION
Let G = (N, E, P, S) be a proper C F G with no e-productions. We say

that G is a weak precedence grammar if the following conditions hold:
(1) The relation 3> is disjoint from the union of < and "
(2) If A --. otXfl and B --* fl are in P with X in N u Z, then neither of
the relations X <Z B and X " B are valid.
Example 5.37
The grammar G with the following productions is an example of a weak
precedence g r a m m a r t :
E > E -t- TI + TI T
T >T,F[F
F > (E)la
The precedence matrix for G is shown in Fig. 5.14.

Note that the only precedence conflicts are between < and ~-, so condi-
tion (1) of the definition of a weak precedence grammar is satisfied. To
see that condition (2) is not violated, first consider the three productions
E---+ E + T, E ~ -b T, and E---~ T.$ From the precedence table we see that
no precedence relation holds between E and E or between -b and E (with
--b on the left side of the relation, that is). Thus these three productions do
not cause a violation of condition (2). The only other productions having
one right side a suffix of the other are T ----~ T • F and T ~ F. Since there is
no precedence relation between • and T, condition (2) is again satisfied.
Thus G is a weak precedence grammar.
t It should be obvious that G is related to our favorite grammar Go. In fact, L(G)
is just L(Go) with superfluous unary + signs, as in -q- a . (q-- a -k- a), included. Go is
another example of a uniquely invertible weak precedence grammar which is not a simple
precedence grammar.
:l:The fact that these three productions have the same left side is coincidental.
E T F a ( )
•> .> .> .>
•> .> .> .>
) •> .> .> .>

i7
( <., "= <- <. <. <. <.
+ <.," <. <. <.
_• <. <.
$ <- <- ~ <- <. <. <.
Fig. 5.14 Precedence matrix.
Although G is not a simple precedence grammar, it does generate a simple

precedence language. Later we shall see that this is always true. Every unique-
ly invertible weak precedence grammar generates a simple precedence lan-
guage. [Z]
Let us now verify that in a right-sentential form of a weak precedence

grammar the handle is always the right side of the longest applicable produc-
tion.
LEMMA 5.4
Let G = (N, 2;, P, S) be a weak precedence grammar, and let P contain
production B ---~ ft. Suppose that $S$ *~m~,Cw =~'~m~Xflw. If there exists
a production A ~ tzXfl for any g, then the last production applied was
not B ---~ ft.
Proof. Assume on the contrary that C = B and 7 = 5X. Then X ~ B
or X " B by Theorem 5.14 applied to derivation S *~ rm
~,Cw. This follows
because the handle of ~,Cw ends somewhere to the right of C, and thus C is
one of the X's of Theorem 5.14. But we then have an immediate violation
of the weak precedence condition. [Z]
LEMMA 5.5
Let G be as in Lemma 5.4, and suppose that G is uniquely invertible.
If there is no production of the form A--~ txXfl, then in the derivation
$S$ ~l ' I l l ?Cw ==>

rill
gXflw, we must have C = B and 7' = OX (i.e., the last
production used was B ----~fl).
Proof Obviously, C was replaced at the last step. The left end of the
handle of OXltw could not be anywhere to the left of X by the nonexistence
of any production A ~ o~X]3. If the handle ends somewhere right of the
first symbol of fl, then a violation of Lemma 5.4 is seen to occur with B ~ ,0
playing the role of A ----~ocXfl in that lemma. Thus the handle is fl, and the
result follows by unique invertibility. [53
Thus the essence of the parsing algorithm for uniquely invertible weak
precedence grammars is that we can scan a right-sentential form (surrounded
by endmarkers) from left to right until we encounter the first 3> relation.
This relation delimits the right end of the handle. We then examine symbols
one at a time to the left of .~. Suppose that B ~ fl is a production and we
see Xfl to the left of the .~ relation. If there is no production of the form
A--* ~X,8, then by Lemma 5.5, fl is the handle. If there is a production
A --. ocXfl, then we can infer by Lemma 5.4 that B --* fl is not applicable.
Thus the decision whether to reduce fl can be made examining only one
symbol to the left of ft.
We can thus construct a shift-reduce parsing algorithm for each uniquely
invertible weak precedence grammar.
ALGORITHM 5.14
Shift-reduce parsing algorithm for weak precedence grammars.
Input. A uniquely invertible weak precedence grammar G = (N, Z, P, S)
in which the productions are numbered from 1 to p.
Output. ~Z = ( f , g), a shift-reduce parsing algorithm for G.
Method. The construction is similar to Algorithm 5.12. The shift-reduce
function f is defined directly from the precedence relations:
(1) f ( X , a) = shift if X < a or X ~ - a.
(2) f ( X , a) = reduce if X .> a.
(3) f($S, $) = accept.
(3) f ( X , a) = error otherwise.
The reduce function g is defined to reduce using the longest applicable
production"
(4) g(Xfl) = i if B ~ fl is the ith production in P and there is no produc-
tion in P of the form A ~ ocXfl for any A and u.
(5) g(cx) = error otherwise.
THEOREM 5.18
Algorithm 5.14 constructs a valid shift-reduce parsing algorithm for G.
Proof The proof is a straightforward consequence of Lemmas 5.4 and

5.5, the definition of a uniquely invertible weak precedence grammar, and
the construction of a itself. D
There are several transformations which can be used to eliminate preced-

ence conflicts from grammars. Here we shall present some useful transfor-
mations of this nature which can often be used to map a nonprecedence
grammar into an equivalent (1, 1) precedence grammar or a weak precedence
grammar.
Suppose that in a grammar we have a precedence conflict of the form
X " Y and X-> Y. Since X " Y there exist one or more productions in
which the substring X Y appears on the right side. If in these productions we
replace X by a new nonterminal A, we will eliminate the precedence relation
X ~ Y and thus resolve this precedence conflict. We can then add the
production A ----~ X to the grammar to preserve equivalence. If X alone is
not the right side of any other production, then unique invertibility will be
preserved.
Example 5.38
Consider the grammar G having the productions
S~0Sl11011
We saw in Example 5.36 that G is not a simple precedence grammar because

1 " 1 and 1 -> 1. However, if we substitute the new nonterminal A for
the first 1 in each right side and add the production A ~ 1, we obtain the
simple precedence grammar G' with productions
S > OSAltOA1
A- >1
The precedence relations for G' are shown in Fig. 5.15. D
S A 0 1
, ,
S "= ~
Ai "=
0 =" I ~ <-
1 "~ "~
$ <-
Fig. 5.15 Precedence relations for G'.
420 ONE-PASS NO BACKTRACK PARSING CHAP, 5
Similar transformations can be used to eliminate some precedence con-

flicts of the form .7( ~ Y, X .> Y (and also of the form X ~ Y, X ~ Y if
simple precedence is desired).
When these techniques destroy unique invertibility, it may be possible to
resolve precedence conflicts by eliminating productions as in Lemma 2.14.
Example 5.39
E >E+TIT
T >T*FIF
F > al(E) la(L )
L > L,E[E
In this grammar L represents a list of expressions, and variables can be sub-

scripted by an arbitrary sequence of expressions.
G is not a weak precedence grammar since E ~ ) and E 3> ). We could
eliminate this precedence conflict by replacing E in F---, (E) by E' and adding
the production E'----~ E. But then we would have two productions with E
as the right side. However, if we instead eliminate the production F ~ a(L)
from G by substituting for L as in Lemma 2.14, we obtain the equivalent
grammar G with the productions
E >E+TIT
T > T* FIF
F >al(E) la(Z, E) Ia(E)
L >L, EIE
Since L no longer appears to the left of), we do not have E 3> ) in this gram-
mar. We can easily verify that G' is a weak precedence grammar. D
We can use a slight generalization of these techniques to show that every

uniquely invertible weak precedence grammar can be transformed into
a simple precedence grammar. Thus the uniquely invertible weak precedence
grammars are no more powerful than the simple precedence grammars in
their language-generating capability, although, as we saw in Example 5.37,
there are uniquely invertible weak precedence grammars which are not
simple precedence grammars.
THEOREM 5.19
A language is defined by a uniquely invertible weak precedence grammar
if and only if it is a simple precedence language.
Proof.
lf: Let G = (N, l~, P, S) be a simple precedence grammar. Then clearly,
condition (1) of the definition of weak precedence grammar is satisfied.
Suppose that condition (2) were not satisfied. That is, there exist A ~ txXYfl
and B ---~ Yfl in P, and either X < B or X ~ B. Then X <~ Y, by Lemma
5.3. But X = ' Y because of the production A ---, o~XYfl. This situation is
impossible because G is a precedence grammar.
Only if." Let G = (N, E, P, S) be a uniquely invertible weak precedence

grammar. We construct a simple precedence grammar G ' = (N', I;, P', S)
such that L(G') = L(G). The construction of G' is given as follows:
(1) Let N' be N plus new symbols of the form [0c] for each e ~ e such
that A ---~ fie is in P for some A and ft.
(2) Let P ' consist of the following productions:
(a) IX]----~ X for each [X] in N' such that X is in N U l~.
(b) [X~] ----~ X[e] for each [Xe] in N', where X is in N U E and 0c ~ e.
(c) A ~ [0c] for each A ----~t~ in P.
We shall show that 4 , ~-, and .> for the grammar G' are mutually dis-
joint. No conflicts can involve the endmarker. Thus let X and Y be in
N' U E. We observe that
(1) If X < Y, t h e n X i s i n N U E ;
(2) If X ~ Y, then X is in N U E, and Y is in N ' -- N, since right sides
of length greater than one only appear in rule (2b); and
(3) If X .2> Y, then X is in N' U E and Y is in E.
Part1: " n 3> = ~ . I f X " Y, then Y is in N' -- N. I f X . > Y, then Y

is i n E . Clearly, " n .> = ~ .
Part 2: .~ n -> = ~ . Suppose that X - ~ Y and X -> Y. Then X is in
N u E and Y is in E. Since X < Y in G', there is a production of the form
+
[Xel] --~ X[~i] in P ' such that [e~] 7 Yez for some e2 in (N' U E)*., But Xe~
must be the suffix of some production A --~ 0c3Xel in P. Now ea ~ Ye'z for
some 0c~ in (N U E)*. Thus in G we have X " Y or X < Y.
Now consider X - > Y in G'. There must be a production [Bill] ~ B[fll]
+ +
in P' such that B =~G'
fl2X and [fl~] ~ Yfl3 for some f12 in (N U I~)* and f13
in (N' U I~)*. In G, Bill is the suffix of some production C ~ )?Bfl~ in P.
Moreover, B ~ G fl2X and fla ~G Yfl~ for some fl'3 in (N w E)*. Thus
X.> YinG.
We have shown that if in G' we have X .~ Y and X 3> Y, then in G,
either X ~ Y and X -> Y or X--~ Y and X .> Y. Either situation contradicts
the assumption that G is a weak precedence grammar. Thus, <~ n ~ --

in G'.
Part 3: <~ n ---" -- ~ . We may assume that X <~ [Y~] and X ~ [Y~],
for some X in N u E, and [Y~] in N' -- N. This implies that there are pro-
+
ductions [XAfl]---, X[A]3] and B - - , [Y~] in P ' such that [Aft] ~ BTfl =~
(7'
[Y0~]?fl for some ~, in (N' U E)*, A and B in N.
This, in turn, implies that there are productions C - - ~ 6 X A # and
B ~ Ya in P such that A *=~ G
B~,' for some ~.' in (N U E)*. Thus in G we have
X <~ B or X-~" B. (The latter occurs if and only if B z A.)
Now consider A" ~__ [Y~]. Then there is a production [XY~]--~ X[Yoc] in
P', and thus there is a production D --~ eXY~ in P.
Therefore if X <~ [Y~] and X--" [Y~] in G', then there are two produc-
tions in P of the form B --~ Y~ and D ---, eXY~, and X <~ B or X ~ B in
G, violating condition (2) of the definition of weak precedence grammar.
The form of the productions in P ' allows us to conclude immediately
that G' is uniquely invertible if G is. Thus we have that L(G') is a simple
precedence language. A proof that L(G') -- L(G) is quite straightforward and
is left for the Exercises. [~
COROLLARY
Every uniquely invertible weak precedence grammar is unambiguous.

Proof If there were two distinct rightmost derivations in G of Theorem
5.19, we could construct distinct rightmost derivations in G' in a straight-
forward manner. [~
The construction in the proof of Theorem 5.19 is more appropriate for

a theoretical proof than a practical tool. In practice we could use a far less
exhaustive approach. We shall give a simple algorithm to convert a uniquely
invertible weak precedence grammar to a simple precedence grammar. We
leave it for the Exercises to show that the algorithm works.
ALGORITHM 5.15
Conversion from uniquely invertible weak precedence to simple preced-

ence.
Input. A uniquely invertible weak precedence grammar G = (N, E, P, S).
Output. A simple precedence grammar G' with L(G') z L(G).
Method.
(1) Suppose that there exists a particular X and Y in the vocabulary of
G such that X ~ Y and X ~ Y. Remove from P each production of the form
A ~ ~xXY[3, and replace it by A --~ ocX[Yfl], where [Yfl] is a new nonter-
minal.
(2) F o r each [Yfl] i n t r o d u c e d in step (1), replace a n y p r o d u c t i o n s o f

the f o r m B ~ Yfl by B ---~ [Yfl], a n d a d d the p r o d u c t i o n [Yfl] ---~ Yfl to P.
(3) R e t u r n to step (1) as often as it is applicable. W h e n it is n o l o n g e r
applicable, let the resulting g r a m m a r be G', a n d halt.
Example 5.40
Let G be as in E x a m p l e 5.37. W e c a n a p p l y A l g o r i t h m 5.15 t o o b t a i n

the g r a m m a r G' h a v i n g p r o d u c t i o n s
E~ E + [T]I + [T] [ [T]

T >T,F[F
F :- ([E)]la
IT] :- T
[g)] ~ E)
The two a p p l i c a t i o n s o f step (1) are to the pairs X = (, Y = E a n d

X = + , Y = T. The p r e c e d e n c e relations for G' are given in Fig. 5.16. [~]
E T F [TI [E)] a ( ) + • $
] °
I .> ,>
•> .> •> .>

• o
[TI ! .> .> .>
[E)I •> .> .> .>
•> .> .> .>
•> .> .> .>
<. <.j <. <. <. <. <.

I
o
<" <" i <" <"

J " I <. <-
$ <- <. <. <. <. <. <.
Fig. 5.16 Simple precedence matrix.

EXERCISES
5.3.1. Which of the following grammars are simple precedence grammars ?

(a) Go.
(b) S ~ if E then S else S[a
E ~ E or bib.
(e) S ~ ASlA
A ~ (S)I().
(d) S---, SAIA
A ~ (S)l().
5.3.2. Which of the grammars of Exercise 5.3.1 are weak precedence gram-
mars ?
5.3.3. Which of the grammars of Exercise 5.3.1 are (2, 1) precedence gram-
mars ?
5.3.4. Give examples of precedence grammars for which
(a) " is neither reflexive, symmetric, nor transitive.
(b) < is neither irreflexive nor transitive.
(c) 3> is neither irreflexive nor transitive.
*5.3.5. Show that every regular set has a simple precedence grammar. Hint"
Make sure your grammar is uniquely invertible.
*5.3.6. Show that every uniquely invertible (m, n) precedence grammar is an
LR grammar.
*5.3.7. Show that every weak precedence grammar is an LR grammar.
5.3.8. Prove that Algorithm 5.12 correctly produces a right parse.
5.3.9. Prove that G is a precedence grammar if and only if G is a (1, 1) pre-
cedence grammar.
5.3.10. Prove Lemma 5.3(1).
5.3.11. Prove that Algorithm 5.14 correctly produces a right parse.
5.3.12. Give a right parsing algorithm for uniquely invertible (m, n) precedence
grammars.
5.3.13. Prove the corollary to Theorem 5.17.
"5.3.14. Show that the construction of Algorithm 5.15 yields a simple preced-
ence grammar equivalent to the original.
5.3.15. For those grammars of Exercise 5.3.1 which are weak precedence
grammars, give equivalent simple precedence grammars.
"5.3.16. Show that the language L = {a0"l"ln ~ 1} U {b0"12" In >_ 1} is not a
simple precedence language. Hint: Think of the action of the right
parser of Algorithm 5.12 on strings of the form a0"l" and b0"l ", if L
had a simple precedence grammar.
"5.3.17. Give a (2, i) precedence grammar for the language of Exercise 5.3.16.
EXERCISES 425
"5.3.18. Give a simple precedence grammar for the language {0nalnln > 1}
u {0nbl2nin >__ 1}.
'5.3.19. Show that every context-free grammar with no e-productions can be
transformed into a (1, 1) precedence grammar.
5.3.20. For C F G G = (N, X, P, S), define the relations 2,/z, and p as follows:
(1) AAX if A ---~ X~ is in P for some 0~.
(2) X/t Y if A ~ ~XYfl is in P for some 0~ and ft. Also, $/zS and
S/z$.
(3) XpA if A ---~ ~ X is in P for some 0~.
Show the following relations between the Wirth-Weber precedence
relations and the above relations ( + denotes the transitive closure;
• denotes reflexive and transitive closure)'
(a) < = / z 2 ÷.
(b) ~ u {($, S), (S, $)} = ~.
(c) 3> = p*ltA* n ((N u ~) x ~).
• "5.3.21. Show that it is undecidable whether a given grammar is an extended
precedence grammar [i.e., whether it is (m, n) precedence for some m
and n].
• 5.3.22. Show that if G is a weak precedence grammar, then G is an extended
precedence grammar (for some m and n).
5.3.23. Show t h a t a is in FOLLOWl(A) if and only if A < a, A ~ a, or A 3> a.
5.3.24. Generalize Lemma 5.3 to extended precedence grammars.
5.3.25. Suppose that we relax the extended precedence conditions to permit
0c < w and 0~ ~ w if they are generated only by rules (lb) and (2b).
Give a shift-reduce algorithm to parse any grammar meeting the
relaxed definition.
Research Problem
5.3.26. Find transformations which can be used to convert grammars into simple
or weak precedence grammars.
Open Problem
5.3.27. Is every simple precedence language generated by a simple precedence
grammar in which the start symbol does not appear on the fight side
of any production? It would be nice if so, as otherwise, we might
attempt to reduce when $S is on the pushdown list and $ on the input.
5.3.28. Write a program to construct the Wirth-Weber precedence relations
for a context-free grammar G. Use your program on the grammar for
PL360 in the Appendix.
5.3.29. Write a program that takes a context-free grammar G as input and
constructs a shift-reduce parsing algorithm for G, if G is a simple
precedence grammar. Use your program to construct a parser for PL360.
5.3.30. Write a program that will test whether a grammar is a uniquely invert-
ible weak precedence grammar.
5.3.31. Write a program to construct a shift-reduce parsing algorithm for a
uniquely invertible weak precedence grammar.
BIBLIOGRAPHIC NOTES
The origins of shift-reduce parsing appear in Floyd [1961]. Our treatment

here follows Aho et al. [1972]. Simple precedence grammars were defined by
Wirth and Weber [1966] and independently by Pair [1964]. The simple precedence
concept has been used in compilers for several languages, including Euler [Wirth
and Weber, 1966], ALGOL W [Bauer et al., 1968], and PL360 [Wirth, 1968].
Fischer [1969] proved that every CFL without e is generated by a (not necessarily
UI) (1, 1) precedence grammar (Exercise 5.3.19).
Extended precedence was suggested by Wirth and Weber. Gray [1969] points
out that several of the early definitions of extended precedence were incorrect.
Because of its large memory requirements, (m, n) extended precedence with
m + n > 3 seems to have little practical utility. McKeeman [1966] studies methods
of reducing the table size for an extended precedence parser. Graham [1970] gives
the interesting theorem that every deterministic language has a UI (2, 1) precedence
grammar.
Weak precedence grammars were defined by Ichbiah and Morse [1970].
Theorem 5.19 is from Aho et al. [1972].
Several error recovery schemes are possible for shift-reduce parsers. In shift-
reduce parsing an error can be announced both in the shift-reduce phase and
in the reduce phase. If an error is reported by the shift-reduce function, then we
can make deletions, changes, and insertions as in both LL and LR parsing. When
a reduce error occurs, it is possible to maintain a list of error productions which
can then be applied to the top of the pushdown list.
Error recovery techniques for simple precedence grammars are discussed by
Wirth [1968] and Leinius [1970]. To enhance the error detection capability of a
simple precedence parser, Leinius also suggests checking after a reduction is made
that a permissible precedence relation holds between symbols X and A, where A
is the new nonterminal on top of the pushdown list and X the symbol immediately
below.
5.4, OTHER CLASSES OF S H I F T - R E D U C E

PARSABLE G R A M M A R S
We shall mention several other subclasses of the L R grammars having

shift-reduce parsing algorithms. These are the bounded-right-context gram-
mars, mixed strategy precedence grammars, and operator precedence gram-
mars. We shall also consider the Floyd-Evans production language, which
is essentially a programming language for deterministic parsing algorithms.
SEC. 5.4 OTHER CLASSES OF SHIFT-REDUCE PARSABLE GRAMMARS 427
5.4.1. Bounded-Right-Context Grammars
We would like to enlarge the class of weak precedence grammars that we

have considered by relaxing the requirement of unique invertibility. We
cannot remove the requirement of unique invertibility altogether, since
an economical parsing algorithm is not known for all precedence grammars.
However, we can parse many grammars using the weak precedence concept
to locate the right end of a handle and then using local context, if the grammar
is not uniquely invertible, to locate the left end of the handle and to determine
which nonterminal is to replace the handle.
A large class of grammars which can be parsed in this fashion are the
(m, n)-bounded-right-context (BRC) grammars. Informally, G = (N, Z, P, S)
is an (m, n)-BRC grammar if whenever there is a rightmost derivation
S' ~r m aaw =~
rm
apw,
in the augmented grammar G' = (N t2 [S'}, Z, P W {S' ---~ S}, S'), then the
handle fl and the production A ---~ fl which is used to reduce the handle in
aflw can be uniquely determined by
(1) Scanning aflw from left to right until the handle is encountered.
(2) Basing the decision of whether ~ is the handle of aflw, where ~,~ is
a prefix of aft, only on ~, the m symbols to the left of 6 and n symbols to
the right of 6.
(3) Choosing for the handle the leftmost substring which includes, or is
to the right of, the rightmost nonterminal of aflw, from among possible
candidates suggested in (2).
For notational convenience we shall append m $'s to the left and n $'s to
the right of every right-sentential form. With the added $'s we can be sure
that there will always be at least m symbols to the left of and n symbols to
the right of the handle in a padded right-sentential form.
DEFmmON
G = (N, Z, P, S) is an (m, n)-bounded right-context (BRC) grammar if
the four conditions"
(1) $'ns'$" ~G ' rm
aAw ~G" r m aflw and
(2) $ms'$" ~
G' rm
?Bx ~G" r m y~x = ~' fly are rightmost derivations in the
augmented grammar G ' : (N U {S'}, Z, P U {S' ---~ S}, S').
(3) lxl_<lyl
(4) the last m symbols of a and a' coincide, and the first n symbols of
w and y coincide
imply that t~'Ay----~Bx; that is, a ' = 7,, A = B, and y----x.
A grammar is BRC if it is (m, n)-BRC for some m and n.
If we think of derivation (2) as the "real" derivation and of (1) as a possible

cause of confusion, then condition (3) ensures that we shall not encounter
a substfing that looks like a handle (p surrounded by the last m symbols of
and the first n of w) to the left of the real handle $. Thus we can choose
as the handle the leftmost substring that "looks" like a handle. Condition
(4) assures that we only use m symbols of left context and n symbols of right
context to decide whether something is a handle or not.
As with LR(k) grammars, the use of the augmented grammar in the
definition is required only when S appears on the right side of some
production. For example, the grammar G with the two productions
S >Sala
would be (1, 0)-BRC without the proviso of an augmented grammar. As in

Example 5.22 (p. 373), we cannot determine whether to accept S in the right-
sentential form Sa without looking ahead one symbol. Thus we do not want
G to be considered (1, 0)-BRC.
We shall prove later than every (m, k)-BRC grammar is LR(k). However,
not every LR(0) grammar is (m, n)-BRC for any m and n, intuitively because
the LR definition allows us to use the entire portion of a right-sentential form
to the left of the handle to make our parsing decisions, while the BRC con-
dition fimits the portion t o the left of the handle which we may use to m
symbols. Both definitions limit the use of the portion to the right of the han-
dle, of course.
Example 5.41
The grammar G i with productions
S > aAc
A > Abblb
is a (1, 0)-BRC grammar. The right-sentential forms (other than S' and S)
are aAb2"c for all n > 0 and ab2"+le for n > 0. The possible handles are
aAe, Abb, and b, and in each right-sentential form the handle can be uniquely
determined by scanning the sentential form from left to right until aAe or
Abb is encountered or b is encountered with an a to its left. Note that neither
b in Abb could possibly be a handle by itself, because A or b appears to its
left.
On the other hand, the grammar G2 with productions
S > aAc
A > bAb[b
generates the same language but is not even an LR grammar. D

Example 5.42
The grammar G with productions
S > aAlbB
A ---+ 0A l 1
B----+ 0BI 1
is an LR(0) grammar, but fails to be BRC, since the handle in either of

the right-sentential forms a0"l and b0"l is !, but knowing only a fixed number
of symbols immediately to the left of 1 is not sufficient to determine whether
A ~ 1 or B ~ 1 is to be used to reduce the handle.
Formally, we have derivations
smstsn ~ SmaOmASn~ $maOml $"

rill
and
smsts" zz~ $mbOmB$" ~ SmbOml $"
rm rl'n
Referring to the BRC definition, we note that a -- $ma0m, 0~' --- ~' --- $mb0m,
fl - - 6 = 1, and y = w - - x -- $n. Then ~ and ~' end in the same m symbols,
0 m; w and y begin with the same n, $"; and [xl < ]y l, but ~'Ay =/=?Bx. (A and
B are themselves in the BRC definition.)
The grammar with productions
S > aA [bA
A ---~ 0A I 1
generates the same language and is (0, 0)-BRC. [-7
Condition (3) in the definition of BRC may at first seem odd. However,
it is this condition that guarantees that if, in a right-sentential form a'fly, fl
is the leftmost substring which is the right side of some production A ~ fl
and the left and right Context of fl in a'fly is correct, then the string a'Ay
which results after the reduction will be a right-sentential form.
The BRC grammars are related to some of the classes of grammars we
have previously considered in this chapter. As mentioned, they are a subset
of the LR grammars. The BRC grammars are extended precedence grammars,
and every uniquely invertible (m, n) precedence grammar is a BRC grammar.
The (1, 1)-BRC grammars include all uniquely invertible weak precedence
grammars. We shall prove this relation first.
THEOREM 5.20
If G = (N, E, P, S) is a uniquely invertible weak precedence grammar,
then it is a (1, 1)-BRC grammar.
Proof Suppose that we have a violation of the (1, 1)-BRC condition,

i.e., a pair of derivations
$S'$ ~ rm
aAw ~ riD.
aflw
and
$s'$ ~= rBx ~ ~,ax = a' fly
where a and a' end in the same symbol; w and y begin with the same symbol;
and l xl _< l yl, but ~,Bx ~ a'Ay. Since G is weak precedence, by Theorem
5.14 applied to 75x, we encounter the 3> relation first between 5 and x.
Applying Theorem 5.14 to ~flw, we encounter 3> between fl and w, and since
w and y begin with the same symbol, we encounter 3> between fl and y.
Thus, l a'13l >_ I~'~ I. Since we are given Ixl _< i y l, we must have a'fl = 7di
and x = y.
If we can show that fl = O, we shall have a' = 7- But by unique invertibil-
ity, A = B. We would then contradict the hypothesis that 7Bx ~ ~'Ay.
If f l ¢ J, then one is a suffix of the other. We consider cases to show
/ / = ,~.
Case 1:]3 = eX5 for some e and X. X is the last symbol of y, and there-
fore we have X <~ B or X " B by Theorem 5.14 applied to right-sentential
form ?Bx. This violates the weak precedence condition.
Case 2: J = eXfl for some e and X. This case is symmetric to the above.
We conclude that fl = O and that G is (1, 1)-BRC. [Z
THEOREM 5.21
Every (m, k)-BRC grammar is an LR(k) grammar.
Proof. Let G = (N, Z,P, S) be (m, k)-BRC but not LR(k). Then by
Lemma 5.2, we have two derivations in the augmented grammar G'
S' ~ r i l l aA w - -r m~ oq3w
and
S' ~rm 7Bx ~ 7OX = efly
where I~Ol ~ laP[ and FIRSTk(y ) = FIRSTk(w), but 7Bx ~ ~Ay. If we

surround all strings by $'s and tet ~ ' = ~, we have an immediate violation
of the (m, k)-BRC condition. [Z]
COROLLARY
Every BRC grammar is unambiguous. D

sEc. 5.4 OTHER CLASSES OF SHIFT-REDUCE PARSABLE GRAMMARS 431
We shall now give a shift-reduce parsing algorithm for BRC grammars

and discuss its efficient implementation. Suppose that we are using a shift-
reduce parsing algorithm to parse a BRC grammar G and that the parsing
algorithm is in configuration (0c, w, z~). Then we can define sets ~C and 9Z,
which will tell us whether the handle in the right-sentential form t~w appears
on top of the stack (i.e., is a suffix of 00 or whether the right end of the handle
is somewhere in w (and we need to shift). If the handle is on top of the stack,
these sets will also tell us what the handle is and what production is to be
used to reduce the handle.
DEFINITION
Let G be an (m, n)-BRC grammar. ~ , , ( A ) , for A ~ N, is the set of

triples (~, fl, x) such that l e l = m, Ix l = n, and there exists a derivation
$"S'$" *=*~m ~ocAxy ~ ~,ocflxy in the augmented grammar.
9ZG
m , n is the set of pairs (e, x) such that
(1) Either 1~[ = m --t-/, where l is the length of the longest right side in
P o r i e l < m -q- I and ~ begins with $".
(2) Ixl = n.
(3) There is a derivation $ms'$" ~rm flay :72 flYY, where txx is a substring
of ]32,ypositioned so that ~ lies within fl? and does not include the last symbol
of fly.
We delete G, m, and n from ~(A) and 9Z when they are obvious.
The intention is that the appearance of substring ocflx in scanning a right-
sentential form from left to right should indicate that the handle is fl and
that it is to be reduced to A whenever (e, fl, x) is in ~C(A). The appearance
of ex, when (e, x) is in 9Z, indicates that we do not have the handle yet, but
it is possible that the handle exists to the right of e. The following lemma
assures us that this is, in fact, the case.
LEMMA 5.6
G = (N, E, P, S) is (m, n)-BRC if and only if
(1) Let A --, fl and B ~ 6 be distinct productions. Then if (~, fl, x) is in
~Cm,,(A) and (~,, 6, x) is in 5Cm.,(B), then ~fl is not a suffix of ~,O, or vice versa;
(2) For all A ~ N, if (0~, fl, x) is in ~Cm.,(A), then (Oo~fl,x) is not in 9Zn,.
for any 0.
Proof.
If: Suppose that G is not (m, n)-BRC. Then we can find derivations in
the augmented grammar G
$~S'$" ~r I I l ~Aw ~r n l ~flw

and
$mstsn ~ r m r B x r~m r ~x -- oF[3y
where 0~ and 0F coincide in the last m places, w and y coincide in the first n
places, and Ix I_< ]y 1, but 7Bx ~ ogAy. Let e be the last m places of 0~ and z
the first n places of w. Then (e, fl, z) is in ~(A). If x ~ y, and Ix[ _< ]y 1,we
must have (Oefl, z) in 9Z for some 0 and thus condition (2) is violated. If x = y,
then 07, 6, z) is in ~(B), where I/is the last m symbols of 7. If A --> fl and
B --~ d~ are the same, then with x = y we conclude 7Bx = oFAy, contrary to
hypothesis. But since one of ~6 or eft is a suffix of the other, we have a
violation of (1) if A --, fl and B ~ ~ are distinct.
Only if: Given a violation of (1) or (2), a violation of the (m, n)-BRC
condition is easy to construct. We leave this part for the Exercises. [Z
We can now give a shift-reduce parsing algorithm for BRC grammars.

Since it involves knowing the sets ~ ( A ) and 9Z, we shall first discuss how to
compute them.
ALGORITHM 5.16
Construction of ~m,,(A) and 9Zm...
Input. A proper grammar G = (N, Z, P, S).
Output. The sets ~m,,(A) for A ~ N and 9Z.
Method.
(1) Let I be the length of the longest right side of a production. Compute
$, the set of strings 7 such that
(a) ] ) , [ - - - - m - q - n + l , or [ ~ , [ < m - q - n - J r - I and y begins with $m;
( b ) ) ' is a substring of ocflu, where ocflw is a right-sentential form with
handle fl and u ----- FIRST,(w); and
(c) ~, contains at least one nonterminal.
We can use a method similar to the first part of Algorithm 5.13 here.
(2) For A ~ N, let 5e(A) be the set of (ct, fl, x) such that there is a string
~,ocAxy in $, A --, fl is in P, [~[ = m, and Ix] = n.
(3) Let 9Z be the set of (0~, x) such that there exists ~,By in $, B --~ ~ in
P, ctx is a substring of ~,Oy, and ~ is within ?O, exclusive of the last symbol
of ~,O. Of course, we also require that I x ] = n and that [~[ -- m ÷ / , where l
is the longest right side of a production or ~ begins with $" and [~[ <
m-+- l. F-]
THEOREM 5.22
Algorithm 5.6 correctly computes ~ ( A ) and 9Z.
Proof Exercise. [Z]
SEC. 5.4 OTHER CLASSESOF SHIFT-REDUCE PARSABLE GRAMMARS 4:33
ALGORITHM 5.17
Shift-reduce parsing algorithm for BRC grammars.

Input.
An (m,n)-BRC grammar G= (N,X,P,S), with augmented
grammar G' = (N', X, P, S').
Output. et = ( f , g), a shift-reduce parsing algorithm for G.
Method.
(1) Let f ( a , w) = shift if (~, w) is in 9Zm...
(2) f(a, w) = reduce if ~ = axa2, and (~1, a2, w) is in 5Cm.,(A) for some
A, unless A = S', a~ = $, and a2 = S.
(3) f($ss, $") = accept.
(4) f ( a , w) = error otherwise.
(5) g(a, w) = i if we can write a = a~a2, (aa, ~2, w) is in ~(A), and the
ith production is A --~ ~2.
(6) g(a, w) = error otherwise. D
THEOREM 5.23
The shift-reduce parsing algorithm constructed by Algorithm 5.17 is

valid for G.
Proof. By Lemma 5.6, there is never any ambiguity in defining f and g.
By definition of ~(A), whenever a reduction is made, the string reduced,
t~2, is the handle of some string flocloczwz. If it is the handle of some other
string fl'oclo~zwz', a violation of the BRC condition is immediate. The only
difficult point is ensuring that condition (3), [xl _< lY l, of the BRC definition
is satisfied. It is not hard to show that the derivations of floclet2wz and
fl'oc~ctzwz' can be made the first and second derivations in the BRC definition
in one order or the other, so condition (3) will hold. [Z]
The shift-reduce algorithm of Algorithm 5.17 has f and g functions which

clearly depend only on bounded portions of the argument strings, although
one must look at substrings of varying lengths at different times. Let us dis-
cuss the implementation o f f and g together as a decision tree. First, given
on the pushdown list and x on the input, one might branch on the first n
symbols of x. For each such sequence of n symbols, one might then scan
backwards, at each step making the decision whether to proceed further on
0c or announce an error, a shift, or a reduction. If a reduction is called for,
we have enough information to tell exactly which production is used, thus
incorporating the g function into the decision tree as well. It is also possible
to generalize Domolki's algorithm (See the Exercises of Section 4.1.) to make
the decisions.
Example 5.43
Let us consider the grammar G given by
(0) s ' ~s
(1) s , OA
(2) S ,~ 1S
(3) a ~ OA
(4) A 71
G is (1, 0)-BRC. To compute 5C(A), 5C(S), and ~ , we need the set of
strings of length 3 or less that can appear in the viable prefix of a right-
sentential form and have a nonterminal. These are $S', $S, $0A, SIS, 00A,
11 S, 10A, and substrings thereof.
We calculate
5C,,o(S' ) -~ [($, S, e)}

~,,o(S) = {($, OA, e), ($, 1S, e), (1, OA, e), (1, 1S, e)}
~Cx,o(A) = [(0, OA, e), (0, 1, e)}
i~ consists of the pairs (a, e), where a is $, $0, $00, 000, $1, $11, $10,
111,100, or 110. The functions f and g are given in Fig. 5.17. By "ending of
a" we mean the shortest suffix of a necessary to determine f ( a , e) and to
determine g(a, e) if necessary.
Ending of a f(a, e) g(a, e)
$0A reduce
10A reduce
00A reduce
$IS reduce
11S reduce
O1 reduce
O0 shift
$0 shift
$10 shift
110 shift
$1 shift
$11 shift
111 shift
$ shift
$S accept
Fig. 5.17 Shift-reduce functions.

A decision tree implementing f and g is shown in Fig. 5.18 on p. 436.

We omit nodes with labels A and S below level 1. They would all have the
outcome error, of course. [~
5.4.2, Mixed Strategy Precedence Grammars
Unfortunately, the shift-reduce algorithm of Algorithm 5.17 is rather

expensive to implement because of the storage requirements for the f and g
functions. We can define a less costly shift-reduce parsable class of grammars
by using precedence to locate the right end of the handle and then using
local context to both isolate the left end of the handle and determine which
nonterminal is to replace the handle.
Example 5.44
Consider the (non-UI) weak precedence grammar G with productions
S- > aA l bB
A > CAIIC1
B > DBE1 [DE1
C >0
D- >0
E >1
G generates the language [a0"l"ln ~ 1} U (b0"l 2"In > 1], which we shall show
in Chapter 8 not to be a simple precedence language. Precedence relations
for G are given in Fig. 5.19 (p. 437). Note that G is not uniquely invertible,
because 0 appears as the right side in two productions, C --~ 0 and D ~ 0.
However, ~1,0(C) = [(a, 0, e), (C, 0, e)} and ~l,0(D) = {(b, 0, e), (D, 0, e)~
Thus, if we have isolated 0 as the handle of a right-sentential form, then
the symbol immediately to the left of 0 will determine whether to reduce
the 0 to C or to D. Specifically, we reduce 0 to C if that symbol is a or C
and we reduce 0 to D if that symbol is b or D. [~]
This example suggests the following definition.

DEFINITION
Let G = (N, E, P, S) be a proper C F G with no e-productions. We say

that G is a (p, q; m, n) mixed strategy precedence (MSP) grammar if
(1) The extended (p, q) precedence relation .> is disjoint from the union
of the (p, q) precedence relations < and 2__.
(2) If A ~ ~fl and B ~ fl are distinct productions in P, then the follow-
ing three conditions can never be all true simultaneously:
/ °
- 0
L~
,d
o
.,,.f
0
a0
•~'-~ - ;,,~
1,1
436
SEe. 5.4 OTHER CLASSES OF SHIFT-REDUCE PARSABLE GRAMMARS 437
S A B C D E a b 0 1 $
.>
<. .>
<. <.
<. "_ <.

°
- <.
i °
0 .> .>
1~ .> .>
$ <. <.
Fig. 5.19 Precedence relations for G.
(a) ~m,n(A)contains (7, ~fl, x).

(b) ~,,.,(B) contains (~, fl, x).
(c) ~ is the last m symbols of ~,~.
A (I, 1; 1, 0)-MSP grammar will be called a simple MSP grammar. For
example, the grammar of Example 5.44 is simple MSP. In fact, every uniquely
invertible weak precedence grammar is a simple MSP grammar.
Let l(A) = ~XI X ~ A or X--" A}. For a simple MSP grammar, condition
(2) above reduces to the following:
(2a) If A--~ fl'Xfl and B - ~ fl are productions, then X is not in I(B).
(2b) If A --~ fl and B --~ fl are productions, A ~ B, then I(A) and I(B)
are disjoint.
Condition (1) above and condition (2a) are recognizable as the weak
precedence conditions. Thus a simple MSP grammar can be thought of as
a (possibly non-UI) weak precedence grammar in which one symbol of left
context is sufficient to distinguish between productions with the same right
side [condition (2b)].
ALGORITHM 5.18
Parsing algorithm for M s P grammars.
Input. A (p, q; m, n)-MSP grammar G = (N, Z, P, S) in which the pro-

ductions are numbered.
Output. a = ( f , g), a shift-reduce parsing algorithm for G.
Method.
(1) Let l al = p and ]xl = q. Then
(a) f ( a , x) = shift if a ~ x or a ~ x, and
(b) f ( a , x) = reduce if 0c .> x.
(2) f($PS, $q) ---- accept.
(3) f(7', w ) = error otherwise.
(4) Let ~Cm,n(A) contain (a, fl, x) and A ~ fl be production i. Then
g(~fl, x) = i.
(5) g(?, w) = error otherwise. [--]
THEOREM 5.24
Algorithm 5.18 is valid for G.
Proof. Exercise. It suffices to show that every MSP grammar is a BRC
grammar, and then show that the functions of Algorithm 5.18 agree with
those of Algorithm 5.17. D
5.4.3. Operator Precedence Grammars
An efficient parsing procedure can be given for a class of grammars called

operator precedence grammars. Operator precedence parsing is simple to
implement and has been used in many compilers.
DEFINITION
An operator grammar is a proper C F G with no e-productions in which

no production has a right side with two adjacent nonterminals.
For an operator grammar we can define precedence relations on the set
of terminal symbols and $, while ignoring nonterminals. Let G -~ (N, E, P, S)
be an operator grammar and let $ be a new symbol. We define three operator
precedence relations on E U {$} as follows:
(1) a ~ b if A ~ o~a~bfl is in P with 7' in N u {e}.
(2) a <" b if A --~ txaBfl is in P and B ~ 7,bO, where 7' is in N U {e}.
(3) a .> b if A ~ ~Bbfl is in P and B :=~ tSaT', where 7' is in N U {e}.
+
(4) $ <- a if S =~ 7'a0c with 7' in N U {e}.
(5) a "> $ if S ~ ~aT' with 7' in N u {e}.
G is an operator precedence grammar if G is an operator grammar and
at most one operator precedence relation holds between any pair of terminal
symbols.
Example 5.45
The grammar G o is a classic example of an operator precedence grammar"
(1) E ~ E + T (2) E ~ T
(3) T ~ T , F (4) T ~ F
(5) F --~ (E) (6) F ---~ a
The operator precedence relations are given in Fig. 5.20. 5
( a * + ) $
•> .> .> .>
•> .> .> ->
<. <. .> .> .> .>
<. <. <. .> .> .>
<. <. <. <.
<. <. <. <. Fig. 5.20 O p e r a t o r precedence rela-

tions for Go.
We can produce skeletal parses for operator precedence grammars very

efficiently. The parsing principle is the same as for simple precedence analysis.
It is easy to verify the following theorem.
THEOREM 5.25
Let G = (N, E, P, S) be an operator grammar, and let us suppose that
$S$ r*~
m
txAw ~r m txflw. Then
(1) The operator precedence relation <z or ~ holds between consecutive
terminals (and $) of tx;
(2) The operator precedence relation < holds between the rightmost
terminal symbol of ~ and the leftmost terminal symbol of fl;
(3) The relation ~ holds between the consecutive terminal symbols of fl;
(4) The relation 3> holds between the rightmost terminal symbol of fl
and the first symbol of w.
COROLLARY
If G is an operator precedence grammar, then we can add to (1)-(4) of

Theorem 5.25 that "no other relations hold."
Proof. By definition of operator precedence grammar. D
Thus we can readily isolate the terminal symbols appearing in a handle

using a shift-reduce parsing algorithm. However, nonterminal symbols
cause some problems, as no precedence relations are defined on nonterminal
symbols. Nevertheless, the fact that we have an operator grammar allows
us to produce a "skeletal" right parse.
Example 5.46
Let us parse the string (a -+- a) • a according to the operator precedence
relations of Fig. 5.20 obtained from G o. However, we shall not worry about
nonterminals and merely keep their place with the symbol E. That way we
do not have to worry about whether F should be reduced to T, or T to F
(although in this particular case, we could handle such matters by going
outside the methods of operator precedence parsing). We are effectively
parsing according to the g r a m m a r G"
(1) E >E + E
(3) E > E, E
(s) E , (E)
(6) E >a
derived from G o by replacing all nonterminals by E and deleting single pro-

ductions. (Note that we cannot have a production with no terminals on the
right side in an operator grammar unless it is a single production.)
Obviously, G is ambiguous, but the operator precedence relations will
assure us that a unique parse is found. The shift-reduce algorithm which we
shall use on grammar G is given by the functions f and g below. Note that
strings which form arguments for f and g are expected to consist only of
the terminals of Go and the symbols $ and E. Below, ~, is either E or the
empty string; b and c are terminals or $.
shift if b ~ c or b ~-. c
reduce if b 3> c
(1) f(b},, c) =
accept if b = $, 7 = E, and c = $
error otherwise
(2) g(bTa, x) -- 6 if b <~ a

g(bE • E, x) -- 3 if b < •
g(bE-+- E , x ) - - 1 ifb ~ +
g(bT(E ), x) ----- 5 if b <~ (
g(~, x) = error otherwise
SEC. 5.4 OTHER CLASSES OF S H I F T - R E D U C E PARSABLE GRAMMARS 441
Thus a~ would make the following sequence of moves with (a --b a) • a

as input"
[$, (a + a) • aS, e] ~ [$(, a + a) • aS, e]

V-[$(a, + a) • aS, e]
[$(E, + a) • aS, 6]
[$(E + , a) • aS, 61
[$(E -+ a, ) • aS, 6]
[$(E + E, ) • aS, 66]
I-- IS(E, ) • aS, 661]
[$(E), • aS, 661]
[--- [$E, • aS, 6615]
[$E ,, aS, 6615]
[$E • a, $, 6615]
]--- [$E • E, $, 66156]
[$E, $, 661563]
t-- accept
We can verify that 661563 is indeed a skeletal right parse for (a q-- a) • a
according to G. We can view this skeletal parse as a tree representation of
(a + a) • a, as shown in Fig. 5.21. [ ]
We should observe that it is possible to fill in the skeletal parse tree of

Fig. 5.21 to build the corresponding tree of Go. But in a practical sense, this
is not often necessary. The purpose of building the tree is for translation,
and the natural translation of E, T and F in G Ois a computer program which
computes the expression derived from that nonterminal. Thus when produc-
tion E ----~T or T----~ F is applied, the translation of the right is very likely
to be the same as the translation of the left.
( E )
E + E
I
a
I
a Fig. 5.21 Skeletalparse tree.
Example 5.46 is a special case of a technique that works for many gram-
mars, especially those that define languages which are sets of arithmetic
expressions. Involved is the construction of a new grammar with all nonter-
minals of the old grammar replaced by one nonterminal and single produc-
tions deleted. If we began with an operator precedence grammar, we can
always find one parse of each input by a shift-reduce algorithm. Quite often
the new grammar and its parser are sufficient for the purposes of translation,
and in such situations the operator precedence parsing technique is a particu-
larly simple and efficient one.
DEFINITION
Let G = (N, X, P, S) be an operator grammar. Define Gs = ([S}, X, P', S),
the skeletal grammar for G, to consist of all productions S----~ Xi " " Xm
such that there is a production A ~ Y1 "" " Ym in P, and for 1 < i < m,
(1) Xt = Y, if Y, ~ X.
(2) X ~ = S if Yt ~ N.
However, we do not allow S ---~ S in P'.
We should warn the reader that L(G) ~ L(G,) and in general L(G,) may
contain strings not in L(G). We can now give a shift-reduce algorithm for
operator precedence grammars.
ALGORITHM 5.19
Operator precedence parser.
Input. An operator precedence grammar G = (N, X, P, S).
Output. Shift-reduce parsing functions f and g for G,.
Method. Let fl be S or e.
(1) f(afl, b) = shift if a b.
(3) f($S, $ ) = accept.
(4) f ( a , w) = error otherwise.
(5) g(aflby, w ) = / i f
(a) fl is S or e;
(b) a < b;
(c) The " relation holds between consecutive terminal symbols of
y, if any; and
(d) Production i of G, is S --~ flby.
(6) g(g, w) = error otherwise. D
Example 5.46 is an example of Algorithm 5.19 applied to G o. To show

the correctness of Algorithm 5.19, two lemmas are needed.
LEMMA 5.7
If g is a right-sentential form of an operator grammar, then g does not
have two adjacent nonterminals.
Proof. Elementary induction on the length of the derivation of tz.
LEMMA 5.8
If g is a right-sentential form of an operator grammar, then the symbol
appearing immediately to the left of the handle cannot be a nonterminal.
Proof. If it were, then the right-sentential form to which ~z is reduced
would have two adjacent nonterminals. [Z]
THEOREM 5.26
Algorithm 5.19 parses all sentences in L(G).
Proof. By the corollary to Theorem 5.25, the first 3> and the previous <
correctly isolate a handle. Lemma 5.7 justifies the restriction that fl be only
S or e (rather than any string in S*). Lemma 5.8 justifies inclusion of fl in
the handle in rule (5). D
5.4.4. Floyd-Evans Production Language
What we shall next discuss is not another parsing algorithm, but rather
a language in which deterministic (nonbacktracking) top-down and bottom-
up parsing algorithms can be described. This language is called the Floyd-
Evans production language, and a number of compilers have been imple-
mented using this syntactic metalanguage. The name is somewhat of a mis-
nomer, since the statements of the language need not refer to any particular
productions in a grammar. A program written in Floyd-Evans productions
is a specification of a parsing algorithm with a finite state control influencing
decisions.'l"
A production language parser is a list of production language statements.
Each statement has a label, and the labels can be considered to be the states
of the finite control. We assume that no two statements have the same label.
The statements act on an input string and a pushdown list and cause a right
parse to be constructed. We can give an instantaneous description of the
parser as a configuration of the form
(q, SXm "'" X1, al "'" an$, zt)
where
(1) q is the label of the currently active statement;
1"Wemight add that this is not the ultimate generalization of shift-reduce algorithms.
The LR(k) parsing algorithm uses a finite control and also keeps extra information on its
pushdown list. In fact, a DPDT might really be considered the most general kind of shift-
reduce algorithm. However, as we saw in Section 3.4 the DPDT is not really constrained
to parse by making reductions according to the grammar for which its output is a presumed
parse, as is the LR(k) algorithm and the algorithms of Sections 5.3 and 5.4.
Z
444 ONE=PASSNO BACKTRACK PARSING CHAP. 5
(2) Xm "'" X1 is the contents of the pushdown list with X1 on top ($ is

used as a bottom of pushdown list marker);
(3) al . . . a n is the remaining input string ($ is also used as a right end-
marker for the input tape);
(4) zt represents the output of the parser to this point, presumably the
right parse of the input according to some CFG.
A production language statement is of the form
(label) : tzla > ill(action) • (next label)
where the metasymbols ---~ and • are optional.

Suppose that the parser is in configuration
(L1, ~,~, ax, n)
and statement L1 is
LI" gla > Plemit s , L2
L1 says that if the string on top of the pushdown list is ~ and the current
input symbol is a, then replace ~ by fl, emit the string s, move the input head
one symbol to the right (indicated by the presence of ,), and go next to
statement L2. Thus the parser would enter the configuration (L2, 7fl, x, ns).
The symbol a may be e, in which case the current input symbol is not relevant,
although if the • is present, an input symbol will be shifted anyway.
If statement L1 did not apply, because the top of the pushdown list did
not match ~ or the current input symbol was not a, then the statement imme-
diately following L1 on the list of statements must be applied next.
Both labels on a statement are optional (although we assume that each
statement has a name for use in configurations). If the symbol --~ is missing,
then the pushdown list is not to be changed, and there would be no point in
having fl ~ e. If the symbol • is missing, then the input head is not to be
moved. If the (next label~ is missing, the next statement on the list is always
taken.
Other possible actions are accept and error. A blank in the action field
indicates that no action is to be taken other than the pattern matching and
possible reduction.
Initially, the parser is in configuration (L, $, w$, e), where w is the input
string to be parsed and L is a designated statement. The statements are then
serially checked until an applicable statement is found. The various actions
specified by this statement are performed, and then control is transferred to
the statement specified by the next label.
The parser continues until an error or accept action is encountered. The
output is valid only when the accept is executed.
SEC. 5.4 OTHER CLASSES OF S H I F T - R E D U C E PARSABLE GRAMMARS 445
We shall discuss production language in the context of shift-reduce

parsing, but the reader should bear in mind that top-down parsing algorithms
can also be implemented in production language. There, the presumption
is that we can write a : a,a2a3 and fl : a,Aa3 or fl : o~,Aa3a if
the • is present. Moreover, A ~ a2 is a production of the grammar we are
attempting to parse, and the output s is just the number of production
A ---~ ~ z.
Floyd-Evans productions can be modified to take "semantic routines" as
actions. Then, instead of emitting a parse, the parser would perform a syntax-
directed translation, computing the translation of A in terms of the trans-
lations of the various components of a2. Feldman [1966] describes such a
system.
Example 5.47
We shall construct a production language parser for the grammar Go

with the productions
(I) E--~ E-1- T (2) E ~ T
(3) r---~ T , F (4) r---~ F
(5) F ~ (E) (6) F ---~a
The symbol ~ is used as a match for any symbol. It is presumed to repre-
sent the same symbol on both sides of the arrow. L11 is the initial state-
ment.
L0: ( # ~(# ,L0

LI: a # ---~ y # emit 6
L2: T, F~ > T~ emit 3 L4
L3: F..~ T~ emit 4
L4: T• >T*~ ,L0
L5: E-t-TO ----~ E ~ emit 1 L7
L6: T~ >E~ emit 2
L7: E -q- # >E+# ,L0
L8: (E) # ---~ F# emit 5 ,L2
L9: $E$ > accept
L10: $ > error
Lll: ,LO
The parser would go through the following configurations under input

(a+ a),a"
[L11, $, (a -t- a) • aS, e] ~- [LO, $(, a + a) • aS, e]

R [LO, $(a, + a) • aS, e]
R [L1, $(a, + a) • aS, e]
R [L2, $(F +, a) • aS, 6]
R [L3, $(F +, a) • aS, 6]
R [L4, $(T + , a) • aS, 64]
R [L5, $(T + , a) • aS, 64]
R [L6, $(T ÷, a) • aS, 64]
R [L7, $(E ÷ , a) • aS, 642]
F- [L0, $(E ÷ a, ) • aS, 642]
R ILl, $(E ÷ a, ) • aS, 642]
1- [L2, $(E + F), • aS, 6426]
R [L3, $(E + F), • aS, 6426]
R [L4, $(E + T), • aS, 64264]
r--- [L5, $(E ÷ T), • aS, 64264]
R [L7, $(E),, aS, 642641]
R [L8, $(E), • aS, 642641]
R [L2, $F,, aS, 6426415]
l- [L3, SF ,, aS, 6426415]
I-- [L4, ST ,, aS, 64264154]
R [L0, ST, a, $, 64264154]
l--- [L1, ST, a, $, 64264154]
R [L2, ST, F$, e, 642641546]
1-- [L4, $T$, e, 6426415463]
1---[L5, $T$, e, 6426415463]
R [L6, $T$, e, 6426415463]
t-- [L7, $E$, e, 64264154632]
R [L8, $E$, e, 64264154632]
t- [L9, $E$, e, 64264154632]
R accept [Z
It should be observed that a Floyd-Evans parser can be simulated by

a DPDT. Thus each Floyd-Evans parser recognizes a deterministic CFL.
However, the language recognized may not be the language of the grammar
SEC. 5 . 4 OTHER CLASSES OF SHIFT-REDUCE PARSABLE GRAMMARS 447
for which the Floyd-Evans parser is constructing a parse. This phenomenon

occurs because the flow of control in the statements may cause certain reduc-
tions to be overlooked.
Example 5.48
Let G consist of the productions
(1) S: > aS
(2) S > bS
(3) S >a
L(G) = (a -t-- b)*a. The following sequence of statements parses words in

b*a according to G, but accepts no other words:
L0: # ~#1 ,
LI: a >S [ emit 3 L4
L2: b ! L0
L3: $ [ error
L4: aS >S l emit 1 L4
L5: bS S [ emit 2 L4
L6: $S $ > $S$[ accept •
L7: I error
With input ba, the parser makes the following moves"
[L0, $, ba$, e] }---[L1, $b, aS, e]

[L2, $b, aS, e]
}- [L0, $b, a$, e]
l- [L1, $ba, $, e]
[L4, $bS, $, 3]
t--- [L5, $bS, $, 3]
F-- [L4, $S, $, 32]
t-- [L5, $S, $, 32]
[L6, $S, $, 32]
The input is accepted at statement L6. However, with input aa, the follow-
ing sequence of moves is made"
[L0, $, aa$, e] F- [L1, Sa, aS, e]

[L4, $S, aS, 3]
[L5, $S, aS, 3]
1-- [L6, $S, aS, 3]
[L7, $S, aS, 3]
An error is declared at L7, even though aa is in L(G).

There is nothing mysterious going on in Example 5.48. Production
language programs are not tied to the grammar that they are parsing in
the way that the other parsing algorithms of this chapter are tied to their
grammars. [Z
In Chapter 7 we shall provide an algorithm for mechanically generating

a Floyd-Evans production language parser for a uniquely invertible weak
precedence grammar.
5.4.5. Chapter Summary
The diagram in Fig. 5.22 gives the hierarchy of grammars encountered

in this chapter.
All containments in Fig. 5.22 can be shown to be proper. Those inclu-
sions which have not been proved in this chapter are left for the Exercises.
The inclusion of LL grammars in the LR grammars is proved in Chapter
8.
Insofar as the classes of languages that are generated by these classes of
grammars are concerned, we can demonstrate the following results:
(1) The class of languages generated by each of the following classes of
grammars is precisely the class of deterministic context-free languages"
(a) LR. (b) LR(1).
(c) BRC. (d) (1, 1)-BRC.
(e) MSP. (f) Simple MSP.
(g) Uniquely invertible (2, 1) (h) Floyd-Evans parsable.
precedence.
(2) The class of LL languages is a proper subset of the deterministic
CFL's.
(3) The uniquely invertible weak precedence grammars generate exactly
the simple precedence languages, which are
(a) A proper subset of the deterministic CFL's and
(b) Incommensurate with the LL languages.
(4) The class of languages generated by operator precedence grammars
is the same as that generated by the uniquely invertible operator precedence
grammars. This class of languages is properly contained in the class of simple
precedence languages.
Context-free
grammars
~ U ~ m b i g u o u s ~
Floyd,-Evans~ " CFG's
parsable I Operator
R precedence
BRC LR(I)
!
I
MSP
Simple MSP
Uniquely
invertible
extended
precedenc/
Uniquely
invertible
weak
precedence
Simple
precedence
Fig. 5.22 Hierarchy of grammars.
Many of these results on languages will be found in Chapter 8.

The reader may well ask which class of grammars is best suited for
describing programming languages and which parsing technique is best.
There is no clear'cut answer to such a question. The simplest class of gram-
mars may often require manipulating a given grammar in order to make it
fall into that class. Often the grammar becomes unnatural and unsuitable
for use in a syntax-directed translation scheme.
The LL(1) grammars are particularly attractive for practical use. For
each LL(1) grammar we can find a parser which is small, fast, and pro-
duces a left parse, which is advantageous for translation purposes. How-
ever, there are some disadvantages. An LL(1) grammar for a given language
can be unnatural and difficult to construct. Moreover, not every deterministic
CFL has an LL grammar, let alone an LL(1) grammar, as we shall see in
Chapter 8.
Operator precedence techniques have been used in several compilers, are
450 ONE-PASS NO B A C K T R A C K PARSING CHAP. 5
easy to implement and work quite efficiently. The (1, 1)-precedence grammars
are also easy to parse, but obtaining a (1, 1)-precedence grammar for a language
often requires the addition of many single productions of the form A ---~ X
to make the precedence relations disjoint. Also, there are many deter-
ministic CFL's for which no uniquely invertible simple or weak precedence
grammar exists.
The LR(1) technique presented in this chapter closely follows Knuth's
original work. The resulting parsers can be extremely large. However, the
techniques to be presented in Chapter 7 produce LR(1) parsers whose size
and operating speed are competitive with precedence parsers for a wide
variety o f programming languages. See Lalonde et al. [1971] for some
empirical results. Since the LR(1) grammars embrace a large class of gram-
mars, LR(1) parsing techniques are also attractive.
Finally we should point out that it is often possible to improve the per-
formance of any given parsing technique in a specific application. In Chapter
7 we shall discuss some methods which can be used to reduce the size and
increase the speed of parsers.
EXERCISES
5.4.1. Give a shift reduce parsing algorithm based on the (1, 0)-BRC technique
for G1 of Example 5.41.
5.4.2. Which of the following grammars are (1, 1)-BRC?
(a) S--~ aAIB
A ---~ 0A1 la
B ----~0B1 lb.
(b) S --. aA l bB
,4 ~ 0AllOl
B ---, 0Bli 01.
(c) E----,E ÷ TI E - TIT
T---, T . F! T/F! F
F---. (E)I-- EIa.
DEFINITION
A proper CFG G = (N, Z, P, S) is an (m, n)-bounded context (BC)

grammar if the three conditions
(1) $mS'$" ~ t~Alyl ==~ 0tlf1171 in the augmented grammar,

(2) $'~S'$" ~ ot2A272 ~ ot2f1272 = ~3fllY3 in the augmented gram-
mar, and
(3) The last m symbols of etl and ~t3 agree and the first n symbols
of 71 and 73 agree
imply that ~3A173 = 0~A2~2.
EXERCISES 451
5.4.3. Show that every (m, n)-BC grammar is an (m, n)-BRC grammar.
5.4.4. Give a shift-reduce parsing algorithm for BC grammars.
5.4.5. Give an example of a BRC grammar that is not BC.
5.4.6. Show that every uniquely invertible extended precedence grammar is
BRC.
5.4.7. Show that every BRC grammar is extended precedence (not necessarily
uniquely invertible, of course).
5.4.8. For those grammars of Exercise 5.4.2 which are (1, 1)-BRC, give shift-
reduce parsing algorithms and implement them with decision trees.
5.4.9. Prove the "only if" portion of Lemma 5.6.
5.4.11. Which of the following grammars are simple MSP grammars ?
(a) Go.
(b) S--~ A [ B
A ~ 0All01
B~ 2B1 !1.
(c) S - - , A I B
A - ~ OAllOl
B--~ OB1 I1.
(d) S--~ A I B
A ~ 0A1 [01
B --~ 01B1 [01.
5.4.12. Show that every uniquely invertible weak precedence grammar is a
simple MSP grammar.
5.4.13. Is every (m, n; m, n)-MSP grammar an (m, n)-BRC grammar .9
5.4.15. Are the following grammars operator precedence ?
(a) The grammar of Exercise 5.4.2(b).
(b) S ~ if B then S else 5'
S ~ if B then S
S---~ s
B---~ B o r b
B---~ b.
(c) 5' --~ if B then St else S
S --~ if B then S
$1 --~ if B then $1 else $1
S--, s
S1----). S
B---~ B o r b
B--lb.
The intention in (b) and (c)is that the terminal symbols are if, then,
else, or, b, and s.
5.4.16. Give the skeletal grammars for the grammars of Exercise 5.4.15.
5.4.17. Give shift-reduce parsing functions for those grammars of Exercise
5.4.15 which are operator precedence.
5.4.19. Show that the skeletal grammar G, is uniquely invertible for every
operator grammar G.
*5.4.20. Show that every operator precedence language has an operator prece-
dence grammar with no single productions.
"5.4.21. Show that every operator precedence language has a uniquely invertible
operator precedence grammar.
5.4.22. Give production language parsers for the grammars of Exercise 5.4.2.
5.4.23. Show that every BRC grammar has a Floyd-Evans production lan-
guage parser.
5.4.24. Show that every LL(k) grammar has a production language parser
(generating left parses).
**5.4.25. Show that it is undecidable whether a grammar is
(a) BRC.
(b) BC.
(c) MSP.
5.4.26. Prove that a grammar G = (N, E, P, S) is simple MSP if and only if
it is a weak precedence grammar and if A ---~ ~ and B ~ 0~ are in P,
A ~ B, then I(A) ~ I(B) = ~ .
*5.4.27. Suppose we relax the condition on an operator grammar that it be
proper and have no e-productions. Show that under this new definition,
L is an operator precedence language if and only if L - [e} is an
operator precedence language under our definition.
5.4.28. Extend Domolki's algorithm as presented in the Exercises of Section 4.1
to carry along information on the pushdown list so that it can be used
to parse BRC grammars.
DEFINITION
We can generalize the idea of operator precedence to utilize for

our parsing a set of symbols including all the terminals and, perhaps,
some of the nonterminals as well. Let G = (N, ~, P, S) be a proper
C F G with no e-production and T a subset of N u E, with E ~ T.
Let V denote N u ~. We say that G is a T-canonical grammar if
(1) For each right side of a production, say ocXYfl, if X is not in T,
then Y is in E, and
(2) If A is in T and A =~ ~, then tz has a symbol of T.
Thus a E-canonical grammar is the same as an operator grammar.
If G is a T-canonical grammar, we say that T is a token set for G.
EXERCISES 453
If G is a T-canonical grammar, we define T-canonical precedence

relations, < , --~-, and 3> on T U {$}, as follows:
(1) If there is a production A ---~ ~zX[JY?, X and Y are in T, and

/~ is either e or in ( V - T), then X ~ Y.
+
(2) If A ----~ ~zXBt~ is in P and B ==~ ? YS, where X and Y are in T

and 7 is either e or in V - T, then X < Y.
(3) Let A ~ oclBoc2Ztx3 be in P, where ~z2 is either e or a symbol
of V -- T. Suppose that Z ~ * flla[32, where fll is either e or in V -- T,
and a is in X. (Note that this derivation must be zero steps if tx2 ~ e,
by the T-canonical grammar definition.) Suppose also that there is a
derivation
B~ YiC1J1 =~" 7172C25251 ~ "'" = ~ 71 "'" ? k - l C k - l J k - 1 "'" 01

=~ 71 "'" 7 k X J k "'" 51,
where the C's are replaced at each step, the J ' s are all in {el u V - T,
and X is in T. Then we say that X .> a.
(4) If S ~ aXfl and a is in {e) u V - T , then $ < X . I f f l i s i n
[el kJ V - - T, then X - > $.
N o t e that if T = Z, we have defined the operator precedence rela-
tions, and if T = V, we have the W i r t h - W e b e r relations.
Example 5,49
Consider Go, with token set A = {F, a, (,), + , ,]. We find (..-~),
since (E) is a right side, and E is not a token. We have -t- < *, since
there is a right side E --t--T, and a derivation T ~ T . F, and T is not
a token. Also, -4- .> + , since there is a right side E -4- T and a deriva-
tion E ~ E + T, and T is not a token. The A-canonical relations for
Go are shown in Fig. 5.23.
a + * ( ) F $
•> .> .> .>
<. .> <. <. .> <. .>
<. <.
<. <. <5 <. • <.

,
) .> .> .> .>
F -> -> "> .>
$ <- <. <. <. <. Fig. 5.23 A-canonical precedence

relations.
5.4.29. Find all token sets for Go.

5.4.30. Show that E is a token set for G = (N, ~, P, S) if and only if G is an
operator precedence grammar. Show that N w • is a token set if and
only if G is a precedence grammar.
DEFINITION
Let G = (N, ZE,P, S) be a T-canonical grammar. The T-skeletal

grammar for G is formed by replacing all instances of symbols in
V - T by a new symbol So and deleting the production So ---~ So if it
appears.
Example 5.50
Let A be as in Example 5.49. The A-skeletal grammar for Go is
So >S0 + S o I S 0 . F [ F
F > (S0) [a D
5.4.31. Give a shift-reduce parsing algorithm for T-canonical precedence

grammars whose T-skeletal grammar is uniquely invertible. Parses in
the skeletal grammar are produced, of course.
Research Problem
5.4.32. Develop transformations which can be used to make grammars BRC,
simple precedence, or operator precedence.
5.4.33. Write a program that tests whether a given grammar is an operator
precedence grammar.
5.4.34. Write a program that constructs an operator precedence parser for an
operator precedence grammar.
5.4.35. Find an operator precedence grammar for one of the languages in the
Appendix and then construct an operator precedence parser for that
language.
5.4.36. Write a program that constructs a bounded-right-context parser for a
grammar G if G is (1, 1)-BRC.
5.4.37. Write a program that constructs a simple mixed strategy precedence
parser for a grammar G if G is simple MSP.
5.4.38. Define a programming language centered around the Floyd-Evans
production language. Construct a compiler for this programming
language.
BIBLIOGRAPHIC NOTES
Various precedence-oriented parsing techniques were employed in the earliest

compilers. The formalization of operator precedence is due to Floyd [1963].
Bounded context and bounded-right-context parsing methods were also defined in
the early 1960's. Most of the early development of bounded context parsing and
variants of it is reported by Eickel et al. [1963], Floyd [1963, 1964a, 1964b],
Graham [1964], Irons [1964], and Paul [1962].
The definition of bounded-right-context grammar here is equivalent to that
given by Floyd [1964a]. An algorithm for constructing parsers for certain classes
of BRC grammars is given by Loeckx [1970]. An extension of Domolki's algorithm
to BRC grammars is given by Wise [1971].
Mixed strategy precedence was introduced by McKeeman and used by
McKeeman et al. [I970] as the basis of the XPL compiler writing system.
Production language was first introduced by Floyd [1961] and later modified
by Evans [1964]. Feldman [1966] used it as the basis of a compiler writing system
called Formal Semantic Language (FSL) by permitting general semantic routines
in the (action) field of each production language statement.
T-canonical precedence was defined by Gray and Harrison [1969]. Example
5.47 is from Hopgood [1969].
LIMITED BACKTRACK
6 PARSING ALGORITHMS
In this chapter we shall discuss several parsing algorithms which, like

the general top-down and bottom-up algorithms in Section 4.1, may involve
backtracking. However, in the algorithms of this chapter the amount of
backtracking that can occur is limited. As a consequence, the parsing
algorithms to be presented here are more economical than those in Chapter 4.
Nevertheless, these algorithms should not be used in situations where a deter-
ministic nonbacktracking algorithm will suffice.
In the first section we shall discuss two high-level languages in which top-
down parsing algorithms with restricted backtracking capabilities can be
written. These languages, called TDPL and GTDPL, are capable of specify-
ing recognizers for all deterministic context-free languages with an endmarker
and, because of the restricted backtracking, even some non-context-free
languages, but (probably) not all context-free languages. We shall then
discuss a method of constructing, for a large class of CFG's, precedence-
oriented bottom-up parsing algorithms, which allow a limited amount of
backtracking.
6.1. LIMITED BACKTRACK T O P - D O W N PARSING
In this section we shall define two formalisms for limited backtrack pars-
ing algorithms that create parse trees top-down, exhaustively trying all
alternates for each nonterminal, until one alternate has been found which
derives a prefix of the remaining input. Once such an alternate is found, no
other alternates will be tried. Of course, the "wrong" prefix may have been
found, and in this case the algorithm will not backtrack but will fail. Fortu-
456
SEC. 6.1 LIMITED BACKTRACK TOP-DOWN PARSING 457
nately, this aspect of the algorithm is rarely a serious problem in practical

situations, provided we order the alternates so that the longest is tried first.
We shall show relationships between the two formalisms, discuss their
implementation, and then treat them briefly as mechanisms which define
classes of languages. We shall discover that the classes of languages defined
are different from the class of CFL's.
6.1.1. TDPL
Consider the general top-down parsing algorithm of Section 4.1. Suppose

we decide to generate a string from a nonterminal A and that ~1, ~z2, • • •, ~,
are the alternates for A. Suppose further that in a correct parse of the
input, A derives some prefix x of the remaining input, starting with the
derivation A =-~ lm
~m, 1 ~
--
rn -<- n, but that A ==~lm
~j, for j < m, does not lead
to a correct parse.
The top-down parsing algorithm in Chapter 4 would try the alternates
~1, 0~2, • • •, 0Cmin order. After each ~zj failed, j < m, the input pointer would
be reset, and a new attempt would be made, using 0~j+1. This new attempt
would be made regardless of whether ~zj derived a terminal string which was
a prefix of the remaining input.
Here we shall consider a parsing technique in which nonterminals are
treated as string-matching procedures. To illustrate this technique suppose
that a ~ . . . a,.is the input string and that we have generated a partial left
parse successfully matching the first i - 1 input symbols. If nonterminal A
is to be expanded next, then the nonterminal A can be "called" as a proce-
dure, with input position i as an argument. If A derives a terminal string
that is a prefix of a t a f + ~ . . . a ., then A is said to succeed starting at input
position i. Otherwise, A fails at position i.
These procedures call themselves recursively. If A were called in this
manner, A itself would call the nonterminals of its first alternate, ~ . If 0~
failed, then A would replace the input pointer to where it was when A was
first called, and then A would call 0cz, and so forth. If ~j succeeds in matching
at a t + ~ ' " ak, then A returns to the procedure that called it and advances
the input pointer to position k -¢- 1.
The difference between the current algorithm and Algorithm 4.1 is that
should the latter fail to find a complete parse in which ~ derives a t • • • ak, then
it will backtrack and try derivations beginning with productions A ~ 0c~÷I,
A --, ~j+2, and so forth, possibly deriving a different prefix of a~. • • a, from A.
Our algorithm will not do so. Once it has found that ~j derives a prefix of
the input and that the subsequent derivation fails to match the input, our
parsing algorithm returns to the procedure that called A, reporting failure.
The algorithm will act as if A can derive no prefix whatsoever of a t . . . a , , .
Thus our algorithm may miss some parses and may not even recognize
458 LIMITED BACKTRACK PARSING ALGORITHMS CHAP. 6
the same language as its underlying C F G defines. We shall therefore not tie
our algorithm to a particular CFG, but will treat it as a formalism for lan-
guage definition and syntactic analysis in its own right.
Let us consider a concrete example. If
S > Ac
A >alab
are productions and the alternates are taken in the order shown, then the
limited backtrackalgorithm will not recognize the sentence abc. The non-
terminal S called at input position 1 will call A at input position 1. Using
the first alternate, A reports success and moves the input pointer to position
2. However, c does not match the second input symbol, so S reports failure
starting at input position 1. Since A reported success the first time it was
called, it will not be called to try the second alternate. Note that we can
avoid this difficulty by writing the alternates for A as
A > abla
We shall now describe the "top-down parsing language," TDPL, which

can be used to describe parsing procedures of this nature. A statement (or
rule) of TDPL is a string of one of the following forms"
A > BC/D
or
A >a
where A, B, C, and D are nonterminal symbols and a is a terminal symbol,

the empty string, or a special symbol f (for failure).
DEFINITION
A TDPL program P is a 4-tuple (N, X, R, S), where

(1) N and X are finite disjoint sets of nonterminals and terminals,
(2) R is a sequence of TDPL statements such that for each A in N there
is at most one statement with A to the left of the arrow, and
(3) S in N is the start symbol.
A TDPL program can be likened to a grammar in a special normal form.
A statement of the form A ~ BC/D is representative of the two productions
A ---~ BC and A ---~ D, where the former is always to be tried first. A state-
ment of the form A ----~a represents a production of that form when a ~ X
or a = e. If a = f, then the nonterminal A has a special meaning, which will
be described later.
sEc. 6.1 LIMITED BACKTRACK TOP-DOWN PARSING 459
Alternatively, we can describe a TDPL program as a set of procedures

(the nonterminals) which are called recursively with certain inputs. The out-
come of a call will either be failure (no prefix of the input is matched or
recognized) or s u c c e s s (some prefix of the input is matched).
The following sequence of procedure calls results from a call of a state-
ment of the form A --~ B C / D , with input w:
(1) First, A calls B with input w. If w = x x ' and B matches x, then B

reports success. A then calls C with input x'.
(a) If x ' = y z and C matches y, then C reports s u c c e s s . A then
returns success and reports that it has matched the prefix x y of w.
(b) If C does not match any prefix of x', then C reports failure. A
then calls D with input w. Note that the success of B is undone in
this case.
(2) If, when A calls B with input w) B cannot match any prefix of w,
then B reports failure. A then calls D with input w.
(3) If D has been called with input w = uv and D matches u, a prefix of
w, then D reports s u c c e s s . A then returns success and reports that it has
matched the prefix u of w.
(4) If D has been called with input w and D cannot match any prefix of
w, then D reports failure. A then reports failure.
Note that D gets called unless both B and C succeed. We shall later
explore a parsing system in which D is called only if B fails. Note also that
if both B and C succeed, then the alternate D can never be called. This
feature distinguishes TDPL from the general top-down parsing algorithm of
Chapter 4.
The special statements A ---~ a, A ~ e, and A ---~f a r e handled as follows:
(1) If A ----~a is the rule for A with a ~ X~ and A is called on an input

beginning with a, then A succeeds and matches this a. Otherwise, A fails.
(2) If A ---~ e is the rule for A, then A succeeds whenever it is called and
always matches the empty string.
(3) If A ~ f is the rule, A fails whenever it is called.
We shall now formalize the notion of a nonterminal "acting on an input

string.
DEFINITION
Let P = (N, T, R, S) be a TDPL program. We define a set of relations ~

=%.p from nonterminals to pairs of the form (x ~"y, r), where x and y are in
X~* and r is either s (for success) or f (for failure). The metasymbol ~' is used
to indicate the position of the current input symbol. We shall drop the sub-
script P wherever possible.
(1) If A ~ e is in R, then A =~- (r" w, s) for all w ~ Z*.

(2) If A - - i f is in R, then A =~. (~ w,f) for all w ~ Z*.
(3) If A --~ a is in R, with a ~ Z, then
(a) A =~ (alx, s) for all x ~ Z*.
(b) A =~ (I Y, f ) for all those y ~ Z* (including e) which do not
begin with the symbol a.
(4) Let A --, BC/D be in R.
(a) A m__n_~l(xy r'z, s) if
(i) B ~ (x r'yz, s) and
(ii) C ~ (y l z, s).
t
(b) A =-~ (u l v, s), with i = m -q- n -if- p -q- 1, if
(i) s ~ (x l y, s),
(ii) C ~ (T"Y, f ) , and
(iii) a ~ (u l'v, s), where uv = xy.
(c) A ~ (~"xy, f ) , with i = m -b n -k p if- 1, if
(i) B=~ (x ~y, s),
(ii) C ~ (T' Y, f ) , and
(iii) D =~ (I xy, f).
(d) A m___~_~?(x I' Y, s), if
(i) B =~ (~ xy, f), and
(ii) D =~,. (x r.y, s).
(e) A '~----~1(~' x, f), if
(i) B ~ (r' x, f ) , and
(ii) D ~ (~ x, f ) .
(5) The relations ~ do not hold except when required by (1)-(4).
Case (4a) takes care of the case in which B and C both succeed. In (4b)
and (4c), B succeeds but C fails. In (4d) and (4e), B fails. In the last four
cases, D is called and alternately succeeds and fails. Note that the integer
above the arrow indicates the number of "calls" which were made before
the outcome is reached. Observe also that if A ~ (x T'Y, f ) , then x = e.
That is, failure always resets the input pointer to where it was at the begin-
ning of the call.
We define A ~ (x r' y, r) if and only if A ~ (x I' y, r) for some n > 1
The language defined by P, denoted L(P), is [w l S =~ (w ~, s) and w ~ Z*}.
Example 6.1
Let P be the T D P L program (~S, A, B, C}, ~a, b}, R, S), where R is the
sequence of statements
S ~ AB/C
A=>a
B ~ CB/A
C----~ b
Let us investigate the action of P on the input string aba using the rela-
tions defined above. To begin, since S ----~AB/C is the rule for S, S calls A
with input aba. A recognizes the first input symbol and returns success.
Using part (3) of the previous definition we can write A =~ (a I ba, s). Then,
S calls B with input ba. Since B ---~ CB/A is the rule for B, we must examine
the behavior of C on ba. We find that C matches b and returns success. Using
(3) we write C =~ (b l a, s).
Then B calls itself recursively with input a. However, C fails on a and so
C ~ (~" a, f ) . B then calls A with input a. Since A matches a, A ~ (a I, s).
Since A succeeds, the second call of B succeeds. Using rule (4d) we write
B=~,. (a ~',s).
Returning to the first call of B on input ba, both C and B have succeeded,
so this call of B succeeds and we can write B ~ (ba I, s) using rule (4a).
Now returning to the call of S, both A and B have succeeded. Thus, S
matches aba and returns success. Using rule (4a) we can write S ~ (aba I, s).
Thus, aba is in L(P).
It is not difficult to show that L(P) = ab*a + b. [~]
An important property of a T D P L is that the outcome of any program

on a given input is unique. We prove this in the following lemma.
LEMMA 6.1
Suppose that P = (N, E, R, S) is a T D P L program such that for some
A ~ N , A =~,.(x 1 ~yl, r l ) a n d A =~(x2Iy2, r2),where x l y l = x 2 y z = w ~ ~*.
Then we must have x l = x2, y l = Y2, and r l = r2.
Proof The proof is a simple induction on the minimum of n~ and n 2,
which we can take without loss of generality to be na.
Basis. nl = 1. Then the rule for A is A ~ a, a ~ e, or A --, f The con-
clusion is immediate.
Induction. Assume the conclusion for n < n~, and let n~ > 1. Let the
rule for A be A ---~ B C ] D . Suppose that for i = 1 and 2, A ~ (xt ~"y~, ri)
was formed by rule (4) from B ~ (ui ~"% tt) and (possibly) C =g (u~ I v~, t~)
and/or D ~ (u~' ~"vt", t'/). Then m 1 < n 1, so the inductive hypothesis applies
to give ua = u2, vl = v2, and tl = t2. N o w two cases, depending on the value
of t t, have to be considered.
tt tt tot/ tt
Case 1: t~ t2 = f Then since l~ < na, we have u~ = u2, = %, and
t'~' = t~'. Since x~ u~, Yi vi, and r~ = t~ for i = 1 and 2 in this case,
the desired result follows.
' ' = v~ for i - - 1 and 2. Since k~ < nl, we
Case 2: t 1 -- t~ = s. Then u~v~
may conclude that u'a = u'z, v', = v~, ' and t', = t,.
' If t'l = s, then xi = u~u'
?
yt = v~, and r,. = s for i - - 1 and 2. We reach the desired conclusion. If
t'l = f, the argument proceeds with u,.~t and v,.~t as in case 1. U]
It should also be noted that a T D P L p r o g r a m need not have a response to

every input. F o r example, any p r o g r a m having the rule S ~ S S / S , where S
is the start symbol, will not recognize any sentence (that is, the ~ relation is
empty).
The notation we have used for a T D P L to this point was designed for
ease of presentation. In practical situations it is desirable to use more general
rules. F o r this purpose, we now introduce e x t e n d e d T D P L rules and define
their meaning in terms of the basic rules"
(1) We take the rule A ~ B C to stand for the pair of rules A ~ B C / D

and D ---~f, where D is a new symbol.
(2) We take the rule A ---~ B / C to stand for the pair of rules A ---~ B D / C
and D ---~ e.
(3) We take the rule A --~ B to stand for the rules A ~ B C and C ---~ e.
(4) We take the rule A ---~ AtA2 " - A,, n > 2, to stand for the set of
rules A ~ A1B1, B1 ~ A z B 2 , . . . , B._3 ~ A . _ 2 B . _ 2 , B . _ z ~ A . _ i A . .
(5) We take the rule A ~ 0~,/~2/"'/oc., where the oc's are strings of
nonterminals, to stand for the set of rules A ~ B~/C 1, C~ ~ B z / C z . . . . .
C,_3 ~ B , _ 2 / C , - 2 , C , - z ---" B , _ i / B , , and B1 ~ 0cl, B2 ~ 0cz. . . . , B, ---, 0c,.
If n - - 2 , these rules reduce to A - - , B1/Bz, Ba ~ oct, and Bz--~ ~ . F o r
1 < i ~ n if I~] -- 1, we can let B,. be ~ and eliminate the rule B~ ~ ~ .
(6) We take rule A ---, ~ / 0 % / . . . / ~ , , where the ~'s are strings of non-
terminals and terminals, to stand for the set of rules A ----, ~'~/0~/.../0c',, and
X~ ~ a for each terminal a, where ~'~ is 0~t with each terminal a replaced by
xo.
Henceforth we shall allow extended rules of this type in T D P L programs.
The definitions above provide a mechanical way of constructing an equiva-
lent T D P L p r o g r a m that meets the original definition.
These extended rules have natural meanings. For example, if A has the
rule A ~ Y1 Y2 "'" Y,, then A succeeds if and only if Y1 succeeds at the
input position where A is called, Y2 succeeds where Y~ left off, Y3 succeeds

where Y2 left off, and so forth.
Likewise, if A has the rule A ~ ~ 1 / ~ 2 / " " / ~ , , then A succeeds if and
only if ~a succeeds where A is called, or if ~ fails, and ~2 succeeds where A is
called, and so forth.
Example 6.2
Consider the extended TDPL program P = ({E, F, T}, {a, (,), + , ,}, R, S),
where R consists of
E >T+E/T
T >F,T/F
F ~ (E)/a
'Ihe reader should convince himself that L ( P ) = L(Go), where GO is our

standard grammar for arithmetic expressions.
To convert P to standard form, we first apply rule (6), introducing non-
terminals Xo, X~, X~, X+, and X,. The rules become
E > TX+E/T
T > FX, T/F
F - - + XcEX>/X,,
Xa >a
x< >(
x, >)
x+ ---~ +
X, >,
By rule (5), the first rule is replaced by E --~ B i / T and B~ ---~ TX+E. By rule
(4), B1 ---~ TX+E is replaced by B~ ---~ TB2 and B2 ---~ X+E. Then, these are
replaced by B 1 ---~ TBz/D, Bz ~ X+E/D, and D----,f Rule E ~ B i / T is
replaced by E---, B aC/T and C ~ e. The entire set of rules constructed is
Xa >a D- >f B4 > X,T/D

X( >( E----~.B1C/T F >BsC/X~
X> >) B1 > TB2/D B5 > X~B6/D
X+ > q- B2 ~ X+E/D B6 ~ > EX)/D
X, >• T~ B3C/F
C >e B3 ~ FB4/D
We have simplified the rules by identifying each nonterminal X that has

rule X ---~ e with C and each nonterminal Y that has rule Y ----~f with D.
Whenever a TDPL program recognizes a sentence w, we can build a

"parse tree" for that sentence top-down by tracing through the sequence of
T D P L statements executed in recognizing w. The interior nodes of this parse
tree correspond to nonterminals which are called during the execution of
the program and which report success.
ALGORITHM 6.1
Derivation tree from the execution of a TDPL program.
Input. A TDPL program P = (N, E, R, S) and a sentence w in X* such
that S ~ (w r', s).
output. A derivation tree for w.
Method. The heart of the algorithm is a recursive routine buildtree which
takes as argument a statement of the form A ~ (x ~'y, s) and builds a tree
whose root is labeled A and whose frontier is x. Routine buildtree is initially
called with the statement S =%- (w ~', s) as argument.
Routine buildtree: Let A ~ (x r"Y, s) be input to the routine.
(1) If the rule for A is A ---, a or A ---~ e, then create a node with label A
and one direct descendant, labeled a or e, respectively. Halt.
(2) If the rule for A is m~A ~ BC/D and we can write x = x tx2 such that
ml
B =:~ (x 1 r"x2y, s) and c =:~ (x 21 Y, s), create a node labeled A. Execute
routine buildtree with argument B ~ (x 1 I xzY, s), and then with argument
mi
C =:~ (x2 I Y, s). Attach the resulting trees to the node labeled A, so that
the roots of the trees resulting from the first and second calls are the left and
right direct descendants of the node. Halt.
(3) If the rule for A is A ----~B C / D but (2) does not hold, then it must be
m3
that D :=~ (x ~'y, s). Call routine buildtree with this argument and make
the root of the resulting tree the lone direct descendant of a created node
labeled A. Halt.
Note that routine buildtree calls itself recursively only with smaller values
of m, so Algorithm 6.1 must terminate.
Example 6.3
Let us use Algorithm 6.1 to construct a parse tree generated by the TDPL
program P of Example 6.1 for the input sentence aba.
We initially call routine buildtree with the statement S ~ (aba I, s) as
argument. The rule S ~ A B / C succeeds because A and B each succeed,
SEC. 6.1 LIMITEDBACKTRACKTOP-DOWNPARSING 465
recognizing a and ba, respectively. We then call routine buildtree twice, first
with argument A =~. (a I, s) and then with argument B ~ (ha I, s). Thus
the tree begins as shown in Fig. 6.I(a). A succeeds directly on a, so the node
labeled A is given one descendant labeled a. B succeeds because its rule is
B ~ CB/A and C and B succeed on b and a, respectively. Thus the tree
grows to that in Fig. 6.1 (b).
A/s S
a
i /\
C B
(a) (b)
A
I/ C BIB
b A
I
a
(c)
Fig. 6.1 Construction from parse tree in TDPL.
C succeeds directly on b, so the node labeled C gets one descendant

labeled b. B succeeds on a because A succeeds, so B gets a descendant labeled
A, and that node gets a descendant labeled a. The entire tree is shown in
Fig. 6.1(c). D
We note that if the set of TDPL rules is treated as a parsing program,

then whenever a nonterminal succeeds, a translation (which may later have
to be "canceled")can be produced for the portion of input that it recognizes,
in terms of the translations of its "descendants" in the sense of the parse tree
just described. This method of translation is similar to the syntax-directed
translation for context-free grammars, and we shall have more to say about
this in Chapter 9.
466 LIMITEDBACKTRACK PARSING ALGORITHMS CHAP. 6
It is impossible to "prove" Algorithm 6.1 correct, since, as we mentioned,

it itself serves as the constructive definition of a parse tree for a T D P L
program. However, it is straightforward to show that the frontier of the
parse tree is the input. One shows by induction on the number of calls of
routine buildtree that when called with argument A ~ (x ~"y, s) the result is
a tree with frontier x.
Some of the successful outcomes will be "canceled." That is, if we have
+
A ~ (xy ~"z, s) and the rule for A is A --~ BC/D, we may find that B ==~
(x I yz, s) but that C *~ (r' yz, f). Then the relationship B ~ (x F"yz, s) and
all the successful recognitions that served to build up B ~ (x ~"yz, s) are not
really involved in defining the parse tree for the input (except in a negative
way). These Successful recognitions are not reflected in the parse tree, nor is
routine buiidtree Called with argument B ~ (x I yz, s). But all those success-
ful recognitions which ultimately contribute to the successful recognition of
the input are included, i
6.1.2. TDPL and Deterministic Context-Free Languages
It can be quite difficult to determine what language is defined by a T D P L

program. To get some feel for the power of T D P L programs, we shall prove
that every deterministic CFL with an endmarker has a T D P L program
recognizing it. Moreover, the parse trees for that T D P L program are closely
related to the parse trees from the "natural" C F G constructed by Lemma
2.26 from a D P D A for the language. The following lemma will be used to
simplify the representation of a D P D A in this section.
LEMMA 6.2
If L = L,(M~) for D P D A M~, then L = L,(M) for some D P D A M

which never increases the length of its pushdown list by more than 1 on any
single move.
Proof. A proof was requested in Exercise 2.5.3. Informally, replace
a move which rewrites Z as X ~ . . . X,, k > 2, by moves which replace Z
by YuXk, Yk by Yk_~X'k_~,..., Y4 by Y3X3, and Y3 by XiX 2. The Y's are
new pushdown symbols and the pushdown top is on the left. D
LEMMA 6.3
If L is a deterministic CFL and $ is a new symbol, then L$ = L,(M) for

some D P D A M.
Proof The proof is a simple extension of Lemma 2.22. Let L = L(M1).
Then M simulates M1, keeping the next input symbol in its finite control.
M erases its pushdown list if M1 enters a final state and $ is the next input
symbol. No other moves are possible or needed, so M is deterministic. D
THEOREM 6.1
Let M = (Q, 2~, F, ~, q0, Z0, F) be a D P D A with L , ( M ) = L. Then there

exists a T D P L program P such that L = L(P).
Proof. Assume that M satisfies Lemma 6.2. We construct P = (N, 2~, R, S),
an extended T D P L program, such that L(P) = Le(M). N consists of
(1) The symbol S;
(2) Symbols of the form [qZp], where q and p are in Q, and Z ~ F ; and
(3) Symbols of the form [qZpL, where q, Z, and p are as in (2), and a ~ ~.
The intention is that a call of the nonterminal [qZp] will succeed and
recognize string w if" and only if (q, w, Z)[--~-. (p, e, e), and [qZp] will fail under
all other conditions, including the case where (q, w, Z)[--~-(p', e, e) for some
p' ~ p. A call of the nonterminal [qZpL recognizes a string w if and only if
(q, aw, Z)[-~--(p, e,e). The rules of P are defined as follows:
(1) The rule for S is S ~ [qoZoqo]/[qoZoq~]/"'"/[qoZoqA, where q0, q~,
• . . , qk are all the states in Q.
(2) If ,~(q, e, Z ) = (p, e), then the rule for [qZp] is [qZp] ~ e, and for all
p' ~ p, [qZp'] --, f is a rule.
(3) If di(q, e, Z) = (p, X), then the rule for [qZr] is [qZr] ~ [pXr] for all
rinQ.
(4) If ~(q, e, Z ) = (p, X Y ) , then for each r in Q, the rule for [qZr] is
[qZr] ~ [pXqo][qoYr]/[pXqi][q~Yr]/ . . . /[pXqk][qkYr], where qo, qt, . . . , qk
are all the states in Q.
(5) If c~(q, e , z ) is empty, let a ~ , . . . , a ~ be the symbols in X for
which c~(q, a, Z) ~ ~ . Then for r ~ Q, the rule for nonterminal [qZr] is
[qZr] ----, al[qZr]aJa2[qZrL/."/al[qZr]o,. If l = 0, the rule is [qZr] ---~f.
(6) If ~(q, a, Z) = (p, e) for a ~ ~, then we have rule [qZpL---~ e, and
for p' ~ p, we have rule [qZp']~ ---~f.
(7) If cS(q, a, Z) = (p, X), then for each r ~ Q, we have a rule of the form
[qZr]~--~ [pXr].
(8) If c~(q, a , Z ) = (p, X Y ) , then for each r ~ Q, we have the rule
[qZr]a ~ [pXqo][qo Yr]/[pXq ~][q ~Yr]/ . . . /[pXqk][qkYr].
We observe that because M is deterministic, these definitions are consis-
tent; no member of N has more than one rule. We shall now show the follow-
ing"
(6.1.1) [qZp] ~ (w ~"x, s), for any x, if and only if (q, wx, Z)[ ÷ (p, x, e)
(6.1.2) If (q, wx, Z)l-~- (p, x, e), then for all p' ~ p, [qZp'] ~ (1 wx, f )
We shall prove (6.1.2) and the "if" portion of (6.1.1) simultaneously by

induction on the number of moves made by M going from configura-
tion (q, wx, Z) to (p, x, e). If one move is made, then d~(q, a, Z) = (p, e),
i:
where a is e or the first symbol of wx. By rule (2) or rules (5) and (6),
[qZp] ~ (a l y, s), where ay = wx, and [qZp'] -----~(r wx, f ) for all p ' ~ p.
Thus the basis is proved.
Suppose that the result is true for numbers of moves fewer than the num-
ber required to go from configuration (q, wx, Z) to (p, x, e). Let w = aw' for
some a ~ ~ U [e}. There are two cases to consider.
Case 1: The first move is (q, wx, Z ) ~ (q', w'x, X) for some X in F.
By the inductive hypothesis, [q'Xp] ~ (w' r"x, s) and [qZp'] ~ (I w'x, f ) for
p' ~ p. Thus by rule (3) and rules (5) and (7), we have [qXp] =-~ (w r"x, s)
and [qXp'] ~ (r" wx, f) for all p' ~ p. The extended rules of P should be
first translated into rules of the original type to prove these contentions
rigorously.
Case 2: For some X and Y in I', we have, assuming that w ' = yw",
(q, wx, Z) ~---(q', w' x, X Y ) I--e-(q", w" x, Y)I--~- (p, x, e), where the pushdown
list always has at least two symbols between configurations (q', w'x, XY)
and (q", w"x, Y). By the inductive hypothesis, [q'Xq"] ~ (y I w"x, s) and
[q"Yp] ~ (w" I x, s). Also, if p' ~ q", then [q'Xp'] ~ (I w'x, f). Suppose
first that a = ¢. If We examine rule (4) and use the definition of the extended
T D P L statements, we see that every sequence of the form [q'Xp'][p'Yp] fails
for p' -~ q". However, [q'Xq"][q"Yp] succeeds and so [qZp] ~ (w ["x, s)
as desired.
We further note that if p ' -~ p, then [q"Yp'] =~. (F"w"x, f), so that all
terms [q'Xp"][p"Yp'] fail. (If p " ~ q", then [q'Xp"] fails, and if p " = q",
then [p"Yp'] fails.) Thus, [qZp'] ~ (r" wx, f ) for p' ~ p. The case in which
a ~ E is handled similarly, using rules (5) and (8).
We must now show the "only if" portion of (6.1.1). If [qZp] ~ (w ~"x, s)
then [qZp] ~ (w I x, s) for some n.f We prove the result by induction on n.
If n = 1, then rule (2) must have been used, and the result is elementary.
Suppose that it true for n < no, and let [qZp] ~ (w ~"x, s).
Case 1: The rule for [qZp] is [qZp]---, [q'Xp]. Then J(q, e, Z ) = (p, X),
and [q'Xp] ~ (w I x, s), where nl < no. By the inductive hypothesis,
(q', wx, X)l.-~- (p, x, e). Thus, (q, wx, Z)I -e- (p, x, e).
Case 2: The rule for [qZp] is [qZp] ~ [q'Xqo][qoYp]/ "" /[q'Xqk][qkYp].
Then we can write w = w'w" such that for some p', [q'Xp'] ~ (w' I w"x, s)
and [p'Yp] ~ (w" r"x, s), where nl and nz are less than n 0. By hypothesis,
(q', w'w"x, XY)[---(p', w"x, Y)[---(p, x, e). By rule (4), ~(q, e, Z) =
(q', XY). Thus, (q, wx, Z) ~ (p, x, e).
tThe step counting must be performed by converting the extended rules to the original
form.
Case 3: The rule for [qZp] is defined by rule (5). That is, O(q, e, Z ) = ~ .
Then it is not possible that w = e, so let w = aw'. If the rule for nonterminal
[qZp]° is [qZp]a ~ e, we know that O(q, a, Z) = (p, e), so w' = e, w = a, and
(q, w, Z) ~ (p, e, e). The situations in which the rule for [qZp], is defined by
(7) or (8)are handled analogously to cases 1 and 2, respectively. We omit
these considerations.
To complete the proof of the theorem, we note that S ~ (w l, s) if and
only if for some p, [qoZoP] ~ (w I, s). By (6.1.1), [qoZoP] ~ (w l, s) if and
only if (q0, w, Z0)[-~-- (p, e, e). Thus, L(P) = L,(M).
COROLLARY
If L is a deterministic C F L and $ a new symbol, then L$ is a T D P L

language.
Proof. From Lemma 6.3. D
6,1.3, A Generalization of TDPL
We note that if we have a statement A ~ BC/D in TDPL, then D is

called if either B or C fails. There is no way to cause the flow of control to
differ in the cases in which B fails or in which B succeeds and C fails. To
overcome this defect we shall define another parsing language, which we call
G T D P L (generalized TDPL). A program in G T D P L consists of a sequence
of statements of one of the forms
(1) A ---, B[C, D]
(2) A --~ a
(3) A ~ e
(4) A --~ f
The intuitive meaning of the statement A ~ B[C, D] is that if A is called, it
calls B. If B succeeds, C is called. If B fails, D is called at the point on the
input where A was called. The outcome of A is the outcome of C or D,
whichever gets called. Note that this arrangement differs from that of the
T D P L statement A ~ BC/D, where D gets called if B succeeds but C fails.
Statements of types (2), (3), and (4) have the same meaning as in TDPL.
We formalize the meaning of G T D P L programs as follows.
DEFINITION
A GTDPL program is a 4-tuple P = (N, ~, R, S), where N, ~, and S are

as for a T D P L program and R is a list of rules of the forms A ~ B[C, D],
A ~ a, or A ~ f Here, A, B, C, and D are in N, a is in ~ U [e}, and f is
the failure metasymbol, as in the T D P L program. There is at most one rule
with any particular A to the left of the arrow.
We define relations ~ as for the T D P L program"
(1) If A has rule A ---~ a, for a in Z U [e}, then A ~ (a ~"w, s) for all
w ~ Z*, and A ~ (I w, f ) for all w ~ Z* which do not have prefix a.
(2) If A has rule A ~ f, then A =g. (~' w, f ) for all w ~ Z*.
(3) If A has rule A ----~B[C, D], then the following hold"
(a) If B =~ (w I xy, s) and C ~ (x r"y, s), then A ,,___~i(wx I Y, s).
(b) If B ~ (w r"x, s) and C ~ (F"x, f), then A"-----~2 (l wx, f).
(c) If B =~ (~ wx, f ) and D ~ (w r' x, s), then A m____~)(W~'X, S).
(d) If B ~ (I w, f ) and D ~ (r' w, f ) , then A m---e-~~ (I w, f).
We say that A ~ (x ~"y, r) if A ~ (x [' y, r) for some n ~ 1. The language

defined by P, denoted L(P), is the set {w l S =~ (w I, s)}.
Example 6.4
Let P be a GTDPL program with rules
s A[C, El
C- > S[B, E]
A >a
B >b
E >e
We claim that P recognizes {a"b"]n ~ 0}. We can show by simultaneous

induction on n that S ~ (a"b" ~"x, s) and C =~ (a"b"+I I x, s) for all x and
n. For example, with input aabb, we make the following sequence of
observations"
A ~ ([" bb, f )
E~ (~"bb, s)
S~ (I bb, s)
B ___k. (b I b, s)
c (b I b, s)
A~ (a I bb, s)
S ---2-.,.(ab I b, s)
B (b s)
C~ (abb r', s)
A~ (a r"abb, s)
s ~ (aabb l, s) D
The next example is a G T D P L program that defines a non-context-free

language.
Example 6.5
We construct a G T D P L program to recognize the non-CFL [0"1"2" In > 1}.
By Example 6.4, we know how to write rules that check whether the
string at a certain point begins with 0"1" or 1"2". Our strategy will be to first
check that the input has a prefix of the form 0 ~ 1"2 for m ~ 0. If not, we shall
arrange it so that acceptance cannot occur. If so, we shall arrange an inter-
mediate failure outcome that causes the input to be reconsidered from
the beginning. We shall then check that the input is of the form 0'l J2 J.
Thus both tests will be met if and only if the input is of the form 0"1"2" for
n~l.
We shall need nonterminals that recognize a single terminal or cause
immediate success or failure; let us list them first"
(1) X >0
(2) Y- 3i
(3) Z ~2
(4) E >e
(5) r >f
We can utilize the program of Example 6.4 to recognize 0mlm2 by a non-

terminal Sa. The rules associated with Sa are
(6) S , ~ A[Z, Z]
(7) A > X[B, El
(8) B - - ÷ A[Y, E]
Rules (7), (8), (1), (2), and (4) correspond to those of Example 6.4 exactly.
Rule (6) assures that S~ will recognize what A recognizes (0mlm), followed by
2. Note that A always succeeds, so the rule for $1 could be $1 ---~A[Z, W]
for any W.
Next, we must write rules that recognize 0* followed by 1~2J for some j.
The following rules suffice"
(9) $2 > X[S2, C]

(10) C ~ Y[D, E]
(11) D > C[Z, El
Rules (10), (11), (2), (3), and (4) correspond to Example 6.4, and C
recognizes 1J2j. The rule for $2 works as follows. As long as there is a prefix
of O's on the input, $2 recognizes one of them and calls itself further along
the input. When X fails, i.e., the input pointer has shifted over the O's, C is
called and recognizes a prefix of the form t j 2 j. Note that C always succeeds,
so S z always succeeds.
We must now put the subprograms for S~ and $2 together. We first
create a nonterrL aal $3, which never consumes any input, but succeeds or
fails as S~ fails or succeeds. The rule for $3 is
(12) $3 > S,[F, E]
Note that if S~ succeeds, $3 will call F, which must fail and retract the input
pointer to the place where S~ was called. If S~ fails, $3 calls E, which succeeds.
Thus, $3 uses no input in any case. Now we can let S be the start symbol,
with rule
(13) S > Sa[F, $21
If $1 succeeds, then $3 fails and $2 is called at the beginning of the input.

Thus, S succeeds whenever $1 and $2 succeed on the same input. If $1 fails,
then S 3 succeeds, so S fails. If $1 succeeds but Sz fails, then S also fails. Thus
the program recognizes {0"1"2" In ~ 1}, which is the intersection of the sets
recognized by S~ and $2. Hence there are languages which are not context-
free which can be defined by GTDPL programs. The same is true for TDPL
programs, incidentally. (See Exercise 6.1.1.) [Z]
We shall now investigate some properties of GTDPL programs. The

following lemma is analogous to Lemma 6.1.
LEMMA 6.4
+
Let P = (N, X, R, S) be any GTDPL program. If A ==~(xr'y, r l) and
A ~ (u ~"v: r2), where xy = uv, then x = u, y = v, and r l = r2.
Proof. Exercise. 5
We now establish two theorems about GTDPL programs. First, the class
of TDPL definable languages is contained in the class of GTDPL definable
languages. Second, every language defined by a GTDPL program can be
recognized in linear time on a reasonable random access machine.
THEOREM 6 . 2
Every TDPL definable language is a G T D P L definable language.

Proof Let L = L(P) for a T D P L program P = (N, Z, R, S). We define
the G T D P L program P ' = (N', Z, R', S), where R' is defined as follows"
(1) If A ---~ e is in R, add A ---~ e to R'.
(2) If A ----~a is in R, add A ----~a to R'.
(3) If A ----~f is in R, add A ~ f to R'.
(4) Create nonterminals E and F and add rules E ~ e and F ---~f to R'.
(Note that other nonterminals with the same rules can be identified with
these.)
(5) If A ~ BC/D is in R, add
A > A'[E, D]
A' > B[C, F]
to R', where A' is a new nonterminal.

Let N' be N together with all new nonterminals introduced in the construc-
tion of R'.
It is elementary to observe that if B and C succeed, then A' succeeds, and
that otherwise A' fails. Thus, A succeeds if and only if A' succeeds (i.e., B
and C succeed), or A' fails (i.e., B fails or B succeeds and C fails) and D
succeeds. It is also easy to check that B, C, and D are called at the same
points on the input by R' that they are called by R. Since R' simulates each
rule of R directly, we conclude that S ~p (w T"~ s) if and only if S ~p , (w ~" s),
andL(P)=L(P'). 5
Seemingly, the G T D P L programs do more in the way of recognition than

T D P L programs. For example, a G T D P L program can readily be written
to simulate a statement of the form
A > BC/(Di, D2)
in which D 1 is to be called if B fails and D 2 is to be called if B succeeds and

C fails. It is open, however, whether the containment of Theorem 6.2 is
proper.
As with TDPL, we can embellish G T D P L with extended statements (see
Exercise 6.1.12). ~For example, every extended T D P L statement can be
regarded as an extended form of G T D P L statement.
6.1.4. Time Complexity of GTDPL Languages
The main result of this section is that we can simulate the successful
recognition of an input sentence by a G T D P L program (and hence a T D P L
program) in linear time on a machine that resembles a random access com-

puter. The algorithm recalls both the Cocke-Younger-Kasami algorithm
and Earley's algorithm of Section 4.2, and works backward on the input
string.
ALGORITHM 6.2
Recognition of G T D P L languages in linear time.
Input. A G T D P L program P = (N, Z, R, S), with N = [A~, A z , . . . , Ak},
and S = A~, and an input string w = a~a2 . . . a, in Z*. We assume that
a,+ ~ = $, a right endmarker.
Output. A k × (n + 1) matrix [ t j . Each entry is either undefined, an
integer m such that 0 < m < n, or the failure symbol f. If t;~ = m, then
A~ ~ (ajaj+~ . . . aj+~_, ["aj+m "'" a., s). If t,l = f , At =~" (r" aj . . . a,, f ) .
Otherwise, t,j is undefined.
Method. We compute the matrix of ttj's as follows. Initially, all entries
are undefined.
(1) Do steps (2)-(4) f o r j = n + 1, n , . . . , 1.
(2) For each i, 1 < i < k, if As --* f is in R, set t,j = f . If A s ~ e is in
R, set t;j = 0. If A --~ a~ is in R, set t;j = 1, and if A ~ b is in R, b ~ aj,
set t,j - - f (We take a,+ ~ to be a symbol not in Z, so A --* a,+ ~ is never in R.)
(3) Do step (4) repeatedly for i - - 1, 2 , . . . , k, until no changes to the
t j s occur in a step.
(4) Let the rule for At be of the form At ~ Ap[Aq, A,], and suppose
that tsj is not yet defined.
(a) If t~j = f a n d trj = X, then set t,j = x, where x is an integer or f
(b) If tpj = m~ and tq<j+m,~ = m2 ~ f , set ttj = m~ + m2.
(c) If t~j = m~ and t~c~+m,) = f, set t;~ = f
In all other cases do nothing to t~j. ~]
THEOREM 6.3
Algorithm 6.2 correctly determines the tsj's.

P r o o f We claim that after execution of Algorithm 6.2 on the input
string w = al . . . a,, tsj = f if and only if As ~ (I a~ . . . a,, f ) and tsj = m
if and only if At ~ (aj . . . aj+m_ 1 I a~+m "'" a,, s). A straightforward induc-
tion on the order in which the t j s are computed shows that whenever t~
is given a value, that value is as stated above. Conversely, an induction on l
shows that if At =L. ([' a j . . . a , , f ) or A, =L. (aj . . . aj+m_ ~ 1 aj+m " " a,, s),
then tt~ is given the value f or m, respectively. Entry tt~ is left undefined if
A t called at positionj does not halt. The details are left for the Exercises. D
Note that al .." a. is in L ( P ) if and only if t l~ = n.

sEc. 6.1 LIMITED BACKTRACK TOP-DOWN PARSING 475
THEOREM 6.4
For each GTDPL program there is a constant c such that Algorithm 6.2
takes no more than cn elementary steps on an input string of length n > 1,
where elementary steps are of the type used for Algorithm 4.3.
Proof The crux of the proof is to observe that in step (3) we cycle through
all the nonterminals at most k times for any given j. D
Last we observe that from the matrix in Algorithm 6.2 it is possible to

build a tree-like parse structure for accepted inputs, similar to the structure
that was built in Algorithm 6.1 for TDPL programs. Additionally, Algorithm
6.2 can be modified to recognize and "parse" according to a TDPL (rather
than GTDPL) program.
Example 6.6
Let P = (N, E, R I, E), where
N= [E,E+,T,T.,F,F', X , Y , P , M , A , L , R],
~E = [a, (,), + , ,], and R1 consists of
(1) g • T[E+, X]
(2) g+ ,PIE, Y]
(3) Z > F[T., X]
(4) T. > M[T, Y]
(5) F • L[F', A]
(6) F' • E[R, X]
(7) X---> f
(8) Y >e
(9) P >+
(I0) M • *
(11) A •a
(12) L: >(
(13) R •)
This GTDPL program is intended to recognize arithmetic expressions

over + and ,, i.e., L(Go). E recognizes an expression consisting of a sequence
of terms (T's) separated by ÷ ' s . The nonterminal E+ is intended to recognize
an expression with the first term deleted. Thus rule (2) says that E+ recognizes
a + sign (P) followed by any expression, and if there is no + sign, the empty
string (Y) serves. Then we can interpret statement (1) as saying that an expres-
sion is a term followed by something recognized by E÷, consisting of either
the empty string or an alternating sequence of q-'s and terms beginning with
•q- and ending in a term. A similar relation applies to statements (3) and (4).
Statements (5) and (6) say that a factor (F) is either ( followed by an
expression followed by ) or, if no ( is present, a single symbol a.
Now, suppose that (a + a) • a is the input to Algorithm 6.2. The matrix
[ t j constructed by Algorithm 6.2 is shown in Fig. 6.2.
Let us compute the entries in the eighth column of the matrix. The entries
for P, M, A, L, and R have value f, since they look for input symbols in
and the eighth input symbol is the right endmarker. X" always yields value
f, and Y always yields value 0. Applying step (3) of Algorithm 6.2, we find
that in the first cycle through step (4) the values for E+, T., and F can be
filled in and are 0, 0, and f, respectively. On the second cycle, T is given the
value f. The values for E and F ' can be computed on the third cycle.
( a -4- a ) • a $
E 7 3 f 1 f f 1 f
E+ 0 0 2 0 0 0 0 0
T 7 1 f 1 f f 1 f
T, 0 0 0 0 0 2 0 0
F 5 1 f 1 f f 1 f
F' f 4 f 2 f f f f
X f f f f f f f f
Y 0 0 0 0 0 0 0 0
P f f 1 f f f f f
M f f f f f 1 f f
A f 1 f 1 f f 1 f
L 1 f f f f f f f
R f f f f 1 f f f
Fig. 6.2 Recognition table from Algorithm 6.2.
An example of a less trivial computation occurs in column 3. The bottom

seven rows are easily filled in by statement (2). Then, by statement (3), since
the P entry in column 3 is 1, we examine the E entry in column 4 ( = 3 --t- 1)
and find that this is also 1. Thus the E÷ entry in column 3 is 2 ( = 1 -q- 1). D
6.1.5. Implementation of GTDPL Programs
In practice, implementation of GTDPL-like parsing systems do not take

the tabular form of Algorithm 6.2. Instead, a trial-and-error method is
normally used. In this section we shall construct an automaton that "imple-
ments" the recognition portion of a G T D P L program. We shall leave it to
the reader to show how this automaton could be extended to a transducer

which would indicate the successful sequence of routine calls from which
a "parse" or translation can be constructed.
The automaton consists of an input tape with an input pointer, which
may be reset; a three-state finite control; and a pushdown list consisting
of symbols from a finite alphabet and pointers to the input. The device
operates in a way that exactly implements our intuitive idea of routines
(nonterminals) calling one another, with the retraction of the input pointer
required on each failure. The retraction is to the point at which the input
pointer dwelt when the call of the failing routine occurred.
DEFINITION
A parsing machine is a 6-tuple M = (Q, E, F, 6, begin, Zo) , where

(1) Q = {success, failure, begin].
(2) E is a finite set of input symbols.
(3) r is a finite set of pushdown symbols.
(4) ~ is a mapping from Q x ( E u { e } ) x F to Q x F .2, which is
restricted as follows"
(a) If q is success or failure, then 0(q, a, Z) is undefined if a ~ E,
and ~(q, e, Z) is of the form (begin, Y) for some Y ~ F.
(b) If ~(begin, a, Z) is defined for some a ~ ~, then ~(begin, b, Z) is
undefined for all b #= a in E U [e}.
(c) For a in E, ~(begin, a, Z) can only be (success, e) if it is defined.
(d) ~(begin, e, Z) can only be of the forms (begin, YZ), for some Y
in F, or of the form (q, e), for q : success or failure.
(5) begin is the initial state.
(6) Z0 in r is the initial pushdown symbol.
M resembles a pushdown automaton, but there are several major differ-
ences. We can think of the elements of I" as routines that either call or trans-
fer to each other. The pushdown list is used to record recursive calls and the
position of the input head each time a call was made. The state begin normally
causes a call of another routine, reflected in that if ~(begin, e, Z) = (begin, YZ),
where Y is in r and Z is on top of the list, then Y will be placed above
Z on a new level of the pushdown list. The states success and failure
cause transfers to, rather than calls of, another routine. If, for example,
~(success, e, Z) : (begin, Y), then Y merely replaces Z on top of the list.
We formally define the operation of M as follows.
A configuration of M is a triple (q, w I x, ~,), where
(1) q is one of success, failure, or begin;
(2) w and x are in E*; I is a metasymbol, the input head;
(3) ~, is a pushdown list of the form (Z~, i~) . . . (Zm, ira), where Zj ~ U
and ij is an integer, for 1 ~ j ~ m. The top is at the left. The Z's are "routine"
calls; the i's are input pointers.
We define the I-~- relation, or ~ when M is understood, on configurations

as follows:
(1) Let ~(begin, e, Z) = (begin, YZ) for Y in F . Then
(begin, w ~"x, (Z, i)7 ) t-- (begin, w f x, (Y, y)(z, i)~,),
where j = I wl. Here, Y is "called," and the position of the input head.
when Y is called is recorded, along with the entry on the pushdown list
for Y.
(2) Let ~(q, e, Z) = (begin, Y), where Y ~ F, and q = success or failure.
Then (q, w I x, (Z, i)7) t- (begin, .w ["x, (Y, i)r). Here Z "transfers" to Y.
The input position associated with Y is the same as that associated with Z.
(3) Let ~(begin, a, Z) = (q, e) for a in E u {e}. If q = success, then
(begin, w Iax, (Z, i)~,) ~- (success, wa ~ x, ~,). If a is not a prefix of x or
q = failure, then (begin, w I x, (Z, i)7) ~ (failure, u I v, ~,), where uv = wx
and [ul = i. In the latter case the input pointer is retracted to the location
given by the pointer on top of the pushdown list.
Note that if ~(begin, a, Z) = (success, e), then the next state of the parsing
machine is success if the unexpended input string begins with a and failure
otherwise.
Let ~ be the transitive closure of ~ . The language defined by M, denoted
L(M), is {w[w is in E* and (begin, I w, (Z0, 0))[ --~- (success, w r', e)}.
Example 6,7
Let M = (Q, (a, b}, (Z0, Y, A, B, E}, ~, begin, Zo), where 6 is given by
(1) 6(begin, e, Zo) = (begin, YZo)
(2) O(success, e, Z o ) = (begin, Z0)
(3) $(failure, e, Z0) = (begin, E)
(4) $(begin, e, Y) = (begin, A Y)
(5) O(success, e, Y ) = (begin, Y)
(6) 6(failure, e, Y) = (begin, B)
(7) $(begin, a, A) = (success, e)
(8) O(begin, b, B) = (success, e)
(9) O(begin, e, E) = (success, e)
M recognizes e or any string of a's and b's ending in b, but does so in
a peculiar way. A and B recognize a and b, respectively. When Y begins, it
looks for an a, and i f Y finds it, Y "transfers" to itself. Thus the pushdown
list remains intact, and a's are consumed on the input. If b or the end of
the input is reached, Y in state failure causes the top of the pushdown list
to be erased. That is, Y is replaced by B, and, whether B succeeds or fails,
that B is eventually erased.
Z0 calls Y and transfers to itself in the same way that Y calls A. Thus
any string of a's and b's ending in b will eventually cause Z0 to be erased and
state success entered. The action of M on input abaa is given by the following
sequence of configurations"
(begin, I abaa, (Z o, 0)) t--- (begin, I abaa, (Y, O)(Zo, 0))

t- (begin, l' abaa, (A, O)(Y, O)(Zo, 0))
t- (success, a ~'ban, (Y, O)(Zo, 0))
t- (begin, a ["ban, (Y, O)(Zo, 0))
(begin, a ~"ban, (A, 1)(Y, O)(Zo, 0))
(failure, a l ban, (Y, O)(Zo, 0))
(begin, a ~"ban, (B, O)(Zo, 0))
(success, ab ~"an, (Zo, 0))
(begin, ab ~"an, (Zo, 0))
(begin, ab l an, (Y, 2)(Zo, 0))
(begin, ab I an, (A, 2)(Y, 2)(Zo, 0))
[-- (success, aba ~ a, (Y, 2)(Zo, 0))
[- (begin, aba ~"a, (Y, 2)(Z o, 0))
t- (begin, aba ~"a, (A, 3)(Y, 2)(Zo, 0))
(success, abaa I, (Y, 2)(Zo, 0))
(begin, abaa ~', (Y, 2)(Zo, 0))
(begin, abaa l', (A, 4)(Y, 2)(Zo, 0))
[- (failure, abaa ~', (Y, 2)(Zo, 0))
[- (begin, abaa I, (B, 2)(Z o, 0))
t-- (failure, ab ~"an, (Zo, 0))
[- (begin, ab ["aa, (E, 0))
[- (success, ab ~ an, e)
Note that abaa is not accepted because the end of the input was not
reached at the last step. However, ab alone would be accepted. It is important
also to note that in the fourth from last configuration B is not "called" but
replaces Y. Thus the number 2, rather than 4, appears on top of the list, and
when B fails, the input head backtracks. [-7
We shall now prove that a language is defined by a parsing machine if
and only if it is defined by a GTDPL program.
LEMMA 6.5
If L -- L(M) for some parsing machine M -- (Q, E, r', 6, begin, z0), then
L -- L(P) for some GTDPL program P.
Proof. Let P = (N, Z, R, Z0), where N = F u {X} with X a new symbol.

Define R as follows"
(1) X has no rule.
(2) If ~(begin, a, Z) = (q, e), let Z --~ a be the rule for Z if q = success,
and Z--~ f be the rule if q = failure.
(3) For the other Z's in F define Y~, Y2, and Y3 as follows'
(a) If ~(begin, e, Z) (begin, YZ), let Y~ = Y.
(b) If ~(q, e, Z) = (begin, Y), let Y2 = Y if q is success and let Y3
= Y if q is failure.
(c) If Yt is not defined by (a) or (b), take Yt to be X for each Yt not
otherwise defined.
Then the rule for Z is Z ---~ Y~[Y2, Y3].
We shall show that the following statements hold for all Z in r .
(6.1.3) Z ---> (w ~'x, s) if and only if

(begin, I wx, (Z, 0))Ira- (success, w I x, e)
(6.1.4) Z ~ ([' w, f ) if and only if (begin, I w, (Z, 0))[-~-- (failure, I w, e)
We prove both simultaneously by induction on the length of a derivation in

P or computation of M.
Only if: The bases for (6.1.3) and (6.1.4) are both trivial applications of
the definition of the l- relation.
For the inductive step of (6.1.3), assume that Z ~ (w ~'x, s) and that
(6.1.3) and (6.1.4) are true for smaller n. Since we may take n > 1, let the rule
for Z be Z ~ Yi[Y2, Y3].
Case 1: w = wlw2, Y1 ~ (wl ~"w2x, s) and Y2 ~ (w2 I x, s). Then n 1

and n2 are less than n, and we have, by the inductive hypothesis (6.1.3),
(6.1.5) (begin, ~"wx, (Y1, 0))12- (success, w 1 I w2x, e)

and
(6.1.6) (begin, I w2x, (Y2, 0))12- (success, wz I x, e)
We must now observe that if we insert some string, w 1 in particular, to

the left of the input head of M, then M will undergo essentially the same
action. Thus from (6.1.6) we obtain
(6.1.7) (begin, w 1 ~ w2x , (Y2, 0))12- (success, w I x, e)
This inference requires an inductive proof in its own right, but is left for
the Exercises.
SEC. 6.1 LIMITED BACKTRACK T O P - D O W N PARSING 481
From the definition of R we know that ~(begin, e, Z ) = (begin, YtZ)

and ~(suceess, e, Z) = (begin, Y2). Thus
(6.1.8) (begin, T'wx, (Z, 0)) ~ (begin, ~' wx, (Y~, 0)(Z, 0))
and
(6.1.9) (success, wl r"w2x, (Z, 0)) ~ (begin, w 1 I w2x, (Y2, 0)).
Putting (6.1.9), (6.1.5), (6.1.8), and (6.1.7) together, we have
(begin, [' wx, (Z, 0))l-~-- (success, w r' x, e),

as desired.
Case 2: Yl ~ (~ wx, f ) and Y3 ~ (w p x, s). The proof in this case is

similar to case 1 and is left for the reader.
We now turn to the induction for (6.1.4). We assume that Z ~ (~ w, f).
Case 1 : Y 1 ~ (wl ~ w2, s) and Yz ~ (~ wz, f), where w lw z = w. Then

nl, n2 < n, and by (6.1.3) and (6.1.4), we have
(6.1.10) (begin,p w, (Y1, 0))[--~-(success,w 1 ~"w z, e)

(6.1.11) (begin, r"w2, (Y2, 0))I~ (failure, ~'wz, e)
If we insert w a to the left of 1"in (6.1.11), we have
(6.1.12) (begin,wl ~"w2, (Yz, 0))I--~- (failure, ~"w 1wz, e)

The truth of this implication is left for the Exercises. One has to observe
that when (Yz, 0) is erased, the input head must be set all the way to the left.
Otherwise, the presence of w l on the input cannot affect matters, because
numbers written on the pushdown list above (Y1, 0) will have 1w~ [ added to
them [when constructing the sequence of steps represented by (6.1.12) from
(6.1.11)], and thus there is no way to get the input head to move into w x
without erasing (Y~, 0).
By definition of Y~ and Y2, we have
(6.1.13) (begin, ~'w, (Z, 0)) ~ (begin, ~"w, (Y~, O)(Z, 0))
and
(6.1.14) (success, w l ~"w2, (Z, 0)) F- (begin, w t ["w2, (Y2, 0))
Putting (6.1.13), (6.1.10), (6.1.14), and (6.1.12) together, we have

(begin, ["w, (Z, 0))I --~- (failure, 1"w, e).
Case 2 : Y 1 ~ (~ w, f ) and Y3 =%. ([' w, f). This case is similar and left
to the reader.
lf: The "if" portion of the proof is similar to the foregoing, and we leave
the details for the Exercises.
As a special case of (6.1.3), Z0 =~ (w l', s) if and only if (begin, [' w, (Z0, 0))
!-~-- (success, w [', e), so L(M) = L(P). D
LEMMA 6.6
If L = L(P) for some GTDPL program P, then L = L(M) for a parsing
machine M.
Proof. Let P = (N, X, R, S) and define M = (Q, X, N, ~, begin, S).
Define ~ as follows:
(1) If R contains rule A ---~ B[C, D], let ~(begin, e, A) = (begin, BA),
$(sueeess, e, A) = (begin, C) and dr(failure, e, A) = (begin, D).
(2) (a) If A----~ a is in R, where a is in X u [e}, let $(begin, a, A ) =
(success, e).
(b) If A ---~ f is in R, let 6(begin, e, A) = (failure, e).
A proof that L ( M ) = L(G) is straightforward and left for the Exercises.
D
THEOREM 6.5
A language L is L(M) for some parsing machine M if and only if it is
L(P) for some GTDPL program P.
Proof Immediate from Lemmas 6.5 and 6.6. D
EXERCISES
6.1.1. Construct TDPL programs to recognize the following languages"

(a) L(ao).
(b) The set of strings with an equal number of a's and b's.
(c) {WCWR[w ~ (a + b)*}.
*(d) {a2"ln > 1}. Hint: Consider S ~ aSa/aa.
(e) Some infinite subset of FORTRAN.
6.1.2. Construct GTDPL programs to recognize the following languages:
(a) The languages in Exercise 6.1.1.
(b) The language generated by (with start symbol E)
E----~ E + T[T
T >T,FIF
F > (E)II
I----+ a[ a(L)
L - > a[a,L
**(c) {a"'ln ~ 1}.
EXERCISES 483
"6.1.3. Show that for every LL(1) language there is a GTDPL program which
recognizes the language with no backtracking; i.e., the parsing machine
constructed by Lemrna 6.6 never moves the input pointer t o the left
between successive configurations.
"6.1.4. Show that it is undecidable whether a TDPL program P = (N, X~, R, S)
recognizes
(a) ~.
(b) X*.
6.1.5. Show that every TDPL or GTDPL program is equivalent to one in
which every nonterminal has a rule. Hint: Show that if A has no rule,
you can give it rule A ~ A A / A (or the equivalent in GTDPL) with no
change in the language recognized.
"6.1.6. Give a TDPL program equivalent to the following extended program.
What is the language defined ? From a practical point of view, what
defects does this program have ?
s ~ Zln/C
h ------~ a
B -----~ S C A
C--+ b
6.1.7. Give a formal proof of Lemma 6.3.

6.1.8. Prove Lemma 6.4,
6.1.9. Give a formal proof that P of Example 6.5 defines {0"l"2"ln > 1}.
6.1.11. Use Algorithm 6.2 to show that the string ((a)) + a is in L ( P ) , where
P is given in Example 6.6.
6.1.12. GTDPL statements can be extended in much the same manner we
extended TDPL statements. For example, we can permit GTDPL
statements of the form
A > X~g~Xzg2 ... Xkgk
where each Xt is a terminal or nonterminal and each gt is either e or

a pair of the form [czi, fli], where ~zi and fit are strings of symbols.
A reports success if and only if each Xtgt succeeds where success is
defined recursively, as follows. The string ~.[0¢t, fit] succeeds if and
only if
(1) ~ succeeds and oct succeeds or
(2) Xi fails and fit succeeds.
(a) Show how this extended statement can be replaced by an equivalent
set of conventional GTDPL statements.
(b) Show that every extended TDPL statement can be replaced by an
equivalent set of (extended) GTDPL statements.
6.1.13. Show that there are TDPL (and GTDPL) programs in which the
number of statements executed by the parsing machine of Lemma 6.6
is an exponential function of the length of the input string.
6.1.14. Construct a GTDPL program to simulate the meaning of the rule
A ~ BC/(D1, D2) mentioned on p. 473.
6.1.15. Find a GTDPL program which defines the language L ( M ) , where M
is the parsing machine given in Example 6.7.
6.1.16. Find parsing machines to recognize the languages of Exercise 6.1.2.
DEFINITION
A TDPL or GTDPL program P = (N, ~Z, R, S) has a partial accep-
tance failure on w if w = uv such that v -~ e and S ~ (u T"v, s). We
say that P is well formed if for every w in E*, either S ~ (1' w, f ) or
s=~ (w, Ls).
"6.1.17. Show that if L is a TDPL language (alt. G T D P L language) and $ is
a new symbol, then L$ is defined by a TDPL program (alt. GTDPL
program) with no partial acceptance failures.
"6.1.18. Let L1 be defined by a TDPL (alt. GTDPL) program and L2 by a well-
formed TDPL (alt. GTDPL) program. Show that
(a) L1 U L2,
(b) L,2,
(c) L1 ~ L2, and
(d) L1 -- .L2
are TDPL (alt. GTDPL) languages.
"6.1.19. Show that every GTDPL program with no partial acceptance failure is
equivalent to a well-formed G T D P L program. Hint: It suffices to look
for and eliminate "left recursion." That is, if we have a normal form
G T D P L program, create a CFL by replacing rules ,4 --~ B[C, D] by
productions A ~ B C and A ~ D. Let A ~ a or A ~ e be produc-
tions of the CFL also. The "left recursion" referred to is in the CFL
constructed.
*'6.1.20. Show that it is undecidable for a well-formed TDPL program
P = (N, X, R, S) whether L ( P ) = ~ . Note: The natural embedding of
Post's correspondence problem proves Exercise 6.1.4(a), but does not
always yield a well-formed program.
Open P r o b l e m s
6.1.23. Does there exist a context-free language which is not a GTDPL lan-
guage ?
6.1.24. Are the TDPL languages closed under complementation ?
s~c. 6.2 LIMITED BACKTRACK BOTTOM-UP PARSING 485
6.1.25. Is every TDPL program equivalent to a well-formed TDPL program?

6.1.26. Is every GTDPL program equivalent to a TDPL program? Here we
conjecture that {a"'ln _~ 1} is a GTDPL language but not a TDPL
language.
6.1.27. Design an interpreter for parsing machines. Write a program that takes
an extended GTDPL program as input and constructs from it an
equivalent parsing machine which the interpreter can then simulate.
6.1.28. Design a programming language centered around GTDPL (or TDPL)
which can be used to specify translators. A source program would be
the specification of a translator and the object program would be the
actual translator. Construct a compiler for this language.
BIBLIOGRAPHIC NOTES
TDPL is an abstraction of the parsing language used by McClure [1965] in his

compiler-compiler TMG.I" The parsing machine in Section 6.1.5 is similar to the
one used by Knuth [1967]. Most of the theoretical results concerning TDPL and
GTDPL reported in this section were developed by Birman and Ullman [1970].
The solutions to many of the exercises can be found there.
GTDPL is a model of the META family of compiler-compilers [Schorre, 1964]
and others.
6.2. LIMITED BACKTRACK BOTTOM-UP

PARSING
We shall discuss possibilities of parsing deterministically and bottom-up

in ways that allow more freedom t h a n the shift-reduce methods of Chapter
5. In particular, we allow limited backtracking on the input, and the parse
produced need not be a right parse. The principal method to be discussed
is that of Colmerauer's precedence-based algorithm.
6.2.1. Noncanonical Parsing
There are several techniques which might be used to deterministically

parse grammars which are not LR. One technique would be to permit arbi-
trarily long lookahead by allowing the input pointer to migrate forward
along the input to resolve some ambiguity. When a decision has been reached,
the input pointer finds its way back to the proper place for a reduction.
tTMG comes from the word "transmogrify," whose meaning is "to change in appear-
ance or form, especially, strangely or grotesquely.'"
Example 6.8
S • AalBb
A > 0A1101
B > 0Bl11011
G generates the language [O"Ya[n > 1} U {0"12"b ]n > 1}, which is not a
deterministic context-free language. However, we can clearly parse G by
first moving the input pointer to the end of an input string to see whether
the last symbol is a or b and then returning to the beginning of the string and
parsing as though the string were 0"1" or 0"12,, as appropriate. [-7
Another parsing technique would be to reduce phrases which may not

be handles.
DEFINITION
If G = (N, E, P, S) is a CFG, then fl is a phrase of a sentential form ~zfly

if there is a derivation S ~ ~zAy:=~ ~zflT. If X t . . . X k and X j . . . X1 are
phrases of a sentential form X 1 . . - X,, we say that phrase X ~ . . . Xk is to
the left of phrase Xj . - . Xi if i < j or if i = j and k < l. Thus, if a grammar
is unambiguous, the handle is the leftmost phrase of a right-sentential form.
Example 6.9
Consider the grammar G having productions
S > OABb[OaBc
A >a
B~ >Blll
L(G) is the regular set 0al+(b + c), but G is not LR. However, we can parse
G bottom-up if we defer the decision of whether a is a phrase in a sentential
form until we have scanned the last input symbol. That is, an input string of
the form 0al" can be reduced to OaB independently of whether it is followed
by b or c. In the former case, OaBb is first reduced to OABb and then to S.
In the latter case, OaBc is reduced directly to S. Of course, we shall not pro-
duce either a left or right parse. [Z]
Let G = (N,Z, P, S) be a C F G in which the productions are numbered

from 1 to p and let
(6.2.1) S: ~o ~ ~ 1 - - - - > ' o ~ 2 :-"" :-o~n : w

sEc. 6.2 LIMITED BACKTRACK BOTTOM-UP PARSING 487
be a derivation of w from S. For 0 < i < n, let at = fl~At6~, suppose that

A~ ----~Yt is production p~ and suppose that this production is used to derive
a~+l = fl~?i5~ by replacing the explicitly shown A~. We can represent this
step of the derivation by the pair of integers (p,, It), where 1, = I fl, I. Thus
we can represent the derivation (6.2.1) by the string of n pairs
(6.2.2) (p0,/0)(p,, t0-.. (;._,,/._,)

If the derivation is leftmost or rightmost, then the second components in
(6.2.2), those giving the position of the nonterminal to be expanded in the
next step of the derivation, are redundant.
DEFINITION
We shall call a string of pairs of the form (6.2.2) a (generalized) top-down

parse for w. Clearly, a left parse is a special case of a top-down parse. Like-
wise, we shall call the reverse of this string, that is,
(P,-1, t,-~)(P,-z, l , - 2 ) " " (Pa, ll)(Po, 1o)

a (generalized) bottom-up parse of w. Thus a right parse is a special case of
a bottom-up parse.
If we relax the restriction of scanning the input string only from left to
right, but instead permit backtracking on the input, then we can determin-
istically parse grammars which cannot be so parsed using only the left-to-
right scan.
6,2.2. T w o - S t a c k Parsers
To model some backtracking algorithms, we introduce an automaton

with two pushdown lists, the second of which also serves the function of
an input tape. The deterministic version of this device is a cousin of the two-
stack parser used in Algorithms 4. I and 4.2 for general top-down and bottom-
up parsing. We shall, however, put some restrictions on the device which
will make it behave as a bottom-up precedence parser.
DEFINITION
A two-stack (bottom-up) parser for grammar G = (N, E, P, S) is a finite

set of rules of the form (a, fl) ~ (y, 6), where a, fl, ~,, and ~ are strings of
symbols in N U E U [$}; $ is a new symbol, an endmarker. Each rule of the
parser (a, fl) ~ (y, 6) must be of one of two forms: either
(1) fl = X6 for some X E N U I:, and y = aX, or
(2) a = ~,6 for some 6 in (N u E)*, 6 = Aft, and A ~ e is a production
in P.
In general, a rule (a, fl) ~ (Y, 6) implies that if the string a is on top of
the first pushdown list and if the string fl is on top of the second, then we
can replace 0c by ~, on the first pushd.own list and fl by y on the second. Rules
of type (1) correspond to a shift in a shift-reduce parsing algorithm. Those of
type (2) are related to reduce moves; the essential difference is that the
symbol A, which is the left side of the production involved, winds up on
the top of the second pushdown list rather than the first. This arrangement
corresponds to limited backtracking. It is possible to move symbols from
the first pushdown list to the second (which acts as the input tape), but only
at the time of a reduction. Of course, rules of type (1) allow symbols to move
from the second list to the first at any time.
A configuration of a two-stack parser T is a triple (0~, fl, n), where tx
$(N u E)*, fl ~ (N u E)* $, and rt is a string of pairs consisting of an integer
and a production number. Thus, n could be part of a parse of some string
in L(G). We say that (0c, fl, zt)]-T (0¢, fl', n') if
(1) ~ = txltx2, fl = fl2fll, (t~2, f12) "-~ (r, ~) is a rule of T;
(2) ~' = 0c17, fl' = ~flx ; and
(3) If ((x2, f12) ~ (Y, d~) is a type 1 rule, then zt' = rt; if a type 2 rule and
production i is the applicable production, then ~z'= n(i, j), where j is equal
to I ~ ' 1 - 1.t
Note that the first stack has its top at the right and that the second has
its at the left.
We define I-~-, I~--, and [-~- from I T in the usual manner. The subscript
T will be dropped whenever possible.
The translation defined by T, denoted z(T), is [(w,n)l($, w$, e)! ~
($, S$, zt)3. We say that T is valid for G if for every w ~ L(G), there exists
a bottom-up parse n of w such that (w, rt) ~ z(T). It is elementary to show
that if (w, zt) ~ z(T), then rt is a bottom-up parse of w.
T is deterministic if whenever (0~1, ill) ----~(Yl, ~1) and (~2, flz) --~ (Y2, ~2)
are rules such that 0~ is a suffix of ~2 or vice versa and fll is a prefix of
f12 or vice versa, then ~'1 = ~'2 and di~ = ~2. Thus for each configuration C,
there is at most one C' such that C ~- C'.
Example 6.10
(1) S > aSA

(2) S > bSA
(3) S >b
(4) A - >a
tThe --1 term is present since ~' includes a left endmarker.

SEC. 6.2 LIMITED BACKTRACK BOTTOM-UP PARSING 489
This grammar generates the nondeterministic CFL [wba"lw ~ (a + b)*

and n = [wl}.
We can design a (nondeterministic) two-stack transducer which can
parse sentences according to G by first putting all of an input string on
the first pushdown list and then parsing in essence from right to left.
The rules of T are the following:
(1) (e, X) ---~ (X, e) for all X ~ {a, b, S, A}. (Any symbol may be shifted
from the second pushdown list to the first.)
(2) (a, e) ---~ (e, A). (An a may be reduced to A.)
(3) (b, e) --~ (e, S). (A b may be reduced to S.)
(4) (aSA, e) ---~ (e, S).
(5) (bSA, e) ---, (e, S).
[The last two rules allow reductions by productions (I) and (2).]
Note that T is nondeterministic and that many parses of each input can
be achieved. One bottom-up parse of abbaa is traced out in the following
sequence of configurations"
($, abbaa$, e) ~ ($a, bbaa$, e)

R ($ab, baa$, e)
k- ($abb, aa$, e)
k- ($ab, Saa$, (3, 2))
k- ($abS, aa$, (3, 2))
($abSa, aS, (3, 2))
k- ($abSaa, $, (3, 2))
[- ($abSa, AS, (3, 2)(4, 4))
}--- ($abS, AA$, (3, 2)(4, 4)(4, 3))
[--- ($abSA, AS, (3, 2)(4, 4)(4, 3))
]- ($a, SA$, (3, 2)(4, 4)(4, 3)(2, 1))
R ($aS, AS, (3, 2)(4, 4)(4, 3)(2, 1))
1---($aSA, $, (3, 2)(4, 4)(4, 3)(2, 1))
~- ($, S$, (3, 2)(4, 4)(4, 3)(2, 1)(1, 0))
The string (3, 2)(4, 4)(4, 3)(2, 1)(1, 0) is a bottom-up parse of abbaa,
corresponding to the derivation
S =~ aSA =~ abSAA =~ abSaA =~ abSaa =:¢,.abbaa. [-q
The two-stack parser has an anomaly in common with the general shift-
reduce parsing algorithms; if a grammar is ambiguous, it may still be pos-
sible to find a deterministic two-stack parser for it by ignoring some of

the possible parses. Later developments will rule out this problem.
Example 6.11
S----~'AIB
A > aAla
B > Bala
G is an ambiguous grammar for a ÷. By ignoring B and its productions,

the following set of rules form a deterministic two-stack parser for G'
(e, a) > (a, e)

(a, $) ~ (e, A $)
(a, A) > (aA, e)
(aA, $) > (e, AS)
($, A) ----~ ($A, e)
($A, $) ~ ($, S$) G
6.2.3. Colmerauer Precedence Relations
The two-stack parser can be made to act in a manner somewhat similar

to a precedence parser by assuming the existence of three disjoint relations,
< , ._~._,and .~, on the symbols of a grammar, letting .~ indicate a reduction,
and < and " indicate shifts. When reductions are made, < will indicate
the left end of a phrase (not necessarily a handle). It should be emphasized
that, at least temporarily, we are not assuming that the relations < , -~--, and
~• bear any connection with the productions of a grammar. Thus, for exam-
ple, we could have X Y even though X and Y never appear together on
the right side of a production.
DEFINITION
Let G = (N, Z, P, S) be a CFG, and let < , ~---, and -~ be three disjoint
relations on N U Z U [$}, where $ is a new symbol, the endmarker. The
two-stack parser induced by the relations < , ---~, and -~ is defined by the
following set of rules"
(1) (X, Y) ~ (XY, e) if and only if X < Y or 2" " Y.
(2) ( X Z 1 . . " Zk, Y ) ~ (X, A Y ) if and only if Zk "~ Y: Zt-=-Zt÷~ for
1 ~ i < k, X < Z1, and A ~ Z~ . . . Zk is a production.
We observe that if G is uniquely invertible, then the induced two-stack
parser is deterministic, and conversely.
sEc. 6.2 LIMITED BACKTRACK BOTTOM-UP PARSING 491
Example 6.12
Let G be the grammar with productions
(1) S > aSA

(2) S > bSA
(3) S >b
(4) A >a
as in Example 6.i0.
Let < , ~--, and 3> be defined by Fig. 6.3.
a b S A $
<. <.
<. <. • .> .>

:i '
<. <. • .> .>
.> .>
Fig. 6.3 "Precedence" relations.
These relations induce the two-stack transducer with rules defined as

follows"
(x, r ) - - ~ ( x r , e) for all X E {$, a, b}, Y ~ {a, b}

(Xa, Y) > (x, A Y) for all X ~ {$, a, b], Y ~ {$, A}
(Xb, Y ) > (X, SY) for all X ~ [$, a, b}, Y ~ {$, A}
(X, S) ~ (XS, e) for all X ~ [a, b}
(S, A ) > (SA, e)
(XaSA, r ) > (X, SY) for X ~ [ $, a, b} and Y ~ {A, $}
(XbSA, Y) > (X, SY) for X ~ {$, a, b} and Y ~ {A, $}
T accepts a string wba" such t h a t ] w l = n by the following sequence of

moves"
($, wba"$, e)[2,+1 ($wba", $, e)

I " ( $ w b , A"$, (4, 2n) . . - (4, n + 1))
[ ($w, SA"$, (4, 2n) • .. (4, n -F 1)(3, n))
] 3, ($, S$, (4, 2n) . . . (4, n -k 1)(3, n)(i,, n - - 1 ) . . . (il, 0))
where i v is 1 or 2, 1 < j _< n. Note the last 3n moves alternately shift S and
A, and then reduce either aSA or bSA to S.
It is easy to check that T is deterministic, so no other sequences of moves
are possible with words in L(G). Since all reductions of T are according to
productions of G, it follows that T is a two-stack parser for G. D
On certain grammars, we can define "precedence" relations such that

the induced two-stack parser is both deterministic and valid. We shall make
such a definition here, and in the next section we shall give a simple test by
which we can determine whether a grammar has such a parser.
DEFINITION
Let G = (N, E, P, S) be a CFG. We say that G is a Colmerauer grammar if

(1) G is unambiguous,
(2) G is proper, and
(3) There exist disjoint relations 4 , .-~-, and .> on N U E U {$} which
induce a deterministic two-stack parser which is valid for G.
We call the three relations above Colmerauer precedence relations. Note
that condition (3) implies that a Colmerauer grammar must be uniquely
invertible.
Example 6.13
The relations of Fig. 6.3 are Colmerauer precedence relations, and G of

Examples 6.10 and 6.12 is therefore a Colmerauer grammar. D
Example 6.14
Every simple precedence grammar is a Colmerauer grammar. Let <~, -~-,

and 3> be the Wirth-Weber precedence relations for the grammar G =
(N, ~, P, S). If G is simple precedence, it is by definition proper and un-
ambiguous. The induced two-stack parser acts almost as the shift-reduce
precedence parser.
However, when a reduction of right-sentential form ocflw to 0~Aw is made,
we wind up with $~ on the first stack and Aw$ on the second, whereas in
the precedence parsing algorithm, we would have $~A on the pushdown
list and w$ on the input. If Xis the last symbol of $~, then either X < A or
X ~ A, by Theorem 5.14. Thus the next move of the two-stack parser must
shift the A to the first stack. The two-stack parser then acts as the simple
precedence parser until the next reduction.
Note that if the Colmerauer precedence relations are the Wirth-Weber
ones, then the induced two-stack parser yields rightmost parses. In general,
however, we cannot always expect this to be the case. D
6.2.4. Test for Colmerauer Precedence
We shall give a necessary and sufficient condition for an unambiguous,

proper grammar to be a Colmerauer grammar. The condition involves three
relations which we shall define below. We should recall, however, that it is
undecidable whether a C F G is unambiguous and that, as we saw in Example
6.12, there are ambiguous grammars which have deterministic two-stack
parsers. Thus we cannot always determine whether an arbitrary C F G is
a Colmerauer grammar unless we know a priori that the grammar is un-
ambiguous.
DEFINITION
Let G = (N, X, P, S) be a CFG. We define three new relations 2 (for left),

,u (for mutual or adjacent), and p (for right) on N W X as follows: For all
X a n d Yin N U X, A in N,
(1) A2Y if A ~ Y0¢ is a production,

(2) XtuY if A ~ ocXYfl is a production, and
(3) X p A if A ---~ ~zX is a production.
o0
As is customary, for each relation R we shall use R ÷ to denote U Rt and

i=1
oo
R* to denote U RE Recall that R ÷ and R* can be conveniently computed

i=0
using Algorithm 0.2.

Note that the Wirth-Weber precedence relations < , ~--, and .> on N W £
can be defined in terms of 2,/t, and p as follows:
(1) < = / t 2 +.
(2) ± = u.
(3) > = p + a 2 * m ( N u X ) x £.
The remainder of this section is devoted to proving that an unambiguous,

proper C F G has Colmerauer precedence relations if and only if
(1) p+,u ~ lt2* = ~3, and

(2) ,u ~ p*/.z2+ = ~ .
Example 6.15
Consider the previous grammar
S > aSA [bSA Ib

A >a
Here
~, = {(S, a), (S, b), (A, a)}

/t = {(a, S), (S, A), (b, S)}
p = [(A, S), (b, S), (a, A)]
p+/, = [(A, A), (b, A), (a, A)}
~u~,* = {(a, S), (S, A), (b, S), (a, a), (a, b), (S, a), (b, a), (b, b)}
p*/z~ + -- [(a, a), (a, b), (b, a), (b, b), (S, a), (A, a)}
Since p+/t A ,uA* = ~ and p*,uA + ~ ,u = ~ , G has Colmerauer precedence

relations, a set of which we saw in Fig. 4.3.
We shall now show that if a grammar G contains symbols X and Y such

that X l t Y and Xp*lt2+Y, then G cannot be a Colmerauer grammar. Here,
X and Y need not be distinct.
LEMMA 6.7
Let G = (N, Z, P, S) be a Colmerauer grammar with Colmerauer preced-

ence relations < , ~-, and ->. If XItY, then X ' Y.
Proof Since G is proper, there exists a derivation in which a production

A - - ~ a X Y f l is used. When parsing a word w in L(G) whose derivation
involves that production, at some time aXYfl must appear at the top of
stack 1 and be reduced. This can happen only if X_~- Y. [Z]
LEMMA 6.8
Let G -- (N, Z, P, S) be a C F G such that for some X and Y in N u Z,

X p*/zA + Y and X/z Y. Then G is not a Colmerauer grammar.
Proof. Suppose that G is. Let G have Colmerauer relations < , --', and
>• , and let T b e the induced two-stack parser. Since G is assumed to be proper,
there exist x and y in Z* such that X *=~ x and Y *=~ y. Since X/z Y, there
is a production A ~ aXYfl and strings wl, w2, w3, and w4 in I;* such
that S *~ wlAw4 ==~ wlocXYflw4 ==~ * wlwzXYw3w 4 ==~ * wlw2xyw3w4. Since
X p*,uA + Y, there exists a production B ~ 7ZC5 such that Z ~ ~,'X, C =~-
YO', and for some zl, z2, z3, and z4, we have S ~ zaBz4 =-~ za?ZCSza
zly?'XYO'Sz4 =~ zlz~ XYz3 z4 =~ zlz2xyz3z4.
By Lemma 6.7, we may assume that X " Y. Let us watch the processing
by T of the two strings u = wlw2xyw 3 w4 and v = zlz2xyz3z 4. In particular,
let us concentrate on the strings to which x and y are reduced in each case,
and whether these strings appear on stack 1, stack 2, or spread between
them. Let 01, 0 2 , . . . be the sequence of strings to which xy is reduced
in u and W1, W2,. • • that sequence in v. We know that there is some j such
that 0j = X Y , because since G is assumed to be unambiguous, X and Y

must be reduced together in the reduction of u. If ~F~ = 0~ for 1 _< i < j ,
then when this Y is reduced in the processing of v, the X to its left will also
be reduced, since X " Y. This situation cannot be correct, since C =:~ Y6'
for some 8' in the derivation of v.t
Therefore, suppose that for some smallest i, 1 2, then the break
point between the stacks when 0~_ 1 and tF~_ i were constructed by a reduction
was at the same position in 0~-1 as in tF~_ 1. Therefore, if the break point
migrated out of 0t_ ~ before 0t was created, it did so for tF~_x, and it left in
the same direction in each case. Taking into account the case i = 2, in which
01 = ~F1 = xy, we know that immediately before the creation of 0~ and ~F~,
the break point between the stacks is either
(1) Within 0t and ~F~, and at the same position in both cases; i.e., 0,._1
and W~_1 straddle the two stacks; or
(2) To the right of both 0~_ 1 and tF~_1; i.e., both are on stack 1.
Note that it is impossible for the break point to be left of 0~_ 1 and tF t_ 1
and still have these altered on the next move. Also, the number of moves
between the creation of 0~_1 and 0~ may not be the same as the number
between ~F~_~ and ~F~. We do not worry about the time that the break point
spends outside these substrings; changes of these strings occur only when
the break point migrates back into them.
It follows that since 0t ~ ~F~, the first reduction which involves a symbol
of tF~_ 1 must involve at least one symbol outside of tF~_ 1; i.e., tF t really does
not exist, for we know that the reduction of 0~_ 1 to 0t involves only symbols
of 0~_1, by definition of the 0's. If the next reduction involving W~-i were
wholly within ~t-1, the result would, by (1) and (2) above, have to be that
~ - 1 was reduced to 0t.
Let us now consider several cases, depending on whether, in 0~_ 1, x has
been reduced to X and/or y has been reduced to Y.
Case 1" Both have been reduced. This is impossible because we chose
i~j.
Case 2: y has not been reduced to Y, but x has been reduced to X. Now
the reduction of ~ _ 1 involves symbols of ~ _ i and symbols outside of ~ _ 1.
Therefore, the breakpoint is written 0~-1 and ~ - 1 , and a prefix of both is
reduced. The parser on input u thus reduces X before Y. Since we have
assumed that T is valid, we must conclude that there are two distinct parse
"l'Note that we are using symbols such as X and Y to represent specific instances of
that symbol in the derivations, i.e., particular nodes of the derivation tree. We trust that
the intended meaning will be clear.
trees for u, and thus that G is ambiguous. Since G is unambiguous, we discard

this case.
Case 3: x has not been reduced to X, but y has been reduced to Y. Then
0~_ ~ = OY for some 0. We must consider the position of the stack break
point in two subcases"
(a) If the break point is within 0i_ x, and hence Wi_ x, the only way that
tF t could be different from 0~ occurs when the reduction of 0~_~ reduces
a prefix of 0t_ ~, and the symbol to the left of 0~_ 1is < related to the leftmost
symbol of 0~_~. However, for W~-i, the " relation holds, so a different
reduction occurs. But then, some symbols of Wt-1 that have yet to be reduced
to X are reduced along with some symbols outside of W~_~. We rule out this
possibility using the argument of case 2.
(b) If the break point is to the right of Ot_ 1, then its prefix 0 can never
reach the top of stack 1 without Y being reduced, for the only way to decrease
the length of stack 1 is to perform reductions of its top symbols. But then,
in the processing of u, by the time x is reduced to X, the Y has been reduced.
However, we know that the X and Y must be reduced together in the unique
derivation tree for u. We are forced to conclude in this case, too, that either
T is not valid ~or G is ambiguous.
Case 4: Neither x nor y have been reduced to X or Y. Here, one of
the arguments of cases 2 and 3 must apply.
We have thus ruled out all possibilities and conclude t h a t / t ~ p*,u~, +
must be empty for a Colmerauer grammar. D
We now show that if there are symbols X and Y in a C F G G such that

X p+,u Y and X,u2* Y, then G cannot be a Colmerauer grammar.
LEMMA 6.9
Let G = (N, X, P, S) be a C F G such that for some X and Y in N u X,
X p + g Y and X g 2 * Y . Then G is not a Colmerauer grammar.
Proof. The proof is left for the Exercises, and is similar, unfortunately,
to Lemma 6.8. Since X p*g Y, we can find A----~ o~ZYfl in P such that
Z *~ ~'X. Since X g~+ Y, we can find B ~ yXC8 in P such that C ~ YS'. By
the properness of G, we can find words u and v in L(G) such that each deri-
vation of u involves the production A --, o~ZY]3 and the derivation of ~'X
from that Z; each derivation o f v involves B ~ 7XC8 and the derivation of
YS' from C. In each case, X derives x and Y derives y for some x and y
in ~*.
As in Lemma 6.8, we watch what happens to xy in u and v. In v, we find
that Y must be reduced before X, while in u, either X and Y are reduced at
the same time (if Z *~ ~'X is a trivial derivation) or X is reduced before
SEC. 6.2 LIMITED B A C K T R A C K B O T T O M - U P PARSING 497
Y (if Z ~ 0t'X). Using arguments similar to those of the previous lemma,

we can prove that as soon as the strings to which xy is reduced in u and v
differ, one derivation or the other has gone astray. D
Thus the conditions g N p*,u2 + = ~3 and p*g n g2 + = ~ are necessary

in a Colmerauer grammar. We shall now proceed to show that, along with
unambiguity, properness, and unique invertibility, they are sufficient.
LEMMA 6.10
Let G = (N, Z, P, S) be any proper grammar. Then if t~XYfl is any
sentential form of G, we have X p*,u2* Y.
Proof. Elementary induction on the length of a derivation of txXYfl. D
LEMMA 6.11
Let G = (N, Z, P, S) be unambiguous and proper, with ,u n p*,u2 ÷ =
and p+,u n , u 2 * = ~3. If otYXt... XkZfl is a sentential form of G, then
the conditions X1 ,u X2 . . . . . X;,_i ,u Xk, Y p*,u2 + X1, and Xk p+g2* Z imply
that X~ ... Xk is a phrase of ~YXx "" XkZfl.
Proof. If not, then there is some other phrase of otYX~ ... XkZfl which
includes X1.
Case 1: Assume that X 2 , . . . , Xk are all included in this other phrase.
Then either Y or Z is also included, since the phrase is not X~ . . - X~. Assum-
ing that Y is included, then YgX~. But we know that Y p*,uA + X~, so that
,u n p*,u2 + ~ $3. I f Z is included, then XkltZ. But we also have Xk P+g~* Z.
If 2* represents at least one instance of 2, i.e., Xe p+,u2 + Z, then
,u n p*,u2 + ~ ~ . If 2* represents zero instances of 2, then Xkp +ltZ. Since
Xk g Z, we have Xk g2* Z, SO p+g N g2* ~ ~.
Case 2: X~ is in the phrase, but Xg+~is not for some i such that 1 < i < k .
Let the phrase be reduced to A. Then by Lemma 6.10 applied to the sentential
form to which we may reduce ttYX1... XkZfl, we have A p*,u2* Xt+I, and
hence Xg p+,u2* X~+~. But we already have X~ ,u Xg÷l, so either ,u n p*,u2 +
~ or p+,u n , u 2 * ~ ~ , depending on whether zero or more instances
of 2 are represented by 2* in p+/t2*. D
LEMMA 6.12
Let G = (N, Z, P, S) be a C F G which is unambiguous, proper, and
uniquely invertible and for which ,u n p*,u2 + = ~ and p+,u n ,u2* = ~ .
Then G is a Colmerauer grammar.
Proof. We define Colmerauer precedence relations as follows"
(1) X " Y if and only if X/2 Y.

(2) X < Y if and only if X/22+Y or X = $, Y :-# $.
(3) X 3> Y if and only if X ¢ $ and Y = $, or X p+/22" Y but X/2~ + Y
is false.
It is easy to show that these relations are disjoint. If " n < ~ ;2 or
- : - n .> :¢= ;2, then /2 n p*/22 + :¢= ;2 or /2 n p+/2 = ;2, in which case
/22* n p+/2 :¢= ;2. If < ~ .> ¢ ;2, then X/22 + Y. Also, X/22 + Y is false,
an obvious impossibility.
Suppose that T, the induced two-stack parser, has rule ( Y X ~ . . . Xk, Z)
--~(Y, AZ). Then Y < X ~ , so Y = $ or Y/22+X~. Also, X~ ' X~+~ for
1 Z, so Z = $ or X~ p+/22" Z. Ignoring
the case Y = $ or Z = $ for the moment, Lemma 6.11 assures us that if
the string on the two stacks is a sentential form of G, then X t . . . Xk is
a phrase thereof. The cases Y = $ or Z = $ are easily treated, and we can
conclude that every reduction performed by T on a sentential form yields
a sentential form.
It thus suffices to show that when started with w in L(G), T will continue
to perform reductions until it reduces to S.
By Lemma 6.10, if X and Y are adjacent symbols of any sentential form,
then X p*/22* Y. Thus either X/2 Y, X p+/2 Y, X/2~+ Y, or X p+/22+ Y. In
each case, X a n d Y are related by one of the Colmerauer precedence relations.
A straightforward induction on the number of moves made by T shows
that if X and Y are adjacent symbols on stack I, then X < Y or X " Y.
The argument is, essentially, that the only way X and Y could become adja-
cent is for Y to be shifted onto stack I when X is the top symbol. The rules
of T imply that X < Y or X " Y.
Since $ remains on stack 2, there is always some pair of adjacent symbols
on stack 2 related by 3>. Thus, unless configuration ($, S$, zr) is reached by
T, it will always shift until the tops of stack 1 and 2 are related by .>. At that
time, since the < relation never holds between adjacent symbols on stack 1,
a reduction is possible and T proceeds. E]
THEOREM 6.6
A grammar is Colmerauer if and only if it is unambiguous, proper,

uniquely invertible, and/2 n p*/22÷ = p÷/2 n / 2 2 " = ;2.
Proof Immediate from Lemmas 6.8, 6.9, and 6.12.
Example 6.16
We saw in Example 6.15 that the grammar S ~ aSAIbSAIb, A ~ a

satisfies the desired conditions. Lemma 6.12 suggests that we define
Colmerauer precedence relations for this grammar according to Fig. 6.4. [Z]
EXERCISES 499
b s A $
<. <. <. <.
<. <. _• .> .>
<. <. • .> .>
S <- I = .>
AL-> k "> -> Fig. 6.4 Colmerauer precedence rela-

tions.
EXERCISES
6.2.1. Which of the following are top-down parses in Go ? What word is derived
if it is ?
(a) (1, 0) (3, 2) (5, 4) (2, 0) (4, 2) (2, 5) (4, 0) (4, 5) (6, 5) (6, 2) (6, 0).
(b) (2, 0) (4, 0) (5, 0) (6, 1).
6.2.2. Give a two-stack parser valid for Go.
6.2.3. Which of the following are Colmerauer grammars ?
(a) Go
(b) S ~ aA l bB
A ~ OA1 I01
B--+ 0BIll011.
(c) S - - + a A B I b
A ~ bSBla
B--~a.
*6.2.4. Show that if G is proper, uniquely invertible, /z n p*/tA + = ~ , and
p+,u n / z ~ * = ~ , then G is unambiguous. Can you use this result to
strengthen Theorem 6.6 ?
6.2.5. Show that every uniquely invertible regular grammar is a Colmerauer
grammar.
6.2.6. Show that every uniquely invertible grammar in G N F such that
p+,u n / . t = ~ is a Colmerauer grammar.
6.2.7. Show that the two-stack parser of Example 6.12 is valid.
6.2.8. Show, using Theorem 6.6, that every simple precedence grammar is a
Colmerauer grammar.
6.2.11. Let G be a Colmerauer grammar with Colmerauer precedence relations
< , -~, and .> such that the induced two-stack parser not only parses
every word in L(G), but correctly parses every sentential form of G.
Show that
(a) /t ~ ~tt.
(b) .u,,1,+ ~ < .
(c) p+.u ~ .>.
(d) p+.u2 + ~ .
"6.2.12. Let G be a Colmerauer grammar, and let o be any subset of

p÷,u2 ÷ -- p÷/,t - ,u2 +. Show that the relations --~ = ,u, M = p*,u2 ÷ -- o,
and .> = p÷/t u o are Colmerauer percedence relations capable of
parsing any sentential form of G.
6.2.13. Show that every two-stack parser operates in time 0(n) on strings of
length n.
"6.2.14. Show that if L has a Colmerauer grammar, then L R has a Colmerauer
grammar.
"6.2.15. Is every (1, 1)-BRC grammar a Colmerauer grammar?
"6.2.16. Show that there exists a Colmerauer language L such that neither L nor
L R is a deterministic CFL. Note that the language [wba"11w i = n~},which
we have been using as an example, is not deterministic but that its
reverse is.
• 6.2.17. We say that a two-stack parser T recognizes the domain of 'r(T) regard-
less of whether a parse emitted has any relation to the input. Show that
every recursively enumerable language is recognized by some determin-
istic two-stack parser. Hint: It helps to make the underlying grammar
ambiguous.
Open Problem
6.2.18. Characterize the class of CFG's, unambiguous or not, which have valid
deterministic two-stack parsers induced by disjoint "precedence" rela-
tions.
BIBLIOGRAPHIC NOTES
Colmerauer grammars and Theorem 6.6 were first given by Colmerauer [1970].
These ideas were related to the token set concept by Gray and Harrison [1969].
Cohen and Culik [1971] consider an LR(k)-based scheme which effectively incor-
porates backtrack.
tThis is a special case of Lemma 6.7 and is included only for completeness.
APPENDIX
The Appendix contains the syntactic descriptions of four programming

languages:
(1) A simple base language for an extensible language.
(2) SNOBOL4, a string-manipulating language.
(3) PL360, a high-level machine language for the IBM 360 computers.
(4) PAL, a language combining lambda calculus with assignment state-
ments.
These languages were chosen for their diversity. In addition, the syntactic
descriptions of these languages are small enough to be used in some of
the programming exercises throughout this book without consuming an
excessive amount of time (both human and computer). At the same time
the languages are quite sophisticated and will provide a flavor of the prob-
lems incurred in implementing the more traditional programming languages
such as ALGOL, FORTRAN, and PL/I. Syntactic descriptions of the latter
languages can be found in the following references:
(1) ALGOL 60 in Naur [1963].
(2) ALGOL 68 in Van Wijngaarden [1969].
(3) FORTRAN in [ANS X3.9, 1966] (Also see ANSI-X3J3.)
(4) PL/I in the IBM Vienna Laboratory Technical Report TR 25.096.
A.1. S Y N T A X FOR AN EXTENSIBLE BASE

LANGUAGE
The following language was proposed by Leavenwortht as a base language

which can be extended by the use of syntax macros. We shall give the syntax
?B. M. Leavenworth, "Syntax Macros and Extended Translation" Comm. A C M 9,
No. 11 (November 1966), 790-793. Copyright © 1966, Association for Computing
Machinery, Inc. The syntax of the base language is reprinted here by permission of the
Association for Computing Machinery.
501
502 APPENDIX
of this language in two parts. The first part consists of the high-level produc-
tions which define the base language. This base language can be used as
a block-structured algebraic language by itself.
The second part of the description is the set of productions which defines
the extension mechanism. The extension mechanism allows new forms of
statements and functions to be declared by means of a syntax macro defini-
tion statement using production 37. This production states that an instance
of a (statement) can be a (syntax macro definition), which, by productions
39 and 40, can be either a (statement macro definition) or a (function macro
definition).
In productions 41 and 42 we see that each of these macro definitions
involves a (macro structure) and a (definition). The (macro structure)
portion defines the form of the new syntactic construct, and the (definition)
portion gives the translation that is to be associated with the new syntactic
construct. Both the (macro structure) and (definition) can be any string of
nonterminal and terminal symbols except that each nonterminal in the
(definition) portion must appear in the (macro structure). (This is similar
to a rule in an SDTS except that here there is no restriction on how many
times one n0nterminal can be used in the translation element.)
We have not given the explicit rules for (macro structure) and (defini-
tion). In fact, the specification that each nonterminal in the (definition)
portion appear in the (macro structure) portion of a syntax macro definition
cannot be specified by context-free productions.
Production 37 indicates that we can use any instance of a (macro struc-
ture) defined in a statement macro definition wherever (sentence) appears
in a sentential form. Likewise, production 43 allows us to use any instance
of a (macro structure) defined in a function macro definition anywhere
(primary) appears in a sentential form.
For example, we can define a sum statement by the derivation
(statement) ~ (syntax macro definition)

(statement macro definition)
smacro (macro structure) define (definition) endmacro
A possible (macro structure) is the following"
sum (expression) ~1~with (variable) ~-- (expression) (2) to (expression) C3~
We can define a translation for this macro structure by expanding (defini-

tion) as
begin local t; local s; local r;
t~ 0;
(variable) ~ (expression) (2) ;
A.1 SYNTAX FOR AN EXTENSIBLE BASE LANGUAGE 503
r" if (variable) > (expression) ~3~then goto s;

t < t + (expression)~l~;
(variable) ,-- (variable) + 1,
goto r;
s" result t
end
Then if we write the statement
sum a with b < c to d
this would first be translated into
begin local t; local s; local r;

t< -0;
b< :c;
r' if b > d then goto s;
t ~ - - - t . + a;
b< b+l;
goto r;
s: result t
end
before being parsed according to the high-level productions.

Finally, the nonterminals (identifier), (label), and (constant)are lexical
items which we shall leave unspecified. The reader is invited to insert his
favorite definitions of these items or to treat them as terminalsymbols.
High-Level Productions
1 (program)
(block)
2 (block)--~
begin (opt local ids) (statement list) end
3 (opt local ids)--~
(opt local ids) local (identifier); [ e
5 (statement list)
(statement) ](statement list); (statement)
7 (statement) --~
(variable) ~-- (expression) [goto (identifier)[
if (expression) then (statement) [(block) [result (expression) 1
(label): (statement)
504 APPENDIX
13 (expression) --~
(arithmetic expression) (relation op) (arithmetic expression) !
(arithmetic expression)
15 (arithmetic expression) --~
(arithmetic expression) (add op) (term) [(term)
17 <term> --~
<term) <mult op) (primary) [<primary)
19 (primary) --~
(variable) I(constant) [((expression)) I(block)
23 (variable)
(identifier) t (identifier) ((expression list))
25 (expression list)
(expression list), (expression) I(expression)
27 (relation op)--~
<1_<1=I~1>!>_
33 (add op)
+!-
35 (mult op)
,1/
Extension Mechanism
37 (statement>
(syntax macro definition>l(macro structure)
39 <syntax macro definition> ---~
(statement macro definition) I(function macro definition)
41 <statement macro definition>
smacro <macro structure> define (definition> endmacro
42 (function macro definition)
fmacro <macro structure> define (definition) endmacro
43 (primary>--~
(macro structure)
Notes
1. (macro structure) and (definition) can be any string of nonterminal

or terminal symbols. However, any nonterminal used in <definition> must
also appear in the corresponding <macro structure>.
2. <constant>, (identifier>, and (label> are lexical variables which have
not been defined here.
A.2 SYNTAX OF SNOBOL4 STATEMENTS 505
A.2. SYNTAX OF SNOBOL4 STATEMENTS
Here we shall define the syntactic structure of SNOBOL4 statements as

described by Griswold et al.t The syntactic description is in two parts. The
first part contains the context-free productions describing the syntax in terms
of lexical variables which are described in the second part using regular
definitions of Chapter 3. The division between the syntactic and lexical parts
is arbitrary here, and the syntactic description does not reflect the relative
precedence or associativity of the operators. All operators associate from
left-to-right except --1, !, and **. The precedence of the operators is as
follows:
1. & 2. I 3. <blanks>
4.@ 5.+-- 6.@
7./ 8., 9.%
10. !** 11. $. 12.--t ?
1 (statement>--~
(assignment statement> i (matching statement>!
(replacement statement>l(degenerate statement>l<end statement>
6 (assignment statement>
(optional label> <subject field> (equal> (object field> (goto field>
<eos>
7 (matching statement>
<optional label> (subject field> <pattern field> <goto field> (eos>
8 (replacement statement> --~
<optional label> <subject field> <pattern field> <equal> <object
field> (goto field> (eos>
9 (degenerate statement> ---~
(optional label> <subject field> (goto field> <eos> I
(optional label> (goto field> (eos>
11 (end statement> --~
END <eos> l END <blanks> <label> (eos> I
END <blanks> END (eos>
14 <optional label>
(label> [e
16 <subject field> --,
(blanks> (element>
tR. E. Griswold, J. F. Poage, and I. P. Polonsky, The SNOBOL4 Programming Lan-

guage (2nd ed.) (Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1971) pp. 198-199.
506 APPENDIX
17 (equal) ---~
(blanks) =
18 (object field)
(blanks) (expression)
19 (goto field)
(blanks) : (optional blanks)(basic goto)l e
21 (basic goto)----~
(goto) !S (goto) (optional blanks) (optional F goto) I
F (goto)(optional blanks)(optional S goto)
24 (goto)---~
((expression)) [ < (expression) >
26 (optional S goto) ---~
S (goto) Ie
28 (optional F goto) --~
F(goto)le
30 (eos)
(optional blanks~ ;i (optional blanks~ (eol~
32 (pattern field~
(blanks) (expression~
33 (element)
(optional unaries~ (basic element~
34 (optional unaries~---~
(operator) (optional unaries) 1e
36 ~basic element~
~identifier~ [~literal~ i (function call) I(reference~ I((expressionS)
41 (function call~ ---~
(identifier~ ((arg list~)
42 (reference~
(identifier) < (arg list) >
43 (arg list~ ---~
(arg list), (expression~ I(expression)
45 (expression~----~
(optional blanks~ (element) (optional blanks) [
(optional blanks~ (operation~ (optional blanks) 1
(optional blanks~
48 (optional blanks~
(blanks)! e
50 (operation~
(element) (binary) (element) l (element) (binary> (expression~
Regular Definitions for Lexical Syntax
(digit~ =
0111213141516171819
A.3 SYNTAXfOR PL360 507
(letter) =
AtBICI...IZ
(alphanumeric) =
(letter) l (digit)
(identifier) =
(letter) ((alphanumeric) i. I )*
(blanks) =
(blank character) +
(integer) =
(digit) +
(real) =
(integer). (integer) i (integer).
(operator) =
~l?l$1.ltt%l,I/l#1+l-l@llt&
(binary) =
(blanks 5 [(blanks) (operator) (blanks51 (blanks ~ ** (blanks)
(sliteral 5 =
' ((EBCDIC charactery -- ')* 't
(dliteral) =
" ( ( E B C D I C character) -- ")* "
(literal) =
(sliteral) [(dliteral) [(integer) I(real)
(label) =
(alphanumeric) ((EBCDIC character) -- ((blank character) 1;) )*
-- E N D
Lexical Variables
(blank character)
( E B C D I C character)
(eol~:l:
A.3. S Y N T A X FOR PL360
This section contains the syntactic description of PL360, a high-level

machine language devised by Nicklaus Wirth for the IBM 360 computers.
The syntactic description is the precedence grammar given by Wirth [1968].§
tThe minus sign is a metasymbol here and on the following lines.
:l:end of line.
§Niklaus Wirth, "PL360, a Programming Language for the 360 Computers" J. ACM
15, No. 1 (January, 1968), 37-74. Copyright © 1968, Association for Computing Ma-
chinery, Inc. The syntax is reprinted by permission of the Association for Computing
Machinery.
508 APPENDIX
1 (register)--~
(identifier)
2 (cell identifier) ----~
(identifier)
3 (procedure identifier)----~
(identifier)
4 (function identifier)
(identifier)
5 (cell)---~
(cell identifier) ](celll)) l (cell2))
8 (celll) --,
(cell2) (arith op) (number)! (cell3) (number)
lO (cell2)
(cell3) (register)
11 (cell3)
(cell identifier) (
12 (unary op)--~
abs [ neg t neg abs
15 (arith opt
+i-I,i/!++i--
21 (logical op) ---~
and iorlxor
24 (shift op)---~
shla i shra Ishll [shrl
28 (register assignment) --~
(register) := (cell) 1
(register) := (number)l
(register) := (string)[
(register) := (register) i
(register) "= (unary op) (cell) !
(register) := (unary op)(number)l
(register) := (unary op)(register) !
(register) := @ (cell)]
(register assignment) (arith op) (cell) l
(register assignment) (arith op) (number)[
(register assignment) (arith op) (register) !
(register assignment) (logical op) (cell) !
(register assignment) (logical op) (number)[
(register assignment) (logical op) (register) !
(register assignment) (shift op) (number) I
(register assignment) (shift op) (register)
A.3 SYNTAX FOR P L 3 6 0 509
44 (funcl) --~
(func2) (number) I
(func2) (register)[
(func2) (cell) [
(func2) (string)
48 (func2)--~
(function identifier) ) ](func 1),
50 (case sequence)----~
case (register) of begin I(case sequence) (statement) ;
52 (simple statement) ---~
(cell) := (register) I(register assignment) Inull [goto (identifier) [
(procedure identifier) ](function identifier) !(func 1) ( ]
(case sequence) end I(blockbody) end
61 (relation) --~
<1=1>1<=1>=[~=
67 (not)---~
--1
68 (condition)
(register) (relation) (cell) I
(register) (relation) (number)[
(register) (relation) (register) [
(register) (relation) (string) !
overflow 1(relation) I
(cell) [(not) (cell)
76 (compound condition)
(condition) ](comp aor) (condition)
78 (comp aor)--~
(compound condition) and 1(compound condition) or
80 (cond then.)----~
(compound condition) then
81 (true part) ---~
(simple statement) else
82 (while)--~
while
83 (cond do)---~
(compound condition) do
84 (assignment step) ---~
(register assignment) step (number)
85 (limit)
until (register) Iuntil (cell) Iuntil (number)
88 (do)
do
89 (statement*)
510 APPENDIX
(simple statement)[
if (cond then) (statement*) I
if (cond then) (true part) (statement*) I
(while) (cond do) (statement*) l
for (assignment step) (limit) (do) (statement*)
94 (statement)----~
(statement*)
95 (simple type)
short integer [ integer ! logical treal [ long real Ibyte [ character
102 (type) ---~
(simple type) !array (number) (simple type)
104 (decll)--~
(type) (identifier) I(decl2) (identifier)
106 (decl2)
(decl7),
107 (decl3) ---~
(decll~ =
108 (decl4) --~
(decl3) (I (decl5),
110 (decl5~ ---o
(decl4) (number) I(decl4) (string)
112 (decl6) ---~
(decl3)
113 (dee17) --~
(decll) I
(decl6) (number) [
(decl6) (string) I
(decl5))
117 ~function declarationl)
function I(function declaration7)
119 (function declaration2) ---~
(function declarationl) (identifier)
120 (function declaration3)---~
(function declaration2) (
121 (function declaration4) ---~
(function declaration3) (number)
122 (function declaration5)
(function declaration4),
123 (function declaration6)
(function declaration5) (number)
124 (function declaration7) --~
(function declaration6) )
125 (synonymous dcl) ----~
A.3 SYNTAXFORPL360 511
(type) (identifier) syn[

(simple type) register (identifier) synl
(synonymous dc3) (identifier) syn
128 (synonymous dc2) ----~
(synonymous de l) (cell) I
(synonymous dcl) (number) 1
(synonymous dcl) (register)
131 (synonymous dc3) --~
(synonymous dc2),
132 (segment head) ----~
segment
133 (procedure headingl) --~
procedure [(segment head) procedure
135 (procedure heading2) --~
(procedure headingl) (identifier)
136 (procedure heading3) ---~
(procedure heading2) (
137 (procedure heading4) ---~
(procedure heading3) (register)
138 (procedure heading5)
(procedure heading4) )
139 (procedure heading@ --~
(procedure heading5) ;
140 (declaration) --~
(decl7) I(function declaration7) l (synonymous dc2) l
(procedure heading@ (statement*) I(segment head) base (register)
145 (label definition)
(identifier):
i 46 (blockhead) ---~
begin I(blockhead) (declaration) ;
148 (blockbody)
(blockhead) 1
(blockbody) (statement) ;1
(blockbody) (label definition)
151 (program) ---~
. (statement).
Lexical Variables
(identifier)
(string)
(number)
512 APPENDIX
A.4. A S Y N T A X - D I R E C T E D TRANSLATION
S C H E M E FOR PAL
Here we provide a simple syntax-directed translation scheme for PAL, a

programming language devised by J. Wozencraft and A. Evanst embodying
lambda calculus and assignment statements. PAL is an acronym for
Pedagogic Algorithmic Language.
The simple SDTS presented here maps programs in a slightly modified
version of PAL into a postfix Polish notation. This SDTS is taken from
DeRemer [1969].$ The SDTS is presented in two parts. The first part is
the underlying context-free grammar, a simple LR(1) grammar. The second
part defines the semantic rule associated with each production of the underly-
ing grammar.
We also provide a regular definition description of the lexical variables
< relational functor > , < variable > and < constant > used in the SDTS.
I (program)--~
(definition list)I <expressio n)
3 (definition list)
def (definition) (definition list) ]def (definition)
5 (expression)
let (definition) in (expression)]
fn (by p a r t ) . (expression)]
(where expression)
8 (where expression)
(valof expression) where (rec definition) ](valof expression)
10 (valof expression)
valof (command) [(command)
12 (command)--~
(labeled command) ; (command) ](labeled command)
14 (labeled command)
(variable) : (labeled command)[(conditional command)
16 (conditional command) --~
test (boolean) ifso (labeled command) ifnot (labeled command)[
test (boolean) ifnot (labeled command) ifso (labeled command)[
)A complete description of PAL is given in: John M. Wozencraft and Arthur Evans,
Jr., Notes on Programming Linguistics, Department of Electrical Engineering, Massa-
chusetts Institute of Technology, Cambridge, Mass., July 1969. The syntax is reprinted
by permission of the authors.
~F. L. DeRemer, Practical Translators Jbr LR(k) Languages, Ph.D. Thesis, M.I.T.,
Cambridge, Mass., 1969, by permission of the author.
A.4 A SYNTAX-DIRECTED TRANSLATION SCHEME FOR PAL 513
if (boolean) do (labeled command) l

unless (boolean) do (labeled command) I
while (boolean) do (labeled command) 1
until (boolean) do (labeled command) I
(basic command)
23 (basic command) ---~
(tuNe> := (tuple) Igoto (combination) [
res (tuple) I(tuple)
27 (tuple)
(T1)I(T1), (tuple)
29 (T1) ---~
(T1) aug (conditional expression) I(conditional expression)
31 (conditional expression)
(boolean) --~ (conditional expression) I @onditional expression) [
<T2>
33 (T2)
$ (combination) !(boolean)
35 (boolean)--~
(boolean) or @onjunction) I@onjunction)
37 (conjunction)
(conjunction) & (negation) l (negation)
39 (negation)
not (relation) I(relation)
41 (relation) --~
(arithmetic expression)(relational functor)
(arithmetic expression) [
43 (arithmetic expression)--~
(arithmetic expression) + (term)l
(arithmetic expression) -- (term)!
+ ( t e r m ) l - (term)[ (term)
48 (term)--~
(term) • (factor) i (term) / (factor) [(factor)
51 (factor) --~
(primary) ** (factor) I(primary)
53 (primary) --~
(primary) 7o (variable) (combination) I(combination)
55 (combination)
(combination) (rand) [(rand)
57 (rand)
(variable) 1@onstant) I((expression)) I[(expression)]
61 (definition) --~
514 APPENDIX
(inwhich definition) within (definition) l

(inwhich definition)
63 (inwhich definition)
(inwhich definition) inwhich (simultaneous definition) I
(simultaneous definition)
65 (simultaneous definition)---~
(rec definition) and (simultaneous definition) I
(rec definition)
67 (rec definition)
tee (basic definition) I(basic definition)
69 (basic definition)
(variable list)= (expression)[
(variable) (bv part) = (expression) I
((definition)) I[(definition)]
73 (bv part)----~
(bv part) (basic bv) 1(basic bv)
75 (basic bv)
(variable) I((variable list)) i ( )
78 (variable list)
(variable), (variable list) [(variable)
Rules Corresponding to the High-Level
Productions
1 (program)=
(definition list) I(expression)
3 (definition list) =
(definition) (definition list) defl (definition) lastdef
5 (expression)=
(definition) (expression) let l
(bv part) (expression) lambda I
(where expression)
8 (where expression) =
(valof expression) (rec definition) where I
(valof expression)
10 (valof expression) =
(command) valof !(command)
12 (command)=
(labeled command)(command); I(labeled command)
14 (labeled command) =
(variable) (labeled command): I(conditional command)
16 (conditional command) =
(boolean) (labeled command) (labeled command) test-true I
(boolean) (labeled command) (labeled command) test-false I
A.4 A SYNTAX-DIRECTED TRANSLATION SCHEME FOR PAL 51 5
(boolean) (labeled command) ift

(boolean) (labeled command) unless I
(boolean~ (labeled command~ while I
(boolean~ (labeled command) until I
(basic command~
23 (basic command~ =
(tuple) ~tuple) := l (combination~ goto[
(tuple) res I(tuple~
27 (tuple)=
(T1) I(T1) (tuple),
29 (TI~ =
(T1 ~ (conditional expression) aug I(conditional expression~
31 (conditional expression) =
(boolea!3 ~ (conditional expression~
(conditional expression) test-true I
(T2)
33 ( T 2 ) =
(combination> $ I(boolean)
35 (boolean)=
(boolean) (conjuction> or i (conjunction~
37 (conjunction)=
(conjunction) (negation) & l (negation)
39 (negatio@ =
(relation~ not I(relation)
41 (relation)=
(arithmetic expression) (arithmetic expression~ .....
(relational functor~
43 (arithmetic expression~ =
(arithmetic expression~ (term~ ÷[
(arithmetic expression)(term) --[
(term~ pos !(term~ neg [(term)
48 ( t e r m ) =
(term) (factory • [(term) (factor) / [(factor)
51 (factor)=
(primary) (factory exp 1(primary)
53 (primary)=
(primary) (variable~ (combination) % I(combination)
55 (combination)=
(combination~ (rand) gamma l (randy
57 (randy =
(variable~ 1(constant) [(expression) I(expression)
61 (definition)=
51 6 APPENDIX
<inwhich definition) <definition) within I

<inwhich definition)
63 <inwhich definition> =
<inwhich definition> <simultaneous definition> inwhieh ]
<simultaneous definition>
65 <simultaneous definition> =
<rec definition> <simultaneous definition) and[
<rec definition)
67 <rec definition) =
(basic definition) rec I<basic definition)
69 (basic definition) =
(variable list) (expression) = I
(variable) <bv part) (expression) ffl
<definition> !<definition>
73 <bv part) =
<bv part) (basic bv> !<basic bv)
75 (basic b y ) =
<variable> I<variable list>l ( )
78 (variable l i s t ) =
(variable) (variable list) vl I<variable)
Regular Definitions
<uppercase letter) =
AIBIC1..'IZ
<lowercase letter> =
albtcl"'lz
<digit> =
0i1121...19
<letter> =
<uppercase letter)! <lowercase letter)
<alphanumeric> =
<letter) I<digit>
<truthvalue> =
true Ifalse
<variable head> =
<digit) + (<letter) l _ ) l
<lowercase letter> + (<uppercase letter> ]<digit> ! _ ) I
<uppercase letter) l _
<variable> =
<lowercase letter> I<variable head> (<alphanumeric>l_)*
<integer> =
<digit) +
A.4 A SYNTAX-DIRECTED TRANSLATION SCHEME FOR PAL 517
<real> =
(digit> + . (digit> +
(quotation element> =
(any character other than • or '>1
,nl, tl, bl*sl**l*'l*kl*r
(quotation> =
' ( q u o t a t i o n element>*'
(constant> =
<integer> t <real> I <quotation> I<truthvalue> I e
(relational functor> =
gr [ge[eqlnellslle
7 ~
BIBLIOGRAPHY
Ano, A. V. [1968]. Indexed grammars~an extension of context-free grammars.

Jr. ,4CM 15: 4, 647-671.
AHo, A.V., and J.D. ULLMAN [1969a]. Syntax directed translations and the
pushdown assembler. J. Computer and System Sciences 3: 1, 37-56.
Ano, A. V., and J. D. ULLMAN [1969b]. Properties of syntax directed translations.
J. Computer and System Sciences 3:3, 319-334.
AHo, A. V., and J. D. ULLMAN [1971]. The care and feeding of LR(k) grammars.
Proe. of 3rd A CM Conf. on Theory of Computing, 159-170.
Ano, A. V., P. J. DENNINC, and J. D. ULLMAN [1972]. Weak and mixed strategy
precedence parsing. J. A C M 19" 2, 225-243.
AIJo, A. V., J. E. HOPCROFT, and J. D. ULLMAN [1968]. Time and tape complexity
of pushdown automaton languages. Information and Control 13 : 3, 186-206.
ANS X3.9 [1966]. American National Standards FORTRAN. American National
Standards Institute, New York.
ANSI SUBCOMMITTEEX3J3 [1971]. Clarification of FORTRAN Standards-Second
Report. Comm. ACM 14: 10, 628-642.
ARBIB, M.A. [1970]. Theories of Abstract Automata. Prentice-Hall, Inc., Engle-
wood Cliffs, N.J.
BACKerS, J.W., et al. [1957]. The FORTRAN automatic coding system. Proc.
Western Joint Computer Conference 11, 188-198.
BAR-HILLEL, Y. [1964]. Language and Information. Addison-Wesley, Reading, Mass.
BAR-HILLEL Y., M. PERLES, and E. SHAMIR[1961]. On formal properties of simple
phrase structure grammars. Z. Phonetik, Sprachwissenschaft und Kommunika-
tionsforschung 14, 143-172. Also in Bar-Hillel [1964], pp. 116-150.
BARNETT, M.P., and R.P. FUTRELLE [1962]. Syntactic analysis by digital com-
puter. Comm. A CM 5:10, 515-526.
519
520 BIBLIOGRAPHY
BAUER, H., S. BECKER, and S. L. GRAHAM[1968]. ALGOL Wlmplementation. CS98,

Computer Science Department, Stanford Univ., Stanford, Calif.
BERGE, C. [1958]. The Theory of Graphs and Its Applications. Wiley, New York.
BIRMAN, A., and J. D. ULLMAN [1970]. Parsing algorithms with backtrack. IEEE
Conf. Record of l lth Annual Symposium on Switching and Automata Theory,
pp. 153-174.
BLATTNER,M. [ 1972]. The Unsolvability of the Equality Problem for Sententiat Forms
of Context-free Languages, Unpublished memorandum, U CLA, Los Angeles,
Calif.
BOBROW, D.G. [1963]. Syntactic analysis of English by computer--a survey.
Proc. AFIPS Fall Joint Computer Conference, 24. Spartan, New York, pp.
365-387.
BOOK, R.V. [1970]. Problems in formal language theory. Proc. Fourth Annual
Princeton Conference on Information Sciences and Systems, pp. 253-256.
BOOTH, T. L. [1967]. Sequential Machines and Automata Theory. Wiley, New York.
BORODIN, A. [1970]. Computational complexity--a survey. Proc. Fourth Annual
Princeton Conference on Information Sciences and Systems, pp. 257-262.
BRArFORT, P., and D. HIRSCHBERG (eds.) [1963]. Computer Programming and
Formal Systems. North-Holland, Amsterdam.
BROOKER, R. A., and D. MORRIS [1963]. The compiler-compiler. Annual Review
in Automatic Programming, 3. Pergamon, Elmsford, N.Y., pp. 229-275.
BRZOZOWSKI, J. A. [1962]. A survey of regular expressions and their applications.
1RE Trans. on Electronic Computers 11:3, 324-335.
BRZOZOWSKI, J. A. [1964]. Derivatives of regular expressions. J. A CM 11:4,
481-494.
CANTOR, D. G. [1962]. On the ambiguity problem of Backus systems. J. A CM
9: 4, 477-479.
CHEATHAM, T. E. [1965]. The TGS-II translator-generator system. Proc. IFIP
Congress 65. Spartan, New York, pp. 529-593.
CHEATHAM, T. E. [1966]. The introduction of definitional facilities into higher level
programming languages. Proc. AFIPS Fall Joint Computer Conference, 30.
Spartan, New York, pp. 623-637.
CHEATHAM, T.E. [1967]. The Theory and Construction of Compilers (2nd ed.).
Computer Associates, Inc., Wakefield, Mass.
CHEATHAM, T. E., and K. SATTLEY[1964]. Syntax directed compiling. Proc. AFIPS
Spring Joint Computer Conference, 25. Spartan, New York, pp. 31-57.
CHEATHAM, T.E., and T. STANDISH [1970]. Optimization aspects of compiler-
compilers. A CM SIGPLAN Notices 5 : 10, 10-17.
CHOMSKY, N. [1956]. Three models for the description of language. IEEE Trans.
on Information Theory, 2: 3, 113-124.
BIBLIOGRAPHY 521
CHOMSKY,N. [1957]. Syntactic Structures. Mouton and Co., The Hague.

CHOMSKY,N. [1959a]. On certain formal properties of grammars. Information and
Control 2: 2, 137-167.
CHOMSKY, N. [1959b]. A note on phrase structure grammars. Information and
Control 2:4, 393-395.
CHOMSKY, N. [1962]. Context-free grammars and pushdown storage. Quarterly
.Progress Report, No. 65, Research Laboratory of Electronics, Massachusetts
Institute of Technology, Cambridge, Mass.
CHOMSKY,N. [1963]. Formal properties of grammars. In Handbook of Mathematical
Psychology, 2 (R. D. Luce, R.R. Bush, and E. Galanter, eds.). Wiley, New
York.
CHOMSKY, N. [1965]. Aspects of the Theory of Syntax. M.I.T. Press, Cambridge,
Mass.
CHOMSKY, N., and G. A. MILLER [1958]. Finite state languages. Information and
Control 1:2, 91-112.
CHOMSKY,N., and M. P. SCHUTZENBERGER[1963]. The algebraic theory of context-
free languages. In Braffort and Hirschberg [1963], pp. 118-161.
CHRISTENSEN, C., and J. C. SHAW (eds.) [1969]. Proc. of the extensible languages
symposium. A CM SIGPLAN Notices 4'8.
CHURCH, A. [1941]. The Calculi of Lambda Conversion. Annals of Mathematics
Studies 6. Princeton University Press, Princeton, N.J.
CHURCH, A. [1965]. Introduction to Mathematical Logic. Princeton University
Press, Princeton, N.J.
COCKE, J., and J. T. SCHWARTZ [1970]. Programming Languages and Their Com-
pilers. Courant Institute of Mathematical Sciences, New York University,
New York.
COHEN, R. S., and K. CULIK II. [1971]. LR-Regular Grammars--an Extension of
LR(k) Grammars. IEEE Conf. Record of 12th Annual Symposium on Switching
and Automata Theory, pp. 153-165.
COHEN, D.J., and C. C. GOTLIEB [1970]. A list structure form of grammars for
syntactic analysis. Computing Surveys 2" 1, 65-82.
COLMERAUER,A. [1970]. Total precedence relations. J. ACM 17: 1, 14--30.
CONWAY, M. E. [1963]. Design of a separable transition-diagram compiler. Comm.
ACM 6: 7, 396--408.
CONWAY, R.W., and W.L. MAXWELL [1963]. CORC: the Cornell computing
language. Comm. ACM 6: 6, 317-321.
CONWAY, R. W., and W. L. MAXWELL[1968]. CUPLmAn Approach to Introductory
Computing Instruction. TR No. 68-4, Dept. of Computer Science, ComeU
Univ., Ithaca, N.Y.
CONWAY, R. W. et al. [1970]. PL/C. A High Performance Subset of PL/I. TR70-55,
Dept. of Computer Science, Comell Univ., Ithaca, N.Y.
522 BIBLIOGRAPHY
COOK, S.A. [1971]. Linear time simulation of deterministic two-way pushdown

automata. Proc. 1FIP Congress 71, TA-2. North-Holland Publishing Co.,
Netherlands, pp. 174-179.
COOK, S. A., and S. D. AANDERAA[1969]. On the minimum computation time of
functions. Trans. American Math. Soc. 142, 291-314.
CULm, K. II [1968]. Contribution to deterministic top-down analysis of context-
free languages. Kybernetika 4:5, 422---431.
DAVIS, M. [1958]. Computability and Unsolvability. McGraw-Hill, New York.
DAvm, M. (ed.) [1965]. The Undecidable. Basic papers in undecidable propositions,
unsolvable problems and computable functions. Raven Press, New York.
DEREMER, F. L. [1969]. Practical translators for LR(k) languages. Ph.D. Thesis,
Massachusetts Institute of Technology, Cambridge, Mass.
DEREMER, F. L. [1971]. Simple LR(k) grammars. Comm. A C M 14: 7, 453-460.
DEWAR, R. B. K., R. R. HOCHSPRUNG, and W. S. WORLEY [1969]. The IITRAN
programming language. Comm. A C M 12" 10, 569-575.
FARLEY, J.[1968]. An efficient context-free parsing algorithm. Ph.D. Thesis,
Carnegie-Mellon Univ., Pittsburgh, Pa. Also see Comm. A CM 13:2, (February,
1970) 94-102.
EICKEL, J., M. PAUL, F. L. BAUER, and K. SAMELSON[1963]. A syntax-controlled
generator of formal language processors. Comm. A C M 6: 8, 451-455.
ELSPAS, B., M. W. GREEN, and K. N. LEVITT[1971]. Software reliability. Computer
1, 21-27.
ENGELER, E. (ed.) [1971]. Symposium on Semantics of Algorithmic Languages.
Lecture Notes in Mathematics, Springer, Berlin.
EVANS, A., Jr. [1964]. An ALGOL 60 compiler. Annual Review in Automatic
Programming, 4. Pergamon, Elmsford, N.Y., pp. 87-124.
EvEY, R.J. [1963]. Applications of pushdown-store machines. Proc. AFIPS Fall
Joint Computer Conference, 24, Spartan, New York, pp. 215-227.
FELDMAN,J. A. [1966]. A formal semantics for computer languages and its applica-
tion in a compiler-compiler. Comm. A CM 9:1, 3-9.
FELDMAN, J., and D. GRIES [1968]. Translator writing systems. Comm. A C M
11:2, 77-113.
FISCHER, M. J. [1968]. Grammars with macro-like productions. IEEE Conf. Record
of 9th Annual Symposium on Switching and Automata Theory, pp. 131-142.
FISCHER, M.J. [1969]. Some properties of precedence languages. Proc. A CM
Symposium on Theory of Computing, pp. 181-190.
FLOYD, R.W. [1961]. A descriptive language for symbol manipulation. J. A CM
8: 4, 579-584.
FLOYD, R. W. [1962a]. Algorithm 97: shortest path. Comm. A C M 5:6, 345.
FLOYD, R. W. [1962b]. On ambiguity in phrase structure languages. Comm. A C M
5:10, 526-534.
BIBLIOGRAPHY 523
FLOYD, R. W. [1963]. Syntactic analysis and operator precedence. J. A C M 10: 3,

316-333.
FLOYD, R.W. [1964a]. Bounded context syntactic analysis. Comm. A C M 7:2,
62-67.
FLOYD, R.W. [1964b]. The syntax of programming languages--a survey. IEEE
Trans. on Electronic Computers 13:4, 346-353.
FLOYD, R. W. [1967a]. Assigning meanings to programs. In Schwartz [1967], pp.
19-32.
FLOYD, R. W. [1967b]. Nondeterministic algorithms. J. A CM 14" 4, 636-644.
FREEMAN, D. N. [1964]. Error correction in CORC, the Cornell computing language.
Proc. AFIPS Fall Joint Computer Conference, 26. Spartan, New York, pp.
15-34.
GALLER, B. A., and A.J. PERLIS [i967]. A proposal for definitions in ALGOL.
Comm. A C M 10: 4, 204-219.
GARWICK, J.V. [1964]. GARGOYLE, a language for compiler writing. Comm.
A C M 7: 1, 16-20.
GENTLEMAN, W. M. [1971]. A portable coroutine system. Proc. 1FIP Congress 71,
TA-3, North Holland Publishing Co., Netherlands, pp. 94-98.
GILL, A. [1962]. Introduction to the Theory of Finite State Machines. McGraw-Hill,
New York.
GINSBURG, S. [1962]. An Introduction to Mathematical Machine Theory. Addison-
Wesley, Reading, Mass.
GINSBURG, S. [1966]. The Mathematical Theory of Context-Free Languages.
McGraw-Hill, New York.
GINSBURG, S., and S. GREIBACH [1966]. Deterministic context-free languages.
Information and Control 9:6, 620-648.
GINSBURG, S., and S. GREIBACH [1969]. Abstract families of languages. Memoir
Amer. Math. Soc. No. 87.
GINSBURG, S., and H. G. RICE [1962]. Two families of languages related to ALGOL.
J. A C M 9: 3, 350-371.
GINZBURG, A. [1968]. Algebraic Theory of Automata. Academic Press, New York.
GRAHAM, R.M. [1964]. Bounded context translation. Proc. AFIPS Spring Joint
Computer Conference, 25, Spartan, New York, pp. 17-29.
GRAHAM, S.L. [1970]. Extended precedence languages, bounded right context
languages and deterministic languages. IEEE Conf. Record of I lth Annual
Symposium on Switching and Automata Theory, pp. 175-180.
GRAU, A. A., U. HILL, and H. LANGMAACK [1967]. Translation of ALGOL 60.
Springer, Berlin.
GRAY, J. N. [1969]. Precedence parsers for programming languages. Ph.D. Thesis,
Univ. of California, Berkeley.
524 BIBLIOGRAPHY
GRAY, J. N., and M. A. HARRISON [1969]. Single pass precedence analysis. IEEE
Conf. Record of lOth Annual Symposium on Switching and Automata Theory,
pp. 106-117.
GRAY, J. N., M.A. HARRISON, and O. ]BARRA [1967]. Two way pushdown auto-
mata. Information and Control I 1:1, 30-70.
GREIBACH, S. A. [1965]. A new normal form theorem for context-free phrase struc-
ture grammars. J. A C M 12: 1, 42-52.
GREIBACH, S., and J. HOPCROFT [1969]. Scattered context grammars. J. Computer
and System Sciences 3: 3, 233-247.
GRIES, O. [1971]. Compiler Construction for Digital Computers. Wiley, New York.
GRIFFITHS, T. V. [1968]. The unsolvability of the equivalence problem for A-free
nondeterministic generalized machines. J. A C M 15" 3, 409-413.
GRIFFITHS, T. V., and S. R. PETRICK [1965]. On the relative efficiencies of context-
free grammar recognizers. Comm. A C M 8:5, 289-300.
GRISWOLD. R. E., J. F. POAGE, and I. P. POLONSKY [1971]. The SNOBOL 4 Pro-
gramming Language (2nd ed.)Prentice-Hall, Inc., Englewood Cliffs, N. J.
GROSS, M., and A. LENTIN [1970]. Introduction to Formal Grammars. Springer,
Berlin.
HAINES, L.H. [1970]. Representation Theorems for Context-Sensitive Languages.
Department of Electrical Engineering and Computer Sciences, Univ. of
California, Berkeley.
HALMOS, P. R. [1960]. Naive Set Theory. Van Nostrand Reinhold, New York.
HALMOS, P.R. [1963]. Lectures on Boolean Algebras. Van Nostrand Reinhold,
New York.
HARARY, F. [1969]. Graph Theory. Addison-Wesley, Reading, Mass.
HARRISON, M. A. [1965]. Introduction to Switching and Automata Theory. McGraw-
Hill, New York.
HARTMANIS, J., and J. E. HOPCROFT [1970]. An overview of the theory of compu-
tational complexity. J. A C M 18" 3, 444-475.
HARTMANIS,J., P. M. LEWISII, and R. E. STEARNS[1965]. Classifications of com-
putations by time and memory requirements. Proc. IFIP Congress. 65. Spartan,
New York, pp. 31-35.
HAYS, D. G. [1967]. Introduction to Computational Linguistics. American Elsevier,
New York.
HEXT, J. B., and P. S. ROBERTS [1970]. Syntax analysis by Domolki's algorithm.
Computer J. 13: 3, 263-271.
HOPCROFT, J. E. [1971]. An n log n Algorithm for Minimizing States in a Finite
Automaton. CS71-190, Computer Science Department, Stanford Univ.,
Stanford, Calif. Also in, Theory of Machines and Computations (Z. Kohavi
and A. Paz, eds.) Academic Press, New York, 1972, pp. 189-196.
BIBLIOGRAPHY 525
HOPCROFT, J. E., and J. D. ULLMAN [1967]. An approach to a unified theory of

automata. Bell System Tech. J. 46: 8, 1763-1829,
HOPCROFT, J. E., and J. D. ULLMAN [1969]. Formal Languages and Their Relation
to Automata. Addison-Wesley, Reading, Mass.
HOPGOOD, F. R. A. [1969]. Compiling Techniques. American Elsevier, New York.
HUFFMAN, D.A. [1954]. The synthesis of sequential switching circuits. J. of the
Franklin Institute 257, 3--4, i61, 190, and 275-303.
ICHBIAH, J. D., and S. P. MORSE[1970]. A technique for generating almost optimal
Floyd-Evans productions for precedence grammars. Comm. A C M 13:8,
501-508.
INGERMAN, P. Z. [1966]. A Syntax Oriented Translator. Academic Press, New York.
IRLAND, M. I., and P. C. FISCHER [1970]. A Bibliography on Computational Com-
plexity. CSRR 2028, Dept. of Applied Analysis and Computer Science, Univ.
of Waterloo, Waterloo, Ontario.
IRONS, E.T. [1961]. A syntax directed compiler for ALGOL 60. Comm. A C M
4: 1, 51-55.
IRONS, E.T. [1963a]. An error correcting parse algorithm. Comm. A C M 6: 11,
669-673.
IRons, E. T. [1963b]. The structure and use of the syntax directed compiler. Annual
Review in Automatic Programming, 3. pergamon, Elmsford, N.Y., pp. 207-227.
IRONS, E.T. [1964]. Structural connections in formal languages. Comm. A C M
7:2, 62-67.
JOHNSON, W. L., J.H. PORTER, S. I. ACKLEY, and D. T. Ross [1968]. Automatic
generation of efficient lexical processors using finite state techniques. Comm.
A C M 11:12, 805-813.
KAMEDA, T., and P. WEINER [1968]. On the reduction of nondeterministic auto-
mata. Proc. Second Annual Princeton Conference on Information Sciences and
Systems, pp. 348-352.
KASAMI, T. [1965]. An efficient recognition and syntax analysis algorithm for
context-free languages. Sci. Rep. AFCRL-65-758, Air Force Cambridge
Research Laboratory, Bedford, mass.
KASAMI, T., and K. TORn [1969]. A syntax analysis procedure for unambiguous
context-free grammars. J. A CM 16: 3, 423--431.
KLEENE, S. C. [1952]. Introduction to Metamathematics. Van Nostrand Reinhold,
New York.
KLEENE, S.C. [1956]. Representation of events in nerve nets. In Shannon and
McCarthy [1956], pp. 3--40.
KNUTH, D. E. [1965]. On the translation of languages from left to right. Information
and Control 8: 6, 607-639.
526 BIBLIOGRAPHY
KNUTH, D. E. [1967]. Top-down syntax analysis. Lecture Notes. International

Summer School on Computer Programming, Copenhagen, Denmark.
KNUTH, D.E. [1968]. The Art of Computer Programming. Vol. I: Fundamental
Algorithms. Addison-Wesley, Reading, Mass.
KORENJAK, A.J. [1969]. A practical method for constructing LR(k) processors.
Comm. A C M 12:11,613-623.
KORENJAK, A.J., and J.E. HOPCROFT [1966]. Simple deterministic languages.
IEEE Conf. Record of 7th Annual Symposium on Switching and Automata
Theory, pp. 36-46.
KOSARAJU, S. R. [1970]. Finite state automata with markers. Proc. Fourth Annual
Princeton Conference on Information Sciences and Systems, p. 380.
Kuyo, S., and A. G. OETTINGER [1962]. Multiple-path syntactic analyzer. Infor-
mation Processing, 62 (IFIP Cong.) (Popplewell, ed.). North-Holland, Amster-
dam, pp. 306-311.
KURKI-SUONIO, R. [1969]. Note on top down languages. BIT 9, 225-238.
LAFRANCE, J. [1970]. Optimization of error recovery in syntax directed parsing
algorithms. A C M SIGPLAN Notices 5:12, 2-17.
LALONDE, W. R., E. S. LEE and J. J. HORNING [1971]. An LALR(k) parser genera-
tor, Proc. IFIP Congress 71, TA-3. North Holland Publishing Co., Nether-
lands, pp. 153-157.
LEAVENWORTH, B.M. [1966]. Syntax macros and extended translation. Comm.
A CM 9:11,790-793.
LEE, J. A. N. [1967]. Anatomy of a Compiler. Reinhold, New York.
LEINIUS, R. P. [1970]. Error detection and recovery for syntax directed compiler
systems. Ph.D. Thesis, Univ. of Wisconsin, Madison.
LEWIS, P. M. II, and D. J. ROSENKRANTZ [1971]. An ALGOL compiler designed
using automata theory. Proc. Polytechnic Institute of Brooklyn Symposium on
Computers and Automata.
LEWIS, P. M. II, and R. E. STEARNS[1968]. Syntax directed transduction. J. A C M
15:3, 464-488.
LOECKX, J. [1970]. An algorithm for the construction of bounded-context parsers.
Comm. A CM 13:5, 297-307.
LucAs, P., and K. WALK [1969]. On the formal description of PL/L Annual Review
in Automatic Programming 6" 3. Pergamon, pp. 105-182.
MARKOV, A. A. [1951]. The theory of algorithms (Russian), Trudi Mathemati-
cheskova Instituta imeni V. A. Steklova 38 pp. 176-189. (English translation,
1961, National Science Foundation, Washington, D.C.)
MCCARTHY, J. [1963]. A basis for the mathematical theory of computation. In
Braffort and Hirschberg [1963], pp. 33-71.
BIBLIOGRAPHY 527
MCCARTHY, J., and J. A. PAINTER [1967]. Correctness of a compiler for arithmetic

expressions. In Schwartz [1967], pp. 33-41.
MCCLURE, R. M. [1965]. T M G m a syntax directed compiler. Proc. A C M National
Conference, 20, pp. 262-274.
MCCULLOUGH W. S., and E. PITTS [1943]. A logical calculus of the ideas imma-
nent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.
MCILROY, M.D. [1960]. Macro instruction extensions of compiler languages.
Comm. A CM 3 : 4, 414-220.
MCILROY, M. D. [1968]. Coroutines. Unpublished memorandum.
MCKEEMAN, W.M. [1966]. An Approach to Computer Language Design. CS48,
Computer Science Department, Stanford Univ., Stanford, Calif.
MCKEEMAN, W.M., J.J. HORNING, and D.B. WORTMAN [1970]. A Compiler
Generator. Prentice-Hall, Inc., Englewood Cliffs, N.J.
MCNAUGHTON, R., and H. YAMADA [1960]. Regular expressions and state graphs
for automata. IRE Trans. on Electronic Computers 9:1, 39-47. Reprinted in
Moore [1964], pp. 157-174.
MENDELSON, E. [1968]. Introduction to Mathematical Logic. Van Nostrand Rein-
hold, New York.
MILLER, W. F., and A. C. SHAW [1968]. Linguistic methods in picture processing--
a survey. Proc. AFIPS Fall Joint Computer Conference, 33. The Thompson
Book Co., Washington, D.C., pp. 279-290.
MINSKY, M. [1967]. Computation: Finite and Infinite Machines. Prentice-Hall, Inc.,
Englewoods Cliffs, N.J.
MONTANARI, U. G. [1970]. Separable graphs, planar graphs and web grammars.
MOORE, E. F. [1956]. Gedanken experiments on sequential machines. In Shannon
and McCarthy [1956], pp. 129-153.
MOORE, E.F. [1964]. Sequential Machines: Selected Papers. Addison-Wesley,
Reading, Mass.
MORGAN, H.L. [1970]. Spelling correction in systems programs. Comm. A C M
13:2, 90-93.
MOULTON, P. G., and M. E. MULLER [1967]. A compiler emphasizing diagnostics.
Comm. A C M 10:1, 45-52.
MUNRO, I. [1971]. Efficient Determination of the Transitive Closure of a Directed
Graph. Information Processing Letters 1:2, 56-58.
NAUR, P. (ed.) [1963]. Revised report on the algorithmic language ALGOL 60.
Comm. A C M 6: 1, 1-17.
OETTINGER, A. [1961]. Automatic syntactic analysis and the pushdown store.
In Structure of Language and its Mathematical Concepts, Proc. 12th Symposium
528 BIBLIOGRAPHY
on Applied Mathematics. American Mathematical Society, Providence, R.I.,

pp. 104-129.
OGDEN, W. [1968]. A helpful result for proving inherent ambiguity. Mathematical
Systems Theory 2: 3, 191-194.
ORE, O. [1962]. Theory of Graphs. Amer. Math. Soc. Colloquium Publications, 38.
PAINTER, J.A. [1970]. Effectiveness of an optimizing compiler for arithmetic
expressions. ACM SIGPLAN Notices, 5" 7, 101-126.
PAre, C. [1964]. Trees, pushdown stores and compilation. RFTl--Chifffres 7: 3,
199-216.
PARIKH, R. J. [1966]. On context-free languages. J. ACM 13:4, 570-581.
PAUL, M. [1962]. A general processor for certain formal languages. Proc. 1CC
Symposium Symb. Lang. Data Processing. Gordon & Breach, New York,
pp. 65-74.
PAULL, M.C., and S.H. UNGER [1968]. Structural equivalence of context-free
grammars. J. Computer and System Sciences 2:1,427-463.
PAVLIDIS,T. [1972]. Linear and context-free graph grammars. J. A CM 19: 1, 11-23.
PFALTZ, J. L., and A. ROSENFELD[1969]. Web grammars. Proc. International Joint
Conf. on Artificial Intelligence, Washington, D.C., pp. 609-619.
POST, E.L. [1943]. Formal reductions of the general combinatorial decision
problem, American Journal of Mathematics 65, 197-215.
POST E. L. [1947]. Recursive unsolvability of a problem of Thue. J. of Symbolic
Logic i2, 1-11. Reprinted in Davis [1965], pp. 292-303.
POST, E.L. [1965]. Absolutely unsolvable problems and relatively undecidable
propositions. In Davis [1965], pp, 340-433.
PRATHER, R. E. [1969]. Minimal solutions of Paull-Unger problems. Mathematical
Systems Theory 3:1, 76-85.
RABIN, M. O. [1967]. Mathematical theory of automata. In Schwartz [1967], pp.
173-175.
RABIN, M. O., and D. SCOTT[1959]. Finite automata and their decision problems.
1BM J. of Research and Development 3, 114-125. Reprinted in Moore [1964],
pp. 63-91.
RANDELL, B., and L.J. RUSSELL [1964]. ALGOL 60 Implementation. Academic
Press, New York.
REYNOLDS, J. C. [1965]. An introduction to the COGENT programming system.
Proc. A CM National Conference, 422.
REYNOLDS, J. C., and R. HASKELL [1970]. Grammatical coverings. Unpublished
memorandum.
ROGERS, H., JR. [1967]. Theory of Recursive Functions and Effective Computability.
BIBLIOGRAPHY 529
ROSEN, S. (ed.) [1967a]. Programming Systems and Languages. McGraw-Hill, New

York.
ROSEN, S. [1967b]. A compiler-building system developed by Brooker and Morris.
In Rosen [1967a], pp. 306-331.
ROSENKRANTZ, D. J. [1967]. Matrix equations and normal forms for context-free
grammars. J. A C M 14: 3, 501-507.
ROSENKRANTZ, D.J. [1968]. Programmed grammars and classes of formal lan-
guages. J. A C M 16: 1,107-131.
ROSENKRANTZ, D. J., and P. M. LEWIS II [1970]. Deterministic left corner parsing.
IEEE Conf. Record l lth Annual Symposium on Switching and Automata
Theory, pp. 139-152.
ROSENKRANTZ, O. J., and R.E. STEARNS [1970]. Properties of deterministic top-
down grammars. Information and Control 17:3, 226--256.
SALOMAA,A. [1966]. Two complete axiom systems for the algebra of regular events.
J. A C M 13: 1,158-169.
SALOMAA,A. [1969a]. Theory of Automata. Pergamon, Elmsford, N.Y.
SALOMAA, A. [1969b]. On the index of a context-free grammar and language.
SAMMET,J. E. [1969]. Programming Languages: History and Fundamentals. Prentice-
Hall, Englewood Cliffs, N.J.
SCHORRE, D.V. [1964]. META II, a syntax oriented compiler writing language.
Proc. A C M National Conference 19, pp. D1.3-1-D1.3-11.
SCHUTZENBERGER, M.P. [1963]. On context-free languages and pushdown auto-
mata. Information and Control 6:3, 246-264.
SCHWARTZ, J.T. (ed.) [1967]. Mathematical aspects of computer science. Proc.
Symposia in Applied Mathematics, 19. American Mathematical Society,
Providence, R. I.
SHANNON, C.E., and J. MCCARTHY (eds.) [1956]. Automata Studies. Princeton
University Press, Princeton, N.J.
SHAW, A.C. [1970]. Parsing of graph-representable pictures. J. A C M 17: 3,
453-481.
SHEPnERDSON, J.C. [1959]. The reduction of two-way automata to one-way
automata. IBM J. Res. 3, 198-200. Reprinted in Moore [1964], pp. 92-97.
STEARNS, R. E. [1967]. A regularity test for pushdown machines. Information and
Control 11:3, 323-340.
STEEL, T.B. (ed.) [1966]. Formal Language Description Languages for Computer
Programming. North-Holland, Amsterdam.
STRASSEN,V. [1969]. Gaussian elimination is not optimal. Numerische Mathematik
13, 354-356.
SUPPES, P. [1960]. Axiomatic Set Theory. Van Nostrand Reinhold, New York.
530 BIBLIOGRAPHY
THOVn~SON, K. [1968]. Regular expression search algorithm. Comm. A C M 11"6,

419-422.
TURING, A. M. [1936-1937]. On computable numbers, with an application to the
Entscheidungsproblem. Proc. of the London Mathematical Society, set. 2, 42,
230-265. Corrections. 1bid. 43 (1937), 544-546.
UNGER, S. H. [1968]. A global parser for context-free phrase structure grammars.
Comm. A CM 11:4, 240-246, and 11:6, 427.
VAN WIJNGARRDEN, A. (ed.) [1969]. Report on the algorithmic language ALGOL
68. Numerische Mathematik 14, 79-218.
WALTERS,D. A. [1970]. Deterministic context-sensitive languages. Information and
Control 17:1, 14-61.
WARSHALL,S. [1962]. A theorem on Boolean matrices. J. A C M 9:1, 11-12.
WARSHALL,S., and R. M. SHAPIRO [1964]. A general purpose table driven com-
piler. Proc. AFIPS Spring Joint Computer Conference, 25. Spartan, New York,
pp. 59-65.
WEGBREIT, B. [1970]. Studies in extensible programming languages. Ph.D. Thesis,
Harvard Univ., Cambridge, Mass.
WINOGRAD,S. [1965]. On the time required to perform addition. J. A C M 12:2,
277-285.
WINOGRAD, S. [1967]. On the time required to perform multiplication, d. A C M
14: 4, 793-802.
WIRTH, N. [1968]. PL360--a programming language for the 360 computers. J.
A C M 15:1, 37-34.
WIRTH, N., and H. WEBER [1966]. EULERma generalization of ALGOL and its
formal definition, Parts 1 and 2. Comm. A C M 9: 1, 13-23 and 9: 2, 89-99.
WISE, D. S. [1971]. Domolki's algorithm applied to generalized overlap resolvable
grammars. Proc. Third Annual A CM Symp. on Theory of Computing, pp.
171-184.
WOOD, D. [1969a]. The theory of left factored languages. Computer J. 12: 4, 349-
356, and 13:1, 55-62.
WOOD, D. [1969b]. A note on top-down deterministic languages. BIT 9: 4, 387-399.
WOOD, D. [1970]. Bibliography 23: Formal language theory and automata theory.
Computing Reviews 11" 7, 417-430.
WOZENCRAFT, J. M., and A. EVANS, JR. [1969]. Notes on Programming Languages.
Dept. of Electrical Engineering, Massachusetts Institute of Technology,
Cambridge, Mass.
YOUNGER, D. H. [1967]. Recognition and parsing of context-free languages in time
n 3. Information and Control 10: 2, 189-208.
INDEX TO LEMMAS, THEOREMS,
AND ALGORITHMS
Theorem Theorem Theorem Theorem

Number Page Number Page Number Page Number Page
0-1 7 2-20 162 3-14 274 5-11 391

0-2 8 2-21 184 3-15 275 5-12 395
0-3 40 2-22 187 4-1 297 5-13 395
0-4 45 2-23 189 4-2 298 5-14 406
0-5 48 2-24 193 4-3 299 5-15 409
2-1 110 2-25 196 4-4 306 5-16 413
2-2 111 2-26 197 4-5 306 5-17 414
2-3 117 2-27 197 4-6 316 5-18 418
2-4 120 2-28 201 4-7 317 5-19 420
2-5 120 2-29 201 4-8 319 5-20. 429
2-6 127 2-30 203 4-9 323 5-21 430
2-7 128 3-1 221 4-10 327 5-22 432
2-8 129 3-2 233 4-11 327 5-23 433
2-9 129 3-3 239 4-12 330 5-24 438
2-10 132 3--4 240 5-1 341 5-25 439
2-11 143 3-5 242 5-2 342 5-26 443
2-1.2 145 3-6 243 5-3 343 6.--1 467
2-13 147 3-7 243 5-4 347 6-2 473
2-14 148 3-8 .250 5-5 353 6-3 474
2-15 149 3-9 257 5-6 356 6-4 475
2-16 150 3-10 266 5-7 358 6-5 482
2-17 152 3-11 267 5-8 360 6-6 498
2-18 156 3-12 270 5-9 383
2-19 158 3-13 273 5-10 387
531
532 INDEXTO LEMMAS, THEOREMS, AND ALGORITHMS
Lemma Lemma Lemma Lemma

II
2-1 104 2-19 161 3-6 244 5-3 406

2-2 108 2-20' 172 3-7 24,~ 5-4 417
2-3 108 2-21 174 3-8 245 5-5 417
2--4 109 2-22 175 3-9 246 5-6 431
2-5 110 2-23 176 3-10 247 5-7 442
2-6 110 2-24 176 3-11 247 5-8 443
2-7 110 2-25 181 3-12 248 6-1 461
2-8 118 2-26 183 3-13 248 6-2 466
2-9 119 2-27 186 3-14 249 6-3 466
2-10 119 2-28 188 3-15 274 6-4 472
2-11 125 2-29 200 4-1 294 6-5 479
2-12 141 2-30 200 4-2 295 6--6 482
2-13 141 2-31 201 4-3 296 6-7 494
2-14 151 3-1 230 4-4 296 6-8 494
.2-15 154 3-2 230 4-5 306 6-9 496
2-16 157 3-3 232 4-6 325 6--10 497
2-17 160 3-4 238 5-1 348 6-11 497
2-18 161 3-5 244 5-2 382 6-12 497
Algorithm Algorithm Algorithm Algorithm

0--1 45 2-10 148 4-4 318 5-10 391

0-2 48 2-11 149 4-5 321 5-11 393
0-3 50 2-12 152 4-6 328 5-12 408
1-1 68 2-13 155 5-1 345 5-13 411
2-1 106 2-14 158 5-2 350 5-14 418
2-2 126 2-15 161 5-3 351 5-15 422
2-3 130 2-16 187 5-4 357 5-16 432
.2-4 131 3-1 220 5-5 357 5-17 433
2-5 131 3-2 255 5-6 359 5-18 437
2-6 134 4-1 289 5-7 375 5-19 442
2-7 144 4-2 303 5-8 386 6-1 464
2-8 146 4-3 315 5-9 389 6-2 474
.2-9 146
INDEX TO V O L U M E I
Automaton (see Recognizer, Transducer)

Axiom, 19-20
Aanderaa, S. D., 34
Acceptance, 96 (see also Final configura- B
tion)
Accessible state, 117, 125-126 Backtrack parsing, 281-314, 456-500
Accumulator, 65 Backus, J. W., 76
Ackley, S. I., 263 Backus-Naur form, 58 (see also Context-
Action function, 374, 392 free grammar)
Adjacency matrix, 47-51 Backwards determinism (see Unique in-
Aho, A. V., 102-103, 192, 251, 399, 426 vertibility)
ALGOL, 198-199,234, 281,489-490, 501 Bar-Hillel, Y., 82, 102, 211
Algorithm, 27-36 Barnett, M. P., 237
Alphabet, 15 Bauer, F. L., 455
Alternates (for a nonterminal), 285, 457 Bauer, H., 426
Ambiguous grammar, 143, 163, 202-207, Becker, S., 426
281,489-490 (see also Unambigu- Berge, C., 52
ous grammar) Bijection, 10
Ancestor, 39 Birman, A., 485
Antisymmetric relation, 10 Blattner, M., 211
Arbib, M. A., 138 BNF (see Backus-Naur form)
Arc (see Edge) Bobrow, D. G., 82
Arithmetic expression, 86 Book, R. V., 103, 211
Arithmetic progression, 124, 209 Bookkeeping, 59, 62-63, 74, 255
Assembler, 59, 74 Boolean algebra, 23-24, 129
Assembly language, 65-70 Booth, T. L., 138
Assignment statement, 65-70 Border (of a sentential form), 334, 369
Asymmetric relation, 9 Borodin, A., 36
Atom, 1-2 Bottom-up parsing, 178-184, 268-271,
Augmented grammar, 372-373, 427-428 301-307, 485-500 (see also
533
534 INDEXTO VOLUME I
Bottom-up parsing (cont.) Code optimization, 59, 70-72

Bounded right context grammar, Cohen, R. S., 500
Floyd-Evans productions, LR (k) Colmerauer, A., 500
grammar, Precedence grammar) Colmerauer grammar, 492, 497-500
Bounded context grammar/language, 450- Colmerauer precedence relations, 490-
452 500
Bounded-right-context grammar/language, Commutative law, 71
427-435, 448, 45.1-452 Compiler, 53-77 (see also Bookkeeping,
BRC (see Bounded-right context) Code generation, Code optimiza-
Brooker, R. A., 77, 313 tion, Error correction, Lexical
Brzozowski, J. A., 124, 138 analysis, Parsing)
Compiler-compiler, 77
Complementation, 4, 189-190, 197, 208,
484
Completely specified finite automaton,
Canonical collection of sets of valid items, 117
389-391 Composition (of relations), 13, 250
Canonical LR (k) parser, 393-396 Computational complexity, 27-28, 208,
Canonical set of LR(k) tables, 393-394 210, 297-300, 316-320, 326-328,
Cantor, D. G., 211 356, 395-396, 473-476
Cardinality (of a set), I 1, 14 Concatenation, 15, 17, 197, 208-210
Cartesian product, 5 Configuration, 34, 95, 113-114, 168-169,
Catalan number, 165 224, 228, 290, 303, 338, 477, 488
CFG (see Context-free grammar) Conflict, precedence (see Precedence con-
CFL (see Context-free language) flict)
Characteristic function, 34 Congruence relation, 134
Characterizing language, 238-243, 251 Consistent set of items, 391
Cheatham, T. E., 58, 77, 280, 314 Constant, 254
Chomsky, N., 29, 58, 82, 102, 124, 166, Context-free grammar/language, 91-93,
192, 211 97, 99, 101, 208, 399
Chomsky grammar, 29 (see also Gram- Context-sensitive grammar/language, 91-
mar ) 93, 97, 99, 101,208, 399
Chomsky hierarchy, 92 Continuing pushdown automaton, 188
Chomsky normal form, 150-153, 243, Conway, R. W., 77
276-277, 280, 314, 362 Cook, S. A., 34, 192
Christensen, C., 58 Correspondence problem (see Post's cor-
Church, A., 25, 29 respondence problem)
Church-Turing thesis, 29 Countable set, 11, 14
Circuit (see Cycle) Cover, grammatical (see Left cover,
Closed portion (of a sentential form), Right cover)
334, 369 CSG (see Context-sensitive grammar)
Closure CSL (see Context-sensitive language)
of a language, 17-18, 197 Culik, K. II, 368, 500
of a set of valid items, 386 Cut (of a parse tree), 140-141
reflexive and transitive, 8-9 Cycle, 39
transitive, 8-9, 47-50, 52 Cycle-free grammar, 150, 280, 302-303,
CNF (see Chomsky normal form) 307
Cocke, J., 76, 332
Cocke-Younger-Kasami algorithm, 281,
314-320
Code generation, 59, 65-70, 72, 74 (see
also Translation) Dag, 39-40, 42--45, 116
INDEX TO VOLUME I B3B
Dangling else, 202-204 Edge, 37

Davis, M., 36 e;free first (EFF), 381-382, 392, 398
D-chart, 79-82 e-free grammar, 147-149, 280, 302-303,
Decidable problem (see Problem) 397
Defining equations (for context-free lan- Eickel, J., 455
guages), 159-163 Eight queens problem, 309
De Morgan's laws, 12 Elementary operation, 317, 319, 326, 395
Denning, P. J., 426 e-move, 168, 190
De Remer, F. L., 399, 512 Emptiness problem, 130-132, 144-145,
Derivation, 86, 98 483
Derivation tree (see Parse tree) Empty set, 2
Derivative (of a regular expression), 136 Empty string, 15
Descendant, 39 Endmarker, 94, 271, 341,404, 469, 484
Deterministic finite automaton, 116, 255 Engeler, E., 58
(see also Finite automaton) English, structure of, 55-56, 78
Deterministic finite transducer, 226-227 e-production, 92, 147-149, 362
(see also Finite transducer) Equivalence class (see Equivalence rela-
Deterministic pushdown automaton, 184- tion)
192, 201-202, 208-210, 251, 344, Equivalence problem, 130-132, 201,237,
398, 446, 448, 466-469 362
Deterministic pushdown transducer, 229, Equivalence relation, 6-7, 12-13, 126,
251,271-275, 341, 395, 443,446 133-134
Deterministic recognizer, 95 (see also Error correction, 59, 72-74, 77, 367, 394,
Recognizer, Deterministic finite 399, 426
automaton, Deterministic push- Euclid's algorithm, 26--27, 36
down automaton, Deterministic Evans, A., 455
two-stack parser) Evey, R. J., 166, 192
Deterministic two-stack parser, 488, 492- Extended precedence grammar, 410-415,
493, 500 424-425, 429, 451
Dewar, R. B. K., 77 Extended pushdown automaton, 173-
Diagnostics (see Error correction) 175, 185-186
Difference (of sets), 4 Extended pushdown transducer, 269
Dijkstra, E. W., 79 Extended regular expression, 253-258
Direct lexical analysis, 61-62, 258-261 Extensible language, 58, 501-504
Directed acyclic graph (see Dag)
Directed graph (see Graph)
Disjoint sets, 4
Distinguishable states (of a finite automa-
ton), 124-128 Feldman, J. A., 77, 455
Domain, 6, 10 Fetch function (of a recognizer), 94
Domolki's algorithm, 312-313,452 FIN, 135, 207
DPDA (see Deterministic pushdown autom- Final configuration, 95, 113, 169, 175,
aton) 224, 228, 339
DPDT (see Deterministic pushdown trans- Final state, 113, 168, 224
ducer) Finite ambiguity, 332
Dyck language, 209 Finite automaton, 112-121, 124-128,
255-261, 397
Finite control, 95, 443 (see also State)
Finite set, 11, 14
Earley, J., 332 Finite transducer, 223-227, 235, 237-
Earley's algorithm, 73, 281, 320-331, 240, 242, 250, 252, 254-255, 258
397-398 FIRST, 300, 335-336, 357-359
Fischer, M. J., 102, 426

Fischer, P. C., 36
Flow chart, 79-82 Haines, L. H., 103
Floyd, R. W., 52, 77, 166, 211, 314, 426, Halmos, P. R., 3, 25
455 Halting (see Recursive, Algorithm)
Floyd-Evans productions, 443--448, 452 Halting problem, 35
FOLLOW, 343, 425 Halting pushdown automaton, 282-285
Formal Semantic Language, 455 Handle (of a right sentential form), 179-
FORTRAN, 252, 501 180, 377, 379-380, 403--404, 486
Freeman, D. N., 77, 263 Harary, E., 52
Frontier (of a parse tree), 140 Harrison, M. A., 138, 192, 280, 455, 500
Function, 10, 14 Hartmanis, J., 36, 192
Futrelle, R. P., 237 Hash table, 63
Haskell, R., 280
Hays, D. G., 332
O Hext, J. B., 314
Hochsprung, R. R., 77
Galler, B. A., 58 Homomorphism, 17-18, 197, 207, 209,
Generalized top-down parsing language, 213
469-485 Hopcroft, J. E., 36, 102-103, 138, 192,
Gentleman, W. M., 61 211, 368, 399
Gill, A., 138 Hopgood, F. R. A., 76, 455
Ginsburg, S., 102-103, 138, 166, 211,237 Homing, J. J., 76-77, 450, 465
Ginzburg, A., 138 Huffman, D. A:, 138
GNF (see Greibach normal form)
Gotlieb, C. C., 314
GOTO, 386---390, 392
Goto function, 374, 392
Graham, R. M., 455 Ibarra, O., 192
Graham, S. L., 426 Ichbiah, J. D., 426
Grammar, 85 (see Bounded-right-context Identifier, 60-63, 252, 254
grammar, Colmerauer grammar, Inaccessible state (see Accessible state)
Context-free grammar, Context- Inaccessible symbol (of a context-free
sensitive grammar, Indexed gram, grammar), 145-147
mar, LC(k) grammar, LL(k) Inclusion (of sets), 3,208
grammar, LR(k) grammar, Op- In-degree, 39
erator grammar, Precedence gram- Index (of an equivalence relation), 7
mar, Right linear grammar, Unre- Indexed grammar, 100-101
stricted grammar, Web grammar) Indirect lexical analysis, 61-62, 254-258
Graph, 37-52 Indistinguishable states (see Distinguish-
Graph grammar (see Web grammar) able states)
Gray, J. N., 192, 280, 455, 500 Indlletion (see Proof by induction)
Greek letters, 214 Infinite set, 11, 14
Greibach, S. A., 102-103, 166, 211 Ingerman, P. Z., 77
Greibach normal form, 153-162, 243, Inherent ambiguity, 205-207, 209
280, 362 INIT, 135, 207
Gries, D., 76-77 Initial configuration, 95, 113, 169
Griffiths, T. V., 314 Initial state, 113, 168, 224
Griswold, R. E., 505 Initial symbol (of a pushdown automa-
Gross, M,, 211 ton), 168
GTDPL (see Generalized top-down parsing Injection, 10
language) Input head, 94-96
INDEX TO VOLUME I 537
Input symbol, 113, 168, 218, 224 Left-bracketed representation (for trees),
Input tape, 93-96 46
Intermediate (of an indexed grammar), Left-corner parse, 278-280, 310-312,
100 362-367
Intermediate code, 59, 65-70 Left-corner parser, 310-312
Interpreter, 55 Left cover, 275-277, 280, 307
Intersection, 4, 197, 201,208, 484 Left factoring, 345
Inverse (of a relation), 6, 10-11 Left linear grammar, 122
Inverse finite transducer mapping, 227 Leftmost derivation, 142-143, 204, 318-
Irland, M. I., 36 320
Irons, E. T., 77, 237, 314, 455 Left parsable grammar, 271-275, 341
Irreflexive relation, 9 Left parse (see Leftmost derivation, Top-
Item (Earley's algorithm), 320, 331,397- down parsing)
398 Left parse language, 273, 277
Item (LR(k)), 381 (see also Valid item) Left parser, 266--268
Left recursion, 484 (see also Left re,cur-
sive grammar)
Left-recursive grammar, 153-158, 287-
288, 294-298, 344-345
Johnson, W. L., 263 Left sentential form, 143
Leinius, R. P., 399, 426
Length
of a derivation, 86
of a string, 16
Lentin, A., 211
Lewis, P. M., II, !92, 237, 368
Kameda, T., 138 Lexical analysis, 59-63, 72-74, 251-264
Kasami, T., 332 Lexicographic order, 13
Keyword, 59, 259 Linear bounded automaton, 100 (see also
Kleene, S. C., 36, 124 Context-sensitive grammar)
Knuth, D. E., 36, 58, 368, 399, 485 Linear grammar/language, 165-170, 207-
Korenjak, A. J., 368, 399 208, 237
Kosaraju, S. R., 138 Linear order, 10, 13-14, 43-45
k-predictive parsing algorithm (see Pre-
Linear set, 209-210
dictive parsing algorithm)
LL(k) grammar/language, 73, 268, 333-368,
Kuno, S., 313
397-398, 448, 452
Kurki-Suonio, R., 368
LL(k) table, 349-351,354-355
LL(1) grammar, 342-349, 483
Loeckx, J., 455
Logic, 19-25
Logical connective, 21-25
Labeled graph, 38, 42 Lookahead, 300, 306, 331, 334-336, 363,
Lalonde, W. R., 450 371
Lambda calculus, 29 Looping (in a pushdown automaton),
Language, 16-17, 83-84, 86, 96, 114, 169 186-189
(see also Recognizer, Grammar) LR(k) grammar/language, 73, 271, 369,
LBA (see Linear bounded automaton) 371-399, 402, 424, 428, 430, 448
LC(k) grammar/language, 362-367 LR(k) table, 374-376, 392-394, 398
Leavenworth, B. M., 58, 501 LR (1) grammar, 410, 448-450
Lee, E. S., 450 Lueas, P., 58
Lee, J. A. N., 76 Lukasiewicz, J., 214
M Nondeterministic recognizer, 95 (see also

Recognizer)
Mapping (see Function) Non-left-recursive grammar (see Left re-
Marked closure, 210 cursive grammar)
Marked concatenation, 210 Nonterminal, 85, 100, 218, 458
Marked union, 210 Nose colds, in hogs, 78
Markov, A. A., 29
Markov algorithm, 29
MAX, 135 O
Maxwell, W. L., 77
McCarthy, J., 77 Object code, 59
McClure, R. M., 77, 485 Oettinger, A., 192, 313
McCullough, W. S., 103 Ogden, W., 211
Mcllroy, M. D., 58, 61 Ogden's lemma, 192-196
McKeeman, W. M., 76-77, 426, 455 (1, 1)-bounded-right-context grammar/lan-
McNaughton, R., 124 guage, 429-430, 448
Membership (relation on sets), 1 One-turn pushdown automaton, 207-208
Membership problem, 130-132 One-way recognizer, 94
Memory (of a recognizer), 93-96 Open portion (of a sentential form), 334,
Mendelson, E., 25 369
META, 485 Operator grammar/languagel 165, 438
Miller, G. A., 124 Operator precedence grammar/language,
Miller, W. F., 82 439-443, 448-450, 452
MIN, 135, 207, 209 Order (of a syntax-directed translation),
Minimal fixed point, 108-110, 121-123, 243-251
160-161 Ordered dag, 42 (see also Dag)
Minsky, M, 29, 36, 102, 138 Ordered graph, 41-42
Mixed-strategy precedence grammar, 435, Ordered tree, 42-44 (see also Tree)
437-439, 448, 452 Ore, O., 52
Montanad, U. G., '82 Out degree, 39
Moore, E. F., 103, 138 Output symbol, 218, 224
Morgan, H. L., 77, 263
Morris, D., 77, 313
Morse, S. P., 426
Moulton, P. G., 77
Move (of a recognizer), 95
MSP (see Mixed strategy precedence) Painter, J. A., 77
Muller, M. E., 77 Pair, C., 426
Munro, I., 52 PAL, 512-517
Parikh, R. J., 211
Parikh's theorem, 209-211
N Parse lists, 321
Parse table, 316, 339, 345-346, 348, 351-
Natural languages, 78, 281 (see also Eng- 356, 364-365, 374
lish) Parse tree, 139-143, 179-180, 220-222,
Naur, P., 58 273,379, 464-466
Next move function, 168, 224 Parsing, 56, 59, 63-65, 72-74, 263-280
Node, 37 (see also Bottom-up parsing, Shift-
Nondeterministic algorithm, 285, 308-310 reduce parsing, Top-down parsing)
Nondeterministic finite automaton, 117 Parsing action function (see Action func-
(see also Finite automaton) tion )
Nondeterministic FORTRAN, 308-310 Parsing machine, 477-482, 484
Partial acceptance failure (of a TDPL or Prefix expression, 214-215, 229, 236
GTDPL program), 484 Prefix property, 17, 19, 209
Partial correspondence problem, 36 Preorder (of a tree), 43
Partial function, 10 Problem, 29-36
Partial left parse, 293-296 Procedure, 25-36
Partial order, 9-10, 13-15, 43-45 Product
Partial recursive function (see Recursive of languages, 17 (see also Concatena-
function) tion )
Partial right parse, 306 of relations, 7
Path, 39, 51 Production, 85, 100
Pattern recognition, 79-82 Production language, 443
Paul, M., 455 Proof, 19-21, 43
Paull, M. C., 166 Proof by induction, 20-21, 43
Pav!idis, T., 82 Proper grammar, 150
PDA (see Pushdown automaton) Propositional calculus, 22-23, 35
PDT (see Pushdown transducer)
Pumping lemma
Perles, M., 211
for context-free languages, 195-196
Perlis, A. J., 58
Petrick, S. R., 314 (see also Ogden's lemma)
Pfaltz, J. L., 82 for regular sets, 128-129
Phrase, 486 Pushdown automaton, 167-192, 201,282
Pitts, E., 103 (see also Deterministic pushdown
PL/I, 501 automaton)
PL360, 507-511 Pushdown symbol, 168
Poage, J. F., 505 Pushdown transducer, 227-233, 237,265-
Polish notation (see Prefix expression, 268, 282-285 (see also Determin-
Postfix expression) istic pushdown transducer)
Polonsky, I. P., 505
Porter, J. H., 263
Position (in a string), 193
Q
Post, E. L., 29, 36
Postfix expression, 214-215, 217-218, Question (see Problem)
229, 512
Postorder (of a tree), 43
Post's correspondence problem, 32-36,
199-201
Post system, 29 Rabin, M. O., 103, 124
Power set, 5, 12 Randell, B., 76
Prather, R. E., 138 Range, 6, 10
Precedence (of operators), 65, 233-234 Recognizer, 93-96, 103 (see also Finite
Precedence conflict, 419-420 automaton, Linear bounded auto-
Precedence grammar/language, 399-400, maton, Parsing machine, Push-
403-404 (see also Extended prece- down automaton, Turing machine,
dence grammar, Mixed-strategy Two-stack parser)
precedence grammar, Operator Recursive function, 28
precedence grammar, Simple pre- Recursive grammar, 153, 163
cedence grammar, T-canonical Recursively enumerable set, 28, 34, 92,
precedence grammar, Weak prece- 97, 500
dence grammar) Recursive set, 28, 34, 99
Predecessor, 37 Reduced finite automaton, 125-128
Predicate, 2 Reflexive relation, 6
Predictive parsing algorithm, 338-348, Reflexive-transitive closure (see Closure,
351-356 reflexive and transitive)
540 INDEXTO VOLUMEI
Regular definition, 253-254 Schwartz, J. T., 76

Regular expression, 104--110, 121-123 Scott, D., 103, 124
Regular expression equations, 105-112, SDT (see Syntax directed translation)
121-123 Self-embedding grammar, 210
Regular grammar, 122, 499 Semantics, 55-58, 213
Regular set, 103-138, 191, 197, 208-210, Semantic unambiguity, 274
227, 235, 238-240, 424 Semi-linear set, 210
Regular translation (see Finite trans- Sentence (see String)
ducer) Sentence symbol (see Start symbol)
Relation, 5-15 Sentential form, 86, 406--407, 414--415,
Reversal, 16, 121, 129-130, 397, 500 442
Reynolds, J. C., 77, 280, 313 Set, 1-19
Rice, H. G., 166 Shamir, E., 211
Right-bracketed representation (for Shapiro, R. M., 77
trees), 46 Shaw, A. C., 82
Right cover, 275-277, 280, 307 Shepherdson, J. C., 124
Right invariant equivalence relation, 133- Shift-reduce parsing, 269, 301-302, 368-
134 371, 392, 400-403, 408, 415, 418-
Right linear grammar, 91-92, 99, 110- 419, 433, 438-439, 442 (see also
112, 118-121, 20'1 Bounded-right-context grammar,
Right linear syntax-directed translation LR (k) grammar, Precedence
scheme, 236 grammar)
Rightmost derivation, 142-143, 264, 327- Simple LL(I) grammar/language, 336
330 Simple mixed-strategy precedence gram-
Right parsable grammar, 271-275, 398 mar/language, 437, 448, 451-452
Right parse (see Rightmost derivation, Simple precedence grammar/language,
Bottom-up parsing) 403-412, 420-424, 492--493, 507
Right parse language, 273 Simple syntax-directed translation, 222-
Right parser, 269-271 223, 230-233, 240~242, 250, 265,
Right reeursive grammar, 153 512
Right sentential form, 143 Single production, 149-150, 452
Roberts, P. S., 314 Skeletal grammar, 440-442, 452
Rogers, H., 36 SNOBOL, 505-507
Rosen, S., 76 Solvable problem (see Problem)
Rosenfeld, A., 82 Source program, 59
Rosenkrantz, D. J., 102, 166, 368 Space complexity (see Computational
Ross, D. T., 263 complexity)
Rule of inference, 19-20 Spanning tree, 51
Russell, L. J., 76 Standard form (of regular expression
Russell's paradox, 2 equations), 106-110
Standish, T., 77, 280
Start state (see Initial state)
Start symbol, 85, 100, 168, 218, 458 (see
also Initial symbol)
Salomaa, A., 124, 138, 211 State (of a recognizer), 113, 168, 224
Samelson, K., 455 State transition function, 113
Sammet, J. E., 29, 58 Stearns, R. E., 192, 211, 237, 368
Sattley, K., 312 Steel, T. B., 58
Scatter table (see Hash table) Store function (of a recognizer), 94
Schorre, D. V., 77, 313, 485 Strassen, V., 34, 52
Schutzenberger, M. P., 166, 192, 211 String, 15, 86
Strong characterization (see Character- Transition graph, 116, 225

izing language) Transitive closure (see Closure, transi-
Strong connectivity, 39 tive)
Strong LL(k) grammar, 344, 348 Transitive relation, 6
SUB, 135 Translation, 55, 212-213 (see also Code
Subset, 3 generation, Syntax-directed trans-
Substitution (of languages), 196-197 lation, Transducer)
Successor, 37 Translation form (pair of strings), 216-
Suffix property, 17, 19 217
Superset, 3 Translator, 216 (see also Transducer)
Suppes, P., 3 Tree, 40-42, 45-47, 55-56, 64-69, 80-
Symbol table (see Bookkeeping) 82, 283-284, 434-436 (see also
Symmetric relation, 6 Parse tree)
Syntactic analysis (see Parsing) Truth table, 21
Syntax, 55-57 T-skeletal grammar, 454
Syntax-directed translation, 57, 66-70, Turing, A. M., 29, 102
215-251, 445 (see also Simple Turing machine, 29, 33-36, 100, 132
syntax-directed translation) (2, 1 )-precedence grammar, 426, 448
Syntax macro, 501-503 Two-stack parser, 487-490, 499-500
Syntax tree (see Parse tree) Two-way finite automaton, 123
Two-way pushdown automaton, 191-192
Tag system, 29, 102

Tautology, 23
T-canonical precedence grammar, 452- Ullman, J. D., 36, 102-103, 192, 211,
454 251, 399, 426, 485
TDPL (see Top-down parsing language) Unambiguous grammar, 98-99, 325-328,
Temporary storage, 67-70 344, 395, 397, 407, 422, 430 (see
Terminal (of a grammar), 85, 100, 458 also Ambiguous grammar)
Theorem, 20 Undecidable problem (see Problem)
Thompson, K., 138, 263 Undirected graph, 50-51
Three address code, 65 Undirected tree, 51
Time complexity (see Computational Unger, S. H., 314
complexity) Union, 4, 197, 201,208, 484
TMG, 485 Unique invertibility, 370, 397, 404, 448,
Token, 59-63, 252 452, 490, 499
Token set, 452 Universal set, 4
Top-down parsing, 178, 264-268, 285- Universal Turing machine, 35
301, 445, 456-485, 487 (see also Unordered graph (see Graph-)
LL(k) grammar) Unrestricted grammar/language, 84-92,
Top-down parsing language, 458-469, 97-98, 100, 102
472-473, 484.--485 Unsolvable problem (see Problem)
Topological sort, 43-45 Useless symbol, 123, 146-147, 244, 250,
Torii, K., 332 280
Total function, 10, 14
Total recursive function, 28
Transducer (see Finite transducer, Push-
down transducer)
Transformation, 71-72 (see also Func- Valid item, 381,383-391,394
tion, Translation) Valid parsing algorithm, 339, 347-348
Transition function (see State transition Van Wijngaarden, A., 58
function, Next move function) Variable (see Nonterminal)
542 INDEX TO VOLUME I
Venn diagram, 3 Well-formed (TDPL or GTDPL pro-

Vertex (see Node) gram), 484
Viable prefix (of a right sentential form), Well order, 13, 19, 24-25
380, 393 Winograd, S., 33
Viable sequence, 33 Wirth, N., 426, 507
Wirth-Weber precedence (see Simple
precedence grammar)
W Wise, D. S., 455
Wood, D., 368
Walk, K., 58 Word (see String)
Waiters, D. A., 399 Worley, W. S., 77
Warshall, S., 52, 77 Wortman, D. B., 76-77, 455, 512
Warshall's algorithm, 48-49 Wozencraft, J. M., 512
Weak precedence grammar, 415-425,
437, 451-452
Weber, H., 426
Web grammar, 79-82
Wegbreit, B., 58 Yamada, H., 124
Weiner, P., 138 Younger, D. H., 332
(continued from j¥ont flap)
are designed to test understanding
at a variety of levels and to provide
additional information.
• States all algorithms in clear, easy-
to-understand English.
ALraED V. AHO is a member of the

Technical Staff of Bell Telephone
Laboratories' Computing Science Re-
search Center at Murray Hill, New
Jersey, and an Affiliate Associate Pro-
fessor at Stevens Institute of Tech-
nology. The author of numerous pub-
lished papers on language theory and
compiling theory, Dr. Aho received
his Ph.D. from Princeton University.
JEFFrtEY D. ULLMAN is an Associate

Professor of Electrical Engineering at
Princeton University. He has pub-
lished numerous papers in the com-
puter science field and was previously
co-author of a text on language theory.
Professor Ullman received his Ph.D.
from Princeton University.
PRENTICE-HALL, Inc.
Englewood Cliffs, New Jersey
178 • Printed in the U . S . A .
This second volume of THE THEORY
OF PARSING, TRANSLATION,
AND COMPILING completes the
definitive work in the field of com-
piling theory.
Techniques for the optimization of

parsers are presented with emphasis
on linear precedence functions, weak
precedence parsers, and LR(k) par-
sors.
The principal theoretical results con-

cerning deterministic one-pass parsers
are given. The important parsing
methods are compared in terms of
their language processing ability.
Later chapters are devoted to syntax-

directed translations, generalized
translation schemes, symbol tables,
property grammars, optimization of
straight line code and arithmetic ex-
pressions, programs with loops, and
global code optimization.
Among the features

• Organizes and systematizes the en-
tire field of compiler theory into a

unified terminology, orientation and
methodology.
• Covers parser optimization tech-
niques, deterministic parsing theory,
translation and code generation,
"bookkeeping," and code optimiza-
tion.
(continued on back flap)
THE THEORY OF
A N D COMPILING
Prentice-Hall
Series in Automatic Computation
George Forsythe, editor
AHO, editor, Currents in the Theory of Computing

AHO AND ULLMAN, Theory of Parsing, Translation, and Compiling,
V o l u m e I: Parsing; V o l u m e I I : Compiling
(ANDREE), 3 Computer Programming: Techniques, Analysis, and Mathematics
ANSELONE, Collectively Compact Operator Approximation Theory
and Applications to Integral Equations
ARBIB, Theories of Abstract Automata
BATES AND DOUGLAS, Programming Language~One, 2 n d ed.
BLUMENTHAL, Management Information Systems
BRENT, Algorithms for Minimization without Derivatives
CRESS, et al., FORTRAN 1V with WATFOR and WATFIV
DANIEL, The Approximate Minimization of Functionals
DESMONDE, Computers and Their Uses, 2nd ed.
DESMONDE, Real-Time Data Processing Systems
DRUMMOND, Evaluation and Measurement Techniques for Digital Computer Systems
EVANS, et al., Simulation Using Digital Computers
FIKE, Computer Evaluation of Mathematical Functions
FIKE, PL/1 for Scientific Programers
FORSYTH~ AND MOLER, Computer Solution of Linear Algebraic Systems
GAUTHIER AND PONTO, Designing Systems Programs
GEAR, Numerical Inital Value Problems in Ordinary Differential Equations
GOLDEN, FORTRAN I V Programming and Computing
GOLDEN AKD LEICHUS, IBM/360 Programming and Computing
GORDON, System Simulation
GRUENBERGER, e d i t o r , Computers and Communications
ORUENBERGER, e d i t o r , Critical Factors in Data Management
GRUENBERGER, e d i t o r , Expanding Use of Computers in the 70's
GRUENBERGER, e d i t o r , Fourth Generation Computers
HARTMANIS AND STEARNS, Algebraic Structure Theory of Sequential Machines
HULL, Introduction to Computing
JACOnY, et al., Iterative Methods for Nonlinear Optimization Problems
JOHNSON, System Structure in Data, Programs, and Computers
KANTER, The Computer and the Executive
KIVIAT, et al., The SIMSCRIPT H Programming Language
LORIN, Parallelism in Hardware and Software:
Real and Apparent Concurrency
LOUDEN AND LEDIN, Programming the 1BM 1130, 2nd ed.
MARTIN, Design of Man-Computer Dialogues
MARTIN, Design of Real-Time Computer Systems
MARTIN, Future Developments in Telecommunications
MARTIN, Programming Real-Time Computing Systems
MARTIN, Systems Analysis for Data Transmission
MARTIN, Telecommunications and the Computer
MARTIN, Teleprocessing Network Organization
MARTIN AND NORMAN, The Computerized Society
MATHISON AND WALKER, Computers and Telecommunications: Issues in Public Policy
MCKEEMAN, et al., A Compiler Generator
MEYERS, Time-Sharing Computation in the Social Sciences
MINSKY, Computation: Finite and Infinite Machines
PLANE AND MCMILLAN, Discrete Optimization: Integer Programming and
Network Analysis for Management Decisions
PRITSKER AND KIVIAT, Simulation with GASP H: a FORTRAN-Based
Simulation Language
PYLYSHYN, editor, Perspectives on the Computer Revolution
RICH, Internal Sorting Methods: Illustrated with PL/1 Program
RUSTIN, editor, Algorithm Specification
RUSTIN, editor, Computer Networks
RUSTIN, editor, Data Base Systems
RUSTIN, editor, Debugging Techniques in Large Systems
RUSTIN, editor, Design and Optimization of Compilers
RUSTIN, editor, Formal Semantics of Programming Languages
SACKMAN AND CITRENBAUM, editors, On-linePlanning: Towards
Creative Problem-Solving
SALTON, editor, The SMART Retrieval System: Experiments in Automatic
Document Processing
SAMMET, Programming Languages: History and Fundamentals
SCHAEFER, A Mathematical Theory of Global Program Optimization
SCHULTZ, Digital Processing: A System Orientation
SCHULTZ, Spline Analysis
SCHWARZ, et al., Numerical Analysis of Symmetric Matrices
SHERMAN, Techniques in Computer Programming
SIMON AND SIKLOSSY, Representation and Meaning: Experiments with Information
Processing Systems
SNYDER, Chebyshev Methods in Numerical Approximation
STERLING AND POLLACK, Introduction to Statistical Data Processing
STOUTEMYER, PL/1 Programming for Engineering and Science
STRANG .AND FIX, An Analysis of the Finite Element Method
STROUD, Approximate Calculation of Multiple Integrals
STROUD AND SECREST, Gaussian Quadrature Formulas
TAVlSS, editor, The Computer Impact
TRAUB, Iterative Methods for the Solution of Polynomial Equations
UHR, Pattern Recognition, Learning, and Thought
VAN TASSEL, Computer Security Management
VARGA, Matrix Iterative Analysis
WAITE, Implementing Software for Non-Numeric Application
WILKINSON, Rounding Errors in Algebraic Processes
WIRTH, Systematic Programming: An Introduction
THE THEORY OF
A N D COMPILING
VOLUME I1: COMPILING
A L F R E D V. A H O
Bell Telephone Laboratories, lnc.

Murray Hill, N. J.
JEFFREY D. U L L M A N
Department of Electrical Engineering

Princeton University
PRENTICE-HALL, INC.
ENGLEW00D CLIFFS~ N.J.

© 1973 by Bell Telephone Laboratories,
Incorporated, and J. D. Ullman
All rights reserved. No part of this book

may be reproduced in any form or by any means
without permission in writing from the publisher.
10 9 8 7 6
ISBN: 0-13-914564-8
Library of Congress Catalog Card No. 72-1073
Printed in the United States of America
PRENTICE-HALL INTERNATIONAL, INC., London

PRENTICE-HALL OF AUSTRALIA, PTY. LTD., Sydney
PRENTICE-HALL OF CANADA, LTD., Toronto
PRENTICE-HALL OF I N D I A PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
PREFACE
Compiler design is one of the first major areas of systems programming

for which a strong theoretical foundation is becoming available. Volume I
of The Theory of Parsing, Translation, and Compiling developed the relevant
parts of mathematics and language theory for this foundation and developed
the principal methods of fast syntactic analysis. Volume II is a continuation
of Volume I, but except for Chapters 7 and 8 it is oriented towards the non-
syntactic aspects of compiler design.
The treatment of the material in Volume II is much the same as in Volume
I, although proofs have become a little more sketchy. We have tried to make
the discussion as readable as possible by providing numerous examples, each
illustrating one or two concepts.
Since the text emphasizes concepts rather than language or machine
details, a programming laboratory should accompany a course based on this
book, so that a student can develop some facility in applying the concepts
discussed to practical problems. The programming exercises appearing at the
ends of sections can be used as recommended projects in such a laboratory.
Part of the laboratory course should discuss the code to be generated for such
programming language constructs as recursion, parameter passing, subroutine
linkages, array references, loops, and so forth.
Use of the Book
The notes from which this book evolved were used in courses at Princeton
University and Stevens Institute of Technology at both the senior and grad,
uate levels. The material in Volume II was used at Stevens as a one semester
course in compiler design following a one semester course based on Volume I.
As a text in compiler design, we feel, certain sections of the book are more
important than others. On a first reading proofs can be omitted, along with
Chapter 8 and Sections 7.4.3, 7.5.3, 9.3.3, 10.2.3, and t0.2.4.
vii
viii PREFACE
As in Volume I, problems and bibliographic notes appear at the end of

each section. We have coarsely graded problems, other than research and
open problems, according to their level of difficulty, using stars. Unstarred
problems test understanding of basic definitions. Singly starred problems re-
quire one significant insight for their solution. Doubly starred problems are
considerably harder than singly starred problems.
Acknowledgments
In addition to the acknowledgments made in the Preface to Volume I,
we would also like to thank Karel Culik, Amelia Fong, Mike Hammer, and
Steve Johnson for helpful comments.
ALFRED V. AHO
JEFFREY D. ULLMAN
CONTENTS
PREFACE VII
7 TECHNIQUES FOR PARSER OPTIMIZATION 543
7.1 Linear Precedence Functions 543

7.1.1 A Matrix Representation Theorem 544
7.1.2 Applications to Operator Precedence Parsing 550
7.1.3 Weak Precedence Functions 551
7.1.4 Modification of Precedence Matrices 555
Exercises 560
Bibliographic notes 563
7.2 Optimization of Floyd-Evans Parsers 564

7.2,1 Mechanical Generation of Floyd-Evans Parsers for Weak
Precedence Grammars 564
7.2.2 Improvement of Floyd-Evans Parsers 569
Exercises 576
7.3 Transformations on Sets of LR(k) Tables 579

7.3,1 The General Notion of an LR(k) Table 580
7.3.2 Equivalence of Table Sets 585
7.3,3 ~o-InaccessibleSets of Tables 588
7.3,4 Table Mergers by Compatible Partitions 592
7.3.5 Postponement of Error Checking 597
7.3.6 Elimination of Reductions by Single Productions 607
Exercises 615
7.4 Techniques for Constructing LR(k) Parsers 621

7.4.1 Simple LR Grammars 622
ix
X CONTENTS
7.4.2 Extending the SLR Concept to Non-SLR Grammars 627

7.4.3 Grammar Splitting 631
Exercises 642
7.5 Parsing Automata 645

7.5.1 Canonical Parsing Automata 645
7.5.2 Splitting the Functions of States 650
7.5.3 Generalizations to LR(k) Parsers 657
Exercises 663
11 THEORY OF DETERMINISTIC PARSING 666
8.1 Theory of LL Languages 667

8.1.1 LL and LR Grammars 668
8.1.2 LL Grammars in Greibach Normal Form 674 ~
8.1.3 The Equivalence Problem for LL Grammars 683
8.1.4 The Hierarchy of LL Languages 686
Exercises 689
8.2 Classes of Grammars Generating the Deterministic

Languages 690
8.2.1 Normal Form DPDA's and Canonical Grammars 690
8.2.2 Simple MSP Grammars and Deterministic Languages 695
8.2.3 BRC Grammars, L R Grammars, and Deterministic
Languages 699
8.2.4 Extended Precedence Grammars and Deterministic
Languages 701
Exercises 707
8.3 Theory of Simple Precedence Languages 709

8.3.1 The Class of Simple Precedence Languages 709
8.3.2 Operator Precedence Languages 711
Exercises 717
9 T R A N S L A T I O N A N D CODE GENERATION 720
9.1 The Role of Translation in Compiling 720

9.1.1 Phases of Compilation 721
9.1.2 Representations of the Intermediate Program 724
9.1.3 Models for Code Generation 728
Exercises 728
CONTENTS
9.2 Syntax-Directed Translations 730

9.2.1 Simple Syntax-Directed Translations 730
9.2.2 A Generalized Transducer 737
9.2.3 Deterministic One-Pass Bottom-Up Translation 740
9.2.4 Deterministic One-Pass Top-Down Translation 742
9.2.5 Translation in a Backtrack Environment 746
Exercises 753
9.3 Generalized Translation Schemes 757

9.3.1 Multiple Translation Schemes 758
9.3.2 Varieties of. Translations 765
9.3.3 Inherited and Synthesized Translations 776
9.3.4 A Word About Timing 781
Exercises 782
|O BOOKKEEPING 788
10.1 Symbol Tables 788

10.1.1 Storage of Information About Tokens 789
10.1.2 Storage Mechanisms 791
10.1.3 Hash Tables 793
10.1.4 Hashing Functions 797
10.1.5 Hash Table Efficiency 799
Exercises 807
10.2 Property Grammars 811

10.2.1 Motivation 812
10.2.2 Definition of Property Grammar 814
10.2.3 Implementation of Property Grammars 823
10.2.4 Analysis of the Table Handling Algorithm 832
Exercises 840
11 CODE O P T I M I Z A T I O N 844
11.1 Optimization of Straight-Line Code 845

11.1.1 A Model of Straight Line Code 845
11.1.2 Transformations on Blocks 848
11.1.3 A Graphical Representation of Blocks 854
11.1.4 Characterization of Equivalences Between Blocks 859
11.1.5 Optimization of Blocks 861
11.1.6 Algebraic Transformations 867
Exercises 874
xii CONTENTS
11.2 Arithmetic Expressions 878

11.2.1 The Machine Model 879
11.2.2 The Labeling of Trees 882
11.2.3 Programs With STORE's 889
11.2.4 Effect of Some Algebraic Laws 891
Exercises 902
11.3 Programs with Loops 907

11.3.1 The Program Model 908
11.3.2 Flow Analysis 912
11.3.3 Examples of Transformations on Programs 917
11.3.4 Loop Optimization 921
Exercises 932
11.4 Data Flow Analysis 937

11.4.1 Intervals 937
1t.4.2 Data Flow Analysis Using Intervals 944
11.4.3 Irreducible Flow Graphs 952
Exercises 956
BIBLIOGRAPHY FOR V O L U M E S ! AND II 961
I N D E X TO L E M M A S , THEOREMS, AND ALGORITHMS 987
I N D E X TO V O L U M E S I AND II 989
THE THEORY OF
AND COMPILING
TECHNIQUES FOR
7 PARSER O P T I M I Z A T I O N
In this chapter we shall discuss various techniques that can be used to

reduce the size and/or increase the speed of parsers.
First, we shall consider reducing storage requirements for precedence
matrices. In certain cases, including many of practical interest, we shall
show that an m x n precedence matrix can be replaced, by two vectors of
length m and n, respectively. We shall also discuss how a precedence matrix
can be modified without affecting the shift-reduce parsing algorithm con-
structed from the matrix.
Next we shall show how a production language parser can be mechan-
ically generated from a weak precedence grammar, and then we shall consider
various techniques which can be used to reduce the size of the resulting parser.
Finally, we shall consider in some detail various transformations which
can be used to reduce the size of an LR parser without adversely affecting
its error-detecting ability. The "Simple LR" method of DeRemer and the
grammar splitting method of Korenjak are discussed in detail.
The techniques presented in this chapter are indicative of the types of
optimization that can be performed on all parsers constructed by the methods
of Chapter 5 (in Volume I). Many more optimizations are possible, but a
complete "catalogue" of these does not exist. The summary at the end of
this chapter is recommended for those readers desiring merely an overview
of parser optimization techniques.
7.1. LINEAR P R E C E D E N C E FUNCTIONS
A matrix whose entries are either --1, 0, q--1, or "blank" will be called
a precedence matrix. There are obvious applications for precedence matrices
543
544 TECHNIQUESFOR PARSEROPTIMIZATION CHAP. 7
in the implementation of precedence-oriented parsing algorithms. For exam-

ple, we can use a precedence matrix to represent the Wirth-Weber precedence
relations for a precedence grammar by associating
-- 1 with <~
0 with .2_
--Jr-1 with .>
blank with error
Or we can use a precedence matrix to represent the parsing decisions Of

a shift-reduce parsing algorithm. One such representation would be to
associate
--1 with shift
0 with error
+1 with reduce
In this section we shall show how a precedence matrix can often be concisely
represented by a pair of vectors called linear precedence functions.
7.1.1. A Matrix Representation Theorem
Let M be an m × n precedence matrix. We say that a pair ( f , g) of vectors

of integers represents M if
(1) f = ( f ~ , f 2 , . . . ,fro);
(2) g = (gl, g2,. • . , g,); and
(3) f. < gj whenever M;j = --1,
f. = gj whenever Mtj = 0, and
f. > gj whenever M,v = + 1.
We can use f and g in place of M as follows. To determine Mij we look up
f. and gj. I f f . < gj, f. - - g j , or f. > gj, we shall assume that M~j = --1, 0,
or + 1, respectively. Note that by using f and g in place of M in this manner,
we do not recover the blank entries of M, because one of the relations < , --,
or > holds between each f. and gj.
We shall call the vectors f and g linear precedence functions for M. By
using f and g to represent M, we can reduce the storage requirement for the
precedence matrix from m × n entries to m + n entries. We should point
out, however, that linear precedence functions do not exist for every pre-
cedence matrix.
Example 7.1
Consider the simple precedence grammar G with productions
S >aSclbSclc
SEC. 7.1 LINEARPRECEDENCEFUNCTIONS 545
S a b c $
=- <. <. <-
=. <. <. <.
•> .>
$ <. <. <. Fig. 7.1 Matrix of Wirth-Weber pre-

cedence relations.
The W i r t h - W e b e r precedence relations for G are shown in the matrix in

Fig. 7.1. We shall henceforth call this matrix the matrix of Wirth-Weber
precedence relations to avoid confusion with the term precedence matrix.
We can represent the precedence relations in Fig. 7.1 by the precedence
matrix M shown in Fig. 7.2, associating
-- 1 with <
0 with
+ 1 with 3>
and leaving blank entries unchanged. We can then represent this precedence
matrix by the linear precedence functions
f - - (1, O, O, 2, O)
g -- (0, 1, 1, 1, O)
1 2 3 4 5
S a b c $
1 S
2 a -1 -1 -1
3 b -1 -1 -1
4 c +1 +1
5 $ 1 -1 -1
Fig. 7.2 Precedencematrix M.
We can easily verify that these are linear precedence functions for M. For
example, f4 = 2 and g5 = 0. Thus, since f4 > gs, f and g faithfully represent
the + 1 entry M45.
The entry Max in the precedence matrix is blank. However, f4 = 2 and
g l = 0. Thus, if we use f and g to represent M, we wouldreconstruct M41
546 TECHNIQUESFOR PARSER OPTIMIZATION CHAP. 7
as + 1 (since f4 > g~). Likewise, the blank entries M l l , M~5, M42 and M43
would all be represented by + l's, and M~z, Mx3, Mzs, M35, Ms~ and Ms5
would be represented by O's.
The blank entries in the original precedence matrix represent error con-
ditions. Thus, if we use linear precedence functions to represent the prece-
dence relations in this fashion, we shall lose the ability to detect an error
when none of the three precedence relations holds. However, this error will
eventually be caught by attempting a reduction and discovering that there is
no production whose right side is on top of the pushdown list. Nevertheless,
this delay in error detection could be an unacceptable price to pay for the
convenience of using precedence functions in place of precedence matrices,
depending on how important early error detection is in the particular com-
piler involved. E]]
Example 7.2
We can overcome much of this loss of timely error detection by imple-
menting a shift-reduce parsing algorithm for a precedence grammar in which
we associate both the precedence relations .~ and ~ with shift and 3> with
reduce. Moreover, for the shift-reduce parsing action function we need only
the precedence relations from N u Z u [$} to Z u {$}. For example, we can
associate <Z and ~ w i t h - - 1 and 3> with -at- 1 and obtain the precedence
1 2 3 4
a b c $
S -1
a --1 -1 -1
b -1 -1 -1
4 c +1 +1
5 $ --1 --1 --l

Fig. 7.3 Precedence matrix M ' .
matrix M' in Fig. 7.3 from Fig. 7.1. The blank entries represent error condi-
tions. We can show that
f = (0, O, O, 2, O) and g = (1, 1, 1, O)
are linear precedence functions for M'. These linear precedence functions
have the advantage that they reproduce the blank entries M14, M24, M34, and
M54 as 0 (since fl = f2 = f3 = f5 = g4). W e can thus use 0 to denote an error
condition and in this way preserve error detection that was present in the
original matrix M'. We shall consider this problem in greater detail in Section
7.1.3.
SEC. 7.1 LINEAR PRECEDENCE FUNCTIONS 547
We shall first present an algorithm which, given a precedence matrix M,

will find precedence functions for M whenever they exist. In the next section
we shall present a modification of this algorithm which when presented with
a precedence matrix with --1, + 1, and blank entries will find precedence
functions for the matrix such that blank entries will be represented by O's
as often as possible.
We first observe that if two rows of a precedence matrix M have identical
entries, then the two rows can be merged into a single row without affecting
the existence of linear precedence functions for M. Likewise, identical col-
umns can be merged. We shall call a precedence matrix in which all identical
rows and identical columns have been merged a reduced precedence matrix.
We can find precedence functions more etficiently if we first reduce the pre-
cedence matrix, of course.
ALGORITHM 7.1
Computation of linear precedence functions.
Input. An m × n matrix M whose entries are -- 1, 0, + 1, and blank.
Output. Two vectors of integers f = ( f ~ , . . . ,fro) and g -- (gl . . . . . g,)
such that
fi < g j if M~j : -- 1
ff. : g j if M~j = 0
fi > gj if M,v = + 1
or the output "no" if no such vectors exist.

Method.
(1) Construct a directed graph with at most m + n nodes, called the
linearization graph for M. Initially, label m nodes F 1, Fz, . . . . F,, and the
remaining n nodes G1, G2, • • •, G,. These nodes will be manipulated, and at
all times there will be some node/~, representing F~and a node (~j representing
Gj. Initially, ff~ = Fi and (~j -- Gj for all i and j. Then do step (2) or (3), as
appropriate, for each i and j.
(2) If M~j -- 0, create a new node N by merging ff~ and (~i. N now repre-
sents all those nodes previously represented by ~. and G j.
(3) If M~j = + 1, draw an edge from P~ to G j. If M,.~ - - - 1, draw an
edge from G j to P~.
(4) If the resulting graph is cyclic, answer "no."
(5) If the linearization graph is acyclic, let f. be the length of a longest
path beginning a t / ~ and let g~ be the length of a longest path beginning
a t G j . [-7
In step (4) of Algorithm 7.1 we can use the following general technique
to determine whether a directed graph G is cyclic or acyclic"
(1) Let G be the graph at hand initially.

(2) Find a node N in the graph at hand that has no descendants. If no
such node exists, report that G is cyclic. Otherwise, remove N.
(3) If the resulting graph is empty, report that G is acyclic. Otherwise,
repeat step (2).
Once we have determined that the graph is acyclic, we can use the follow-
ing labeling technique in step (5) of Algorithm 7.1 to determine the length
of a longest path extending from every node.
Let G be a directed acyclic graph (dag).
(1) Initially, label each node in G with 0.
(2) Repeat step (3) until no further changes can be made to the labels
of G. At that time the label on each node gives the length of a longest path
beginning at that node.
(3) Find a node N in G. Let N have direct descendants Nt, N2 . . . . , Nk
with labels ll, 1 2 , . . . , lk. Change the label of N to max{l~, lz, • • •, lk} + 1.
(If k = 0, the label of N remains 0.) Repeat this step for every node in G.
It should be clear that we shall repeat step (3) at most l times per node,
where l is the length of a longest path in G.
Example 7.3
Consider the precedence matrix M of Fig. 7.4.
1 2 3 4 5
I
-1 -1 0 ! -1
2 +1 -1 0
3 -1 +1 0
4 --1 +1
5 +1 +1
Fig. 7.4 Precedence matrix.
The linearization graph constructed from M is shown in Fig. 7.5. Note

that in step (2) of Algorithm 7.1, three pairs of nodes are merged: (F3, G4),
(F2, Gs), and (F1, G3).
The linearization graph is acyclic. From step (5) of Algorithm 7.1 we
obtain linear precedence functions f = (0, 1, 2, 1, 3) and g = (3, 2, 0, 2, 1).
For example, f5 is 3 since the longest path beginning at node F5 is of
length 3. [~]
THEOREM 7.1
A precedence matrix has linear precedence functions if and only if its
linearization graph is acyclic.
Fig. 7.5 Linearization graph.
Proof.
If: We first note that Algorithm 7.1 emits f and g only if the linearization
graph is acyclic. It suffices to show that i f f a n d g are computed by Algorithm
7.1, then
(1) M~j -- 0 implies that f. -- g j,
(2) M~j -- + 1 implies that f~ > gj, and
(3) M~j -- -- 1 implies that f. < gj
Assertion (1) is immediate from step (2) of Algorithm 7.1. To prove assertion
(2), we note that if M~j ---- + 1, then edge (if,., (~j) is added to the linearization
graph. Hence, f. > gj, since the length of a longest path to a leaf from node^
if,. must be at least one more than the length of a longest path from G j if
the linearization graph is acyclic. Assertion (3) follows similarly.
O n l y if: Suppose that a linear precedence matrix M has linear precedence
functions f and g but that the linearization graph for M has a cycle consist-
ing of the sequence of nodes Na, N2, •. •, Nk, N~+,, where N k + 1 - - N~ and
k > 1. Then by step (3), for all i, 1 < i < k, we can find nodes H~ and I~+ 1
such that
(1) H~ and I~+~ are original F's and G's;
(2) H,. and I,.+1 are represented by N,. and N,.+~, respectively; and,
(3) Either Hi is Fro, I~+ ~ is G, and Mm,, -- + 1, or H~ is G m, It+ ~ is F, and
M,,m = - - 1.
550 TECHNIQUES FOR PARSER OPTIMIZATION CHAP. 7
We observe by rule (2) that if nodes Fm and G, are represented by the same
Nt, then fm must equal g, if f and g a r e to be linearizing functions for M.
Let f a n d g be the supposed linearizing functions for M. Let ht be fm if Ht
is F= and let h t be gm if H t is Gm. Let h~ be f~ if I t is Fm and let h~ be gm if I t
is Gm. Then
ha>h~ =hz>h~ ..... h~>h~+a
But since Nk+ 1 is Na, we have h~+l = ha. However, we just showed that
ha > h~+ a. Thus, a precedence matrix with a cyclic linearization graph cannot
have linear precedence functions. [Z]
COROLLARY
Algorithm 7.1 computes linear precedence functions for M whenever

they exist and produces the answer "no" otherwise. D
7.1.2. Applications to Operator Precedence Parsing
We can try to find precedence functions for any matrix whose entries
have at most three values. The applicability of this technique is not affected
by what the entries represent. To illustrate this point, in this section we shall
show how precedence functions can be applied to represent operator pre-
cedence relations.
Example 7.4
Consider our favorite grammar G Owith productions
E >E+TIT
T > T , FIF
F >(E)[a
T h e m a t r i x giving the operator precedence relations for G o is shown in

Fig. 7.6.
( + * a )
<. <. <. <.

<. <. <. <. _-_
•> <. . > <. <..>
• .> <. •> .> <. .>
a i°~ •> .> .>

) -> •> .> .> Fig. 7.6 Matrixof operator precedence
relations for Go.
SEC. 7.1 LINEAR PRECEDENCEFUNCTIONS 551
1 2 3 4 5
-1 --1 --1
-1 --1 --1 0
-1 +1 --1 +1
--1 +1 +1 +1
+1 +1 +1 a,)
$ ( + . ) Fig. 7.7 Reduced precedence
o matrix M'.
Fig. 7.8 Linearization graph for M'.
If we replace ~ by --1, " by 0, and -> by -t-1, we obtain the reduced

precedence matrix M ' shown in Fig. 7.7. Here we have combined the rows
labeled a and ) and the columns labeled ( and a. The linearization graph for
M ' is shown in Fig. 7.8. From this graph we obtain the linear precedence
functions f'= (0, 0, 2, 4, 4) and g' = (0, 5, 1, 3, 0) for M'. Hence, the linear
precedence functions for the original matrix are f = (0, 0, 2, 4, 4, 4) and
g = (0, 5, 1, 3, 5, 0). [Z
7.1.3. Weak Precedence Functions
As pointed o u t previously, --1, 0, and ÷ 1 of the matrix of Algorithm

7.1 can be identified with the Wirth-Weber precedence relations <~, --', and
!552 TECHNIQUESFOR PARSER OPTIMIZATION CHAP. 7
>• , respectively. If linear precedence functions are found, then the precedence
relation between X and Y is determined by applying the first function to X
and the second to Y. In this case, all pairs X and Y will have some precedence
relation between them, so error detection is delayed until either the end of
the input is reached or an impossible reduction is called for.
However, the linear precedence function technique can be applied to
the representation of shift-reduce parsing decisions with an opportunity of
retaining some of the error-checking capability present in the blank entries
of the original matrix of precedence relations. Let us define a weak prece-
dence matrix as an m × n matrix M whose entries are --1, + 1, and blank.
The --1 entries generally denote shifts, the + 1 entries reductions, and the
blank entries errors. Such a matrix can be used to describe the shift-reduce
function of a shift-reduce parsing algorithm for a weak precedence grammar,
a (1, 1)-precedence grammar, or a simple mixed strategy precedence grammar.
We say that v e c t o r s f a n d g are weak precedence functions for a weak pre-
cedence matrix M if f. < gj whenever Mtj = - - 1 and f. > gj whenever
M ~ j = +1.
The condition ~ = gj can then be used to denote an error condition,
represented by a blank entry M~j. In general, we may not always be able to
have ~ = gj wherever M~i is blank, but we would like to retain as much of
the error-detecting capability of the original matrix as possible.
Thus, we can view the problem of finding weak precedence functions for
a weak precedence matrix M as one of finding functions which will produce
as many O's for the critical blank entries of M as possible. We choose not to
fill in all blanks of the weak precedence matrix with O's immediately, since
this would restrict the number of useful weak precedence matrices that have
weak precedence functions. Some blank entries may have to be changed to
--1 or + 1 in order for weak precedence functions to exist (Exercise 7.1.9).
In addition, some blank entries may never be consulted by the parser, so
these entries need not be represented by O's.
The concept of independent nodes in a directed acyclic graph is of impor-
tance here. We say that two nodes N1 a n d Nz of a directed acyclic graph
are independent if there is no path from N I to Nz or from N2 to N1.
We could use Algorithm 7.1 directly to produce weak precedence func-
tions for a weak precedence matrix M, but this algorithm as given did not
attempt to maximize the number of O's produced for blank entries. However,
we shall use the first three steps of Algorithm 7.1 to produce a Iinearization
graph for M.
From Theorem 7.1 we know that M has weak precedence functions if
and only if the linearization graph for M is acyclic. The independent nodes
of the linearization graph determine which blank entries of M can be pre-
served. That is, we can have f,. = gj if and only if F~ and Gj are independent
nodes. Of course, if we choose to have fi = gj, then there may be other pairs
of independent nodes whose corresponding numbers cannot be made equal.
Example 7.5
The matrix of W i r t h - W e b e r precedence relations for the g r a m m a r G O
is shown in Fig. 7.9. The columns corresponding to nonterminals have been
deleted, since we shall use this matrix only for shift-reduce decisions. The
corresponding reduced weak precedence matrix is shown in Fig. 7.10, and
the linearization graph that results from this reduced matrix is shown in
Fig. 7.11. In this graph the nodes labeled F1 and G4 are independent. Also,
G 2 and G 4 are independent, but F 1 and G3 are not. [~]
a ( ) + * $
=" =.
•> .> _._ .>
•> .> .> .>

•> .> .> .>
•> .> .> .>

~° ~°
~. ~°
~° ~°
$ <" <.
Fig. 7.9 Precedence relations for Go.
1 2 3 4
-1 E
+1 -1 +1 T
3 +1 +1 +1 F,a, )
4 -1 (,+,*,$
a ) • $ Fig. 7.10 Reduced weak precedence

( + matrix.
We can generalize step (5) of Algorithm 7.1 to determine precedence

functions which maximize the number of 0's produced for blank entries.
We can view the determination of the components of the precedence vectors
as an assignment of numbers to the nodes of the linearization graph. Any
set of pairwise-independent nodes can be assigned the same number, but
a node which is an ancestor of one or more nodes must be assigned a larger
number than any of its descendants.
We shall assign numbers to the nodes as follows. First, we partition the
nodes of the linearization graph into clusters of independent nodes such
Q Fig. 7.11 Linearization graph.
that the total number of F-G pairs together in a cluster is as large as possible
and no one cluster contains both descendents and ancestors of another
duster. In general there may be many different sets of clusters possible, and
certain F-G pairs may be more desirabie than others. This part of the process
may well be a large combinatorial problem.
However, once we have partitioned the graph into a set of clusters, we
can then find a linear order < on the clusters such that, for clusters C and
C', C < C' if C contains a node that is a descendant of a node in C'. If
C 0, C ~ , . . . , C k is the sequence of clusters in this linear order, we then assign
0 to all nodes in C 0, 1 to all nodes in C1, and so forth.
Example 7.6
Consider the linearization graph for G O shown in Fig. 7.11. The set
~F 1, F 4, G4) is an example of a cluster of independent nodes, and so are
(F4, G2, G4~}, {F2, Gi}, ~G1, G3), and (F 3, GI~. However, the cluster (G 1, G3)
is not desirable, since both nodes in this cluster are labeled by G's and
thus it would not produce a 0 entry in the weak precedence matrix. The
cluster (F3, G~) might be more desirable than the cluster (F2, G~), since
f3 ~ g l will detect errors whenever aa, a(, )a, or )( appear in an input string,
while f~ = g~ will detect errors only for the pairs Ta and T(. Also, note that
if we detect an error whenever aa appears, we shall not be able to reduce
a to F, so the adjacencies Fa and Ta would never occur.
Thus, o n e possible clustering of nodes is {F~}, {F4, Gz, G4}, {F2}, {G3},
{F 3, G~}. Taking the linear order on clusters to be the left-to-right order
SEC. 7.1 LINEAR PRECEDENCEFUNCTIONS 555
shown, we obtain the weak precedence functions
f = (0, 2, 4, 1) and g = (4, 1, 3, 1)
These functions define the precedence matrix shown in Fig. 7.12. D

a ( ) + * $
--1 --1 -1 --1 -1 -1
-1 -1 +1 +1 -1 +I
0 0 +1 +l +1 +1
0 ~0 +1 +1 +1 +1
0 0 +1 +1 +1 +1
-1 --1 0 0 -1 0
--1 -1 0 0 -1 0
1 -I 0 0 -I 0
Fig. 7.12 Resulting precedence matrix
-1 -1 o o -1 o
for Go.
7.1.4. Modification of Precedence Matrices
Example 7.6 suggests that certain error entries in the matrix of W i r t h -

Weber precedence relations will never be consulted by the shift-reduce
parsing algorithms for simple and weak precedence grammars (Algorithms
5.12 and 5.14). If we can isolate these entries and replace them by "don't
cares," then we can ignore these entries when we are attempting to find weak
precedence functions that cover as many error entries as possible.
To understand what modifications can be made to a matrix of precedence
relations, we first define what we mean when we say two shift-reduce parsing
algorithms are exactly equivalent. We shall use the notation for shift-reduce
parsing algorithms that was given in Section 5.3 (Volume I).
DEFINITION
L e t a~ = ( f l , g l ) ' l " and ffz = ( f 2 , g2) be two shift-reduce parsing
algorithms for a context-free grammar G = (N, E, P, S). We say that ffl and
ffz are exactly equivalent if their behavior on each input string is identical:
that is, if an input string w is in L(G), then both parsing algorithms accept
w. If w is not in L(G), then both parsers announce error after the same number
of steps and in the same phase. If in the reduction phase, an error relation is
found after scanning an equal number of symbols down the stack.
We shall determine which blank entries in the canonical matrix of W i r t h -
Weber precedence relations can be changed without affecting the parsing
tHere fl is the shift-reduce function and gl is the reduce function.

behavior of the shift-reduce parsing algorithm constructed from that matrix

using Algorithm 5.12. The chief use of this analysis is in finding good clusters
for weak precedence functions as discussed in the previous section. Blank
entries which should not be changed will be called essential blanks. The
theorem that follows identifies the essential blanks.
First, let us establish some notational conventions. Suppose that G =
(N, Z, P, S) is a CFG. We let M~ be the matrix of canonical Wirth-Weber
precedence relations for G. (These are the ones that are created by the defini-
tion.) We shall subscript these precedence relations with c. If no Wirth-
Weber precedence relation holds between a pair of symbols X and Y, we
shall write X ?~ Y.
We can also create an arbitrary matrix M of < ' s , -~-'s, .>'s, and blanks.
If M has the same dimensions as M~, then we shall call M a matrix of pre-
cedence relations for G. We shall write X ? Y if the (X, Y) entry in M is
blank.
We can use Algorithm 5.12 to construct the shift-reduce parsing algor-
ithm ~ = (f~, g~) for G using the Wirth-Weber precedence relations in Me.
We can also use Algorithm 5.12 to construct another shift-reduce parsing
algorithm et = (f, g) for G using the precedence relations in M. Theorem
5.15 guarantees that O~ is a valid parsing algorithm for G, but there is no
guarantee that O~will be a valid parsing algorithm for G. However, the fol-
lowing theorem states necessary and sufficient conditions for a to be exactly
equivalent to a~.
THEOREM 7.2
is exactly equivalent to O~ if and only if the following four conditions
are satisfied for all X and Y in N u Z u {$}, a and b in Z u {$}, and A in N.
(1) (a) If X <~ Y, then X < Y.
(b) If X--~-~ Y, then X ~ Y.
(c) If X >~ a, then X > a.
(2) If b ?~ a, then b ? a.
(3) If A ?~ a, then either
(a) A ? a or
(b) For all Z in N U Z such that A ~ e Z is a production in P
the relation Z .>~ a is false.
(4) If X ?~ A, then either
(a) X ? A or
(b) For all Z in N U Z such that A ~ Z~¢ is in P the relation X <~ Z
is false.
Proof
If: By condition (1), the moves of ~ and ~ must agree until the latter
detects an error. Therefore, it suffices to show that if the two parsing
algorithms reach configuration Q = (x1 .." Xm, a l " " at, n) and we find
Q [aT error, then Q I-~-error, and, moreover, the mechanism of error detec-
tion is the same in both a and a~.
Let us first assume that in configuration Q, f~(X,n, a l ) = error but
f(Xm, a l ) - ~ error. We shall show that a contradiction arises. Thus, suppose
that Xm ?c a~ but that Xm ? al does not hold. By condition (2), Xm must be
a nonterminal. By condition (3), for all Y such that Xm ~ 0~Y is in P, Y 3>c a i
is false.
Examination of the precedence parsing algorithm indicates that the
only way for a nonterminal to be on top of the stack is for the previous move
to have been a reduction. Then there is some production Xm ----~t~ Y in P
such that the move of both parsers before configuration Q was entered is
(Xi • :. X,,,_lOCY, al . . . ar, n') F- Q. But this implies that Y 3>~ a~, in con-
tradiction.
The other possibility is that in configuration Q, g~(X, . . . Xz, e) -- error
but g(X~ . . . X,,,, e) =t= error. The only case that needs to be considered here
is that in which X,,, .>~ al, Xm 3> al and there is some s such that X, ?c X,+i,
while X; "--~ X;+ ~ and X~ --" X;+ ~ for s < i < m, but the relation X, ? X,+ 1
does not hold. We claim that X,+ 1 must be a nonterminal, because the only
way a,c could place a terminal above X, on the stack is if X, <~c X,+a or
By condition (4), we can not have X, <~ Y if Xs+ ~--, Y~ is in P. But the
only way that X,+ 1 could appear next to X, on the stack is for a reduction
of some Y0c to X,+, to occur. That is, there must be some configuration
(Xi . . . X, Ytx, b 1 . . . bk, zt") leading to Q such that
(X1 .--XsYoc, b, . . . bk, zr")l-a: (X, .-. X,X,+i, b , . . . bk, zr,"i).
But then Xs <~c Y in violation of condition (4).

Only if: It is straightforward to show that if condition (1) is violated,
the parsers are not exactly equivalent. We therefore omit this portion of
the proof and proceed to the more difficult portions.
Case 1: Suppose that condition (2) is violated. That is, for some b ?~ a,
we do not have b ? a. Since G is a simple precedence grammar (and hence
proper), there is some sentence wbx in L(G). Consider the parsing of wba by
~ a n d a. Since wbx is in L(G), neither parser can declare an error until
the a in wba becomes the next input symbol. Thus, both parsers must enter
some configuration ($a, ba$, n), at which time the b is shifted onto the stack,
yielding configuration ($ab, aS, 70. Since b ?~ a but b ? a is false, ~c and
are not exactly equivalent.
Case 2: Suppose that condition (3) is violated. That is, we have A ?c a,
A ? a is false, and there is some A ~ ctX in P such that X 3>c a.
Since G is proper, there is some right-sentential form flAw of G and there

is some x in E* such that A =~ ~X ==~ fll = ~ . . . =~ ,O, =~ x. Moreover,
rm rm rra rm rm
there exists y in E* such that fl ~ y. By Lemma 5.3, if Y is the last symbol

of any of f i E , . . . , ft, or x, then Y > ~ a .
In the parsing of yxw, we note that the first symbol of w is not shifted
until yx is reduced to flA. Therefore, the parsing of yxa will proceed exactly
as that of yxw--until configuration ($IIA, aS, n') is reached. But A ?ca
holds while A ? a does not, so the two parsers are not exactly equivalent.
Case 3: Suppose that condition (4) is violated. That is, we have X ?c A

a n d some production A----~ Y~ such that X ? A is false and X <c Y.
Let flAw be a right-sentential form and A =~ rm
Y~ ==~
rm
71 =~
rm
"'" =~
rm
7, = ~ x .
Also, let 6Xu be a right-sentential form such that ~ =~ y and X =% z. Then
by Lemma 5.3, X is related by <¢ to every first symbol in each of 7~, • • •, 7,
and x. Moreover, the last symbol of each right-sentential form in a deriva-
tion X =~ z is related by .>~ to the first symbol of x.
Then~when parsing yzxw, the configuration ($d~X, xw$, n) will be reached,
and subsequently ($SXA, w$, zg) will be entered. The parsers will eventually
attempt to reduce by the production that introduced A into flaw. If X ?c A,
but X ? A is false, the exact equivalence of the two parsers is again contra-
dicted. [Z]
Example 7.7
Consider the following simple precedence grammar G"
E: >E+AIA
A >T
T ~T,FIF
F >(B[a
B > E)
It should be evident that L(G)= L(Go). The matrix of canonical Wirth-

Weber precedence relations for G is shown in Fig. 7.13. Let us consider
which entries of Fig. 7.13 can be modified according to Theorem 7.2. Condi-
tion (1) states that no nonblank entries can be changed. Condition (2) states
that all blank entries in the intersection of the last six rows and the last six
columns are essential.
By condition (3), (E, $) is an essential blank since E ~ A is a production
and A 3> $. The remaining blanks in the last six columns are not essential
and thus can be changed arbitrarily.
$EC. 7.1 LINEAR PRECEDENCE FUNCTIONS 559
E A T F B a ( ) + * $
E -+- -'-
A "> .> .>
•> .> =" .>
•> .> .> .>
•> .> .> .>
•> .> .> .>
•> .> .> .>
( <" <. <. <- =. .< <.
=- <. <. <. <.
, -_- <. <.
~ <. <. <. <. <. <.
Fig. 7.13 Matrix of canonical Wirth-Weber precedence relations.
C o n d i t i o n (4) requires t h a t the ($, B) e n t r y be an essential b l a n k , because

B ---~ E) is a p r o d u c t i o n a n d $ .~ E. T h e r e m a i n i n g b l a n k entries in the first
five c o l u m n s can be c h a n g e d arbitrarily.
If we use A l g o r i t h m 5.14 to c o n s t r u c t a s h i f t - r e d u c e p a r s i n g parsing

a l g o r i t h m for a u n i q u e l y invertible w e a k precedence g r a m m a r , t h e n we can
s h o w t h a t the a n a l o g o u s parsers ~2 a n d ~c o f T h e o r e m 7.2 are exactly
e q u i v a l e n t if a n d only if the first three c o n d i t i o n s o f T h e o r e m 7.2 are sat-
isfied.t
Example 7.8
Using conditions (1)-(3) of Theorem 7.2 on the weak precedence relations
for Go shown in Fig. 7.9 (p. 553), we find that all the blanks in the last six
rows are essential. The only o t h e r essential b l a n k is (E, $), since E ~ T is a
p r o d u c t i o n a n d T 3> $.
E x a m i n i n g the linearization g r a p h o f Fig. 7.11, we find t h a t there are
no p r e c e d e n c e functions such t h a t every essential b l a n k is r e p r e s e n t e d by 0.
t Recall that reductions do not depend on the precedence matrix in a weak precedence
parser.
This wouldrequire, for example, that nodes F4, G2, G3, and a 4 all be placed
in one cluster.
At this point we might give up trying to use precedence functions to
implement the parser. However, we can consider using a slightly weaker
definition of equivalence between parsers.
Exact equivalence is very stringent. In practical situations we would be
willing to say that two shift-reduce parsing algorithms are equivalent if they
either both accept the same input strings or both announce error at the same
position on erroneous input strings. Thus, one parser could announce error
while the other made several reductions (but no shift moves) before announc-
ing error. Under this definition, which we shall call simply equivalence, we
can modify precedence relations even more drastically but still preserve
equivalence. (See Exercise 7.1.13.)
With this weaker definition of equivalence we can show that a shift-
reduce parsing algorithm using the precedence functions
E T F a ( ) + , $
0 2 i5 5 4 5 4 4 4
5 5 1 1 3 0
is equivalent to the parser constructed by Algorithm 5.14 from the weak

precedence relations in Fig. 7.9. [Z]
We shall explore this weaker form of equivalence in much greater detail

in Sections 7.2, 7.3, and 7.4.
EXERCISES
7.1.1. Find linear weak precedence functions for the following grammars
or prove that none exist"
(a) S---~SAIA
A --~ ( S ) I ( )
(b) E --~ E + T[ + TIT
T---, T , FIF
F--~ (E)[a
7.1.2. Show that if M ' is a matrix formed from M by permuting some rows
and/or columns, then the vectors f and g produced by Algorithm 7.1
for M ' will be permutations of those produced for M.
7.1.3. Find linear precedence functions for the matrix of Fig. 7.14.
7.1.4. Find an algorithm to determine whether a matrix has linear precedence
functions f and g such that f = g.
EXERCISES 561
+1 +1 +1
+1 +1 +1
-1 -1 +1 +1 +1
-! -1 -1 +1 +1 +1
-1 -I -1 -1 +1 +1
-1 -1 -1 -1 -1 0
Fig. 7.14 Matrix.
"7.1.5. (a) Show that the technique given after Algorithm 7.1 for determining
whether a directed graph is acyclic actually works.
(b) Show that this technique can be implemented to work in time
0(n) + 0(e), where n is the number of nodes and e is the number
of edges in the given graph. H i n t : Choose a node in the graph.
Color all nodes on a path extending from this node until either a
leaf or previously colored node is encountered. If a leaf is found,
remove it, back up to its immediate ancestor, and then continue the
coloring process.
7.1.6. (a) Show that the labeling technique given after Algorithm 7.1 will
find the length of a longest path beginning at each node.
(b) Show that this technique can be implemented in time 0(n) + 0 (e),
where n is the number of nodes and e the number of edges in the
graph.
7.1.7. Give an algorithm which takes a matrix M with entries --1, 0, + l ,
and blank and a constant k and determines whether there exist vectors
f and g such that
(1) If Mij = --1, then f~ + k < gs;
(2) If Mij -- 0, then [f~ -- g j l - < k;
(3) If M~j -- + 1, then f~ > gj + k.
DEFINITION
Let M be a weak precedence matrix. We say a sequence of integers
il, i2, •. •, it,, where k is even and greater than 3, is a cycle of M if
(1) M,.~m, = -- 1 for odd j, Mij÷li, = + i for even j, and Mia, = + I,
or
(2) M,~m, = + 1 for odd j, Mm,t j = -- 1 for even j, and M,i, = -- 1.
7.1.8. Show that there exist weak precedence functions for a weak precedence
matrix if and only if M contains no cycle.
7.1.9. Let M be a weak precedence matrix and let i, j, k, and l be indices such
that either
(1) Mtk --- M j l - - --1, M j k -- + 1, and Ma is blank, or
(2) Mtk -- Mit = + 1, Mix -- --1, and Ma is blank.
Let M ' be M with Ma replaced by -- 1 in case (1) and by + 1 in case (2).
Show that f and g are weak precedence functions for M if and only if f
and g are weak precedence functions for M ' .
DEFINITION
We say that two rows (columns) of a precedence matrix are com-
patible if whenever they differ one is blank. We can merge compatible
rows (columns) by replacing them by a single row ( c o l u m n ) w h i c h
agrees with all their nonblank entries.
7.1.10. Show that the operations of row and column merger preserve the prop-
erty of not having linearizing functions.
We can also use linear precedence functions to represent the <
and " relations used by the reduce function in the shift-reduce parsing
algorithm constructed by Algorithm 5.12. First, we construct a weak
precedence matrix M in which -- 1 represents < , + 1 represents --~-.,
and blanks represent both 3> and error. We then attempt to find linear
precedence functions for M, again attempting to represent as many
blanks as possible by O's.
7.1.11. Represent the < and ' relations of Fig. 7.13 with linear precedence
functions. Use Theorem 7.2 to locate the essential blanks and attempt
to preserve these blanks.
"7.1.12. Show that under the definition of exact equivalence for weak prece-
dence parsers a blank entry (X, Y) of the matrix of Wirth-Weber
precedence relations is an essential blank if and only if one of the fol-
lowing conditions holds"
(1) X and Y are in ~ U {$}; or
(2) X is in N, Y is in X U {$}, and there is a production X ~ czZ
such that Z 3>c Y.
In the following problems, "equivalent" is used in the sense of
Example 7.8.
• 7.1.13. Let ~c and ~ be shift-reduce parsing algorithms for a simple precedence
grammar as in Theorem 7.2. Prove that ~c is equivalent to ~ if and
only if the following conditions are satisfied"
(1) (a) If X < c Y, then X < Y.
(b) If X - - ~ Y, then X " Y.
(c) If X 3>~ a, then X 3> a.
(2) If b ?~ a, then b < a is false.
(3) If A ?~a and A < a or A " a, then there is no derivation
A ~r m 0~aX~=-~
rm
. . . :=~
rm
~mXm, m >
~
1, such that for 1 a, and Xm ">c a, or Xm is a terminal and X,n 3> a.
(4) I f A ~ < a o r A~ " a f o r s o m e a , then there does not exist a
derivation A~ ==~ A2 ==~ . . . ~ Am ==~ B0~, m > 1, a symbol X, and a
production B ~ Y f l such that
(a) X ? ¢ A ~ b u t X < A t , for2~i<m;
(b) X ?c B but X < B; and
(c) x < Y.
7.1.14. Show that the parser using the precedence functions of Example 7.8
is equivalent to the canonical precedence parser for Go.
7.1.15. Let M be a matrix of precedence relations constructed from Me, the
matrix of canonical Wirth-Weber precedence relations, by replacing
some blank entries by ->. Show that the parsers constructed from M
and Mc by Algorithm 5.12 (or 5.14) are equivalent.
"7.1.16. Consider a shift-reduce parsing algorithm for a simple precedence
grammar in which after each reduction a check is made to determine
whether the < or ~ relation holds between the symbol that was
immediately to the left of the handle and the nonterminal to Which the
handle is reduced. Under what conditions will an arbitrary matrix of
precedence relations yield a parser that is exactly equivalent (or equiv-
alent) to the parser of this form constructed from the canonical Wirth-
Weber precedence relations ?
*'7.1.17. Show that every CFL has a precedence grammar (not necessarily
uniquely invertible) for which linear precedence functions can be found.
Research Problems
7,1,18, Give an efficient algorithm to find linear precedence functions for a
weak precedence grammar G that yields a parser which is equivalent
to the canonical precedence parser for G.
7,1,19, Devise good error recovery routines to be used in conjunction with
precedence functions.
7,1,20, Construct a program that implements Algorithm 7.1.
7,1,21, Write a program that implements a shift-reduce parsing algorithm using
linear precedence functions to implement the f and g functions.
7,1,22, Write a program that determines whether a CFG is a precedence gram-
mar that has linear precedence functions.
7,1,23, Write a program that takes as input a simple precedence grammar G
that has linear precedence functions and constructs for G a shift-reduce
parser utilizing the precedence functions.
BIBLIOGRAPHIC NOTES
Floyd [1963] used linear precedence functions to represent the matrix of operator
precedence relations. Wirth and Weber [1966] suggested their use for representing
Wirth-Weber precedence relations. Algorithms to compute linear precedence
functions have been given by Floyd [1963], Wirth [1965], Bell [1969], Martin [1972],
and Aho and Ullman [1972a].
Theorem 7.2 is from Aho and Ullman [1972b], which also contains answers to
Exercises 7.1.13 and 7.1.15. Exercise 7.1.17 is from Martin [1972].
7.2. OPTIMIZATION OF F L O Y D - E V A N S PARSERS
A shift-reduce parsing algorithm provides a conceptually simple method

of parsing. However, when we attempt to implement the two functions of
the parser, we are confronted with problems of efficiency. In this section we
shall discuss how a shift-reduce parsing algorithm for a uniquely invertible
weak precedence grammar can be implemented using the Floyd-Evans
production language. The emphasis in the discussion will be on methods
by which we can reduce the size o f the resulting Floyd-Evans production
language program without changing the behavior of the parser. Although
we only consider precedence grammars here, the techniques of this section
are also applicable to the implementation of parsers for each of the other
classes of grammars discussed in Chapter 5 (Volume I).
7.2.1. Mechanical Generation of Floyd-Evans

Parsers for Weak Precedence Grammars
We begin by showing how a Floyd-Evans production language parser

can be mechanically constructed for a uniquely invertible weak precedence
grammar. The Floyd-Evans production language is described in Section
5.4.4 of Chapter 5. We shall illustrate the algorithm by means of an example.
As expected, we shall use for our example the weak precedence grammar
Go with productions
(1) E - ~ E + T
(2) E ~ T
(3) T ~ T, F
(4) T - r E
(5) F ~ (E)
(6) F---~a
The Wirth-Weber precedence relations for Go were shown in Fig. 7.9. (p.
553). From each row of this precedence matrix we shall generate statements
of the Floyd-Evans parser. We use four types of statements: shift statements,
reduce statements, checking statements, and computed goto statements.?
We shall give statements symbolic labels that denote both the type of state-
ment and the top symbol of the pushdown list. In these labels we shall use
S for shift, R for reduce, C for checking, and G for goto, followed by the
symbol assumed to be on top of the pushdown list.
We shall generate the shift statements first and then'the reduce statements.
1"The computed goto statement involves an extension of the production language of

Section 5.4.4 in that the next label can be an expression involving the symbol .#, which,
as in Section 5.4.4, represents an unknown symbol matching the symbol at a designated
position on the stack or in the lookahead. While we do not wish to discuss details of
implementation, the reader should observe that such computed gotos as are used here
can be easily implemented on his favorite computer.
SEC. 7.2 OPTIMIZATION OF FLOYD-EVANS PARSERS 565
For weak precedence grammars the precedence relations ~ and " indicate
shift and .> indicates a reduction.
The E-row of Fig. 7.9 generates the statements
(7.2.1) SE: El) > E)] • S)

E l i - ---~ E + I ,S+
$E15 I accept
El I error
The first statement states that if E is on top of the pushdown list and the
current input symbol is ), then we shift ) onto the pushdown list, read the
next input symbol, and go to the statement labeled S). If this statement does
not apply, we see whether the current input symbol is + . If the second
statement does not apply, we next see if the current input symbol is $. The
relevant action here would be to go into the halting state accept if the
pushdown list contained $E. Otherwise, we report error. Note that no reduc-
tions are possible if E is the top stack symbol.
Since the first component of the label indicates the symbol on top of
the pushdown list, we can in many cases avoid unnecessary checking of
the top symbol of the pushdown list if we know what it is. Knowing that
E is on top of the pushdown list, we could replace the statements (7.2.1) by
SE: 1) ~ )1 • s)
I + ---~ +i ,S+
$#15 1 accept
I I error
Notice that it is important that the error statement appear last. When E is
on top of the pushdown list, the current input symbol must be ), + , or $.
Otherwise we have an error. By ordering the statements accordingly, we can
first check for ), then for + , and then for $, and if none of these is the cur-
rent input symbol, we report error.
The row for T in Fig. 7.9 generates the statements
ST: ,S,
RT: E+T > E CT
T E CT
(7.2.2) CT: ) SE
+ SE
SE
error
Here the precedence relation T ~--. generates the first statement. The
precedence relations T-> ), T-> --k, and T-> $ indicate that with T on top
of the pushdown list we are to reduce. Since we are dealing with a weak
precedence grammar, we always reduce using the longest applicable pro-
duction, by Lemma 5.4. Thus, we first look to see if E + T appears on top
of the pushdown list. If so, we replace E + T by E. Otherwise, we reduce
T to E. When the two R T statements are applicable, we know that T is on
top of the pushdown list. Thus we could use
RT: E -k ~ l , EI CT
@l > El CT
and again avoid the unnecessary checking of the top symbol on the push-
down list.
After we perform the reduction, we check to see whether it was legal.
That is, we check to see whether the current input symbol is either ), + ,
or $. The group of checking statements labeled CT is used for this purpose.
We report error if the current input symbol is not ), + , or $. Reducing first
and then checking to see if we should have made a reduction may not always
be desirable, but by performing these actions in this order we shall be able
to merge common checking operations.
To implement this checking, we shall introduce a computed goto state-
ment of the form
G: #i I S_~
indicating that the top symbol of the pushdown list is to become the last
symbol of the label.
Now we can replace the checking statements in (7.2.2) by the following
sequence of statements"
CT: I) I G
1+ 1 G
!$ 1 G
1 I error
G: -~ ! i s#
We shall then be able to use these checking statements in other sequences.
For example, if a reduction in G Ois accomplished with T on top of the stack,
the new top of the stack must be E. Thus, the statements in the CT group
could all transfer to SE. However, in general, reductions to several different
nonterminals, could be made, and the computed goto is quite useful in
establishing the new top of the stack.
Finally, for convenience we shall allow statements to have more than

one label. The use of this feature, which is not difficult to implement, will
become apparent later. We shall now give an algorithm which makes use
of the preceding ideas.
ALGORITHM 7.2
Floyd-Evans parser from a uniquely invertible weak precedence grammar.
lnput. A uniquely invertible weak precedence grammar G = (N, E, P, S).
Output. A Floyd-Evans production language parser for G.
Method.
(1) Compute the Wirth-Weber precedence relations for G.
(2) Linearly order the elements in N U E W [$} as (X1, X2 . . . . , Y m ) ,
(3) Generate statements for X~, X2 . . . . . Xm as follows. Suppose that X~
is not the start symbol. Suppose further that either X~ ~ a or Xt " a for all
a in (al, a 2 , . . . , a:}, and Xt -~ b for all b in (b 1 , . . . , bl}. Also, suppose
A 1~ ~lXi, A 2 ~ ~2X~,..., A k ~ ~kXi are the productions having X~ as
the last symbol on the right-hand side, arranged in an order such that 0~Xt
is not a suffix of 0~qX~for p < q. Moreover, let us assume that Ah ~ ehX~ has
number Ph, 1 ~ h ~ k. Then generate the statements
sx, " l a, ~ a, t , Sa~

l az > a2 [ * Saz
°
l aj ~ a:l • Sa k
RXi: 0~1~[ >All emitpl cx,
0~2@1 - >A21 emitp2 cx,
,
0~k~l > Ak I emit Pk cx,

I 1 error
c~- Ib~ I G
G
[bl G
L error
If j is zero, then the first statement of the RX,. group also has label SX~.
If k is zero, the error statement in the RX~ group has label RX~. If X,. is
the start symbol, then we do as above and also add the statement
$@ 15 1 accept
to the end of the S X i group. S$ is the initial statement of the parser.

(4) Append the computed goto statement:
G: #1 t s# D
Example 7.9
Consider the grammar Go. From the F row of the precedence matrix
we would get the following statements:
SF: RF~f : T,@ ~T emit 3 CF

# >T emit 4 CF
error
CF: ) G
+ G
G
$ G
error
Note that the third statement is useless, as the second statement will always
produce a successful match. We could, of course, incorporate a test into
Algorithm 7.2 which would cause useless statements not to be produced.
From now on we shall assume useless statements are not generated.
From the a row we get the following statements:
Sa: Ra: ~ l >F emit 6 Ca

Ca: 1) G
I+ G
!, G
Is G
I error
Notice that the checking statements for a are identical to those for F.
~Note the use of multiple labels for a location. Here, the SF group is empty.
SEC. 7 . 2 OPTIMIZATION OF FLOYD-EVANS PARSERS 569
In the next section we shall outline an algorithm which will merge redun-
dant statements. In fact, the checking statements labeled CT could also be
merged with CF if we write
Ca: CF: 1" I G

CT: I) I c
I+ 1 c
J$ 1 c
l [ error
Our merging algorithm will also consider partial mergers of this nature.
The row labeled ( in the precedence matrix generates the statements
I( > (t ,s(
[a - > al ,Sa
t ] error
Similar statements are also generated by the rows labeled + , ,, and $. [--]
We shall leave the verification of the fact that Algorithm 7.2 produces
a valid right parser for G for the Exercises.
7.2.2. Improvement of Floyd-Evans Parsers
In this section we shall consider techniques which can be used to reduce

the number of shift and checking statements in the Floyd-Evans parser
that results from Algorithm 7.2. Our basic technique will be to merge com-
mon shift statements and common checking statements. The procedure
may introduce additional statements having the effect of an unconditional
transfer, but we shall assume that these branch statements have relatively
small cost. We shall treat the merger of shift statements here; the same
technique can be used to merge checking statements.
Let G = (N, E, P, S) be a uniquely invertible weak precedence grammar.
Let M be its matrix of Wirth-Weber precedence relations. The matrix M
determines the shift and checking statements that arise in Algorithm 7.2.
From the precedence matrix M, we construct a merged shift matrix Ms
as follows:
(1) Delete all .> entries and replace the ~" entries by <~. (Since we care
only about shifts, the <~ and " relations can be identified.)
(2) If two or more rows of the resulting matrix are identical, replace
them by one row in M,, with the new row identified with the set of symbols
in N u E u {$} with which the original rows were associated.
(3) Delete all rows with no < entries and call the resulting matrix Ms.
Example 7.10
The merged shift matrix for G o from Fig. 7.9 is shown in Fig. 7.15. This
merged shift matrix is a concise representation of the situations in which
the parser is to make a shift move.
( a ) + • $
~. ~o
T <"
{(,+,,,,} <. <.

Fig. 7.15 Merged shift matrix.
From the merged shift matrix M,, we construct an unordered labeled

directed graph (A, R), called the shift graph associated with Ms, as follows:
(1) For each row of M, labeled Y, there is a node in A labeled Y.
(2) There is one additional node labeled ~ in A representing a fictitious
empty row.
(3) If row Y of Ms is covered by row Z of M, (that is, in whatever column
Y has a < entry, Z has a ~ entry), then edge ( Y, Z) is in R, and edge ( Y, Z)
is labeled with the number of columns in which Z, but not Y, has a < entry.
Note that Y may be the empty row. We let I(Y, Z) denote the label of edge
( Y, Z).
Example 7.11
Consider the shift matrix Ms given in Fig. 7.16. The shift graph associated
with M is shown in Fig. 7.17. [~]
al a2 a3 a4 a5 a6
Y1 <. <. <. <. <.
<. <. <. .
Y3 <. .
<. <.
Ys "
Fig. 7.16 Shift matrix Ms.

Fig. 7.17 Shift graph from Ms.
It should be clear that the shift graph is a directed acyclic graph with
a single root, 0 . The number of shift statements generated by Algorithm
7.2 is equal to the number of shift ( < and ~--) entries in the precedence
matrix M. Using the shift matrix M, and merging rows with similar shift
entries, we can reduce the number of shift statements that are required.
The technique is to construct a minimum cost directed spanning tree (subset
of the edges which forms a tree, with all nodes included) for the shift graph,
where the cost of a spanning tree is the sum of the labels of the edges in
the tree.
A path from O to Y to Z in the shift graph (A, R) has the following
interpretation. The label l(O, Y) gives the number of shift statements gener-
ated for row Y of M,. Thus, the number of shift statements that would be
generated for rows Y and Z is l ( ~ , Y) + l ( ~ , Z). However, if there is a path
from ~ to Y to Z in the graph, we can first generate the shift statements for
row Y. To generate the shift statements for row Z, we can use the shift
statements for row Y and precede them by those shift statements for row Z
which are not already present. Thus, we would generate the following se-
quence of shift statements for rows Y and Z:
SZ" Shift statements for entries in Z but not in Y

SY: Shift statements for entries in Y
The number of shift statements for Y and Z would thus be l ( 0 , Y) + l( Y, Z)

- l ( 0 , Z), rather than 1(0, Y) + l ( 0 , Z). We thus get the shift statements
-
for row Y "for free."

572 TECHNIQUESFOR PARSEROPTIMIZATION CHAP. 7
We can generalize this technique in an algorithm which takes an arbitrary

directed spanning tree for a shift graph and constructs a set of shift state-
ments "corresponding" to that spanning tree. The number of shift statements
is equal to the sum of the labels of the edges of the tree. The method is given
in the following algorithm.
ALGORITHM 7.3
Set of shift statements from spanning tree.
Input. A shift matrix Ms and a spanning tree for its shift graph.
Output. A sequence of Floyd-Evans production language statements.
Method. For each node Y of the spanning tree except the root, construct
the sequence of statements
L: ]al - > as l * Sal

l az > az I * Saz
[a. > a.[ ,Sa.

1 1 L'
where L is the label of row Y in Ms (i.e., the set of labels S X 1 , . . . , SXm,
where X s , . . . , Xm are the symbols whose rows in the precedence matrix
form row Y in M,). L' is the label for the directancestor of node Y in the
spanning tree; a l , . . . , a, are the columns covered by row Y of M, but not
by its direct ancestor.
For node ~ , we add a new computed goto statement:
~: #1 I R#
The statements for the nodes can be placed in any order. However, if
the statement
I 1 L'
immediately precedes the statement labeled L', then the former statement
may be deleted. E]
Example 7.12
The tree of Fig. 7.18 is a spanning tree for the shift graph of Fig. 7.17.
The following sequence of statements could be generated from the tree
of Fig. 7.18 by Algorithm 7.3. Of course, the sequence of the statements is
not completely fixed by Algorithm 7.3, and other sequences are possible.
By S Y~ is meant the set of labels corresponding to row Yt in the shift graph.
Fig. 7.18 Spanning tree.
SY4: a 1 ------->. a 1 • Sa x
a6 ~ a6 , Sa6
SY,: a 5 >a 5 • Sa 5
SY~ : a3 - > a3 • Sa 3
a 6 ----> a 6 • Sa 6
S Y3" a1 > al • Sa 1
a4 > a4 • Sa 4
SY~ : a2 > a2 • Sa 2
R# D
THEOREM 7.3
Algorithm 7.3 produces a sequence of production language statements
which may replace the shift statements generated by Algorithm 7.2, with
no change in the parsing action of the program.
P r o o f We observe that when started at the sequence of statements gener-
ated by Algorithm 7.3 for node Y, the statements which may subsequently
be executed are precisely those generated for the ancestors of node Y. It is
straightforward to show that these statements test for the presence in the
lookahead position of exactly those symbols whose columns are covered
by row Y of the shift matrix.
The statement with label ~ ensures that if no shift is made, we transfer
to the proper R-group.
The spanning tree for a given shift graph which produces the fewest shift
statements by Algorithm 7.3 is surprisingly easy to find. We observe tfiat,
neglecting the unconditional transfer statements, the number of statements
generated by Algorithm 7.3 (all of which are of the form l a ~a l • Sa
for some a) is exactly the sum of the labels of the edges in the tree.
ALGORITHM 7.4
Minimum cost spanning tree from shift graph.
Input. Shift graph (A, R) for a precedence matrix M.
Output. Spanning tree (A, R') such that ~(x,Y)~Z l(X, Y) is minimal.
Method. For each node Y in A other than the root, choose a node X such
that l(X, Y) is smallest among all edges entering Y. Add (X, Y) to R'. [Z]
Example 7.13
The spanning tree in Fig. 7.18 is obtained from the shift graph of Fig.
7.17 using Algorithm 7.4. [~
THEOREM 7.4
The number of shift statements generated by Algorithm 7.3 from a span-
ning tree is minimized for a given shift graph when the tree produced by
Algorithm 7.4 is chosen.
Proof Since every node except the root of (A, R) has a unique direct
ancestor, (A, R') must be a tree. Since in every spanning tree of (A, R) one
edge enters each node other than the root, the minimality of (A, R') is
immediate. [-]
We observe that we can define a reduce matrix Mr from a precedence

matrix M by deleting all but the .> entries and merging rows exactly as we
did for the shift matrix. We can then define a reduce graph in exact analogy
with the shift graph and minimize the number of checking statements by an
obvious analog of Algorithm 7.4. We leave the details to the reader. We
shall give an example of the minimization of the entire Floyd-Evans parser
for Go.
Example 7.14
The Me matrix for G Ois
( a ) + •
< <
<
Y <
SEC. 7 . 2 OPTIMIZATION OF FLOYD-EVANS PARSERS 575
where Y = [(, + , . , $}. The Mr matrix (the matrix for reductions) is

) + • $
T 3> 3> 3>
Z 3> 3> 3> 3>
where Z = (F, a, )}.

Employing Algorithm 7.4 for merging both shift statements and for
merging checking statements, we generate the following F l o y d - E v a n s parser
for G 0. The initial statement is S$.
S(: S+: S,: S$: = :> ( ,s(

a >a .Sa
SE: ) - >) • S)
+ >+ .S+
$#t $ accept
fg
ST: * >* ,S,
fg: # R#
RT: E+# >E emit 1 CT
4; >E emit 2 CT
SF: RF: T,~ >T emit 3 CF
# >T emit 4 CF
Sa: Ra: # >F emit 6 Ca
s): R): (E# >F emit 5 C)
RE: R(: R+: R,: R$: error
CF: Ca: C):
CT: )
÷
$
error
G: # s#
tHere, this statement can be executed only with E on top of the stack. If it could be
executed otherwise, we would have to replace ~ by E (or in general, by the start symbol).
The penultimate statement plays the role of ~ for the checking state-
ments. D
There are other improvements that can be made to Floyd-Evans parsers.

One is to combine two statements into a single statement. For example,
the second statement of the parser for Go could have been combined with
the statement labeled Sa into
la ~F I emit 6 Ca
If we are willing to change the behavior of the parser slightly, we can

effect further changes. One change would be to delay error detection. For
example, the parser in Section 5.4.3 for G O uses only 11 statements, but it
delays error detection in some cases.
We should emphasize that all the parsing algorithms discussed in Chapter
5 can be implemented in the Floyd-Evans production language. To imple-
ment these parsing algorithms efficiently, we can use techniques similar to
those presented in this section.
Finally, there is the question of implementing the Floyd-Evans produc-
tion language parser itself. A production language statement may cause
the following elementary operations to be performed- read an input symbol,
compare symbols, place symbols on stack, pop symbols on the stack, generate
output. These operations are quite straightforward to implement. Hence,
we can construct a program that will map a Floyd-Evans parser into a se-
quence of these elementary operations. A small interpreter can then be
provided to execute this sequence of elementary operations.
EXERCISES
7.2.1. Use Algorithm 7.2 to generate Floyd-Evans parsers for the following
weak precedence grammars:
(a) s - - ~ S + Il I
I ~ (S) la(S)ta
(b) S - - , 0Sll01
(c) E ~ E + TI T
T--~, T , F I F
e~etele
P----, (E)Ia
7.2.2. Use the techniques of this section to improve the parsers constructed
in Exercise 7.2.1.
7.2.3. For the weak precedence matrix of Fig. 7.19, find the shift and reduce
matrices.
EXERCISES 577
Xl X2 X3 X4 X5 X6
I
X1 <" <.,'= _~ .> .>
, ,.
X2 <. <. .= .> .>
X 3 "> <.,~ .> .>
X4 "> ~_ .~ <.
X5 <. .> .> .> .>
X6 <- <. -> .~

. .
°
X7 =" <.,_-" .> .> .>
X8 .
Fig. 7.19 Matrix of precedence rela-

tions.
7.2.4. From the shift and reduce matrices constructed in Exercise 7.2.3, con-
struct the shift and reduce graphs.
7.2.5. Use Algorithm 7.3 to find shortest sequences of shift and checking state-
ments for the graphs of Exercise 7.2.4.
*7.2.6. Devise an algorithm to generate a deterministic left parser in production
language for an LL(1) grammar.
7.2.7. Using the algorithm developed in Exercise 7.2.6, construct a left parser
in production language for the following LL(1) grammar:
E-----~. TE"
E'- ~ + TE'ie
T - - - - ~ FT"
T'- ~ , FT'[e
F- x (E)la
*7.2.8. Use the techniques of this section to improve the parser constructed in
Exercise 7.2.7.
*7.2.9. Devise an algorithm to generate a deterministic right parser in pro-
duction language for an LR(1) grammar.
7.2.10. Using the algorithm developed in Exercise 7.2.9, construct right parsers
for Go and the grammar in Exercise 7.2.7.
"7.2.11. Use the techniques of this section to improve the parsers constructed in
Exercise 7.2.10. Compare the resulting parsers with those in Examples
5.47 and 7.14 and the one in Exercise 7.2.8.
"7.2.12. Is it possible to test a production language parser to determine if it is
a valid weak precedence parser for a given grammar ?
"7.2.13. Construct an algorithm to generate a production language program that

simulates a deterministic pushdown transducer. What improvements
can be made to the resulting program ?
Research Problems
It should be evident that this section does not go very deeply into
its subject matter and that considerable further improvements can be
made in parsers implemented in production language. We therefore
suggest the following areas for further research.
7.2.14. Study the optimizations which are possible when various kinds of shift-
reduce algorithms are to be implemented in production language. In
particular, one might examine the algorithms used to parse LL(k), BRC,
extended precedence, simple mixed strategy precedence, and LR(k)
grammars.
7.2.15. Extend the parser optimizations given in this section by allowing post-
ponement of error detection, statement merger and/or other reasonable
alterations of the production language program.
7.2.16. Develop an alternative to production language for t h e implementation
of parsing algorithms. Your language should have the property that
each statement can be implemented by some constant number of machine
statements per character in the statement of your language. A "reason-
able" random access machine should serve as a benchmark here.
7,2,17. Design elementary operations that can be used to implement Floyd-
Evans production language statements. Construct an interpreter that
will execute these elementary operations.
7.2.18. Construct a compiler which will take a program in production language
and generate for it a sequence of elementary operations which can be
executed by the interpreter in Exercise 7.2.17.
7,2,19, Write a program that will construct production language parsers for a
useful class of context-free grammars.
7,2,20, Construct a production language parser for one of the grammars in the
Appendix of Volume I. Incorporate an error recovery routine which gets
called whenever error is announced. The error recovery routine should
adjust the stack and/or input so that normal parsing can resume.
BIBLIOGRAPHIC NOTES
Production language and variants thereof have been popular for implementing
parsers. Techniques for the generation of production language parsers have been
developed by a number of people, including Beals [1969], Beals et al. [1969],
DeRemer [1968], Earley [1966], and Haynes and Schutte [1970]. The techniques
SEC. 7.3 TRANSFORMATIONS ON SETS OF LR(k) TABLES 579
presented in this section for generating production language parsers were origi-
nated by Cheatham [1967]. The use of computed goto's in production language
and the optimization method in Section 7.2.2 is due to Ichbiah and Morse [1970].
Some error recovery techniques for production language parsers are described in
LaFrance [1970].
7.3. TRANSFORMATIONS ON SETS OF LR(k) TABLES
We shall discuss the optimization of LR(k) parsers for the remainder of

this chapter. The reason for devoting such a large amount of space to LR(k)
parsers and their optimization is twofold. First, the LR(k) grammars are the
largest natural class of unambiguous grammars for which we can construct
deterministic parsers. Secondly, using optimizations it is possible to produce
LR parsers that are quite competitive with other types of parsers.
In Chapter 5 the parsing algorithm for LR(k) grammars was given. The
heart of this algorithm is a set of LR(k) tables which governs the behavior
of the parser. In Section 5.2.5 an algorithm was presented which could be
used to automatically construct the canonical set of LR(k) tables for an
LR(k) grammar (Algorithm 5.7).
However, as we noted, this canonical set of LR(k) tables can be imprac-
tically large for a grammar of practical interest if k .~ 1. Nevertheless, the
LR(k) parsing algorithm using the canonical set of LR(k) tables [the
canonical LR(k) parser] has some desirable features"
(1) The parser is fast. An input string of length n can be parsed in cn
moves, where c is a small constant.
(2) The parser has good error-detecting capability. For example, suppose
that the string xa is a prefix of some sentence in the language at hand but
that xab is not a prefix of any sentence. On an input string of the form
xaby, the canonical LR(1) parser would parse x, shift a, and then announce
error. Thus, it would announce error when the input symbol b becomes
the lookahead string for the first time. In general, the canonical LR(k) parser
will announce error at the earliest possible opportunity in a left-to-right
scan of the input string.
Precedence parsers do not enjoy this early error-detecting capability.
For example, in parsing the input string xaby mentioned above, it is possible
for a precedence parser to scan arbitrarily many symbols ofy before announc-
ing error. (See Exercise 7.3.5.)
LL(k) parsers share the fast speed and good error-detecting capability
of LR(k) parsers. However, not every deterministic language has an LL
grammar, and, in general, it is often possible to find a more "natural" LR
grammar to describe a programming language and its translation. For
this reason, for the remainder of this chapter we shall concentrate on tech-
niques for constructing small LR(k) parsers. Many of these techniques can
also be applied to LL(k) grammars.
In this section we shall discuss LR(k) parsers from a general point of
view. We shall say that two LR(k) parsers are equivalent if given an input
string w either they both accept w or they both announce error at the same
symbol in w. This is exactly akin to the notion of "equivalence" we encoun-
tered in Example 7.8.
In this section we shall present several transformations which can be
used to reduce the size of an LR(k) parser, while producing an equivalent
LR(k) parser. In Section 7.4 we shall present techniques which can be used
to produce directly, from certain types of LR(k) grammars, LR(k) parsers
that are considerably smaller than, but equivalent to, the canonical LR(k)
parser. In addition, the techniques discussed in this section can also be
applied to these parsers. Finally, in Section 7.5 we shall consider a more
detailed implementation of an LR parser in which common scanning actions
can be merged.
7.3.1. The General Notion of an LR(k) Table
A set of LR(k) tables forms the basis of the LR(k) parsing algorithm
(Algorithm 5.7). In general, there are many different sets of tables which
can be used to construct equivalent parsers for the same LR(k) grammar.
Consequently, we can search for a set of tables with certain desirable prop-
erties, e.g., smallness.
To understand what changes can be made to a set of LR(k)tables, let us
examine the behavior of the LR(k) parsing algorithm in detail. This algorithm
places LR(k) tables on the pushdown list. The LR(k) table on top of the push-
down list dictates the behavior of the parsing algorithm. Each table is a pair
of functions ( f , g). Recall that f, the parsing action function, given a look-
ahead string, tells us what parsing action to take. The action may be to (1)
shift the next input symbol onto the pushdown list, (2) reduce the top of
the pushdown list according to a named production, (3) announce comple-
tion of the parsing, or (4) declare that a syntactic error has been found in
the input. The second function g, the goto function, is invoked after each
shift action and each reduce action. Given a symbol of the grammar, the goto
function returns either the name of another table or an error notation.
A sample LR(1) table is shown in Fig. 7.20. In the LR(k) parsing
action goto
a b e S A a b
75:
s 3 x T4 X T7 X
Fig. 7.20 LR(1) table.
algorithm, a table can influence the operation of the parser in two ways.
First, suppose that table T~ is at the top of the pushdown list. Then the
parsing action function of Ti influences events. For example, if b is the look-
ahead string, then the parser calls for a reduction using production 3. Or if
a is the lookahead string, the parser shifts the current input symbol (here, a)
onto the pushdown list, and since g(a) = TT, the name of table T7 follows a
onto the top of the pushdown list.t
The second way in which a table can influence the action of the parser
appears immediately after a reduction. Suppose that the pushdown list is
• ATlbT2ST3 and that a reduction using the production S ~ bS is called
for by T3. The parser will then remove four symbols (two grammar symbols
and two tables) from the stack, leaving ~AT1 there.l: At this point table T~
is exposed. The nonterminal S is then placed on top of the stack, and the goto
function of T1 is invoked to determine that table T4 = g(S) is to be placed
on top of S.
We shall take the viewpoint that the important times in an LR parsing
process occur when a new table has just been placed on top of the pushdown
list. We call such a table a governing table. We shall examine the charac-
teristics of an LR parser in terms of the sequence of governing tables. If a
governing table T calls for a shift, then the next governing table is determined
by the goto function of T in a straightforward manner. If T calls for a reduc-
tion, on the other hand, the next governing table is determined by the goto
function of the ith table from the top of the pushdown list, where i is the
length of the right-hand side of the reducing production. What table might
be there may seem hard to decide, but we can give an algorithm to determine
the set of possible tables.
With the viewpoint of governing tables in mind, we shall attempt to deter-
mine when two sets of LR(k) tables give rise to equivalent parsers. We shall
set performance criteria that elaborate on what "equivalent" means. First,
we shall give the definition of a set of LR(k) tables that extends the definition
given in Section 5.2.5. Included as both a possible action and a possible
goto entry will be a special symbol ~, which can be interpreted as "don't
care." It turns out, as we shall see, that many of the error entries in a set of
LR(k) tables are never exercised; that is, the LR(k) parsing algorithm will
never consult certain error entries no matter what the input is. Thus, we can
tAs we mentioned in Chapter 5, it is not necessary to write the grammar symbols on

the pushdown list. However, since the action of an LR(k) parser is more evident with the
grammar symbols present, in this section we shall assume that the grammar symbols also
appear on the pushdown list.
~Note that when the canonical LR(k) parser calls for a reduction according to pro-
duction i, the right-hand side of production i will always be a suffix of the grammar
symbols on the stack. Thus, in the parsing process it is not necessary to match the right-
hand side of the production with the grammar symbols on the stack.
change these entries in any fashion whatsoever, and the parser will stili operate
in the same way.
DEFINITION
Let G be a CFG. A set o f LR(k) tables for G is a pair (3, To), where 3 is
a set of tables for G and To, called the initial table, is in 3. A table for G is
a pair of functions ( f , g), where
(1) f i s a mapping from X.k to the set consisting of ~, error, shift, accept,
and reduce i for all production indices i, and
(2) g maps N u X to 3 u {q~,error}.
When To is understood, we' shall refer to (3, To) simply as 3. The canonical
set of LR(k) tables constructed in Section 5.2.5 is a set of LR(k) tables in
the formal sense used here. Note that (p never appears in a canonical set of
LR(k) tables.
Example 7.15
(1) S ~ SA
(2) S ~ A
(3) A --. aA
(4) A --~ b
In Fig. 7.21 is a set of LR(1) tables for G.
action goto
a b e S A a b
S S X X 73 T~ 72
T2 4 4 4
T3 ~o ~o T1 X T2
Fig. 7.21 Set of LR(1) tables.
We shall see that the tables of Fig. 7.21 do not in any sense parse accord-
ing to the grammar G. They merely "fit" the grammar, in that the tables
defined use only symbols in G, and the reductions called for use productions
which actually exist in G.
We can redefine the LR(k) parsing algorithm based on a set of LR(k)

tables as defined above. This algorithm is essentially the same as Algorithm
5.7 when we consider a ~ entry as an error entry. For completeness we shall
restate the algorithm.
DEFINITION
Let (3, To) be a set of LR(k) tables for a C F G G = (N, 2, P, S). A con-
figuration of the LR(k) parser for (3, To) is a triple ( T o X i T 1 . . . X,,Tm, w, ~z),
where
(1) T~ is in 3, 0 _~ i < m, and To is the initial table;
(2) X~isinNuZ, l<i<m;
(3) w is in Z*; and
(4) zt is an output string of production numbers.
The first component of a configuration represents the contents of the
pushdown list, the second the unused input, and the third the parse found
so far.
An initial configuration of the parser is one of the form (To, w, e) for
some w in Z*. As before, we shall express a move of the parsing algorithm
by the relation 1--- on configurations defined as follows.
Suppose that the parser is in configuration (To Xi T i . . . XmTm, w, rt), where
Tm = ( f , g> is a table in 3. Let w be in Z* and let u = FIRSTk(w). That is, w is
the string remaining on the input, and u is the lookahead string.
(1) If f(u) = shift and w = aw', where a ~ Z, then
(ToX1T 1 . . . XmT m, aw', ~z) ~- (ToX1T 1 . . . XmTmaT, w', rt)
where T = g(a). Here, we make a shift move in which the next input symbol,
a, is shifted onto the pushdown list and then the table g(a) is placed on top
of the pushdown list.
(2) Suppose that f(u) = reduce i and that production i is A ~ y, where
I rl = r. Suppose further that r < m and that T m - r -~ ( f ' , g'). Tm-r is the
table that is exposed when the string X'm_r+lZm_r+l . . . X m T m is removed
from the pushdown list. Then we say that
( T o X 1T 1 ....e~mTrn , w, 7~) ~- ( T o X 1Z 1 . . . X r a _ r T m _ r A T , w, 7ri)
where T = g'(A). Here a reduction according to the production A ~ y is

made. The index of this production is appended to the output string, and
a string of length 2[y[ is removed from the top of the pushdown list and
replaced by AT, where T = g'(A) and g' is the goto function of the table
immediately below the last table removed. Note that in the reduction the
symbols removed from the pushdown list are not examined. Thus, it may
be possible to make a reduction such that the grammar symbols removed
do not correspond to the right-hand side of the production governing the
reduction.t
(3) If f(u) -= ~, error or accept, there is no next configuration C such
that (ToXITi " " XmTm, w, zt) ~ C.
There is also no next configuration if in rule (1) w = e (then there is no
next input symbol) or g(a) is not a table name, or in rule (2) r > m (then
tThis will never happen if the canonical set of tables is used. In general, this situation
is undesirable and should only be permitted if we are sure an error will be declared shortly.
584 TECHNIQUES FOR PARSER O P T I M I Z A T I O N ' CHAP. 7
there are not enough symbols on the pushdown list) or g'(a) is not a table
name.
We call (ToX~T1 . . . XmTm, w, n) an error indication if there is no next
configuration. However, as an exception, we call (ToST1, e, n) an accepting
configuration if T~ : ( f , g) and f(e) ~ accept.
We define [_.L., I_~._,and [_t_in the usual manner. We say that configuration
C is accessible if Co [-~- C for some initial configuration C 0, We shall now
summarize the parsing algorithm.
ALGORITHM 7.5
LR(k) parsing algorithm.
Input. A C F G G - - ( N , E, P, S), a set (3, To) of LR(k) tables for G, and
an input string w ~ E*.
Output. A sequence of productions n or an error indication.
Method.
(1) Construct the initial configuration (To, w, e).
(2) Let C be the latest configuration constructed. Construct the next
configuration C' such that C ~- C' and then repeat step (2). If there is no
next configuration C', go to step (3).
(3) Let C - - ( a , x, n) be the last configuration constructed. If C is an
accepting configuration, then emit n and halt. Otherwise, indicate error.
It should be evident that this algorithm can be implemented by a deter-

ministic pushdown transducer with a right endmarker.
If Algorithm 7.5 reaches an accepting configuration, then the output
string n is called a parse for the input string w. We say that n is valid if n is
a right parse for w according to the grammar G. Likewise, we say that a set
of LR(k) tables is valid for a grammar G if and only if Algorithm 7.5 produces
a valid parse for each sentence in L(G) and does not produce a parse for any
w not in L(G).
By Theorem 5.12 we know that the canonical set of LR(k) tables for an
LR(k) grammar G is valid for G. However, an arbitrary set of LR(k) tables
for a grammar G obviously need not be valid for G.
Example 7.16
Let us trace through the sequence of moves made by Algorithm 7.5 on
input ab using the LR(1) tables of Fig. 7.21 (p. 582) with T~ as the initial
table.
The initial configuration is (T1, ab, e). The action of T 1 on lookahead a
is shift and the goto of T1 on a is TI, so
(T,, ab, e) ~ (T~aT~, b, e) "

The action of T~ on b is also shift but the goto is T2, so
(TlaT 1, b, e) ]-- (TiaTlbT2, e, e)
The action of T2 on e is reduce 4; production 4 is A ~ b. Thus, the topmost

table and grammar symbol are removed from T~aTibT2, leaving TiaT1.
The goto of T~ on A is T3, so
(TiaT~bT2, e, e) F- (T~aT~AT3, e, 4)
The action of T3 on lookahead e is (p. That is, no next configuration can be

constructed. Since (TlaT1AT3, e, 4) is not an accepting configuration, we have
an indication of error.
Since ab is in L(G), this set of tables is obviously not valid for the grammar
in Example 7.15. [-7
7.3.2. Equivalence of Table Sets
We can now describe what it means for two sets of LR(k) tables to be
equivalent. The weakest equivalence we might be interested in would require
that, using Algorithm 7.5, the two sets of tables produce the same parse
for those sentences in the language L(G) and that one would not parse a sen-
tence not parsed by the other. The error condition might be detected at
different times by the two sets of tables.
The strongest equivalence we might consider would be one which required
that the two sets of tables produce identical sequences of parsing actions.
That is, suppose that (To, w, e) and (T~,, w, e) are initial configurations for
two sets of tables 3 and 3'. Then for any i ~ 0
(T o, w, e)[.2_. (ToX 1T~ ... S'mT m, X , 7(,)
using tables in 3 if and only if
(To, w,e)]--L- (ToX1T'i

' ' "'" X,T,,,
' ' x ,' rt')
using tables in 3', where m - - n , x = x', n ----n' and Xi -- X'i, for 1 ~ i ~ m.

Each of these definitions allows us to develop techniques by which sets
of tables can be modified while these equivalences are preserved. Here we
shall consider an equivalence that is intermediate in stringency. We require,
of course, that Algorithm 7.5 using one set of tables finds a parse for an input
string w if and only if it finds the same parse for w using the other set of
tables. Moreover, as in the strongest kind of equivalence, we further require
that Algorithm 7.5 trace through the same sequence of parsing actions on
each input string whenever both sets of tables specify parsing actions. How-
ever, we shall allow one set of tables to continue making reductions, even
though the other set has stopped parsing. We have the following motivation
for this definition.
When an error occurs, we wish to detect it using either set of tables, and
we want the position of the error to be as apparent as possible. It will not
do for one set of tables to detect an error while the other shifts a large number
of input symbols before the error is detected. The reason for this requirement
is that one would in practice like to discover an error as close to where it
occurred as possible, for the purpose of producing intelligent and intelligible
diagnostics.
In practice, on encountering an error, one would transfer to an error
recovery routine which would modify the remaining input string and/or
the contents of the pushdown list so that the parser could proceed to parse the
rest of the input and detect as many errors as possible in one pass over
the input. It would be unfortunate if large portions of the input had been
processed in a meaningless way before the error was detected. We are thus
motivated to make the following definition of equivalence o n sets of LR
tables. It is a special case of the informal notion of equivalence discussed
in Section 7.1.
DEFINITION
Let (3, To) and (3', T~) be two sets of L R ( k ) t a b l e s for a context-free
grammar G = (N, E, P, S).
Let w be an input string in E*, C O = (To, w, e), and C~ = (T~, w, e).
Let C o ~ C ~ ]--Czl-- "'" and C ~ [ - C'~ ~ C ~ ~ . . . be the respective
sequences of configurations constructed by Algorithm 7.5. We say that
(3, To) and (3', T~) are equivalent if the following four conditions hold, for
all i ~> 0 and for arbitrary w.
(1) If C~ and C~ both exist, then we can express these configurations
as Ct = (ToX1Zl . . . XmTm, X, ~) and C't -- (ToXiT'~ . . . ~Y(mT~, x, n), that
is, as long as both sequences of configurations exist, they are identical except
for table names.
(2) Ct is an accepting configuration if and only if C', is an accepting
configuration.
(3) If C~ is defined but C~ is not, then the second components of C~_
and C~ are the same.
(4) If C'~ is defined but C~ is not, then the second components of C~_~
and C'~ are the same.
What conditions (3) and (4) are saying is that once one of the sets of tables
has detected an error, the other must not consume any more input, that is,
not call for any shift actions. However, conditions (3) and (4) allow one set
of tables to call for one or more reduce actions while the other set has halted
with a don't care or error action.
Notice that neither set of tables has to be valid for G. However, if two
sets of tables are equivalent and one is valid for G, then the other must also
be valid for G.
Example 7.17
Consider the LR(1) grammar G with the productions
(1) S---, aSb
(2) S--Tab
(3, To), the canonical set of LR(1) tables for G, is shown in Fig. 7.22. Figure
7.23 shows another set of LR(1) tables, ('11, U0), for G. Let us consider the
action goto
a b e S a b
To S X X T1 72 X
X X A X X X
T2 S S X T3 T4 Ts
T3 X S X X X T6
/'4 S S X T7 T4 78
Ts X X 2 X X X
T6 X X 1 X X X
T7 X S X x x T9
T8 X 2 X X X X
T9 X 1 X X X X
Fig. 7.22 (3, To)
behavior of the LR(1) parsing algorithm using 3 and '11 on the input string
abb. Using 3, the parsing algorithm would make the following sequence
of moves"
(To, aab, e) ~ (ToaT2, bb, e)
(ToaTzbTs, b, e)
The last configuration is an error indication. Using the set of tables 'It, the
parsing algorithm would proceed as follows"
(Uo, abb, e) ~ (UoaU2, bb, e)

[- (UoaU2bU4, b, e)
[-- (UoSU~, b, 2)
action goto
a b e S a b
Uo S X X u~ u2 ~0
U1 ,p X A
S S X u3 u2 u4
U3 ~p S X ~0 ~ Us
u4 X 2 2 ~o ~0 ~0
X 1 1 ~o ~o ~o
Fig. 7.23 ('lt, U0)
The last configuration is an error configuration. Note that the canonical

parser announces error as soon as it sees the second b in the input string
for the first time. But the Parser using the set of tables 'It reduced ab to S
before announcing error. However, the second b is not shifted onto the push-
down list, so the equivalence condition has not been violated.
It is not too difficult to show that 5 and 'll. are indeed equivalent. There is
an algorithm to determine whether an arbitrary set of LR(k) tables for
an LR(k) grammar is equivalent to the canonical set of LR(k) tables for
that grammar. We leave the algorithm as an exercise.
7.3.3. ~-inaccessible Sets of Tables
Many of the error entries in a canonical set of LR(k) tables are never
used by the LR(k) parsing algorithm. Such error entries can be replaced by
~o's, which are truly don't cares in the sense that these entries never influence
the computation of the next configuration for any accessible configuration.
We show that all error symbols in the goto field of a canonical set of tables
can be replaced by ~'s and that if a given table can become the governing
table only immediately after a reduction, then the same replacements can
be made in the action field.
DEFINITION
Let (3, To) be a set of LR(k) tables, k ~ 1, and let
c = (ToX, T~X~T~ . . . X,,Tm, w, n)
be any accessible configuration. Let T = ( f , g ) and u = FIRSTk(w). We say

that 5 is free of accessible ~o entries, or to-inaccessible for short, if the follow-
ing statements are true for arbitrary C:
(1) f(u) =/=~o.
(2) If f(u) = shift, then g(a) :~ ~o, where a is the first symbol of u.
(3) If f(u) = reduce i, production i is A ---~ Y1 "'" Y,, r > 0, and Tm-r
is ( f ' , g'), then g'(A) ~ ~.
Informally, a set of LR(k) tables is tp-inaccessible if whatever ~ entries
appear in the tables are never referred to by Algorithm 7.5 during the parsing
of any input string. We shall now give an algorithm which replaces as many
error entries as possible by ~ in the canonical set of LR(k) tables while keep-
ing the resulting set of tables c-inaccessible.
Thus, we can identify the error entries in a canonical set of LR(k) tables
which are never consulted by the LR(k) parsing algorithm by using this
algorithm to change the unused error entries to ~p's.
ALGORITHM 7.6
Construction of a a-inaccessible set of LR(k) tables with as many ~'s

as possible.
lnput. An LR(k) grammar G = (N, X, P, S), with k > 1, and (3, To),
the canonical set of LR(k) tables for G.
Output. (3', T~), an equivalent set of LR(k) tables with all unused error
entries replaced by (p.
Method.
(1) For each T = ( f , g ) in 3, construct a new table ( f ' , g'), where
~g(X) if g(X) ~ error

g'(X)
I ~a otherwise
Let (31, T~) be the set of tables so constructed.

(2) The set of tables (3', To) is then constructed as follows"
(a) Tg is in 3'.
(b) For each table T = ( f , g ) in 31 --{To} we add to 3' the table
T' = ( f ' , g)~ where f ' is defined as follows:
(i) f'(u) --f(u) whenever f(u) ~ error.
(ii) Iff(vb) = error for some v in Xk-1 and b in E and for some
a E X there is a table ( f l , g l ) in 31 such that fl(av) = shift
and ga(a) = T, then f'(vb) = error.
(iii) If f(u) = error for some u in X *~k-t~ and for some a E
there is a table ( f l , g l) in 31 such that fl(au) = shift and
g x(a) = T, then f'(u) = error.
(iv) Otherwisef'(u) = q~. [--]
In step (1) of Algorithm 7.6 all error entries in the goto functions are
changed to ~a entries, because when k > 1, a canonical LR(k) parser will
always detect an error immediately after a shift move. Hence, an error entry
in the goto field will never be exercised.
Step ( 2 ) o f Algorithm 7.6 replaces error by ~a in the action field of table
T if there is no table ( f l , g l) and lookahead string au such that f l (au) --

shift and g~(a) = T. Under these circumstances table T can appear on top
of the stack only after a reduction. However, if an error has occurred, the
canonical LR(1) parser would have announced error before making the reduc-
tion. Thus, all error entries in tables such as T will never be consulted and
can therefore be treated as don't cares.
We should reiterate that Algorithm 7.6 as stated works only for sets of
LR(k) tables where k > 1. For the LR(0) case, the lookahead string will
always be the empty string. Therefore, all errors must be caught by the goto
entries. Thus, in a set of LR(0) tables not all error entries in the goto field
are don't cares. We leave it for the Exercises to deduce which of these entries
are don't cares.
Example 7.18
Let G be the LR(I) grammar with productions
(1) S - - ~ S a S b
(2) S---, e
The canonical set of LR(k) tables for G is shown in Fig. 7.24(a), and the tables
after application of Algorithm 7.6 are shown in Fig. 7.24(b).
Note that in Fig. 7.24(b) all error's in the right-hand portions of the
tables (goto fields) have been replaced by ~0's. In the action fields, To has been
left intact by rule (2a) of Algorithm 7.6. The only shift actions occur in tables
T3 and T7; these result in T4, T~, or T7 becoming the governing table. Thus,
e r r o r entries in the action fields of these tables are left intact. We have changed
e r r o r to ~oelsewhere. [~]
THEOREM 7.5
The set of tables 3' constructed by Algorithm 7.6 is (0-inaccessible and
equivalent to the canonical set 3.
P r o o f The equivalence of :3 and :3' is immediate, since in Algorithm 7.6,
the only alterations of 3 are to replace error's by ~0's, and the LR(k) parsing
algorithm does not distinguish between error and ~0 in any way.t We shall
now show that if a ~0entry is encountered by Algorithm 7.5 using the set of
tables 3', then 3' was not properly constructed from 3.
Suppose that 3' is not ~0-inaccessible. Then there must be some smallest
i such that Co ]'---" C, where Co is an initial configuration and in configuration
C Algorithm 7.5 consults a ~0-entry of 3'. Since To is not altered in step (2a),
we must have i > 0. Let C = (To X1T1 " " XmTm, w, 7z), where T,, = ( f , g )
and FIRSTk(w ) = u. There are three ways in which a ~0-entry might be
encountered.
tThe purpose of the ~0'sis only to mark entries which can be changed.
action goto
b e S a b
To 2 X 2 X X
T1 S X A X T2 X
T2 2 2 X X X
T3 S S X X r4 rs
T4 2 2 X T6 X X
1 X 1 X X X
T6 S S X X T4 T7
T7 1 1 X X X X
(a) Canonical Set of LR(1 ) Tables
action ~oto
a b e S a b
To 2 X 2 T~ ~
T1 S ~o A T2
T2 2 2 X 75 :
T3 S S ,p ~ T4 ~
T4 2 2 X r6 ~
Ts 1 X 1 ~p tp ~o
T6 S S : r4 r7
T7 1 1 X ,p ~o ,p
Fig. 7.24 A set of LR(1) tables before
(b) ~-Free Set of Tables and after application of Algorithm 7.6.
Case 1: Suppose that f(u) -- qo.Then by step (2aii) and (2aiii) of Algorithm
7.6, the previous move of the parser could not have been shift and must have
been reduce. Thus, Co [i-~ C'k- C, where
C' = ( T o X i T , . . . Xm_,T,,_, Y, U, . . . YrU,, x, n'),
and the reduction was by production Xm ~ Y1 "'" Y,. (n is n' followed by

the number of this production.)
Let us consider the set of items from which the table Ur is constructed.
(If r = O, read Tin_1 for Ur). This set of items must include the item
[Xm ~ Y~ " " Yr ", u]. Recalling the definition of a valid item, there is
some y ~ Z* such that the string Xi . . . XmUy is a right-sentential form.
Suppose that the nonterminal Arm is introduced by production A --~ o~Xzfl.
That is, in the augmented grammar we have the derivation
s' rm rm
7 x.flx
- rm
where ),~t -- X, . . . X~_i. Since u is in FIRST(fix), item [A ~ ~Xm'fl, V]

must be valid for X, . . . Xm if V = FIRST(x).
We may conclude that the parsing action of table Tm on u in the canonical
set of tables was not error and thus could not have been changed to ~0 by
Algorithm 7.6, contradicting what we had supposed.
Case 2: Suppose that f ( u ) = shift and that a is the first symbol of u
but that g(a)--~0. Since f (u) -- shift, in the set of items associated with
table Tm there is an item of the form [A--~ ~.afl, v] such that u is in
EFF(aflv) and [A ~ ~.afl, v] is valid for the viable prefix X 1 - . . Arm.
(See Exercise 7.3.8.) But it then follows that [A--~ ~a.fl, v] is valid for
X ~ . . . Xma and that X ~ . . . Xma is also a viable prefix. Hence, the set
of valid items for X~ . . . X,,a is nonempty, and g(a) should not be ~0 as
supposed.
Case 3: Suppose that f ( u ) -- reduce p, where production p is Xr "'" Xm,
T,_~ is ( f ' , g'), and g'(A) = ~0. Then the item [A --~ )Yr "'" Xm., u] is valid
for X1 . . . X~, and the item [A ~ • X, . . . Xm, U] is valid for X1 . . . X~_t.
In a manner similar to case 1 we claim that there is an item [B ~ ~A. fl, v]
which is valid for X, . . . X~_iA, and thus g'(A) should not be ~0. [~
We can also show that Algorithm 7.6 changes as many error entries in
the canonical set of tables to ~ as possible. Thus, if we change any e r r o r
entry in 3' to (p, the resulting set of tables will no longer be ~0-inaccessible.
7.3.4. Table Mergers by Compatible Partitions
In this section we shall present an important technique which can be used

to reduce the size of a set of LR(k) tables. This technique is to merge two
tables into one whenever this can be done without altering the behavior of
the LR(k) parsing algorithm using this set of tables. Let us take a to-inacces-
sible set of tables, and let T1 and Tz be two tables i n t h i s set. Suppose that
whenever the action or goto entries of Ti and T2 disagree, one of them is ~0.
Then we say that Ti and T2 are compatible, and we can merge T, and 7"2,
treating them as one table.
In fact, we can do more. Suppose that T~ and 7'2 were almost compatible
but disagreed in some goto entry by having 7'3 and 7'4 there. If 7'3 and T4
were themselves compatible, then we could simultaneously merge T3 with
T4 and T~ with T~. And if T3 and T4 missed being compatible only because
they had T~ and T2 in corresponding goto entries, we could still do the
merger.
We shall describe this merger algorithm by defining a compatible parti-
tion on a set of tables. We shall then show that all members of each block
in the compatible partition may be simultaneously merged into a single table.
DEFINITION
Let (3, To) be a set of ¢p-inaccessible LR(k) tables and let II =[$1, • • •, Sp}
be a partition on 3. That is, 8, U $2 U -.- U 8p -- 3, and for all i ~ j, ~;,
and Sj are disjoint. We say that II is a compatible partition of width p if for
all blocks 8~, 1 < i ~ p, whenever ( f l , g , ) and (fz, g , ) are in St, it follows
that
(1) f~(u) ~f2(u) implies that at least one off,(u) andf2(u) is ~0and that
(2) g~(X) ~ g2(X) implies that either
(a) At least one of g~(X) and g2(X) is ~, or
(b) g~(X) and g2(X) are in the same block of II.
We can find compatible partitions of a set of LR(k) tables using tech-
niques reminiscent of those used to find indistinguishable states of an incom-
pletely specified finite automaton. Our goal is to find compatible partitions
of least width. The following algorithm shows how we can use a compatible
partition of width p on a set of LR(k) tables to find an equivalent set con-
taining p tables.
ALGORITHM 7.7
Merger by compatible partitions.
Input. A ~0-inaccessible set (3, To) o f LR(k) tables and a compatible
partition II = [ $ ~ , . . . , Sp) on 3.
Output. An equivalent cp-inaccessible set (3', To) of LR(k) tables such
that :/4:3' = p.
Method.
(1) For all i, 1 _~ i _~ p, construct the table Ut = ( f , g ) from the block
St of II as follows-
(a) Suppose that ( f ' , g ' ) is in 8t and that for lookahead string u,
f'(u) ~ cp. Then let f(u) = f ' ( u ) . If there is no such table in St,
set f(u) = ~o.
(b) Suppose that ( f ' , g') is in $~ and that g'(X) is in block ~j. Then
set g ( X ) = Uj. If there is no table in St with g'(X) in $i, set
g(x) = ~o.
(2) T~ is the table constructed from the block containing To. [--]
The definition of compatible partition ensures us that the construction

given in Algorithm 7.7 will be consistent.
Example 7.19
Consider Go, our usual grammar for arithmetic expressions"
(1) E - - D E + T
(2) E ~ T
(3) T ~ T, F
(4) T----, F
(5) F--~ (E)
(6) r---~a
The (p-inaccessible set of LR(1) tables for G O is given in Fig. 7.25.

We observe that T3 and T1 o are compatible and that T1, and 7'20 are com-
patible. Thus, we can construct a compatible partition with {T3, T10} and
{T14, Tzo} as blocks and all other tables in blocks by themselves. If we replace
{T3, Tlo} by U~ and {T~a, 7'20} by U2, the resulting set of tables is as shown
in Fig. 7.26. Note that goto entries 7'3, T~o, T14, and 7'20 have been changed
to U1 or Uz as appropriate.
The compatible partition above is the best we can find. For example,
Ti 6 and T17 are almost compatible, and we would group them in one block
of a partition if we could also group T~ o and 7'20 in a block of the same parti-
tion. But T~ 0 and 7'20 disagree irreconcilably in the actions for + , ,, and).
D
We shall now prove that Algorithm 7.7 produces as output an equivalent
~0-inaccessible set of LR(k) tables.
THEOREM 7.6
Let 3' be the set of LR(k) tables constructed from 3 using Algorithm 7.7.
Then 3 and 3' are equivalent and q~-inaccessible.
P r o o f Let T' in 3' be the table constructed from the block of the com-
patible partition that contains the table T in 3. Let Co = (To, w, e) and
Co = (To, w, e) be initial configurations of the LR(k) parser using 3 and 3',
respectively. We shall show that
(7.3.1) Co I--L- (ToXaT~ . . . XmTm, x, lg) using 3

if and only if Co I.-L-(T~X~T'~ . . . XmT'm, x, zr) using 3'
That is, the only difference between the LR(k) parser using 3 and 3' is that
in using 3' the parser replaces table T in 3 by the representative of the block
of T in the partition H.
We shall prove statement (7.3.1) by induction on i. Let us consider the
"only if" portion. The basis, i -- 0, is trivial. For the inductive step, assume
that statement (7.3.1) is true for i. Now consider the i + 1st move. Since
;3 is to-inaccessible, the actions of Tm and T" on FIRSTk(x ) are the same.
action goto
a + • ( ) e E T F a + • ( )
To S X X S X X T1 T2T3T 4 ~0 SO T5 ~0
T1 ~o S ,p ¢ so A
2 S ,p ~ 2
T3 so 4 4 ,# ~ 4
T4 X 6 6 X X 6 ~0 SO ~0 ~0 ~0 SO
S X X S X X T8 T9 T10 Tll V SO T12 SO
T6 S X X S X X ~0 T13 T3 Z4 ~P SO T5 SO
T7 S X X S X X SO SO T14 T4 ~P ,p T5 SO
r8 so S so ¢ S ~0 ~o ~o ~0 ~0 T16 SO tp T15
T9 so 2 S so 2 so SO ~p SO tp ~p T17 SO tp
Tlo ,p 4 4 so 4 ,p ~p ~ tp ~p ~ SO ~p ~p
Tll X 6 6 X 6 X SO SO ~P SO ~# SO ~P ~P
T12 S X X S X X T18 T9 Tlo Tll ~P ~p T12 ~P
T~3 so 1 S so so 1 SO SO tp SO ~p T7 SO SO
T14 ~o 3 3 ¢ ~p 3 ~p ~p ~p SO ~P SO SO ~P
T15 X 5 5 X X 5 9 ~ 9 9 9 ~P SO 9
T16 S X X S X X SO T19 T.IO Tll ~P 9 T12 9
T~7 S X X S X X tp ~p T20 Tll ~p ~o T12
T18 so S so so S ,p tp ~p SO SO T16 tp ~p T21
T~9 so 1 S so 1 so ~p ~ ~0 SO ~o Tl7 SO ~p
r:o so 3 3 so 3 so SO ~p SO SO ~p ~o SO ~p
T~ X 5 5 X 5 X ~P ~ SO SO 9 SO 9 9
Fig. 7.25 to-inaccessible tables for Go.
Suppose that the action is shift, that a is the first symbol of x, and that the
goto entry of Tm on a is T. By Algorithm 7.7 and the definition of compatible
partition, the goto of T'~ on a is T' if and only if T' is the representative of
the block of II containing T.
action goto
a + * ( ) e E T F a + * ( )
ro S X X S X X ~q r2 vl r4 ~ ~ rs ~
rl SO S SO ~o SO A
T2 ,# 2 S ~0 SO 2
T4 X 6 6 X X 6 SO SO SO SO SO SO SO SO
rs S X X S X X T8 7'9 v~ rx~ ~ ~ 7"12 ~o

T~ S X X S X X SO T13 Wl T4 SO SO T5 SO
T7 S X X S X X ,,0 ~0 u2 7'4 ~, ,: 7's ~0
T8 ,: ~ ,,0 ~ 716 ~ ~ Tt5
r9 SO 2 S ~o 2 ~o ~o ~0 SO SO ~0 T17 SO SO
Tll X 6 6 X 6 X SO ~0 SO SO ~o SO SO ~0
T12 S X X S X X Z18 T9 U 1 Zll SO SO T12 ~0
T~3 SO 1 S SO SO 1 so so so so so T7 so so
Ti5 X 5 5 X X 5 SO SO SO SO SO SO ~P SO
T~6 S X X S X X SO T19 UI Tli SO ~ T12 SO
T17 S X X S X X so so U2 TiI SO so T12 SO
T~8 so S SO SO- S SO SO SO SO SO T16 SO SO T21
Ti9 SO 1 S SO 1 SO ~0 SO SO SO SO T17 SO SO
X 5 5 X 5 X SO SO SO SO SO SO SO SO
Ux SO 4 4 SO 4 4 SO SO SO SO SO SO SO SO
U~ SO 3 3 SO 3 3 SO SO SO SO SO SO SO SO
Fig. 7.26 Merged tables.
If the action is reduce by some production with r symbols on the right,

then comparison of Tm-r and T'm-, yields the inductive hypothesis for i + 1.
For the "if" portion of (7.3.1), we have only to observe that if T'm has
a non-fo action on FIRST(x), then T'~ must agree with Tin, since 3 is ~0-
inaccessible. D
We observe that Algorithm 7.7 preserves equivalence in the strongest

sense. The two sets of tables involved always compute next configurations
for the same number of steps regardless of whether the input has a parse.
7.3.5. Postponement of Error Checking
Our basic technique in reducing the size of a set of LR(k) tables is to

merge tables wherever possible. However, two tables can be merged into
one only if they are compatible. In this section we shall discuss a technique
which can be used to change essential error entries in certain tables to reduce
entries with the hope of increasing the number of compatible pairs of tables
in a set of LR(k) tables.
As an example let us consider tables T4 and T~ in Fig. 7.25 (p. 595) whose
action fields are shown below"
action
a + . ( ) e
T4: X 6 6 X X 6
Taa: X 6 6 X 6 X
If we changed the action of T4 on lookahead ) from error to reduce 6 and

the action of T11 on e from error to reduce 6, then T4 and Tll would be com-
patible and could be merged into a single table. However, before we make
these changes, we would like to be sure that the error detected by Ta o n )
and t h e error detected by Tll on e will be detected on a subsequent move
before a shift move is made. In this section we shall derive conditions under
which we can postpone error detection in time without affecting the position
in the input string at which an LR(k) parser announces error. In particular,
it is easy to show that any such change is permissible in the canonical set
of tables.
Suppose that an LR(k) parser is in configuration (To X~ T~ . . . XmT=, w, zt)
and that table Tm has action error on lookahead string u - FIRSTk(w ).
Now, suppose that we change this error entry in Tm to reduce p, where p is
production A --~ Y1 "'" Yr. There are two ways in which this error could
be subsequently detected.
In the reduction process 2r symbols are removed f r o m the pushdown
list and table Tm-r is exposed. If the goto of Tm-r on A is ~, then error would
be announced. We could change this qa to error. However, the goto of Tm-r
on A could be some table T. If the action of T on lookahead u is error or qa
(which we can change to error to preserve ~-inaccessibiiity), then we would
catch the error at this point. We would also maintain error detection if
the action of T on u was reduce p' and the process above was repeated. In
short, we do not want any of the tables that become governing tables after
the reduction by production p to call for a shift on lookahead u (or for
acceptance).
Note that in order to change error entries to reduce entries, the full
generality of the definition of equivalence of sets of LR(k) tables is needed

here. While parsing, the new set of tables may compute next configurations
several steps farther than the old set, but no input symbols will be shifted.
To describe the conditions under which this alteration can take place,
we need to know, for each table appearing on top of the pushdown list
during the parse, what tables can appear as the r -+- 1st table from the top
of the pushdown list. We begin by defining three functions on tables and
strings of grammar symbols.
DEFINITION
Let (3, To) be a set of LR(k) tables for grammar G = (N, X, P, S). We
extend the GOTO function of Section 5.2.3 to tables and strings of grammar
symbols. GOTO maps 5 × (N U X)* to 3 as follows:
(1) GOTO(T, e) = T for all T in 3.
(2) If T = ( f , g>, GOTO(T, X) --- g ( X ) for all X in N U E and T in 5.
(3) GOTO(T, eX) -- GOTO(GOTO(T, e), X) for all 0~ in (N W E)* and
T i n :3.
We say table that T in (5, To) has height r if GOTO(T0, e) = T implies
that [a[ > r.
We shall also have occasion to use GOTO-~, the "inverse" of the GOTO
function. GOTO -~ maps 5 x (N u Z)* to the subsets of 5. We define
GOTO-~(T, t~) = IT' [GOTO(T', 00 = T}.
Finally, we define a function NEXT(T, p), where T is in 3 and p is produc-
tion A ~ Xx . . . X, as follows:
(1) If T does not have height r, then NEXT(T, p) is undefined.
(2) If T has height r, then NEXT(T, p ) - - [ T ' ] t h e r e exists T" ~ 5 and
~ (N U E) r such that T" ~ GOTO-I(T, e) and T ' - - G O T O ( T " , A)}.
Thus, NEXT(T, p) gives all tables which could be the next governing
table after T if T is on t o p of the pushdown list and calls for a reduction by
production p. Note that there is no requirement that Xx ..- Xr be the top
r grammar symbols on the pushdown list. The only requirement is that there
be at least r grammar symbols on the list. If the tables are the canonical
ones, we can show that only for ~t -- X 1 --. Xr among strings of length r
will GOTO -1 (T, e) be nonempty.
Certain algebraic properties of the GOTO and NEXT functions are left
for the Exercises.
The GOTO function for a set of LR(k) tables can be conveniently por-
trayed in terms of a labeled graph. The nodes of the GOTO graph are labeled
by table names, and an edge labeled X is drawn from node T,. to node Tj if
GOTO(T,., X) = Tj. Thus, if GOTO(T, X i ) ( 2 . . . X~) = T ' , then there will
be a path from node T to node T' whose edge labels spell out the string
X1 X2 --..Yr. The height of a table T can then be interpreted as the length
of the shortest path from To to T in the GOTO graph.
T h e N E X T function can be easily computed from the GOTO graph.

To determine NEXT(T, i), where production i is A - - . X 1 X 2 . . . Xr, we
find all nodes T" in the GOTO graph such that there is a path of length r
from T" to T. We then add GOTO(T", A) to NEXT(T, i) for each such T".
Example 7.20
The GOTO graph for the set of tables in Fig. 7.25 (p. 595) is shown in
Fig. 7.27.
F ~ )
)j,,,_J +
Fig. 7.27 GOTO graph.

From this graph we can deduce that GOTO[T6, (E)] = T15, since
GOTO[T6, (] = Ts, GOTO[Ts, E] = Ts, and GOTO[Ts, )] = T15.
Table T6 has height 2, so NEXT(T6, 5), where production 5 is F ~ (E),
is undefined.
Let us now compute NEXT(T1 s, 5). The only tables from which there is
a p a t h of length 3 to T15 are To, T6, and TT. Then GOTO(T0, F ) ~ - T 3 ,
GOTO(T6, F) = T3, and GOTO(TT, F) = T 14, and so NEXT(T1 s, 5) =
{T3, T14}. [~
We shall now give an algorithm whereby a set of q~-inaccessible tables
can be modified to allow certain errors to be detected later in time, although
not in terms of distance covered on the input. The algorithm we give here is
not as general as possible, but it should give an indication of how the more
general modifications can be performed.
We shall change certain error entries and q~-entries in the action field to
r e d u c e entries. For each entry to be changed, we specify the production to
be used in the new reduce move. We collect permissible changes into what
we call a postponement set. Each element of the postponement set is a triple
(T, u, i), where T is a table name, u is a lookahead string, and i is a produc-
tion number. The element (T, u, i) signifies that we are to change the action
of table T on lookahead u to reduce i.
DEFINITION
Let (3, To) be a set of LR(k) tables for a grammar G = (N, E, P, S).
We call (P, a subset of 3 × E *k × P, a postponement set for (3, To) if the
following conditions are satisfied.
If (T, u, i) is in 6' with T = ( f , g), then
(1) f ( u ) = e r r o r or ~0;
(2) If production i is A --~ e and T = GOTO(To, fl), then e is a suffix
of fl;
(3) There is no i' such that (T, u, i') is also in (P; and
(4) If T' is in NEXT(T, i) and T' = ( f ' , g'), then f'(u) = error or ~.
Condition (1) states that only error entries and (p-entries are to be changed
to reduce entries. Condition (2) ensures that a reduction by production i
will occur only if a appears on top of the pushdown list. Condition (3) ensures
uniqueness, and condition (4) implies that reductions caused by introducing
extra reduce actions will eventually be caught without a shift occurring.
Referring to condition (4), note that (T', u, j) may also be in (P. In this
case the value o f f ' ( u ) will also be changed from error or ~ to r e d u c e j. Thus,
several reductions may be made in sequence before error is announced.
Finding a postponement set for a set of LR(k) tables which will maximize
the total number of compatible tables in the set is a large combinatorial
problem. In one of the examples to follow we shall hint at some heuristic
techniques which can be used to find appropriate postponement sets. How-
ever, we shall first show how a postponement set is used to modify a given
set of LR(k) tables.
ALGORITHM 7.8
Postponement of error checking.
Input. An LR(k) grammar G----(N, ~, P, S), a g-inaccessible set (3, To)
of LR(k) tables for G, and a postponement set 6).
Output. A a-inaccessible set 3' of LR(k) tables equivalent to 3.
Method.
(1) For each (T, u, i) in 6', where T = ( f , g), change f(u) to reduce i.
(2) Suppose that (T, u, i) is in 6) and that production i is A ----~0c. For all
T' = ( f ' , g ' ) such that GOTO(T', ~) = T and g'(A) = ~, change g'(A)
to e r r o r .
(3) Suppose that (T, u, i) is in 6' and that T' = ( f ' , g'~ is in NEXT(T, i).
Iff'(u) = q~, change f'(u) to e r r o r .
(4) Let 3' be the resulting set of tables with the original names retained.
D
E x a m p l e 7.21

(1) S ~ A S
(2) S ~ b
(3) A ~ # ~
(4) B ~ aB
(5) B ~ b
The T-inaccessible set of LR(1) tables for G obtained by using Algorithm
7.6 on the canonical set of LR(1) tables for G is shown in Fig. 7.28.
We can choose to replace both error entries in the action field of T4 with
reduce 5 and the error entry in T8 with reduce 2. That is, we pick a postpone-
ment set 6) = {(T4, a, 5), (T4, b, 5), (Ta, e, 2)}. Production 5 is B ---~ b, and
GOTO(T0, b) -~ GOTO(T2, b) ~ T4. Thus, the entries under B in To and T2
must be changed from ~ to error. Similarly, the entries under S for T3 and T7
are changed to e r r o r . Since NEXT(T4, 5) and NEXT(Ts, 2) are empty, no
(0's in the action fields need be changed to e r r o r . The resulting set of tables
is shown in Fig. 7.29.
If we wish, we can now apply Algorithm 7.7 with a compatible partition
grouping T4 with Ts, T~ with T~, and T5 with T6. (Other combinations of
three pairs are also possible.) The resulting set of tables is given in Fig. 7.30.
THEOREM 7.7
Algorithm 7.8 produces a a-inaccessible set of tables 3' which is equiva-

lent to 3.
action goto
a b e S A B a b
r0 S S X T1T2 SO T3 T4
~q SO SO A SO SO SO SO SO
r2 S S SO Ts T2 ~ T3 T4
r3 S S X ~ r6rTT8
r4 X X 2 SO SO SO SO SO
rs SO SO 1 SO SO SO SO SO
r6 3 3 SO SO so so SO so
r7 S S X ~ Tg r7 T8
7q 5 5 X SO so so SO SO
r9 4 4 SO SO so SO so SO
Fig. 7.28 ~0-inaccessible tables for G.
action goto
a b e S A B a b
To S S X TiT2XT3T4
Ti SO SO A SO SO SO SO
73 S S SO Ts T2 X T3 T4
T3 s s x X , T6 TT T8
r4 5 5 2 SO SO SO SO SO
r5 SO SO 1 SO SO SO SO SO
r~ 3 3 SO SO SO SO SO SO
T7 S S X x ~0 Tg T7 Ts
T8 5 5 2 so so so so ¸ so
r9 4 4 SO so so so so so
Fig. 7.29 Tables after postponement of error checking.

action goto
b e S A B a b
ro S S X T1T 1 X T3 T4
75 S S A Ts Ti X T3 T4
75 S S X x ~ Ts rT r4
r4 5 5 2 ~o ~o ~o ~o ~o
r5 3 3 1 ~o ~o so ~o ~o
T7 S S X X ~o T9 TT T4
r9 ¸ 4 4 ~o ~o ~p ~p tp ¢
Fig. 7.30 Merged tables.
P r o o f Let Co = (To, w, e) be an initial configuration of the LR(k) parser

(using either :3 or 3' - - t h e table names are the same). Let Co k- C1 k- • • • k- C,
be the entire sequence of moves made by the parser using ~ and let
Co ~- C'1 [--- • • • k- C " be the corresponding sequence for 3'. Since 3' is formed
from 3 by replacing only error entries and (0-entries, we must have m > n,
and C~ -- C i, for 1 n, then no shift or accept
moves are made after configuration C', has been entered.
If m -- n, the theorem is immediate, and if C, is an accepting configu-
ration, then m = n. Thus, assume that C, declares an error and that m > n.
By the definition of a p o s t p o n e m e n t set, since the action in configuration
C, is error and the action in C', is not error, the action in configuration C',
must be reduce. Thus, let s be the smallest integer greater than n such that
the action in configuration C's is shift or accept. (If there is no such s, we
have the theorem.)
Then there is some r, where n < r _< s, such that the action entry consulted
in configuration C'r is one which was present in one of the tables of 5. The
case r -- s is not ruled out, as certainly the shift or accept entry of C'r was
present in 3. The action entry consulted in configuration C'~_ i was of the form
reduce i for some i. By our assumption on r, that entry must have been intro-
duced by Algorithm 7.8.
Let T~ and T2 be the governing tables in configurations C'r_ ~ and C',, re-
spectively. Then Tz is in NEXT(T~, i), and condition (4) in the definition of
a postponement set is violated. D
We shall now give a rather extensive example in which we illustrate how

postponement sets and compatible partitions might be found. There are
a number of heuristics used in the example. Since these heuristics will not be
delineated elsewhere, the reader is urged to examine this example with care.
Example 7.22
Consider the tables for G O shown in Fig. 7.25 (p. 595). Our general
strategy will be to use Algorithm 7.8 to replace error actions by reduce
actions in order to increase the number of tables with similar parsing action
functions. In particular, we shall try to merge into one table all those tables
which call for the same reductions.
Let us try to arrange to merge tables T~ 5 and T21 , because they reduce
according to production 5, and tables T4 and T11, which reduce according
to production 6.
To merge T~ 5 and T21, we must make the action of T15 on ) be reduce 5
and the action of T2~ on e be reduce 5. Now we must check the actions of
NEXT(Tls, 5) = {T3, T14} and NEXT(T2~, 5) = {Tl0, T2o} on ) and e. Since
T3 and T14 each have (~ action on), we could change these ~'s to error's and
be done with it. However, then T3 and T1 o would no longer be compatible
nor would T14 and T20---~so we would be wise to change the actions of T3
and T14 on ) instead to reduce 4 and reduce 3, respectively.
We must then cheek NEXT(T3, 4) = NEXT(T14, 3) = [T2, T13}. A similar
argument tells us that we should not change the actions of T2 and T13 on )
to error, but rather to reduce 2 and reduce 1, respectively. Further, we see that
NEXT(T2, 2) = [Ti } = NEXT(T13, 1). There is nothing wrong with changing
the action of Ti on ) to error, so at this point we have taken into account all
modifications needed to change the action of T15 on ) to reduce 5.
We must now consider what happens if we change the action of T21 on
e to reduce 5. NEXT(T21, 4) = ITs0, T20}, but we do not want to change
the actions of Tlo and T2o on e to error, because then we could not possibly
merge these tables with T3 and T14. We thus change the actions of T10 and
T20 on e to reduce 4 and reduce 3, respectively. We find that
NEXT(Tlo, 4) = NEXT(T2o, 3) = {Tg, T19}.
We do not want to change the actions of T9 and T19 on e to error, so let

T9 have action reduce 2 on e and T19 have action reduce 1 on e. We find
NEXT(Tg, 2) = NEXT(T19, 1) = {Ts, T18}. These tables will have their ac-
tions on e changed to error.
We have now made T15 and T21 compatible without disturbing the pos-
sible compatibility ofT3 and Tlo, ofT1, and T20, ofT2 and T9, ofT13 and T19,
or of T14 and T20. Now let us consider making T4 and T11 compatible by
changing the action of T4 on ) to reduce 6 and the action of Tll on e to
reduce 6. Since NEXT(T,, 6) = IT3, T14} and NEXT(T11, 6) = {T10, T20}, the
changes we have already made to T3 and T14 and to T10 and T20 allow us to
make these changes to T4 and T11 without further ado. The complete post-
ponement set consists of the following elements"
[T2, ), 2] [Tg, e, 2]
[T3, ), 4] [T~ o, e, 4]
[T4, ), 6] [T~ ,, e, 6]
[T,3, ), 1] [T,9, e, 1]
[T,,, ), 3] [Tzo, e, 3]
[r, 5, ), 5] [r2,, e, 5]
The result of applying Algorithm 7.8 to the tables of Fig. 7.25 with this
postponement set is shown in Fig. 7.31. Note that no error entries are intro-
duced into the goto field.
Looking at Fig. 7.3 l, we see that the following pairs of tables are imme-
diately compatible"
T~-Ti0
T~-T~i
T~-T~o
T~-T~i
Moreover, if these pairs form blocks of a compatible partition, then

the following pairs may also be grouped"
T~-T, ~
T~'T12
T~'T17
T~-T~
T~-T~
T~-Ti ~
If we apply Algorithm 7.7 with the partition whose blocks are the above
pairs and the singletons {To} and {T1}, we obtain the set of tables shown in
Fig. 7.32. The smaller index of the paired tables is used as representative
in each case. It is interesting to note that the "SLR" method of Section
7.4.1 can construct the particular set of tables shown in Fig. 7.32 directly
from Go.
To illustrate the effect of error postponement, let us parse the erroneous
input string a). Using the tables in Fig. 7.25, the canonical parser would
action ,,goto
a + • ( ) e E T F a + * ( )
ro S X X S X X T1 T: T3 T4 ~ ~ Ts
rl ~p S ~ ~p X A
r~ ,# 2 S ~ 2 2
r3 4 4 ¢ 4 4
r4 X 6 6 X 6 6
rs S X X S X X T8 T9 Zlo Tll ~o tp T12 ~0
r6 S X X S X X g9 T13 T3 T4 ho ~0 T5 g9
r7 S X X S X X tp tp T14 T4 ho tp T5
r8 ~. S ,¢ ¢ S X ~0 ~ ~0 ~ T16 ~0 ~ T15
r9 2 S ,~ 2 2 ~p ~o tp tp tp T17 ~p tp
r~o ,~ 4 4 ¢ 4 4
Tll X 6 6 X 6 6
r~2 S X X S X X T~8 T9 T10 T~ ~o ~ T~2

~o I S ~o 1 1 tp tp tp ~0 tp T7 tp ¢
r~4 ,~ 3 3 ~o 3 3
rls X 5 5 X 5 5 tp ~ ~ tp ~o tp tp ,¢
r16 S X X S X X tp T19 TlO Tll tp ~p T12 tp
r17 S X X S X X ¢ tp T2o Tll ¢ tp T12 ¢
T18 S ¢ ~, S X , ~0 "9 , T16 ¢ : T21
T19 ~p 1 S ~ 1 1 ~p tp ho 9 9 T17 tp ~p
r~o 3 3 ~o 3 3
r~l X 5 5 X 5 5 ~p ~p tp tp ~p ~o ¢ ~o
Fig. 7.31 A p p l i c a t i o n of p o s t p o n e m e n t algorithm.
make one move:
[To, a), e] F-- [ToaT4, ), e]
The action of T4 on ) is error.

action goto
a + * ( ) e E T F a + • (
r0 S X X S X X
T1 ~p S ~ ~p X A ~p ~p ~p ~o T6 ~p ~p
T2 ~p 2 S ~p 2 2
T3 ,# 4 4 ~0 4 4 ~p ~o ~p ~p ~p ~p ~p ~p
T4 X 6 6 X 6 6
T5 S X X S X X r8 r2 r3 r4
T6 S X X S X X r~3 r3 73
T7 S X X S X X TI4 T4
r8 S ¢ ¢ S X
T13 ,p 1 S ~o 1 1
T14 ~p 3 3 ~ 3 3
T15 X 5 5 X 5 5
Fig. 7.32 After application of merging algorithm.
However, using the tables in Fig. 7.32 to parse this same input string,
the parser would now make the following sequence of moves:
[To, a), e] [--- [ToaT4, ), e]

[- [ToFT3, ), 6]
I-- [ToTT2, ), 64]
~- [ToET1, ), 642]
Here three reductions are made before the announcement of error. [~
7.3.6. Elimination of Reductions by Single Productions
We shall now consider an important modification of LR(k) tables that

does not preserve equivalence in the sense of the previous sections. This
modification will never introduce additional shift actions, nor will it even
cause a reduction when the unmodified tables detect error. However, this
modification can cause certain reductions to be skipped altogether. As a result,
a table appearing on the pushdown list may not be the one associated with
the string of grammar symbols appearing below it (although it will be com-
patible with the table associated with that string). The modification of this
section has to do with single productions, and we shall treat it rather more
informally than we did previous modifications.
A production of the form A ~ B, where A and B are nonterminals, is
called a single production. Productions of this nature occur frequently in
grammars describing programming languages. For instance, single produc-
tions often arise when a context-free grammar is used to describe the prece-
dence levels of operators in programming languages. From example, if a
string al-t- az * a3 is to be interpreted as a 1 ÷ (a 2 • a3), then we say the
operator • has higher precedence than the oPerator ÷ .
Our grammar G Ofor arithmetic expressions m a k e s , of higher precedence
than + . The productions in Go are
(1) E ~ E + T
(2) E----~ T
(3) T---~ T . F
(4) T--~ F
(5) F ~ (E)
(6) F--~ a
We can think of the nonterminals E, T, and F as generating expressions on
different precedence levels reflecting the precedence levels of the operators.
E generates the first level of expressions. These are strings of T's separated
by -t-'s. The operator -q- is on the first precedence level. T generates the second
level of expressions consisting of F's separated by • ;s. The third level of
expressions are those generated by F, and we can consider these to be the
primary expressions.
Thus, when we parse the string al -q- a2 * a3 according to G 0, we must
first parse a 2 • a3 as a T before combining this Twith al into an expression E.
The only function served by the two single productions E ~ T and
T ----~F is to permit an expression on a higher precedence level to be trivially
reduced to an expression on a lower precedence level. In a compiler the
translation rules usually associated with these single productions merely
state that the translations for the nonterminal on the left are the same as
those for the nonterminal on the right. Under this condition, we may, if we
wish, eliminate reductions by the single production.
Some programming languages have operators on 12 or more different
precedence levels. Thus, if we are parsing according to a grammar which
reflects a hierarchy of precedence levels, the parser will often make many
sequences of reductions by single productions. We can speed up the parsing
process considerably if we can eliminate these sequences of reductions, and
in most practical cases we can do so without affecting the translation that is
being computed.
In this section we shall describe a transformation on a set of LR(k) tables
which has the effect of eliminating reductions by single productions wherever
desired.
Let (3, To) be a 0-inaccessible set of LR(k) tables for an LR(k) grammar
G = (N, l~, P, S), and assume that 3 has as many (p-entries as possible. Sup-
pose that A ~ B is a single production in P.
Now, suppose the LR(k) parsing algorithm using this set of tables has
the property that whenever a handle Y~ Y2 "'" Y, is reduced to B on look-
ahead string u, B is then immediately reduced to A. We can often modify
this set of tables so that Y~ • • • Y,. is reduced to A in one step. Let us examine
the conditions under which this can be done.
Let the index of production A ---~ B be p. Suppose that T = ( f , g~ is
a table such that f ( u ) = reduce p for some lookahead string u. Let 3 ' =
GOTO-~(T, B) and let '11. = {U IU = GOTO(T', A) and T' ~ 3'}. 3' consists
of those tables which can appear immediately below B on the pushdown
list when T is the table above B. If (3, To) is the canonical set of LR(k)
tables, then 'It is NEXT(T, p)(Exercise 7.3.19).
To eliminate the reduction by production p, we would like to change
the entry g'(B) from T to g'(A) for each ( f ' , g ' ) in 3'. Then instead of making
the two moves
(yT' Yi U1 Yz Uz . . . Yr Ur, w, n:) ~ (?T'BT, w, n:i)

(?T'A U, w, nip)
the parser would just make one move"
(?T' Y1 U1 Y2 U2 . . . Yr U,., w, rt) ~ (?T'A U, w, n:i)
We can make this change in g'(B) provided that the entries in T and all
tables in q.t are in agreement except for those lookaheads which call
for a reduction by production p. That is, let T = ( f , g ) and suppose
q'[ -~- ( ( f l, g~), ( f z, gz), . . . , (fro, g,,)}" Then we require that
(1) For all u in E,k if f ( u ) is not ~o orreduce p, then f.(u) is either ~o or

the same as f ( u ) for 1 < i < m.
(2) For all X in N u E, if g(X) is not ~o, then g~(X) is either ~por the same
a s g ( X ) for 1 ~ i ~ m .
If both these conditions hold, then we modify the tables in 3' and 'It as
follows"
(3) We let g'(B) be g'(A) for all ( f ' , g ' ) in 3'.
(4) For 1 ~ i ~ m
(a) for each u ~ E *k change f~(u) to f(u) if f~(u) =~0 and if f ( u ) i s
not ~0 or reduce p, and
(b) for each X ~ N U E change g,(X) to g(X) if g(X) -~ ~o.
The modification in rule (3) will make table T inaccessible if it is possible
to reach T only via the entries g'(B) in tables ( f ' , g ' ) in 3'.
Note that the modified parser can place symbols and tables on the push-
down list that were not placed there by the original parser. For example,
suppose that the original parser makes a reduction to B and then calls for
a shift, as follows"
(?T' Y1 U1 Yz U2 ... YrU,, aw, n:) ~ (?T'BT, aw, zti)

F- (TT'BTaT", w, hi)
The new parser would make the same sequence of moves, but different sym-
bols would appear on the pushdown list. Here, we would have
(?T' Y1 U1 Y2 U2 ... Y,. Ur, aw, rt) ~ (?T'A U', aw, zti)
(TT'A U'aT", w, hi)
Suppose that T = ( f , g), T' = ( f ' , g'), and U = ( f . , g,). Then, U is g'(A).
Table U' has been constructed from U according to rule (4) above. Thus,
if f ( v ) = shift, where v = FIRST(aw), we know that f~(v) is also shift.
Moreover, we know that gi(a) will be the same as g(a). Thus, the new parser
makes a sequence of moves which is correct except that it ignores the ques-
tion of whether the reduction by A ~ B was actually made or not.
In subsequent moves the grammar symbols on the pushdown list are
never consulted. Since the goto entries of U' and T agree, we can be sure
that the two parsers will continue to behave identically (except for reductions
by single production A ~ B).
We can repeat this modification on the new set of tables, attempting to
eliminate as many reductions by semantically insignificant single produc-
tions as possible.
Example 7.23
Let us eliminate reductions by single productions wherever possible in the
set of LR(1) tables for G Oin Fig. 7.32 (p. 607). Table T2 calls for reduction
by production 2, which is E ~ T. The set of tables which can appear imme-
diately below Tz on the stack is [To, T5}, since GOTO(T0, E ) = T1 and
GOTO(Ts, E) -- Ts.
We must check that except for the reduce 2 entries, tables T2 and T1 are
compatible and T2 and T8 are compatible. The action of T2 on • is shift. The
action of T1 and T8 on • is ~o. The goto of T2 on • is T7. The goto of T1 and
T8 on • is ~. Therefore, T2 and T1 are compatible and T2 and T8 are com-
patible. Thus, we can change the goto o f table To on nonterminal T from T2
to T1 and the goto of T5 on T from T2 to Ts. We must also change the action
of both T1 and Ts on • from f to shift, since the action of T2 on • is shift.
Finally, we change the goto of both T1 and T8 on • from (p to T7 since the
goto of T2 on • is Z7.
Table T2 is now inaccessible from To and thus can be removed.
Let us now consider the reduce 4 moves in table T3. (Production 4 is

T--~ F.)The set of tables which can appear directly below T3 is (To, T~, T6}.
Now GOTO(T0, T) = T1, GOTO(Ts, T) = T8, and GOTO(T6, T) = T13.
[Before the modification above, GOTO(T0, T) was T2, and GOTO(Ts, T)
was Tz.] We must now check that T3 is compatible with each of T1, Ts, and
T13. This is clearly the case, since the actions of T3 are either ~oor reduce 4,
and the gotos of T3 are all ~o.
Thus, we can change the goto of To, T~, and T6 on F to Tt, Ts, and Ti 3,
respectively. This makes table T3 inaccessible from To.
The resulting set of tables is shown in Fig. 7.33(a). Tables T2 and T3 have
been removed.
There is one further observation we can make. The goto entries in the
columns under E, T, and F all happen to be compatible. Thus, we can merge
these three columns into a single column. Let us label this new column by E.
The resulting set of tables is shown in Fig. 7.33(b).
The only additional change that we need to make in the parsing algorithm
is to use E in place of T and F to compute the goto entries. In effect the set
of tables in Fig. 7.33(b) is parsing according to the skeletal grammar
(1) E ~ E + E
(3) E--> E , E
(5) E----~ (E)
(6) E--,a
as defined in Section 5.4.3.
For example, let us parse the input string (a + a) • a using the tables in
Fig. 7.33(b). The LR(1) parser will make the following sequence of moves.
[To, (a + a) • a, el I- [To(Ts, a + a) • a, e]
[To(T, aT4, + a) • a, e]
]--- [To(T~ET8, + a) • a, 6]
[To(TsET8 + T6, a) • a, 6]
[- [To(TsET8 + T6aT4, ) • a, 6]
1-- [To(TsET8 + T6ET~3, ) • a, 66]
~- [To(TsET8, ) • a, 661]
[To(TsETs)Tls, • a, 661]
t-- [ToET~, * a, 6615]
[ToET1, TT, a, 6615]
[ToET~ ,TTaT+, e, 6615]
t-- [ToET~,TTET~4, e, 66156]
~- [ToETi , e, 661563]
action goto
a + -, ( ) e E T F a + • ( )
To S X X S X X ~q rl ~ r4 ~ ~ r5
r~ ~p S S ~p X A
T~ X 6 6 X 6 6
T~ S X X S X X T8 r8 T8 T4 ~ ~ r5
T~ S X X S X X
T7 S X X S X X ~ r~4 r4 ~ ~ r~ ~
T8 ~p S S ,p S X
Tx~ ~p 1 S ~p 1 1
rl4 ,~ 3 3 ~p 3 3
X 5 ~ 5 X 5 5
(a) before column merger
action got__..o
a + * ( ) e E a + * ( )
To S X X S X X 7q r4 ~ ~ r5
r~ ~p S S ~p X A
T4 X 6 6 X 6 6
T5 S X X S X X r8 T4 ~ ~ T5
r~ S X X S X X T~3 r4 ~ ~ Ts
r7 S X X S X X r14 r4 ~ ~ r5
r8 S S ~ S X ~, ,p T6 T7 ~ T~5
T~3 ~p 1 S ,# 1 1
Ti4 ~p 3 3 ~p 3 3 ~o ~p ~p tp ~p ~¢
TI5 X 5 5 X 5 5
(b) after column merger
Fig. 7.33 ~ LR(1) tables after elimination of single productions.

The last configuration is an accepting configuration. In parsing this same

input string the canonical LR(1) parser for G o would have made five addi-
tional moves corresponding to reductions by single productions. D
The technique for eliminating single productions is summarized in

Algorithm 7.9. While we shall not prove it here in detail, this algorithm
enables us to remove all reductions by single productions if the grammar
has at most one single production for each nonterminal. Even if some non-
terminal has more than one single production, this algorithm will still do
reasonably well.
ALGORITHM 7.9
Elimination of reductions by single productions.
Input. An LR(k) grammar G = (N, Z, P, S) and a a-inaccessible set of
tables (3, To) for G.
Output. A modified set of tables for G that will be "equivalent" to (~3, To)
in the sense of detecting errors just as soon but which may fail to reduce by
some single productions.
Method.
(1) Order the nonterminals so that N ---- [A~ . . . . . A,], and if A~ --~ Aj is
a single production, then i < j. The unambiguity of an LR grammar guaran-
tees that this may be done.
(2) Do step (3) f o r j = 1, 2 , . . . , n, in turn.
(3) Let At--~ A~ be a single production, numbered p, and let T1 be a
table which calls for the reduce p action on one or more lookahead strings.
Suppose that T2 is in NEXT(T~, p)t and that for all lookaheads either the
actions of T~ and T~ are the same or one is ~ or the action of T~ is reduce p.
Finally, suppose that the goto entries of T~ and T2 also either agree or one
is ~. Then create a new table T3 which agrees with T~ and T2 wherever they
are not ~,, except at those lookaheads for which the action of T1 is reduce p.
There, the action of T3 is to agree with T2. Then, modify every table T -
(f,g) such that g(AO = T~ and g(Ai)= T~ by replacing T~ and T2 by T~ in
the range'of g.
(4) After completing all modifications of step (3), remove all tables which
are no longer accessible from the initial table.
We shall state a series of results necessary to prove our contention that
given an LR(1) grammar, Algorithm 7.9 completely eliminates reductions by
single productions if no two having the same left or same right sides, provided
that no nonterminal of the grammar derives the empty string only. (Such a
tNEXT must always be computed for the current set of tables, incorporating all
previous modifications of step (3)
nonterminal can easily be removed without affecting the LR(1)-ness of the

grammar.)
LEMMA 7.1
Let G - (N, X, P, S) be an LR(1) grammar such that for each C ~ N,
there is some w =/= e in E* such that C *=* w. Suppose that A =2, B by a se-
quence of single productions. Let a l and a 2 be the sets of LR(1) items that
are valid for viable prefixes ),A and ),B, respectively. Then the following
conditions hold"
(1) If [ C - * ~, • Xfll, a] is in a 1 and [D--~ ~2 " Yflz, b] is in 002, then
X =/= Y.
(2) If [C--* el • ill, a] is in 6 1 and b is in EFF(flla),t then there is no
item of the form [ D - ~ 0c2. f12, c] in 6 2 such that b ~ EFF(fl2c ) except
possibly for [E --~ B . , b], where A *=* E =:> B by a sequence of single produc-
tions.
Proof A derivation of a contradiction of the LR(1) condition when
condition (1) or.(2) is violated is left for the Exercises. [~
COROLLARY
The LR(1) tables constructed from ~; and 6 2 do not disagree in any
action or goto entry unless one of them is cp or the table for t2z calls for
a reduction by a single production in that entry.
Proof. By Theorem 7.7, since the two tables are constructed from the sets
of valid items for strings ending in a nonterminal, all error entries are don't
cares. Lemma 7.1(1) assures that there are no conflicts in the goto entries"
part (2) assures that there are no conflicts in the action entries, except for
resolvable ones regarding single productions.
LEMMA 7.2
During the application of Algorithm 7.9, if no two single productions
have the same left or the same right sides, then each table is the result of
merging a list (possibly of length 1) of LR(1) tables T ~ , . . . , T~ which were con-
structed from the sets of valid items for some viable prefixes yA1, y A 2 , . . . ,
yAh, where A i---~ A i+1 is in P for 1 < i < n.
Proof Exercise. [~]
THEOREM 7.8
If Algorithm 7.9 is applied to an LR(1) grammar G and its canonical set
of LR(1) tables 3, if G has no more than one single production for any non-
terminal, and if no nonterminal derives e alone, then the resulting set of
tables has no reductions by single productions.
]'Note that f i x c o u l d be e here, in which case etl calls for a reduction on lookahead b.
Otherwise, (~1 calls for a shift.
EXERCISES 61 5
Proof. Intuitively, Lemmas 7.1 and 7.2 assure that all pairs T1 and Tz
considered in step (3) do in fact meet the conditions of that step. A formal
proof is left for the Exercises.
EXERCISES
7.3.1. Construct the canonical sets of LR(I) tables for the following grammars:
(a) S---. A B A C
A - - . aD
B---, b l c
C--, cld
D---~ D010
(b) S ---, a S S [ b
(c) S --~ SSa l b
(d) E ~ E + T [ T
T---, T * F1F
V ~ P? FIP
P ---~ (E)1 al a(L)
L--~L, EIE
7.3.2. Use the canonicalset of tables from Exercise 7.3.1(a) and Algorithm
7.5 to parse the input string aObaOOc.
7.3.3. Show how Algorithm 7.5 can be implemented by
(a) a deterministic pushdown transducer,
(b) a Floyd-Evans production language parser.
7.3.4. Construct a Floyd-Evans production language parser for Go from
(a) The LR(1) tables in Fig. 7.33(a).
(b) The LR(1) tables in Fig. 7.33(b).
Compare the resulting parsers with those in Exercise 7.2.11.
7.3.5. Consider the following grammar G which generates the language
L = {anOa~bn[i, n ~ 0) W {Oa"laicn[i, n ~ 0}:
S- +AIOB
A ---+ aAb[O[OC
B~ aBc [ I I1C
C - - - ~ aC[a
Construct a simple precedence parser and the canonical LR(1) parser

for G. Show that the simple precedence parser will read all the a's
following the 1 in the input string a"laib before announcing error.
Show that the canonical LR(1) parser will announce error as soon as
it reads the I.
7.3.6. Use Algorithm 7.6 to construct a ~0-inaccessible set of LR(I) tables for
each of the grammars in Exercise 7.3.1.
61 6 TECHNIQUESFOR PARSER OPTIMIZATION CHAP. 7
7.3.7. Use the techniques of this section to find a smaller equivalent set of
LR(1) tables for each of the grammars of Exercise 7.3.1.
7.3.8. Let ~ be the canonical collection of sets of LR(k) items for an LR(k)
grammar G = (N, E, P, S). Let 12 be a set of items in ,~. Show that
(a) If item [A ~ tg • fl, u] is in 12, then u ~ FOLLOWk(A).
(b) If 12 is not the initial set of items, then 12 contains at least one item
of the form [A --~ tzX. fl, u] for some X in N w E.
(c) If [B ~ • fl, v] is in 12 and B ~ S', then there is an item of the
form [A ~ tz - BT, u] in 12.
(d) If [A --~ tg • Bfl, u] is in 12 and EFFi(Bflu) contains a, then there is
an item of the form [C ~ • a~', v] in 12 for some ~, and v. (This result
provides an easy method for computing the shift entries in an LR(1)
parser.)
*7.3.9. Show that if an error entry is replaced by tp in 3', the set of LR(k)
tables constructed by Algorithm 7.6, the resulting set of tables will no
longer be ~0-inaccessible.
"7.3.10. Show that a canonical LR(k) parser will announce error either in the
initial configuration or immediately after a shift move.
*'7.3.11. Let G be an LR(k) grammar. Give upper and lower bounds on the
number of tables in the canonical set of LR(k) tables for G. Can you
give meaningful upper and lower bounds on the number of tables in
an arbitrary valid set of LR(k) tables for G ?
"7.3.12. Modify Algorithm 7.6 to construct a tp-inaccessible set of LR(0) tables
for an LR(0) grammar.
"7.3.13. Devise an algorithm to find all a-entries in an arbitrary set of LR(k)
tables.
"7.3.14. Devise an algorithm to find all tp-entries in an arbitrary set of LL(k)
tables.
*'7.3.15. Devise an algorithm to find all a-entries in an LC(k) parsing table.
"7.3.16. Devise a reasonable algorithm to find compatible partitions on a set of
LR(k) tables.
7.3.17. Find compatible partitions for the sets of LR(1) tables in Exercise 7.3.1.
7.3.18. Show that the relation of compatibility of LR(k) tables is reflexive and
symmetric but not transitive.
7.3.19. Let (3, To) be the canonical set of LR(k) tables for an LR(k) grammar
G. Show that GOTO(T0, t~) is not empty if and only if t~ is a viable
prefix of G. Is this true for an arbitrary valid set of LR(k) tables for G ?
*7.3.20, Let (3, To) be the canonical set of LR(k) tables for G. Find an upper
bound on the height of any table in 3 (as a function of G).
7.3.21. Let 3 be the canonical set of LR(k) tables for LR(k) grammar
G = (N, E, P, S). Show that for all T e 3, NEXT(T, p) is the set
{GOTO(T', A)I T' ~ G O T O - I ( T , t~), where production p is A ~ t~}.
EXERCISES 617
7.3.22. Give an algorithm to compute NEXT(T,p) for an arbitrary set of

LR(k) tables for a grammar G .
*7.3.23. Let 3 be a canonical set of LR(k) tables. Suppose that for each T ~ 3
having one or more reduce actions, we select one production, p, by
which T reduces and replace all error and ~0 actions by reduce p. Show
that the resulting set of tables is equivalent to 3.
*7.3.24. Show that reductions by single productions cannot always be eliminated
from a set of LR(k) tables for an LR(k) grammar by Algorithm 7.9.
*7.3.25. Prove that Algorithm 7.9 results in a set of LR(k) tables which is equiv-
alent to the original set of tables except for reductions by single pro-
ductions.
7.3.26. A binary operator 0 associates from left to right if aObOe is to be inter-
preted as ((aOb)Oe). Construct an LR(1) grammar for expressions over
the alphabet [a, (,)} together with the operators { + , - - , . , / , ~}. All
operators are binary except + and --, which are both binary and
(prefix) unary. All binary operators associate from left to right except
$. The binary operators 3- and -- have precedence level 1, • and / have
precedence level 2, 1" and the two unary operators have precedence
level 3.
**7.3.27. Develop a technique for automatically constructing an LR(1) parser for
expressions when the specification of the expressions is in terms of a
set of operators together with their associativities and precedence levels,
as in Exercise 7.3.26.
*7.3.28. Prove that 3 and ql. in Example 7.17 are equivalent sets of tables.
**7.3.29. Show that it is decidable whether two sets of LR(k) tables are equivalent.
DEFINITION
The following productions generate arithmetic expressions in which
0 1 , 0 2 , . . . , 0. represent binary operators on n different precedence
levels. 01 has the lowest precedence, and 0. the highest. The operators
associate from left to right.
Eo - - ~ Eo$1 E1 i E1
E1 ~ EiOzEz I E2
En-1- > E.-IOnEnIE.

E. ----> (E1) l a
We can also generate this same set of expressions by the following

"tagged LR(1)" grammar"
(1) E, --~ E, OjEy 0~ i <j ~ n
(2) Ei ~ (Eo) 0 _~ i ~ n
(3) Ei---~a O~i~n
In these productions the subscripts are to be treated as tags on the

nonterminal E and terminal O. The conditions on the tags reflect the
precedence levels of the operators. For example, the first production
indicates that a level i expression can be a level i expression followed
by an operator on thejth precedence level followed by a levelj expression
provided that 0 < i < j < n. The start symbol has the tag 0.
The expression aO2(aOla), which is analogous to a , (a + a), has
the parse tree shown in Fig. 7.34. In the tree we have shown the values
of the tags associated with the nonterminals.
E0
( Eo )
E0 01 E1
a a Fig. 7.34 Parse tree.
Although the tagged grammar is ambiguous without the tags, we

can construct an LR(1)-like parser that uses tags with LR(1) tables
wherever necessary to correctly parse input strings. Such an LR(1)
parser is shown in Fig. 7.35.
To illustrate the behavior of the parser, let us parse the input string
aOz(aOla). The parser starts off in configuration ([To, 0], aO2(aOla), e)
in which the tag 0 is associated with the initial table To. The parsing
action of[T0, i] on input a is shift, and so the parser enters configuration
([To, 0]aTz, 0 2(aOl a), e)
The action of Tz on 0z is reduce by production 3; that is, E---~ a.

The goto of [To, i] on E is [T1, i]. The value of the tag is transmitted
from To to Ti. Therefore, the parser then enters configuration
([To, 0]E[T1 ,-01, O'2(aO1a), 3)

EXERCISES 619
action goto
a O/ ( ): e E a 0] ( )
[ro,il S X S X X ira,t] T2 X [r3,01 X
[T~ ,i ] X S X X A X X [ r4,/] X X
T2 X 3 X 3 3 X X X X X
[r3,0l S X S X X ITs,el T2 X [r3,01 X
iT4,/] S X S X X [r6,~] T2 X [r3,0l X
[Ts,i ] X S X S X X X IT4,/] X T7
tT6~I X R1 X R2 R2 X X [r,,/l X X
T7 X 2 X 2 2 X X X X X
Fig. 7.35 LR(1) parser with tags.
The complete sequence of moves made by the parser would be as follows:
([To, 0], aO2(aO l a), e)

1---([To, O]aTz, Oz(aOla), e)
l--- ([To, 0]E[Ti, 0], 0 2(aO1a), 3)
([To, 0]E[T1, 0]02[T4, 2], (aOla), 3)
I- ([To, 0]E[Ti, 0]02[T4, 2]([T3, 0], aO ~a), 3)
([To, 0]E[T1, 0]02[T4, 2]([T3, O]aTz, 01a), 3)
1- ([To, 0]E[Ti, 0]02[T4, 2]([T3, 0]E[Ts, 0], O~a), 33)
([To, 0lEITh, 0]02[T4, 2]([T3, 0]E[Ts, 0]01[T4, 1], a), 33)
I- ([To, 0]E[T1, 0]02[T4, 2]([T3, 0]E[Ts, 0]0i[T4, 1]aTz, ), 33)
I-- ([To, 0]E[T1, 0]02[T4, 2]([T3, 0]E[Ts, 0]01[T4, lIE[T6, 1], ), 333)
k- ([To, 0]E[T1, 0]02[T4, 2]([T3, 0]E[Ts, 0], ), 33311)
([To, 0]E[T1, 0]02[T4, 2]([T3, 0]E[Ts, 0])TT, e, 33311)
1- ([To, 0]E[T1, 0]02[Ta, 2]E[T6, 2], e, 333112)
t-- ([To, 0]E[T1, 0], e, 33311212)
**7.3.30. Show that the parser in Fig. 7.35 correctly parses all expressions gen-
erated by the tagged grammar.
7.3.31. Construct an LR(1) parser for the untagged grammar with operators
on n different precedence levels. How big is this parser compared with
the tagged parser in Fig. 7.35. Compare the operating speed of the two
parsers.
*7.3.32. Construct a tagged LR(1)-like parser for expressions with binary opera-
tors of which some associate from left to right and others from right
to left.
**7.3.33. The following tagged grammar will generate expressions with binary
operators on n different precedence levels"
(1) E~ - - , (Eo)Rt.,, 0 < i< n
(2) E~ --> aRt,,, 0 ~ i~ n
(3) Ri, k ---> OyEyRt.j_t 0 ~ i < j ~ k ~ n
(4) R~,j~e 0~i~j~n
Construct a tagged LL(1)-like parser for this grammar. Hint: Although
this grammar has two tags on R, only the first tag is needed by the
parser.
7.3.34. Complete the proof Of Lemma 7.1 and its corollary.
Open P r o b l e m
7.3.37. Under what conditions is it possible to merge all the goto columns for
the nonterminals after eliminating reductions by single productions,
as we did for Go ? The reader should consider the possibility of relating
this question to operator precedence. Recall that Go is an operator
precedence grammar.
Research Problems
7.3.38. Develop additional techniques for modifying sets of LR tables, while
preserving "equivalence" in the sense we have been using the term.
7.3.39. Develop techniques for compactly representing LR tables, taking advan-
tage of ~ entries.
P r o g r a m m i n g Exercises
7.3.40. Design elementary operations that can be used to implement an LR(1)
parser. Some of these operations might be: read an input symbol, push
a symbol on the pushdown list, pop a certain number of symbols from
the pushdown list, emit an output, and so forth. Construct an inter-
preter that will execute these elementary operations.
7.3.41. Construct a program that takes as input a set of LR(1) tables and
produces as output a sequence of elementary instructions that imple-
ments the LR(1) parser using this set of tables.
SEC. 7.4 TECHNIQUES FOR CONSTRUCTING LR(k) PARSERS 621
7.3.42. Construct a program that takes as input a set of LL(1) tables and
produces as output a sequence of elementary instructions that simulates
the LL(1) parser using this set of tables.
7.3.43. Write a program to add don't care entries to the canonical set of LR
tables.
*7.3.44. Write a program to apply some heuristics for error postponement and
table merger, with the goal of producing small sets of tables.
7.3.45. Implement Algorithm 7.9 to eliminate reductions by single productions
where possible.
BIBLIOGRAPHIC NOTES
The transformations considered in this section were developed by Aho and

Ullman [1972c, 1972d]. Pager [1970] considers another approach to the simplifi-
cation of LR(k) parsers in which a parser can be modified to such an extent that
it may no longer detect errors at the same position on the input as the canonical
LR(k) parser and may need to look at the stack to determine which reduction to
make. The idea of using tags in LL grammars and parsers was suggested by
P. M. Lewis, D. J. Rosenkrantz, and R. E. Stearns. Lewis and Rosenkrantz [1971]
report that by using tags to handle expressions and conditional statements, the
syntax analyzer in their ALGOL 60 compiler was reduced to a 29 by 37 LL(1)
parsing table.
7.4. TECHNIQUES FOR C O N S T R U C T I N G LR(k)

PARSERS
The a m o u n t of w o r k required to construct the sets of LR(k) items [and

hence the canonical LR(k) parser] grows rapidly with the size of the g r a m m a r
and with k, the length of the lookahead string. For large g r a m m a r s the
a m o u n t of computation needed to construct the canonical set of LR(k)
tables is so large as to be impractical, even if k = 1. In this section we shall
consider some more practical techniques which can be used to construct
valid sets of LR(1) tables from certain LR(1) grammars.
T h e first technique that we shall consider is the construction of the canon-
ical collection of sets of LR(0) items for a g r a m m a r G. If each set of LR(0)
items is consistent,t then we can construct a valid set of LR(0) tables for G.
If a set of LR(0) items is not consistent, then it is reasonable to attempt to
use lookahead strings in this set of items to resolve parsing action conflicts.
tA set of LR(k) items, ~, is consistent if we can construct an LR(k) table from ~ in

which the parsing actions are unique.
The saving in this approach is due to the fact that we would use lookahead
only where lookahead is needed. For many grammars this approach will
produce a set of tables which is considerably smaller than the canonical set
of LR(k) tables for G. However, for some LR(k) grammars this method does
not work at all.
We shall also consider another approach to the design of LR(k) parsers.
In this approach, we split a large grammar into smaller pieces, constructing
sets of LR(k) items for the pieces and then combining the sets of items to
form larger sets of items. However, not every splitting of an LR(k) grammar
G is guaranteed to produce pieces from which we can construct a valid set
of tables for G.
7.4.1. Simple LR Grammars
In this section we shall attempt to construct a parser for an LR(k) gram-

mar G by first constructing the collection of sets of LR(0) items for G. The
method that we shall consider works for a subclass of the LR grammars
called the simple LR grammars.
DEFINITION
Let G = (N, Z, P, S) be a C F G [not necessarily LR(0)]. Let g0 be the
canonical collection of sets of LR(0) items for G. Let a be any set of items
in So. Suppose that whenever [A ~ a - f l , el and [B ~ ~,. 5, el are two
distinct items in a, one of the following conditions is satisfied:
(1) Neither of fl and J are e.
(2) fl ~ e, J = e, and FOLLOWk(B ) ~ EFFk( fl FOLLOWk(A))t = ~.
(3) fl = e, 5 ~ e, and FOLLOWk(A ) ~ EFFk(O FOLLOWk(B) ) = ~.
(4) ,O = 5 = e and FOLLOWk(A ) ~ FOLLOWk(B ) = ~.
Then G is said to be a simple LR(k) grammar [SLR(k) grammar, for short].
Example 7.24
Let G Obe our usual grammar
E ~E+TIT
T >T*F[F
F ~ (E) la
The canonical collection of sets of LR(0) items for G is listed in Fig.

7.36, with the second components, which are all e, omitted.
tWe could use FOLLOWk_I(A) here, since fl must generate at least one symbol of
any string in EFFk(fl FOLLOWk(A)).
60: E'---+ -E 66: E ~ E + -T

E--~ .E + T T---~ .T* F
E----~ .T T----~ .F
T---~ .T* F F----~ .(E)
T---~ .F F---* -a
F---,.(E) 67" T~T, .F
F ~ .a F--~ .(E)
61: E'~ E. F--+ .a
E---+E.+T
68: F ----~ (E, )
62: E-----~ T. E--~ E.+T
T - - + T, , F
69: E---~E+ T.
63: T - - - , F. T-----~ T . , F
64: F---~ a. 6ta0: T - - ~ T , F.
65: F---~ (.E) 61x : F--~ (E).
E---~ .E+ T
E---~ .T
T--~ .T, F
T---~ .F
F ~ .(E)
F .---~ . a
Fig. 7.36 LR(0) items for Go.
G o is not SLR(0) because, for example, 61 contains the two items

[E'--~ E . ] and [E ~ E - ÷ T] and
F O L L O W 0 ( E ' ) = {e} = E F F 0 [ + T FOLLOW0(E)].t
However, Go is SLR(1). To check the SLR(1) condition, it suffices to

consider sets of items which
(1) Have at least two items, and
(2) Have an item with the dot at the right-hand end.
Thus, we need concern ourselves only with 61, 62, and 69. For 61, we observe
that FOLLOWi(E' ) = [ e } and E F F I [ + T FOLLOW i (E)] = { + } . Since
[e} A { + ] = ~ , 61 satisfies condition (3) of the SLR(1) definition. 62 and
69 satisfy condition (3) similarly, and so we can conclude that G Ois SLR(1).
Now let us attempt to construct a set of LR(1) tables for an SLR(1)

grammar G starting from So, the canonical collection of sets of LR(0) items
for G. Suppose that some set of items 6 in So has only the items [A --~ e -, e]
and [B ~ fl.~,, e]. We can construct an LR(I) table from this set of
tNote that for all ~, F I R S T o ( ~ ) -- E F F o ( ~ ) -- F O L L O W o ( ~ ) -- [e].

items as follows. The goto entries are constructed in the obvious way, as
though the tables were LR(0) tables. But for lookahead a, what should the
action be? Should we shift, or should we reduce by production A----~ ct.
The answer lies in whether or not a E F O L L O W i (A). If a ~ FOLLOWi(A),
then it is impossible that a is in EFFi(?) , by the definition of an SLR(1)
grammar. Thus, reduce is the appropriate parsing action. Conversely, if a is
not in F O L L O W i (A), then it is impossible that reduce is correct. If a is in
EFF(~,), then shift is the action; otherwise, error is correct. This algorithm
is summarized below.
ALGORITHM 7.10
Construction of a set of LR(k) tables for an SLR(k) grammar.
Input. An SLR(k) grammar G = (N, E, P, S) and So, the canonical col-
lection of sets of LR(0) items for G.
Output. (3, To), a set of LR(k) tables for G, which we shall call the SLR(k)
set of tables for G.
Method. Let ~ be a set of LR(0) items in So. The LR(k) table T associated
with ~ is the pair ( f , g), constructed as follows:
(1) For all u in Z *k,
(a) f(u) = shift if [A --~ ~ • fl, e] is in ~, ,8 ~ e, and u is in the set
EFFk(fl FOLLOWk(A)).
(b) f ( u ) = reduce i if [d---~ 0c., e] is in ~, d - - ~ o~ is production i
in P, and u is in FOLLOWk(A).
(c) f(e) = accept if [S' --~ S -, e] is in ~2.'~
(d) f ( u ) = error otherwise.
(2) For all X in N t.3 X, g(X) is the table constructed from G O T O ( a , X).
To, the initial table, is the one associated with the set of items containing
[s' ~ . S, d.
We can relate Algorithm 7.10 to our original method of constructing

a set of tables from a collection of sets of items given in Section 5.2.5. Let
~' be the set of items [A ~ tx • fl, u] such that [A ~ e • fl, e] is in ~ and u
is in FOLLOWk(A ). Let S~ be [ ~ ' [ ~ ~ g0}- Then, Algorithm 7.10 yields the
same set of tables that would be obtained by applying the construction given
in Section 5.2.5 to g~.
It should be clear from the definition that each set of items in g; is con-
sistent if and only if G is an SLR(k) grammar.
Example 7.25
Let us construct the SLR(1) set of tables from the sets of items of Fig.
7.36 (p. 623). We use the name Tt for the table constructed from ~ r We
shall consider the construction of table T2 only.
"l'The canonical collection of sets of items is constructed from the augmented grammar.
tt 2 is {[E--+ T .], [T ~ T - • F]}. Let T2 = ( f , g). Since F O L L O W ( E )

is [ + , ), e}, we have f ( + ) = f D ] = f ( e ) = reduce 2. (The usual p r o d u c t i o n
n u m b e r i n g is being used.) Since E F F ( , F F O L L O W ( T ) ) = [,], f ( , ) = shift.
F o r the other lookaheads, we have f ( a ) = f[(] = error.
The only symbol X for which g ( X ) is defined is X = ,. It is easy to see
by inspection o f Fig. 7.36 that g ( , ) = TT. The entire set of tables is given
in Fig. 7.37.
action goto
a + * ( ) e E T F a + • ( )
r0 S X X S X X T~r2T3T4X x T5 X
rl X S X X X A X X X X T6 X X X
r2 X 2 S X 2 2 X X X X X T7 X X
7q X 4 4 X 4 4 X X X X X X X X
r4 X 6 6 X 6 6 X X X X X X X X
r5 S X X S X X Ts T2 T3 T4 X XTsX
r6 S X X S X X X Tg T3 T4 X X T5 X
r7 S X X S X X X X TlO Tg X T5 X
r8 X S X X S X X X X X T6 x XT~
r9 X 1 S X 1 1 x x x x x rT x x
T10 X 3 3 X 3 3 X X X X X X X X
X 5 5 X 5 5 X X X X X X X X
Fig. 7.37 SLR(1) tables for Go.
Except for names and ~ entries, this set o f tables is exactly the same as
that in Fig. 7.32 (p. 607).
We shall now prove that A l g o r i t h m 7.10 will always p r o d u c e a valid set

of LR(k) tables for an SLR(k) g r a m m a r G. In fact, the set o f tables p r o d u c e d
is equivalent to the canonical set o f LR(k) tables for G.
THEOREM 7.9
If G is an SLR(k) g r a m m a r , (3, To), the SLR(k) set o f tables constructed
by A l g o r i t h m 7.10 for G is equivalent to (3 c, To), the canonical set o f LR(k)
tables for G.
P r o o f Let gk be the canonical collection o f sets o f LR(k) items for G
and let So be the collection o f sets o f LR(0) items for G. Let us define the
• core of a set of items R as the set of bracketed first components of the items
in that set. For example, the core of [A ~ 0c • fl, u] is [A ~ 0c • fl].t We shall
denote the core of a set of items R by CORE(R).
Each set of items in So is distinct, but there may be several sets of items
in Sk with the same core. However, it can easily be shown that So =
{CORE(et) let ~ Sk}.
Let us define the function h on tables which corresponds to the function
C O R E on sets of items. We let h(T) -- T' if T is the canonical LR(k) table
associated with R and T' is the SLR(k) table constructed by Algorithm 7.10
from CORE(R). It is easy to verify that h commutes with the GOTO function.
That is, GOTO(h(T), %) = h(GOTO(T, X)).
As before, let
R'= {[A---, tx. ,8, ull[A---~ ~ - f l , e] e R and u ~ FOLLOWk(A)}.
Let $~ = [R'IR ~ So}. We know that (3, To) is the same set of LR(k) tables
as that constructed from S'0 using the method of Section 5.2.5. We shall show
that (5, To) can also be obtained by applying a sequence of transformations to
(3c, To), the canonical set of LR(k) tables for G. The necessary steps are the
following.
(1) Let 6' be the postponement set consisting of those triples (T, u, i)
such that the action of T on u is error and the action of h(T) on u is reduce i.
Use Algorithm 7.8 on 6" and (3 c, Tc) to obtain another set of tables (3'c, T'c).
(2) Apply Algorithm 7.7 to merge all pairs of tables T~ and Tz such that
h(T1) = h(Tz). The resulting set of tables is (3, To).
Let (T, u, i) be an element of 6". To show that 6' satisfies the requirements
of being a postponement set for 3c, we must show that if T " = ( f " , g " )
is in NEXT(T, i), then f " ( u ) = error. To this end, suppose that production
i is A ---, 0c and T " = GOTO(To, flA) for some viable prefix flA. Then
T = GOTO(T0,/~).
In contradiction let us suppose t h a t f " ( u ) ~ error. Then there is some item
[B ~ y • ,5, v] valid for flA, where u is in EFF(~v).:I: Every set of items,
except the initial set of items, contains an item in which there is at least one
symbol to the left of the dot. (See Exercise 7.3.8.) The initial set of items is
valid only for e. Thus, we may assume without loss of generality that 7 = y'A
for some y'. Then [B ---~ y' • A6, v] is valid for fl, and so is [A ---~ • ~, u].
Thus, [A --~ ~ . , u] is valid for p~, and f ( u ) should not be error as assumed.
We conclude that 6' is indeed a legitimate postponement set for ~o.
Let ~3~ be the result of applying Algorithm 7.8 to 3c using the postpone-
tWe shall not bother to distinguish between [A ---~ ~ • ,8] and [A ---~ • • ,13,e].
~Note that this statement is true independent of whether ~ = e or not.
ment set (P. Now suppose that T is a table in 3c associated with the set of
items ~ and that T' is the corresponding modified table in 31- Then the only
difference between T and T' is that T' may call for a reduction when T
announces an error. This will occur whenever u is in FOLLOW(A) and the
only items in ~ of the form [A ---, ~ . , v] have v ~ u. This follows from the
fact that because of rule (lb) in Algorithm 7.10, T' will call for a reduction
on all u such that u is FOLLOW(A) and [A --~ ct •] is an item in CORE(a).
We can now define a partition II -- [(B t, ( B 2 , . . . , 6~r} on 31 which groups
tables T1 and Tz in the same block if and only if h(T1) --h(T2). The fact
that h commutes with GOTO ensures that II will be a compatible partition.
Merging all tables in each block of this compatible partition using Algorithm
7.7 then produces 3.
Since Algorithms 7.7 and 7.8 each preserve the equivalence of a set of
tables, we have shown that 3, the set of LR(k) tables for G, is equivalent to
3c, the canonical set of LR(k) tables for G. [--]
Before concluding our discussion of SLR parsing we should point out

that the optimization techniques discussed in Section 7.3 also apply to SLR
tables. Exercise 7.4.16 states which error entries in an SLR(1) set of tables
are don't cares.
7.4.2. Extending the SLR Concept to non-SLR

Grammars
There are two big advantages in attempting to construct a parser for

a grammar from So, the canonical collection of sets of LR(0) items. First,
the amount of computation needed to produce So for a given grammar is
much smaller in general than that required to generate ,~t, the sets of LR(1)
items. Second, the number of sets of LR(0) items is generally considerably
smaller than t h e number of sets of LR(1) items.
However, the following question arises: What should we do if we have
a grammar in which the F O L L O W sets are not sufficient to resolve parsing
action conflicts resulting from inconsistent sets of LR(0) items ? There are
several techniques we should consider before abandoning the LR(0) approach
to parser design. One approach would be to try to use local context to resolve
ambiguities. If this approach is unsuccessful, we might attempt to split one
set of items into several. In each of the pieces the local context might result
in unique parsing decisions. The following two examples illustrate each of
these approaches.
Example 7.26
Consider the LR(1) grammar G with productions
(1) S ~ Aa
(2) S ~ dAb
(3) S ~ cb
(4) S ~ dca
(5) A-re
Even though L ( G ) consists only of four sentences, G is not an SLR(k) gram-
mar for any k :> 0. The canonical collection of sets of LR(0) items for G is
given in Fig. 7.38. The second components of the items have been omitted
~0: s' ~ .s
s ~ . A a [ . d A b [ .cb [ .dea
A ---. . c
~x : S ' ---> S .
¢gz : S ~ A .a
a,3: S---* d.Ab[d.ea
A--+.e
a~4 : S ---> c . b
A .--~ c.
¢g5: S --~ Aa.
a~6 : S ~ dA.b
a7: S --~ dc.a

A ---~ c-
ms: S - - - ~ cb.
~tg: S ~ dab.
~a0: S ~ dca.
Fig. 7.38 Sets of LR(0) items.
and we have used the notation A --, al • fla i a~ • f121 "'" l a , " ft, as short-
hand for the n items [ A - - , a l . f l ~ ] , [ A ~ a s - f l 2 ] , . . . , [ A ~ a , . f l , ] .
There are two sets of items that are inconsistent, 64 a n d 67. Moreover,
since FOLLOW(A) -- {a, b}, Algorithm 7.10 will not produce unique parsing
actions from 64 and 67 on the lookaheads b and a, respectively.
However, let us examine the GOTO function on the sets of items as
graphically shown in Fig. 7.39.? We see that the only way to get to 64 from
6 0 is to have e on the pushdown list. If we reduce e to A, from the produc-
tions of the grammar, we see that a is the only symbol that can then follow
A. Thus, 7'4, the table constructed from 64, would have the following unique
parsing actions"
c d
4" reduce 5 shift error I error ]
tNote that this graph is acyclic but that, in general, a goto graph has cycles.
SEC. 7.4 TECHNIQUESFOR CONSTRUCTINGLR(k) PARSERS 629
S a c
Similarly, from the GOTO graph we see that the only way to get to ~7 from
a0 is to have de on the pushdown list. In this context if c is reduced to A,
the only symbol that can then legitimately follow A is b. Thus, the parsing
actions for TT, the table constructed from a7, would be
a b c d
T7"[ shJift reduce5 ] error ] error
The remaining LR(1) tables for G can be constructed using Algorithm 7.10
directly. [-]
The grammar in Example 7.26 is not an SLR grammar. However, we were

able to use lookahead to resolve all ambiguities in parsing action decisions
in the sets of LR(0) items. The class of LR(k) grammars for which we can
always construct LR parsers in this fashion is called the class of lookahead
L R ( k ) grammars, L A L R ( k ) for short (see Exercise 7.4.1I for a more precise
definition). The LALR(k) grammars are the largest natural subclass of the
LR(k) grammars for which k symbol lookahead will resolve all parsing ac-
tion conflicts arising in S0, the canonical collection of sets of LR(0) items.
The lookaheads can be computed directly from the GOTO graph for S0 or
by merging the sets of LR(k) items with identical cores. LALR grammars
include all SLR grammars, but not all LR grammars are LALR grammars.
We shall now give an example in which a set of items can be "split" to
obtain unique parsing decisions.
Example 7.27
Consider the LR(1) grammar G with productions
(1) S ~ Aa
(2) S--~ dAb
(3) S ~ Bb
(4) S -~ dBa
(5) A - ~ c
(6) B - - - ~ c
This grammar is quite similar to the one in Example 7.26, but it is not an
L A L R grammar. The canonical collection of sets of LR(0) items for the
augmented grammar is shown in Fig. 7.40. The set of items et s is inconsistent
because we do not know whether we should reduce by production A --~ e
or B ~ e. Since FOLLOW(A) -- FOLLOW(B) = {a, b}, using these sets
as lookaheads will not resolve this ambiguity. Thus G is not SLR(I).
12o: S" ~ "S as" A - - ~ c.

S ~ .Aa B--re.
S ~ .dAb
126: S----~Aa.
S----~ . B b
S ~ .dBa 127 : S---~Bb.
A ---+ . c 128 : S ~ dA.b

B ---~ . e
129: S ~ dB.a
121: S" ----.~. S .
121o: S.--~ dAb.
122: S ~ A.a
121i: S ~ dBa.
123: S ~ B.b
124: S ~ d.Ab
S ~ d.Ba
A .---~ . c
B ---~ . c
Fig. 7.40 LR(O) items.
Examining the productions of the grammar, we know that if we have

only e on the pushdown list and if the next input symbol is a, then we should
use production A --~ e to reduce c. If the next input symbol is b, we should
use production B---, e. However, if de appears on the pushdown list and
the next input symbol is a, we should use production B---, c to reduce c.
If the next input symbol is b, we should use A ----, c.
The GOTO function for the set of items is shown in Fig. 7.41. Unfor-
tunately, ~5 is accessible from a 0 under both ¢ and de. Thus, a 5 does not
tell us whether we have e o r de on the pushdown list, and hence G is not
LALR(1).
S A c
b A B
However, we can construct an LR(1) parser for G by replacing ~5 by two

identical sets of items ~ and ~ ' such that ~ is accessible only from t~4 and
~ ' is accessible from only t/,0. These new sets of items provide the additional
needed information about what has appeared on the pushdown list.
From ~ and ~2~' we can construct the tables with unique parsing actions
as follows:
a b c d
t
T~" reduce 6 reduce 5 error error
TJt" reduce 5 reduce 6 error error
The value of the goto functions of T~ and T~' is always error. D

7.4.3. Grammar Splitting
In this section we shall discuss another technique for constructing LR

parsers. It is not as easy to apply as the SLR approach, but it does work in
situations where the SLR approach does not. Here, we partition a grammar
G = (N, E , P , S) into several component grammars by treating certain
nonterminal symbols as terminal symbols. Let N' ~ N be such a set of
"splitting,' nonterminals. For each A in N' we can find GA, the component
grammar with start symbol A, using the following algorithm.
ALGORITHM 7.11
Grammar splitting.
lnput. A C F G G = (N, Z, P, S) and N', a subset of N.
Output. A set of component grammars GA for each A ~ N'.

Method. For each A E N', construct GA as follows:
(1) On the right-hand side of each production in P, replace every non-
terminal B ~ N' by J~. Let ~ be [./~[B ~ N'} and let the resulting set of
productions be P.
(2) Define G~ = (N - N' U {A}, Z U N, P, A).
-
(3) Apply Algorithm 2.9 to eliminate useless nonterminals and produc-

tions from G]. Call the resulting reduced grammar Ga. D
Here we consider the building of LR(1) parsers for each component gram-
mar and the merger of these parsers. Alternatives involve the design of dif-
ferent kinds of parsers for the various components. For example, we could
use an LL(1) parser for everything but expressions, for which we would use
an operator precedence parser. Research extending the techniques of" this
section to several types of parsers is clearly needed.
Example 7.28
Let Go be the usual grammar and let N' = {E, T} be the splitting nonter-
minals. Then P consists of
T > ~'* F I F
F > (/~) [a
Thus, GE = ([E}, [/~, ~', +}, {E ~ E + ~'l~'}, E), and GT is given by

(IT, F}, {~, E, (,), a, ,}, {T --~ 7~ • FI F, F ---} (~)[ a}, T). [~
We shall now describe a method of constructing LR(1) parsers for certain

large grammars. The procedure is to initially partition a given grammar into
a number of smaller grammars. If a collection of consistent sets of LR(1)
items can be found for each component grammar and if certain conditions
relating these sets of items are satisfied, then a set of LR(1) items can be
constructed for the original grammar by combining the sets of items for
the component grammars. The underlying philosophy of this procedure is
that much less work is usually involved in building the collections of sets of
LR(1) items for smaller grammars and merging them together than in con-
structing the canonical collection of sets of LR(1) items for one large gram-
mar. Moreover, the resulting LR(1) parser will most likely turn out to be
considerably smaller than the canonical parser.
In the grammar-splitting algorithm we treat A as the start symbol of its
own grammar and use FOLLOW(A) as the set of possible lookahead strings
for the initial set of items for the subgrammar GA. The net effect will be to
merge certain sets of items having common cores. The similarity to the SLR
algorithm should be apparent. In fact, we shall see that the SLR algorithm
is really a grammar-splitting algorithm with N' = N.
The complete technique can be summarized as follows.
(1) Given a grammar G = (N, E, P, S), we ascertain a suitable splitting
set of nonterminals N' ~ N. We include S in N'. This set should be large
enough so that the component grammars are small, and we can readily
construct sets of LR(1) tables for each component. At the same time, the
number of components should not be so large that the method will fail to
produce a set of tables. (This comment applies only to non-SLR grammars.
If the grammar is SLR, any choice for N ' will work, and choosing N ----N'
yields the smallest set of tables.)
(2) Having chosen N', we compute the component grammars using
Algorithm 7.11.
(3) Using Algorithm 7.12, below, we compute the sets of LR(1) items for
each component grammar.
(4) Then, using Algorithm 7.13, we combine the component sets of items
into S, a collection of sets of items for the original grammar. This process
may not always yield a collection of consistent sets of items for the original
grammar. However, if $ is consistent, we then construct a set of LR(1)
tables from 8 in the usual manner.
ALGORITHM 7.12
Construction of sets of LR(1) items for the component grammars of
a given grammar.
Input. A grammar G----(N, E, P, S), a subset N' of N, with S ~ N',
and the component grammars GA for each A ~ N'.
Output. Sets of LR(1) items for each component grammar.
Method. For notational convenience let N' = [Sx, Sz,. • •, am}. We shall
denote Gs, as Gi.
If O, is a set of LR(1) items, we compute O0', the closure of ~ with respect
to GA, in a manner similar, but not identical, to Algorithm 5.8. a ' is defined
as follows:
(i) Oo ~ O~'. (That is, all-items in 6t are in ~2'.)
(2) If [B --~ ~ • Cfl, u] is in a,' and C --~ 7 is a production in G~, then
[C ~ • 7, v] is in a ' for all v in FIRST~(fl'u), where fl' is fl with each symbol
in N replaced by the original symbol in N.
Thus all lookahead strings are in E*, while the first components of the
items reflect productions in GA.
For each Gi, we construct $i, the collection of sets of LR(1) items for G~,
as follows:
(1) Let ~ be the closure (with respect to Gi) of
{[St --+ • a, a][S, ~ a is a production in G, and a is in FOLLOW?(S,)}.
Let g, = [it, t}.

(2) Then repeat step (3) until no new sets of items can be added to $i.
(3) If ~ is in $i, let ~2, be {[A ~ a X . fl, u]l[A ~ a . Xfl, u] is in ~}.
Here X is in N U E W N. Add ~", the closure (with respect to Gt) of ~',
to S~. Thus, ~" = GOTO(~, X). [~]
Note that we have chosen to add FOLLOW(A) to the lookahead set of

each initial set of items rather than augmenting each component grammar
with a zeroth production. The effect will be the same.
Example 7.29
Let us apply Algorithm 7.12 to G 0, with N ' = [E, T}. We find that
FOLLOW(E) -- {+, ), e} and FOLLOW(T) -- { + , . , ), e}. Thus, by step (1),
ag consists of
[E---+ . E-+- f', -+- l ) l e]
[E > • ~P, - k - / ) / e l
Likewise, ao~ consists of
[T---+. ~',F, +/,/)/el
[r > • F, + ~ , ~ ) ~ e l
[F > • (~), + i • / ) / el
[F- >.a,+l*l)le]
The complete sets of items generated for Ge are shown in Fig. 7.42, and
those for Gr are shown in Fig. 7.43.
Note that when a f is constructed from a f , for example, the symbol ~P
is a terminal, and the closure operation yields no new items. [--]
We now give an algorithm that takes the sets of items generated by

Algorithm 7.12 for the component grammars and combines them to form
/[E--+ -£' + L + / ) / e ]
~o~. / [E---~ L + I)le]
[e--~ D.+~, + l)le]
[E---> ~., + I ) l e]
[E---> ~ + ' L + I)le]
a~- [E--> ~ -t- ~', ÷ / ) / e ]
Fig. 7.42 Sets of items for Gz.
a,0r"
{
[T----, . ~ . F ,
[T----, .F,
[F---, •(E),
[F---~-a,
+/./)/e]
+/./)/e]
+ / • / ) / e]
÷/./)/e]
aT: [T--~ f-*F, + l * l ) l e ]
t~" [T--> F., +l./)/e]
a~'" [F--~ (./~), +/*/)/el
a~- [F---~ a., + / • / ) / e]
f[T---~ ~'*.F, ÷ / , / ) / e ]
~sr" I[F--~'(E)' + / , / ) / e ]
,[F---, .a, + / • / ) / e]
a~': [F~(E.), +/,/)/e]
eta': [T----~f , F . , + / , / ) / e ]
at: [F--~(E)., +/,/)/e]
Fig. 7.43 Sets of items for Gr.
a set of LR(1) tables for the original grammar, provided that certain condi-
tions hold.
ALGORITHM 7.13
Construction of a set of LR(1) tables from the sets of LR(1) items of
component grammars.
lnput. A C F G G = (N, E, P, $1), a splitting set N' = ($1, $2,. • . , Sin},
and a collection {6g, 6 ] , . . . , 6~,} of sets of LR(1) items for each component
grammar Gi.
Output. A valid set of LR(1) tables for G or an indication that the sets of
items will not yield a valid set of tables.
Method.
(1) In the first component of each item, replace each symbol of the form
,Si by S~. Each such S~ is in N'. Retain the original name for each set of items
so altered.
(2) Let ~'0 = [[S'1 ~ • $I, e]}. Apply the following augmenting operation
to a0, and call the resulting set of items ~'0- a0 will then be the "initial" set
of items in the merged collection of sets of items.
Augmenting Operation. If a set of items 6 contains an item with a first
c o m p o n e n t of the form A ~ ~ • Bfl and B *~
G
Sit, for some S i in N', y in
(N U ~)*, then add 6~ to 6. Repeat this process until no new sets of items
can be added to 6.
(3) We shall now construct ,~, the collection of sets of items accessible
from tt o. Initially, let S = (tt0). We then perform step (4) until no new sets
of items can be added to 8.
(4) Let a be in g. a can be written as ~ U a,l) U ~tl,' U . . . u 6t[;, where

~t is either the empty set or {[S'1 ~ • S , , e]} or {[S', ~ S , . , e]}. For each X
in N u X, let a,' = GOTO(a,, X) and g ~ = GOTO02L,5 X).? Let tf be the
union of g' and these ~]~'s. Then apply the augmenting operation to a' and
call the resulting set of items a'. Let KGOTO~: be the function such that
KGOTO(a, X) = a' if a, X, and a' are related as above. Add a' to g if it is
not already there. Repeat this process until for all ~ in $ and X in N u X,
KGOTO(a, X) is in g.
(5) When no new set of items can be added to g, construct a set of LR(1)
tables from g using the methods of Section 5.2.5. If table T = ( f , g ) is
being constructed from the set of items a, then g ( X ) is KGOTO(a, X). If any
set of items produces parsing action conflicts, report failure. D
Example 7.30
Let us apply Algorithm 7.13 to the sets of items in Figs. 7.42 and 7.43.
The effect of step (1) should be obvious. Step (2) first creates the set of items
ao = { [ E ' - - , • E , e]}, and after applying the augmenting operation, a0 =
{[E' --~ • E, el} U ago U ao~.
At the beginning of step (3), $ = [a0}. Applying step (4), we first compute
~I = GOTO(go, E) = {[E' --~ E . , e]} U ale.
That is, GOTO({[E' --~ • E, el}, E) = [[E'--, E . , e]} and GOTO(ag, E) = af.
GOTO(ff,~, E) is empty. The augmenting operation does not enlarge g,.
We then compute g2 = GOTO(go, T) = ff,f Y ~'. The augmenting operation
does not enlarge a 2. Continuing in this fashion, we obtain the following col-
lection of sets of items for ,~"
go = {[E' > • E, e]} U 12o

~ U aTo
a, = {[E' > E . , e]} U ~
a4 = ~
a~ = ~ u ~ u ~
~rs = elf u e~
tThe GOTO function for G],, is meant here. However if X is splitting nonterminal then
use X in place of X.
:l:The K honors A. J. Korenjak, inventor of the method being described.
alo = a,~
atx : ~
All sets of items in S are consistent, and so from S we can construct

the set of LR(1) tables shown in Fig. 7.44. Table Tt is constructed from a t.
This set of tables is identical to that in Fig. 7.37 (p. 625). We shall see
that this is not a coincidence.
action goto
a + * ( ) e E T F a + * ( )
To s X x s x x T1 T2 T3 T4 X xTsx
T1 X S X X X A X X X X r6 X X X
T2 X 2 S X 2 2 X X X X X T7 X X
T3 X 4 4 X 4 4 X X X X X X X X
T4 X 6 6 X 6 6 X X X X X X X X
T5 S X X S X X T8 T2 T3 T4 X X Ts X
T6 S X X S X X X T9 T3 T4 X X T5 X
T7 S X X S X X X X TlO T4 X X T5 X
T8 X S X X S X X X X X 76 X X Ti1
T9 X 1 S X 1 1 X X X X X T7 X X
Tlo X 3 3 X 3 3 X X X X X X X X
Tll X 5 5 X 5 5 X X X X X X X X
Fig. 7.44 Tables for Go from Algorithm 7.13.
We shall now show that this approach yields a set of LR(1) tables that is
equivalent to the canonical set of LR(1) tables. We begin by characterizing
the merged collection of sets of items generated in step (4) of Algorithm 7.13.
DEFINITION
Let KGOTO be the function in step (4) of Algorithm 7.13. We extend

KGOTO to strings in the obvious way; i.e.,
(1) KGOTO(a, e) = a, and
(2) KGOTO(a, czX) = KGOTO(KGOTO(a, ~), X).
Let G -- (N, Z, P, S) be a C F G and N' a splitting set. We say that item

[A --~ ~ • fl, a] is quasi-valid for string ), if it is valid (as defined in Section
5.2.3) or if there is a derivation S' *=~ rm
O lBx *=,
rm
61~2Ax =~rm
~162~flx in the
augmented g r a m m a r such that
(1) 61t~2t~ : 7,
(2) B is in N', and
(3) a is in F O L L O W ( B ) .
Note that if [A --~ ~ • fl, a] is quasi-valid for 7, then there exists a lookahead
b such that [A ----. ~ • fl, b] is valid for y. (b might be a.)
LEMMA 7.3
- ~ e t G - - ( N , Z , P , S) be a C F G as above and let KGOTO(a0, ) , ) - -
for some 7 in (N U E)*. Then tt is the set of quasi-valid items for 7.
Proof We prove the result by induction on l7 l- The basis, ~, -- e, is omit-
ted, as it follows from observations made during the inductive step.
Assume that ~ , - - y ' X and ~ , the set of quasi-valid items for ?', is
K G O T O ( a 0 , y') Let ~ = all' U --- U ~.:" The case where [S'
" lk • S, e] or
~ "
[S' --~ S -, e] is in ~ can be handled easily, and we shall omit these details
here. Suppose that [A---~ ~ . p, a] is in a = KGOTO(a0, 7"X). There are
three ways in which [.4 ----~ t~ • fl, a] can be added to a.
Case 1" Suppose that [A ~ tx- ,8, a] is in GOTO(a/~', X) for some p,
a : a ' X and that [A --~ a' • Xfl, a] is in a[,". Then [A --~ a' . Xfl, a] is quasi-
valid for y', and it follows that [A --~ a • fl, a] is quasi-valid for ~,.
Case 2." Suppose that [A --~ a • fl, a] is in GOTO(a/,", X) and that a = e.
Then there is an item [B ---, $1X" COz, b] in GOTO(~/,", X), and C *=, A w,
rm
where a ~ FIRST(wO2b). Then [ B - - , ~ . XCO2, b] is quasi-valid for ?',
and [B ~ ~ i X . Cd~2, b] is quasi-valid for ?. If a is the first symbol of w or
w ---- e and a comes from ~2, then [A --~ a • fl, a] is valid for ~,. Likewise, if
[B --~ O l X . C~2, b] is valid for ),, so is [A - ~ a • fl, a].
Thus, suppose that w-----e, Oz =~ e, a = b, and [B---. O l X " C~2, b] is
quasi-valid, but not valid for ?. Then there is a derivation
S' *=, ~3Dx *=, ~3O4Bx =* ~3O4~XCO2x *=* O3~O~XAx,

rm rm rm rm
where 6 ~ 4 ~ X = ~,, D is in N', and b is in F O L L O W ( D ) . Thus, item

[A ~ ~z • fl, a] is quasi-valid for ~,, since a = b.
Case 3: Suppose that [h ~ a • fl, a] is added to tt during the augmenting
operation in step (4). Then ~ = e, a n d there must be some [B ~ ~ X . COz,b]
in GOTO(a/¢, X) such that C ~r m Dw~ *~
rm
A wzw~, D is in N', and a is in
FIRST(wzc) for some e in F O L L O W ( D ) . That is, D is St, where [A --~ ~x- ,8, a]
is added to ~ when ~ is adjoined. The argument now proceeds as in case 2.
We must now show the converse of the above, that if [A --~ ~ • fl, a] is
quasi-valid for y, then it is in g. We omit the easier case, where the item is
actually valid for ?. Let us assume that [A ~ e • fl, a] is quasi-valid, but not
valid. Then there is a derivation
S' & ,~Bx& O,6~ax~ O,O~e#x

rIIl rm
where ? = d ~ 2 a , B is in N', and a is in FOLLOW(B). If ~ ~ e, then we may

write ~ = ~'X. Then [A ~ ~' • Xfl, a] is quasi-valid for y' and is therefore
in ~ . It is immediate that [A ---~ ~ • fl, a] is in ~.
Thus, suppose that ~ = e. We consider two cases, depending on whether
t~2 -~ e or g2 = e.
Case 1: ~ ~ e. Then there is a derivation B *~ t~3C =~ ~3t~4X~5 and
rm I'm
~5 *~ A. Then [C--~ $4" Xt~5, a] is quasi-valid for ?' and hence is in ~ .

rm
It follows that [C ----~ g4X" ~5, a] is in ~. Since $5 *~ A, it is not hard to show

rm
that either in the closure operation or in the augmenting operation

[A ---, ~ • fl, a] is placed in a.
Case 2:~2 = e. Then there is a derivation S' *~ g3Cy=~ t~3t~4Xt55y, rill rm
where gsy*~ Bx. Then [ C - - - ~ 4 - X~5, c] is valid for y', where c is

rill
FIRST(y). Hence, [C ~ ~ 4 X . ~5, c] is in g. Then, since d;sy *~ Bx, in the rill
augmenting operation, all items of the form [B ~ . e , b] are added to g,

where B ~ e is a production and b is in FOLLOW(B). Then, since B *~ A, rm
the item [A ~ ~ • fl, a] is added to g, either in the modified closure of the set
containing [B ~ • e, b] or in a subsequent augmenting operation. D
THEOREM 7.10
Let (N, E, P, S) be a CFG. Let (3, To) be the set of LR(1) tables for G
generated by Algorithm 7.13. Let (3 o To) be the canonical set. Then the two
sets of tables are equivalent.
Proof We observe by Lemma 7.3 that the table of 3 associated with string
y agrees in action with the table of 3c associated with ~, wherever the latter is
not error. Thus, if the two sets are inequivalent, we can find an input w such
that (To, w) ~ (TcX~T~ ... XmTm, x)t using 3~, and an error is then declared,
while
(To, w) ~ (To X~TI ... XmT'~, x)

F-- (To Y~ U1 "'" Y.U., x)
t--- (To Yi Ui ... Y,U, aU, x')
using 5.
"l'We have omitted the output field in these configurations for simplicity.
-~ 7 i~ - ~ :
Suppose that table U, is constructed from the set of items ~. Then ~ has
some member [A ~ a - fl, b] such that ,8 ~ e and a is in EFF(fl). Since
[A --~ a • fl, b] is quasi-valid for Y1 "'" Y,, by Lemma 7.3, there is a deri-
vation S' *~
rm
7'Ay =~
rm
Taffy for some y, where 7'a = Yi "'" Y,. Since we have
derivation Yi . . . Y,, *~
rm
X~ . . . Xm, it follows that there is an item
[B --~ ~ • e, c] valid for X~ . . . Xm, where a is in EFF(ec). (The case e = e,
where a = e, is not ruled out. In fact, it will occur whenever the sequence of
steps
(ToX, T'i . . . XmT~, x) l--~--(To Y1 U1 " ' " YnUn, X)
is not null. If that sequence is null, then [ A - - , a . fl, b] suffices for

[B ~ ,~ • e, el.)
Since a = FIRST(x), the hypothesis that 3' declares an error in configu-
ration (ToX~T'i . . . X, nT'm, x) is false, and we may conclude the theorem. [~
Let us compare the grammar-splitting algorithm with the SLR approach.

The grammar-splitting algorithm is a generalization of the SLR method in
the following sense.
THEOREM 7.11
Let G = (N, E, P, S) be a CFG. Then G is SLR(1) if and only if Algorithm
7.13 succeeds, with splitting set N. If so, then the sets of tables produced by
the two methods are the same.
Proof. If N' = N, then [A ~ a • fl, a] is quasi-valid for 7' if and only if
[A ~ a • fl, b] is valid for some b and a is in FOLLOW(A). This is a con-
sequence of the fact that if B *~ OC, then FOLLOW(B) ~ F O L L O W (C).
rm
It follows from Lemma 7.3 that the SLR sets of items are the same as those
generated by Algorithm 7.13. [~]
Theorem 7.9 is thus a corollary of Theorems 7.10 and 7.11.

Algorithm 7.13 brings the full power of the canonical LR(1) parser con-
struction procedure to bear on each component grammar. Thus, from an
intuitive point of view, if a grammar is not SLR, we would like to isolate in
a single component each aspect of the given grammar that results in its being
non-SLR. The following example illustrates this concept.
Example 7.31
Consider the following LR(1) grammar"
(1) S - - , Aa
(2) S--~ dab
(3) s ~ cb
(4) S---, BB
(5) A ~c
(6) B---, Bc
(7) B--, b
This g r a m m a r is n o t S L R because of p r o d u c t i o n s (1), (2), (3), and (5). Using

the splitting set [S, B}, these four p r o d u c t i o n s will be together in one com-
p o n e n t g r a m m a r to which the full LR(1) technique will then be applied.
With this splitting set A l g o r i t h m 7.13 would p r o d u c e the set of L R ( I ) tables
in Fig. 7.45. [~
action goto
a b c d e S A B a b c d
To X S S S X T1 T2 T3 X T4 T5 T6
T~ X X X X A X X X X X X X
T2 S X X X X x x x TT x x x
T3 X S S X X x x T8 X T4 T9 X
T4 X 7 7 X 7 X X X X X X X
T5 5 S X X X X X X X Tlo X X
T6 X X S X X X Tll. X X X T12 X
V7 X X X X 1 X X X X X X X
T8 X X S X 4 x x x x X Tg X
r9 X 6 6 X 6 X X X X X X X
T~o X X X X 3 X X X X X X X
X S X X X X X X X Tt3 X X
r~2 X 5 X X X X X X X X X X
T~3 X X X X 2 X X X X X X X
Fig. 7.45 LR(1) tables.
Finally, we observe that neither A l g o r i t h m 7.10 n o r A l g o r i t h m 7.13 m a k e

m a x i m a l use of the principles of e r r o r p o s t p o n e m e n t a n d table merger. F o r
example, the SLR(1) g r a m m a r in Exercise 7.3.1(a) has a canonical set of 18
LR(1) tables. A l g o r i t h m 7.13 will yield a set o f LR(1) tables containing at
least 14 tables a n d A l g o r i t h m 7.10 will p r o d u c e a set of 14 SLR(1) tables.
But a judicious application of error p o s t p o n e m e n t and table merger can result

in an equivalent set of 7 LR(1) tables. •
EXERCISES
"7.4.1. Consider the class {G1, G2 . . . . } of LR(0) grammars, where G. has the
following productions"
S ~Ai l~i<n
A~- >aiA i l <i~j<n
A~ > aiBi l bi 1 < i _~ n
Bi ~ ajBt l bi 1 ~ i, j < n
Show that the number of tables in the canonical set of LR(0) tables
for G. is exponential in n.
7.4.2. Show that each grammar in Exercise 7.3.1 is SLR(1).
7.4.3. Show that every LR(0) grammar is an SLR(0) grammar,
*7.4.4. Show that every SMSP grammar is an SLR(1) grammar.
7.4.5. Show that the grammar in Example 7.26 is not SLR(k) for any k _> 0.
*7.4.6. Show that every LL(1) grammar is an SLR(1) grammar. Is every LL(2)
grammar an SLR(2) grammar ?
7.4.7. Using Algorithm 7.10, construct a parser for each grammar in Exercise
7.3.1.
7.4.8. Let ,~c be the canonical collection of sets of LR(k) items for G. Let So
be the sets of LR(0) items for G. Show that ,~c and 80 have the same
sets of cores. Hint: Let ~ = GOTO(~0, ~), where ~0 is the initial set
of So, and proceed by induction on l~ I.
7.4.9. Show that CORE(GOTO(tY, ~)) = GOTO(CORE(~), 06), where ~ ~ Sc
as above.
DEFINITION
A grammar G = (N, E , P , S) is said to be lookahead LR(k)
[LALR(k)] if the following algorithm succeeds in producing LR(k)
tables:
(1) Construct So, the canonical collection of sets of LR(k) items
for G.
(2) For each tY ~ ~c, let (Y' be the union of those (B ~ ~ such
that CORE((B) = CORE(G).
(3) Let S be the set of those ~ ' constructed in step (2). Construct
a set of LR(k) tables from S in the usual manner.
7.4.10. Show that if G is SLR(k), then it is LALR(k).
EXERCISES 643
7.4.11. Show that the LALR table-constructing algorithm above yields a set of
tables equivalent to the canonical set.
7.4.12. (a) Show that the grammar in Example 7.26 is LALR(1).
(b) Show that the grammar in Example 7.27 is not LALR(k) for any k.
S----~L= RIR
L----~ * RI a
R -----~ L
Show that G is LALR(1) but not SLR(1).

7.4.14. Show that there are LALR grammars that are not LL, and conversely.
• 7.4.15. Let G be an LALR(1) grammar. Let So be the canonical collection of
sets of LR(0) items for G. We say that So has a shift-reduce conflict
if some ~ ~ So contains items [ A ~ . ] and [ B ~ f l . ay] where
a ~ FOLLOW(A). Show that the LALR parser resolves each shift-
reduce conflict in the LR(0) items in favor of shifting.
"7.4.16. Let (3, To) be the set of SLR(1) tables for an SLR(1) grammar
G = (N, ~, P, S). Show that:
(1) All error entries in the goto field of each table are don't cares.
(2) An error entry on input symbol a in the action field of table T
is essential (not a don't care) if and only if one of the following con-
ditions holds.
(a) T is To, the initial table.
(b) There is a table T ' = (f, g) in ~3 such that T = g(b) for some
binS.
(c) There is some table T' = (f, g) such that T c NEXT (T', i) and
f(a) = reduce i.
"7.4.17. Let G = (N, ~, P, S) be a CFG. Show that G is LR(k) if and only if,
for each splitting set N ' ~ N, all component grammars G,~ are LR(k),
A ~ N'. [Note: G may not be LR(k) but yet have some splitting set
N' such that each Ga is LR(k) for A ~ N'.]
7.4.18. Repeat Exercise 7.4.17 for LL(k) grammars.
7.4.19. Use Algorithms 7.12 and 7.13 to construct a set of LR(1) tables for
Go using the splitting set {E, T, F}. Compare the set of tables obtained
with that in Fig. 7.44 (p. 637).
**7.4.20. Under what conditions will all splitting sets on the nonterminals of a
grammar G cause Algorithms 7.12 and 7.13 to produce the same set
of LR(1) tables for G ?
"7.4.21. Suppose that the LALR(1) algorithm fails to produce a set of LR(1)
tables for a grammar G = (N, ~, P, S) because a set of items containing
[A----~ t~., a] and [B ~ f l ' 7 , b] is generated such that ~, ~ e and
a ~ EFF(?b). Show that if N' ~ N is a splitting set and A ~ N', then

Algorithm 7.13 will also not produce a set of LR(1) tables with unique
parsing actions.
7.4.22. Use Algorithms 7.12 and 7.13 to try to construct a set of LR(1) tables
for the grammar in Example 7.27 using the splitting set [S, A}.
7.4.23. Use Algorithms 7.12 and 7.13 to construct a set of LR(1) tables for
the grammar in Exercise 7.3.1(a) using the splitting set {S, ,4}.
*7.4.24. Use error postponement and table merger to find an equivalent set of
LR(1) tables for the grammar in Exercise 7.3,1(a) that has seven tables.
*7.4.25. Give an example of a (1, 1)-BRC grammar which is not SLR(1).
*7.4.26. Show that if Algorithm 7.9 is applied to a set of SLR(1) tables for a
grammar having at most one single production for any nonterminal,
then all reductions by single productions are eliminated.
Research Problems
7.4.27. F i n d additional ways of constructing small sets of LR(k) tables for
LR(k) grammars without resorting to the detailed transformations of
Section 7.3. Your methods need not work for all LR(k) grammars but
should be applicable to at least some of the practically important
grammars, such as those listed in the Appendix of VolUme 1.
7.4.28. When an LR(k) parser enters an error configuration, in practice we
would call an error recovery routine that modifies the input and the
pushdown list so that normal parsing can resume. One method of
modifying the error configuration of an LR(1) parser is to search
forward on the input tape until we find one of certain input symbols.
Once such an input symbol a has been found, we look down into the
pushdown list for a table T : (f, g~ such that T was constructed from
a set of items ~ containing an item of the form [A ---, • 0c, a], A ~ S.
The errOr recovery procedure is to delete all input symbols up to a
a n d to remove all symbols and tables above T on the pushdown list.
The nonterminal A is then placed on top of the pushdown list and
table g(A) is placed on top of A. Because of Exercise 7.3.8(c),
g(A) ~ error. The effect of this error recovery action is to assume that
the grammar symbols above T on the pushdown list together with the
input symbols up to a form an instance of A. Evaluate the effectiveness
of this error recovery procedure, either empirically or theoretically.
A reasonable criterion of effectiveness is the likelihood of properly
correcting the errors chosen from a set of "likely" programmer errors.
7.4.29. When a grammar is split, the component grammars can be parsed in
different ways. Investigate ways to combine various types of parsers for
the components. In particular, is it possible to parse one component
bottom up and another top down ?
SEC. 7.5 PARSING AUTOMATA 645
7.4.30. Write a program that constructs the SLR(1) set of tables from an
SLR(I) grammar.
7.4.31. Write a program that finds all inaccessible error entries in a set of
LR(1) tables.
7.4.32. Write a program to construct an LALR(1) parser from an LALR(1)
grammar.
7.4.33. Construct an SLR(1) parser with error recovery for one of the gram-
mars in the Appendix of Volume I.
BIBLIOGRAPHIC NOTES
Simple LR(k) grammars and LALR(k) grammars were first studied by DeRemer
[1969, 1971]. The technique of constructing the canonical set of LR(0) items for a
grammar and then using lookahead to resolve parsing decision ambiguities was
also advocated by DeRemer. The grammar-splitting approach to LR parser design
was put forward by Korenjak [1969].
Exercise 7.4.1 is from Earley [1968]. The error recovery procedure in Exercise
7.4.28 was suggested by Leinius [1970]. Exercise 7.4.26 is from Aho and Ullman
[1972d].
7.5. PARSING A U T O M A T A
Instead of looking at an L R parser as a routine which treats the LR tables

as data, in this section we shall take the point of view that the L R tables
control the parser. Adopting this point of view, we can develop another
approach to the simplification of L R parsers.
The central idea of this section is that if the LR tables are in control of
the parser, then each table can be considered as the state of an automaton
which implements the LR parsing algorithm. The automaton can be consider-
ed as a finite automaton with "side effects" that manipulate a pushdown list.
Minimization of the states of the automaton can then take place in a manner
similar to Algorithm 2.2.
7.5.1. Canonical Parsing Automata
An LR parsing algorithm makes its decision to shift or reduce by looking

at the next k input symbols and consulting the governing table, the table on
top of the pushdown list. If a reduction is made, the new governing table is
determined by examining the table on the pushdown list which is exposed
d u r i n g the reduction.
I t is entirely possible to imagine that the tables themselves are parts of

a program, and that program control lies with the governing table itself.
Typically, the program will be written in some easily interpreted language,
so the distinction between a set of tables driving a parsing routine and an
interpreted program is not significant.
Let G be an LR(0) grammar and (5, To) the set of LR(0) tables for G.
From the tables we shall construct a parsing automaton for G that mimics
the behavior of the LR(0) parsing algorithm for G using the set of tables 3.
We notice that if T = ( f , g ) is an LR(0) table in 3, then f(e) is either
shift, reduce, or accept. Thus, we can refer to tables as "shift" tables or
"reduce" tables as determined by the parsing action function. We shall
initially have one program for each table. We can interpret these programs
as states of the parsing automaton for G.
A shift state T ~ - ( f , g) does the following:
(1) T, the name of the state, is placed on top of the pushdown list.
(2) The next input symbol, say a, is removed from the input and control
passes to the state g(a).
In the previous versions of LR parsing we also placed the input symbol a
on the pushdown list on top of the table T. But, as we have pointed out,
storing the grammar symbols on the pushdown list is not necessary, and for
the remainder of this section we shall not place a n y grammar symbols on
the pushdown list.
A reduce state does the following"
(1) Let A ~ ~ be production i according to which the reduction is to be
made. The top Itzl -- 1 symbols are removed (popped) from the pushdown
list.t (If ~ = e, then the controlling state is placed on top of the pushdown
list.)
(2) The state name now on top of the pushdown list is determined. Sup-
pose that that state is T = ( f , g). Control passes to the state g(A) and the
production number i is emitted.
A special case occurs when the "reduce" action really means accept.
In that case, the entire process terminates and the automaton enters an
(accepting) final state.
It is straightforward to show that the collection of states defined above
will do to the pushdown list exactly what the LR(k) parser does (except here
we have not written any grammar symbols on the pushdown list) if we
identify the state names and the tables from which the states are derived.
The only exception is that the LR(k) parsing algorithm places the name of
the governing table on top of the pushdown list, while here the name of
tWe remove loci -- 1 symbols, rather than t~1 symbols because the table corresponding
to the rightmost symbol of 0cis in control and does not appear on the list.
that table does not appear but is indicated by the fact that program control
lies with that table.
We shall now define the parsing automaton which executes these parsing
actions directly. There is a state of the automaton for each state (i.e., table)
in the above sense. The input symbols for the automaton are the terminals
of the grammar and the state names themselves. A shift state makes tran-
sitions only on terminals, and a reduce state makes transitions only on state
names. I n fact, a state T calling for a reduction according to production
A ~ ~ need have a transition specified for state T' only when T is in
GOTO(T', ~).
We should remember that this automaton is more than a finite automaton,
in that the states have side effects on a pushdown list. That is, each time
a state transition occurs, something happens to the pushdown list which is
not reflected in the finite automaton model of the system. Nevertheless, we
can reduce the number of states of the parsing automaton in a manner similar
to Algorithm 2.2. The difference in this case is that we must be sure that all
subsequent side effects are the same if two states are to be placed in the same
equivalence class. We now give a formal definition of a parsing automaton.
DEFINITION
Let G = (N, Z, P, S) be an LR(0) grammar and (3, To) its set of LR(0)
tables. We define an incompletely specified automaton M, called the canonical
parsing automaton for G. M is a 5-tuple (3, Z U 3 U {$}, ~, To, [T~}), where
(1) 3 is the set of states.
(2) E U 3 is the set of possible input symbols. The symbols in Z are on
the input tape, and those i n 3 are on the pushdown list. Thus, 3 is both the
set of states and a subset of the inputs to the parsing automaton.
(3) ~ is a mapping from 3 × (Z U 3) to 3. c~is defined as follows:
(a) If T ~ 3 is a shift state, 6(T, a ) = GOTO(T, a) for all a ~ 2~.
(b) If T ~ 3 is a reduce state calling for a reduction by production
A ~ ~x and if T' is in GOTO-I(T, tz) [i.e., T = GOTO(T', tz)],
then c~(T, T') = GOTO(T', A).
(c) O(T, X) is undefined otherwise.
The canonical parsing automaton is a finite transducer with side effects
on a pushdown list. Its behavior can be described in terms of configurations
consisting of 4-tuples of the form (~t, T, w, n), where
(1) 0c represents the contents of the pushdown list (with the topmost
symbol on the right).
(2) Z is the governing state.
(3) w is the unexpended input.
(4) n is the output string to this point.
Moves can be reflected by a relation ~- on configurations. If T is a shift
state and J(T, a) = T', we write (oc, T, aw, n) ~ (ocT, T', w, n). If T calls for
a reduction according to the ith production A ~ ~, and J(T, T') ---- T", we
write (ocT'fl, T, w, n) ~ (ocT', T", w, hi) for all fl of length I~, [ -- 1. If l~'[----0,
then (~z, T, w, n) ~ (aT, T", w, hi). In this case, T and T' are the same. Note
that if we had included the controlling state symbol as the top symbol of
the pushdown list, we would have the configurations of the usual LR parsing
algorithm.
We define ~--., I--, and ~ in the usual fashion. An initial configuration is
one of the form (e, To, w, e), and an accepting configuration is one of the form
(To, T1, e, z0. If (e, To, w, e)]--- (To, T1, e, n), then we say that n is the parse
produced by M for w.
Example 7.32
Let us consider the LR(0) grammar G
(l) S--~ aA
(2) S--~ aB
(3) A --~ bA
(4) A --* c
(5) B - - , bB
(6) B--~ d
generating the regular set ab*(c + d). The ten LR(0) tables for G are listed
in Fig. 7.46.
action goto
e S A B a b c d
To T1 X X T2 X X X
T1 X X X X X X X
T2 x T3 T4 X T5 T, T7
T3 X X X X X X X
T4 X X X X X X X
T5 x T8 T9 X T5 T, T7
T6 X X X X X X X
T7 X X X X X X X
T8 X X X X X X X
T9 X X X X X X X
Fig. 7.46 LR(0) tables.

To, T2, and T5 are shift states. Thus, we have a parsing automaton with
the following shift rules"
6(To, a) = T~
,~(T~, b) = T~
6(T~, c) = T~
6(T~, d) = r~
6(T~, b) = T~
6(T~, c) = T~
6(T5, d) = T 7
We compute the transition rules for the reduce states T3, T4, T6, T7, Ts and
Tg. Table T3 reduces using production S ~ aA. Since GOTO-I(T3, aA)={To}
and GOTO(To, S ) = TI, ~(T3, To) = T1 is the only rule for T3. Table T7
reduces by B ---~ d. Since GOTO-I(TT, d) = [T2, Ts}, GOTO(T2, B) = T4, and
GOTO(Ts, B) = Tg, the rules for T7 are ~(TT, T2) = T4 and ~6(T7, Ts) = Tg.
The reduce rules are summarized below"
~(T3, To) : T 1
~(Z,, To) : T,
~(T6, T2) : T3 ~(T6, Ts) : Ts
6(T~, T9 = T, ~(T7, Ts) : T9
~(Ts, T 9 = Z~ ~(T8, Ts) : Ts
6(T~, T9 : T~ ~(Tg, Ts) : T9
The transition graph of the parsing automaton is shown in Fig. 7.47.

With input abc, this canonical automaton would enter the following
sequence of configurations:
(e, To, abe, e) ~- (To, Tz, be, e)

(ToTs, T~, c, e)
1-- (ToT2Ts, T6, e, e)
(Torero, T~, e, 4)
(TOT2, T3, e, 43)
1- (To, Ti, e, 431)
Thus, the parsing automaton produces the parse 431 for the input abc.
D
start
¢ c
T7
Ts
T5 T8 T 9 ~ T5
/,o
Fig. 7.47 Transition graph of canonical automaton.
7.5.2. Splitting the Functions of States
One good feature of the parsing automaton approach to parser design is

that we can often split the functions of certain states and attribute them to
two separate states, connected in series. If state A is split into two states A I
and A 2 while B is split into B 1 and B2, it may be possible to merge, say A 2
and B2, while it was not possible to merge A and B. Since the amount of
work done by A1 and A 2 (or B1 and B2) exactly equals the amount done by
A (or B), no increase in cost occurs if the split is made. However, if state
mergers can be made, then improvement is possible. In this section we shall
explore ways in which the functions of certain states can be split with the hope
of merging common actions.
We shall split every reduce state into a pop state followed by an interroga-
tion state. Suppose that T is a reduce state calling for a reduction by produc-
tion A --~ ~t whose number is i. When we split T, we create a pop state whose
sole function is to remove [~l -- 1 symbols from the top of the pushdown
list. If 0~ = e, the pop state will actually add the name T to the top of the
pushdown list. In addition, the pop state will emit the production number i.
Control is then transferred to the interrogation state which examines
SEC. 7.5 PARSINGAUTOMATA 651
the state name now on top of the pushdown list, say U, and then transfers
control to GOTO(U, A).
In the transition graph we can replace a reduce state T by a pop state,
which we shall also call T, and an interrogation state, which we shall call T'.
All edges entering the old T still enter the new T, but all edges leaving the old
T now leave T'. One unlabeled edge goes from the new T to T'. This trans-
formation is sketched in Fig. 7.48, where production i is A--~ 0~.
Shift states and the accept state will not be split here.
T Pop
State
Interrogation
State
Old reduce state
Split Reduce State
Fig. 7.48 Splittingreduce states.
The automaton constructed from a canonical parsing automaton in

the above manner is called a split canonical parsing automaton.
Example 7.33
The split parsing automaton from Fig. 7.47 is shown in Fig. 7.49. We
show shift states by [], pop states by A , and interrogation and accept states
by (D.
To compare the behavior of this split automaton with the automaton in
Example 7.32, consider the sequence of moves the split automaton makes on
input abc :
(e, To, abe, e) ]-- (To, Tz, be, e)

(roT~, r~, c, e)
(ToT2Ts., T6, e, e)
[-- (TorzTs, T~, e, 4)
(ToTzTs, Ts, e, 4)
I--- (TOT2, T~, e, 43)
J-- (TOT2, T3, e, 43)
(To, T~, e, 431)
l- (To, T~, e, 431)
D
start
b °
L I a¸ (
s
Fig. 7.49 Split canonical automaton.
If M1 and Mz are two parsing automata for a grammar G, then we say

that Mi and Mz are equivalent if, for each input string w, they both produce
the same parse or they both produce an error indication after having read
the same number of input symbols. Thus, we are using the same definition
of equivalence as for two sets of LR(k) tables.
If the canonical and split canonical automata are run side by side, then
it is easy to see that the resulting pushdown lists are the same each time
the split automaton enters a shift or interrogation state. Thus, it should be
evident that these two automata are equivalent.
There are two kinds of simplifications that can be made to split parsing
automata. The first is to eliminate certain states completely if their actions
are not needed. The second is to merge states which are indistinguishable.
The first kind of simplification eliminates certain interrogation states.
If an interrogation state has out-degree 1, it may be removed. The pop
state connected to it will be connected directly, by an unlabeled edge, to
the state to which the interrogation state was connected.
We call the automaton constructed from a split canonical automaton
by applying this simplification a semireduced automaton.
Example 7.34
Let us consider the split parsing automaton of Fig. 7.49. T~ and T~ have
only one transition, on To. Applying our transformation, these states and
the To transitions are eliminated. The resulting semireduced automaton is
shown in Fig. 7.50.
THEOREM 7.12
A split canonical parsing automaton M1 and its semireduced automaton
M~ are equivalent.
Proof. An interrogation state does not change the symbols appearing on
the pushdown list. Moreover, if an interrogation state T has only one transi-
tion, then the state labeling that transition must appear at the top of the push-
down list whenever M1 enters state T. This follows from the definition of
the GOTO function and of the canonical automaton. Thus, the first trans-
formation does not affect any sequence of stack, input or output moves
made by the automaton. [Z]
We now turn to the problem of minimizing the states of a semireduced

automaton by merging states whose side effects (other than placing their
own name on the pushdown list) are the same and which transfer on corre-
sponding edges to states that are themselves indistinguishable. The mini-
mization algorithm is similar in spirit to Algorithm 2.2, although
modifications are necessary because we must account for the operations on
the pushdown list.
DEFINITION
Let M = (Q, E, ~, q0, {ql}) be a semireduced parsing automaton, where
q0 is the initial state and q l the accept state. Note that Q ~ E. Also, we shall
start
T2 - 7
T5
Fig. 7.50 Semi-reducedautomaton.
use e to "label" transitions hitherto unlabeled. We say that p and q in Q are

0
O-indistinguishable, written p~ q, if one of the following conditions is satisfied
by the transition diagram for M (the case p = q is not excluded):
(1) p and q are both shift states.

(2) p and q are both interrogation states.
(3) p and q are both pop states, and they pop the same number of symbols
from the pushdown list and cause the same production number to be emitted.
(That is, p and q reduce according to the same production.)
(4) p = q = q l (the final state).
Otherwise, p and q are 0-distinguishable. In particular, states of different

types are always 0-distinguishable.
k
We say that p and q are k-indistinguishable, written p - ~ q, if they are
( k - 1)-indistinguishable and one of the following holds:
(1) p and q are shift states and

(a) For every a ~ X u {e}, an edge labeled a leaves either both or
neither of p and q. If an edge labeled a leaves each of p and q
and enters p' and q', respectively, then p' and q' are ( k - 1)-
indistinguishable states.
(b) There is no interrogation state with transitions on p and q to
(k -- !)-distinguishable states.
(2) p and q are pop states and the edges leaving them go to ( k - 1)-
indistinguishable states.
(3) p and q are interrogation states, and for all states s either both or
neither o f p and q have transitions on s. If both do, then the transitions lead
to ( k - 1)-indistinguishable states.
Otherwise, p and q are k-distinguishable. We say that p and q are indistin-
guishable, written p - - q, if they are k-indistinguishable for all k _> 0. Other-
wise, p and q are distinguishable.
LEMMA 7.4
Let M = (Q, E, 6, qo, {ql]) be a semireduced automaton. Then
(1) For a l l k , k IS
- -
. an equivalence relation on Q, and
k+l k+l k+2
(2) If ~ -- ~ , t h e n ~ -- ~ -- . . . .
Proof Exercise similar to Lemma 2.11.
Example 7.35
Consider the semireduced automaton of Fig. 7.50. Recalling the LR(0)
tables from which that automaton was constructed (Fig. 7.46 on p. 648),
we see that all six pop states reduce according to different productions and
hence are 0,distinguishable. The other kinds of states are, by definition,
0
0-indistinguishable from those of the same kind, and so - - has equivalence
classes [To, Tz, Ts}, {T1}, {T3], {T4}, [T6}, {TT}, {T8}, {Tg}, [T~, T~, T~, T~].
1
To compute ~ , we observe that Tz and T5 are 1-distinguishable, because
T~ branches to 0,distinguishable states T3 and T8 on T2 and Ts, respectively.
Also, To is 1-distinguishable from Tz and Ts, because the former has a transi-
tion on a, while the latter do not. T~ and T~ are 1-distinguishable because they
branch on Tz to 0-distinguishable states. Likewise, the pairs T~-T~, T~-T~
and T~-T~ are 1-distinguishable. The other pairs which are 0-indistinguish-
able are also 1-indistinguishable. Thus, the equivalence classes of ~ which
have more than one member are {T~, T~} and {T~r, T~}. We find that =2 = !
DEFINITION
Let M1 = (Q, E, J, q0, {ql}) be a semireduced automaton. Let Q' be
the set of equivalence classes of -- and let [q] stand for the equivalence class
656 T E C H N I Q U E S FOR P A R S E R O P T I M I Z A T I O N CI-IAP. 7
containing q. The reduced automaton for M1 is M2 -- (Q', E', 6', [q0], {[ql]}),
where
(1) E ' - - ( Z - Q) u Q';
(2) For all q ~ Q and a ~ E U {e} -- Q, 6'([q], a) -- [~(q, a)]; and
(3) For all q and p ~ Q, 5'([q], [p]) = [~(q, p)].
Example 7.36
In Example 7.35 we found that T~ ~ T~ and T~ ~ T~. The transition
graph of the reduced automaton for Fig. 7.50 is shown in Fig. 7.51. T~ and
T~r have been chosen as representatives for the two equivalence classes with
more than one member. [~]
From the definition of ~ , it follows that the definition of the reduced

automaton is consistent; that is, rules (2) and (3) of the definition do not
depend on which representative of an equivalence class is chosen.
We can also show in a straightforward way that the reduced and semi-
start -
a b
/
7
T1
Fig. 7.51 Reduced automaton.

reduced automata are equivalent. Essentially, the two automata will always
make similar sequences of moves. The reduced automaton enters a state
representing the equivalence class of each state entered by the semireduced
automaton. We state the correspondence formally as follows.
THEOREM 7.13
Let Mi be a semireduced automaton and Mz the corresponding reduced
automaton. Then for all i > 0, there exist T , , . . . , T~, T such that
(e, To, w, e)[--~, (TOT, . . . T,,,, T, x, n)
if and only if
(e, [To], w, e ) I ~ ([T0][T,] . . . [Tm], [T], x, n:),
where To is the initial state of Mi and [u] denotes the equivalence class of
state u.
Proof. Elementary induction on i. [~
COROLLARY
M1 and M2 are equivalent. [~]
7.5.3. G e n e r a l i z a t i o n s t o LR(k) Parsers
We can also construct a canonical parsing automaton from a set of LR(k)

tables for an LR(k) grammar with k > 0. Here we consider the case in which
k = 1. The parsing automaton behaves in much the same fashion as the
canonical parsing automaton for an LR(0) grammar, except that we now can-
not classify each table solely as a shift, reduce, or accept table.
As before, there will be a state of the automaton corresponding to each
table. If the automaton is in configuration (ToT1 . . . Tin, T, w, n), the auto-
maton behaves as follows:
(1) The lookahead string a = FIRST(w) is determined.
(2) A decision is made whether to shift or reduce. That is, if T = ( f , g),
then f ( a ) is determined.
(a) If f ( a ) = shift, then the automaton moves into the configuration
(TOT1 "'" TroT, T', w', n), where T' = g(a) and aw' = w.
(b) If f ( a ) -= reduce i and production i is A ----~a, where I~1 = r > 0,
the automaton enters configuration (ToT1 -.- Tin-r+1, T', w, zti),
where T' = g'(A) if Tin_r+ 1 = ( f ' , g ' ) . [If the production is A ---~ e,
then the resulting configuration will be (TOT, . . . TroT, T', w, rti),
where T' = g(A), assuming that T -= ( f , g).]
(c) If f ( a ) = accept or error, then the automaton halts and reports
acceptance or rejection of the input.
It is possible to split states in various ways, so that particular pushdown
list operations can be isolated with the hope of merging common operations.
Here, we shall consider the following state-splitting scheme.
Let Tbe a state corresponding to table T = ( f , g). We split this state into
read, push, pop, and interrogation states as follows"
(1) w e create a read state labeled T which reads the next input symbol.
Read states are indicated by G.
(2) If f(a) = shift, we then create a push state labeled T" and draw an edge
with label a from the read state T to this push state. If g(a) = T', we then
draw an unlabeled edge from T" to the read state labeled T'. The push state
T" has two side effects. The input symbol a is removed from the input, and
the table name T is pushed on top of the pushdown list. We indicate push
states by ~ .
(3) If f(a) = reduce i, then we create a pop state T1 and an interrogation
state T2. A n edge labeled a is drawn from T to T1. If production i is A ~ a,
then the action of state T~ is to remove l a[ -- 1 symbols from the top of the
pushdown list and to emit the production number i. If a = e, then T1 places
the original table name T on the pushdown list. Pop states are indicated by ~..
An unlabeled edge is then drawn from T1 to T2. The action of T2 is to examine
the symbol now on top of the pushdown list. If GOTO-~(T, a) contains T'
and GOTO(T', A) = T", then an edge labeled T' is drawn from T2 to the
read state of T". Interrogation states are also indicated by G. The labels on
the edges leaving distinguish these circles from read states.
Thus, state T would be represented as in Fig. 7.52 if f(a) = shift and
f(b) = reduce i.
Read State b
3'
7i Pop State
Push State [ a Interrogation State
Fig. 7.52 Possiblerepresentation for state T.
(4) The accepting state is not split.
Example 7.37
Let us consider the LR(1) grammar G

(1) s---, A s
(2) A ---~ aAb
(3) A ---, e
(4) B---~ b B
(5) B----~b
A set of LR(1) tables for G is shown in Fig. 7.53.
action goto
a b e S A B a b
To S 3 X T1 T2 X T3 X
T~ X X A X X X X X
T2 X S X X X T4 x r5
T3 S 3 X x /'6 X T3 X
T4 X X 1 X X X X X
T5 X S 5 x X rT X rs
T6 X S X X X X X T8
r7 X X 4 X X X X X
r8 X 2 X X X X X X
Fig. 7.53 L R ( I ) tables.
The parsing automaton which results from splitting states as described

above is shown in Fig. 7.54. [-]
There are several ways in which the number of states in a split automaton
for an LR(1) grammar can be reduced:
(1) If an interrogation state has only one edge leaving, the interrogation
state can be eliminated. This simplification is exactly like the corresponding
LR(0) simplification.
(2) Let T be a read state such that in every path into T the last edge
entering a pop state is always labeled by the same input symbol or always by
e. Then the read state may be eliminated. (One can show that in this case
the read state has only one edge leaving and that the edge is labeled by
that symbol.)
Example 7.38
Let us consider the automaton of Fig. 7.54. There are three interrogation
states with out-degree 1, namely T~, T~, and T]. These states and the edges
leaving can all be eliminated.
Start
r3
T5
T2
T0
Fig. 7.54 Split a u t o m at o n for LR(1) grammar.
Next, let us consider read state T6. The only way T6 can be reached is via
the paths from T3 and T~ or T8 and T~. The previous input symbol label is b
in either case, meaning that if T3 or 7'8 see b on the input, they transfer to T~
or T~ for a reduction. The b remains on the input until 7'6 examines it and
decides to transfer to T b for a shift. Since we know that the b is there, 7'6 is
superfluous; T~ can push the state name 7'6 on the pushdown list without
looking at the next input symbol, since that input symbol must be b if the
automaton has reached state/'6.
SEC. 7.5 PARSINGAUTOMATA 661
Similarly, Tz can be eliminated. The resulting automaton is shown in

Fig. 7.55. [[]
As with the LR(0) semireduced automaton, we can also merge compatible

states without affecting the action of the automaton. We leave these matters
to the reader. In Fig. 7.55, the only pair of states which can be merged is
T~ and T~.
Start
To
T3
Fig. 7.55 Semireduced automaton.
In this chapter we have seen a large number of techniques that can be

used to reduce the size and increase the speed of parsers. In view of all these
possibilities, how should one go about constructing a parser for a given
grammar?
First a decision whether to use a top-down or bottom-up parser must be
made. This decision is affected by the types of translations which need to be
computed. The matter will be discussed in Chapter 9.
If a top-down parser is desired, an LL(1) parser is recommended. To

construct such a parser, we need to perform the following steps:
(1) The grammar must be first transformed into an LL(1) grammar.
Very rarely will the initial grammar be LL(1). Left factoring and elimination
of left recursion are the primary tools in attempting t o make a grammar
LL(1), b u t t h e r e is no guarantee that these transformations will always
succeed. (See Examples 5.10 and 5.11 on p. 345 of Volume I.)
(2) However, if we can obtain an equivalent LL(1) grammar G, then
using the techniques of Section 5.1 (Algorithm 5.1 in particular), we can
readily produce an LL(1) parsing table for G. The entries in this parsing
table will in practice be calls to routines that manipulate the pushdown list,
produce output, or generate error messages.
(3) There are two techniques that can be used to reduce the size of the
parsing table"
(a) If a production begins with a terminal symbol, then it is not
necessary to stack the terminal symbol if we advance the input
pointer. (That is, if production A ---~ a~ is to be used and a is
the current input symbol, then we put ~ on the stack and move the
input head one symbol to the right.) This technique can reduce
the number of different symbols that can appear on the pushdown
list and hence the number of rows in the LL(1) parsing table.
(b) Several nonterminals with similar parsing actions can be com-
bined into a single nonterminal with a "tag" which describes
what nonterminal it represents. Nonterminals representing
expressions are amenable to this combination. (See Exercises
7.3.28 and 7.3.31.)
If a bottom-up parser is desired, then we recommend a deterministic
shift-reduce parsing algorithm such as an SLR(1) parser or LALR(1) parser,
if necessary. It is easy to describe the syntax of most programming languages
by an SLR(1) grammar, so little preliminary modification of the given gram-
mar should be necessary. The size of an SLR(1) or LALR(1)parser can be
reduced significantly by a few optimizations. It is usually worthwhile to
eliminate reductions by single productions.
Further space optimization is possible if we implement an LALR(1) parser
in the style of the production language parsers of Section 7.2. The parsing
action entries of each LR(1) table could be implemented as a sequence of
shift statements, followed by a sequence of reduce statements, followed by
one unconditional error statement. If all reduce statements involve the same
production, then all these reduce statements and the following error statement
could be replaced by one statement which reduces by that production regard-
less of the input. The error-detecting capability of the parser would not be
affected by this optimization. See Exercises 7.3.23 and 7.5.13. The non-g0 goto
EXERCISES 663
entries for each LR(1) table could be stored as a list of pairs (A, T) meaning
on nonterminat A place T on top of the stack. The gotos on terminals could
be encoded in the shift statements themselves. Note that no p-entries would
have to be stored. The optimizations of Section 7.2 merging common se-
quences of statements would then be applicable.
These approaches to parser design have several practical advantages.
First, we can mechanically debug the resulting parser by generating input
strings that will check the behavior of the parser. F o r example, we can easily
construct input strings that cause each useful entry in an LL(1) or LR(1)
table to be exercised. Another advantage of LL(1) and LR(1) parsers,
especially the former, is that minor changes to the syntax or semantics can
be made by simply changing the appropriate entries in a parsing table.
Finally, the reader should be aware that certain ambiguous grammars
have " L L " o r " L R " parsers that are formed by resolving parsing action con-
flicts in an apparently arbitrary manner (see Exercise 7.5.14). Design of
parsers of this type warrants further research.
EXERCISES
7.5.1. Construct a parsing automaton M for Go. Construct from M an equiva-

lent split, a semireduced, and a reduced automaton.
7.5.2. Construct parsing automata for each grammar in Exercise 7.3.1.
7.5.3. Split states to construct reduced parsing automata from the parsing
automata in Exercise 7.5.2.
7.5.5. Prove that the definition of the reduced automaton in Section 7.5.2 is
consistent; that is, if p _= q, then d;(p, a) -_- 6(q, a) for all a in E' u {e}.
DEFINITION
Let G = (N, E,P, S') be an augmented CFG with productions

numbered 0, 1, . . . such that the zeroth production is S'--~ S. Let E' =
(@ 0, @1 . . . . , @p} be a set of special symbols not in N U E. Let the
ith production be A ---~ fl and suppose that S' *~ t~Aw =~. rm
ocflw. Then
t~fl#~ is called a characteristic string of the righ~-~sentential form t~flw.
*7.5.7. Show that the set of characteristic strings for a CFG is a regular set.
7.5.8. Show that a CFG G is unambiguous if and only if each right-sentential
form of G, except S', has a unique characteristic string.
664 TECHNIQUES FOR PARSER O P T I M I Z A T I O N CHAP. 7
7.5.9. Show that a C F G is LR(k) if and only if each right-sentential form t~flw
such that S" *~
rm
~Aw ==~
rm
~flw has a characteristic string which may be
determined from only ~fl and FIRSTk(W).
7.5.10. Let G = (N, E, P, S) be an LR(0) grammar and (3, To) its canonical set
of LR(0) tables. Let M = (3, E U 3, ~, To, {T1}) be the canonical
parsing automaton for G. Let M ' = (3 U (qr}, E u E', 6', To, {qs]) be
the deterministic finite automaton constructed from M by letting
(1) ~'(T, a) -- ~(T, a) for all T in 3 and a in E.
(2) O'(T, :~,.) -- qs ifO(T, T') is defined and T' is areduce state calling
for a reduction using production i.
Show that L(M') is the set of characteristic strings for G.
7.5.11. Give an algorithm for merging "equivalent" states of the semireduced
automaton constructed for an LR(1) grammar, where equivalence is
taken to mean that the two states have transitions on the same set of
symbols and transitions on each symbol are to equivalent states.t
*%5.12. Suppose that we modify the definition of "equivalence" in Exercise
7.5.11 to admit the equivalence of states that transfer to equivalent states
on symbol a whenever both states have a transition on a. Is the resulting
automaton equivalent (in the formal sense, meaning one may not shift
if the other declares error) to the semireduced automation ?
7.5.13. Suppose a read state T of an LR(1) parsing automaton has all its tran-
sitions to pop states which reduce by the same production. Show that if
we delete T and merge all those pop states to one, the new automaton will
make the reduction independent of the lookahead, but will be equivalent
to the original automaton,
"7.5.14. Let G be the ambiguous grammar with productions
S -+ if b then SEIa
E --~ else S [ e
(a) Show that L(G) is not an LL language.

(b) Construct a 1-predictive parser for G assuming that whenever E
is on top of the stack and else is the next input symbol, production
E ~ else S is to be applied.
(c) Construct an LR(1) parser for G by making an analogous assump-
tion.
Research Problems
7.5.15. Apply the technique used here--breaking a parser into a large number
of active components and merging or eliminating some of t h e m - - t o
parsers other than LR ones. For example, the technique in Section 7.2
tThis definition can be made precise by defining relations 0=, __1_. . . . as in Lemma 7.2.
effectively treated the rows of a precedence matrix as active elements.

Develop techniques applicable to LL parsers and various kinds of pre-
cedence parsers.
7.5.16. Certain states of a canonical parsing automaton may recognize only
regular sets. Consider splitting the shift states of a parsing automaton
into scan states and push states. The scan state might remove input
symbols and emit output but would not affect the pushdown list. Then
a scan state might transfer control to another scan state or a push state.
Thus, a set of scan states cart behave as a finite transducer. A push state
would place the name of the current state on the pushdown list. Develop
transformations that can be used to optimize parsing automata with
scan and push states as well as split reduce states.
7.5.17. Express the optimizations of Section 7.3 and 7.5 in each other's terms.
7.5.18. Design elementary operations that can be used to implement split
canonical parsing automata. Construct an interpreter for these elemen-
tary operations.
7.5.19. Write a program that will take a split canonical parsing automaton and
construct from it a sequence of elementary operations that simulates
the behavior of the parsing automaton.
7.5.20. Construct two LR(1) parsers for one of the grammars in the Appendix
of Volume 1. One LR(1) parser should be the interpretive LR(1) parser
working from a set of LR(1) tables. The other should be a sequence of
elementary operations simulating the parsing automaton. Compare the
size and speed of the parsers.
BIBLIOGRAPHIC NOTES
The parsing automaton approach was suggested by DeRemer [1969]. The

definition of characteristic string preceding Exercise 7.5.7 is from the same source.
Classes of ambiguous grammars that can be parsed by LL or LR means are dis-
cussed by Aho, Johnson, and Ullman [1972].
THEORY OF DETERMINISTIC
PARSING
In Chapter 5 we were introduced to various classes of grammars for which

we can construct efficient deterministic parsers. In that chapter some inclu-
sion relations among these classes of grammars were demonstrated. For
example, it was shown that every (m, k)-BRC grammar is an LR(k) grammar.
In this chapter we shall complete the hierarchy of relationships among these
classes of grammars.
One can also ask what class of languages is generated by the grammars in
a given class. In this chapter we shall see that most of the classes of grammars
in Chapter 5 generate exactly the deterministic context-free languages.
Specifically, we shall show that each of the following classes of grammars
generates exactly the deterministic context-free languages"
(1) LR(1),
(2) (1, 1)-BRC,
(3) Uniquely invertible (2, 1)-precedence, and
(4) Simple mixed strategy precedence.
In deriving these results we provide algorithms to convert grammars of one
kind into another. Thus, for each deterministic context-free language we can
find a grammar that can be parsed by a UI (2, 1)-precedence or simple mixed
strategy precedence algorithm. However, if these conversion algorithms are
used indiscriminately, the resulting grammars will often be too large for
practical use.
There are three interesting proper subclasses of the deterministic context-
free languages:
(1) The simple precedence languages,
(2) The operator precedence languages, and
(3) The LL languages.
666
SEC. 8.1 THEORY OF LL LANGUAGES 667
The operator precedence languages are a proper subset of the simple pre-
cedence languages and incommensurate with the LL languages. In Chapter
5 we saw that the class of UI weak precedence languages is the same as the
simple precedence languages. Thus, we have the hierarchy of languages
shown in Fig. 8.1.
Deterministic
Context-free Languages
Simple
Precedence
LL
Operator
Precedence
Fig. 8.1 Hierarchyof deterministic context-free languages.
In this chapter we shall derive this hierarchy and mention the most strik-
ing features of each class of languages. This chapter is organized into three
sections. In the first, we shall discuss LL languages and their properties.
In the second, we shall investigate the class of deterministic languages, and
in the third section we shall discuss the simple precedence and operator
precedence languages.
This chapter is the caviar of the book. However, it is not essential to
a strict diet of"theory of compiling," and it can be skipped on a first reading.'t
8.1. T H E O R Y OF LL L A N G U A G E S
We begin by deriving the principal results about LL languages and gram-

mars. In this section we shall bring out the following six results"
'tReaders who dislike caviar can skip it on subsequent readings as well.

668 THEORY OF DETERMINISTIC PARSING CHAP. 8
(1) Every LL(k) grammar is an LR(k) grammar (Theorem 8.1).

(2) For every LL(k) language there is a Greibach normal form LL(k q-- 1)
grammar (Theorem 8.5).
(3) It is decidable whether two LL grammars are equivalent (Theorem
8.6).
(4) An e-free language is LL(k) if and only if it has an e-free LL(k -q- 1)
grammar (Theorem 8.7).
(5) For k > 0, the LL(k) languages are a proper subset of the LL(k -t- 1)
languages (Theorem 8.8).
(6) There exist LR languages which are not LL languages (Exercise
8.1.11).
8.1.1. LL and LR Grammars
Our first task is to prove that every LL(k) grammar is an LR(k) grammar.
This result can be intuited by the following argument. Consider the deriva-
tion tree sketched in Fig. 8.2.
w x y Fig. 8.2 S k e t c h o f d e r i v a t i o n tree.
In scanning the input string wxy, the LR(k) condition requires us to

recognize the production A ~ ~ knowing wx and FIRSTk(y ). On the other
hand, the LL(k) condition requires us to recognize the production A ~ t~
knowing only w and F I R S T k ( x y ). Thus, it would appear that the LL(k)
condition is more stringent than the LR(k) condition, so that every LL(k)
grammar is LR(k). We shall now formalize this argument.
Suppose that we have an LL(k) grammar G a n d two parse trees in G.
Moreover, suppose that the frontiers of the two trees agree for the first m
symbols. Then to every node of one tree such that no more than m - k
leaves labeled with terminals appear to its left, there corresponds an "essen-
tially identical" node in the other tree. This relationship is represented picto-
rially in Fig. 8.3, where the shaded region represents the "same" nodes. We
assume that ] w[ = m -- k and that FIRSTk(xl) = FIRSTk(Xz). We can state
this observation more precisely as the following lemma.
SEe. 8.1 THEORY OF LL LANGUAGES 669
S S
W X1 W X2
(a) (b)
Fig. 8.3 Two parse trees for an LL(k) grammar.
LEMMA 8.1
Let G be an LL(k) g r a m m a r and let
S ~Im w~ Aa ~lm w~x~ and S ~Ira w2Bfl ~l m w2x 2
be two leftmost derivations such that F I R S T ~ ( w l x l ) = FIRST~(w2xz)for

1 = k + max( I wl 1, I wz !). (That is, w~x~ and w~x2 agree at least k symbols
beyond w~ and wz.)
(1) If ms = m 2 , t h e n w 1 = w 2 , A = B , a n d a = f l .
~I ~ --~!
(2) If m, < m2, then S ~ira w~Aa ~ira w2Bfl ==~
* W z X 2.
Proof. Examining the LL(k) definition, we find that if m 1 ~ mz, each

m2
of the first m 1 steps of the derivation S ==~ wzB fi are the same as those of
Ira
ml
S ==~ wiAa, since the lookahead strings are the same at each step. (1) and
Ira
(2) follow immediately. [Z]
THEOREM 8.1
Every LL(k) g r a m m a r t is LR(k).
Proof. Let G = (N, E, P, S) be an LL(k) g r a m m a r and suppose that it is
not LR(k). Then there exist two rightmost derivations in the augmented
grammar
i
(8.1.1) S' ~
rm
aAx I ===~ O~flx 1
rm
(8.1.2) S' ~i'm ?By ~r ra 7@
such that y@ = aflx2 for some x z for which F I R S T e ( x 2 ) = FIRSTe(xl).

Since G is assumed not to be LR(1), we can assume that aAx2 ~ ?By.
•~Throughout this book we are assuming that a grammar has no useless productions.
In these derivations, we can assume that i a n d j are both greater than zero.
Otherwise, we would have, for example,
S'-~ S'---> S
S' ---> B y ---> S fly
which would imply that G was left-recursive and hence that G was not LL.
Thus, for the remainder of this proof we can assume that we can replace S'
by S in derivations (8.1.1) and (8.1.2).
Let x,, xp, x v, and x ~ b e terminal strings derived from a, fl, ~,, and t~,
respectively, such that x, xpx2 = x~x~y. Consider the leftmost derivations
which correspond to the derivations
(8.1.3) S - - r~m ocAx~ ~r m aflx~ * X~XpX 1

rm
and
(8.1.4) S ~rm ~,By ~r m ?'6y ==:> xvx~y

rm
Specifically, let
(8.1.5) S ~l m x.Arl ~ x.flrl ~I m x.xprl ~I m x.xpx,
and
(8.1.6) S -Y-->
lm
x~BO --->
Im
x~OO --->
* x~x~O ::=,
lm
* x~x ~y
lm
where t/and 0 are the appropriate strings in (N U Z)*.

By Lemma 8.1, the sequence of steps in the derivation S *=>
lm
x~Arl is
the initial sequence of steps• in the derivation S *=,
Im
xvBO or conversely. We
assume the former case; the converse can be handled in a symmetric manner.
Thus, derivation (8.1.6) can be written as
(8.1.7) S - -l m~ x,Arl ~l m xJ3rl =Ira: ~ xvBO ~l m xvt~O - -l m~ xvx~O ~l m xyx,~y
Let us fix our attention on the parse tree T of derivation (8.1.7). Let nA
be the node corresponding to A in x~Arl and nn the node corresponding to
B in XyBO. These nodes are shown in Fig. 8.4. Note that nB may be a descen-
dant of nA. There cannot really be overlap between xp and x~. Either they
are disjoint or x~ is a subword of xp. We depict them this way merely to
imply either case.
Let us now consider two rightmost derivations associated with parse
tree T. In the first, T is expanded rightmost up to (and including) node nA;
in the second the parse tree is expanded up to node nB. The latter derivation
can be written as
s ~r m ~By ~1-m ~,,~y
Xa X# X2
t , • 13 L l. T . JIL II J
x~ x6 Y
Fig. 8.4 Parse tree T.
This derivation is in fact derivation (8.1.2). The rightmost derivation up to

node nA is
(8.1.8) S ~r r l l o(Ax2 ~r m ~'flX2
for some a'. We shall subsequently prove that ~' = ~.

Let us temporarily assume that ~' = ~. We shall then derive a contradic-
tion of the LL(k)-ness of G and thus be able to conclude the theorem. If
~ ' = ~, then ~,8y = oc'flx2 = ~flx2. Thus, the same rightmost derivations can
be used to extend derivations (8.1.2) and (8.1.8) to the string of terminals
x~,x,sy. But since we assume that nodes nA and nn are distinct, derivations
(8.1.2) and (8.1.8) are different, and thus the completed derivations are
different. We may conclude that xrx,~y has two different rightmost derivations
and that G is ambiguous. By Exercise 5.1.3, no LL grammar is ambiguous.
Hence, G cannot be LL, a contradiction of what was assumed.
Now we must show that ~' = ~. We note that ~' is the string formed by
concatenating the labels from the left of those nodes of T whose direct
ancestor is an ancestor of nA. (The reader should verify this property of right-
most derivations.) Now let us again consider the leftmost derivation (8.1.5),
which has the same parse tree as the rightmost derivation (8.1.3). Let T' be
the parse tree associated with derivation (8.1.3). The steps of derivation
(8.1.5) up to x , Atl are the same as those of derivation (8.1.7) up to xo,Aql.
Let n] be the node of T' corresponding to the nonterminal replaced in deriva-
tion (8.1.7) at the step x~Arl ==,

lm
x~flrl. Let II be the preorderingt of the inte-
rior nodes of parse tree T up to node nA and II' the preordering of the interior
nodes of T' up to node n]. The ith node in II matches the ith node in II' in
that both these nodes have the same label and that corresponding descen-
dants either are matched or are to the right of nA and n], respectively.
t
The nodes in T' whose direct ancestor is an ancestor of nA have labels
which, concatenated from the left, form e. But these nodes are matched with
those in T which form e', so that e' -- e. The proof is thus complete. [-]
The grammar
S >A[B
A > aAb[O
B > aBbb[ 1
is an LR(0) grammar but not an LL grammar (Example 5.4), so the contain-

ment of LL grammars in LR grammars is proper. In fact, if we consider the
LL and LR grammars along with the classes of grammars that are left- or
right-parsable (by a D P D T with an endmarker), we have, by Theorems 5.5,
5.12, and 8.1, the containment relations shown in Fig. 8.5.
LR
Left Right
Parsable Parsable
LL
Fig. 8.5 Relations between classes of grammars.
t I I is the sequence of interior nodes cff T in the order in which the nodes are expanded
in a leftmost derivation.
We claim that each of the six classes of grammars depicted in Fig. 8.5 is
nonempty. We know that there exists an LL grammar, so we must s h o w
the following.
THEOREM 8.2
There exist grammars which are
(1) LR and left-parsable but not LL.
(2) LR but not left-parsable.
(3) Left- and right-parsable but not LR.
(4) Right-parsable but not LR or left-parsable.
(5) Left-parsable but not right-parsable.
Proof Each of the following grammars inhabits the appropriate region.
(1) The grammar Go with productions
S >AIB
A ~ aaA l aa
B > aaBla
is LR(1) and left-parsable but not LL.

S >AbIAc
A ~ ABla
B ~a
is LR(1) but not left-parsable. See Example 3.27 (p. 272 of Volume I).
S- ~AblBc
A ~ Aala
B ~ Bala
is both left- and right-parsable but is not LR.

(4) The grammar Gd with productions
S >AblBc
A ~ACla
B ~ BCIa
C- ~a
is right-parsable but neither LR nor left-parsable.

(5) The grammar G, with productions
674 THEORYOF DETERMINISTIC PARSING CHAP. 8
S >BAbICAc
A > BAIa
B >a
C >a
is left-parsable but not right-parsable. See Example 3.26 (p. 271, Volume I).
8.1.2. LL Grammars in Greibach Normal Form
In this section we shall consider transformations which can be applied to

LL grammars while preserving the LL property. Our first results involve
e-productions. We shall give two algorithms which together convert an LL(k)
grammar into an equivalent LL(k-F- 1) grammar without e-productions.
The first algorithm modifies a grammar so that each right-hand side is either
e or begins with a symbol (possibly a terminal) which does not derive the
empty string. The second algorithm converts a grammar satisfying that con-
dition to one without e-productions. Both algorithms preserve the LL-ness
of grammars.
DEFINITION
Let G = (N, ~, P, S) be a CFG. We say that nonterminal A is nullable

if A can derive the empty string. Otherwise, a symbol in N u E is said to be
nonnullable. Every terminal is thus nonnullable. We say that G is nonnullable
if each production in P is of the form A ~ e or A ---~ X1 " " Xk, where X1
is a nonnullable symbol.
ALGORITHM 8.1
Conversion to a nonnullable grammar,
lnput. C F G G = (N, E, P, S).
Output. A nonnullable context-free grammar G1 = (Na, ~E' P~, S~) such
that L(Gi) = L(G) - - [e}.
Method.
(1) Let N ' = N U [A I A is a nullable nonterminal in N}. The barred

nonterminal A will generate the same strings as A except e. Hence A is non-
nullable.
(2) If e is in L(G), let S~ = S. Otherwise S~ = S.
(3) Each non-e-production in P can be uniquely written in the form
A ----~ B i . . . BmX1 " " X , (with m ~ 0, n ~ 0, and m --F-n > 0), where each
Bi is a nullable symbol, and if n 3> 0, X1 is a nonnullable symbol. The
remaining Xj, 1 < j ~ n, can be either nullable or nonnullable. Thus, X~

is the leftmost nonnullable symbol on the right-hand side of the production.
For each non-e-production A ---~ B1 " " B m X i " • X , , we construct P ' as fol-
lows"
(a) If m ~ 1, add to P ' the m productions
A= > BIB2 ... BmX 1 ... X.
A > B 2 B 3 . . . BmX" 1 ' ' ' X.

°
A: >BmX~ "'" X .
(b) If n ~ 1 and m ~ 0, add to P ' the production
A >X~ ...X~
(c) In addition, if A is itself nullable, we also add to P' all the produc-
tions in (a) and (b) above with A instead of A on the left.
(4) If A ---~ e is in P, we add A ~ e to P'.
(5) Let G~ = (Na, Z, Pa, S~) be G ' = (N', Z, P', $1) with all useless sym-
bols and productions removed. D
Example 8.1
Let G be the LL(1) grammar with productions
S---->AB
A >aAle
B >bAle
Each of the nonterminals is nullable, so we introduce new nontermi-

nals S-, A, and B; the first of these is the new start symbol. Productions
A .--~ a A and B ---~ b A each begin with a nonnullabte symbol, and so their
right-hand sides are of the form X 1 X 2 for the purpose of step (3) of Algorithm m
8.1. Thus, we retain these productions and also add A ~ a A and B ~ b A

to the set of productions.
In the production S ~ A B , each symbol on the right is nullable, and so
we can write the right-hand side as B I B 2 . This production is replaced by
the following four productions"
D
S > ABIB
S > ABIB
Since S is the new start symbol, we find that S is now inaccessible. The
final set of productions constructed by Algorithm 8.1 is
S > ABIB
A > aA
A > aAle
B > bA
B >bAle
The following theorem shows that Algorithm 8.1 preserves the LL(k)-
ness of a grammar.
THEOREM 8.3
If G1 is constructed from G by Algorithm 8.1, then
(1) L ( G , ) = L(G) -- {e}, a n d
(2) If G is LL(k), then so is G1.
Proof
(1) Straightforward induction on the length of strings shows that for all
A in N,
(a) A ~ w if and only if A ~ w and

G1 G
(b) A ~ G~ w if and only if w =/= e and A ~ 17 w.
The details are left for the Exercises.

For part (2), assume the contrary. Then we can find derivations in G1
s, ~
Gt lm
w ~ Gt
- - .lm wp~ G~*-~
lm
wx
$1 t71
~ l m wAoc 17x
~ l m wToc ==~
(71 l m
wy
where FIRSTk(x ) -- FIRST~,(y), fl ~ 7, and A is either A or A.

We can construct from these two derivations corresponding derivations
in G of w x and wy as follows. First define h(A) = h(A) = A for A ~ N and
h(a)=afora ~ Z.
For each production/~ --, t~ in P1 we can find a production B ---, O'h(O)
in P from which it was constructed in step (3) of Algorithm 8.1. That is,
B -- h(/t), and 6' is some (possibly empty) string of nullable symbols. Each
time production/~ --~ ~ is used in one of the derivations in G~, in the corre-
sponding derivation in G we replace it by B ---, O'h(d;), followed by a leftmost
SEC. 8.1 T H E O R Y OF LL L A N G U A G E S 677
derivation of e from ~' if ~' ~ e. In this manner, we construct the derivations

in G"
S ~G l m wAh(a) ===~
G lm
wfl'h(fl)h(a) *~ wh(fl)h(a) ~G l m wx
G lm
,
s ~ wAh(~) ~ w~'h(~)h(~) ~ wh(r)h(~) ~ wy
G lm G lm G Im G lm
We can write the steps between wfl'h(fl)h(a) and wh(fl)h(a) as
lm wO~ - -Im
wfl'h(fl)h(~)=wO~ ==~ ~ ... lm ~. wO. = wh(,O)h(oO
and those between wy'h(y)h(a) and wh(y)h(a) as
W ~ tth(y)h( 0C) = WE" I ~lm we z ~ l m . . . ~lm we,, = wh(y)h(a).
If z = F I R S T ( x ) = FIRST(y), then we claim that z is in FIRST(0,) and

FIRST(e~) for all i, since fl' and y' consist only of nullable symbols (if fl'
and y' are not e). Since G is LL(k), it follows that ~ -- e~ for all i. In par-
ticular, fl'h(fl) = y'h(y). Thus, fl and y are formed from the same production
by Algorithm 8.1. If fl' ~ y', then in one of the above derivations, a non-
terminal derives e, while in the other it does not. We may contradict the
LL(k)-ness of G in this way and so conclude that fl' = y'. Since fl and y are
assumed to be different, they cannot both be e, and so they each start with
the same, nonnullable symbol. It is then possible to conclude that m = n
and arrive at the contradiction fl -- ?. [--]
Our next transformation will eliminate e-productions entirely from an LL

grammar G. We shall assume that e q~ L(G). Our strategy will be to first
apply Algorithm 8.1 to an LL(k) grammar, and then to combine nonnullable
symbols in derivations with all following nullable symbols, replacing such
a string by a single symbol. If the grammar is LL(k), then there is a limit to
the number of consecutive nullable symbols which can follow a nonnull-
able symbol.
DEFINITION
Let G = (N, X, P, S) be a CFG. Let VG be the set of symbols of the form

[ X B 1 . '. B,] such that
(1) X is a nonnullable symbol of G (possibly a terminal).
(2) B 1 , . . . , B, are nullable symbols (hence nonterminals).
(3) If i ~ j , then Bi ~ Bj. (That is, the list B 1 , . . . , B, contains no
repeats.)
Define the homomorphism g from VG to (N U E)* by g([a]) = a.
LEMMA 8.2
Let G = (N, Z, P, S) be a nonnullable LL(k) grammar such that every

nonterminal derives at least one nonempty terminal string. Let Vo and g
be as defined above. Then for each left-sentential form a of G such that
a ~ e there is a unique fl in Vo* such that h(fl) = a.
Proof We can write a uniquely as a~a2...am, m > 1, where each a~
is a nonnullable symbol X followed by some string of nullable symbols
(because every non-e-sentential form of G begins with a nonnullable symbol).
It suffices to show that [a~] is in Vo for each i. If not, then we can write at
as XflB?B5, where B is some nullable symbol and fl, 7, and 6 consist only
of nullable symbols, i.e., a~ does not satisfy condition (3) of the definition
of Vo. Let w be a nonempty string such that B ~ w. Then there are two dis-
G
tinct leftmost derivations:
fiB?B5 ~lm B?B5 ~lm w?B5 - - lm

-*~ w
and
fiB?B5 ~lm B~ ~Im w5 Ira

*-~ w
From these, it is straightforward to show that G is ambiguous and hence not

LL. [[]
The following algorithm can be used to prove that every LL(k) language
without e has an LL(k + 1) grammar with no e-productions.
ALGORITHM 8.2
Elimination of e-productions from an LL(k) grammar.
Input. An LL(k) grammar Ga = (N~, E, P1, S1)-
Output. An LL(k + 1) grammar G = (N, Z, P, S) such that L ( G ) =
L(G,) -- {e}.
Method.
(1) First apply Algorithm 8.1 to obtain a nonnullable LL(k) grammar
G 2 = (N 2, Z, P2, $2).
(2) Eliminate from G2 each nonterminal A that derives only the empty
string by deleting A from the right-hand sides of productions in which it
appears and then deleting all A-productions. Let the resulting grammar be
G 3 : ( N 3, E, P3, $2)"
(3) Construct grammar G = (N, Z, P, S) as follows:
(a) Let N be the set of symbols [Xa] such that
(i) X is a nonnullable symbol of G3,
(ii) a is a string of nullable symbols,
(iii) Xe ~ E (i.e., we do not have X ~ l ~ and 0~ = e simulta-

neously),
(iv) e has no repeating symbols, and
(v) Xe actually appears as a substring of some left-sentential
form of G 3.
(b) S = [Szl.
(c) Let g be the homomorphism g([e]) = e for all [e] in N and let
g(a) = a for a in E. Since g-~(p)contains at most one member,
we use g-~(fl) to stand for l' if 7 is the lone member of g-l(fl).
We construct P as follows:
(i) Let [Ae] be in N and let A ~ fl be a production in P3.
Then [A~]--, g-~(fl~) is a production in P.
(ii) Let [aeAfl] be in N, with a ~ Z and A ~ N 3. Let A ---, y be
in P3, with ), =/= e. Then [aeAfl] --, ag-x(yfl) is in P.
(iii) Let [aocAfl] be in N with a ~ E. Then [aocAfl]--. a is also
i n P . [-7
Example 8.2
Let us consider the g r a m m a r of Example 8.1. Algorithm 8.1 has already
been applied in that example. Step (2) of Algorithm 8.2 does not affect the
grammar. We shall generate the productions of g r a m m a r G as needed to
assure that each nonterminal involved appears in some left-sentential form.
The start symbol is [S]. There are two S-productions, with right-hand sides
A-B and/~. Since A a n d / ~ are nonnullable but B is nullable, g-a(A-B) = [A-B]
and g-a(B-) -- [/~]. Thus, by rule (i) of step (3c), we have productions
[Sl -~ ~ [ABll[B]
Let us consider nonterminal [AB]. A has one production, A---* aA.

Since g-~(aAB) -- [aAB], we add production
[AB] > [aAB]
Consideration of [B] causes us to add

s
[B]-- ~ [bA]
We now apply rules (ii) and (iii) to the nonterminal [aAB]. There is one
non-e-production for A and one for B. Since g - ~ ( a A B ) - - [ a A B ] , we add
[aAB] > a[aAB]
corresponding to the A-production. Since g-l(bA) -= [bA], we add
[aAB] ~ a[bA]
680 T H E O R Y OF D E T E R M I N I S T I C P A R S I N G CHAP. 8
corresponding to the B-production. By step (iii), we add
[(tAB], >a
Similarly, from nonterminal [bA] we get
[bA] > b[aA] l b
Then, considering the newly introduced nonterminal [aA], we add produc-

tions
[aA] ~ a[aA] l a
Thus, we have constructed the productions for all nonterminals intro-

duced and so completed the construction of G.
THEOREM 8.4
The grammar G constructed from G 1 in Algorithm 8.2 is such that

(1) L(G) ----L(Gi) -- {e}, and
(2) If G~ is LL(k), then G is LL(k + 1).
Proof Let g be the homomorphism in step (3c) of Algorithm 8.2.
(1) A straightforward inductive argument shows that A *~
Ga
fl, where
A ~ N and fl ~ (N U Z)*, if and only if g ( A ) ~Gg ( f l ) and fl ~ e. Thus,
[$2] ~O w, for w i n E*, if and only if $2 ==~
Ga
w and w ~ e. Hence, L(G)
L(G3). That L(G2) ----L(G1) -- (e} is part (1) of Theorem 8.3, and it is easy to
see that step (2) of Algorithm 8.2, converting G 2 to G 3, does not change the
language generated.
(2) Here, the argument is similar to that of the second part of Theorem
8.2. Given a leftmost derivation in G, we find a corresponding one in G3
and show that an LL(k q- 1) conflict in the former implies an LL(k) conflict
in the latter. The intuitive reason for the parameter k + 1, instead of k,
is that if a production of G constructed in step (3cii) or (iii) is used, the ter-
minal a is still part of the lookahead when we mustdetermine which produc-
tion to apply for o~Afl. Let us suppose that G is not LL(k + 1). Then there
exist derivations
S~ w a s - - ~ wp~ * WX
G lm G lm G lm
and
S ~G lm wAoc ~G lm wyt~ ==~
G lm
wy
where fl ~ 7', but FIRSTk+i(x ) = FIRSTk+I(y ). We construct corresponding

derivations in G 3 as follows"
(a) Each time production [Aa] ~ g-l(fla), which is in P because of

rule (3ci), is used, we use production A ~ fl of G 3.
(b) Each time production [aaAfl] ~ ag-1(3,fl), which is in P by rule
(3cii), is used, we do a leftmost derivation of e from ~, followed
by an application of production A ~ 3'.
(c) Each time production [aaA/3] ---~ a is used, we do a leftmost deri-
vation of e from aAfl. Note that this derivation will involve one
or more steps, since t~Afl ~ e.
Thus, corresponding to the derivation S ~ wAa is a unique derivation
G lm
g(S) ~ wg(A)g(a). In the case that A is a bracketed string of symbols

Gs I m
beginning with a terminal, say a, the bordert of wg(A)g(oO is one symbol
to the right of w. Otherwise it is immediately to the right of w. In either case,
since x and y agree for k + 1 symbols and Gs is LL(k), the steps in G 3 corre-
sponding to the application of productions A ~ fl and A ~ 3, in G must be
the same.
Straightforward examination of the three different origins of productions
of G and their relation to their corresponding derivations in G 3 suffices to
show that we must have fl = 3,, contrary to hypothesis. That is, let A ~- [~].
If 6 begins with a nonterminal, say ~ = C~', case (2a) above must apply in
both derivations. There is one production of G3, say C ~ ~", such that
/~ = ~ = g-~(6,,6,).
If 6 begins with a terminal, say ~ - - a 6 ' , case (2b) or (2c) must apply.
The two derivations in G3 replace a certain prefix of ~ by e, followed by
the application of a non-e-production in case (2b). It is easy to argue that
fl -- 3, in either case.
We shall now prove that every LL(k) language has an LL(k + 1) grammar
in Greibach normal form (GNF). This theorem has several important appli-
cations and will be used as a tool to derive other results. Two preliminary
lemmas are needed.
LEMMA 8.3
No LL(k) grammar is left-recursive.
Proof Suppose that G = (N, E, P, S) has a left-recursive nonterminal A.
+
Then there is a derivation A =~ A~. If ~ ~ e, then it is easy to show that G
is ambiguous and hence cannot be LL. Thus, assume that ~ *~ v for some
v ~ l~+. We can further assume that A *~ u for some u ~ E* and that there
exists a derivation
S =L~
lm
wA,~~l m wAock,5 l*-~
m
wuvkx
•l'Border as in Section 5.1.1.

Hence, there is another derivation"
S ~Im . wAc~ - -lm~ wAakc~ ~lm . wAa k+~c~~ lm

* WUV k+ 1 x
Since FIRSTk(UVkX) = FIRSTk(uv k+lx), we can readily obtain a contradic-

tion of the LL(k) definition from these two derivations, for arbitrary k. D
LEMMA 8 . 4
Let G = (N, Z, P, S) be a C F G with A ~ Ba in P, where B ~ N. Let

B ~ fla [f12[ "'" [fin be all the B-productions of G and let G~ = (N, E, P~, S)
be formed by deleting A ---~ Ba from P and substituting the productions
A ~ P ~ I P ~ I " " 1,O~. ThenL(G~) = L(G), and if G is LL(k), then so is G~.
Proof By Lemma 2.14, L(G1) = L(G). To show that Gi is LL(k) when G
is, we observe that leftmost derivations in G1 are essentially the same as
those of G, except that the successive application of the productions A ~ B~
and B ~ fit in G is done in one step in G 1. Informally, since Ba begins with
a nonterminal, the two steps in G are dictated by the same k symbol look-
ahead string. Thus, when parsing according to G~, that lookahead dictates
the use of the production A ~ ilia. A more detailed proof is left for the
Exercises. [~]
ALGORITHM 8.3
Conversion of an LL(k) grammar to an LL(k + 1) grammar in GNF.

lnput. LL(k) grammar G1 = (N~, Z, P1, $1).
Output. LL(k q- 1) grammar G = (N, E, P, S) in Greibach normal form,
such that L(G) = L(Gi) -- [e}.
Method.
(1) Using Algorithm 8.2, construct from G 1 an LL(k + 1) grammar
G2 = (N2, E, P2, S) with no e-productions.
(2) Number the nonterminals of N2, say N 2 = [ A 1 , . . . , Am}, such that
if At ---~ Aja is in P2, then j > i. Since, by Lemma 8.3, G is not left-recursive,
we can do this ordering by Lemma 2.16.
(3) Successively, for i = m - I, m - 2 , . . . , 1, replace all productions
A t ---~ Aja by those A~ ~ fla such that Aj --~ fl is currently a production.
We shall show that this operation causes all productions to have right-hand
sides that begin with terminals. Call the new grammar G 3 = (N3, E, P3, S).
(4) For each a in E, let Xa be a new nonterminal symbol. Let
N=N sU[Xala~ ~}.
Let P be formed from P3 by replacing each instance of terminal a, which is

not the leftmost symbol of its string, by X, and by adding productions Xa ~ a

for each a. Let G = (N, E, P, S). G is in GNF. D
THEOREM 8.5
Every LL(k) language has an LL(k -1- !) grammar in GNF.

Proof It suffices to show that G constructed in Algorithm 8.3 is LL(k + 1)
if G 1 is LL(k). By Lemma 8.4, G3 is LL(k + 1). We claim that the right-hand
side of every production of G3 begins with a terminal. Since G1 is not ieft-
recursive, the argument is the same as for Algorithm 2.14.
It is easy to show that the construction of step (4) preserves the LL(k + 1)
property and the language generated. It is also clear that G is in GNF.
The proofs are left for the Exercises. E]
8.1.3. The Equivalence Problem for LL Grammars
We are almost prepared to give an algorithm to test if two LL(k) gram-

mars are equivalent. However, one additional concept is necessary.
DEFINITION
Let G = (N, E, P, S) be a CFG. For a in (N u E)*, we define the thick-

ness of ~ in G, denoted THO(a), to be the length of the shortest string w in
E* such that a =~ w. We leave for the Exercises the observations that
G
Tno(afl) = Tno(a) q- THO(fl) and that if a ~ fl, then WHO(a) ~ THO(fl).

We further define THe(a, w), where a is in (N U E)* and w is in 2~*k,
to be the length of the shortest string x in ~* such that a ==> x and w =
G
FIRSTk(X ). If no such x exists, THe(a, w) is undefined. We omit k and G
from TH~ or TH ° where obvious.
The algorithm to test the equivalence of two LL(k) grammars is based on
the following lemma.
LEMMA 8.5
Let G 1 = (N1, I2, P,, $1) and G z = (N2, E, P2, $2) be LL(k) grammars
in G N F such that L ( G , ) = L(G2). Then there is a constant p, depending
on G 1 and G 2, with the following property. Suppose that Si ---> wa ---> wx
Gt l m Gt l m
and that $2 G2
~ l m w fl G2 *I m wy, where a and fl are the open portions of w0~and
===>
wfl, and FIRSTk(x) = FIRSTk(y). Then ITHO'(a) -- THO,(fl) ! ~ p.t
P r o o f Let t be the maximum of THO'(y) or TH"'(?) such that ? is a right-
hand side of a production in P1 or P2, respectively. Let p = t(k ÷ 1), and
suppose in contradiction that
(8.1.9) WHO'(a) -- TH°'(fl) > p
•~Absolut¢ value, not length, is meant here by[ I.

We shall show that as a consequence of this assumption, L ( G ~ ) ~ L(Gz).

Let z = FIRST(x) = FIRST(y). We claim that THO~(fl, z) ~ THO~(fl) + p,
for there is a derivation fl G2
===-~
lm
y, and hence, since G 2 is in GNF, there is a
derivation, fl 03
~ l m z~ for some ~, requiring no m o r e t h a n k steps. It is easy
to show that
TH°~(6) < THO~(fl) + kt
since "at worst" ~ is fl with most of the right-hand sides of k productions

appended. We conclude that
(8.1.1o) TH°*(fl, z) < k + THG*(8) _< THG~(fl) + p
It is trivial to show that THO,(a, z) ~ THO,(a), and so from (8.1.9) and

(8.1.10) we have
(8.1.11) T.H°'(~, z) > THO~(fl, z)
If we let u be a shortest string derivable from g, then TH°,(fl, z ) < l z u I.

The string wzu is in L(G2), since S *~ wfl *~ wzg =~
* wzu. But it is impossible
G~ G~ G~
that a =~- G~
zu, because by (8.1.11) THO,(a, z) > Jzu [. Since G1 is LL(k), if
there is any leftmost derivation of wzu in G~, it begins with the derivation
$1 Gt
- - l~m wa. Thus, wzu is not in L(G1) , contradicting the assumption that
L(G1) = L(G2). We conclude that THO,(a) -- THO,(fl) p is handled symmetrically. ~
LEMMA 8.6
It is decidable, for DPDA P, whether P accepts all strings over its input
alphabet.
Proof By Theorem 2.23, L(P), the complement of L(P), is a deterministic
language and hence a CFL. Moreover, we can effectively construct a CFG
G such that L(G) = L(P). Algorithm 2.7 can be used to test if L(G) = ~ .
Thus, we can determine whether P accepts all strings over its input alphabet.
We are now ready to describe the algorithm to test the equivalence of

two LL(k) grammars.
THEOREM 8.6
It is decidable, for two LL(k) grammars G 1 = (N1, El, P1, $1) and
G2 = (N2, Z2, e2, $2), whether L(G~) = L ( G 2 ) .
Proof. We first construct, by Algorithm 8.3, G N F grammars G'~ and G~
equivalent to G 1 and Gz, respectively (except possibly for the empty string,
which can be handled in an obvious way). We then construct a D P D A P

which accepts an input string w in (E~ u E2)* if and only if
(1) w is in both L(G~) and L(G2) or
(2) w is in neither L(G1) nor L(G2).
Thus, L(G~) = L(G2) if and only if L(P) = (E~ U E2)*. We can use Lemma
8.6 for this test.
Thus, to complete the proof, all we need to do is show how we construct
the D P D A P. P has a pushdown list consisting of two parallel tracks. P
processes an input string by simultaneously parsing its input top-down
according to G'~ and G'2"
Suppose that P's input is of the form wx. After simulating I wl steps of
the leftmost derivations in G'i and G~, P will have on its pushdown list the
contents of each stack of the k-predictive parser for G'i and G~, as shown in
Fig. 8.6. We note from Algorithm 5.3 that the stack contents are in each
i w ! i
Parser
open portion for G '1 = a
open portion for G ~ =
Fig. 8.6 Representation of left sentential forms w~ and wfl.
case the open portions of the two current left-sentential forms, together with
some extra information appended to the nonterminals to guide the parsing.
We can thus think of the stack contents as consisting of symbols of G'i and
G~. The extra information is carried along automatically.
Note that the two open portions may not take the same amount of space.
However, since we can bound from above the difference in their thicknesses,
then, whenever L(G1) = L(G2), we know that P can simulate both derivations
by reading and writing a fixed distance down its pushdown list. Since G'I
and G~ are in GNF, P alternately simulates one step of the derivation in G'i,
one in G~, and then moves its input head one position. If one parse reaches
an error condition, the simulation of the parse in the remaining grammar
continues until it reaches an error or accepting configuration.
It is necessary only to explain how the two open portions can be placed
so that they have approximately the same length, on the assumption that
L(G1) = L(G2). By Lemma 8.5, there is a constant p such that the thicknesses
of the two open portions, resulting from processing the prefix of any input
string, do not differ by more than p.
For each grammar symbol of thickness t, P will reserve t cells of the
appropriate track of its pushdown list, placing the symbol on one of them.
Since G'x and G~ are in G N F , there are no nullable symbols in either grammar,
and so t > 1 in each case. Since the two strings ~ and fl of Fig. 8.6 differ in
thickness by at most p, their representations on P's pushdown list differ in
length by at most p cells.
To complete the proof, we design P to reject its input if the two open
portions on its pushdown list ever have thicknesses differing by more than
p symbols. By Lemma 8.5, L ( G ~ ) ~ L(G2) in this case. Also, should the
thicknesses never differ by more than p, P accepts its input if and only if it
finds a parse of that input in both G'i and G~ or in neither of G'~ and G~.
Thus, P accepts all strings over its input alphabet if and only if L(Gi) =
L(G2). E]
8.1.4, The Hierarchy of LL Languages
We shall show that for all k > 0 the LL(k) languages are a proper subset
of the LL(k + 1) languages. As we shall see, this situation is in direct con-
trast to the situation for LR languages, where for each LR language we can
find an LR(1) grammar.
Consider the sequence of languages L1, L 2 , . . , L k , . . . , where
Lk = {a"w l n > 1 and w is in {b, c, bkd}"}.
In this section, we shall show that L k is an LL(k) language but not an

LL(k -- I) language, thereby demonstrating that there is an infinite proper
hierarchy of LL(k) languages. The following LL(k) grammar generates L k"
S > aT
T >S A I A
A= > bB[c
B > bk-ld[ e
We now show that every LL(k) grammar for L k must contain at least one
e-production.
LEMMA 8.7
Lk is not generated by any LL(k) grammar without e-productions.
P r o o f Assume the contrary. Then we may, by steps (2)-(4) of Algorithm
8.3, find an LL(k) grammar, G - - - ( N , {a, b, c, d}, P, S), in G N F such that
L(G)--L k. We shall now proceed to show that any such grammar must
generate sentences not in L k.
Consider the sequence of strings ~, i - - 1, 2 , . . . , such that
S==~ ==~
lm lm
for some J. Since G is LL(k) and in G N F , ~t is unique for each i. For if not,
let ~ = ~j. Then it is easy to show that a~+kbj+k is in L(G), which is contrary
to assumptions if i ~ j. Thus, we can find i such that I~,1>_2 k - - 1.
Pick a value of i such that 0ct = / 3 B y for some fl and ? in N* and B ~ N
such that lPt and 17,1 are at least k ~ 1. Since G is LL(k), the derivation of
the sentence a~+k-~b ~+~'-~ is of the form
S =~::,.a, ]3B r =~, a, ÷k - l b , ÷k -1.

lm Im
Since G is in G N F and I/~1>_ k -- 1, we must have fl *~ a ~-1bi, B *~ bl,

and ~, *~ b m for some j ~ 0, l _~ 1, and m _~ k -- 1, where
i+k-- 1 =j+l+m.
If we consider the derivation of the sentence at+k-le i+k-~, we can also

conclude that B *=~ c" for some n :> 1.
Finally, if we consider the derivation of the sentence at+k-abJ+t+k-adb m,
we find that
S *=~a~flB7 ~ a'+k-lbiB7 *=~a'+k-lbJ+l? *=~at+k-lbJ+l+k-ldb m.
lm Im lm Im
The existence of the latter derivation follows from the fact that the sentence
at+k-~bJ+Z+k-~db r" agrees with a~+k-lb i+k-I for (i + k -- 1) + (j + 1 + k - - 1)
symbols. Thus, ? =~
* bk-ldb ~.
Putting these partial derivations together, we can obtain the derivation
S ~ aiflBy ~ ai+k-XbJB? -Y-> a~+k-XbJc"? ~ ai+k-lbJc"bk-Xdb "

lm Ira lm lm
But the result of this derivation is not in L k, because it contains a substring

o f the form ebk-ld. (Every d must have k b's before it.) We conclude that L k
has no LL(k) grammar without e-productions.
We now show that if a language L is generated by an LL(k) grammar

with no e-productions, then L is an LL(k -- 1) language.
THEOREM 8.7
If a language L has an LL(k) grammar without e-productions, k > 2,

then L has an LL(k -- I) grammar.
P r o o f By the third step of Algorithm 8.3 we can find an LL(k) grammar
in G N F for L. Let G -- (N, E, P, S) be such a grammar. From G, we can
construct LL(k -- 1) grammar G 1 = (N1, E, P1, S), where
(1) N~ = N u { [ A , a ] [ A ~ N, a ~ E, and A ~ aa is in P for some a}
and
(2) P~ = (A --~ a[A, a] l A --~ aoc is in P} u ([A, a] ~ a [A --~ aa is in P}.
It is left for the Exercises to prove that G1 is LL(k -- 1). Note that the con-
struction here is an example of left factoring. [~]
Example 8.3
Let us consider Gk, the natural LL(k -+- 1) grammar for the language L k of
Lemma 8.7, defined by
S ~ aSAlaA
A ~ bkdlb[c
We construct an LL(k) grammar for L k by adding the new symbols

[S, a], [A, b], and [A, c]. This new grammar G, is defined by productions
S > a[S, a]
A , b[A, b]lc[A, el
[S, al "~ S A I A
[A, b] > bk-ldle
[A, c] ~e
It is left for the Exercises to prove that G k and G~, are, respectively,
LL(k + 1) and LL(k).
THEOREM 8.8
For all k ~ 1, the class of L L ( k - 1) languages is properly included

within the L L ( k ) languages.
P r o o f Clearly, the ~LL(0) languages are a proper subset of the LL(1)
languages. For k > 1, using Lemma 8.7, the language L k has no LL(k)
grammar without e-productions. Hence, by Theorem 8.4 it has no LL(k -- 1)
grammar. It does, as we have seen, have an LL(k) grammar.
EXERCISES 689
EXERCISES
8.1.1. Give additional examples of grammars which are

(a) LR and (deterministically) left-parsable but not LL5
(b) Right- and left-parsable but not LR.
(c) L R but not left-parsable.
8.1.2. Convert the following LL(1) grammar to an LL(2) grammar with no
e-productions"
E - - - ~ TE"
E'- ~ + TE'[e
T - - - ~ FT'
T'- ~,FT'[e
F---> al(E)
8.1.3. Convert the grammar of Exercise 8.1.2 to an LL(2) grammar in G N F .

8.1.4' Prove part (1) of Theorem 8.3.
8.1.7. Show that Gi, constructed in the proof of Theorem 8.7, is LL(k -- 1).
8.1.8. Show that a language is LL(0) if and only if it is ~ or a singleton.
8.1.9. Show that Gk and G~, in Example 8.3 are, respectively, LL(k + 1) and
LL(k).
• 8.1.10. Prove that each of the grammars in Theorem 8.2 has the properties
attributed to it there.
"8.1.11. Show that the language L = ~a"b"ln ~ 1} u {a"c" In ~ 1} is a determin-
istic language which is not an LL language. Hint: Assume that L has
an LL(k) grammar G in G N F . Show that L(G) must contain strings not
in L by considering left-sentential forms of the appearance ai~ for i > 1.
8.1.12. Show that L = {a"b m I1 ~ m <~ n} is a deterministic language which is
not LL. Note that L is the concatenation of the two LL(1) languages
a* and {a"b" [n ~ 1}.
"8.1.13. Show that every LL(k) language has an LL(k + 1) grammar in CNF.
• "8.1.14. Let L----L1 U Lz U - - . U L,,, where each Li is an LL language,
1 ~ i ~ m. Show that if L is regular, then L~ is regular for all i.
• "8.1.15. Show that if L is an LL language but not regular, then L is not LL.
"8.1.16. Show that the LL languages are not closed under union, intersection,
complementation, concatenation, reversal, or e-free homomorphism.
Hint: See Exercises 8.1.11, 8.1.12, and 8.1.15.
8.1.17. Prove that THG(afl) = THa(a) ÷ THe(fl) and that if a *~ fl, then
THe(a) .~ WHe(fl).
8.1.18. Give algorithms to compute THe(a) and THG(a, z).
8.1.19. Show that G1 of Algorithm 8.1 left-covers G of that algorithm.
8.1.20. Show that every LL(k) grammar G is left-covered by an LL(k + 1)
grammar in GNF.
8.1.21. For k ~ 2, show that every LL(k) grammar without e-productions is
left-covered by an LL(k -- 1) grammar.
*'8.1.22. Show that it is decidable, given an LR(k) grammar G, whether there
exists a k' such that G is LL(k').
BIBLIOGRAPHIC NOTES
Theorem 8.1 was first suggested by Knuth [1967]. The results in Sections 8.1.2
and 8.1.3 first appeared in Rosenkrantz and Stearns [1970]. Solutions to Exercises
8.1.14-8.1.16 and 8.1.22 can be found in there also. The hierarchy of LL(k) lan-
guages was first noted by Kurki-Suonio [1969].
Several earlier papers gave decidability results related to Theorem 8.6. Korenjak
and Hopcroft [1966] showed that it was decidable whether two simple LL(1)
grammars were equivalent. McNaughton [1967] showed that equivalence was
decidable for parenthesis grammars, which are grammars in which the right-hand
side of each production is surrounded by a pair of parentheses, which do not
appear elsewhere within any production. Independently, Paull and Unger [1968a]
showed that it was decidable whether two grammars were structurally equivalent,
meaning that they generate the same strings, and that their parse trees are the
same except for labels. (Two grammars are structurally equivalent if and only if
the parenthesis grammars constructed from them are equivalent.)
8.2. CLASSES OF G R A M M A R S GENERATING

THE D E T E R M I N I S T I C LANGUAGES
In this section we shall see that various classes of grammars generate

exactly the deterministic languages. Among these are the LR(1), (1, 1)-BRC,
simple MSP, and UI (2, 1)-precedence grammars. In addition, if a deter-
ministic language has the prefix property, then it has LR(0) and (1, 0)-BRC
grammars. Note that any language can be given the prefix property by
appending a right endmarker to each sentence in the language.
8.2.1. Normal Form DPDA's and Canonical Grammars
The general strategy of Section 8.2 is to construct grammars from DPDA's

having certain special properties. These grammars, or simple modifications
SEC. 8.2 GRAMMARS GENERATING DETERMINISTIC LANGUAGES 691
of them, will be in the classes mentioned above. We shall first define the
special properties desired in a DPDA.
DEFINITION
A D P D A P = (Q, E, F, ~, q0, Z0, F) is in normal form if it has all the

following properties:
(1) P is loop-free. Thus, on each input, P can make only a bounded num-
ber of moves.
(2) F h a s a single member, qs, and if (q0, w, Z0)]-~--(qs, e, ~,), then ~, = Z 0.
That is, if P accepts an input string, then P is in the final state qs and the push-
down list consists of the start symbol alone.
(3) Q can be written as Q = Qs u Qw u Qe U {qs}, where Q,, Qw, and
Qe are disjoint sets, called the scan, write, and erase states, respectively; q r
is in none of these three sets. The states have the following properties"
(a) If q is in Qs, then for each a ~ E, there is some state Pa such that
~(q, a, Z ) = (Pa,Z) for all Z . Thus, if P is in a scan state, the
next move is to scan the input symbol. In addition, this move is
always independent of the symbol on top of the pushdown list.
(b) If q is in Qw, then O(q, e, Z) = (p, YZ) for some p and Y and for
all Z. A write state always prints a new symbol on top of the
pushdown list, and the move is independent of the current input
symbol and the symbol on top of the pushdown list.
(c) If q is in Q,, then for each Z E F, there is some state Pz such
that ~(q, e,Z) = (Pz, e). An erase state always removes the top-
most symbol from the pushdown l i s t w i t h o u t scanning a new
input symbol.
(d) 6(qs, a, Z) = O for all a in E u {e} and Z ~ F. No moves are
possible in the final state.
(4) If (q, w, Z) ~ (p, e, Z), then w ~ e. That is, a sequence of moves
which (possibly) enlarges the stack and returns to the same level cannot
occur on e input. A sequence of moves (q, w, Z) ~ (p, e, Z) will be called
a traverse. Note that the possibility or impossibility of a traverse for given
q, p, and w is independent of Z, the symbol on top of the pushdown list.
In short, a scan state reads the next input symbol, a write state prints
a new symbol on the stack, and an erase state examines the top stack symbol,
erasing it. Only scan states may shift the input head.
THEOREM 8.9
If L ~ Z* is a deterministic language, and ¢ is not in Z, then L¢ is L(P)

for some D P D A P in normal form.
Proof We shall construct a sequence of six DPDA's P1-P6, constructing
Pi+l from Pi such that P,.+~ has more of the properties of a normal form
D P D A than Pt does. P6 will be our desired D P D A in normal form.
For P1 we use Lemma 2.28 to find a continuing, and hence loop-free,

D P D A such that Z = L(Pi). We then transform P~ into P2 using the con-
struction of Lemma 2.21, treating P1 as an extended DPDA. The resulting
D P D A P2 will be continuing, and each state will act either as an erase or write
state or will leave the stack fixed, although any state of P2 may advance
the input.
Then we use the construction of Theorem 2.23 to construct P3 from P2.
P3 will have all the properties of P2 and, in addition, will make no e-move
in a final state.
Next, we construct P4 from P3, so that P4 has all the properties of P3
mentioned but accepts L~, where ~ is not in the alphabet of L. P4 will have
a unique final state, q s, and will only accept if the stack consists of its start
symbol only. P4 has two bottom markers, Z0 and Z~. Z0 is the start symbol,
and the first two moves of P4 print Z~ above Z0 and P3's start symbol above
Z1. P4 then simulates P3. If P3 accepts, then on input ~, P4 enters a new state
qe and erases its stack down to Z~. Then, P4 erases Z1, enters state ql, and
makes no further moves.
To put P4 in normal form, it remains to
(1) Separate the (input) scan operation from the stack manipulations and
(2) Eliminate traverses on input e.
For (1), we modify P4 to create Ps- For each state q of P4 on which an e-
move is not possible, except q = qr, we create new states qa for each a in Z.
We have q transfer to qa on input a and then have Ps make the move from
state qa on input e that P4 made from state q on input a.
Finally, we observe that it is decidable whether (q, e, Z ) ~ (p, e, Z) for
each q and p. This question is independent of Z because of the construction
of P2- An algorithm to decide this question is left for the Exercises. We
observe that all DPDA's constructed, including Ps, are loop-free. Hence,
for each state q there is a unique state q' (possibly q' = q) such that
(q, e, Z)l-~-- (q', e, Z),
but for no q" does (q', e, Z)[.z_ (q", e, Z). We construct the final D P D A P6 in
our sequence from Ps by giving q the moves of q' in each situation above.
P6 is then the desired D P D A P.
The detailed construction corresponding to these intuitive ideas is left
for the Exercises. [Z]
We next give a method of constructing what we call the canonical gram-

m a r from a normal form DPDA. This method is somewhat different from
that of Lemma ~2.26 in that here we make use of the special properties of
a normal form DPDA.
DEFINITION
Let M = (Q, E, F, ~, q0, Zo, {qs}) be a normal form D P D A with push-

down top at the left. The canonical grammar for P is G = (N, E, P, [q0qs]),
where
(1) N' is the set of pairs [qp] in Q x Q such that q is a scan or write
state and p is arbitrary. The nonterminaI [qp] will generate exactly those
terminal strings w such that M can make a traverse from state q to state p
under input w. That is, [qp] *~ w if and only if (q, w, Z)I-~-- (p, e, Z) for all
ZinF.
(2) The set of productions P ' is constructed as follows:
(a) If O(q, a, Z) ---- (q', Z), then we add
[qq'] ---->a
to P. Also, for all r E Qs u Q~ we add
[rq'] ~ [rq]a
to P'. Note that here q is a scan state.

(b) If tS(q, e, Z ) = (s, Y Z ) and O(p, e, Y) = (q', e), then we add
[qq'] ~ [sp]
to P', and for all r ~ Q, w Qw,
[rq'] ~ [rq][sp]
is added to P'. Here, q is a write state and p an erase state.

(3) N and P are constructed by eliminating useless nonterminals and
productions from N' and P'. The productions in P will be of the forms
(1) [qq']--, a,
(2) [qq'] ~ [pp']a,
(3) [qq'] ~ [pp'], and
(4) [qq'] ~ [pp'][rr'].
We say that a production of the ith form is of type i for 1 ~ i ~ 4.
We make the following observations about canonical grammars.
(1) If [qq'] ~ a is in P, then q is a scan state.
(2) If [qq'] ~ [pp']a is in P, then p' is a scan state.
(3) If [qq'] ---~ [pp'] is in P, then q is a write state and p' is an erase state.
(4) If [qq'] --~ [pp'][rr'] is in P, then p' is a write state and r' an erase state.
The next observation is also useful. Let q be any write state of M. From
. L
state q, M can write only a fixed number of symbols on its pushdown list
before scanning another input symbol. That is, there exists a finite sequence
of states q 1,. • •, qk such that q l = q, 6(qi, e, Z) ---- (q~+1, Yt Z) for 1 ~_ i < k
and all Z, and q~ is a scan state. The sequence has no repeats, and k ---- 1 is
possible. The justification is that should there be a repeat, then M is not
loop-free; if the sequence is longer than # Q, then there must be a repeat.
We call this sequence of states the write sequence f o r state q.
THEOREM 8.10
If G = (N, X, P, S) is the canonical g r a m m a r constructed from a normal
form D P D A M = (Q, X, F, 8, q0, z0, [qs]), then L(G) = L ( M ) -- [e}.
P r o o f Here we shall prove that [qq'] generates exactly the input strings
for which a traverse from q to q' is possible. To do so, we shall prove the fol-
lowing statement inductively"
(8.2.1) [qq'] ~ w for some n and w ~ e if and only if
(q, w, Z)I--~- (q', e, Z) for some m > 0 and arbitrary Z
lf: The basis, m = 1, is trivial. In this case, w must be a symbol in X,

and [qq'] ----~ w must be a production in P. For the inductive step, assume
that (8.2.1) is true for values smaller than m and that (q, w, Z)[--~- (q', e, Z).
Then the configuration immediately before (q', e, Z) must be either of the
form (p, a, Z) or (p, e, YZ).
In the first case, p is a scan state, and (q, w, Z)[m-a (p, a, Z). Hence,
(q, w', Z)Ira, 1 (p, e, Z) if w'a = w. By the inductive hypothesis, [qp] *~ w'.
By definition of G, [qq'] ---~ [qp]a is in P, so [qq'] ==~ w.
In the second case, we must be able to find states r and s and express
w as wlw2, so that
(q, WlW~, z) ~ (r, w~, z)

~-- (s, w~, Yz)
(p, e, YZ)
~---(q', e , Z )
where m~ < m, m 2 < m, and the sequence (s, w 2, Y Z ) ~ - ( p , e, YZ) never

erases the explicitly shown Y. If m~ = 0, then r = q and w 2 = w. It follows
that [qq']----, [sp] is in P. It also follows from the form of the moves that
(s, w2, Y ) ~ (p, e, Y), and hence, [sp] *=, w 2. Thus, [qq'] =* w.
If m~ > 0, then (q, w~, Z) ~ - (r, e, Z), and so by hypothesis, [qr] *=* Wx.
As above, we also have [sp] *=~ w2. The construction of G assures that
[qq'] ~ [qr][sp] is in P, and so [qq'] *~ w.
Only if: This portion is another straightforward induction and is left

for the Exercises.
The special case of (8.2.1) where q -- q0 and q' ----qr yields the theorem.
D
COROLLARY 1
If L is a deterministic language with the prefix property and e is not in L,t
then L is generated by a canonical grammar.
Proof. The construction of a normal form D P D A for such a language is
similar to the construction in Theorem 8.9. [ZI
COROLLARY 2
If L ~ E* is a deterministic language and ¢ is not in E, then L¢ has
a canonical grammar. [~]
8.2.2. Simple M S P Grammars and Deterministic Languages
We shall now proceed to prove that every canonical grammar is a simple

MSP grammar. Recall that a simple MSP grammar is a (not necessarily UI)
weak precedence grammar G = (N, E, P, S) such that if A ~ e and B --~ ct
are in P, then l(A) n l(B) = ~ , where l(C) is the set of nonterminal or ter-
minal symbols which may appear immediately to the left of C in a right-
sentential form; i.e., I(C) = {XI X < C or X " C}.
We begin by showing that a canonical grammar is a (I, 1)-precedence
grammar (nOt necessarily UI).
LEMMA 8.8
A canonical grammar is proper (i.e., has no useless symbols, e-produc-
tions, or cycles).
Proof. The construction of a canonical grammar eliminates useless
symbols and e-productions. It suffices to show that there are no cycles,
A cycle can occur only if there is a sequence of productions of type 3, say
[q~q'i]---~ [qi+lq;+l], 1 _~i<.], where [qlq'l] = [qjq~]. But then the rules for
the construction of a canonical grammar imply that the write sequence for
q~ begins with ql, q z , . . . , qj and thus has a repeat. This would imply that
the underlying normal form D P D A has a loop, and we may therefore con-
clude that no cycles occur. [~]
"~Note that if L has the prefix property and e is in L, then L = [e}.

LEMMA 8.9
A canonical g r a m m a r is a (not necessarily UI) (1, 1)-precedence gram-
mar.t
P r o o f . Let G = (N, X , P , S) be a canonical g r a m m a r . We consider
the three possible precedence conflicts and show that none can occur.
Case 1: Suppose that X < Y a n d X ~- Y. Since X-~- Y, there must be
a p r o d u c t i o n A - 0 X Y of type 2 or 4. Thus, X = [qq'], and either Y ~ Z
a n d q' is a scan state or Y = [pp'] and p' is an erase state.
Since X < Y, there m u s t also be a p r o d u c t i o n B - 0 X C of type 4, where
+
C =~ Ya for some a. Let X = [qq'] as above. Then q' must be a write state,
because B --~ X C is a type 4 production. Moreover, Y must be of the f o r m
[pp'], where p' is an erase state, and hence A - 0 X Y is of type 4. We may con-
clude from the f o r m of type 4 productions that p is the second state in the
write sequence o f q'. Because B - 0 X C is a type 4 production, we may also
conclude that C = [pp"] for some p " .
+
N o w , let us consider the derivation C =~ Ya, which we may write as
lm
l l
[sis,] ~ [s~si]~ =- ... =- [s.s,]~°,
lrn lm lm
where [sis'z] = [pp"] a n d [s,s'~] = [pp']. We observe from the form of pro-
ductions that for each i either s~+ 1 = s~ (if [s#'~] is replaced by a p r o d u c t i o n
of type 2 or 4) or s~+ 1 is the state following s~ in the write sequence of q' (if
[s#',] is replaced by a p r o d u c t i o n of type 3). Only in the latter case will s~+ ~ be
an erase state, and thus we may conclude that since s'~ ( = p') is an erase state,
s. ( = p) follows s._ ~ on the write sequence of q'. Since s,_ x is either p or fol-
lows p on that sequence, we may conclude that p appears twice in the write
sequence of q'. Since this would imply a loop, we conclude that there are no
conflicts between <~ and " in a canonical g r a m m a r .
Case 2: X < Y a n d X ~ Y. Since X < Y, we may conclude as in case 1
that X -- [qq'], where q' is a write state. But if X -> Y, then there is a produc-
tion A --~ B Z , where B =~ a X a n d Z *=~ Yfl. The form of the productions
+
assures us that if B =~ a[qq'], then q, is an erase state. But we already found
q' to be a write state. W e m a y conclude that no conflicts between < and ->
exist.
Case 3: X "-- Y and X ~ Y. Since X ~-- Y, we may conclude as in case 1
that X --[qq'], where q' is a write or scan state. But, since X -> Y, we may
conclude as in case 2 that q' is an erase state.
Thus, a canonical g r a m m a r is a precedence g r a m m a r .
?To prove that a canonical grammar is simple MSP, we need only prove it to b e w e a k
precedence, rather than (1, 1)-precedence. However, the additional portion of this lemma
is interesting and easy to prove.
THEOREM 8.11
A c a n o n i c a l g r a m m a r is a simple m i x e d strategy p r e c e d e n c e g r a m m a r .
P r o o f By T h e o r e m 8.10 a n d L e m m a s 8.8 a n d 8.9, it suffices to s h o w t h a t
for every canonical g r a m m a r G = (N, E, P, S)
(I) If A ~ a X Y f l a n d B ~ Yfl are in P, t h e n X is n o t in I(B); a n d
(2) If A --~ a a n d B - - , a are in P, A :/: B, t h e n I(A) ~ I(B) -- ;2.
W e have (1) i m m e d i a t e l y , since if X <~ B or X ~ B, t h e n X <~ Y. But if
A ~ a X Y f l is a p r o d u c t i o n , then X ~" Y, a n d so we have a p r e c e d e n c e
conflict, in violation o f L e m m a 8.9.
N o w let us consider (2). A --~ a a n d B ~ a c a n n o t be distinct type 2
p r o d u c t i o n s , for if A -- [qq'], B = [pp'], a n d a = [rr']a, a n d if G comes f r o m
a D P D A M -- (Q, Z, F, ~, qo, Z0, [qs}), we have
6(r', a, Z) -- (q', Z) -- (p', Z).
T h u s q' = p'. But the f o r m o f type 2 p r o d u c t i o n s assures us t h a t q -- p -- r,

so A -- B, which we a s s u m e d n o t to be the case. A similar a r g u m e n t , left
for the Exercises, shows t h a t A - - , a a n d B --~ a c a n n o t be distinct type 4
productions.
Let us n o w cOnsider the case a = a, i.e., where A --~ a a n d B --~ a are
p r o d u c t i o n s o f type 1. In general, if X <~ Y or X " Y, we have seen t h a t X
m u s t be a n o n t e r m i n a l ( p r o o f of L e m m a 8.9, case 1). Thus, s u p p o s e t h a t
C is in b o t h l(A) a n d l(B). T h e n there exist p r o d u c t i o n s D 1 ~ CDz a n d
E1 --+ CEz, where D 2 *=~ Aft a n d E2 *:~ BT. T h e cases A = D 2 or B -- E 2 are
n o t ruled out. Let C -- [qq'], A = [pp'], a n d B -- [rr']. T h e n p a n d r are scan
states, since [pp']--, a a n d [rr']--, a are type 1 p r o d u c t i o n s . By a previous
a r g u m e n t , p a n d r m u s t each a p p e a r in the write sequence o f q', a n d thus
each m u s t e n d t h a t sequence. Hence, p - - - - r . Since
6(p, a, z ) = (p', z ) = (r', z )
we have p' = r' a n d A -- B, in c o n t r a d i c t i o n . Thus, A ~ a a n d B ---,

m a y n o t be of type 1. .
Last, s u p p o s e t h a t A --* a a n d B --~ a are of type 3. As in the p r e v i o u s
p a r a g r a p h , let C be in l(A) A l(B), with C -- [qq'], A = [pp'], a n d B = [rr'].
T h e n p a n d r are each in the write sequence of q'. If p = r, let ~(p, e, Z )
be (s, Y). T h e n 0c = [ss'] for s o m e s', a n d O(s', e, Y) = (p', e) -- (r', e). Thus,
r' -- p', a n d again A = B, which we k n o w n o t to be the case.
W e c o n c l u d e t h a t p :/: r. H o w e v e r , if a --[ss'], then s follows b o t h p
a n d r in the write sequence o f q' a n d thus a p p e a r s twice. W e c o n c l u d e t h a t
A - - , a a n d B --* a are n o t of type 3 a n d t h a t c o n d i t i o n (2) does n o t occur.
Thus, G is simple M S P . [~
As a consequence of Theorem 8.11, L¢ has a simple MSP grammar for

every deterministic language L, where ~ is an endmarker. In fact, we can prove
more, by showing that ~ can be removed from the end of each string generated
by a canonical grammar; the modified grammar will also be simple MSP.
As the construction which eliminates endmarkers will be used several times,
we shall give it the status of an algorithm.
ALGORITHM 8.4
Elimination of the right endmarker from the sentences generated by
a proper grammar.
Input. A proper grammar G = (N, Z tj [~}, P, S), where ¢ is not in Z
and L(G) is of the form L~ for some L ~ Z+.
Output. A grammar G1 = (N1, E, P1, S) such that L(GI) ~ L.
Method.
(1) Remove all productions of the form A --~ ¢ from P.
(2) Replace all productions in P of the form A --~ a~, where a ~ e, by
A --~ 0~.
(3) If A ---~ aB is in P, a ~ e, and B *~ ~, then add A ~ a to P.
G
(4) Remove useless nonterminals and productions from N and the
resultant set of productions. Let N1 and P1 be the nonterminals and produc-
tions remaining. [~
THEOREM 8.12
If G~ is the grammar constructed in Algorithm 8.4, then L(G~)-~ L.
Proof. Since every sentence w in L(G) is of the form x¢ for x ~ Z ÷, it
follows that for every A ~ N, either A =~ G
u implies u ~ Z ÷, or A =~ G
u
implies u = re, where v ~ Z*. Let us call nonterminals of the first kind
intermediate and nonterminals of the latter type completing. A straightfor-
ward induction on the length of derivations shows that if A is an intermediate
nonterminal, then A *~ u if and only if A =:~ u, u ~ ~ * . Likewise, if A is
G Gt
a completing nonterminal, then A *~ v¢ if and only if A *~ v. The proof is

t3 Gt
We now have S ~ w ¢ if and only if S *~w. Thus L ( G 1 ) - - L . [~]
G Gt
THEOREM 8.13
If L is a deterministic language and e is not in L, then L is generated by
a simple MSP grammar.
Proof. Let L ~ Z ÷ and ¢ not be in Z. Then by Corollary 2 to Theorem
8.10, L¢ is generated by a canonical grammar G = (N, Z, P, S), which by
Theorem 8.11 is a simple MSP grammar. Let G I = (N~, Z, P t, S) be the
grammar constructed from G by Algorithm 8.4. Then L(Gi) = L, and we
shall show that G~ is also simple MSP.
It is easy to show that G 1 will have no e-productions or useless symbols.

If G t had a cycle, it would have to involve a production A --, B which is in
Pi b u t n o t in P. Suppose that A --. B X is in P for some X such that X *=, ¢.
G
+
Suppose further that A = - B *=* A. It follows that A =* A w¢ for some w in
Gl Gl G
(E U (¢})*, and hence G generates words with more than one (. Since we
know this not to be the case, we conclude that G1 is proper.
Next, we must show that G~ is a precedence grammar. Since ( appears
only at the right-hand end of strings generated by S in G, it is easy to show
that the only new relations for G~ that do not hold for G involve $ (the end-
marker in the precedence formalism) on the right. But X - > $ is the only
relation that can hold with $ on the right, and so G 1 must be a precedence
grammar.
Third, we must show that no new conflicts A - - . ocXYfl and B--~ Yfl,
with X in l(B), occur in G1. As in Theorem 8.10, the simple precedence
property rules out such problems.
Last, we must show that there is no pair of productions A ---0c and
B - - , 0c in P~, where A =/= B. Three cases need to be considered.
Case 1: Suppose that ~ = C and that A ~ C X and B ---, C Y are in P,
where X *=* ¢ and Y *=* ¢. Let C = [qq']. Ifq' is a scan state, then X -- Y = ¢.
G G
As we saw in the proof of Theorem 8.11, the left-hand side of a type 2 produc-
tion is uniquely determined by its right-hand side, and so A = B, contrary
to hypothesis. If q' is a write state, let p be the unique scan state in the write
sequence of q' and let ~(p, ¢, Z ) = (p', Z), where ~ is the move function of
the D P D A from which G was constructed. After entering state p', the D P D A
can do nothing but erase its stack, because if it scans or writes, either it
accepts a string with ¢ in the middle or X or Y are useless symbols. There is
then a unique state p " in which the D P D A finds itself after erasing the sym-
bols pushed on the pushdown list during the write sequence of q'. Thus,
X - - Y again, and we conclude that A -- B.
Case 2: Suppose that e = C and that A --~ C and B----, C X are in P,
where X =~ ¢ If C - - [ q q ' ] , then q' is an erase state, as A --* C is a type 3
production. But since B --~ C X is a type 2 or 4 production, q' must be a
write or scan state. We thus have a contradiction.
Case 3: If 0c -- C and A ---, C X and B --, C are in P, then we have a situ-
ation symmetric to case 2.
We conclude that G 1 is simple MSP. [~
8.2.3. BRC Grammars, LR Grammars, and

Deterministic Languages
We shall show that a canonical grammar is also a (1, 0)-BRC grammar

and hence, by Theorem 5.21, an LR(0) grammar. Intuitively, the reason
that n o l o o k a h e a d is needed for the shift-reduce parsing o f a canonical

g r a m m a r is that every terminal and every n o n t e r m i n a l [qq'], where q' is an erase
state, indicates the right-hand end of a handle. We shall give a formal p r o o f
incorporating this idea.
THEOREM 8.14
A canonical g r a m m a r is a (1, 0)-BRC g r a m m a r .

Proof Let G = (N, E, P, S) be a canonical g r a m m a r . Suppose that G
were not (1, 0)-BRC. T h e n we can find derivations in the a u g m e n t e d g r a m m a r
G' = (N U [S'}, E, P U {S' --~ S}, S') namely $S' *=, Till
e X A w =~
l'rrl
eXflw a n d
$S' *=, ?Bx ~ ?tSx, where ~6x can be written as oc'Xfly with ]x L~ l y [ but
rm
e ' X A y ~ ~Bx. W e observe that every right-sentential form of G has an open
p o r t i o n consisting of nonterminals only, and so X ~ N and e, e' and ~, are
in N*. W e shall consider four cases, depending on the type of the p r o d u c t i o n
A~/~.
Case 1: Suppose that fl = a, where a e E. Since Ix[ ~ ]y 1, we must have
x = y a n d either ~ = a or ~ -- Xa. In the first case, X is in l(A) n l(B),
which can only occur if A -- B, since we k n o w G to be simple MSP. In the
second case, we have p r o d u c t i o n s A - - ~ a a n d B--~ Xa, with X ~ l(A),
again violating the simple M S P condition.
Case 2: Suppose that fl = Ca for some C ~ N. T h e n since ] x ] ~ l Y 1,
we must have x = y a n d ~ = a or 6 = Ca. If 6 = Ca, then A = B, since
p r o d u c t i o n s o f type 2 are uniquely invertible, as we saw in T h e o r e m 8.11.
It then follows that oc'XAy -- ?Bx, contrary to hypothesis. If ~ = a, we have
C " a from A --~ Ca. Since C must be the last symbol o f ~,, we have C <~ B
or C --" B, a n d hence, C <~ a, in violation of the fact that G is a precedence
grammar.
Case 3: Suppose f l = C for some C ~ N. Either O = C , 6=a or
6 -- Ca for some a E E, or 6 = XC. If 6 = C, then X is in I(A) N I(B), which
violates the simple M S P condition. Let C = [qq']. Then q' is an erase state.
If 6 ---- a then C is t h e last symbol of ~'. Since B appears to the right of C
in a right sentential form, q' must be a write state. We thus have a contra-
diction. If 6 = Ca, then q' would be a scan state, and so we eliminate this
possibility. If 6 = XC, then since X is in l(A), we have a violation of the
simple M S P condition, with productions A ~ C a n d B --~ XC.
Case 4: Suppose that fl -- CD for C a n d D in N. T h e n ~ is one of D, a or
Da for some a ~ E, or CD. Since type 4 productions a r e uniquely invertible
( p r o o f o f T h e o r e m 8.11), if ~ -- CD, then A -- B and oc'XAy = ?Bx. Let
D = [qq']. Because of p r o d u c t i o n A --, CD, q' is an erase state. If ~; -- Da,
then q' would be a scan state, a n d so we eliminate this possibility. If ~ -- a,
then as in Case 3, we can show that q' is a write state. If ~ -- D, then the
last symbol of ~, is C, and so C is in I(B). With productions A --~ C D and

B ~ D, we have a violation of the simple MSP condition. [-7
COROLLARY 1
Every deterministic language L with the prefix property has a (1, 0)-
BRC grammar.
P r o o f If e is not in L, the result is immediate. If e c L, then L = (e}.
It is easy to find a (1, 0)-BRC grammar for this language. [~]
COROLLARY 2
I f L ~ E* is a deterministic language and ¢ is not in E, then L¢ has a (1, 0)-
BRC grammar. [~
We can now prove a theorem about arbitrary deterministic languages

which is almost a restatement of Corollary 2 above.
THEOREM 8.15
Every deterministic language is generated by a (1, 1)-BRC grammar.
P r o o f Let G be a canonical grammar and construct G 1 from G by
Algorithm 8.4. It is necessary to show that G 1 is (1, 1)-BRC. Intuitively,
the problem is to recognize when a production A ~ B of G1, which is not
a production of G, is to be used for a reduction. However, the one-symbol
lookahead allows us to make such a reduction only when $, the BRC end-
marker, is the lookahead. A formal proof is left for the Exercises. [--7
THEOREM 8 . 1 6
(1) Every deterministic language with the prefix property has an LR(0)
grammar.
(2) Every deterministic language has an LR(1) grammar.
P r o o f From Theorem 5.21, every (m, k)-BRC grammar is an LR(k)
grammar. The result is thus immediate from Theorem 8.15 and Corollary I
to Theorem 8.14. [~]
8.2.4. Extended Precedence Grammars and

Deterministic Languages
Up to this point we have seen that if L is a deterministic context-free

language, then
(1) There is a normal form D P D A P such that L ( P ) - - L~,
(2) There is a simple mixed strategy precedence grammar G such that
L ( G ) = L -- {e}, and
(3) There is a (1, 1)-BRC grammar G such that L ( G ) = L.
We shall now show that there is also a UI (2, /)-precedence grammar G

such that L ( G ) = L - [e}. We already know that a canonical grammar is
a (1, 1)'precedence grammar, although it is not necessarily UI. To obtain
our result, we shall perform several transformations on a canonical grammar
that will convert it to a UI (2, 1)-precedence grammar. These transformations
are summarized as follows"
(1) Each nonterminal is augmented to record the symbol to its left in
a rightmost derivation.
(2) Single productions are eliminated by replacing the productions
A ---~ B and B ---~ a l l ' " [am by the productions A ---~ al [ . . . [~m"
(3) Nonterminals are "split" so that each nonterminal either has one
production of the form A ---~ a or has no production of that form.
(4) Finally, nonterminals are "strung out" to restore unique invertibility
to nonterminals whose production is A ~ a. That is, a set of productions
A a ~ a , . . . , A k ~ a will be replaced by A a ~ A z. . . . . A k-1 ~ A k and
Ak ---, a.
The first three operations each preserve the (1, 1)-precedence nature of
the grammar. The fourth step may make the grammar (2, 1)-precedence at
worst, but the fourth step is needed to ensure unique invertibility. We now
give the complete algorithm.
ALGORITHM 8.5
Conversion of a canonical grammar to a UI (2, 1)-precedence grammar.

lnput. A canonical grammar G = (N, X, P, S).
Output. An equivalent uniquely invertible (2, 1)-precedence grammar
G4 = (N4, X, P4, S4)-
Method.
(1) Construct Gx = (N1, E, P1, $1) from G as follows:
(a) Let N~ be the set of symbols Ax, where A E N and X ~ N u ($}.
(b) Let S~ = Ss.
(c) P'~ consists of productions A x ---~ Bx, A x ~ BxCB, A x ---~ Bxa,
and Ax---~ a, for all X E N u {$}, whenever A ---~ B, A ---, BC,
A ---~ Ba, and A ---~ a are in P, respectively.
(d) N~ and P1 are formed from N'~ and P'i, respectively, by deleting
useless symbols and productions.
(2) Construct G2 = (N2, X, P2, Sa) from Ga as follows:
(a) Removeall single productions from P1 by performing the follow-
ing operation until no further change is possible. If A ~ B is
currently a production of P1, add A ~ ~ for each production
B---~ ~z in P1 and delete A ~ B.
(b) Let N2 and P2 be the resulting set of useful nonterminals and
productions.
SEC. 8.2 GRAMMARS GENERATING DETERMINISTICLANGUAGES 703
(3) Construct G 3 = (N 3, E, P3, S~) from G2 as follows:

(a) Let N~ consist of N2 and all symbols A = such that A --~ a is in Pz.
(b) Add to P~ the production A a ~ a for all A a in N~.
(c) If A is in N2, A ~ e is in P2, and tx is not a single terminal, add
A ~ tx' to P~ for each e' in h- 1(ix), where h is the homomorphism"
h(a) = a for all a ~

h ( B a) = B for all B a E N~
h(B) = B for all B ~ N z
(d) Let N 3 and P3 be the useful portions of N'3 and P~.

(4) Construct G 4 ~ (N4, E, P4, $4) f r o m G 3 as follows:
(a) N4 = N3 and $4 = $1.
(b) P4 is P3 with the following change. If A 1 ~ a, A z ~ a . . . . ,
A k ~ a are all the productions with a on the right in P3 in some
order and k > 1, then these productions are replaced in P4 by
A~ ---, A z, A 2 ~ A 3 , . . . , A k _ ~ --~ At, and A k --~ a. D
Example 8.4
Let G be defined by the following productions:
S: ~AB
A >alb
B---+AC
C ~D
D ---~ a
Then G1 is defined by
Ss ~ AsB~
As >a[b
Ba------~ A A C ~
AA -----+ a l b
CA > DA
DA - >a
In step (2) of Algorithm 8.5, CA ~ DA is replaced by CA ---~ a. DA is then

useless.
In step (3), we add nonterminals A~, A~, A~, A~., and C~. Then A S, AA,
and CA become useless, so the resulting productions of G 3 are
ss, ~ A~B~IA~B~
A~ ~a
A~ ,b
B ~~ A~AC~ I AbaCa~
A~ >a
AbA ~ >b
C~: >a
In step (4), we string out A~, AS, and C,5, and we string out A~ and A].
The resulting productions of Ga are
Ss > A~B.4 ]A ~B.4

OA ~ A~C~[ AACA
b a
A~ ~ A.~
A]. > C]
C] -----> a
A~ ---~. A~
A] >b [~
We shall now prove by a series of lemmas that G4 is a UI (2, 1)-precedence

grammar.
LEMMA 8.10
In Algorithm 8.5,
( 1 ) L ( G 1 ) = L(G),
(2) G~ is a (l, 1)-precedence grammar, and
(3) If A ~ ~ and B--~ ~ are in P~ and ~z is not a single terminal, then
A=B.
Proof. Assertion (1) is a straightforward induction on lengths of deriva-
tions. For (2), we observe that if X and Y are related by <~, ~-, or 3> in G~,
then X' and Y', which are X and Y with subscripts (if any) deleted, are simi-
larly related in G. Thus G~ is a (1, 1)-precedence grammar, since G is.
For (3), we have noted that productions of types 2 and 4 are uniquely
invertible in G. That is, if A --~ C X and B ~ C X are in P, then A = B.
It thus follows that if Ar --~ CyXc and B r ~ C y X c are in P~, then Ar = Br.
Similarly, if X is a terminal, and Ay ~ C y X and By ~ C y X are in P~, then
Ay = By.
We must consider productions in P1 derived from type 3 productions;
that is, suppose that we have A x ~ Cx and Bx --~ Cx. Since we have elimi-
nated useless productions in G l, there must be right-hand sides X D and X E
:#
of productions of G such that D ~G Atx and E ~G Bfl for some ~ and ft.
Let X = [qq'], A = [pp'], B = [rr'], and C =[ss']. Then p and r are in the
write sequence of q', and s follows both of them. We may conclude that
p = r. Since A ---~ C and B ~ C are b o t h type 3 productions, we have
p' -- r'. That is, let 6 be the transition function of the D P D A underlying G.
Then O(p, e, Z ) -- (s, Y Z ) for some Y, and O(s', e, Y) -- (p', e) -- (r', e).
Thus, A -- B, and A x -- B x.
The case X -- $ is handled similarly, with q0, the start state of the underly-
ing D P D A , playing the role of q' in the above. [~
LEMMA 8.11
The three properties of G1 stated in Lemma 8.10 apply equally to G2 of

Algorithm 8.5.
P r o o f Again (1) is a simple induction. To prove (2), we observe that
step (2) of Algorithm 8.5 introduces no new right-hand sides and hence no
new " relationships. Also, every right-sentential form of G2 is a right-
sentential form of Gi, and so it is easy to show that no <~ or .> relationships
are introduced.
For (3), it suffices to show that if A x - - ~ B x is a production in P1, then
B x appears on the right of no other production and hence is useless and
eliminated in step (2b). We already know that there is no production
Cx ----, B x in P1 if C ~ A. The only possibilities are productions Cx --~ Bxa,
Cx ~ BxDB, or Cr --~ X r B x . Now A --~ B is a type 3 production, and so if
B = [qq'], then q' is an erase state. Thus, Cx ~ B x a and Cx --~ B x D n are
impossible, because C ~ Ba and C --, B D are of types 2 and 4, respectively.
Let us consider a production Cr ---* XrBx. Let X = [pp'] and A -- [rr'].
Then q is the second state in the write sequence of p', because C - - ~ X B is
a type (4) production. Also, since A --~ B is a type 3 production, q follows
r in any write sequence in which r appears. But since X can appear immedi-
ately to the left of A in a right-sentential form of G, r appears in the write
sequence of p', and r ~ p'. Thus, q appears twice in the write sequence of
p', which is impossible. [~]
LEMMA 8.12
The three properties of G1 stated in Lemma 8.10 apply equally to G3

of Algorithm 8.5.
P r o o f Again, (1) is straightforward. To prove (2), we note that if X and
Y are related in G 3 by <~, ~ ' , or .>, then X' and Y' are so related in G2,
where X' and Y' are X and Y with superscripts (if any) removed. Thus, G3
is (1, 1)-precedence, and (2) holds. Part (3) is straightforward.
THEOREM 8.17
G4 of Algorithm 8.5 is a UI (2, 1)-precedence grammar.
P r o o f Since for each a in E, P4 has at most one production A ~ a, it
should be clear that G4 is UI. It is easy to show that the only new
(1, 1)-precedence conflicts which might be introduced involve new relations

(1) A~ .> b and (2) C ~ A~.
Conflicts of form (1) occur because some B~ -> b or B} " b holds in G3,
and B~. ~ A~. We might also have A~. " b or A~ ~ b in G3 and hence in G4.
G4
Conflicts of form (2) occur by essentially the same mechanism. We could
. -t- .
have C B~ or C ~ Bg., in G3, and B~, A~. We could also have C A}
G4
in G 3 and hence in G4. (C' is C with subscripts and superscripts deleted.)
We shall show that potential conflicts of form (1) are resolved by con-
sidering (2, 1)-precedence relations. Those of form (2) cannot occur.
Case 1: Suppose for some C ~ N4 that CA~..~ b in G4 and that either
CA~x " b or CAax <:. b in both G 3 and G 4. Then C - - Xdz for some Z and d;
the latter symbol may not actually appear. We observed that since A% .> b
+
is a relation in G4 but not in G3, there must exist a derivation B~. ==,
t34
A} such
that C can appear to the left of B~. in a rightmost derivation of G 3. Thus,
Y=X.
We must show that there cannot be two distinct nonterminals B~. and Aft
in N 3. That is, there cannot exist Bx and A x in Nz, with A =/= B, A =~ a,
and B *=* a. Let A = [qq'], B = [pp'], and X -- [rr']. We observe that q and
G
p are both in the write sequence of r'. Also, if [ss'] *=,

O
a and [ss"] *=,
G
a, then
referring to the D P D A underlying G, we may conclude that s' -- s". Thus,
q' is uniquely determined by q, and p' by p, given that [qq'] *=* a and [pp'] ==>a.
13 G
It follows that since q and p are in the write sequence of r' and q ~ p,
then either B appears as a sentential form in the derivation A =* 13
a or A
appears in B =* a.
G
Assume the former without loss of generality. Then there is a nontrivial
derivation of B from A in G, and this derivation must use only type (3) pro-
ductions. Then A x =L,G~ Bx, and Bx should have been removed in step (2)
of Algorithm 8.5. In contradiction, we conclude that B~, does not exist in G4.
The case where $ replaces C in the above is handled similarly. Here, q0,
the start state of the underlying DPDA, plays the role of r'.
Case 2: Suppose that for some C, we have C <: A~c in G 4 and C " A~c
in both G3 and G4. Then there is some B~. such that C <: B} or C " B~ in
G 3 and B~. =2, A.~. By the same argument as in case 1 we conclude that
+
C = X~ and that A x = , Bx, or vice versa. Thus, A x or Bx should have been
Gt
removed in step (2) of Algorithm 8.5. Note that there cannot be a (1, 1)-
precedence conflict here, let alone a (2, 1)-precedence conflict.
We conclude that G4 is a UI (2, 1)-precedence grammar. [-1
COROLLARY 1
Every deterministic language L with the prefix property but without e

has a UI (2, 1)-precedence grammar. [~]
EXERCISES 707
COROLLARY 2
If L ~ Z* is a deterministic language and ¢ is not in Z, then L¢ has a UI
(2, 1)-precedence grammar. [~]
We can strengthen Theorem 8.17 by deleting the endmarker in Corollary 2.

THEOREM 8.18
Every deterministic language without e has a UI (2, 1)-precedence gram-
mar.
P r o o f Let L¢ have canonical grammar G. Apply Algorithm 8.5 to G and
Algorithm 8.4 to the resulting G4. We claim that the grammar constructed
by Algorithm 8.4 is also a UI (2, 1)-precedence grammar. The proof is left
for the Exercises.
EXERCISES
8.2.1. Find a normal form DPDA accepting

(a) [wcwR I w E (a + b)*}.
(b) [amb"a"b m t m, n ~ 1 ].
(c) L(Go).
8.2.2. Find the canonical grammar for the normal form DPDA
P - ([qo, q,, qz, q3, qf], [0, 1], [Zo, Z1, Xi, ~, qo, Zo, [ql])
where 5 is given, for all Y, by
O(qo, e, Y) -- (ql, Z1 Y)
5(ql, O, Y) = (qz, Y)
5(ql, 1, Y) -- (q3, Y)
5(q2, e, Y) = (ql, X Y )
6(q3. e, X) -- (qx, e)
O(q3, e, Zi) = (qf, e)
O(q3, e, Zo) = (qs, e)t
8.2.3. Identify the write, scan, and erase states in Exercise 8.2.2.
8.2.4. Give formal constructions for Theorem 8.9.
The following three exercises refer to a canonical grammar
G = (N, E, P, S).
8.2.5. Show that
(a) If [qq'] ---~ a is in P, then q is a scan state.
(b) If [qq'] ---~ [pp']a is in P, then p' is a scan state.
l'This rule is never used but appears for the sake of the normal form.
(c) If [qq'] ~ [pp'] is in P, then q is a write state and p' an erase state.
(d) If [qq'] --, [pp'][rr'] is in P, then p' is a write state and r' is an erase
state.
8.2.6. Show that if [qq'][pp'] appears as a substring in a right-sentential form
o f G, then
(a) q' is a write state.
(b) q' =¢=p.
(c) p is in the write sequence of q'.
+ p,
8.2.7. Show that if [qq'] ~ t~[pp'], then is an erase state.
8.2.8. Prove the "only if" portion of Theorem 8.10.
8.2.9. Give a formal proof of Theorem 8.15.
8.2.10. Use Algorithm 8.5 to find UI (2, 1)-precedence grammars for the fol-
lowing deterministic languages:
(a) {a0nc0n In > 0} U {bOncO2~In >_ 0}.
(b) {0~aln0m In > 0, m > 0} U {0mbln0n [n > 0, m > 0}.
8.2.11. Complete the case X = $ in Lemma 8.10.
"8.2.14. Show that a CFL has an LR(0) grammar if and only if it is determin-
istic and has the prefix property.
8.2.15. Show that a CFL has a (1, 0)-BRC grammar if and only if it is deter-
ministic and has the prefix property.
*'8.2.16. Show that every deterministic language has an LR(1) grammar in
(a) CNF.
(b) GNF.
8.2.17. Show that if A ~ t~ and B - ~ • are type 4 productions of a canonical
grammar, then A = B.
"8.2.18. If G of Algorithm 8.4 is a canonical grammar, does GI constructed in
that algorithm right-cover G ? What if G is an arbitrary grammar .9
8.2.19. Complete the proof of Theorem 8'12.
*8.2.20. Show that every LR(k) grammar is right-covered by a (1, k)-BRC
grammar. Hint: Modify the LR(k) grammar by replacing each ter-
minal a on the right of productions by a new nonterminal X,~ and
adding production Xo ---~ a. Then modify nonterminals of the grammar
to record the set of valid items for the viable prefix to their right.
*'8.2.21. Show that every LR(k) grammar is right-covered by an LR(k) grammar
which is also a (not necessarily UI) (1, 1)-precedence grammar.
*8.2.22. Show that G4 of Algorithm 8.5 right-covers G of that algorithm.
SEC. 8.3 THEORY OF SIMPLE PRECEDENCE LANGUAGES 709
Open Problems
8.2.23. Is every LR(k) grammar covered by an LR(1) grammar ?
8.2.24. Is every LR(k) grammar covered by a UI (2, 1)-precedence grammar ?
A positive answer here would yield a positive answer to Exercise 8.2.23.
8.2.25. We stated this one in Chapter 2, but no one has solved it yet, so we
shall state it again. Is the equivalence problem for DPDA's decidable ?
Since all the constructions of this section can be effectively carried out,
we have many equivalent forms for this problem. For example, one
might show that the equivalence problem for simple MSP grammars
or for UI (2, l)-precedence grammars is decidable.
BIBLIOGRAPHIC NOTES.
Theorem 8.11 was first derived by Aho, Denning, and Ullman [1972]. Theorems
8.15 and 8,16 initially appeared in Knuth [1965]. Theorem 8.18 is from Graham
[1970]. Exercise 8.2.21 is from Gray and Harrison [1969].
8.3. THEORY OF SIMPLE PRECEDENCE LANGUAGES
We have seen that many classes of grammars generate exactly the deter-
ministic languages. However, there are also several important classes of
grammars which do not generate all the deterministic languages. The LL
grammars are such a class. In this section we shall study another such class,
the simple precedence grammars. We shall show that the simple precedence
languages are a proper subset of the deterministic languages and are incom-
rnensurate with the LL languages. We shall also show that the operator
precedence languages are a proper subset of the simple precedence languages.
8.3.1. The Class of Simple Precedence Languages
As we remarked in Chapter 5, every CFL has a UI g r a m m a r and a prece-

dence grammar. When these properties occur simultaneously in a grammar,
we have a simple precedence grammar and language. It is interesting to
examine the power of simple precedence grammars. They can generate only
deterministic languages, since each simple precedence grammar has a deter-
ministic parser. We shall now prove the two main results regarding the class
of simple precedence languages. First, we show that
L~ = {aO"l" In ~ 1) w {bO"lz" In ~ 1}
is not a simple precedence language.

THEOREM 8.19
The simple precedence languages form a proper subset of the deterministic
languages.
Proof. Clearly, every simple precedence grammar is an LR(1) grammar.
To prove proper inclusion, we shall show that there is no simple precedence
grammar that generates the deterministic language L 1.
Intuitively, the reason for this is that any simple precedence parser for L 1
cannot keep count of the number of 0's in an input string and at the same time
know whether an a or a b was seen at the beginning of the input. If the parser
stores the first input symbol on the pushdown list followed by the succeeding
string of 0's, then, when the l's are encountered on the input, the parser will
not know whether to match one or two l's with each 0 on the pushdown list
without first erasing all the O's stored on the stack. On the other hand, if
the parser tries to maintain on top of the pushdown list an indication of
whether an a or b was initially seen, then it must make a sequence of reduc-
tions while reading the O's, which destroys the count of the number of O's
seen on the input.
We shall now construct a formal proof motivated by this intuitive reason-
ing. Suppose that G = (N, X, P, S) is a simple precedence grammar such
that L(G) = Li. We shall show as a contradiction that any such grammar
must also derive sentences not in L~.
Suppose that an input string aO"w, w ~ 1", is to be parsed by the simple
precedence parser constructed for G according to Algorithm 5.12. As a0"
is the prefix of some sentence in L 1 for all n, each 0 must eventually be shifted
onto the stack. Let 0c~be the stack contents after the ith 0 is shifted. If ~ = tzj
for some i < j, then a0Jl ~ and a0JlJ would either both be accepted or both
be rejected by the parser, and so ~t ~ ~j if i ~ j.
Thus, for any constant c we can find a n ' ~ such that ]~i[ > c and ~ is
a prefix of every t~j, j > i (for if not, then we could construct an arbitrarily
long sequence ~s,, ~s,,. • • such that t~,, [ = ]~s,_, ] for t ~ 2 and thus find
two identical e's). Choose i as small as possible. Then there must be some
shortest string fl ~ e such that for each k, 0c;fl* is e,'+mk for some m > 0.
The reason is that since e,. is never erased as long as O's appear on the input,
the symbols written on the stack by the parser do not depend on 0c,.. The
behavior of the parser on input a0" must be cyclic, and it must enlarge the
stack or repeat the same stack contents (and we have just argued that it may
n o t do the latter).
Now, let us consider t h e behavior of the parser on an input string of
the form bO"x. Let )'k be the stack after reading b0 k. We may argue as above,
that for some ?j, j as small as possible, there is a shortest string J ~ e such
that for each k, ?ilk = ),j+q~ for some q > 0. In fact, since ?j is never erased,
we must have ~ = fl and q = m. That is, a simple induction on r ~> 0 shows
that if after reading aOi+r the stack holds aier, then after reading bOj+r, the
stack will hold ?je,..
Consider the moves made by the parser acting on the input string
aOt+mklt+mk. After reading aO~+mk, the stack will contain etfl k. Then let s be
the largest number such that after reading 1~+mk-" the parser will have 0~t~
left on its stack for some ~ in (N U E)* (i.e., on the next 1 input, one of the
symbols of et is involved in a reduction). It is easy to show that s is unique;
otherwise, the parser would accept a string not in L~.
Similarly, let r be the largest number such that beginning with ?jfle on
its stack the parser with input 12~j+mk~-r makes a sequence of moves ending
up with some ~,j~ on its stack. Again, r must be unique.
Then for all k, the input bOJ+mkli+mk-`+r must be accepted, since bOj+m~'
causes the stack to become ~jflk, and 1'+mk-` causes the stack to become
~j~. The erasure of the ,O's occurs independently of whether ~i or 0~j is below,
and 1r causes acceptance. But since m ~ 0, it is impossible that we have
i -k mk -- s -k r = 2(j Jr mk) for all k.
We conclude that L~ is not a simple precedence language.
THEOREM 8 . 2 0
The classes of LL and simple precedence languages are incommensurate.

Proof L1 of Theorem 8.19 is an LL(1) language which is not a simple
precedence language. The natural grammar for L1
S - - - + aA t bB
A > 0A1 [01
B~ > 0BII[011
is easily shown to be an LL(2) grammar. Left factoring converts it to an

LL(1) grammar.
We claim that the language Lz = [0"al" In _~ 1} U {0"b2"ln _~ 1] is not
an LL language. (See Exercise 8.3.2.) L~ is a simple precedence language;
it has the following simple precedence grammar"
S- >A[B
A > 0Alia
B ~ 0B21b D
8.3.2. Operator Precedence Languages
Let us now turn our attention to the class of languages generated by

the operator precedence grammars. Although there are ambiguous operator
712 THEORY OF DETERMINISTIG PARSING CHAP. 8
precedence grammars (the grammar S ~ A IB, A ----~ a, B ---~ a is a simple

example), we shall discover that operator precedence languages are a proper
subset of the simple precedence languages. We begin by showing that every
operator precedence language has an operator grammar which has no single
productions and which is uniquely invertible.
LEMMA 8 . 1 3
Every (not necessarily UI) operator precedence grammar is equivalent to

one with no single productions.
P r o o f Algorithm 2.11, which removes single productions, is easily seen
to preserve the operator precedence relations between terminaIs. E]
We shall now provide an algorithm to make any context-free grammar

with no single productions uniquely invertible.
ALGORITHM 8.6
Conversion of a CFG with no single productions to an equivalent UI
CFG.
Input. A CFG G = (N, X, P, S) with no single productions.
Output. An equivalent UI CFG G~ = (N~, X, P1, $I).
Method.
(1) The nonterminals of the new grammar will be nonempty subsets of N.

Formally, let N'a = {MIM ~ N and M ~ ~} W {S~}.
(2) For each Wo, w ~ , . . . , w k in X* and M i , . . . , Mk in N'~, place in P'~
the production M - - ~ w o M ~ w 1 . . . M k w e , where
M = [A [there is a production of the form A ~ woB~w ~ ... Bkwk,
where Bt ~ Mr, for 1 ~ i ~ k}
provided that M ~ ~ . Note that only a finite number of productions are so

generated.
(3) For all M _ N such that S ~ M, add the production S~ ~ M to P'i.
(4) Remove all useless nonterminals and productions, and let N 1 and P1
be the useful portions, of N'i and P'i, respectively. [--]
Example 8.5
Consider the grammar
S ~ > a laAbS
A > a laSbA
From step (2) we obtain the productions
IS} - - + a{A}b{S} l a{S, A}b[S} I a[A]b{S, A}

{A] - > a[S}b{A} l a{S, A}b{A} I a[S}b[S, A}
IS, A] > a l a{S, A}b[S, A}
From step (3) we obtain the productions
S 1~ {S}[~S, A]
From step (4) we discover that all [S]- and (A]-productions are useless, and
so the resulting grammar is
S1 ), [S~ A}
IS, A} > a la[S, A}b{S, A}
LEMMA 8 . 1 4
The grammar G~ constructed from G in Algorithm 8.6 is UI if G has no

single productions and is an operator precedence grammar if G is operator
precedence. Moreover, L(G~) = L(G).
Proof. Step (2) certainly enforces unique invertibility, and since G has
no single productions, step (3) cannot introduce a nonuniquely invertible
right side. A proof that Algorithm 8.6 preserves the operator precedence
relations among terminals is straightforward and left for the Exercises.
Finally, it is easy to verify by induction on the length of a derivation that
i f A e M , t h e n A ~ Gw i f a n d o n l y i f M * ~ w .G1 D
We thus have the following normal form for operator precedence gram-
mars.
THEOREM 8.21
IfL is an operator precedence language, then L = L(G) for some operator
precedence grammar G = (N, Z, P, S) such that
(1) G is UI,
(2) S appears on the right of no production, and
(3) The only single productions in P have S on the left.
Proof Apply Algorithms 2.11 and 8.6 to an arbitrary operator precedence
grammar. [Z
We shall now take an operator precedence grammar G in the normal

form given in Theorem 8.21 and change G into a UI weak precedence gram-
mar G1 such that L(Gi) -- (L(G), where ~ is a left endmarker. We can then
construct a UI weak precedence grammar G2 such that L(Gz) --L(G). In
this way we shall show that the operator precedence languages are a subset
of the simple precedence languages.
ALGORITHM 8.7
Conversion of an operator precedence grammar to a weak precedence

grammar.
Input. A UI operator precedence grammar G -- (N, E, P, S) satisfying
the conditions of Theorem 8.21.
Output. A UI weak precedence grammar G~ = ( N ~ , E U {¢},P~,S~)
such that L(G~) --eL(G), where ¢~ is not in E.
Method.
(1) Let N'I consist of all symbols [XA] such that X ~ E U {¢} and A E N.
(2) Let S~ --[¢S].
(3) Let h be the homomorphism from N'~ w E U {¢} to N U ~ such that
(a) h(a) -- a, for a ~ E u (¢~},and
(b) h([aA]) -- aA.
Then h-~(~) is defined only for strings ~ in (N u E)* which begin with a
symbol in E u {¢} and d o n o t have adjacent nonterminals. Moreover, h-~(~)
is unique if it is defined. That is, h-1 combines a nonterminal with the ter-
minal appearing to its left. Let P'i consist of all productions [aA]---, h-l(aoO
such that A --. ~ is in P and a is in ~: u [¢}.
(4) Let N~ and P~ be the useful portions of N'i and P'~, respectively. [---]
Example 8.6
Let G be defined by
S ~A
A ~ aAbAe[aAd[ a
We shall generate only the useful portion of N'~ and P'i in Algorithm 8.7.
We begin with nonterminal [¢~S]. Its production is [¢S] ~ [¢~A]. The produc-
tions for [CA] are [tEA]~ ¢[aA][bA]c, [CA]--~ ¢[aA]d, and leA]---, Ca. The
productions for [aA] and [bA] are constructed similarly. Thus, G1 is
[¢s] ~ [¢,4]
[CA] > ¢[aA][bA]e I f~[aA]d[f~a
[aA] ~ a[aA][bA]c l a[aA]d [ aa
[bA] -----+ b[aA][bA]e [ b[aA]d [ ba D
LEMMA 8.15
In Algorithm 8.7 the grammar G~ is a UI weak precedence grammar such
that L(G~) = eL(G).
Proof. Unique invertibility is easy to show, and we omit the proof. To
show that G1 is a weak precedence grammar, we must show that < and "
are disjoint from 3> in G~'t. Let us define the homomorphism g by
(1) g(a) : a for a ~ 12,
(2) g(¢) = $, and
(3) g([aA]) : a.
It suffices to show that
(1) If X <Z Y in G1, then g(X) <Z g(Y) or g(X) ~- g(Y) in G.
(2) If X ~ Y in G~, then g(X) ~ g(Y) or g(X) ~ g(Y) in G.
(3) If X -> Y in G1, then g(X) .> g(Y) in G.
Case/'Suppose that X ~ Y in G1, where X ~ ¢ and X ~ $. Then there
is a right-hand side in P~, say ocX[aA]fl, such that [aA] ~ Y7 for some 7-
• GI
If t~ is not e, then it is easy to show that there is a right-hand side in P with
substring h(X[aA]):~ and thus g(X) ~ a in G. But the form of the productions
in P~ implies that g(Y) must be a. Thus, g(X) " g(Y).
If t~ is e and X ~ E, then the left-hand side associated with right-hand
side ~X[aA]fl is of the form [XB] for some B E N. Then there must be a pro-
duction in P whose right-hand side has substring XB. Moreover, B ~ aAh(fl)
is a production in P. Hence, X <~ a in G. Since g(X) = X in this case, the
conclusion g(X)<Z g(Y) follows. If t~ is e and X = [bB] for some b ~ E
and B ~ N, let the left-hand side associated with right-hand side ocX[aA]fl
be [bC]. Then there is a right-hand side in P with substring bC, and
C---, BaAh(fl) is in P. Thus, b ~ a in G, and g(X) ~ g(Y) follows. The case
in which 0c = e and X = [¢B] for some B ~ N is easily handled, as is the
case where X itself is ¢ or $.
Case 2: The case X ~ Y is handled similarly to case 1.
Case 3: X 3> Y. Assume that Y ~ $. Then there is some right-hand side
+
in P~, say tx[aA]Zfl, such that [aA] 7X and Z *~ YO for some 7 and ~.
Gt G1
The form of productions in P~ again implies that either Z = Y and both are
in E U {$} or Z = [aB] and Y = [aC] or Y = a, for some B and C in N.
In any case, g(Z) = g( Y).
There must be a right-hand side in P with substring Ag(Z) because
tHere and subsequently, the symbols 4, ~, and .> refer to operator precedence rela-
tions in G and Wirth-Weber precedence relations in Gi.
Sh is the homomorphism in Algorithm 8.7.
oc[aA]Zfl is a right-hand side of P1. Since G has no single productions except

for starting productions, ~, is not e, and so g(X) is derived from A rather
than a in the derivation [aA] =~ ?X. Thus, g(X) > g(Z), and g(X) .> g(Y)
Ot
in G.
The case Y - - $ is handled easily.
To complete the proof that G1 is a weak precedence grammar, we must
show that if A --~ ~,Yfl and B - ~ fl are in P~, then neither X <Z B nor X--" B
holds. Let B -- [aC], and suppose for the moment that C ~ S. Then, since
G has no single productions without S on the left, we know that fl = Yfl'
for some fl' ~ e, where h(Y) = a. Since fl' ~ e, h(fl') has at least one ter-
minal; let b be the leftmost one. Then, since a can appear to the left of C
in a right-sentential form of G, we know that a < b in G. But since ocXfl is
a right-hand side in P~, h(fl) is a subword of some right-hand side in P, and
so a " b in G. We thus rule out the possibility that C ve S.
If C -- S, then by Theorem 8.21 (2), we must have a = (. But then, since
ocXfl is a right-hand side in P1, ~ must appear in a right-hand side in P, which
we assumed not to be the case.
We conclude that G1 is a weak precedence grammar. The proof that
L(G~) --¢L(G) is straightforward and will be omitted. [B
THEOREM 8.22
If L is a n operator precedence language, then L is a simple precedence
language.
Proof. By Theorem 8.21 and Lemma 8.15, if L __c E + is an operator pre-
cedence language, then (L is a UI weak precedence language, where ¢ is not
in X. A straightforward generalization of Algorithm 8.4 allows us to remove
the left endmarker from the grammar for eL constructed by Algorithm 8.7.
(The form of productions in Algorithm 8.7 assures that the resulting grammar
will be UI and proper.) The actual proof is left for the Exercises. By Theorem
5.16, L is a simple precedence language. U-]
8.3.3. " Chapter Summary
We have, in Chapter 8, shown the inclusion relations for classes of lan-

guages indicated in Fig. 8.7. All inclusions are proper.
Examples of languages which are contained in the regions shown in Fig.
8.7 are
(1) {0"1" ] n ~ 1} is LL(1) and operator precedence.
(2) {0"l"[n ~ 1} u {0"2"In ~> 1} is operator precedence but not EL.
(3) {aO"l"O"lm, n ~> 1} U {bO"l"O"lm, n ~ 1} u ~aO"2"OmIm, n ~ 1} is
simple precedence but neither operator precedence nor LL.
(4) {a0"l"0mIm, n > 1} U {b0ml"0"l m, n ~> 1} is LL(1) and simple pre-
cedence but not operator precedence.
EXERCISES 717
CFL
deterministic CFL
LR(1)
(1,1)- BRC
UI (2, 1) - precedence
simple MSP
simple precedence
UI weak precedence
d e
LL
operator b
precedence
Fig. 8.7 Subclasses of deterministic languages.
(5) {a0"l"ln >__ 1} U {b0"lZ"ln >__ 1] is LL(1) but not simple precedence.
(6) [aO"l"]n >. 1} w {aO"2Z"ln > I} u {b0"l Z"ln > 1} is deterministic but
not LL or simple precedence.
(7) {0"t"[n > 1} U {0"lZ"l n ~> 1} is context-free but not deterministic.
Proofs that these languages have the ascribed properties are requested
in the Exercises.
EXERCISES
8.3.1. Show that the grammar for L1 given in Theorem 8.20 is LL(2). Find an
LL(t) grammar for L 1.
*8.3.2. Prove that the language Lz = [0"al"[n > 1] U {0"b2"ln > 1] is not an
LL language. Hint: Assume that L2 has an LL(k) grammar in GNF.
*8.3.3. Show that L2 of Exercise 8.3.2 is an operator precedence language.
8.3.4. Prove that Algorithm 2.11 (elimination of single productions) preserves
the properties of
(a) Operator precedence.
(b) (m, n)-precedence.
(c) (m, n)-BRC.
*8.3.6. Why does Algorithm 8.6 not necessarily convert an arbitrary precedence
grammar into a simple precedence grammar ?
8.3.7. Convert the following operator precedence grammar to an equivalent
simple precedence grammar:
S ~ if b then S~ else S Iif b then S la

$1 ~ if b then $1 else S~la
8.3.8. Prove case 2 of Lemma 8.15.

8.3.9. Give an algorithm to remove the left endmarker from the language
generated by a grammar. Show that your algorithm preserves the UI
weak precedence property when applied to a grammar constructed by
Algorithm 8.7.
8.3.10. Show that the simple precedence language
L = {a0"l"0 m In ~ 1, m ~__ 1} U {b0"l"0"ln ~ 1, m ~> 1}
is not an operator precedence language.

Hint: Show that any grammar for L must have 1 <~ 1 and 1 .> 1 as
part of the operator precedence relations.
'8.3.11. Show that L of Exercise 8.3.10 is a simple precedence language.
• 8.3.12. Prove that the languages given in Section 8.3.3 have the properties
ascribed to them.
8.3.13. Give additional examples of languages which belong to the various
regions in Fig. 8.7.
8.3.14. Generalize Theorem 8.18 to show that L1 of that theorem is not UI
(1, k)-precedence for any k.
8.3.15. Show that Gi of Algorithm 8.7 right-covers G of that algorithm.
8.3.16. Does G~ constructed in Algorithm 8.6 right-cover G of that algorithm
if G is proper ?
• '8.3.17. Let L be the simple precedence language
(a"Oaib" [i, n ~ 0} u {Oa"laic" l i, n ~ 0}.
Show that there is no simple precedence parser for L that will announce
error immediately after reading the 1 in an input string of the form
a"laib. (See Exercise 7.3.5.)
Open Question
8.3.18. What is the relationship of the class of UI (1, k)-precedence languages,
k > 1, to the classes shown in Fig. 8.7? The reader should be aware
of Exercise 8.3,14.
BIBLIOGRAPHIC NOTES
The proper inclusion of the simple precedence languages in the deterministic

context-free languages and the proper containment of the operator precedence
languages in the simple precedence languages were first proved by Fischer [1969].
TRANSLATION A N D
9 CODE GENERATION
A compiler designer is usually presented with an incomplete specification

of the language for which he is to design a compiler. Much of the syntax of
the language can be precisely defined in terms of a context-free grammar.
However, the object code that is to be generated for each source program is
more difficult to specify. Broadly applicable formal methods for specifying the
semantics of a programming language are still a subject of current research.
In this chapter we shall present and give examples of declarative formal-
isms that can be used to specify some of the translations performed within a
compiler. Then we shall investigate techniques for mechanically implementing
the translations defined by these formalisms.
9.1. THE ROLE OF T R A N S L A T I O N IN C O M P I L I N G
We recall from Chapter 1 that a compiler is a translator that maps strings

into strings. The input to the compiler is a string of symbols that constitutes
the source program. The output of the compiler, called the object (or target)
program, is also a string of symbols. The object program can be
(1) A sequence of absolute machine instructions,
(2) A sequence of relocatable machine instructions,
(3) An assembly language program, or
(4) A program in some other language.
Let us briefly discuss the characteristics of each of these forms of the object
program.
(1) Mapping a source program into an absolute machine language pro-
720
SEC. 9 . 1 THE ROLE OF T R A N S L A T I O N IN C O M P I L I N G 721
gram that can immediately b e executed is one way of achieving very

fast compilation. W A T F O R is an example of such a compiler. This type
of compilation is best suited for small programs that do n o t use sepa-
rately compiled subroutines.
(2) A relocatable machine instruction is an instruction that references
memory locations relative to some movable origin. An object program in
the form of a sequence of relocatable machine instructions is usually Called
a relocatable object deck. This object deck can be linked together with other
object decks such as separately compiled user subprograms, input-output
routines, and library functions to produce a single relocatable object d e c k ,
often called a load module. The program that produces the load module
from a set of binary decks is called a link editor. The toad module is then
put into memory by a program called a loader that converts the relocatabie
addresses into absolute addresses, The object program is then ready for
execution.
Although linking and loading consume time, most commercial compilers
produce relocatable binary decks because of the flexibility in being able to
use extensive library subroutines and separately compiled subprograms.
(3) Translating a source program into an assembly language program
that is then run through an assembler simplifies the design of the compiler.
However, the total time now required to produce an executable machine
language program is rather high because assembling the output of this type
of compiler may take as much time as the compilation itself.
(4) Certain compilers (e.g., SNOBOL) map a source program into
another program in a special internal language. This internal program is
then executed by simulating the sequence of instructions in the internal
program. Such compilers are usually called interpreters. However, we can
view the mapping of the source program into the internal language as an
instance of compilation in itself.
In this book we shall not assume any fixed format for the output of
a compiler, although in many examples we use assembly language as the
object code. In this section we shall review compi|ing and the main processes
that map a source program into object code.
9.1.1. Phases of Compilation
In Chapter 1 we saw that a compiler can be partitioned into several

subtranslators, each of which participated in translating some representation
of the source program toward the object program. We can model each
subtranslator by a mapping that defines a phase of the compilation such
that the composition of all the phases models the entire compilation.
What we choose to call a phase is somewhatarbitrary. However, it is
convenient to think of lexicai analysis, syntactic analysis, and code generation
722 TRANSLATION AND CODE GENERATION CHAP. 9
as the main phases of compilation. However, in many sophisticated compilers

these phases are often subdivided into several subphases, and other phases
(e.g., code optimization) may also be present.
The input to a compiler is a string of symbols. The lexical analyzer is
the first phase of compilation. It maps the input into a string consisting of
tokens and symbol table entries. During lexical analysis the original source
program is compressed in size somewhat because identifiers are replaced by
tokens and unnecessary blanks and comments are removed. After lexical
analysis, the original source program is still essentially a string, but certain
portions of this string are pointers to information in a symbol table.
A finite transducer is a good model for a lexical analyzer. We discussed
finite transducers and their application to lexical analysis in Section 3.3.
tn Chapter 10 we shall discuss techniques that can be used to insert and
retrieve information from symbol tables.
The syntax analyzer takes the output of the lexical analyzer and parses
it according to some underlying grammar. This grammar is similar to the one
used in the specification of the source language. However, the grammar for
the source language usually does not specify what constructs are to be
treated as lexical items. Keywords and identifiers such as labels, variable
names, and constants are some of the constructs that are usually recognized
during lexical analysis. But these constructs could also be recognized by
the syntax analyzer, and in practice there is no hard and fast rule as to what
constructs are to be recognized lexically and what should be left for the
syntactic analyzer.
. After syntactic analysis, we can visualize that the source program has
been transformed into a tree, called the syntax tree. The syntax tree is closely
related to the derivation tree for the source program, often being the deri-
vation tree with chains of single productions deleted. In the syntax tree
interior nodes generally correspond to operators, and the leaves represent
operands consisting of pointers into the symbol table. The structure of
the syntax tree reflects the syntactic rules of the programming language in
which the source program was written. There are several ways of physically
representing the syntax tree, which we shall discuss in the next section. We
shall call a representation of the syntax tree an intermediate program.
The actual output of the syntactic analyzer can be a sequence of com-
mands to construct the intermediate program, to consult and modify the
symbol table, and to produce diagnostic messages where necessary. The
model that we have used for a syntactic analyzer in Chapters 4-7 produced
a left or right parse for the input. However, it is the nature of syntax trees
used in practice that one may easily replace the production numbers in a left
or right parse by commands that construct the syntax tree and perform
symbol table operations, Thus, it is an appealing simplification to regard the
left or right parses produced by the parsers of Chapters 4-7 as the inter-
SEC. 9.1 THE ROLE OF T R A N S L A T I O N IN C O M P I L I N G 723
mediate program itself, and to avoid making a major distinction between a

syntax tree and a parse tree.
A compiler must also check that certain semantic conventions of the
source language have been obeyed. Some common examples of conventions
of this nature are
(1) Each statement label referenced must actually appear as the label of
an appropriate statement in the source program.
(2) No identifier can be declared more than once.
(3) All variables must be defined before use.
(4) The arguments of a function call must be compatible both in number
and in attributes with the definition of the function.
The checking of these conventions by a compiler is called semantic analy-

sis. Semantic analysis often occurs immediately after syntactic analysis, but
it can also be done in some later phases of compilation. For example, check-
ing that variables are defined before use can be done during code optimization
if there is a code optimization phase. Checking of operands for correct
attributes can be deferred to the code generation phase.
In Chapter 10, we shall discuss property grammars, a formalism that
can be used to model aspects of syntactic and semantic analysis.
After syntactic analysis, the intermediate program is mapped by the code
generator into the object program. However, to generate correct object code,
the code generator must also have access to the information in the symbol
table. For example, the attributes of the operands of a given operator deter-
mine the code that is to be generated for the operator. For instance, when
A and B are floating-point variables, different object code will be generated
for A + B than will be generated when A and B are integer variables.
Storage allocation also occurs during code generation (or in some sub-
phase of code generation). Thus, the code generator must know whether
a variable represents a scalar, an array, or a structure. Tbis information is
contained in the symbol table.
Some compilers have an optimizing phase before code generation. In this
optimizing phase the intermediate program is subjected to transformations
that attempt to put the intermediate program into a form from which a more
efficient object language program can be produced. It is often difficult to
distinguish some optimizing transformations from good code generation
techniques. In Chapter 11 we shall discuss some of the optimizations that
can be performed on intermediate programs or while generating intermediate
programs.
An actual compiler may implement a phase of the compilation process
in one or more passes, where a pass consists of reading input from secondary
memory and then writing intermediate results into secondary memory.
What is implemented in a pass is a function of the size of the machine on
which the compiler is to run, the language for which the compiler is being
developed, the number of people engaged in implementing the compiler,
and so forth. It is even possible to implement all phases in one pass. What
the optimal number of passes to implement a given compiler should be is
a topic that is beyond the scope of this book.
9.1.2. Representations of the Intermediate Program
In this section we shall discuss some possible representations for the

intermediate program that is produced by the syntactic analyzer. The inter-
mediate program should reflect the syntactic structure of the source program.
However, it should also be relatively easy to translate each statement of the
intermediate program into machine code.
Compilers performing extensive amounts of code optimization create
a detailed representation of the intermediate program, explicitly showing
the flow of control inherent in the source program. In other compilers the
representation of the intermediate program is a simple representation of
the syntax tree, such as Polish notation. Other compilers, doing little code
optimization, will generate object code as the parse proceeds. In this case,
the "intermediate" program appears only figuratively, as a sequence of steps
taken by the parser.
Some of the more common representations for the intermediate program
are
(1) Postfix Polish notation,
(2)' Prefix Polish notation,
(3) Linked list structures representing trees,
(4) Multiple address code with named results, and
(5) Multiple address code with implicitly named results.
Let us examine some examples of these representations.
We defined Polish notation for arithmetic expressions in Section 3.1.1.
For example, the assignment statement
(9.1.1) A--B+C,-- D
with the normal order of precedence for the operators and assignment symbol
(--) has the postfix Polish representationt
ABCD- • -t- :
and the prefix Polish representation
= A + B , C - - D
tin this representation, -- is a unary operator and ,, +, a n d : are all binary operators.
SEC. 9.1 THE ROLE OF TRANSLATION IN COMPILING 725
In postfix Polish notation, the operands appear from left to right in the
order in which they are used. The operators appear right after the operands
and in the order in which they are used. Postfix Polish notation is often used
as an intermediate language by interpreters. The execution phase of the inter-
preter can evaluate the postfix expression using a pushdown list for an accu-
mulator. (See Example 9.4.)
Both types of Polish expressions are linear representations of the syntax
tree for expression (9.1.1), shown in Fig. 9.1. This tree reflects the syntactic
structure of expression (9.1.1). We can also use this tree itself as the inter-
mediate program, encoding it as a linked list structure.
91 Syntax tree.
Another method of encoding of the syntax tree is to use multiple address

code. For example, using multiple address code with named results, expres-
sion (9.1.1) could be represented by the following sequence of a s s i g n m e n t
statements"
T I -,t O
Tz +---- *CT1
T3 ~, q-BTz
A ~ T3"~
A statement of the form A +-- OB 1 . . . B r means that r-ary operator 0 is to

be applied to the current values of variables B~, • • -, B r and that the resulting
tNote that the assignment operator must be treated differently from other operators.
A simple "optimization" is to replace T3 by A in the third line and to delete the fourth
line.
value is to be assigned to variable A. We shall formally define this particular

intermediate language in Section 11.1.
Multiple address code with named results requires a temporary variable
name in each assignment instruction to hold the value of the expression
computed by the right-hand side of each statement. We can avoid the use of
the temporary variables by using multiple address code with implicitly
named results. In this notation we label each assignment statement by
a number. We then delete the left-hand side of the statement and reference
temporary results by the number assigned to the statement generating the
temporary result. For example, expression (9.1.1) would be represented by
the sequence
1: --D
2: , C (1)
3: +B(2)
4: = A (3)
Here a parenthesized number refers to the expression labeled by that number.

Multiple address code representations are computationally convenient if
a code optimization phase occurs after the syntactic analysis phase. During
the code optimization phase the representation of the program may change
considerably. It is much more difficult to make global changes on a Polish
representation than a linked list representation of the intermediate program.
As a second example let us consider representations for the statement
(9.1.2) if I : J then S~ else $2
where S 1 andS2 represent arbitrary statements.

A possible postfix Polish representation for this statement might be
I J EQUAL? L 2 JFALSE S'1 L JUMP S~
Here, S'i and S~ are the postfix representations of $1 and $2, respectively;
EQUAL ? is a Boolean-valued binary operator that has the value true if its
two arguments are equal and false otherwise. L2 is a constant which names
the beginning of S~. JFALSE is a binary operator which causes a jump to
the location given by its second argument if the value of the first argument is
false and has no effect if the first argument is true. L is a constant which is
the first instruction following S~. JUMP is a unary operator that causes
a jump to the location giqen by its argument.
A derivation tree for statement (9.1.2) is shown in Fig. 9.2. The important
syntactic information in this derivation tree can be represented by the syntax
tree shown in Fig. 9.3, where S'1 and S~ represent syntax trees for $1 and $2.
SEC. 9.1 THE ROLE OF TRANSLATION IN COMPILING 727
< if statement >
if < expression > then < statement > else < statement >
< expression > < relop > < var >
< var > = J
Fig. 9.2 Derivation tree.
< if statement >
Fig. 9.3 S y n t a x tree.
A syntactic analyzer generating an intermediate program would parse

expression (9.1.2) tracing out the tree in Fig. 9.2, but its output would be
a sequence of commands that would construct the syntax tree in Fig. 9.3.
In general, in the syntax tree an interior node represents an operator
whose operands are given by its direct descendants. The leaves of the syntax
tree correspond to identifiers. The leaves will actually be pointers into the
symbol table where the names and attributes of these identifiers are kept.
Part of the output of the syntax analyzer will be commands to enter
information into the symbol table. For example, a source language state-
ment of the form
INTEGER I
will be translated into a command that enters the attribute "integer" in the
symbol table location reserved for identifier L There will be no explicit repre-
sentation for this statement in the intermediate program.
9.1.3. M o d e l s for Code Generation
Code generation is a mapping from the intermediate program to a string.

We shall consider this mapping to be a function defined on the syntax tree
and information in the symbol table. The nature of this mapping depends
on the source language, the target machine, and the quality of the object
code desired.
One of the simplest code generation schemes would map each multiple
address statement into a sequence of object language instructions independent
of the context of the multiple address instructions. For example, an assign-
ment instruction of the form
A < -+-BC
might be mapped into the three machine instructions
LOAD B
ADD C
STORE A
Here we are assuming a one accumulator machine, where the instruction

LOAD B places the value of memory location B in the accumulator, A D D
C addst the value of memory location C to the accumulator, and STORE A
places the value in the accumulator into memory location A. The STORE
instruction leaves the contents of the accumulator unchanged.
However, if the accumulator initially contained the value of memory
location B (because, for example, the previous assignment instruction was
B ~ - q - D E ) , then the LOAD B instruction would be unnecessary. Also,
if the next assignment instruction is F ~ - F A G and no other reference is
made to A, then the STORE A instruction is not needed.
In Sections 11.1 and 11.2 of Chapter 11 we shall consider some techniques
for generating code from multiple address statements.
EXERCISES
9.1.1. Draw syntax trees for the following source language statements:
(a) A = (B -- C)/(B + C) (as in FORTRAN).
(b) I = LENGTH(C111 C2) (as in PL/I).
(c) if B > C then
if D > E t h e n , 4 " = B - q - C e l s e A " = B - - C
else A := B • C (as in ALGOL).
tLet us assume for simplicity that there is only one type of arithmetic. If more than
one, e.g., fixed and floating, is available, then the translation of + will depend on symbol
table information about the attributes of B and C.
9.1.2. Define postfix Polish representations for the programs in Exercise 9.1.1.
9.1.3. Generate multiple address code with named results for the statements
in Exercise 9.1.1.
9.1.4. Construct a deterministic pushdown transducer that maps prefix Polish
notation into postfix Polish notation.
9.1.5. Show that there is no deterministic pushdown transducer that maps
postfix Polish notation into prefix Polish notation. Is there a nondeter-
ministic pushdown transducer that performs this mapping? Hint" see
Theorem 3.15.
9.1.6. Devise an algorithm using a pushdown list that will evaluate a postfix
Polish expression.
9.1.7. Design a pushdown transducer that takes as input an expression w in
L(Go) and produces as output a sequence of commands that will build
a syntax tree (or multiple address code) for w.
9.1.8. Generate assembly code for your favorite computer for the programs
in Exercise 9.1.1.
• 9.1.9. Devise algorithms to generate assembly code for your favorite computer
for intermediate programs representing arithmetic assignments when
the intermediate program is in
(a) Postfix Polish notation.
(b) Multiple address code with named results.
(c) Multiple address code with implicitly named results.
(d) The form of a syntax tree.
"9.1.10. Design an intermediate language that is suitable for the representation
of some subset of F O R T R A N (or PL/I or ALGOL) programs. The
subset should include assignment statements and some control state-
ments. Subscripted variables should also be allowed.
• "9.1.11. Design a code generator that will map an intermediate program of
Exercise 9.1.10 into machine code for your favorite computer.
BIBLIOGRAPHIC NOTES
Unfortunately, it is impossible to specify the best object code even for common
source language constructs. However, there are several papers and books that
discuss the translation of various programming languages. Backus et al. [1957]
give the details of an early F O R T R A N compiler. Randell and Russell [1964] and
Grau et al. [1967] discussthe implementation of ALGOL 60. Some details of PL/I
implementation are given in IBM [1969].
There are many publications describing techniques that are useful in code
generation. Knuth [1968a] discusses and analyzes various storage allocation tech-
niques. Elson and Rake [1970] consider the generation of code from a tree-
structured intermediate language. Wilcox [1971] presents some general models
for code generation.
?
9.2. SYNTAX-DIRECTED TRANSLATIONS
In this section we shall consider a compiler model in which syntax analysis

and code generation are combined into a single phase. We can view such
a model as one in which code generation operations are interspersed with
parsing operations. The term syntax-directed compiling is often used to
describe this type of compilation.
The techniques discussed here can also be used to generate intermediate code
instead of object or assembly code, and we give one example of translation to
intermediate code. The remainder of our examples are to assembly code.
Our starting point is the syntax-directed translation scheme of Chapter 3.
We show how a syntax-directed translation can be implemented on a deter-
ministic pushdown transducer that does top-down or bottom-up parsing.
Throughout this section we shall assume that a D P D T uses a special symbol
$ to delimit the right-hand end of the input string. Then we add various
features to make the SDTS more versatile. First, we allow semantic rules
that permit more than one translation to be defined at various nodes of
the parse tree. We also allow repetition and conditionals in the formulas
for these transIations. We then consider translations which are not strings;
integers and Boolean variables are useful additional types of translations.
Finally, we allow translations to be defined in terms of other translations
found not only at the direct descendants of the node in question but at its
direct ancestor.
First, we shall show that every simple SDTS on an LL grammar can be
implemented by a deterministic pushdown transducer. We shall then inves-
tigate what simple SDTS's on an LR grammar can be so implemented. We
shall discuss an extension of the DPDA, called a pushdown processor, to
implement the full class of SDT's whose underlying grammar is LL or LR.
Then, the implementation of syntax-directed translations in connection with
backtrack parsing algorithms is studied briefly.
9.2.1. Simple Syntax-Directed Translations
In Chapter 3 we saw that a simple syntax-directed translation scheme can

be implemented by a nondeterministic pushdown transducer. In this section
we shall discuss deterministic implementations of certain simple syntax-
directed translation schemes.
The grammar G underlying an S DTS can determine the translations
definable on L(G) and the efficiency with which these translations can be
directly implemented.
Example 9.1
Suppose that we wish to map expressions generated by the grammar G,
below, into prefix Polish expressions, on the assumption that ,'s are to take
SEC. 9.2 SYNTAX-DIRECTED TRANSLATIONS 7 3 '1
precedence over ÷ ' s , e.g., a , a q - a has prefix expression -t-* aaa, not
• a + aa. G is given by E --~ a ÷ E[ a • El a. However, there is no syntax-
directed translation scheme which uses G as an underlying grammar and
which can define this translation. The reason for this is that the output gram-
mar of such an SDTS must be a linear CFG, and it is not difficult to show
that the set of prefix Polish expressions over [-?, ,, a} corresponding to the
infix expressions in L ( G ) is not a linear context free language. However, this
particular translation can be defined using a simple SDTS with G O [except
for production F---~ (E)] as the underlying grammar.
In Theorem 3.14 we showed that if (x, y) is an element of a simple syntax-

directed translation, then the output y can be generated from the left parse
of x using a deterministic pushdown transducer. As a consequence, if a gram-
mar G can be deterministically parsed top-down in the natural manner using
a D P D T M, then M can be readily modified to implement any simple SDTS
whose underlying grammar is G. If G is an LL(k) grammar, then any simple
SDT defined on G can be implemented by a D P D T in the following manner.
THEOREM 9.1
Let T = (N, E, A, R, S) be a semantically unambiguous simple SDTS
with an underlying LL(k) grammar. Then [(x$, y) l (x, y) ~ z(T)} can be de-
fined by a deterministic pushdown transducer.
P r o o f The proof follows from the methods of Theorems 3.14 and 5.4.
If G is the underlying grammar of T, then we can construct a k-predictive
parsing algorithm for G using Algorithm 5.3. We can construct a D P D T
M with an endmarker to simulate this k-predictive parser and perform the
translation as follows.
Let A ' = [ a ' l a ~ A}, assume A' ~ E = ~ , and let h ( a ) = a' for all
a E A. The parser of Algorithm 5.3 repeatedly replaces nonterminals by
right-hand sides of productions, carrying along LL(k) tables with the non-
terminals. M will do essentially the same thing. Suppose that the left parser
replaces A by w o B l w ~ . . . Brow m [with some LL(k) tables, not shown, append-
ed to the nonterminals]. Let
A ~ woBlwl... BmWm, xoB~x ~ . . . BmX m
be a rule of R. Then M will replace A by woh(xo)BlWlh(Xl) .. : B m W m h ( x m ) .

As in Algorithm 5.3, whenever a symbol of E appears on top of the push-
down list, it is compared with the current input symbol, and if a match occurs,
the symbol is deleted from the pushdown list, and the input head moves
right one cell. When a symbol a' in A' is found on top of the pushdown list,
M emits a and removes a' from the top of the pushdown list without moving
the input head. M does not emit production indices.
A more formal construction of M is left to the reader.
Example 9.2
Let T be the simple SDTS with the rules
S ~ aSbSc, 1S2S3
S= >d, 4
The underlying grammar is a simple LL(1) grammar. Therefore, no LL

tables need be kept on the stack. Let M be the deterministic pushdown
transducer (Q, Z, A, F, ~, q, S, {accept}), where
Q = [q, accept, error}

Z ---- [a, b, c, d, $}
r=[s,s}uz ua
A = [1, 2, 3, 4}
Since Z ¢q A -- ~ , we shall let A' -- A. ~ could be defined as follows"
,~(q, a, s) = (q, Sb2Sc3, 1)t

t~(q, d, S) = (q, e, 4)
fi(q, b, b) = (q, e, e)
6(q, c, c) = (q, e, e)
•6(q, e, 2) = (q, e, 2) -
6(q,e, 3) ---- (q, e, 3)
~(q, $, $) = (accept, e, e)
Otherwise,
~(q, X, Y) = (error, e, e)
It is easy to verify that ~(M) = {(x$, y)[ (x, y) E v(T)}.
Let us now consider a simple SDTS in which the underlying grammar is

LR(k). Since the class of LR(k) grammars is larger than the class of LL(k)
grammars, it is interesting to investigate what class of simple SDTS's with
underlying LR(k) grammars can be implemented by DPDT's. It turns out
that there are semantically unambiguous simple SDT's which have an under-
lying LR(k) grammar but which cannot be performed by any DPDT. Intui-
tin these rules we have taken the liberty of producing an output symbol and shifting
the input as soon as a production has been recognized, rather than doing these actions
in separate steps as indicated in the proof of Theorem 9.1.
SEe. 9.2 SYNTAX-DIRECTED T R A N S L A T I O N S 733
tively, the reason for this is that a translation element may require the
generation of output long before it can be ascertained that the production
to which this translation element is attached is actually used.
Example 9.3
Consider the simple SDTS T with the rules
S ~ Sa, aSa
S- ~ Sb, bSb
S----> e, e
The underlying grammar is LR(1), but by Lemma 3.15 there is no D P D T

defining {(x$, y) l (x, y) ~ v(T)}. The intuitive reason for this is that the first
rule of this SDTS requires that an a be emitted before the use of production
S ~ Sa can be recognized. [~]
However, if a simple SDTS with an underlying LR(k) grammar does

not require the generation of any output until after a production is recog-
nized, then the translation can be implemented on a DPDT.
DEFINITION
An SDTS T = (N, E, A, R, S) will be called a p o s t f i x SDTS if each rule
in R is of the form A ~ ~, fl, where fl is in N'A*. That is, each translation
element is a string of nonterminals followed by a string of output symbols.
THEOREM 9.2
Let T = (N, E, A, R, S) be a semantically unambiguous simple postfix
SDTS with an underlying LR(k) grammar. Then {(x$, y)l(x, y)e z(T)}
can be defined by a deterministic pushdown transducer.
P r o o f Using Algorithm 5.1 I, we can construct a deterministic right parser
M with an endmarker for the underlying LR(k).grammar. However, rather
than emitting the production number, M can emit the string of output
symbols of the translation element associated with that production. That
is, if A --~ tx, f i x is a rule in R where fl E N* and x ~ A*, then when the
production number of A --~ e is to be emitted by the parser, the string x
is emitted by M. In this fashion M defines {(x$, y)l(x, y) ~ ~r(T)}. [Z]
We leave the converse of Theorem 9.2, that every deterministic pushdown

transduction can be expressed as a postfix simple SDTS on an LR(1) gram-
mar, for the Exercises.
Example 9.4
Postfix translations are more useful than it might appear at first. Here,
let us consider an extended D P D T that maps the arithmetic expressions of
L(Go) into machine code for a very convenient machine. The computer for
this example has a pushdown stack for an accumulator. The instruction
LOAD X
puts the value held in location X on top of the stack; all other entries on
the stack are pushed down. The instructions A D D and MPY, respectively,
add and multiply the top two levels of the stack, removing the two levels
but then pushing the result on top of the stack. We shall use semicolons to
separate these instructions.
The SDTS we have in mind is
E >E+T, ET 'ADD;'
E >T, T
T >T,F, TF 'MPY;'
T >F, F
F >(E), E
F > a, ' L O A Da;'
In this example and the ones to follow we shall use the SNOBOL convention
of surrounding literal strings in translation rules by quote marks. Quote
marks are not part of the output string.
With input a + ( a . a)$, the D P D T enters the following sequence of
configurations; we have deleted the LR(k) tables from the pushdown list.
We have also deleted certain obvious configurations from the sequence as well
as the states and bottom of stack marker.
[e, a ÷ (a • a)$, e]
lad ÷(a.a)$, e]
--[F, -q-- (a • a)$, LOAD a;]
~[E, •-t- (a • a)$, LOAD a;]
I--L-[E + (a, • a)$, LOAD a;]
!--[E + (F, • a)$, LOAD a; LOAD a;]
- - [ E + (T, • a)$, LOAD a; LOAD a;]
[_L [E -+- ( T , a, )$, LOAD a; LOAD a;]
--[E + (T, F, )$, LOAD a; LOAD a; LOAD a;]
--[E + (T, )$, LOAD a; LOAD a; LOAD a; MPY;]
1--[E --q- (E, )$, LOAD a; LOAD a; LOAD a; MPY;]
SEC. 9.2 SYNTAX-DIRECTED TRANSLATIONS 735
[E + (E), $, LOAD a; LOAD a; LOAD a; MPY;]

~-3--[E -k T, $, LOAD a; LOAD a; LOAD a; MPY;]
~[E, s, LOAD a; LOAD a; LOAD a; MPY; ADD;]
Note that if the different a's representing identifiers are indexed, so that
the input expression becomes, say, a 1 + (a z • a3), then the output code
would be
LOAD a,
LOAD a2
LOAD a 3
MPY
ADD
which computes the expression correctly on this machine. ~]
While the computer model used in Example 9.4 was designed for the
purpose of demonstrating a syntax-directed translation of expressions with
few of the complexities of generating code for more common machine models,
the postfix scheme is capable of defining useful classes.of translations. In
the remainder of this chapter, we shall show how object code can be generated
for other machine models using a pushdown transducer operating on what is
in essence a simple postfix SDTS with an underlying LR grammar.
Suppose that we have a simple SDTS which has an underlying LR(k)
grammar, but which is not postfix. How can such a translation be performed ?
One possible technique is to use the following multipass translation scheme.
This technique illustrates a cascade connection of DPDT's. However, in
practice this translation would be implemented in one pass using the tech-
niques of the next section for arbitrary SDTS's.
Let T = (N, E, A, R, S) be a semantically unambiguous simple SDTS
with an underlying LR(k) grammar G. We can design a four-stage translation
scheme to implement r(T). The first stage consists of a DPDT. The input to
the first stage is the input string w$. The output of the first stage is z~, the right
parse for w according to the underlying input grammar G. The second stage
reverses n to create nR, the right parse in reverse.t
The input to the third stage will be rcR. The output of the third stage will
be the translation defined by the simple SDTS T' = (N, E', A, R', S), where
tRecall that n, the right parse, is the reverse of the sequence of productions used in a
rightmost derivation. Thus, nR begins with the first production used and ends with the
last production used in a rightmost derivation. To obtain nR we can merely read the
buffer in which z~is stored backwaM.
R' contains the rule
A ~ > iBmBm_~ " " Bi, ymB,, "" ylBlyo
if and only if . . . BmXr,, y o B l y ~ " " BmYm is a rule in R and

A ----, x o B l x ~
is the ith production in the underlying LR(k) grammar.
A ---, x o B t x ~ • •. BmXm
It is easy to prove that (z~R, yR) is in v(T') if and only if (S, S ) 7 n " (w, y).
T' is a simple SDTS based on an LL(1) grammar and can thus be imple-
mented on a DPDT. The fourth stage merely reverses the output of the third
stage. If the output of the third stage is put on a pushdown list, then the fourth
stage merely pops off the symbols from the pushdown list, emitting each
symbol as it is popped off.
Figure 9.4 summarizes this procedure.
rr, the fight parse for w

Stage
1
7rR
Stage
2
71.R yR
Stage
3
yR
Stage
4
Fig. 9.4 Simple S D T on an L R ( k ) g r a m m a r .
Each of these three stages requires a number of basic operations that is

linearly proportional to the length of w. Thus, we can state the following
result.
THEOREM 9.3
Any simple SDTS with an underlying LR(k) grammar can be implemented
in time proportional to its input length.
Proof A formalization of the discussion above. [-]
9.2.2. A Generalized Transducer
While the pushdown transducer is adequate for defining all simple SDTS's
on an LL grammar and for some simple SDTS's on an LR grammar, we
need a more versatile model of a translator when doing
(1) Nonsimple SDTS's,
(2) Nonpostfix simple SDTS's on an LR grammar,
(3) Simple SDTS's on a non-LR grammar, and
(4) Syntax-directed translation when the parsing is not deterministic
one pass, such as the algorithms of Chapters 4 and 6.
We shall now define a new device called a pushdown processor (PP) for
defining syntax-directed translations that map strings into graphs. A push-
down processor is a PDT whose output is a labeled directed graph, generally
a tree or part of a tree which the processor is constructing. The major feature
of the PP is that its pushdown list, in addition to pushdown symbols, can
hold pointers to nodes in the output graph.
Like the extended PDA, the pushdown processor can examine the top k
cells of its pushdown list for any finite k and can manipulate the contents of
these cells arbitrarily. Unlike the PDA, if these k cells include some pointers
to the output graph, the PP can modify the output graph by adding or deleting
directed edges connected to the nodes pointed to. The PP can also create
new nodes, label them, create pointers to them, and create edges between
these nodes and the nodes pointed to by those pointers on the top k cells of
the pushdown list.
As it is difficult to develop a concise lucid notation for such manipulations,
we shall use written descriptions of the moves of the PP. Since each move of
the PP can involve only a finite number of pointers, nodes, and edges, such
descriptions are, in principle, possible, but we feel that a formal notation
would serve to obscure the essential simplicity of the translation algorithms
involved. We proceed directly to an example.
Example 9.5
Let us design a pushdown processor P to map the arithmetic expressions
of L(G o) into syntax trees. In this case, a syntax tree will be a tree in which
each interior node is labeled by + or • and leaves are labeled by a. The fol-
lowing table gives the parsing and output actions that the pushdown proces-
sor is to take under various combinations of current input symbol and
symbol on top of the pushdown list. P has been designed from the SLR(1)
parser for G Ogiven in Fig. 7.37 (p. 625).
However, here we have eliminated table T4 from Fig. 7.37, treating
F ~ a as a single production, and have renamed the tables as follows"
Old name To T1 T5 T6 T7 T8 T9 T10 Tll
New name To T1 ( --~- * Z2 T3 T4 )
In addition $ is used as a right endmarker on the input. The parsing and out-
put actions of P are given in Figs. 9.5 and 9.6. P uses the LR(1) tables to
determine its actions. In addition P will attach a pointer to tables T1, T2, T3,
and T4 when these tables are placed on the pushdown list.t These pointers
are to the output graph being generated. However, the pointers do not affect
the parsing actions. In parsing we shall not place the g r a m m a r symbols on
the pushdown list. However, the table names indicate what that g r a m m a r
symbol would be.
The last column of Fig. 9.5 gives the new LR(1) table to be placed on
top of the pushdown list after a reduce move. Blank entries denote error
situations. A new input symbol is read only after a shift move. The numbers
in the table refer to the actions described in Fig. 9.6.
Let us trace the behavior of P on input a 1 • (a 2 + a3)$. We have sub-
scripted the a's for clarity. The sequence of moves m a d e by P is as follows:
Pushdown List Input
(1) To al * (a2 --t-a3)$

(2) To[TI,pl] * (a2 -+- a3)$
(3) To[Ti,pl] * (a2 -]- a3)$
(4) To[Ti,pl] * ( a2 -1- a3)$
(5). To[Ti,pa] * ([T2,P2] -q- a3)$
(6) To[TI,pl] * ([T2,P2] + a3)$
(7) T o [ T i , p l ] * ([T2,p2] + [T3,P3] )$
(8) To[Ti,pl] * ([T2,P4] )$
(9) To[Ti,pl] * ([T2,p4]) $
(10) T o [ T t , p l ] * [T4, P4] , $
(11) To[Ti,p 5] $
Let us examine the interesting moves in this sequence. In going from

configuration (1) to configuration (2), P creates a node n I labeled a and sets
a p o i n t e r p l to this node. The pointer pl is stored with the LR(1) table T1 on
the pushdown list. Similarly, going from configuration (4) to configuration
(5), P creates a new node n 2 labeled a 2 and places a pointer P2 to this node
on the pushdown list with LR(1) table Tz. In configuration (7) another node
n 3 labeled a3 is created and P3 is established as a pointer to n 3.
tin practice, these pointers can be stored immediately below the tables on the push-
down list.
SEC. 9 . 2 SYNTAX-DIRECTED TRANSLATIONS 739
\ Input
~ Symbol action
Table on
Top of
Pushdown List "~ a
+ , ( ) $ goto
To 1 shift ( rl
Ti shift + shift accept
r2 shift + shift shift )
2 s~ft 2
~q 3 3
+ shift ( r3
shift ( r4
( shift ( ~5
)
Fig. 9.5 Pushdown processor P.
(1) Create a new node n labeled a. Push the symbol [T1, p] on top of the pushdown list,
where p is a pointer to n. Read a new input symbol.
(2) At this point, the top of the pushdown list contains a string of four symbols of the
form X[T~, p 1] + [Ti, p z], where p i andp 2 are pointers to nodes n 1and nz, respectively.
Create a new node n labeled Jr-. Make nl and nz the left and right direct descendants
of n. Replace [Ti, pl] + [Ti, p2] by [T,p], where T = goto(X) and p is a pointer to
node n.
(3) Same as (2) above with • in place of ÷.
(4) Same as (1) with T3 in place of Ti.
(5) Same as (1) with T4 in place of T~.
(6) Same as (1) with T~ in place of T1.
(7) The pushdown list now contains X([T,p]), where p is a pointer t o s o m e node n.
Replace ([T, p]) by [T', p], where T ' = goto(X).
Fig. 9.6 Processor output actions.
In going to c o n f i g u r a t i o n (8), P creates the o u t p u t g r a p h s h o w n in Fig.

9.7. Here, P4 is a p o i n t e r to n o d e n4. After entering configuration (11),
the o u t p u t g r a p h is as s h o w n in Fig. 9.8. H e r e p~ is a p o i n t e r to n o d e ns.
In configuration (11), the action of T1 on i n p u t $ is accept, a n d so the tree
in Fig. 9.8 is the final o u t p u t . This tree is the s y n t a x tree for the expression
al * (az + a3). [~
P41
nl n3
Fig. 9.7 Output after configuration (8).
//5
/'//3
Fig. 9.8 Final output.
9.2.3. Deterministic One-Pass Bottom-Up Translation
This is the first of three sections showing how various parsing algorithms
can be extended to implement an SDTS by the use of a deterministic push-
down processor instead of a pushdown transducer. We begin by giving
an algorithm to implement an arbitrary SDTS with an underlying LR
grammar.
ALGORITHM 9. i
SDTS on an LR grammar.
Input. A semantically unambiguous SDTS T = (N, E, A, R, S) with
underlying LR(k) grammar G = (N, E, P, S) and an input w in E*.
Output~ An output tree whose frontier is the output for w.

Method. The tree is constructed by a pushdown processor M, which
simulates ~, an LR(k) parser for G. M will hold on its pushdown list (top
at the right) the symbols in N U ~ and the LR(k) tables exactly as ~ does.
In addition, immediately below each nonterminal, M will have a pointer to
the output graph being generated. The shift, reduce, and accept actions of M
are as follows"
(1) When ~ shifts symbol a onto its stack, M does the same.
(2) If ~ reduces X1 .-- Xm to nonterminal A, M does the following"
(a) Let A--~ X i . . . Xm, WoBlWl"'" Brwr be a rule of the SDTS,
where the B's are in one-to-one correspondence with those of
the X's which are nonterminals.
(b) M removes X1 " - Xm from the top of its stack, along with inter-
vening pointers, LR(k) tables, and, if X1 is in N, the pointer
immediately below X~.
(c) M creates a new node n, labels it A, and places a pointer to n
below the symbol A on top of its stack.
(d) The direct descendants of n have labels reading, from the left,
WoB lw~ . . . Brwr. Nodes are created for each of the symbols of
the w's. The node for Bi, 1 ~ i ~ r, is the node pointed to by
the pointer which was immediately below Xj on the stack of M,
where Xj is the nonterminal corresponding to B; in this particular
rule of the SDTS.
(3) If t~'s input becomes empty (we have reached the right endmarker)
and only p S and two LR(k) tables appear on M's pushdown list, then M
accepts if ~ accepts; p points to the root of the output tree of M. [~
Example 9.6
Let Algorithm 9.l be applied to the SDTS
S > aSA, OAS

S- >b, 1
A > bAS, 1SA
A ~a, 0
with input abbab. The underlying grammar is SLR(1), and we shall omit
discussion of the LR(1) tables, assuming that they are there and guide the
parse properly. We shall list the successive stack contents entered by the push-
down processor and then show the tree structure pointed to by each of the
pointers. The LR(1) tables on the stack have been omitted.
742 TRANSLATIONAND CODE GENERATION CHAP. 9
(e, abbab$) [_z_(ab, bab$)

[-- (aplS, bab$)
(ap ~Sba, b $)
~-- (ap,Sbp2A, b$)
[--- (aplSbpzAb , $)
[-- (apiSbpzAp3S, $)
~-- (ap~Sp4A, $)
k- (p~S, s)
The trees constructed after the third, fifth, seventh, eighth, and ninth
indicated configurations are shown in Fig. 9.9(a)-(e).
Note that when bp2Ap3S is reduced to A, the subtree formerly pointed to
by P3 appears to the left of that pointed to by P2, because the translation
element associated with A --~ bSA permutes the S and A on the right. A simi-
lar permutation occurs at the final reduction. Observe that the yield of Fig.
9.9(e) is 01101.
THEOREM 9.4
Algorithm 9.1 correctly produces the translation of the input word

according to the given SDTS.
Proof Elementary induction on the order in which the pointers are created
by Algorithm 9.1. [---]
We comment that if we "implement" a pushdown processor on a reason-

able random access computer, a single move of the PP can be done in a finite
number of steps of the random access machine. Thus, Algorithm 9.1 takes
time which is a linear function of the input length.
9.2.4. Deterministic One-Pass Top-Down Translation
Suppose that we have a predictive (top-down) parser. Converting such

a parser to a translator requires a somewhat different approach from the way
in which a bottom-up parser was converted into a translator. Let us suppose
that we have an SDTS with an underlying LL grammar. The parsing process
builds a tree top-down, and at any stage in the process, we can imagine that
a partial tree has been constructed. Those leaves in this partial tree labeled
by nonterminals correspond to the nonterminals contained on the pushdown
list of the predictive parser generated by Algorithm 5.3. The expansion of
a nonterminal is tantamount to the creation of descendants for the corre-
sponding leaf of the tree.
The translation strategy is to maintain a pointer to each leaf of the "cur-
I =1
(a) (b) (c)
1 ,i t.,i
(d) (e)
Fig. 9.9 Translation by pushdown processor.
rent" tree having a nonterminal label. While doing ordinary LL parsing,

that pointer will be kept on the pushdown list immediately below the nonter-
minal corresponding to the node pointed to. When a nonterminal is expanded
according to some production, new leaves are created for the corresponding
translation element, and nodes with nonterminal labels are pointed to by
newly created pointers on the pushdown list. The pointer below the expanded
nonterminal disappears. It is therefore necessary to keep, outside the push-
down processor, a pointer to the root of the tree being created. For example,
if the pushdown list contains Ap, and the production A ~ aBbCc, with
translation element 0CIB2, i~ used to expand A, then the pu~hdown processor
will replace A p by aBpsbCp2c. If p pointed to the leaf labeled A before the

expansion, then after the expansion A is the root of the following subtree"
where P l points to node B and P2 points to node C.

ALGORITHM 9.2
Top-down implementation of an SDTS on an LL grammar.
Input. A semantically unambiguous SDTS T = (N, X, A, R, S) with
an underlying LL(k) grammar G = (N, X, P, S).
Output. A pushdown processor which produces a tree whose frontier is
the output for w for each w in L(~).
M e t h o d . We shall construct a pushdown processor M which simulates
the LL(k) parser t/, for G. The simulation of ~ by M proceeds as follows.
As in Algorithm 9.1, we ignore the handling of tables. It is the same for M
as for 6.
(1) Initially, M will have Spr on its pushdown stack (the top is at the left),
where Pr is a pointer to the root node nr.
(2) If ~ has a terminal on top of its pushdown list and compares it with
the current input symbol, deletingboth, M does the same.
(3) Suppose that ~ expands a nonterminal A [possibly with an associated
LL(k) table] by production A ~ X i ' " Xm, having translation element
y o B l y l " • • BrYr, and that the pointer immediately below A (it is easy to show
that there will always be one) points to node n. Then M does the following:
(a) M creates direct descendant nodes for n labeled, from the left,
with the symbols of yo B1Yl • • " B~ yr.
(b) On the stack M replaces A and the pointer below by X'~ . . . X,n,
with pointers immediately below those of X~ . . . X m which are
nonterminals. The pointer below X) points to the node created
for B~ if Xj and B~ correspond in the rule
A ~ X1 ' " Xm, Y o B l Y l " " BrY~
(4) If M's pushdown list becomes empty when it has reached the end of
the input sentence, it accepts; the output is the tree which has been construct-
ed with root n~. [~]
Example 9.7
We shall consider an example drawn from the area of natural language
translation. It is a little known fact that an SDTS forms a precise model for
the translation of English to another commonly spoken natural language,
pig Latin. The following rules informally define the translation of a word
in English to the corresponding word in pig Latin"
(1) If a word begins with a vowel, add the suffix YAY.
(2) If a word begins with a nonempty string of consonants, move all
consonants before the first vowel to the back of the word and append
suffix AY.
(3) One-letter words are not changed.
(4) U following a Q is a consonant.
(5) Y beginning a word is a vowel if it is not followed by a vowel.
We shall give an SDTS that incorporates only rules (1) and (2). It is left
for the Exercises to incorporate the remaining rules.
The rules of the SDTS are as follows"
(word) > (consonants) (vowel) (letters),

(vowel) (letters) (consonants) 'AY'
(word)~ > (vowel) (letters), (vowel)(letters) 'YAY'
(consonants) ~ (consonant)(consonants), (consonant)fconsonants)
(consonants) - - + (consonant), (consonant)
(letters)~ ~ (letter) (letters), (letter) (letters)
(letters) > e, e
(vowel) ~ 'A', 'A'
(vowel) ~ ~ 'E', 'E'
(vowel)- ~ 'U','U'
(consonant) ~ 'B', 'B'
(consonant) > 'C', 'C'
(consonant)~ > 'Z', 'Z'

(letter) ~ (vowel), (vowel)
(letter) ~ (consonant), (consonant)
746 TRANSLATIONAND CODEGENERATION CHAP. 9
The underlying grammar is easily seen to be LL(2). Let us compute

the output translation corresponding to the input word "THE". As in
the previous two examples, we shall first list the configurations entered by the
processor and then show the tree at various stages in its construction. The
pushdown top is at the left this time.
Input Stack
( 1) THE$ (word)p1
(2) THE$ (consonant s)p 2(vowel)p 3(letters)p 4
(3) THE$ <consonant)p s(consonants)p 6(vowel)p 3(letters)p 4
(4) THE$ T(consonants)p 6(vowel)p 3(letters)p 4
(5) HE$ ( consonan ts ) p 6( v0wel) p 3(letters) p 4
(6) HE$ (consonant) p 7(vowel)p 3(letters) p 4
(7) HE$ H(vowel>p 3(letters)p 4
(8) E$ (vowel)p 3(letters)p 4
(9) E$ E(letters)p4
(10) $ (letters)p4
(11) $ e
The tree structure after steps 1, 2, 6, and 11 are shown in Fig. 9.10(a)-(d),
respectively. D
As in the previous section, there is an easy proof that the current algorithm
performs the correct translation and that on a suitable random access
machine the algorithm can be implemented to run in time which is linear in
the input length. For the record, we state the following theorem.
THEOREM 9.5
Algorithm 9.2 constructs a pushdown processor which produces as output
a tree whose frontier is the translation of the input string.
Proof We can prove by induction that an input string w has the net effect
of erasing nonterminal A and pointer p from the pushdown list if and only if
(A, A ) ~T* (w, x), where x is the frontier of the subtree whose root is the
node pointed to by p (after erasure of A and p and the symbols to which A
is expanded). Details are left for the Exercises.
9.2.5. Translation in a Backtrack E n v i r o n m e n t
The ideas central to the pushdown processor can be applied to backtrack

parsing algorithms as well. To be specific, we shall discuss how the parsing
machine of Section 6.1 can be extended to incorporate tree construction.
The chief new idea is that the processor must be able to "destroy," if need be,
subtrees which it has constructed. That is, certain subtrees can become inac-
cessible, and while we shall not discuss it here, the memory cells used to
< word >

l 1 !
(
< word > < v o w e l > | <letters> | <consonants> A
(a) (b)
< word >

[
<vowel~ <letters> ~flonsonan~> A

(c)
I P4 1 <consonant> <consonants>
T
< consonant >
< word >
< vowel > < letters > < consonants > A
E e < consonant > < consonants >
(d) T
i I
< consonant >
I
H
Fig. 9.10 Translation to pig Latin.
represent the subtree are in practice r e t u r n e d to the available m e m o r y .

W e shall m o d i f y the parsing m a c h i n e of Section 6.1 to give it the capa-
bility o f placing pointers to nodes of a g r a p h on its p u s h d o w n list. (Recall
that the parsing machine already has pointers to its input; these pointers are
kept on one cell along with an information symbol.) The rules for manipu-
lating these pointers are the same as for the pushdown processor, and we shall
not discuss the matter in any more detail.
Before giving the translation algorithm associated with the parsing
machine, let us discuss how the notion of a syntax-directed translation
carries over to the G T D P L programs of Section 6.1. It is reasonable to
suppose that we shall associate a translation with a "call" of a nonterminal
if and only if that call succeeds.
Let P -- (N, Z, R, S) be a G T D P L program. Let us interpret a G T D P L
statement A --~ a, where a is in E U {e}, as though it were an attempt to
apply production A ~ a in a CFG. Then, in analogy with the SDTS, we
would expect to find associated with that rule a string of output symbols.
That is, the complete rule would be A --~ a, w, where w is in A*, A being
the "output alphabet."
The only other G T D P L statement which might yield a translation is
A --~ B[C, D], where A, B, C, and D are in N. We can suppose that two C F G
productions are implied here, namely A --~ BC and A ~ D. If B and C
succeed, we want the translation of A to involve the translations of B and C.
Therefore, it appears natural to associate with rule A --~ B[C, D] a trans-
lation element of the form wBxCy or wCxBy, where w, x, and y are in A*.
(If B -- C, then there must be a correspondence specified between the non-
terminals of the rule and those of the translation element.)
If B fails, however, we want the translation of A to involve the trans-
lation of D. Thus, a second translation element, of the form uDv, must be
associated with the rule A --~ B[C, D]. We shall give a formal definition of
such a translation-defining method and then discuss how the parsing machine
can begeneralized to perform such translations.
DEFINITION
A GTDPL program with output is a system P -- (N, E, A, R, S), where N,

E, and S are as for any G T D P L program, A is a finite set of output symbols,
and R is a set of rules of the following forms:
(1) A - ~ f , e
(2) A - - ~ a , y w h e r e a ~ E u [ e } a n d y ~ A *
(3) (a) A --~B[C, D], yiBy2Cy 3, y4Dy5
(b) A---, B[C, D], Y l Cy2By3, yaDy5
where y, is in A*
There is at most one rule for each nonterminal.
We define relations ~ between nonterminals and triples of the form
(u r v, y, r), where u and v are in Z*, y E A*, and r is s or f. Here, the first
SEC. 9 . 2 SYNTAX-DIRECTED TRANSLATIONS 749
component is the input string, with input head position indicated; the
second component is an output string, and the third is the outcome, either
success or failure.
(1) If A has rule A ~ a, y, then for all u in E*, A =~ (a l" u, y, s). If v is
in E* but does not begin with a, then A =~ ([" v, e, f ) .
(2) IfA has rule A --, B[C, D], ylEyzFy3, y4Dys, where E -- B and F -- C
or vice versa, then the following hold"
111 112
(a) If B =~ (Ul 1" uzu3, xl, s) and C =~ (u2 I" u3, x2, s), then
A ,~1 (ulu z { u3, YlXlYzXzY3, S)
if E -----B and F - - C, and
A .~___~_~1(u,u z { u3, Y~xzy2x~y3, s)
if E -- C and F -- B. In case B -- C, we presume that the cor-

respondence between E and F on the one hand and the positions
held by B and C on the other is indicated by superscripts, e.g.,
A ~ B~t~[B~z~, D], ylB~Z~y~BCl~y3, y4DYs.
111 11~
(b) If B ~ (u 1 1" u 2, x, s) and C =~ (1" u 2, e, f ) , then
111+112+ !
A ~ ([" utuz, e, f )
(c) If B ~ (l" ulu2, e, f ) ~md D ~ (u, { u z, x, s), then

nt+n~+ 1
A ~ (u 1 ~ u2, Y4XYs, S)
/Ii t12
(d) if B =~ (l" u, e, f ) and O =~ (l" u, e, f ) , then A ,,+,,+1 ( { u , e , f ) .
Note that if A =~ (u [" v, y, f ) , then u -- e and y -- e. That is, on failure
the input pointer is not moved and no translation is produced. It also should
be observed that in case (2b) the translation of B is "canceled" when C
fails.
+ n
We let =~ be the union of =~ for n ~> 1. The translation defined by P,
denoted z(P), is [(w, x) [S =~ (w ~, x, s)}.
Example 9.8
Let us define a G T D P L p r o g r a m with output that performs the pig
Latin translation of Example 9.7. Here we shall use lowercase output and
the following nonterminals, whose correspondence with the previous exam-
ple is listed below. Note that X and C3 represent strings of nonterminals.
750 TRANSLATION AND CODE GENERATION CRAP. 9
Here Example 9.7
W (word)
C1 (consonant)
C2 (consonants)
C3 (consonant)*
Li (letter)
Lz (letters~
V (vowel)
X (vowel)(letters)
In addition, we shall use nonterminals S and F with rules S ~ e and

F ~ f. Finally, the rules for C~, V, and L must be such that they match any
consonant, vowel, or letter, respectively, giving a translation which is the
letter matched. These rules involve additional nonterminals and will be
omitted. The important rules are
W- > Cz[X, X], XC2 'ay', X'yay'

C2 > Ci[C3, F], C1C3, F
C3 ~ C1[C3, S], C1C3, S
Lz ~ Li[L2, S], LiL2, S
X > V[L2, F], VL2, F
For example, if we consider the input string 'and', we observe that

V ~ (a ~ nd, a, s) and that L 2 ~ (nd [~, nd, s). Thus, X ~ (and 1~, and, s).
+ +
Since C~ =~ (1~ and, e, f ) , it follows that C2 =~- (]~ and, e, f ) . Thus, we have
W ~ (and I~, andyay, s). [~]
Such a translation can be implemented by a modification of the parsing

machine of Section 6.1. The action of this modified machine is based on
the following observation. If procedure A with rule A --~ B[C, D] is called,
then A will produce a translation only if B and C succeed or if B fails and D
succeeds. The following algorithm describes the behavior of the modified
parsing machine.
ALGORITHM 9.3
Implementation of G T D P L programs with output.
lnput. P = (N, E, A, R, S), a G T D P L program with output.
Output. A modified parsing machine M such that for each input w, M
produces a tree with frontier x if and only if (w, x) is in z(P).
Method. If we ignore the translation elements of the rules of P, we have
a G T D P L program P' in the sense of Section 6.1. Then we can use Lemma
6.6 to construct M', a parsing machine that recognizes L(P').
We shall now informally describe the modifications of M' we need to

make to obtain the modified parsing machine M. The modified machine M
simulates M' and also creates nodes in an output tree, placing pointers to
these nodes on its pushdown list.
With an input string w, M has initial configuration (begin, ]" w, (S, 0)p,).
Here p, is a pointer to a root node nr.
(1) (a) Suppose that A ~ B[C, D], ylEy2Fy3, y4Dys is the rule for A.
Suppose that M makes the following sequence of moves imple-
menting a call of procedure A under the rule A --~ B[C, D]:
(begin, u 1 ~ uzu 3, (A, i)?) ~ (begin, u 1 j" u2u 3, ( B , j ) ( A , i)y)

(success, ulu 2 [" u 3, (A, i)~,)
J-- (begin, u lu2 J" u3, (C, i)~,)
Then M would make the following sequence of moves correspond-
ing to the moves above made by M ' :
(9.2.1) (begin, u 1 J" u2u 3, (A, i)pA~')

(9.2.2) ~ (begin, u 1 J" u2u 3, (B, j)pB(A, i)pnPAY')
(9.2.3) [--- (success, ulu2 ]" u 3, (A, i)pBpA?' )
(9.2.4) ~ (begin, ulu2 I" u3, (C, i)pc?')
In configuration (9.2.1) M has a pointer PA, to a leaf nA, directly

below A. (We shall ignore the pointers to the input in this discus-
sion.) In going to configuration (9.2.2), M creates a new node nn,
makes it a direct descendant of nA, and places a pointer PB to nn
immediately below and above A. Then M places B on top of
the pushdown list. In (9.2.3) M returns to A in state success.
In going to configuration (9.2.4), M creates a direct descendant
of nA for each symbol of yiEy2Fy3 and orders them from the left.
Node nB, which has already been created, becomes the node for
E or F, whichever is B. Let the node for the other of E and F be
n c. Then in configuration (9.2.4), M has replaced ApBpA by Cpc,
where Pc is a pointer to nc. Thus if E = C and F = B, then at
configuration (9.2.4), node nA is root of the following subtree"
HA
y~ nc Y2 nB Y3
(b) If, on the other hand, B returns failure in rule A---~ B[C, D],
M will make the following sequence of moves"
(9.2.1) (begin, u 11" uzu3, (A, i)pAT')

(9.2.2) I---(begin, u l 1" U2U3, (B, j)pa(A, i)PBPAT')
(9.2.5) I-- (failure, u I l" U2U3' (A, i)pBPAT')
(9.2.6) 1--- (begin, u l I" u 2u3, (D, i)PDT')
In going from configuration (9.2.5) to (9.2.6), M first deletes n n

and all its descendants from the output tree, using the pointer PB
below A to locate riB. Then M creates a direct descendant of nA
for each symbol of y4Dys in order from the left and replaces
ApBPA on top of its pushdown list by DpD, where PD is the pointer
to the node created f o r D.
(2) If A --~ a, y is a rule a in Z U [e}, then M would make one of the
following moves:
(a) (begin, u 1" av, (A, i)PAr) ~ (success, ua 1" v, r). In this move a
direct descendant o f nA (the node pointed to by PA) would be
created for each symbol of y. If y = e, then one node, labeled e,
would be created as a direct descendant.
(b) (begin, u 1" v, (A, i)PhT) ~ (failure, u' 1" v', 7), where [ u' l = i and
v does not begin with an a.
(3) If A ~ f, e is a rule, then M would make the move in (2b).
(4) If M' reads its entire input w and erases all symbols on its pushdown
list, then the tree constructed with root nr will be the translation of w. [~]
THEOREM 9 . 6
Algorithm 9.3 defines a modified parsing machine which for an input

w produces a tree whose frontier is the translation of w.
Proof This is another straightforward inductive argument on the number
of moves made by M. The inductive hypothesis is that if M, started in state
begin with A and a pointer p to node n on top of its pushdown list and uv to
the right of its input head, uses input u with the net effect of deleting A and p
from the stack and ending in state success, then the node pointed to by p
will be the root of a subtree with frontier y such that A *~ (u l" v, y, s).
Example 9.9
Let us show how a parsing machine would implement the translation of
Example 9.8. The usual format will be followed. Configurations will be
indicated, followed by the constructed tree. Moves representing recognition
by C1, V, and L 1 of consonants, vowels, and letters will not be shown.
EXERCISES 753
State Input Pushdown List
begin ]" and Wp 1

begin [" and C2p2 Wp2p 1
begin ]" and ClP3C2P3P2Wp2Pl
failure ]" and Czp3pzWp2pl

begin l" and Fp 4 W e 2 P 1
failure r and Wp 2p 1
begin r and Xp 5
begin r and Vp 6 Xp 6p s
success a ]" nd Xp 6 p s
begin a l" n d L 2P 7
begin a I" nd Llp8LzpsP7
success a n I" d L 2P 8P 7
begin an r d L2 p 9
begin an l" d LlploL2PIop9
success and ]" L2 p 10p 9

begin and r L2 p 11
begin and [" LlplzLzplZpll
failure and I" L2 p 12p 11

begin and [" Sp 13
success and I" e
We show the tree after every fourth-listed configuration in Fig. 9.1 l ( a ) -

(e).
A l t h o u g h we shall not discuss it, the techniques suggested in A l g o r i t h m
9.3 are applicable to the other t o p - d o w n parsing algorithms, such as
A l g o r i t h m 4.1 a n d t r a n s l a t i o n built on a T D P L p r o g r a m .
EXERCISES
9.2.1. Construct a pushdown transducer to implement the following simple

SDTS"
(a) (b)
)
l .~3r_~
(c) (d)
y)
(e)
t The empty string is meant here; it is the translation of S
Fig. 9.11 Translation of pig Latin by parsing machine.
754
EXERCISES "755
E > E + T, E T ' A D D ; '

E >T, T
T > T , F, TF 'MPY;'
T >F, F
F > F 1" P, FP 'EXP;'
F > P, P
P > (E), E
P - - - + a, ' L O A D a;'
The transducer should be based on an SLR(1) parsing algorithm. Give

the sequence of output strings during the processing of
(a) a l " a , ( a + a ) .
(b) a -4- a • a 1"a.
9.2.2. Repeat Exercise 9.2.1 for a pushdown processor parsing in an LL(1)
manner. Modify the underlying grammar when necessary.
9.2.3. Show how a pushdown transducer working as an LL(1) parser would
translate the following strings according to the simple SDTS
E ~ > TE', TE"

E' > -t- TE', T 'ADD;' E'
E ' -----> e, e
T ~ FT', FT"
T" ~ • F T ' , F ' M P Y ; ' T'
Z' ~ e, e
F----+ (E), E
F~ a, ' L O A D a ;'
(a) a , (a + a).
(b) ((a + a ) , a) -4- a.
9.2.4. Show how a pushdown processor would translate the word abbaaaa
according to the SDTS
S ~ aA<~)A ~z), OA¢Z)A<I)I

S--+ b, 2
A- > bS<~)S ~z), 1S<~)S(z)O
A---->.a, e
(a) Parsing in an LL(1) mode.

(b) Parsing in an LR(1) mode.
9.2.5. Show that there is no SDTS which performs the translation of Example
9.1 on the given grammar E ~ a ÷ E t a • EI a.
9.2.6. Give a formal construction for the P D T M of Theorem 9.1.
• 9.2.7. Prove that every D P D T defines a postfix simple syntax-directed trans-
lation on an LR(1) grammar. Hint: Show that every D P D T can be put
in a normal form analogous to that of Section 8.2.1.
• 9.2.8. Show that there exist translations T = [(x$, y)} such that T is definable
by a D P D T but {(x, y)[(x$, y) ~ T} is not definable any DPDT. Con-
trast this result with Exercise 2.6.20(b).
9.2.9. Give a formal proof of Theorem 9.3.

9.2.11. Extend the SDTS of Example 9.7 to incorporate rules (3), (4), and (5)
for pig Latin.
9.2.12. Can the SDTS of Example 9.7 be replaced by a simple SDTS if we

assume that no English word begins with more than four consonants ?
What happens to the number of rules of the SDTS ?
9.2.13. Show how a parsing machine with pointers would translate the word
abb according to the following G T D P L program with output:
S ~ A[B, C], OBA, 11C

A ----~ a, 0
B~ S[C, A], OCS, 1A
C- >b, 1
*'9.2.14. Construct P, a G T D P L program with output, such that 'r(P) = {(x, y) lx

is a string with n a's and n b's, n ~ 0, and y = a"b"]. Construct M,
a modified parsing machine, from P so that 'r(M) = "r(P). Is there (a)
a T D P L program with output or (b) an SDTS that defines the same
translation ?

"9.2.17. Extend the notion of a processor with pointers to a graph and give
translation algorithms for SDTS's based on the following algorithms:
(a) Algorithm 4.1.
(b) Algorithm 4.2.
(c) The Cocke-Younger-Kasami algorithm.
(d) Earley's algorithm.
(e) A two-stack parser (see Section 6.2).
(f) A Floyd-Evans parser.
SEC. 9.3 GENERALIZED TRANSLATION SCHEMES 757
Research Problem
9.2.18. In implementing a translation, we are often concerned with the efficiency
with which the translation is performed. Develop optimization tech-
niques similar to those in Chapter 7 to find efficient translators for
useful classes of syntax-directed translations.
9.2.19. Construct a program that takes as input a simple SDTS on an LL(1)
grammar and produces as output a translator which implements the
given SDTS.
9.2.20. Construct a program that produces a translator which implements a
postfix simple SDTS on an LALR(1) grammar.
9.2.21. Construct a program that produces a translator which implements an
arbitrary SDTS on an LALR(1) grammar.
BIBLIOGRAPHIC NOTES
Lewis and Stearns [1968] were the first to prove that a simple SDTS with an
underlying LL(k) grammar can be implemented by a deterministic pushdown trans-
ducer. They also showed that a simple postfix SDTS on an LR(k) grammar can
be performed on a D P D T and that every D P D T translation can be effectively de-
scribed by a simple postfix SDTS on an LR(k) grammar (Exercise 9.2.7). The push-
down processor was introduced by Aho and Ullman [1969a].
In many compiler-compilers and compiler writing systems, the formalism used
to describe the object compiler is similar to a syntax-directed translation scheme.
The syntax of the language for which a compiler is being constructed is specified
in terms of a context-free grammar. Semantic routines are also specified and
associated with each production. The object compiler that is produced can be
modeled by a pushdown processor; as the object compiler parses its input, the
semantic routines are invoked to compute the output. TDPL with output is a
simplified model of the T M G compiler writing system [McClure, 1965]. McIlroy
[1972] has implemented an extension of T M G to allow GTDPL-type parsing rules
and simulation of bottom-up parsers. GTDPL with output is a simplified model
for the META family of compiler-compilers [Schorre, 1964].
Two classes of simple SDTS's which each include the simple SDTS's on an
LL grammar and the postfix simple SDTS's on an LR grammar are mentioned by
Lewis and Stearns [i968] and Aho and Ullman [1972g]. Each of these classes is
implementable on a DPDT.
9.3. GENERALIZED T R A N S L A T I O N S C H E M E S
In this section we shall consider how the idealized translations discussed

previously can be extended in a natural way to enable us to p e r f o r m a wider
and more useful class of translations. Here we shall adopt the point of view
that the most general translation element that can be associated with a pro-
duction can be any type of function. The main extensions are to allow several
translations at each node of the parse tree, to allow use of other than string-
valued variables, and to allow a translation at one node to depend on trans-
lations at its direct ancestor, as welt as its direct descendants.
We shall also discuss the important matter of the timing of the evaluation
of translations at various nodes.
9.3.1. Multiple Translation Schemes
Our first extension of the SDTS Will allow each node in the parse tree to
possess several string-valued translations. As in the SDTS, each translation
depends on the translations of the various direct descendants of the node in
question. However, the translation elements can be arbitrary strings of output
symbols and symbols representing the translations at the descendants. Thus,
translation symbols can be repeated.
DEFINITION
A generalized syntax-directed translation scheme (GSDTS) is a system

T -- (N, Z, A, 17, R, S), where N, E, and A are finite sets of nonterminals, in-
put symbols, and output symbols, respectively. F is a finite set of translation
symbols of the form A,, where A e N and i is an integer. We assume that
$1 ~ F. S is the start symbol, a member of N. Finally, R is a set of rules of
the form
A ..... Am=/L
subject to the following constraints"

(1) Aj ~ F f o r l < j < m .
(2) Each symbol of i l l , . . . , tim is either in A or a symbol Bk in F such
that B is a nonterminal which appears in a.
(3) If a has more than one symbol B, then each Bk in the fl's is associated
by a superscript to one of these instances of B.
We call A ~ a the underlying production of the rule. We call A t a trans-
lation of A and A t = fit ei translation element associated with this rule. If P
denotes the set of underlying productions of all rules, then G = (N, E, P, S)
is the underlying grammar of T. If no two rules have the same underlying
production, then T is said to be semantically unambiguous.
We define the output of a GSDTS in a bottom-up fashion. With each
interior node n of a parse tree (in the underlying grammar) labeled A, we
associate one string for each A t in F. This string is called the value (or
translation) of A t at node n. Each value is computed by substituting the values
defined at the direct descendants of n for the translation symbols of the

translation element for A i. The proper translation element for A t is the one
associated with the production used at n. For example, suppose that
A --, BaC, A 1 : bB1CzB l, A z : ClcB z
is a rule in a GSDTS and that the underlying production A ~ B a C is used to

expand a node labeled A in the derivation tree, as shown in Fig. 9.12. Then if
Fig. 9.12 Portion of a parse tree.
the values of B1 and B 2 at node B and of C1 and C 2 at node C are as in Fig.

9.12, the value of A~ defined at the node labeled A is bXlY2Xl and the value
of A2 at that node is y lcx2 .
The translation defined by T, denoted z(T), is the set of pairs
{(x, y ) [ x has a parse tree in the underlying grammar of T,

and y is the value of S~ at the root of that tree}.
Example 9.10
Let T = ([S~}, ~a], ~a}, {S,, $2}, R, S) be a-GSDTS, where R consists of
the following rules:
S >aS, S~ = S ~ S z S 2 a , Sz : S z a
S ~ a, S 1 = a, Sz = a
Then z ( T ) = {(a", a"')ln _~ 1}. For example, a 4 has the parse tree of Fig.
9.13(a). The values of the two translations at each interior node are shown
in Fig. 9.13(b).
For example, to calculate the value of S 1 at the root, we substitute into
the expression S i S z S 2 a the values for S~ and S 2 at the node below the root.
These values are a 9 and a 3, respectively. A proof that z(T) maps a" to a "'
reduces to observing that (n -t- 1)2 -- n 2 + 2n -+- 1. [-7
S 1 = a16 S 2 = aaaa
a S1 a9 S 2 = aaa
S1 a4 S 2 = aa
S1 a S2=a
(a) (b)
Fig. 9.13 Generalized syntax directed translation.
Example 9.11
We shall give an example of formal differentiation of expressions involv-
ing the constants 0 and 1, the variable x, and the functions sine, cosine, + ,
and ,. The following grammar generates the expressions:
E ~ E +TIT
T >T,F[F
F > (E)[sin(E)[eos(E)[x[O[ 1
We associate with each of E, T, and F two translations indicated by

subscripts 1 and 2. Subscript 1 indicates an undifferentiated expression; 2
indicates a differentiated expression. E 2 is the distinguished translation. The
appropriate laws for the derivatives are
d ( f (x) + g(x)) - - d f ( x ) -+- dg(x)

d f (x)g(x) = f (x)dg(x) -k g(x)d f (x)
d sin(f (x)) = eos(f(x))df(x)
d cos(f (x)) = -- sin(f(x))df(x)
dx : l
dO=O
dl : 0
The following GSDTS, T, reflects these laws"
E >E+T Et : E l - l - T 1
E~ = E ~ + T~
E >T E1 =T1
E~ = T~
T >T*F T1 : T i * F 1
T~ = T~ , F~ + (T~) , F,
T >F T~ =F~
r >(E) F~ =(E~)
F > sin(E) Fl : sin(El)

F~ = cos(El) • (E~)
F- > cos(E) F~ = Cos(E1)
Fz : --sin(El) • (E2)
F >x F~ : x
F2=l
F >0 Fi : 0
F~=0
F >1 F1:1
F~=0
We leave for the Exercises a proof that if (0~, fl) is in z(T), then fl is the deri-
vative of ~. fl may contain some redundant parentheses.
The derivation tree for sin(cos(x)) + x is given in Fig. 9.14.
The values of the translation symbols at each of the interior nodes are
listed below:
Nodes E l , T1, o r F1 Ez, Tz or Fz
n3, nz x 1
nl 2, nl 1, Hi 0 x 1
rig, n8, n7 COS(X) --sin(x) • (1)
n6, n5, n4 sin(cos(x)) cos(cos(x)) • (--sin(x) • (1))
nl sin(cos(x)) + x cos(cos(x)) • (--sin(x) • (1)) + 1
== . . . .
762 TRANSLATIONAND CODE GENERATION CHAP. 9
n2
n3
n7
n8
n9
r/10
nil
nl 2
Fig. 9.14 Derivation tree for sin (cos (x)) + x.
The implementation of a GSDTS is not much different from that of an

SDTS using Algorithms 9.1 and 9.2. We shall generalize Algorithm 9.1 to
produce as output a directed acyclic graph (dag) rather than a tree. It is then
possible to "walk" over the dag in such a way that the desired translation
can be obtained.
ALGORITHM 9.4
Bottom-up execution of a GSDTS.

Input. A semantically unambiguous GSDTS T = (N, X, A, I', R, S),
whose underlying grammar G is LR(k), and an input string x ~ X*.
Output. A dag from which we can recover the output y such that (x, y)
is in T.
Method. Let ~ be an LR(k) parser for G. We shall construct a pushdown
processor M with an endmarker, which will simulate ~ and construct a dag.
If ~ has nonterminal A on its pushdown list, M will place below A one pointer
for each translation symbol A l ~ r'. Thus, corresponding to a node labeled
A on the parse tree will be as many nodes of the dag as there are translations
of A, i.e., symbols Al in r'. The action of M is as follows:
(!) If ~ shifts, M does the same.
(2) Suppose that ~ is about to reduce according to production A ---~ 0c,
with translation elements A1 = fl 1,- - . , Am = fl~. At this point M will have
on top of its pushdown list, and immediately below each nonterminal in
there will be pointers toeach of the translations of that nonterminal. When
M makes the reduction, M first creates m nodes, one for each translation
symbol A r The direct descendants of these nodes are determined by the
symbols in i l l , . . . , tim" New nodes for output symbols are created. The node
for translation symbol B k ~ F is the node indicated by that pointer below
the nonterminal B in ~ which represents the kth translation of B. (As usual,
if there is more than one B in ~, the particular instance of B referred to will
be indicated in the translation element by a superscript.) In making the reduc-
tion, M replaces ~ and its pointers by A and the m pointers to the translations
for A. For example, suppose that ~ reduces according to the underlying
production of the rule
A- > BaC, A I = bBiCzBi, A 2 = CicB z
Then M would have the string PB,Ps,Bapc, Pc iC on top of its pushdown list
(C is on top), where the p's are pointers to nodes representing translations. In
making the reduction, M would replace this string by pA,PalA, where PAl
and p~, are pointers to the first and second translations of A. After the reduc-
tion the output dag is as shown in Fig. 9.15, We assume that Px, points to
the node for X~.
Fig. 9.15 Portion of output dag.

(3) If M has reached the end of the input string and its pushdown list
contains S and some pointers, then the pointer to the node for $1 is the root
of the desired output dag. [Z]
We shall delay a complete example until we have discussed the inter-

pretation of the dag. Apparently, each node of the dag "represents" the value
of a translation symbol at some node of the parse tree. But how can that value
be produced from the dag ? It should be evident from the definition of the
translation associated with a dag that the value represented by a node n
should be the concatenation, in order from the left, of the values represented
by the direct descendants of node n. Note that two or more direct descendants
may in fact be the same node. In that case, the value of that node is to be
repeated.
With the above in mind, it should be clear that the following method of
walking through a dag will produce the desired output.
ALGORITHM 9.5
Translation from dag.
Input. A dag with leaves labeled and a single root.
Output. A sequence of the symbols used to label the leaves.
Method. We shall use arecursive procedure R(n), where n is a node of the
dag. Initially, R(no) is called, where n o is the root.
Procedure R(n).
(1) Let al, a 2 , . . . , am be the edges leaving n in this order. Do step (2)
for al, az, . . . , am in order.
(2) Let a i be the current edge.
(a) If ai enters a leaf, emit the label of that leaf.
(b) If a~ enters an interior node n', perform R(n'). [~
THEOREM 9.7
If Algorithm 9.5 is applied to the dag produced by Algorithm 9.4, then
the output of Algorithm 9.5 is the translation of the input x to Algorithm 9.4.
P r o o f Each node n produced by Algorithm 9.4 corresponds in an obvious
way to the value of a translation symbol at a particular node of the parse
tree for x. A straightforward induction on the height of a node shows that
R(n) does produce the value of that translation symbol.
Example 9.12
Let us apply Algorithms 9.4 and 9.5 to the GSDTS of Example 9.10
(p. 759), with input aaaa.
The sequence of configurations of the processor is [with LR(1) tables
omitted, as usual]
SEe. 9.3 GENERALIZED TRANSLATION SCHEMES 765
Pushdown List Input
(1) e aaaa$
(2) a aaa$
(3) aa aa$
(4) aaa aS
(5) aaaa $
(6) aaap 1p 2S $
(7) aap3P4S $
(8) apsp6S $
(9) p7psS $
The trees constructed after steps 6, 7, and 9 are shown in Fig. 9.16(a)-
(c). Nodes on the left correspond to values of S~ and those on the right to
values of $2.
The application of Algorithm 9.5 to the dag of Fig. 9.16(c) requires many
invocations of the procedure R(n). We begin with node n x, since that corre-
sponds to $1. The sequence of calls of R ( n ) and the generations of output
a will be listed. A call of R(n~) iS indicated simply as n t. The sequence is
nln3nsn7ansansaan6nsaan6nsaaan4n6nsaaanan6nsaaaa. [~
9.3.2. Varieties of Translations
Heretofore, we have considered only string-valued translation variables in

a translation scheme. The same principles that enabled us to define SDTS's
and GSDTS's will allow us to define and implement translation schemes
containing arithmetic and Boolean variables, for example, in addition to
string variables.
The strings produced in the course of code generation can fall into several
different classes"
(1) Machine or assembly code which is to be output of the compiler.
(2) Diagnostic messages; also output of the compiler but not in the same
stream as (1).
(3) Instructions indicating that certain operations should be performed
on data managed by the compiler itself.
Under (3), we would include instructions which do bookkeeping operations,
arithmetic operations on certain variables used by the compiler other than
in the parsing mechanism, and operations that allocate storage and generate
new labels for output statements. We defer discussion of when these instruc-
tions are executed to the next section.
A few examples of types of translations which we might find useful during
code generation are
(1) Elements of a finite set of modes (real, integer, and so forth) to indi-
cate the mode of an arithmetic expression,
l~:1 I.~!
k, k...~
(a) (b)
l,,o]
n3
n5
(c)
Fig. 9.16 Dag constructed by Algorithm 9.4.
(2) Strings representing the labels of certain statements when compiling

flow of control structures (if-then-else, for example), and
(3) Integers indicating the height of a node in the parse tree.
We shall generalize the types of formulas that can be used in a syntax-
directed translation scheme to compute translations. Of course, when dealing
with numbers or Boolean variables, we shall use Boolean and arithmetic
operators to express translations. However, we find it convenient to also use
conditional statements of the form if B then E1 else E2, where B is a Boolean
expression and E1 and E2 are arbitrary expressions, including conditionals.
For example, we might have a production ,4--~ BC, where B has a
Boolean translation B~ and C has two string-valued, translations C~ and C2.
The formula for the translation of A 1 might be if B~ then C~ else Cz. That
is, if the left direct descendant of the node whose translation A~ is being
computed has translation B~ ----true, then take C1, the first translation of the
right direct descendant, as An; otherwise, take C2. Alternatively, B might
have an integer-valued translation B~, and the formula for A2 might be
if B2 = 1 then C~ else C2. This statement would cause A s to be the same
as C~ unless Bz had the value 1.
We observe that it is not hard to generalize Algorithm 9.4, a bottom-up
algorithm, to incorporate the possibility that certain translations are numbers,
Boolean values, or elements of some finite set. In a bottom-up scheme the
formulas for these variables (and the string-valued variables) are evaluated
only when all arguments are known.
In fact, it will often be the case that all but one of the translations asso-
ciated with a node are Boolean, integer or elements of a finite set. If the
remaining (string-valued) translation can be produced by rules that are
essentially postfix simple SDTS rules (with conditionals, perhaps), then
we can implement the entire translation on a DPDT which keeps the non,
string translations on its pushdown list. These translations can be easily
"attached" to the pushdown cells holding the nonterminals, there being
an obvious connection between nonterminals on the pushdown list and
interior nodes of the parse tree. If some translation is integer-valued, we have
gone outside the DPDT formalism, but the extension should pose no problem
to the person implementing the translator on a computer.
However, generalizing Algorithms 9.2 and 9.3, which are top-down, is
not as easy. When only strings are to be constructed, we have allowed the
formulas to be expanded implicitly, i.e., with pointers to nodes which will
eventually represent the desired string. However, it may not be possible to
treat arithmetic or conditional formulas in the same way.
Referring to Algorithm 9.2, we can make the following modification.
If a nonterminal A on top of the pushdown list is expanded, we leave the
pointer immediately below A on the pushdown list. This pointer indicates
the node n which this instance of A represents. When the expansion of A is
768 TRANSLATIONAND CODEGENERATION CHAP. 9
complete, the pointer will again be at the top of the pushdown list, and the
translations for all descendants of n will have been computed. (We can show
this inductively.) It is then possible to compute the translation of n exactly
as if the parse were bottom-up. We leave the details of such a generalization
for the Exercises.
We conclude this section with several examples of more general trans-
lation schemes.
Example 9.13
We shall elaborate on Section 1.2.5, wherein we spoke of the generation
of code for arithmetic expresgions for a sin.gle accumulator random access
machine. Specifically, we assume that the assembly language instructions
ADD ~z
MPY ~z
LOAD ~z
STORE
are available and have the obvious meanings.

We shall base our translation on the grammar G 0. Nonterminals E, T,
and F will each have ,two translation elements. The first will produce a string
of output characters that will cause the value of the corresponding input
expression to be brought to the accumulator. The second translation will
be an integer, representing the height of the node in a parse tree. We shall
not, however, count the productions E--~ T, T--~ F, or F---~ ( E ) w h e n
determining height. Since the only need for this height measure is to deter-
mine a safe temporary location name, it is permissible to disregard these
productions. Put another way, we shall really measure height in the syntax
tree.
The six productions and their translation elements are listed below:
(1) E ~.E+T E1 = Ti ' ; S T O R E $' Ez ';' E, ' ; A D D $' Ez

E2 ----max(Ez, T2) + 1
(2) E >T El = Tit
E~ = T~
(3) T ~ T, F T1 = F1 ' ; S T O R E $'T2 ';' T1 ' ; M P Y $'Tz
T2 = max(T2, F2) + 1
t If we were working from the syntax tree, rather than the parse tree, reductions by
E ~ T, T ~ F and F ~ (E) would not appear. We would then have no need to implement
the trivial translation rules associated with these productions.
SEC. 9.3 GENERALIZED TRANSLATION SCHEM ES 7 69
(4) T :> F T, = F,
T~ = F~
(5) F~ (E) F, = E,
Fz = E z
(6) F~ a F~ = 'LOAD' NAME(a)

F~=I
Again we use the SNOBOL convention in the translation elements for

the definition of strings. Quotes surround strings that denote themselves.
Unquoted symbols denote their current value, which is to be substituted for
that symbol. Thus, in rule (1), the translation element for E1 states that
the value of E1 is to be the concatenation of
(1) The value of T1 followed by a semicolon;
(2) The instruction STORE with an address $n, where n is the height
of the node for E (on the right-hand side of the production), followed by
a semicolon;
(3) The value of E1 (the left argument of + ) followed by a semicolon;
and
(4) The instruction ADD Sn, where n is the same as in (2).
Here $n is intended to be a temporary location, and the translation for
E1 (on the right-hand side of the translation element) will not use that loca-
tion because of the way Ez, T2, and Fz are handled.
The rule for production (6) uses the function NAME(a), where NAME
is a bookkeeping function that retrieves the internal (to the compiler) name
of the identifier represented by the token a. Recall that the terminal symbols
of all our grammars are intended to be tokens, if a token represents an iden-
tifier, the token will contain a pointer to the place in the symbol table where
information about the particular identifier is stored. This information tells
which identifier the token really represents, as well as giving attributes of that
identifier.
The parse tree of (a + a) • (a + a), with values of some of the transla-
tion symbols, is shown in Fig. 9.17. We assume that the function NAME(a)
yields a~, a2, a3, and a 4, respectively, for the four identifiers represented
bya. [~]
Example 9.14
The code produced by Example 9.13 is by no means optimal. A con-
siderable improvement can be made by observing that if the right operand is
a single identifier, we need not load it and store it in a temporary. We shall
therefore add a third translation for E, T, and F, which is a Boolean variable
T 1 = LOAD a 4 • STORE $1" LOAD a 3 •

ADD $1" STORE $2; LOAD a 2 •
STORE $1" L O A D a I • ADD $ 1 MPY $2
T . = ~ "~
E1 = L O A D a 2 . S T O R E $ t. t ~ t~ E1 = L O A D a 4 . S T O R E $ 1 .
L O A D a 1 ' ADD $1 / ~ T ~ L O A D a 3 ' ADD $1
E2=2 / E2= 2
F1 = L O A D a l y F2=
F2 =1 F1= LOADa2
1 ~F1 F2== LOADa3~
1 FI = LlOA~Da4
F2=
Fig. 9.17 P a r s e tree a n d t r a n s l a t i o n s .
with value true if and only if the expression dominated by the node is a single
identifier.
The first translation of E, T, and F is again code to compute the expres-
sion. However, if the expression is a single identifier, then this translation is
only the " N A M E " of that identifier. Thus, the translation scheme does not
"work" for single identifiers. This should cause little trouble, since the expres-
sion grammar is presumably part of a grammar for assignment and the
translation for assignments such as A ~ B can be handled at a higher level.
The new rules are the following:
(1) E > E -q- T E, : if T3 then

if E3 then 'LOAD' E1 ';ADD' T1
#
else Ei ';ADD' T,
else if E3 then T, ';STORE $1 ;LOAD' E,
';ADD $1'
else T, ';STORE $' E2 ';' E, ';ADD $' E2
Ez : max(E2, T2) -t- i
E3 : false
(2) E >T E, = T,
E2 = T2
E3 = T3
(3) T~T,F T, = i f F 3 t h e n
if T3 then 'LOAD' T, ';MPY' F1
else T~ ';MPY' F1
else if T3 then F~ ';STORE $1 ;LOAD' T,
';MPY $1'
else F, ';STORE $' T2 ';' T~ ';MPY $' T2
Tz : max(T2, Fz) -+- 1
T3 : false
(4) T- >F T, : F ,
T~ : F~
T~ = FF~
(5) F > (E) F, = E,
F2 : E2
F3 : E~
(6) F >a F, : NAME(a)
F2:l
F3 : true
In rule (1), the formula for E, checks whether either or both arguments
are single identifiers. If the right-hand argument is a single identifier, then
the code generated causes the left-hand argument to be computed and the
right-hand argument to be added to the accumulator. If the left-hand argu-
ment is a single identifier, then the code generated for this argument is the
identifier name. Thus, 'LOAD' must be prefixed to it. Note that Ez -- 1 in

this case, and so $1 can be used as the temporary store rather than '$' Ez
(which would be $1 in this case anyway).
The code produced for (a + a) • (a + a) is
LOAD a3; A D D a4; STORE $2; LOAD al; A D D az; MPY $2
Note that these rules do not assume that + and • are commutative. [--]
Our next example again deals with arithmetic expressions. It shows how
if three address code is chosen as the intermediate language, we can write
what is essentially a simple SDTS, implementable by a deterministic push-
down transducer that holds some extra information at the cells of its push-
down list.
Example 9.15
Let us translate L(Go) to a sequence of three address statements of the
form A ~ + BC and A ~ • BC, meaning that A is to be assigned the sum
or product, respectively, of B and C. In this example A will be a string of
the form $i, where i is an integer. The principal translations, El, T1, and FI,
will be a sequence of three address statements which evaluate the expression
dominated by the node in question; E2, T2, and F2 are integers indicating
levels, as in the previous examples. E3, T3, and F3 will be the name of a vari-
able which has been assigned the value of the expression by the aforemen-
tioned code. This name is a program variable in the case that the expression
is a single identifier and a temporary name otherwise. The following is the
translation scheme"
E ~E+T Ei -~ EiT1 '$' max(E2, T2) '< + ' E3T3 ';'

E~ : max(E2, T2) + 1
E3 = '$' max(E2, Tz)
E >T Ei T1
E~ =T~
E3 =T~
T >T*F T1 : T1Fi '$' max(T2, F2)'< ,' T3F3 ';'
T~ : max(T2, F2) + 1
T3 : '$' max(T2, F2)
T .'F Ti = F ,
T~
T~ =F~
F >(E) F1 = E l
F~=E~
F3 = E3
F---~ a Fi = e
F2=l
F~ = NAME(a)
As an example, the output for the input a, • (a2 -t- a3) is
$1 < -t- a2a3;

$2< -,a,$1;
We leave it to the reader to observe that the rules for E,, T,, and F,
form a postfix simple SDTS if we assume that the values of second and third
translations of E, T, and F are output symbols. A practical method of imple-
mentation is to parse Go in an LR(1) manner by a D P D T which keeps the
values of the second and third translations on its stack. That is, each push-
down cell holding E will also hold the values of Ez and E3 for the associated
node of the parse tree (and similarly for cells holding T and F).
The translation is implemented by emitting
'S' max(E2, T2)'< +' E3T3 ';'
every time a reduction of E -t- T to E is called for, where Ez, E3, Tz, and T3
are the values attached to the pushdown cells involved in the reduction.
Reductions by T---, T , F are treated analogously, and nothing is emitted
when other reductions are made.
We should observe that since the second and third translations of E, T,
and F can assume an infinity of values, the device doing the translation is
not, strictly speaking, a DPDT. However, the extension is easy to implement
in practice on a random access computer. D
Example 9.16
We shall generate assembly code for control statements of the if-then-
else form. We presume that the nonterminal S stands for a statement and
that one of its productions is S ~ if B then S else S. We suppose that S has
a translation S, which is code to execute that statement. Thus, if a production
for S other than the one shown is used, we can presume that S, is correctly
computed.
Let us assume that B stands for a Boolean expression and that it has two
translations, B, and B2, which are computed by other rules of the translation
~i i ~i~ i
system. Specifically, B~ is code that causes a jump to location B z if the

Boolean expression has value false.
To generate the expected code, S will need another translation S 2, which
is the location to be transferred to after execution of S is finished. We assume
that our computer has an instruction JUMP ~ which transfers control to
the location named a.
To generate names for these locations, we shall assume the existence of
a function NEWLABEL, which when invoked generates a name for a label
that has never been used before. For example, the compiler might keep
a global integer i available. Each time NEWLABEL is invoked, it might
increment i by 1 and return the name $$i. The function NEWLABEL is not
actually invoked for this S-production but will be invoked for some other
S-productions and for B-productions.
We also make use of a convenient assembler pseudo-operation, the likes
of which is found in most assembly languages. This assembly instruction is
of the form
EQUAL tz, fl
It generates n o code but causes the assembler to treat the locations named
and fl as the same location.
The EQUAL instruction is needed because the two instances of S on
the right-hand side of the production S ~ if B then S else S each have a name
for the instruction they expect to execute next. We must make sure that
a location allotted for one serves for the other as well.
The translation elements for the production mentioned are
S > if B then S <~ else S ~z~ S 1 = 'EQUAL' S~z1)'', S~z2~ ' "
01 ' ", S~ 1) ';JUMP'
S~2~ '", B2 ' " S~2~

$2 : S~~
That is, the translation for S consists of the concatenation of

(1) An instruction to cause S~z1~ and S<z2~ to represent the same location;
(2) Object code for the Boolean expression (B1), which causes a jump to
location B2 if false;
(3) Object code for the first statement (S~ 1)) followed by a jump to the
location labeled S~1>(the location with that label exists outside the statement
being compiled); and
(4) Object code for the second statement (S~2)). The first location for that
code is given label B2.
The translation for $2 is the same as translation $2 of the first substate-
ment. Thus, whether B is true or false, the location Stz1~ (which now equals
Stz2)) will be reached.
Let us consider the nested statement
if B ~1) then if B c2> then Scl) else S <2> else S ~3)
generated by two applications of the production in question. (The super-

scripts are just for reference and strictly speaking should not appear.) The
object code generated for this nested statement would be (with semicolons
replaced by new lines)
EQUAL S~1~, S~3)

code for B (1~
E Q U A L S~1), S~2~
code for B c2)
code for S tl~
J U M P S~1~
B~2~" code for S C2)
J U M P S~I)
n(21 ). code for S ~3) E]
Example 9.17
As the last example in this section, we consider generating object code
for a call of the form
call X ( E ~1>, . . . , E <"))
We suppose that there is an assembly instruction CALL X which transfers to

location X and places the current location in a register reserved for returning
from a function call. We translate call X(E(I~,..., E c")) into code which
computes the values denoted by each of the expressions E ~ I ~ , . . . , E (") and
stores their values in temporary locations t 1 , . . . , t,. These computations
are followed by the assembly instruction CALL X and n "argument-holding"
instructions A R G tl, • • •, A R G t,, which are used as pointers to the argu-
ments of this call of X.
We should comment that an alternative method of implementing a call is
to place the values o f E e l ) , . . . , E ~'~ directly in the locations following the
instruction CALL X. A translation scheme generating this type of call is
The important productions describing the call statement are
S ~ call a A
776 TRANSLATION A N D CODE GENERATION CHAP. 9
A: >el(L)
L >E, LIE
That is, a statement can be the keyword call followed by an identifier and
an argument list (A). The argument list can be the empty string or a paren-
thesized list (L) of expressions. We assume that each expression E has two
translations El and E2, where the translation of E~ is object code that leaves
the value of E in the accumulator and the translation of E2 is the name of
a temporary location for storing the value of E. The names for the tempo-
raries are created by the function NEWLABEL. The following rules perform
the desired translation:
S > call a A $1 -- A 1 ';CALL' NAME(a) A2

A- >(L) A1 --L 1
A z -- ' ", L2
A >e A 1 --e
A2 :e
L ~E,L L1 -- E1 ';STORE' Ez ';' L1
"Lz -- 'ARG' Ez ';' L2
L- >E L~ = Et ';STORE' Ez
L z = 'ARG' Ez
For example, the statement call AB(E (1), E (2)) would, if the temporaries
for E (1) and E (2) were $$1 and $$2, respectively, be compiled into
code for E <t)

STORE $$1
code for E ~2)
STORE $$2
CALL AB
A R G $$1
A R G $$2
9.3.3. Inherited and Synthesized Translations
There is another generalization of the syntax-directed translation that

may be useful in certain applications. We have considered translations of
nonterminals which are functions only of translations at the direct descendant
nodes. It is also possible that the value of a translation could be a function
of the values of the translations at its direct ancestor as well as its direct
descendants. We make the following definition.
DEFINITION
A translation of a nonterminal in a translation scheme is said to be a

synthesized translation if it is a function only of translations at itself and
the direct descendants of the nodes at which it is computed. The translation
is an inherited translation if it is a function only of translations at itself and
the direct ancestor of nodes at which it is computed. All translations con-
sidered so far have been synthesized and, in fact, have not had formulas
involving translations at the same node.
We shall define translation elements associated with a production in
the usual way if the translation defined is synthesized. However, the rules
for inherited translations associated with a production may use as arguments
the translations associated with the left-hand side of the production and
compute a translation associated with some particular symbol on the right.
It is thus necessary that we distinguish the nonterminaI on the left from any
instances of that symbol on the right. As usual, superscripts will be used for
this task.
Example 9.18
Let us consider the translation rule
A] 1) = aA~2)A~3)b
A~ 3~ = aA]' )
Here A has two translations A 1, which is synthesized, and A 2, which is

inherited. The rule for A1 is of the type with which we are familiar. The rules
for A~2) and A~3) are examples of rules for inherited attributes. Let us refer
to the portion of a tree in Fig. 9.18. The rule for A~2~ says that translation A 2
at node n 2 is to be made equal to the translation A I at nl. Since A~ at nl is
defined in terms of A2 at n2, here we have a simple example of circular
definition, and this rule is not acceptable as it stands. ~]
Fig. 9.18 Portion of a tree.

778 TRANSLATION AND CODE GENERATION CHAr'. 9
Implementation of a translation scheme with both inherited and synthe-

sized attributes is not easy. In the most general case, one must first construct
the entire parse tree and then compute at each node whatever translations
can be computed in terms of the already computed translations at the descen-
dant and ancestor nodes. Presumably, one begins with the synthesized
attributes at nodes of height i. When a translation is recomputed, all trans-
lations depending on it, whether inherited or synthesized, must be recom-
puted. It is thus possible that there is no end to the sequence of translations
which must be recomputed. Such a translation scheme is said to be circular.
It is decidable whether a translation scheme is circular, although the decision
algorithm is complicated, and we shall leave it for the Exercises. We shall
give one example in which for any parse tree there is a natural order in which
to compute the various translations.
Example 9.19
Let us compile code for arithmetic expressions to be executed on a
machine with two fast registers, denoted the A and B registers. The relevant
instructions are
LOADA
LOADB
STOREA
STOREB oc
ADDA
ADDB
MPY t~
ATOB
The meaning of the first six instructions should be obvious. We can load,
store, or perform addition in either register. We presume that the MPY
instruction takes its left argument in the B register and leaves the result in
the A register, as is the case for floating-point arithmetic on some computers.
The last instruction, ATOB, with no argument, transfers the contents of
the A register to the B register.
We shall build our translation on G O, exactly as we did in Example 9.13.
The translations Et, T1, and F~ will represent code to compute the value of
the associated input expression, sometimes leaving the result in the A register
and sometimes in B. However, the code for E1 at the root of the parse tree
for the input expression will always leave the value of the expression in the A
register. E2, Tz, and F2 are integers which measure the height of the node,
as in Example 9.13. There will be translations E3, T3, and F3 which are
Boolean and have the value true if and only if the value of the expression is
to be left in the A register. These last three translations are all inherited,
while the first six are synthesized. We dispense with the code-improving
feature of Example 9.14. The rules of the translation scheme are
E <t) ~ E C2) + T E~ 1) = if E~ ~) t h e n
Tt ' ; S T O R E A $' E~ 2) ';' E~ 2)
' ; A D D A $' E<22)
e l s e Tt ' ; S T O R E A $' E~2) ';' El 2)
' ; A D D B $' E~ 2)
E~2t) = max(E<22), T2) + 1
E~32) : E3¢1)~f
T 3 = true
E----> T E1 = T1
E~ -- T~
T3 -- E3
T Ct) > T ~2) • F T~ x) : i f T <3~) t h e n
F 1 ' ; S T O R E A $' T<22> ';' T[ 2)

' ; M P Y $' T~2)
e l s e F1 ' ; S T O R E A $' T<22> ';' T~ 2)
' ; M P Y $' T~ z) ' ; A T O B '

T~ ~) : max(T~22), F2) + 1
T~32) : f a l s e
F 3 : true
T---~.F T~=Fi
T, -- F~
F~ -- T3
F----~ (E) Ft = E1
Fz = E2
E3 = F3
F L >a F~ = if F~ then ' L O A D A ' N A M E ( a )
else 'LOADB' NAME(a)
Fz=l
1"We assume that all Boolean translations initially have the value true. Thus, if E~1~
refers to the root, it is already defined.
The strategy is to compute all right-hand operands in register A. The left-

hand operand of • is always computed in B, and the left-hand operand of
q- is computed in either A or B, depending on where the value of the com-
plete expression is desired. Thus, the translation element for El 1) associated
with production E ---~ E q- T gives a translation which computes T i n register
A, stores it in a safe temporary, then computes E in either A or B, whichever
is desired, and performs the addition. The rule for T] 1) associated with pro-
duction T ~ T • F computes F in register A, stores it, computes T in register
B, multiplies, and, if desired, transfers the result in register B.
The parse tree o f ( a + a) • a is shown in Fig. 9.19, with some interior
r/4
n2
nl
Fig. 9.19 Parse tree.
nodes named. Inherited and synthesized attributes propagate unchanged

up and down chains E ~ T ~ F. Here, we describe one sequence of com-
putation of the translations, omitting the propagation of translations from

E to T to F and conversely"
Translation At Node Value
T3 rt5 true
E3 n3 false
F3 nl false
F3 n2 true
FI n2 L O A D A a2
Fz n2 1
Fi nI LOADB aa
F2 na 1
E1 n3 LOADA a2; STOREA $1; LOADB al; ADDB $1
E2 n3 2
/73 n4 true
F1 n4 L O A D A a3
F2 n4 1
T1 ns LOADA a3; STOREA $2; LOADA a2;
STOREA $1 ; LOADB at ; ADDB $1 ; MPY $2
T2 n5 3
D
9.3.4. A Word About Timing
We have pictured a compiler as though the three steps of lexical analysis,

syntactic analysis, and code generation were done one at a time, in that order.
However, this time division is only for representational purposes and may
not occur in practice.
First, the lexical analysis phase normally produces tokens only as they
are needed by the parser. The input string of tokens, which we have shown
when demonstrating the action of parsers, does not necessarily exist in
reality. The tokens are found only when the parser is about to look at them.
Of course, this difference does not affect the action of the parser in any way.
As we have already indicated, the parsing and code generation phases
may occur simultaneously. Thus, the three phases can operate in lock-step.
When the parser cannot parse further, it gets another token from the lexical
analyzer. After each reduction (if bottom-up) or nonterminal expansion (if
top-down) by the parser, the code generation phase operates on the node or
nodes of the parse tree that have just been produced.
If the translation being produced were one string, there would be little
concern about when different pieces of the translation were produced.
However, we recall that the various pieces of translation being produced
may be of several types: for example, object code, diagnostics, and instruc-
tions to be executed by the compiler itself, such as instructions to enter infor-
mation into the bookkeeping mechanism.
Should the method of translation be a pushdown transducer, or a similar

device, where a single stream of output emerges from the device, we again
have little problem with the timing of translations. Symbols are deemed out-
put as they emerge from the device. If we assume that different kinds of
output are differentiated by metasymbols, then the output stream can be
divided as it emerges. For example, intermediate code is passed to the opti-
mization phase, diagnostics are placed on their own list to await printing,
and bookkeeping instructions are executed immediately.
Let us suppose that one of the more general versions of Algorithms 9.1-
9.4 is being used to perform ~some generalized syntax-directed translation.
We could wait until the entire tree or dag is constructed and then construct
a single output stream. The division of the stream into object code and
instructions would occur exactly as for a pushdown transducer. This means
that bookkeeping instructions would not be executed until the entire output
was constructed and that the instructions are reached as the output is pro-
cessed. This arrangement requires an extra pass over the output and a large
random access: memory but may be the most practical arrangement if the
power of the more general syntax-directed translation schemes is needed.
Alternatively, one could adopt the convention that bookkeeping and
other compiler instructions are associated with particular nodes of the parse
tree and are executed just as soon as that node is constructed. However, if
the parsing algorithm involves backtrack, one must be careful not to execute
an instruction associated with a node which is subsequefitly found not to be
part of the parse, In such a situation, a mechanism to negate the effect of
such an instruction is needed.
EXERCISES
9.3.1. Find GSDTS's for the following translations"

(a) {(a", an3) l n > 1}.
(b) {(a", a2') l n _~ 1}.
(c) ((w, ww)l w ~ (0 + 1)*}.
9.3.2. Show that there exist GSDTS definable translations that are not SDT's.
**9.3.3. Let T be a semantically unambiguous GSDTS with infinite domain
whose underlying CFG is proper. Show that one of the following three
conditions must hold"
(1) There exist constants c l and c2 greater than 1 such that if
(x, y ) ~ z(T), then [Yl ~ clzxl, and for an infinity of x, there exists
(x, y) ~ z(T) with [Y I>_ c~xl.
(2) There exist constants c l and cz greater than 0 and an integer
i > 1 such that if ~(x, y) ~ 'r(T) and x ~ e, then l Yl-~ czlx It, and for
an infinity of x, there exists (x, y) ~ z(T) with lYl >_ c ll x V.
(3) The range of T is finite.
EXERCISES 783
9.3.4. Show that the translation [(an, am) l m is the integer part of,v/-~ -} cannot
be defined by any GSDTS.
9.3.5. For Exercise 9.3.1(a)-(c), give the dags produced by Algorithm 9.4
with inputs a 3, a 4, and 011, respectively.
9.3.6. Give the sequence of nodes visited by Algorithm 9.5 when applied to
the three dags of Exercise 9.3.5.
*9.3.7. Embellish Example 9.16 to include the production S ~ while B do S,
with the intended meaning that alternately expression B is to be tested,
and then the statement S done, until such time as the value of B becomes
false.
9.3.8. The following grammar generates PL/I-like declarations:
D ~ (L)M
L - - - ~ a, L I D , L I a I D
M--->" mt Im2l "-" Imk
The terminal a stands for any identifier; m 1 , . . . , mk are the k possible

attributes of identifiers in the language. Comma and parentheses are
also terminal symbols. L stands for a list of identifiers and declarations;
D is a declaration, which consists of a parenthesized list of identifiers
and declarations. The intention is that the attribute derived from M
is to apply to all those identifiers generated by L in the production
D ~ ( L ) M even if the identifier is within a nested declaration. For
example, the declaration (aa, (a2, a3)ma)m2 assigns attribute ml to a2
and aa and attribute m2 to ax, a2, and a3. Build a translation scheme,
involving any type of translation, that will translate strings generated
by D into k lists; list i is to hold exactly those identifiers which are
given attribute mr.
9.3.9. Modify Example 9.17 to place values of the expressions, rather than
pointers to those values, in the ARG instructions.
9.3.10. Consider the following translation scheme"
N--+ L <1) • L <2> Nl = L~1) + L~2) / 2z~2'

N----~.L N1 = L1
L~LB L1 = 2 L 1 + B 1
Lz = L2 + 1
L >B L1 = B1
L2 = 1
B ---->- 0 B1 = 0
B ----~ 1 Bt = 1
In the underlying grammar, N is the start symbol and derives binary

numbers (possibly with a binary point). L stands for a list of bits and
B for bit. The translation elements are arithmetic formulas. The trans-
lation element N1 represents a rational number, the value of the binary
number derived by the nonterminal N. The translation elements L1, L2,
and B1 take integer values. For example, 11.01 has the translation 3¼.
Show that z(T) = {(b, at)Ib is a binary number and d is the value of b}.
"9.3.11. Consider the following translation scheme with the same underlying
grammar as in Exercise 9.3.10 but involving both synthesized and
inherited attributes:
N--+L ~1) • L ~2) N1 = L~1) + L~z)

LCz1) = 0
Lt22) = --L~ 2)
N- >L Nt = L1
L2 = 0
L <1) > L¢2)B L] 1) = L] 2) + B1
B2 = L(21)
L ~ 2) = L~ 1) -[- 1
L~t) = L(32) -]-- 1

L~ B L1 = B1
B 2 = L2
L3 = 1
B -----> 0 BI = 0
B- >1 B 1 = 2 B~
The parse tree for 11.0t together with the values of the translation
elements associated with each n o d e is shown in Fig. 9.20. Note that
to compute the translation element N1 we must first compute the L3's
to the right of the radix point bottom-up, then the L2's top-down, and
finally the Ll'S bottom-up. Show that this translation scheme defines
the same translation as the scheme in Exercise 9.3.10.
"9.3.12. Show that any translation that can be performed using inherited and
synthesized translations can be performed using synthesized translations
only. H i n t : There is no restriction on the structure of a translation.
Thus, one translation defined at a node can be the entire subtree that
it dominates.
9.3.13. Can every translation using synthesized translations be performed using
inherited translations only ?
*'9.3.14. Give an algorithm to test whether a given translation scheme involving
inherited and synthesized attributes is circular.
EXERCISES 785
/~-.~4~N~ L 3 =2
-- 1 -¼
= 0 ~B 2 = --2
B1 0
B2 -
Fig. 9.20 Parse tree with translations.
9.3.15. Modify Example 9.18 to incorporate the code-improving feature of

Example 9.14.
"9.3.16. The differentiation algorithm of Example 9.11 allowed the generation
of expressions such as 1 • cos(x) or 0 • x + (1) • 1 (the formal derivative
of 1 • x). We can detect and eliminat~i expressions which are explicitly
0 or I, where the definition of explicit is as follows:
(1) 0 is explicitly 0; 1 is explicitly 1.
(2) If E1 is explicitly 0 and E2 is any expression, then E1 • E2 and
Ez • E1 are explicitly 0.
(3) If E1 and Ez are explicitly 1, then El • Ez is explicitly 1.
(4) If E1 is explicitly 0 and Ez is explicitly 1, then E1 + E2 and
Ez + E~ are explicitly 1.
(5) If E~ and Ez are explicitly 0, then E~ ÷ Ez is explicitly 0.
Modify the GSDTS, including the underlying grammar, if necessary,
so that no subexpressions which are explicitly 0 appear in the trans-
lation, and no explicit 1 appears as a multiplicative factor.
9.3.17. Consider the following grammar for assignment statements involving
subscripted variables:
A ~I:=E
E - > E(adop)T[ T
T ~ T(mulop)Fl F
F----~ (E) IX
I ~ ala(L)
L----~E, LIE
(adop) ~ +l--
(mulop~ ~ ,[/
An example of a statement generated by this grammar would be
a(a, a) • = a(a + a, a , (a + a)) + a
Here, a is a token representing an identifier. Construct a translation

scheme based on this grammar that will generate suitable assembly or
multiple address code for assignment statements.
9.3.18. Show that the GSDTS of Example 9.11 correctly produces an expression
for the derivative of its input expression.
• '9.3.19. Show that
{(al... a,bl. . . b,, a l b i . . . a,b,) l n ~ 1,

ai ~ [0, 1} and bi ~ {2, 3} for 1 < i ~ n]
is not definable by a GSDTS.
Research Problem
9.3.20. Translation of arithmetic expressions can become quite complicated if
operands can be of many different data types. For example, we could
be dealing with identifiers that could be Boolean, string, integer, real,
or complex--the last three in single, double, or perhaps higher preci-
sions. Moreover, some identifiers could be in dynamically allocated
storage, while others are statically allocated. The number of c9mbina-
tions can easily be large enough to make the translation elements asso-
ciated with a production such as E ~ E + T quite cumbersome. Given
a translation which spells out the desired code for each case, can you
develop an automatic way of simplifying the notation ? For instance,
in Example 9.19, the then and else portions of the translation of E~ 1~
differ only in the single character A or B at the end of ADD. Thus,
almost a factor of 2 in space could be saved if a more versatile defining
mechanism were used.
Programming Exercise
"9.3.21. Construct a translation scheme that maps a subset of F O R T R A N into
intermediate language as in Exercise 9.1.10. Write a program to imple-
ment this translation scheme. Implement the code generator designed
in Exercise 9.1.11. Combine these programs with a lexical analyzer to
produce a compiler for the subset of FORTRAN. Design test strings

that will check the correct operation of each program.
BIBLIOGRAPHIC NOTES
Generalized syntax-directed translation schemes are discussed by Aho and

Ullman [1971], and a solution to Exercise 9.3.3 can be found there. Knuth [1968b]
defined translation schemes with inherited and synthesized attributes. The GSDT's
in Exercises 9.3.10 and 9.3.11 are discussed there.
The solution to Exercise 9.3.14 is found in Bruno and Burkhard [1970] and
Knuth [1968b]. Exercise 9.3.12 is from Knuth [1968b].
10 BOOKKEEPING
This chapter discusses methods by which information can be quickly

stored in tables and accessed from these tables. The primary application of
these techniques is the storage of information about tokens during lexical
analysis and the retrieval of this information during code generation. We shall
discuss two ideas-asimple information retrieval techniques and the formalism
of property grammars. The latter is a method of associating attributes and
identifiers while ensuring that the proper information about each identifier
is available at each node of the parse tree for translation purposes.
10.1. S Y M B O L TABLES
The term symbol table is given to a table which stores names and infor-
mation associated with these names. Symbol tables are an integral feature of
virtually all compilers. A symbol table is pictured in Fig. 10.1. The entries
in the name field are usually identifiers. If names can be of different lengths,
then it is more convenient for the entry in the name field to be a pointer to
a storage area in which the names are actually stored.
The entries in the data field, sometimes called descriptors, provide infor-
mation that has been collected about each name. In some situations a dozen
or more pieces of information are associated with a given name. For example,
we might need to know the data type (real, integer, string, and so forth) of
an identifier; whether it was perhaps a label, a procedure name, or a formal
parameter of a procedure; whether it was to be given statically or dynamically
allocated storage; or whether it was an identifier with structure (e.g., an
array), and if so, what the structure was (e.g., the dimensions of an array).
If the number of pieces of information associated with a given name is vari-
788
SEC. 10.1 SYMBOL TABLES 789
NAME DATA
I INTEGER
LOOP LABEL
REAL ARRAY Fig. 10.1 Symbol table.
able, then it is again convenient to store a pointer in the data field to this
information.
10.1.1. Storage of Information About Tokens
A compiler Uses a symbol table to store information about tokens, par-

ticularly identifiers. This information is then used for two purposes. First,
it is used to check for the semantic correctness of a source program. For
example, if a statement of the form
GOTO LOOP
is found in the source program, then the compiler must check that the identi-
fier LOOP appears as the label of an appropriate statement in the program.
This information will be found in the symbol table (although not necessarily
at the time at which the goto statement is processed). The second use of
the information in the symbol table is in generating code. For example, if
we have a F O R T R A N statement of the form
A=Bq--C
in the source program, then the code that is generated for the operator +
depends on the attributes of identifiers B and C (e.g., are they fixed- or
floating-point, in or out of "common," and so forth).
The lexical analyzer enters names and information into the symbol table.
For example, whenever the lexical analyzer discovers an identifier, it consults
the symbol table to see whether this token has previously been used. If not,
the lexical analyzer inserts the name of this identifier into the symbol table
along with any associated information. If the identifier is already present in
the symbol table at some location/, then the lexical analyzer produces the
token (~identifier), l) as output.
Thus, every time the lexical analyzer finds a token, it consults the symbol
table. Therefore, to design an efficient compiler, we must, given an instance
of an identifier, be able to rapidly determine whether or not a location in
the gymbol table hag been regerved for that identifier. If no such entry exists,
we m u s t then be able to insert the identifier quickly into the .table.
790 BOOKKEEPING CHAP. 10
Example 10.1
Let us suppose that we are compiling a F O R T R A N - l i k e language and
wish to use a single token type (identifier) for all variable names. When
the (direct) lexical analyzer first encounters an identifier, it could enter into
a symbol table information as to whether this identifier was fixed- or floating-
point. The lexical analyzer obtains the information by observing the first
letter of the identifier. Of course, a previous declaration of the identifier to be
a function or subroutine or not to obey the usual fixed-floating convention
would already appear in the symbol table and would overrule the attempt
by the lexical analyzer to store its information. [~
Example 10.2
Let us suppose that we are compiling a language in which array declara-
tions are defined by the following productions"
(array statement) > array (array list~

{array list) > (array definition), (array list) I(array definition)
~array definition) > (identifier) ((integer))
An example of an array declaration in this language is
(10.1.1) array AB(10), CD(20)
For simplicity, we are assuming that arrays are one-dimensional here.

The parse tree for statement (10.1.1) is shown in Fig. 10.2, treating A B and
CD identifiers and 10 and 20 as integers.
< array statement >
j
array < array list >
< array definition > < array list >
< identifier >l ( < integer >1 ) < array definition >
< identifier >2 ( < integer >2 )

Fig. 10.2 Parse tree of array statement.
SEC. 10.1 SYMBOLTABLES 791
In this parse tree the tokens (identifier)l, (identifier)2, (integer)l, and

(integer)2 represent AB, CD, 10, and 20, respectively.
The array statement is, naturally, nonexecutable; it is not compiled into
machine code. However, it makes sense to think of its translation as a se-
quence of bookkeeping steps to be executed immediately by the compiler.
That is, if the translation of an identifier is a pointer to the place in the symbol
table reserved for it and the translation of an integer is itself, then the syntax-
directed translation of a node labeled (array definition) can be instructions
for the bookkeeping mechanism to record that the identifier is an array
and that its size is measured by the integer. [~]
The ways in which the information stored in the symbol table is used are
numerous. As a simple example, every subexpression in an arithmetic expres-
sion may need mode information, so that the arithmetic operators can be
interpreted as fixed, floating, complex, and so forth. This information is
collected from the symbol table for those leaves which have identifier or
constant labels and are passed up the tree by rules such as fixed + floating =
floating, and floating q- complex = complex. Alternatively, the language, and
hence the compiler, may prohibit mixed mode expressions altogether (e.g.,
as in some versions of FORTRAN).
10.1.2. Storage Mechanisms
We conclude that there is a need in a compiler for a bookkeeping method

which rapidly stores and retrieves information about a large number of differ-
ent items (e.g., identifiers). Moreover, while the number of items that actually
occur is large, say on the order of 100 to 1000 for a typical program, the
number of possible items is orders of magnitude larger; most of the possible
identifiers do not appear in a given program.
Let us briefly consider possible ways of storing information in tables in
order to better motivate the use of the hash or scatter storage tables, to be
discussed in the next section.
Our basic problem can be formulated as follows. We have a large collec-
tion of possible items that may occur. Here, an item can be considered to
be a name consisting of a string of symbols. We encounter items in an un-
predictable fashion, and the exact number of items to be encountered is
unknown in advance. As each item is encountered, we wish to check a table
to determine whether that item has been previously encountered and if it
has not, to enter the name of the item into the table. In addition, there will
be other information about items which we wish to store in the table.
We might initially consider using a direct access table to store information
about items. In such a table a distinct location is reserved for each possible
item. Information concerning the item would be stored at that location, and
the name of the item need not be entered in the table. If the number of pos-
sible items is small and a unique location can readily be assigned to each item,
then the direct access table provides a very fast mechanism for storingand
retrieving information about items. However, we would quickly discard the
idea of using a direct access table for most symbol table applications, since
the size of the table would be prohibitive and most of it would never be used.
For example, the number of F O R T R A N identifiers (a letter followed by up
to five letters or digits) is about 1.33 × 109.
Another possible method of storage is to use a pushdown list. If a new
item is encountered, its name and a pointer to information concerning that
item is pushed onto the pushdown list. Here, the size of the table is propor-
tional to the number of items actually encountered, and new items can be
inserted very quickly. However, the retrieval of information about an item
requires that we search the list until the item is found. Thus, retrieval on
the average requires time proportional to the number of items on the list.
This technique is often adequate for small lists. In addition, it has advantages
when a block-structured language is being compiled, as a new declaration
of a variable can be pushed on top of an old one. When the block ends, all
its declarations are popped off the list and the old declarations of the vari-
ables are still there.
A third method, which is faster than the pushdown list, is to use a binary
search tree. In a binary search tree each node can have a left direct descendant
and a right direct descendant. We assume that data items can be linearly
ordered by some relation < , e.g., the relation "precedes in alphabetical
order." Items are stored as the labels of the nodes of the tree. When the first
item, say ~1, is encountered, a root is created and labeled ~1. If ct2 is the next
item and ~2 < ~1, then a leaf labeled ~2 is added to the tree and this leaf is
made the left direct descendant of the root. (If 0cl < ~z, then this leaf would
have been made the right direct descendant.) Each new item causes a new
leaf to be added to the tree in such a position that at all times the tree will
have the following property. Suppose that N is any node in the tree and that
N is labeled ft. If node N has a left subtree containing a node labeled ~, then
< ft. If node N has a right subtree with a node labeled 7, then fl < 7.
75 75
The following algorithm can be used to insert items into a binary search
tree.
ALGORITHM 10.1
Insertion of items into a binary search tree.
Input. A sequence ¢z~,..., ¢z, of items from a set of items A with a linear
order < on A.
Output. A binary tree whose nodes are each labeled by one of 0c~. . . . , ~,,
with the property that if node N is labeled cz and some descendant N' of N
is labeled fl, then fl < 0c if a n d only if N ' is in the left subtree of N.
Method.
(1) Create a single node (the root) and label it el.
(2) Suppose that e l , . . . , ei-1 have been placed in the tree, i > 0. If
i = n + 1, halt. Otherwise, insert et in the tree by executing step (3) begin-
ning at the root.
(3) Let this step be executed at node N with label ft.
(a) If e~ < fl and N has a left direct descendant, Nt, execute step (3)
at NI. If N has no left direct descendant, create such a node and
label it ei. Return to step (2).
(b) If fl < 0~iand N has a right direct descendant, Nr, execute step (3)
at Nr. If N has no right direct descendant, create such a node
and label it ei. Return to step (2). [~
The method of retrieval of items is essentially the same as the method for
insertion, except that one must check at each node encountered in step (3)
whether the label of that node is the desired item.
Example 10.3
Let the sequence of items input to Algorithm 10. l be XY, M, QB, ACE,
and OP. We assume that the ordering is alphabetic. The tree constructed is
shown in Fig. 10.3. [~
It can be shown that, after n items have been placed in a binary search
tree, the expected number of nodes which must be searched to retrieve one
of them is proportional to log n. This cost is acceptable, although hash
tables, which we shall discuss next, give a faster expected retrieval time.
10.1.3. Hash Tables
The most efficient and commonly used method for the bookkeeping neces-
sary in a compiler is the hash table. A hash storage symbol table is shown
schematically in Fig. 10.4.
Fig. 10.3 Binary search tree.
NAME POINTER
Ol
Item • Information
about ot
Ha h(ot)
function
n-1
Hash Table Data Storage Table
Fig. 10.4 Hash storage scheme.
T h e h a s h s t o r a g e m e c h a n i s m uses a hashing function, h, a hash table,

a n d a data storage table. T h e h a s h t a b l e has n entries, w h e r e n is fixed before-
h a n d . E a c h e n t r y in t h e h a s h t a b l e consists o f t w o f i e l d s - - a n a m e field a n d
a pointer field. Initially, each entry in the hash table is assumed to be empty.t
If an item tz has been encountered, then some location in the hash table,
usually h(o¢), contains an entry whose name field contains tz (or possibly
a pointer to a location in a name table in which tz is stored) and whose pointer
field holds a pointer to a block in the data storage table containing the infor-
mation associated with tz.
The data storage table need not be physically distinct from the hash
table. For example, if k words of information are needed for each item,
then it is possible to use a hash table of size kn. Each item stored in the hash
table would occupy a block of k consecutive words of storage. The appro-
priate location in the hash table for an item tz can then readily be found by
multiplying h(~), the hash address for 0c, by k and using the resulting address
as the location of the first word in the block of words for tz.
The hashing function h is actually a list of functions h0, h ~ , . . . , hm, each
from the set of items to the set of integers [0, 1 , . . . , n -- 1}. We shall call
ho the primary hashing function. When a new item ~ is encountered, we can
use the following algorithm to compute h(~t), the hash address of tz. If tz has
been previously encountered, h(00 is the location in the hash table at which
tz is stored. If ~ has not been encountered, then h(~) is an empty location
into which ~ can be stored.
ALGORITHM 10.2
Computation of a hash table address.
Input. An item ~, a hashing function h consisting of a sequence of functions
h 0, h l,..., ., h meach from the set of items to the set of integers {0, 1 , . . . , n -- 1}
and a (not necessarily empty) hash table with n locations.
Output. The hash address h(00 and an indication of whether tz has been
previously encountered. If tz has already been entered into the hash table,
h(tz) is the location assigned to tz. If ~z has not been encountered, h(tz) is
an empty location into which ~ is to be stored.
Method.
(1) We compute ho(tZ), h l ( t z ) , . . . , hm(tz) in order using step (2) until no
"collision" occurs. If hm(tz) produces a collision, we terminate this algorithm
with a failure indication.
(2) Compute hi(tz) and do the following:
(a) If location h,(ot) in the hash table is empty, let h(tz)= h,(oO,
report that ~ has not been encountered, and halt.
(b) If location hi(a) is not empty, check the name entry of this loca-
tSometimes it is convenient to put the reserved words and standard functions in the
symbol table at the start.
tion.t If the name is a, let h(a) = hi(a), report that a has already
been entered, and halt. If the name is not a, a collision occurs
and we repeat step (2) to compute the next alternate address.
Each time a location hi(a) is examined, we say that a probe of the table
is made.
When the hash table is sparsely filled, collisions are rare, and for a new
item a, h(a) can be computed very quickly, usually just by evaluating the
primary hashing function, ho(a). However, as the table fills, it becomes
increasingly likely that for each new item a, ho(a) will already contain another
item. Thus, collisions become more frequent as more items are inserted into
the table, and thus the number of probes required to determine h(a) increases.
However, it is possible to design hash tables whose overall performance is
much superior to binary search trees.
Ideally, for each distinct item encountered we would like the primary
hashing function h 0 to yield a distinct location in the hash table. This, of
course, is not generally feasible because the total number of possible items
is usually much larger than n, the number of locations in the table. In
practice, n will be somewhat larger than the number of distinct items ex-
pected. However, some course of action must be planned in case the table
overflows.
To store information about an item a, we first compute h(a). If 0c has not
been previously encountered, we store the name a in the name field of loca-
tion h(a). [If we are using a separate name table, we store a in the next empty
location in the name table and put a pointer to this location in the name
field of location h(a).] Then we seize the next available block of storage in
the data storage table and put a pointer to this block in the pointer field of
location h(a). We can then insert the information in this block of data storage
table.
Likewise, to fetch information about an item a, we can compute h(a),
if it exists, by Algorithm 10.2. We can then use the pointer in the pointer field
to locate the information in the data storage table associated with item a.
Example 10.4
Let us choose n = 10 and let an item consist of any string of capital
R o m a n letters. We define CODE(a), where a is a string of letters to be the
sum of the "numerical value" of each letter in a, where A has numerical
value of 1, B has value 2, and so forth. Let us define hj(a), for 0 < j _~ 9,
?If the name entry contains a pointer to a name table, we need to consult the name
table to determine the actual name.
to be (CODE(00 q--j)mod 10.t Let us insert the items A, W, and EF into

the hash table.
We find that h 0 ( A ) = (1 mod 1 0 ) = 1, so A is assigned to location 1.
Next, W is assigned to location h0(W ) = (23 mod 1 0 ) = 3. Then, EF
is encountered. We find h0(EF ) = (5 ÷ 6)mod 10 -~ 1. Since location 1 is
already occupied, we try h l ( E F ) = (5 .jr 6 + 1)rood 10-~ 2. Thus, EF is
given location 2. The hash storage contents are now as depicted in Fig. 10.5.
Name Pointer
data for A
A -'----- - " " " " - * "
EF -~.
W data for W
data for EF
7
8
9
Fig. 10.5 Hash table contents.
Now, suppose that we wish to determine whether item HX is in the table.

We find h0(HX ) = 2. By Algorithm 10.2, we examine location 2 and find
that a collision occurs, since location 2 is filled but not with HX. We then
examine location hi(HX ) ~- 3 and have another collision. Finally, we com-
pute hz(HX) -~ 4 and find that location 4 is empty. Thus, we can conclude
that the item HX is not in the table. [~
10.1.4. Hashing Functions
It is desirable to use a primary hashing function h0 that scatters items

uniformly throughout the hash table. Functions which do not map onto
the entire range of locations or tend to favor certain locations, as well as
those which are expensive to compute, should be avoided.
Some commonly used primary hashing functions are the following.
ta mod b is the remainder when a is divided by b.

(1) If an item ~ is spread across several computer words, we can numeric-

ically sum these words (or exclusive-or these words) to obtain a single word
representation of ~. We can then treat this word as a number, square it, and
use the middle log2 n bits of the result as ho(~). Since the middle bits of
the square depend on all symbols in the item, distinct items are likely to
yield different addresses regardless of whether the items share common
prefixes or suffixes.
(2) We can partition the single word representation of a into sections of
some fixed length (log2 n bits is common). These sections can then be sum-
med, and the log2 n low-order bits of the sum determine ho(00.
(3) We can divide the single word representation of ~ by the table size
n and use the remainder as ho(00. (n should be a prime here.)
Let us now consider methods of resolving collisions, that is, the design
of the alternate functions ha, h ~ , . . . , h m. First we note that h~(0¢) should be
different from h~(00 for all i ~ j. If h~(~) produces a collision, it would be
senseless to then probe hi+r(a) if hi+r(t~) = hi(a). Also, in most cases we want
m = n -- 1, because we always want to find an empty slot in the table if one
exists. In general the method used to resolve collisions can have a significant
effect on the overall efficiency of a scatter store system.
The simplest, but as we shall see one of the worst, choices for the func-
tions h a, h 2 , . . . , h,_i is to use
h,(~) [h0(tx) + ilmod n for 1 ~ i ~ n -- 1.
Here we search forward from the primary location ho(00 until no collision
occurs. If we reach location n -- 1, we proceed to location 0. This method
is simple to implement, but clusters tend to occur once several collisions are
encountered. For example, given that h0(00 produces a collision, the prob-
ability that ha(tx) will also produce a collision is greater than average.
A more efficient method of generating alternate addresses is to use
hi(a) = (h0(a) + r~)mod n for 1 ~ i ~ n -- 1
where r~ is a pseudorandom number. The most rudimentary random number

generator that generates every integer between 1 and n - 1 exactly once
will often suffice, (See Exercise 10.1.8.) Each time the alternate functions are
used, the random number generator is initialized to the same point. Thus, the
same sequence r 1, r2, • • • is generated each time h is invoked, and the "random"
number generator is quite predictable.
Other possibilities for the alternate functions are
hi(0¢) = [i(ho(~ ) + 1)]mod n

and
h,(a) = [h0(a ) q- ai 2 + bi]mod n,
where a and b are suitably chosen constants.
A somewhat different method of resolving collisions, called chaining, is
discussed in the Exercises.
10.1.5. Hash Table Efficiency
We shall now address ourselves to the question, "How long, on the

average, does it take to insert or retrieve data from a hash table ?" A com-
panion question is, "Given a set of probabilities for the various items, how
should the functions h 0 , . . . , h,_ 1 be selected to optimize the performance of
the hash table ?" Interestingly, there are a number of open questions in this
area.
As we have noted, it is foolish to have duplications in the sequence of
locations h 0 ( a ) , . . . , h,_a(a) for any a. We shall thus assume that the best
system of hashing functions avoids duplication. For example, the sequence
] t o , . . . , h 9 of Example 10.4 never causes the same location to be exam-
ined twice for the same item.
If n is the size of the hash table with locations numbered from 0 to n -- 1
and h 0, . . . , h,_ ~ are the hashing functions, we can associate with each item
a a permutation II~ of {0. . . . , n - 1}, namely II~ = [ h 0 ( a ) , . . . , h,_~(a)].
Thus, the first component of I I , gives the primary location to be assigned
to a, the second component gives the next alternate location, and so forth.
If we know p,, the probability of item a occurring, for all items a, we can
define the probability of permutation II to be ~ p~, where the sum is taken
over all items a such that I I , = II. Hereafter we shall assume that we have
been given the probabilities of the permutations.
It should be clear that we can calculate the expected number of locations
which must be examined to insert an item into a hash table or find an item
knowing only the probabilities of the permutations. It is not necessary to
know the actual functions h 0 , . . . , h,_~ to evaluate the efficiency of a par-
ticular hashing function. In this section we shall study properties of hashing
functions as measured by the probabilities of the permutations.
We are thus motivated to make the following definition, which abstracts
the problem of hash table design to a question of what is a desirable set of
probabilities for permutations.
DEFINITION
A hashihg system is a number n, the table size, and a probability function
p on permutations of the integers 0 through n -- 1. We say that a hashing
system is random if p(II) = 1/n! for all permutations II.
The important functions associated with a hashing system are

(1) p ( i l i ~ . . . ik), the probability that someitem will have il as its primary
location, i2 the first alternate, i 3 the next alternate, and so forth (each ij here
is an integer between 0 and n -- 1), and
(2) p([i~, i 2 , . . . , ik}), the probability that a sequence of k items will fill
exactly the set of locations [il, • • •, ik}.
The following formulas are easy to derive"
(10.1.2) p(il . . . ik) --- ~_, p(i~ ' ' " iki ) if k < n
i not a m o n g il, .. ', i~
(10.1.3) p(il'" i,) = p(H)
where H is the permutation [ i l , . . . , i,]'
(10.1.4) p ( S ) = ~_~ p ( S -- {i}) ~] p ( w i )

iES w
where S is any subset of { 0 , . . . , n -- 1} of size k and the rightmost sum is

taken over all w such that w is any string of k -- 1 or fewer distinct locations
in S -- [i}. We take p ( ~ ) = 1.
Formula (10.1.2) allows us to compute the probability of any sequence
of locations occurring as alternates, starting with the probabilities of the
permutations. Formula (10.1.4) allows us to compute the probability that
the locations assigned to k items will be exactly those in set S. This probability
is the sum of the probability that the first k -- 1 items will fill all locations in
S except location i and that the last item will fill i, taken over all i in S.
Example 10.5
Let n = 3 and let the probabilities of the six permutations be
Permutation Probability
[0, 1, 21 .1
[0,2, 11 .2
[1, 0, 21 .1
[1, 2, 0l .3
[2, 0, 11 .2
[2, 1, 0l .1
By (10.1.3) the probability that the string of locations 012 is generated

by applying the hashing function to some item is just the probability of
the permutation [0, 1, 2]. Thus, p(012) = .1. By (10.1.2) the probability that
01 is generated is p(0!2). Likewise, p(02) is the probability of permutation
[0,2, 1], which is .2. Using formula (10.1.2), p ( 0 ) = p ( 0 1 ) - t - p ( 0 2 ) = .3.
SEC. 1 0 . 1 SYMBOL TABLES 801
The probability of each string of length 2 or 3 is the probability of the unique

permutation of which it is a prefix The probabilities of strings 1 and 2 are,
respectively, p(10) + p(12) = .4 and p(20) + p(21) = .3.
Let us now compute the probabilities that various sets of locations are
filled. For sets S of one element, (10.1.4) reduces to p({i}) = p(i). Let us com-
pute p({0, 1}) by (10.1.4). Direct substitution gives
p({0, 1}) -- p((0})[p(1) + p(01)] + p({1))[p(0) + p(10)]

= .3[.4 + .11 + .4[.3 -t-.1] = .31
Similarly, we obtain
p((0, 2}) = .30 and p({1, 2}) - - . 3 9
To compute p((0, 1, 2}) by (10.1.4), we must evaluate
p({0, 1)}[p(2) -k- p(02) --Jrp(12) -+- p(012) --Jr-p(102)]

+ p({0, 2})[p(1) + p(01) + p(2 l) + p(021) -Jr-p(201)]
-+- p(( 1, 2})[p(0) -Jr-p(10) -t- p(20) -+- p(120) -t- p(210)]
which sums to 1, of course. [~]
The figure of merit which we shall use to evaluate a hashing system is

the expected number of probes necessary to insert an item ~ into a table in
which k out of n locations are filled. We shall succeed on the first probe if
h0(00 = i and location i is one of the n - k empty locations. Thus, the
probability that we succeed on the first probe is given by
n-1
p(i) ~_~p(S)
i=0 S
where the rightmost sum is taken over all sets S of k locations which do not
contain i.
If h0(a ) is in S but hi(e) ~ S, then we shall succeed on the second try.
Therefore, the probability that we fail on the first try but succeed on the
second is given by
n-I n-1
~_~p(ij) ~ p(S)
i=0 j = 0 S
where the rightmost sum is taken over all sets S such that _~ S = k, i E S,
and j q! S. Note that P(O') = 0 if i = j.
Proceeding in this manner, we arrive at the following formula for E(k, n),
the expected number of probes required to insert an item into a table in which
k out of n locations are filled, k < n"

k+l
(10.1.5) E(k, n) ---- E m ~_~p(w) ~_~p(S)
m=l w S
where
(1) The middle summation is taken over all w which are strings of distinct
locations of length m and
(2) The rightmost sum is taken over all sets S of k locations such that all
but the last symbol of w is in S. (The last symbol of w is not in S.)
The first summation assumes that m steps are required to compute the
primary location and its first m -- 1 alternates. Note that if k < n, an empty
location will always be found after at most k + 1 tries.
Example 10.6
Let us use the statistics of Example 10.5 to compute E(2, 3). Equation
(10.1.5) gives
3
E(2, 3 ) = ~ m ~ p(w) Z p(S)
= I m ,$' s u c h t h a t ~ S = 2 a n d a l l but
t h e last symbol o f w is i n S
-- p(O)p([1, 2}) + p(1)p({O, 2}) + p(2)p([O, 1})

• + 2[p(O 1)p({~O,2]) + p(10)p([ 1, 2}) + p(O2)p([O, 1})
+ p(20)p([ 1, 2}) + p(12)p([O, 1}) + p(21)p({~O, 2})]
-1- 3[p(O12)p([O, 1}) + p(O21)p({~O, 2}) + p(lO2)p({~O, 1})
-q--p(120)p([1, 2}) + p(201)p([O, 2}) + p(ZlO)p([1, 2})]
-~ 2.008
Another figure of merit used to evaluate hashing systems is R(k, n), the
expected number of probes required to retrieve an item from a table in which
k out of n locations are filled. However, this figure of merit can be readily
computed from E(k, n). We can assume that each of the k items in the table
is equally likely to be retrieved. Thus, the expected retrieval time is equal to
the average number of probes that were required to originally insert these
k items into the table. That is,
R(k, n) - - - ~1 ~,=o
lE(i,n)
For this reason, we shall consider E(k, n) as the exclusive figure of merit.
A natural conjecture is that performance is best when a hashing system
is random, on the grounds that any nonrandomness can only make certain
locations more likely to be filled than others and that these are exactly the
locations more likely to be examined when we attempt to insert new items.
While this will be seen not to be precisely true, the exact optimum is not
known. Random hashing is conjectured to be optimum in the sense of mini-
mum retrieval time, and other common hashing systems do not compare
favorably with a random hashing system. We shall therefore calculate E(k, n)
for a random hashing system.
LEMMA 10.1
If a hashing system is random, then
(1) For all sequences w of locations such that 1 _~ ]wl ~ n,
p(w) = (n -- l w l)!In!
(2) For all subsets S of{0, I , . . . , n -- 1},
1
-(;x)
Proof
(1) Using (10.1.2), this is an elementary induction on (n - - [ w I), starting
at lwl = n and ending at lw[ ---- 1.
(2) A simple argument of symmetry assures us that p(S) is the same for
•
all S of size k. Since the number of sets of size k is (z) , part (2) is immedi-
ate. [~
LEMMA 10.2
I f n > k, then ~ ( ~ - - J ) = (n-~ ]1
Proof. Exercise. [~]
THEOREM 10.1
If a hashing system is random, then E(k, n) = (n + 1)/(n + 1 -- k).
Proof. Let us suppose that we have a hash table with k out of n locations
filled. We wish to insert the k ÷ 1st item e. It follows from Lemma 10.1(2)
that every set of k locations has the same probability of being filled. Thus
E(k, n) is independent of which k locations are actually filled. We can there-
fore assume without loss of generality that locations 0, 1, 2 , . . . , k - 1
are filled.
To determine the expected number of probes required to insert a, we
examine the sequence of addresses obtained by applying h to a. Let this
sequence be ho(a ), h a ( a ) , . . . , h,_~(a). By definition, all such sequences of
n locations are equally probable.
Let q j be the probability that the first j - 1 locations in this sequence

are in [0, 1 , . . . , k -- 1} but that the jth is not. Clearly, E(k, n), the expected
number of probes to insert ~, is ~=+ljqj. We observe that
k+l k+l m k+l k+l
(10.1.6) ~ J q j ~-- ~ ~ qm = ~2 ~ qm
j=l m=l j=l j = l m=j
But ~ ' j qm is just the probability that the first j - 1 locations in the
sequence h0(~z), h , ( 0 0 , . . . , h,_,(~) are between 0 and k - 1, i.e., that at
least j probes are required to insert the k + 1st item. By Lemma 10.1(1),
this quantity is
n - j + 1)
( k ) ( k --11 ) " " ( k - -- -J j- +
t -2 ) __ kl
(k - - j + 1)!
(n--j+
n!
1)l k--j+
(~)
1
Then,
E(k, n) __ k+, (~c j+l (n 1 n+l

(z) n--k+l
using Lemma 10.2. [~]
We observe from Theorem 10.1 that for large n and k, the expected time
to insert an item depends only on the ratio of k and n and is approximately
1/(1 -- p), where p = kin. This function is plotted in Fig. 10.6.
The ratio kin is termed the load factor. When the load factor is small,
the insertion time increases with k, the number of filled locations, at a slower
rate than log k, and hashing is thus superior to a binary search. Of course,
if k approaches n, that is, as the table gets filled, insertion becomes very
expensive, and at k = n, further insertion is impossible unless some mecha-
nism is provided to handle overflows. One method of dealing with overflows
is suggested in Exercises 10.1.11 and 10.1.12.
The expected number of trials to insert an item is not the only criterion
of goodness of a hashing scheme. One also desires that the computation of
the hashing functions be simple. The hashing schemes we have considered
compute the alternate functions h,(e) . . . . , h,_ ,(~) not from 0citself but from
h0(e), and this is characteristic of most hashing schemes. This arrangement is
efficient because ho(00 is an integer of known length, while e may be arbitrarily
long. We shall call such a method hashing on locations. A more restricted
case, and one that is even easier to implement, is linear hashing, where hi(e)
is given by (h0(e) + i)mod n. That is, successive locations in the table are
tried until an empty one is found; if the bottom of the table is reached, we
proceed to the top. Example 10.4 (p. 796) is an example of linear hashing.
10-
6 h
4--
2--
"° ~ o j °
i . i . ! l !
0 0.2 0.4 0.6 0.8 1.0
k/n
Fig. 10.6 Expected insertion time as a function of load factor for
r a n d o m hashing.
We shall give an example to show that linear hashing can be inferior to

random hashing in the expected number of trials for insertion. We shall
then discuss hashing on locations and show that at least in one case it, too,
is inferior to random hashing.
Example 1 0 . 7
Let us consider a linear hashing system with n = 4 and probability 1/4
for each of the four permutations [0123], [1230], [2301], and [3012]. The reader
can show that performance is made worse only if these probabilities are
made unequal. For random hashing, Theorem 10.1 gives E(2, 4 ) = 5/3.
By (10.1.4), we can calculate the probabilities that a set of two locations
is filled. These probabilities are
p({0, 1}) =p({1, 2}) =p((2, 3]) =p([3, 0}) = 3/16

and
p((0, 2]) -- p((1, 3}) = 1/8
Then, by (10.1.5), we compute E(2, 4) for linear hashing to be 27/16, which

is greater than 5/3, the cost for random hashing. [~]
We shall next compare hashing on locations with random hashing in

the special case that the third item is entered into the table. We note that
when we hash on locations there is, for each location i, exactly one permu-
tation, H~, that begins with i and has nonzero probability. We can denote
the probability of IIt by Pi. We shall denote the second entry in II~, the first
alternate of i, by a~. Ifp~ = 1/n for each i, we call the system random hashing
on locations.
THEOREM 10.2
E(2, n) is smaller for random hashing than for random hashing on loca-
tions for all n > 3.
Proof. We know by Theorem 10.1 that E(2, n) for random hashing is
(n + 1 ) / ( n - 1). We shall derive a lower bound on E(2, n) for hashing on
locations. Let us suppose that the first three items to be entered into the table
have permutations Hi, IIj, and rib, respectively. We shall consider two
cases, depending on whether i = j or not.
Case 1: i ~ j. This occurs with probability (n -- 1)/n. The expected num-
ber of trials to insert the third item is seen to be
1 + (2/n) + (2/n)[1/(n -- 1)] = (n + 1)/(n -- 1).
Case 2: i = j . This occurs with probability 1/n. Then with probability

(n -- 2)/n, the third item is inserted in one try, that is, k ~ i and k ~ a~,
the second location filled. With probability 1/n, k = a~, and at least two
tries are made. Also with probability 1/n, k = i, and three tries must be
made. (This follows because we know that the second try is for a~, which
was filled by j.) The expected number of tries in this case is thus at least
[(n -- 2)/n] + (2/n) + (3/n) = (n + 3)In.
Weighting the two cases according to their likelihoods, we get, for random
hashing on location,
n2 + 2 n + 3
E ( 2 ' n ) ~+( ~- -- I ) ( n - l n ) + ( n+n 3)(_~_)= n2
The latter expression exceeds (n + 1)/(n -- 1) for n > 3. [~
The point of the previous example and theorem is that many simple
hashing schemes do not meet the performance of random hashing. Intuitively,
the cause is that when nonrandom schemes are used, there is a tendency for
the same location to be tried over and over again. Even if the load factor is
small, with high probability there will still be some locations that have been
tried many times. If a scheme such as hashing on locations is used, each
time a primary location h0(~) is filled, all the alternates of h0(~) which were
tried before will be tried again, resulting in poor performance.
The foregoing does not imply that one should not use a scheme such
EXERCISES 807
as hashing on locations if there is a compensating saving in time per insertion

try.
In fact, it is at least possible that random hashing is not the best we can do.
The following example shows that E(k, n) does not always attain a minimum
when random hashing is used. We conjecture, however, that r a n d o m hashing
does minimize R(k, n), the expected retrieval time.
•E x a m p l e 10.8
Let the permutations [0123] and [1032] have probability .2 and let [2013],
[2103], [3012], and [3102] have probability .15, all others having zero probabil-
ity. We can calculate E(2, 4) directly by (10.1.5), obtaining the value 1.665.
This value is smaller than the figure 5/3 for r a n d o m hashing.
EXERCISES
10.1.1. Use Algorithm 10.1 to insert the following sequence of items into a
binary search tree" T, D, H, F, A, P, O, Q, W, TO, TH. Assume that
the items have alphabetic order.
10.1.2. Design an algorithm which will take a binary search tree as input and
list all elements stored in the tree in order. Apply your algorithm to
the tree constructed in Exercise 10.1.1.
"10.1.3. Show that the expected time to insert (or retrieve) one item in a binary
search tree is O(log n), where n is the number of nodes in the tree.
What is the maximum amount of time required to insert any one item ?
"10.1.4. What information about FORTRAN variables and constants is needed
in the symbol table for code generation ?
10.1.5. Describe a symbol table storage mechanism for a block-structured
language such as ALGOL in which the scope of a variable X is limited
to a given block and all blocks contained in that block in which X is
not redeclared.
10.1.6. Choose a table size and a primary hashing function h0. Compute
h0(t~), where 0c is drawn from the set of (a) FORTRAN keywords,
(b) ALGOL keywords, and (c) PL/I keywords. What is the maximum
number of items with the same primary hash address ? You may wish
to do this calculation by computer. Sammet [1969] will provide the
needed sets of keywords.
"10.1.7. Show that R(k, n) for random hashing approximates (-- 1/p) log (1 --p),
where p = k/n. Plot this function.
Hint: Approximate (n/k) ~k~-o~(n + 1)/(n -- i + 1) by an integral.
*10.1.8. Consider the following pseudorandom number generator. This genera-
tor creates a sequence rl, rz, • • •, r,_l of numbers which can be used
to compute hi(00 = [h0(t~) + rdmod n for 1 ~ i ~ n - 1. Each time
that a sequence of numbers is to be generated, the integer R is initially

set to 1. We assume that n -- 2p. Each time another number is required,
the following steps are executed"
(1) R = 5 • R.
(2) R = R mod 4n.
(3) Return r = [R/4].
Show that for each i, the differences r~+k - ri are all distinct for k > 1
andi+k~n- 1.
*'10.1.9. Show that if h i = [ h 0 + a i z + b i ] m o d n for 1 < i ~ n - 1, then at
most half the locations in the sequence ho, hi, h z , . . . , h,_a are distinct.
Under what conditions will exactly half the locations in this sequence
be distinct ?
**10.1.10. How many distinct locations are there in the sequence ho, hi . . . . . hp_l
if f o r l < i < p / 2
hzi-1 = [h0 + iZ]modp
h z ~ = h o _ ~ ( p -2 1) + i t 2 mod P
and p is a prime number ?
DEFINITION
Another technique for resolving collisions that is more efficient in
terms of insertion and retrieval time is chaining. In this method one
field is set aside in each entry of the hash table to hold a pointer to
additional entries with the same primary hash address. All entries with
the same primary address are chained on a linked list starting at that
primary location.
There are several methods of implementing chaining. One method,
called direct chaining, uses the hash table itself to store all items. To
insert an item 0~, we consult location ho(0~).
(1) If that location is empty, 0c is installed there. If h0(~) is filled
and is the head of a chain, we find an empty entry in the hash table
by any convenient mechanism and place this entry on the chain
headed by ho(00.
(2) If h0(0¢) is filled but not by the head of a chain, we move the
current entry fl in h0(0c) to an empty location in the hash table and
insert 0c in ho(00. [We must recompute h(fl) to keep fl in the proper
chain.]
This movement of entries is the primary disadvantage of direct
chaining. However, the method is fast. Another advantage of the
technique is that when the table becomes full, additional items can be
placed in an overflow table with the same insertion and retrieval
strategy.
EXERCISES 809
10.1.11. Show that if alternate locations are chosen randomly, then R(k, n),
the expected retrieval time for direct chaining, is 1 + p/2, where
p = k/n. Compare this function with R(k, n) in Exercise 10.1.7.
Another chaining technique which does not require items to be
moved uses an index table in front of the hash table. The primary
hashing function h0 computes addresses in the index table. The entries
in the index table are pointers to the hash table, whose entries are
filled in sequence.
To insert an item ~ in this new scheme, we compute h0(tx), which
is an address in the index table. If h0(t~) is empty, we seize the next
available location in the hash table and insert t~ into that location.
We then place a pointer to this location in h0(00.
If h0(t~) already contains a pointer to a location in the hash table,
we go to that location. We then search down the chain headed by
that location. Once we reach the end of the chain, we take the next
available location in the hash table, insert ~ into that location, and
then attach this location to the end of the chain.
If we fill the hash table in order beginning from the top, we can
find the next available location very quickly. In this scheme no items
ever need to be moved because each entry in the index table always
points to the head of a chain.
Moreover, overflows can be simply accommodated in this scheme
by adding additional space to the end of the hash table.
"10.1.12. What is the expected retrieval time for a chaining scheme with an
index table? Assume that the primary' locations are uniformly dis-
tributed.
10.1.13. Consider a random hashing system with n locations as in Section
10.1.5. Show that if S is a set of k locations and i q~ S, then
~w p(wi) = 1/(n -- k), where the sum is taken over all w such that w
is a string of k or fewer distinct locations in S.
"10.1.14. Prove the following identities"
(a) ,=o~(n +i i) --(n + k + )"
(c) i = o ~ i ( n f -- k ( n + k + ) - - ( n +k_k +l 1).
"10.1.15. Suppose that items are strings of from one to six capital R o m a n letters.
Let CODE(a) be the function defined in Example 10.4. Suppose that
item tx has probability (1/6)26-1~,I. Compute the probabilities of the
permutations on {0, 1 , . . . , n -- 1} if
(a) h,(ct) = (CODE(t~) + i)mod n, 0 < i < n - 1.
(b) hi(a) = (i(ho(~) + 1))mod n, 1 .~ i _~ n - 1 where

h0(~) = CODE(a) mod n
*'10.1.16. Show that for linear hashing with the primary location h0(00 randomly
distributed, the limit of E(k, n) as k and n approach oo with kin : p
is given by 1 + [p(1 -- p/2)/(1 - p)Z]. How does this compare with
the corresponding function of p for random hashing ?
DEFINITION
A hashing system is said to be k-uniform if for each set

S ~ { 0 , 1 , 2 . . . . . n - - l ] such that ~ S : k the probability that a
sequence of k distinct locations consists of exactly the locations of S
is 1/(7,). (Most important, the probability is independent of S.)
*10.1.17. Show that if a hashing system is k-uniform, then
E(k, n) = (n + 1)/(n + 1 -- k),
as for random hashing.

"10.1.18. Show that for each hashing system such that there exists k for which
E(k, n) < (n + 1)/(n + 1 -- k), there exists k' < k such that
E(k', n) > (n + 1)/(n q-- 1 -- k')
Thus, if a given hashing system is better than random for some k, there
is a smaller k' for which performance is worse than random. Hint:
Show that if a hashing system is (k -- 1)-uniform but not k-uniform,
then E(k, n) > (n q- 1)/(n -- k q- 1).
"10.1.19. Give an example of a hashing systemwhich is k-uniform for all k but
is not random.
"10.1.20. Generalize Example 10.7 to the case of unequal probabilities for the
cyclic permutations.
"10.1.21. Strengthen Theorem 10.2 to include systems which hash on locations
but do not have equal pt's.
Open Problems
10.1.22. Is random hashing optimal in the sense of expected retrieval time ?
That is, is it true that R(k, n) is always bounded below by
(l/k) Z~-~ (n + 1)/(n q- 1 - - i ) ?
We conjecture that random hashing is optimal.

10.1.23. Find the greatest lower bound on E(k, n) for systems which hash on
locations. Any lower bound that exceeds (n + 1)/(n ÷ 1 -- k) would be
of interest.
10.1.24. Find the greatest lower bound on E(k, n) for arbitrary hashing systems.
We saw in Example 10.8 that (n ÷ 1)/(n + 1 -- k) is not such a bound.
SEC. 10.2 PROPERTY GRAMMARS 811
Research Problem
10.1.25. In certain uses of a hash table, the items entered are known in advance.
Examples are tables of library routines or tables of assembly language
operation codes. If we know what the residents of the hash table are,
we have the opportunity to select our hashing system to minimize the
expected lookup time. Can you provide an algorithm which takes the
list of items to be stored and yields a hashing system which is efficient
to implement, yet has a low lookup time for this particular loading of
the hash table ?
10.1.26. Implement a hashing system that does hashing on locations. Test the
behavior of the system on F O R T R A N keywords and common func-
tion names.
10.1.27. Implement a hashing system that uses chaining to resolve collisions.
Compare the behavior of this system with that in Exercise 10.1.26.
BIBLIOGRAPHIC NOTES
Hash tables are also known as scatter storage tables, key transformation tables,
randomized tables, and computed entry tables. Hash tables have been used by
programmers since the early 1950's. The earliest paper on hash addressing is by
Peterson [1957]. Morris [1968] provides a good survey of hashing techniques. The
answer to Exercise 10.1.7 can be found there.
Methods of computing the alternate functions to reduce the expected number
of collisions are discussed by Maurer [1968], Radke [1970], and Bell [1970]. An
answer to Exercise 10.1.10 can be found in Radke [1970]. Ullman [1972] discusses
k-uniform hashing systems.
Knuth [1973] is a good reference on binary search trees and hash tables.
10.2. PROPERTY G R A M M A R S
An interesting and highly structured method of assigning properties to

the identifiers of a p r o g r a m m i n g language is through the formalism of
"property grammars." These are context-free grammars with an additional
mechanism to record information about identifiers and to handle some of
the non-context-free aspects of the syntax of p r o g r a m m i n g languages (such
as requiring identifiers to be declared before their use). In this section we
provide an introduction to the theory of property grammars a n d show how
a property g r a m m a r can be implemented to model a syntactic analyzer that
combines parsing with certain aspects of semantic analysis.
812 BOOKKEEPING CHAP: 10
10.2.1. Motivation
Let us try first to understand why it is not always sufficient to establish

the properties of each identifier as it is declared and to place these properties
in some location in a symbol table reserved for that identifier. If we consider
a block-structured language such as A L G O L or PL/I, we realize that the
properties of an identifier may change many times, as it may be defined in
an outer block and redefined in an inner block. When the inner block ter-
minates, the properties return to what they were in the outer block.
For example, consider the diagram in Fig. 10.7 indicating the block struc-
ture of a program. The letters indicate regions in the program. This block
structure can be presented by the tree in Fig. 10.8.
begin block 1
A
begin block 2
B
,i
begin block 3
c
end block 3
end block 2
begin block 4
end block 4
end block 1 Fig. 10.7 Blockstructure of a program.
Suppose that an identifier I is defined in block 1 of this program and is

recorded in a symbol table as soon as it is encountered. Suppose that I is
then redefined in block 2. In regions B, C, and D of this program, I has the
new definition. Thus, on encountering the definition of I in block 2, we must
enter the new definition of I in the symbol table. However, we cannot merely
replace the definition that I had in block 1 by the new definition in block 2,
because once region E is encountered we must revert to the original definition
of L
One way to handle this problem is to associate two n u m b e r s - - a level
number and an index--with each definition of an identifier. The level number
is the nesting depth, and each block with the same level number is given
a distinct index. For example, identifiers in areas B and D of block 2 would
$EC. 10.2 PROPERTY GRAMMARS 813
block 1
A block 2 E block 4 G
B
j block 3 D F
Fig. 10.8 Tree representation of block structure.
have level number 2 and index 1. Identifiers in area F of block 4 would have
level number 2 and index 2.
If an identifier in the block with level i and index j is referenced, we look
in the symbol table for a definition of that identifier with the same level
number and index. However, if that identifier is nowhere defined in the block
with level number i, we would then look for a definition of that identifier
in the block of level number i -- 1 which contained the block of level number
i and index j, and so forth. If we encounter a definition at the desired level
but with too small an index, we may delete the definition, as it will never
again apply. Thus, a pushdown list is useful for the storing of definitions of
each identifier as encountered. The search described is also facilitated if the
index Of the currently active block at each level is available.
For example, if an identifier K is encountered in region C and no defini-
tion of K appeared in region C, we would accept a definition of K appear-
i n g in region B (or D, if declarations after use are permitted). However, if
no definition of K appeared in regions B or D, we would then accept a
definition of K in regions A, E, or G. But we would not look in region F for
a definition of K.
The level-index method of recording definitions can be used for languages
with the conventional nesting of definitions, e.g., A L G O L and PL/I. How-
ever, in this section we shall discuss a more general formalism, called prop-
erty grammars, which permits arbitrary conventions regarding scope of
definition. Property grammars have an inherent generality and elegance
which stem from the uniformity of the treatment of identifiers a n d their
properties. Moreover, they can be implemented in an amount of time which
is essentially linear in the length of the compiler input. While the constant
of proportionality may be high, we present the concept in the hope that

future research may make it a practical compiler technique.
10.2.2. Definition of Property Grammar
Informally, a property grammar consists of an underlying C F G to whose

nonterminal and terminal symbols we have attached "property tables."
We can picture a property table as an abstraction of a symbol table. When
we parse bottom-up according to the underlying grammar, the property
tables attached to the nonterminals together represent the information avail-
able in the symbol table at that point of the parse.
A property table T is a mapping from an index set I to a property set V.
Here we shall use the set of nonnegative integers as the index set. We can
interpret each integer in the index set as a pointer into a symbol table. Thus, if
the entry pointed to in the symbol table represents an identifier, we can treat
the integer as the name of that identifier.
The set V is a set of "properties" or "values" and we shall use a finite set
Of integers for V. One integer (usually 0) is distinguished as the "neutral"
property. Other integers can be associated with various properties which
are of interest. For example, the integer 1 might be associated with the prop-
erty "this identifier has been referenced," 2 with "declared real," and so forth.
In tables associated with terminals all but one index is mapped to the
neutral property. The remaining index can be mappeid to any property
(including the neutralone). However, if the terminal is the token (identifier),
the index which represents the name of the particular token (that is, the data
component of the token) would be mapped onto a property such as "this
is the identifier mentioned here."
For example, if we encounter the declaration real B in a program, this
string might be parsed as
n) [1"2]
real identifier [1" 1]
where the d a t a c o m p o n e n t of the token (identifier) refers to the string B.

With the terminal (identifier) is associated the table [1: 1], which associates
the property 1 (perhaps "mentioned") with the index 1 (which now corre-
sponds to B) and the neutral property with all other indices. The terminal
token real is associated with a table which maps all indices to the neutral
property. Such a table will normally not be explicitly present.
With the nonterminal (declaration) in the parse tree we associate a table
which would be constructed by merging the tables of its direct descendants
according to some rule. Here, the table that is constructed is [1: 2], which
SEC. 10.2 PROPERTY GRAMMARS 81 5
associates property 2 with index 1 and the neutral property with all other
indices. This table can then be interpreted as meaning that the identifier
associated with index 1 (namely B) has the property "declared real."
In general if we have the structure
A
T
x~ x~ . . . x ~
T~T~ ...T~
in a parse tree, then the property of index i in table T is a function only of

the property of index i in tables T1, T2 . . . . , Tk. That is, each index is treated
independently of all others.
We shall now define a property grammar precisely.
DEFINITION
A property grammar is an 8-tuple G -- (N, Z, P, S, V, %, F, g), where
(1) (N, Z, P, S) is a CFG, called the underlying CFG;
(2) V is a finite set of properties;
(3) v 0 in V is the neutral property;
(4) F ~ V is the set of acceptable properties; v o is always in F; and
(5) /t is a mapping from P × V* to V, such that
(a) If/t(p, s) is defined, then production p has a right-hand side exact-
ly as long as s, where s is a string of properties;
(b) tt(p, %% " " %) is v 0 if the string of v0's is as long as the right-
hand side of production p, and is undefined otherwise.
The function/t tells how to assign properties in the tables associated with
interior nodes of parse trees. Depending on the production used at that node,
the property associated with each integer is computed independently of
the property of any other integer. Condition (5b) establishes that the table
of some interior node can have a nonneutral property for some integer only
if one of its direct descendants has a nonneutrai property for that integer.
A sententiaIform of G is a string X1T1X2Tz . . . X,T,, where the X's are
symbols in N U Z and the T's are property tables. Each table is assumed to
be attached to the symbol immediately to its left and represents a mapping
from indices to V such that all but a finite number of indices are mapped
to v0.
We define the relation =~, or =~. if G is understood, on sentential forms
G
as follows. Let aA Tfl be a sentential form of G and A ~ X1 ..- Xk be pro-
duction p in P. Then a A T ~ =~ aX~T I .... XkTkfl if for all indices i,
It(P, T , ( i ) T 2 ( i ) . . . Tk(i)) = T(i).
Let =*, or ==~ if G is understood, be the reflexive, transitive closure of =-. The
G G
language defined by G, denoted L ( G ) , is the set of all a~ T~a2T2 . . . a , T , such
that for some table T
(1) . . . a,T,;
S T *=, a t T l a z T 2
(2) Each aj is in E;
(3) For all indices i, T(i) is in F; and
(4) For each j, Tj maps all indices, or all but one index, to %.
We should observe that although the definition of a derivation is top-
down, the definition of a property grammar lends itself well to bottom-up
parsing. If we can determine the tables associated with the terminals, we can
construct the tables associated with each node of the parse tree deterministic-
ally, since/z is a function of the tables associated with the direct descendants
of a node.
It should be clear that if G is a property grammar, then the set
[alaz . . . a, l a l T l a z T 2 . . . a , T , is in L ( G )
for some sequence of tables T1, T z , . . . , T,}
is a context-free language, because any string generated by the underlying

C F G can be given all-neutral tables and generated in the property grammar.
CONVENTION
We continue to use the convention regarding context-free grammars,

a, b, e , . . . are in E and so forth, except v now represents a property and s
a string of properties. If T maps all integers to the neutral property, we write
X instead of X T .
Example 10.9
We shall give a rather lengthy example using property grammars to handle
declarations in a block-structured language. We shall also show how, if
the underlying C F G is deterministically parsable in a bottom-up way, the
tables can be deterministically constructed as the parse proceeds.t
Let G = (N, E, P, S, V, O, [0}, ,u) be a property grammar with
(i) N = ((block), (statement), (declaration list), (statement list),
(variable list)}. The nonterminal (variable list) generates a list of variables
used in a statement. We are going to represent a statement by the actual
variables used in the statement rather than giving its entire structure. This
tOur example grammar happens to be ambiguous but will illustrate the points to be
made.
abstraction brings out the salient features of property grammars without

involving us in too much detail.
(ii) ~ = [begin, end, declare, label, goto, a}. The terminal declare stands
for the declaration of one identifier. We do not show the identifier declared,
as this information will be in the property table attached to declare. Likewise,
labd stands for the use of an identifier as a statement label. The identifier
itself is not explicitly shown. The terminal goto stands for goto(label>, but
again we do not show the label explicitly, since it will be in the property
table attached to goto. The terminal a represents a variable.
(iii) P consists of the following productions.
(1) (block) > begin (declaration list)(statement list) end

(2) (statement list) ~ (statement list)(statement)
(3) (statement list) > (statement)
(4) (statement) ~ (block)
(5) (statement) ~ labd (variable list)
(6) (statement) ~ (variable list)
(7) (statement) ~ goto
(8) (variable list) L ~ (variable list) a
(9) (variable list) ~e
(10) (declaration list) ~ declare (declaration list)
(11) (declaration list) ~e
Informally, production (1) says that a block is a declaration list and a list of
statements surrounded by begin and end. Production (4) says that a statement
can be a block; productions (5) and (6) say that a statement is a list of the
variables used therein, possibly prefixed with a label. Production (7) says that
a statement can be a goto statement. Productions (8) and (9) say that a vari-
able list is a string of 0 or more a's, and productions (10) and (11) say that
a declaration list is a string of 0 or more deelare's.
(iv) V = (0, 1, 2, 3, 4} is a set of properties with the following meanings:
0 Identifier does not appear in the string derived from this node (neutral
property).
1 Identifier is declared to be a variable.
2 Identifier is a label of a statement.
3 Identifier is used as a variable but is not (insofar as the descendants
of the node in question are concerned) yet declared.
4 Identifier is used as a goto target but has not yet appeared as a label.
(v) We define the function ,u on properties with the following ideas in

mind"
(a) If an invalid use of a variable or label is found, there will be no
way to construct further tables, and so the process of table com-
putation will "jam." (We could also have defined an "error"
property.)
(b) An identifier used in a goto must be the label of some statement
of the block in which it is used.t
(c) An identifier used as a variable must be declared within its block
or a block in which its block is nested.
We shall give #(p, w) for each production in turn, with comments as to
the motivation.
(1) ( b l o c k ) L > begin (declaration list)(statement list)end
#(1, s)
0 0 0 0
0 1 0 0
0 1 3 0
0 0 3 0
0 0 2 0
The only possible property for all integers associated with begin and end
is 0 and hence the two columns of O's. In the declaration list each identifier
will have property 0 ( = not declared) or 1 ( = declared). If an identifier is
declared, then within the body of the block, i.e., the statement list, it can
be used only as a variable (3) or not used (0). In either case, the identifier is
not declared insofar as the program outside the block is concerned, and so
we give the identifier the 0 property. Thus, the second and third lines appear
as they do.
If an identifier is not declared in this block, it may still be used, either
as a label or variable. If used as a variable (property 3), this fact must be
transmitted outside the block, so we can check that it is declared at some
appropriate place, as in line 4. If an identifier is defined as a label within
the block, this fact is not transmitted outside the block (line 5), because
a label within the block may not be transferred to from outside the block.
Since # is not defined for other values of s, the property grammar catches
uses of labels not found within the block as well as uses of declared variables
as labels within the block. A label used as a variable will be caught at another
point.
tThis differs from the convention of ALGOL, e.g., in that ALGOL allows transfers
to a block which surrounds the current one. We use this convention to make the handling
of labels differ from that of identifiers.
(2) (statement list) (statement list) ( s t a t e m e n t )
U(2, s)
0 0
3 3
0 3
3 0
4 4
0 4
4 0
4 2
2 4
0 2
2 0
Lines 2-4 say that an identifier used as a variable in (statement list)

or ( s t a t e m e n t ) on the right-hand side of the production is used as a vari-
able insofar as the (statement list) on the left is concerned. Lines 5-7 say
the same thing about labels. Lines 8-11 say that a label defined in either
(statement list) or ( s t a t e m e n t ) on the right is defined for (statement list)
on the left, whether or not that label has been used.
At this point, we catch uses of an identifier as both a variable and label
within one block.
(3) (statement list) ~ (statement)
s u(3, s)
0 o
2 2
3 3
4 4
Properties are transmitted naturally. Property 1 is impossible for a state-

ment.
(4) (statement) > (block)
s U(4, s)
0 0
3 3
The philosophy for production (3) also applies for production (4).
(5) (statement>- > label <variable list>
u(5, s)
0 0
0 3
2 0
The use of an identifier as a label in label or variable in <variable list> is

transmitted to the (statement> on the right.
(6) <statement>= > <variable list>
s u(6, s)
0 0
3 3
Here, the use of a variable in (variable list> is transmitted to (statement>.
(7) (statement) ~ goto
s U(7,s)
0 0
4 4
The use of a label in goto is transmitted to (statement).
(8) (variable list) > (variable l i s t ) a
u(8, s)
0 0
3 0
0 3
3 3
Any use of a variable is transmitted to (variable list).
(9) (variable list) ~ e
s g(9, s)
e 0
The one value for w h i c h / t is defined is s = e. The property must be 0

by definition of the neutral property.
(10) <declaration list> > declare (declaration list>
.u(lO, s)
0 0
0 1
1 0
1 1
Declared identifiers are transmitted to (declaration list).
(11) (declaration list) ~ e
s #(11,s)
e 0
The situation with regard to production (9) also applies here.

We shall now give an example of how tables are constructed bottom-up
on a parse tree. We shall use the notation [il : vl, i2 : v 2 , . . . , i, :v,] for the
table which assigns to index ij the property vj for 1 . ~ j < n and assigns the
neutral property to all other indices.
Let us consider the input string
begin
declare [1 : 1]
declare [ 2 : 1 ]
begin
label [ 1 : 2 ] a [2 : 3]
goto [1 : 4].
end
a [1 : 3]
end
That is, in the outer block, identifiers 1 and 2 are declared as variables
by symbols declare[1 : 1] and declare[2 : 1]. Then, in the inner block, identifier
1 is declared and used as a label (which is legitimate) bysymbols label[1 : 2]
and goto[l : 4], respectively, and identifier 2 is used as a variable by symbol
a[2 : 3]. Returning to the outer block, 1 is used as a variable by symbol
a[1:3].
A parse tree with tables associated with each node is given in Fig. 10.9.
< block >
begin
li21,
< declaration list > < statement fist >
[1"3 2"3]
end
declare < declaration fist > < statement list > < statement >
[2 3] [1 3]
declare < declaration fist > < statement > < variable fist >
[2 • 1] [2 3] [1 3]
< block > < variable list > a

3 [1:31
begin < declaration fist > < statement list > end e
[1"2,2"31
< statement list > < statement >

[1"2,2"3] [l 41
I
< statement > goto
[1"2,2'3] [1'4]
label < variable list >

[1 • 2] [2 3]
< variable list > a

[2"3]
Fig. 10.9 Parse tree with tables.

SEC. 1 0 . 2 PROPERTY GRAMMARS 823
The notion of a property grammar can be generalized. For example,

(1) The mapping /z can be made nondeterministic; that is, let/z(p, w)
be a subset of V.
(2) The existence of a neutral property need not be assumed.
(3) Constraints may be placed on the class of tables which may be asso-
ciated with symbols, e.g., the all-neutral table may not appear.
Certain theorems about property grammars in this more general formu-
lation are reserved for the Exercises.
10.2.3. Implementation of Property Grammars
We shall discuss the implementation of a property grammar when the

underlying CFG can be parsed bottom-up deterministically by a shift-
reduce algorithm. When parsing top-down, or using any of the other parsing
algorithms discussed in this book, one encounters the same timing problems
in the construction of tables as one has timing the construction of translation
strings in syntax-directed translation. Since the solutions are essentially those
presented in Chapter 9, we shall discuss only the bottom-up case.
In our model of a compiler the input to the lexical analyzer does not have
tables associated with its terminals. Let us assume that the lexical analyzer
will signal whether a token involved is an identifier or not. Thus, each token,
when it becomes input to the parser, will have one of two tables, an all-neutral
table or a table in which one index has the nonneutral property. We shall
thus assume that unused input to the parser will have no tables at all; these
are constructed when the symbol is shifted onto the pushdown list.
We also assume that the tables do not influence the parse, except to inter-
rupt it when there is an error. Thus, our parsing mechanism will be the normal
shift-reduce mechanism, with a pointer to a representation of the property
table for each symbol on the pushdown list.
Suppose that we have [B, T1][C, T2] on top of the pushdown list and that
a reduction according to the production A ~ B C is called for. Our problem
is to construct the table associated with A from T1 and T2 quickly and con-
veniently. Since most indices in a table are mapped to the neutral property,
it is desirable to have entries only for those indices which do not have the
neutral property.
We shall implement the table-handling scheme in such a way that all
table-handling operations and inquiries about the property of a given index
can be accomplished in time that is virtually linear in the number of opera-
tions and inquiries. We make the natural assumption that the number of
table inquiries is proportional to the length of the input. The assumption is
correct if we are parsing deterministically, as the number of reductions made
(and hence the number of table mergers) is proportional to the length of the
~ i ~i ~
input. Also, a reasonable translation algorithm would not inquire of prop-

erties more than a constant number of times per reduction.
In what follows, we shall assume that we have a property g r a m m a r whose
underlying C F G is in Chomsky normal form. Generalizations o f the
algorithms are left for the Exercises. We shall also assume that each index
(identifier) with a nonneutral property in any property table has a location
in a hash table and that we may construct data structures out of elementary
building blocks called cells. Each cell is of the form
DATUM1 DATUMm POINTER1 POINTERn
consisting of one or more fields, each of which can contain some data or
a pointer to another cell. The cells will be used to construct linked lists.
Suppose that there are k properties in V. The property table associated
with a g r a m m a r symbol on the pushdown list is represented by a data struc-
ture consisting of up to k property lists and an intersection list. Each property
list is headed by a property list header cell. The intersection list is headed by
an intersection list leader cell. These header cells are linked as shown in Fig.
10.10.
The property header cell has three fields:
PROPERTY COUNT NEXT HEADER
Property Header Cell
Grammar Symbol B
Pushdown
List
Entry
Pointer to Property Table
I
Intersection ~ v
t ]
i
List Header Headers for Property Lists
Fig. 10.10 A pushdown list entry with structure representing

property table.
The intersection list header cell contains only a pointer to the first cell
on the intersection list. All cells on the property lists and the intersection lists
are index cells. An index cell consists of four pointer fields:
TOWARD INTERSECTION TOWARD AWAY F R O M

PROPERTY LIST HASH TABLE HASH TABLE
Index Cell
Suppose that T is a property table associated with a symbol on the pushdown

list. Then T will be represented by p property lists, where p is the number of
distinct properties in T, and one intersection list. For each index in T having
a nonneutrai property j, there is one index cell on the property list headed
by the property list header cell for property j.
All index cells having the same property are linked into a tree whose root
is the header for that property. The first pointer in an index cell is to its
direct ancestor in that tree.
If an index cell is on the intersection list, then the second pointer in that
cell is to the next cell on the intersection list. The pointer is absent if an index
cell is not on the intersection list.
The last two pointers link index cells which represent the same index but
in different property tables. Suppose that tST3~,T2flTx~ represents a string of
tables on the pushdown list such that T1, T2, and T3 each contain index i
with a nonneutral property and that all tables in t~, fl, 7, or ~ give index i
the neutral property.
If table T1 is closest to the top of the pushdown list, then C1, the index
cell representing i in T1, will have in its third field a pointer to the symbol
table location for i. The fourth field in C1 has a pointer to C2, the cell in T2
that represents i. In C2, the third field has a pointer to cell C~ and the fourth
field has a pointer to the cell representing i in T3.
Thus, an additional structure is placed on the index cells: All index cells
representing the same index in all tables on the pushdown list are in a doubly
linked list, with the hash table entry for the index at the head.
A cell is on the intersection list of a table if and only if some table above
it on the pushdown list has an index cell representing the same index. In
the example above, the cell C2 above will be on the intersection list for T2.
Example 10.10
Suppose that we have a pushdown list containing grammar symbols B
and C with C on top. Let the tables associated with these entries be, respec-
tively,
T~ = [ 1 : v ~ , 2 : v 2 , 5 : v v S : v ~ , 8 : v z ]
and
T 2 = [ 2 " v l , 3 " v 1,4"v 1,5"v 1,7"v2,8"%]
Then a possible implementation for these tables is shown in Fig. 10.11. Circles
Pushdown
List
B C
v2 ! v3Hv2H !
~I I / /f'-",~
/ / ~ / /
/ / /
/ /
/ /
/ /
/ /
J /
/
J
Symbol
Table
Fig. 10.11 Implementation of property tables.
indicate index cells. The number inside the circle is the index represented
by that cell. Dol~ted lines indicate the links of the intersection list. Note that
the intersection list of the topmost table is empty, by definition, and that
the intersection list of table T1 consists of index cells for indices 2, 5, and 8.
Dashed lines indicate links to the hash table and to ceils representing the
same index on other tables. We show these only for indices 2 and 3 to avoid
clutter. D
SEC. 10.2 PROPERTYGRAMMARS 827
Suppose that we are parsing and that the parser calls for B C on its stack
to be reduced to A. We must compute table T for A from tables Ti and Tz
for B and C, respectively. Since C is the top symbol, the intersection list
of T1 contains exactly those indices having a nonneutral property on both
T~ and Tz (hence the name "intersection list"). These indices will be set aside
for later consideration.
Those indices which are not on the intersection list of T~ have the neutral
property on at least one of T1 or T2. Thus, their property on T is a function
only of their one nonneutral property. Neglecting those indices on the inter-
section list, the data structure representing table T can be constructed by
combining various trees of T1 and T~. After doing so, each entry on the inter-
section list of T1 is treated separately and made to point to the appropriate
cell of T.
Before formalizing these ideas, we should point out that in practice we
would expect that the properties can be partitioned into disjoint subsets such
that we can express V as Va × V2 × .-- × Vm for some relatively large m.
The various components, V~, V2, • • • would in general be small. For example,
V1 might contain two elements designating "real" and "integer". Vz might
have two elements "single precision" and "double precision"; V3 might
consist of "dynamically allocated" and "statically allocated," and so on.
One element of each of V1, Vz, • • • can be considered the default condition
and the product of the default elements is the neutral property. Finally, we
may expect that the various components of an identifier's property can be
determined independently of the others.
If this situation pertains, it is possible to create one property header for
each nondefault element of V~, one for each nondefault element of V2, and
so on. Each index cell is linked to several property headers, but at most one
from any V,. If the links to the headers for Vt are made distinct from those
to the headers of Vj for i ~ j, then the ideas of this section apply equally
well to this situation, and the total number of property headers will approxi-
mate the sum Of the sizes of the V,'s rather than their product.
We shall now give a formal algorithm for implementing a property gram-
mar. For simplicity in exposition, we restrict our consideration to property
grammars whose underlying C F G is in Chomsky normal form.
ALGORITHM 10.3
Table handling for property grammar implementation.
Input. A property grammar G = (N, E, P, S, V, v 0, F, ,u) whose underly-
ing C F G is in Chomsky normal form. We shall assume that a nonneutral
property is never mapped into the neutral property; that is~ if l t ( p , v 1v2) = vo,
then vl = v2 = v0. [This condition may be easily assumed, because if we
find It(i, vlv2) -- Vo but v~ or v 2 is not v0, we could on the right-hand side
replace v o by v~, a new, nonneutral property, and introduce rules that would
make v~ "look like" v0.] Also, part of the input to this algorithm is a shift-
reduce parsing algorithm for the underlying CFG.
Output. A modified shift-reduce parsing algorithm for the underlying
CFG, which while parsing computes the tables associated with those nodes
o f the parse tree corresponding to the symbols or the pushdown list.
Method. Let us suppose that each table has the format of Fig. 10.10,
that is, an intersection list and a list of headers, at most one for each property,
with a tree of indices with that property attached to each header. The opera-
tion of the table mechanism will be described in two parts, depending on
whether a terminal or two nonterminals are reduced. (Recall that the underly-
ing grammar is in Chomsky normal form.)
Part 1: Suppose that a terminal symbol a is shifted onto the pushdown
list and reduced to a nonterminal A. Let the required table for A be [ i ' v ] .
To implement this operation, we shall shift A onto the pushdown list directly
and create the table [i" v] for A as follows.
(1) In the entry for A at the top of the pushdown list, place a pointer to
a single property header cell having property v and count 1. This property
header points to an intersection list header with an empty intersection list.
(2) Create C, an index cell for i.
(3) Place a pointer in the first field of C to the property header cell.
(4) Make the second field of C blank.
(5) Place a pointer in the third field of C to the hash table entry for i and
also make that hash table entry point to C.
(6) If there was previously another cell C' which was linked to the hash
table entry for i, place C' on the intersection list of its table. (Specifically,
make the intersection list header point to C' and make the third field in C'
point to the previous first cell of the intersection list if there was one.)
(7) Place a pointer in the fourth field of C to C'.
(8) Make the pointer in the third field of C' point to C.
Part 2: Now, suppose that two nonterminals are reduced to one, say by
production A ----~ B D . Let T1 and T2 be the tables associated with B and D,
respectively. Then do the following to compute T, the table for A.
(1) Consider each index cell on the intersection list of Tt. (Recall that T2
has no intersection list.) Each such cell represents an entry for some index i
on both Tt and T2. Find the properties of this index on these tables by
Algorithm 10.4.¢ Let these properties be v~ and v2. Compute v -- It(p, v~v2),
tObviously, one can find the property of the index by going from its cells on the two
tables to the roots of the trees on which the cells are found. However, in order that the table
handling as a whole be virtually linear in time, it is necessary that following the path to
the root be done in a special way. This method will be described subsequently in Algorithm
10.4.
SEC. 10.2 PROPERTYGRAMMARS 829
where p is production A ~ BD. Make a list of all index cells on the inter-
section list along with their new properties and the old contents of the cells
on T1 and T2.
(2) Consider the property header cells of T1. Change the property of
the header cell with property v to the property It(p, vvo). That is, assume
that all indices with property v on T1 have the neutral property on T2.
(3) Consider the property header cells of Tz. Change the property of
the header cell with property v to It(p, roy ). That is, assume that all indices
with property v on Tz have the neutral property on Ti.
(4) Now, several of the property header cells formerly belonging to T1
and T2 may have the same property. These are merged by the following
steps, which combine two trees into one"
(a) Change the property header cell with the smaller count (break
ties arbitrarily) into a dummy index cell not corresponding to
any index.
(b) Make the new index cell point to the property header cell with
the larger count.
(c) Adjust the count of the remaining property header cell to be
the sum of the counts of the two headers plus I, so that it reflects
the number of index cells in the tree, including dummy cells.
(5) Now, consider the list of indices created in step (1). For each such
index,
(a) Create a new index cell C.
(b) Place a pointer in the first field of C to the property header cell
with the correct property and adjust the count in that header cell.
(c) Place pointers in the third field of C to the hash table location for
that index and from this hash table entry to C.
(d) Place a pointer in the fourth field of C to the first index cell below
(on the pushdown list) having the same index.
(e) Now, consider C~ and C2, the two original rcells representing this
index on T1 and T2. Make C~ .and C2 into "dummy" cells by
preserving the pointers in the first field of C~ and C2 (their links
to their ancestors in their trees) but by removing the pointers in
the third and fourth fields (their links to the hash table and to
cells on other tables having the same index). Thus, the newly
created index cell C plays the role of the two cells C~ and C2
that have been made dummy cells.
(6) Dummy cells that are leaves can be returned to available storage. [1
Example 10.11
Let us consider the two property tables of Example 10.10 (Fig. 10.11
on p. 826). Suppose that It(p, st) is given by the following table:
s t u(p, st)
VO Vl Vl
VO V2 V2
VO V3 V3
Vl VO Vl
V2 VO V2
V2 Vl V3
V2 V3 V2
In part 2 of Algorithm 10.3 we must first consider the intersection list of

T1, which consists of index cells 2, 5, and 8. (See Fig. 10.11.) Inspection of
the above table shows that these indices will have properties v3, v3, and v2,
respectively, on the new property table T.
Then, we consider the property header cells of T, and T2. #(p, VoV~)= v,.
and It(p, %%)--vi, so the properties in the header cells all remain the
same. Now, we merge the tree for v l on T1 into the tree for v~ on Tz, since
the latter is the larger. The resulting tree is shown in Fig. 10.12(a). Then,
we merge the tree for v2 on Tz into the tree for v2 on T1. The resulting tree is
shown in Fig. 10.12(b). It should be observed that in Fig. 10.12(a) node 5
+
(a) (b)
Fig. 10.12 Merged trees.
has been made a direct descendant of the header, while in Fig. 10.11 it was
a direct descendant of the node numbered 3. This is an effect of Algorithm
10.4 and occurred when the intersection list of T1 was examined. Node 2 in
Fig. 10.12(b) has been moved for the same reason.
In the last step, we consider the indices on the intersection list of Ti.
New index cells are created for these indices; the new cells point directly to
the appropriate header. All other cells for that index in table T are made
dummy cells. Dummy ceils with no descendants are then removed. The
resulting table T is shown in Fig. 10.13. The symbol table is not shown.
Note that the intersection list of T is empty.
see. 10.2 PROPERTY GRAMMARS 831
Pushdown A
List
Fig. 10.13 New table T.
Now suppose that an input symbol is shifted onto the pushdown list and
reduced to D[2 : v~]. Then the index cell for 2, which points to Va in Fig.
10.13, is linked to the intersection list of its table. The changes are shown
in Fig. 10.14. E]
We shall now give the algorithm whereby we inquire about the property
of index i on table T. This algorithm is used in Algorithm 10.3 to find the
property of indices on the intersection list.
A D
l ?
'r
Ol
~. [ ~ dummy /
\\ //
\ /
/
Fig. 10.14 Tables after shift.
i
ALGORITHM 10.4
Finding the property of an index.
Input. An index cell on some table T. We assume that tables are structured
as in Algorithm 10.3.
Output. The property of that index in table T.
Method.
(1) Follow pointers from the index cell to the root of the tree on which
it appears. Make a list of all cells encountered on this path.
(2) Make each cell on the path, except the root itself, point to the root
directly. (Of course, the cell on the path immediately before the root already
does so.) [Z]
Note that it is step (2) of Algorithm 10.4, which is essentially a "side

effect" of the algorithm, that is of significance. It is step (2) which guarantees
that the aggregate time spent on the table handling will be virtually propor-
tional to input length.
10.2.4. Analysis of the Table Handling Algorithm
The remainder of this chapter is devoted to the analysis of the time com-
plexity of Algorithms 10.3 and 10.4. To begin, we define two functions F
and G which will be used throughout this section.
DEFINITION
We define F(n) by the recurrence"
F(1)-- 1
F(n) -- 2 F('- I )
The following table shows some values of F(n).
n F(n)
1 1
2 2
3 4
4 16
5 65536
Now let us define G(n) to be the least integer i such that F(i) ~ n. G(n)
grows so slowly that it is reasonable to say that G(n) ~ 6 for all n which are
representable in a single computer word, even in floating-point notation.
Alternatively, we could define G(n) to be the number of times we have to
apply log 2 to n in order to produce a number equal to or less than 0.
sEc. 10.2 PROPERTYGRAMMARS 833
The remainder of this section is devoted to proving that Algorithm 10.3

requires O(nG(n)) steps of a random access computer when the input string is
of length n. We begin by showing that exclusive of the time spent in Algorithm
!0.4, the time complexity of Algorithm 10.3 is 0(n).
LEMMA 10.3
Algorithm 10.3, exclusive of calls to Algorithm 10.4, can be implemented
to run in time 0(n) on a random access computer, where n is the length of
the input string being parsed.
P r o o f . Each execution of part 1 requires a fixed amount of time, and there
are exactly n such executions.
In part 2, we note that steps (1) and (5) take time proportional to the
length of the intersection list. (Again recall that we are not counting the time
spent in calls of Algorithm 10.4.) But the only way for an index cell to find
its way onto an intersection list is in part 1. Each execution of part 1 places
at most one index cell on an intersection list. Thus, at most n indices are
placed on all the intersection lists of all tables. After execution of part 2,
all index cells are removed from the intersection list, and so the aggregate
amount of time spent in steps (1) and (5) of part 2 is 0(n).
The other steps of part 2 are easily seen to require a constant amount of
time per execution of part 2. Since part 2 is executed exactly n - 1 times,
we conclude that all time spent in Algorithm 10.3, exclusive of calls to
Algorithm 10.4, is 0(n). D
We shall now define an abstract problem and provide a solution, mirror-

ing the ideas of Algorithms 10.3 and 10.4 in terms that are somewhat more
abstract but easier to analyze.
DEFINITION
For the remainder of this section let us define a s e t m e r g i n g p r o b l e m as
(1) A collection of o b j e c t s a l, • • . , an;
(2) A collection of set n a m e s , including A1, A2 . . . . , An; and
(3) A sequence of instructions I~, 12, •. • , Ira, where each I~ is of the form
(a) merge(A, B, C) or
(b) find(a),
where A, B, and C are set names and a is an object.
(Think of the set names as pairs consisting of a table and a property. Think
of the objects as index cells.)
The instruction merge(A, B, C) creates the union of the sets named A
and B and calls the resulting set C. No output is generated.
The instruction find(a) prints the name of the set of which a is currently
a member.
Initially, we assume that each object a~ is in the set A~; that is, A~ = {a~}.
834 BOOKKEEPING CHAP. ]0
The response to a sequence o f instructions I1, I 2 , . . . , Im is the sequence of

outputs generated when each instruction is executed in turn.
Example 10.12
Suppose that we have objects a 1, a 2. . . . , a6 and the sequence of instruc-
tions
merge(AI, A 2, A 2)
merge(A 3, A 4, A 4)
merge(A4, A 6, A 6)
find(a3)
After executing the first instruction, A 2 is {a l, az}. After the second instruc-
tion, A 4 is [a3, a4}. After the third instruction, A 6 is {as, a6}. Then after the
instruction merge(A 2, A 4, A4), A4 becomes {al, az, a3, a4}. After the last
merge instruction, A 6 = {al, a z . . . . ,a6}. Then, the instruction find(a3)
prints the name A 6, which is the response to this sequence of instructions.
We shall now give an algorithm to execute any instruction sequence of

length 0(n) on n objects in O(nG(n)) time. We shall make certain assumptions
about the way in which objects and sets can be accessed. While it may not
initially appear that the property g r a m m a r implementation of Algorithms
10.3 and 10.4 meets these conditions, a little reflection will suffice to see that
in fact objects (index cells) are easily accessible at times when we wish to
determine their property and that sets (property headers) are accessible when
we want to merge them.
ALGORITHM 10.5
C o m p u t a t i o n of the response to a sequence of instructions.

Input. A collection of objects [a 1. . . . , a,} and a sequence of instructions
11 . . . . , Im of the type described above.
Output. The response to the sequence 1, . . . . , Ira.
Method. A set will be stored as a tree in which each node represents one
element of the set. The root of the tree has a label which gives
(1) The name of the set represented by the tree and
(2) The n u m b e r of nodes in that tree (count).
We assume that it is possible to find the node representing an object or
the root of the tree representing a set in a fixed number of steps. One way of
accomplishing this is to use two vectors OBJECT and SET such that
OBJECT(a) is a pointer to the node representing a and SET(A) is a pointer
to the root of the tree representing set A.
Initially, we construct n nodes, one for each object a t. The node for a l
is the root of a one-node tree. Initially, this root is labeled Ai and has a count
ofl.
(1) To execute the instruction merge(A, B, C), locate the roots of the trees
for A and B [via SET(A) and SET(B)]. Compare the counts of the trees
named A and B. The root of the smaller tree is made a direct descendant of
the larger. (Break ties arbitrarily.) The larger root is given the name C, and
its count becomes the sum of the counts of A and B.'t Place a pointer in
location SET(C) to the root of C.
(2) To execute the instruction find(a), determine the node representing a
via OBJECT(a). Then follow the path from that node to the root r of its
tree. Print the name found at r. Make all nodes on this path, except r, direct
descendants of r.:l:
Example 10.13
Let us consider the sequence of instructions i n Example 10.12. After
executing the first three merge instructions, we would have three trees, as
shown in Fig. 10.15. The roots are labeled with a set name and a count.
A2 A4 A6
@ Fig. 10.15 Trees after three merge

instructions.
(The count is not shown.) Then executing the instruction merge(A 2, A 4, A4),
we obtain the structure of Fig. 10.16. After the final merge instruction
merge(A4, A6, A6), we obtain Fig. 10.17. Then, executing the instruction
find(a3), we print the name A 6 and make nodes a 3 and a4 direct descendants
of the root. (a4 is already a direct descendant.) The final structure is shown
in Fig. 10.18. [Z]
tThe analogy between this step and the merge procedure of Algorithm 10.3 should
be obvious. The discrepancy in the way counts are handled has to do simply with the
question of whether the root is counted or not. Here it is; in Algorithm 10.3 it was not.
,The analogy to Algorithm 10.4 should be obvious.
i
836 BOOKKEEPING CHAP. I0
A4 A6
Fig. 10.16 Trees after next merge instruction.
a 6
a2
Fig. 10.17 Tree after last merge instruc-

tion.
A6
Fig. 10.18 Tree after find instruction.
We shall now show that Algorithm 10.5 can be executed in O(nG(n))

time on the reasonable assumption that execution of a merge instruction
requires one unit of time and an instruction of the form find(a) requires time
proportional to the length of the path from the node representing a to the
root. All subsequent results are predicated on this assumption. From this point
on we shall assume that n, the number of objects, has been fixed, and that
the sequence of instructions is of length 0(n).
DEFINITION
We define the rank of a node on one of the structures created by
Algorithm 10.5 as follows.
(1) A leaf is of rank 0.
(2) If a node N ever has a direct descendant of rank i, then N is of rank
at least i + 1.
(3) The rank of a rtode is the least integer consistent with (2).
It may not be immediately apparent that this definition is consistent.
However, if node M is made a direct descendant of node N in Algorithm
10.5, then M will never subsequently be given any more direct descendants.
Thus the rank of M may be fixed at that time. For example, in Fig. 10.17
the rank of node a 6 can be fixed at 1 since a~ has one direct descendant of
rank 0 and a 6 subsequently acquires no new descendants.
The next three lemmas state some properties of the rank of a node.
LEMMA 10.4
Let N be a root of rank i created by Algorithm 10.5. Then N has at least
2 i descendants.
Proof. The basis, i = 0, is clear, since a node is trivially its own descen-
dant. For the inductive step, suppose node N is a root of rank i. Then, N
must have at some time been given a direct descendant M of rank i - 1.
Moreover, M must have been made a direct descendant of N in step (1) of
Algorithm 10.5, or else the rank of N would be at least i -q- 1. This implies
that M was then a root so that, by the inductive hypothesis, M has at least
2 t-1 descendants at that time, and in step (1) of Algorithm 10.5, N has at
least 2 i- 1 descendants at that time. Thus, N has at least 2 t descendants after
merger. As long as N remains a root, it cannot lose descendants.
LEMMA 10.5
At all times during the execution of Algorithm 10.5, if N has a direct
descendant M, then the rank of N is greater than the rank of M.
Proof. Straightforward induction on the number of instructions ex-
ecuted. [~]
The following lemma gives a bound on the number of nodes of rank i.

LEMMA 10.6
There are at most n2 -i nodes of rank i.
Proof. The structure created by Algorithm 10.5 is a collection of trees.
Thus, no node is a descendant of two different nodes of rank i. Since there

are n nodes in the structure, the result follows immediately from Lemma
10.4. [~
COROLLARY
No node has rank greater than logzn. [~]

DEFINITION
With the number of objects n fixed, define groups of ranks as follows.

We say integer i is in group j if and only if
logsj) (n) > i • log~zJ+l) (n),
where log~z1~ (n) -- log2 n and log~k+l) (n) -- log~zk) (log 2 (n)). That is, log~zk)
is the function which applies the log 2 function k times. For example,
log~23) (65536) = log~22~(16) = log 2 (4) = 2.
A node of rank r is said to be in rank group j if r is in group j. Since

log~k) (F(k)) = 1 and no node has rank greater than log 2 n, we note that no
node is in a rank group higher than G(n). For example, if n = 65536, we
have the following rank groups.
Rank of Node Rank G ro u p

0 5
1 4
2 3
3,4 2
5,6 . . . . . 16 1
We are now ready to prove that Algorithm 10.5 is of time complexity

O(nG(n)).
THEOREM 10.3
The cost of executing any sequence a of 0(n) instructions on n objects is
O(nG(n)).
Proof Clearly the total cost of the merge instructions in a is 0(n) units
of time. We shall account for the lengths of the paths traversed by the find
instructions in a in two ways. In executing a find instruction suppose we
move from node M to node N going up a path. If M and N are in different
rank groups, then we charge one unit of time to the find instruction itself.
We also charge 1 if N is the root. Since there are at most G(n) different rank
groups along any path, no find instruction is charged more than G(n) units.
If, on the other hand, M and N are in the same rank group, and N is not
a root, we charge 1 time unit to node M itself. Note that M must be moved
SEC. ] 0 . 2 PROPERTY GRAMMARS 839
in this case. By Lemma 10.5, the new direct ancestor of M is of higher rank
than its previous direct ancestor. Thus, if M is in rank group j, M may be
charged at most logsj~ (n) time units before its direct ancestor becomes one
of a lower rank group. From that time on, M will never be charged; the cost
of moving M will be borne by the find instruction executed, as described in
the paragraph above.
Clearly the charge to all the find instructions is O(nG(n)). To find an upper
bound on the total charge t o a l l the objects we sum over all rank groups the
maximum charge to each node in the group times the maximum number of
nodes in the group. Let g y be the number of nodes in rank group j and cj
the charge to all nodes in group j. Then:
log2 O) (n)
(10.2.1) gj ~ ~] n2 -k
k=log2 (j+1) (n)
by Lemma 10.6.
The terms of (10.2.1) form a geometric series with ratio 1/2, so their
sum is no greater than twice the first term. Thus g j < 2n 2 -l°g~''" (n), which
is 2n/logSj> (n). Now cj is bounded above by gj logsj> (n), so cj .~ 2n. Since j
may vary only from 1 to G(n), we see that O(nG(n)) units of time are charged
to nodes. It follows that the total cost of executing Algorithm 10.5 is
O(nG(n)). [~]
Now we apply our abstract result to property grammars.

THEOREM 10.4
Suppose that the parsing and table-handling mechanism constructed in
Algorithms 10.3 and 10.4 is applied to an input of length n. Also, assume
that the number of inquiries regarding properties of an index in tables is
0(n) and that these inquiries are restricted to the top table on the pushdown
list for which the index has a nonneutral property.t Then the total time
spent in table handling on a random access computer is O(nG(n)).
Proof. First, we must observe that the assumptions of our model apply,
namely that the time needed in Algorithm 10.3 to reach any node which
we want to manipulate is fixed, independent of n. The property headers
(roots of trees) can be reached in fixed time since there are a finite number of
them per table and they are linked. The index cells (nodes for objects) are
directly accessible either from the hash table when we wish to inquire of their
properties (this is why we assume that we inquire only of indices on the top
table) or in turn as we proceed down an intersection list.
tlf we think of the typical use of these properties, e.g., when a is reduced to F in Go
and properties of the particular identifier a are desired, we see that the assumption is quite
plausible.
It thus suffices to show that each index and header cell we create can be
modeled as an object node, that there are 0(n) of them, and that all manipu-
lations can be expressed exactly as some sequence of merge and lind instruc-
tions. The following is a complete list of all the cells ever created.
(1) 2n "objects,' correspond to the n header cells and n index cells created
during the shift operation (part 1 of Algorithm 10.3). We can cause the index
cell to point to the header cell by a merge operation.
(2) At most n objects correspond to the new index cells created in step
(5) of part 2 of Algorithm 10.3. These cells can be made to point to the
correct root by an appropriate merge operation.
Thus, there are at most 3n objects (and n in Algorithm 10.5 means 3n
here). Moreover, the number of instructions needed to manipulate the sets
and objects when "simulating" Algorithms 10.3 and 10.4 by Algorithm 10.5
is 0(n). We have commented that at most 3n merge instructions suffice to
initialize tables after a shift [(!) above] and to attach new index cells to headers
[(2) above]. In addition, 0(n) merge instructions suffice in step (4) of part 2
of Algorithm 10.3 when two sets of indices having the same property are
merged. This follows from the fact that the number of distinct properties is
fixed and that only n -- 1 reductions can be made.
Lemma 10.3 implies that 0(n) find instructions suffice to account for the
examination of properties of indices on an intersection list [step (1) of part 2
of Algorithm 10.3]. Finally, we assume in the hypothesis of the theorem
that 0(n) additional find instructions are needed to determine properties of
indices (presumably for use in translation). If we put all these instructions
in the order dictated by the parser and Algorithm 10.3, we have a sequence
of 0(n) instructions. Thus, the present theorem follows from Lemma 10.3
and Theorem i0.3. [5]
EXERCISES
10.2.1. Let G be the CFG with productions
E-----~ E + TI E@) TI T
T- > (E)ia
where a represents an identifier, + represents fixed-point addition,

and @ represents floating-point addition. Create from G a property
grammar that will do the following:
(1) Assume that each a has a table with one nonneutral entry.
(2) If node n in a parse tree uses production E ~ E + T, then
all the a's whose nodes are dominated by n are said to be "used in a
fixed-point addition."
EXERCISES 841
(3) If, as in (2), production E ~ E @ T is used, all a's dominated

by n are "used in a floating-point addition."
(4) The property grammar must parse according to G, but check
that an identifier which is used in a floating-point addition is not
subsequently (higher up the parse tree) used in a fixed-point addition.
10.2.2. Use the property grammar of Example 10.9 and give a parse tree, if
one exists, for each of the following input strings"
(a) begin
declare[ 1 : 1]
declare[ 1 : 1]
begin
a[1:3]
end
label[2: 2]a[ 1 : 3]
begin
declare[3:1]
a[3: 3]
end
goto[2: 4]
end
(b) begin
declare[1 : 1]
a[1:3]
begin
declare[2:1]
goto[1 : 4]
end
end
DEFINITION
A nondeterministic property grammar is defined exactly as a prop-
erty grammar, except that
(1) The range of/.t is the subsets of V, and
(2) There is no requirement as to fhe existence of a neutral property.
Using the conventions of this section, we say that ocATp
ocX1 T1 . . . X~T~fl if A ---~ X1 . . . X~ is some production, say p, and
for each i, T(i) is a member of It(p, Tl(i) . . . T~(i)). The ==~ relation
and the language generated are defined exactly as for the deterministic
case.
"10.2.3. Prove that for every nondeterministic property grammar G there is a
property grammar G' (possibly without a neutral property) generating
the same language, in which/z is a function.
10.2.4. S h o w t h a t if the grammar G of Exercise t0.2.3 has a neutral property
(i.e., all but a finite number of integers must have that property in
each table of each derivation), then grammar G' is a property grammar
with the neutral property.
10.2.5. Generalize Algorithm 10.3 to grammars which are not in CNF. Your
generalized algorithm should have the same time complexity as the
original.
"10.2.6. Let us modify our property grammar definition by requiring that no
terminal symbol have the all-neutral table. If G is such a property
grammar, let L'(G) be (al . . . anlalT1 . . . anTn is in L(G) for some
(not all-neutral) tables T1 . . . . . Tn}. Show that L'(G) need not be a
CFG. Hint: Show that (aibJckli < j < k} can be so generated.
*'10.2.7. Show that for "property grammar" G, as modified in Exercise 10.2.6,
it is undecidable whether L'(G) : ~ , even if the underlying C F G of
G is right-linear.
10.2.8. Let G be the underlying C F G of Example 10.9. Suppose the termi-
nal deelare associates one of two properties with an index--either
"declared real" or "declared integer." Define a property grammar on
G such that if the implementation of Algorithm 10.3 is used, the highest
table on the pushdown list having a particular index i with nonneutral
property (i.e., the one with the cell for i pointed to by the hash table
entry for i) will have the currently valid declaration for identifier i as
the property for i. Thus, the decision whether an identifier is real or
integer can be made as soon as its use is detected on the input stream.
10.2.9. Find the output of Algorithm 10.5 and the final tree structure when
given a set of objects ( a l , . . . , alz} and the following sequence of
instructions. Assume that in case of tie counts, the root of At becomes
a descendant of the root of Aj if i < j.
merge(A 1, Az, A 1)
merge(A4, A s, A4)
merge(A 6, A 4, ,4 4)
merge(Ag, AT, AT)
merge(A 1o, A 11, A 1o)
merge(A12, AIo, Alo)
find(a1)
merge(A 1, A 4, A I)
merge( A 7, A 1o, A 2)
fred(aT)
merge(A 1, A z, A 1)
find(a3)
find(a4)
*'10.2.10. Suppose that Algorithm 10.5 were modified to allow either root to be
made a descendant of the other when mergers were made. Show that
the revised algorithm is of time complexity at best 0(n log n).
Open Problems
10.2.11. Is Algorithm 10.5 as stated in this book really O(nG(n)) in complexity,
or is it 0(n), or perhaps something in between ?
10.2.12. Is the revised algorithm of Exercise 10.2.10 of time complexity
0(n log n) ?
Research Problem
10.2.13. Investigate or characterize the kinds of properties of identifiers which
can be handled correctly by property grammars.
BIBLIOGRAPHIC NOTES
Property grammars were first defined by Stearns and Lewis [1969]. Answers
to Exercises 10.2.6 and 10.2.7 can be found there. An n log log n method of imple-
menting property grammars is discussed by Stearns and Rosenkrantz [1969].
To our knowledge, Algorithm 10.5 originated with R. Morris and M. D. Mcllroy,
but was not published. The analysis of the algorithm is due to Hopcroft and
Ullman [1972a]. Exercise 10.2.10 is from Fischer [1972].
11 CODE OPTIMIZATION
One of the most difficult and least understood problems in the design of
compilers is the generation of "good" object code. The two most common
criteria by which the goodness of a program is judged are its running time
and size. Unfortunately, for a given program it is generally impossible to
ascertain the running time of the fastest equivalent program or the length of
the shortest equivalent program. As mentioned in Chapter 1, we must be
content with code improvement, rather than true optimization when programs
have loops.
Most code improvement algorithms can be viewed as the application of
various transformations on some intermediate representation of the source
program in an attempt to manipulate the intermediate program into a form
from which more efficient object code can be produced. These code improve-
ment transformations can be applied at any point in the compilation process.
One common technique is to apply the transformations to the intermediate
language program that occurs after syntactic analysis but before code gen-
eration.
Code improvement transformations can be classified as being either
machine-independent or machine-dependent. An example of machine-inde-
pendent optimization would be the removal of useless statements from a
program, those which do not in any way affect its output.t Such machine-
independent transformations would be beneficial in all compilers.
" Machine-dependent transformations would attempt to transform a pro-
gram into a form whereby advantage could be taken of special-purpose
tSince such statements should not normally appear in a program, it is likely that an
error is present, and thus the compiler ought to inform the user of the uselessness of the
statement.
844
SEC. 11.1 OPTIMIZATION OF STRAIGHT-LINE CODE 845
machine instructions. As a consequence, machine-dependent transformations

are hard to characterize in general, and for this reason they will not be dis-
cussed further here.
In this chapter we shall study various machine-independent transfor-
mations that can be applied to the intermediate programs occurring within
a compiler after syntactic analysis but before code generation. We shall
begin by showing how optimal code can be generated for a certain simple
but important class of straight-line programs. We shall then extend this class
of programs to include loops and examine some of the code improvement
techniques that can be applied to these programs.
11.1. OPTIMIZATION OF STRAIGHT-LINE CODE
We shall first consider a program schema that models a block of code

consisting of a sequence of assignment statements, each of which has the
form A ~ f ( B i , • • •, Br), where A and B l, • • •, Br are scalar variables and f
is a function of r variables for some r. For this restricted class of programs,
we develop a set of transformations and show how these transformations can
be used to find an optimal program under a rather general cost function.
Since the actual cost of a program depends on the nature of the machine
code that will eventually be produced, we shall consider what portions of
the optimization procedure are machine-independent and how the rest de-
pends on the actual machine model chosen.
11.1.1. A Model of Straight-Line Code
We shall begin by defining a block. A block models a portion of an

intermediate language program that contains only multiple-address assign-
ment statements.
DEFINITION
Let X be a countable set of variable names and ® a finite set of operators.

We assume that each operator 0 in 19 takes a known fixed number of oper-
ands. We also assume that 19 and E are disjoint.
A statement is a string of the form
A + - - - O B 1 "'" B~
where A, B ~ , . . . , B~ are variables in E and 0 is an r-ary operator in ®.

We say that this statement assigns (or sets) A and references B ~ , . . . , B~.
A block (B is a triple ( P , / , U), where
(1) P i s a list of statements S~; $2; • • • ; S,, where n ~> 0;
(2) I is a set of input variables; and
(3) U is a set of output variables.
846 CODE OPTIMIZATION CHAP. 11
We shall assume that if statement Sj references A, then A is either an

input variable or assigned by some statement before S i (i.e., by some St
such that i < j ) . Thus, in a block all variables referenced are previously
defined, either internally, as assigned variables, or externally, as input vari-
ables. Similarly, we assume each output variable either is an input variable
or is set by some statement.
A typical statement is A ~ + B C , which is just the prefix form of the
more common assignment "A ~ B + C." If a statement sets variable A,
we can associate a "value" with that assignment of A. This value is the for-
mula for A in terms of the (unknown) initial values of the input variables.
This formula can be written as a prefix expression involving the input vari-
ables and the operators.
For the time being we are assuming that the input variables have unknown
values and should be treated as algebraic unknowns. Moreover, the meaning
of the operators and the set of quantities on which they are defined is not
specified, and so a formula, rather than a quantity, is all that we can expect
as a value.
DEFINITION
To be more precise, let (P,/, U) be a block with P = $ 1 ; . . . ;S,. We

define vt(A), the value of variable A immediately after time t, 0 < t < n,
to be the following prefix expression"
(1) If A is in L then vo(A ) = A.
(2) If statement St is A ~ OB1 . . . Br, then
(a) vt(A ) -- Ov,_~(B~) . . . V t _ l ( B r ) .
(b) vt(C) = v~_~(C) for all C ~ A, provided that v,_~(C) is defined.
(3) For all A in Z, v,(A) is undefined unless defined by (1) or (2)
above.
We observe that since each operator takes a known number of operands,
every value expression is either a single symbol of Z or can be uniquely
written as OEi . . . E r, where 0 is an r-ary operator and E l , . . . , E~ are value
expressions. (See Exercise 3.1.17.)
The value of block (B -~ (P,/, U), denoted v((B), is the set
{v.(A) IA E U, and n is the number of statements of P}
Two blocks are (topologically)equivalent ( ~ ) i f they have the same value.

Note that the strings forming prefix expressions are equal if and only if they
are identical. That is, we assume no algebraic identities for the time being.
In Section 11.1.6 we shall study equivalence of blocks under certain alge-
braic laws.
SEC. 11.1 O P T I M I Z A T I O N OF S T R A I G H T - L I N E CODE 847
Example 11.1
Let I = [A, B}, let U = {~F, G}, and let P consist o f the statementst
T~ -A+B
S< -A--B
T+--T,T
S< S,S
F< T+S
G< T--S
Initially, vo(A ) = A a n d vo(B ) ---B. After the first statement, v l ( T ) --

A + B [in prefix n o t a t i o n vl(T) = + A B ] , Vl(A ) -- A, a n d vl(B) -- B. After
the second statement, v2(S ) -- A -- B, and other variables retain their previous
values. After the third statement, v3(T) -- (A q- B) • (A + B). The last three
statements cause the following values to be c o m p u t e d (other values carry over
f r o m the previous step):
v , ( S ) = (A -- B) • (A -- B)
v,(F) = (A + B) • (A + B) + (A -- B) • (A - - B)
v~(C) = (A + B) • (A + B) -- (A -- B) • (A -- B)
Since v6(F) = vs(F), the value o f the block is
{(A + B) • (A + B) + (A -- B) • (A -- B),
(A + B) * (A + B) -- (A - - B ) • ( A - - B)}.~
N o t e that we are assuming that no algebraic laws pertain. If the usual laws
o f algebra applied, then we could write F = 2(A z + B z) and G = 4 A B . [---]
To avoid u n d u e complexity, we shall not consider statements involving

structured variables (arrays a n d so forth) at this time. One way to handle
arrays is the following. If A is an array, treat an assignment such as A ( I ) - - J
as if A were a scalar variable assigned some function o f / , J and its f o r m e r
value. T h a t is, we write A , - - O A I J , where 0 is an o p e r a t o r symbolizing as-
signment in an array. Similarly, an assignment such as J = A ( I ) could be
expressed as J ,--- ~ A L
tin displaying a list of statements we often use a new line in place of a semicolon to
separate statements. In examples we shall use infix notation for binary operators.
~In prefix notation the value of the block is
{ ÷ • -t-AB + AB, --AB--AB, -- • q--AB -k-AB, -- AB--AB }.
In addition, we make several other assumptions which make this theory

less than generally applicable. For example, we ignore test statements, con-
stants, and assignments of the form A ,--- B. However, changing the assump-
tions will lead to a similar theory, and we make our assumptions primarily
for convenience, in order to give an example of this family of theories.
Our principal assumptions are"
(1) The important thing about a block is the set of functions of the input
variables (variables defined outside the block) computed within the block.
The number of times a particular function is computed is not important.
This philosophy stems from the view that the blocks of a program pass
values from one to another. We assume that it is never necessary for a block
to pass two copies of a value to another block.
(2) The variable names given to the functions computed are not impor-
tant. This assumption is not a good one if the block is part of a loop and
the function computed is fed back. That is, if 1 ~ 1 + 1 is computed, it
would not do to change this computation to J ~ I + 1 and then repeat the
block, expecting the computation to be the same. Nevertheless, we make
this assumption because it lends a certain symmetry to the solution and is
often valid. In the Exercises, the reader is asked to make the modifications
necessary to allow certain output values to be given fixed names.
(3) We do not include statements of the form X ~ Y. If such a statement
occurred, we could substitute Y for X and delete it anyway, provided that
assumption (2) holds. Again, this assumption lends symmetry to the model,
and the reader is asked to modify the theory to include such statements.
11.1.2. T r a n s f o r m a t i o n s on Blocks
We observe that given two blocks (B1 and (B2, we can test whether (B1
and (B2 are equivalent by computing their values v((Ba) and v((B2) and deter-
mining whether v((B1)= v(6~2). However, there are an infinity of blocks
equivalent to any given block.
For example, if 6~ = (P,/, U) is a block, X is a variable not mentioned
in 6~, A is an input variable, and 0 is any operator, then we can append the
statement X ~ O A . . . A to P as many times as we choose without changing
the value of (B.
Under a reasonable cost function, all equivalent blocks are not equally
efficient. Given a block 6~, there are various transformations that we can
apply to map 6~ into an equivalent, and possibly more desirable block (B'.
Let 3 be the set of all transformations which preserve the equivalence of
blocks. We shall show that each transformation in 3 can be implemented
by a finite sequence of four primitive transformations on blocks. We shall
then characterize those sequences of transformations which lead to a block
that is optimal under a reasonable cost criterion.
sEc. 11.1 OPTIMIZATIONOF STRAIGHT-LINECODE 849
DEFINITION
Let (B = (P, I, U) be a block with P = St; $2; . . . ;S,. For notational
uniformity we shall adopt the convention that all members of the input
set I are assigned at a zeroth statement, So, and all members of the output
set U are referenced at an n + 1st statement, S,+ 1.
Variable A is active immediately after time t, if
(1) A is assigned by some statement S~;
(2) A is not assigned by statements S~+1, S ~ + 2 , . . . , $i;
(3) A is referenced by statement Sj+t; and
(4) 0~i~t~j~n.
I f j above is as large as possible, then the sequence of statements S~+~,
S~+2, • • •, Sj+~ is said to be the scope of statement St and the scope of this
assignment of variable A. If A is an output variable and not assigned after
S~, then j -- n -t- 1, and U is also said to be in the scope of S~. (This follows
from the above convention; we state it only for emphasis.)
If a block contains a statement S such that the variable assigned in S
is not active immediately after this statement, then the scope of S is null,
and S is said to be a useless statement. Put another way, S is useless if S sets
a variable that is neither an output variable nor subsequently referenced.
Example 11.2
Consider the following block, where ~, fl, and 7 are lists o f zero or more
statements"
A< B+C
IJ
D+--A,E
?
If A is not assigned in the sequence of statements fl or referenced in 7, then

the scope of A ,-- B + C includes all of fl and the statement D ~ A • E.
If no statement in ? references D, and D is not an output variable, then the
statement D ~-- A • E is useless. E]
We shall now define four primitive equivalence-preserving transforma-

tions on blocks. We assume that CB = (P, I, U) is a block and that P is
Si ; $2; • • • ; S,. As before, we assume that So assigns all input variables and
that S,+, references all output variables. We shall define our transformations
in terms of their effect on this block 6~. Our first transformation is intuitively
appealing. We remove from a block any input variable or statement that
does not affect an output variable.
Ti" Elimination of Useless Assignments

If statement S i, 0 ~ i ~ n, assigns A, and A is not active after time i, then
(1) If i > 0, S,. can be deleted from P, or
(2) If i = 0, A can be deleted from L
Example 11.3
Let (g = (P, I, U), where I = {A, B, C}, U = {F, G}, and P consists of
F< A+A
G<-F,C
F< -A-+-B
G+----A,B
The second statement is useless, since its scope is null. Thus one application
of T~ maps (B into (B1 -- (P1,/, U), where P~ is
F< -A+A
F< A+B
G+---A*B
In 6~1, the input variable C is now useless, and the first statement in P~ is
also useless. Thus, we can apply transformation T~ twice in succession to
obtain ~2 = (P2, [A, B}, U), where P2 consists of
F< -A+B
G< A,B
Note that (B2 is obtained whether we first remove input variable C or whether
we first remove the first statement in P1. D
A systematic method of eliminating all useless statements from a block

(B = ( P , / , U) is to determine the set of useful variables (those that are used
directly or indirectly in computing an output) after each statement of the
block, beginning with the last statement in P and working up. Certainly,
U, = U is the set of variables that are useful after the last statement S,.
Suppose that statement St is A ~ 0 B1 " " Br and that Ui is the set of
variables useful after S~.
(1) If A E U~, St is a useful statement, since the variable A is used to
compute an output variable. Then U~_i, the set of useful variables after
statement S i_1, is found by replacing A in U~ by the variables B 1 , . . . , Br
[i.e., U,_~ = ( U , - {A}) U (Bx,..., Br}].
SEC. 11.1 O P T I M I Z A T I O N OF STRAIGHT-LINE CODE 851
(2) If A q! Ut, then statement S t is useless and can be deleted. In this

case Ut_ 1 = U~.
(3) Once we have computed Uo, we can remove all input variables in I
which do not appear in U0.
Our second transformation on blocks merges common expressions as
follows.
T2: Elimination of Redundant Computations

Now let us suppose that 6~ = (P,/, U) is a block in which P is of the form
0~
A < -0Ci "°" C r
P
B ~---0C1 "'" C r
where none of C a , . . . , Cr is A or is assigned in a statement of ft. Transfor-

mation T2 maps Cg into 6~' = ( P ' , / , U'), where P ' is
D< 0C1 "'Cr
and
(1) fl' is fl with all references to A changed to D in the scope of the
explicitly shown A, and
(2) ?' is 7 with all references to A and B changed to D in the scopes of
the explicitly shown A and B.
If the scope of A or B extends to S,+ 1, then U' is U with A or B changed to
D. Otherwise U ' = U.
D can be any symbol which does not change the value of the block.
Any symbol not mentioned in P is suitable, and some symbols of P might
also be usable.
Example 11,4
Suppose that CB = (P, {A, B], [F, G}), where P consists of
S< A+B
F< A,S
852 C O D EOPTIMIZATION CHAP. 11
R~ B+B
T~, A,S
G~, T,R
The second and fourth statements perform redundant computations, so trans-

formation T2 can be applied to ~ to produce 63' = (P', [A, B}, {D, G}), where
P ' consists o f
S-~ ,A + B
D ~. A,S
R~ B+B
G~. D.R
The output set becomes {D, G}. D may be any new symbol or one of the
variables F, A, S, or T. It is easy to check that letting D be B, R, or G changes
the value of the program. E]
T3" Renaming
Clearly, the name of an assigned variable is irrelevant insofar as the value
of a block 63 = (P,/, U) is concerned. Suppose that statement S~ in P is
A ~ 0 B1 .. • Br and that C is a variable that is not active in the scope of Si.
Then we can let ( B ' = ( P ' , / , U'), where P' is P with S~ replaced by
C ~ 0 B1 -.- Br and with all references to A replaced by references to C
in the scope of S~. If U is in the scope of S~, then U' is U with A changed
to C. Otherwise U' = U. Transformation T3 maps 63 into 63'.
Example i l .5
Let 63 = (P, {A, B}, IF}), where P is
T< A,B
T~ T+A
F< T,T
One application of T 3 enables us to change the name of the first assigned

variable from T to S. Thus T 3 maps 63 into 63' ----(P', [A, B}, [F}), where P' is
S~ A,B
T~. S+A
F~ T.T
Note that only the first assignment of T has been replaced by S. [Z]
T4" Flipping
Let (B = ( P , / , U) be a block in which statement S~ is A ~--- 0 B1 .." Br,
statement Si+l is C ~ ¢D1 . . . D s, A is not one of C, D 1 , . . . , D s, and C
is not one of A, Bi, • - . , Br. Then transformation T4 maps the block (B into
6~' = ( P ' , / , U), where P' is P with S t and Si+ ~ interchanged.
Example 11.6
Let (B = (P, [A, B}, IF, G}) in which P is
F.~--- A -k- B
G.~---A . B
T4 can be applied to transform 6~ into (P', [A, B}, (F, G}), where P ' is
G< A.B
F+--A -k- B
However, T4 can not map the block CB1 = (P~, [d, B}, IF, G}), where P1 is
F< ~A+B
G< F,A
into the block (B2 = (P2, {A, B}, IF, G}), where P2 is
G< F,A
F+--A+B
In fact, 6~2 is not even a block, because variable F is used without a previous
definition.
We shall now define certain equivalence relations that reflect the action
of the four transformations defined.
DEFINITION
Let S be a subset of [1, 2, 3, 4}. We say that (B1 7 6~2 if one application
•
of transformation Tt changes (B1 into (B2, where i is in S. We say (B1 . ¢ sG . 6 ~
if there is a sequence e 0 , . . . , e, of blocks such that
(1) e0 = ~ .
(2) e. = (~2.
(3) For each i, 0 _G i < n, either e t =~ et+ 1 or e~+ 1 ~ et.
Thus, ~ is the least equivalence relation containing :=~ and reflects
S S
the idea that the transformations can be applied in either direction.
854 CODEOPTIMIZATION CHAP. 11
CONVENTION
We shall represent subsets of [1, 2, 3, 4} without braces, so, for exam-

ple, ~ would be written =-~.
[1,2} 1,2
We would now like to show that 031 and 032 are equivalent blocks if and
only if there is a sequence of transformations involving only T~ through T,
which transforms 031 • into 032. That is, 031 ~ 032 if and only if 031 ~,2,3,,
~ 6~.
The "if" portion of this statement is easy to verify. All that is needed is
to show that each transformation individually preserves the value of a
block.
THEOREM I 1.1
If 031 1,2,3,4
~ 032, then 031 ~ 032.
Proof. Exercise. The reader should also show that any new name may
be chosen for D in T2, as stated in the description of that transformation. D
COROLLARY
If 031 1:2,3;, 032, then 031 --= 032. [Z
We shall prove the converse of the corollary to Theorem 1 1.1 in Section

11.1.4.
11.1.3. A Graphical Representation of Blocks
In this section we shall show that for each block 03 = (P, I, U) we can
find a directed acyclic graph (dag) D that represents 03 in a natural way.
Each leaf of D corresponds to one input variable in I and each interior node
of D corresponds to a statement of P. The transformations on blocks con-
sidered in the previous section can then be applied to dags with equal ease.
DEFINITION
Let 03 = ( P , / , U) be a block. We construct a labeled ordered dag,

denoted D(03), from 03 as follows:
(1) Let P = $1;. • • ; S,,
(2) For each A in /, create a node with label A. At this point in the
algorithm, the node for A is said to be the last definition of A.
(3) For i = 1, 2 , . . . , n, do the following. Let S~ be A ~ OB1 . . . Br.
Create a new node labeled 0, with r directed edges leaving. Order the edges
from the left, and let the j t h edge from the left point to the last definition of
Bj, 1 < j < r. The new node labeled 0 becomes the last definition of A.
This node corresponds to statement S~ in P.
(4) After step (3), those nodes which are the last definition of an output
variable are further labeled "distinguished." We shall circle distinguished
nodes.
Example 11.7
Let 63 = (P, [A, B}, {F, G}) be a block, where P consists of the statements
T< -A+B
F< A,T
T< -B+F
G< -B,T
The dag D(63) is given in Fig. 1 1.1.
n4
n3
n2
n1
( Fig. 11.1 E x a m p l e o f a dag.
Note in Fig. 1 1.1 that the four statements of 63 correspond in order to

nodes nl, n2, n3, n4. Also note that the right descendant of n 4 is n 3 rather
than nl, because when n4 is created, n 3 is the last definition of T. [Z]
Each dag represents an equivalence class of ~3,4 in a natural way. That is,
if a block 631 can be transformed into 632 by some sequence of transformations
T3 and T4, then blocks 631 and 63z have the same dag, and conversely. Half
of this assertion is the following lemma, which is left to the reader to check
using the definitions.
LEMMA 1 1.1
If631 =~
3,4 632, then D(631) = D(632).
Proof Exercise.
COROLLARY
If 631 ~
3,4
632, then D(631) = D(632). D
856 CODE O P T I M I Z A T I O N CHAP. 11
The more difficult portion of our assertion is the other direction. For its
proof we need the following definition and lemma.
DEFINITION
A block ~ = (P, L U) is said to be open if
(1) No statement in P is of the form A ~-- ~, where A is in L and
(2) No two statements in P assign the same variable.
In an open block 6~ ---=( P , / , U), a distinct variable X t not in I is assigned
by each statement Si in P. The following lemma states that an open block
can always be created by renaming variables using only transformation T3.
LEMMA 11.2
Let CB = (P,/, U) be a block. Then there is an equivalent open block
(B' = ( P ' , / , U') such that (B ~ (B'.
3
P r o o f Exercise. E]
The following theorem shows that two blocks have the same dag if and
only if one block can be transformed into the other by renaming and flipping.
THEOREM 11.2
D((B 0 = D((B2) if and only if (Bt ~3,4 CB2.
Proof. The "if" portion is the corollary to Lemma 11.1. Thus, it suffices
to consider two blocks ~ = (P~, It, U~) and (B2 = (P2, 12, U2) such that
D ( ~ ) = D(~2) = D. Since the dags are identical, the input sets must be
the same, and so we may let I~ = 12 = I. Also, the number of statements in
Pt and P2 must be the same, and so we may suppose P~ = S t ; . . . ; S~ and
P2 = R I ; . . . . ;R,.
Using T3, the renaming transformation, we can construct two open
blocks ~'~ = (P'i, 1'1, U'~) and 6~ = (P~, 1~, U~) having the same set of as-
signed variables, such that
(1) ~', ~ tB,;
3
(2) ~ ~ ~2;
(3) Let P~ = S ~ ; . . . ; S', and P~ = R'~;.. ; R~'. Then S', and R) assign
the same variable if and only if they correspond to the same node of D.
[Observeby the eoroilary to Lemma 11.1 that D((B'i) = D((B~) = D.]
In creating the open blocks, we first rename all the variables of ~ and CB2
with entirely new names. Then, we can rename again to satisfy condition (3).
Now we shall construct a sequence of blocks e 0 , . . . , t~, such that
(4) e o = ~'~;
(5) e . = ~ ;
(6) e, ~ 4 e,+~ for 0 < i < n"
(7) The statements of e i are R ' ~ ; . . . ;R~. followed by those statements

among S ' ~ ; . . . ;S', which do not set variables also assigned by any of
R'i . . . . , R'~. Clearly D(e~) = D, and statements defining the same variable
in et and (B~ create the same node in D.
We begin with e 0 = (B'i. Condition (7) is satisfied trivially. Suppose
that we have constructed el, i ~ 0. We can write the list of statements in Ct
as R'i",... ," R'~', S~L',... ," S'~._,, in which the statements S~, . . . , S~,._, satisfy
condition (7). By definition of P'i and P~, we can find S~., which assigns the
same variable as R~+ 2. Since S~.~and R'i+ ~ correspond to the same node of D,
it follows that S~.~ references only variables which are in I or assigned by
R'~; . . . ; R'i, and, in fact, R~+i = S~,~.Thus, by repeated application of T4,
we may move S'j~ in front of all ofS~. I," - - - " , Sj._, ' . The resulting block is
C,+ 1, and conditions (6) and (7) are easily checked.
When i in condition (7) is equal to n, we obtain condition (5). Thus
3 4 3
from which 6~I ~3 , 4 63z is immediate.
COROLLARY
If D(6~) = D(632), then ~ ~ ~2.

Proof. Immediate from Theorems 11.1 and 11.2. D
By the above corollary, we can naturally give a value to a dag, namely

the value of any block having that dag.
Example 11.8
Consider the two blocks (B~ = (Pa, {A, B}, {F}) and 632 = (e2, [A, B}, {F}),
with P~ and P2 as follows"
P1 P2
C~---A*A C~---B*B
De--B*B D*---A*A
E~---C--D E~--'DWC
F+--C+D C+---D--C
F +--- E l F F ~--- C[E
Blocks 631 and 632 have the same dag, which is shown in Fig. 11.2. Using/'3,
we can map 631 and 6~2 into open blocks 63'1 = (P'~, [A, B}, [Xs}) and
CB~ = (P~, {A, B}, [)(5]) so that condition (3) in the proof of Theorem 11.2
is satisfied. P'~ and P~ are shown on the following page.
Fig. 11.2 Dag for (B~ and (~2.
Pi P~
X1 + - - A * A Xz ~----B*B
Xz~-'-B*B X1 + - - A * A
X3 +--- X 1 - X2 X4 +-- X~ + X2
24 ~ X1 --~ X2 X3 ~ X l -- X 2
X5 ~ X3/X4 X5 ~ X3/X4
Then, beginning with block eo having the list of statements P'i, we can readily
construct the blocks ex, e2, e3, C4, and C5 in the proof of Theorem 11.2.
Block e~ is obtained by using T4 to move fhe second statement in front of
the first, as shown below:
~t e3
Xz + - - B * B Xz ~ - - B * B
XI ~ A * A X1 ~ A * A
X3 ~ X1- Xz Xa ~ X1 + X2
X4 ~ X l "~- X z X3 ~ X l -- X z
X5 ~ X3/X4 X5 ~ X3/X4
Then ~2 = el. Block e 3 is constructed from C2, using T4 to move the fourth
statement in front of the third as shown above, and e4 and ~5 are both the
same as e 3. [~]
11.1.4. Characterization of Equivalences Between Blocks
We shall now show that 031 = 03z if and only if 031 1~, 2 , 3 , 4 032. In fact there
is a stronger result, namely that 031 ~ 032 if and only if 031 ~1,2 032 That is,
transformations T1 and T2 are sufficient to map any block into any other
equivalent block. We shall leave the proof of this stronger result for the
Exercises. (See Exercises 1 I. 1.9 and 1 1.1.10.)
DEFINITION
A block 03 is reduced if there is no block 03' such that 03 =0-

1,2
03'.
A reduced block contains no useless statements or redundant compu-
tations. Given any block 03, we can find a reduced block equivalent to it by
repeatedly applying Ti and T2 in their forward directions. Since each appli-
cation of T I or Tz reduces the length of the block, we must eventually come
upon a reduced block. Our goal is to show that for reduced blocks 031 and 03z,
we have 031 = 032 if and only if D(031) = D(032). Thus, given a block 03,
we can find one dag corresponding to all reduced blocks obtainable from 6~
by any sequence of transformations T1 and Tz whatsoever. Finding this
dag is an important step in the "optimization" of the block, whatever machine
model we are using.
DEFINITION
Let P = $1; • • • ; Sn be the list of statements in a block. Let E(P) be the

set of expressions computed by P. Formally,
E(P) = {v,(A)tSt assigns A, 1 ~ t < n}.
Expression I/ is computed k times by P if there are exactly k distinct

values of t such that v,(A) = 1/and St sets A.
LEMMA 1 1.3
If 03 = (P,/, U) is a reduced block, then P does not compute any expres-
sion more than once.
Proof If two statements compute the same expression, find the "first"
instance of an expression computed twice. That is, if S~ and S~, i < j, com-
pute the same expression r/, we say that (i,j) is the first instance if for all
pairs Sk and S, k < l, which compute the same expression, either i < k or
both i = k and j < l. It is left for the Exercises to show that an application
ofT2 in the forward direction would be applicable to S~ and S~ contradic-
ting the assumption that P is reduced. [B
LEMMA 11.4
If 63s = (P1, Is, Us) and 632 = (/2, 12, U2) are equivalent reduced blocks,
then E(P~) = E(P2).
Proof If E(Ps) ~ E(P2), we may, without loss of generality, let ~ be the
last computed expression in E(Pa)--E(P2). Since v ( ~ ) = v(632) and each
expression may be uniquely split into subexpressions, it follows that ~ is not
a subexpression of any expression in v(63~). Thus, the statement computing
r/in P~ is useless and can be eliminated using transformation T~, contradicting
the assumption that 63~ was reduced. Details are left for the Exercises. D
THEOREM 11.3
Let ~1 and ~2 be two reduced blocks. Then ~1 ~ ~2 if and only if

D(~,) = D((B2).
Proof The "if" portion is a special case of the corollary to Theorem
11.2. Thus, let 63~ - - (B2. By the previous two lemmas, there is a one-to-one
correspondence between those statements of 631 and 6~ which compute the
same expression.
Suppose that D((B1):/= D(6~2). We shall attempt to "match" the nodes
of D(63I) and D((B2) as far "up" the dag as possible. Clearly, the leaves of
the two dags must match, for if not, the input sets of (B~ and 632 would be
different. We could then find an input variable of one which was not refer-
enced and ~apply T1 to eliminate that input variable. Since 6~1 and 632 are
reduced, we would have a contradiction.
We proceed to match nodes; if a node of D(631) and a node of D(6~2)
have the same (operator) label, if their edges leaving are equal in number,
and if corresponding edges (from the left) point to matching nodes, then
the two nodes in question are matched. If in so doing we match all nodes of
D((Bt) and D(6~2), then these dags are the same.
Otherwise, we shall come to some node of D(63~) or D((B2) which does
not match a node in the Other dag. Without loss of generality, we can assume
that such a node occurs in D((B1) and that we pick the "lowest" such node,
one such that each edge leaving it points to a node which is matched. Let
this node be nl. We observe that matched nodes of D(63~) and D(632) are
created by statements of (B~ and (B2 which compute the same expression.
An easy induction on the order of matching shows this.
However, by Lemma 11.3, no node can possibly be matched with two
nodes of the other dag. By Lemma 11.4, there is a node n~ of D(~Bz) which
is Created by a statement of (B2 which computes the same expression as the
statement of 6~i which creates n~. Since expressions "parse" uniquely,
the direct descendants of n~ and n 2 are matched. This follows from our
assumption that na was as "low" on D((B1) as possible. Thus, n~ and n 2
SEC. 11.1 OPTIMIZATIONOF STRAIGHT-LINECODE 861
could have been matched, contrary to hypothesis. Hence, D(631) -~ D(632).
COROLLARY
All reduced blocks equivalent to a given block have the same dag. D
We can now put the various pieces together to obtain the result that
the four transformations are sufficient to transform a block into any of its
equivalents.
THEOREM 11.4
63~ ~ 632 if and only if 631 ~.2.3, 4 632.
Proof The "if" portion is the corollary to Theorem 1 1.1. Conversely,
assume that 631 ~ 632. Then there exist reduced blocks 63'1 and 63~ such that
631 ~l,z 63'1 and 632 ~l,z 63~" By the corollary to Theorem 1 1.1,631 ~ 63 '1 and
632 ~ 63~. Thus, 63'~ ~_ 63~. By Theorem 1 1.3 D(63't) -~ D(63~). By Theorem
11.2, 63'1 ~ ' Hence, 631 1.2.3.4
632. .oLD ~ .
3,4
11.1.5. Optimization of Blocks
Let us now consider the question of transforming a block 63 into a block

63' which is optimal with respect to some cost criterion on blocks. In practice,
we have the situation portrayed in Fig. tl.3. Given a block 63, we want
Optimizer ~" -i Generator

Code ~e0
Fig. 11.3 Optimizationscheme.
to ultimately produce an object language program that is optimal with

respect to some cost function on object programs such as program size or
execution speed. Our optimizer applies a sequence of transformations to
63 in order to produce 63', a block equivalent to 63, from which an optimal
object language program can be generated. Thus, one problem is to find
some cost criterion on blocks that mirrors the cost of the object program
which will ultimately be produced.
There are certain cost criteria on blocks for which the idea of optimiza-
tion does not even make sense. For example, if we said that the longer a block
is, the better it is, then there would be no optimal block equivalent to a given
block. Here we shall restrict our thinking to cost functions on blocks that
reflect most of the common criteria applied to object language programs,
such as speed of execution or amount of storage used.
DEFINITION
A cost criterion on blocks is a function from blocks to real numbers.

A block 63 is optimal under cost criterion C if C((B)~ C(tg') for all 63'
equivalent to 63. A cost criterion C is reasonable if 631 ==~
1.2 ~2 implies that
C((B2) _~ C(6~1), and every block has an optimal equivalent under C. That
is, a cost criterion is reasonable if transformations T 1 and T 2 applied in
the forward direction do not increase the cost of a block.
LEMMA 11.5
If C is a reasonable cost criterion, then every block has a reduced equiva-
lent which is optimal under C.
Proof. Immediate from definitions. E]
Lemma 11.5 states that given a block (g we can confine our search for
an equivalent optimal block to the set of reduced blocks equivalent to (g.
The following lemma states that only reduced blocks equivalent to a given
reduced block 63 will be found by applying a sequence of transformations
T3 and T 4 to 63.
LEMMA 11.6
If 631 is a reduced block and 631 ~3,4 632, then (B2 is reduced.
Proof Exercise.
Our next result shows that if we have an open block initially, then a se-
quence of renamings followed by a flip can be replaced by the flip followed
by the renamings.
LEMMA 1 1.7
Let 631 b e a n open block and 6~1 ~ (g z 7 (g3" Then there exists a block
3
6~ suchthat (g 1 7 ~ ~ 3
633"
Proof Exercise. [~]
We are now prepared to give a general framework for optimizing blocks

according to any reasonable cost criterion. The following theorem provides
the basis for this optimization.
THEOREM 11.5
Let 6~ be any block. There exists a block (g' equivalent to 63 such that if
C is any reasonable cost criterion, then there also exist blocks 6~1 and ~z
such that
(1) ~' ~ 4 ~ ,
(2) (g I ~ 3 632, and
(3) 6~2 is optimal under C.
Proof Let (B" be any reduced block equivalent to (B. We can transform
(B" into (B', an open block equivalent to (B" using only T3. By Lemma 11.6,
(B' is reduced as well as open.
Let (g 2 be an optimal reduced block equivalent to (g. By Lemma 11.5,
(B2 exists. Thus, D((Bz)= D((g') by the corollary to Theorem 11.3. By
Theorem 11.2, (B' .~**>(Bz. We observe that T3 and T4 are their own "inverses,"
3,4
that is, e ==~ e ' if and only if e ' = , e. Hence, we can find a sequence of
3,4 3,4
blocks e l , . . , e , such that (g' = e l , (Bz - - e , , and e~ =*
3,4
e~+a for 1 -< i < n.
Using Lemma 11.7 iteratively, we can move all uses of T4 ahead of those of
T3. Thus, we can find (B 1 such that (g' * <'3=~
If we examine Theorem 11.5, we see that it divides the optimization pro-

cess into three stages. Suppose that we wish to optimize a given block (B:
(1) From (g we can first eliminate redundant and useless computations
and rename variables to obtain a reduced open block (g'.
(2) In (B' we can then reorder statements by flipping, until a block (B I is
obtained in which the statements are in the best order.
(3) Finally we can rename variables in (g~ until an optimal block (g z is
found.
We note that step (1) can be performed efficiently (as a function of block
length). It is left to the reader to give an algorithm for step (1) which takes
time 0(n log n) on a block having n statements.
Often, one of steps (2) and (3) is trivial. Our next example shows how
statements in our intermediate language can be converted to assembly lan-
guage in such a way that the number of assembly language instructions
executed is minimized. This optimization algorithm will be seen not to need
step (3). Renaming of variables will not subsequently affect the cost.
Example 11.9
We shall now take an example that has some interesting ideas not found
elsewhere in the book. The reader is urged to examine it closely. Let us
consider generating machine code for blocks. We postulate a computer
with a single accumulator and the following assembly language instructions
with meanings as shown.
(!) L O A D M. Here the contents of memory location M are loaded into
the accumulator.
(2) S T O R E M. Here the contents of the accumulator are stored into
memory location M.
(3) 0 M2M~,..., Mr. Here 0 is the name of an r-ary operator. The first
argument of 0 is in the accumulator, the second in memory location M2,
the third in memory location M 3, and so forth. The result obtained by apply-
ing 0 to its arguments is placed in the accumulator.
A code generator would translate a statement of the form A ~ OB~ . . • B,.

into the following sequence of machine instructions"
J
LOAD B
0 B 2 , . . . ,Br
STORE A
However, if the value of B t is already in the accumulator (i.e., the previous

statement assigned B i), then the first LOAD instruction need not be gener-
ated. Likewise, if the value of A is not required, except as the first argument
of the next statement, then the final STORE instruction is not necessary.
The cost of the statement A ~ OB~ . . • B,, can thus be 1, 2, or 3. It is 3
if B a is not found in the accumulator and there is a subsequent reference
to this assignment of A that is not the first argument of the next statement
(i.e., A has to be stored). It is 1 if B a is already in the accumulator and there
is no reference to this computation of A other than as the first argument of
the next statement. Otherwise, the cost is 2.
We should point out that this cost assumption glosses over a number of
considerations. To show that it correctly reflects the number of instructions
needed to execute the block on our machine, we should first rigorously
define the effect of a sequence of assembly instructions. If this is done in
the expected way, then every assembly language program can be related to
a block in our intermediate language by identifying assembly instructions
of type (3), the operations, with statements of the block. All these details
are left for the Exercises. In this example we shall take the cost function on
blocks to be as we have stated it.
Let us consider the block (B~ = (P1, {A, B, C}, [F, G}), which might be
obtained from the F O R T R A N statements
F=(A + B ) , ( A -- B)
G=(A--B),(A--C),(B--C)
The list of statements in P1 is
T< -A+B
S~----A--B
F< T.S
T ~. A--B
S~ A--C
R~ B--C
T< T,S
G< -T,R
There are no useless statements. However, we note one instance of redun-

dancy, between the second and fourth statements. We can eliminate this
redundancy and then give each statement a new variable name to assign,
obtaining the reduced open block 6~2 = (P2, (A, B, C}, {X3, X7]). (B2 plays the
role of 63' in Theorem 11.5. The statements of P2 are
Xi< A+B
Xz< A--B
X3 < Xt * X2
X4< A--C
Xs< B--C
X7 < X, , X,
The dag for $2 is shown in Fig. 11.4. Node ni is created from the state-
ment of Pz which sets X~.
We observe that there are a large number of programs into which 6~2
can be transformed using only T4. We leave it for the Exercises to show
that this number is the same as the number of linear orders of which the
partial order represented by Fig. 11.4 is a subset.
An upper bound on that number would be 7 !, the number of permuta-
tions of the seven statements. However, the actual number will be less in
this case, as not all statements of Pz can ever pass over each other by using
T4. For example, the third statement of P2 must always follow the second,
because the third references X2 and the second defines it. Note that an appli-
cation of T 3 may change the name of X2 but that the same relation will hold
with a new name.
Another interpretation of the limits on T4's ability to reorder the block
is to observe that in any such reordering, each node of D((Bz)will correspond
to some statement. The statement corresponding to an interior node n
cannot precede any statement corresponding to an interior node which is
a descendant of node n.
While the problem of this example is simple enough to enumerate all
linear orderings of P2, we cannot afford the time to do this for an arbitrary
block. Some heuristic that will produce good, although not necessarily
optimal, orderings quickly is needed. We propose one here. The following
algorithm produces a linear ordering of the nodes of a dag. The desired block
has statements corresponding to these nodes in reverse order. We express
the algorithm as follows:
r/7
n3 ~ * )n 6
nl {- )n2 (- }n4 n5
Fig. 11.4 D a g for 6~2.
(1) We construct a list L. Initially, L is empty.

(2) Choose a node n of the dag such that n is not on L, and if there are
any edges entering n, they come from nodes already on L. Add n to L. If no
such n exists, end.
(3) If n 1 is the last node added to L, the leftmost edge leaving n I points
to an interior node n not in L, and all of n's direct ancestors are already in L,
add n to L and repeat step (3). Otherwise go to step (2).
For example, using the dag of Fig. 11.4, we might begin with L - - n 3.
By step (3), we would add n~ to L. Then we would choose nT, add it to L, and
follow it by n6 and n 2. Two more uses of rule (2) would add n 4 and n 5, so
a candidate for L is ns, nl, nT, n6, n2, n4, ns. Recalling that the statement
assigning X; creates node n i and that the list L corresponds to the statements
in reverse, we obtain the block 6~3 ---- (P3, (A, B, C}, [Y3, XT~}). It is easy to
check that 63z ~'4~ 633. The list of statements in P3 is
X5< B--C
X4< A--C
X2~ A--B
SEC. 11.1 O P T I M I Z A T I O N OF S T R A I G H T - L I N E CODE 867
X,< X, , X,
Xi< A+B
X3 < X~ * X2
The assembly language p r o g r a m s obtained f r o m (B2 and (B, are shown

in Fig. II.5.
LOAD A LOAD B
ADD B SUBTR C
STORE X1 STORE Xs
LOAD A LOAD A
SUBTR B SUBTR C
STORE Xz STORE X4
LOAD X1 LOAD A
MULT X2 SUBTR B
STORE X3 STORE Xz
LOAD A MULT X4
SUBTR C MULT Xs
STORE X4 STORE X7
LOAD B LOAD A
SUBTR C ADD B
STORE Xs MULT Xz
LOAD X2 STORE X3
MULT X4
MULT Xs
STORE X7
(a) From ~2 (b) From 6~3
Fig. 11.5 Assembly language programs.
11.1.6. Algebraic Transformations
In m a n y p r o g r a m m i n g languages certain algebraic laws are k n o w n to

hold a m o n g some operators and operands. These algebraic laws can often
be used to reduce the cost of a p r o g r a m in a m a n n e r which would not be
possible using only the four topological t r a n s f o r m a t i o n s hitherto considered.
Some useful, c o m m o n algebraic laws are the following:
(1) A binary o p e r a t o r 0 is commutative if t~ 0 fl = fl 0 • for all expres-
sions t~ and ft. Integer addition and multiplication are examples of c o m m u -
tative operators.'l
tHowever, care must be exercised if the operands of a commutative operator are func-
tions with side effects. For example, f(x) + g(y) may not be equal to g(y) + f(x) if the
function f alters the value of y.
(2) A binary operator 0 is associative if e 0 (,8 0 ?) = (e 0 fl) 0 ? for all

e, fl, and 7. F o r example, addition is associative because
+ (# + r) = (= + #) + ~.t
(3) A binary operator 0~ distributes over a binary operator 02 if
01 (fl 02 7) = (e 0~ ,O) 02 (e 0a ?). F o r example, multiplication distributes
over addition because 0c • (p + ~,) = e • fl + e • y. The same caveats as
for (1) and (2) also apply here.
(4) A unary operator 0 is a self-inverse if 00e = e for all e. For exam-
ple, Boolean not and unary minus are self-inverses.
(5) A n expression e is said to be an identity under a (binary) operator
0 if e 0 • = e 0 e = • for all e. Some c o m m o n examples of identity expres-
sions are
(a) The constant 0 is an identity under addition. So is any expression
which has the value 0, such as e -- ~, e • 0, (-- e) + ~, and so
forth.
(b) The constant 1 is a multiplicative identity.
(c) The Boolean constant true is a conjunctive identity.
(That is, ~ and true = ~ for all =).
(d) The Boolean constant false is a disjunctive identity.
(That is, 0~ or false = e for all e).
If a is a set of algebraic laws, we say that expression o~ is equivalent to
expression fl under ~, written e --=4 fl, if e can be transformed into fl using
the algebraic laws in a;.
Example 11.10
Suppose that we have the expression
A ,(B, C) + ( B , A ) , D + A • E
Using the associative law of • we can write A • (B • C) as (A • B) • C. Using

the commutative law f o r . , we can write B • A as A • B. Then using the dis-
tributive law, w e c a n w r i t e the entire expression as
(A , B) , (C + D) + A , E
Finally, applying the associative law to the first term and then the distributive
law, we can write the expression as
A , ( B , ( C + O)+E)
tOne must also use this transformation with care. For example, suppose x is very
much larger than y, z = --x, and floating-point calculation is done. Then (y + x) + z
may give 0 as a result, while y -I- (x + z) gives y as an answer.
Thus, this expression is equivalent to the original under the associative,

commutative, and distributive laws for + and ,. However, this final expres-
sion can be evaluated using two multiplications and two additions, while
the original expression required five multiplications and two additions. I---1
We can extend the definition of equivalence under a set of algebraic laws

6t to blocks. We say that blocks (gl and (B2 are equivalent under 6, written
(gl --=a 6~2, if for each expression 0c in v((gl) there is an expression fl in v((g2)
such that e ~ a fl, and conversely.
Each algebraic law induces a corresponding transformation on blocks
(and dags).
Example 11.11
If-+- is commutative, then the transformation on blocks correspond-
ing to this algebraic law would allow us to replace a statement of the form
X ~-- A ÷ B in a block by the statement X ~-- B -+- A.
The associated transformation on dags would allow us to replace the
structure
by the structure
anywhere within a dag. [-]
Example 11.12
Let us consider the transformation on blocks corresponding to the
associative law for ÷ . Here we can replace a sequence of two statements
of the form
X< B-+- C
Y< A-k- X
by the three statements
X< B+C
X'< A+B
Y< X'-+- C
where X' is a new variable. This transformation would have the following
analog on dags:
<3=0,
Note that we preserve the statement X , - - B + C, because the variable

X may be referenced by some later statement. However, if the statement
X , - - B - + - C is useless after the transformation, then we may remove
this statement using transformation T~. If, in addition, the statement
X' ~-- A + B can be removed by T z, we have used the associative law to
advantage. (See Exercise 11.1.17.) D
Given a finite set of algebraic laws and the corresponding transformations

on blocks, we would like to use these in conjunction with the four topological
transformations of Section 11.1.2 on a given block to find an optimal
equivalent block. Unfortunately, for a particular set of algebraic laws, there
may be no effective way of applying these transformations to find an opti-
mal block.
The approach usually taken is to apply algebraic transformations in
limited ways, in the hopes of doing most of the possible "simplification" of
expressions and of producing as many common subexpressions as possible.
A typical scheme would uniformly replace fl 0 ~ by 0c 0 fl if 0 were a com-
mutative binary operator and e preceded fl under some lexicographic order-
ing o f variable names. If 0 were an associative and commutative binary
operator, then 0cl 0 0c2 0 . . . 0 0~, would be transformed by ordering the names
e l , . . . , ~, lexicographically and grouping from the left.
We conclude this section with an example that illustrates the possible
effect of algebraic transformations on blocks.
Example 11.13
Consider the block 03 ~- (P, I, { Y}), where I = {A, B, C, D, E, F} and
P is the following sequence of statements'
Xi<---B--C
X2 <, A ,X~
X3 < - E * F
X4< D'X3
Y< X: * X ,
03 computes the expression
Y = (A , (B -- C)) , (D , (E , F))
The dag for 03 is shown in Fig. 1 1.6.
C
C
Fig. 11.6 D a g for 6~.
Suppose that we wish to generate an assembly code program for 03 where

we are using the assembly code and cost function of Example 11.9 (p. 863).
If we generate assembly code directly from 03, the resulting assembly lan-
guage program would have a cost of 15.
Now let us suppose that • is a commutative and associative operator and
that we wish to find an optimal block for 03 that is equivalent to 03 under
the associative and commutative law for ,. We shall apply to 03 the algebraic
transformations corresponding to the two algebraic laws for ,. Our goal
in applying these transformations will be to try to obtain a sequence of

statements in which intermediate results can be immediately used by the
following instruction without being stored.
Assuming that • is associative, in ~ we can replace the two statements
X3 < E,F
X4 < D *X3
by the three statements
X3 < E,F
X~ < D,E
X4 < X~,F
Now the statement X 3 ~ E • F is useless and can be deleted by transfor-
mation T1. Then using the associative transformation, we can replace the
statements
X4 < X~ * F
Y< X2 * X 4
by the statements
X4< X~ , F
x'~ < x~ , x'~
Y< X'4 * F
The statement X4 ~ - X~ • F is now useless and can be deleted. At this point

we have the statements
Xi< B--C
X2 < A *Xi
X~ < D,E
Y< X~*F
Now if we apply the associative transformation once more to the third and
fourth statements, we obtain (after deleting the resulting useless statement)
the block
Xi+--B--C
X2 < A*X~
sEc. 11.1 OPTIMIZATION OF STRAIGHT-LINE CODE 873
Y<~ X~*F
Finally, if we assume that • is commutative, we can permute the operands

of the second statement to obtain the following block (B"
Xi<---B--C
X2 < X~*A
X~' < X2• D
X~ <--- X~' * E
Y< X]*F
The dag for CB' is shown in Fig. 11.7. (B' has a cost of 7, the lowest possible
cost for a block equivalent to 6V. In the next section we shall give a sys-
tematic method for optimizing arithmetic expressions using the associative
and commutative algebraic laws.
Fig. 11.7 Dag for ~ ' .

EXERCISES
11.1.1. Let (B = (P, {A, B, C}, {F, G}) be a block in which P is
T~----A + B
R< -A,T
S< B+C
F~----R.S
T~ A,A
R< A+B
S< A,R
G~----S+T
(a) What is v((B) ?
(b) Indicate the scope of each statement in P.
(c) Does P have any useless statements ?
(d) Transformation T2 is applicable to the first and sixth statements
What values may D (as defined on p. 851) take in this application.
ofT2?
(e) Draw a dag for (B.
(f) Find an equivalent reduced block for ~ .
(g) How many different reduced blocks are equivalent to (B except for
renaming ? (More technically, let (B' be an open reduced block
equivalent to (B. What is the cardinality of {(B" I(B" ~ (B'} ?)
4
(h) Find a block equivalent to (B that is optimal according to the cost
criterion of Example 11.9 (p. 863).
11.1.2. Prove that transformations T1, T3, and T4 preserve block equivalence
(that is, if (B ==~ (B', then v((B) = v((B3 for i = 1, 3, and 4).
i
11.1.3. Show that in transformation Tz, as presented on p. 851, if D is any

symbol not mentioned in P, then v((B) = v(~').
"11.1.4. Give an algorithm to determine the set of permissible names for D in
transformation T2.
11.1.5. Prove that the algorithm following Example 11.3 removes all useless
statements and input variables from a block. Show that the number
of steps required to implement this algorithm is linearly proportional
to the number of statements in a block.
"11.1.6. Devise an algorithm to remove all redundant computations (trans-
formation T2) from a block in time 0(n log n), where n is the number
of statements in the block.'t (Note the similarity to minimizing finite
state machines by Algorithm 2.6.)
tDo not forget that the set of possible names of variables is infinite. Thus, some book-
keeping techniques such as those mentioned in Section 10.1 must be used.
EXERCISES 87S
11.1.7. Devise an algorithm to compute the scope of a statement in a block.

11.1.8. Define the transformations on dags that are analogous to the trans-
formations T1-T4 on blocks.
Exercises 11.1.9 and 11.1.10 show that if ~1 ~ ~2, then
1,2 3,4
1,2
• 11.1.9. Show that if (B1 ~ (Bz, then there is a block (B3 such that (B3 =~ (B~
and ~3 ~ (B~. Thus, transformation T3 can be implemented using
one application of T~ in reverse followed by one application of Tz.
• 11.1.10. Show that if ~1 ::~ (Bz, then there is a block (~3 such that (B3 ~ ~2
and (B3 ~ (Bi.
DEFINITION
A set S of transformations on blocks is complete if v((B1) = v(~2)
implies that 6~1 ~ (Bz. S is minimal complete if no proper subset
S
of S is complete.
Exercises 11.1.9 and 11.1.i0 show that {T~, T2} is complete. The
following two exercises show that {Ti, Tz} is minimal complete.
• 11.1.11. Show that block (B = (P, {A, B}, {C, D}) cannot be transformed into
(B' ----(P', {A, B}, {C, D}), where P a n d P ' are as shown, using only trans-
formations T1, T3, and T4.
p p"
E~--Aq-B C~---A+B
D~---E.E D~-----C,C
C~---Aq-B
Hence, [ T1, T3, T4} is not complete, and so [ T1 } cannot be complete.

"11.1.12. Show that block (B = (P, [A, B], {C}) cannot be transformed into
(B' -- (P', [A, B}, [C}) using only transformations T2, T3, and T4,
where P and P' are
p p'
C~----A.B C~-----A-q-B
C~---A-I-B
11.1.13. Provide an algorithm to determine whether two blocks are equivalent.

11.1.14. Let P = $1; S ~ ; . . . ; Sn be a sequence o f assignment statements. Let
I be a set of input variables. Give an algorithm to locate all undefined
(referenced before being assigned) variables in P.
• 11.1.15. Consider blocks as defined but also include statements of the form
A ~ B with the obvious meaning. Find a complete set of transfor-
mations for such blocks.
• "11.1.16. Assume that addition is commutative. Let T5 be the transformation
which replaces a statement A ~ B + C by A ~ C + B. Show that T5
together with transformations T1 and T2 transform two blocks into
one another if and only if they are equivalent under the commutative
law of addition.
• "11.1.17. Assume that addition is associative. Le(T6 be the transformation which
replaces two statements X ~ A + B; Y ~ X + C by the three state-
ments X ~ A + B ; X'~B+C; Y~A +X'or the statements
X ~ - - B + C;Y ~---A + X b y X ~ - - B + C; X" + - A + B;Y~-- X" + C,
where X' is a new variable. Show that T6, T1, and Tz transform two
blocks into one another if and only if they are equivalent under the as-
sociative law of addition.
11.1.18. What is the transformation on blocks that corresponds to the distri-
butive law of • over + ? What is the corresponding transformation on
dags ?
• "11.1.19. Show that there exist sets of algebraic laws for which it is recursively
undecidable whether two expressions are equivalent.
DEFINITION
An algebraic law is operand-preserving if no operands are created
or destroyed by one application of the algebraic law. For example,
the commutative and associative laws are operand-preserving but the
distributive law is not.
An algebraic law is operator-preserving if the number of operators
is not affected by one application of the law. The algebraic law
00 •--~ (self-inverse) is not operator-preserving, but the law
(~ -//) - ~' = • - (/~ + r ) is.
The number of interior nodes and the number of leaves in the
dag associated with a block are preserved when the transformations
corresponding to operator- and operand-preserving algebraic laws are
applied to the block.
"11.1.20. Show that under a set of operator- and operand-preserving algebraic
laws it is decidable whether two blocks are equivalent.
• "11.1.21. Extend Theorem 11.5 to apply to optimization of blocks using both
the topological transformations of Section 11.1.2 and an arbitrary
collection of operator- and operand-preserving algebraic transfor-
mations.
"11.1.22. Consider blocks in which variables can represent one-dimensional
arrays. Let us consider assignment statements of the form
(1) A(X) ~ - B and
(2) B ~ A(X),
EXERCISES 877
where A is a one-dimensional array and B and X are scalars.

If we have block in which each statement is of form (1) or (2) or
B ~ 0Ci . . . Cr, where B, C 1 , . . . , Cr are scalars, find some transfor-
mations that can be applied to these blocks making use of the fact
that A is an array.
11.1.23. Prove Lemma 11.1.
11.1.24. Prove Lemma 11.2.
"11.1.27. Give an example of a cost criterion C such that if (B1 ==~ (B2, then
1,2
C((B2) < C((BI), yet not every block has an optimal block under C.
11.1.28. Prove Lemma 11.6.
11.1.29. Show that if (B~ is open and (B~ ~ (B2 =~. ~3, then there is a block
(B such that (Bi =0- (B ~ (B3.
4 3
11.1.30. Prove Lemma 11.7. Hint: Use Exercise 11.1.29.
"11.1.31. Suppose that we have a machine with N registers such that operations
can be done with any or all arguments in registers, the result appearing
in any designated register. Show that the output values of a block
can be computed on such a machine with no store instructions (the
results appearing in registers) if and only if that block has an equiva-
lent block in which no more than N variable names appear in the
instructions.
"11.1.32. Show that if T~ and T2 are applied to a given block (B in any order
until a reduced block is obtained, then a unique block (up to renam-
ing) results.
Research Problems
11.1.33. Using the cost criterion of Example 11.9, or some other interesting cost
criterion, find a fast algorithm to find an optimal block equivalent to
a given one.
11.1.34. Find a collection of algebraic transformations that is useful in opti-
mizing a large class of programs. Devise efficient techniques for
applying these transformations.
P r o g r a m m i n g Exercises
11.1.35. Using a suitable representation for dags, implement transformations
T1 and T2 of this section.
11.1.36. Implement the heuristic suggested in Example 11.9 to "optimize" code
for a one-accumulator machine.
BIBLIOGRAPHIC NOTES
The presentation in this section follows Aho and Ullman [1972e]. ]garishi [1968]
discusses transformations on similar blocks with A ~--B statements permitted
and names of output variables considered important. DeBakker [1971] considers
blocks in which all statements are of the form A ~ B. Bracha [1972] treats straight
line blocks with foward jumps.
Richardson [1968] proved that no algorithm to "simplify" expressions exists
when the expressions are taken to be over quite simple operators. The answer to
Exercise 11.1.19 can be found in his article. Caviness [1970] also treats classes of
algebraic laws for which equivalence of blocks is undecidable.
Floyd [1961a] and Breuer [1969] have considered algorithms to find common
subexpressions in straight-line blocks when certain algebraic laws pertain. Aho
and Ullman [1972f] discuss the equivalence of blocks with structured variables as
in Exercise 11.1.22. Some techniques useful for Exercise 11.1.32 can be found in
Aho, Sethi, and Ullman [1972].
11.2. ARITHMETIC EXPRESSIONS
Let us now turn our attention to the design of a code generator which
produces assembly language code for blocks. The input to the code generator
is a block consisting of a sequence of assignment statements. The output is
an equivalent assembly language program.
We would like the resulting assembly language program to be good under
some cost function such as number of assembly language instructions or
number of memory fetches. Unfortunately, as mentioned in the last section,
there is no efficient algorithm known that will produce optimal assembly
code, even for the simple "one-accumulator" machine of Example 11.9.
In this section we shall provide an efficient algorithm for generating
assembly code for a restricted class o f b l o c k s - - t h o s e that represent one
expression with no identical operands. For this class of blocks our algorithm
will generate assembly language code that is optimal under a variety of cost
criteria, including program length and number of accumulators used.
While the assumption of no identical operands is certainly not realistic,
it is often a good first-order approximation. Moreover, if we are to generate
code using a syntax directed translation with synthesized attributes only, the
assumption is quite convenient. Finally, experience has shown that the prob-
lem of generating optimal code for expressions with even one pair of identical
operands is extremely difficult in comparison.
A block representing one expression has only one output variable. For
example, the assignment statement F = Z • (X ÷ Y) can be represented by
the block (B = (P, {X, Y, Z}, (F}), where P is
SEC. 11.2 ARITHMETIC EXPRESSIONS 879
R< ~X-t- Y
F< Z.R
The restriction that the assignment involve an expression with no identical

operands is equivalent to requiring that the dag for the expression be a
tree.
For convenience we shall assume that all operators are binary. This
restriction is not serious, since it is straightforward to generalize the results
of this section to expressions involving arbitrary operators.
We shall generate assembly code for a machine having N accumulators,
where N ~ 1. The cost criterion will be the length of the assembly language
program (i.e., the number of instructions). The algorithm is then extended to
take advantage of operators which we know are commutative or associative.
11.2.1. The M a c h i n e Model
We consider a computer with N 3> 1 general-purpose accumulators and

four types of instructions.
DEFINITION
An assembly language instruction is a string of symbols of one of the fol-

lowing four types"
LOAD M, A
STORE A, M
OP0 A,M,B
OP0 A,B, C
In these instructions, M is a memory location and A, B, and C are accumu-

lator names (possibly the same). OP 0 is the operation code for the binary
operator 0. We assume that each operator 0 has a corresponding machine
instruction of type (3) and (4).
These instructions perform the following actions"
(1) LOAD M, A places the contents of memory location M into accumu-
lator A.
(2) STORE A, M places the contents of accumulator A into memory
location M.
(3) OP 0 A, M, B applies the binary operator 0 to the contents of accumu-
lator A and memory location M and places the result in accumulator B.
(4) OP 0 A, B, C applies the binary operator 0 to the contents of accumu-
lators A and B and stores the result in accumulator C.
If there is but one accumulator, this set of instructions reduces to that in
Example 11.9, except for type (4) instructions, which become OP 0 A, A, A.

The algorithm we have in mind does not take advantage of such an instruc-
tion, and so for these purposes, our instruction set can be thought of as
a generalization of one-address, single-accumulator instructions.
An assembly language program (program for short) is a sequence of assem-
bly language instructions.
If P = I I ; I 2 ; . . . ; I , is a program, we can define the value of register
Rafter instruction t, denoted vt(R), as follows. (A register is either an accumu-
lator or a memory location.)
(1) vo(R ) = R if R is a memory location and is undefined if R is an
accumulator.
(2) Let ~ be LOAD M, A. Then v,(A) = %_ i(M).
(3) Let L be STORE A, M. Then v,(M) = v,_ i(A).
(4) Let ~ be OP 0 A, R, C. Then v,(C) = 0 vt_ ~(A) v,_ ~(R). Note that R
may be an accumulator or a memory location.
(5) If v,(R) is not defined by (2)-(4) but v,_ i(R) has been defined, then
vt(R) = v,_ ~(R). Otherwise, v,(R) is undefined.
Thus, values are computed exactly as one would expect. LOAD's and
STORE's move values from one register to another, leaving them also in
the original register. Operations place the computed value in the accumu-
lator designated by the third argument, leaving other registers unchanged in
value. We say that a program P computes expression ~, leaving the result in
accumulator A, if after the last statement of P, accumulator A has the
value g.
Example 11.14
Consider the following assembly language program with two accumu-
lators A and B. The values of the accumulators after each instruction are
shown beside each instruction in infix notation, as usual.
v(A) v(B)
LOAD X, A X Undefined
ADD A, Y, A X+ Y Undefined
LOAD Z, B X+ Y Z
MULT B, A, A Z , (X + Y) Z
The value of accumulator A at the end o f the program corresponds to the

(infix) expression Z • (X + Y). Thus, this program computes Z • (X + Y),
leaving the result in accumulator A. (Technically, the expression Z is also
computed.) [Z]
In this section we shall formally define an (arithmetic) syntax tree as a

labeled binary tree T having one or more nodes such that
(1) Each interior node is labeled by a binary operator 0 in ®, and
(2) Each leaf is labeled by a distinct variable name X in X.
For convenience we assume that O and X are disjoint. Figure 11.8 shows
the tree for Z • (X + Y).
Fig. 11.8 Tree for Z • (X + Y).
We can assign values to the nodes of a tree from the bottom as follows:
(1) If node n is a leaf labeled X, then n has value 2".
(2) If n is an interior node labeled 0 with direct descendants n 1 and n z
whose values are v I and v z, then n has value Ovlv z.
The value o f a tree is the value of its root. For example, the value of the
tree in Fig. 11.8 is Z * (X + Y) in infix notation.
Let us briefly discuss the relation between the intermediate language
blocks of Section 11.1 and the assembly language programs we have just
defined. First, given a reduced block in which
(1) All operators are binary,
(2) Each input variable is referenced once, and
(3) There is exactly one output variable,
the dag associated with the block will be a tree. This tree is a syntax tree in
our current terminology. The value of the expression is also the value of
the block.
We can naturally convert the intermediate language block to an assembly
language program, statement by statement. It turns out that if this conversion
takes account of the possibility that desired values are already in accumu-
lators, then we can produce an optimal assembly program from a given
reduced open block using only transformation T4, as suggested by Theorem
11.5, and then performing conversion to assembly language.
However, it may not be entirely obvious that the above is true; the reader
should verify these facts for himself. What we achieve by essentially rework-
ing many of the definitions of Section 11.1 for assembly language programs
is to show that there is no strange optimal assembly language program which
is not related by any natural statement-by-statement conversion to an inter-
mediate language block obtainable from a reduced open block and trans-
formation T4.
11,2.2. The Labeling of Trees
Fundamental to our algorithm for generating code for expressions is

a method of attaching additional labels to the nodes of a syntax tree. These
labels are integers, and we shall subsequently refer to them as the labels of
nodes, even though each node is also labeled by an operator or variable.
The integer label determines the number of accumulators needed to evaluate
an expression optimally.
ALGORITHM 11.1
Labeling of syntax trees.
Input. A syntax tree T.
Output. A labeled syntax tree.
Method. We assign integer labels to the nodes of T recursively from the
bottom as follows:
(1) If a node is a leaf and either the left direct descendant of its direct
ancestor, or a root (i.e., the tree consists of this one node), label this node 1 ;
if it is a leaf and the right direct descendant, label it 0.
(2) Let node n have direct descendants n t and nz with labels l l and 12.
If 11 ~ 12, let the label of n be the larger of Ii and 12. If 11 = 12, let the label
of n be one greater than 11. [Z]
Example 11.15
The arithmetic expression A • ( B - C)/(D • ( E - F)) is expressed in

tree form in Fig. 11.9. The integer labels are shown. [Z]
The following algorithm converts a labeled syntax tree into an assembly

language program for a machine with N accumulators. We shall show that
for each N the program produced is optimal under a variety of cost criteria,
including program length.
ALGORITHM 11.2
Assembly code for expressions.
Input. A labeled syntax tree T and N accumulators A1, A2,. • •, AN for
some N _~ 1.
'C
Fig. 11.9 Labeled syntax tree.
Output. An assembly language program P such that v(A 1) after the last
instruction of P is v(T); i.e., P computes the expression represented by T,
leaving the result in accumulator A 1.
Method. We assume that T has been labeled using Algorithm 11.1. We
then execute the following procedure code(n, i) recursively. The input to code
is a node n of T and an integer i between 1 and N. The integer i means that
accumulators A~, A~+ 1, • • •, AN are currently available to compute the expres-
sion for node n. The output of code(n, i) is a sequence of assembly language
instructions which computes the value v(n), leaving the result in accumu-
lator Av
Initially we execute code(n 0, 1), where n o is the root of T. The sequence
of instructions generated by this call of the procedure code is the desired
assembly language program.
Procedure code(n, i).
We assume that n is a node of T and that i is an integer between 1 and N.

(1) If node n is a leaf, do step (2). Otherwise, do step (3).
(2) If code(n, i) is called and n is a leaf, then n will always be a left direct
descendant (or the root if n is the only node in the tree). If leaf n has variable
name X associated with it, then
code(n, i) • ' L O A D X, A,'
[meaning that the output of code(n, i) is the instruction L O A D X, Ai]. End.

(3) We reach this point only if n is an interior node. Let n have operator
0 associated with it and direct descendants n 1 and n z with labels 11 and I z
as shown:
The next step is determined by the values of labels 11 and 12"

(a) If lz : 0 (node n 2 is a right leaf), then do step (4).
(b) If 1 ~ 11 < 12 and 11 < N, then do step (5).
(c) If 1 ~ 12 ~ l~ and 12 < N, then do step (6).
(d) If N _~ 11 and N ~ 12, then do step (7).
(4) code(n, i) = code(n 1 ' i)
'OP 0 A t, X, A t'
Here X is the variable associated with leaf n2, and OP 0 is the operation code
for operation 0. The output of code(n, i) is the output of code(n1, i) followed
by the instruction OP 0 A t, X, A~.
(5) code(n, i) = code(n z, i)

code(n l, i -Jr- 1)
'OP 0 Ai+ 1, A t, A,.'
(6) code(n, i) code(n 1, i)
code(n2, i + 1)
'OP 0 A t, At+ 1, At'
(7) code(n, i) = code(n z, i)
T< newtemp
'STORE A i, T'
code(n 1, i)
'OP 0 A t, T, A t'
Here newtemp is a function which whenever invoked produces a new tempo-

rary memory location for storing intermediate results. IS]
Later we shall show that the following relationships between ll, 12, and i
hold when steps (5), (6), and (7) of Algorithm 11.2 are invoked"
Step Relation
(5) i ~ N - It
(6) i ~ N - - lz
(7) ~= 1
Note also that Algorithm 11.2 requires instructions of type (4) of the form
OPO A,B,A
OPO A,B,B
By making the procedure code slightly more complicated in step (5), we can
eliminate the need for instructions of the form
OPO A,B,B
which is not part of the instruction repertoire of some multiregister machines.

(See Exercise 11.2.11.)
We can view code(n, i) as a function which computes a translation at each
node of an expression in terms of the translations and labels of the direct
descendants of the node. To get acquainted with Algorithm 11.2, let us con-
sider several examples.
Example 11.16
Let T be the syntax tree consisting of the single node X (labeled 1). From
step (2) code is the single instruction LOAD X, At. [Z
Example 11.17
Let T be the labeled syntax tree in Fig. 11.10. The assembly language
p r o g r a m for T using Algorithm 11.2 with N = 2 is produced as follows.
The following sequence of calls of code(n, i) is generated. We also show the
step of Algorithm 11.2 which is invoked during each call. Here, we indicate
a node by the variable or operator associated with it.
Call Step of Algorithm 11.2
code(,, 1) (3c)
code(Z, 1) (2)
code(+, 2) (3a)
code(X, 2) (2)
'"7"32 T4: T1
r3
. 1 ~ ULTAI'A2'A1
T3"T2
Ti: LOAD Z, ADD A2, Y, A 2
G O
T2: LOAD X, A 2
Fig. 11.10 Labeled syntax tree with translations.
The call code(X, 2) generates the instruction LOAD X, A s, which is the trans-
lation associated with node X. The call code(q-, 2) generates the instruction
sequence
LOAD X, A s
ADD As, Y, A 2
which is the translation for node + .

The call code(Z, 1) generates the instruction LOAD Z, A1, the translation
for node Z. The call code(., 1) generates the final program which is the trans-
lation for the root"
LOAD Z, A 1
LOAD X, A 2
ADD A2, Y, A 2
MULT A 1, As, A 1
This program is similar (but not identical) to that in Example 11.14. The
value in accumulator A 1 at the end of this program is clearly Z • (X-t- Y).
D
Example 11,18
Let us apply Algorithm 11.2 with N = 2 to the syntax tree in Fig. 11.9
(p. 883). The following sequence of calls of code(n, i) is generated. Here *z
refers to the left descendant o f / , *R to the right descendant o f / , --z to the
right descendant of *L, and --R to the right descendant of *R. The step of
Algorithm 11.2 which is applicable during each call is also shown.
SEC. 11.2 ARITHMETICEXPRESSIONS 887
Call Step
code(/, 1) (3d)
code(,R, i) (3c)
code(D, 1) (2)
code(- R, 2) (3a)
code(E, 2) (2)
eode(,L, 1) (3c)
code(A, 1) (2)
code(--L, 2) (3a)
code(B, 2) (2)
The following program is generated by code(/, 1)"
LOAD D, A 1
LOAD E, A 2
SUBTR A2, F, A z
MULT A1, A2, A1
STORE A1, TEMP1
LOAD A, A 1
LOAD B, A 2
SUBTR A 2, C, A2
MULT A1, A2, A1
DIV A I, TEMP 1, A 1
Here TEMP1 is a memory location generated by newtemp. D
We shall prove that the label of the root of the labeled syntax tree pro-
duced by Algorithm 11.1 is the smallest number of accumulators needed to
compute that expression without using any STORE instructions.
We begin by making several observations about Algorithm 11.2.
LEMMA 11.8
The program produced by procedure code(n, i) in Algorithm 11.2 correctly
computes the value of node n, leaving that value in the ith accumulator.
Proof. An elementary induction on the height of a node. [Z]
LEMMA 11.9
If Algorithm 11.2, with N accumulators available, is applied to the root
of a syntax tree, then when procedure code(n, i) is called on node n with
label I either
(1) l > N and N accumulators are available for this call (i.e., i = 1), or
(2) 1 < N and at least 1 accumulators are available for this call (i.e.,
i<N--l+ 1).
P r o o f Another elementary induction, this time on the number of calls
of code(n, i) made prior to the call in question. [Z
THEOREM 11.6
Let T be a syntax tree and let N be the number of available accumulators.
Let l be the label of the root of T. Then there exists a program to compute
T which uses no STORE instructions if and only if l < N.
Proof
If." If I ~ N, then step (7) of procedure code(n, i) is never executed. That
is, a node whose two direct descendants have labels equal to or greater than N
has a label at least N + 1 itself. Step (7) is the only step which generates a
STORE instruction. Therefore, if l ~ N, the program constructed by Algo-
rithm 11.2 has no STORE's.
Only if: Assume that 1 > N. Since N ~ 1, we must have 1 ~ 2. Suppose
that the conclusion is false. Then we may assume without loss of generality
that T has a program P which computes it using N accumulators, that P
has no STORE statements, and that there is no syntax tree T' which has
fewer nodes than T and also violates the conclusion. Since the label of the
root of T exceeds 1, T cannot be a single leaf. Let n be the root and let n~
and n2 be its direct descendants, with labels 1~ and 12, respectively.
Case 1" 1~ : 1. The only way the value of n can be computed is for the
value of n~ to appear at some time in an accumulator, since n~ cannot be
a leaf. We form a new program P ' from P by deleting those statements follow-
ing the computation of the value of n I . Then P ' computes the subtree with
root n~ and has no STORE's. Thus, a violation with fewer nodes than T
occurs, contrary to our assumption about T.
Case 2:12 : l. This case is similar to case 1.
Case 3" 11 : 12 : 1 - 1. We have assumed that no two leaves have the
same associated variable name. We can assume without loss of generality
that P is "as short as possible," in the sense that if any statement were deleted,
the value of n would no longer appear in the same accumulator at the end
of P. Thus, the first statement of P must be LOAD X, A, where X is the
variable name associated with some leaf of T, for any other first statement
could be deleted.
Let us assume that X is the value of a leaf which is a descendant of n~
(possibly n~ itself). The case in which X is a value of a descendant of n 2 is
symmetric and will be omitted. Then until n 1 is computed, there is always at
least one accumulator which holds a value involving X. This value could
not be used in a correct computation of the value of n~. We may conclude
that from P we can find a program P ' which computes the value of nz, with
label 1 - 1, which uses no STORE's and no more than N - 1 accumulators
at any time. We leave it to the reader to show that from P' we can find an
equivalent P " which never mentions more than N -- 1 different accumulators.
(Note that P' may mention all N accumulators, even though it is not "using"
more than N - 1 at any time.) Thus, the subtree of n z forms a smaller vio-
lation of our conditions, contradicting the minimality of T. We conclude
that no violation can occur. D
11.2.3. Programs with STORE's
We shall now consider how many LOAD's and STORE's are n e e d e d t o

compute a syntax tree using N accumulators when the root has a label greater
than N. The following definitions are useful.
DEFINITION
Let T be a syntax tree and let N be the number of available accumulators.

A node of T is major if each of its direct descendants has a label equal to or
greater than N. A node is minor if it is a leaf and the left direct descendant
of its direct ancestor (i.e., a leaf with label 1).
Example 11.19
Consider the syntax tree of Fig. 11.9 (p. 883)again, with N -- 2. The only
major node is the root. There are four minor nodes, the leaves with values
A, B, D, and E. [---]
LEMMA 11.10
Let T be a syntax tree. There exists a program to compute T using m
LOAD's if and only if T has no more than m minor nodes.
Proof If we examine procedure code(n, i) of Algorithm 11.2, we find that
only step (2) introduces a LOAD statement. Since step (2) applies only to
minor nodes, the "if" portion is immediate.
The "only if" portion is proved by an argument similar to that of Theorem
11.6, making use of the facts that the only way the value of a leaf can appear
in an accumulator is for it to be "LOADed" and that the left argument of
any operator must be in an accumulator. D
LEMMA 11.11
Let T be a syntax tree. There exists a program P to compute T using M
STORE's if and only if T has no more than M major nodes.
Proof.
If: Again referring to procedure code(n, i), only step (7)introduces a
STORE, and it applies only to major nodes.
Only if: This portion is b y induction on the number of nodes in T. The
basis, a tree with one node, is trivial, as the label of the root is 1, and there
are thus no major nodes. Assume the result for syntax trees of up to k -- 1
nodes, and let T have k nodes.
Consider a program P which computes T, and let M be the number of
major nodes of T. We can assume without loss of generality that P has as
few STORE's as any program computing T. If M = 0, the desired result is
immediate, and so assume that M ~ 1. Then P has at least one STORE,
because the label of a major node is at least N -k 1, and if no STORE's were
present in P, a violation of Theorem 11.6 would occur.
The value stored by the first STORE instruction of P must be the value
of some node n of T, or else a program with fewer STORE's than P but com-
puting T could easily be found. Moreover, we may assume that n is not
a leaf for the same reason. Let T' be the syntax tree formed from T b y making
node n a leaf and giving it some new name X as value. Then T' has fewer nodes
than T, and so the inductive hypothesis applies to it. We can find a program
P ' which evaluates T' using exactly one fewer STORE than P. P ' is constructed
from P by deleting exactly those statements needed to compute-the first
value stored and replacing subsequent references in P to the location used
for that STORE by the name X until a new value is stored there.
If we can show that T' has at least M - 1 major nodes, we are done,
since by the inductive hypothesis, we can then conclude that P ' has at least
M -- 1 STORE's and thus that P has at least M STORE's.
We observe that no descendant of n in T can be major, since a violation
of Theorem 11.6 would occur. Consider a major node n' of T. If n is not
a descendant of n', then n' will be a major node in T'. Thus, it suffices to
consider those major nodes nl, n z , . . • on the iaath from n to the root of T.
By the argument of case 3 of Theorem 11.6, n cannot itself be major. The
first node, n l, if it exists, may no longer be major in T'. However, the label
of n 1 in T' is at least N, because the direct descendant of nl that is not an
ancestor of n must have a label at least N in T and T'. Thus, n 2, n 3, . . . are
still major nodes in T'. We conclude that T' has at least M - 1 major nodes.
The induction is now complete. E]
THEOREM 11.7
Algorithm 11.2 always produces a shortest-length program to compute

a given expression.
Proof. By Lemmas 11.10 and 11.11, Algorithm 11.2 generates a program
with the fewest LOAD's and STORE's possible. Since the minimum number
of operation instructions is clearly equal to the number of interior nodes of
the tree and Algorithm 11.2 yields one such instruction for each interior
node, the theorem follows.
Example 11.20
As pointed o u t in Example 11.19, the arithmetic expression of Fig. 11.9
has one major and four minor nodes (assuming that N = 2). It also has five
interior nodes. Thus, at least ten statements are necessary to compute it.
The program of Example 11.18 has ten statements. Note that one of these is
STORE, four are LOAD's, and the rest operations. E]
11.2.4. Effect of Some Algebraic Laws
We can define the cost of a syntax tree as the sum of

(1) The number of interior nodes,
(2) The number of major nodes, and
(3) The number of minor nodes.
The results of the previous section indicate that this cost is a reasonable
measure of the "complexity" of a syntax tree, in that the number of instruc-
tions needed to compute a syntax tree is equal to the cost of the tree.
Often, algebraic laws may apply to certain operators, and making use of
these identities can reduce the cost of a given syntax tree. From Section 11.1.6
we know that each algebraic law induces a corresponding .transformation on
syntax trees. For example, if n is an interior node of a syntax tree associated
with a commutative operator, then the commutative transformation reverses
the order of the direct descendants of n.
Likewise if 0 is an associative operator [i.e., ~ 0 (fl 0 7 ) = (0c 0,8) 0 7],
then using the corresponding associative transformation on trees we can
transform two syntax trees as shown in Fig. 11.11. The associative transfor-
mation depicted in Fig. I 1.11 corresponds to the transformation
X< BOC X'+---AOB

Y< -AOX Y< -X'OC
on blocks. In Section 11.1.6 we retained the statement X ~ B/9 C after the

transformation from left to right. However, in our present discussion this
statement will always be useless after the transformation, and so it can be
safely removed without changing the value of the block.
DEFINITION
Given a set a of algebraic laws, we say that two syntax trees T~ and T2
are equivalent under a, written T1 ~ a Tz, if there exists a sequence of trans-
formations derived from these laws which will transform T~ into T2. We shall
write [T]a to denote the equivalence class of trees {T'[T _=~ T'}.
<~==C>
Fig. 11.11 Associativetransformation onsyntax trees.
Thus, if we are given a syntax tree T and we know that a certain set
of algebraic laws prevails, then to find an optimal program for T we might
want to search [T]a for an expression tree with the minimum cost. Once we
have found a minimum cost tree, we can apply Algorithm 11.2 to find the
optimal program. Theorem 11.7 guarantees that the resulting program will
be optimal.
If each law preserves the number of operators, as do the commutative
and associative laws, then we need only minimize the sum of major and
minor nodes. As an example, we shall give algorithms to do this minimization,
first in the case that some operators are commutative and second in the case
that some commutative operators are also associative.
Given a syntax tree T and a set ~ of algebraic laws, the next algorithm
will find a syntax tree T' in [T]a of minimal cost provided that g contains
only commutative laws applying to certain operators. Algorithm 11.2 can
then be applied to T' to find the optimal program for the original tree T.
ALGORITHM 11.3
Minimal cost syntax tree assuming some commutative operators.
lnput. A syntax tree T (with three or more nodes) and a set of commu-
tative laws ~.
Output. A syntax tree in [T]a of minimal cost.
Method. The heart of the algorithm is a recursive procedure commute(n)
which takes a node n of the syntax tree as argument and returns as output
a modified subtree with node n as root. Initially, commute(n0)is called,
where n o is the root of the given tree T.
SEC. 11.2 A R I T H M E T I C EXPRESSIONS 893
Procedure commute(n).
(1) If node n is a leaf, commute(n) : n.
(2) If node n is an interior node, there are two cases to consider:
(a) Suppose that node n has two direct descendants n 1 and n 2 (in this
order) and that the operator attached to n is commutative. If nl
is a leaf and n 2 is not, then the output of commute(n) is the tree of
Fig. 11.12(a).
(b) In all other cases the output of commute(n) is the tree of Fig.
11.12(b). [Z]
Example 11.21
Consider Fig. 11.9 (p. 883) and assume only • is commutative. Then the
result of applying Algorithm 11.3 to that tree is shown in Fig. 11.13. Note
that the label of the root of Fig. 11.13 is 2 and that there are two minor nodes.
(a)
(b)
Fig. 11.12 Result of commute procedure.
1 1
Fig. 11.13 Revised arithmetic expression.
Thus, if two accumulators are available, only seven statements are needed
to compute this tree, compared with ten for Fig. 11.9. [Z]
THEOREM 11.8
If the only algebraic law permitted is the commutative law of certain
operators, then Algorithm 11.3 produces that syntax tree in the equivalence
class of the given tree with the least cost.
Proof. It is easy to see that the commutative law cannot change the
number of interior nodes. A simple induction on the height of a node shows
that Algorithm 11.3 minimizes the number of minor nodes and the label
that would be associated with each node after applying Algorithm 11.1.
Hence, the number of major nodes is also minimized. D
The situation is more complex when certain operators are both commu-
tative and associative. In this case we can often transform the tree extensively
to reduce the number of major nodes.
DEFINITION
/
Let T be a syntax tree. A set S of two or more nodes of T is a cluster if
(1) Each node of S is an interior node with the same associative and
commutative Operator.
(2) The nodes of S, together with their connecting edges, form a tree.
(3) No proper superset of S has properties (!) and (2).
SEC. 11.2 ARITHMETICEXPRESSIONS 895
The root of the cluster is the root of the tree formed as in (2) above. The
direct descendants of a cluster S are those nodes of T which are not in S but
are direct descendants of a node in S.
Example 11.22
Consider the syntax tree of Fig. 11.14, where q- and • are considered
associative and commutative, while no other algebraic laws pertain.
"\\\\
\ \ Cluster 1
\ \
Cluster 2 \ ~ \\ \ k \\ Cluster 3
/\ , I
Fig. 11.14 Syntax tree.
The three clusters are circled. The cluster which includes the root of
the ti'ee has as direct descendants, in order from the left, the root of cluster
2, the node to which the -- operator is attached, and the root of cluster 3.
We observe that the clusters in a syntax tree T can be uniquely found

and that the clusters are disjoint. To find a tree of minimal cost in [T]a, when
contains laws reflecting that some operators are associative and commu-
tative while others may be only commutative, the concept of an associative
tree, which condenses clusters into a single node, is introduced.
DEFINITION
Let T be a syntax tree. Then T', the associative tree for T, is formed by
replacing each cluster S of T by a single node n having the same associative
and commutative operator as the nodes of the cluster S. The direct descen-
dants of the cluster in T are made direct descendants of n in T'.
Example 11.23
Consider the syntax tree T in Fig. 11.15. Assuming that + and • are both
associative and commutative, we obtain the clusters which are circled in
Fig. 11.15. The associative tree for T is shown in Fig. 11.16. Note that the
associative tree is not necessarily a binary tree.
//
// \\
/ \
Fig. 11.15 Syntax tree with clusters.
We can l a b e l the nodes of an associative tree with integers from the bot-
tom up as follows"
(I) A leaf which is the leftmost direct descendant of its ancestor is labeled
1. All other leaves are labeled 0.
(2) Let n be an interior node having nodes n~, n 2 . . . . , n m with labels
11, 12 . . . . , Im as direct descendants, m _~ 2.
(a) If one of 11, 1 2 , . . . , I m is larger than the others, let that integer be
the label of node n.
(b) If node n has a commutative operator and n~ is an interior node
with 1,. = 1 and the rest of n~ . . . . , n~_ 2, n,.+1, • . . , rtm are leaves,
then label node n by 1.
SEC. 11.2 ARITHMETIC EXPRESSIONS 8 97
Fig. 11.16 Associative tree.
(c) Provided that (b) does not apply, if it = lj for some i ~ j and It
is greater than or equal to all other lk's, let the label of node n
be/i-t- 1.
Example 11,24
Consider the associative tree in Fig. 11.16. The labeled associative tree is
shown in Fig. 11.17.
Note that condition (2b) of the labeling procedure applies to the third
and fourth direct descendants of the root, since • is a commutative opera-
tor. [B
We now give an algorithm which takes a given syntax tree and produces
that tree in its equivalence class with the smallest cost.
ALGORITHM 11.4
Minimal cost syntax tree, assuming that certain operators are commu-
tative and that certain operators are both associative and commutative but
that no other algebraic laws pertain.
Input. A syntax tree T and a set a~ of commutative and associative-
commutative laws.
Output. A syntax tree in [T]a of minimal cost.
Method. First create T', the labeled associative tree for T. Then compute
acolmnute(no), where acommute is the procedure defined below and n o is
2 ,)l
Fig. 11.17 Labeled associative tree.
the root of T'. The output of acommute(n0) is a syntax tree in [T]~ of minimal
cost.
Procedure acommute(n).
The argument n is a node of the labeled associative tree. If n is a leaf,
aeommute(n) is n itself. If n is an interior node, there are three cases to
consider"
(i) Suppose that node n has two direct descendants n~ and n2 (in this
order) and that the operator attached to n is commutative (and possibly
associative).
(a) If n~ is a leaf and n2 is not, then the output aeommute(n) is the tree
of Fig. ll.18(a).
(b) Otherwise, aeommute(n) is the tree of Fig. 11.18(b).
(2) Suppose that 0, the operator attached to n, is commutative and asso-
ciative and that n has direct descendants n t, n 2 , . . . , nm, m > 3, in order
from the left.
Let nm,x be a node among n~, . . . . , n m having the largest label. If two or
more nodes have the same largest label, then choose nm~ be be an interior
node. Let P ~ , P 2 , . . . , P m - ~ be, in any order, the remaining nodes in
n,] --
Then the output of aeommute(n) is the binary tree of Fig. 11.19, where
each r~, 1 < i ~ m - 1, is a new node with the associative and commutative
operator O of n attached.
(a)
(b)
Fig. 11.18 Result of acoramute procedure.
(3) If the operator attached to n is neither commutative nor associative,

then the output of aeommute(n) is as in Fig. 11.18(b). E]
Example 11.25
Let us apply Algorithm 11.4 to the labeled associative tree in Fig. 11.17.
Applying aeommute to the root, case (2) applies, and we choose to treat the
first direct descendant from the left as nm~x. The binary tree which is the out-
put of Algorithm 11.4 is shown in Fig. 11.20. D
We shall conclude this section by proving that Algorithm 11.4 finds

a tree in [T]a with the least cost. The following lemma is central to the proof.
LEMMA 11.12
Let T be a labeled syntax tree and S a cluster of T. Suppose that r of
the direct descendants of S have labels _> N, where N is the number of accu-
mulators. Then at least r -- 1 of the nodes of S are major.
Fig. 11.19 Result of aeommute procedure.
P r o o f We prove the result by induction on the number of nodes in T.

The basis, one node, is trivial. Thus, assume that the result is true for all
trees with fewer nodes than T. Let node n be the root of S and let n have
direct descendants n 1 and n 2 with labels l I and l 2. Let T1 and T 2 be the names
of the subtrees with roots n~ and n2, respectively.
Case 1" Neither n l nor n 2 is in S. Then the result is trivially true.
Case 2: n I is in S, but n2 is not. Since T 1 has fewer nodes than T, the induc-
tive hypothesis applies to it. Thus, in T1, S - {n} has at least r -- 2 major
nodes if 12 ~_ N and at least r -- 1 major nodes if 12 < N. In the latter case,
the conclusion is trivial. In both cases, the result is trivial if r ~ 1. Thus,
consider the case r > 1 and l 2 ~ N. Then S - In} has at least one direct
descendant with label ~ N, so ll _~ N. Thus, n is a major node, and S
contains at least r -- 1 major nodes.
sac. 11.2 ARITHMETIC EXPRESSIONS 901
Fig. 11.20 Output of Algorithm 11.4.
Case 3" n 2 is in S, b u t n l is not. This case is similar to case 2.

Case 4: B o t h n 1 a n d n 2 are in S. Let rl o f the direct d e s c e n d a n t s of S with
labels at least N be d e s c e n d a n t s o f nl a n d r2 o f t h e m be d e s c e n d a n t s o f n 2.
T h e n r l q- r2 = r. By the inductive hypothesis, the p o r t i o n s o f S in T 1 a n d T2
have, respectively, at least r l -- 1 a n d at least r2 -- 1 m a j o r nodes. If neither
r 1 n o r r 2 is zero, then ll _~ N a n d 12 ~ N, a n d so n is major. Thus, S has at
least (r 1 -- 1) + (r~ -- 1) -q- 1 = r -- 1 m a j o r nodes. If rl = 0, t h e n r2 = r,
a n d so the p o r t i o n o f S in Tz has at least r -- 1 m a j o r nodes. T h e case r2 = 0
is a n a l o g o u s . [~]
THEOREM 11.9
A l g o r i t h m 11.4 p r o d u c e s a tree in [T]~, t h a t has the least cost.
Proof. A s t r a i g h t f o r w a r d i n d u c t i o n on the n u m b e r o f n o d e s in a n associa-
tive tree A shows t h a t the result o f a p p l y i n g p r o c e d u r e a c o m m u t e to its r o o t
is a syntax tree T whose root after applying the labeling Algorithm 11.1
has the, same label as the root of A. No tree in [T]a has a root with label
smaller than the label of the root of A, and no tree in [T]a has fewer major
or minor nodes.
Suppose otherwise. Then let T be a smallest tree violating one of those
conditions. Let 0 be the operator at the root of T.
Case 1" 0 is neither associative nor commutative. Every associative or
commutative transformation on T must take place wholly within the subtree
dominated by one of the two direct descendants of the root of T. Thus,
whether the violation is on the label, the number of major nodes, or the num-
ber of minor nodes, the same violation must occur in one of these subtrees,
contradicting the minimality of T.
Case 2 : 0 is commutative but not associative. This case is similar to case
1, except that now the commutative transformation may be applied to the
root. Since step (1) of procedure aeommute takes full advantage of this trans-
formation, any violation by T again implies a violation in one of its subtrees.
Case 3 : 0 is commutative and associative. Let S be the cluster containing
the root. We may assume that no violation occurs in any of the subtrees
whose roots are the direct descendants of S. Any application of an associative
or commutative transformation must take place wholly within one of these
subtrees, or wholly within S. Inspection of the result of step (2) of procedure
aeommute assures us that the number of minor nodes resulting from cluster
S is minimized. By Lemma 11.12, the number of major nodes resulting from
S is as small as possible (inspection of the result of procedure aeommute is
necessary to see this), and hence the label of the root is as small as possible.
Finally we observe that the alterations made by Algorithm 11.4 can
always be accomplished by applications of the associative and commutative
transformations. IS]
EXERCISES
11,2.1. Assuming that no algebraic laws prevail, find an optimal assembly

program with the number of accumulators N = 1, 2, and 3 for each
of the following expressions:
(a) A -- B , C -- D , (E + F).
(b) A + (B -}- (C , (D -+- E / F-k- G) , H)) + ( I - ~ J).
(c) (A • (B -- C)) • (D • (E • F)) + ((G + (H 4- I)) 4- (J + (K + L))),
Determine the cost of each program found.
11.2.2. Repeat Exercise 11.2.1 assuming that + and • are commutative.
11.2.3. Repeat Exercise 11.2.1 assuming that + and • are both associative
and commutative.
EXERCISES 903
11.2.4. Let E be a binary expression with k operators. What is the maximum

number of parentheses required to express E without using any
unnecessary parentheses ?
"11.2.5. Let T be a binary syntax tree whose root is labeled N > 2 after apply-
ing Algorithm 11.1. Show that T contains at least 3 × 2N-2 - - 1
interior nodes.
11.2.6. Let T be a binary expression tree with k interior nodes. Show that T
can have at most k minor nodes.
"11.2.7. Given N > 2, show that a tree with M major nodes has at least
3(M + 1)2N- 2 _ 1 interior nodes.
'11.2.8. What is the maximum cost of a binary syntax tree with k nodes ?
"11.2.9. What is the maximum saving in the cost of a binary syntax tree of k
nodes in going from a machine with N accumulators to one with
N + 1 accumulators ?
*'11.2.10. Let ~ be an arbitrary set of algebraic identities. Is it decidable whether
two binary syntax trees are equivalent under tZ ?
11.2.11. For Algorithm 11.2 define the procedure code(n, [il, i2 . . . . . ik]) to
compute the value of node n with accumulators Ate, A t , , . . . , Ate,
leaving the result in accumulator Ai,. Show that by making step (5)
to be
code(n, [il, i2 . . . . . ik]) =
code(n2, [i2, il, i3, • • • , ik])
code(n1, [ i l , i3, i 4 , . • • , ik])
'OP0 All, At~, At~'
Algorithm 11.2 can be modified so that assembly language instructions

of types (3) and (4) are only of the form
OP0 A,B,A
11.2.12. Let
!l if b > 0
sign(a, b) : if b = 0
[al if b < 0
Show that sign is associative but not commutative. Give examples of

other operators which are associative but not commutative.
"11.2.13. The instructions
LOAD M, A
STORE A, M
oPO A,M,B
each use one memory reference. If B is an accumulator, then

0P 0 A, B, C uses none. Find the minimum number of storage refer-
ences generated by a program that computes a binary syntax tree with
k nodes.
"11.2.14. Show that Algorithm 11.2 produces a program which requires the
fewest number of memory references to compute a given expression.
"11.2.15. Taking Go as the underlying grammar, construct a syntax-directed
translation scheme which translates infix expressions into optimal
assembly language programs, assuming N accumulators and that
(a) No algebraic identities hold,
(b) + and • are commutative, and
(c) ÷ and • are associative and commutative.
11.2.16. Give algorithms to implement the procedures code, commute, and
acommute in linear time.
"11.2.17. Certain operators may require extra accumulators (e.g., subroutine
calls or multiple precision arithmetic). Modify Algorithms 11.1 and
11.2 to take into account the possible need for extra accumulators
by operators.
11.2.18. Algorithms 11.1-11.4 can also be applied to arbitrary binary dags if
we first convert a dag into a tree by duplicating nodes. Show that now
Algorithm 11.2 will not always generate an optimal program. Esti-
mate how bad Algorithm 11.2 can be under these conditions.
11.2.19. Generalize Algorithms 11.1-11.4 to work on expressions involving
operators with arbitrary numbers of arguments.
"11.2.20. Arithmetic expressions can have unary q- and unary -- operators.
Construct an algorithm to generate optimal code for arithmetic
expressions with unary q-- and --. [Assume that all operands are
distinct and that the usual algebraic laws relating unary ÷ and --
to the four binary arithmetic operators apply.]
"11.2.21. Construct an algorithm to generate optimal code for a single arith-
metic expression in which each operand is a distinct variable or an
integer constant. Assume the associative and commutative laws for
÷ and • as well as the following identities:
(1) a + O : O + a = O .
(2) a . 1 = 1 . a =t~.
(3) c1 0 cz = c3, where c3 is the integer that is the result of apply-
ing the operator/9 to integers c1 and cz.
11.2.22. Find an algorithm to generate optimal code for a single Boolean
expression in which each operand is a distinct variable or a Boolean
constant (0 or 1). Assume that Boolean expressions involve the
operators and, or, and not and that these operators satisfy the laws
of Boolean algebra. (See p. 23 of Volume I.)
EXERCISES 905
*'11.2.23. In certain situations two or more operations can be executed in parallel.

The algorithms in Sections 11.1 and 11.2 assume serial execution.
However, if we have a machine capable of performing parallel opera-
tions, then we might attempt to arrange the order of execution to
create as many simultaneous parallel computations as possible. For
example, suppose that we have a four-register machine in which four
operators can be simultaneously executed. Then the expression
A i + A2 q- A3 -{- A4 nu A5 q- A6 q- A7 nu As can be done as shown
in Fig. 11.21. In the first step we would load A a into the first register,
step 4
step 3
step 2
step 1
Fig. 11.21 Tree for parallel computation.
A3 into the second, A5 into the third, and A7 into the fourth. In the
second step we would add A2 to register 1, A4 to register 2, A6 to
register 3, and A8 to register 4. After this step register 1 would con-
tain A1 + A2, register 2 would contain A3 + A4, and so forth. At
the third step we would add register 2 to register 1 and register 4 to
register 3. At the fourth step we would add register 2 to register 1.
Define an N-register machine in which up to N parallel operations can
be executed in one step. Assuming this machine, modify Algorithm
11.1 to generate optimal code (in the sense of fewest steps) for single
arithmetic expressions with distinct operands.
Research Problem
11.2.24. Find an efficient algorithm that will generate optimal code of the type
mentioned in this section for an arbitrary block.
Programming Exercise
11.2.25. Write programs to implement Algorithms 11.1-11.4.
BIBLIOGRAPHIC NOTES
Many papers have been written on the generation of good code for arithmetic
expressions for a specific machine or class of machines. Floyd [1961a] discusses a
number of optimizations involving arithmetic expressions including detection of
common subexpressions. He also suggested that the second operand of a non-
commutative binary operator be evaluated first. Anderson [1964] gives an algorithm
for generating code for a one-register machine that is essentially the same as the
code produced by Algorithm 11.1 when N = 1. Nakata [1967] and Meyers [1965]
give similar results.
The number of registers required to compute an expression tree has been inves-
tigated by Nakata [1967], Redziejowski [1969], and Sethi and Ullman [1970].
Algorithms 11.1-11.4 as presented here were developed by Sethi and Ullman [1970].
Exercise 11.2.11 was suggested by P. Stockhausen. Beatty [1972] and Frailey [1970]
discuss extensions involving the unary minus operator. An extension of Algo-
rithm 11.2 to certain dags was made by Chen [1972].
There are no known efficient algorithms for generating optimal code for arbi-
trary expressions. One heuristic technique for making register assignments in a
sequence of expression evaluations is to use the following algorithm.
Suppose that expression tz is to be computed next and its value stored in a fast
register (accumulator).
(1) If the value Of ~ is already stored in some register i, then do not recompute ~.
Register i is now "in use."
(2) If the value of tz is not in any register, store the value of ~ in the next unused
register, say register j. Register j is now in use. If there is no unused register avail-
able, store the contents of some register k in main memory, and store the value
of g in register k. Choose register k to be that register whose value will be unrefer-
enced for the longest time.
Belady [1966] has shown that this algorithm is optimal in some situations. However,
the model assumed by this algorithm (which was designed for paging) does not
exactly model straight line code. In particlar, it assumes the order of computation
to be fixed, while as we have seen in Sections 11.1 and 11.2, there is often much
advantage to be had by reordering computations.
A similar register allocation problem is discussed by Horwitz et al. [1966].
They assume that we are given a sequence of operations which reference and
change values. The problem is to assign these values to fast registers so that the
number of loads and stores from the fast registers to main memory is minimized.
Their solution is to select a least-cost path in a dag of possible solutions. Techniques
for reducing the size of the dag are given. Further investigation of register alloca-
tion where order of computation is not fixed has been done by Kennedy [1972]
and Sethi [1972].
Translating arithmetic expressions into code for parallel computers is discussed
by Allard et al. [1964], Hellerman [1966], Stone [1967], and Baer and Bovet [1968].
sac. 11.3 PROGRAMS W I T H LOOPS 907
The general problem of assigning tasks optimally to parallel processors is

very difficult. Some interesting aspects of this problem are discussed by Graham
[1972].
11,3. PROGRAMS WITH LOOPS
When we consider programs that contain loops, it becomes literally

impossible to mechanically optimize such programs. Most of the difficulty
stems from undecidability results. Given two arbitrary programs there is no
algorithm to determine whether they are equivalent in any worthwhile sense.
As a consequence there is no algorithm which will find an optimal program
equivalent to given program under an arbitrary cost criterion.
These results are understandable when we realize that, in general, there
are arbitrarily many ways to compute the same function. Thus, there is an
infinity of algorithms that can be used to implement the function defined
by the source program. If we want true optimization, a compiler would
have to determine the most efficient algorithm for the function computed by
the source program, and then it would have to generate the most efficient
code for this algorithm. Needless to say, from both a theoretical and a prac-
tical point of view, optimizing compilers of this nature do not exist.
However, in many situations there are a number of transformations that
can be applied to a program to reduce the size and/or increase the speed of
the resulting object language program. In this section we shall investigate
several such transformations. Through popular usage, transformations of
this nature have become known as "optimizing" transformations. A more
accurate term would be "code-improving" transformations. However, we
shall bow to tradition and use the more popular, but less accurate term
"optimizing transformation" for the remainder of this book. Our primary
goal will be to reduce the running time of the object language program.
We begin by defining intermediate programs with loops. These programs
will be very primitive, so that we can present the essential concepts without
going into a tremendous amount of detail. Then we define a flow graph for
a program. The flow graph is a two-dimensional representation of a program
that displays the flow of control between the basic blocks of a program.
A two-dimensional structure usually gives a more accurate representation
of a program than a linear sequence of statements.
In the remainder of this section we shall describe some important trans-
formations that can be applied to a program in an attempt to reduce the
running time of the object program. In the next section we shall look at ways
of collecting the information needed to apply some of the transformations
presented in this section.
11.3.1. The Program Model
We shall use a representation for programs that is intermediate between

source language and assembly language. A program consists of a sequence
of statements. Each statement may be labeled by an identifier followed by
a colon. There will be five basic types of statements: assignment, goto,
conditional, input-output, and halt.
(I) An assignment statement is a string of the form A , - - 0 B1 ' " B,,
where A is a variable, B 1 , . . . , B, are variables or constants, and 0 is an
r-ary operator. As in the previous sections, we shall usually use infix nota-
tion for binary operators. We also allow a statement of the form A ~ - B in
this category.
(2) A goto statement is a string of the form
goto (label)
where (label) is a string of letters. We shall assume that if a goto statement is

used in a program, then the label following the word goto appears as the
label of a unique statement in the program.
(3) A conditional statement is of the form
if A (relation) B goto (label)
where A and B are variables or constants and (relation) is a binary relation

such as < , < , --, and ~ .
(4) An input-output statement is either a read statement of the form
read A
where A is a variable, or a write statement of the form
write B
where B is a variable or a constant. For convenience we shall use the state-

ment
read A1, A 2 , . . . , A ,
to denote the sequence of statements
read Ai
read Az
read A,
We shall use a similar convention for write statements.

SEC. 11.3 PROGRAMSWITH LOOPS 909
(5) Finally, a halt statement is the instruction halt.

The intuitive meaning of each type of statement should be evident. For
example, a conditional statement of the form
if A r B goto L
means that if the relation r holds between the current values of A and B,
then control is to be transferred to statement labeled L. Otherwise, control
passes to the following statement.
A definition statement (or definition for short) is a statement of the form
read A or of the form A ~ OB1 . . . Br. Both statements are said to define
the variable A.
We shall make some further assumptions about programs. Variables
are simple variables, e.g., A, B, C , . . . , or simple variables indexed by one
simple variable or constant, e.g., A(1), A(2), A(I), or A(J). Further, we shall
assume that all variables referenced in a program must be either input vari-
ables (i.e., appear in a previous read statement) or have been previously
defined by an assignment statement. Finally, we shall assume that each
program has at least one halt statement and that if a program terminates,
then the last statement executed is a halt statement.
Execution of a program begins with the first statement of the program
and continues until a halt statement is encountered. We suppose that each
variable is of known type (e.g., integer, real) and that its value at any time
during the execution is either undefined or is a quantity of the appropriate
type. (It will be assumed that all operators used are appropriate to the types
of the variables to which they apply and that conversion of types occurs
when appropriate.)
In general the input variables of a program are those variables associated
with read statements and the output variables are the variables associated
with write statements. An assignment of a value to each input variable each
time it is read is called an input setting. The value of a program under an input
setting is the sequence of values written by the output variables during
the execution of the program. We say that two programs are equivalent if
for each input setting the two programs have the same value.t
This definition of equivalence is a generalization of the definition of
equivalent blocks used in Section 11.1. To see this, suppose that two blocks
(~1 = (P~, 1i, Ui) and (B2 = (Pz, Iz, Uz) are equivalent in the sense of Section
11.1. We convert (g~ and (g2 into programs 6'1 and 6'z in the obvious way
tWe are assuming that the meaning of each operator and relational symbol, as well as
the data type of each variable, is established. Thus, our notion of equivalence differs from
that of the schematologists (see for example, Paterson [1968] or Luckham et al. [1970]),
in that they require two programs to give the same value not only for each input setting,
but for each data type for the variables and for each set of functions and relations that
we substitute for the operators and relational symbols.
That is, we place read statements for the variables in 11 and 12 in front of P1
and P2, respectively, and place write statements for the variables in U 1 and
U2 after P1 and P2- Then we append a halt statement to each program. How-
ever, we must add the write statements to P1 and P2 in such a fashion that
each output variable is printed at least once and that the sequences of values
printed will be the same for both ff'l and 6~2. Since 031 is equivalent to 032,
we can always do this.
The programs (Pi and if'2 are easily seen to be equivalent no matter what
the space of input settings is and no matter what interpretation is placed on
the functions represented by the operators appearing in (P1 and 6'z. For
example, we could choose the set of prefix expressions for the input space
and interpret an application of operator 0 to expressions e 1, • • •, er to yield
Oel "'" e,.
However, if 031 and 03z are not equivalent and 6'1 and 6'z are programs
that correspond to 031 and 032, respectively, then there will always be a set of
data types for the variables and interpretations for the operators that causes
6'1 and (P2 to produce different output sequences. In particular, let the vari-
ables have prefix expressions as a "type" and let the effect of operator 0
on prefix expressions el, e z , . . . , ee be the prefix expression Oelez...ek.
Of course, we may make assumptions about data types and the algebra
connected with the function and relation symbols that will cause 6' 1 and 6~2
to be equivalent. In that case 031 and 032 will be equivalent under the corre-
sponding set of algebraic laws.
Example 11,26
Consider the following program for the Euclidean algorithm described
on p. 26 (Volume I). The output is t o be the greatest common divisor of
two positive integers p and q.
read p
read q
loop: r < remainder(p, q)
if r =0 goto done
p< ~q
q< r
goto loop
done: write q
halt
If, for example, we assign the input variables p and q the values 72 and 56,
respectively, then the output variable q in the write statement will have value
SEC. 11.3 PROGRAMS WITH LOOPS 91 1
8 when that statement is executed with the normal interpretation of the

operators. Thus, the value of this program for the input setting p ~ 72,
q ~ 56 is the "sequence" 8 which is generated by the output variable q.
If we replace the statement goto loop by if q ~ 0 goto loop, we have an
equivalent program. This follows because the statement goto loop cannot be
reached unless the fourth statement finds r ~ 0. Since q is given the value of
r at the sixth statement, it is not possible that q = 0 when the seventh state-
ment is executed. [~]
It should be observed that the transformations which we may apply are to

a large extent determined by the algebraic laws which we assume hold.
'Example 11.27
For some types of data we might assume that a • a = 0 if and only if
a = 0. If we assume such a law, then the following program is equivalent
to the one in Example 11.26:
read p
read q
loop: r < remainder(p, q)
t< r,r
if t----0 goto done
p~ ~q
q~---r
goto loop
done" write q
halt
Of course, this program would not be more desirable in any circumstance

we can think of. However, without the law stated above, this program and
the one in Example 11.26 might not be equivalent. [~
Given a program P, our goal is to find an equivalent program P ' such

that the expected running time of the machine language version of P ' is less
than that of the machine language version of P. A reasonable approximation
of this goal is to find an equivalent program P " such that the expected num-
ber of machine language instructions to be executed by P" is less than the
number of instructions executed by P. The latter goal is an approximation in
that not every machine instruction requires the same amount of machine
time to be executed. For example, an operation such as multiplication or
division usually requires more time than an addition or subtraction. How-
ever, initially we shall concentrate on reducing the number of machine

language instructions that need to be executed.
Most programs contain certain sequences of statements which are exe-
cuted considerably more often than the remaining statements in the pro-
gram. Knuth [1971] found that in a large sample of F O R T R A N programs,
a typical program spent over one-half of its execution time in less than 4 7o
of the program. Thus, in practice it is often sufficient to a p p l y t h e optimiza-
tion procedures only to these heavily traveled regions of a program. Part of
the optimization may involve moving statements from heavily traveled
regions to lightly traveled ones even though the actual number of state'
ments in the program itself remains the same or even increases.
We can often deduce what the most frequently executed parts of a source
program will be and pass this information along to the optimizing compiler
along with the source program. In other cases it is relatively easy to write
a routine that will count the number of times a given statement in a program
is executed as the program is run. With these counts we can obtain the "fre-
quency profile" of a program to determine those parts of the program in
which we should concentrate our optimization effort.
11.3.2. Flow Analysis
Our first step in optimizing a program is to determine the flow of control

within the program. To do this, we partition a program into groups of
statements such that no transfer occurs into a group except to the first state-
ment in that group, and once the first statement is executed, all statements
in the group are executed sequentially. We shall call such a group of state-
ments a basic block, or block if no confusion with the term "block" in the
sense of Section 11.1 arises.
DEFINITION
A statement S in a program P is a basic block entry if
(1) S is the first statement in P, or
(2) S is labeled by an identifier which appears after goto in a goto or
conditional statement, or
(3) S is a statement immediately following a conditional statement.
The basic block belonging to a block entry S consists of S and all state-
ments following S
(1) Up to and including a halt statement or
(2) Up to but not including the next block entry.
Notice that the program constructed from a block in the sense of Section
11.1 will be a basic block in the sense of this section.
sEc. 11.3 PROGRAMS WITH LOOPS 91 3
Example 11.28
Consider the program of Example 11.26. There are four block entries,
namely the first statement in the program, the statement labeled loop, the
assignment statement p ~ q, and the statement labeled done.
Thus, there are four basic blocks in the program. These blocks are given
below:
Block 1 read p
read q
Block 2 loop: r ~ remainder(p, q)
if r----0 goto done
Block 3 pc q
q~ r
goto loop
Block 4 done: write q
halt D
From the blocks of a program we can construct a graph that resembles
the familiar flow chart for the program.
DEFINITION
A flow graph is a labeled directed graph G containing a distinguished node

n such that every node in G is accessible from n. Node n is called the begin
node.
Aflow graph ofaprogram is a flow graph in which each node of the graph
corresponds to a block of the program. Suppose that nodes i a n d j of the flow
graph correspond to blocks i and j of the program. Then an edge is drawn
from node i to node j if
(1) The last statement in block i is not a goto or halt statement and block
j follows block i in the program, or
(2) The last statement in block i is goto L or i f . . . goto L and L is the
label of the first statement of block j.
The node corresponding to the block containing the first statement of
the program is the begin node.
Clearly, any block that is not accessible from the begin node can be
removed from a given program without changing its value. From now on
we shall assume that all such blocks have been removed from each program
under consideration.
Example 11.29
The flow graph for the program of Example 11.26 is given in Fig. 11.22.
Block 1 is the begin node. E]
read
read ; Block 1
loop:
1
r "~-remainder (p,q)
if r = 0 goto done Block 2
p~q done: write q

Block 3 q~r halt Block 4
goto loop
Fig. 11.22 Flow graph.
Many optimizing transformations on programs require knowing the

places in a program at which a variable is defined and where that definition
is subsequently referenced. These definition-reference relationships are
determined by the sequences of blocks that can actually be executed. The
first block in such a sequence is the begin node, and each subsequent block
must have an edge from the previous block. Sometimes the predicates used
in the conditional statements may preclude some paths in the flow graph
from being executed. However, there is no algorithm to detect all such situ-
ations, and we shall assume that no paths are precluded from execution.
It is also convenient to know for a block ~ whether there is another block
6~' such that each time ~B is executed, ~ ' was previously executed. One appli-
cation o f this knowledge is that if the same value is computed in both 6~
and 6~', then we can store this value after it is computed in CB' and thus
avoid recomputing the same value in (B. We now develop these ideas formally.
DEFINITION
Let F be a flow graph whose blocks have names chosen from set A.
A sequence of blocks CB1 . . . 6~, in A* is a (block) computationpath of F if
(1) 6~1 is the begin node of F.
(2) For 1 < i < n, there is an edge from block ~ _ 1 to CBt.
sEc. 11.3 PROGRAMS WITH LOOPS 91 5
In other words, a computation path 03t "'" 63, is a path from 031 to 63, in F
such that 6~a is the begin node.
We say that block 03' dominates 63 if 03' :~ 63 and every path from the
begin node to 63 contains 63'. We say that 03' directly dominates 03 if
(1) 03' dominates 03, and
(2) If 63" dominates 63 and 63" ~ 63', then 63" dominates 63'.
Thus, block 63' directly dominates 63 if 63' is the block "closest" to 03
which dominates 63.
Example 11.30
Referring to Fig. 11.22, the sequence 1232324 is a computation path.
Block 1 directly dominates block 2 and dominates blocks 3 and 4. Block 2
directly dominates blocks 3 and 4. D
Here are some algebraic properties of the dominance relation.

LEMMA 11.13
(1) If 03I dominates 032 and 032 dominates 033, then 031 dominates 033
(transitivity).
(2) If 63i dominates 032, then 632 does not dominate 031 (asymmetry).
(3) If ~1 and 032 dominate 633, then either 031 dominates 632 or conversely.
Proof (I) and (2) are Exercises. We shall prove (3). Let e 1 . - . C,033 be
any computation path with no cycles (i.e., Ci ~ 033, and ei ~ Cj if i ~ j).
One such path exists since we assume that all nodes are accessible from
the begin node. By hypothesis, eg = 031 and C i = 032 for some i and j.
Assume without loss of generality that i < j. Then we claim that 03i domi-
nates 032.
In proof, suppose that 031 did not dominate 032. Then there is a compu-
tation path ~D1 -.. q.E)m032, where none of ~D1, . . . , ~Dm are 031. It follows
that ~D1 - . . ~Dm032~j+1 "'" ~,033 is also a computation path. But none of
the symbols preceding 033 are 031, contradicting the hypothesis that 031 domi-
nates 033. [~
LEMMA 1 1.14
Every block except the begin node (which has no dominators), has
a unique direct dominator.
Proof Let ,$ be the set of blocks that dominate some block 03. By Lemma
1 1.1 3 the dominance relation is a (strict) linear order on $. Thus, S has a mini-
mal element, which must be the direct dominator of 03. (See Exercise 0.1.23.)
D
91 6 CODEOPTIMIZATION CHAP. 11
We now give an algorithm to compute the direct dominance rel~ttion

for a flow graph.
ALGORITHM 1 1.5
Computation of direct dominance.
Input. A flow graph F with A = [(B~, 0 3 2 , . . . , (B,}, the set of blocks in F.
We assume that 631 is the begin node.
Output. DOM(63), the block that is the direct dominator of block 63,
for each 63 in A, other than the begin node.
Method. We compute DOM(63)recursively for each 63 in A - ~631}.
At any time, DOM(63) will be the block closest to 63 found to dominate 63.
Ultimately, DOM(63) will be the direct dominator of 63. Initially, DOM(63)
is 631 for all d3 in A - {63~}. For i = 2, 3 , . . . , n, do the following two
steps:
(1) Delete block 63~from F. Using Algorithm 0.3, find each block 63 which
is now inaccessible from the begin node of F. Block 63~ dominates 63 if and
only if 63 is no longer accessible from the begin node when 63~is deleted from
F. Restore 63~to F.
(2) Suppose that it has been determined that 63~dominates 63 in step (1).
If D O M ( 6 3 ) = DOM(63t), set DOM(63) to 63t. Otherwise, leave DOM(63)
unchanged.
Example 11.31
Let us compute the direct dominators for the flow graph of Fig. 1 1.22
using Algorithm 1 1.5. Here A = [631,632,633,634]- The successive values of
DOM(63) after considering 63~, 2 _~ i < 4, are given below"
i DOM((Bz) DOM((B3) DOM((B4)
Initial ~1 ~1 ~i
2 6~1 (Bz ~z
3 (B1 ~2 ~2
4 ~1 (B2 ~2
Let us compute line 2. Deleting block 632 makes blocks 633 and (~4 inaccessible.
We have thus determined that 632 dominates 633 and 634. Prior to this point,
DOM(632) = DOM(633) = 631, and so by step (2) of Algorithm 1 1.5 we set
DOM(633) to 63z- Likewise, DOM(634) is set to 63z. Deleting block 633 or 634
does not make any block inaccessible, so no further changes occur.
THEOREM 1 1.10
When Algorithm 1 1.5 terminates, DOM(63) is the direct dominator of 63.
SEC. 11.3 PROGRAMS W I T H LOOPS 917
Proof We first observe that step (1) correctly determines those 65's domi-
nated by 6~;, for 6~ dominates 6~ if and only if every path to 63 from the
begin node of F goes through 6~.
We show by induction on i that after step (2) is executed, DOM(CB) is
that block 6~h, 1 < h ~ i, which dominates 6~ but which, in turn, is dominated
by all 6~'s, 1 ~ j ~ i, which also dominate 6~. That such a 6~h must exist
follows directly from Lemma 11.13. The basis, i -- 2, is trivial.
Let us turn to the inductive step. If CB~+1 does not dominate 6~, the con-
clusion is immediate from the inductive hypothesis. If 6~+, does dominate ~,
but there is some 6~j, 1 ~_~j ~ i, such that 6~j dominates ~B and 6~+ 1 domi-
nates 6~, then DOM(6~) ~ DOM(6~+i). Thus, DOM((B) does not change,
which correctly fulfills the inductive hypothesis. If (B~+1 dominates 6~ but is
dominated by all (Bj's which dominate (g, 1 ~ j ~ i, we claim that prior to
this step, DOM(6~)----DOM(6~+I). For if not, there must be some 6~k,
1 < k ~ i, which dominates 6~+ 1 but not 6~, which is impossible by Lemma
11.13(1). Thus DOM(6~) is correctly set to 6~.+1, completing the induction. [~
We observe that if F is constructed from a program, then the number of

edges is at most twice the number of blocks. Thus, step (1) of Algorithm 11.5
takes time proportional to the square of the number of blocks. The space
taken is proportional to the number of blocks.
If (B 1, ( B 2 , . . . , 63, are the blocks of a program (except for the begin
block), then we can store the direct dominators of these blocks as the sequence
(~1, e 2 , . . . , e,, where e i is the direct dominator of 6~t, for 1 < i ~ n. All
dominators for block 63i can be recovered from this sequence easily by
finding DOM(CB~), DOM(DOM(6~,)), and so forth until we reach the begin
block.
11.3.3. Examples of Transformations on Programs
Let us now turn our attention to transformations that can be applied to

a program, or its flow graph, in an attempt to reduce the running time of
the object language program that is ultimately produced. In this section we
shall consider examples of such transformations. Although there does not
exist a complete catalog of optimizing transformations for programs with
loops, the transformations considered here are useful for a wide class of
programs.
1. Removal of Useless Statements

This is a generalization of transformation T1 of Section 11.1. A statement
that does not affect the value of a program is unnecessary in a program and
can be removed. Basic blocks that are not accessible from the begin node
are clearly useless and can be removed when the flow graph is constructed.
Statements which compute values that are not ultimately used in computing
an output variable also fall into this category. In Section 11.4, we shall pro-
vide a tool for implementing this transformation in a program with loops.
2. Elimination of Redundant Computations

This transformation is a generalization o f transformation T2 of Section
11.1. Suppose that we have a program in which block (B dominates block
6~' and that 6~ and 6~' have statements A ~--B + C and A ' ~ - - B + C,
respectively. If neither B nor C are redefined in any (not necessarily cyele
free) path from 6~ to 6~' (it is not hard to detect this; see Exercise 11.3.5),
then the values computed by the two expressions are the same. We may
insert the statement X ~ - A after A ~-- B + C in 6~, where X is a new vari-
able. We then replace A' ~ B + C by A' ~ X. Moreover, if A is never
redefined going from 6~ to 6~', then we do not need X ~-- A, and A' ~ A
serves for A' +-- B + C.
In this transformation we assume that it is cheaper to do the two assign-
ments X ~-- A and A ~ - X than to evaluate A' .-- B + C, an assumption
which is realistic for many reasonable machine models.
Example 11.32
Consider the flow graph shown in Fig. 11.23. In this flow graph block (B~
dominates blocks (B2, (B3, and (B4. Suppose that all assignment statements
involving the variables ,4, B, C, and D are as shown in Fig. 11.23. Then
the expression B + C has the same value when it is computed in blocks
(B~, 033, and 6~4. Thus, it is unnecessary to recompute the expression B + C
in blocks (B3 and 6~4. In block 6~ we can insert the assignment statement
X ~ A after the statement A ~ B + C. Here X is a new variable name.
Then in blocks 6~3 and 6~4 we can replace the statement A ~ B + C and
G ~ B + C by the simple assignment statements A ~ X and G ~ X,
respectively, without affecting the value of the program. Note that since A
is computed in block 6~2, we cannot use A in place of X. The resulting flow
graph is shown in Fig. 11.24.
The assignment A ~-- X now in 6~3 is redundant and can be eliminated.
Also note that if the statement F ~ A + G in block 6~2 is changed to
B ~-- A + G, then we can no longer replace G ~ - B + C by G ~-- X in
block 6~4. [-]
Eliminating redundant computations (common subexpressions) from

a program requires the detection of computations that are common to two
or more blocks of a program. While we have shown a redundant computa-
tion occurring at a block and one of its dominators, an expression such as
A + B might be computed in several blocks, none of which dominate a given
SEC. 11.3 PROGRAMS WITH LOOPS 919
B ~-D+D
C~D*D ~l
A ~B+C
A ~-B*C ,1 A ~B+C
F~A+G l CB2 E~A,A ~3
t
6~5
block • (which also needs expression A + B). In general, a computation of

A + B is redundant in a block • if
(1) Every path from the begin block to ~ (including those which pass
several times through ~) passes through a computation of A + B, and
(2) Along any such path, no definition of A or B occurs between the last
computation of A + B and the use of A + B in (B.
In Section 11.4 we shall provide some tools for the detection of this more
general situation.
We should note that as in the straight-line case, algebraic laws can increase
the number of common subexpressions.
3. Replacing Run Time Computations by Compile Time Computations

It makes sense, if possible, to perform a computation once when a pro-
gram is being compiled, rather than repeatedly when the object program is
being executed. A simple instance of this is constant propagation, the replace-
ment of a variable by a constant when the constant value of that variable is
known.
B~D+D
. C~D*D -
A ~B+C I~1
X+-A
A ~-B*C A ~X
F~A+G (B2 E~A*A
" G'~- 4
6~5
Fig. 11.24 Transformed flow graph.
Example 11.33
Suppose that we have the block
read R
PI < 3.14159
A < 4/3
B< A*PI
C< R1'3
V< B,C
write V
We can substitute the value 3.14159 for PI in the f o u r t h statement to obtain

the statement B ~ A • 3.14159. We can also c o m p u t e 4/3 a n d substitute
the resulting value in B ~--- A • 3.14159 to obtain B ~ 1.33333 • 3.14159.
We can c o m p u t e 1 . 3 3 3 3 3 , 3 . 1 4 1 5 9 = 4 . 1 8 8 7 8 a n d substitute 4.18878 in
the statement V ~ B • C to obtain V ~-- 4.18878 • C. Finally, we can elimi-
nate the resulting useless statements to obtain the following shorter equiva-
lent program:
read R
C< R1"3
V< 4.18878 • C
write V
4. Reduction in Strength
Reduction in strength involves the replacement of one operator, requiring
a substantial amount of machine time for execution, by a less costly com-
putation. For example, suppose that a PL/I source program contains the
statement
I ---- LENGTH(S1 I! $2)
where S1 and $2 are strings of variable length. The operator [[ denotes string
concatenation. String concatenation is relatively expensive to implement.
However, suppose we replace this statement by the equivalent statement
I = LENGTH(S1) -t-- LENGTH(S2)
We would now have to perform the length operation twice and perform one
addition. But these operations are substantially less expensive than string
concatenation.
Other examples of this type of optimization are the replacement of cer-
tain multiplications by additions and the replacement of certain expo-
nentiations by repeated multiplication. For example, we might replace the
statement C ~-- R 1" 3 by the sequence
C~ R,R
C< C,R
assuming that it is cheaper to compute R • R • R rather than calling sub-

routines to evaluate R 3 as ANTILOG(3 • LOG(R)).
In the next section we shall consider a more interesting form of reduction
in strength within loops, where it is possible to replace certain multiplications
by additions.
11.3.4. Loop Optimization

Roughly speaking, a loop in a program is a sequence of blocks that can
be executed repeatedly. Loops are an integral feature of most programs,
and many programs have loops that are executed a large number of times.
Many programming languages have constructs whose explicit purpose is
}
the establishment of a loop. Often substantial improvements in the running

time of a program can be made by taking advantage of transformations
that only reduce the cost of loops. The general transformations we just dis-
cussed--removal of useless statements, elimination of redundant compu-
tation, constant propagation, and reduction in strength--are particularly
beneficial when applied to loops. However, there are certain transformations
that are specifically directed at loops. These are the movement of compu-
tations out of loops, the replacement of expensive computations in a loop
by cheaper ones, and the unrolling of loops.
To apply these transformations, we must first isolate the loops in a given
program. In the case of FORTRAN DO loops, or the intermediate code
arising from a DO loop, the identification of a loop is easy. However, the
concept of a loop in a flow graph is more general than the loops that result
from DO statements in FORTRAN. These generalized loops in flow graphs
are called "strongly connected regions." Every cycle in a flow graph with a
single entry point is an example of a strongly connected region. However,
more general loop structures are also strongly connected regions. We define
a strongly connected region as follows.
DEFINITION
Let F be a flow graph and 8 a subset of blocks of F. We say that ,~ is

a strongly connected region (region, for short) of F if
(1) There is a unique block ~ of $ (the entry) such that there is a path
from the begin node of F to (B which does not pass through any other block
in S.
(2) There is a path (of nonzero length) that is wholly within S from every
block in ,~ to every other block in $.
Example 11.34
Consider the abstract flow graph of Fig. 11.25. [2, 3, 4, 5} is a strongly
connected region with entry 2. [4] is a strongly connected region with entry 4.
{3, 4, 5, 6} is a region with entry 3. {2, 3, 7} is a region with entry 2. Another
region with entry 2 is {2, 3, 4, 5, 6, 7}. The latter region is maximal in that
every other region with entry 2 is contained in this region. D
The important feature of a strongly connected region that makes it

amenable to code improvement is the single identifiable entry block. For
example, one optimization that we can perform on flow graphs is to move
a computation that is invariant within a region into the predecessors of the
entry block of the region (or we may construct a new block preceding the
entry, to hold the invariant computations).
We can characterize the entry blocks of a flow graph in terms of the
dominance relation.
sEc. 11.3 PROGRAMS WITH LOOPS 923
THEOREM 11.11
Let F be a flow graph. Block CBin F is an entry block of a region if and
only if there is some block CB' such that there is an edge from ~ ' to 6~ and
6~ either dominates 6~' or is CB'.
Proof.
Only if: Suppose that 6~ is the entry block of region S. If S = [¢B}, the
result is trivial. Otherwise, let 6~' be in 8, cB' ~ 6~. Then ~ dominates 6~',
for if not, then there is a path from the begin node to CB' that does not pass
through ~, violating the assumption that 6~ is the unique entry block.
Thus, the entry block of a region dominates every other block in the region.
Since there is a path from every member of $ to cB, there must be at least
one CB' in 8 - {tB} which links directly to cB.
If: The case in which 6~ = cB' is trivial, and so assume that 6~ ~ 6~'.
Define S to be ~ together with those blocks ~ " such that tB dominates ~ "
and there is a path from 6~" to 6~ which passes only through nodes dominated
by 6~. By hypothesis, ~ and ~ ' are in $. We must show that $ is a region with
entry ~. Clearly, condition (2) of the region definition is satisfied, and so we
must show that there is a path from the begin node to ~ that does not pass
through any other block in $. Let C 1 .- • •,CB be a shortest computation path
leading to CB. If ej is in $, then there is some i, 1 < i < j, such that t~i = CB,
because ~ dominates Cj. Then e l . . . C,O3is not a shortest computation path
leading to CB,a contradiction. Thus, condition (1) of the definition of a strongly
connected region holds.
The set S constructed in the "if" portion of Theorem 11.11 is clearly the
maximal region with entry ~B. It would be nice if there were a unique region
with entry ~, but unfortunately this is not always the case. In Example 11.34,
there are three regions with entry 2. Nevertheless, Theorem 11.11 is useful
in constructing an efficient algorithm to compute maximal regions, which
are unique.
Unless a region is maximal, the entry block may dominate blocks not
in the region, and blocks in the region may be accessible from these blocks.
In Example 11.34, e.g., region [2, 3, 7} can be reached via block 6. We there-
fore say that a region is single-entry if every edge entering a block of the region,
other than the entry block, comes from a block inside the region. In Example
11.34, region {2, 3, 4, 5, 6, 7} is a single-entry region. In what follows, we
assume regions to be single-entry, although generalization to all regions is
not difficult.
1. Code Motion
There are several transformations in which knowledge of regions can be
used to improve code. A principal one is code motion. We can move a region-
independent computation outside the region. Let us say that within some
single-entry region variables Y and Z are not changed but that the statement
X ~-- Y + Z appears. We may move the computation of Y-Jr- Z to a newly
created block which links only to the entry block of the region.t All links from
outside the region that formerly went to the entry now go to the new block.
Example 11.35
It may appear that region-invariant computations would not appear
except in the most carelessly written programs. However, let us consider
the following inner DO loop of a F O R T R A N source program, where J is
defined outside the loop"
K=0
DO 3 I = 1,1000
3 K--J+ 1 +I+K
The intermediate program for this portion of the source program might
?The addition of such a block may make the flow graph unconstructable from any
program. However, the property of constructability from a program is never used here.
look like this"
K< 0
I< 1
loop: T+---J+ 1
S< T+I
K< S+K
if I : 1000 goto done
I< I+ 1
goto loop
done: halt
The corresponding flow graph is shown in Fig. 11.26.
K+-O 63 1
I+-1
T~-J+I
S~T+I
K~S+K 6~2
I = 1000?
i ! i hat 6~4
i I~I+ 1 6~ 3
We observe that [~2, 6~3} in Fig, 11.26 is a region with entry ~2- The
statement T ~ J + 1 is invariant in the region, so it may be moved to a new
block, as shown in Fig. 11.27.
While the number of statements in the flow graphs of Figs. 11.26 and 11.27
is the same, the presumption is that statements in a region will tend to be
executed frequently, so that the expected time of execution has been
decreased. [Z]
2. Induction Variables
Another useful transformation concerns the elimination of what we shall
call induction variables.
K~O 6~
I~1
L
T~J+I
S~T+I
K~S+K ~2
I = 1000?
I ! :l halt ]6~ 4
I~I+ 1 6~3
Fig. 11.27 Revised flow graph.
DEFINITION
Let $ be a single entry region with entry 6~ and let X be some variable
appearing in a statement of the blocks of S. Let CB1 . . . 6~n03e1 . . . ~m be
any computation path such that ~t is in S, 1 < i < m, and 6~n, if it exists,
is not in S. Define X1, X2, • • • to be the sequence of values of X each time X
is assigned in the sequence 6~e~ . . . ~m" If Xi, X2, • .. forms an arithmetic
progression (with positive or negative difference) for arbitrary computation
paths as above, then we say that X is an induction variable of $.
We shall also consider X to be an induction variable if it is undefined
the first time through (B and forms an arithmetic progression otherwise. In
this case, it may be necessary to initialize it appropriately on entry to the
region from outside, in order that the optimizations to be discussed here may
be performed.
Note that it is not trivial to find all the induction variables in a region.
In fact, it can be proven that no such algorithm exists. Nevertheless, we can
detect enough induction variables in common situations to make the concept
worth considering.
Example 11.36
In Fig. 11.27, the region {(B2, 6~3} has entry CB2. If CB2 is entered from
(B~ and the flow of control passes repeatedly from CB2 to 6~3 and then back
to 6~2, the variable I takes on the values 1, 2, 3 . . . . . Thus, I is an induction
variable. Less obviously, S is an induction variable, since it takes on values
T + 1, T + 2, T + 3, .. •. However, K is not an induction variable, because
it takes values T + 1, 2T + 3, 3T + 6, . . . . E]
The important feature of induction variables is that they are linearly

related to each other as long as control is within their region. For example,
in Fig. 11.27, the relations S - T-+- I and I - - S - T hold every time we
leave 6~2.
If, as in Fig. 11.27, one induction variable is used only to control the
region (indicated by the fact that its value is not needed outside the region
and that it is always set to the same constant immediately before entering
the region), it is possible to eliminate that variable. Even if all induction
variables are needed outside the region, we may use only one inside the
region and compute the remaining ones when we leave the region.
Example 11.37
Consider Fig. 11.27. We shall eliminate the induction variable I, which
qualifies under the criteria listed above. Its role will be played by S. We
observe that after executing 6~2, S has the value T + / , so when control
passes from 6~3 back to 6~2, the relation S ---- T + I - 1 must hold. We can
thus replace the statement S +-- T + I by S ~-- S + 1. But we must then
initialize S correctly in 6~, so that when control goes from 6~ to (B2, the
value of S after executing the statement S ,-- S + 1 is T + L Clearly, in
block 6~'2, we must introduce the new statement S ~-- T after the statement
T ~ - - J + I.
We must then revise the test I = 1000 ? so that it is an equivalent test on S.
When the test is executed, S has the value T ÷ I. Consequently, an equiva-
lent test is
R < T -t- 1000

S--R?
Since R is region-independent, the calculation R , - - T + 1000 can be

moved to block 03'2. It is then possible to dispense with I entirely. The result-
ing flow graph is shown in Fig. 11.28.
We observe from Fig. 11.28 that block CB3 has been entirely eliminated
and that the region has been shortened by one statement. Of course, ~B~
has been increased in size, but we are presuming that regions are executed
more frequently than blocks outside the region. Thus, Fig. 11.28 represents
a speeding up of Fig. 11.27.
We observe that the step S ,-- T in 6~ can be eliminated if we identify
S and T. This is possible only because at no time will the values of S and T
be different, yet both will be "active" in the sense that they may both be
used later in the computation. That is, only S is active in 6~2, neither is active
in CB~, and in 6~ both are active only between the statements S ~-- T and
R ~ - T + 1000. At that time they certainly have the same value. If we
replace T by S, the result is Fig. 11.29.
To see the improvement between Figs. 11.26 and 11.29, let us convert
K~0
K~0
iI
iT,-
S~
R +-
1
1000
~ S*-'J+ 1
R ~- S + 1000
S*-S + I S*-S+I
K~S+K 632 K*-S+K d~2
S =R? S =R?
I i
halt ] (B4
IIhalt d~4
Fig. 11.28 Further revised flow graph. Fig. 11.29 Final flow graph.
each into assembly language programs for a crude one-accumulator machine.

T h e o p e r a t i o n c o d e s s h o u l d be t r a n s p a r e n t . (JZERO s t a n d s f o r " j u m p if
a c c u m u l a t o r is z e r o " a n d J N Z f o r " j u m p if a c c u m u l a t o r is n o t z e r o . " ) T h e
t w o p r o g r a m s a r e s h o w n in Fig. 11.30.
LOAD = 0 LOAD = 0
STORE K STORE K
LOAD = 1 LOAD J
LOOP: STORE I ADD = 1
LOAD J STORE S
ADD = 1 ADD = 1000
ADD 1 STORE R
ADD K LOOP: LOAD S
STORE K ADD = 1
LOAD I STORE S
SUBTR = 1000 ADD K
JZERO DONE STORE K
LOAD 1 LOAD S
ADD = 1 SUBTR R
JUMP LOOP JNZ LOOP
DONE: END END
(a) (b)
Program from Fig. 11.26 Program from Fig. 11.29
Fig. 11.30 Equivalent programs.

Observe that the length of the program in Fig. 11.30(b) is the same as
that of Fig. 11.30(a). However, the loop in Fig. 11.30(b) is shorter than in
Fig. ll.30(a) (8 instructions vs. 12), which is the important factor when
time is considered. [~
3. Reduction in Strength
An interesting form of reduction in strength is possible within regions.
If within a region there is a statement of the form A ,-- B • / , where the value
of B is region-independent and the values of I at that statement form an
arithmetic progression, we can replace the multiplication by addition or
subtraction of a quantity, which is the product of the region-independent
value and the difference in the arithmetic progression of the induction
variable. It is necessary to properly initialize the quantity computed by the
former multiplication statement.
Example 11.38
Consider the following portion of a source program,
DO 5 J--1, N
DO 5 I - - 1, M
5 A(/, J) = B(~, J)
which sets array A equal to array B assuming that both A and B are M by
N arrays. Suppose element A(I, J) is stored in location A + M • ( J - 1) + I - 1
for 1 < I < M , 1 <J-<N. Let us make a similar assumption about
B (/, J). For convenience, let us denote location A + L by A(L). Then the
following partially optimized intermediate program might be created from
this source program" M' + - - M - 1
N'< N--1
J< --1
outer" J< J[ 1
I< --1
K< M.J
loop" I< I[ 1
L< KtI
A(L) ~ B(L)
if I < M ' goto loop
if J < N' goto outer
halt
The flow graph for this program is shown in Fig. 11.31. In this flow
i~ - ~i ~ i-
M'~M-1
N'+-N-1
J+--1
631
l J~J+l
I~O 632
K+-M *J
I~I+I
L+-K+I
A (L ) +- B(L ) d~3
I < M'?
6~ 4
,11
halt
graph {6~2,6~3, ~4} is a region in which M is invariant and J is an induc-

tion variable with increment 1. We can therefore replace statement
K ~ M , J by statement K ~ K - t - M provided K is initialized to - - M
outside the region. The flow graph that results is shown in Fig. 11.32. The
program represented by this new flow graph is longer than before, but the
region represented by blocks (B~', (B3 and ~4 can be executed more quickly,
because a multiplication has been replaced by an addition. Moreover,
additional time can be saved by eliminating the induction variable J in
favor of L.
It is interesting to note that we can obtain a far more economical program
by replacing the entire region {03~',~3, 6~4} by a single block in which A ( L )
is set to B ( L ) for 1 _~ K _< M , N. The resulting flow graph is shown in
Fig. 11.33. [Z]
4. L o o p Unrolling
The final code-improving transformation which we shall consider is
exceedingly simple but often overlooked. It is loop unrolling. Consider the
M'~M-I
N'~N-1
J~-I
K ~ -M
J~J+l
I~ -1
K~K+M
I~I+l
L~K+I
A(L)~B(L) ~3
I'[
I<M'?
J <N'? ~4
'1 halt ~5
Fig. 11.32 New flow graph.
L~--1
T~M *N
I
L~L+I
_ ,
A(L)~-B(L)
L<T?
I I '
halt
Fig. 11.33 Final flow graph.
?
flow graph in Fig. 11.34. Blocks 032 and 033 are executed 100 times. Thus 100
test instructions are executed. We could dispense with all 100 test instruc-
tions by "unrolling" the loop. That is, the loop could be unfolded into
a straight-line block consisting of 100 assignment statements"
A(1)< B(1)
A(2) < B(2)
.
A(100) < B(100)
A less frivolous approach would be to unroll the loop "once" to obtain the
flow graph in Fig. 11.35. The program in Fig. 11.35 is longer, but fewer
instructions are executed. In Fig. 11.35 only 50 test instructions are used,
versus 100 for the program in Fig. 11.34.
[I~11
t, I
A(1) ~ B(I)
I~-I+ 1
A(/) <--B(/) -|~ A ( I ) +- B(I)
I > 100? CB2 / I > 100?
I i l halt ]
~i "a'tl
I
[I~,+11
[
i I~I+ 1
1 6~3
! 1
Fig. 11.34 Flow graph. Fig. 11.35 Unrolled flow graph.
EXERCISES
11.3.1. Construct intermediate programs equivalent to the following source

programs"
(a) S = (A -J- B --J-C ) , .5
D = s • (s - A) • (S - B) • (S -- C)
AREA -- SQRT(D)
EXERCISES 933
(b) for I ' : 1 step 1 until N do

begin
A[I] • -- B[I] + C [ I , 2];
if (A[I] = 0) then halt
else A[I] • = I
end
(c) DO5I=l,N
5 A(I, I) = C • A(I, I )
11.3.2. What functions are computed by the following two intermediate lan-
guage programs ?
(a) read N
S~--0
I~--1
loop" S ~-- S + I
if I ~ N goto done
I:I+1
goto loop
done" write S
halt
(b) read N
T~--N+ I
T~--T,N
T+-- T , . 5
write T
halt
Are the two programs equivalent, if N and I represent integers and S
and I represent reals ?
11.3.3. Consider the following program P"
read A, B
R ~---- 1
C.~---A ,A
D+---B,B
if C < D goto X
E.+---A ,A
R+---R-F-1
E+---E+R
write E
halt
X: E.<----B, B
R. R+2
E+----E+R
write E
if E > 100 goto Y
halt
Y: R< R - - 1
goto X
Construct a flow graph for P.
11.3.4. Find the dominators and direct dominators of each node in the flow
graph of Fig. 11.36.
11.3.5. Let ON((B1, (B2) be the set of blocks that can appear on a path from
block (Bi to block 6~z (without going through (B1 again, although
the path may go through 632 more than once) in a flow graph. Show
that if (Bi dominates (Bz, then
ON((Ba, 6~2)= [6~ Ithere is a path from (B to (B2
when (B1 is deleted from the flow graph}.
What is the time required to compute ON((B1, (Bz)?

EXERCISES 935
11.3.6. Prove assertions (1) and (2) of Lemma 11.13;

11.3.7. We can define a postdominance relation as follows. Let F be a flow
graph and 63 a node of F. A node 63' is a postdominator of 63 if every
path from 63 to a halt statement passes through 63'. An immediate
postdominator of 63 is postdominated by every other postdominator
of 63. Show that if a node in a flow graph has a postdominator, then
it has an immediate postdominator.
11.3.8. Devise an algorithm to construct the immediate postdominators of all
nodes in a flow graph.
11.3.9. Find all strongly connected regions in Fig. 11.36. Which regions are
maximal ?
11.3.10. Let P be the program in Exercise 11.3.3.
(a) Eliminate all common subexpressions from P.
(b) Eliminate all unnecessary constant computations from P.
(c) Remove all invariant computations from the loop in P.
11.3.11. Find all induction variables in the following program. Eliminate as
many as you can and replace as many multiplications by additions as
possible.
I+--1
read J, K
X: A +--K, I
B+---J,I
C+--A+B
write C
I+---I+1
if I < 100 goto X
halt
11.3.12. Give algorithms to find all (a) regions (b)single-entry regions and (c)
maximal regions in a flow graph.
• 11.3.13. Give an algorithm to detect some of the induction variables in a single-
entry region.
"11.3.14. Generalize the algorithm in Exercise 11.3.13 to handle regions that are
not single-entry.
11.3.15. Give an algorithm to move region-independent computations out of a
(not necessarily single-entry) region. Hint: Blocks outside the region
that can reach the region other than by the region entry are permitted
to change variables involved in the region-invariant computation.
We may need to place new blocks between blocks outside the region
and blocks within the region.
*'11.3.16. Show that it is undecidable whether two programs are equivalent.
Hint: Choose appropriate data types and interpretations for the
operators.
*'11.3.17. Show that it is undecidable whether a variable is an induction variable.
11.3.18. Generalize the notion of scope of variables and statements to pro-
grams with loops. Give an algorithm to compute the scope of a variable
in a program with loops.
"11.3.19. Extend transformations T1-T4 of Section 11.1 to apply to programs
with no backward loops (programs with assignment statements and
conditional statements o f the form if x R y goto L, where L refers to
a statement after this conditional statement).
"11.3.20. Show that it is und~cidable whether a program will ever terminate.
Research Questions
11.3.21. Characterize the machine models for which the transformations we
have described will result in faster-running programs.
11.3.22. Develop algorithms that will detect large classes of the phenomena
with which we have been dealing in this section, e.g., loop-invariant
computations or induction variables. Note that, for most of these
phenomena, there is no algorithm to detect all such instances.
Open Question
11.3.23. Is it possible to compute direct dominators of an n-node flow graph
in less than O(n2) steps ? It is reasonable to suppose that O(n2) is the
best we can do for the entire dominance relation, since it takes that
long just to print the answer in matrix form.
BIBLIOGRAPHIC NOTES
There are several papers that have proposed various optimizing transformations
for programs. Nievergelt [1965], Marill [1962], McKeeman [1965], and Clark [1967]
list a number of machine independent transformations. Gear [1965] proposes an
optimizer capable of some common subexpression elimination, the propagation of
constants, and loop optimizations such as strength reduction and removal of in-
variant computations. Busam and Englund [1969] discuss similar optimizations in
the context of FORTRAN. Allen and Cocke [1972] provide a good survey of these
techniques. Allen [1969] discusses a global optimization scheme based on finding
the strongly connected regions of a program.
The dominator approach to code optimization was pioneered by Lowry and
Medtock [1969], although the idea of the dominance relation comes from Prosser
[1959].
SEe. 11.4 DATAFLOWANALYSIS 937
There has been a great deal of theoretical work on program schemas, which are
similar to our flow graphs, but with unspecified spaces for the values of variables
and unspecified functions for operators. Two fundamental papers regarding equiva-
lence between such schemas independent of the actual spaces and functions are
Ianov [1958] and Luckham et al. [1970]. Kaplan [1970] and Manna [1973] survey
the area.
11.4. D A T A FLOW A N A L Y S I S
In the previous section, we used certain information about the com-

putations in the blocks of a program without describing how this informa-
tion could be efficiently computed. In particular, we have used
(1) The "available" expressions upon entering a block. An expression
A -t- B is said to be available on entering a block if A -+- B is always computed
before reaching the block but not before a definition of A or B.
(2) The set of blocks in which a variable could have last been defined
before the flow of control reaches the current block. This information is
useful for propagating constants and detecting useless computations. Another
application is in detecting possible programmer errors in which a variable
is referenced before being defined.
A third type of information that can be computed using the techniques
of this section is the computation of active variables, those whose value
must be retained on exit from a block. This information is useful when blocks
are converted to machine code, as it indicates those variables which must
either be stored or retained in a fast register on exit from the block. In terms
of Section 11.1, this information is needed to determine which variables are
output variables. Note that a variable might not be computed in the block
in question (but rather in a previous block) and still be an input and output
variable of the block.
Of these three problems, we shall here discuss only question (2)--the
determination of where a variable could have been defined previous to reach-
ing a given block. Our technique, called "interval analysis," partitions a flow
graph into larger and larger sets of nodes, placing a hierarchical structure
on the entire graph. This structure is used to give an efficient algorithm for
a class of flow graphs, called "reducible" graphs, that occurs with surprising
frequency in flow graphs derived from actual programs. We then show
the extension necessary to handle irreducible graphs.
In the Exercises, we shall discuss some of the changes necessary to gather
the other two types of information using interval analysis.
11.4.1. Intervals
We begin with the definition of a type of subgraph that is useful in data

flow analysis.
DEFINITION
If h is a node of a flow graph F, we define I(h), the interval with header

h, as the set of nodes of F constructed as follows:
(1) h is in I(h).
(2) If n is a node not yet in I(h), n is not the begin node, and all edges
entering n leave nodes in I(h), then add n to I(h).
(3) Repeat step (2) until no more nodes can be added to I(h).
Example 11.39
Consider the flow graph of Fig. 11.37.
Let us consider the interval with header n~, the begin node. By step (1),
I(na) includes n~. Since the only edge to enter node n 2 leaves n~, we add n2
to I(n~). Node n3 cannot be added to I(n~), since n3 can be entered from node

SEC. 11.4 DATAFLOWANALYSIS 939
n5 as well as n 2. No other nodes can be added to I~nl). Thus, 1(nl) = {nl, nz}.
Now let us consider I(n3). By step (1), n3 is in 1(n3). However, we cannot
add n 4 to I(n3), since n4 may be entered via n6 (as well as n3) and n6 is not in
1(n3). No other nodes can be added to I(n3), and so 1(n3) = {n3].
Continuing in this fashion, we can partition this flow graph into the fol-
lowing intervals"
I(n4) = {n4, n5, n6}

I(nT) = [nT, ns, n9) D
We shall provide an algorithm for selecting interval headers and construct-
ing the associated intervals so that a flow graph is partitioned into dis-
joint intervals. However, we shall first make three observations concerning
intervals.
THEOREM 11.12
(1) The header h dominates every other node in I(h) [although not every
node dominated by h need be in I(h)].
(2) For each node h of a flow graph F, the interval I(h) is unique and
independent of the order in which candidates for n in step (2) of the defini-
tion of interval are chosen.
(3) Every cycle in an interval I(h) includes the interval header h.
Proof. We shall leave (1) and (2) for the Exercises and prove (3). Suppose
that l(h) has a cycle n l , . . . , nk which excludes h. That is, there is an edge
from nt to n~+1, 1 ~ i < k, and an edge from nk to nx. Let nt be the first of
nl, . . . , n k added to l(h). Then nt'l (or n k, if i = 1) must have been in l(h)
at that time, in contradiction. D
One of the interesting aspects of interval analysis is that flow graphs

can be partitioned uniquely into intervals, and the intervals of one flow
graph can be considered to be the nodes of another flow graph, in which
an edge is drawn from interval 11 to a distinct interval 12 if there is any edge
from a node of 11 to the header of 12. (There clearly cannot be an edge from
11 to a node of 12 other than the header.) This new graph can then be broken
into intervals in the same way, and this process can be continued. For this
reason, we shall subsequently consider a flow graph to be composed of nodes
of unspecified type, rather than blocks. The nodes may thus represent struc-
tures of arbitrary complexity.
We shall now give the algorithm that partitions a flow graph into a set of
disjoint intervals
ALGORITHM 11.6
Partitioning a flow graph into disjoint intervals.
lnput. A flow graph F.
Output. A set of disjoint intervals whose union is all the nodes of F.
Method.
(1) We shall associate with each node of F two parameters, a count and
a reach. The count of a node n is a number which is initially the number of
edges entering n. While executing the algorithm, the count of n is the number
of these edges which have not yet been traversed. The reach of n is either
undefined or some node of F. Initially, the reach of each node is undefined,
except for the begin node, whose reach is itself. Eventually, the reach of a
node n will be the first interval header h found such that there is an edge
from some node in I(h) to n.
(2) We create a list of nodes called the header list. Initially, the header
list contains only the begin node of F.
(3) If the header list is empty, halt. Otherwise, let n be the next node on
the header list. Remove n from the header list.
(4) Then use steps (5)-(7) to construct the interval I(n). In these steps
the direct successors of I(n) are added to the header list.
(5) l(n) is constructed as a list of nodes. Initially, I(n) contains only node
n and n is "unmarked."
(6) Select an unmarked node n' on I(n), mark n', and for each node
n" such that there is an edge from n' to n" perform the following operations'
(a) Decrease the count of n" by 1.
(b) (i) If the reach of n" is undefined, set it to n and do the follow-
ing. If the count of n is now 0 (having been 1), then add n"
to I (n) and go to step (7); otherwise, add n" to the header list
if not already there and go to step (7).
(ii) If the reach of n" is n and the count of n" is 0, add n" to
I(n) and remove n" from the header list, if it is there. Go to
step (7).
If neither (i) nor (ii) applies, do nothing in part(b).
(7) If an unmarked node remains in I(n), return to step (6). Otherwise,
l(n) is complete, and we return to step (3). [Z]
DEFINITION
From the intervals of a flow graph F, we can construct another flow
graph I(F) which we call the derived graph of F. I(F) is defined as follows:
(1) I(F) has one node for each interval constructed in Algorithm 11.6.
(2) The begin node of I(F) is the interval containing the begin node of F.
(3) There is an edge from interval I to interval J if and only if I ~ J
and there is an edge from a node o f / t o the header of J.
sEc. 11.4 DATA FLOWANALYSIS 941
I(F), the derived graph of a flow graph F, shows the flow of control among
the intervals of F. Since I(F) is a flow graph itself, we can also construct
I(I(F)), the derived graph of I(F). Thus, given a flow graph F 0 we can con-
struct a sequence of flow graphs F 0, F ~ , . . . , F,, which we call the derived
sequence of F, in which F~+~ is the derived graph of F~, for 0 ~ i < n, and F,
is its own derived graph [i.e., I(F,) ---- F,]. We say that F~ is the ith derived
graph of Fo. F, is called the limit of F0. It is not hard to show that F, always
exists and is unique.
If F, is a single node, then F is said to be reducible.
It is interesting to note that if F 0 is constructed from an actual pro-
gram, there is a high probability that F 0 will be reducible. In Section 11.4.3,
we shall discuss a node-splitting technique whereby every irreducible flow
graph can be transformed into one that is reducible.
Example 11.40
Let us use Algorithm 11.6 to construct the intervals for the flow graph of
Fig. 11.38.
The begin node is n~. Initially, the header list contains only n~. To con-
struct I(n~), we add n~ to I(n~) as an unmarked node. We make n~ marked by
considering n 2, the direct successor of n~. In doing so, we decrease the count
of n 2 from its initial value of 2 to 1, set the reach of n2 to n~, and add n 2
to the header list. At this point no unmarked nodes remain in I(n~), so
I(n~) -- [n~} is complete.
The header list now contains n z, the successor of I(n~). To compute I(n2),
we add n 2 to I(nz) and then consider n3, whose count is 2. We decrease the
count of n3 by 1, set the reach of n 3 to nz, and add n3 to the header list. Thus,
we find that I(n2) = [n2}.
The header list now contains n3, the successor of I(n2). To compute I(n3),
we begin by placing n 3 in I(n3). We then consider nodes n4 and ns, decreasing
their counts from 1 to 0, making their reach n3, and adding both n4 and n5
as unmarked nodes to I(n3). We mark n 4 by decreasing the count of n6 from
its initial value of 2 to 1, making n3 the reach of n6 and adding n6 to the
header list. When we mark n5 on I(n3), we change the count of n6 from 1 to 0,
remove n6 from the header list, and add n6 to I(n3).
To mark n 6 on I(n3), we make the count of n7 0, set the reach of n7 to n 3,
and add n7 to I(n3). Node n3 is considered next, since there is an edge from
n6 to n 3. Since its reach is n2, n3 does not affect I(n3) or the header list at
this point. To mark nT, we make the count of ns 0, set the reach of n8 to
n3, and add n8 to I(n3). Node n2 is also a successor of nT, but since the reach
of n2 is n~, n2 is not added to I(n3) or t h e header list. Finally, to mark n8,
no operations are needed, since n8 has no successors. At this point no un-
marked nodes remain in I(n3), and so I(n3) = In3, n4, ns, n6, r/7, n8}.
The header list is now empty; and so the algorithm terminates. In sum-
mary, we have partitioned the flow graph into three disjoint intervals"
l(n,) = {n,}
l(nz) -~ (n2}
!(n3) = (n3, /'/4, /'/5, /'/6, n7, ns}
From these intervals we can construct the first derived flow graph F1. We can
then apply Algorithm 11.6 to F I to obtain its intervals. Repeating this entire
process, we construct the sequence of derived flow graphs shown in Fig.
11.39. E]
Example 11.41
Consider the flow graph F in Fig. 11.40. The intervals for F are
= {nl}
I(n2) ~-- {n2}
I(n3) = In3}
SEC. 11.4 DATA FLOW ANALYSIS 943
{nl~ n
O .....
{n2} {n2 n3 ..... n8}
{n3 ..... n8}
F1 F2 ~3
Fig. 11.39 Sequence of flow graphs.
Fig. 11.40 Flow graph F.
We find that I(F) = F. Thus, F is not a reducible flow graph. E]
THEOREM I 1.13
Algorithm 11.6 constructs a set of disjoint intervals whose union is the
entire graph.
Proof. Disjointness is obvious. If a node is added to an interval in step

(6bi) of Algorithm 11.6, that node will not be added to the header list. If
a node is added to an interval in step (6bii), that node is removed from the
header list. Likewise, it is easy to show that the union of all I(n) constructed
is the set of nodes of F. Assuming that F is a flow graph, every node is acces-
sible from the begin node of F and so is placed either on the header list or
in an interval. Unless a node is added to an interval, it will become the header
of its own interval.
Finally, we must show that each I(n) constructed is an interval. In step

(6), n" is added to I(n) if and only if its reach is n and its count has been
reduced to 0. Thus, every edge entering n" comes from a node already in I(n),
and n" can be added to I(n) according to the definition of an intervaI.
We observe that Algorithm 1 1.6 can be executed in time proportional

to the number of edges in the 'flow graph on a random access computer.
Since a flow graph whose nodes are blocks of a program has no more than
two edges leaving any node, this is tantamount to saying that Algorithm
1 1.6 is linear in the number of blocks in the program. It is left for the Exer-
cises to show that each derived graph constructed from a program of n
blocks by repeated application of Algorithm 1 1.6 has no more than 2n edges.
11.4.2. Data F l o w Analysis Using Intervals
We shall show how interval analysis can be used to determine the data
flow within a reducible graph. The particular problem that we shall discuss is
that of determining for each block 63 and for each variable A of a reducible
flow graph at which statements of the program A could have last been defined
when control reaches 03. Subsequently, we shall extend the basic interval
analysis algorithm to irreducible flow graphs.
It is worthwhile pointing out that part of the merit in the interval approach
to data flow analysis lies in treating sets as packed bit vectors. The logical
AND, OR, and NOT operations on bit vectors serve to compute set inter-
sections, unions, and complements in a way that is quite efficient on most
computers.
We shall now construct tables that give, for each block 03 in a program,
all locations l at which a given variable A is defined, such that there is a
path from l to 03 along which A is not redefined. This information can be
used to determine the possible values of A upon entering 03.
We begin by defining four set-valued functions on blocks.
DEFINITION
A computation path f r o m statement s 1 to statement s 2 is a sequence of

statements beginning with s l and ending with s 2 that may be executed in
that order during the execution of a program.
Let 63 be a block of P. We define four sets of definition statements as
follows:
(1) IN(03) = {d in P lthere is a computation path from definition state-

ment d to the first statement of 03, such that no statement in this path, except
possibly the first statement of 03, redefines the variable defined by d}.
(2)' OUT(03)= {~d in P[ there is a computation path from d to the last
statement of 03, such that no statement in this path redefines the variable
defined by d}.
(3) TRANS(03) = {d in P tthe variable defined by d is not defined by any
statement in ~}.
(4) GEN(03)= [d in 03I the variable defined by d is not subsequently
defined in 03}.
Informally, IN(03) contains those definitions that can be active going into
03. OUT(03) contains those definitions that can be active coming out of 03.
TRANS(03) contains the definitions transmitted through 03 without redefini-
tion in 03. GEN(03) contains those definitions generated in 03 that are active
on leaving 03. It is easy to show that
OUT(03) = (IN(03) ~ TRANS(03)) U GEN(03)
Example 11.42
Consider the following program"
Si: I+---- 1
$2: J< 0
$3: J< J+ I
$4: read I
$5" ifI< 100 goto $8
$6" write J
$7: halt
$8: I~ I.I
$9: goto $3
We have labeled each statement for convenience. The flow graph for this
program is shown in Fig. 11.41. Each block has been explicitly labeled.
Let us determine IN, OUT, TRANS, and GEN for block 032.
Statement S1 defines/, and S1, $2, $3 is a computation path that does
not define 1 (except at Si). Since this path goes from S1 to the first statement
of 032, we see that SI ~ IN(032). In this manner we can show that
IN(032) = {SI, $2, $3, $8]
Note that $4 is not in IN(032), because there is no computation path from

$4 to $3 that does not redefine I after $4.
OUT(032) does not include S1, since all computation paths from S1 to $5
redefine L The reader should verify that
SI" ! ~ 1
$2" J ~ 0
"¢_i
OUT((B2) = {$3, $4}

TRANS(6~2) =
GEN((B2) = {$3, $4} D
The remainder of this section is concerned with the development of an
algorithm to compute IN(6~) for all blocks of a program. Suppose that
~1,. • . , 03k are all the direct predecessors of a block 6~ in P. (One of these
direct predecessors may be CBitself.) Clearly,
k
i=1
IN(CB) -- U OUT((B~)
k
= U [(IN(6~,) n TRANS(CB3) u GEN(CB,)]
i=1
To compute IN((B), we could write this set equation for each block in the
program along with IN(CB0)= ~3, where (B0 is the begin block, and then
attempt to solve the collection of simultaneous equations.t However, we
shall give an alternative method of solution that takes advantage of the
interval structure of flow graphs. We first define what we mean by an en-
trance and exit of an interval.
tAs with the regular expression equations of Section 2.2, the solution may not be
unique. Here we want the smallest solution.
DEFINITION
Let P be a p r o g r a m and F 0 its flow graph. Let F0, F 1 , . . . , F, be the
derived sequence of F 0. Each node in F~, i > 1, is an interval of F,_ 1 and is
called an interval o f order i.
The entrance of an interval of order 1 is the interval header. (Note that
this header is a block of the p r o g r a m . ) The entrance of an interval of order
i > 1 is the entrance of the header of that interval. Thus, the entrance of
any interval is a basic block of the underlying p r o g r a m P.
An exit of I(n), an interval of order 1, is the last statement of a block 03
in I(n) such that 03 has a direct descendant which is either the interval header
n or a block not in I(n). An exit of an interval I(n) of order i > 1 is the last
statement of a block 03 contained within I(n)t such that there is an edge in
F 0 from 03 either to the header o f interval n or to a block outside I(n).
N o t e that each interval has one entrance and zero or more exits.
Example 11.43
Let F 0 be the flow graph in Fig. ! 1.41. Using Algorithm 11.6, we obtain
11 -- I(03,) -- {031}
Iz -- I(03z) = {03z, 033,034}
as the partition of F0 into intervals. F r o m these intervals we can construct

the first derived graph F 1 shown in Fig. 1 1.42. F r o m F1 we can construct its
intervals (there is only one),
13 = 1(11) -- {11, Iz} ~- [03t, 032,033, (~04}
and obtain the limit flow graph F2, also shown in Fig. 1 1.42.
Fl F2 Fig. 11.42 Derived sequence of F0.
tStrictly speaking, an interval I of order i > 1 has intervals of order i -- 1 as members.

We shall informally say that a block 6~ is in I if it is in one of l's members. Thus, the set
of blocks comprising an interval of arbitrary order is defined as we would intuitively expect.
948 CODEOPTIMIZATION CHAP. l I
11 and I 2 a r e intervals of order 1. The entrance of I 2 is CB2. The entrance

of 13 is CBx The only exit of 11 is statement $2. The only exit of 12 is statement
$9. 13 is an interval of order 2 with entrance 6~1.13 has no exits. I~]
We now extend the functions IN, OUT, T R A N S , and G E N to intervals.

Let Fo, F1, • • •, F, be the derived sequence of Fo, where F0 is the flow graph
of a p r o g r a m P. Let (g be a block of P and I an interval of some Fi, i > 1.
We make the following recursive definitions"
flN(CB) if I is of order 1 and CBis the header of L
(1) IN(l)
"-[IN(/) if I is of order i > 1 and I' is the header of L
I n (2), (3), and (4) below, s is an exit of L

(2) OUT(/, s) --OUT(6~) if s is the last statement in 6~ and 6~ is in L
(3) (a) TRANS((B, s) = TRANS((B) if s is the last statement in (g.
(b) T R A N S ( / , s) is the set of statements d in P such that there exists
a cycle-free path 11, 12, • • •, Ik consisting solely of nodes in I and
a sequence of exits s 1, . . . , s k of 11, . . . , I k, respectively, such that
(i) 11 is the header of L
(ii) In F0, sj is in a block that is a direct predecessor of the
entrance of Ij+ ~ for 1 < j < k.
(iii) d is in TRANS(Ij, s j) for 1 < j ~ k.
(iv) s k = s.
These conditions are illustrated in Fig. 11.43.
Interval I
in Fi
Intervals in Fi_ l
Fig. 11.43 TRANS (/, s).

s~c. 11.4 DATAFLOWANALYSIS 949
(4) (a) GEN(03, s) = GEN(03) if s is the last statement of 03.

(b) GEN(/, s) is the set of d in P such that there is a cycle-free path
11, • • •, Ik consisting solely of nodes in I and a sequence of exits
s ~ , . . . , Sk of 11, • • •, Ik, respectively, such that
(i) d is in GEN(I~, sl).
(ii) In F0, s j is in a block that is a direct predecessor of the
entrance of 1i+ ~, 1 < j < k.
(iii) d is in TRANS(Ij, s l) for 2 < j < k.
(iv) s, = s.
Note that I1 need not be the header of I here.
Thus, TRANS(I, s) is the set of definitions that can pass from the
entrance of I to exit s without being redefined in L GEN(I, s) is the set of
definitions in I which can reach s without being redefined.
Example 11.44
Let us considerF 0 of Fig. 1 1.4 1 and F I and F2 of Fig. 1 1.42. In F 1, interval
12 is {032,033, 034} and has exit $9. Thus, iN(I2) = IN(032) = IS1, $2, $3, $8},
and OUT(I2, $9) = OUT(033) ----{$3, $8}.
TRANS(I2, $9) = ~ , since TRANS(032) ---- ~ .
GEN(I2, $9) contains $8, since there is a sequence of blocks consisting
of 033 alone, with $8 in GEN(033, $9). Also, $3 is in GEN(I2, $9), because
of the sequence of blocks 032, 033, with exits $5 and $9. That is, $3 is in
GEN(032, $5), 032 is a direct predecessor of 033, and $3 is in TRANS(033, $9).
D
We shall now give an algorithm to compute IN(03) for all blocks of a pro-
gram P. The following algorithm works for only those programs that have
a reducible flow graph. Modifications necessary to do the computation for
irreducible flow graphs are given in the next section.
ALGORITHM 1 1.7
Computation of the IN function.
lnput. A reducible flow graph F 0 for a program P.
O u t p u t . IN(03) for each block 03 of P.
Method.
(1) Let F 0 , F 1 , . . . , F k be the derived sequence of F o. Compute
TRANS(03) and GEN(03) for all blocks 03 of Fo.
(2) For i = 1 , . . . , k, in turn, compute TRANS(I, s) and GEN(I, s)
for all intervals of order i and exits s of L The recursive definition of these
functions assures that this can be done.
(3) Define IN(/) = ~3, where I is the lone interval of order k. Set i = k.
(4) Do the following for all intervals of order i. Let I = [ I 1 , . . . , I,} be
an interval of order i. ( I a , . . . , I, are intervals of order i - 1, or blocks,

if i = I.) We may assume that the ordering of these subintervals is the order
in which they were added to ! in Algorithm 11.6. That is, 11 is the header,
and for each j > 1, {I,, . . . , Ij_ 1} contains all nodes of Ft_ t that are direct
predecessors of Ij.
(a) Let sl, s z , . - . , sr be the exits of I such that each st is in a block
of Fo that is a direct predecessor of the entrance of L Set
IN(/,) = IN(/) U U GEN(I, s,)

i=l
(b) For all exits s of I1,'t set

OUT(I~, s) = (iN(I~) n TRANS(I~, s)) u GEN(I~, s)
(c) F o r j = 2 , . . . , n, let sr 1, S r 2 , • • • , S , . k , . be the exits of It, 1 ~ r < L
such that each exit is in a block of F 0 that is a direct predecessor of
the entrance of Ij. Set
IN(Ij) -- ~_J OUT(Ir, s,,)

r, l
OUT(Ij, s) = (IN(Ij) A TRANS(/j, s)) tu GEN(Ij, s)
for all exits s of Ij.

(5) If i = 1, halt. Otherwise, decrease i by 1 and return to step (4). Q
Example 11.45
Let us apply Algorithm 11.7 to the flow graph of Fig. 11.41.
It is straightforward to compute G E N and T R A N S for the four blocks of
F 0. These results are summarized below"
Block GEN TRANS
sa [s1, $2}
6~2 {$3, $4}
6~a [$8} {$2, $3}
(B4 ~ {s1, $2, $3, $4, $8}
For example, since (B3 defines only the variable/, (B3 "kills" the previous
definitions of I but transmits the definitions of J, namely $2 and $3. Since
no block defines a variable twice, all definition statements within a block
are in G E N of that block.
We observe that I,, consisting of (B1 alone, has one exit, the statement $2.
tIf an interval has two exits connecting to the same next interval, they can be "merged"
for efficiency of implementation. The "merger" consists of taking the union of the GEN
and TRANS functions.
Since paths in I~ are trivial, GEN(Ia, $2) = {S1, $2} and TRANS(I~, $2)
is the empty set.
12 has exit $9. We saw in Example 11.44 that GEN(I2, $ 9 ) = {$3, $8}
and TRANS(I2, $9) = ~ .
We can thus begin to compute the IN function. As required, IN(I3) = ~ .
Then we can apply step (4) of Algorithm 11.7 to the two subintervals of I3.
The only permissible order for these is I~, 12. We compute in step (4a),
IN(I~) = IN(/3) = ~ , and in step (4b),
OUT(It, $2) = (IN(I1) A TRANS(I1, $2)) U GEN(It, $2) = IS1, $2].
Then, in step (4c),

iN(I2) : OUT(Is, $2) : IS1, $2}
Going to intervals of order 1, we must consider the constituents of It
and 12.11 consists only of (ga, and so we compute IN((B1) = ~ . 12 consists
of (g~, (g 3, and (g4, which we may consider in that order. In step (4a), we have
IN((B2) : IN(I2) U GEN(I2, $9) : IS1, $2, $3, $8}. In step (4b)
OUT((B 2, $5) ----(IN((B2) A TRANS((B2, $5)) t..) GEN((B2, $5) = {$3, $4}
Since $5 leads to (B3, we find that IN((B3) = OUT((B z, $5) = {$3, $4}. Then,
since $5 also leads to (B4, we find that
iN(6~4) = OUT((B2, $5) = [$3, $4]
Summarizing, we have
IN((B,) =
IN((B2) ----[S1, $2, $3, $8}
IN((B3) = {$3, $4}
IN((B4) = {$3, $4} E]
We can prove by induction on the order of I that:

(1) TRANS(I, s) is the set of definition statements d in P such that there
is a path from the first statement of the header of I up to s along which no
statement redefines the variable defined by d.
(2) GEN(I, s) is the set of definitions d such that there is a path from d
to s along which no statement redefines the variable defined by d.
Then we can prove the following statement by induction on the number
of applications of step (4) of Algorithm 11.7.
(11.4.1) If step (4) is applied to compute IN(Ij), then IN(Ij) is the set of
definitions such that there is a path from d to the entrance of
Ij along which no statement redefines the variable defined by d,
•and OUT(I~, s) is the set of d such that there is a path from d to
s along which no statement redefines the variable defined by d.
The special case of (11'4.1), where I v is a block, is the following theorem.
THEOREM 11.14
In Algorithm 11.7, for all basic blocks 63 in P, IN(CB) is the set of defini-
tions d such that there is a path in F 0 from d to the first statement of • along
which no statement redefines the variable defined by d. U]
11.4.3. Irreducible Flow Graphs
While not every flow graph is reducible, there is an additional concept,

called node splitting, which allows us to generalize Algorithm 11.7 to all
flow graphs. A node n With more than one edge entering is "split" into several
identical copies, one for each entering edge. Each copy of n thus has a single
edge entering and becomes part of the interval of the node from whence
this edge comes. Thus, :an application of node splitting followed by interval
construction will reduce the number of nodes in the graph by at least 1.
Repeating this process if necessary, we can transform any irreducible flow
graph into a reducible one.
Example 11.46
Consider the irreducible flow graph in Fig. 11.40 (p. 943). We can split
node n3 into two copies, n3! and n3, tt
to obtain the flow graph F ,t shown in
Fig. 11.44. The intervals for F' are
I, -- I(n,) - - [ n , , n'3]
Iz - - I ( n z ) - - [nz, nT}
Fig. 11.44 Split flow graph.
F~, the first derived graph of F' will have two nodes, as shown in Fig. 11.45.
The second derived graph of F' consists of a single node. Thus by node split-
ting we have transformed F into a reducible flow graph F'. [--]
Fig. 11.45 First derived graph.
We shall give a modified version of Algorithm 11.7 to take this new

technique into account. First, a simple observation is useful.
LEMMA 11.15
If G is a flow graph and I(G) = G, then every node n other than the begin
node has at least two entering edges; neither edge comes from n.
Proof Each edge from a node to itself disappears in an application of
the interval instruction. Thus, assume that node n has only one entering
edge, from another node m. Then n is in I(m). If I(G) = G, then node m
eventually appears on the header list in Algorithm 11.6. But then n is placed
in I(m), and so I(G) cannot be G.
ALGORITHM 11.8
General computation of the IN function.
lnput. An arbitrary flow graph F for a program P.
Output. IN(G), for each block 6~ of P.
Method.
(1) Compute GEN(~) and TRANS((B) for each block ~B of F. Then
apply step (2) recursively to F. The input to step (2) is a flow graph G with
GEN(/, s) and TRANS(/, s) known for each node I of G and each exit s of L
The output of step (2) is IN(/) for each node I of G.
(2) (a) Let G be the input to this step and let G, G 1, . . . , Ge be the derived
sequence of G. if G k is a single node, proceed exactly as in
Algorithm 11.7. If G k is not a single node, we may compute G E N •
and TRANS for all the nodes of G1, • • •, Gk as in Algorithm 11.7.
Then, by Lemma 11.15, G k has some node other than the begin
node with more than one entering edge. Select one such node L
If I has j entering edges, replace I by new nodes 1 1 , . . . , Ij. One
edge enters each of 11, . . . , Ij, each from a different node from
which an edge previously entered L
(b) For each exit s o f / , create an exit s i of Ii, 1 _~ i _~ j, and imagine
that in F there is an edge from each s~ to the entrance of every
node to which s connected in Gk. Define GEN(I~, s~) = GEN(/, s)
and TRANS(I i, si) = TRANS(/, s) for 1 ~ i < j. Call the resul-

ting graph G'.
(c) Apply step (2)to G ' . Recursively, the IN function will be com-
puted for G'. Then, compute the IN function for the nodes of Gk
by letting I N ( l ) = ~,_j[=l IN(It). No other changes to the IN
function are made.
(d) Compute IN for G from the IN function for Gk as in Algorithm
11.7.
(3) After step (2) is complete, the IN function will have been computed
for each block of F. This information forms the output of the algorithm. D
Example 11.47
Consider the flow graph F0 of Fig. 11.46(a). We can compute F 1 = I(Fo),
which is shown in Fig. 11.46(b). However, 1(F1) = F1, so we must apply the
(a) F 0 (b) F 1
Fig. 11,46 Non-reducible flow graphs.
node splitting procedure of step (2). Let node [n2, ns} be/, and split I into 11
and/2. The result is shown in Fig. 11.47. We have chosen to connect nl to 11
and In3, n4} to 12. Edges from 11 and 12 to [n3, n4} have been drawn. Actually,
each exit of I is duplicated, one for 11 and one for 12. It is the duplicated
exits which connect to the entrance of [n3, n4}, a fact which is represented
by the two edges in Fig. 11.47. Note that the graph of Fig. 11.47 is reducible.
THEOREM 11.15
Algorithm 11.8 always terminates.
Proof By Lemma 11.15, each call of step (2) is either on a reducible
graph, in which case the call surely terminates, or there is a node I which
can be split. We observe that each of 11, . . . , Ij created in step (2) has a single
entering edge. Thus, when the interval construction is applied, they will
each find themselves in an interval with another node as header. We conclude
that the next call of step (2) will be on graphs with at least one fewer node,
so Algorithm 11.8 must terminate. D
THEOREM 11.16
Algorithm 11.8 correctly computes the IN function.
Proof It suffices to observe that G E N and TRANS for I1, • • •, Ij in step
(2) are the same as f o r / . Moreover, IN(/) is clearly (..][--1 IN(It) and OUT(/)
is ~{=1 OUT(It). Since each I~ connects wherever I connects, the IN function
for nodes other than / i n Gk is the same as in G'. Thus, a simple induction on
the number of calls of step (2) shows that IN is correctly computed. [5]
If we are going to construct an optimizing compiler, we must first decide

what optimizations are worthwhile. This decision should be based on the
characteristics of the class of programs the compiler is expected to compile.
Unfortunately, these characteristics are often hard to determine and little
has been published on this subject.
In this chapter we have approached code optimization from a rather

general point of view and it would be wise to ask how the various aspects of
code optimization that we have discussed relate to each other.
The techniques of Section 11.2, on arithmetic expressions, can be used at
the time the final object program is constructed. However, some aspects of
the algorithms in Section 11.2 can also be incorporated into the generation
of intermediate code. That is, portions of the algorithms of that section can
be built into the output of the syntax analyzer. This will lead to straight-line
blocks that tend to make emcient use of registers.
At the code generation phase of the compiler, we have an intermediate
program which we may suppose looks something like the "programs" of
Section 11.3. Our first task is to construct the flow graph in the manner
described in that section. A possible next step is to perform loop optimiza-
tions as described in Section 11.3, starting with inner loops and proceeding
outward.
Having done this, we can compute global data flow information, e.g.,
as suggested by Algorithm 11.8 and/or Exercises 11.4.19 and 11.4.20. With
this information, we can perform the "global" optimizations of Section 11.3,
e.g., constant propagation and common subexpression elimination. At this
stage, however, we must be careful not to add steps to inner loops. To do
this, we could flag blocks in inner loops and in such blocks avoid an addi-
tional store of an expression, even if that expression were used later on in
a block outside the loop. If the machine for which we are generating code
has more than one register, we can use active variable determination (Exercise
11.4.20) to determine which variables should occupy registers on exit from
blocks.
Finally, we can treat the basic blocks by the methods of Section 11.1
or an analogous method, depending on the exact form of the intermediate
language. Also at this stage, we allocate registers within the block, subject
to the constraints imposed by the global register assignment mentioned above.
Some heuristic techniques are usually needed here.
EXERCISES
11.4.1. Construct the derived sequence of flow graphs for the flow graphs in
Fig. 11.32 (p. 931) and Fig. 11.36 (p. 934). Are the flow graphs reduci-
ble?
11.4.2. Give additional examples of irreducible flow graphs.
11.4.3. Prove Theorem 11.12(1)and (2).
• 11.4.4. Show that Algorithm 11.6 can be implemented to run in time propor-
tional to the number of edges in flow graph F.
EXERCISES 957
11.4.5. Prove Theorem 1i.14,

11.4.7. Use interval analysis (Algorithm 11.7) as the basis of an algorithm that
determines, given a statement which references variable A, whether A
was explicitly defined to have the same constant value at each execu-
tion of the statement. Hint: It is necessary to determine which defini-
tion statements defining A could have been the previous definition of
A before the current execution of the statement in question. It is easy
to determine this if there is a previous statement defining A in the
block of the statement in question. If not, we need IN(G) for the block
of the statement. In the latter case, we say that A has a constant value
if and only if all definition statements in IN((B) which define A give
A the same constant value.
11.4.8. Give an algorithm using interval analysis as a basis to determine
whether a statement S is useless, i.e., whether there is some statement
which might use the value defined by S.
11.4.9. Let (B be a block of a flow graph with edges to (B1 and (B2. Let d be
a definition statement in (B whose value is not used by (B. If no block
accessible from (B2 uses the value defined by d, then d may be moved
to (B1. Use interval analysis as the basis of an algorithm to detect such
situations.
11.4.10. Compute IN for each block of the following program"
N ~---- 2
Y: I ~--- 2
W: ifI<NgotoX
write N
Z: N~--N+ 1
goto Y
X: J + - - remainder(N, I)
if J -- 0 goto Z
I~ -I+1
goto W
11.4.11. Compute IN for each block of the following program"
read I
i f I = 1 goto X
Z: ifI> 10goto Y
X: J.~----I + 3
write J
W: I< -I + 1
goto Z
Y: I<---I-- 1
if I > 15 goto FV
halt
"11.4.12. Let T1 and Tz be two transformations defined on flow graphs as

follows:
Ta: If node n has an edge to itself, remove that edge.
T2: I f node n has a single entering edge, from node m, and n is
not t h e begin node, merge m and n by replacing the edge
from m to n by edges from m to each node n' which was
formerly entered by an edge from n. Then delete n.
Show that if T~ and Tz are applied to a flow graph F until they can
no longer be applied, then the result is the limit of F.
"11.4.13. Use the transformations T1 and T2 to give an alternative way of
computing the IN function without using interval analysis.
*'11.4.14. Let G be a flow graph with initial node no. Show that G is irreducible
if and only if it has nodes n~, n2, and n3 such that there are paths
fromn0 to n~, from nl to nz and n3, from n2 to n3, and from n3 to n2
(See Fig. 11.48) which do not coincide except at their end points. All
of n0, nl, n2 and n3 must be distinct, with the exception that nl may
be no.
Fig. 11.48 Pattern in every irreducible

flow graph.
11.4.15. Show that every d-chart (See Section 1.3.2) is reducible. Hint: Use Exer-
cise 11.4.14.
11.4.16. Show that every F O R T R A N program in which every transfer to a
previous statement of the program is caused b y a DO loop has a
reducible flow graph.
EXERCISES 959
*'11.4.17. Show that one can determine in time 0(n log n) whether a program
flow graph is reducible. Hint' Use Exercise 11.4.12.
"11.4.18. What is the relation between the concepts of intervals and single-entry
regions ?
"11.4.19. Give an interval-based algorithm that determines for each expression
(say A + B) and each block (B whether every execution of the program
must reach a statement which computes A q- B (i.e., there is a state-
ment of the form C ~ A + B ) and which does not subsequently redefine
A or B. Hint: If 63 is not the begin block, let IN(63) -- ~ OUT(63i),
where the 63i's are all the direct predecessors of 63. Let OUT(63) be
(IN(63) ~ X) w Y
where X is the set of expressions "killed" in 63 (we "kill" ,4 + B by

redefining A or B) and Y is the set of expressions computed by the
block and not killed. For each interval I of the various derived graphs
and each exit s o f / , compute GEN'(/, S) to be the set of expres-
sions which are computed and not subsequently killed in every path
from the entrance of I to exit s. Also, compute TRANS'(/, s) to be
the set of expressions which if killed, are subsequently generated in
every such path. Note that GEN'(/, s) ~ T R A N S ' (/, s).
• 11.4.20. Give an algorithm, based on interval analysis, that determines for
each variable A and each block 63 whether there is an execution
which after passing through 63 will reference A before redefining it.
11.4.21. Let F be an n node flow graph with e edges. Show that the ith-derived
graph of F has no more than e -- i edges.
"11.4.22. Give an example of an n node flow graph with 2n edges whose derived
sequence is of length n.
"11.4.23. Show that Algorithm 11.7 and the algorithms of Exercises 11.4.19
and 1 1.4.20 take at most 0(n z) bit vector steps on flow graphs of n
nodes and at most 2n edges.
There is another approach to data flow analysis which is tabular
in nature. For example, in analogy with Algorithm 11.8, we could
compute a table IN(d, 63) which had value 1 if definition d was in
IN(63) and 0 otherwise. Initially, let IN(d, 63) = 1 if and only if there is
a node 63' with an edge to 63 and d is in GEN(63'). For each 1 added
to the table, say at entry (d, 63), place a 1 in entry (d, 63") if there is
an edge from 63 to 63" and 63 does not kill d.
"11.4.24. Show that the above algorithm correctly computes IN(63) and that it
operates in time O(mn) on an n node flow graph with at most 2n edges
and m definitions.
"11.4.25. Give algorithms similar to the above performing the tasks of Exercises
1 1.4.19 and 11.4.20. How fast do they run?
*'11.4.26. Give algorithms requiring 0(n log n) bit vector steps to compute the IN
functions of Algorithm 11.7 or Exercise 11.4.19 for flow graphs of n
nodes.
*'11.4.27. Show that a flow graph is reducible if and only if its edge set can be
partitioned into two sets El and E2, where (1)E1 forms a dag, and
(2) If (m, n) is in E2, then m = n, or n dominates m.
*'11.4.28. Give a n 0 ( n log n) algorithm to compute direct dominators for an n
node reducible graph.
Research Problems
11.4.29. Suggest someadditional data flow information (other than that men-
tioned in Algorithm 11.7 and Exercises 11.4.19 and 11.4.20) which
would be useful for code optimization purposes. Give algorithms to
compute these, both for reducible and for irreducible flow graphs.
11.4.30. Are there techniques to compute the IN function of Algorithm 11.8 or
other data flow functions that are superior to node splitting for irre-
ducible graphs? By "superior," we are assuming that bit vector
operations are permissible, or else the algorithms of Exercises 11.4.24
and 11.4.25 are clearly optimal.
BIBLIOGRAPHIC NOTES
The interval analysis approach to code optimization was developed by Cocke

[1970] and further elaborated by Cocke and Schwartz [1970] and Allen [1970].
Kennedy [1971] discusses a global algorithm that uses interval analysis to recognize
active variables in a program (Exercise 11.4.20).
The solutions to Exercises 11.4.12-11.4.16 can be found in Hecht and Ullman
[1972a]. Exercise 11.4.17 is from Hopcroft and Ullman [1972b]. An answer to
Exercise 11.4.19 can be found in Cocke [1970] or Schaefer [1973]. Exercises 11.4.24
and 11.4.26 are from Ullman [1972b]. Exercise 11.4.27 is from Hecht and Ullman
[1972b]. Exercise 11.4.28 is from Aho, Hopcroft and Ullman [1972].
There are several papers that discuss the implementation of optimizing com-
pilers. Lowry and Medlock [1969] discuss some optimizations used in the OS/360
F O R T R A N H compiler. Busam and Englund [1969] present techniques for the
recognition of common subexpressions, the removal of invariant computations
from loops, and register allocation in another F O R T R A N compiler.
Knuth [1971] collected a large sample of F O R T R A N programs and analyzed
some of their characteristics.
BIBLIOGRAPHY FOR V O L U M E S !
AND II
AHO, A. V. [1968]
Indexed grammars~an extension of context-free grammars.
J. A CM 15: 4, 647-671.
AnD, A. V. (ed.) [1973]
Currents in the Theory of Computing.
Prentice-Hall, Englewood Cliffs, N.J.
AHO, A. V., P. J. DENNING, and J. D. ULLMAN [1972]
Weak and mixed strategy precedence parsing.
J. A C M 19:2, 225-243.
AHO, A. V., J. E. HOPCROFT, and J. D. ULLMAN[1968]
Time and tape complexity of pushdown automaton languages.
AHO A. V., J. E. HOPCROFT, and J. D. ULLMAN [1972]
On finding lowest common ancestors in trees.
Proc. Fifth Annual ACM Symposium on Theory of Computing, (May, 1973),
253-265.
AnD, A. V., S. C. JOHNSON,and J.D. ULLMAN [1972]
Deterministic parsing of ambiguous grammars.
Unpublished manuscript, Bell Laboratories, Murray Hill, N.J.
AHO, A. V., R. SETHI, and J. D. ULLMAN[1972]
Code optimization and finite Church-Rosser systems.
In Design and Optimization of Compilers
(R. Rustin, ed.). Prentice-Hall, Englewood
Cliffs, N.J., pp. 89-106.
AHO, A. V., and J. D. ULLMAN [1969a]
Syntax directed translations and the pushdown assembler.
961
962 BIBLIOGRAPHYFOR VOLUMES I AND II
AHO, A. V., and J. D. ULLMAN[1969b]

Properties of syntax directed translations.
J. Computer and System Sciences 3: 3, 319-334.
AHO, A. V., and J. D. ULLMAN[1971]
Translations on a context-free grammar.
Information and Control 19: 5, 439-475.
AHO, A. V., and J. D. ULLMAN[1972a]
Linear precedence functions for weak precedence grammars.
International J. Computer Mathematics, Section A, 3, 149-155.
AHO, A. V., and J. D. ULLMAN[1972b]
Error detection in precedence parsers.
Mathematical Systems Theory, 7:2 (February 1973), 97-113.
AHO, A. V., and J. D. ULLMAN[1972C]
Optimization of LR(k) parsers.
J. Computer and System Sciences, 6:6, 573-602.
AHO, A. V., and J. D. ULLMAN[1972d]
A technique for speeding up LR(k) parsers.
Proc. Fourth Annual A C M Symposium on Theory of Computing, pp. 251-263.
AHO, A. V., and J. D. ULLMAN[1972e]
Optimization of straight line code.
S I A M J. on Computing 1: 1, 1-19.
AHO, A. V., and J. D. ULLMAN[1972f]
Equivalence of programs with structured variables.
J. Computer and System Sciences 6: 2, 125-137.
AHO, A. V., and J. D. ULLMAN[1972g]
LR(k) syntax directed translation.
ALLARD, R. W., K. A. WOLF, and R. A. ZEMLIN [1964]
Some effects of the 6600 computer on language structures.
Comm. A C M 7:2, 112-127.
ALLEN, F. E. [1969]
Program optimization.
Annual Review in Automatic Programming, Vol. 5., Pergamon, Elmsford, N.Y.
ALLEN, F. E. [1970]
Control flow analysis.
A C M SIGPLAN Notices 5:7, 1-19.
ALLEN, F. E., and J. COCKE [1972]
A catalogue of optimizing transformations.
In Design and Optimization of Compilers
(R. Rustin, ed.). Prentice-Hall, Englewood Cliffs, N.J., pp. 1-30.
ANDERSON, J. P. [1964]
A note on some compiling algorithms.
Comm. A C M 7:3, 149-150.
BIBLIOGRAPHY FOR VOLUMES I AND II 963
ANS X.3.9 [1966]

American National Standard FORTRAN.
American National Standards Institute, New York.
ANSI SUBCOMMITTEEX3J3 [1971]
Clarification of FORTRAN standards--second report.
Comm. A C M 14:10, 628-642.
ARBIB, M. A. [1969]
Theories of Abstract Automata.
BACKUS, J. W., et al. [1957]
The FORTRAN Automatic Coding System.
Proc. Western Joint Computer Conference, Vol. 11, pp. 188-198.
BAER, J. L., and D. P. BOVET[1968]
Compilation of arithmetic expressions for parallel computations.
Proc. IFIP Congress 68, B4-B10.
BAGWELL, J. T. [1970]
Local optimizations.
A CM SIGPLAN Notices 5:7, 52-66.
BAR-HILLEL, Y. [1964]
Language and Information.
Addison-Wesley, Reading, Mass.
BAR-HILLEL Y., M. PERLES, and E. SHAMIR [1961]
On formal properties of simple phrase structure grammars.
Z. Phonetik, Sprachwissenschaft und Kommunikationsforschung 14, 143-172.
Also in Bar-Hillel [1964], pp. 116-150.
BARNETT, M. P., and R. P. FUTRELLE[1962]
Syntactic analysis by digital computer.
_
Comm. ACM 5: 10, 515-526.

BAUER, H., S. BECKER,and S. GRAHAM[1968]
ALGOL W implementation.
CS98, Computer Science Department, Stanford University, Stanford, Cal.
BEALS, A. J. [1969]
The generation of a deterministic parsing algorithm.
Report No. 304, Department of Computer Science,
University of Illinois, Urbana.
BEALS, A. J., J. E. LAFRANCE, and R. S. NORTHCOTE[1969]
The automatic generation of Floyd production syntactic analyzers.
Report No. 350. Department of Computer Science,
BEATTY, J. C. [1972]
An axiomatic approach to code optimization for expressions.
J. ACM, 19: 4.
964 BIBLIOGRAPHY FOR VOLUMES I AND II
BELADY, L. A. [1966]
A study of replacement algorithms for a virtual storage computer.
IBM Systems J. 5, 78-82.
BELL, J. R. [1969]
A new method for determining linear precedence functions for precedence
grammars.
Comm. A CM 12:10, 316-333.
BELL, J. R. [I 970]
The quadratic quotient method: a hash code eliminating secondary clustering.
Comm. A C M 13:2, 107-109.
BERGE, C. [1958]
The Theory of Graphs and Its Applications.
Wiley, New York.
BIRMAN,A., and J. D. ULLMAN[1970]
Parsing algorithms with backtrack.
IEEE Conference Record of llth Annual Symposium on Switching and Automata
BLATTNER, M. [1972]
The unsolvabi!ity of the equality problem for sentential forms of context-free
languages.
Unpublished Memorandum, UCLA, Los Angeles, Calif. To appear in JCSS.
BOBROW, D. G. [1963]
Syntactic analysis of English by computer--a survey.
Proc. AFIPS Fall Joint Computer Conference, Vol. 24.
BooK, R. V. [1970]
Problems in formal language theory.
Proc. Fourth Annual Princeton Conference on Information Sciences and Systems,
pp. 253-256. Also see Aho [1973].
BOOTH, T. L. [1967]
Sequential Machines and Automata Theory.
Wiley, New York.
BORODIN, A. [1970]
Computational complexity--a survey.
pp. 257-262. Also see Aho [1973].
BRACHA, N. [1972]
Transformations on loop-free program schemata.
Report No. UIUCDCS-R-72-516, Department of Computer Science, Univer-
sity of Illinois, Urbana
BRAFFORT, P., and D. HIRSCHBERG(eds.) [1963]
Computer Programming and Formal Systems.
North-Holland, Amsterdam.
BIBLIOGRAPHY FOR VOLUMESi AND II 965
BREUER, M. A. [1969]
Generation of optimal code for expressions via factorization.
Comm. A C M 12: 6, 333-340.
BROOKER, R. A., and D. MORRIS[1963]
The compiler-compiler.
Annual Review in Automatic Programming, Vol. 3.
Pergamon, EImsford, N.Y., pp. 229-275.
BRUNO, J. L., and W. A. BURKHARD[1970]
A circularity test for interpreted grammars.
Technical Report 88. Computer Sciences Laboratory, Department of Electrical
Engineering, Princeton University, Princeton, N.J.
BRZOZOWSKI,J. A. [1962]
A survey of regular expressions and their applications.
IRE Trans. on Electronic Computers 11:3, 324-335.
BRZOZOWSKI, J, A. [1964]
Derivatives of regular expressions.
J. A C M 11:4, 481-494.
BUSAM, V. A., and D. E. ENGLUND [1969]
Optimization of expressions in Fortran.
Comm. A CM 12:12, 666-674.
CANTOR, D. G. [1962]
On the ambiguity problem of Backus systems.
J. A CM 9: 4, 477--479.
CAVINESS, B. F. [1970]
On canonical forms and simplification.
J. A C M 17:2, 385-396.
CHEATHAM, T. E., [1965]
The TGS-Ii translator-generator system.
Proc. IFIP Congress 65. Spartan, New York, pp. 592-593.
CHEATHAM, T. E. [1966]
The introduction of definitional facilities into higher level programming
languages.
Proc. AFIPS Fall Joint Computer Conference. Vol. 30.
CHEATHAM, T. E. [1967]
The Theory and Construction of Compilers (2nd ed.).
Computer Associates, Inc., Wakefield, Mass.
CHEATHAM, T. E., and K. SATTLEY[1964]
Syntax directed compiling.
Proc. AFIPS Spring Joint Computer Conference, Vol. 25. Spartan, New York,
pp. 31-57.
CHEATHAM, T. E., and T. STANDISH[1970]

Optimization aspects of compiler-compilers.
A C M SIGPLAN Notices 5: 10, 10-17.
CHEN, S. [1972]
On the Sethi-Ullman algorithm.
Unpublished memorandum, Bell Laboratories, Holmdel, N.J.
CHOMSKY, N. [1965]
Three models for the description of language.
IEEE Trans. on Information Theory 2: 3, 113-124.
CHOMSKY, N. [1957]
Syntactic Structures.
Mouton and Co., The Hague.
CHOMSKY, N. [1959a]
On certain formal properties of grammars.
CHOMSKY, N. [1959b]
A note on phrase structure grammars.
CHOMSKY, N. [1962]
Context-free grammarsand pushdown storage.
Quarterly Progress Report No. 65. Research Laboratory of Electronics,
Massachusetts Institute of Technology, Cambridge, Mass.
CHOMSKY, N. [1963]
Formal properties of grammars.
In Handbook of Mathematical Psychology, Vol. 2, R.D. Luce, R. R. Bush,
and E. Galanter (eds.). Wiley, New York, pp. 323-4i8.
CHOMSKY, N. [1965]
Aspects of the Theory of Syntax.
M.I.T. Press, Cambridge, Mass.
CHOMSKY, N., and G. A. MILLER[1958]
Finite state languages.
CHOMSKY, N., and M. P. SCHUTZENBERGER[1963]
The algebraic theory of context-free languages.
In Braffort and Hirschberg [1963], pp. 118-161.
CHRISTENSEN, C., and J. C. SHAW (eds.) [1969]
Proc. of the extensible languages symposium.
A CM SIGPLAN Notices 4: 8.
CHURCH, A. [1941]
The Calculi of Lambda-Conversion.
Annals of Mathematics Studies, Vol. 6.
Princeton University Press, Princeton, N.J.
BIBLIOGRAPHY FOR VOLUMESI AND II 967
CHURCH, A. [1956]
Introduction to Mathematical Logic.
CLARK, E. R. [1967]
On the automatic simplification of source language programs.
Comm. A C M 10:3, 160-164.
COCKE, J. [1970]
Global common subexpression elimination.
A CM SIGPLAN Notices 5:7, 20-24.
COCKE, J., and J. T. SCHWARTZ[1970]
Programming Languages and Their Compilers (2rid ed.).
Courant Institute of Mathematical Sciences, New York University, New York.
COHEN, D. J., and C. C. GOTLIEB [1970]
A list structure form of grammars for syntactic analysis.
Computing Surveys 2: 1, 65-82.
COHEN, R. S., and K. CULIK, II [1971]
LR-regular grammars--an extension of LR(k) grammars.
IEEE Conference Record of 12th Annual Symposium on Switching and Auto-
mata Theory, pp. 153-165.
COLMERAUER, A. [1970]
Total precedence relations.
J. A C M 17:1, 14-30.
CONWAY, M. E. [1963]
Design of a separable transition-diagram compiler.
Comm. A C M 6:7, 396-408.
CONWAY, R. W., and W. L. MAXWELL[1963]
CORC: the Cornell computing language.
Comm. A C M 6:6, 317-321.
CONWAY, R. W., and W. L. MAXWELL[1968]
CUPL--an approach to introductory computing instruction.
Technical Report No. 68-4. Department of Computer Science,
Cornell University, Ithaca, N.Y.
CONWAY, R. W., et al. [1970]
PL/C. A high performance subset of PL/I.
Technical Report 70-55. Department of Computer Science,
COOK, S. A. [1971]
Linear time simulation of deterministic two-way pushdown automata.
Proc. IFIP Congress 71, TA-2. North-Holland, Amsterdam. pp. 174-179.
COOK, S. A., and S. D. AANDERAA[1969]
On the minimum computation time of functions.
Trans. American Math. Soc. 142, 291-314.
CULIK, K., II [1968]

Contribution to deterministic top-down analysis of context-free languages.
Kybernetika 4: 5, 422---431.
CULIK, K., II [1970]
n-ary grammars and the description of mapping of languages.
Kybernetika 6, 99-117.
DAVIS, i . [1958]
Computability and Unsolvability.
DAviS, M. (ed.) [1965]
The Undecidable. Basic papers in undecidable propositions, unsolvable problems
and computable functions.
Raven Press, New York.
DE BAKKER, J . W . [1971]
Axiom systems for simple assignment statements.
In Engeler [1971], pp. 1-22.
DEREMER, F. L. [1968]
On the generation of parsers for BNF grammars: an algorithm.
Report No. 276. Department of Computer Science,
Practical translators for LR(k) languages.
Ph. D. Thesis, Massachusetts Institute of Technology, Cambridge, Mass.
Simple LR(k) grammars.
Comm. A C M 14: 7, 453-460.
DEWAR, R. B. K., R. R. HOCHSPRUNG, and W. S. WORLEY[1969]
The IITRAN programming language.
Comm. A C M 12: 10, 569-575.
EARLEY, J. [1966]
Generating a recognizer for a BNF grammar.
Computation Center Report, Carnegie-Mellon University, Pittsburgh.
EARLEY, J. [1968]
An efficient context-free parsing algorithm.
Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh.
Also see Comm. A CM (February, 1970) 13 : 2, 94-102.
EICKEL, J., i . PAUL, F. L. BAUER, and K. SAMELSON[1963]
A syntax-controlled generator of formal language processors.
Comm. A C M 6:8, 451-455.
ELSON, M., and S. T. RAKE [1970]
Code-generation technique for large-language compilers.
IBM Systems J. 9: 3, 166-188.
BIBLIOGRAPHYFOR VOLUMESI AND II 969
ELSPAS, B., M. W. GREEN, and K. N. LEVITT[1971]

Software reliability.
Computer l, 21-27.
ENGELER, E. (ed.) [1971]
Symposium on Semantics of Algorithmic Languages.
Lecture Notes in Mathematics. Springer, Berlin.
EVANS, A., JR. [1964]
An ALGOL 60 compiler.
Pergamon, Elmsford, N.Y., pp. 87-124.
EVEY, R. J. [1963]
Applications of pushdown-store machines.
FELDMAN, J. A. [1966]
A formal semantics for computer languages and its application in a compiler-
compiler.
Comm. ,4 CM 9: l, 3-9.
FELDMAN, J. A., and D. GRIES [1968] •
Translator writing systems.
Comm. A CM 11" 2, 77-113.
FISCHER, M. J. [1968]
Grammars with macro-like productions.
IEEE Conference Record of 9th Annual Symposium on Switching and Automata
Some properties of precedence languages.
Proc. A C M Symposium on Theory of Computing, pp. 181-190.
Efficiency of equivalence algorithms.
Memo No. 256, Artificial Intelligence Laboratory, Massachusetts Institute of
Technology, Cambridge, Mass.
FLOYD, R. W. [1961a]
An algorithm for coding efficient arithmetic operations.
Comm. A C M 4' 1, 42-51.
FLOYD, R. W: [1961b]
A descriptive language for symbol manipulation.
J. A CM 8.: 4, 579-584.
FLOYD' R. W. [1962a]
Algorithm 97" shortest path.
Comm. A C M 5" 6, 345.
FLOYD, R. W. [1962b]
On ambiguity in phrase structure languages.
Comm. A C M 5:10, 526-534.
FLOYD, R. W. [1963]
Syntactic analysis and operator precedence.
J. A C M 10:3, 316-333.
Bounded context syntactic analysis.
C o m m . / I C M 7:2, 62-67.
The syntax of programming languages--a survey.
I E E E Trans. on Electronic Computers EC-13: 4, 346-353.
Assigning meanings to programs.
In Schwartz [1967], pp. 19-32.
Nondeterministic algorithms.
J . / 1 C M 14: 4, 636-644.
FRAILLY, D. J. [1970]
Expression optimization using unary complement operators.
A C M SIGPL/1N .Notices 5:7, 67-85.
FREEMAN, D. N. [1964]
Error correction in CORC, the Cornell computing language.
GALLER, B. A., and A. J. PERLIS [1967]
A proposal for definitions in ALGOL.
C o m m . / 1 C M 10: 4, 204-219.
GARWICK, J. V. [1964]
GARGOYLE, a language for compiler writing.
Comm. A C M 7: 1, 16-20.
GARWICK, J. V. [1968]
GPL, a truly general purpose language.
C o m m . / 1 C M 11:9, 634-638.
GEAR, C. W. [1965]
High speed compilation of efficient object code.
Comm. A C M 8:8, 483-487.
GENTLEMAN, W. M. [1971]
A portable c0routine system.
Proc. IFIP Congress 71, TA-3. North-Holland, Amsterdam, pp. 94-98.
GILL, A. [1962]
Introduction to the Theory o f Finite State Machines.
GINSBURG, S. [1962]
An Introduction to Mathematical Machine Theory.
GINSBURG, S. [1966]
The Mathematical Theory of Context-Free Languages.
GINSBURG, S., and S. A. GREIBACH[1966]
Deterministic context-free languages.
GINSBURG, S., and S. A. GREIBACH[1969]
Abstract families of languages.
Memoir American Math. Soc. No. 87, 1-32.
GINSBURG, S., and H. G. RICE [1962]
Two families of languages related to ALGOL.
J. A C M 9:3, 350-371.
GINZBURG, A. [1968]
Algebraic Theory of Automata.
Academic Press, New York.
GLENNIE, A. [1960]
On the syntax machine and the construction of a universal compiler.
Technical Report No. 2. Computation Center,
Carnegie-Mellon University, Pittsburg, Pa.
GRAHAM, R. L. [1972]
Bounds on multiprocessing anomalies and related packing algorithms.
Proc. AFIPS Spring Joint Computer Conference, Vol. 40
AFIPS Press, Montvale, N.J. pp. 205-217.
GRAHAM, R. M. [1964]
Bounded context translation.
Proc. AFIPS Spring Joint Computer Conference, Vol. 25.
GRAHAM, S. L. [1970]
Extended precedence languages, bounded right context languages and deter-
ministic languages.
IEEE Conference Record of l lth Annual Symposium on Switching and Auto-
GRAU, A. A., U. HILL, and H. LANGMAACK[1967]
Translation of ALGOL 60.
Springer-Verlag, New York
GRAY, J. N. [1969]
Precedence parsers for programming languages.
Ph.D. Thesis, Department of Computer Science, University of California,
Berkeley.
GRAY, J. N., and M. A. HARRISON[1969]

Single pass precedence analysis.
IEEE Conference Record of lOth Annual Symposium on Switching and Auto-
GRAY, J. N,, M. A. HARRISON, and O. IBARRA [1967]
Two way pushdown automata.
Information and Control 11 : 1, 30-70.
GREIBACH, S. A. [1965]
A new normal form theorem for context-free phrase structure grammars.
J. A C M 12: l, 42-52.
GREIBACH S. A., and J. E. HOPCROFT[1969]
Scattered context grammars.
GRIES, D. [1971]
Compiler Construction for Digital Computers.
Wiley, New York.
GRIFFITHS, T. V. [1968]
The unsolvability of the equivalence problem for A-free nondeterministic
generalized machines.
J. A C M 15:3, 409-413.
GRIFFITHS, T. V., and S. R. PETRICK [1965]
On the relative efficiencies of context-free grammar recognizers.
Comm. A CM 8: 5, 289-300.
GRISWOLD, R. E., J. F. POAGE, and I. P. POLONSKY[1971]
The SNOBOL4 Programming Language (2nd ed.).
GROSS, M., and A. LENTIN [1970]
Introduction to Formal Grammars.
Springer-Verlag, New York.
HAINES, L. H. [1970]
Representation theorems for context-sensitive languages.
Department of Electrical Engineering and Computer Sciences, University o f
California, Berkeley.
HALMOS, P. R. [1960]
Naive Set Theory.
Van Nostrand Reinhold, New York.
HALMOS, P. R. [1963]
Lectures on Boolean Algebras.
HARARY, E. [1969]
Graph Theory.
HARRISON, M. A. [1965]
Introduction to Switching and Automata Theory.
McGraw-Hiil, New York.
HARTMANIS, J., and J. E. HOPCROFT[1970]
An overview of the theory of computational complexity.
J. A C M 18:3, 444--475.
HARTMANIS, J., P. M. LEWIS, II, and R. E. STEARNS[1965]
Classifications of computations by time and memory requirements.
Proc. IFIP Congress 65. Spartan, New York, pp. 31-35.
HAYNES, H. R., and L. J. SCHUTTE[1970]
Compilation of optimized syntactic recognizers from Floyd-Evans productions.
HAYS, D. G. [i967]
Introduction to Computational Linguistics.
American Elsevier, New York.
HECHT, M. S., and J. D. ULLMAN[1972a]
Flow graph reducibility.
S I A M J. on Computing 1"2, 188-202.
HECHT, M. S., and J. D. ULLMAN [1972b]
Unpublished memorandum,
Department of Electrical Engineering, Princeton University
HELLERMAN, H . [1966]
Parallel processing of algebraic expressions.
1EEE Trans. on Electronic Computers EC-15:1, 82-91.
HEXT, J. B., and P. S. ROBERTS[1970]
Syntax analysis by Domolki's algorithm.
Computer J. 13: 3, 263-271.
HOPCROFT, J. E. [1971]
An n log n algorithm for minimizing states in a finite automaton.
CS71-190. Computer Science Department, Stanford University, Stanford, Cal.
Also in Theory of Machines and Computations, Z. Kohavi and A. Paz (eds).
Academic Press, New York, pp. 189-196.
HOPCROFT, J. E., and J. D. ULLMAN[i967]
An approach to a unified theory of automata.
Bell System Tech. J. 46: 8, 1763-1829.
HOPCROFT, J. E., and J. D. ULLMAN[1969]
Formal Languages and Their Relation to Automata.
HOPCROFT, J. E., and J. D. ULLMAN[1972a]
Set merging algorithms.
Unpublished memorandum. Department of Computer Science,
HOPCROFT, J.E., and J. D. ULLMAN [1972b]

An n log n algorithm to detect reducible graphs
Proc. Sixth Annual Princeton Conference on Information Sciences and Systems,
pp. 119-122.
HOPGOOD, F. R. A. [1969]
Compiling Techniques.
American Elsevier, New York.
HOPKINS, M. E. [1971]
An optimizing compiler design.
Proc. IFIP Congress 71, TA-3. North-Holland, Amsterdam, pp. 69-73.
HORWtTZ, L. P., R. M. KARP, R. E. MILLER, and S. WINOCRAD [1966]
Index register allocation.
J. A C M 13: 1, 43-61.
HUFFMAN, D. A. [1954]
The synthesis of sequential switching circuits.
J. Franklin Institute 257, 3-4, 161, 190, and 275-303.
HUXTABLE, D. H. R. [1964]
On writing an optimizing translator for ALGOL 60.
In Introduction to System Programming, Academic Press, New York.
IANOV, I. I. [1958]
On the equivalence and transformation of program schemes.
Translation in Comm. A C M 1:10, 8-11.
IBM [1969]
System 360 Operating System PL/I (F) Compiler Program Logic Manual.
Publ. No. Y286800, IBM, Hursley, Winchester, England.
ICHBIAH, J. D., and S. P. MORSE [1970]
A technique for generating almost optimal Floyd-Evans productions for
precedence grammars.
Comm. A C M 13: 8, 501-508.
IGARISHI, S. [1,968]
On the equivalence of programs represented by Algol-like statements.
Report of the Computer Centre, University of Tokyo 1, pp. 103-118.
INGERMAN, P. Z. [1966]
A Syntax Oriented Translator.
IRLAND, M. I., and P. C. FISCHER [1970]
A bibliography on computational complexity.
CSRR 2028. Department of Applied Analysis and Computer Science,
University of Waterloo, Ontario.
IRONS, E. T. [ 1961]
A syntax directed compiler for ALGOL 60.
Comm. A C M 4:1, 51-55.
IRONS, E. T. [1963a] "

An error correcting parse algorithm.
Comm. A C M 6: 11,669-673.
IRONS, E. T. [1963b]
The structure and use of the syntax directed compiler.
IRONS, E. T. [1964]
Structural connections in formal languages.
Comm. A C M 7: 2, 62-67.
JOHNSON, W. L., J. H. PORTER, S. I. ACKLEY, and D. T. Ross [1968]
Automatic generation of efficient lexical processors using finite state tech-
niques.
Comm. A C M 11:12, 805-813.
KAMEDA, T., and P. WEINER [1968]
On the reduction of nondeterministic automata.
Proc. Second Annual Princeton Conference on Information Sciences and Systems,
pp. 348-352.
KAPLAN, D. M. [1970]
Proving things about programs.
Proc. 4th Annual Princeton Conference on Information Sciences and Systems,
pp. 244-25 I.
KASAMI, T. [1965]
An efficient recognition and syntax analysis algorithm for context-free lan-
guages.
Scientific Report AFCRL-65-758. Air Force Cambridge Research Laboratory,
Bedford, Mass.
KASAMI, T., and K. TORII [1969]
A syntax analysis procedure for unambiguous context-free grammars.
J. A C M 16: 3, 423-431.
KENNEDY, K. [1971]
A global flow analysis algorithm.
International J. Computer Mathematics 3: 1, 5-16
KENNEDY, K. [1972]
Index register allocation in straight line code and simple loops.
In Design and Optimization of Compilers (R. Rustin, ed.).
Prentice-Hail, Englewood Cliffs, N.J., pp. 51-64.
KLEENE, S. C, [1952]
Introduction to Metamathematics.
KLEENE, S. C. [1956]
Representation of events in nerve nets.
In Shannon and McCarthy [1956], pp. 3--40.
KNUTH, D. E. [1965]
On the translation of languages from left to right.
KNUTH, D. E. [1967]
Top-down syntax analysis.
Lecture Notes.
International Summer School on Computer Programming, Copenhagen.
Also in Acta lnformatica 1: 2, (1971), 79-110.
KNUTH, D. E. [1968a]
The Art of Computer Programming, Vol. 1 : Fundamental Algorithms.
KNUTH, D. E. [1968b]
Semantics of context-free languages.
Math. Systems Theory 2: 2, 127-146.
Also see Math. Systems Theory 5: 1, 95-95.
KNUTH, D. E. [ 1971 ]
An empirical study of FORTRAN programs.
Software-Practice and Experience 1 : 2, 105-134.
KNUTH, D. E. [1973]
The Art of Computer Programming, Vol. 3: Sorting and Searching.
KORENJAK, A. J. [1969]
A practical method for constructing LR(k) processors.
Comm. ACM ~12: II, 613-623.
KORENJAK, A. J., and J. E. HOPCROFT [1966]
Simple deterministic languages.
Theory, pp. 36--46.
KOSARAJU, S. R. [1970]
Finite state automata with markers.
p. 380.
KUNO, S., and A. G. OETTiNGER [1962]
Multiple-path syntactic analyzer.
Information Processing 62 (IFIP Congress), Popplewell (ed.).
North-Holland, Amsterdam, pp. 306-311.
KURKt-SUONtO, R. [1969]
Notes on top-down languages.
BIT 9, 225-238.
LAFRANCE, J. [1970]
Optimization of error recovery in syntax directed parsing algorithms.
A CM SIGPLAN Notices 5: 12, 2-17.
LALONDE, W. R., E. S. LEE, and J. J. HORNING [1971]

An LALR(k) parser generator.
Proc. IFIF Congress 71, TA-3. North-Holland, Amsterdam, pp. 153-157.
LEAVENWORTH, B. M. [1966]
Syntax macros and extended translation.
Comm. A C M 9:11,790--793.
LEDLEY, R. S., and J. B. WILSON [1960]
Automatic programming language translation through syntactical analysis.
Comm. A C M 3, 213-214.
LEE, J. A. N. [1967]
Anatomy of a Compiler.
LEINIUS, R. P. [1970]
Error detection and recovery for syntax directed compiler systems.
Ph.D. Thesis, University of Wisconsin, Madison.
LEWIS, P. M., II, and D. J. ROSENKRANTZ[197I]
An ALGOL compiler designed using automata theory.
Proc. Symposium on Computers and Automata, Microwave Research Institute
Symposia Series, Vol. 21. Polytechnic Institute of Brooklyn., New York.,
pp. 75-88.
LEw~s, P. M., II, and R. E. STEARNS[1968]
Syntax directed transduction.
J. A C M 15: 3, 464488.
LOECKX, J. [I 970]
An algorithm for the construction of bounded-context parsers.
Comm. A C M 13:5, 297-307.
LOWRY, E. S., and C. W. MEDLOeK [1969]
Object code optimization.
Comm. A C M 12:1, 13-22.
LUCAS, P., and K. WALK [1969]
On the formal description of PL/I.
Annual Review in Automatic Programming, Vol. 6, No. 3.
LUCKHAM, D. C., D. M. R. PARK, and M. S. PATERSON[1970]
On formalized computer programs.
MANNA, Z. [1973]
Program schemas.
In Aho [1973].
MARKOV, A. A. [1951]
The theory of algorithms (in Russian).
Trudi Mathematicheskova Instituta imeni V. A. Steklova 38, 176-189. (English
translation, American Math. Soc. Trans. 2:15 (1960), 1-14.)
MARILL, M. [1962]
Computational chains and the size of computer programs.
I R E Trans. on Electronic Computers, EC-11: 2, 173-180.
MARTIN, D. F. [1972]
A Boolean matrix method for the computation of linear precedence functions.
Comm. A C M 15:6, 448-454.
MAURER, W. D. [1968]
An improved hash code for scatter storage.
Comm. A C M 11 : 1, 35-38.
MCCARTHY, J. [1963]
A basis for the mathematical theory of computation.
In Braffort and Hirschberg [1963], pp. 33-71.
McCARTHY, J., and J. A. PAINTER [1967]
Correctness of a compiler for arithmetic expressions.
In Schwartz [1967], pp. 33-41.
MCCLURE, R. M. [1965]
T M G - - a syntax directed compiler.
Proc. A C M National Conference, Vol. 20, pp. 262-274.
MCCULLOCH, W. S., and W. PITTS [1943]
A logical calculus of the ideas immanent in nervous activity.
Bulletin o f Math. Biophysics 5, 115-133.
MCILROY, M. D. [1960]
Macro instruction extensions of compiler languages.
Comm. A C M 3: 4, 414-220.
Coroutines.
A manual for the TMG compiler writing language.
Unpublished memorandum, Bell Laboratories, Murray Hill, N.J.
MCKEEMAN, W. M. [1965]
Peephole optimization.
Comm. A C M 8:7, 443--444.
MCKEEMAN, W. M. [1966]
An approach to computer language design.
CS48. Computer Science Department, Stanford University, Stanfordl Cal.
MCKEEMAN, W. M., J. J. HORNING, and D. B. WORTMAN[1970]
A Compiler Generator.
MCNAUGI-ITON, R. [1967]
Parenthesis grammars.
J. A C M 14: 3, 490-500.
MCNAUGHTON, R., and H. YAMADA[1960]

Regular expressions and state graphs for automata.
IRE Trans. on Electronic Computers 9: i, 39-47.
Reprinted in Moore [1964], pp. 157-174.
MENDELSON, E. [1968]
Introduction to Mathematical Logic.
MEYERS, W. J. [1965]
Optimization of computer code.
Unpublished memorandum. G. E. Research Center, Schenectady, N.Y.
MILLER,W. F., and A. C. SHAW [1968]
Linguistic methods in picture processingua survey.
The Thompson Book Co., Washington, D.C., pp. 279-290.
MINSKY, M. [1967]
Computation: Finite and Infinite Machines.
Prentice-Hall, Englewoods Cliffs, N.J.
MONTANARI, U. G. [1970]
Separable graphs, planar graphs and web grammars.
MOORE, E. F. [1965]
Gedanken experiments on sequential machines.
In Shannon and McCarthy [1956], pp. 129-153.
MOORE, E. F. [1964]
Sequential Machines: Selected Papers.
MORGAN, H. L. [1970]
Spelling correction in systems programs.
Comm. A C M 13:2, 90-93.
MORRIS, ROBERT[1968]
Scatter storage techniques.
Comm. A C M 11: 1, 35--44.
MOULTON, P. G., and M. E. MULLER[i967]
A compiler emphasizing diagnostics.
Comm. A C M 10:1, 45-52.
MUNRO, I. [1971]
Efficient determination of the transitive closure of a directed graph.
Information Processing Letters 1:2, 56-58.
NAKATA, I. [1967]
On compiling algorithms for arithmetic expressions.
Comm. A C M 12: 2, 81-84.
NAUR, P. (ed.) [1963]

Revised report on the algorithmic language ALGOL 60.
Comm. A C M 6:1, 1-17.
NIEVERGELT, J. [1965]
On the automatic simplification of computer programs.
Comm. A CM 8:6, 366-370.
OETTINGER, A. [1961]
Automatic syntactic analysis and the pushdown store.
In Structure of Language and its Mathematical Concepts, Proc. 12th Symposium
on Applied Mathematics. American Mathematical Society, Providence, pp.
104-129.
OGDEN, W. [1968]
A helpful result for proving inherent ambiguity.
Mathematical Systems Theory 2: 3, 191-194.
ORE, O. [1962]
Theory of Graphs.
American Mathematical Society Colloquium Publications, Vol. 38, Providence.
PAGER, D. [1970]
A solution to an open problem by Knuth.
PAINTER, J. A. [1970]
Effectiveness of an optimizing compiler for arithmetic expressions.
PAIR, C. [1964]
Trees, pushdown stores and compilation.
RFTI--Chiffres 7: 3, 199-216.
PARIKH, R. J. [1966]
On context-free languages.
J. A CM 13: 4, 570-581.
PATERSON, M. S. [1968]
Program schemata.
Machine Intelligence, Vol. 3 (Michie, ed.).
Edinburgh University Press, Edinburgh, pp. 19-31.
PAUL, M. [1962]
A general processor for certain formal languages.
Proc. ICC Symposium on Symbolic Language Data Processing.
Gordon & Breach, New York, pp. 65-74.
PAULL, M. C., and S. H. UNGER [1968a]
Structural equivalence of context-free grammars.
J. Computer and System Sciences 2:1,427-463.
PAULL, M. C., and S. H. UNGER [1968b]
Structural equivalence and LL-k grammars.
IEEE Conference Record of Ninth Annual Symposium on Switching and Auto-

PAVLIDIS, T. [1972]
Linear and context-free graph grammars.
J. A CM 19: I, 11-23.
PETERSON, W. W. [1957]
Addressing for random access storage.
I B M J. Research and Development 1:2, 130-146.
PETRONE, L. [1968]
Syntax directed mapping of context-free languages.
PFhLTZ, J. L., and A. ROSENFELO[1969]
Web grammars.
Proc. International Joint Conference on Artificial Intelligence,
Washington, D. C., pp. 609-619.
POST, E. L. [1943]
Formal reductions of the general combinatorial decision problem.
American J. of Math. 65, 197-215.
POST, E. L. [1947]
Recursive unsolvability of a problem of Thue.
J. Symbolic Logic, 12, 1-11. Reprinted in Davis [1965], pp. 292-303.
POST, E. L. [1965]
Absolutely unsolvable problems and relatively undecidable propositionsm
account of an anticipation.
In Davis [1965], pp. 338-433.
PRATHER, R. E. [1969]
Minimal solutions of Paull-Unger problems.
Math. System Theory 3:1, 76-85.
PRICE, C.E. [1971]
Table lookup techniques.
A CM Computing Surveys, 3" 2, 49-66.
PROSSER, R. T. [1959]
Applications of Boolean matrices to the analysis of flow diagrams.
Proc. Eastern J. Computer Conference, Spartan Books, N.Y., pp. 133-138.
RABIN, M. O. [1967]
Mathematical theory of automata.
In Schwartz [1967], pp. 173-175.
RABIN, M.O., and D. SCOTT [1959]
Finite automata and their decision problems.
I B M J. Research and Development 3, 114-125.
Reprinted in Moore [1964], pp. 63-91.
RADKE, C. E. [1970]
The use of quadratic residue search.
Comm. A C M 13: 2, 103-109.
RANDELL, B., and L. J. RUSSELL [1964]
ALGOL 60 Implementation.
REDZIEJOWSKI, R. R. [1969]
On arithmetic expressions and trees.
Comm. A C M 12: 2, 81-84.
REYNOLDS, J. C. [1965]
An introduction to the COGENT programming system.
Proc. A C M National Conference, Vol. 20, p. 422.
REYNOLDS, J. C., and R. HASKELL[1970]
Grammatical coverings.
Unpublished memorandum, Syracuse University.
RICHARDSON, D. [1968]
Some unsolvable problems involving elementary functions of a real variable.
J. Symbolic Logic 33, 514-520.
ROGERS, H., JR. [1967]
Theory of Recursive Functions and Effective Computability.
ROSEN S. (ed.) [1967a]
Programming Systems and Languages.
ROSEN, S. [1967b]
A compiler-building system developed by Brooker and Morris.
In Rosen [1967a], pp. 306-331.
ROSENKRANTZ, O. J. [1967]
Matrix equations and normal forms for context-free grammars.
J. A C M 14: 3, 501-507.
ROSENI,ZRANTZ, D. J. [1968]
Programmed grammars and classes of formal languages.
J. A C M 16:1, 107-131.
ROSENKRANTZ, D. J., and P. M. LEWIS, II [1970]
Deterministic left corner parsing.
IEEE Conference Record of llth Annual Symposium on Switching and Automata
ROSENKRANTZ, D. J., and R. E. STEARNS[1970]
Properties of deterministic top-down grammars.
SALOMAA, A. [1966]
Two complete axiom systems for the algebra of regular events.
J. A C M 13: 1,158-169.
SALOMAA, A. [1969a]
Theory of Automata.
Pergamon, Elmsford, N.Y.
SALOMAA, A. [1969b]
On the index of a context-free grammar and language.
Information and Control 14" 5, 474-477.
SAMELSON, K., and F. L. BAUER [1960]
Sequential formula translation.
Comm. A C M 3" 2, 76-83.
SAMMET, J. E. [1969]
Programming Languages" History and Fundamentals.
SCHAEFER, M. [1973]
A Mathematical Theory of Global Program Optimization
Prentice-Hall, Englewood Cliffs, N.J., to appear.
SCHORRE, D. V. [1964]
META II, a syntax oriented compiler writing language,
Proe. A C M National Conference, Vol. 19, pp. Di.3-1-Dt.3-11.
SCHUTZENBERGER, M. P. [1963]
On context-free languages and pushdown automata.
hlformation and Control 6" 3, 246-264.
SCHWARTZ, J. T. (ed.) [1967]
Mathematical Aspects of Computer Science.
Proc. Symposia in Applied Mathematics, Vol. 19.
American Mathematical Society, Providence.
SCOTT, D., and C. STRACHEY[1971]
Towards a mathematical semantics for computer languages.
Proc. Symposium on Computers and Automata, Microwave Research Institute
Symposia Series, Vol. 21. Polytechnic Institute of Brooklyn, New York, pp.
19-46.
SETHI, R. [1973]
Validating register allocations for straight line programs.
Ph.D. Thesis, Department of Electrical Engineering, Princeton University.
SETHI, R., and J. D. ULLMAN[1970]
The generation of optimal code for arithmetic expressions.
J. A C M 17" 4, 715-728.
SHANNON, C. E., and J. MCCARTHY (eds.) [1956]
Automata Studies.
SHAW, A. C. [1970]
Parsing of graph-representable pictures.
J. A C M 17" 3, 453-481.
984 BIBLIOGRAPHYFOR VOLUMESI AND II
SHEPHERDSON, J. C. [1959]
The reduction of two-way automata to one-way automata.
I B M J. Research 3, 198-200. Reprinted in Moore [1964], pp. 92-97.
STEARNS, R. E. [1967]
A regularity test for pushdown machines.
STEARNS, R. E. [1971]
Deterministic top-down parsing.
Proc. Fifth Annual Princeton Conference on Information Sciences and Systems,
pp. 182-188.
STEARNS, R. E., and P. M. LEWIS, II [1969]
Property grammars and table machines.
STEARNS, R. E., and D. J. ROSENKRANTZ[1969]
Table machine simulation.
IEEE Conference Record of lOth Annual Symposium on Switching and Automata
STEEL, T. B. (ed.) [1966]
Formal Language Description Languages for Computer Programming.
North-Holland, Amsterdam.
STONE, H. S. [1967]
One-pass compilation of arithmetic expressions for a parallel processor.
Comm. A CM 10: 4, 220-223.
STRASSEN, V. [1969]
Gaussian elimination is not optimal.
Numerische Mathematik 13, 354-356.
SUPPES, P. [1960]
Axiomatic Set Theory.
TARJAN, R. [1972]
Depth first search and linear graph algorithms.
S I A M J . on Computing 1"2, 146-160.
THOMPSON, K. [1968]
ReguIar expression search algorithm.
Comm. A C M 11:6, 419-422.
TURING, A. M. [1936]
On computable numbers, with an application to the Entscheidungsproblem.
Proc. London Mathematical Soc. Ser. 2, 42, 230-265. Corrections, Ibid., 43
(1937), 544-546.
ULLMAN, J. D. [1972a]
A note on hashing functions.
J. A C M 19:3, 569-575.
ULLMAN, J. D. [1972b]
Fast Algorithms for the Elimination of Common Subexpressions.
Technical Report TR-106, Dept. of Electrical Engineering, Princeton Univer-
sity, Princeton, N.J.
UNGER, S. H. [1968]
A global parser for context-free phrase structure grammars.
Comm. A C M 11 : 4, 240-246, and 11: 6, 427.
VAN WIJNGAARDEN, A. (ed.) [1969]
Report on the algorithmic language ALGOL 68.
Numerische Mathematik 14, 79-218.
WALTERS, D. A. [1970]
Deterministic context-sensitive languages.
Information and Control 17" 1, 14-61.
WARSHALL, S. [t962]
A theorem on Boolean matrices.
J. A C M 9 : 1 , 11-12.
WARSHALL, S., and R. M. SHAPIRO[1964]
A general purpose table driven compiler.
Proc. AFIPS Spring Joint Computer Conference, Vol. 25.
WEGBREIT, B. [1970]
Studies in extensible programming languages.
Ph. D. Thesis, Harvard University, Cambridge, Mass.
WILCOX, T. R. [1971]
Generating machine code for high-level programming Ianguages.
Technical Report 71-103. Department of Computer Science,
WINOGRAD, S. [1965]
On the time required to perform addition.
J. A C M 12: 2, 277-285.
WINOGRAD, S. [1967]
On the time required to perform muItiplication.
J. A C M 14:4, 793-802.
WIRTH, N. [1965]
Algorithm 265: Find precedence functions.
Comm. A CM 8 : 10, 604-605.
WIRTH, N. [1968]
PL 360--a programming language for the 360 computers.
J. A C M 15: i, 37-34.
WIRTH, N., and H. WEBER [1966]
EULER--a generalization of ALGOL and its formal definition, Parts 1 and 2.
Comm. A C M 9: 1-2, 13-23, and 89-99.
WISE, D. S. [1971]
Domolki's algorithm applied to generalized overlap resolvable grammars.
Proc. Third Annual A C M Symposium on Theory of Computing, pp. 171-184.
WOOD, D. [196%]
The theory of left factored languages.
Computer J. 12: 4, 349-356, and 13: 1, 55-62.
WOOD, D. [1969b]
A note on top on top-down deterministic languages.
BIT 9: 4, 387-399.
WooD, D. [1970]
Bibliography 23: Formal language theory and automata theory.
Computing Reviews 11" 7, 417--430.
WOZENCRAFT, J. M., and A. EVANS, JR. [1969]
Notes on Programming Languages.
Department of Electrical Engineering, Massachusetts Institute of Technology,
Cambridge, Mass.
YERSHOV, A. P. [1966]
ALPHANan automatic programming system of high efficiency.
J. A C M 13:1, 17-24.
YOUNGER, D. H. [1967]
Recognition and parsing of context-free languages in time n 3.
INDEX TO LEMMAS, THEOREMS,
AND ALGORITHMS
Theorem Theorem Theorem

Number Page Number Page Number Page
7.1 548 8.9 691 10.1 803

7.2 556 8.10 694 10.2 806
7.3 573 8.11 697 10.3 838
7.4 574 8.12 698 10.4 839
7.5 590 8.13 698 11.1 854
7.6 594 8.14 700 11.2 856
7.7 601 8.15 701 11.3 860
7.8 614 8.16 701 11.4 861
7.9 6.2,5 8.17 705 11.5 862
7.10 639 8.18 707 11.6 888
7.11 640 8.19 710 11.7 890
7.12 653 8.20 711 11.8 894
7.13 657 8.21 713 11.9 901
8.1 669 8.22 716 11.10 916
8.2 672 9.1 731 11.I 1 923
8.3 676 9.2 733 11.12 939
8.4 680 9.3 736 11.13 943
:8.5 683 9.4 742 11.14 952
8.6 684 9.5 746 11.15 955
8.7 688 9.6 752 11.16 955
8.8 688 9.7 764
987
i
988 INDEX TO LEMMAS, THEOREMS AND ALGORITHMS
Lemma Lemma temma

7.1 614 8.11 705 11.3 859

7.2 614 8.12 705 11.4 860
7.3 638 8.13 712 11.5 862
8.1 669 8.14 713 11.6 862
8.2 678 8.15 715 11.7 862
8.3 681 10.1 803 11.8 887
8.4 682 10.2 803 11.9 887
8.5 683 10.3 833 11.10 889
8.6 684 10.4 837 11.11 889
8.7 687 10.5 837 11.12 899
8.8 695 10.6 837 11.13 915
8.9 696 11.1 855 11.14 915
8.10 704 11.2 856 11.15 953
Algorithm Algorithm Algorithm

7.1 547 8.1 674 10.2 795

7.2 567 8.2 678 10.3 827
7.3 572 8.3 682 10.4 832
7.4 574 8.4 698 10.5 834
7.5 584 8.5 702 11.1 882
7.6 589 8.6 712 11.2 882
7.7 593 8.7 714 11.3 892
7.8 601 9.1 740 11.4 897
7.9 613 9.2 744 11.5 916
7.10 624 9.3 750 11.6 940
7.11 631 9.4 762 11.7 949
7.I2 633 9.5 764 11.8 953
7.13 635 10.1 793
INDEX TO VOLUMES I AND Ii
A Anderson, J. P., 906

Antisymmetric relation, 10
Aanderaa, S. D., 34 Arbib, M. A., 138
Absolute machine code, 720-7.21 Arc (see Edge)
Acceptable property, 815 Arithmetic expression, 86, 768-773,778-
Acceptance, 96 (see also Final configura- 781, 878-907
tion) Arithmetic progression, 124, 209, 925-
Accessible configuration, 583 929
Accessible state, 117, 125-126 Assembler, 59, 74
Accumulator, 65 Assembly language, 65-70, 721, 863,
Ackley, S. I., 263 879-880
Action function, 374, 392 Assignment statement, 65-70
Active variable, 849, 937 Associative law, 23, 617, 868-873, 876,
Adjacency matrix, 47-51 891, 894-903
Aho, A. V., 102-103, 192, 251, 399, 426, Associative tree, 895-902
563, 621, 645, 665, 709, 757, 787, Asymmetric relation, 9
878, 960 Atom, 1-2
Algebraic law, 846, 867-873, 910-911 Attribute (see Inherited attribute, Syn-
ALGOL, 198-199, 234, 281, 489-490, thesized attribute, Translation
501, 621 symbol)
Algorithm, 27-36 Augmented grammar, 372-373, 427-428,
Allard, R. W., 906 634
Allen, F. E., 936, 960 Automaton (see Recognizer, Transducer)
Alphabet, 15 Available expression, 937
Alternates (for a nonterminal), 285, 457 Axiom, 19-20
Ambiguous grammar, 143, 163, 202,207,
281,489-490, 662-663, 678, 711- B
712 (see also Unambiguous gram-
mar) Backtrack parsing, 281-314, 456-500,
Ancestor, 39 746--753
989
990 INDEXTO VOLUMES I AND II
Backus, J. W., 76 Brooker, R. A., 77, 313

Backus-Naur form, 58 (see also Context- Bruno, J. L., 787
free grammar) Brzozowski, J. A., 124, 138
Backwards determinism (see Unique in- Burkhard, W. A., 787
vertibility) Busam, V. A., 936, 960
Baer, J. L., 906
Bar-Hillel, Y., 82, 102, 211
C
Barnett, M. P., 237
Base node, 39 Canonical collection of sets of valid
Basic block (see Straight-line code) items, 389-391, 616, 621
Bauer, F. L., 455 Canonical grammar, 692-697, 699-700
Bauer, H., 426 Canonical LR (k) parser, 393-396
Beals, A. J., 578 Canonical parsing automaton, 647-648,
Beatty, J. C., 906 657
Becker, S., 426 Canonical set of LR (k) tables, 393-394,
Begin block, 913, 938 584, 589-590, 625
Belady, L. A., 906 Cantor, D. G., 211
Bell, J. R., 563, 811 Cardinality (of a set), 11, 14
Berge, C., 52 Cartesian product, 5
Bijection, 10 Catalan number, 165
Binary search tree, 792-793 Caviar, 667
Birman, A., 485 Caviness, B. F., 878
Blattner, M., 211 CFG (see Context-free grammar)
Block (see Straight-line code) CFL (see Context-free language)
Block-structured language, bookkeeping Chaining, 808-809
for, 792, 812-813, 816-821 Characteristic function, 34
BNF (see Backus-Naur form) Characteristic string (of a right senten-
Bobrow, D. G., 82 tial form), 663
Book, R. V., 103, 211 Characterizing language, 238-243, 251
Bookkeeping, 59, 62-63, 74, 255, 722- Cheatham, T. E., 58, 77, 280, 314, 579
723, 781-782, 788-843 Chen, S. C., 906
Boolean algebra, 23-24, 129 Chomsky, N., 29, 58, 82, 102, 124, 166,
Booth, T. L., 138 192, 211
Border (of a sentential form), 334, 369 Chomsky grammar, 29 (see also Gram-
Borodin, A., 36 mar)
Bottom-up parsing, 178-184, 268-271, Chomsky hierarchy, 92
301-307, 485-500, 661-662, 740- Chomsky normal form, 150-153, 243,
742, 767, 816 (see also Bounded- 276-277, 280, 314, 362, 689, 708,
right-context grammar, Floyd- 824
Evans productions, LR(k) gram- Christensen, C., 58
mar, Precedence grammar) Church, A., 25, 29
Bounded context grammar/language, Church-Turing thesis, 29
450-452 Circuit (see Cycle)
Bounded-right-context (BRC) grammar/ Circular translation scheme, 777-778
language, 427-435, 448, 451-452, Clark, E. R., 936
666, 699-701, 708, 717 (see also Closed portion (of a sentential form),
P
(1, 1)-BRC grammar, (1, 0)- 334-369

BRC grammar) Closure
Bovet, D. P., 966 of a language, 17-18, 197
Bracha, N., 878 reflexive and transitive, 8-9
Breuer, M. A., 878 of a set of valid items, 386, 633
INDEX TO VOLUMES I AND II 991
Closure (cont.) Conflict, precedence (see Precedence con-
transitive, 8-9, 47-50, 52 flict)
Cluster (of a syntax tree), 894-895 Congruence relation, 134
CNF (see Chomsky normal form) Consistent set of items, 391
Cocke, J., 76, 332, 936, 960 Constant, 254
Cocke-Younger-Kasami algorithm, 281, Constant propagation (see Compile time
314-320 computation)
Code generation, 59, 65-70, 72, 74, 728, Context-free grammar/language, 57, 91-
765-766, 781-782, 863-867 (see 93, 97, 99, 101, 138-211, 240-
also Translation) 242, 842
Code motion, 924-925 Context-sensitive grammar/language, 91-
Code optimization, 59, 70-72, 723-724, 93, 97, 99, 101,208, 399
726, 769-772, 844-960 Continuing pushdown automaton, 188-
Cohen, R. S., 500 189
Collision (in a hash table), 795-796, Conway, R. W., 77
798-799 Cook, S. A., 34, 192
Colmerauer, A., 500 Core [of a set of LR(k) items], 626-627
Colmerauer grammar, 492, 497-500 Correspondence problem (see Post's cor-
Colmerauer precedence relations, 490- respondence problem)
500 Cost criterion, 844, 861-863, 891
Column merger [in LR(k) parser] 611- Countable set, 11, 14
612 Cover, grammatical (see Left cover, Right
Common subexpression (see Redundant cover)
computation) CSG (see Context-sensitive grammar)
Commutative law, 23, 71, 867, 869-873, CSL (see Context-sensitive language)
87 6, 891-903 Culik, K. II, 368, 500
Compatible partition, 593-596, 601, 627 Cut (of a parse tree), 140-141
Compiler, 53-57 (see also Bookkeeping, Cycle, 39
Code generation, Code optimiza- Cycle-free grammar, 150, 280, 302-303,
tion, Error correction, Lexical 307
analysis, Parsing)
Compiler-compiler, 77
Compile time computation, 919-921
Complementation, 4, 189-190, 197, 208,
484, 689 Dag, 39-40, 42-45, 116, 547-549, 552,
Completely specified finite automaton, 763-765, 854-863, 865-866, 959
117 Dangling else, 202-204
Component grammar (see Grammar Data flow, 937-960
splitting) Davis, M., 36
Composition (of relations), 13, 250 D-chart, 79-82, 958
Computational complexity, 27-28, 208, De Bakker, J. W., 878
210, 297-300, 316-320, 326-328, Debugging, 662
356, 395-396, 473-476, 736, 839- Decidable problem (see Problem)
840, 863, 874, 944, 958-959 Defining equations (for context-free lan-
Computation path, 914-9 t 5, 944 guages), 159-163
Computed goto (in Floyd-Evans produc- Definition (statement of a program), 909
tions), 564 De Morgan's laws, 12
Concatenation, 15, 17, 197, 208-210, 689 Denning, P. J., 426, 709
Configuration, 34, 95, 113-114, 168-169, Depth (in a graph), 43
224, 228, 290, 303, 338, 477, 488, De Remer, F. L., 399, 512, 578, 645, 665
582 Derivation, 86, 98
992 INDEX TO VOLUMES I AND II
Derivation tree (sde Parse tree) Dyck language, 209

Derivative (of a regular expression), 136
Derived graph, 9407-941
Descendant, 39 \
Deterministic finite automaton, 116, 255
(see also Finite automaton) e (see Empty string)
Deterministic finite transducer, 226-227 Earley, J., 332, 578, 645
(see also Finite transducer) Earley's algorithm, 73, 281,320-31,397-
Deterministic language (see Determin- 398
istic pushdown automaton.) Edge, 37
Deterministic pushdown automaton, 184- e-free first (EFF), 381-382, 392, 398
192, 201-202, 208-210, 251, 344, e-free grammar, 147-149, 280, 302-303,
398, 446, 448, 466--469, 684-686, 397
695, 701, 708-709, 711, 712, 717 Eickel, J., 455
Deterministic pushdown transducer, 229, Eight queens problem, 309
251, 271-275, 341, 395, 443, 446, Elementary operation, 317, 319, 326, 395
730-736, 756 e-move, 168, 190
Deterministic recognizer, 95 (see also Emptiness problem, 130-132, 144-145,
Deterministic finite automaton, 483
Deterministic pushdown automa- Empty set, 2
ton, Deterministic two-stack Empty string, 15
parser, Recognizer) Endmarker, 94, 271, 341, 404, 469, 484,
Deterministic two-stack parser, 488, 492- 698, 701,707, 716
493, 500 Engeler, E., 58
Dewar, R. B. K., 77 English, structure of, 55-56, 78
Diagnostics (see Error correction) Englund, D. E., 936, 960
Difference (of sets), 4 Entrance (of an interval), 947
Differentiation, 760-763 Entry (of a block), 912
Dijkstra, E. W., 79 e-production, 92, 147-149, 362, 674-680,
Direct access table, 791-792 686--688, 690
Direct chaining, 808 Equivalence
Direct dominator (see Dominator) of LR(k) table sets, 585-588, 590-
Direct lexical analysis, 61-62, 258-.281 596, 601, 617, 625, 652-683
Directed acyciic graph (see Dag) of parsers, 560, 562-563, 580-581 (see
Directed graph (see Graph, directed) also Equivalence of LR(k) table
Disjoint sets, 4 sets, Exact equivalence)
Distance (in a graph), 47-50 of programs, 909, 936
Distinguishable states, 124-128, 593, of straight-line blocks, 848, 891
654-657 topological (see Equivalence of
Distributive law, 23, 868 straight-line blocks)
Domain, 6, 10 under algebraic laws, 868-869
Domolki's algorithm, 312-313, 452 Equivalence class (see Equivalence re-
Dominator, 915-917, 923-924, 934, 939, lation)
959 Equivalence problem, 130-132, 201, 237,
Don't care entry [in an LR(k) table], 362, 684-686, 709, 936
581-5812, 643 (see also cp-inacces- Equivalence relation, 6--7, 12-13, 126,
sible ) 133-134
DPDA (see Deterministic pushdown Erase state, 691, 693, 708
automaton) Error correction/recovery, 59, 72-74, 77,
DPDT (see Deterministic pushdown 367, 394, 399, 426, 546, 586, 615,
transducer) 644, 781, 937
INDEX TO VOLUMESI AND II 993
Error indication, 583 Frequency profile, 912

Essential blank (of a precedence matrix), Frontier (of a parse tree), 140
556 Function, 10, 14
Euclid's algorithm, 26-27, 36, 910 Futrelle, R. P., 237
Evans, A., 455, 512
Evey, R. J., 166, 192 G
Exact equivalence (of parsers), 555-559,
562, 585 Galler, B. A., 58
Exit (of an interval), 947 Gear, C. W., 936
Extended precedence grammar, 410-415, GEN, 945-955
424-425, 429, 451,717 Generalized syntax-directed translation
Extended pushdown automaton, 173-175, scheme, 758-765, 782-783
185-186 Generalized top-down parsing language,
Extended pushdown transducer, 269 469-485, 748-753
Extended regular expression, 253-258 Gentleman, W. M., 61
Extensible language, 58, 501-504 Gill, A., 138
Ginsburg, S., 102-103, 138, 166, 211,
237
Ginzburg, A., 138
GNF (see Greibach normal form)
Feldman, J. A., 77, 455 Gotlieb, C. C., 314
Fetch function (of a recognizer), 94 GOTO, 386-390, 392, 598, 616
FIN, 135, 207 GOTO -a, 598-600
Final configuration, 95, 113, 169, 175, Goto function, 374, 392
224, 228, 339, 583, 648 GOTO graph, 599-600
Final state, t 13, 168, 224 Governing table, 581
Finite ambiguity, 332 Graham, R. L., 907
Finite automaton, 112-121, 124-128, Graham, R. M., 455
255-261, 397 (see also Canonical Graham, S. L., 426, 709
parsing automaton) Grammar (see Bounded-right-context
Finite control, 95, 443 (see also State) grammar, Colmerauer grammar,
Finite set, 11, 14 Context-free grammar, Context-
Finite transducer, 223-227, 235, 237- sensitive grammar, Indexed gram-
240, 242, 250, 252, 254-255, 258, mar, LC(k) grammar, LL(k)
722 grammar, LR(k) grammar, Op-
FIRST, 300, 335-336, 357-359 erator grammar, Precedence gram-
Fischer, M. J., 102, 426, 719, 843 mar, Right-linear grammar, Web
Fischer, P. C., 36 grammar)
Flipping (of statements), 853-859, 861- Grammar splitting, 631-645
863 Graph
Flow chart, 79-82 (see also Flow graph) directed, 37-52
Flow graph, 907, 913-914 undirected, 51
Floyd, R. W., 52, 77, 166, 211, 314, 426, Graph grammar (see Web grammar)
455, 563, 878, 906 Gray, J. N., 192, 280, 455, 500, 709
Floyd-Evans productions, 443-448, 452, Greek letters, 2t4
564-579 Greibach, S. A., 102-103, 166, 211
FOLLOW, 343, 425, 616, 640 Greibach normal form, 153-162, 243,
Formal Semantic Language, 455 280, 362, 668, 681-684, 690, 708
FORTRAN, 252, 501,912, 958 Gries, D., 76--77
Frailey, D. J., 906 Griffiths, T. V., 314
Freeman, D. N., 77, 263 Griswold, R. E., 505
994 INDEXTO VOLUMES I AND II
Gross, M., 211 Igarishi, S., 878

GTDPL (see Generalized top-down IN, 944-955, 959
parsing language) Inaccessible state (see Accessible state)
Inaccessible symbol (of a context-free
grammar), 145-147
Inclusion (of sets), 3, 208
In-degree, 39
Haines, L. H., 103
Halmos, P. R., 3, 25 Independent nodes (of a dag), 552-555
Index (of an equivalence relation), 7
Halting (see Recursive set, Algorithm)
Indexed grammar, 100-101
Halting problem, 35
Indirect chaining (see Chaining)
Halting pushdown automaton, 282-285
Handle (of a right-sentential form) 179- Indirect lexical analysis, 61-62, 254-258
Indistinguishable states (see Distinguish-
180, 377, 379-380, 403-404, 486
Harary, F., 52 able states.)
Harrison, M. A., 138, 192, 280, 455, 500, Induction (see Proof by induction)
709 Induction variable, 925-929
Infinite set, 11, 14
Hartmanis, J., 36, 192
Infix expression, 214-215
Hashing function, 794-795, 797-798 (see
Ingerman, P. Z., 77
also Hashing on locations, Linear
Inherent ambiguity, 205-207, 209
hashing function, Random hash-
ing function, Uniform hashing Inherited attribute, 777-781,784
INIT, 135, 207
function )
Initial configuration, 95, 113, 169, 583,
Hashing on locations, 804-807
648
Hash table, 63, 793-811
Initial state, 113, 168, 224
Haskell, R., 280
Initial symbol (of a pushdown automa-
Haynes, H. R., 578
ton), 168
Hays, D. G., 332
Injection, 10
Header (of an interval), 938-939
Input head, 94-96
Hecht, M. S., 960
Height Input symbol, 113, 168, 218, 224
Input tape, 93-96
in a graph, 43
Input variable, 845
of an LR(k) table, 598
Interior frontier, 140
Hellermann, H., 906
Hext, J. B., 314 Intermediate (of an indexed grammar),
100
Hochsprung, R. R., 77
Homomorphism, 17-18, 197, 207, 209, Intermediate code, 59, 65-70, 722-727,
213, 689 844-845, 908-909
Interpreter, 55, 721, 725
Hopcroft, J. E., 36, 102-103, 138, 192,
Interrogation state, 650, 658
211, 368, 399, 690, 843, 960
Intersection, 4, 197, 201, 208, 484, 689
Hopgood, F. R. A., 76, 455
Intersection list, 824-833, 839-840
Horning, J. J., 76-77, 450, 465
Horwitz, L. P., 906 Interval analysis, 937-960
Inverse (of a relation), 6, 10-11
Huffman, D. A., 138
Inverse finite transducer mapping, 227
Irland, M. I., 36
Irons, E. T., 77, 237, 314, 455
Irreducible flow graph, 953-955
Ianov, I. I., 937 Irreflexive relation, 9
Ibarra, O., 192 Item (Earley's algorithm), 320, 331,397-
Ichbiah, J. D., 426, 579 398
Identifier, 60-63, 252, 254 Item [LR(k)], 381 (see also Valid item)
Left cover, 275-277, 280, 307, 690

Left factoring, 345
Johnson, S. C., 665 Left-linear grammar, 122
Johnson, W. L., 263 Leftmost derivation, 142-143, 204, 318-
320
Left-parsable grammar, 271-275, 341,
672-674
Left parse (see Leftmost derivation, Top-
Kameda, T., 138
down parsing)
Kaplan, D. M., 937
Left parse language, 273, 277
Karp, R. M., 906
Left parser, 266-268
Kasami, T., 332
Left recursion, 484 (see also Left-recur-
Kennedy, K., 906, 960
sive grammar)
Keyword, 59, 259
Left-recursive grammar, 153-158, 287-
KGOTO, 636-637
288, 294-298, 344-345, 681-682
Kleene, S. C., 36, 124
Left-sentential form, 143
Kleene closure (see Closure, of a lan-
Leinius, R. P., 399, 426, 645
guage)
Knuth, D. E., 36, 58, 368, 399, 485, 690, Length,
of a derivation, 86
709, 787, 811, 912, 960
of a string, 16
Korenjak, A. J., 368, 399, 636, 645, 690
Lentin, A., 211
Kosaraju, S. R., 138
Lewis, P. M. II, 192, 237, 368, 621, 757,
k-predictive parsing algorithm (see Pre-
843
dictive parsing algorithm)
Lexical analysis, 59-63, 72-74, 251-264,
k'-uniform hashing function (see Uniform
721-722, 781, 789, 823
hashing function)
Lexicographic order, 13
Kuno, S., 313
Limit flow graph, 941
Kurki-Suonio, R., 368, 690
Linear bounded automaton, 100 (see also
Context-sensitive grammar)
Linear grammar/language, 165-170,
207-208, 237
Labeled graph, 38, 42, 882, 896-897 Linear hashing function, 804-805
La France, J. E., 578-579 Linearization graph, 547-550
Lalonde, W. R., 450 Linear order 10, 13-14, 43-45, 865
LALR(k) grammar (see Lookahead Linear precedence functions, 543-563
LR(k) grammar) Linear set, 209-210
Lambda calculus, 29 Link editor, 721
Language, 16-17, 83-84, 86, 96, 114, LL(k) grammar/language, 73, 333-368,
169, 816 (see also Recognizer, 397-398, 448, 452, 579, 643, 664,
Grammar) 666--690, 709, 711,716-717, 730-
LBA (see Linear bounded automaton) 732, 742-746
LC(k) grammar/language, 362-367 LL(k) table, 349-351, 354-355
Leaf, 39 LL(1) grammar, 342-349, 483, 662
Leavenworth, B. M., 58, 501 LL(0) grammar, 688-689
Lee, E. S., 450 Loader, 721
Lee, J. A. N., 76 Load module, 721
Left-bracketed representation (for trees), Loeckx, J., 455
46 Logic, 19-25
Left-corner parse, 278-280, 310-312, Logical connective, 21-25
362-367 Lookahead, 300, 306, 331, 334-336, 363,
Left-corner parser, 310-312 371
996 INDEX TO VOLUMES I AND II
Lookahead LR(k) grammar, 627-630, Miller, R. E., 906

642-643, 662 Miller, W. F., 82
Looping (in a pushdown automaton), MIN, 135, 207, 209
186-189 Minimal fixed point, 108-110, 121-123,
Loops (in programs), 907-960 160-161
Loop unrolling, 930-93.2 Minor node, 889, 903
Lowry, E. S., 936, 960 Minsky, M., 29, 36, 102, 138
LR(k) grammar/language, 73, 369, 371- Mixed strategy precedence grammar, 435,
399, 402, 424, 428, 430, 448, 579- 437-439, 448, 452, 552
665, 666-674, 730, 732-736, 740- Modus ponens, 23
742 Montanari, U. G., 82
LR(k) table, 37~-376, 392-394, 398, Moore, E. F., 103, 138
580-582 Morgan, H. L., 77, 263
LR(1) grammar, 410, 448-450, 690, 701 Morris, D., 77, 313
LR(0) grammar, 590, 642, 646-657, 690, Morris, R., 811, 843
699, 701,708 Morse, S. P., 426, 579
Lucas, P., 58 Moulton, P. G., 77
Luckham, D. C., 909, 937 Move (of a recognizer), 95
Lukasiewicz, J., 214 MSP (see Mixed strategy precedence
grammar)
Muller, M. E., 77
M Multiple address code, 65, 724-726
Munro, I., 52
Major node, 889-890, 903
Manna, Z., 937
N
Mapping (see Function)
Marill, M., 936 N akata, I., 906
Marked closure, 2 I0 NAME, 769
Marked concatenation, 210 Natural language, 78, 281 (see also En-
Marked union, 210 glish )
Markov, A. A., 29 Naur, P., 58
Markov algorithm, 29 Neutral property, 814-815, 823, 827,
Martin, D. F., 563 840-841
Maurer, W. D., 811 NEWLABEL, 774
MAX, 135, 207, 209 NEXT, 598-599, 609, 616
Maxwell, W. L., 77 Next move function, 168, 224
McCarthy, J., 77 Nievergelt, J., 936
McClure, R. M., 77, 485, 757 Node, 37
McCullough, W. S., 103 Nondeterministic algorithm, 285, 308-
Mcllroy, M. D., 58, 61,757, 843 310
McKeeman, W. M., 76-77, 426, 455,936 Nondeterministic finite automaton, 117
McNaughton, R., 124, 690 (see also Finite automaton)
Medlock, C. W., 936, 960 Nondeterministic FORTRAN, 308-310
Membership (relation on sets), 1 Nondeterministic property grammar, 823,
Membership problem, 130-132 841
Memory (of a recognizer), 93-96 Nondeterministic recognizer, 95 (see also
Memory references, 903-904 Recognizer)
Mendelson, E., 25 Non-left-recursive grammar (see Left-
META, 485 recursive grammar)
Meyers, W. J., 906 Nonnullable symbol (see Nullable sym-
Miller, G. A., 124 bol)
INDEX TO VOLUMESI AND II 997
Nonterminat, 85, 100, 218, 458 Parse tree, 139-143, 179-180, 220-222,
Normal form deterministic pushdown 273, 379, 464-466 (see also Syn-
automaton, 690-695 tax tree)
Northcote, R. S., 578 Parsing, 56, 59, 63-65, 72-74, 263-280,
Nullable symbol, 674-680 722, 781 (see also Bottom-up
parsing, Shift-reduce parsing,
Top-down parsing)
O Parsing action function (see Action func-
tion)
Object code, 59, 720 Parsing automaton, 645-665
Oettinger, A., 192, 313 Parsing machine, 477-482, 484, 747-748,
Ogden, W., 211 750-753
Ogden's lemma, I92-196 Partial acceptance failure (of a TDPL
(1, 1 )-bounded-right-context grammar, or GTDPL program), 484
429-430, 448, 690, 701 Partial correspondence problem, 36
( 1, 0)-bounded-right-context grammar, Partial function, 10
690, 699-701,708 Partial left parse, 293-296
One-turn pushdown automaton, 207-208 Partial order, 9-10, 13-15, 43-45, 865
One-way recognizer, 94 Partial recursive function (see Recursive
Open block, 856-858 function)
Open portion (of a sentential form), 334, Partial right parse, 306
369 Pass (of a compiler), 723-724, 782
Operator grammar/language, 165, 438 Paterson, M. S., 909, 937
Operator precedence grammar/language, Path, 39, 51
439-443, 448-450, 452, 550-551, Pattern recognition, 79-82
711-718 Paul, M., 455
Order (of a syntax-directed translation), Paull, M. C., i66, 690
243-251 Pavlidis, T., 82
Ordered dag, 42 (see also Dag) PDA (see Pushdown automaton)
Ordered graph, 41-42 PDT (see Pushdown transducer)
Ordered tree, 42-44 (see also Tree) Perfect induction, 20
Ore, O., 52 Perles, M., 211
OUT, 944-958 Perlis, A. J., 58
Out degree, 39 Peterson, W. W., 811
Output symbol, 218, 224 Petrick, S. R., 314
Output variable, 844 Pfaltz, J. L., 82
Phase (of compilation), 721,781-782
w-inaccessible set of LR(k) tables, 588-
597, 601,613, 616
Pager, D., 621 Phrase, 486
Painter, J. A., 77 Pig Latin, 745-747
Pair, C., 426 Pitts, E., 103
PAL, 512-517 PL/I, 501
Parallel processing, 905 PL360, 507-511
Parenthesis grammar, 690 Poage, J. F., 505
Parikh, R. J., 211 Polish notation (see Prefix expression,
Parikh's theorem, 209-211 Postfix expression)
Park, D. M. R., 909, 937 Polonsky, I. P., 505
Parse lists, 321 Pop state, 650, 658
Parse table, 316, 339, 345-346, 348, 351- Porter, J. H., 263
356, 364-365, 374 Position (in a string), 193
998 INDEXTO VOLUMESI AND II
Post, E. L., 29, 36 Pseudorandom number, 798, 807, 808

Postdominator, 935 Pumping lemma,
Postfix expression, 214-215, 217-218, for context-free languages, 195-196
229, 512, 724, 733-735 (see also Ogden's lemma)
Postfix simple syntax-directed translation, for regular sets, 128-129
733, 756 Pushdown automaton, 167-192, 201, 282'
Postorder (of a tree), 43 (see also Deterministic pushdown
Postponement set, 600-607, 617, 626 automaton)
Post's correspondence problem, 32-36, Pushdown list, 94, 168, 734-735
199-201 Pushdown processor, 737-746, 763-764
Post system, 29 Pushdown symbol, 168
Power set, 5, 12 Pushdown transducer, 227-233, 237,
PP (see Pushdown processor) 265-268, 282-285 (see also De-
Prather, R. E., 138 terministic pushdown transducer)
Precedence (of operators) 65, 233-234, Push state, 658
608, 617
Precedence conflict, 419-420
Precedence grammar/language, 399-400, Q
403, 404, 563, 579 (see also Ex-
tended precedence grammar, Quasi-valid LR(k) item, 638-639
Mixed strategy precedence gram- Question (see Problem)
mar, Operator precedence gram- Quotient (of languages), 135, 207, 209
mar, Simple precedence grammar,
T-canonical precedence grammar,
(2, 1)-precedence grammar, Weak
precedence grammar)
Predecessor, 37 Rabin, M. O., 103, 124
Predicate, 2 Radke, C. E., 811
Predictive parsing algorithm, 338-348, Randell, B., 76
351-356 Random hashing function, 799, 803-807,
Prefix expression, 214,215, 229, 236, 724, 809-810
730-731 Range, 6, 10
Prefix property, 17, 19, 209, 690, 708 Read state (of parsing automaton), 658
Preorder (of a tree), 43, 672 Reasonable cost criterion, 862-863
Problem, 29-36 Recognizer, 93-96, 103 (see also Finite
Procedure, 25-36 automaton, Linear bounded auto-
Product maton, Parsing machine, Push-
of languages, 17 (see also Concatena- down automaton, Pushdown
tion) processor, Turing machine, Two-
of relations, 7 stack parser)
Production, 85, 100 Recursive function, 28
Production language (see Floyd-Evans Recursive grammar, 153, 163
productions) Recursively enumerable set, 28, 34, 9 2 ,
Program schemata, 909 97, 500
Proof, 19-21, 43 Recursive set, 28, 34, 99
Proof by induction, 20-21, 43 Reduced block, 859-86.1
Proper grammar, 150, 695 Reduced finite automaton, 125-128
Property grammar, 723, 788, 811-843 Reduced parsing automaton, 656-657
Property list, 824-833, 839-840 Reduced precedence matrix, 547
Propositional calculus, 22-23, 35 Reduce graph, 574-575
Prosser, R. T., 936 Reduce matrix, 574-575
Reduce state, 646 Rosenfeld, A., 82

Reducible flow graph, 937, 941,944-952, Rosenkrantz, D. J., 102, 166, 368, 621,
958-959 690, 843
Reduction in strength, 921, 929-930 Ross, D. T., 263
Redundant computation, 851-852, 854- Rule of inference, 19-20
859, 861-863, 918-919 Russell, L. J., 76
Redziejowski, R. R., 906 Russell's paradox, 2
Reflexive relation, 6
Reflexive-transitive closure (see Closure,
reflexive and transitive)
Region, 922-924, 926-927, 929 Salomaa, A., 124, 138, 211
Regular definition, 253-254 Samelson, K., 455
Regular expression, 104-110, 1.21-123 Sammet, J. E., 29, 58, 807
Regular expression equations, 105-112, Sattley, K., 312
121-123 Scan state, 691, 693, 707
Regular grammar, 122, 499 Scatter table (see Hash table)
Regular set, 103-138, 191, 197, 208-210, Schaefer, M., 960
227, 235, 238-240, 424, 689 Schorre, D. V., 77, 313, 485
Regular translation (see Finite trans- Schutte, L. J., 578
ducer) Schutzenberger, M. P., 166, 192, 211
Relation, 5-15 Schwartz, J. T., 76, 960
Relocatable machine code, 721 Scope (of a definition), 849
Renaming of variables, 852, 854-859, Scott, D., 103, 124
861-863 SDT (see Syntax-directed translation)
Reversal, 16, 121, 129-130, 397, 500, Self-embedding grammar, 210
689 Self-inverse operator, 868
Reynolds, J. C., 77, 280, 313 Semantic analysis, 723
Rice, H. G., 166 Semantics, 55-58, 213
Richardson, D., 878 Semantic unambiguity, 274, 758
Right-bracketed representation (for Semilinear set, 210
trees), 46 Semireduced automaton, 653, 661
Right cover, 275-277, 280, 307, 708, 718 Sentence (see String)
Right-invariant equivalence relation, 133- Sentence symbol (see Start symbol)
134 Sentential form, 86, 406-407, 414-415,
Right-linear grammar, 91-92, 99, 110- 442, 815
112, 118-121,201 Set, 1-19
Right-linear syntax-directed translation Set merging problem, 833-840, 843
scheme, 236 Sethi, R., 878, 906
Rightmost derivation, 142-143,264, 327- Shamir, E., 211
330 Shapiro, R. M., 77
Right parsable grammar, 271-275, 398, Shaw, A. C., 82
672-674 Shepherdson, J. C., 124
Right parse (see Rightmost derivation, Shift graph, 570-574
Bottom-up parsing) Shift matrix, 569-574
Right parse language, 273 Shift-reduce conflict, 643
Right parser, 269-27t Shift-reduce parsing, 269, 301-302, 368-
Right-sentential form, 143 371, 392, 400-403, 408, 415, 418-
Roberts, P. S., 314 419, 433, 438-439, 442, 544, 823
Rogers, H., 36 (see also Bounded-right-context
Root, 40 grammar, LR (k) grammar, Prece-
Rosen, S., 76 dence grammar)
Shift state, 646 Straight line code, 845-879, 909-910,

Simple LL(1) grammar/language, 336 912
Simple LR(k) grammar, 605, 626-627, Strassen, V., 34, 52
633, 640-642, 662 String, 15, 86
Simple mixed strategy precedence gram- Strong characterization (see Characteriz-
mar/language, 437, 448, 451-452, ing language)
642, 666, 690', 695-699 Strong connectivity, 39
Simple precedence grammar/language, Strong LL(k) grammar, 344, 348
403-412, 420~-424, 492-493, 507, Strongly connected region (see Region)
544-552, 555-559, 615, 666, 709- Structural equivalence, 690
712, 716-718 SUB, 135
Simple syntax-directed translation, 222- Subset, 3
223, 230-233, 240-242, 250, 265, Substitution (of languages), 196-197
512, 730-736 (see also Postfix Successor, 37
simple SDTS) Suffix property, 17, 19
Single entry region, 924 (see also Region) Superset, 3
Single productions, 149-150, 452, 607- Suppes, P., 3
615, 617, 712 Symbol table (see Bookkeeping)
Skeletal grammar, 440-442, 452, 611 Symmetric relation, 6
SLR(k) grammar (see Simple LR(k) Syntactic analysis (see Parsing)
grammar) Syntax, 55-57
SNOBOL, 505-507 Syntax-directed compiler, 730
Solvable problem (see Problem) Syntax-directed translation, 57, 66-70,
Source program, 59, 720 215-251, 445, 730-787, 878 (see
Space complexity (see Computational also Simple syntax-directed trans-
complexity) lation)
Spanning tree, 51, 571-574 Syntax macro, 501-503
Split canonical parsing automaton, 651- Syntax tree, 722-727, 881 (see also Parse
652, 659-660 tree)
Splitting Synthesized attribute, 777, 784
of grammars (see Grammar split-
ting)
of LR(k) tables, 629-631
of states of parsing automaton, 650-
652, 657-660
Splitting nonterminal, 631 Tagged grammar, 617-620, 662
Standard f o r m (of regular expression Tag system, 29, 102
equations), 106--.110 Tautology, 23
Standish, T., 77, 280 T-canonical precedence grammar, 452-
Start state (see Initial state) 454
Start symbol, 85, 100, 168, 218, 458 (see TDPL (see Top-down parsing language)
also Initial symbol) Temporary storage, 67-70
State (of a recognizer), 113, 168,224 Terminal (of a grammar) 85, 100, 458
State transition function, 113 Theorem, 20
Stearns, R. E., 192, 211, 237, 368, 621, Thickness (of a string), 683-684, 690
690, 757, 843 Thompson, K., 138, 263
Steel, T. B., 58 Three address code, 65 (see also Mul-
Stockhausen, P,; 906 tiple address code)
Stone, H. S., 906 Time complexity (see Computational
Storage allocation, 723 complexity)
Store function (of a recognizer), 94 TMG, 485
INDEX TO VOLUMESI AND II '! 001
Token, 59-63, 252 Ullman (cont.)

Token set, 452 251, 399, 426, 485, 563, 621, 645,
Top-down parsing, i78, 264-268, 285- 665, 709, 757, 787, 811, 843, 878,
301, 445, 456-485, 487, 661-662, 960
742-746, 767-768, 816 (see also Unambiguous grammar, 98-99, 325-328,
LL (k) grammar) 344, 395, 397, 407, 422, 430, 613,
Top-down parsing language, 458-469, 663 (see also Ambiguous gram-
472-473, 484-485 mar)
Topological sort, 43-45 Undecidable problem (see Problem)
Torii, K., 332 Underlying grammar, 226, 758, 815
Total function, 10, 14 Undirected graph (see Graph, undi-
Total recursive function, 28 rected)
TRANS, 945-955 Undirected tree, 51
Transducer (see Finite transducer, Pars- Unger, S. H., 314, 690
ing machine, Pushdown processor, Uniform hashing function, 810
Pushdown transducer) Union, 4, 197, 201,208, 484, 689
Transformation, 71-72 (see also Func- Unique invertibility, 370, 397, 404, 448,
tion, Translation) 452, 490, 499, 712-713
Transition function (see State transition Universal set, 4
function, Next move function) Universal Turing machine, 35
Transition graph, 116, 225 Unordered graph (see Graph)
Transitive closure (see Closure, transi- Unrestricted grammar/language, 84-92,
tive) 97-98, 100, 102
Transitive relation, 6 Unsolvable problem (see Problem)
Translation, 55, 212-213, 720-787 (see Useless statement, 844, 849-851, 854-
also Code generation, Syntax-di- 859, 861-863, 917-918, 937
rected translation, Transducer) Useless symbol, 123, I46--147, 244, 250,
Translation element, 758 280
Translation form (pair of strings), 216-
217
Translation symbol, 758
Translator, 216 (see also Transducer)
Traverse (of a PDA), 691,693-694 Valid item, 381, 383-391, 394
Tree, 40-42, 45-47, 55-56, 64-69, 80-82, Valid parsing algorithm, 339, 347-348
283-284, 434-436 (see also Parse Valid set of LR(k) tables, 584
tree) Value (of a block) 846, 857
Truth table, 21 Van Wijngaarden, A., 58
T-skeletal grammar, 454 Variable (see Nonterminal)
Turing, A. M., 29, 102 Venn diagram, 3
Turing machine, 29, 33-36, 100-132 Vertex (see Node)
(2, 1)-precedence grammar, 426, 448, 666, Viable prefix (of a right-sentential form)
690,702-707 380, 393, 616
Two-stack parser, 487-490, 499-500
Two-way finite automaton, 123
Two-way pushdown automaton, 191-192 W
Walk, K., 58
Waiters, D. A., 399
Warshall, S., 52, 77
UI (see Unique invertability) Warshall's algorithm, 48-49
Ullman, J. D., 36, 102-103, 192, 211, WATFOR, 721
Weak precedence function, 551-555, 561 Word (see String)

Weak precedence grammar, 415-425, Worley, W. S., 77
437, 451-452, 552, 559, 561,564- Wortman, D. B., 76-77, 455, 512
579, 667, 714-716 Wozencraft, J. M., 512
Weber, H., 426, 563 Write sequence (for a write state of a
Web grammar, 79-82 PDA), 694
Wegbreit, B., 58 Write state, 691,693,708
Weiner, P., 138
Well-formed (TDPL or GTDPL pro-
gram), 484
Well order, 13, 19, 24-25
Winograd, S., 33; 906 Yamada, H., 124
Wirth, N., 426, 507, 563 Younger, D. H., 332
Wirth-Weber precedence (see Simple
precedence grammar)
Wise, D. S., 455
Wolf, K. A., 906
Wood, D., 368 Zemlin, R. A., 906
(continued from front flap)
• Describes methods for rapid storage
of information in symbol tables.
• Describes syntax-directed transla-
tions in detail and explains their
implementation by computer.
* Provides detailed mathematical so-
lutions and many valuable examples.
ALFRED V. AHO is a member of the

Technical Staff of Bell Telephone
Laboratories' Computing Science Re-
search Center at Murray Hill, New
jersey, and an Affiliate Associate Pro-
fessor at Stevens Institute of Tech-
nology. The author of numerous pub-
lished papers on language theory and
compiling theory, Dr. Aho received
his Ph.D. from Princeton University.
JEFFREY D. ULLMAN is an Associate

Professor of Electrical Engineering at
Princeton University. He has pub-
lished numerous papers in the com-
puter science field and was previously
co-author of a text on language theory.
Professor Ullman received his Ph.D.
from Princeton Universit)/.
PRENTICE-HALL, Inc.
Englewood Cliffs, New Jersey
1 1 7 6 • P r i n t e d in U.S. of A m e r i c a

The Theory of Parsing, Translation, and Compiling PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Theory of Parsing, Translation, and Compiling PDF

Uploaded by

Copyright:

Available Formats

This first volume of a comprehensive,

self-contained, and up-to-date treat-

Also included are valuable appendices

Among the features:

AHO AND ULLMAN, The Theory of Parsing, Translation, and Compiling,

Bell Telephone Laboratories, Inc.

Department of Electrical Engineering

ENGLEW00D CLIFFS, N.J.

All rights reserved. No part of this book

Printed in the United States of America

PRENTICE-HALL INTERNATIONAL, INC., London

This book is intended for a one or two semester course in compiling

course covering finite automata and context-free languages. It was therefore

parsing, translation, bookkeeping, and code optimization.) The two volumes

0.1 Concepts from Set Theory

0.2 Sets of Strings 15

0.3 Concepts from Logic 19

0.4 Procedures and Algorithms 25

0.4.3 Recursive Functions 28

0.5 Concepts from Graph Theory 37

1.2 An Overview of Compiling 58

2.1 Representations for Languages 83

2.2 Regular Sets, Their Generators, and Their Recognizers 103

2.3 Properties of Regular Sets 124

2.4 Context-free Languages 138

2.5 Pushdown Automata 167

2.6 Properties of Context-Free Languages 192

3.1 Formalisms for Translations 213

3.2 Properties of Syntax-Directed Translations 238

3.3 Lexical Analysis 251

3.4 Parsing 263

4 GENERAL PARSING M E T H O D S 281

4.1 Backtrack Parsing 282

4.2 Tabular Parsing Methods 314

5.1 LL(k) Grammars 334

5.2 Deterministic Bottom-Up Parsing 368

5.3 Precedence Grammars 399

5.4 Other Classes of Shift-Reduce Parsable Grammars 426

6 LIMITED B A C K T R A C K PARSING A L G O R I T H M S 456

6.1 Limited Backtrack Top-Down Parsing 456

6.2 Limited Backtrack Bottom-Up Parsing 485

A.1 Syntax for an Extensible Base Language 501

To speak clearly and accurately we need a precise and well-defined lan-

0.1. CONCEPTS FROM SET THEORY

In what follows, we assume that there are certain objects, referred to as

necessarily atoms) such that a E A. Each member of a set is either an atom

If P(X) is a predicate, then we denote the set of objects X for which

We have glossed over a great deal of development called axiomatic set

We say that set A is included in set B, written A ___ B, if every element of

Fig. 0.1 Venn diagram of set inclusion:

0.1.2. Operations on Sets

(a) (b) (c)

AUB AnB A-B

If A (~ B -- ;~, then A and B are said to be disjoint.

M a n y c o m m o n mathematical concepts, such as membership, set inclu-

Fig. 0.3 Equivalence classes for congruence modulo 3.

The index of an equivalence relation on a set A is the number of equiva-

Let R be an equivalence relation on A. Then for all a and b in A, either

Given a relation R, we often need to find another relation R', which

The k-fold product of a relation R (on A), denoted R k, is defined as fol-

suppose that a R* b. T h e n by (2) there is a c~ such that a R c~ a n d c 1 R 3 b.

c l , . . . , c, such that a = c 1, b = c,, and c tRct+ 1 for 1 ~ i < n. Since

0.1.5. Ordering Relations