Semantic Analysis and Intermediate Representations

98
Semantic Analysis
Compiler Design, WS 2005/2006
Semantic Analysis
Goal: check correctness of program and enable proper execution. gather semantic information, e.g. symbol table. check semantic rules (type checking). Often used approach: basis: context-free grammar associate information to language constructs by attaching attributes (properties) to grammar symbols. specify semantic rules (attribute equations) for grammar productions to compute the values of the attributes. Attribute grammar: context-free grammar with attributes and semantic rules.
99
Page 1
Attribute Grammar (Attributierte Grammatik)

Synthesized attributes (synthetisierte Attribute): An attribute at a node is synthesized, if its value is computed from the attribute values of the children of that node in the parse tree. Inherited attributes (ererbte Attribute): An attribute at a node is inherited if its value is computed from attribute values of the parent and/or siblings of that node in the parse tree.
100
synthesized at node n
inherited at node n
Example: Synthesized Attributes, Calculator

Production LE E E1 + T ET T T1 F TF F(E) F number Semantic Rule print(E.val) E.val := E1.val + T.val E.val := T.val T.val := T1.val F.val T.val := F.val F.val := E.val F.val := number.lexval
101
integer-valued synthesized attribute val with each nonterminal. integer-valued synthesized attribute lexval supplied by lexical analyzer.
Page 2
Example: Inherited Attributes, Declaration

Production DT L T int T real L L1 , id L id Semantic Rule L.in := T.type T.type := integer T.type := real L1.in := L.in symtab(id.entry, L.in) symtab(id.entry, L.in)
102
Declaration consists of type T followed by a list of variables. Type of T is synthesized and inherited to L: type synthesized attribute. in inherited attribute. Procedure symtab adds type of each identifier to its entry in the symbol table pointed to by attribute entry.
Attribute Grammar
103
Consider the production X0 X1 X2 Xn and the attributes a1,...,ak with the values of the attributes denoted by Xi.aj together with the following attribute equation:
X i .a j = f ij ( X 0 .a1 ,K , X 0 .ak ,K, X n .a1 , K, X n .ak )

Attribute computation: Attribute evaluator: Attribute evaluator derived from attribute equations. Attribute evaluation has to follow an evaluation order. Evaluation order based on dependency graph multiple passes of parse tree may be necessary. Attribute computation during parsing: Restricted attributed grammars are required.
Page 3
Attribute Computation During Parsing

LL/LR-parsing: processes input from left to right Consequence: Attribute evaluation has to correspond to left-to-right traversal of parse tree. Synthesized attributes: Children of a node can be processed in arbitrary order (in particular also from left to right). Inherited attributes: Backward dependencies are not allowed, i.e. dependencies from right to left in the parse tree.
104
Restricted Attribute Grammars

S-attributed Grammar: (S ... for synthesized) Attribute grammar in which all attributes are synthesized, i.e. let A X1 X2 Xn : A.a = f ( X 1.a1 ,K, X 1.ak ,K , X n .a1 , K, X n .ak ) L-attributed Grammar: (L ... for left to right) An attribute grammar is called L-attributed, if for each production X0 X1 X2 Xn and for each inherited attribute aj the semantic rules are all of the form:
X i .a j = f ij ( X 0 .a1 , K , X 0 .ak , K , X i 1.a1 , K , X i 1.ak ) Every S-attributed grammar is L-attributed. Synthesized attributes during LR parsing: simple (see example). Inherited attributes during LR parsing: can cause problems.
105
Page 4
Inherited Attributes and LR Parsing

LR parsers put off decision on which production to use in a derivation until the RHS of the production is fully formed. This makes it difficult for inherited attributes to be made available. YACC example: A: B {...some action...} C; is interpreted as: A: B U C; U: {...some action...} ; Note: semantic actions can add new parsing conflicts!
106
Symbol Table
107
Central repository for distinct kinds of information. Alternative: information directly in intermediate representation (eg. as attributes). Variety of names: variables, defined constants, type definitions, procedures, compiler-generated variables, etc. Typical symbol table entry: identifier type information scope information memory location Principal symbol table operations: insert, lookup. Often realized as hash table.
Page 5
Hash-Funktionen (1)
Die Hash-Funktion mu Schlsseln gleichmig verteilen. Die Hash-Funktion soll effizient sein. Es ist sehr zu empfehlen fr PMAX eine Primzahl zu whlen. Betrachten wir nun Hash-Funktionen fr Strings der Lnge k mit Characters ci, 1 i k, und h sei der berechnete Wert. Mit h mod PMAX erhlt man Index. x : hi = hi 1 + ci , fr 1 i k mit h0 = 0; und h = hk . (siehe x65599, x16, x5, x2, x1 in Tabelle). quad: 4 aufeinanderfolgende Characters bilden einen Integer, die dann aufaddiert werden und h ergeben. middle: h ergibt sich aus mittleren 4 Characters eines Strings. ends: Addiert ersten 3 und letzten 3 Characters eines Strings fr h.
108
Hash-Funktionen (2)
Funktion hashpjw in C (P.J. Weinbergers C Compiler, siehe Drachenbuch):
#define PRIME 211 #define EOS \0 int hashpjw(s) char *s; { char *p; unsigned h=0, g; for (p=s; *p != EOS; p++) { h = (h<<4) + (*p); if (g = h & 0xf0000000) { h = h ^ (g >> 24); h = h ^ g; } } return h % PRIME; }
109
Page 6
Experiment mit Hash-Funktionen (1)
110
(Experiment aus A.V. Aho, R. Sethi, J.D. Ullman: Compilers) Folgende Testreihe wird verwendet: 1. 50 hufigsten Namen und Keywords von Auswahl von C Programmen. 2. Wie 1., nur mit den 100 hufigsten Namen. 3. Wie 1., nur mit den 500 hufigsten Namen. 4. 952 externe Namen im UNIX Betriebssystem. 5. 627 Namen eines C Programms generiert aus C++. 6. 915 Zufallsnamen. 7. 614 Worte aus Kapitel 3.1 des Drachen-Buchs (Compilers). 8. 1201 englische Worte mit xxx als Prefix und Suffix 9. Die 300 Worte v100, v101, , v399.

Bei Kollisionen wird eine uere Verkettung mittels Listen verwendet. Es wird eine Tabelle der Gre 211 verwendet. Die Grundlage fr den Vergleich sind die Lngen der Listen. Es wird fr jede Hash-Funktion die Lnge der Listen festgestellt und daraus ein Verteilungsma berechnet. Dieses Verteilungsma wird auf 1 normiert, indem man es durch eine theoretisch berechnetes gleichmiges Verteilungsma dividiert. D.h. Werte um 1 stellen eine gleichmige Verteilung dar. Die Nummern im Diagramm sind die einzelnen Testflle aus der Testreihe; die besten Hash-Funktionen kommen zuerst. (Wieso x65599: Primzahl nahe 216, die fr 32-bit Integer schnell Overflow liefert.)
111
Page 7
112
Symbol Table, Handling Nested Scopes (1)

Approach: Separate symbol table for each scope.
static int w; int x; // L0
113
tblptr L3 L2b L2a b w c a z a L1 b c x L0
void f(int a, int b) { int c; // L1 { int b, z; // L2a ... } { int a, x; // L2b ... { int c, x; // L3 b = a + b + c + w; } } }
x f
Page 8
Symbol Table, Handling Nested Scopes (2)

Operations: 1. mktable(previous): creates new table 2. enter(table, name, type, offset): creates new entry for name 3. addwidth(table, width): stores width of all entries of table in header of table 4. enterproc(table, name, corrtable): new entry for procedure name. Stacks for symbol tables tblptr and offsets offset. Translation Scheme: ASU, Chapter 8
D proc id; N D ; S { t := top(tblptr); addwidth(t, top(offset)); pop(tblptr); pop(offset); enterproc(top(tblptr), id.name, t); } { enter(top(tblptr), id.name, T.type, top(offset)); top(offset) := top(offset) + T.width; } { t := mktable(top(tblptr)); push(t, tblptr); push(0, offset); }
114
D id : T N
Type Checking
Types: simple types structured types: array, struct, union, pointer Type equivalence: (Are two type expressions equivalent?) structural/declaration/name-equivalence Type checking: Expressions: e.g. usage of operators Statements: e.g. Boolean type of conditional expressions Additional topics in type checking: type conversion overloading etc.
115
Page 9
Structural/Declaration/Name-Equivalence
116
Structural equivalence: Two types are equivalent, if they have the same structure (i.e. consist of the same components). Name equivalence: Two types are equivalent, if they have either the same simple type or they have the same type name. Declaration equivalence: Type aliases are supported (weaker version of name equivalence). Two types are equivalent, if they lead back to the same type name.
struct A {int a; float b;} a; struct B {int c; float d;} b; typedef struct A C; C c,c1; struct.equiv. a = b; ok a = c; c = c1; ok ok
decl.equiv. error ok ok
name equiv. error error ok

117
Intermediate Representations
Page 10
Intermediate Representation (Zwischendarstellung)

Intermediate representation (IR): compile-time data structure that represents source program during translation. IR design rather art than science. Compiler may need several different IRs. Best choice depends on tasks which shall be fulfilled. No widespread agreement on this subject.
118
Taxonomy: Axis 1
Organizational structure: 1. Structural representations trees, e.g. parse tree, abstract syntax tree graphs, e.g. control flow graph 2. Linear representations pseudo-code for some abstract machine, e.g. three-address code 3. Hybrid representations combination of graphs and linear code, e.g. control flow graph
119
Page 11
Taxonomy, Axis 2
Level of abstraction: 1. High-level IR close to source program appropriate for high-level optimizations (loop transformations) and source-to-source translators e.g. control flow graph 2. Medium-level IR represents source variables, temporaries reduces control flow to un-/conditional branches machine independent, powerful instruction set 3. Low-level IR almost target machine instructions
120
Examples of Intermediate Representations

parse tree (rf. to syntax analysis) abstract syntax tree (AST) - (see next pages) directed acyclic graph (DAG) - (see next pages) control flow graph (CFG) - (see next pages) program dependence graph (PDG) static single assignment form (SSA) stack code three address code - (see next pages) hybrid combinations
121
Page 12
(Abstract) Syntax Tree (1)

Abstract syntax tree: condensed form of parse tree (concrete syntax tree). superficial nodes are omitted - more efficient. not unique (syntax tree - unlike parse tree - not defined by grammar). parse tree:
122
E
E
a
(abstract) syntax tree: +
+
E
a
E
a a
a
(Abstract) Syntax Tree (2)

if-statement: parse trees:
if
123
(abstract) syntax tree:
if
exp 0
stmt stmt1 if
else
stmt stmt2 0
if
stmt1
stmt2
if
exp 0
stmt stmt1
else-part else stmt stmt2

Page 13
Directed Acyclic Graph (DAG)

Directed acyclic graph DAG: contraction of AST that avoids dulpication: identical subtrees are reused. exposes redundancies: changes (assignments, calls) ? smaller memory footprint Example: a x (a-b) + c x (a-b)
AST:
*
124
+ *
DAG:
*
+ * c
a a
c a
b a
b
CFG: Basic Blocks

Program is broken up into set of basic blocks. Def. Basic Block: Maximum length sequence of instructions I1,...In (n 1) with exactly one entry point (I1) and exactly one exit point (In). (I.e. no branch instructions except perhaps the last instruction and no branch targets except perhaps at the first statement.)
125
Page 14
CFG: Definition
CFG models flow of control of a procedure: each node represents a basic block, each edge represents a potential flow of control. Def.: A control flow graph G is a triple G=(N,E,s,e), where (N,E) is a directed graph with nodes n N representing basic blocks and edges e E modeling the nondeterministic branching structure of G, s N is the entry node, e N is the exit node and there is a path from s to every node of G. Predecessors preds (x) = {u(u,x) E} Successors succs (x) = {u(x,u) E} Start /end node properties preds (s) = (start node has no predecessors) succs (e) = (end node has no successors)
126
Construction CFG: Nodes

Input: Sequence of instructions (linear IR) Output: List of basic blocks. Method. 1. Find set of leaders (first statements of basic blocks): i. First stmt of a function is a leader. ii. Any stmt that is target of some jump is a leader. iii. Any stmt that follows some jump stmt is a leader. 2. For each leader: all stmts up to but not including next leader or end of function form its basic block.
127
Page 15
Construction CFG: Edges

Input: List of basic blocks BBi of a procedure. Output: CFG. Method. 1. There is an edge from BBs to BBt , if i. there is a jump from the last stmt of BBs to the first stmt of BBt, or ii. BBt immediately follows BBs in the program and BBs does not end with unconditional jump. 2. If there are not unique entry and exit nodes, create such nodes.
128
Example: Control Flow Graph and Basic Blocks

Procedure: SQRT(L) if L is a non-negative integer.
129
B1
READ(L) N=0 K=0 M=1 L1: K=K+M C=K>L IF C GOTO L2
B1
B2
B2
B3
B3
N=N+1 N=M+2 GOTO L1 B4 L2: WRITE(N)
B4
Page 16
Three-Address Code
Stands for a variety of representations. In general, instructions of the form: res := arg1 op arg2 an operator op at most two operands arg1 and arg2, and one result res. Common three-address statements: simple assignments conditional/unconditional jumps indexed assignment: x := y[i], x[i] := y address and pointer assignments: x := & y, x := *y some more complex operations ?
130
Representing Three-Address Code

1. Quadruples: record structure with four fields: op, arg1, arg2, res. 2. Triples: refer to a temporary by the location of the instruction that computes it Requirement: Three-address instructions must be referenceable. Assumption: If a three-address code contains all three addresses, the target address is a temporary. 3. Indirect triples: listing pointers to triples rather than listing the triples themselves
131
Page 17
Example: Quadruples and Triples

z := x 2 x y Quadruples
(1) (2) (3) (4) (5) (6) t1 t2 t3 t4 t5 z := := := := (1) (2) (3) (4) (5) (6) 2 y t1 x t4 t5 (23) (24) (25) (26) (27) (28)
132
Triples
t2 t3
(1) (2) (3) (4) (5) (6)
:= := := :=
2 y (1) (2) x (4) (3) z (5)
Indirect triples
(23) (24) (25) (26) (27) (28)
:= := := :=
2 y (23) (24) x (26) (25) z (27)

Pros/Cons of Three-Address Representations

Quadruples: four fields, more memory easy to reorder Triples: three fields, memory efficient harder to reorder Indirect triples: about same amount of memory like quadruples equally easy to reorder than quadruples
133
Page 18
Implementing Linear IRs: Quadruples

Array:
t1 t2 t3 t4 t5 z := := := := 2 y t1 x t4 t5
134
Array of pointers:
t1 t2 t3 t2 t3 t4 t5 z := := := := 2 y t1 x t4 t5 t3 t2
Linked list:
t1 t2 t3 t4 t5 z := := := := 2 y t1 x t4 t5 t3 t2
Pros / Cons ?
Addressing Array Elements

Array storage layout: row-major order: row by row, rightmost subscript varies fastest (C) column-major order: column by column, leftmost subscript varies fastest (Fortran) indirection vectors (Java) Addressing: type A[low,high] A[i] base + (i-low) w base base address of A[low] w sizeof(type)
135
Page 19

A[i] base+(i-low)w A[i,j], row-major base+(i-low1)len2w+ +(j-low2)w with len2=high2-low2+1 A[i,j], column-major base+(j-low2)len1w+ +(i-low1)w with len1=high1-low1+1
136
A[i] - optimized iw+(baseloww) A[i,j] optimized ((ilen2)+j)w + (base-((low1len2)+low2w) A[i,j], optimized ((jlen1)+i)w + (base-((low2len1)+low1w)

Generalization for A[i1,i2, , ik]:
137
((L ((i1len2 + i2 )len3 + i3 ) L)lenk + ik ) w + + base ((L ((low1len2 + low2 )len3 + low3 ) L)lenk + lowk ) w
Second line: evaluated statically by compiler!
Page 20

Semantic Analysis and Intermediate Representations

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semantic Analysis and Intermediate Representations

Uploaded by

Copyright:

Available Formats

98

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Attribute Grammar (Attributierte Grammatik)

Example: Synthesized Attributes, Calculator

Example: Inherited Attributes, Declaration

X i .a j = f ij ( X 0 .a1 ,K , X 0 .ak ,K, X n .a1 , K, X n .ak )

Attribute Computation During Parsing

Compiler Design, WS 2005/2006

Restricted Attribute Grammars

Compiler Design, WS 2005/2006

Inherited Attributes and LR Parsing

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Experiment mit Hash-Funktionen (1)

Experiment mit Hash-Funktionen (2)

Compiler Design, WS 2005/2006

Experiment mit Hash-Funktionen (3)

Compiler Design, WS 2005/2006

Symbol Table, Handling Nested Scopes (1)

tblptr L3 L2b L2a b w c a z a L1 b c x L0

Compiler Design, WS 2005/2006

Symbol Table, Handling Nested Scopes (2)

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

name equiv. error error ok

Compiler Design, WS 2005/2006

Intermediate Representation (Zwischendarstellung)

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Examples of Intermediate Representations

Compiler Design, WS 2005/2006

(Abstract) Syntax Tree (1)

(abstract) syntax tree: +

(Abstract) Syntax Tree (2)

(abstract) syntax tree:

else-part else stmt stmt2

Directed Acyclic Graph (DAG)

CFG: Basic Blocks

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Construction CFG: Nodes

Compiler Design, WS 2005/2006

Construction CFG: Edges

Compiler Design, WS 2005/2006

Example: Control Flow Graph and Basic Blocks

READ(L) N=0 K=0 M=1 L1: K=K+M C=K>L IF C GOTO L2

N=N+1 N=M+2 GOTO L1 B4 L2: WRITE(N)

Compiler Design, WS 2005/2006

Compiler Design, WS 2005/2006

Representing Three-Address Code

Compiler Design, WS 2005/2006

Example: Quadruples and Triples

(1) (2) (3) (4) (5) (6)

2 y (1) (2) x (4) (3) z (5)

(23) (24) (25) (26) (27) (28)

2 y (23) (24) x (26) (25) z (27)

Pros/Cons of Three-Address Representations

Compiler Design, WS 2005/2006

Implementing Linear IRs: Quadruples