Professional Documents
Culture Documents
Semantic Analysis
Semantic Analysis
Goal: check correctness of program and enable proper execution. gather semantic information, e.g. symbol table. check semantic rules (type checking). Often used approach: basis: context-free grammar associate information to language constructs by attaching attributes (properties) to grammar symbols. specify semantic rules (attribute equations) for grammar productions to compute the values of the attributes. Attribute grammar: context-free grammar with attributes and semantic rules.
99
Page 1
100
synthesized at node n
inherited at node n
Compiler Design, WS 2005/2006
101
integer-valued synthesized attribute val with each nonterminal. integer-valued synthesized attribute lexval supplied by lexical analyzer.
Compiler Design, WS 2005/2006
Page 2
102
Declaration consists of type T followed by a list of variables. Type of T is synthesized and inherited to L: type synthesized attribute. in inherited attribute. Procedure symtab adds type of each identifier to its entry in the symbol table pointed to by attribute entry.
Compiler Design, WS 2005/2006
Attribute Grammar
103
Consider the production X0 X1 X2 Xn and the attributes a1,...,ak with the values of the attributes denoted by Xi.aj together with the following attribute equation:
Page 3
104
105
Page 4
106
Symbol Table
107
Central repository for distinct kinds of information. Alternative: information directly in intermediate representation (eg. as attributes). Variety of names: variables, defined constants, type definitions, procedures, compiler-generated variables, etc. Typical symbol table entry: identifier type information scope information memory location Principal symbol table operations: insert, lookup. Often realized as hash table.
Compiler Design, WS 2005/2006
Page 5
Hash-Funktionen (1)
Die Hash-Funktion mu Schlsseln gleichmig verteilen. Die Hash-Funktion soll effizient sein. Es ist sehr zu empfehlen fr PMAX eine Primzahl zu whlen. Betrachten wir nun Hash-Funktionen fr Strings der Lnge k mit Characters ci, 1 i k, und h sei der berechnete Wert. Mit h mod PMAX erhlt man Index. x : hi = hi 1 + ci , fr 1 i k mit h0 = 0; und h = hk . (siehe x65599, x16, x5, x2, x1 in Tabelle). quad: 4 aufeinanderfolgende Characters bilden einen Integer, die dann aufaddiert werden und h ergeben. middle: h ergibt sich aus mittleren 4 Characters eines Strings. ends: Addiert ersten 3 und letzten 3 Characters eines Strings fr h.
108
Hash-Funktionen (2)
Funktion hashpjw in C (P.J. Weinbergers C Compiler, siehe Drachenbuch):
#define PRIME 211 #define EOS \0 int hashpjw(s) char *s; { char *p; unsigned h=0, g; for (p=s; *p != EOS; p++) { h = (h<<4) + (*p); if (g = h & 0xf0000000) { h = h ^ (g >> 24); h = h ^ g; } } return h % PRIME; }
109
Page 6
110
(Experiment aus A.V. Aho, R. Sethi, J.D. Ullman: Compilers) Folgende Testreihe wird verwendet: 1. 50 hufigsten Namen und Keywords von Auswahl von C Programmen. 2. Wie 1., nur mit den 100 hufigsten Namen. 3. Wie 1., nur mit den 500 hufigsten Namen. 4. 952 externe Namen im UNIX Betriebssystem. 5. 627 Namen eines C Programms generiert aus C++. 6. 915 Zufallsnamen. 7. 614 Worte aus Kapitel 3.1 des Drachen-Buchs (Compilers). 8. 1201 englische Worte mit xxx als Prefix und Suffix 9. Die 300 Worte v100, v101, , v399.
Compiler Design, WS 2005/2006
111
Page 7
112
113
void f(int a, int b) { int c; // L1 { int b, z; // L2a ... } { int a, x; // L2b ... { int c, x; // L3 b = a + b + c + w; } } }
x f
Page 8
114
D id : T N
Type Checking
Types: simple types structured types: array, struct, union, pointer Type equivalence: (Are two type expressions equivalent?) structural/declaration/name-equivalence Type checking: Expressions: e.g. usage of operators Statements: e.g. Boolean type of conditional expressions Additional topics in type checking: type conversion overloading etc.
115
Page 9
Structural/Declaration/Name-Equivalence
116
Structural equivalence: Two types are equivalent, if they have the same structure (i.e. consist of the same components). Name equivalence: Two types are equivalent, if they have either the same simple type or they have the same type name. Declaration equivalence: Type aliases are supported (weaker version of name equivalence). Two types are equivalent, if they lead back to the same type name.
struct A {int a; float b;} a; struct B {int c; float d;} b; typedef struct A C; C c,c1; struct.equiv. a = b; ok a = c; c = c1; ok ok
decl.equiv. error ok ok
117
Intermediate Representations
Page 10
118
Taxonomy: Axis 1
Organizational structure: 1. Structural representations trees, e.g. parse tree, abstract syntax tree graphs, e.g. control flow graph 2. Linear representations pseudo-code for some abstract machine, e.g. three-address code 3. Hybrid representations combination of graphs and linear code, e.g. control flow graph
119
Page 11
Taxonomy, Axis 2
Level of abstraction: 1. High-level IR close to source program appropriate for high-level optimizations (loop transformations) and source-to-source translators e.g. control flow graph 2. Medium-level IR represents source variables, temporaries reduces control flow to un-/conditional branches machine independent, powerful instruction set 3. Low-level IR almost target machine instructions
120
121
Page 12
122
E
E
a
+
E
a
E
a a
a
Compiler Design, WS 2005/2006
123
if
exp 0
stmt stmt1 if
else
stmt stmt2 0
if
stmt1
stmt2
if
exp 0
stmt stmt1
Page 13
124
+ *
DAG:
*
+ * c
a a
c a
b a
b
Compiler Design, WS 2005/2006
125
Page 14
CFG: Definition
CFG models flow of control of a procedure: each node represents a basic block, each edge represents a potential flow of control. Def.: A control flow graph G is a triple G=(N,E,s,e), where (N,E) is a directed graph with nodes n N representing basic blocks and edges e E modeling the nondeterministic branching structure of G, s N is the entry node, e N is the exit node and there is a path from s to every node of G. Predecessors preds (x) = {u(u,x) E} Successors succs (x) = {u(x,u) E} Start /end node properties preds (s) = (start node has no predecessors) succs (e) = (end node has no successors)
126
127
Page 15
128
129
B1
B1
B2
B2
B3
B3
B4
Page 16
Three-Address Code
Stands for a variety of representations. In general, instructions of the form: res := arg1 op arg2 an operator op at most two operands arg1 and arg2, and one result res. Common three-address statements: simple assignments conditional/unconditional jumps indexed assignment: x := y[i], x[i] := y address and pointer assignments: x := & y, x := *y some more complex operations ?
130
131
Page 17
132
Triples
t2 t3
:= := := :=
Indirect triples
:= := := :=
133
Page 18
134
Array of pointers:
t1 t2 t3 t2 t3 t4 t5 z := := := := 2 y t1 x t4 t5 t3 t2
Linked list:
t1 t2 t3 t4 t5 z := := := := 2 y t1 x t4 t5 t3 t2
Pros / Cons ?
Compiler Design, WS 2005/2006
135
Page 19
136
A[i] - optimized iw+(baseloww) A[i,j] optimized ((ilen2)+j)w + (base-((low1len2)+low2w) A[i,j], optimized ((jlen1)+i)w + (base-((low2len1)+low1w)
137
((L ((i1len2 + i2 )len3 + i3 ) L)lenk + ik ) w + + base ((L ((low1len2 + low2 )len3 + low3 ) L)lenk + lowk ) w
Second line: evaluated statically by compiler!
Page 20