Professional Documents
Culture Documents
Introduction
Lexical Analysis
Overview
Christian Schulte
IMIT, KTH
Organizational
Course overview
Compiler structure
Lexical analysis
www.imit.kth.se/~schulte/
2005-10-25
2005-10-25
Textbook
Andrew W. Appel, Modern Compiler
Implementation in Java
2nd edition, Cambridge University Press,
2002.
Organizational
2005-10-25
2005-10-25
Kursnmnd
2005-10-25
2005-10-25
No labs
Examination
course passed
labs passed
full exam
240 points
2005-10-25
2005-10-25
Reading Suggestion
Chapters 1 and 2
Course Overview
2005-10-25
2005-10-25
Compiler
Two aspects
source language
target language
10
execution
execute program
2005-10-25
11
2005-10-25
12
Execution Environments
Can be concrete hardware
Compilation
2005-10-25
13
Compilation Phases
2005-10-25
14
Frontend: Tasks
Lexical analysis
source
program
frontend
backend
Syntax analysis
intermediate
representation
target
program
Semantic analysis
15
2005-10-25
Optimization
Optimization
Dead-code elimination
Instruction selection
Strength reduction
Register allocation
Constant/value propagation
Code motion
16
2005-10-25
18
Overall Structure
Compiler has two main phases
Lexical Analysis
analysis
synthesis
understand program
"front end"
put it together in different way
"back end"
2005-10-25
19
2005-10-25
Lexical Analyzer
Lexical Tokens
Also: lexer
Takes a stream of characters
Produces a stream of tokens
names
keywords
punctuation marks
discards white space and comments
Non-tokens
Simple task
2005-10-25
21
2005-10-25
20
22
Example Program
float match0(char* s) {
/* find a zero */
if (!strncmp(s, "0.0", 3))
return 0.;
}
2005-10-25
23
2005-10-25
24
ID(match0)
STAR
LBRACE
BANG
ID(s)
COMMA
RPAREN
SEMI
Approach
Specification of lexical tokens
regular expression (regexp)
LPAREN
ID(s)
IF
ID(strncmp)
COMMA
NUM(3)
RETURN
RBRACE
Implementation of lexer
deterministic finite automaton (DFA)
Computing DFA from regexp
nondeterministic finite automaton (NFA)
25
2005-10-25
Regular Expressions
Regular Expressions
Language:
String:
Symbol
Alternation
M|N
set of strings
finite sequence of symbols
Example
language of primes:
in language of N
Concatenation
27
Regular Expressions
Epsilon
Repetition
M*
MN
26
2005-10-25
28
a|b
(a|b)a
(ab)|
((a|b)a)*
{"a","b"}
{"aa","ba"}
{"ab",""}
{"","aa","ba",
"aaaa","aaba",
"baaa","baba",
}
2005-10-25
29
2005-10-25
30
Lexical Specification
Examples
Conventions
Sometimes omit or
ab means a b
(a|) means (a|)
means a(b)*
2005-10-25
31
[abcd]
[b-g]
[a-cA-C01]
M?
M+
.
"xyz+-*"
32
if
IF
[a-z][a-z0-9]*
ID
[0-9]+
NUM
([0-9]+"."[0-9]*)|([0-9]*"."[0-9]+)REAL
(" "|"\t"|"\n"|"\r")
no token
means
a|b|c|d
means
[bcdefg]
means
[abcABC01]
means
(M|)
means
(MM*)
any character but newline
stands for itself
2005-10-25
Programming Language
Token Specifications
Abbreviations
2005-10-25
error
33
2005-10-25
34
Disambiguation
Does if8 match ID or IF NUM(8)?
Disambiguation rules commonly used
longest match
rule priority
2005-10-25
Finite Automata
35
2005-10-25
36
Finite Automata
i
1
2005-10-25
37
Start state: 1
Final states: 3
2005-10-25
38
Finite Automata
Deterministic finite automaton (DFA)
a-z
0-9
Start state: 1
Final states: 2
2005-10-25
39
Accepted Language
2005-10-25
40
Example DFA
a
b
2
41
2005-10-25
42
Accepting abab
Accepting abab
a
b
2
1
4
String to process
State
String to process
State
abab
1 (start state)
43
Accepting abab
2005-10-25
bab
4
44
Accepting abab
a
b
2
1
4
1
5
String to process
State
2005-10-25
2005-10-25
String to process
State
ab
5
45
Accepting abab
2005-10-25
b
4
46
Combining DFAs
a
b
2
1
4
String to process
State
47
2005-10-25
48
Nondeterministic
Finite Automata
2005-10-25
49
NFAs
2005-10-25
50
Example NFA
b
a
2
How to accept?
To process: abbb
51
Accepting abbb
2005-10-25
Accepting abbb
b
a
2
1
4
1
5
String to process
Set of states
2005-10-25
52
abbb
{1} (containing start state)
53
String to process
Set of states
2005-10-25
bbb
{2,4}
54
Accepting abbb
Accepting abbb
b
a
2
1
4
String to process
Set of states
2005-10-25
String to process
Set of states
bb
{2,3,5}
55
Accepting abbb
2005-10-25
b
{2,3}
56
a
2
1
4
String to process
Set of states
{2,3}
immediately
57
2005-10-25
58
Summary
Compilers
Summary
2005-10-25
59
2005-10-25
60
10