Language About Complier Construction

What is a language?
 An alphabet is a well defined set of

characters. The character ∑ is typically used
to represent an alphabet.
 A string : a finite sequence of alphabet
symbols, can be e, the empty string (Some
texts use l as the empty string)
 A language, L, is simply any set of strings
(infinite or finite)over a fixed alphabet. can be
{ }, the empty language.
1
What is a language? (cont’d)
Examples:
Alphabet: A-Z Language: English
Alphabet: ASCII Language: C++
2
Suppose S = {a,b,c}. Some languages
over S could be:
 {aa,ab,ac,bb,bc,cc}
 {ab,abc,abcc,abccc,. . .}
 {e}
 {}
 {a,b,c,e}
 …
3
What is a language? (cont’d)
Alphabet Languages
{0,1} {0,10,100,1000,10000 }
, ,
{0,1,00,11,000,111, }
{a,b,c} {abc,Aabbcc,Aaab,bbccc}
4
Regular Languages
 Formally describe tokens in the language

– Regular Expressions
– NFA
– DFA
5
Regular Expressions
 A Regular Expression is a set of rules ,

techniques for constructing sequences of
Symbols (Strings) From an Alphabet.
If A is a regular expression, then L(A) is the
language defined by that regular expression.
L(“c”) is the language with the single word “c”.
L(“i” “f”) is the language with just “if” in it.
6
Regular Expressions (cont’d)
L(“if” | “then” | “else”) is the language with just

the words “if”, “then”, and “else”.
L((“0” | “1”)(“0” | “1”)) is the language

consisting of “00”, “01”, “10” and “11”.
7
Regular Expressions (cont’d)
 Let Σ Be an Alphabet, r a Regular

Expression Then L(r) is the Language That
is Characterized by the Rules of r
8
Rules
 fix alphabet Σ
 e is a regular exp. (denotes the language {e})
 If a is in Σ , a is a regular expression (that denotes the
language {a}
 if r and s are regular exps. denoting L(r) and L(s)
respectively, then so are:
 (r) | (s) is a regular expression ( denotes the language L(r)  L(s)
 (r)(s) is a regular expression ( denotes the language L(r)L(s) )
 (r)* is a regular expression (denotes the language ( L(r)* )
9
Example
L = {A, B, C, D } D = {1, 2, 3}
A|B|C|D =L
(A | B | C | D ) (A | B | C | D ) = L 2
(A | B | C | D )* = L*
(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L  D)
10
Regular Expression Operation
 There are three basic operations in regular

expression :
– Alternation (union) RE1 | RE2
– Concatenation (concatenation) RE1 RE2
– Repetition (closure) RE* (zero or more
RE’s)
11
Regular Expression Operation
If P and Q are regular expressions over S, then so are:

• P | Q (union)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then P +
Q denotes the set {a,…,e,0,…,9}
• PQ (concatenation)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then PQ
denotes the set {a0,…,e0,a1,…,e9}
• Q* (closure)
If Q denotes the set {0,…,9} then Q* denotes the set
{0,…,9,00,…99,…}
12
Examples
If S = {a,b}
 (a | b)*b
 b(a1b)*
13
Regular Expression Overview
Expression Meaning
e Empty pattern
a Any pattern represented by ‘a’
ab Strings with pattern ‘a’ followed by ‘b’
a|b Strings consisting of pattern ‘a’ or ‘b’
a* Zero or more occurrences of patterns in ‘a’
a+ One or more occurrences of patterns in ‘a’
a3 Patterns in ‘a’ repeated exactly 3 times
14
 A regular expression R describes a set of
strings of characters denoted L(R)
 L(R) = the language defined by R
– L(abc) = { abc }
– L(hello|goodbye) = { hello, goodbye }
– L(1(0|1)*) = all binary numbers that start with a 1
 Each token can be defined using a regular
expression
15
RE Notational Shorthand
 R+ one or more strings of R: R(R*)

 R? optional R: (R|e)
 [abcd] one of listed characters: (a|b|c|d)
 [a-z] one character from this range:
(a|b|c|d...|z)
 [^ab] anything but none of the listed chars
 [^a-z] any character not from this range
16
 Regular Expression, R  Strings in L(R)
“a”
– a ---------------------------------------- –
– “ab”
– ab --------------------------------------
– “a”, “b”
– a|b----------------------------------------
– “”, “ab”, “abab”, ...
– (ab)*------------------------------------- – “ab”, “b”
– (a| e)b------------------------------------ – “0”, “1”, “2”, ...
– digit = [0-9]----------------------------- – “8”, “412”, ...
– posint = digit+------------------------- – “23”, “34”, ...
– [0-9]+ (e|(.[0-9]+))-------------------- – “ “12”, “1.056”, ...
17
More Examples
 All Strings that start with “tab” or end with

bat”:
tab{A,…,Z,a,...,z}*|{A,…,Z,a,....,z}*bat
 All Strings in which {1,2,3} exist in ascending
order:
{A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*
18
Defining Our Language
The first thing we can define in our language are

keywords. These are easy:
if | else | while | find | …
When we scan a file, we can either have a single

token represent all keywords, or else break them
down into groups, such as “commands”, “types”,
19 etc.
Language Def (cont’d)
Next we will define integers in a language:
digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
integer = {digit}+
Note that we can abbreviate ranges using the dash (“-”). Thus,
digit = 0-9
Relation = ‘<’ | ‘<=’ | ‘>’ | ‘>=’ | ‘<>’ | ‘=’
Floating point numbers are not much more complicated:
float = {digit}+ “.” {digit}+

20
Language Def (cont’d)
Identifiers are strings of letters, underscores, or digits

beginning with a non-digit.
Letter = a-z | A-Z

digit = 0-9
Identifier = ({letter} | “_”)({letter} | “_” | {digit})*
21
Real-world example
What is the regular expression that defines all phone
numbers in US?
∑ = { 0-9 }
Area = {digit}3
Exchange = {digit}3
Local = {digit}4
Phone_number = “(” {Area} “)” {Exchange} {Local}

22
THANKS
23

Language About Complier Construction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Language About Complier Construction

Uploaded by

Copyright:

Available Formats

What is a language?

 An alphabet is a well defined set of

 Formally describe tokens in the language

 A Regular Expression is a set of rules ,

L(“if” | “then” | “else”) is the language with just

L((“0” | “1”)(“0” | “1”)) is the language

 Let Σ Be an Alphabet, r a Regular

 There are three basic operations in regular

If P and Q are regular expressions over S, then so are:

 R+ one or more strings of R: R(R*)

 All Strings that start with “tab” or end with

The first thing we can define in our language are

if | else | while | find | …

When we scan a file, we can either have a single

Next we will define integers in a language:

float = {digit}+ “.” {digit}+

Identifiers are strings of letters, underscores, or digits

Letter = a-z | A-Z

Phone_number = “(” {Area} “)” {Exchange} {Local}

You might also like