Professional Documents
Culture Documents
Examples:
Alphabet: A-Z Language: English
Alphabet: ASCII Language: C++
2
Suppose S = {a,b,c}. Some languages
over S could be:
{aa,ab,ac,bb,bc,cc}
{ab,abc,abcc,abccc,. . .}
{e}
{}
{a,b,c,e}
…
3
What is a language? (cont’d)
Alphabet Languages
{0,1} {0,10,100,1000,10000 }
, ,
{0,1,00,11,000,111, }
{a,b,c} {abc,Aabbcc,Aaab,bbccc}
4
Regular Languages
5
Regular Expressions
6
Regular Expressions (cont’d)
7
Regular Expressions (cont’d)
8
Rules
fix alphabet Σ
e is a regular exp. (denotes the language {e})
If a is in Σ , a is a regular expression (that denotes the
language {a}
if r and s are regular exps. denoting L(r) and L(s)
respectively, then so are:
(r) | (s) is a regular expression ( denotes the language L(r) L(s)
(r)(s) is a regular expression ( denotes the language L(r)L(s) )
(r)* is a regular expression (denotes the language ( L(r)* )
9
Example
L = {A, B, C, D } D = {1, 2, 3}
A|B|C|D =L
(A | B | C | D ) (A | B | C | D ) = L 2
(A | B | C | D )* = L*
(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D)
10
Regular Expression Operation
11
Regular Expression Operation
12
Examples
If S = {a,b}
(a | b)*b
b(a1b)*
13
Regular Expression Overview
Expression Meaning
e Empty pattern
a Any pattern represented by ‘a’
ab Strings with pattern ‘a’ followed by ‘b’
a|b Strings consisting of pattern ‘a’ or ‘b’
a* Zero or more occurrences of patterns in ‘a’
a+ One or more occurrences of patterns in ‘a’
a3 Patterns in ‘a’ repeated exactly 3 times
14
A regular expression R describes a set of
strings of characters denoted L(R)
L(R) = the language defined by R
– L(abc) = { abc }
– L(hello|goodbye) = { hello, goodbye }
– L(1(0|1)*) = all binary numbers that start with a 1
Each token can be defined using a regular
expression
15
RE Notational Shorthand
16
Regular Expression, R Strings in L(R)
“a”
– a ---------------------------------------- –
– “ab”
– ab --------------------------------------
– “a”, “b”
– a|b----------------------------------------
– “”, “ab”, “abab”, ...
– (ab)*------------------------------------- – “ab”, “b”
– (a| e)b------------------------------------ – “0”, “1”, “2”, ...
– digit = [0-9]----------------------------- – “8”, “412”, ...
– posint = digit+------------------------- – “23”, “34”, ...
– [0-9]+ (e|(.[0-9]+))-------------------- – “ “12”, “1.056”, ...
17
More Examples
18
Defining Our Language
digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
integer = {digit}+
Note that we can abbreviate ranges using the dash (“-”). Thus,
digit = 0-9
Relation = ‘<’ | ‘<=’ | ‘>’ | ‘>=’ | ‘<>’ | ‘=’
Floating point numbers are not much more complicated:
21
Real-world example
What is the regular expression that defines all phone
numbers in US?
∑ = { 0-9 }
Area = {digit}3
Exchange = {digit}3
Local = {digit}4