You are on page 1of 66

VYSYA COLLEGE, SALEM-103. NAME : A. KAJA MOHIDEEN SUBJECT : THEORY OF AUTOMATA AUGUST F!

"# : 11$0%$0% T" : 1&$0%$0% UNIT ' I A()"#*)* T+,"!Introduction Structural representation Automata and complexity Alphabets Strings Languages Problems SE TEMBER F!"# : 04$05$0% T" : 15$05$0% UNIT ' II R,6(3*! E72!,881"/8 - Finite automata and regular expression - Applications of regular expressions - Algebraic la s of regular expressions - Pro"ing languages not to be regular - Decision properties of regular languages - !#ui"alence and minimi$ation of automata - %oore and %ealy machines . T0 F!"# : 44$05$0% T" : 30$05$0% UNIT ' III C"/),7) F!,, G!*##*! - Definition - Deri"ations using a grammar - Leftmost and rightmost deri"ation - The language of a grammar - Sentential forms - Parse trees (8+ 9":/ *()"#*)* - Definition - Languages of a PDA - !#ui"alence of PDAs and &F's - Deterministic PDA . T0 F!"# : 40$10$0% T" : 31$10$0% UNIT ' V I/)!*c)*<3, !"<3,#8 The classes P and NP The NP complete problem &omplements of languages in NP Problem sol"able in polynomial space . T0 CLASS : I M.Sc. CS UNITS : 5 OCTOBER F!"# : 03$10$0% T" : 0;$10$0% UNIT ' IV T(!1/6 M*c+1/, - Introduction - Notations - Descriptions - Transition diagrams - Languages - Turing machines and halting( F!"# : 10$10$0% T" : 1;$10$0% UNIT ' IV .C"/)..0 Programming techni#ues of Turing machines %ultitape Turing machines )estricted Turing machines Turing machines and computers . T0

F!"# : 1%$0%$0% T" : 31$0%$0% UNIT ' I .C"/)..0 F1/1), A()"#*)* - Introduction - Deterministic finite automata - Non-deterministic finite automata A2231c*)1"/ - Text search - Finite automata ith !psilon transition . T0

UNIT -I =+*) 18 )+, 2(!2"8, "> 8)(9-1/6 *()"#*)* )+,"!-? =+*) 18 )+, /,,9 )" 8)(9- A()"#*)* )+,"!-? ."!0 .5 #*!@80 hy the study of

Automata theory is the study of abstract computing de"ices( There are se"eral reasons automata and complexity is an important part of the core of computer science( I/)!"9(c)1"/ )" F1/1), A()"#*)*

Finite automata are a useful model for many important *inds of hard are and soft are( Let us list some of the most important *inds+ Soft are for designing and chec*ing the beha"ior of digital circuits( The ,lexical analy$er- of a typical compiler. that is. the compiler component that brea*s the input text into logical units. such as identifiers. *ey ords and punctuations( Soft are for scanning large bodies of text. such as collections of eb pages. to find occurrences of ords. phrases. or other patterns( Soft are for "erifying systems of all types that ha"e a finite number of distinct states. such as communications protocols or protocols for secure exchange of information( There are many systems or components. contains finite number of ,states-( The purpose of a state is to remember the rele"ant portion of the system/s history( The ad"antage of ha"ing only a finite number of states is that e can implement the system ith a fixed set of resources( E7*#23, The simplest finite automaton is an on0off s itch( The de"ice remembers hether it is the ,on- state or the ,off- state. and it allo s the user to press a button hose effect is different. depending on the state of the s itch( That is. if the s itch is in the off state. then pressing the button changes it to the on state. and if the s itch is in the on state. then pressing the same button turns it to the off state( push start
"> > " /

push Fig+ A finite automaton modeling an on0off s itch As for all finite automata. the states are represented by circles1 in this example. e ha"e named the states on and off( Arcs bet een states are labeled by ,inputs-. hich represent external influences on the system( 2ere. both arcs are labeled by the input Push. hich represents a user pushing the button( The intent of the t o arcs is that hiche"er state the system is in. hen the push input is recei"ed it goes to the other state( the state in hich the system is placed initially is called as the ,start state-.( In our example. the start state is off. and e con"eniently indicate the start state by the ord Start and an arro leading to that state( S)!(c)(!*3 R,2!,8,/)*)1"/8: There are t o important notations that are not automaton-li*e. but play an important role in the study of automata and their applications( G!*##*!8

These are useful models hen designing soft are that processes data ith a recursi"e structure( The best *no n example is a ,parser-. the component of a compiler that deals ith the recursi"ely nested features of the typical programming language. such as expressions-arithmetic. conditional and so on( For instance. a grammatical rule li*e ! ! 3 ! states that an expression can be formed by ta*ing any t o expressions and connecting them by a plus sign1 this rule is typical of ho expressions of real programming languages are formed(

R,6(3*! E72!,881"/8 These also denote the structure of data. especially text strings( The style of these expressions differ significantly from that of grammars( The 4NI5-style regular expression 67A-89 7a-$9 : 79 7A-89 7A-89/ represents capitali$ed ords follo ed by a space and t o capital letters( This expression represents patterns in text that could be a city and state. e(g(. Ithaca N;( It misses multi ord city names. such as Palo alto &A. hich could be captured by the more complex expression 6<7A-89 7a-$9 : 79 7A-89 7A-89=/ hen interpreting such expressions. e only need to *no that 7A-89 represents a range of characters from capital ,A- to capital ,8- and 79 is used to represent the blan* character alone( Also. the symbol : represents ,any number of- the preceding expression( Parentheses are used to group components of the expression1 they do not represent characters of the text described( A()"#*)* */9 c"#23,71)Automata are essential for the study of the limits of computation( There are t o important issues+ >hat can a computer do at all? This study is called ,decidability-. and the problems that can be sol"ed by computer are called ,decidable-( >hat can a computer do efficiently? This study is called ,intractability-. and the problems that can be sol"ed by a computer using no more time than some slo ly gro ing function of the si$e of the input are called ,tactable-( G1A, )+, c,/)!*3 c"/c,2)8 "> A()"#*)* )+,"!-. E723*1/ <!1,>3- *<"() A32+*<,)8, S)!1/68, L*/6(*6,8 */9 !"<3,#8. ."!0 .10 M*!@80

The most important definitions include the ,alphabet- <a set of finite symbols=. ,strings-<a list of symbols from an alphabet= and ,language- < a set of strings from the same alphabet=( A32+*<,)8 An alphabet is a finite. nonempty set of symbols( >e use the symbol @ for an alphabet( &ommon alphabets include+ A( @ B CD. AE. the binary alphabet F( @ B C a. b. GGG((.$E. the set of all lo er-case letters( H( The set of all AS&II characters. or the set of all printable AS&II characters( S)!1/68 A string <or sometimes ord= is a finite se#uence of symbols chosen from some alphabet( For eg(. DAADA is a string from the binary alphabet @ B CD. AE( The string AAA is another string chosen from this alphabet( T+, E#2)- S)!1/6 The empty string is the string ith $ero occurrences of symbols( This string. denoted . is a string that may be chosen from any alphabet hatsoe"er( L,/6)+ "> * S)!1/6

It is often useful to classify strings by their length. that is. the number of positions for symbols in the string( For instance. DAADA has length I( The standard notation for the length of a string is J J( For eg(. J DAA J B H and J J BD( ":,!8 "> */ A32+*<,) If @ is an alphabet. the set of all strings of a certain length from that alphabet is expressed by using an exponential notation( >e define @* to be the set of strings of length *. each of hose symbols is in @( Note that @D BCE. regardless of hat alphabet @ is( That is. is the only string hose length is D( If @ B CD. AE. then @A B CD. AE. @F B CDD.DA.AD.AAE. @H B C DDD. DDA.DAD.DAA.ADD.ADA.AAD.AAAE G(( There is a slight confusion bet een @ and @A( The former is an alphabet1 its members D and A are symbols( The latter is a set of strings1 its members are the strings D and A. each of hich is of length A( !g+

The set of all strings o"er an alphabet is denoted by :( For instance. CD. AE: B C. D. A. DD. DA. AD. AA. DDD. G( E( That is : B D A F ( ( ( The set of nonempty strings from alphabet is denoted by 3( Thus the e#ui"alences are o 3 B A F H G( o : B 3 CE

C"/c*),/*)1"/ "> 8)!1/68 Let x and y be strings( Then xy denote the concatenation of x and y. that is. the string formed by ma*ing a copy of x and follo ing it by a copy of y( If x is the string composed of i symbols x B aAaFGGGG(ai and y is the string composed of K symbols y B bAbFGGGG(bK. then xy is the string of length i3K1 xy B aAaFGG(aibAbFGG(bK( E6: Let x B DAADA and y B AAD( Then xy BDAADAAAD and yx B AADDAADA( For any string . the e#uations B B L*/6(*6,8 A set of strings chosen from some @:. is called a language. denoted by L( The language can be expressed as L @:( For any programming language. the legal programs are a subset of the possible strings that can be formed from the alphabet of the language( This alphabet is a subset of the AS&II characters( For example consider the follo ing languages+ o The language of all strings consisting of n D/s follo ed by n A/s. for some n D1 C. DA.DDAA.DDDAAA.GGGE( o The set of strings of D/s and A/s ith an e#ual number of each+ C. DA. AD. DDAA. DADA. ADDA.GG(E o The set of binary numbers hose "alue is a prime+ CAD. AA. ADA. AAA. ADAA. GGG(E o @: is a language for any alphabet @( o . the empty language. is a language o"er any alphabet( o CE. the language consisting of only the empty string. is also a language o"er any alphabet( Notice that B CE1 the former has no strings and the latter has one string( !"<3,#8

In automata theory. a problem is the #uestion of deciding hether a gi"en string is a member of some particular language( A ,problem- can be expressed as a membership in a language( If @ is an alphabet. and L is a language o"er @. then the problem L is+ 'i"en a string in @:. decide hether or not is in L( E6: The problem of testing can be expressed by the language L p consisting of all binary strings hose "alue as a binary number is a prime( That is. gi"en a string of D/s and A/s say ,yes- if the string is the binary representation of a prime and say ,no- if not( Lne potentially unsatisfactory aspect of our definition of ,problem- is that one commonly thin*s of problems not as decision #uestions but as re#uests to compute or transform some input( The techni#ue of sho ing one problem hard by using its supposed efficient algorithm to sol"e efficiently another problem that is already *no n to be hard is called a ,reduction- of the second problem to the first( E723*1/ 1/ 9,)*13 *<"() D,),!#1/18)1c F1/1), A()"#*)*. D18c(88 *<"() D,),!#1/18)1c F1/1), A()"#*)*. ."!0 .10 #*!@80

The term ,deterministic- refers to the fact that on each input there is one and only one state to hich the automaton can transition from its current state( A deterministic finite automaton consists of+ A( A finite set of states. often denoted M( F( A finite set of input symbols. often denoted @( H( A transition function that ta*es as arguments a state and an input symbol and returns a state( N( A start state. one of the states in M( I( A set of final or accepting states F( The set F is a subset of M( A Deterministic Finite Automaton ill often be referred to by its acronym+ DFA( DFA in ,fi"e-tuplenotation is gi"en as. A B < M. @. . #o. F= >here A is the name of the DFA. M is its set of states. @ is its input symbols. its transition function. #o its start state. and F its set of accepting states( L*/6(*6, "> )+, DFA The ,language- of the DFA is the set of all strings that the DFA accepts( Suppose aA.aFGGGG an is a se#uence of input symbols( >e start out ith the DFA in its start state. #o( The transition function . say <#o. aA= B #A to find the state that the DFA enters after processing the first input symbol aA( Then process the next input symbol a F. by e"aluating <#A. aF=1 continue in this manner. finding states #H.#NGG(( #n such that <#i-A. ai= B #i for each i( If #n is a member of F. then the input aA.aFGGGG an is accepted. and if not then it is ,reKected-(

E7*#23,: Let us formally specify a DFA that accepts all and only the strings of D/s and A/s that ha"e the se#uence DA some here in the string( >e can rite this language as+ LBC 0 is of the form xDAy for some strings x and y consisting of D/s and A/s only E

Another e#ui"alent description. using parameters x and y to the left of the "ertical bar. is+ CxDAy0x are any strings D/s and A/sE !xamples of strings in the language include DA. AADAD. and ADDDAA( !xamples of strings not in the language include . D and AAADDD(

S1#23,! N")*)1"/8 >"! DFAB8 Specifying a DFA as a fi"e-tuple ith a detailed description of the transition function is both tedious and hard to read( There are t o preferred notations for describing automata+ A( A transition diagram hich is a graph F( A transition table. hich is a tabular listing of the functions. hich by implication tell us the set of states and the input alphabet( T!*/81)1"/ D1*6!*#8: A transition diagram for a DFA A B <M. @. . #D. F= is a graph defined as follo s+ A( For each state in M there is a node F( For each state # in M and each input symbol a in @. let <#. a= B p( then the transition diagram has an arc from node # to node p. labeled a( If there are se"eral input symbols that cause transitions from # to p. then the transition diagram can ha"e one arc. labeled by the list of these symbols( H( There is an arro into the start state #o. labeled start( This arro does not originate at any node( N( Nodes corresponding to accepting states are mar*ed by a double circle( States not in F ha"e a single circle( E7*#23, In the figure(. gi"en belo the three nodes that corresponds to the three states( There is a start arro entering the start state. #o. and the one accepting state. # A. is represented by a double circle( Lut of each state is one arc labeled D and one arc labeled A(

The transition diagram for the DFA accepting all strings ith a substring DA T!*/81)1"/ )*<3,8 A transition table is a con"entional. tabular representation of a function li*e that ta*es t o arguments and returns a "alue( The ro s of the table correspond to the states. and the columns correspond to the inputs( The entry for the ro corresponding to state # and the column corresponding to input a is the state <#. a=( D #D #F : #A #A #F #F A #D #A #A

E7*#23, The t o features of a transition table mar*ed belo ith the start state being mar*ed ith an arro . and the accepting states are mar*ed ith a star( Since e can deduce the sets of states and input symbols by loo*ing at the ro and column heads. e can no read from the transition table all the information e need to specify the finite automaton uni#uely( E7),/91/6 )+, )!*/81)1"/ >(/c)1"/ )" 8)!1/68: In terms of the transition diagram. the language of a DFA is the set of labels along all the paths that lead from the start state to any accepting state( >e define an extended transition function that describes hat happens hen e start in any state and follo any se#uence of inputs( If is our transition function. then the extended transition function constructed from ill be (The extended transition function is a function that ta*es a state # and a string and returns a state p O the state that the automaton reaches hen starting in state # and processing the se#uence of inputs ( >e define by induction on the length of the input string. as follo s+

<#. = B #( That is. if e are in state # and read no inputs. then e are still in state #( Suppose is a string of the form xa1 that is. a is the last symbol of . and x is the string consisting of all but the last symbol( For eg(. B AADA is bro*en into x B AAD and a BA( Then <#. = B < <#. x=. a= To compute <#. =. first compute <#. x=. the state that the automaton is in after processing all but the last symbol of ( Suppose this state is p1 that is <#. x= B p( Then <#. = is hat e get by ma*ing a transition from state p on input a. the last symbol of (That is. <#. = B <p. a= BASIS INDUCTION L*/6(*6, "> * DFA: >e can define the language of a DFA A B<M. . .#D. F=( This language is denoted L<A=. and is defined by L<A= B C 0 <#D. = is in FE That is. the language of A is the set of strings that ta*e the start state # D to one of the accepting states( If L is L<A=b for some DFA A. then e say L is a regular language( D,816/ * DFA )" *cc,2) )+, 3*/6(*6, L C D : 0 : +*8 <")+ */ ,A,/ /(#<,! "> 0B8 */9 */ ,A,/ /(#<,! "> 1B8E ."!0 G1A, )+, D,),!#1/18)1c F1/1), A()"#*)* >"! )+, L*/6(*6, L C D :0: +*8 <")+ */ ,A,/ /(#<,! "> 0B8 */9 */ ,A,/ /(#<,! "> 1B8E

.5 #*!@80

The state of the DFA is to count both the number of D/s and the number of A/s. but count then modulo F( That is. the state is used to remember hether the number of D/s seen so far is e"en or odd. and also to remember hether the number of A/s seen so far is e"en or odd( There are thus four states. hich can be gi"en the follo ing interpretations+ F0 + Poth the number of D/s seen so far and the number of A/s seen so far are e"en( F1 + The number of D/s seen so far is e"en. but the number of A/s seen so far is odd( F4 + The number of A/s seen so far is e"en. but the number of D/s seen so far is odd( F3 + Poth the number of D/s seen so far and the number of A/s seen so far are odd( State #D is both the start state and the lone accepting state( It is the start state. because before reading any inputs. the numbers of D/s and A/s seen so far are both $ero. and $ero is e"en( It is the only accepting state. because it describes exactly the condition for a se#uence of D/s and A/s to be in language L( >e no *no almost ho to specify the DFA for language L( It is A B <C #D. #A. #F. #HE. CD.AE. . #D. C#DE= here the transition function is described by the transition diagram

!ach input D causes the state to cross the hori$ontal. dashed line( Thus. after seeing an e"en number of D/s e are al ays abo"e the line. in state # D or #A hile after seeing an odd number of D/s e are al ays belo the line. in state #F or #H( Li*e ise. e"ery A causes the state to cross the "ertical. dashed line( Thus. after seeing an e"en number of A/s. e are al ays to the left. in state # D or #A hile after seeing an odd number of D/s e are al ays belo the line. in state #F or #H( Li*e ise. e"ery A causes the state to cross the "ertical.

dashed line( Thus. after seeing an e"en number of A/s. e are al ays to the left. in state # D or #F. hile after seeing an odd number of A/s e are to the right. in state #A or #H( >e can also represent this DFA by a transition table( 2ere. e illustrate the construction of (Suppose the input is AADADA( Since this string has e"en number of D/s and A/s both. e expect it is in the language( Thus. e expect <#D. AADADA= B #D. since #D is the only accepting state( The chec* in"ol"es computing <#D. = for each prefix of AADADA. starting at and going in increasing si$e( The summary of this calculation is+ <#D. = B #D <#D. A= B #D B < <#D. =.A= B <#D.A= B #A <#D. AA= B < <#D. A=.A= B <#A.A= B #D <#D. AAD= B < <#D. AA=.D= B <#D.D= B #F <#D. AADA= B < <#D. AAD=.A= B <#F.A= B #H <#D. AADAD= B < <#D. AADA=.D= B <#H.D= B #A <#D. A= B #D B < <#D. AADAD=.A= B <#A.A= B #D E723*1/ *<"() N"/-9,),!#1/18)1c F1/1), A()"#*)*. D18c(88 1/ 9,)*13 *<"() NFA? ."!0 .10 #*!@80

A non-deterministic finite automaton <NFA= has the po er to be in se"eral states at once( This ability is often expressed as an ability to ,guess- something about its input( For instance. hen the automaton is used to search for certain se#uences of characters<eg(. *ey ords= in a long text string. it is helpful to ,guess- that e are at the beginning of one of those strings and use a se#uence of states to do nothing but chec* that the string appears. character by character( A/ I/>"!#*3 A1,: "> NFA: Li*e DFA. NFA has a finite set of states. a finite set of input symbols. one start state and a set of accepting states( It also has a transition function. hich e shall commonly call ( The difference bet een the DFA and NFA is in the type of ( For the NFA. is a function that ta*es a state and input symbol as arguments. but returns a set of $ero. one or more states( E7*#23, A non-deterministic finite automaton. hose Kob is to accept all and only the strings of D/s and A/s that end in DA( State #D is the start state and e thin* of the automaton as being in state # D hene"er it has not yet ,guessed- that the final DA has begun( It is al ays possible that the next symbol does not begin the final DA. e"en if that symbol is D( Thus. state #D may transition to itself on both D and A(
D. A Start #
D

#
D

D Fig<a= An NFA accepting all strings that end in DA

2o e"er. if the next symbol is D. this NFA also guesses that the final DA has begun( An arc labeled D thus leads from #D to state #A( Notice that there are t o arcs labeled D out of # D( The NFA has the option of going either to #D or to #A and in fact it does both( In state # A. the NFA chec*s that the next symbol is A. and if so. it goes to state #F and accepts( Notice that there is no arc out of #A labeled D. and there are no arcs at all out of #F( In these situations. the thread of the NFA/s existence corresponding to those states simply ,dies- although other threads may continue to exist(

The follo ing fig<b= suggests ho an NFA processes inputs( >e ha"e sho n hat happens hen te automaton of fig<a= recei"es the input se#uence DDADA( It starts in only its start state. # D(>hen the first D is read. the NFA may go to either state # D or state #A. so it does both( These t o threads are suggested by the second column in the fig<b=.

Fig<b= The states of an NFA is in during the processing of input se#uence DDADA Then. the second D is read( State #D may again go to both #D and #A( 2o e"er. state #A has no transition on D. so it ,dies-( >hen the third input. aA. occurs. e must consider transitions from both # D and #A( >e find that #D goes only to #D on A. hile #A goes only to #F( Thus. after reading DDA. the NFA is in states # D and #F( Since #F is an accepting state. the NFA accepts DDA(2o e"er. the input is not finished( The fourth input. aD. causes #F/s thread to die. hile #D goes to both #D and #A(The last input. aA. sends #D to #D and #A to #F( Since e are again in an accepting state. DDADA is accepted( D,>1/1)1"/ "> N"/9,),!#1/18)1c F1/1), A()"#*)*: An NFA is represented essentially li*e a DFA+ A B <M. . .#D. F= >here+ A( M is a finite set of states F( is a finite set of input symbols H( #D a member of M. is the start state N( F. a subset of M. is the set of final <or accepting= states( I( . the transition function is a function that ta*es a state in M and an input symbol in as arguments and returns a subset of M( E7*#23, The NFA of fig<a= can be specified formally as <C #D. #A. #F. E. C D.AE. . #D. C #FE= here the transition function is gi"en by the transition table as follo s+ 0 1 #D C #D. #A E C #D E #A C #F E : #F Notice that the transition tables can be used to specify the transition function for an NFA as ell as for a DFA( The only difference is that each entry in the table for the NFA is a set. e"en if the set is a singleton<has one member=( >hen there is no transition at all from a gi"en state on a gi"en input symbol. the proper entry is . the empty set( T+, E7),/9,9 T!*/81)1"/ F(/c)1"/:

>e need to extend the transition function of an NFA to a function that ta*es a state # and a string of input symbols . and returns the set of states that the NFA is in if it starts in state # and processes the string ( <#, = is the column of states found after reading . if # is the lone state in the first column( Formally e define for an NFA/s transition function by+ BASIS in( INDUCTION <#. = B C#E( That is. ithout reading any input symbols. e are in the state e began Suppose : is of the form B xa. here a is the final symbol of ( Also suppose that <#, x= B CpA. pF. GG( p*E( Let
*

and x is the rest of

<pi, a= B CrA. rF.GGGG(rmE


iBA

Then <#, = BCrA. rF.GGGG(rmE( Less formally. e compute <#, = by first computing <#, x=. and by then follo ing any transition from any of these states that is labeled a( E7*#23, Let us use to describe the processing of input DDADA by the NFA of fig<a=( A summary of the steps is+ <#D, = B C#DE <#D, D= B <#D, D= B C #D. #AE <#D, DD= B <#D, D= <#A.D= B C#D. #AEBC#D. #AE <#D, DDA= B <#D, A= <#A.A= B C#DEC#FEBC#D. #FE <#D, DDAD= B <#D, D= <#A.D= B C#D. #AEBC#D. #AE <#D, DDADA= B <#D, D= <#A.A= B C#DEC#FEBC#D. #FE Line<A= is the basis rule( >e obtain line<F= by applying to the state. #D. that is in the pre"ious set. and get C#D. #AE as a result( Line<H= is obtained by ta*ing the union o"er the t o states in the pre"ious set of hat e get hen e apply to them ith input D( That is. <#D, D=BC#D. #AE. hile <#D, D= B ( For line<N=. e ta*e the union of <#D, A= B C#DE and <#D, A= BC#FE( Lines<I= and <Q= are similar to lines<H= and <N=( T+, L*/6(*6,8 "> */ NFA: An NFA accepts a string if it is possible to ma*e any se#uence of choices of next state. hile reading the characters of . and go from the start state to any accepting state( The fact that other choices using the input symbols of lead to a non accepting state. or do not lead to any state at all<i(e(. the se#uence of states ,dies=. does not pre"ent from being accepted by the NFA as a hole( Formally. if A B < M. . . #D. F= is an NFA. then L<A= B C 0 <#D. = F E That is. L<A= is the set of strings in : such that <#D. = contains at least one accepting state( G1A, * <!1,> /"), "/ )+, ,F(1A*3,/c, "> D,),!#1/18)1c */9 N"/-9,),!#1/18)1c F1/1), A()"#*)*? ."!0 =+*) 9" -"( #,*/) <- ,F(1A*3,/c,? E723*1/ ,F(1A*3,/c, "> DFA */9 NFA. .10#*!@80 !"ery language that can be described by some NFA can also be described by some DFA( The DFA has about as many states as the NFA( Although it often has more transitions( In the orst case. ho e"er. the smallest DFA can ha"e Fn states hile the smallest NFA for the same language has only n states( The proof that DFA/s can do hate"er NFA/s can do in"ol"es an important ,construction- called the subset construction because it in"ol"es constructing all subsets of the set of states of the NFA( The subset construction starts from an NFA N B< M N. . N. #D. FN=( Its goal is the description of a DFA D B <MD. . D. C#DE. FD= such that L<D= B L<N=( The input alphabets of the t o automata are the same. and the

start state of D is the set containing only the start state of N( The other components of D are constructed as follo s( MD is the set of subsets of MN 1 i(e(. MD is the po er set of MN( Note that if MN has n states. then MD ill ha"e Fn states( Note all these states are accessible from the start state of M ( Inaccessible states can be ,thro n a ay-. so effecti"ely. the number of states of D may much smaller than Fn( FD is the set of subsets S of M N such that S FN ( That is. FD is all sets of N/s states that include at least one accepting state of N( For each set S MN and for each input symbol a in . D<S. a= B N<p. a=
p in S

That is. to compute D<S. a= e loo* at all the states p in S. see ta*e the union of all those states(
D. A Start #
D

hat states N goes from p on input a. and

#
D

#
D

E7*#23,

Let N be the automaton of the abo"e figure that accepts all strings that end in DA( Since N/s set of states is C#D. #A.#FE. the subset construction produces a DFA ith F H B R states. corresponding to all the subsets of these three states( The follo ing figure sho s the transition table for these eight states( This transition table belongs to a deterministic finite automaton( !"en though the entries in the table are sets. the states of the constructed DFA are sets( >e can in"ent ne names for these states eg(. A for . P for C#DE etc(. The DFA transition table defines exactly the same automaton but ma*es clear the point that the entries in the table are single states of the DFA( 0 1 DF0E C#D. #AE C#DE DF1E C#FE GDF4E DF0, F1E C#D. #AE C#D. #FE GDF0, F4E C#D. #AE C#DE GDF1, F4E C#FE GDF0, F1, F4E C#D. #AE C#D. #FE Subset &onstruction from NFA

A B

0 A E

1 A B

C GD E GF GG GH

A A E E A !

D A F B D F

)enaming the States Lf the right states in table-F starting in the start state P. e can only reach states P. ! and F( The other fi"e states are inaccessible from the start state( Those states are remo"ed and the table is reconstructed and the DFA diagram is dra n ith that table as sho n in the figure(
A Start D D A D

C#DE

C#D. #AE

C#D. #FE ."!0 .10 #*!@80

E723*1/ +": )" >1/9 8)!1/6 1/ * ),7)? E723*1/ )+, 8),28 1/A"3A,9 1/ >1/91/6 * 8)!1/6 1/ * ),7)?

A common problem in the age of the eb and other on-line text repositories is the follo ing( 'i"en a set of ords. find all documents that contain one of those ords( The search engine uses a particular technology. called in"erted indexes. here for each ord appearing on the eb a list of all the places here that ord occurs is stored( %achines ith "ery large amounts of main memory *eep the most common of these lists a"ailable. allo ing many people to search for documents at once( In"ented-index techni#ues do not ma*e use of finite automata. but they also ta*e "ery large amounts of main memory *eep the most common of these lists a"ailable. allo ing many people to search for documents at once( In"erted-index techni#ues do not ma*e use of finite automata. but they also ta*e "ery large amounts of time for cra lers to copy the eb and set up the indexes( There are a number of related applications that are unsuited for in"erted indexes. but are good applications for automaton-based techni#ues( The characteristics that ma*e an application suitable for searches that use automata are+ A( The repository on hich the search is conducted is rapidly changing( For eg+ a= !"eryday. ne s analysts ant to search the day/s on-line ne s articles for rele"ant topics( For eg(. a financial analyst might search for certain stoc* tic*er symbols or names of companies( b= A ,shopping robot- ants to search for the current prices charged for the items that its clients re#uest( The robot ill retrie"e current catalog pages for the eb and then search those pages for ords that suggest a price for a particular item( F( The documents to be searched cannot be cataloged( For eg(. Ama$on(com does not ma*e it easy for cra lers to find all the pages for all the boo*s that the company sells( )ather. these pages are generated ,on the fly- in response to #ueries( 2o e"er. e could send a #uery for boo*s on a certain topic. say ,finite automata-. and then search the pages retrie"ed for certain ords. eg(. ,excellent- in a re"ie portion( C"/8)!(c) */ NFA >"! ),7) 8,*!c+? G1A, * 8+"!) /"), "/ ),7) 8,*!c+ >"! /"/ 9,),!#1/18)1c >1/1), *()"#*)*? ."!0 .5 #*!@80

Suppose e are gi"en a set of ords. hich e shall call the *ey ords. and e ant to find occurrences of any these ords( In applications such as these. a useful ay to proceed is to design a non-deterministic finite automaton. hich signals. by entering an accepting state. that it has seen one of the *ey ord( The text of a

document is fed. one character at a time to this NFA. hich then recogni$es occurrences of the *ey ords in this text( There is a simple form to an NFA that recogni$es a set of *ey ords( A= There is a start state ith a transition to itself on e"ery input symbol. e(g(. e"ery printable AS&II character if e are examining text( Intuiti"ely. the start state represents a ,guess- that e ha"e not yet begun to see one of the *ey ords. e"en if e ha"e seen some letters of one of these ords( F= For each *ey ord aAaFGGGa*. there are * states. say # A#F. GG((#*( There is a transition from the start state to #A on symbol aA. a transition from #A to #F on symbol aF. and so on( The state #* is an accepting state and indicates that the *ey ord aAaFGGG(a* has been found( E6., suppose e ant to design an NFA to recogni$e occurrences of the ords eb and ebay( The transition diagram for the NFA designed using the rule abo"e is in fig(F(AQ( State A is the start state. and e use to stand for the set of all printable AS&II characters( States F through N ha"e the Kob of recogni$ing eb. hile states I through R recogni$e ebay( Lf course the NFA is not a program( >e ha"e t o maKor choices for an implementation of this NFA( A( >rite a program that simulates this NFA by computing the set of states it is in after reading each input symbol( The simulation as suggested in fig<b=( F( &on"ert the NFA to an e#ui"alent DFA using the subset construction( Then simulate the DFA directly(

Some text processing programs. such as ad"anced form of the 4NI5 grep command<egrep and fgrep= actually use a mixture of these t o approaches( 2o e"er for our purpose con"ersion to a DFA is easy and is guaranteed not to increase the number of states( E723*1/ )+, c"/A,!81"/ "> NFA )" DFA (81/6 ),7) 8,*!c+? G1A, 8+"!) /"),8 "/ c"/A,!81"/ "> NFA >!"# DFA >"!# "> ),7) 8,*!c+,8? ."!0 .5 #*!@80

>hen e apply the subset construction to an NFA that as designed from a set of *ey ords. e find that the number of states of the DFA is ne"er greater than the number of states of the NFA( The rules for constructing the set of NFA states is as follo s. a= If #D is the start state of the NFA. then C#DE is one of the states of the DFA( b= Suppose p is one of the NFA states. and it is reached from the start state along a path hose symbols are aAaFGG((am( Then one of the DFA states is the set of NFA consisting of+ A( #D F( p

H( !"ery other state of the NFA that is reachable from # D by follo ing a path hose labels are a suffix of aAaFG((am. that is. any se#uence of symbols of the form aKaK3AGGGGG((am(

Fig <c=( &on"ersion of the NFA to a DFA There ill be one DFA state for each NFA state p( 2o e"er. in step <b=. t o states may actually yield the same set of NFA states. and thus become one state of the DFA( For eg(. if t o of the *ey ords begin ith the same letter. say a. then the t o NFA states that are reached from # D by an arc labeled a ill yield the same set of NFA states and thus get merged in the DFA(
G1A, 8+"!) /"),8 "/ F1/1), A()"#*)* :1)+ E2813"/ T!*/81)1"/8 E723*1/ *<"() E2813"/ T!*/81)1"/8 "/ F1/1), A()"#*)* :1)+ ,7*#23,? ."!0 .10 #*!@80

The ne ,feature- is that e allo a transition on . the empty string( In effect. an NFA is allo ed to ma*e a transition spontaneously ithout recei"ing an input symbol( U8,8 "> -T!*/81)1"/8: >e shall begin ith an informal treatment of -NFA/s. using transition diagrams ith allo ed as a label( In the examples to follo . thin* of the automaton as accepting those se#uences of labels along paths from the start state to an accepting state( 2o e"er. each along a path is ,in"isible-1 i(e(. it contributes nothing to the string along the path( E7*#23, The follo A( F( H( N( ing figure is an -NFA that accepts decimal numbers consisting of+ An optional 3 or O sign( A string of digits( A decimal point. and Another string of digits( !ither this string of digits. or the string<F= can be empty. but at least one of the t o strings of digits must be nonempty(

Lf particular interest is the transition from #D to #A on any of . 3 or -( Thus. state #A represents the situation in hich e ha"e seen the sign if there is one. but none of the digits are decimal point( State #F represents the situation here e ha"e Kust seen the decimal point. and may or may not ha"e seen prior digits( In #N e ha"e definitely seen at least one digit. but not the decimal point( Thus. the interpretation of #H is that e ha"e seen a decimal point and at least one digit. either before or after the decimal point(

T+, >"!#*3 N")*)1"/ >"! */ -NFA: >e may represent an -NFA exactly as e do an NFA. ith one exception+ the transition function must include information about transitions on ( Formally. e represent an -NFA A by A B <M. . . #D. F=. here all components ha"e their same interpretation as for NFA. except that is no a function that ta*es as arguments+ A( A state in M. and F( A member of CE. that is. either an input symbol. or the symbol ( >e re#uire that . the symbol for the empty string. cannot be a member of the alphabet . so no confusion results( E7*#23, The abo"e -NFA is formally represented as ! B<C#D. #A. GG(#IE. C.. 3. - .D. AG(SE. . #D . C#IE= >here is defined by the transition table in figure C#AE C#IE H,C#AE . C#FE C#HE 0,1,I5 C#A.#NE C#HE C#HE

F0 F1 F4 F3 F& F5 E2813"/ C3"8(!,8

-closure of a state is defined as finding e"ery state that can be reached from the current state along any path hose arcs are all labeled ( Formal definition is as follo s+ BASIS INDUCTION State # is in !&LLS!<#= If state p is in !&LLS!<#=.and there is a transition from state p to state r lebeled . then r is in !&LLS!<#=( %ore precisely. if is the transition function of the -NFA

in"ol"ed. and p is in !&LLS!<#=. then !&LLS!<#= also contains all the states in <p. =( E7*#23, &onsider the follo ing figure1 each state is its o n -closure. ith t o exceptions !&LLS! <#D= B C#D. #AE and !&LLS! <#HEBC#H. #IE The reason is that there are only t o -transitions. one that adds #Ato !&LLS!<#D= and the other that adds #I to !&LLS!<#H=( For this collection of states. hich may be part of some -NFA. e can conclude that !&LLS!<A= B CA.F.H.N.QE !ach of these states can be reached from state A along a path exclusi"ely labeled ( For eg(. state Q is reached by the path A FHQ( State T is not in !&LLS! <A=( Since. although it is reachable from stateA. the path must use the arc NI that is not labeled( The fact that state Q is also reached from state A along a path A NIQ that has non- transitions is unimportant( The existence of one path ith all labels is sufficient to sho state Q is in !&LLS! <A=(

=!1), 8+"!) /"),8 "/ )+, ,7),/9,9 )!*/81)1"/8 */9 3*/6(*6,8 >"! -NFAB8? E723*1/ +": )" ,7),/9 )+, )!*/81)1"/8 */9 3*/6(*6,8 >"! -NFAB8 The -closure allo s us to explain easily hat the transitions of an -NFA loo* li*e of inputs( >e can define hat it means for an -NFA to accept its input( BASIS INDUCTION

."!0 .10 #*!@80 hen gi"en a se#uence

Suppose that !B<M. . . #D. F= is an -NFA( >e first define . the extended transition function. to reflect hat happens on a se#uence of inputs( The intent is that <#. = is the set of states that can be reached along a path hose labels. hen concatenated. form the string ( /s along this path do not contribute to ( <#.= B !&LLS!<#=( That is. if the label of the path is ( Then e can follo only labeled arcs extending from state #1 that is exactly hat !&LLS! does( Suppose is of the form xa. here a is the last symbol of ( Note that a is a member of 1 it cannot be . hich is not in ( >e compute <#. = as follo s+

A( Let CpA.pF.GG(p*E be <#.x=( That is. the pi/s are the states that e can reach from # follo ing a path labeled x( This path may end ith transitions labeled . and may ha"e other -transitions as ell(

F( Let i =A < pi.a= be the set CrA.rF.GGG(rmE(That is. follo all transitions labeled a from states e can reach from # along paths labeled x( The rK/s by follo ing -labeled arcs in step<H= belo ( H( Then <#. = B * !&LLS!<rK=(This closure step includes all the paths from # labeled . by considering the possibility that there are additional -labeled arcs that can follo after ma*ing a transition on the final ,real- symbol. a(
k

E7*#23, Let us compute <#D. I(Q= for the abo"e -NFA <page no(AH=( A summary of the steps needed are as follo s+ < #. = B !&LLS!<#D= B C#D.#AE compute <#D. I= as follo s+ A( First compute the transitions on input I from the states #D and #A that e obtained in the calculation of <#D.=. abo"e( That is. e compute <#D.I= <#A.I= B C#A.#NE F( Next. -closure the members of the set computed in step<A=( H( >e get !&LLS!<#A= !&LLS!<#N= B C#AEC#NE B C#A.#NE(That set is <#D. I=( This t o-step pattern repeats for the next t o symbols( &ompute <#D. I.= as follo s+ A( First compute < #D. .= < #N. .= B C#FEC#HE B C#F.#HE F( Then compute < #D. I.= B !&LLS!<#F= !&LLS!<#H= B C#FEC#H.#IEB C#F.#H.#IE &ompute < #D. I.Q= as follo s+ A( First compute < #F. Q= < #H.Q= < #I.Q= B C#HEC#HE BC#HE F( Then compute <#D. I(Q= B !&LLS!<#H= B C#H.#IE E723*1/ +": )" ,31#1/*), -)!*/81)1"/8 :1)+ */ ,7*#23,. G1A, * <!1,> /"), "/ ,31#1/*)1"/ "> -)!*/81)1"/8? ."!0 .5 M*!@80

'i"en any -NFA. e can find a DFA D that accepts the same language as !( The construction e use is "ery close to the subset construction. as the states of D are subsets of the states of !( The only difference is that e must incorporate -transitions of !. hich e do through the mechanism of the -closure( Let ! B <MD. . !. #D. F!=( Then the e#ui"alent DFA DB <MD. . D. #D. FD= is defined as follo s+A( MD is the set of subsets of M!( %ore precisely. e shall find that the only accessible states of D are the -closed subsets of M!. that is. those sets SM! such that SB!&LLS!<S=( F( #D B !&LLS!<#D=1 e get the start state of D by closing the set consisting of only the start state of !( H( FD id those sets of states that contain at least one accepting state of !( That is. F DBCS0S is in MD and SF! E( N( D<S.a= is computed. for all a in and sets S in MD by+ a( Let S BC pA.pFGGp*= b( &ompute k i =A <pi. a=1 let this set be CrA.rF.G((rmE
k c( Then D<S.a= B j =A !&LLS!<rK=

Let us eliminate -transitions from the -NFA of the abo"e figure. hich e shall call ! in hat follo s( From !. e construct an DFA D. hich is sho n in the follo ing figure( Imagine that for each state sho n in fig(F(FF there are additional transitions from any state to on any input symbols for hich a transition is

not indicated( Also. the state has transitions to itself on all input symbols D.A.G((S( Since the start state of ! is #D. the start state of D is !&LLS!<#D=. hich is C#D. #A=( Lur first Kob is to find the successors of # D and #A on the "arious symbols in 1 note that these symbols are the plus and minus signs. the dot. and the digits D through S( Ln 3 and -. #A goes no here in the NFA. hile #D goes to #A( Thus. to compute D<C#D. #AE. 3= e start ith C#AE and -close it( Since there are no -transitions out of #A. e ha"e D<C#D. #AE. 3= B C#AE( Similarly. D<C#D. #AE. -= B C#AE( These t o transitions are sho n by one arc in the figure( Next. e need to compute D<C#D. #AE. .=( Since #D goes no here on the dot. and # A goes to #F in the NFA e must -close C#FE( As there are no -transitions out of #F. this state is its o n closure. so D<C#D. #AE. .= BC#FE( Finally. e must compute D<C#D. #AE. DE. as an example of the transitions from C#D.#AE on all the digits( >e find that #D goes no here on the digits. but # A goes no here on the digits. but # A goes to both #A and #N( Since neither of those states ha"e -transitions out. e conclude D<C#D. #AE. D= BC#A.#NEand li*e ise for the other digits( Since #I is the only accepting state of !. the accepting states of D are those accessible states contains # I( >e see these t o sets C#H. #IE and C#F. #H. #IE indicated by double circles in figure(

E/9 "> U/1) -1


U N I T - II D,>1/, !,6(3*! ,72!,881"/ :1)+ ,7*#23,. G1A, * <!1,> /"), "/ !,6(3*! ,72!,881"/8. #*!@80 ."!0 .10

The algebraic description of the language is called as ,regular expressions-( )egular expressions offer a declarati"e ay to express the strings e ant to accept( Thus. regular expressions ser"e as the input language for many systems that process strings( !xamples include+ Search commands Lexical analy$er generators T+, "2,!*)"!8 "> R,6(3*! E72!,881"/8: )egular expressions denote languages( For a simple example. the regular expression DA: 3AD : denotes the language consisting of all strings that are either a single D follo ed by any number of A/s or a single A follo ed by any number of D/s( Pefore describing the regular-expression notation. e need to learn the three operations on languages that the operators of regular expressions represent( These operations are+ A( The (/1"/ of t o languages L and %. denoted L %. is the set of strings that are in either L or %. or both( For eg(. if LBCDDA.AD.AAAE and %BC.DDAE. then L% B C.AD.DDA.AAAE( F( The c"/c*),/*)1"/ of languages L and % is the set of strings in L and concatenating it ith any string in %( This operator is denoted either ith a dot or ith no operator at all( For eg(. if LBCDDA.AD.AAAE and %BC.DDAE. then L(%. or Kust L% is CDDA.AD.AAA.DDADDA.ADDDA.AAADDAE( The first three strings in L% are the strings in L concatenated ith ( Since is the identity for concatenation. the resulting strings are the same as the strings of L( 2o e"er. the last three strings in L% are formed by ta*ing each string in L and concatenating it ith the second string in %. hich is it ith the second string in %. hich is DDA(For instance. AD from L concatenated ith DDA from % gi"es us ADDDA for L%( H( The c3"8(!, of a language L is denoted L: and represents the set of those strings that can be formed by ta*ing any number of strings from L. possibly ith repetitions and concatenating all of them( For instance. if LBCD.AE. then L: is all strings of D/s and A/s( If LBCD.AAE. then L : consists of those strings of D/s and A/s such that the A/s come in pairs( e(g(. DAA.AAAAD and but not DADAA or ADA( B(1391/6 R,6(3*! E72!,881"/8: Algebras of all *inds start ith some elementary expressions. usually constants or "ariables( Algebras then allo us to construct more expressions by applying a certain set of operators to these elementary expressions and to pre"iously constructed expressions( >e can describe the regular expressions recursi"ely. as follo s( In this definition e not only describe hat the legal regular expressions are. but for each regular expression !. e describe the language it represents. hich e denote L<!=( BASIS The basis consists of three parts( A( The constants and are regular expressions. denoting the languages CE and respecti"ely( That is. L<= B CE. and L<= B F( If 6a/ is any symbol. then 6a/ is a regular expression( This expression denotes the language CaE( That is. L<a= BCaE( H( A "ariable usually capitali$ed and italic such as L. is a "ariable. representing any language( INDUCTION There are four parts to the inducti"e step. one for each of the three operators and one for the introduction of parentheses( A( If ! and F are regular expressions. then !3F is a regular expression denoting the union of L<!= and L<F=( That is. L<!3F= B L<!=L<F=( F( If ! and F are regular expressions then !F is a regular expression denoting the concatenation of L<!= and L<F=( That is L<!F= B L<!=L<F=( H( If ! is a regular expression. then !: is a regular expression. denoting the closure of L<!=( That is. L<!:= B <L<!==:(

N( If ! is a regular expression. then <!=. a parenthesi$ed !. is also a regular expression. denoting the same language as !( Formally. L<<!==BL<!=( !,c,9,/c, "> R,6(3*!-E72!,881"/ "2,!*)"!8: Li*e other algebras. the regular expression operators ha"e an assumed order of ,precedence- hich means that operators are associated ith their operands in particular order( For instance. e *no that xy3$ groups the product xy before the sum. so it is e#ui"alent to the parameteri$ed expression <xy=3$ and not to the expression x<y3$=( For regular expressions. the follo ing is the order of precedence for the operators( A( The star <closure= operator is of highest precedence( F( Next in precedence comes the concatenation or ,dot- operator( After grouping all stars to their operands. e group concatenation operators to their operands( &oncatenation is an associati"e operator. e"aluates the expression from the left( For instance. DAF is grouped <DA=F( H( Finally. all unions <3 operators= are grouped ith their operands( Since union is also associati"e. it is also e"aluates from the left( E723*1/ >1/1), *()"#*)* :1)+ !,6(3*! ,72!,881"/8? D,8c!1<, >1/1), *()"#*)* */9 !,6(3*! ,72!,881"/8? ."!0 .10 M*!@80

>hile the regular-expression approach to describe languages is fundamentally different from the finiteautomaton approach. these t o notations turn out to represent exactly the same set of languages. hich e termed the ,regular languages-( In order to sho that the regular expressions define the same class. e must sho that+ A( !"ery language defined by one of these automata is also defined by a regular expression( For this proof. e assume the language is accepted by some DFA( F( !"ery language defined by a regular expression is defined by one of these automata( For this part of the proof. the easiest is to sho that there is an NFA ith -transitions accepting the same language(

The abo"e figure sho s all the e#ui"alences( An arc from class 5 to class ; means that e pro"e e"ery language defined by class 5 is also defined by class ;( Since the graph is strongly connected e see that all four classes are really the same(

F!"# DFAB8 )" R,6(3*! E72!,881"/8: >e build expressions that describe sets of strings that label certain paths in the DFA/s transition diagram( 2o e"er. the paths are allo ed to pass through only a limited subset of the states( In an inducti"e definition of these expressions. e start ith the simplest expressions that describe paths that are not allo ed to go through any state1 i(e(. the expressions e generate at the end represent all possible paths( E7*#23,: Let us con"ert the DFA to a regular expression( This DFA accepts all strings that ha"e at least one D in them( To see hy. note that the automaton goes from the start state A to accepting state F as soon as it sees an input D( The automaton then stays in state F on all input se#uences(

from the DFA. the basis expressions can be constructed as sho n in the table( For instance. has the term A because there is an arc from state A to state A on input A( As another example. For a third example. term because the beginning and ending states are different(

<D=

AA

has the term because the beginning and ending states are the same. state A( It

<D=

AF

is D because there is an arc labeled D from state A to state F( There is no B . because there is no arc from state F to state A(

<D= FA

No . construct the expressions by considering the state A( The rule for computing the expressions are

<A= ij

<A= ij

<D= ij

The follo ing table gi"es first the expressions computed by direct substitution into the abo"e formula. and then a simplified expression that e can sho to represent the same language as the more complex expression(

<D= iA

<D=

AA

<D=

Aj

To understand the simplification. note the general principle that if ) is any regular expression. then < 3)=: B ):( In our case. e ha"e <3A=: BA:1 both expressions denote any number of A/s( Also that <3A=A: B A:.denotes any number of A/s( Thus. the original expression containing the string D and all strings consisting of a D preceded by any number of A/s( This language is also expressed by the simpler expression A:D( For any regular expression )+ ) B ) B( And 3) B )3 B)( That is. is the identity for union1 it results in the other expression hene"er it appears in a union(

<A=

AF

is e#ui"alent to D3A:D( This expression denotes the language

Let us compute the expressions

R R

< F= < F= ij

ij

( The inducti"e rule applied ith *BF gi"e us+ B

<A= ij

R R
iF

<A=

<A= FF

<A= Fj

The expressions are gi"en in the follo ing table(

The final regular expression e#ui"alent to the DFA is constructed by ta*ing the union of all the expressions here the first state is the start state and the second state is accepting( In this eg(. ith A as the start state and F as the only accepting state. e need only the expression This expression is 1G0.0H10G( It is simple to interpret this expression( Its language consists of all strings that begin ith $ero or more A/s. then ha"e a D. and then any string of D/s and A/s( In another ay. the language is all strings of D/s and A/s ith at least one D( E723*1/ )+, c"/A,!81"/ "> DFA )" !,6(3*! ,72!,881"/. ."!0 .10 #*!@80 D,8c!1<, )+, c"/c,2)8 "> ,31#1/*)1/6 8)*),8 1/ )+, c"/A,!81"/ "> DFA )" R,6(3*! ,72!,881"/? The regular expressions can be constructed by eliminating the states of DFA( >hen a state is eliminated. all the paths that ent through s no longer exist in the automaton( Lnce a state is eliminated. the language ill also change( So an arc must be included to connect its predecessor and successor( This arc ill contain a string as a label( )egular expression is used to represent the string( Finally the language of the automaton is the union o"er all paths from the start state to an accepting state of the language formed by concatenating the languages of the regular expressions along that path( The follo ing figure<a= sho s a generic state s about to be eliminated( The state s has predecessor states #A.#F.GG(.#* and successor states pA.pF.GG(pm(

<A=

AF

<a=

<b=

Fig(<b= sho s hat happens hen e eliminate state s( All arcs in"ol"ing state s are deleted( To compensate. e introduce. for each predecessor # i of s and each successor pK of s. a regular expression that represents all the paths that start at #i. go to s. perhaps loop around s $ero or more times( The expression for these paths is Mi s: PK( This expression is added to the arc from #i to pK( If there as no arc #i pK. then first introduce one ith regular expression ( The strategy for constructing a regular expression from a finite automaton is as follo s+ A( For each accepting state #. apply the abo"e reduction process to produce an e#ui"alent automaton ith regular expression labels on the arcs( F( !liminate all states except # and the start state #D( H( If # U #D. then e shall be left ith a t o-state automaton that loo*s li*e as in the figure( N( The regular expression for the accepted strings can be described in "arious ays( I( Lne is <)3S4:T=: S4:( a( In explanation. e can go from the start state to itself any number of times. by follo ing a se#uence of paths hose labels are in either L<)= or L<S4:T=( b( The expression S4:T represents paths that go to the accepting state "ia a path in L<S=. perhaps return to the accepting state se"eral times using a se#uence of paths ith labels in L<4=. and then return to the start state ith a path hose label is in L<T=( c( Then e must go to the accepting state. ne"er to return to the start state. by follo ing a path ith a label in L<S=( d( Lnce in the accepting state. e can return to it as many times as e li*e. by follo ing a path hose label is in L<4=(

Q( If the start state is also an accepting state. then e must also perform a state-elimination from the original automaton that gets rid of e"ery state but the start state (>hen e do so. e are left ith a one-state automaton that loo*s li*e in the follo ing figure( The regular expression is ):(

T( The desired regular expression is the sum of all the expressions deri"ed from the reduced automata for each accepting state. by rules <F= and <H=(

E6: Let us consider the NFA in the abo"e figure that accepts all strings of D/s and A/s such that either the second or third position from the end has a A( Lur first step is to con"ert it to an automaton ith regular expression labels( Since no state elimination has been performed. all e ha"e to do is replace the labels ,D.A- ith the e#ui"alent regular expression D3A( The result is sho n in the follo ing figure(

Let us first eliminate state P( Since this state is neither accepting nor the start state. it ill not be in any of the reduced automata( State P has one predecessor. A. and one successor. &( As a result. the expression on the ne arc from A to & is 3A:<D3A=( To simplify. e first eliminate the initial . hich any be ignored in a union( The expression thus becomes A:<D3A=( Note that the regular expression : is e#ui"alent to the regular expression ( Thus. A:<D3A= is e#ui"alent to A<D3A=(

No . e must branch. eliminating states & and D in separate reductions( To eliminate state &. and the resulting automaton is as sho n in the figure(

In terms of the generic t o-state automaton. the regular expressions are+ )B D3A. SBA<D3A=<D3A=. TB. and 4B( The expression 4: can be replaced by ( Also. the expression S4:T is e#ui"alent to ( The generic expression <)3S4:T=:S4: thus simplifies in this case to ):S. or .0H10G1.0H10.0H10( In informal terms. the language of this expression is any string ending in A. follo ed by t o symbols that are each either D or A( That language is one portion of the strings accepted by the original automaton( those strings hose third position from the end has a A(

No e start again and eliminate state D instead of &( since D has no successors( The resulting t o-state automaton is sho n in the figure(

Thus. e can apply the rule for t o-state automata and simplify the expression to get .0H10G1.0H10( This expression represents the other type of string the automaton accepts+ those ith a A in the second position from the end( All that remains is to sum the t o expressions to get the expression for the entire automaton( This expression is .0H10G1.0H10 H .0H10G1.0H10.0H10

D,8c!1<, )+, c"/A,!81"/ "> !,6(3*! ,72!,881"/ 1/)" *()"#*)*? E723*1/ )+, c"/A,!81"/ "> !,6(3*! ,72!,881"/ 1/)" >1/1), *()"#*)*?

."!0 .10 #*!@80

The automata can be constructed for single symbols . and ( And they can be combined into larger automata that accept the union. concatenation. or closure of the language accepted by smaller automata( All of the automata e construct are -NFA/s ith a single accepting state( T+,"!,#: !"ery language defined by a regular expression is also defined by a finite automaton( !"">: Suppose LBL< ) = for a regular expression )( >e sho that LBL<!= for some -NFA ! ith+ A( !xactly one accepting state F( No arcs into the initial state H( No arcs out of the accepting state BASIS There are three parts to the basis as sho n in the figure( Part<a= sho s to handle the expression ( The language of the automaton is easily seen to be C E. since the only path from the start state to an accepting state is labeled ( Part <b= sho s the construction for ( &learly there are no paths from start state to accepting state. so is the language of this automaton( Finally. part <c= gi"es the automaton for a regular expression a( the language of this automaton consists of the one string a. hich is also L<a=( It is easy to chec* that these automata all satisfy conditions <A=. <F= and <H= of the inducti"e hypothesis( INDUCTION The three parts of the induction are sho n in the figure( It is assumed that the statement of the theorem is true for the immediate sub expressions of a gi"en regular expression1 that is. the languages of these sub expressions are also the languages of -NFA/s ith a single accepting state( The four cases are+ The expression id )3S for some smaller expressions ) and S( Then the automaton of fig<a= ser"es( That is. starting at the ne start state. e can go to the start state of either ) or S( >e then reach the accepting state of one of these automata. follo ing a path labeled by some string in L<)= or L<S=( Lnce e reach the accepting state of the automaton for ) or S. e can follo one of the arcs to the accepting state of the ne automaton( Thus. the language of the automaton in fig<a= sho s L<)=L<S=( The expression is )S for some smaller expression ) and S( The automaton for the concatenation is sho n in fig<b=( Note that the start state of the first automaton becomes the start state of the hole. and the accepting state of the second automaton becomes the accepting state of the hole( The idea is that the only paths from the start state to accepting state go first through the automaton for ). here it must follo a path labeled by a string in L<)=. and then through the automaton for S. here it follo s a path labeled by a string in L<S=( Thus. the paths in the automaton of fig<b= are all and only those labeled by strings in L<)= L<S=( The expression is ): for some smaller expression )( Then e use the automaton of fig<c=( That automaton allo s us to go either(

A( Directly from the start state to the accepting state along a path labeled hich is in L<)=: no matter hat ) is( F( To the start state of the automaton for ) through that automaton one or more times. and then to the accepting state( This set of paths allo s us to accept in L<)=. L<)= L<)=. L<)= L<)= L<)=. and so on. thus co"ering all strings in L<)=: except hich as co"ered by direct arc to the accepting state mentioned in H<a=( The expression is <)= for some smaller expression )( The automaton of ) also ser"es as the automaton for <)= since the parenthesis do not change the language defined by the expression(

C"/c3(81"/+ It is a simple obser"ation that the constructed automata satisfy the three conditions gi"en in the inducti"e hypothesis( Lne accepting state. ith no arcs into the initial state or out of the accepting state( E7*#23, Let us con"ert the regular expression <D3A=:A<D3A= to an -NFA( Lur first step is to construct an automaton for D3A( Next. e apply closure operation( The third automaton in the concatenation is another automaton for D3A( The step by step construction is sho n in the figure(

S)*),

)+,

*36,<!*1c

3*:8

>"!

!,6(3*! ,72!,881"/8. ."!0 E723*1/ )+, A36,<!*1c 3*:8 *A*13*<3, 1/ !,6(3*! ,72!,881"/8.

.10 #*!@80

Li*e arithmetic expressions. the regular expressions ha"e a number of la s that or* for them( %any of these are similar to the la s for arithmetic. if e thin* of union as addition and concatenation as multiplication( C"##()*)1A1)&ommutati"ity is the property of here the order of its operands can be s itched and gets the same result( An example for arithmetic is gi"en as x3y B y3x( L3% B %3L( This la . the commutati"e la for union. says that e may ta*e the union of t o languages in either order( A88"c1*)1A1)Associati"ity is the property of an operator that allo s us to group the operands hen the operator is applied t ice( For eg(. the associati"e la of multiplication is <x y= $ B x <y $=( <L3%=3N B L3<%3N=( The union of three languages either by ta*ing the union of the last t o initially. or ta*ing the union of the last t o initially( <L%=N BL<%N=( This la . the associati"e la for concatenation. says that e can concatenate three languages by concatenating either the first t o or the last t o initially( I9,/)1)1,8 An identity for an operator is a "alue such that hen the operator is applied to the identity for addition. since D3x B x3DBx. and A is the identity for multiplication. since A y B y A B y( 3 L B L 3 B L( This la asserts that is the identity for union( L B L BL( This la asserts that is the identity for concatenation( A//1+13*)"!8 An annihilator for an operator is a "alue such that hen the operator is applied to the annihilator and some other "alue. the result is the annihilator( For instance. D is an annihilator for multiplication. since D y B y D BD( There is no annihilator for addition( L B LB ( This la asserts that is the annihilator for concatenation( D18)!1<()1A, L*:8: A distributi"e la in"ol"es t o operators. and asserts that one operator can be pushed do n to be applied to each argument of the other operator indi"idually( The most common example from arithmetic is the distributi"e la of multiplication o"er addition. that is x <y 3 $= B x y 3 x $( T o types are L<%3N= B L% 3 LN( This la . is the left distributi"e la of concatenation o"er union( <%3N=L B %L 3 NL( This la . is the right distributi"e la of concatenation o"er union( T+, I9,#2"),/) L*:: An operator is said to be idempotent if the result of applying it to t o of the same "alues as arguments is that "alue( The common arithmetic operators are not idempotent1 x 3x x in general and x x x in general( 2o e"er. union and intersection are common examples of idempotent operators( Thus. for regular expressions. L3LBL This la . the idempotent la for union. states that if e ta*e the union of t o identical expressions. e can replace them by one copy of the expression(

L*:8 1/A"3A1/6 c3"8(!,8: There are a number of la s in"ol"ing the closure operators as gi"en belo <L:=: This la says that closing an expression that is already closed does not change the language( That is <L:=: L: : B ( The closure of contains only the string ( : B ( It is easy to chec* that the only string that can be formed by concatenating any number of copies of the empty string is the empty string itself( L3 B LL: B L:L L : B L 3 3 ( D,8c!1<, )+, 2(#21/6 L,##* >"! !,6(3*! 3*/6(*6,8 */9 *2231c*)1"/8 "> )+, 2(#21/6 L,##*. ."!0 !"A, )+*) )+, 3*/6(*6,8 *!, /") )" <, !,6(3*!. .10 #*!@80 )egular languages ha"e atleast four different descriptions( They are the languages accepted by DFA/s. by NFA/s and by -NFA/s1 they are also the languages defined by regular expressions( Not e"ery language is a regular language( >e shall introduce a po erful techni#ue. *no n as the ,pumping lemma- for sho ing certain languages not to be regular( T+, (#21/6 L,##* >"! R,6(3*! L*/6(*6,8 Let us consider the language LDA B C DnAn J n AE( This language contains all strings DA. DDAA. DDDAAA. and so on. that consist of one or more D/s follo ed by an e#ual number of A/s( LDA is not a regular language( Since it as in some state # hen the A/s started. it cannot ,remember- that ho number of D/s it has recei"ed already( So this language can/t be represented by using automata and regular expression(

T+, 2(#21/6 3,##* >"! !,6(3*! 3*/6(*6,8 Let L be a regular language( Then there exists a constant n such that for e"ery string n. the string can be di"ided into three strings. B xy$. such that+ A( y F( J xy J n H( For all * D. the string xy*$ is also in L( in L such that J J

Then LB L<A= for some DFA A( Suppose A has n states( No . consider any string of length n or more. say B aAaFGGam. here m n and each ai is an input symbol( no . e can brea* B xy$ as follo s+ A( xBaAaFG((ai F( y B ai3Aai3FG((aK H( $ B aK3AaK3FG((am The follo ing figure sho s the DFA for the regular languages(

=!1), 8+"!) /"),8 "/ D,c181"/ 2!"2,!)1,8 "> R,6(3*! L*/6(*6,8

."!0

E723*1/ R,6(3*! L*/6(*6,8 <*8,9 "/ 9,c181"/ 2!"2,!)1,8.

.10 #*!@80

&onsider some of the fundamental #uestions about languages+ A( Is the language described empty? F( Is a particular string in the described language? H( Do t o descriptions of a language actually describe the same language? This #uestion is often called ,e#ui"alence- of languages( CONVERTING AMONG RE RESENTATIONS >e can con"ert any of the four representations for regular languages to any of the other three representations( >hile there are algorithms for any of the con"ersions. sometimes e are interested not only in the possibility of ma*ing a con"ersion. but in the amount of time it ta*es( C"/A,!)1/6 NFAB8 )" DFAB8 >hen e start ith either an NFA or -NFA and con"ert it to a DFA. the time can be exponential in the number of states of the NFA( First. computing the -closure of n states ta*es L<nH= time( Lnce the -closure is computed. e can compute the e#ui"alent DFA by the subset construction( Then from the subset. the reachable states from the starting state and their corresponding outputs are considered for the DFA

DFA-)"-NFA c"/A,!81"/ %odify the transition table for the DFA by putting set-brac*ets around states and. if the output is an -NFA. adding a column for ( Since e treat the number of input symbols as a constant. copying and processing the table ta*es L<n= time( A()"#*)"/-)"-R,6(3*! E72!,881"/ c"/A,!81"/ The nH expressions can ta*e time L<nHNn=(The same construction or*s in the same running time if the input is an NFA. or an -NFA( If e first con"ert an NFA to a DFA and then con"ert the DFA to a regular expression. it could ta*e time L<nHNnF=. hich is doubly exponential( R,6(3*!-E72!,881"/ ')"-A()"#*)"/ C"/A,!81"/ &on"ersion of a regular expression to an -NFA ta*es linear time( >e need to parse the expression efficiently. using a techni#ue that ta*es only L<n= time on a regular expression of length n N( The result is an expression tree ith one node for each symbol of the regular expression( Lnce e ha"e an expression tree for a regular expression. e can or* up the tree. building the -NFA for each node( TESTING EM TINESS OF REGULAR LANGUAGES JIs regular language L empty? K The ans er is+ is empty and all other regular languages are not( If our representation is any *ind of finite automaton. the emptiness #uestion is hether there is any path hatsoe"er from the start state to some accepting state( If so. the language is nonempty. hile if the accepting states are all separated from the start state. then the language is empty( The algorithm can be summari$ed by this recursi"e process( BASIS The start state is surely reachable from the start state(

INDUCTION

&ompute the set of reachable states from each state( If the set contains any accepting state. then the language is nonempty other ise it is empty(

The follo ing recursi"e rules tell hether a regular expression denotes the empty language( BASIS denotes the empty language1 and a for any input symbol a do not( INDUCTION Suppose ) is a regular expression( There are four cases to consider. corresponding to the ays that ) could be constructed A( ) B )A 3 )F( Then L<)= is empty if and only if both L<)A= and L<)F= are empty( F( ) B )A)F( Then L<)= is empty if and only if either L<)A= or L<)F= is empty( H( ) B )A:( Then L<)= is not empty1 it al ays includes at least ( N( ) B <)A=( Then L<)= is empty if and only if L<)A= is empty. since they are the same language( TESTING MEMBERSHI IN A REGULAR LANGUAGE 'i"en a string and a regular language L. the problem is to find out hether the gi"en string can be accepted by the language or not( >hile is represented explicitly. L is represented by an automaton or regular expression( If the language L is represented by an automaton. then for the gi"en string. if the DFA ends in an accepting state. the ans er is ,yes-1 other ise the ans er is ,no-( If L has any other representation besides a DFA. li*e NFA. then e could con"ert to a DFA and run the test abo"e( If the NFA has -transitions. then e must compute the -closure before starting the simulation( Then the processing of each input symbol a has t o stages. each of hich re#uires L<sF= time( Lastly. if the representation of L is a regular expression of si$e s. e can con"ert to an -NFA ith at most Fs states. in L<s= time( >e then perform the simulation abo"e. ta*ing L<ns F= time on an input of length n(

ELUIVALENCE AND MINIMIMATION OF AUTOMATA In this. e discuss ho to test hether t o descriptors for regular languages are e#ui"alent. in the sense that they define the same language( An important conse#uence of this test is that there is a ay to minimi$e a DFA( That is. e can ta*e any DFA and find an e#ui"alent DFA that has the minimum number of states( D,8c!1<, *<"() T,8)1/6 EF(1A*3,/c, "> 8)*),8. E723*1/ *<"() )+, ),8)1/6 ,F(1A*3,/c, "> 8)*),8. ."!0 .10 #*!@80

Lur goal is to understand hen t o distinct states p and # can be replaced by a single state that beha"es li*e both p and #( >e say that states p and # are e#ui"alent if+ For all input strings . p must be accepting state if and only if # is an accepting state( Poth are accepting or both are non-accepting( If t o states are not e#ui"alent. then e say they are distinguishable( For example consider the DFA. & and ' are not e#ui"alent because one is accepting and the other is not( &onsider states A and '( o String doesn/t distinguish them because both are nonaccepting states(

o For String D. they go to states P and '. and both states are non-accepting( o Li*e ise. string A doesn/t distinguish A from '. because they go to F and !. respecti"ely. and both are non-accepting( o 2o e"er. DA distinguishes A from '. because <A.DA= B &. <'. DA= B !. & is accepting. and ! is not( o So the states A and ' are not e#ui"alent( In contrast. consider states A and !( o Neither is accepting. so does not distinguish them( o Ln input A. they both go to state F( o >ith D. they go to states P and 2. respecti"ely( Since neither is accepting. string D by itself does not distinguish A from !( o Ln input A they both go to &. and on input D they both go to '( Thus. all inputs that begin ith D ill fail to distinguish A from !( o So both are e#ui"alent states(

To find states that are e#ui"alent. e ma*e our best efforts to find pairs of states that are distinguishable( The follo ing table-filling algorithm is used to find out the non-e#ui"alent state pairs( BASIS INDUCTION: If p is an accepting state and # is non accepting. then the pair Cp.#E is distinguishable( Let p and # be states such that for some input symbol a. r B <p. a= and s B <#. a= are a pair of states *no n to be distinguishable( Then Cp.#E is a pair of distinguishable states(

!xecute the table-filling algorithm on the abo"e DFA( The final table is sho n in the follo ing figure. here an x indicates pairs of distinguishable states. and the blan* s#uares indicate those pairs that ha"e been found e#ui"alent( Initially. there are no x/s in the table( For the basis. since & is the only accepting state. e put x/s in each pair that in"ol"es &( For instance. since C&.2E is distinguishable. and states ! and F go to 2 and &. respecti"ely. on input D. e *no that C!.FE is also a distinguishable pair( The pair CA.'E can be disco"ered simply by loo*ing at the transitions from the pairs of states on either D or A. and obser"ing that one state goes to & and the other does not( >e can sho CA('E is distinguishable on the next round. since on input A they go to F and !. respecti"ely. and e already established that the pairC!.FE is distinguishable( The three remaining pairs. hich are therefore e#ui"alent pairs. are CA.!E. CP.2Eand CD.FE( For eg(. consider hy e cannot infer that CA.!E is a distinguishable pair( Ln input D. A and ! go to P and 2. respecti"ely. and CP.2E has not yet been sho n distinguishable( Ln input A. A and ! both go to F. so there is no hope of distinguishing them that ay( The other t o pairs. CP.2E and CD.FE ill ne"er be distinguished because they each ha"e identical transitions on D and identical transitions on A(

Thus. the table-filling algorithm steps ith the table as sho n in figure. determination of e#ui"alent and distinguishable states(

hich is the correct

T,8)1/6 EF(1A*3,/c, "> R,6(3*! L*/6(*6,8 The table-filling algorithm gi"es us an easy ay to test if t o regular languages are the same( Suppose languages L and % are each represented in some ay. eg(. one by a regular expression and one by an NFA( &on"ert each representation to a DFA( No . imagine one DFA hose states are the union of the states of the DFA/s for L and %( Technically. this DFA has t o start states. but actually the start state is irrele"ant as far as testing state e#ui"alence is concerned. so ma*e any state the lone start state( No . test if the start states of the t o original DFA/s are e#ui"alent. using the table-filling algorithm( If they are e#ui"alent. then LB% and if not. then L%( E7*#23,: &onsider the t o DFA/s in the figure( !ach DFA accepts the empty string and all strings that end in D1 that is the language of regular expression 3<D3A=:D( >e can imagine that the figure represents a single DFA. ith fi"e states A through !( If e apply the table-filling algorithm to that automaton. the result is as sho n in the figure( To see ho the table is filled out. e start by placing x/s in all pairs of states here exactly one of the states is accepting( It turns out that there is no more to do( The four remaining pairs.CA.&E.CA.DE.C&.DE and CP.!E are all e#ui"alent pairs( ;ou should chec* that no more distinguishable pairs are disco"ered in the inducti"e part of the table-filling algorithm( For instance. ith the table as in fig(N(AA. e can/t distinguish the pair CA.DE because on D they go to themsel"es and on A they go to the pair CP.!E. hich has not yet been distinguished( Since A and & are found e#ui"alent by this test. and those states ere the start states of the t o original automata. e conclude that these DFA/s do accept the same language( E723*1/ *<"() M1/1#1N*)1"/ "> DFAB8. G1A, * <!1,> /"), "/ M1/1#1N*)1"/ "> DFAB8. ."!0 .10 #*!@80

For each DFA e can find an e#ui"alent DFA that has a fe states as any DFA accepting the same language( The central idea behind the minimi$ation of DFA/s is that the notion of state e#ui"alence lets us partition the states into bloc*s such that+ A( All the states in a bloc* are e#ui"alent F( No t o states chosen from t o different bloc*s are e#ui"alent( E7*#23, &onsider the first DFA and the table of the pre"ious #uestion. here e determined the state e#ui"alences and distinguishabilities for the states in the DFA( The partition of the states into e#ui"alent bloc*s is <CA.!E.CP.2E.C&E.CD.FE.C'E=( Notice that the three pairs of states that are e#ui"alent are each placed in a bloc* together. hile the states that are distinguishable from all the other states are each in a bloc* alone( For the automaton of second DFA. the partition is <CA.&.DE.CP.!E=(This example sho s that e can ha"e more than t o states in a bloc*( It may appear that A. & and D can all li"e together in a bloc*. because e"ery pair of them is e#ui"alent. and none of them is e#ui"alent to any other state(

&onsider the first DFA. The start state is CA.!E.since A as the start state of the DFA( The only accepting state is C&E. since & is the only accepting state of DFA( The transitions of the follo ing figure properly reflect the transitions of original DFA( For instance. it has a transition on input D from CA.!E to CP.2E( If e examine the original DFA. e find that both A and ! go to F on input A. so the selection of the successor of CA.!E on input A is also correct( &hec* that all of the other transitions are also proper( =!1), 8+"!) /"),8 "/ M""!, */9 M,*3- #*c+1/,8. ."!0 .10 #*!@80 E723*1/ M""!, */9 M,*3- #*c+1/,8. Lne limitation of the finite automaton as e ha"e defined it is that its output is limited to a binary signal+ ,accept- 0 ,don/t accept(- %odel in hich the output is chosen form some other alphabet ha"e been considered (There are t o distinct approaches1 the output may be associated ith the state <called a %oore machine= or ith the transition <called a %ealy machine=( >e shall define each formally and then sho that the t o machine types produce the same input-output mappings( M""!, #*c+1/,8 A %oore machine is a six-tuple <M. V. W. X. Y.# D= . here M. V. Y and # D are as in the DFA( W is the output alphabet and X is a mapping from M to W gi"ing the output associated ith each state( The output of % in response to input aA.aF (((( an . nZD. is X <#D= X <#A= G( X <#n=. here #D.#A. G(( # n is the se#uence of states such that Y <#i-A.aA= B # i for A[ i [ n( Note that any %oore machine gi"es output X <#D= in response to input \( The DFA may be "ie ed as a special case of a %oore machine here the output alphabet is CD.AE and state # is ,accepting- if and only if X <#=BA( A # D D A # A D # F A

A %oore machine calculating residues E7*#23, : Suppose e ish to determine the residue mod H for each binary string treated as a binary integer( To begin. obser"e that if I ritten in binary is follo ed by a D. the resulting string has "alue Fi. and if I in binary is follo ed by a remainder of Fi0H is Fp mod H( If p B D. A. or F. then Fp modH is D. F. orA. respecti"ely( Similarly. the remainder of <Fi3A=0H is A. D. or F. respecti"ely( It suffices therefore to design a %oore machine ith three states. #D. #A. and #F. here # K is entered if and only if the input seen so far the residue K( >e define X<#K= B K for K B D. A. and F ( In the abo"e transition diagram. here outputs label the states( The transition function T is designed to reflect the rules regarding calculation of residues described abo"e( Ln input ADAD the se#uence of states entered is #D.#A.#F.#F.#A. gi"ing output se#uence DAFFA( That is. ! < hich has ,"alue-D= has residue D. A has residue A.F <in decimal= has residue F . and AD <in decimal= has residue A( M,*3- #*c+1/,8

A %ealy machine is also a six-tuple % B <M. V. W. X. Y. # D=. here all is as in the %oore machine. except X that maps M : V to W( That is. X<#.a= gi"es the output associated ith the transition from state # on input a. The output of % in response to input aA.aF G( An is X <#D.aA= X <#A.aF= G( X <#n-A.an=. here #D.#A. G(( #n is the se#uence of states such that Y <#i-A.aA= B #i for A [ i [ n( Note that this se#uence has length n rather than length n 3 A as for the %oore machine. and on input \ a %ealy machine gi"es output \( D0N P D D0n # D A0n A0N P A %ealy machine A0n D0n

start

E7*#23, !"en if the output alphabet has only t o symbols. the %ealy machine model can sa"e states hen compared ith a finite automation( &onsider the language <D 3 A=:<DD 3 AA= of all strings of D/s and A/s hose last t o symbols are the same( In the next chapter e shall de"elop the tools necessary the sho that this language is accepted by no DFA ith fe er than fi"e states( 2o e"er. e may define a three-state %ealy machine that uses its state to remember the last symbol read. emits output y hene"er the current input matches the precious one. and emits n other ise( The se#uence of y/s and n/s emitted by the %ealy machine corresponds to the se#uence of accepting and non accepting states entered by a DFA on the same input1 ho e"er. the %ealy machine does not ma*e an output prior to any input. hile the DFA reKects the string !. as its initial state is nonfinal( The %ealy machine % B <C#D.pD.pAE.CD.AE.Cy.nE. Y. X.#D= is sho n in belo figure( >e use the label a0b on an arc from state p to state # to indicate that Y <p.a= B # and X <p . a= B b( The response of % to input DAADD is nnyny. ith the se#uence of states entered being #DpDpApApDpD( Note ho pD remembers a $ero and pA remember a one( State #D is initial and ,remembers- that no in

E/9 "> U/1) -4


U N I T - III E723*1/ C"/),7) F!,, G!*##*! .CFG0 :1)+ ,7*#23,? E723*1/ )+, c"#2"/,/)8 "> c"/),7) >!,, 6!*##*!? ."!0 .5 #*!@80 as

There are four important components in a grammatical description of language+ A( There is a finite set of symbols that form the strings of the language being defined( This set CD.AE(>e call this alphabet the terminals or terminal symbols(

F( There is a finite set of "ariables also called sometimes non-terminals or syntactic categories( !ach "ariable represents a language that is a set of strings( H( Lne of the "ariables represents the language being defined1 it is called the start symbol( Lther "ariables represent auxiliary classes of strings that are used to help define the language of start symbol( N( There is a finite set of productions or rules the represents the recursi"e definition of a language( !ach production consist of <a= A "ariable that is being defined by the production and it is being defined by the production and it is called as the head of the production( <b= The production symbol( <c= A string of $ero or more terminals and "ariables( This string called the body of the production represents one ay to form strings in the language of the "ariable of the head( The &F' ' can be represented by its N components that is 'B<].T.P.S= ] set of "ariables T terminals P set of productions S start symbol( E7*#23,-1+ &onsider the context free grammar for palindromes( A( P F( P D H( P A N( P DPD I( P APA The grammar 'pal for palindrome is represented by+ here A represents the set of I production as seen abo"e(

'palB<CpE.CD.AE.A.P=

E7*#23,-4+ &onsider the set of identifiers only ith the letters a. b and the digits D and A( !"ery identifier must begin ith a or b. hich may be follo ed by any string in Ca.b.D.AE ( >e need t o "ariables in this grammar one e call !. represents the expressions( It is the start symbol and represents the language of expressions e are defining( The other "ariable I represents the identifier( The regular expression for the language is <a3b= <a3b3D3A=:( This expression can be con"erted into a &F' as follo s+ A( ! I F( ! ! 3 ! H( ! ! : ! N( ! < ! = I( I a Q( I b T( I Ia R( I Ib S( I ID AD( I IA The grammar for expressions is started finally as ' B <C!.IE.T.P.!= here T is the set of symbols C3.:. <.=a.b.D.AE and P is the set of productions sho n abo"e( )ule <A= is the basis rule for expressions( It says that expressions can be a single identifier( )ule <F= through<N= describe the inducti"e case fro expressions( )ule<F= say that an expression can be t o expressions connected by a plus sign. )ule <H=says the same ith a multiplication sign(

)ule <N= says that if e ta*e any expression and put matching parentheses around it. the result is also an expression( )ules<I= through<AD= describe identifiers( I( The basis is rules<I=and <Q=1 they say that a and b are identifiers( The remaining N rules are the inducti"e case. They say that if e ha"e any identifier. e can follo it by a.b.D<or=A. and the result ill be another identifiers( ."!0 .5 #*!@80

D18c(88 )+, 9,!1A*)1"/8 "> 6!*##*!8 :1)+ ,7*#23,? =!1), 8+"!) /"),8 "/ .10 R,c(!81A, 1/>,!,/c, */9 .110 D,!1A*)1"/8?

There are t o approaches found for the inference of &F'( The more con"entional approach is to use the rules from body to head( >e ta*e strings *no n to be in the language of each of the "ariable of the body. concatenate them. in the proper order ith any terminals appearing in the body and infer that the resulting strings is in the language of the "ariable in the head( >e shall refer to this procedure as !,c(!81A, 1/>,!,/c,( E7*#23,: Let us consider some of the inferences e ma*e using the grammar for expression <a3b= <a3b3D3A=:( For example. line <i= says that e can infer string a is in the languages for I by using production I( Lines <ii= through <i"= says e can infer that bDD is an identifier by using production Q once<to get the b= and then applying production S t ice<to attach the t o D/s=(

Lines <"= and <"i= exploit production A to infer that. since that any identifier is an expression. the strings a and bDD . hich e inferred in lines <i= and <i"= to be identifiers are also in language of "ariable !( Line<"ii= uses production F to infer that the sum of these identifiers is an expression. line<"iii= uses production N to infer that the same string ith parentheses around it is also an identifier and line<ix= use production H to multiply the identifier a by the expression e had disco"ered in line<"iii=( There is another approach to define the language of a grammar. in hich e use the productions from head to body( >e expand the start symbol using one of its productions( >e further expand the resulting string by replacing on of the "ariables by the body of one of its production and so on. until e deri"e a string consisting entirely of terminals( The language of grammar is all strings of terminals that e can obtain in this ay( This use of grammar is called D,!1A*)1"/8. The process of deri"ing strings by applying productions from head to body re#uires the definition of a ne relation symbol ( Suppose 'B<].T.P.S= is a &F'( Let A ^ _ be a production of '( Then e say A` ^ _ `( >e may entered the relationships to represent $ero. one or many deri"ation steps. For deri"ation e use a : to denote ,$ero or more steps-( !xample+ The inference that a : <a3bDD= is in the language of "ariable ! can be reflected in a deri"ation of that string. starting ith the string !( 2ere is one such deri"ation(

G1A, 8+"!) /"),8 "/ 3,>) #"8) */9 !16+)#"8) 9,!1A*)1"/8? D18c(88 LMD */9 RMD?

."!0 .5 M*!@80

To restrict the number of choices e ha"e in deri"ing a string. it is often useful to re#uire that at each step. e replace the leftmost "ariable by one of its production bodies such a deri"ation is called as leftmost : deri"ation and e indicate that a deri"ation is leftmost by using the relations lm and. lm for one or many steps respecti"ely( If the grammar ' that is being used is not ob"ious. e can place the name ' belo these symbols if it is not clear hich grammar is being used( Similarly it is possible to re#uire that at each step the rightmost "ariable is replaced by one of its bodies. if : so. e call the deri"ation rightmost and use the symbols rm and. rm to indicate one or more rightmost deri"ation steps. respecti"ely( Again the name of the grammar may appear belo these symbols if it is not clear hich grammar is being used( E7*#23,: &onstruct L%D for the gi"en string Ba : < a3bDD= by considering the production for <a3b=<a3b3D3A=:(

There is a rightmost deri"ation that user the same replacements for each "ariable. although it ma*es the replacement in different order( This rightmost deri"ation is

Any deri"ation has an e#ui"alent leftmost and an e#ui"alent rightmost deri"ation(

E723*1/ C"/),7) F!,, L*/6(*6,? E723*1/ )+, 3*/6(*6, "> )+, 61A,/ 6!*##*!?

."!0 .5 #*!@80

If '<].T.P.S= is a &F'. the language of '. denoted L<'=. is the set of terminal strings that ha"e deri"ations from the start symbol( That is L<'= B C in T J S

G :

If a language L is the language of some context free grammar. then L is said to be a context free language or &FL( For instance. e asserted that the grammar of the language of palindromes o"er alphabet CD.AE( Thus the set of palindrome is a context free language( !x+ &ontext free grammar for palindromes(

E723*1/ 8,/),/)1*3 >"!# :1)+ ,7*#23,? E723*1/ !16+) 8,/),/)1*3 >"!# */9 3,>) 8,/),/)1*3 >"!#?

."!0 .5 #*!@80

Deri"ations from the start symbol produce strings that ha"e a special role( >e call these ,sentential forms-( That is if 'B<].T.P.S= is a &F'. then any string ^ in <] 4 T=: such that s ^ is a sentential form( If s
:

^
lm

. then ^ left sentential form and if

s ^ .then ^ is a right sentential form( rm Note that the language L<'= is those sentential forms that are in T:. that is they consist solely of terminals(

&onsider the grammar for expressions( For example !:<I3!= is a sentential form. since there is a deri"ation ! !:! !:<!= !:<!3!= !:<!3I= 2o e"er this deri"ation is neither leftmost nor rightmost. since at the last step. the middle ! is replaced( As an example of left-sentential form. consider a:!. ith the leftmost deri"ation( ! !:! I:! a:! Additionally the deri"ation ! !:! !:<!= !:<!3!= sho s that !:<!3!= is a right sentential form(

=!1), 8+"!) /"),8 "/ 2*!8, )!,,8? E723*1/ )+, c"/8)!(c)1"/ "> 2*!8, )!,,8? G1A, )+, c"/8)!(c)1"/ "> 2*!8, )!,, :1)+ ,7*#23,?

."!0 ."!0 .10 #*!@80

There is a tree representation for deri"ations that has pro"ed extremely useful( This tree sho s clearly ho the symbols of a terminal string are grouped into substrings. each of hich belongs to the language of one of the "ariables of the grammar( The trees are *no n as ,parse tree-( In compiler the tree structure of the source program facilitates the translation of the source program into executable code by allo ing natural. recursi"e functions to perform this translation process( Let us fix on a grammar ' B <].T.P.S=( The parse tree for ' is a tree ith the follo ing condition( A( !ach interior node is labeled by a "ariable in ]( F( !ach leaf is labeled by a "ariable. a terminal. or ( 2o e"er. if the leaf is labeled . then it must be the only child of its parent( H( If an interior node is labeled A. and its children are labeled 5A.5F.GGG(.5*( respecti"ely from the left. then A xA.xF.GGG(.x*( is a production in P( Note that the only one time one of the x/s can be is if that is the label of the only child. and A is a production of '( The follo ing figure sho s a parse tree denoting the deri"ation of I 3 ! from !( The fig<b= sho s a parse tree sho ing the deri"ation P DAAD T+, -1,39 "> )+, 2*!8, )!,,: If e concatenate the lea"es of the parse tree from the left. e get a string called the yield of the tree. hich is al ays a string deri"ed from root "ariable( The special importances of parse tree are A( The yield is a terminal string( That is all lea"es are labeled either ith a terminal or ith ( F( The root is labeled by the start symbol( The follo ing figure is an example of a tree ith a terminal string as yield and the start symbol at the root( This particular parse tree is the representation of that deri"ation a:<a3bDD=(

G1A, * <!1,> /"), "/ )+, I/>,!,/c,8, D,!1A*)1"/8 O *!8, T!,,8?

.10 #*!@80

'i"en grammar 'B<].T.P.S=. e shall sho that the follo ing are e#ui"alent( A( The recursi"e inference determines that terminal string 6 / is in the language of "ariable A( F( A w H( N(
A w
lm : :

A w
rm

I( There is a parse tree ith root A and yield ( In fact. except for the use of recursi"e inference. hich e only defined for terminal strings all the other conditions - the existence of deri"ations. leftmost or rightmost deri"ations. and parse trees are also e#ui"alent if is a string that has some "ariables( That is each are in that diagrams indicates that e pro"e a theorem that says if meets the condition at the trail. then it meets the condition at the head of the arc(

F!"# I/>,!,/c, )" T!,,8:Let 'B <].T.P.S= be a &F'( If the recursi"e inference procedure tells us that the terminal string is in the language of "ariable A. then there is a parse tree ith root A and yield ( F!"# T!,,8 )" D,!1A*)1"/ A

In order to understand ho deri"ations may be constructed. e need first to see ho one deri"ation of a string from a "ariable can be embedded ithin another deri"ation( Let us consider the expression grammar( It is easy to chee* that there is a deri"ation ! I Ib ab As a result. for any string ^ and `. it is also true that ! ` ^ I ` ^ Ib ` ^ ab ` The Kustification is that e can ma*e the same replacements of production bodies of for heads in the context of ^ and `( For instance if e ha"e a deri"ation that begins ! !3! !3<!=. e could apply the eri"ation of ab from the second ! by treating ,!3<- as ^ and ,=- as `( This deri"ation ould then continue !3! !3<I= !3<Ib= !3<ab=( E74+ Let us construct the leftmost deri"ation from the follo ing tree( E EGE IGE *GE *G.E0 *G.EHE0 *G.IHE0 *G.*HE0 *G.*HI0 * G . * H I0 0 * G . * H I00 0 * G . * H <00 0

F!"# D,!1A*)1"/8 )" R,c(!81A, I/>,!,/c,8 >hene"er there is a deri"ation A for some &F'. then the fact that is in the language of A is disco"ered in the recursi"e inference procedure( Suppose that e ha"e a deri"ation A 5A.5F.GGG(5* ( Then e can brea* into pieces B A. F.GGG(( ( * such that 5i i( Note that if 5i is a terminal. then iB5i. and the deri"ation is $ero steps( If 5i is a "ariable. e can obtain the deri"ation of 5i i by starting ith the deri"ation A and stripping a ay( All the position of the sentential forms that are either to the left or right of the positions that are deri"ed from 5i( All the steps are not rele"ant to the deri"ation of i from 5i( E723*1/ )+, >"!#*3 9,>1/1)1"/ "> 2(8+ 9":/ *()"#*)*? G1A, 8+"!) /"),8 "/ )+, 9,>1/1)1"/ "> 2(8+ 9":/ *()"#*)*? ."!0 .5 #*!@80

Lur formal definition or notation for pushdo n automation <PDA= in"ol"es se"en components( >e rite the specification of a C .L, , , , F0, M0, F0 The components ha"e the follo ing meanings( L A finite set of states. li*e the states of a finite automation( A finite set of input symbols( A finite set of symbols that are allo ed to push onto the stac*( The transition function( Formally. ta*es as argument a triples <#.a.5= here # is state in M( a is either an input symbol or \ 5 is a stac* symbol that is a member of F( F0 The stac* state. The PDA is in this state before ma*ing any transition( M0 The start symbol initially. the PDA/s stac* consists of one instance of this symbol and nothing else( F The set of accepting states or final states( E7*#23, Let us design a PDA p to accept the language L n)( Lf e shall use the stac* symbol. $D to mar* the bottom of the stac*( >e need to ha"e this symbol present so that. after e pop off the stac* and reali$e that e ) ha"e seen on the input e still ha"e something or the stac* to permit us to ma*e a transition to the accepting state #F( Thus our PDA for L nr can be described as 2C.DF0,F1,F4E,D0,1ED0,1,N0E, F0,N0,DF4E0 >here S is defined by the follo ing rules( A( <#D.D.8D=BC<#D.D8D=E and <#D.A. 8D=BC<#D.A8D=E( Lne of these rules applies initially. hen e are in state #D and e see the stac* symbol 8D at the top of the stac*( >e need the first input and push it once the stac*. lea"ing 8D belo to mas* the bottom( F( <#D.D.D=BC<#D.DD=E. <#D.D.A=BC<#D.DA=E. <#D.A.D=BC<#D.AD=E. <#D.A.A=BC<#D.AA=E( These N. similar rules allo us to ones the top of the stac* ands lea"ing the pre"ious top stac* symbol alone(

H( <#D. .8D=BC<#A.8D=E and <#D. ,D=BC<#A.D=Eand <#D. .A=BC<#.A=E( These H rules allo

P to go

from state #D to state #A spontaneously <on input= lea"ing the symbol at the top of the stac* as it is(

N( BC<#A.D.D=EB <#A. =E and <#A.A.A=BC<#A.=E( No . in state #A e can match input symbols against the top symbols on the stac* and pop hen the symbols match( I( <#A. .8D=BC<#F.8D=E( Finally. if e expose the bottom of the stac* marches to and e are in state ) #A. then e ma*e found and input of the form ( >e go to state #F and accept( G1A, 8+"!) /"),8 "/ 6!*2+1c*3 /")*)1"/8 */9 9,8c!12)1"/ "> DA? E723*1/ DA :1)+ !,>,!,/c, )" 6!*2+1c*3 /")*)1"/8 */9 9,8c!12)1"/8? ."!0 .15 #*!@80

The transition diagram of a finite automation ill ma*e the aspects of the beha"ior of a gi"en PDA clearer( So e introduce and use the transition diagram for PDA/s in hich+ A( The node corresponds to the states of a PDA( F( An arro labeled start indicates the start state and doubly circled states are accept in. as for finite automation( H( The arcs correspond to transitions of the PDA in the follo ing sense( An arc labeled a. x0 P from state # to state p means that s<#.a.x= contains the pair<p. P0 perhaps among other pairs( That is the arc labeled tells hat input is used and also gi"es the old and ne tops of the stac*( The only thing that use Left us is hich stac* symbol is the start symbol( &on"entionally. it is 8D. unless e indicate other ise( Thus it sho s the graphical representation of PDA( I/8)*/)*/,"(8 9,8c!12)1"/ "> * DA: The PDA goes from configuration to configuration. in response to the input symbols <or=times ,but unli*e the finite the state is the only thing that e need to *no about the automation. the pdas configuration of a pda by a triple <#. .P=. here # is the state is the remaining input and P is the stac* contents(

E7*#23, Let us consider the action of the PDA of example abo"e on the input AAAA( Since #D is the start state and $D is the start symbol. the initial ID is <#D.AAAA.$D=( Ln this input PDA has an opportunity of guess rongly se"eral times( The entire se#uence of ID/s that the PDA can reach from the initial Id<#D.AAAA.$D= as sho n belo ( E723*1/ )+, 3*/6(*6,8 "> DA? G1A, * <!1,> /"), "/ 3*/6(*6,8 "> DA? ."!0 .10 M*!@80

The PDA accepts its input by consuming it and entering an accepting state( >e call this approach ,acceptance by final start-( There is a second approach to defining for any PDA the language ,acceptance by empty stac*- that is the set of strings that cause the PDA to empty its stac* starting from the initial string( Acc,2)*/c, "> F1/*3 S)*), Let pB<M. . . . #D. $D. F= be a PDA( Then L<p=. the language accepted by p by final state is

C J <#D. . $D= <#. \. ^= E


That is. starting in the initial string aiting on the input. p consumes accepting state( The content of the stac* at the time is irrele"ant( Acc,2)*/c, <- E#2)- S)*c@ For each PDA pB<M. . . . #D. $D. F=(>e also define from the input and enters and

C J <#D. . $D= <#. \. \= E


For any state #( That is N<P= is the set of input that P can consume and at the same time empty its stac*(

F!"# E#2)- S)*c@ T" F1/*3 S)*), >ith the PDA diagram e ha"e to add one final state( !x+ Let us design a PDA that process se#uence of if/s and else/s in a & program. here i stands for if and e stands for else( >e shall use a stac* symbol 8 to count the difference bet een the numbers of its seen so far. and the number of e/s( This simple one state PDA is suggested by the transition diagram as follo s( The PDA is defined as. PN B C C#E.Ci.eE.C8E. n. #. 8= The abo"e PDA can be changed as follo s to accept the string by reaching the final state(

F1/*3 S)*), )" E#2)- S)*c@ From each accepting state e ha"e to add one more transition to a ne non-accepting state(

E723*1/ )+, ,F(1A*3,/c, "> DABS */9 CFGBS? =!1), )+, 81#13*!1)1,8 <,):,,/ DABS */9 CFGBS?

."!0 .10 M*!@80

The languages defined by PDA/s are exactly the context free languages( The goal is to pro"e that the follo ing three classes of languages( The context free languages. that is the languages defined by &F'/s The languages that are accepted by final state by some PDA( The languages that are accepted by empty stac* by some PDA(

F!"# G!*##*!8 )" (8+9":/ A()"#*)* 'i"en a &F' '. e construct a PDA that simulates the leftmost deri"ations of '( Any left-sentential form that is not a terminal string can be ritten as xA^. here A is the leftmost "ariable. x is hate"er terminals appear to its left and ^ is the string of terminals and "ariables that appear to the right of A( >e call A^ the tail of the left sentential form( If a left sentential form consists of terminals only. then its tail is . The idea behind the construction of a PDA form a grammar is to ha"e the PDA simulate the se#uence of left sentential forms that the grammar uses to generate a gi"en terminal string ( The trail of each sentential form xA^ appears on the stac*. ith A at the top( At that time. x ill be ,represented- by our ha"ing consumed x from the input. lea"ing hate"er of follo s its prefix x( That is if B xy. then y ill remain on the input( Suppose the PDA is in an ID<#.y.A^=. representing left sentential form xA^( It guesses the production to use to expand A. say AQ( The mo"e of the PDA us to replace A on the top of the stac* by `. entering ID<#.y.`^=( Note that there is only one state #. for this PDA( No <#.y.`^= may not be a representation of the next left-most form. because ` may has a prefix of terminals ( In fact ` may ha"e no "ariable at all. and ^ may ha"e a prefix of terminals( >hate"er terminals appear at the beginning of `^ need to be remo"ed. to expose the next "ariables ate the top of the stac*( These terminals are compared against the next input symbols. to ma*e sure our guesses at the leftmost deri"ation of input string are correct. if not. this branch of the PDA dies deri"ation of . then e shall e"entually reach the left-sentential form ( At that point. all the symbols on the stac* ha"e either been expanded or matched against the input( The stac* is empty and e accept by empty stac*( Let 'B<].T.M.S= be a &F'( &onstruct the PDA p that accepts1 L<'= by empty stac* as follo s(

PB<C#E.T. ] T . .# .s= >here transition function is defined by A( For each "ariable A. <#. . A=B C<#. `=0A ` is a production of PE( F( For each terminal a . <#.a.a=BC<#. =E !xample+ Let us con"ert the expression grammar to PDA ( The grammar is I a J b J Ia J Ib J ID J IA ! I J !:! J !3! J <!= The set of terminals for the PDA is Ca.b.D.A<.=.3:E( The R symbols and the symbols I and ! form the stac* alphabet( The transition function for the PDA is+ A( <#. . I= BC <#.a=. <#.b=. <#.Ia=. <#Ib=. <#.ID=. <#.IA= E F( <#. . != B C <#. I=. <#.!3!=. <#.!:!=.<#.<!== E H( <#.a.a=BC<#.=E1 <#.b.b=BC<#.=E1 <#.D.D=BC<#.=E1 N( <#.A.A=BC<#.=E1 <#.<.<=BC<#.=E1 <#.=.==BC<#.=E1 I( <#.3.3=BC<#.=E1 <#.:.:=BC<#.!=E1 F!"# DA )" 6!*##*!: A PDA may change state as it pops stac* symbols. so finally pops a le"el off its stac*( e should also note the state that it enters hen if

G1A, * <!1,> /"), "/ D,),!#1/18)1c (8+9":/ A()"#*)* .D DA0? .10 M*!@80 D18c(88 D DA <*8,9 "/ !,6(3*! ,72!,881"/, c"/),7) >!,, 3*/6(*6,8, */9 *#<16("(8 6!*##*!? A PDA is deterministic if there is ne"er a choice of mo"e in any situation( These choices are of t o *inds. of <#.a.5= contains more than one pair. then surely the PDA is nondeterministic because e can choose among these pairs hen decidedly on the next mo"e( 2o e"er e"en if <#.a.x= is al ays a singleton. e could still ha"e a choice bet een using a real input symbol. or ma*ing a ro on ( Thus e define a PDA pB<M..a..#D.8D.F= to be deterministic . if and only if the follo ing conditions are met( A( <#.a.x= has at most one member for any # in M. a in or aB. and x in a( F( If <#.a.x= is non empty. for some a in . then <#..x= must be empty( !xample+ The language L r that has no DPDA( If the language recogni$able by a DPDA( e put a center-mar*er 6c/ in the middle. e can ma*e

The strategy of the DPDA is to store D/s and A/s on its stac*. until it sees the context mar*ers c( If then goes to another state. in hich it matches input symbols against stac* symbols and pops the stac* if they match( If it e"er finds a non match it dies. its input cannot be of the form c )( If it succeeds in popping its stac* do n to the initial symbol. hich mar*s the bottom of the stac*. then it accepts its input( The PDA is non deterministic. because on state #D. it al ays has the choice of pushing the next input symbol onto the stac* or ma*ing a transition on to state #A1 ie(. it has to guess hen it has reached the middle ( The DPDA for L c r is sho n in the follo ing figure(

This PDA is clearly deterministic( It ne"er has a choice of mo"e in the same state. using the same input and stac* symbol( As for choices bet een using a real input symbol or . the -transition it ma*es is from #A to #F ith 8D at the top of the stac*( 2o e"er. in state #A. there are no other mo"es hen 8D is at the stac* top( R,6(3*! L*/6(*6,8 */9 D,),!#1/18)1c DABS: The DPDA/s accept a class of languages that is bet een the regular languages and the &FL/s( >e shall first pro"e that he DPDA language include all the regular languages( If e ant the DPDA to accept by empty stac*. then e find that our language recogni$ing capability is rather limited( Say that language L has the prefix property if there are no t o different strings x and y in L such that x is a prefix of y( !xample+ That is . it is not possible for there to be t o strings c ) and xcx). one of hich is a prefix of the other. unless they are the same string( To see hy suppose c ) is a prefix of xcx). and x( Then must be shorter than x( Therefore the c in c ) comes in position here xcx) has a D or A1 it is a position in the first x( That point ultraist the assumption that c ) is a prefix of xcx)( Ln the other hand. there are some "ery simple languages that do not ha"e the prefix property( &onsider CDE :. that is the set of all string of D/s( &learly there are pairs of strings in this language one of hich is a prefix of the other( So this language does not ha"e a prefix property( In fact of any t o strings. one is a prefix of the other. although the condition is stronger than e need to establish that the prefix property does not hold( D DABS */9 C"/),7) F!,, L*/6(*6,8: To see the language L c r is not regular. suppose it ere and use the pumping lemma( If n is a constant of the pumping lemma. then consider the string BDncDn. hich is in L c r( >hen e pump this string. it is the first group of D/s hose length must change. so e get in L c r strings that ha"e the ,center- mar*er not in the center( Since these strings are not in L c r e ha"e a contradiction and conclude that L c r is not regular( Ln the other hand. there are &FL/s li*e L c r that cannot be L<P= for any DPDA P( A formal proof is complex. but the intuition is transparent( If P is a DPDA accepting L c r. then gi"en a se#uence of D/s. it must store them on the stac*. or do something to count the arbitrary number of D/s( Suppose p has seen n D/s and the sees AAD n( It must "erify that there ere n D/s after the AA. and to do so it must pop its stac*( No P has seen D nAADn( If it sees the identical string next. it must accept. because the ) complete input is of the form . ith >BDnAADn( 2o e"er if it sees DmAADm for some mn. P must not accept( Since its stac* is empty. it cannot remember hat arbitrary integer n as. and must fail to recogni$e L c r correctly. our conclusion is that+

The languages accepted by DPDA/s by final state properly include the regular languages. but are properly included in the &FL/s(

E/9 "> U/1) -3


U N I T - IV E723*1/ *<"() )+, T(!1/6 #*c+1/,8? =+*) 18 T(!1/6 M*c+1/,? E723*1/ 1)8 C"#2"/,/)8. ."!0 .10 #*!@80

A problem that can/t be sol"ed by computer is called as ,undecidable- problems( Those problems can be sol"ed using a ne computing de"ice called the Turing %achine( N")*)1"/ >"! )+, T(!1/6 M*c+1/, The Turing machine consists of a finite control hich can be in any of a finite set of states( There is a tape di"ided into s#uares or cells1 each cell can hold any one of a finite number of symbols(

A Turing machine Initially. the input string of symbols chosen from the input alphabet is placed on the tape( All other tape cells initially hold a special symbol called the blan*( The blan* is a tape symbol. but not an input symbol( There is a tape head that is al ays positioned at one of the tape cells( The Turing machine is said to be scanning that cell( Initially the tape head is at the leftmost cell that holds the inputs( A mo"e of the T% is a function of the state of the finite control and the tape symbol scanned( In one mo"e. the Turing machine ill+ A( &hange state( The next state optionally may be the same as the current state( F( >rite a tape symbol in the cell scanned( H( %o"e the tape head left or right( The formal notation e shall use for a Turing machine<T%= is similar to that used for finite automata or PDA/s ( >e describe a T% by the T-tuple % B <M. V. . b. #D. P. F= >hose components ha"e the follo ing meanings+ L The finite set of state of the finite control( R The finite set of input symbols( The complete set of tape symbols( S The transition function( The "alue of b<#. 5= if it is defined. is a triple <p. ;. D=. here + p is the next state in M. ; is the symbol. in . D is a direction. either ,left- or ,right-( F0 The start state. a member of M. in hich the finite control is found initially( B The blan* symbol(

F The set of finite or accepting state. a subset of M( I/8)*/)*/,"(8 D,8c!12)1"/8 The string 5A 5F G(5i-A # 5i 5i3A (( 5n is used to describe the transition in Turing machines A( # is the state of the Turing machine( F( The tape head is scanning the ith symbol from the left( H( 5A5F((5n is the portion of the tape bet een the leftmost and the rightmost nonblan*( N( The mo"es of the Turing machine are described by notation( And. or ill be used to indicate $ero. one. or more mo"es of the T%( E7*#23,: Let us design a Turing machine ill accept the language CDnAn J ncBAE( Initially. it is gi"en a finite se#uence of D/s and A/s on its tape. preceded and follo ed by infinity of blan*s( Alternately. the T% ill change a D to an 5 and then a A to a ;. until all D/s and A/s ha"e been matched( In more details. starting at the left end of the input. it repeatedly changes a D to an 5 and mo"es to the right o"er hate"er D/s and ;/s it sees. until it comes to a A(It changes the A to a ;. and mo"es left. o"er ;/s and D/s . until it finds an 5( At that point. it loo*s for a D immediately to the right. and if it finds one. changing it to 5 and repeats the process. changing a matching A to a ;( The formal specification of the T% is %B<C# D.#A.#F.#H.#NE.CD.AE.CD.A.5.;.PE. b. #D. P. C#NE=. >here b is gi"en by the table in figure

T!*/81)1"/ D1*6!*#8

A transition diagrams consists of a set of nodes corresponding to the states of the T%( An arc from state # to state p is labeled by one or more items of the form 5 0 ; D. here 5 and ; are tape symbols. and D is a direction. either L or )( Start state is represented by the ord ,start- and an arro entering that state( Accepting states indicated by double circles( This. the only information about the T% one cannot read directly from the diagram is the symbol used for the blan*( >e shall assume that symbol is P unless e state other ise( The transition diagram for the pre"ious example is gi"en belo ( E7*#23, In this example e shall sho ho a Turing machine might compute the function . hich is called minus or proper subtraction and is defined by m n Bmax<m-n.D= that is m n is m-n if m n and D if mdn( A T% that performs this operation is specified by %B<C#D.#A.GG(#QE.CD.AE.CD.A.PE.b.#D.P= Note that. since this T% is not used to accept inputs. e ha"e omitted the se"enth component. hich is the set of accepting states( % ill start ith a tape consisting of D m ADn surrounded by blan*s( % halts ith D m-n on its tape. % repeatedly finds its leftmost remaining D and replaces it by a blan*( It then searches right. loo*ing for a A( After finding a A. it continues right. until it comes to a D. hich it replaces by a A( % then returns left. see*ing the leftmost D. hich it identifies hen it first meets a blan* and then mo"es one cell to the right( The repetition ends if either+ A( Searching right for a D. % encounters a blan*( Then the n D/s in Dm ADn ha"e all been changed to A/s and m3A of the m D/s ha"e been changed to P( % replaces the n3A A/s by one D and n P/s lea"ing m-n D/s on the tape( Since mcBn in this case. m-nBm-n( F( Peginning the cycle. % cannot find a D to change to a blan*. because the first m D/s already ha"e been changed to P( Then ncBm. so m-nBD( % replaces all remaining A/s and D/s by P and ends ith a completely blan* tape( The follo ing fig sho s the rules of the transition function b and e ha"e also represented b as a transition diagram(

The follo ing is a summary of the role played by each of the se"en states+ F0 This state begins the cycle. and also brea*s the cycle hen appropriate( If % is scanning a D. the cycle must repeat( The D is replaced by P the scanning P. then all possible matches bet een the t o groups of D/s on the tape ha"e been made. and % goes to state #I to ma*e the tape blan*( F1 In this state. % searches right. through the initial bloc* of D/s loo*ing for the leftmost hen found. % goes to state #F( F4 % mo"es right. s*ipping o"er A/s until it finds a D( It changes that D to a A. turns left ard. and enters state( 2o e"er it is also possible that there are no more D/s left after the bloc* of A/s( F3 % mo"es left. s*ipping o"er D/s and A/s until it finds a blan*( >hen it finds P. it mo"es right and returns to state #D . beginning the cycle again(

F& F5 FT

2ere. the subtraction is complete. but one unmatched D in the first bloc* as incorrectly changed to a P( % therefore mo"es left. changing A/s to P/s until it encounters a P on the tape( It changes that P bac* to D. and enters state #Q herein % halts( State #I is entered from #D hen it is found that all D/s in the first bloc* ha"e been changed to P( In this case. described in <F= abo"e. the result of the proper subtractions is D( % changes all remainiRng D/s and A/s to P and enters state #Q( The sole purpose of this state is to allo % to halt hen it has finished its tas*( If the subtraction had been a subroutine of some more complex function. then #Q ould initiate the next step of that larger computation(

L*/6(*6, "> T(!1/6 #*c+1/, */9 1)8 H*3)1/6 The input string is placed on the tape. and the tape head begins at the leftmost input symbol( If the T% e"entually enters an accepting state. then the input is accepted. and other ise not( Let % B <M. V.b D .P.F= be a Turing machine( Then L<%= is the set of strings in such that for some state p in F and any tape strings ^ and `( This definition as assumed hen e discussed the Turing machine of n n !xample hich accepts strings of the form D A ( The set of languages e can accept using a Turing machine is often called the recursi"ely enumerable languages or )! languages( There is another notation of ,acceptance- that is commonly used for Turing machines acceptance by halting( >e say a T% halts if it enters a state #. scanning a tape symbol x. and there is no mo"e in this situation( i(e(. b<#.5= is undefined( G1A, 8+"!) /"),8 "/ !"6!*##1/6 T,c+/1F(,8 >"! T(!1/6 M*c+1/,8. E723*1/ )+, !"6!*##1/6 T,c+/1F(,8 1/A"3A,9 1/ T(!1/6 M*c+1/,8. The ability of a T% is denoted by A( Storage in the State F( %ultiple trac*s H( Subroutines S)"!*6, 1/ )+, S)*), The finite control is used not only to represent a position in the ,program- of the Turing machine. but to hold a finite amount of data( 2ere e see the finite control consisting of not only a ,control- state # but three data elements A. P. and &( The techni#ue re#uires no extension to the T% model and is considered as a tuple( >e shall design a T% %B<M.CD.AE.CD.A.PE.b.7#D.P9.C7#A.P9E= That remembers in its finite control the first symbol that it sees. and chec*s that it does not appear else here on its input( Thus. % accepts the language DA:3AD:( Accepting regular languages such as this one does not stress the ability of Turing machines but it ill ser"e as a simple demonstration( The set of states M is C#D.#AE CD.A.PE( T at is the states may be thought of as pairs ith t o components( a= A control portion #D or #A that remembers hat the T% is doing control state # D indicates that % has not yet read its first symbol hile #A indicates that it has read the symbol. and is chec*ing that it does not appear else here by mo"ing right and hoping to reach a blan* cell( b= A data portion . hich remembers the first symbol seen. >hich means be D the transition function b of % is as follo s+ A( b<7#D.P9.a=B<7#A.a9.a.)= for aBD or aBA( Initially. #D is the control state .and the data portioDn the is P(. the symbol scanned is copied into the copied into the second component of the state. and % mo"es right. entering control state #A as it does so( ."!0 .10 M*!@80

F( <#A.a9.a= B <7#A.a9.a.)= here a is the ,complement- of a. that is. D if aBA and A if aBD( In state #A. % s*ips o"er each symbol D or A that is different from the one it has stored in its state. and continuous mo"ing right( H( b<7#A.a9.P= B<7#A P9.P.)= for aBD or aBA( If % reaches the first blan*. it enters the accepting state7#A.P9( M(3)123, T!*c@8 Turing machine consists of se"eral trac*s( !ach trac* can hold one symbol. and the tape alphabet of the T% consists of tuples. ith one component for each ,trac*-( A common use of multiple trac*s is to treat one trac* as holding the data and a second trac* as holding a mar*( In the present example. e shall use a second trac* explicitly to recogni$e the non-context-free language L c BC c J is in <D3A=3E The Turing machine e shall design is+ %B<M. . e. b. 7#A.P9. 7P.P9. C7#S.P9E= >here+ L The set of states is C#A.#F.GG#SECD.AE that is pairs consisting of a control state #i and a data component D or A( U The set of tape symbols is CP.:ECD.A.c.P=( The first component or trac* can be either blan* or chec*ed represented by the symbols P and : respecti"ely The input symbols are 7P.D9 and 7P.A9 identified ith D and A respecti"ely( The transition function b( S(<!"()1/,8 A Turing machine subroutine is a set of states that perform some useful process( This set of states includes a start state and another state that temporarily has no mo"es. and that ser"es as the ,return- state to pass control to hate"er other set of sates called the subroutine( The ,call- of a subroutine occurs hene"er there is a transition to its initial state( Since the T% has no mechanism for remembering a ,return address-. that is. a state to go to after it finishes. should our design of a T% call for one subroutine to be called from se"eral states. e can ma*e copies of the subroutine. using a ne set of states for each copy( The ,calls- are made to the start states of different copies of the subroutine. and each copy ,returns- to a different state( E7*#23,: >e shall design a T% to implement the function ,multiplication-( That is. our T% ill start ith Dm ADn on its tape. and ill end ith Dmn on the tape( An outline of the strategy is+ A( The tape ill in general ha"e one nonblan* string of the form Di ADn AD*n for some *( F( In one basic step. e change a D in the first group to P and add n D/s to the last group. gi"ing us a string of the form Di-A ADnAD<*3A=n ( H( As a result. e copy the group of n D/s to the end m times. once each time e change a D in the first group to P( >hen the first group of D/s completely changed to blan*s. there ill be mn D/s in the last group( N( The final step is to change the leading ADn A to blan*s. and e are done( The heart of this algorithm is a subroutine. hich e call copy( This subroutine implements step <F= abo"e copying the bloc* of n D/s to the end( &opy con"erts an ID of the form D m-* A#ADnAD<*-A=n to ID Dm-*A#IDnAD*n ( The follo ing figure sho s the transitions of subroutine copy(

This subroutine mar*s the first D ith an 5. mo"es right in state # F until it finds a blan*. copies the D there. and mo"es left in state #H to find the mar*er 5( It repeats this cycle until in state #A it finds a A instead of a D( At that point. it uses state # N to change the 5/s bac* to D/s and ends inn state #I( The complete multiplication Turing machine starts in state # D( The first thing it does is go. in se"eral steps. from ID #oDmADn to ID Dm-AA#ADnA( The transitions needed are sho n in the portion of figure to the left of the subroutine call1 these transitions in"ol"e states #D and #Q only( The purpose of state #T.#R and #S is to ta*e control after copy has Kust copRied a bloc* of n D/s and is in ID Dm-* ADn AD*n ( !"entually. these states bring us to state #Dom-*ADn( At that point the cycle starts again and copy is called to copy the bloc* of n D/s again( As an exception in state #R the T% may find that all m D/s ha"e been changed to blan*s( In that case. a transition to state #AD occurs( This state. ith the help of state #AA. changes the leading ADnA to blan*s and enters the halting state #AF( At this point. the T% is in ID #AF Dmn and its Kob is done( ."!0 .10 M*!@80

=!1), 8+"!) /"),8 "/ E7),/81"/8 )" )+, B*81c T(!1/6 M*c+1/,. E723*1/ )+, A*!1"(8 #"9,38 "> B*81c T(!1/6 M*c+1/,.

Nondeterministic Turing machine. an extension of the basic model that is allo ed to ma*e any of a finite set of choices of mo"e in a gi"en situation( This extension also ma*es ,programming- Turing machines easier. but adds no language-defining po er to the basic model( M(3)1)*2, T(!1/6 M*c+1/,8 The de"ice has a finite control and some finite number of tapes( !ach tape is di"ided into cells. and each cell can hold any symbol of the finite tape alphabet( As in the single tape T%. the set of tape symbols includes a blan*. and has a subset called the input symbols. of hich the blan* is not a member( The set of states includes an initial state and some accepting states( Initially+ A( The input. a finite se#uence of input symbols is placed on the first tape( F( All other cells of all the tapes hold the blan*( H( The finite control is in the initial state( N( The head of the first tape is at the left end of the input( I( All other tape heads are at some arbitrary cell( A mo"e of the multitape T% depends on the state and the symbol scanned by each of the tape heads( In one mo"e the multitape T% does the follo ing+ A( The control enters a ne state. hich could be the same as the pre"ious state(

F( Ln each tape. a ne tape symbol is ritten on the cell scanned( Any of these symbols may be the same as the symbol pre"iously there( H( !ach of the tape heads ma*es a mo"e. hich can be left. right or stationary( The heads mo"e independently. so different heads may mo"e in different directions. and some may not mo"e at all( %ultitape Turing machines li*e one-tape T%/s accept by entering an accepting state( EF(1A*3,/c, "> O/,-T*2, */9 M(3)1)*2, TMB8 )ecursi"ely enumerable languages are defined to be those accepted by a one-tape T%( %ultitape T%/s accept all the recursi"ely enumerable languages. since a one tape T% is multitape T%( R(//1/6 T1#, */9 )+, M*/--T*2,8-)"-"/, C"/8)!(c)1"/ The running time of T% % on input is the number of steps that % ma*es before halting( If % doesn/t halt on . then the running time of % on is infinite( The time complexity of T% % is the function T<n= that is the maximum o"er all inputs of length n. of the running time of % on ( For Turing machines that do not halt on all inputs T<n= may be infinite for some or e"en all n( The constructed one tape T% may ta*e much more running time than the multiStape T%( 2o e"er. the amounts of time ta*en by the t o Turing machines are commensurate in a ea* sense+ the one-tape T% ta*es time that is no more than the s#uare of the time ta*en by the other( >hile ,s#uaring- is not a "ery strong guarantee. it does preser"e polynomial running time( a= The difference bet een polynomial time and higher gro th rates in running time is really the di"ide bet een hat e can sol"e by computer and hat is in practice not sol"able( b= Despite extensi"e research. the running time needed to sol"e many problem has not been resol"ed closer than to ithin the same polynomial N"/9,),!#1/18)1c T(!1/6 M*c+1/,8 A nondeterministic Turing machine differs from the deterministic "ariety e ha"e been studying by ha"ing a transition function such that for each state # and tape symbol x. b<#.5= is a set of triples C<#A.;A.DA=.<#F.;F.DF.=.G(.<#*..;*.D=E here * is any finite integer( The language accepted by an NT% % is defined in the expected manner. in analogy ith the other nondeterministic de"ices. such as N FA/s and PDA/s that e ha"e studied( That is % accepts an input if there is any se#uence of choices of mo"e that leads from the initial ID ith as input. to an ID ith an accepting state( The NT%/s accept no languages not accepted by a deterministic T%( The proof in"ol"es sho ing that for e"ery NT% %N e can construct a DT% %D that explores the ID/s that %N can reach by any se#uence of its choices(If %D finds one that has an accepting state. then %D enters an accepting state of its o n( %D must be systematic. putting ne ID/s on a #ueue rather than a stac*. so that after some finite time % D has simulated all se#uences up to mo"es of %N for *BA.F.G(

E723*1/ R,8)!1c),9 T(!1/6 M*c+1/,8 :1)+ ,7*#23,. G1A, * <!1,> /"), "/ R,8)!1c),9 T(!1/6 M*c+1/,8 :1)+ ,7*#23,.

."!0 .10 M*!@80

First. e restrict the tapes of the T% to beha"e li*e stac*s( Then. e further restrict the tapes to be ,counters- that is. they can only represent one integer. and the T% can only distinguish a count of D from any non$ero count( T(!1/6 M*c+1/,8 :1)+ S,#1-1/>1/1), T*2,8 >e can assume the tape is semi-infinite that is there are no cells to the left of the initial head position( In the next theorem. e shall gi"e a construction that sho s a T% ith a semi-infinite tape can simulate one hose tape is. li*e our original T% model. infinite in both directions( The tric* behind the construction is to use t o trac*s on the semi-infinite tape( The upper trac* represents the cells of the original T% that are at or to the right of the initial head position( The lo er trac* represents the positions but in re"erse order( The exact arrangement is suggested in figure( The upper trac* represents cells 5.5.G(. here 5 D is the initial position of the head 5 A.5F. and so on. are the cells to right( &ells 5-A.5-F and so on( )epresent cells to the left of the initial position( !nd mar*er and pre"ents the head of the semi-infinite T% from accidentally falling off the left end of the tape( >e can ma*e one more restriction to our Turing machine it ne"er rites a blan*( This simple restriction. coupled ith the restriction that the tape is only semi-infinite means that the tape is at all times a prefix of nonblan* symbols follo ed by infinity of blan*s( Further. the se#uence of non blan*s al ays begins at the initial tape position( M(3)1 8)*c@ #*c+1/, >e no consider a class of machines called ,counter machines(- These machines ha"e only the ability to store a finite number of integers <,counter-=. and to ma*e different mo"es depending on hich if any of the counters are currently D( The counter machine can only add or subtract one from the counter. and cannot tell t o different non$ero counts from each other in effect. a counter is li*e a stac* on hich e can place only t o symbols+ a bottom-of-stac* mar*er that appears only at the bottom and one other symbol that may be pushed and popped from the stac*( A *-stac* machine is a deterministic PDA ith * stac*s( It obtains its input. li*e the PDA does. from an input source. rather than ha"ing the input placed on tape or stac*. as the T% does( The multi stac* machine has a finite control. hich is in one of a finite set of states( It has a finite stac* alphabet. hich it uses for all its stac*s( A mo"e of the multi stac* machine is based on+ The state of the finite control( The input symbol read. hich is chosen from the finite input alphabet( Alternati"ely. the multi stac* machine can ma*e a mo"e using input. but to ma*e the machine deterministic. there cannot be a choice of an -mo"e or a non--mo"e in any situation( The top stac* symbol on each of its stac*s( In one mo"e. the multistac* machine can+ &hange to a ne state( )eplace the top symbol of each stac* ith a string of $ero or more stac* symbols( There can be < and usually is = a different replacement string for each stac*( Thus. a typical transition rule for a *-stac* machine loo*s li*e+ <#.a.5A.5FG5*=B<p.;A.;FGG.;*= The interpretation of the rule is that state #. ith 5 i on top of the ith stac*. for iBA.FG(.*. the machine may consume a <either an input symbol or f = from is input. go to state p. and replace 5 i on top of the ith stac* by string ;i for each iBA.F.G.*( The multistac* machine accepts by entering a final state(

>e add one capability that simplifies input processing by this deterministic machine+ e assume there is a special symbol g. called the end-mar*er that appears only at the end of the input and is not part of that input( The presence of the end mar*er allo s us to *no hen e ha"e consumed all the a"ailable input. e shall see in the next theorem ho the end mar*er ma*es it easy for the multistac* machine to simulate a Turing machine( Notice that the con"entional T% needs no special end mar*er. because the first blan* ser"es to mar* the end of the input( C"(/),! M*c+1/,8 A counter machine may be thought of in on of t o ays+ A= The counter machine has the same structure as the multistac* machine. but in place of each stac* is a counter( &ounters hold any nonnegati"e integer. but e can only distinguish bet een $ero and non$ero counters( That is. the mo"e of the counter machine depends on its states. input symbol. and hich. if any. of the counters are $ero( In one mo"e the counter machine can+ a= &hange state( b= Add or subtract A from any of its counters independently( 2o e"er. a counter is not allo ed to become negati"e. so it cannot subtract A from a counter that is currently D( F= A counter machine may also be regarded as a restricted multistac* machine( The restrictions are as follo s+ a= There are only to stac* symbols. hich e shall refer to as 8 D <the bottom-of-stac* mar*er=. and 5( b= 8D is initially on each stac*( c= >e may replace 8D only by a string of the form 5i 8D for some icBD( d= >e may replace 5 only by 5i for some icBD( That is. 8D appears only on the bottom of each stac*. and all other stac* symbols. if any. are 5( >e shall use definition <A= for counter machines. but the t o definitions clearly define machines of e#ui"alent po er( The reason is that stac* 5i 8D can be identified ith the count *( In definition <F=. e can tell count D from other counts. because for count D e see 8D on top of the stac*. and other ise e see 5( T+, ":,! "> C"(/),! M*c+1/,8 The obser"ations about the languages accepted by counter machines are as follo s+ !"ery language accepted by a counter machine is recursi"ely enumerable( The reason I so that a counter machine is a special case of a stac* machine. and a stac* machine is a special case of a multi tape Turing machine. hich accepts only recursi"ely enumerable languages( !"ery language accepted by a non-counter machine is a &FL( Note that a counter. in point-of-"ie <F=. is a stac*. so a one-counter machine is a special case of one-stac* machine. i(e(. a PDA( In fact. the languages of one-counter machines are accepted by deterministic PDA/s. although the proof is surprisingly complex( The difficulty in the proof stems from the fact that the multistac* and counter machines ha"e an end mar*er at the end of their input(

=!1), 8+"!) /"),8 "/ T(!1/6 #*c+1/,8 */9 c"#2(),!8. E723*1/ )+, 8),28 1/A"3A,9 1/ )+, 81#(3*)1"/ "> T(!1/6 #*c+1/,8 <- c"#2(),!8. The claims of T% and computers can be di"ided into t o parts+ A( A computer can simulate a Turing machine(

."!0 .10 M*!@80

F( A Turing machine can simulate a computer. and can do so in an amount of time that is at most some polynomial in the number of steps ta*en by the computer( S1#(3*)1/6 * T(!1/6 M*c+1/, <- C"#2(),! 'i"en a particular T% %. e must rite a program that acts li*e %( one aspect of % is its finite control( Since there are only a finite number of states and a finite number of transition rules. our program can encode states as character strings and use a table of transitions. hich it loo*s up to determine each mo"e( Li*e ise. the tape symbols can be encoded as character strings of a fixed length. since there are only a finite number of tape symbols( A serious #uestion arises hen e consider ho out program is to simulate the Turingmachine tape( This tape can gro infinitely long. but the computer/s memory-main memory dis*. and other storage de"ices-are finite( If there is no opportunity to replace storage de"ices. then in fact e cannot1 a computer ould then be a finite automaton. and the only languages it could accept ould e regular( 2o e"er. common computers ha"e s appable storage de"ices. perhaps a- $ipdis*. for example( Since there is no ob"ious limit on ho many dis*s e could use. let us assume that as many dis*s as the computer needs is a"ailable( >e can thus arrange that the dis*s are placed in t o stac*s as sho n in the figure( Lne stac* holds the data in cells of the Turing machine tape that are located significantly to the left of the tape of the tape head( And the other stac* holds data significantly to the right of the tape head( If the tape head of the T% mo"es sufficiently far to the left that it reaches cells that are not represented by the dis* currently mounted in the computer then it prints a message ,s ap left- the currently mounted dis* is remo"ed by a human operator and placed on the top of the right stac*( The dis* on top of the left stac* is mounted in the computer. and computation resumes( Similarly. if the T%/s tape head reaches cells so far to the right that these cells are not represented by the mounted dis*. then a ,s ap right- message is printed( The human operator mo"es the currently mounted dis* to the top of the left stac*. and mounts the dis* on top of the right stac* in the computer( If either stac* is empty hen the computer as*s that a dis* from that stac* be mounted. then the T% has entered an allblan* region of the tape( In that case. the human operator must go toe the store and by a first dis* to mount( S1#(3*)1/6 * C"#2(),! <- * T(!1/6 M*c+1/, The follo ing Figure suggests ho the Turing machine ould be designed ould be designed to simulate a computer( This T% uses se"eral tapes. but it could be con"erted to a one-tape T% using the construction of multi-tape Turing machine The first tape represents the entire memory of the computer( >e ha"e used a code in hich addresses of memory ords. in numerical order. alternate ith the contents are ritten in binary( The mar*er symbols : and hare used to ma*e it easy to find the ends of addresses and contents. and to tell hether a binary string is am address or contents( The second tape is the ,instruction counter(- This tape holds one integer in binary. hich represents one of the memory locations on tape A( The "alue stored in this location ill be interpreted as the next computer instruction to be executed(

The third tape holds a ,memory address- or the contents of that address after the address has been located on tape A( To execute an instruction. the T% find the contents of one or more memory addresses that holds data in"ol"ed in the computation( Lur T% ill simulate the instruction cycle of the computer. as follo s( A( Search the first tape for an address that matches the instruction number on tape F( >e start at the g on the first tape. and mo"e right. comparing each address ith the contents of tape F( F( >hen the instruction address is found. examine its "alue( Let us assume that hen a ord is an instruction. its first fe bits represent the action to be ta*en <e(g( copy. add. branch=. and the remaining bits code an address or addresses( H( If the instruction re#uires the "alue of some address copy that address on to the third tape and mar* the position of the instruction. using a second trac* of the first tape( N( !xecute the instruction. or the part of the instruction in"ol"ing the "alue( I( After performing the instruction. and determining that the instruction is not a Kump. add A to the instruction counter on tape F and begin the instruction cycle again( The fourth tape holds the simulate input to the computer. since the computer must read its input from a file( A scratch tape is also sho n( Simulation of some computer instructions might ma*e effecti"e use of a scratch tape or tapes to compute arithmetic operations such as multiplication( Finally. e assume that the computer ma*es an output that tells hether or not its input is accepted( C"#2*!1/6 )+, R(//1/6 T1#,8 "> C"#2(),!8 */9 T(!1/6 M*c+1/,8 The running time for the Turing machine that simulates a computer are as follo s+ The issue of running time is important because e shall use the T% not only to examine the #uestion of hat can be computed at all. but hen can be compared ith enough efficiency( The di"iding line bet een the tractable O that hich can be sol"ed efficientlyifrom the intractable O problems that can be sol"e. but not fast enough for the solution to be usable Ois generally held to be bet een hat can be computed in polynomial time and hat re#uires more than any polynomial running time( Thus. e need to assure oursel"es that if a problem can be sol"ed in polynomial time on a typical computer. then it can be sol"ed in polynomial time by a Turing machine. and con"ersely( Thus the T% described abo"e can simulate n steps of a computer in D<n H= time. e need to confront the issue of multiplication as a computer instruction(

E723*1/ )+, 1/>"!#*3 9,>1/1)1"/ "> 2(8+9":/ *()"#*)* :1)+ ,7*#23,? E723*1/ DA :1)+ ,7*#23,?

."!0 .5 #*!@80

The pushdo n automation is a nondeterministic finite automation ith - transitions permitted and one additional capability. a stac* on hich it can store a string of ,stac* symbols-( The PDA can only access the information on its stac* in a first in first out ay( It recogni$es only the context free languages( >hile there are many languages. that are context free. including some e ha"e seen that are not regular languages. there are also some simple-to-describe languages that are not context free( An example of a non-context free language isCDnAnFn $ n AE the set of strings consisting of e#ual groups of D/s. A/s and F/s( A ,finite state control-. reads input. one symbol at a time( The pushdo n automation is allo ed to obser"e the symbol at the top of the stac* and to base its transition on its current state. the input

symbol. and the symbol at the top of the stac*( Alternati"ely. it ma*es a ,spontaneous- transition. using as its input instead of an input symbol( In one transition. the pushdo n automation A( &onsumes from the input the symbol that it uses in the transition( If is used for the input then no input symbol is consumed( F( 'oes to a ne state. hich may us may not be the same as the pre"ious state( H( )eplace the symbol at the top of the stac* by any string( The string could be . hich &orresponds to a pop of the stac*( It could be the same symbol that appeared at the top of the stac* pre"iously that is no change to the stac* is made( E7*#23,: Let us consider the language L
r

BC

is in <D3A=:E

This language. often referred to as , - -re"ersed- is the e"en-length palindromes o"er alphabet CD.AE( >e can design an informal pushdo n automation accepting L r. as follo s( A( Start in a state #D that represents a ,guess- that e ha"e not yet seen the middle that is e ha"e not seen the end of the string that is to be follo ed by its o n re"erse( F( >hile in state #D. e read symbols and store them on the stac*. by pushing a copy of each input symbol on to the stac*( H( At anytime. e may guess that e ha"e seen the middle that is the end of ( At this time ill be on the stac* ith the right end of at the top and the left end at the bottom( >e signify this choice by spontaneously going to state #A( Since the automation is non deterministic e actually ma*e both guesses( >e guess e ha"e seen the end of . but e also slay in static go and continue to read input and store them on the stac*( N( Lnce in state #A. e compare input symbols ith the symbol at the top of the stac*( If they match. e consume the input symbol. pop the stac*. and proceed( If they do not match. e ha"e guessed rong. our guessed as not follo ed by )( This branch dies. although other branches of the non deterministic automation may sur"i"e and read to acceptance( I( If e empty the stac*. then e ha"e ended seen some input follo ed by )( >e accept the output that as read up to this point(

E/9 "> U/1) -&


UNIT -V E723*1/ )+, c"/c,2) "> 1/)!*c)*<131)- )+,"!- <*8,9 "/ 2"3-/"#1*3 )1#,? D,8c!1<, )+, c3*88,8 */9 N 2!"<3,# (81/6 1/)!*c)*<131)- )+,"!-? ."!0 .10 M*!@80

>e introduce the basic components of intractability theory+ the classes P and NP of problems sol"able in polynomial time by deterministic and nondeterministic T%/s. respecti"ely. and the techni#ue of polynomial time reduction( !"<3,#8 8"3A*<3, 1/ 2"3-/"#1*3 )1#,: A Turing machine % is said to be of time complexity T<n= mo"es. regardless of hether or not % accepts( This definition applies to any function T<n=.such as T<n=BIDn F or t<n=BHn3InN1 e shall be interested predominantly in the case here T<n= is a polynomial in n( e say a language L is in class p if there is some polynomial T<n= such that LBL<%= for some deterministic Tm % of time complexity T<n=( A/ ,7*#23,: @!(8@*3B8 *36"!1)+#: %any problems that ha"e efficient solutions1 perhaps you studied some in a course on data structures and algorithms( These problems are generally in P( e shall consider one such problem finding there( Informally e thin* of graphs as diagram such as that of fig

There are nodes. hich are numbered A-N in this example graph( And there are edges bet een some pairs of nodes( !ach edge has a eight. hich is an integer( A spanning tree is a subset of the edges such that all nodes are connected through these edges( ;et there are no cycles( An example of a spanning tree is sho n by bold edges( A minimum- eight spanning tree has the least possible total edge eight of all spanning trees( There is a ell-*no n ,greedy- algorithm. called jrus*al/s Algorithm. for finding a %>ST( 2ere is an informal outline of the *ey ideas+ A( %aintain for each node the connected component in hich the node appears. using hate"er edges of the tree ha"e been selected so far( Initially. no edges are selected. so e"ery node is in it/s a connected component by itself( F( &onsider the lo est- eight edge that has not yet been considered1 brea* ties any ay you li*e( If this edge connects t o nodes that are currently in different connected components then+ o Select that edge for the spanning tree. and o %erge the t o connected components in"ol"ed( H( &ontinue considering edges until either all edges ha"e been considered. or the number of edges selected for the spanning tree is one less that the number of nodes( Note that in the latter case. all nodes must be in one connected component and e can stop considering edges( E7*#23,: In the graph e first consider the edge<A+H= because it has the lo est eight. AD( Since A and H are initially in different components. e accept this edge and ma*e A and H ha"e the same component number say ,component A-( The next edge in order of eights <F.H=. ith eight AF. since F and H are in the different components. e accept this edge and merge node F into ,component A-( The third edge is <A.F=. ith eight AI( 2o e"er. A and F are no in the same component. so e reKect this edge and proceed to the fourth edge. <H.N=( Since N is not in ,component A-( >e accept this edge. no . e ha"e three edges for the spanning tree of a N-node graph. and so may stop( N"/9,),!#1/18)1c 2"3-/"#1*3 )1#, Formally. e say a language L is in the class NP( if there is a nondeterministic T% and hen % is gi"en an input of length n. there are no se#uence of more than T<n= mo"es of %( Lur first obser"ation is that. since e"ery deterministic T% is a nondeterministic T% that happens ne"er to ha"e a choice of mo"es. P NP( 2o e"er. it appears NP contains many problems not in p( The intuiti"e reason is that a NT% running in polynomial time has the ability to guess an exponential number of possible solutions to a problem and chec* one in polynomial time. ,in parallel-( 2o e"er it is one of the deepest open #uestions of %athematics hether pBNP(>hether in fact e"erything that can be done in polynomial time by a NT% can in fact are done by DT% in polynomial time. perhaps ith a higher-degree polynomial( T+, T!*A,31/6 8*3,8#*/ 2!"<3,# The input to TSP is the same as to %>ST. a graph ith integer eights on the edges such as that of fig and a eight limit >( the #uestion as*ed is hether the graph as a ,2amilton circuit- of total eight at most >( A 2amilton circuit is a set of edges ith each node appearing exactly once( Note that the number of edges on a 2amilton circuit must e#ual the number of nodes in the graph( E7*#23,: The graph of fig actually has only one 2amilton circuit the cycle<A.F.N.H.A=( The total eight of the cycle is AI3FD3AR3ADBQH( Thus. if > is QH or more. the ans er is ,yes-. and if >dQH the ans er is ,no-( 2o e"er. the TSP on four-node graph is decepti"ely simple. since there can ne"er be more than t o different 2amilton circuits once e account for the different nodes at hich the same cycle can start. and for the direction in hich e tra"erse the cycle( In m-node graphs. the number of distinct cycles gro s as L<mk=( the factorial of m. hich is more than Fcm for any constant c( It appears that all ays to sol"e the TSP in"ol"e trying essentially all cycles and computing their total eight( Py being cle"er. e can eliminate some ob"iously bad choices( Put it seems that no matter hat e do. e must examine an exponential number of cycles before e can conclude that there is none

ith the desired eight limit >. or to find one if e are unluc*y in the order in hich e consider the cycles( Ln the other hand . if e had a nondeterministic computer. e could guess a permutation of the nodes. and compute the total eight for the cycle of nodes in that order( If there ere a real computer that as nondeterministic. no branch A similar amount of time( Thus. a single Otape NT% can sol"e the TSP in L <nN= time at most ( >e conclude that the TSP is in NP( "3-/"#1*3 'T1#, R,9(c)1"/8 Lur principal methodology for pro"ing that a problem PF cannot be sol"ed in polynomial time <i(e( PF is not in P= is the reduction of a problem PA. hich is *no n. not be in P. to PF(F( the approach as suggested in fig hich e reproduce here as

Suppose e ant to pro"e the statement+ if PF is in P. then so is PA-( Since e claim that PA is not in P. either( 2o e"er. the mere existence of the algorithm labeled ,&onstruct- in fig AD(F is not sufficient to pro"e the desired statement( For instance. suppose that hen gi"en an instance of PA of length m . the algorithm produced and output string of length Fm. hich it fed to the hypothetical polynomial Otime algorithm for PF( if that decision algorithm ran in .. say. time L<n *=. then on an input of length Fm it ould run in time L<F*m=. hich is exponential in m( Thus. the decision algorithm for P# ta*es. hen gi"en an input of length m. time that is exponential in m( these facts are entirely consistent ith the situation here PF is in P and PA is not in P( !"en if the algorithm that constructs a PF instance from a PF instance from a PA instance al ays produces an instance that is polynomial in the si$e of its input. e can fail to reach our desired conclusion( For instance. suppose that the instance of PF constructed is of the same si$e. m. as the PA instance. but the construction algorithm for PF that ta*es polynomial time L<n *= on input of length n only implies that there is a decision algorithm for PA that ta*es time L<Fm3m*= on input of length m( this running time bound ta*es into account the fact that e ha"e to perform the translation to PF as ell as sol"e the resulting PF instance( Again it ould be possible for PA to be in P and PF not( Thus. in the theory of intractability e shall use polynomial Otime reductions only( A reduction from pA tD pF is polynomial Otime if it ta*es time that is some polynomial in the length of ht epA instance( Note that as a conse#uence. the pF instance ill be of a length that is polynomial in the length of the pA instance( N -C"#23,), !"<3,#8 Let L be a language <problem= in NP e say L is NP Ocomplete if the follo ing statements are true about L(L is in NP( For e"ery language LA in NP there is a polynomial-time reduction of D to L( An example of an NP-complete problem. as e shall see. is the Tra"eling salesman problem. hich e introduced already( Since it appears that PUNP. and in particular. all the NP-complete problems are in NP-P. e generally "ie a proof of NP-completeness for a problem as a proof that the problems is not in P( >e shall pro"e our first problem. called SAT <for Poolean satisfiability= to be NP-complete by sho ing that the language of e"ery polynomial-time NT% has a polynomial Otime reduction to SAT( 2o e"er. once e ha"e some NP-complete by reducing some *no n NPBcomplete problem to it. using a polynomial-time reduction( E723*1/ */ N -C"#23,), !"<3,#?."!0 M*!@80 .10

D,8c!1<, *<"() )+, N c"#23,), 2!"<3,#? This problem hether a Poolean expression is satisfiableiis pro"ed NP-complete by explicitly reducing the languages of any nondeterministic. polynomial-time T% to the satisfiability problem( The satisfiability problem The Poolean expression is built from+ A( ]aribles hose "alues are Poolean+ i(e(. they either ha"e the "alue A<true=or D false( F( Pinary operators l and ] standing for the logical AND or L) of the t o expressions( H( 4nary operator Ostanding for logical negation( N( Parentheses to group operators and operands. if necessary at alter the default precedence of operators+highest. then l( and finally "( E7*#23,: An example of a Poolean expression is x l-<y "$=( The sub expression y " $ is true hene"er either "ariable y or "ariable $ has the "alue true. but the sub expressions is false hene"er both y and $ are false( The larger sub expression O<y ] $= is true exactly hen y ] $ is false. that is . hen both y and $ are false( If either y or $ or both are true. then-<y ] $= is false( Finally. consider the entire expression( Since it is the logical AND of t o sub expressions. it is true exactly hen both sub expressions are true( That is. x l O<y ] $= is true exactly hen x is true. y is false. and $ is false( A truth assignment for a gi"en Poolean expression ! assigns either true or false to each of the "ariables mentioned in !( The "alue of expression ! gi"en a truth assignment T. denoted !<t= is the result of e"aluation ! ith each "ariable x replaced by the "alue T<x= <true or false= that T assigns to x( A truth assignments T satisfies Poolean expression ! if !<T=BA1 the truth assignment t ma*es expression ! true A Poolean expression ! is said to be satisfiable if there exists at least one truth assignment T that satisfies !(

R,2!,8,/)1/6 SAT 1/8)*/c, The symbols in a Poolean expression are l.].- the left and right parentheses. and symbols representing "ariables( The satisfiability of an expression does not depend on the names of the "ariables. only on hether t o occurrences of "ariables are the same "ariables are or different "ariables( Thus. e may assume that the "ariables are xA.xFGG. although in examples e shall continue to use "ariable names li*e y or $. as ell as x/ s ( e shall also assume that "ariables renamed as e use the lo est possible subscripts for the "ariables ( For instance. Since there are an infinite number of symbols that could in principle. appear in a Poolean expression. e ha"e a familiar problem o ha"ing to de"ice a code ith a fixed. finite alphabet to represent expressions ith arbitrarily large number of "ariables( Lnly then can e tal* about SAT as a ,problem-. that is. as a language o"er a fixed alphabet consisting of the codes for those Poolean expressions that are satisfiable( The code e shall use is as follo s( A( The symbols l .].m( <.and= are represented by themsel"es( F( The "ariable xi is represented by the symbol x follo ed by D/s and A/s that represent I in binary( Thus. the alphabet for the SAT problem0language has only eight symbols( All instances of SAT are strings in this fixed. finite alphabet( !xample consider the expression x l m<y ] $= form example AD(Q our first step in coding it is to replace the "ariables by subscripted x/s( Since there are three "ariables( >e must use xA. xF.and xH( e ha"e freedom regarding hich of x. y . and $ is replaced by each of the xi/s and to be specific let xBxA . yBxF and $BxF( Then the expression becomes xi l m <xF ] xH=( The code for this expression is 5A l m < x AD ] xAA= Notice that the length of a coded Poolean expression is approximately the same as the number of positions in the expression. counting each "ariable be current as i( the reason for the difference is a that if the expression has m position it can ha"e L<m= "ariables. so "ariables may ta*e L<log m= symbols to code( Thus. an expression hose length is m positions can ha"e a code as long as nBL<m log m= symbols( 2o e"er. the difference bet een m and m log m is surely limited by a polynomial( Thus. as long as e only deal ith the issue of hether or not a problem can be sol"ed in time that is polynomial in its input

length. there is no need to distinguish bet een the length of an expression/s code and the number of positions in the expression itself( N -C"#23,),/,88 "> )+, SAT 2!"<3,# >e no pro"e ,coo*/s Theorem-. the fact that SAT is NP-complete( To pro"e a problem is NPcomplete. e need first to sho that it is in NP( Then. e must sho that e"ery language in NP reduces to the problem in #uestion( In general. e sho the second part by offering a polynomial O time reduction from some other NP-complete problem at right no 1 e don/t *no any NP-complete problems to reduce to SAT( Thus. the only strategy a"ailable is to reduce absolutely e"ery problem in NP to Sat( E723*1/ )+, C"#23,#,/)8 "> L*/6(*6,8 1/ N ? ."!0 . 10 M*!@80 =!1), 8+"!) /"),8 "/ )+, c"#23,#,/)8 "> 3*/6(*6,8 1/ N ? The class of languages P is closed under complementation for a simple argument hy. let L be in P and let % be a T% for L( %odify % as follo s. to accept L( Introduce a ne accepting state # and ha"e the ne T% transition to # hene"er % halts in a state that is not accepting( %a*e the former accepting states of % be non accepting( Then the modified T% accepts L. and runs in the same amount of time that % does. ith the possible addition of one mo"e( Thus. L is in P if L is( It is not *no n hether NP is closed under complementation( It appears not. ho e"er. and in particular e expect that hene"er a language L is NP complete. then its complement is not in NP( T+, C3*88 "> L*/6(*6,8 C"-N &o-NP is the set of languages hose complements are in NP( >e obser"ed at the beginning of section that e"ery language in P has its complement also in P. and therefore in NP( Ln the other hand. e belie"e that none of the NP-complete problems ha"e their complements in NP. and therefore no NPcomplete problem is in co-NP( Li*e ise. e belie"e the complements( 2o e"er. e should bear in mind that. should P turn out to e#ual NP. then all three classes are actually the same( E7*#23,: &onsider the complement of the language SAT hich is surely a member of 6&o-NP e shall refer to this complement as 4SAT include all those that code Poolean expressions that are not satisfiable( 2o e"er. also in 4SAT are those strings that do not code "alid Poolean expressions . because surely none of those strings that do not code "alid Poolean expressions. because surely none of those strings are in SAT( >e belie"e that 4SAT is not in NP but there is no proof(

N -C"#23,), !"<3,#8 */9 C"-N

Let us assume that PB NP( It is still possible that the situation regarding co-NP is not exactly as suggested by figure( Pecause e could ha"e NP and are in NP. and yet not be able to sol"e them in deterministic polynomial time( 2o e"er. the fact that e ha"e not been able to find e"en one NP complete problem hose complement hose complement is in NP is strong e"idence that NPB co-NP. as e pro"e in the next theorem( )efer the theorem NPBco-NP if and only if there is some NP-complete problem hose complement is in NP( =!1), 8+"!) /"),8 "/ !"<3,#8 S"3A*<3, 1/ "3-/"#1*3 S2*c,?."!0 .10 M*!@80 D,8c!1<, )+, 2!"<3,# 8"3A1/6 1/ 2"3-/"#1*3 82*c,?."!0 E723*1/ 2"3-/"#1*3 82*c, "/ T(!1/6 #*c+1/,?."!0S+": )+, !,3*)1"/8+12 "> S */9 N S 2!,A1"(839,>1/,9 c3*88,8? T%/s same( Initially. e shall distinguish bet een the languages accepted by deterministic and nondeterministic ith a polynomial space bound. but e shall soon see that these t o classes of languages are the

There are complete problems P for polynomial space in the sense that all problems in this class are reducible in polynomial time to P( Thus. if P is in P or in NP. then all languages ith polynomial space bounded T%/s are in P or NP respecti"ely( >e shall offer one example of such a problem+ ,#uantified Poolean formulas-( "3-/"#1*3-S2*c, T(!1/6 M*c+1/,8 A polynomial-space-bounded Turing machine is suggested( There is some polynomial p<n= such that hen gi"en input of length n. the T% ne"er "isits more than p<n= cells of its tape( Define the class of languages PS to include all and only the languages that are L<m= for some polynomial-space-bounded. deterministic Turing machine %( Also. define the class NPS<nondeterministic polynomial space= to consist of those languages that are L<%= for some nondeterministic. polynomial-spacebounded T% %( !"idently PS. NPS. since e"ery deterministic T% is technically nondeterministic also( 2o e"er. e shall pro"e the surprising result that PSBNPS R,3*)1"/8+12 "> S */9 N S )" !,A1"(83- D,>1/,9 C3*88,8 To start. the relationships P PS and NP NPS should be ob"ious( The reason is that if a T% ma*es only a polynomial number of mo"es then it uses no more than a polynomial number of cells+ in particular. it cannot "isit more cells than one plus the number of mo"es it ma*es( Lnce e pro"e PS-NPS. e shall see that in fact the three classes form a chain of containment+ P NP PS(

An essential property of polynomial-space-bounded T%/s is that they can ma*e only an exponential number of mo"es before they must repeat an ID( >e need this fact to pro"e other interesting facts about PS. and also to sho that PS contains only recursi"e languages1 i(e(. languages ith algorithms( Note that there is nothing in the definition of PS or NPS that re#uires the T% to halt( It is possible that the T% cycles fore"er. ithout lea"ing a polynomial si$ed region of its tape( D,),!#1/18)1c */9 N"/9,),!#1/18)1c "3-/"#1*3 S2*c, Since the comparison bet een P and NP seems so difficult. it is surprising that the same comparison bet een PS and NPS is easy+ they are the same classes of languages( The proof in"ol"es simulating a nondeterministic T% that has a polynomial space bound p<n= by a deterministic T% ith polynomial space bound L<pF<n==( The heart of the proof is a deterministic. recursi"e test for hether a NT% N can mo"e from ID I to ID n in at most m mo"es( A DT% D systematically tries all middle ID/s j to chec* hether I can become j in m0F mo"es. and function reach <I.n.m= that decides if I n by at most m mo"es( Thin* of the tape of D as a stac*. here the arguments of the recursi"e calls to reach are placed( That is. in one stac* frame D holds 7I.n.%9( A s*etch of the algorithm executed by reach( BOOLEAN FUNCTION !,*c+.I,J,#0 ID: I,JVINT: #V BEGIN IF .#C C10 THEN $G <*818 G$BEGIN T,8) 1> IC C J "! I c*/ <,c"#, J *>),! "/, #"A,V RETURN TRUE IF SO, FALSE 1> /")V ENDV ELSE $G 1/9(c)1A, 2*!) G$BEGIN FOR ,*c+ 2"881<3, ID K DO IF .!,*c+ .I,K,#$40 AND !,*c+.K,J,#$400 THEN RETURN TRUEV RETURN FALSEV ENDV ENDV

It is important to obser"e that. although reach calls itself t ice. it ma*es and therefore only one of the calls is acti"e at a time( Ln. until at some point the third argument becomes A( At that point. reach can apply the basis step. and needs no more recursi"e calls( It tests if IBn or I-n. returning T)4! if either holds and FALS! if neither does( Figure suggests hat the stac* of the DT% D loo*s li*e hen there are as many acti"e calls to reach as possible. gi"en an initial mo"e count of m(

>hile it may appear that many calls to reach are possible. and the tape of can become "ery long. e shall sho that it cannot become ,too long-( That is. if started ith a mo"e count of m. there can only be logF m stac* frames on the tape at any one time( Since Theorem assures us that the NT% N cannot ma*e more than cp<n= mo"es. m does not ha"e to start ith a number greater than that( Thus. the number of stac* frames is at most( LogFcp<n= hich is L<p<n==( >e no ha"e the essentials behind the proof of the follo ing theorem( Theorem+ <Sa"ithc/s Theorem= PSBNPS( ROOF: It is ob"ious that PS NPS since e"ery DT% is technically a NT% as ell( Thus. e need only to sho that NPS PS1 that is if L is accepted by some NT% N ith space bound p<n= ( for some polynomial p<n= then L is also accepted by some DT% D ith polynomial space bound #<n= for some other polynomial #<n=( In fact e shall sho that #<n= can be chosen to be on the order of the s#uare of p<n=( First. e may assume by Theorem AA(H that if N accepts. it does so ithin c A3p<n= steps for some constant c( 'i"en input of length n. D disco"ers hat N does ith input by repeatedly placing the triple 7Io.n.m9 on its tape and calling reach ith these arguments. here+ A( Io is the initial ID of N ith input ( F( n is any accepting ID that uses at most p<n= tape cells1 the different n/s are enumerated systematically by D. using a scratch tape( H( mBlogF cA3p<n=( >e argued abo"e that there ill ne"er be more than logFm recursi"e calls that are acti"e at the same time one ith third argument m. one ith m0F one ith m0N and so on. do n to A( Thus. there are no more than logFm stac* frames on the stac* and logF m is L<p<n==( In binary. it re#uires B logFcA3p<n= cells. hich is L<p<n==( Thus. the entire stac* frame. consisting of t o ID/s and an integer. ta*es L<p<n== space( Since D can ha"e L <p<n== stac* frames at most. the total amount of space used is o<p F<n==( This amount of space is a polynomial if p<n= is polynomial( So e conclude that L has a DT% that is polynomialspace bounded( In summary. e can extend hat e *no about complexity classes to include the polynomial-space classes( The complete diagram is sho n in fig(

E/9 ">

U/1) -5

You might also like