Professional Documents
Culture Documents
McCLUR, Editor
CR CATEGORIES:
The Algorithm
The specific implementation of this algorithm is a compiler that translates a regular expression into I B M 7094
code. The compiled code, along with certain runtime
routines, accepts the text to be searched as input and
finds all substrings in the text that match the regular
expression. The compiling phase of the implemention does
not detract from the overall speed since any search routine
must translate the input regular expression into some
sort of machine accessible form.
Volume 11 / Number 6 / June, 1968
In the compiled code, the lists mentioned in the algorithm are not characters, but transfer instructions into
the compiled code. The execution is extremely fast since
a transfer to the top of the current list automatically
searches for all possible sequel characters in the regular
expression.
This compile-search algorithm is incorporated as the
context search in a time-sharing text editor. This is by
no means the only use of such a search routine. For
example, a variant of this algorithm is used as the symbol
table search in an assembler.
I t is assumed that the reader is familiar with regular
expressions [2] and the machine language of the I B M 7094
computer [3].
The Compiler
419
NNODE onto the existing code to produce the final regular expression in the only stack entry. (See Figure 5.)
s(o)
b~c
FIG. 2
[blc)~
FiG. 3
s(o)-
j
a-(bic)*
FIG. 4
Communications of t i l e A C M
FIG. 5
a-(b c}~-d
FIG. 1
end
6 / J u n e , 1968
Runtime
TRA
TXL
TXH
TSX
TRA
TXL
TXH
TSX
TRA
TXL
TXH
TSX
TRA
TSX
TRA
TRA
TSX
CODE+i
FAIL,1,-'a w--1
FAIL,1,-Wa t
NNODE,4
CODE+16
FAIL,l, - Ib' - 1
FAIL,1,-'b I
NNODE,4
CODE+16
FAIL,1,-'cT-I
FAIL,l, - ~cI
NNODE,4
CODE+16
CNODE,4
CODE+9
CODE+5
CNODE,4
TRA
CODE+i3
TRA
TXL
TXH
TSX
TRA
CODE+19
FAIL,1,--'d'--i
FAIL,1,--'d'
NNODE,4
FOUND
0
1
2
3
4
5
6
7
$
9
10
11
12
13
14
15
16
17
18
19
20
21
22
eof
**,7
CLIST,7
CLIST+i,7
PCA ,4
ACL TSXCMD
SLW CLIST,7
TXI
SCA
*+1,7,-1
CNODE,7
TRA 2,4
TSX
1,2
TXI
SCA
,+1,7,-1
NNODE,7
NLIST COUNT
PLACE NEW TSX **,2
INSTRUCTION
INCREMENT NLIST
COUNT
*
.d
CLIST COUNT
MOVE TRA XCHG
INSTRUCTION
INSERT NEW TSX **,2
INSTRUCTION
INCREMENT CLIST
COUNT
RETURN
CONSTANT, NOT
EXECUTED
TRA 1,2
V o l u m e 11 / N u m b e r 6 / J u n e , 1968
**,7
,4
TSXCMD
NLIST,7
TSXCMD
AXC
PCA
ACL
SLW
TRA 1,2
Routines
AXC
CAL
SLW
NNODE
X1
X2
LAC
AXC
TXL
TXI
CAL
SLW
TXI
CLA
SLW
NNODE,7
0,6
X2,7,0
*+1,7,1
NLIST,7
CLIST,6
X1,6,-1
TRACMD
CLIST,6
SCA
CNODE,6
SCA
NNODE,0
TSX GETCHA,4
PAC ,1
TSX CODE,2
TRA CLIST
TRACMD TRA XCHG
SCA NNODE,0
TRA XCHG
421
the code in order to obtain complete results. F O U N D depends upon the use of the search routine and is therefore
not discussed in detail.
The integer procedure G E T C H A (called from X C H G )
obtains the next character from the text to be searched.
This character is right adjusted in the accumulator.
G E T C H A must also recognize the end of the text and
terminate the search.
Notes
Code compiled for a** will go into a loop due to the
closure operator on an operand containing the null regular
expression, k. ']?here are two ways out of this problem. T h e
first is to not allow such an expression to get t h r o u g h the
s y n t a x sieve. I n m o s t practical applications, this would
not be a serious restriction. T h e second w a y out is to
recognize l a m b d a separately in operands and r e m e m b e r
the C O D E location of the recognition of lambda. This
means t h a t a , is compiled as a search for h l a a , . I f the
Connmunications of t h e ACM
if lambda[lc--1] ~ 0 t h e n
code[lambda[le--1]] := instruction('tra', value('fail'), O, 0);
lambda[lc-1] := pc+5;
pc := pc+6;
go to advance;
or: