Professional Documents
Culture Documents
T (N ) =
T N
+ bN
if N = k > 1 ,
if N = 1,
represents the arithmetic cost of an algorithm which solves the original problem of size
N by combining the results from (recursively) solving problems of size N/. In this
chapter, the cases for = 4 as well as = 2s are considered.
11.1
that is,
(11.1)
N 1
Xr =
r
x N
,
r = 0, 1, . . . , N 1
=0
1
N
4
x4k N
k=0
N
4
1
N
4
r(4k)
x4k+1 N
k=0
N
4
r(4k)
x4k N
+N
k=0
1
N
4
r(4k+1)
x4k+2 N
k=0
1
x4k+1 N
2r
+N
k=0
r(4k+3)
x4k+3 N
k=0
N
4
r(4k)
1
N
4
r(4k+2)
1
N
4
r(4k)
x4k+2 N
k=0
3r
+N
r(4k)
x4k+3 N
k=0
By decimating the time series into four sets, namely the set {yk |yk = x4k , 0 k
N/4 1}, the set {zk |zk = x4k+1 , 0 k N/4 1}, the set {gk |gk = x4k+2 , 0
k N/4 1}, and the set {hk |hk = x4k+3 , 0 k N/4 1}, the four subproblems
4
with period of N/4 can be dened after the appropriate twiddle factor N = N
is
4
identied. The four subproblems are
(11.2)
1
N
4
Yr =
1
N
4
r(4k)
x4k N
k=0
4 rk
x4k N
1
N
4
k=0
yk rk
N ,
4
k=0
r = 0, 1, . . . , N/4 1.
(11.3)
1
N
2
Zr =
1
N
4
r(4k)
x4k+1 N
k=0
4 rk
x4k+1 N
1
N
4
k=0
k=0
zk rk
N ,
r = 0, 1, . . . , N/4 1.
gk rk
N ,
r = 0, 1, . . . , N/4 1.
hk rk
N ,
r = 0, 1, . . . , N/4 1.
(11.4)
1
N
2
Gr =
1
N
4
r(4k)
x4k+2 N
k=0
4 rk
x4k+2 N
1
N
4
k=0
k=0
(11.5)
1
N
2
Hr =
1
N
4
r(4k)
h4k+3 N
k=0
k=0
4 rk
x4k+3 N
1
N
4
k=0
The size of each subproblem is thus N/4, which is equal to the number of input data
points or the number of computed output data points in one period. After these four
subproblems are each (recursively) solved, the solution to the original problem of size
N can be obtained according to (11.1) for r = 0, 1, . . . , N 1. Since the series Yr has
a period of N/4, Yr = Yr+ N = Yr+ N = Yr+ 3N , and the same applies to the series
4
2
4
Zr , Gr , and Hr . The output Xr may be expressed in terms of Yr , Zr , Gr , and Hr for
(11.6)
r+ N
4
(11.7)
Xr+ N = Yr + N
(11.8)
Xr+ N = Yr + N
(11.9)
Xr+ 3N = Yr + N
r+ N
2
3(r+ N
4 )
Gr + N
Hr .
2(r+ N
2 )
3(r+ N
2 )
Gr + N
Hr .
Zr + N
r+ 3N
4
2(r+ N
4 )
Zr + N
2(r+ 3N
4 )
Zr + N
N
3(r+ 3N
4 )
Gr + N
Hr .
3N
4
N
2
4
= j, N2 = (j)2 = 1, and
(11.10)
(11.11)
r
2r
3r
Xr+ N = Yr jN
Zr N
Gr + jN
Hr ,
(11.12)
r
2r
3r
Xr+ N = Yr N
Zr + N
Gr N
Hr ,
r
2r
3r
Xr+ 3N = Yr + jN
Zr N
Gr jN
Hr ,
(11.13)
where r = 0, 1, . . . , N/4 1.
If the radix-4 algorithm is implemented based on equations (11.10), (11.11), (11.12),
and (11.13), one step of the radix-4 algorithm will require more arithmetic operations
than two steps of the radix-2 algorithm, because some partial results were computed
more than once. However, if such partial results can be identied and computed only
once, one step of the radix-4 algorithm can require fewer arithmetic operations than
two steps of the radix-2 algorithm, and the total cost of the radix-4 algorithm can be
lower than the radix-2 algorithm. The four recurrent partial results are shown below
inside each pair of parentheses.
r
2r
3r
Xr = Yr + N
Gr + N
Zr + N
Hr ,
r
2r
3r
Xr+ N = Yr N
Gr j N
Zr N
Hr ,
4
r
3r
2r
Xr+ N = N
Zr + N
Hr + Yr + N
Gr ,
2
r
3r
2r
Xr+ 3N = j N
Zr N
Hr + Yr N
Gr ,
(11.14)
(11.15)
(11.16)
(11.17)
11.1.1
2r
To determine the arithmetic cost of the radix-4 FFT algorithm, observe that N
Gr ,
r
3r
N Zr and N Hr need to be computed before the four partial sums can be obtained.
Since the size of each subproblem is N/4, 3N/4 complex multiplications and N complex
additions are performed during the rst stage of buttery computation. The second
stage of buttery computation involves no multiplication by the twiddle factors, so
only N complex additions are needed. Thus, 3N/4 complex multiplications and 2N
complex additions are required to implement the buttery computation in Figure 11.1.
Recall again that the arithmetic cost of computer algorithms is measured by the
number of real arithmetic operations, and that one complex addition incurs two real
additions according to (2.1), and one complex multiplication (with pre-computed intermediate results involving the real and imaginary parts of a twiddle factor) incurs three
real multiplications and three real additions according to (2.4). Accordingly, 9N/4 real
multiplications and 25N/4 real additions are required per step of radix-4 DIT FFT.
Thus, one step of the radix-4 DIT FFT algorithm requires 17N/2 ops in total.
Since the objective of developing the radix-4 algorithm is to minimize the essential
real operations, a careful analysis of the cost should exclude the trivial multiplication
0
by N
= 1 and 4 = (j) = 1 or j, since they will certainly not be done in an
ecient implementation. Furthermore, note thatthe cost of multiplication by a twiddle
factor which is an odd power of 8 = (1 j)/ 2 is less than the cost of a complex
multiplication because
82+1 = 4 8 = (j)
(11.18)
1j
1j
or
1+j
2r
r
3r
These special factors are identied from the computation of N
Gr , N
Zr , and N
Hr
N
for r = 0, 1, . . . , 4 1 below.
Table
Table 11.1
11.1 Special
Specialcases
cases of
of twiddle-factor
twiddle-factor multiplication
multiplication in
in the
the radix-4
radix-4 algorithm.
algorithm.
W=G
N
P-
1 x G,. = Gr
zp = ZP
WLZP
1x
W3rH
N
1 x H, = H,
wdGr =
-jGr
W-L
W8Zr
Will,
w;G
-
Thus, there are eight special cases: the four cases involving multiplication by 1 and
j are trivial, and the other four cases involving the multiplication by an odd power
of 8 are to be treated specially. The total nontrivial complex multiplications thus is
reduced to 34 N 8. Since only 4 ops are needed to compute each of 8 Gr , 83 Gr ,
special twiddle-factor
r=O
o<r<+&-1
Ir .G
r=
wN/4
1 x G, = G,
w;;,4i 2,
lx&
W$,4iH,
1 x HP = H,
= 2,
f -&
( 14
w4Gr = -jG,
multiplications
r=
in the radix-4
-g
waGr
r=
algo-
g -&
( ).A
w%G
W8.G
l&f,
(11.21)
The derivation above conrms similar results given in [70, 1981] and [46, 1996].
Therefore, compared to the arithmetic cost of T (N ) = 5N log2 N of the radix-2
algorithm in (3.10), the saving by the radix-4 algorithm is 15 percent. It will be shown
in Chapter 12 that the split-radix algorithm can further reduce the arithmetic cost to
T (N ) = 4N log2 N + (N ), which represents a saving of 25 percent compared to the
radix-2 algorithm.
11.2
A radix-4 DIF FFT algorithm can be derived from recursively decimating the frequency
series into four subsets, i.e., the set denoted by Yk = X4k for 0 k N/4 1, the
set denoted by Zk = X4k+1 for 0 k N/4 1, the set denoted by Gk = X4k+2 for
(11.22)
N 1
Xr =
r = 0, 1, . . . , N 1 ,
r
x N
,
=0
1
1
N
4
N
2
r
x N +
=0
= N
4
1
N
4
r
x N +
r (+ N
4 )
x+ N N
= 3N
4
1
x N +
x+ N N
x+ N N +
4
42r
r (+ 3N
4 )
x+ 3N N
4
=0
1
1
N
4
r
=0
1
N
4
r (+ N
2 )
=0
1
4r
r
x N
N
4
N
4
r
=0
N
4
= N
2
=0
N
4
N 1
r
x N
+
=0
1
x N +
1
N
4
1
3N
4
r
N
4
r
x+ N N +
2
43r
r
x+ 3N N
4
=0
=0
r
x + x+ N 4r + x+ N 42r + x+ 3N 43r N
.
4
=0
N
4
Yk = X4k =
4k
x + x+ N 44k + x+ N 424k + x+ 3N 434k N
4
=0
1
N
4
x + x+ N + x+ N + x+ 3N k
N
4
=0
(11.23)
N
4
x + x+ N
2
+ x+ N + x+ 3N
k
N
4
=0
1
N
4
y k
N ,
4
=0
k = 0, 1, . . . , N/4 1.
(11.24)
1
N
4
Zk = X4k+1 =
2(4k+1)
3(4k+1)
+ x+ 3N 4
4
=0
1
N
4
x x+ N
2
j x+ N x+ 3N
N
N k
4
=0
1
N
4
=0
z k
N ,
4
k = 0, 1, . . . , N/4 1.
(4k+1)
(11.25)
1
N
4
Gk = X4k+2 =
2(4k+2)
3(4k+2)
(4k+2)
+ x+ 3N 4
4
=0
1
N
4
x + x+ N
2
2
x+ N + x+ 3N
N
N k
4
=0
1
N
4
g k
N ,
4
=0
k = 0, 1, . . . , N/4 1 .
(11.26)
1
N
4
Hk = X4k+3 =
2(4k+3)
3(4k+3)
+ x+ 3N 4
4
(4k+3)
=0
1
N
4
x x+ N
2
3
+ j x+ N x+ 3N
N
N k
4
=0
1
N
4
=0
h k
N ,
4
k = 0, 1, . . . , N/4 1.
To form these four subproblems using two stages of buttery computation, the partial
sums identied above are rst rearranged to facilitate the buttery computation as
shown below.
(11.27)
(11.28)
(11.29)
(11.30)
N
y = x + x+ N + x+ N + x+ 3N , 0
1.
2
4
4
4
N
z = x x+ N j x+ N x+ 3N
N
1.
, 0
2
4
4
4
N
2
g = x+ N + x+ 3N + x + x+ N
N
, 0
1.
4
4
2
4
N
3
h = j x+ N x+ 3N + x x+ N
N
, 0
1.
4
4
2
4
The computation represented by (11.27), (11.28), (11.29), and (11.30) can now be
represented by the two stages of buttery computation in Figure 11.2.
Figure 11.2 The radix-4 DIF FFT butteries.
11.3
The techniques used to develop the radix-2 and the radix-4 FFT algorithms can be
generalized to develop the entire class of radix-2s FFTs. Setting q = 2s , a radix-q DIT
FFT algorithm may be developed from decomposing (3.1) into q partial sums:
N 1
Xr =
r = 0, 1, . . . , N 1 ,
r
x N
,
=0
N
q1
q
r(qk+u)
xqk+u N
u=0 k=0
(11.31)
q1
N
q
ur
u=0
q1
u=0
r(qk)
xqk+u N
k=0
N
q
ur
rk
q
xqk+u (N
)
k=0
q1
q 1
ur
.
=
N
xqk+u rk
N
u=0
k=0
Observe that (3.3) and (11.1) are special cases of the equation above when q = 2 and
q = 4. According to (11.31), the time series can be decimated into q = 2s sets so that
Nq 1
each of the q partial sums represented by k=0
xqk+u rk
N for u = 0, 1, . . . , q 1, can
q
be recursively computed independent of each other. Each partial sum represents the
DFT of a subproblem of size N/q.
The output frequencies are computed as q separate segments, and each segment denoted by X+ N has N/q consecutive elements indexed by , where 0 N/q1 and
q
N
u N
q1
q 1
N
k
+
(
)
q
u
X+ N =
N
xqk+u N
Nq
q
u=0
(11.32)
=
q1
u=0
k=0
q 1
u u
,
N
q
xqk+u k
N
k=0
= 0, 1, . . . , N/q 1.
Of course, to minimize the arithmetic cost, the computation of the q frequency segments
should be reorganized to avoid redundant computation as demonstrated earlier in the
derivation of the radix-4 DIT FFT algorithm. Observe also that by substituting q =
2, = 0, 1 in (11.32), one obtains (3.7) and (3.8) for computing the two frequency
segments in the radix-2 DIT FFT algorithm; on the other hand, by substituting q = 4,
= 0, 1, 2, 3 in (11.32), one obtains the four equations (11.6), (11.7), (11.8), and (11.9)
for computing the four frequency segments in the radix-4 DIT FFT algorithm.
To develop a radix-q DIF FFT algorithm, one would simply decimate the frequency
series Xr into q sets with each set containing {Yk (u)|Yk (u) = Xqk+u , 0 k N/q 1}
N 1
Xr =
r = 0, 1, . . . , N 1 ,
r
x N
,
=0
N
q1
q
r (+ N
q )
x+ N N
q
=0 =0
N
q
(11.33)
q1
r (+ N
q )
x+ N N
q
=0 =0
N
q
q1
r N
q
x+ N N
q
=0 =0
N
q
1 q1
x+ N
q
=0
r
N
qr
r
N
.
=0
=0
N
q
(11.34)
1 q1
x+ N q(qk+u)
q
=0
N
q
=0
=0
u
N
k
N ,
q
=0
y (u) k
N ,
q
k = 0, 1, . . . , N/q 1 .
(u)
Note that the N/q input data points to each subproblems are labelled by y for
u = 0, 1, . . . , q 1. To show that the radix-2 and radix-4 DIF FFT are special cases
when q = 2 and q = 4, the generalized formulae for forming each of the q = 2s 2
subproblems are explicitly identied from (11.34) and it is displayed once again below.
(0)
(1)
Observe that when q = 2, y in (3.13) and z in (3.15) correspond to y and y in
the generalized formula; when q = 4, y , z , g and h in equations (11.23) to (11.26)
(0)
(1)
(2)
(3)
correspond to y , y , y and y in this generalized formula.
q1
u
y (u) =
(11.35)
x+ N q(qk+u) N
, = 0, 1, . . . , N/q 1 .
q
=0
Since it is known that a radix-q FFT for q = 2s > 4 is less ecient than the probably optimal split-radix algorithm which recursively applies both radix-2 and radix-4
algorithms to solve each subproblem [86], further details on higher radix algorithms are
omitted here, and readers are referred to [5, 84] for details about the popular radix-8
and radix-16 FFT algorithms.