You are on page 1of 7

Chapter 24

Computing and Distributing


Twiddle Factors in the
Parallel FFTs
r
It was assumed in the previous chapters that all N
values, which are commonly referred to as twiddle factors, are pre-computed and available to the computer programs
implementing the various transform algorithms. One could argue that this is a reasonable assumption to make, since FFT codes are usually applied sequentially to large
numbers of vectors, and thus pre-computation of the twiddle factors is an ecient
strategy. The argument appears to be valid for single processor machines, and for
shared-memory multiprocessors, since access to the twiddle factors is straightforward.
As pointed out by [23], this is also an ecient strategy for distributed-memory machines
if multiple transforms are performed by each processor all at once, which is certainly
the case in computing multiple 1D FFTs or 2D FFTs as discussed in Chapter 23.

However, for transforming a single large vector on a distributed-memory machine,


the best strategy is not at all obvious. If every processor has a copy of all the twiddle
factors, then they may consume more memory than the data being transformed
a somewhat incongruous circumstance. If this represents an unacceptable amount of
storage, then the FFT algorithm must arrange that the (pre-computed) twiddle factors
are conveyed among the processors so that they are available when needed. A nal
option is to compute the twiddle factors on the y as required, which relieves the
storage and communication burdens at the expense of additional computation.
The choice of strategy depends on a number of factors: the relative speeds of
communication and computation, the amount of memory available compared to the
size of the problem being solved, and the algorithm itself. Some specic strategies were
considered and compared under special circumstances in [23, 51, 98, 104], but they do
not seem to generalize because the distribution of twiddle factors can be drastically
dierent for dierent algorithms, as discussed in the following two sections.

2000 by CRC Press LLC

24.1

Twiddle Factors for Parallel FFT Without InterProcessor Permutations

Since the twiddle factors are completely dictated by the data elements involved in
each buttery computation, the twiddle factors required by each processor are easily
identied by the data it owns, which are dierent for dierent mappings. Assuming
that naturally ordered input data are transformed by DIF FFT without inter-processor
permutations, an example using a consecutive block mapping is given in Figure 24.1,
and another example using a cyclic mapping is given in Figure 24.2.
Observe that in either case, one processor may need to use more twiddle factors
than the other. For comparison, the distribution of the twiddle factors are tabulated
in Table 24.1 for the consecutive block map in Figure 24.1, and in Table 24.2 for the
cyclic map in Figure 24.2. Apparently, in the former case, the twiddle factors are not
evenly distributed among the processors, whereas in the latter case, a more balanced
(but still not fully balanced) distribution results from using the cyclic data map in
parallelizing the DIF FFT algorithm.

2000 by CRC Press LLC

Figure 24.1 DIFNR FFT twiddle factors required if a consecutive block map is used.

2000 by CRC Press LLC

Table 24.1 DIFNR FFT twiddle factors required by each processor in Figure 24.1.
isizilio

Processors

(Zage

i*ilioO

1)

(z%ge

PO

il ioO0

2)

(zxge

ioOO0

3)

(SZge

0
N

0
N

4
N

5
N

0
WP,,

4)

(Stage

5)

0
WpJ =

0
WN

w;

w;

6
N
12
N
PI

0
WN

0
N

0
N

2
WN

4
WN

6
N

4
N

6
N

6
N

12
wN

6
N
10
IV
12
wN
14
N
p2

0
N

0
N

0
N

1
wN

4
wN

8
wN

2
wN

6
wN

3
wN

12
wN

4
wN
3
wN
6
N
7
wN
P3

2000 by CRC Press LLC

a
wN

0
wN

0
N

0
N

9
wN

2
N

4
N

a
N

10
N

4
N

6
N

11
N

6
N

12
N

12
N

6
N

13
wN

*0
wN

14
wN

12
wN

15
WIV

14
W&l

Figure 24.2 DIFNR FFT twiddle factors required if a cyclic map is used.

2000 by CRC Press LLC

Table 24.2 DIFNR FFT twiddle factors required by each processor in Figure 24.2.
isizilio

izi,ioD

il ioO0

ioOO0

O=

wi.J

Processors

(Z&e

PO

1)

(ii&e

0
N

0
PI

4
N

8
N

2)

(Zige

3)

(Siige

4)

(Stage

1
5)

0
N

8
wN
I*
N
PI

1
wN

1
N

5
wN

10
N

4
N

0
N

0
WN

9
wN
13
wN
P2

1
wN

4
N

6
wN

12
N

a
N

0
N

12
N

8
N

IO
wN
14
N
P3

3
wN

6
N

7
wN

14
N

1,
N
1.5
wN

2000 by CRC Press LLC

24.2

Twiddle Factors for Parallel FFT With InterProcessor Permutations

Referring to Figures 20.1, 20.2, 20.3, 20.4, 20.5, and 20.6 in Chapter 20 on parallel FFTs with inter-processor permutations, one can tabulate the twiddle factors
i 3 i 2 i1 i0
i2 i1 i0 0
i1 i0 00
i0 000
0
N
, N
, N
, N
, and N
(inferred from global m = i4 i3 i2 i1 i0 ) required
by each processor as shown in Table 24.3 below. (Note that p = 4 and N = 32 in the
example.) Again, it is assumed that a DIFNR FFT is used. Observe that in this case,
each processor needs to compute almost all N/2 twiddle factors either in advance or
on the y (to save storage).
Table 24.3 DIFNR FFT twiddle factors required by each processor in Figure 24.1,
20.2, 20.3, 20.4, 20.5, and 20.6. (p = 4 and N = 32)
i

(r
P

1
O
U
w
w
w

w
w
w

0
I
2
3
a
9
I

W
P

w
w
w
w

w
w
W
w

2000 by CRC Press LLC

T( o

5
6
r
1
1
1
1

)2

N
N

w
O

;
4

w
w

N
N

2
3

4
N
N

w
w
w
w

2
4
6
0
2
4
6
8
1
1
1
8
1
1
1

iz i

w
w

N
N
N
N

W
0
2
4

N
N
N
N

w
W
w
w

0
2
4

c )3

lI(

w
w
w

0
4
f
1
0
4
8
I
0
4
8
1
0
4
8
1

e )4

a! Z(
d

i0

i
w =
1
O

)s5

gi &
Z(

N W

00

) s

ie e
&
S

J
N
N

N W

N
N

N
N

w
w

0
8

N w O --

N
N

N
N

N
N

0
a

N w
N

;
g

et

oe a

You might also like