Fast Fourier Transforms (6x9 Version) by C. Sidney Burrus - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 15

Conclusions: Fast Fourier

Transforms1

This book has developed a class of efficient algorithms based on

index mapping and polynomial algebra. This provides a frame-

work from which the Cooley-Tukey FFT, the split-radix FFT, the

PFA, and WFTA can be derived. Even the programs implementing

these algorithms can have a similar structure. Winograd’s theorems

were presented and shown to be very powerful in both deriving al-

gorithms and in evaluating them. The simple radix-2 FFT provides

a compact, elegant means for efficiently calculating the DFT. If

some elaboration is allowed, significant improvement can be had

from the split-radix FFT, the radix-4 FFT or the PFA. If multipli-

cations are expensive, the WFTA requires the least of all.

Several method for transforming real data were described that are

more efficient than directly using a complex FFT. A complex FFT

can be used for real data by artificially creating a complex input

from two sections of real input. An alternative and slightly more

efficient method is to construct a special FFT that utilizes the sym-

metries at each stage.

1This content is available online at <http://cnx.org/content/m16340/1.7/>.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

231

CHAPTER 15. CONCLUSIONS: FAST

232

FOURIER TRANSFORMS

As computers move to multiprocessors and multicore, writing and

maintaining efficient programs becomes more and more difficult.

The highly structured form of FFTs allows automatic generation of

very efficient programs that are tailored specifically to a particular

DSP or computer architecture.

For high-speed convolution, the traditional use of the FFT or PFA

with blocking is probably the fastest method although rectangular

transforms, distributed arithmetic, or number theoretic transforms

may have a future with special VLSI hardware.

The ideas presented in these notes can also be applied to the cal-

culation of the discrete Hartley transform [355], [112], the discrete

cosine transform [119], [395], and to number theoretic transforms

[32], [239], [267].

There are many areas for future research. The relationship of

hardware to algorithms, the proper use of multiple processors,

the proper design and use of array processors and vector proces-

sors are all open. There are still many unanswered questions in

multi-dimensional algorithms where a simple extension of one-

dimensional methods will not suffice.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

Appendix 1: FFT

Flowgraphs1

16.1 Signal Flow Graphs of Cooley-Tukey

FFTs

The following four figures are flow graphs for Radix-2 Cooley-

Tukey FFTs. The first is a length-16, decimation-in-frequency

Radix-2 FFT with the input data in order and output data scram-

bled. The first stage has 8 length-2 "butterflies" (which overlap in the figure) followed by 8 multiplications by powers of W which

are called "twiddle factors". The second stage has 2 length-8 FFTs

which are each calculated by 4 butterflies followed by 4 multiplies.

The third stage has 4 length-4 FFTs, each calculated by 2 butterflies

followed by 2 multiplies and the last stage is simply 8 butterflies

followed by trivial multiplies by one. This flow graph should be

compared with the index map in Polynomial Description of Signals

(Chapter 4), the polynomial decomposition in The DFT as Convo-

lution or Filtering (Chapter 5), and the program in Appendix 3. In

the program, the butterflies and twiddle factor multiplications are

done together in the inner most loop. The outer most loop indexes

through the stages. If the length of the FFT is a power of two, the

number of stages is that power (log N).

1This content is available online at <http://cnx.org/content/m16352/1.11/>.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

233

234

APPENDIX

The second figure below is a length-16, decimation-in-time FFT

with the input data scrambled and output data in order. The first

stage has 8 length-2 "butterflies" followed by 8 twiddle factors

multiplications. The second stage has 4 length-4 FFTs which are

each calculated by 2 butterflies followed by 2 multiplies. The third

stage has 2 length-8 FFTs, each calculated by 4 butterflies followed

by 8 multiplies and the last stage is simply 8 length-2 butterflies.

This flow graph should be compared with the index map in Poly-

nomial Description of Signals (Chapter 4), the polynomial decom-

position in The DFT as Convolution or Filtering (Chapter 5), and

the program in Appendix 3 (Chapter 18). Here, the FFT must be

preceded by a scrambler.

The third and fourth figures below are a length-16 decimation-in-

frequency and a decimation-in-time but, in contrast to the figures

above, the DIF has the output in order which requires a scrambled

input and the DIT has the input in order which requires the output

be unscrambled. Compare with the first two figures. Note the order

of the twiddle factors. The number of additions and multiplications

in all four flow graphs is the same and the structure of the three-

loop program which executes the flow graph is the same.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

APPENDIX

235

x(0)

X (0)

x(1)

X (8)

x(2)

W 0

X (4)

x(3)

W 4

X (12)

x(4)

W 0

X (2)

x(5)

W 2

X (10)

x(6)

W 4

W 0

X (6)

x(7)

W 6

W 4

X (14)

x(8)

W 0

X (1)

x(9)

W 1

X (9)

x(10)

W 2

W 0

X (5)

x(11)

W 3

W 4

X (13)

x(12)

W 4

W 0

X (3)

x(13)

W 5

W 2

X (11)

x(14)

W 6

W 4

W 0

X (7)

x(15)

W 7

W 6

W 4

X (15)

Figure 16.1: Length-16, Decimation-in-Frequency, In-order

input, Radix-2 FFT

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

236

APPENDIX

x(0)

X (0)

x(8)

X (1)

x(4)

W 0

X (2)

x(12)

W 4

X (3)

x(2)

W 0

X (4)

x(10)

W 2

X (5)

x(6)

W 0

W 4

X (6)

x(14)

W 4

W 6

X (7)

x(1)

W 0

X (8)

x(9)

W 1

X (9)

x(5)

W 0

W 2

X (10

x(13)

W 4

W 3

X (11)

x(3)

W 0

W 4

X (12)

x(11)

W 2

W 5

X (13)

x(7)

W 0

W 4

W 6

X (14)

x(15)

W 4

W 6

W 7

X (15)

Figure 16.2: Length-16, Decimation-in-Time, In-order out-

put, Radix-2 FFT

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

APPENDIX

237

x(0)

X (0)

x(8)

W 0

X (1)

x(4)

W 0

X (2)

x(12)

W 4

W 0

X (3)

x(2)

W 0

X (4)

x(10)

W 2

W 0

X (5)

x(6)

W 4

W 0

X (6)

x(14)

W 6

W 4

W 0

X (7)

x(1)

X (8)

x(9)

W 1

X (9)

x(5)

W 2

X (10

x(13)

W 5

W 2

X (11)

x(3)

W 4

X (12)

x(11)

W 3

W 4

X (13)

x(7)

W 6

W 4

X (14)

x(15)

W 7

W 6

W 4

X (15)

Figure 16.3: Length-16, alternate Decimation-in-Frequency,

In-order output, Radix-2 FFT

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

238

APPENDIX

x(0)

X (0)

x(1)

W 0

X (8)

x(2)

W 0

X (4)

x(3)

W 0

W 4

X (12)

x(4)

W 0

X (2)

x(5)

W 0

W 2

X (10)

x(6)

W 0

W 4

X (6)

x(7)

W 0

W 4

W 6

X (14)

x(8)

X (1)

x(9)

W 1

X (9)

x(10)

W 2

X (5)

x(11)

W 2

W 5

X (13)

x(12)

W 4

X (3)

x(13)

W 4

W 3

X (11)

x(14)

W 4

W 6

X (7)

x(15)

W 4

W 6

W 7

X (15)

Figure 16.4: Length-16, alternate Decimation-in-Time, In-

order input, Radix-2 FFT

The following is a length-16, decimation-in-frequency Radix-4

FFT with the input data in order and output data scrambled. There

are two stages with the first stage having 4 length-4 "butterflies"

followed by 12 multiplications by powers of W which are called

"twiddle factors. The second stage has 4 length-4 FFTs which are

each calculated by 4 butterflies followed by 4 multiplies. Note,

each stage here looks like two stages but it is one and there is only

one place where twiddle factor multiplications appear. This flow

graph should be compared with the index map in Polynomial De-

scription of Signals (Chapter 4), the polynomial decomposition in

The DFT as Convolution or Filtering (Chapter 5), and the program

in Appendix 3 (Chapter 18). Log to the base 4 of 16 is 2. The

total number of twiddle factor multiplication here is 12 compared

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

APPENDIX

239

to 24 for the radix-2. The unscrambler is a base-four reverse order

counter rather than a bit reverse counter, however, a modification

of the radix four butterflies will allow a bit reverse counter to be

used with the radix-4 FFT as with the radix-2.

x(0)

X (0)

x(1)

X (8)

x(2)

X (4)

- j

x(3)

X (12)

W 0

x(4)

X (2)

W 2

x(5)

X (10)

x(6)

W 4

X (6)

- j

x(7)

W 6

X (14)

W 0

x(8)

X (1)

W 1

x(9)

X (9)

W 2

x(10)

X (5)

W 3

- j

x(11)

X (13)

- j

W 4

x(12)

X (3)

- j

W 5

x(13)

X (11)

- j

W 6

x(14)

X (7)

- j

W 7

- j

x(15)

X (15)

Figure 16.5: Length-16, Decimation-in-Frequency, In-order

input, Radix-4 FFT

The following two flowgraphs are length-16, decimation-in-

frequency Split Radix FFTs with the input data in order and output

data scrambled. Because the "butterflies" are L shaped, the stages

do not progress uniformly like the Radix-2 or 4. These two fig-

ures are the same with the first drawn in a way to compare with

the Radix-2 and 4, and the second to illustrate the L shaped butter-

flies. These flow graphs should be compared with the index map

in Polynomial Description of Signals (Chapter 4) and the program

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

240

APPENDIX

in Appendix 3 (Chapter 18). Because of the non-uniform stages,

the program indexing is more complicated. Although the num-

ber of twiddle factor multiplications is 12 as was the radix-4 case,

for longer lengths, the split-radix has slightly fewer multiplications

than the radix-4.

Because the structures of the radix-2, radix-4, and split-radix FFTs

are the same, the number of data additions is same for all of them.

However, each complex twiddle factor multiplication requires two

real additions (and four real multiplications) the number of addi-

tions will be fewer for the structures with fewer multiplications.

x(0)

X (0)

x(1)

X (8)

x(2)

X (4)

- j

x(3)

X (12)

x(4)

X (2)

x(5)

W 2

X (10)

- j

x(6)

W 4

X (6)

- j

x(7)

W 6

X (14)

x(8)

W 0

X (1)

x(9)

W 1

X (9)

x(10)

W 2

X (5)

- j

x(11)

W 3

X (13)

- j

x(12)

W 4

X (3)

- j

x(13)

W 5

X (11)

- j

x(14)

W 6

X (7)

- j

- j

x(15)

W 7

X (15)

Figure 16.6: Length-16, Decimation-in-Frequency, In-order

input, Split-Radix FFT

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

APPENDIX

241

x(0)

X (0)

x(1)

X (8)

x(2)

X (4)

x(3)

X (12)

x(4)

W 0

X (2)

x(5)

W 2

X (10)

x(6)

W 4

X (6)

x(7)

W 6

X (14)

x(8)

W 0

X (1)

x(9)

W 1

X (9)

x(10)

W 2

X (5)

x(11)

W 3

X (13)

x(12)

W 0

X (3)

x(13)

W 3

X (11)

x(14)

W 6

X (7)

x(1