Fast Fourier Transforms (6x9 Version) by C. Sidney Burrus - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 14

Comments: Fast Fourier

Transforms1

14.1 Other work and Results

This section comes from a note describing results on efficient algo-

rithms to calculate the discrete Fourier transform (DFT) that were

collected over years. Perhaps the most interesting is the discov-

ery that the Cooley-Tukey FFT was described by Gauss in 1805

[175]. That gives some indication of the age of research on the

topic, and the fact that a 1995 compiled bibliography [363] on ef-

ficient algorithms contains over 3400 entries indicates its volume.

Three IEEE Press reprint books contain papers on the FFT [303],

[84], [85]. An excellent general purpose FFT program has been

described in [132], [129] and is used in Matlab and available over

the internet.

In addition to this book there are several others [238], [266], [25],

[170], [383], [254], [33], [37], [345] that give a good modern the-

oretical background for the FFT, one book [67] that gives the basic

theory plus both FORTRAN and TMS 320 assembly language pro-

1This content is available online at <http://cnx.org/content/m16434/1.8/>.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

225

CHAPTER 14. COMMENTS: FAST

226

FOURIER TRANSFORMS

grams, and other books [219], [348], [70] that contain chapters on

advanced FFT topics. A good up-to-date, on-line reference with

both theory and programming techniques is in [11]. The history

of the FFT is outlined in [87], [175] and excellent survey articles

can be found in [115], [93]. The foundation of much of the mod-

ern work on efficient algorithms was done by S. Winograd. These

results can be found in [412], [415], [418]. An outline and discus-

sion of his theorems can be found in [219] as well as [238], [266],

[25], [170].

Efficient FFT algorithms for length-2M were described by Gauss

and discovered in modern times by Cooley and Tukey [91]. These

have been highly developed and good examples of FORTRAN pro-

grams can be found in [67]. Several new algorithms have been

published that require the least known amount of total arithmetic

[423], [108], [104], [229], [394], [71]. Of these, the split-radix

FFT [108], [104], [392], [366] seems to have the best structure for

programming, and an efficient program has been written [351] to

implement it. A mixture of decimation-in-time and decimation-in-

frequency with very good efficiency is given in [323], [324] and

one called the Sine-Cosine FT [71]. Recently a modification to the

split-radix algorithm has been described [203] that has a slightly

better total arithmetic count. Theoretical bounds on the number of

multiplications required for the FFT based on Winograd’s theories

are given in [170], [172]. Schemes for calculating an in-place, in-

order radix-2 FFT are given in [17], [19], [196], [379]. Discussion

of various forms of unscramblers is given in [51], [321], [186],

[123], [318], [400], [424], [370], [315]. A discussion of the rela-

tion of the computer architecture, algorithm and compiler can be

found in [251], [242]. A modification to allow lengths of N = q 2m

for q odd is given in [24].

The “other” FFT is the prime factor algorithm (PFA) which uses an

index map originally developed by Thomas and by Good. The the-

ory of the PFA was derived in [214] and further developed and an

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

227

efficient in-order and in-place program given in [58], [67]. More

results on the PFA are given in [377], [378], [379], [380], [364]. A

method has been developed to use dynamic programming to design

optimal FFT programs that minimize the number of additions and

data transfers as well as multiplications [191]. This new approach

designs custom algorithms for a particular computer architecture.

An efficient and practical development of Winograd’s ideas has

given a design method that does not require the rather difficult Chi-

nese remainder theorem [219], [199] for short prime length FFT’s.

These ideas have been used to design modules of length 11, 13, 17,

19, and 25 [189]. Other methods for designing short DFT’s can be

found in [376], [223]. A use of these ideas with distributed arith-

metic and table look-up rather than multiplication is given in [80].

A program that implements the nested Winograd Fourier transform

algorithm (WFTA) is given in [238] but it has not proven as fast or

as versatile as the PFA [58]. An interesting use of the PFA was

announced [75] in searching for large prime numbers.

These efficient algorithms can not only be used on DFT’s but on

other transforms with a similar structure. They have been applied

to the discrete Hartley transform [354], [36] and the discrete cosine

transform [394], [401], [314].

The fast Hartley transform has been proposed as a superior method

for real data analysis but that has been shown not to be the case.

A well-designed real-data FFT [360] is always as good as or better

than a well-designed Hartley transform [354], [113], [289], [386],

[371]. The Bruun algorithm [41], [369] also looks promising for

real data applications as does the Rader-Brenner algorithm [310],

[76], [386]. A novel approach to calculating the inverse DFT is

given in [109].

General length algorithms include [340], [143], [125]. For lengths

that are not highly composite or prime, the chirp z-transform in

a good candidate [67], [307] for longer lengths and an efficient

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 14. COMMENTS: FAST

228

FOURIER TRANSFORMS

order-N2 algorithm called the QFT [343], [157], [160] for shorter

lengths. A method which automatically generates near-optimal

prime length Winograd based programs has been given in [199],

[330], [332], [334], [336].

This gives the same efficiency for

shorter lengths (i.e. N ≤ 19) and new algorithms for much longer

lengths and with well-structured algorithms. Another approach is

given in [285]. Special methods are available for very long lengths

[183], [365]. A very interesting general length FFT system called

the FFTW has been developed by Frigo and Johnson at MIT. It

uses a library of efficient “codelets" which are composed for a very

efficient calculation of the DFT on a wide variety of computers

[132], [129], [136]. For most lengths and on most computers, this

is the fastest FFT today. Surprisingly, it uses a recursive program

structure. The FFTW won the 1999 Wilkinson Prize for Numerical

Software.

The use of the FFT to calculate discrete convolution was one of

its earliest uses. Although the more direct rectangular transform

[9] would seem to be more efficient, use of the FFT or PFA is

still probably the fastest method on a general purpose computer or

DSP chip [287], [360], [113], [241]. On special purpose hardware

or special architectures, the use of distributed arithmetic [80] or

number theoretic transforms [5] may be even faster. Special al-

gorithms for use with the short-time Fourier transform [346] and

for the calculation of a few DFT values [349], [316], [347] and for

recursive implementation [399], [129] have also been developed.

An excellent analysis of efficient programming the FFT on DSP

microprocessors is given in [243], [242]. Formulations of the DFT

in terms of tensor or Kronecker products look promising for de-

veloping algorithms for parallel and vector computer architectures

[361], [383], [200], [390], [385], [154], [153].

Various approaches to calculating approximate DFTs have been

based on cordic methods, short word lengths, or some form of

pruning. A new method that uses the characteristics of the signals

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

229

being transformed has combined the discrete wavelet transform

(DWT) combined with the DFT to give an approximate FFT with

O (N) multiplications [162], [164], [69] for certain signal classes.

A similar approach has been developed using filter banks [339],

[185].

The study of efficient algorithms not only has a long history and

large bibliography, it is still an exciting research field where new

results are used in practical applications.

More information can be found on the Rice DSP Group’s web

page2

2http://www-dsp.rice.edu

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 14. COMMENTS: FAST

230

FOURIER TRANSFORMS

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>