This section comes from a note describing results on efficient algorithms to calculate the discrete Fourier transform (DFT) that were collected over years. Perhaps the most interesting is the discovery that the Cooley-Tukey FFT was described by Gauss in 1805 46. That gives some indication of the age of research on the topic, and the fact that a 1995 compiled bibliography 88 on efficient algorithms contains over 3400 entries indicates its volume. Three IEEE Press reprint books contain papers on the FFT 74, 24, 25. An excellent general purpose FFT program has been described in 34, 35 and is used in Matlab and available over the internet.
In addition to this book there are several others 60, 65, 10, 44, 102, 64, 11, 9, 97 that give a good modern theoretical background for the FFT, one book 12 that gives the basic theory plus both FORTRAN and TMS 320 assembly language programs, and other books 57, 84, 15 that contain chapters on advanced FFT topics. A good up-to-date, on-line reference with both theory and programming techniques is in 3. The history of the FFT is outlined in 21, 46 and excellent survey articles can be found in 30, 20. The foundation of much of the modern work on efficient algorithms was done by S. Winograd. These results can be found in 115, 116, 117. An outline and discussion of his theorems can be found in 57 as well as 60, 65, 10, 44.
Efficient FFT algorithms for length-2M were described by Gauss and discovered in modern times by Cooley and Tukey 22. These have been highly developed and good examples of FORTRAN programs can be found in 12. Several new algorithms have been published that require the least known amount of total arithmetic 119, 26, 28, 58, 112, 17. Of these, the split-radix FFT 26, 28, 111, 100 seems to have the best structure for programming, and an efficient program has been written 90 to implement it. A mixture of decimation-in-time and decimation-in-frequency with very good efficiency is given in 77, 78 and one called the Sine-Cosine FT 17. Recently a modification to the split-radix algorithm has been described 52 that has a slightly better total arithmetic count. Theoretical bounds on the number of multiplications required for the FFT based on Winograd's theories are given in 44, 43. Schemes for calculating an in-place, in-order radix-2 FFT are given in 7, 6, 50, 107. Discussion of various forms of unscramblers is given in 16, 69, 54, 31, 73, 114, 120, 79, 71. A discussion of the relation of the computer architecture, algorithm and compiler can be found in 59, 62. A modification to allow lengths of for q odd is given in 4.
The “other” FFT is the prime factor algorithm (PFA) which uses an index map originally developed by Thomas and by Good. The theory of the PFA was derived in 55 and further developed and an efficient in-order and in-place program given in 5, 12. More results on the PFA are given in 105, 106, 107, 108, 98. A method has been developed to use dynamic programming to design optimal FFT programs that minimize the number of additions and data transfers as well as multiplications 49. This new approach designs custom algorithms for a particular computer architecture. An efficient and practical development of Winograd's ideas has given a design method that does not require the rather difficult Chinese remainder theorem 57, 51 for short prime length FFT's. These ideas have been used to design modules of length 11, 13, 17, 19, and 25 48. Other methods for designing short DFT's can be found in 104, 56. A use of these ideas with distributed arithmetic and table look-up rather than multiplication is given in 18. A program that implements the nested Winograd Fourier transform algorithm (WFTA) is given in 60 but it has not proven as fast or as versatile as the PFA 5. An interesting use of the PFA was announced 19 in searching for large prime numbers.
These efficient algorithms can not only be used on DFT's but on other transforms with a similar structure. They have been applied to the discrete Hartley transform 93, 13 and the discrete cosine transform 112, 118, 76.
The fast Hartley transform has been proposed as a superior method for real data analysis but that has been shown not to be the case. A well-designed real-data FFT 94 is always as good as or better than a well-designed Hartley transform 93, 29, 68, 109, 80. The Bruun algorithm 14, 101 also looks promising for real data applications as does the Rader-Brenner algorithm 70, 23, 109. A novel approach to calculating the inverse DFT is given in 27.
General length algorithms include 91, 40, 32. For lengths that are not highly composite or prime, the chirp z-transform in a good candidate 12, 75 for longer lengths and an efficient order-N2 algorithm called the QFT 92, 41, 42 for shorter lengths. A method which automatically generates near-optimal prime length Winograd based programs has been given in 51, 82, 85, 86, 87. This gives the same efficiency for shorter lengths (i.e. N≤19) and new algorithms for much longer lengths and with well-structured algorithms. Another approach is given in 67. Special methods are available for very long lengths 47, 99. A very interesting general length FFT system called the FFTW has been developed by Frigo and Johnson at MIT. It uses a library of efficient “codelets" which are composed for a very efficient calculation of the DFT on a wide variety of computers 34, 35, 33. For most lengths and on most computers, this is the fastest FFT today. Surprisingly, it uses a recursive program structure. The FFTW won the 1999 Wilkinson Prize for Numerical Software.
The use of the FFT to calculate discrete convolution was one of its earliest uses. Although the more direct rectangular transform 2 would seem to be more efficient, use of the FFT or PFA is still probably the fastest method on a general purpose computer or DSP chip 66, 94, 29, 61. On special purpose hardware or special architectures, the use of distributed arithmetic 18 or number theoretic transforms 1 may be even faster. Special algorithms for use with the short-time Fourier transform 81 and for the calculation of a few DFT values 89, 72, 83 and for recursive implementation 113, 35 have also been developed. An excellent analysis of efficient programming the FFT on DSP microprocessors is given in 63, 62. Formulations of the DFT in terms of tensor or Kronecker products look promising for developing algorithms for parallel and vector computer architectures 95, 102, 53, 110, 103, 39, 38.
Various approaches to calculating approximate DFTs have been based on cordic methods, short word lengths, or some form of pruning. A new method that uses the characteristics of the signals being transformed has combined the discrete wavelet transform (DWT) combined with the DFT to give an approximate FFT with O(N) multiplications 36, 37, 8 for certain signal classes. A similar approach has been developed using filter banks 96, 45.
The study of efficient algorithms not only has a long history and large bibliography, it is still an exciting research field where new results are used in practical applications.
More information can be found on the Rice DSP Group's web page
Agarwal, R. C. and Burrus, C. S. (1975, April). Number Theoretic Transforms to Implement Fast Digital Convolution. [also in IEEE Press DSP Reprints II, 1979]. Proceedings of the IEEE, 63(4), 550–560.
Agarwal, R. C. and Cooley, J. W. (1977, October). New Algorithms for Digital Convolution. IEEE Trans. on ASSP, 25(2), 392–410.
Arndt, Jörg. (2008). Algorithms for Programmers: Ideas and Source Code. [FFT book available on-line, 1000 pages, continually updated]. Bayreuth, Germany: http://www.jjj.de/fxt/.
Bi, Guoan and Chen, Yan Qiu. (1998, June). Fast DFT Algorithms for Length. IEEE Transactions on Circuits and Systems – II, 45(6), 685–690.
Burrus, C. S. and Eschenbacher, P. W. (1981, August). An In–Place, In–Order Prime Factor FFT Algorithm. [Reprinted in it DSP Software, by L.R. Morris, 1983]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(4), 806–817.
Beard, James K. (2003). The FFT in the 21st Century: Eigenspace Processing. Boston: Kluwer.
Beard, James K. (1978, April). An In-Place, Self-Reordering FFT. In Proceedings of the ICASSP-78. (pp. 632-633). Tulsa
Burrus, C. Sidney and Gopinath, Ramesh A. and Guo, Haitao. (1998). Introduction to Wavelets and the Wavelet Transform. Upper Saddle River, NJ: Prentice Hall.
Briggs, William L. and Henson, Van Emden. (1995). The DFT: An Owner's Manual for the Discrete Fourier Transform. Philadelphia: SIAM.
Blahut, R. E. (1984). Fast Algorithms for Digital Signal Processing. Reading, MA: Addison-Wesley, Inc.
Blahut, Richard E. (1992). Algebraic Methods for Signal Processing and Communications Coding. New York: Springer-Verlag.
Burrus, C. S. and Parks, T. W. (1985). DFT/FFT and Convolution Algorithms. New York: John Wiley & Sons.
Bracewell, R. N. (1986). The Hartley Transform. Oxford Press.
Bruun, G. (1978, February). z-Transform DFT Filters and FFTs. IEEE Transactions on ASSP, 26(1), 56–63.
Burrus, C. Sidney and Selesnick, Ivan W. (1998). Fast Convolution and Filtering. In Madisetti, V. K. and Williams, D. B. (Eds.), The Digital Signal Processing Handbook. Boca Raton: CRC Press.
Burrus, C. S. (1988, July). Unscrambling for Fast DFT Algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(7), 1086–1087.
Byam, Iain R. (1999, June). A New Fast Fourier Transform Algorithm. [A short version is in Technical Report PG-99001, ECE Dept., Univ. of the West Indies, Aug. 1999]. Technical report. University of the West Indies (UWI), St. Augustine, Trinidad.
Chu, Shuni and Burrus, C. S. (1982, April). A Prime Factor FFT Algorithm using Distributed Arithmetic. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(2), 217–227.
Chen, K. T. (1990, February). A New Record: The Largest Known Prime Number. IEEE Spectrum, 27(2), 47.
Cooley, James W. (1990, April). The Structure of FFT Algorithms. [Notes for a Tutorial at IEEE ICASSP-90].
Cooley, J. W. (1992, January). How the FFT Gained Acceptance. [Also presented at the ACM Conference on the History of Scientific and Numerical Computation, Princeton, NJ, May 1987 and published in: A History of Scientific Computing, edited by S. G. Nash, Addison-Wesley, 1990, pp. 133-140.]. IEEE Signal Processing Magazine, 9(1), 10–13.
Cooley, J. W. and Tukey, J. W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Computat., 19, 297–301.
Cho, K. M. and Temes, G. C. (1978, April). Real-factor FFT Algorithms. In Proceedings of IEEE ICASSP-78. (pp. 634-637). Tulsa, OK
DSP Committee, (Ed.). (1979). Digital Signal Processing II, selected reprints. New York: IEEE Press.
DSP Committee, (Ed.). (1979). Programs for Digital Signal Processing. New York: IEEE Press.
Duhamel, P. and Hollmann, H. (1984, January 5). Split Radix FFT Algorithm. Electronic Letters, 20(1), 14–16.
Duhamel, P. and Piron, B. and Etcheto, J. M. (1978, February). On Computing the Inverse DFT. IEEE Transactions on ASSP, 36(2), 285–286.
Duhamel, P. (1986, April). Implementation of `Split-Radix' FFT Algorithms for Complex, Real, and Real-Symmetric Data. [A shorter version appeared in the ICASSP-85 Proceedings, p. 20.6, March 1985]. IEEE Trans. on ASSP, 34, 285–295.
Duhamel, P. and Vetterli, M. (1987, June). Improved Fourier and Hartley Transfrom Algorithms, Application to Cyclic Convolution of Real Data. IEEE Trans. on ASSP, 35(6), 818–824.
Duhamel, P. and Vetterli, M. (1990, April). Fast Fourier Transforms: A Tutorial Review and a State of the Art. Signal Processing, 19(4), 259–299.
Evans, D. M. W. (1989, August). A Second Improved Digit–Reversal Permutation Algorithm for Fast Transforms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(8), 1288–1291.
Ferguson, Jr., W. E. (1982). A Simple Derivation of Glassman General-N Fast Fourier Transform. [Also, in Report AD-A083811, NTIS, Dec. 1979]. Comput. and Math. with Appls., 8(6), 401–411.
Frigo, Matteo and Johnson, Steven G. (2005, February). The Design and Implementtion of FFTW. Proceedings of the IEEE, 93(2), 216–231.
Frigo, Matteo and Johnson, Steven G. (1998, May 12–15). FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. (Vol. III, p. 1381–1384). ICASSP-98, Seattle
Frigo, Matteo. (1999, May). A Fast Fourier Transform Compiler. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implentation. PLDI-99, Atlanta
Guo, Haitao and Burrus, C. Sidney. (1996, August 6–9). Approximate FFT via the Discrete Wavelet Transform. In Proceedings of SPIE Conference 2825. Denver
Guo, Haitao and Burrus, C. Sidney. (1997, April 21–24). Wavelet Transform Based Fast Approximate Fourier Transform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. (Vol. 3, p. III:1973–1976). IEEE ICASSP-97, Munich
Granata, John and Conner, Michael and Tolimieri, Richard. (1992, December). Recursive Fast Algorithms and the Role of the Tensor Product. IEEE Transactions on Signal Processing, 40(12), 2921–2930.
Granata, John and Conner, Michael and Tolimieri, Richard. (1992, January). The Tensor Product: A Mathematical Programming Language for FFTs. IEEE Signal Processing Magazine, 9(1), 40–48.
Glassman, J. A. (1970, Feburary). A Generalization of the Fast Fourier Transform. IEEE Transactions on Computers, C-19(2), 105–116.
Guo, H. and Sitton, G. A. and Burrus, C. S. (1994, April 19–22). The Quick Discrete Fourier Transform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. (p. III:445–448). IEEE ICASSP-94, Adelaide, Australia
Guo, H. and Sitton, G. A. and Burrus, C. S. (1998, February). The Quick Fourier Transform: an FFT based on Symmetries. IEEE Transactions on Signal Processing, 46(2), 335–341.
Heideman, M. T. and Burrus, C. S. (1986, February). On the Number of Multiplications Necessary to Compute a Length- DFT. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(1), 91–95.
Heideman, M. T. (1988). Multiplicative Complexity, Convolution, and the DFT. Springer–Verlag.
Hossen, A. N. and Heute, U. and Shentov, O. V. and Mitra, S. K. (1995). Subband DFT – Part II: Accuracy, Complexity, and Applications. Signal Processing, 41, 279–295.
Heideman, M. T. and Johnson, D. H. and Burrus, C. S. (1984, October). Gauss and the History of the FFT. [also in Archive for History of Exact Sciences, 1985]. IEEE Acoustics, Speech, and Signal Processing Magazine, 1(4), 14–21.
Hocking, W. K. (1989, January). Performing Fourier Transforms on Extremely Long Data Streams. Computers in Physics, 3(1), 59–65.
Johnson, H. W. and Burrus, C. S. (1981). Large DFT Modules: N = 11, 13, 17, 19, and 25. (8105). Technical report. Department of Electrical Engineering, Rice University, Houston, TX 77251–1892.
Johnson, H. W. and Burrus, C. S. (1983, April). The Design of Optimal DFT Algorithms Using Dynamic Programming. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(2), 378–387.
Johnson, H. W. and Burrus, C. S. (1984, March). An In-Place, In-Order Radix-2 FFT. In ICASSP-84 Proceedings. (p. 28A.2).
Johnson, Howard W. and Burrus, C. S. (1985, February). On the Structure of Efficient DFT Algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(1), 248–254.
Johnson, Steven G. and Frigo, Matteo. (2007, January). A Modified Split-Radix FFT with Fewer Arithmetic Operations. IEEE Transactions on Signal Processing, 55(1), 111–119.
Johnson, J. and Johnson, R. W. and Rodriguez, D. and Tolimieri, R. (1990). A Methodology for Designing, Modifying, and Implementing Fourier Transform Algorithms on Various Architectures. Circuits, Systems and Signal Processing, 9(4), 449–500.
Jeong, Jechang and Williams, William J. (1990, April). A Fast Recursive Bit-Reversal Algorithm. In Proceedings of the ICASSP-90. (p. 1511–1514). Albuquerque, NM
Kolba, D. P. and Parks, T. W. (1977, August). A Prime Factor FFT Algorithm Using High Speed Convolution. [also in]. IEEE Trans. on ASSP, 25, 281–294.
Lu, Chao and Cooley, James W. and Tolimieri, Richard. (1993, Febr