The development of algorithms for the fast computation of the Discrete Fourier Transform in the last 30 years originated with the radix 2 Cooley-Tukey FFT and the theory and variety of FFTs has grown significantly since then. Most of the work has focused on FFTs whose sizes are composite, for the algorithms depend on the ability to factor the length of the data sequence, so that the transform can be found by taking the transform of smaller lengths. For this reason, algorithms for prime length transforms are building blocks for many composite length FFTs - the maximum length and the variety of lengths of a PFA or WFTA algorithm depend upon the availability of prime length FFT modules. As such, prime length Fast Fourier Transforms are a special, important and difficult case.
Fast algorithms designed for specific short prime lengths have been developed and have been written as straight line code 3, 2. These dedicated programs rely upon an observation made in Rader's paper 6 in which he shows that a prime p length DFT can be found by performing a p–1 length circular convolution. Since the publication of that paper, Winograd had developed a theory of multiplicative complexity for transforms and designed algorithms for convolution that attain the minimum number of multiplications 13. Although Winograd's algorithms are very efficient for small prime lengths, for longer lengths they require a large number of additions and the algorithms become very cumbersome to design. This has prevented the design of useful prime length FFT programs for lengths greater than 31. Although we have previously reported the design of programs for prime lengths greater than 31 7 those programs required more additions than necessary and were long. Like the previously existing ones, they simply consisted of a long list of instructions and did not take advantage of the attainable common structures.
In this paper we describe a set of programs for circular convolution and prime length FFTs that are are short, possess great structure, share many computational procedures, and cover a large variety of lengths. Because the underlying convolution is decomposed into a set of disjoint operations they can be performed in parallel and this parallelism is made clear in the programs. Moreover, each of these independent operations is made up of a sequence of sub-operations of the form I⊗A⊗I where ⊗ denotes the Kronecker product. These operations can be implemented as vector/parallel operations 12. Previous programs for prime length FFTs do not have these features: they consist of straight line code and are not amenable to vector/parallel implementations.
We have also developed a program that automatically generates these programs for circular convolution and prime length DFTs. This code generating program requires information only about a set of modules for computing cyclotomic convolutions. We compute these non-circular convolutions by computing a linear convolution and reducing the result. Furthermore, because these linear convolution algorithms can be built from smaller ones, the only modules needed are ones for the linear convolution of prime length sequences. It turns out that with linear convolution algorithms for only the lengths 2 and 3, we can generate a wide variety of prime length FFT algorithms. In addition, the code we generate is made up of calls to a relatively small set of functions. Accordingly, the subroutines can be designed and optimized to specifically suit a given architecture.
The programs we describe use Rader's conversion of a prime point DFT into a circular convolution, but this convolution we compute using the split nesting algorithm 5. As Stasinski notes 11, this yields algorithms possessing greater structure and simpler programs and doesn't generally require more computation.
In computing the DFT of an n=n1n2 point sequence where n1 and n2 are relatively prime, a row-column method can be employed. That is, if an n1×n2 array is appropriately formed from the n point sequence, then its DFT can be computed by computing the DFT of the rows and by then computing the DFT of the columns. The separability of the DFT makes this possible. It should be mentioned, however, that in at least two papers 11, 4 it is mistakenly assumed that the row-column method can also be applied to convolution. Unfortunately, the convolution of two sequences can not be found by forming two arrays, by convolving their rows, and by then convolving their columns. This misunderstanding about the separability of convolution also appears in 1 where the author incorrectly writes a diagonal matrix of a bilinear form as a Kronecker product. If it were a Kronecker product, then there would indeed exist a row-column method for convolution.
Earlier reports on this work were published in the conference proceedings 7, 8, 9 and a fairly complete report was published in the IEEE Transaction on Signal Processing 10. Some parts of this approach appear in the Connexions book, Fast Fourier Transforms. This work is built on and an extension of that in 9 which is also in the Connexions Technical Report.
Blahut, R. E. (1985). Fast Algorithms for Digital Signal Processing. Addison-Wesley.
Johnson, H. W. and Burrus, S. (1981). Large DFT Modules: 11, 13, 17, 19, 25. (8105). [Connexions Collection: col10569]. Technical report. Rice University.
Johnson, H. W. and Burrus, C. S. (1983, April). The Design of Optimal DFT Algorithms Using Dynamic Programming. IEEE Trans. Acoust., Speech, Signal Proc., 31(2), 378-387.
Lu, C. and Cooley, J. W. and Tolimieri, R. (1993, February). FFT Algorithms for Prime Transform Sizes and their Implementations of VAX, IBM3090VF, and IBM RS/6000. IEEE Trans. Acoust., Speech, Signal Proc., 41(2), 638-648.
Nussbaumer, H. J. (1982). Fast Fourier Transform and Convolution Algorithms. Sringer-Verlag.
Rader, C. M. (1968, June). Discrete Fourier Transform When the Number of Data Samples is Prime. Proc. IEEE, 56(6), 1107-1108.
Selesnick, I. W. and Burrus, C. S. (1992). Automating the Design of Prime Length FFT Programs. In Proc. of 1992 ISCAS.
Selesnick, I. W. and Burrus, C. S. (1993). Multidimensional Mapping Techniques for Convolution. In Proc. of 1993 ICASSP.
Selesnick, I. W. and Burrus, C. S. (1994). Extending Winograd's Small Convolution Algorithm to Longer Lengths. In Proc. of 1994 ISCAS.
Selesnick, I.W. and Burrus, C.S. (1996, January). Automatic Generation of Prime Length FFT Programs. IEEE Transactions on Signal Processing, 44(1), 14–24.
Stasinski, R. (1986, June). Easy Generation of Small-N Discrete Fourier Transform Algorithms. IEE Proceedings, part G, 133(3), 133-139.
Tolimieri, R. and An, M. and Lu, C. (1989). Algorithms for Discrete Fourier Transform and Convolution. Springer-Verlag.
Winograd, S. (1980). Arithmetic Complexity of Computations. SIAM.