Fast Fourier Transforms by C. Sidney Burrus, Matteo Frigo, Steven G. Johnson, - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 7Winograd's Short DFT Algorithms

In 1976, S. Winograd 20 presented a new DFT algorithm which had significantly fewer multiplications than the Cooley-Tukey FFT which had been published eleven years earlier. This new Winograd Fourier Transform Algorithm (WFTA) is based on the type- one index map from Multidimensional Index Mapping with each of the relatively prime length short DFT's calculated by very efficient special algorithms. It is these short algorithms that this section will develop. They use the index permutation of Rader described in the another module to convert the prime length short DFT's into cyclic convolutions. Winograd developed a method for calculating digital convolution with the minimum number of multiplications. These optimal algorithms are based on the polynomial residue reduction techniques of Polynomial Description of Signals: Equation 1 to break the convolution into multiple small ones 2, 12, 14, 23, 21, 9.

The operation of discrete convolution defined by

(7.1)
_autogen-svg2png-0001.png

is called a bilinear operation because, for a fixed h(n), y(n) is a linear function of x(n) and for a fixed x(n) it is a linear function of h(n). The operation of cyclic convolution is the same but with all indices evaluated modulo N.

Recall from Polynomial Description of Signals: Equation 3 that length-N cyclic convolution of x(n) and h(n) can be represented by polynomial multiplication

(7.2)
_autogen-svg2png-0010.png

This bilinear operation of Equation 7.1 and Equation 7.2 can also be expressed in terms of linear matrix operators and a simpler bilinear operator denoted by o which may be only a simple element-by-element multiplication of the two vectors 12, 9, 10. This matrix formulation is

(7.3) Y = C [ A X o B H ]

where X, H and Y are length-N vectors with elements of x(n), h(n) and y(n) respectively. The matrices A and B have dimension M x N , and C is N x M with MN. The elements of A, B, and C are constrained to be simple; typically small integers or rational numbers. It will be these matrix operators that do the equivalent of the residue reduction on the polynomials in Equation 7.2.

In order to derive a useful algorithm of the form Equation 7.3 to calculate Equation 7.1, consider the polynomial formulation Equation 7.2 again. To use the residue reduction scheme, the modulus is factored into relatively prime factors. Fortunately the factoring of this particular polynomial, sN–1, has been extensively studied and it has considerable structure. When factored over the rationals, which means that the only coefficients allowed are rational numbers, the factors are called cyclotomic polynomials 2, 12, 14. The most interesting property for our purposes is that most of the coefficients of cyclotomic polynomials are zero and the others are plus or minus unity for degrees up to over one hundred. This means the residue reduction will generally require no multiplications.

The operations of reducing X(s) and H(s) in Equation 7.2 are carried out by the matrices A and B in Equation 7.3. The convolution of the residue polynomials is carried out by the o operator and the recombination by the CRT is done by the C matrix. More details are in 2, 12, 14, 9, 10 but the important fact is the A and B matrices usually contain only zero and plus or minus unity entries and the C matrix only contains rational numbers. The only general multiplications are those represented by o. Indeed, in the theoretical results from computational complexity theory, these real or complex multiplications are usually the only ones counted. In practical algorithms, the rational multiplications represented by C could be a limiting factor.

The h(n) terms are fixed for a digital filter, or they represent the W terms from Multidimensional Index Mapping: Equation 1 if the convolution is being used to calculate a DFT. Because of this, d=BH in Equation 7.3 can be precalculated and only the A and C operators represent the mathematics done at execution of the algorithm. In order to exploit this feature, it was shown 23, 9 that the properties of Equation 7.3 allow the exchange of the more complicated operator C with the simpler operator B. Specifically this is given by

(7.4) Y = C [ A X o B H ]
(7.5)
_autogen-svg2png-0050.png

where H' has the same elements as H, but in a permuted order, and likewise Y' and Y. This very important property allows precomputing the more complicated CTH' in Equation 7.5 rather than BH as in Equation 7.3.

Because BH or CTH' can be precomputed, the bilinear form of Equation 7.3 and Equation 7.5 can be written as a linear form. If an M x M diagonal matrix D is formed from d=CTH, or in the case of Equation 7.3, d=BH, assuming a commutative property for o, Equation 7.5 becomes

(7.6) Y' = BTD A X

and Equation 7.3 becomes

(7.7) Y = C D A X

In most cases there is no reason not to use the same reduction operations on X and H, therefore, B can be the same as A and Equation 7.6 then becomes

(7.8) Y' = ATD A X

In order to illustrate how the residue reduction is carried out and how the A matrix is obtained, the length-5 DFT algorithm started in The DFT as Convolution or Filtering: Matrix 1 will be continued. The DFT is first converted to a length-4 cyclic convolution by the index permutation from The DFT as Convolution or Filtering: Equation 3 to give the cyclic convolution in The DFT as Convolution or Filtering. To avoid confusion from the permuted order of the data x(n) in The DFT as Convolution or Filtering, the cyclic convolution will first be developed without the permutation, using the polynomial U(s)

(7.9) U ( s ) = x ( 1 ) + x ( 3 ) s + x ( 4 ) s2 + x ( 2 ) s3
(7.10) U ( s ) = u ( 0 ) + u ( 1 ) s + u ( 2 ) s2 + u ( 3 ) s3

and then the results will be converted back to the permuted x(n). The length-4 cyclic convolution in terms of polynomials is

(7.11)
_autogen-svg2png-0077.png

and the modulus factors into three cyclotomic polynomials

(7.12)
_autogen-svg2png-0078.png
(7.13)
_autogen-svg2png-0079.png
(7.14)
_autogen-svg2png-0080.png

Both U(s) and H(s) are reduced modulo these three polynomials. The reduction modulo P1 and P2 is done in two stages. First it is done modulo _autogen-svg2png-0085.png, then that residue is further reduced modulo (s–1) and (s+1).

(7.15) U ( s ) = u 0 + u 1 s + u2s2 + u3s3
(7.16)
_autogen-svg2png-0089.png
(7.17)
_autogen-svg2png-0090.png
(7.18)
_autogen-svg2png-0091.png
(7.19)
_autogen-svg2png-0092.png

The reduction in Equation 7.16 of the data polynomial Equation 7.15 can be denoted by a matrix operation on a vector which has the data as entries.

(7.20)
_autogen-svg2png-0093.png

and the reduction in Equation 7.19 is

(7.21)
_autogen-svg2png-0094.png

Combining Equation 7.20 and Equation 7.21 gives one operator

(7.22)
_autogen-svg2png-0095.png

Further reduction of v0+v1s is not possible because P3=s2+1 cannot be factored over the rationals. However s2–1 can be factored into P1P2=(s–1)(s+1) and, therefore, w0+w1s can be further reduced as was done in Equation 7.17 and Equation 7.18 by

(7.23)
_autogen-svg2png-0101.png
(7.24)
_autogen-svg2png-0102.png

Combining Equation 7.22, Equation 7.23 and Equation 7.24 gives

(7.25)
_autogen-svg2png-0103.png

The same reduction is done to H(s) and then the convolution of Equation 7.11 is done by multiplying each residue polynomial of X(s) and H(s) modulo each corresponding cyclotomic factor of P(s) and finally a recombination using the polynomial Chinese Remainder Theorem (CRT) as in Polynomial Description of Signals: Equation 9 and Polynomial Description of Signals: Equation 13.

(7.26) Y ( s ) = K1 ( s ) U1 ( s ) H1 ( s ) + K2 ( s ) U2 ( s ) H2 ( s ) + K3 ( s ) U3 ( s )