In 1976, S. Winograd 20 presented a new DFT algorithm which had significantly fewer multiplications than the Cooley-Tukey FFT which had been published eleven years earlier. This new Winograd Fourier Transform Algorithm (WFTA) is based on the type- one index map from Multidimensional Index Mapping with each of the relatively prime length short DFT's calculated by very efficient special algorithms. It is these short algorithms that this section will develop. They use the index permutation of Rader described in the another module to convert the prime length short DFT's into cyclic convolutions. Winograd developed a method for calculating digital convolution with the minimum number of multiplications. These optimal algorithms are based on the polynomial residue reduction techniques of Polynomial Description of Signals: Equation 1 to break the convolution into multiple small ones 2, 12, 14, 23, 21, 9.
The operation of discrete convolution defined by
is called a bilinear operation because, for a fixed h(n), y(n) is a linear function of x(n) and for a fixed x(n) it is a linear function of h(n). The operation of cyclic convolution is the same but with all indices evaluated modulo N.
Recall from Polynomial Description of Signals: Equation 3 that length-N cyclic convolution of x(n) and h(n) can be represented by polynomial multiplication
This bilinear operation of Equation 7.1 and Equation 7.2 can also be expressed in terms of linear matrix operators and a simpler bilinear operator denoted by o which may be only a simple element-by-element multiplication of the two vectors 12, 9, 10. This matrix formulation is
where X, H and Y are length-N vectors with elements of x(n), h(n) and y(n) respectively. The matrices A and B have dimension M x N , and C is N x M with M≥N. The elements of A, B, and C are constrained to be simple; typically small integers or rational numbers. It will be these matrix operators that do the equivalent of the residue reduction on the polynomials in Equation 7.2.
In order to derive a useful algorithm of the form Equation 7.3 to calculate Equation 7.1, consider the polynomial formulation Equation 7.2 again. To use the residue reduction scheme, the modulus is factored into relatively prime factors. Fortunately the factoring of this particular polynomial, sN–1, has been extensively studied and it has considerable structure. When factored over the rationals, which means that the only coefficients allowed are rational numbers, the factors are called cyclotomic polynomials 2, 12, 14. The most interesting property for our purposes is that most of the coefficients of cyclotomic polynomials are zero and the others are plus or minus unity for degrees up to over one hundred. This means the residue reduction will generally require no multiplications.
The operations of reducing X(s) and H(s) in Equation 7.2 are carried out by the matrices A and B in Equation 7.3. The convolution of the residue polynomials is carried out by the o operator and the recombination by the CRT is done by the C matrix. More details are in 2, 12, 14, 9, 10 but the important fact is the A and B matrices usually contain only zero and plus or minus unity entries and the C matrix only contains rational numbers. The only general multiplications are those represented by o. Indeed, in the theoretical results from computational complexity theory, these real or complex multiplications are usually the only ones counted. In practical algorithms, the rational multiplications represented by C could be a limiting factor.
The h(n) terms are fixed for a digital filter, or they represent the W terms from Multidimensional Index Mapping: Equation 1 if the convolution is being used to calculate a DFT. Because of this, d=BH in Equation 7.3 can be precalculated and only the A and C operators represent the mathematics done at execution of the algorithm. In order to exploit this feature, it was shown 23, 9 that the properties of Equation 7.3 allow the exchange of the more complicated operator C with the simpler operator B. Specifically this is given by
where H' has the same elements as H, but in a permuted order, and likewise Y' and Y. This very important property allows precomputing the more complicated CTH' in Equation 7.5 rather than BH as in Equation 7.3.
Because BH or CTH' can be precomputed, the bilinear form of Equation 7.3 and Equation 7.5 can be written as a linear form. If an M x M diagonal matrix D is formed from d=CTH, or in the case of Equation 7.3, d=BH, assuming a commutative property for o, Equation 7.5 becomes
and Equation 7.3 becomes
In most cases there is no reason not to use the same reduction operations on X and H, therefore, B can be the same as A and Equation 7.6 then becomes
In order to illustrate how the residue reduction is carried out and how the A matrix is obtained, the length-5 DFT algorithm started in The DFT as Convolution or Filtering: Matrix 1 will be continued. The DFT is first converted to a length-4 cyclic convolution by the index permutation from The DFT as Convolution or Filtering: Equation 3 to give the cyclic convolution in The DFT as Convolution or Filtering. To avoid confusion from the permuted order of the data x(n) in The DFT as Convolution or Filtering, the cyclic convolution will first be developed without the permutation, using the polynomial U(s)
and then the results will be converted back to the permuted x(n). The length-4 cyclic convolution in terms of polynomials is
and the modulus factors into three cyclotomic polynomials
Both U(s) and H(s) are reduced modulo these three polynomials. The reduction modulo P1 and P2 is done in two stages. First it is done modulo , then that residue is further reduced modulo (s–1) and (s+1).
The reduction in Equation 7.16 of the data polynomial Equation 7.15 can be denoted by a matrix operation on a vector which has the data as entries.
and the reduction in Equation 7.19 is
Combining Equation 7.20 and Equation 7.21 gives one operator
Further reduction of v0+v1s is not possible because P3=s2+1 cannot be factored over the rationals. However s2–1 can be factored into P1P2=(s–1)(s+1) and, therefore, w0+w1s can be further reduced as was done in Equation 7.17 and Equation 7.18 by
Combining Equation 7.22, Equation 7.23 and Equation 7.24 gives
The same reduction is done to H(s) and then the convolution of Equation 7.11 is done by multiplying each residue polynomial of X(s) and H(s) modulo each corresponding cyclotomic factor of P(s) and finally a recombination using the polynomial Chinese Remainder Theorem (CRT) as in Polynomial Description of Signals: Equation 9 and Polynomial Description of Signals: Equation 13.