Automatic Generation of Prime Length FFT Programs by C. Sidney Burrus - HTML preview

/ Home / Mathematics (Academic) / Automatic Generation of Prime Length FFT Programs

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 2. Preliminaries

Preliminaries

Because we compute prime point DFTs by converting them in to circular convolutions, most of this and the next section is devoted to an explanation of the split nesting convolution algorithm. In this section we introduce the various operations needed to carry out the split nesting algorithm. In particular, we describe the prime factor permutation that is used to convert a one-dimensional circular convolution into a multi-dimensional one. We also discuss the reduction operations needed when the Chinese Remainder Theorem for polynomials is used in the computation of convolution. The reduction operations needed for the split nesting algorithm are particularly well organized. We give an explicit matrix description of the reduction operations and give a program that implements the action of these reduction operations.

The presentation relies upon the notions of similarity transformations, companion matrices and Kronecker products. With them, we describe the split nesting algorithm in a manner that brings out its structure. We find that when companion matrices are used to describe convolution, the reduction operations block diagonalizes the circular shift matrix.

The companion matrix of a monic polynomial, M(s)=m₀+m₁s+⋯+m_n–1s^n–1+sⁿ is given by

(2.1)

Its usefulness in the following discussion comes from the following relation which permits a matrix formulation of convolution. Let

(2.2)

Then

(2.3)

where , , and C_M is the companion matrix of M(s). In Equation 2.3, we say y is the convolution of x and h with respect to M(s). In the case of circular convolution, M(s)=sⁿ–1 and C_sⁿ–1 is the circular shift matrix denoted by S_n,

(2.4)

Notice that any circulant matrix can be written as ∑_kh_kS^k_n.

Similarity transformations can be used to interpret the action of some convolution algorithms. If C_M=T^–1AT for some matrix T (C_M and A are similar, denoted C_M∼A), then Equation 2.3 becomes

(2.5)

That is, by employing the similarity transformation given by T in this way, the action of S^k_n is replaced by that of A^k. Many circular convolution algorithms can be understood, in part, by understanding the manipulations made to S_n and the resulting new matrix A. If the transformation T is to be useful, it must satisfy two requirements: (1) Tx must be simple to compute, and (2) A must have some advantageous structure. For example, by the convolution property of the DFT, the DFT matrix F diagonalizes S_n,

(2.6)

so that it diagonalizes every circulant matrix. In this case, Tx can be computed by using an FFT and the structure of A is the simplest possible. So the two above mentioned conditions are met.

The Winograd Structure can be described in this manner also. Suppose M(s) can be factored as M(s)=M₁(s)M₂(s) where M₁ and M₂ have no common roots, then C_M∼(C_M₁ ⊕ C_M₂) where ⊕ denotes the matrix direct sum. Using this similarity and recalling Equation 2.3, the original convolution is decomposed into disjoint convolutions. This is, in fact, a statement of the Chinese Remainder Theorem for polynomials expressed in matrix notation. In the case of circular convolution, sⁿ–1=∏_d|nΦ_d(s), so that S_n can be transformed to a block diagonal matrix,

(2.7)

where Φ_d(s) is the d^th cyclotomic polynomial. In this case, each block represents a convolution with respect to a cyclotomic polynomial, or a `cyclotomic convolution'. Winograd's approach carries out these cyclotomic convolutions using the Toom-Cook algorithm. Note that for each divisor, d, of n there is a corresponding block on the diagonal of size φ(d), for the degree of Φ_d(s) is φ(d) where φ(·) is the Euler totient function. This method is good for short lengths, but as n increases the cyclotomic convolutions become cumbersome, for as the number of distinct prime divisors of d increases, the operation described by ∑_kh_k(C_{Φ_d})^k becomes more difficult to implement.

The Agarwal-Cooley Algorithm utilizes the fact that

(2.8) S_n = P^t (S_n₁ ⊗ S_n₂) P

where n=n₁n₂, and P is an appropriate permutation 1. This converts the one dimensional circular convolution of length n to a two dimensional one of length n₁ along one dimension and length n₂ along the second. Then an n₁-point and an n₂-point circular convolution algorithm can be combined to obtain an n-point algorithm. In polynomial notation, the mapping accomplished by this permutation P can be informally indicated by

(2.9)

It should be noted that Equation 2.8 implies that a circulant matrix of size n₁n₂ can be written as a block circulant matrix with circulant blocks.

The Split-Nesting algorithm 3 combines the structures of the Winograd and Agarwal-Cooley methods, so that S_n is transformed to a block diagonal matrix as in Equation 2.7,

(2.10)

Here Ψ(d)=⊗_p|d,p∈PC_{Φ_{H_d(p)}} where H_d(p) is the highest power of p dividing d, and P is the set of primes.

Example 2.1.

(2.11)

In this structure a multidimensional cyclotomic convolution, represented by Ψ(d), replaces each cyclotomic convolution in Winograd's algorithm (represented by C_{Φ_d} in Equation 2.7. Indeed, if the product of b₁,⋯,b_k is d and they are pairwise relatively prime, then C_{Φ_d}∼C_{Φ_b₁}⊗⋯⊗C_{Φ_{b_k}}. This gives a method for combining cyclotomic convolutions to compute a longer circular convolution. It is like the Agarwal-Cooley method but requires fewer additions 3.

Prime Factor Permutations

One can obtain S_n₁⊗S_n₂ from S_n₁n₂ when , for in this case, S_n is similar to S_n₁⊗S_n₂, n=n₁n₂. Moreover, they are related by a permutation. This permutation is that of the prime factor FFT algorithms and is employed in nesting algorithms for circular convolution 1, 2. The permutation is described by Zalcstein 7, among others, and it is his description we draw on in the following.

Let n=n₁n₂ where . Define e_k, (0≤k≤n–1), to be the standard basis vector, (0,⋯,0,1,0,⋯,0)^t, where the 1 is in the k^th position. Then, the circular shift matrix, S_n, can be described by

(2.12)

Note that, by inspection,

(2.13)

where 0≤a≤n₁–1 and 0≤b≤n₂–1. Because n₁ and n₂ are relatively prime a permutation matrix P can be defined by

(2.14)

With this P,

(2.15)

and

(2.16)

Since and P^–1=P^t, one gets, in the multi-factor case, the following.

Lemma

If n=n₁⋯n_k and n₁,...,n_k are pairwise relatively prime, then where P is the permutation matrix given by .

This useful permutation will be denoted here as . If n=p₁^e₁p₂^e₂⋯p_k^e_k then this permutation yields the matrix, S_p₁^e₁⊗⋯⊗S_{p_k^e_k}. This product can be written simply as , so that one has .

It is quite simple to show that

(2.17)

In general, one has

(2.18)

A Matlab function for P_a,b⊗I_s is pfp2I() in one of the appendices. This program is a direct implementation of the definition. In a paper by Templeton 5, another method for implementing P_a,b, without `if' statements, is given. That method requires some precalculations, however. A function for P_{n₁,⋯,n_k} is pfp(). It uses Equation 2.18