To write a program that computes the circular convolution of h and x using the bilinear form Equation 24 in Bilinear Forms for Circular Convolution we need subprograms that carry out the action of P, Pt, R, Rt, A and Bt. We are assuming, as is usually done, that h is fixed and known so that u=CtR–tPJh can be pre-computed and stored. To compute these multiplicative constants u we need additional subprograms to carry out the action of Ct and R–t but the efficiency with which we compute u is unimportant since this is done beforehand and u is stored.
In Prime Factor Permutations we discussed the permutation P
and a program for it pfp()
appears in the appendix.
The reduction operations R, Rt and R–t
we have described in Reduction Operations and
programs for these reduction operations
KRED()
etc, also appear in the appendix.
To carry out the operation of A and Bt
we need to be able to carry out the action of
Ad1⊗⋯⊗Adk
and this was discussed in Implementing Kronecker Products Efficiently.
Note that since A and Bt are block diagonal, each diagonal block
can be done separately.
However, since they are rectangular, it is necessary
to be careful so that the correct indexing is used.
To facilitate the discussion of the programs we generate, it is useful to consider an example. Take as an example the 45 point circular convolution algorithm listed in the appendix. From Equation 19 from Bilinear Forms for Circular Convolution we find that we need to compute x=P9,5x and x=R9,5x. These are the first two commands in the program.
We noted above that bilinear forms for linear convolution, , can be used for these cyclotomic convolutions. Specifically we can take , and . In this case Equation 20 in Bilinear Forms for Circular Convolution becomes
In our approach this is what we have done. When we use the bilinear forms for convolution given in the appendix, for which D4=D2⊗D2 and D6=D2⊗D3, we get
and since Ed=Dd for the linear convolution algorithms listed in the appendix, we get
From the discussion above, we found that the Kronecker products
like D2⊗D2⊗D2
appearing in these expressions are best carried
out by factoring the product in to factors
of the form Ia⊗D2⊗Ib.
Therefore we need a program to carry out
and
.
These function are called ID2I(a,b,x)
and
ID3I(a,b,x)
and are listed in the appendix.
The transposed form, ,
is called ID2tI(a,b,x)
.
To compute the multiplicative constants we need Ct. Using we get
The Matlab function KFt
carries out
the operation Fd1⊗⋯FdK.
The Matlab function Kcrot
implements the
operation Gp1e1⊗⋯GpKeK.
They are both listed in the appendix.
By recognizing that the convolution algorithms for different lengths share a lot of the same computations, it is possible to write a set of programs that take advantage of this. The programs we have generated call functions from a relatives small set. Each program calls these functions with different arguments, in differing orders, and a different number of times. By organizing the program structure in a modular way, we are able to generate relatively compact code for a wide variety of lengths.
In the appendix we have listed code for the following functions, from which we create circular convolution algorithms. In the next section we generate FFT programs using this same set of functions.
Prime Factor Permutations: The Matlab function pfp
implements this permutation of Prime Factor Permutations.
Its transpose is implemented by pfpt
.
|
Reduction Operations: The Matlab function KRED
implements the reduction operations of Reduction Operations.
Its transpose is implemented by tKRED
.
Its inverse transpose is implemented by itKRED
and this function
is used only for computing the multiplicative constants.
|
Linear Convolution Operations: ID2I
and ID3I
are Matlab functions for the operations
I⊗D2⊗I and I⊗D3⊗I.
These linear convolution operations are also described in the appendix
`Bilinear Forms for Linear Convolution.'
ID2tI
and ID3tI
implement the transposes,
I⊗D2t⊗I and I⊗D3t⊗I.
|
Table 6.1 lists operation counts for some of the circular convolution algorithms we have generated. The operation counts do not include any arithmetic operations involved in the index variable or loops. They include only the arithmetic operations that involve the data sequence x in the convolution of x and h.
The table in 2 for the split nesting algorithm gives very similar arithmetic operation counts. For all lengths not divisible by 9, the algorithms we have developed use the same number of multiplications and the same number or fewer additions. For lengths which are divisible by 9, the algorithms described in 2 require fewer additions than do ours. This is because the algorithms whose operation counts are tabulated in the table in 2 use a special Φ9(s) convolution algorithm. It should be noted, however, that the efficient Φ9(s) convolution algorithm of 2 is not constructed from smaller algorithms using the Kronecker product, as is ours. As we have discussed above, the use of the Kronecker product facilitates adaptation to special computer architectures and yields a very compact program with function calls to a small set of functions.
N | muls | adds | N | muls | adds | N | muls | adds | N | muls | adds | |||
2 | 2 | 4 | 24 | 56 | 244 | 80 | 410 | 1546 | 240 | 1640 | 6508 | |||
3 | 4 | 11 | 27 | 94 | 485 | 84 | 320 | 1712 | 252 | 1520 | 7920 | |||
4 | 5 | 15 | 28 | 80 | 416 | 90 | 380 | 1858 | 270 | 1880 | 9074 | |||
5 | 10 | 31 | 30 | 80 | 386 | 105 | 640 | 2881 | 280 | 2240 | 9516 | |||
6 | 8 | 34 | 35 | 160 | 707 | 108 | 470 | 2546 | 315 | 3040 | 13383 | |||
7 | 16 | 71 | 36 | 95 | 493 | 112 | 656 | 2756 | 336 | 2624 | 11132 | |||
8 | 14 | 46 | 40 | 140 | 568 | 120 | 560 | 2444 | 360 | 2660 | 11392 | |||
9 | 19 | 82 | 42 | 128 | 718 | 126 | 608 | 3378 | 378
|