Transforms1
This book has developed a class of efficient algorithms based on
index mapping and polynomial algebra. This provides a frame-
work from which the Cooley-Tukey FFT, the split-radix FFT, the
PFA, and WFTA can be derived. Even the programs implementing
these algorithms can have a similar structure. Winograd’s theorems
were presented and shown to be very powerful in both deriving al-
gorithms and in evaluating them. The simple radix-2 FFT provides
a compact, elegant means for efficiently calculating the DFT. If
some elaboration is allowed, significant improvement can be had
from the split-radix FFT, the radix-4 FFT or the PFA. If multipli-
cations are expensive, the WFTA requires the least of all.
Several method for transforming real data were described that are
more efficient than directly using a complex FFT. A complex FFT
can be used for real data by artificially creating a complex input
from two sections of real input. An alternative and slightly more
efficient method is to construct a special FFT that utilizes the sym-
metries at each stage.
1This content is available online at <http://cnx.org/content/m16340/1.7/>.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
231
CHAPTER 15. CONCLUSIONS: FAST
232
FOURIER TRANSFORMS
As computers move to multiprocessors and multicore, writing and
maintaining efficient programs becomes more and more difficult.
The highly structured form of FFTs allows automatic generation of
very efficient programs that are tailored specifically to a particular
DSP or computer architecture.
For high-speed convolution, the traditional use of the FFT or PFA
with blocking is probably the fastest method although rectangular
transforms, distributed arithmetic, or number theoretic transforms
may have a future with special VLSI hardware.
The ideas presented in these notes can also be applied to the cal-
culation of the discrete Hartley transform [355], [112], the discrete
cosine transform [119], [395], and to number theoretic transforms
[32], [239], [267].
There are many areas for future research. The relationship of
hardware to algorithms, the proper use of multiple processors,
the proper design and use of array processors and vector proces-
sors are all open. There are still many unanswered questions in
multi-dimensional algorithms where a simple extension of one-
dimensional methods will not suffice.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
Appendix 1: FFT
Flowgraphs1
16.1 Signal Flow Graphs of Cooley-Tukey
FFTs
The following four figures are flow graphs for Radix-2 Cooley-
Tukey FFTs. The first is a length-16, decimation-in-frequency
Radix-2 FFT with the input data in order and output data scram-
bled. The first stage has 8 length-2 "butterflies" (which overlap in the figure) followed by 8 multiplications by powers of W which
are called "twiddle factors". The second stage has 2 length-8 FFTs
which are each calculated by 4 butterflies followed by 4 multiplies.
The third stage has 4 length-4 FFTs, each calculated by 2 butterflies
followed by 2 multiplies and the last stage is simply 8 butterflies
followed by trivial multiplies by one. This flow graph should be
compared with the index map in Polynomial Description of Signals
(Chapter 4), the polynomial decomposition in The DFT as Convo-
lution or Filtering (Chapter 5), and the program in Appendix 3. In
the program, the butterflies and twiddle factor multiplications are
done together in the inner most loop. The outer most loop indexes
through the stages. If the length of the FFT is a power of two, the
number of stages is that power (log N).
1This content is available online at <http://cnx.org/content/m16352/1.11/>.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
233
234
APPENDIX
The second figure below is a length-16, decimation-in-time FFT
with the input data scrambled and output data in order. The first
stage has 8 length-2 "butterflies" followed by 8 twiddle factors
multiplications. The second stage has 4 length-4 FFTs which are
each calculated by 2 butterflies followed by 2 multiplies. The third
stage has 2 length-8 FFTs, each calculated by 4 butterflies followed
by 8 multiplies and the last stage is simply 8 length-2 butterflies.
This flow graph should be compared with the index map in Poly-
nomial Description of Signals (Chapter 4), the polynomial decom-
position in The DFT as Convolution or Filtering (Chapter 5), and
the program in Appendix 3 (Chapter 18). Here, the FFT must be
preceded by a scrambler.
The third and fourth figures below are a length-16 decimation-in-
frequency and a decimation-in-time but, in contrast to the figures
above, the DIF has the output in order which requires a scrambled
input and the DIT has the input in order which requires the output
be unscrambled. Compare with the first two figures. Note the order
of the twiddle factors. The number of additions and multiplications
in all four flow graphs is the same and the structure of the three-
loop program which executes the flow graph is the same.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
APPENDIX
235
x(0)
X (0)
x(1)
X (8)
x(2)
W 0
X (4)
x(3)
W 4
X (12)
x(4)
W 0
X (2)
x(5)
W 2
X (10)
x(6)
W 4
W 0
X (6)
x(7)
W 6
W 4
X (14)
x(8)
W 0
X (1)
x(9)
W 1
X (9)
x(10)
W 2
W 0
X (5)
x(11)
W 3
W 4
X (13)
x(12)
W 4
W 0
X (3)
x(13)
W 5
W 2
X (11)
x(14)
W 6
W 4
W 0
X (7)
x(15)
W 7
W 6
W 4
X (15)
Figure 16.1: Length-16, Decimation-in-Frequency, In-order
input, Radix-2 FFT
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
236
APPENDIX
x(0)
X (0)
x(8)
X (1)
x(4)
W 0
X (2)
x(12)
W 4
X (3)
x(2)
W 0
X (4)
x(10)
W 2
X (5)
x(6)
W 0
W 4
X (6)
x(14)
W 4
W 6
X (7)
x(1)
W 0
X (8)
x(9)
W 1
X (9)
x(5)
W 0
W 2
X (10
x(13)
W 4
W 3
X (11)
x(3)
W 0
W 4
X (12)
x(11)
W 2
W 5
X (13)
x(7)
W 0
W 4
W 6
X (14)
x(15)
W 4
W 6
W 7
X (15)
Figure 16.2: Length-16, Decimation-in-Time, In-order out-
put, Radix-2 FFT
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
APPENDIX
237
x(0)
X (0)
x(8)
W 0
X (1)
x(4)
W 0
X (2)
x(12)
W 4
W 0
X (3)
x(2)
W 0
X (4)
x(10)
W 2
W 0
X (5)
x(6)
W 4
W 0
X (6)
x(14)
W 6
W 4
W 0
X (7)
x(1)
X (8)
x(9)
W 1
X (9)
x(5)
W 2
X (10
x(13)
W 5
W 2
X (11)
x(3)
W 4
X (12)
x(11)
W 3
W 4
X (13)
x(7)
W 6
W 4
X (14)
x(15)
W 7
W 6
W 4
X (15)
Figure 16.3: Length-16, alternate Decimation-in-Frequency,
In-order output, Radix-2 FFT
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
238
APPENDIX
x(0)
X (0)
x(1)
W 0
X (8)
x(2)
W 0
X (4)
x(3)
W 0
W 4
X (12)
x(4)
W 0
X (2)
x(5)
W 0
W 2
X (10)
x(6)
W 0
W 4
X (6)
x(7)
W 0
W 4
W 6
X (14)
x(8)
X (1)
x(9)
W 1
X (9)
x(10)
W 2
X (5)
x(11)
W 2
W 5
X (13)
x(12)
W 4
X (3)
x(13)
W 4
W 3
X (11)
x(14)
W 4
W 6
X (7)
x(15)
W 4
W 6
W 7
X (15)
Figure 16.4: Length-16, alternate Decimation-in-Time, In-
order input, Radix-2 FFT
The following is a length-16, decimation-in-frequency Radix-4
FFT with the input data in order and output data scrambled. There
are two stages with the first stage having 4 length-4 "butterflies"
followed by 12 multiplications by powers of W which are called
"twiddle factors. The second stage has 4 length-4 FFTs which are
each calculated by 4 butterflies followed by 4 multiplies. Note,
each stage here looks like two stages but it is one and there is only
one place where twiddle factor multiplications appear. This flow
graph should be compared with the index map in Polynomial De-
scription of Signals (Chapter 4), the polynomial decomposition in
The DFT as Convolution or Filtering (Chapter 5), and the program
in Appendix 3 (Chapter 18). Log to the base 4 of 16 is 2. The
total number of twiddle factor multiplication here is 12 compared
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
APPENDIX
239
to 24 for the radix-2. The unscrambler is a base-four reverse order
counter rather than a bit reverse counter, however, a modification
of the radix four butterflies will allow a bit reverse counter to be
used with the radix-4 FFT as with the radix-2.
x(0)
X (0)
x(1)
X (8)
x(2)
X (4)
- j
x(3)
X (12)
W 0
x(4)
X (2)
W 2
x(5)
X (10)
x(6)
W 4
X (6)
- j
x(7)
W 6
X (14)
W 0
x(8)
X (1)
W 1
x(9)
X (9)
W 2
x(10)
X (5)
W 3
- j
x(11)
X (13)
- j
W 4
x(12)
X (3)
- j
W 5
x(13)
X (11)
- j
W 6
x(14)
X (7)
- j
W 7
- j
x(15)
X (15)
Figure 16.5: Length-16, Decimation-in-Frequency, In-order
input, Radix-4 FFT
The following two flowgraphs are length-16, decimation-in-
frequency Split Radix FFTs with the input data in order and output
data scrambled. Because the "butterflies" are L shaped, the stages
do not progress uniformly like the Radix-2 or 4. These two fig-
ures are the same with the first drawn in a way to compare with
the Radix-2 and 4, and the second to illustrate the L shaped butter-
flies. These flow graphs should be compared with the index map
in Polynomial Description of Signals (Chapter 4) and the program
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
240
APPENDIX
in Appendix 3 (Chapter 18). Because of the non-uniform stages,
the program indexing is more complicated. Although the num-
ber of twiddle factor multiplications is 12 as was the radix-4 case,
for longer lengths, the split-radix has slightly fewer multiplications
than the radix-4.
Because the structures of the radix-2, radix-4, and split-radix FFTs
are the same, the number of data additions is same for all of them.
However, each complex twiddle factor multiplication requires two
real additions (and four real multiplications) the number of addi-
tions will be fewer for the structures with fewer multiplications.
x(0)
X (0)
x(1)
X (8)
x(2)
X (4)
- j
x(3)
X (12)
x(4)
X (2)
x(5)
W 2
X (10)
- j
x(6)
W 4
X (6)
- j
x(7)
W 6
X (14)
x(8)
W 0
X (1)
x(9)
W 1
X (9)
x(10)
W 2
X (5)
- j
x(11)
W 3
X (13)
- j
x(12)
W 4
X (3)
- j
x(13)
W 5
X (11)
- j
x(14)
W 6
X (7)
- j
- j
x(15)
W 7
X (15)
Figure 16.6: Length-16, Decimation-in-Frequency, In-order
input, Split-Radix FFT
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
APPENDIX
241
x(0)
X (0)
x(1)
X (8)
x(2)
X (4)
x(3)
X (12)
x(4)
W 0
X (2)
x(5)
W 2
X (10)
x(6)
W 4
X (6)
x(7)
W 6
X (14)
x(8)
W 0
X (1)
x(9)
W 1
X (9)
x(10)
W 2
X (5)
x(11)
W 3
X (13)
x(12)
W 0
X (3)
x(13)
W 3
X (11)
x(14)
W 6
X (7)
x(1