Mapping1
A powerful approach to the development of efficient algorithms is
to break a large problem into multiple small ones. One method for
doing this with both the DFT and convolution uses a linear change
of index variables to map the original one-dimensional problem
into a multi-dimensional problem. This approach provides a uni-
fied derivation of the Cooley-Tukey FFT, the prime factor algo-
rithm (PFA) FFT, and the Winograd Fourier transform algorithm
(WFTA) FFT. It can also be applied directly to convolution to break
it down into multiple short convolutions that can be executed faster
than a direct implementation. It is often easy to translate an algo-
rithm using index mapping into an efficient program.
The basic definition of the discrete Fourier transform (DFT) is
N−1
C (k) = ∑ x(n) W nk
N
(3.1)
n=0
√
where n, k, and N are integers, j =
−1, the basis functions are
1This content is available online at <http://cnx.org/content/m16326/1.12/>.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
9
CHAPTER 3. MULTIDIMENSIONAL
10
INDEX MAPPING
the N roots of unity,
WN = e− j2π/N
(3.2)
and k = 0, 1, 2, · · · , N − 1.
If the N values of the transform are calculated from the N val-
ues of the data, x (n), it is easily seen that N2 complex multipli-
cations and approximately that same number of complex additions
are required. One method for reducing this required arithmetic is
to use an index mapping (a change of variables) to change the one-
dimensional DFT into a two- or higher dimensional DFT. This is
one of the ideas behind the very efficient Cooley-Tukey [89] and
Winograd [404] algorithms. The purpose of index mapping is to
change a large problem into several easier ones [46], [120]. This
is sometimes called the “divide and conquer" approach [26] but a
more accurate description would be “organize and share" which
explains the process of redundancy removal or reduction.
3.1 The Index Map
For a length-N sequence, the time index takes on the values
n = 0, 1, 2, ..., N − 1
(3.3)
When the length of the DFT is not prime, N can be factored as
N = N1N2 and two new independent variables can be defined over
the ranges
n1 = 0, 1, 2, ..., N1 − 1
(3.4)
n2 = 0, 1, 2, ..., N2 − 1
(3.5)
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
11
A linear change of variables is defined which maps n1 and n2 to n
and is expressed by
n = ((K1n1 + K2n2))
(3.6)
N
where Ki are integers and the notation ((x)) denotes the integer
N
residue of x modulo N[232]. This map defines a relation between
all possible combinations of n1 and n2 in (3.4) and (3.5) and the
values for n in (3.3). The question as to whether all of the n in
(3.3) are represented, i.e., whether the map is one-to-one (unique),
has been answered in [46] showing that certain integer Ki always
exist such that the map in (3.6) is one-to-one. Two cases must be
considered.
3.1.1 Case 1.
N1 and N2 are relatively prime, i.e., the greatest common divisor
(N1, N2) = 1.
The integer map of (3.6) is one-to-one if and only if:
(K1 = aN2) and/or (K2 = bN1) and (K1, N1) = (3.7)
(K2, N2) = 1
where a and b are integers.
3.1.2 Case 2.
N1 and N2 are not relatively prime, i.e., (N1, N2) > 1.
The integer map of (3.6) is one-to-one if and only if:
(K1 = aN2) and (K2 = bN1) and (a, N1) = (K2, N2) = 1
(3.8)
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
12
INDEX MAPPING
or
(K1 = aN2) and (K2 = bN1) and (K1, N1) = (b, N2) = 1
(3.9)
Reference [46] should be consulted for the details of these condi-
tions and examples. Two classes of index maps are defined from
these conditions.
3.1.3 Type-One Index Map:
The map of (3.6) is called a type-one map when integers a and b
exist such that
K1 = aN2 and K2 = bN1
(3.10)
3.1.4 Type-Two Index Map:
The map of (3.6) is called a type-two map when when integers a
and b exist such that
K1 = aN2 or K2 = bN1,
but not both.
(3.11)
The type-one can be used only if the factors of N are relatively
prime, but the type-two can be used whether they are relatively
prime or not. Good [149], Thomas, and Winograd [404] all used
the type-one map in their DFT algorithms. Cooley and Tukey
[89] used the type-two in their algorithms, both for a fixed radix
N = RM and a mixed radix [301].
The frequency index is defined by a map similar to (3.6) as
k = ((K3k1 + K4k2))
(3.12)
N
where the same conditions, (3.7) and (3.8), are used for determin-
ing the uniqueness of this map in terms of the integers K3 and K4.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
13
Two-dimensional arrays for the input data and its DFT are defined
using these index maps to give
^
x (n1, n2) = x((K1n1 + K2n2))
(3.13)
N
^
X (k1, k2) = X ((K3k1 + K4k2))
(3.14)
N
In some of the following equations, the residue reduction notation
will be omitted for clarity. These changes of variables applied to
the definition of the DFT given in (3.1) give
N2−1N1−1
C (k) = ∑
∑
x (n) W K1K3n1k1 W K1K4n1k2 W K2K3n2k1 W K2K4n2k2
N
N
N
(3.15) N
n2=0 n1=0
where all of the exponents are evaluated modulo N.
The amount of arithmetic required to calculate (3.15) is the same as
in the direct calculation of (3.1). However, because of the special
nature of the DFT, the integer constants Ki can be chosen in such
a way that the calculations are “uncoupled" and the arithmetic is
reduced. The requirements for this are
((K1K4)) = 0 and/or ((K
= 0
(3.16)
N
2K3))N
When this condition and those for uniqueness in (3.6) are applied,
it is found that the Ki may always be chosen such that one of the
terms in (3.16) is zero. If the Ni are relatively prime, it is always
possible to make both terms zero. If the Ni are not relatively prime,
only one of the terms can be set to zero. When they are relatively
prime, there is a choice, it is possible to either set one or both to
zero. This in turn causes one or both of the center two W terms in
(3.15) to become unity.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
14
INDEX MAPPING
An example of the Cooley-Tukey radix-4 FFT for a length-16 DFT
uses the type-two map with K1 = 4, K2 = 1, K3 = 1, K4 = 4 giving
n = 4n1 + n2
(3.17)
k = k1 + 4k2
(3.18)
The residue reduction in (3.6) is not needed here since n does not
exceed N as n1 and n2 take on their values. Since, in this example,
the factors of N have a common factor, only one of the conditions
in (3.16) can hold and, therefore, (3.15) becomes
^
3
3
C (k1, k2) = C (k) = ∑ ∑ x (n) W n1k1 W n2k1 W n2k2
(3.19)
4
16
4
n2=0n1=0
Note the definition of WN in (3.3) allows the simple form of
W K1K3 = W
16
4
This has the form of a two-dimensional DFT with an extra term
W16, called a “twiddle factor". The inner sum over n1 represents
four length-4 DFTs, the W16 term represents 16 complex multipli-
cations, and the outer sum over n2 represents another four length-4
DFTs. This choice of the Ki “uncouples" the calculations since the
first sum over n1 for n2 = 0 calculates the DFT of the first row of
^
the data array x (n1, n2), and those data values are never needed
in the succeeding row calculations. The row calculations are inde-
pendent, and examination of the outer sum shows that the column
calculations are likewise independent. This is illustrated in Fig-
ure 3.1.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
15
n1
k1
n2
k1
n2
x(n ,n )
1
2
k2
x(k ,n ) x TF
1
2
X(k ,k )
1
2
Figure 3.1: Uncoupling of the Row and Column Calculations
(Rectangles are Data Arrays)
The left 4-by-4 array is the mapped input data, the center array has
the rows transformed, and the right array is the DFT array. The
row DFTs and the column DFTs are independent of each other.
The twiddle factors (TF) which are the center W in (3.19), are the
multiplications which take place on the center array of Figure 3.1.
This uncoupling feature reduces the amount of arithmetic required
and allows the results of each row DFT to be written back over
the input data locations, since that input row will not be needed
again. This is called “in-place" calculation and it results in a large
memory requirement savings.
An example of the type-two map used when the factors of N are
relatively prime is given for N = 15 as
n = 5n1 + n2
(3.20)
k = k1 + 3k2
(3.21)
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
16
INDEX MAPPING
The residue reduction is again not explicitly needed. Although the
factors 3 and 5 are relatively prime, use of the type-two map sets
only one of the terms in (3.16) to zero. The DFT in (3.15) becomes
4
2
X = ∑ ∑ x W n1k1 W n2k1 W n2k2
(3.22)
3
15
5
n2=0n1=0
which has the same form as (3.19), including the existence of the
twiddle factors (TF). Here the inner sum is five length-3 DFTs, one
for each value of k1. This is illustrated in (3.2) where the rectangles
are the 5 by 3 data arrays and the system is called a “mixed radix"
FFT.
n1
k1
n2
k1
n2
x(n ,n )
1
2
k2
x(k ,n )
1
2
X(k ,k )
1
2
Figure 3.2: Uncoupling of the Row and Column Calculations
(Rectangles are Data Arrays)
An alternate illustration is shown in Figure 3.3 where the rectan-
gles are the short length 3 and 5 DFTs.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
17
Figure 3.3: Uncoupling of the Row and Column Calculations
(Rectangles are Short DFTs)
The type-one map is illustrated next on the same length-15 exam-
ple. This time the situation of (3.7) with the “and" condition is
used in (3.10) using an index map of
n = 5n1 + 3n2
(3.23)
and
k = 10k1 + 6k2
(3.24)
The residue reduction is now necessary. Since the factors of N are
relatively prime and the type-one map is being used, both terms in
(3.16) are zero, and (3.15) becomes
^
4
2
^
X = ∑ ∑ x W n1k1W n2k2
(3.25)
3
5
n2=0n1=0
which is similar to (3.22), except that now the type-one map gives
a pure two-dimensional DFT calculation with no TFs, and the sums
can be done in either order. Figures Figure 3.2 and Figure 3.3 also
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
18
INDEX MAPPING
describe this case but now there are no Twiddle Factor multiplica-
tions in the center and the resulting system is called a “prime factor
algorithm" (PFA).
The purpose of index mapping is to improve the arithmetic effi-
ciency. For example a direct calculation of a length-16 DFT re-
quires 16^2 or 256 real multiplications (recall, one complex multi-
plication requires 4 real multiplications and 2 real additions) and an
uncoupled version requires 144. A direct calculation of a length-
15 DFT requires 225 multiplications but with a type-two map only
135 and with a type-one map, 120. Recall one complex multipli-
cation requires four real multiplications and two real additions.
Algorithms of practical interest use short DFT’s that require fewer
than N2 multiplications. For example, length-4 DFTs require no
multiplications and, therefore, for the length-16 DFT, only the TFs
must be calculated. That calculation uses 16 multiplications, many
fewer than the 256 or 144 required for the direct or uncoupled cal-
culation.
The concept of using an index map can also be applied to convolu-
tion to convert a length N = N1N2 one-dimensional cyclic convolu-
tion into a N1 by N2 two-dimensional cyclic convolution [46], [6].
There is no savings of arithmetic from the mapping alone as there
is with the DFT, but savings can be obtained by using special short
algorithms along each dimension. This is discussed in Algorithms
for Data with Restrictions (Chapter 12) .
3.2 In-Place Calculation of the DFT and
Scrambling
Because use of both the type-one and two index maps uncouples
the calculations of the rows and columns of the data array, the re-
sults of each short length Ni DFT can be written back over the data
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
19
as it will not be needed again after that particular row or column
is transformed. This is easily seen from Figures Figure 3.1, Fig-
ure 3.2, and Figure 3.3 where the DFT of the first row of x (n1, n2)
can be put back over the data rather written into a new array. After
all the calculations are finished, the total DFT is in the array of the
original data. This gives a significant memory savings over using
a separate array for the output.
Unfortunately, the use of in-place calculations results in the order
of the DFT values being permuted or scrambled. This is because
the data is indexed according to the input map (3.6) and the results
are put into the same locations rather than the locations dictated by
the output map (3.12). For example with a length-8 radix-2 FFT,
the input index map is
n = 4n1 + 2n2 + n3
(3.26)
which to satisfy (3.16) requires an output map of
k = k1 + 2k2 + 4k3
(3.27)
The in-place calculations will place the DFT results in the loca-
tions of the input map and these should be reordered or unscram-
bled into the locations given by the output map. Examination of
these two maps shows the scrambled output to be in a “bit reversed"
order.
For certain applications, this scrambled output order is not impor-
tant, but for many applications, the order must be unscrambled be-
fore the DFT can be considered complete. Because the radix of
the radix-2 FFT is the same as the base of the binary number rep-
resentation, the correct address for any term is found by reversing
the binary bits of the address. The part of most FFT programs that
does this reordering is called a bit-reversed counter. Examples of
various unscramblers are found in [146], [60] and in the appen-
dices.
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
20
INDEX MAPPING
The development here uses the input map and the resulting algo-
rithm is called “decimation-in-frequency". If the output rather than
the input map is used to derive the FFT algorithm so the correct
output order is obtained, the input order must be scrambled so that
its values are in locations specified by the output map rather than
the input map. This algorithm is called “decimation-in-time". The
scrambling is the same bit-reverse counting as before, but it pre-
cedes the FFT algorithm in this case. The same process of a post-
unscrambler or pre-scrambler occurs for the in-place calculations
with the type-one maps. Details can be found in [60], [56]. It is
possible to do the unscrambling while calculating the FFT and to
avoid a separate unscrambler. This is done for the Cooley-Tukey
FFT in [192] and for the PFA in [60], [56], [319].
If a radix-2 FFT is used, the unscrambler is a bit-reversed counter.
If a radix-4 FFT is used, the unscrambler is a base-4 reversed
counter, and similarly for radix-8 and others. However, if for the
radix-4 FFT, the short length-4 DFTs (butterflies) have their out-
puts in bit-revered order, the output of the total radix-4 FFT will
be in bit-reversed order, not base-4 reversed order. This means any
radix-2n FFT can use the same radix-2 bit-reversed counter as an
unscrambler if the proper butterflies are used.
3.3 Efficiencies Resulting from Index Map-
ping with the DFT
In this section the reductions in arithmetic in the DFT that result
from the index mapping alone will be examined. In practical al-
gorithms several methods are always combined, but it is helpful in
understanding the effects of a particular method to study it alone.
The most general form of an uncoupled two-dimensional DFT is
given by
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
21
N2−1 N1−1
X (k1, k2) = ∑ { ∑ x (n1, n2) f1 (n1, n2, k1)} f2 (n2, k
(3.28)1, k2)
n2=0 n1=0
where the inner sum calculates N2 length-N1 DFT’s and, if for a
type-two map, the effects of the TFs. If the number of arithmetic
operations for a length-N DFT is denoted by F (N), the number
of operations for this inner sum is F = N2F (N1). The outer sum
which gives N1 length-N2 DFT’s requires N1F (N2) operations.
The total number of arithmetic operations is then
F = N2F (N1) + N1F (N2)
(3.29)
The first question to be considered is for a fixed length N, what
is the optimal relation of N1 and N2 in the sense of minimizing the
required amount of arithmetic. To answer this question, N1 and N2
are temporarily assumed to be real variables rather than integers. If
the short length-Ni DFT’s in (3.28) and any TF multiplications are
assumed to require N2 operations, i.e. F (N
, "Efficiencies
i
i) = N2
i
Resulting from Index Mapping with the DFT" (Section 3.3: Effi-
ciencies Resulting from Index Mapping with the DFT) becomes
F = N2N21 + N1N22 = N (N1 + N2) = N N1 + NN−1
(3.30)
1
To find the minimum of F over N1, the derivative of F with re-
spect to N1 is set to zero (temporarily assuming the variables to be
continuous) and the result requires N1 = N2.
dF/dN1 = 0
=>
N1 = N2
(3.31)
This result is also easily seen from the symmetry of N1 and N2
in N = N1N2. If a more general model of the arithmetic complex-
ity of the short DFT’s is used, the same result is obtained, but a
closer examination must be made to assure that N1 = N2 is a global
minimum.
If only the effects of the index mapping are to be considered, then
the F (N) = N2 model is used and (3.31) states that the two factors
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
CHAPTER 3. MULTIDIMENSIONAL
22
INDEX MAPPING
should be equal. If there are M factors, a similar reasoning shows
that all M factors should be equal. For the sequence of length
N = RM
(3.32)
there are now M length-R DFT’s and, since the factors are all
equal, the index map must be type two. This means there must be
twiddle factors.
In order to simplify the analysis, only the number of multiplica-
tions will be considered. If the number of multiplications for a
length-R DFT is F (R), then the formula for operation counts in
(3.30) generalizes to
M
F = N ∑F (Ni)/Ni = NMF (R)/R
(3.33)
i=1
for Ni = R
F = NlnR (N) F (R) /R = (NlnN) (F (R) / (RlnR))
(3.34)
This is a very important formula which was derived by Cooley
and Tukey in their famous paper [89] on the FFT. It states that for
a given R which is called the radix, the number of multiplications
(and additions) is proportional to NlnN. It also shows the relation
to the value of the radix, R.
In order to get some idea of the “best" radix, the number of multi-
plications to compute a length-R DFT is assumed to be F (R) = Rx.
If this is used with (3.34), the optimal R can be found.
dF/dR = 0
=>
R = e1/(x−1)
(3.35)
For x = 2 this gives R = e, with the closest integer being three.
The result of this analysis states that if no other arithmetic saving
methods other than index mapping are used, and if the length-R
Available for free at Connexions
<http://cnx.org/content/col10683/1.5>
23
DFT’s plus TFs require F = R2 multiplications, the optimal algo-
rithm requires
F = 3Nlog3N
(3.36)
multiplications for a length N = 3M DFT. Compare this with N2
for a direct calculation and the improvement is obvious.
While this is an interesting result from the analysis of the effects of
index mapping alone, in practice, index mapping is almost always
used in conjunction with special algorithms for the short length-Ni
DFT’s in (3.15). For example, if R = 2 or 4, there are no mul-
tiplications required for the short DFT’s. Only the TFs require
multiplications. Winograd (see Winorad’s Short DFT Algorithms
(Chapter 7)) has derived some algorithms for short DFT’s that re-
quire O (N) multiplications. This means that F (Ni) = KNi and
the operation count F in "Efficiencies Resulting from Index Map-
ping with the DFT" (Section 3.3: Efficiencies Resulting from In-
dex Mapping with the DFT) is independent of Ni. Therefore, the
derivative of F is zero for all Ni. Obviously, these particular cases
must be examined.
3.4 The FFT as a Recursive Evaluation of
the DFT
It is possible to formulate the DFT so a length-N DFT can be cal-
culated in terms of two length-(N/2) DFTs. And, if N = 2M, each