Fast Fourier Transforms (6x9 Version) by C. Sidney Burrus - HTML preview

/ Home / Engineering / Fast Fourier Transforms (6x9 Version)

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 3

Multidimensional Index

Mapping1

A powerful approach to the development of efficient algorithms is

to break a large problem into multiple small ones. One method for

doing this with both the DFT and convolution uses a linear change

of index variables to map the original one-dimensional problem

into a multi-dimensional problem. This approach provides a uni-

fied derivation of the Cooley-Tukey FFT, the prime factor algo-

rithm (PFA) FFT, and the Winograd Fourier transform algorithm

(WFTA) FFT. It can also be applied directly to convolution to break

it down into multiple short convolutions that can be executed faster

than a direct implementation. It is often easy to translate an algo-

rithm using index mapping into an efficient program.

The basic definition of the discrete Fourier transform (DFT) is

N−1

C (k) = ∑ x(n) W nk

(3.1)

n=0

√

where n, k, and N are integers, j =

−1, the basis functions are

1This content is available online at <http://cnx.org/content/m16326/1.12/>.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

the N roots of unity,

WN = e− j2π/N

(3.2)

and k = 0, 1, 2, · · · , N − 1.

If the N values of the transform are calculated from the N val-

ues of the data, x (n), it is easily seen that N2 complex multipli-

cations and approximately that same number of complex additions

are required. One method for reducing this required arithmetic is

to use an index mapping (a change of variables) to change the one-

dimensional DFT into a two- or higher dimensional DFT. This is

one of the ideas behind the very efficient Cooley-Tukey [89] and

Winograd [404] algorithms. The purpose of index mapping is to

change a large problem into several easier ones [46], [120]. This

is sometimes called the “divide and conquer" approach [26] but a

more accurate description would be “organize and share" which

explains the process of redundancy removal or reduction.

3.1 The Index Map

For a length-N sequence, the time index takes on the values

n = 0, 1, 2, ..., N − 1

(3.3)

When the length of the DFT is not prime, N can be factored as

N = N1N2 and two new independent variables can be defined over

the ranges

n1 = 0, 1, 2, ..., N1 − 1

(3.4)

n2 = 0, 1, 2, ..., N2 − 1

(3.5)

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

A linear change of variables is defined which maps n1 and n2 to n

and is expressed by

n = ((K1n1 + K2n2))

(3.6)

where Ki are integers and the notation ((x)) denotes the integer

residue of x modulo N[232]. This map defines a relation between

all possible combinations of n1 and n2 in (3.4) and (3.5) and the

values for n in (3.3). The question as to whether all of the n in

(3.3) are represented, i.e., whether the map is one-to-one (unique),

has been answered in [46] showing that certain integer Ki always

exist such that the map in (3.6) is one-to-one. Two cases must be

considered.

3.1.1 Case 1.

N1 and N2 are relatively prime, i.e., the greatest common divisor

(N1, N2) = 1.

The integer map of (3.6) is one-to-one if and only if:

(K1 = aN2) and/or (K2 = bN1) and (K1, N1) = (3.7)

(K2, N2) = 1

where a and b are integers.

3.1.2 Case 2.

N1 and N2 are not relatively prime, i.e., (N1, N2) > 1.

The integer map of (3.6) is one-to-one if and only if:

(K1 = aN2) and (K2 = bN1) and (a, N1) = (K2, N2) = 1

(3.8)

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

(K1 = aN2) and (K2 = bN1) and (K1, N1) = (b, N2) = 1

(3.9)

Reference [46] should be consulted for the details of these condi-

tions and examples. Two classes of index maps are defined from

these conditions.

3.1.3 Type-One Index Map:

The map of (3.6) is called a type-one map when integers a and b

exist such that

K1 = aN2 and K2 = bN1

(3.10)

3.1.4 Type-Two Index Map:

The map of (3.6) is called a type-two map when when integers a

and b exist such that

K1 = aN2 or K2 = bN1,

but not both.

(3.11)

The type-one can be used only if the factors of N are relatively

prime, but the type-two can be used whether they are relatively

prime or not. Good [149], Thomas, and Winograd [404] all used

the type-one map in their DFT algorithms. Cooley and Tukey

[89] used the type-two in their algorithms, both for a fixed radix

N = RM and a mixed radix [301].

The frequency index is defined by a map similar to (3.6) as

k = ((K3k1 + K4k2))

(3.12)

where the same conditions, (3.7) and (3.8), are used for determin-

ing the uniqueness of this map in terms of the integers K3 and K4.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

Two-dimensional arrays for the input data and its DFT are defined

using these index maps to give

x (n1, n2) = x((K1n1 + K2n2))

(3.13)

X (k1, k2) = X ((K3k1 + K4k2))

(3.14)

In some of the following equations, the residue reduction notation

will be omitted for clarity. These changes of variables applied to

the definition of the DFT given in (3.1) give

N2−1N1−1

C (k) = ∑

∑

x (n) W K1K3n1k1 W K1K4n1k2 W K2K3n2k1 W K2K4n2k2

(3.15) N

n2=0 n1=0

where all of the exponents are evaluated modulo N.

The amount of arithmetic required to calculate (3.15) is the same as

in the direct calculation of (3.1). However, because of the special

nature of the DFT, the integer constants Ki can be chosen in such

a way that the calculations are “uncoupled" and the arithmetic is

reduced. The requirements for this are

((K1K4)) = 0 and/or ((K

= 0

(3.16)

2K3))N

When this condition and those for uniqueness in (3.6) are applied,

it is found that the Ki may always be chosen such that one of the

terms in (3.16) is zero. If the Ni are relatively prime, it is always

possible to make both terms zero. If the Ni are not relatively prime,

only one of the terms can be set to zero. When they are relatively

prime, there is a choice, it is possible to either set one or both to

zero. This in turn causes one or both of the center two W terms in

(3.15) to become unity.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

An example of the Cooley-Tukey radix-4 FFT for a length-16 DFT

uses the type-two map with K1 = 4, K2 = 1, K3 = 1, K4 = 4 giving

n = 4n1 + n2

(3.17)

k = k1 + 4k2

(3.18)

The residue reduction in (3.6) is not needed here since n does not

exceed N as n1 and n2 take on their values. Since, in this example,

the factors of N have a common factor, only one of the conditions

in (3.16) can hold and, therefore, (3.15) becomes

C (k1, k2) = C (k) = ∑ ∑ x (n) W n1k1 W n2k1 W n2k2

(3.19)

n2=0n1=0

Note the definition of WN in (3.3) allows the simple form of

W K1K3 = W

This has the form of a two-dimensional DFT with an extra term

W16, called a “twiddle factor". The inner sum over n1 represents

four length-4 DFTs, the W16 term represents 16 complex multipli-

cations, and the outer sum over n2 represents another four length-4

DFTs. This choice of the Ki “uncouples" the calculations since the

first sum over n1 for n2 = 0 calculates the DFT of the first row of

the data array x (n1, n2), and those data values are never needed

in the succeeding row calculations. The row calculations are inde-

pendent, and examination of the outer sum shows that the column

calculations are likewise independent. This is illustrated in Fig-

ure 3.1.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

x(n ,n )

x(k ,n ) x TF

X(k ,k )

Figure 3.1: Uncoupling of the Row and Column Calculations

(Rectangles are Data Arrays)

The left 4-by-4 array is the mapped input data, the center array has

the rows transformed, and the right array is the DFT array. The

row DFTs and the column DFTs are independent of each other.

The twiddle factors (TF) which are the center W in (3.19), are the

multiplications which take place on the center array of Figure 3.1.

This uncoupling feature reduces the amount of arithmetic required

and allows the results of each row DFT to be written back over

the input data locations, since that input row will not be needed

again. This is called “in-place" calculation and it results in a large

memory requirement savings.

An example of the type-two map used when the factors of N are

relatively prime is given for N = 15 as

n = 5n1 + n2

(3.20)

k = k1 + 3k2

(3.21)

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

The residue reduction is again not explicitly needed. Although the

factors 3 and 5 are relatively prime, use of the type-two map sets

only one of the terms in (3.16) to zero. The DFT in (3.15) becomes

X = ∑ ∑ x W n1k1 W n2k1 W n2k2

(3.22)

n2=0n1=0

which has the same form as (3.19), including the existence of the

twiddle factors (TF). Here the inner sum is five length-3 DFTs, one

for each value of k1. This is illustrated in (3.2) where the rectangles

are the 5 by 3 data arrays and the system is called a “mixed radix"

FFT.

x(n ,n )

x(k ,n )

X(k ,k )

Figure 3.2: Uncoupling of the Row and Column Calculations

(Rectangles are Data Arrays)

An alternate illustration is shown in Figure 3.3 where the rectan-

gles are the short length 3 and 5 DFTs.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

Figure 3.3: Uncoupling of the Row and Column Calculations

(Rectangles are Short DFTs)

The type-one map is illustrated next on the same length-15 exam-

ple. This time the situation of (3.7) with the “and" condition is

used in (3.10) using an index map of

n = 5n1 + 3n2

(3.23)

and

k = 10k1 + 6k2

(3.24)

The residue reduction is now necessary. Since the factors of N are

relatively prime and the type-one map is being used, both terms in

(3.16) are zero, and (3.15) becomes

X = ∑ ∑ x W n1k1W n2k2

(3.25)

n2=0n1=0

which is similar to (3.22), except that now the type-one map gives

a pure two-dimensional DFT calculation with no TFs, and the sums

can be done in either order. Figures Figure 3.2 and Figure 3.3 also

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

describe this case but now there are no Twiddle Factor multiplica-

tions in the center and the resulting system is called a “prime factor

algorithm" (PFA).

The purpose of index mapping is to improve the arithmetic effi-

ciency. For example a direct calculation of a length-16 DFT re-

quires 16^2 or 256 real multiplications (recall, one complex multi-

plication requires 4 real multiplications and 2 real additions) and an

uncoupled version requires 144. A direct calculation of a length-

15 DFT requires 225 multiplications but with a type-two map only

135 and with a type-one map, 120. Recall one complex multipli-

cation requires four real multiplications and two real additions.

Algorithms of practical interest use short DFT’s that require fewer

than N2 multiplications. For example, length-4 DFTs require no

multiplications and, therefore, for the length-16 DFT, only the TFs

must be calculated. That calculation uses 16 multiplications, many

fewer than the 256 or 144 required for the direct or uncoupled cal-

culation.

The concept of using an index map can also be applied to convolu-

tion to convert a length N = N1N2 one-dimensional cyclic convolu-

tion into a N1 by N2 two-dimensional cyclic convolution [46], [6].

There is no savings of arithmetic from the mapping alone as there

is with the DFT, but savings can be obtained by using special short

algorithms along each dimension. This is discussed in Algorithms

for Data with Restrictions (Chapter 12) .

3.2 In-Place Calculation of the DFT and

Scrambling

Because use of both the type-one and two index maps uncouples

the calculations of the rows and columns of the data array, the re-

sults of each short length Ni DFT can be written back over the data

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

as it will not be needed again after that particular row or column

is transformed. This is easily seen from Figures Figure 3.1, Fig-

ure 3.2, and Figure 3.3 where the DFT of the first row of x (n1, n2)

can be put back over the data rather written into a new array. After

all the calculations are finished, the total DFT is in the array of the

original data. This gives a significant memory savings over using

a separate array for the output.

Unfortunately, the use of in-place calculations results in the order

of the DFT values being permuted or scrambled. This is because

the data is indexed according to the input map (3.6) and the results

are put into the same locations rather than the locations dictated by

the output map (3.12). For example with a length-8 radix-2 FFT,

the input index map is

n = 4n1 + 2n2 + n3

(3.26)

which to satisfy (3.16) requires an output map of

k = k1 + 2k2 + 4k3

(3.27)

The in-place calculations will place the DFT results in the loca-

tions of the input map and these should be reordered or unscram-

bled into the locations given by the output map. Examination of

these two maps shows the scrambled output to be in a “bit reversed"

order.

For certain applications, this scrambled output order is not impor-

tant, but for many applications, the order must be unscrambled be-

fore the DFT can be considered complete. Because the radix of

the radix-2 FFT is the same as the base of the binary number rep-

resentation, the correct address for any term is found by reversing

the binary bits of the address. The part of most FFT programs that

does this reordering is called a bit-reversed counter. Examples of

various unscramblers are found in [146], [60] and in the appen-

dices.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

The development here uses the input map and the resulting algo-

rithm is called “decimation-in-frequency". If the output rather than

the input map is used to derive the FFT algorithm so the correct

output order is obtained, the input order must be scrambled so that

its values are in locations specified by the output map rather than

the input map. This algorithm is called “decimation-in-time". The

scrambling is the same bit-reverse counting as before, but it pre-

cedes the FFT algorithm in this case. The same process of a post-

unscrambler or pre-scrambler occurs for the in-place calculations

with the type-one maps. Details can be found in [60], [56]. It is

possible to do the unscrambling while calculating the FFT and to

avoid a separate unscrambler. This is done for the Cooley-Tukey

FFT in [192] and for the PFA in [60], [56], [319].

If a radix-2 FFT is used, the unscrambler is a bit-reversed counter.

If a radix-4 FFT is used, the unscrambler is a base-4 reversed

counter, and similarly for radix-8 and others. However, if for the

radix-4 FFT, the short length-4 DFTs (butterflies) have their out-

puts in bit-revered order, the output of the total radix-4 FFT will

be in bit-reversed order, not base-4 reversed order. This means any

radix-2n FFT can use the same radix-2 bit-reversed counter as an

unscrambler if the proper butterflies are used.

3.3 Efficiencies Resulting from Index Map-

ping with the DFT

In this section the reductions in arithmetic in the DFT that result

from the index mapping alone will be examined. In practical al-

gorithms several methods are always combined, but it is helpful in

understanding the effects of a particular method to study it alone.

The most general form of an uncoupled two-dimensional DFT is

given by

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

N2−1 N1−1

X (k1, k2) = ∑ { ∑ x (n1, n2) f1 (n1, n2, k1)} f2 (n2, k

(3.28)1, k2)

n2=0 n1=0

where the inner sum calculates N2 length-N1 DFT’s and, if for a

type-two map, the effects of the TFs. If the number of arithmetic

operations for a length-N DFT is denoted by F (N), the number

of operations for this inner sum is F = N2F (N1). The outer sum

which gives N1 length-N2 DFT’s requires N1F (N2) operations.

The total number of arithmetic operations is then

F = N2F (N1) + N1F (N2)

(3.29)

The first question to be considered is for a fixed length N, what

is the optimal relation of N1 and N2 in the sense of minimizing the

required amount of arithmetic. To answer this question, N1 and N2

are temporarily assumed to be real variables rather than integers. If

the short length-Ni DFT’s in (3.28) and any TF multiplications are

assumed to require N2 operations, i.e. F (N

, "Efficiencies

i) = N2

Resulting from Index Mapping with the DFT" (Section 3.3: Effi-

ciencies Resulting from Index Mapping with the DFT) becomes

F = N2N21 + N1N22 = N (N1 + N2) = N N1 + NN−1

(3.30)

To find the minimum of F over N1, the derivative of F with re-

spect to N1 is set to zero (temporarily assuming the variables to be

continuous) and the result requires N1 = N2.

dF/dN1 = 0

N1 = N2

(3.31)

This result is also easily seen from the symmetry of N1 and N2

in N = N1N2. If a more general model of the arithmetic complex-

ity of the short DFT’s is used, the same result is obtained, but a

closer examination must be made to assure that N1 = N2 is a global

minimum.

If only the effects of the index mapping are to be considered, then

the F (N) = N2 model is used and (3.31) states that the two factors

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 3. MULTIDIMENSIONAL

INDEX MAPPING

should be equal. If there are M factors, a similar reasoning shows

that all M factors should be equal. For the sequence of length

N = RM

(3.32)

there are now M length-R DFT’s and, since the factors are all

equal, the index map must be type two. This means there must be

twiddle factors.

In order to simplify the analysis, only the number of multiplica-

tions will be considered. If the number of multiplications for a

length-R DFT is F (R), then the formula for operation counts in

(3.30) generalizes to

F = N ∑F (Ni)/Ni = NMF (R)/R

(3.33)

i=1

for Ni = R

F = NlnR (N) F (R) /R = (NlnN) (F (R) / (RlnR))

(3.34)

This is a very important formula which was derived by Cooley

and Tukey in their famous paper [89] on the FFT. It states that for

a given R which is called the radix, the number of multiplications

(and additions) is proportional to NlnN. It also shows the relation

to the value of the radix, R.

In order to get some idea of the “best" radix, the number of multi-

plications to compute a length-R DFT is assumed to be F (R) = Rx.

If this is used with (3.34), the optimal R can be found.

dF/dR = 0

R = e1/(x−1)

(3.35)

For x = 2 this gives R = e, with the closest integer being three.

The result of this analysis states that if no other arithmetic saving

methods other than index mapping are used, and if the length-R

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

DFT’s plus TFs require F = R2 multiplications, the optimal algo-

rithm requires

F = 3Nlog3N

(3.36)

multiplications for a length N = 3M DFT. Compare this with N2

for a direct calculation and the improvement is obvious.

While this is an interesting result from the analysis of the effects of

index mapping alone, in practice, index mapping is almost always

used in conjunction with special algorithms for the short length-Ni

DFT’s in (3.15). For example, if R = 2 or 4, there are no mul-

tiplications required for the short DFT’s. Only the TFs require

multiplications. Winograd (see Winorad’s Short DFT Algorithms

(Chapter 7)) has derived some algorithms for short DFT’s that re-

quire O (N) multiplications. This means that F (Ni) = KNi and

the operation count F in "Efficiencies Resulting from Index Map-

ping with the DFT" (Section 3.3: Efficiencies Resulting from In-

dex Mapping with the DFT) is independent of Ni. Therefore, the

derivative of F is zero for all Ni. Obviously, these particular cases

must be examined.

3.4 The FFT as a Recursive Evaluation of

the DFT

It is possible to formulate the DFT so a length-N DFT can be cal-

culated in terms of two length-(N/2) DFTs. And, if N = 2M, each