Fast Fourier Transforms (6x9 Version) by C. Sidney Burrus - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 13

Convolution Algorithms1

13.1 Fast Convolution by the FFT

One of the main applications of the FFT is to do convolution more

efficiently than the direct calculation from the definition which is:

y (n) = ∑ h(m) x(n − m)

(13.1)

which, with a change of variables, can also be written as:

y (n) = ∑ x(m) h(n − m)

(13.2)

This is often used to filter a signal x (n) with a filter whose im-

pulse response is h (n). Each output value y (n) requires N multi-

plications and N − 1 additions if y (n) and h (n) have N terms. So,

for N output values, on the order of N2 arithmetic operations are

required.

Because the DFT converts convolution to multiplication:

DFT {y (n)} = DFT {h (n)} DFT {x (n)}

(13.3)

1This content is available online at <http://cnx.org/content/m16339/1.10/>.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

203

CHAPTER 13. CONVOLUTION

204

ALGORITHMS

can be calculated with the FFT and bring the order of arithmetic

operations down to Nlog (N) which can be significant for large N.

This approach, which is called “fast convolutions", is a form of

block processing since a whole block or segment of x (n) must be

available to calculate even one output value, y (n). So, a time de-

lay of one block length is always required. Another problem is the

filtering use of convolution is usually non-cyclic and the convo-

lution implemented with the DFT is cyclic. This is dealt with by

appending zeros to x (n) and h (n) such that the output of the cyclic

convolution gives one block of the output of the desired non-cyclic

convolution.

For filtering and some other applications, one wants “on going"

convolution where the filter response h (n) may be finite in length

or duration, but the input x (n) is of arbitrary length. Two methods

have traditionally used to break the input into blocks and use the

FFT to convolve the block so that the output that would have been

calculated by directly implementing (13.1) or (13.2) can be con-

structed efficiently. These are called “overlap-add" and “over-lap

save".

13.1.1 Fast Convolution by Overlap-Add

In order to use the FFT to convolve (or filter) a long input sequence

x (n) with a finite length-M impulse response, h (n), we partition

the input sequence in segments or blocks of length L. Because

convolution (or filtering) is linear, the output is a linear sum of

the result of convolving the first block with h (n) plus the result of

convolving the second block with h (n), plus the rest. Each of these

block convolutions can be calculated by using the FFT. The output

is the inverse FFT of the product of the FFT of x (n) and the FFT

of h (n). Since the number of arithmetic operation to calculate the

convolution directly is on the order of M2 and, if done with the

FFT, is on the order of Mlog (M), there can be a great savings by

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

205

using the FFT for large M.

The reason this procedure is not totally straightforward, is the

length of the output of convolving a length-L block with a length-

M filter is of length L +M −1. This means the output blocks cannot

simply be concatenated but must be overlapped and added, hence

the name for this algorithm is “Overlap-Add".

The second issue that must be taken into account is the fact that the

overlap-add steps need non-cyclic convolution and convolution by

the FFT is cyclic. This is easily handled by appending L − 1 zeros

to the impulse response and M − 1 zeros to each input block so that

all FFTs are of length M + L − 1. This means there is no aliasing

and the implemented cyclic convolution gives the same output as

the desired non-cyclic convolution.

The savings in arithmetic can be considerable when implementing

convolution or performing FIR digital filtering. However, there are

two penalties. The use of blocks introduces a delay of one block

length. None of the first block of output can be calculated until

all of the first block of input is available. This is not a problem

for “off line" or “batch" processing but can be serious for real-time processing. The second penalty is the memory required to store

and process the blocks. The continuing reduction of memory cost

often removes this problem.

The efficiency in terms of number of arithmetic operations per out-

put point increases for large blocks because of the Mlog (M) re-

quirements of the FFT. However, the blocks become very large

(L > > M), much of the input block will be the appended zeros and

efficiency is lost. For any particular application, taking the particu-

lar filter and FFT algorithm being used and the particular hardware

being used, a plot of efficiency vs. block length, L should be made

and L chosen to maximize efficiency given any other constraints

that are applicable.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 13. CONVOLUTION

206

ALGORITHMS

Usually, the block convolutions are done by the FFT, but they could

be done by any efficient, finite length method. One could use “rect-

angular transforms" or “number-theoretic transforms". A general-

ization of this method is presented later in the notes.

13.1.2 Fast Convolution by Overlap-Save

An alternative approach to the Overlap-Add can be developed by

starting with segmenting the output rather than the input. If one

considers the calculation of a block of output, it is seen that not

only the corresponding input block is needed, but part of the pre-

ceding input block also needed. Indeed, one can show that a length

M + L − 1 segment of the input is needed for each output block.

So, one saves the last part of the preceding block and concatenates

it with the current input block, then convolves that with h (n) to

calculate the current output

13.2 Block Processing, a Generalization of

Overlap Methods

Convolution is intimately related to the DFT. It was shown in The

DFT as Convolution or Filtering (Chapter 5) that a prime length

DFT could be converted to cyclic convolution. It has been long

known [276] that convolution can be calculated by multiplying the

DFTs of signals.

An important question is what is the fastest method for calculating

digital convolution. There are several methods that each have some

advantage. The earliest method for fast convolution was the use

of sectioning with overlap-add or overlap-save and the FFT [276],

[300], [66]. In most cases the convolution is of real data and, there-

fore, real-data FFTs should be used. That approach is still proba-

bly the fastest method for longer convolution on a general purpose

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

207

computer or microprocessor. The shorter convolutions should sim-

ply be calculated directly.

13.3 Introduction

The partitioning of long or infinite strings of data into shorter sec-

tions or blocks has been used to allow application of the FFT to

realize on-going or continuous convolution [368], [181]. This sec-

tion develops the idea of block processing and shows that it is a

generalization of the overlap-add and overlap-save methods [368],

[147]. They further generalize the idea to a multidimensional for-

mulation of convolution [3], [47]. Moving in the opposite direc-

tion, it is shown that, rather than partitioning a string of scalars into

blocks and then into blocks of blocks, one can partition a scalar

number into blocks of bits and then include the operation of mul-

tiplication in the signal processing formulation. This is called dis-

tributed arithmetic [45] and, since it describes operations at the bit

level, is completely general. These notes try to present a coherent

development of these ideas.

13.4 Block Signal Processing

In this section the usual convolution and recursion that imple-

ments FIR and IIR discrete-time filters are reformulated in terms

of vectors and matrices. Because the same data is partitioned and

grouped in a variety of ways, it is important to have a consistent

notation in order to be clear. The nth element of a data sequence

is expressed h (n) or, in some cases to simplify, hn. A block or

finite length column vector is denoted hn with n indicating the nth

block or section of a longer vector. A matrix, square or rectangu-

lar, is indicated by an upper case letter such as H with a subscript

if appropriate.

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 13. CONVOLUTION

208

ALGORITHMS

13.4.1 Block Convolution

The operation of a finite impulse response (FIR) filter is described

by a finite convolution as

L−1

y (n) = ∑ h(k) x(n − k)

(13.4)

k=0

where x (n) is causal, h (n) is causal and of length L, and the time

index n goes from zero to infinity or some large value. With a

change of index variables this becomes

y (n) = ∑h(n − k) x(k)

(13.5)

which can be expressed as a matrix operation by

 

y0

h0

0

0

· · · 0

x0

 

 y1 

 h1

h0

0

  x1 

 = 

 

 .

(13.6)

 

 y2 

 h2

h1 h0

  x2 

. 

 

.

.

.

.

.

..

..

..

The H matrix of impulse response values is partitioned into N by

N square sub matrices and the X and Y vectors are partitioned into

length-N blocks or sections. This is illustrated for N = 3 by

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

209

h0 0

0

H

0

=

h

H1

=

(13.7)

1

h0 0 

h2 h1 h0

h3 h2 h1

 h

etc.

4

h3 h2 

h5 h4 h3

x0

x

0

=

x

x

(13.8)

1 

1

=

x2

x3

y0

 x 

y =  y 

etc.

4 

0

1 

x5

y2

Substituting these definitions into (13.6) gives

 

y

H

x

0

0

0

0

· · · 0

0

 

 y

 H1

H0

0

  x1 

1  = 

 

(13.9)

 

 y

H

x

2 

2

H1 H0

 

2 

. 

 

.

.

.

.

.

..

..

..

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 13. CONVOLUTION

210

ALGORITHMS

The general expression for the nth output block is

n

y =

H

n

∑ n−k xk

(13.10)

k=0

which is a vector or block convolution. Since the matrix-vector

multiplication within the block convolution is itself a convolution,

(13.10) is a sort of convolution of convolutions and the finite length

matrix-vector multiplication can be carried out using the FFT or

other fast convolution methods.

The equation for one output block can be written as the product

x0

y = [H

 x

(13.11)

2

2H1H0]

1

x2

and the effects of one input block can be written

H0

y0

 H1  x1 =  y

 .

(13.12)

1 

H2

y2

These are generalize statements of overlap save and overlap add

[368], [147]. The block length can be longer, shorter, or equal to

the filter length.

13.4.2 Block Recursion

Although less well-known, IIR filters can also be implemented

with block processing [145], [74], [396], [43], [44]. The block

form of an IIR filter is developed in much the same way as for the

block convolution implementation of the FIR filter. The general

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

211

constant coefficient difference equation which describes an IIR fil-

ter with recursive coefficients al, convolution coefficients bk, input

signal x (n), and output signal y (n) is given by

N−1

M−1

y (n) = ∑ al yn−l + ∑ bk xn−k

(13.13)

l=1

k=0

using both functional notation and subscripts, depending on which

is easier and clearer. The impulse response h (n) is

N−1

M−1

h (n) = ∑ al h(n − l) + ∑ bk δ (n − k)

(13.14)

l=1

k=0

which can be written in matrix operator form

 

1

0

0

· · · 0

h0

b0

 

 a1

1

0

  h1 

 b1 

 

 

 a2

a1

1

  h2 

 b2 

 

 = 

(13.15)

 

 a3

a2 a1

  h3 

 b3 

 

0

a

  h 

0 

3

a2

 

4 

.

 

.

.

.

.

.

..

..

..

In terms of N by N submatrices and length-N blocks, this becomes

 

A0

0

0

· · · 0

h0

b0

 

 A1

A0

0

  h1 

 b1 

 

 = 

(13.16)

 

0

A1 A0

  h2 

0 

.

 

.

.

.

.

.

..

..

..

From this formulation, a block recursive equation can be written

that will generate the impulse response block by block.

A0 hn + A1 hn−1 = 0 for n ≥ 2

(13.17)

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 13. CONVOLUTION

212

ALGORITHMS

hn = −A−1A

0

1 hn−1 = K hn−1 for n ≥ 2

(13.18)

with initial conditions given by

h1 = −A−1A

b

b

0

1A−1

0

0 + A−1

0

1

(13.19)

This can also be written to generate the square partitions of the

impulse response matrix by

Hn = KHn−1 for n ≥ 2

(13.20)

with initial conditions given by

H1 = KA−1 B

B

0

0 + A−1

0

1

(13.21)

ane K = −A−1A

0

1. This recursively generates square submatrices

of H similar to those defined in (13.7) and (13.9) and shows the

basic structure of the dynamic system.

Next, we develop the recursive formulation for a general input as

described by the scalar difference equation (13.14) and in matrix

operator form by

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

213

 

1

0

0

· · · 0

y0

 

 a1

1

0

  y1 

 

 

 a2 a1

1

  y2 

 

=

(13.22)

 

 a3

a2 a1

  y3 

 

 

0

a3 a2

  y4 

.

 

.

.

.

.

..

..

 

b0 0

0

· · · 0

x0

 

 b1 b0

0

  x1 

 

 

 b2 b1 b0

  x2 

 

 

0

b2 b1

  x3 

 

 

0

0

b2

  x4 

.

 

.

.

.

.

..

..

which, after substituting the definitions of the sub matrices and

assuming the block length is larger than the order of the numerator

or denominator, becomes

Available for free at Connexions

<http://cnx.org/content/col10683/1.5>

CHAPTER 13. CONVOLUTION

214

ALGORITHMS

 

A0 0

0

· · · 0

y0

 

 A

  y 

<