Introduction to Compressive Sensing by Marco F. Duarte - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 2Sparsity and Compressibilty

2.1Introduction to vector spaces*

Summary

This module provides a brief review of some of the key concepts in vector spaces that will be required in developing the theory of compressive sensing.

For much of its history, signal processing has focused on signals produced by physical systems. Many natural and man-made systems can be modeled as linear. Thus, it is natural to consider signal models that complement this kind of linear structure. This notion has been incorporated into modern signal processing by modeling signals as vectors living in an appropriate vector space. This captures the linear structure that we often desire, namely that if we add two signals together then we obtain a new, physically meaningful signal. Moreover, vector spaces allow us to apply intuitions and tools from geometry in R3, such as lengths, distances, and angles, to describe and compare signals of interest. This is useful even when our signals live in high-dimensional or infinite-dimensional spaces.

Throughout this course, we will treat signals as real-valued functions having domains that are either continuous or discrete, and either infinite or finite. These assumptions will be made clear as necessary in each chapter. In this course, we will assume that the reader is relatively comfortable with the key concepts in vector spaces. We now provide only a brief review of some of the key concepts in vector spaces that will be required in developing the theory of compressive sensing (CS). For a more thorough review of vector spaces see this introductory course in Digital Signal Processing.

We will typically be concerned with normed vector spaces, i.e., vector spaces endowed with a norm. In the case of a discrete, finite domain, we can view our signals as vectors in an N-dimensional Euclidean space, denoted by RN. When dealing with vectors in RN, we will make frequent use of the p norms, which are defined for p∈[1,∞] as

(2.1)
_autogen-svg2png-0007.png

In Euclidean space we can also consider the standard inner product in RN, which we denote

(2.2)
_autogen-svg2png-0009.png

This inner product leads to the 2 norm: _autogen-svg2png-0011.png.

In some contexts it is useful to extend the notion of p norms to the case where p<1. In this case, the “norm” defined in Equation 2.1 fails to satisfy the triangle inequality, so it is actually a quasinorm. We will also make frequent use of the notation x0:=| supp (x)|, where _autogen-svg2png-0015.png denotes the support of x and | supp (x)| denotes the cardinality of supp (x). Note that ∥·∥0 is not even a quasinorm, but one can easily show that

(2.3)
_autogen-svg2png-0020.png

justifying this choice of notation. The p (quasi-)norms have notably different properties for different values of p. To illustrate this, in Figure 2.1 we show the unit sphere, i.e., _autogen-svg2png-0023.png induced by each of these norms in R2. Note that for p<1 the corresponding unit sphere is nonconvex (reflecting the quasinorm's violation of the triangle inequality).

(a) Unit sphere for 1 norm
(b) Unit sphere for 2 norm
(c) Unit sphere for norm
(d) Unit sphere for p quasinorm
Figure 2.1
Unit spheres in R2 for the p norms with p=1,2,∞, and for the p quasinorm with _autogen-svg2png-0034.png.

We typically use norms as a measure of the strength of a signal, or the size of an error. For example, suppose we are given a signal x∈R2 and wish to approximate it using a point in a one-dimensional affine space A. If we measure the approximation error using an p norm, then our task is to find the _autogen-svg2png-0038.png that minimizes _autogen-svg2png-0039.png. The choice of p will have a significant effect on the properties of the resulting approximation error. An example is illustrated in Figure 2.2. To compute the closest point in A to x using each p norm, we can imagine growing an p sphere centered on x until it intersects with A. This will be the point _autogen-svg2png-0047.png that is closest to x in the corresponding p norm. We observe that larger p tends to spread out the error more evenly among the two coefficients, while smaller p leads to an error that is more unevenly distributed and tends to be sparse. This intuition generalizes to higher dimensions, and plays an important role in the development of CS theory.

(a) Approximation in 1 norm
(b) Approximation in 2 norm
(c) Approximation in norm
(d) Approximation in p quasinorm
Figure 2.2
Best approximation of a point in R2 by a a one-dimensional subspace using the p norms for p=1,2,∞, and the p quasinorm with _autogen-svg2png-0060.png.

2.2Bases and frames*

Summary

This module provides an overview of bases and frames in finite-dimensional Hilbert spaces.

A set _autogen-svg2png-0001.png is called a basis for a finite-dimensional vector space V if the vectors in the set span V and are linearly independent. This implies that each vector in the space can be represented as a linear combination of this (smaller, except in the trivial case) set of basis vectors in a unique fashion. Furthermore, the coefficients of this linear combination can be found by the inner product of the signal and a dual set of vectors. In discrete settings, we will only consider real finite-dimensional Hilbert spaces where V=RN and I={1,...,N}.

Mathematically, any signal x∈RN may be expressed as,

(2.4)
_autogen-svg2png-0007.png

where our coefficients are computed as ai=〈x,ψi and _autogen-svg2png-0009.png are the vectors that constitute our dual basis. Another way to denote our basis and its dual is by how they operate on x. Here, we call our dual basis _autogen-svg2png-0011.png our synthesis basis (used to reconstruct our signal by Equation 2.4) and Ψ is our analysis basis.

An orthonormal basis (ONB) is defined as a set of vectors _autogen-svg2png-0013.png that form a basis and whose elements are orthogonal and unit norm. In other words, ψi,ψj〉=0 if ij and one otherwise. In the case of an ONB, the synthesis basis is simply the Hermitian adjoint of analysis basis (_autogen-svg2png-0016.png).

It is often useful to generalize the concept of a basis to allow for sets of possibly linearly dependent vectors, resulting in what is known as a frame. More formally, a frame is a set of vectors _autogen-svg2png-0017.png in Rd, d<n corresponding to a matrix Ψ∈Rd×n, such that for all vectors x∈Rd,

(2.5) Ax2 2 ≤ ∥ΨTx2 2 Bx2 2

with 0<AB<∞. Note that the condition A>0 implies that the rows of Ψ must be linearly independent. When A is chosen as the largest possible value and B as the smallest for these inequalities to hold, then we call them the (optimal) frame bounds. If A and B can be chosen as A=B, then the frame is called A-tight, and if A=B=1, then Ψ is a Parseval frame. A frame is called equal-norm, if there exists some λ>0 such that Ψi2=λ for all i=1,...,N, and it is unit-norm if λ=1. Note also that while the concept of a frame is very general and can be defined in infinite-dimensional spaces, in the case where Ψ is a d×N matrix A and B simply correspond to the smallest and largest eigenvalues of ΨΨT, respectively.

Frames can provide richer representations of data due to their redundancy: for a given signal x, there exist infinitely many coefficient vectors α such that x=Ψα. In particular, each choice of a dual frame _autogen-svg2png-0046.png provides a different choice of a coefficient vector α. More formally, any frame satisfying

(2.6)
_autogen-svg2png-0048.png

is called an (alternate) dual frame. The particular choice _autogen-svg2png-0049.png is referred to as the canonical dual frame. It is also known as the Moore-Penrose pseudoinverse. Note that since A>0 requires Ψ to have linearly independent rows, we ensure that ΨΨT is invertible, so that _autogen-svg2png-0053.png is well-defined. Thus, one way to obtain a set of feasible coefficients is via

(2.7)
_autogen-svg2png-0054.png

One can show that this sequence is the smallest coefficient sequence in 2 norm, i.e., αd2≤∥α2 for all α such that x=Ψα.

Finally, note that in the sparse approximation literature, it is also common for a basis or frame to be referred to as a dictionary or overcomplete dictionary respectively, with the dictionary elements being called atoms.

2.3Sparse representations*

Summary

This module provides an overview of sparsity and sparse representations, giving examples for both 1-D and 2-D signals.

Transforming a signal to a new basis or frame may allow us to represent a signal more concisely. The resulting compression is useful for reducing data storage and data transmission, which can be quite expensive in some applications. Hence, one might wish to simply transmit the analysis coefficients obtained in our basis or frame expansion instead of its high-dimensional correlate. In cases where the number of non-zero coefficients is small, we say that we have a sparse representation. Sparse signal models allow us to achieve high rates of compression and in the case of compressive sensing, we may use the knowledge that our signal is sparse in a known basis or frame to recover our original signal from a small number of measurements. For sparse data, only the non-zero coefficients need to be stored or transmitted in many cases; the rest can be assumed to be zero).

Mathematically, we say that a signal x is K-sparse when it has at most K nonzeros, i.e., x0K. We let

(2.8) ΣK = {x : ∥x0K}

denote the set of all K-sparse signals. Typically, we will be dealing with signals that are not themselves sparse, but which admit a sparse representation in some basis Ψ. In this case we will still refer to x as being K