Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

/ Home / Teacher's Resources / Universal Algorithms in Signal Processing and Communications

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 6. Beyond lossless compression^*

It is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/

2013/05/21 14:23:17 -0500

Summary

6.1. Universal lossy compression

Consider x∈αⁿ. The goal of lossy compression 3 is to describe , also of length n but possibly defined over another reconstruction alphabet , such that the description requires few bits and the distortion

(6.1)

is small, where d(·,·) is some distortion metric. It is well known that for every d(·,·) and distortion level D there is a minimum rate R(D), such that can be described at rate R(D). The rate R(D) is known as the rate distortion (RD) function, it is the fundamental information theoretic limit of lossy compression 3, 9.

The invention of lossy compression algorithms has been a challenging problem for decades. Despite numerous applications such as image compression 24, 16, video compression 23, and speech coding 18, 4, 20, there is a significant gap between theory and practice, and these practical lossy compressors do not achieve the RD function. On the other hand, theoretical constructions that achieve the RD function are impractical.

A promising recent algorithm by Jalali and Weissman 13 is universal in the limit of infinite runtime. Its RD performance is reasonable even with modest runtime. The main idea is that the distortion version of the input x can be computed as follows,

(6.2)

where β<0 is the slope at the particular point of interest in the RD function, and is the empirical conditional entropy of order k,

(6.3)

where u^k is a context of order k, and as before is the number of times that the symbol a appears following a context u^k in wⁿ. Jalali and Weissman proved 14 that when k=o(log(n)), the RD pair converges to the RD function asymptotically in n. Therefore, an excellent lossy compression technique is to compute and then compress it. Moreover, this compression can be universal. In particular, the choice of context order k=o(log(n)) ensures that universal compressors for context tress sources can emulate the coding length of the empirical conditional entropy .

Despite this excellent potential performance, there is still a tremendous challenge. Brute force computation of the globally minimum energy solution involves an exhaustive search over exponentially many sequences and is thus infeasible. Therefore, Jalali and Weissman rely on Markov chain Monte Carlo (MCMC) 12, which is a stochastic relaxation approach to optimization. The crux of the matter is to define an energy function,

(6.4)

The Boltzmann probability mass function (pmf) is

(6.5)

where t>0 is related to temperature in simulated annealing, and Z_t is the normalization constant, which does not need to be computed.

Because it is difficult to sample from the Boltzmann pmf Equation 6.5 directly, we instead use a Gibbs sampler, which computes the marginal distributions at all n locations conditioned on the rest of wⁿ being kept fixed. For each location, the Gibbs sampler resamples from the distribution of w_i conditioned on as induced by the joint pmf in Equation 6.5, which is computed as follows,

(6.6)

We can now write,

(6.7)

where is the change in when y_i=a is replaced by b, and is the change in distortion. Combining Equation 6.6 and Equation 6.7,

(6.8)

The maximum change in the energy within an iteration of MCMC algorithm is then bounded by

(6.9)

We refer to the resampling from a single location as an iteration, and group the n possible locations into super-iterations.^[2]

During the simulated annealing, the temperature t is gradually increased, where in super-iteration i we use t=O(1/log(i)) 12, 13. In each iteration, the Gibbs sampler modifies wⁿ in a random manner that resembles heat bath concepts in statistical physics. Although MCMC could sink into a local minimum, we decrease the temperature slowly enough that the randomness of Gibbs sampling eventually drives MCMC out of the local minimum toward the set of minimal energy solutions, which includes , because low temperature t favors low-energy wⁿ. Pseudo-code for our encoder appears in Algorithm 1 below.

Algorithm 1: Lossy encoder with fixed reproduction alphabetInput:xⁿ∈αⁿ, , β, c, rOutput: bit-stream Procedure:

Initialize w by quantizing x with
Initialize n_w(·,·) using w
for r=1 to R do // super-iteration
// temperature
Draw permutation of numbers {1,...,n} at random
for r^'=1 to n do
Let j be component t^' in permutation
Generate new w_j using
Update n_w(·,·)[·]
Apply CTW to wⁿ // compress outcome

6.2. Computational issues

Looking at the pseudo-code, it is clear that the following could be computational bottlenecks:

Initializing n_w(·,·) - a naive implementation needs to scan the sequence wⁿ (complexity O(n)) and initialize a data structure with elements. Unless , this is super linear in n. Therefore, we recall that K=o(log(n)), and initializing n_w(·,·) requires linear complexity O(n).
The inner loop is run rn times, and each time computing for all possible might be challenging. In particular, let us consider computation of Δd and ΔH.
1. Computation of Δd requires constant time, and is not burdensome.
2. Computation of ΔH requires to modify the symbol counts for each context that was modified. A key contribution by Jalali and Weissman was to recognize that the array of symbol counts, n_w(·,·), would change in O(k) locations, where k is the context order. Therefore, each computation of ΔH requires O(k) time. Seeing that such computations per iteration are needed, and there are rn iterations, this is .
3. Updating n_w(·,·) after w_j is re-sampled from the Boltzmann distribution also requires O(k) time. However, this step is performed only once per iteration, and not times. Therefore, this step requires less computation than step (b).

6.3. Lossy compression of an analog input

So far, we have described the approach by Jalali and Weissman to compress xⁿ∈αⁿ. Suppose instead that x∈Rⁿ is analog. Seeing that MCMC optimizes wⁿ over a discrete alphabet, it would be convenient to keep doing so. However, because is finite, and assuming that square distortion is used, i.i., , we see that the distortion could be large.

Fortunately, Baron and Weissman showed 6 that the following quantizer has favorable properties,

(6.10)

where γ is an integer that grows with n. That is, as γ grows the set of reproduction levels quantizers a wider internal more finely. Therefore, we have that the probability that some x_i is an outlier, i.e., |x_i|<γ, vanishes with n, and the effect of outliers on vanishes. Moreover, because the internal is sampled more finely as γ increases with n, this quantizer can emulate any codebook with continuous levels, and so in the limit its RD performance converges to the RD function.

6.4. Rate distortion performance

To evaluate the RD performance, we must first define the RD function. Consider a source X that generates a sequence xⁿ∈Rⁿ. A lossy encoder is a mapping e:Rⁿ→C, where C⊂Rⁿ is a codebook that contains a set of codewords in Rⁿ. Let Cⁿ(x,D) be the smallest cardinality codebook for inputs of length n generated by X such that the expected per-symbol distortion between the input xⁿ and the codeword e(x)∈Cⁿ(x,D) is at most D. The rate Rⁿ(x,D) is the per-symbol log-cardinality of the codebook,

(6.11)

This is an operational definition of the RD function in terms of the best block code. In contrast, it can be shown that the RD function is equal to a minimization over mutual information, yielding a different flavor of definition 9, 3.

Having defined the RD function, we can describe the RD performance of the compression algorithm by Baron and Weissman for continuous valued inputs.

Theorem 8 Consider square error distortion, , let X be a finite variance stationary and ergodic source with RD function

Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

Chapter 6. Beyond lossless compression*

6.1. Universal lossy compression

6.2. Computational issues

6.3. Lossy compression of an analog input

6.4. Rate distortion performance

Chapter 6. Beyond lossless compression^*