Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 7Compression of nonparametric sources*

It is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/

2013/05/17 13:45:40 -0500

Summary

Consider a stationary ergodic source, where we no longer assume that it is parametric. We need to define notions of probability that fit the universal framework. For this, we study Kac's lemma 5.

We have a stationary source, _autogen-svg2png-0001.png, _autogen-svg2png-0002.png. Define Nr as the number of shifts forward of a window of length until we see z again; this is called recurrence time. Define

(7.1)
_autogen-svg2png-0006.png

e.g., Y0=1, then Nr is the smallest positive number for which Yk=1. Note that _autogen-svg2png-0010.png is a binary stationary ergodic source. Define _autogen-svg2png-0011.png. Then the average recurrence time can be computed, _autogen-svg2png-0012.png. We can now present Kac's lemma.

Lemma 3 5_autogen-svg2png-0013.png and E[Nr =μ| X0+1 = z]=_autogen-svg2png-0015.png.

Let _autogen-svg2png-0016.png. Because z just appeared, then its probability is positive. We will prove that _autogen-svg2png-0018.png.

Define _autogen-svg2png-0019.png, and _autogen-svg2png-0020.png. Then _autogen-svg2png-0021.png. We claim that _autogen-svg2png-0022.png. This can be shown formally, but is easily seen by realizing that if z appears at any time n (say, positive) then it must appear at some negative time n with probability 1, because its probability is positive.

Therefore, we have

(7.2)
_autogen-svg2png-0026.png

Therefore, _autogen-svg2png-0027.png We conclude the proof by noting that Pr(A)=0.

Let us now develop a universal coding technique for stationary sources. Recall _autogen-svg2png-0029.png. The asymptotic equipartition property (AEP) of information theory 1 gives

(7.3)
_autogen-svg2png-0030.png

where lim→∞ϵ(δ,)=0. Define in this context a typical set T(δ,) that satisfies

(7.4)
_autogen-svg2png-0033.png

For a typical sequence z, _autogen-svg2png-0035.png. Then

(7.5)
_autogen-svg2png-0036.png

Consider our situation, we have a source with memory of length n and want to transmit X1.

  1. Choose _autogen-svg2png-0039.png.

  2. For z=X1, find the value of Nr if it appears in memory.

  3. If it appears, then transmit a 0 flag bit followed by the value of Nr.

  4. Else transmit a 1 followed by the uncompressed z.

Transmitting z via Nr requires _autogen-svg2png-0046.png bits, and so the expected coding length is

(7.6)
_autogen-svg2png-0047.png

After we normalize by , the per symbol length converges to _autogen-svg2png-0049.png.

Note that this analysis assumes that the entropy H for a block of symbols is known. If not, then we can have several sets that are each adjusted for some different entropy level.

7.1Universal Coding of the Integers

Coding of an index also appears in other information theoretic problems such as indexing a coset in distributed source coding and a codeword in lossy coding. What are good ways to encode the index?

If the index n lies in the range n∈{1,...,N} and N is known, then we can encode the index using ⌈log(N)⌉ bits. However, sometimes the index can be unbounded, or there could be an (unknown) distribution that biases us strongly toward smaller indices. Therefore, we are interested in a universal encoding of the integers such that each n is encoded using roughly log(n) bits.

To do so, we outline an approach by Elias 2. Let n be a natural number, and let b(n)be the binary form for n. For example, b(0)=0,b(1)=1,b(2)=10,b(3)=11, and so on. Another challenge is that the length of this representation, |b(n)|, is unknown. Let u(k)=0k–11, for example u(4)=0001. We can now define e(n)=u(|b(|b(n)|)|)b(|b(n)|)b(n). For example, for n=5, we have b(n)=101. Therefore, |b(n)|=3,b(|b(n)|)=11,|b(|b(n)|)|=2,u(|b(|b(n)|)|)=01, and e(n)=0111101. We note in passing that the first 2 bits of e(n) describe the length of b(|b(n)|), which is 2, the middle 2 bits describe b(|b(n)|), which is 3, and the last 3 bits describe b(n), giving the actual number 5. It is easily seen that

(7.7)
_autogen-svg2png-0073.png

7.2Sliding Window Universal Coding

Having described an efficient way to encode an index, in particular a large one, we can employ this technique in sliding window coding as developed by Wyner and Ziv 7.

Take a fixed window of length nw and a source that generates characters. Define the history as x1nw, this is the portion of the data x that we have already processed and thus know. Our goal is to encode the phrase _autogen-svg2png-0077.png, where L1 is not fixed. The length L1 is chosen such that there exists

(7.8)
_autogen-svg2png-0080.png

for some _autogen-svg2png-0081.png. For example, if x=abcbababc and we have processed the first 5 symbols abcba, then L1=3 and m1=1, where abcba is the history and we begin encoding after that part.

The actual encoding of the phrase y1 sends _autogen-svg2png-0088.png bits, where

(7.9)
_autogen-svg2png-0089.png

First we encode L1; then either the index m into history is encoded or y1 is encoded uncompressed; finally, the first L1 characters x1L1 are removed from the history database, and y1 is appended to its end. The latter process gives this algorithm a sliding window flavor.

This rather simple algorithm turns out to be asymptotically optimal in the limit of large nw. That is, it achieves the entropy rate. To see this, let us compute the compression factor. Consider x1N where N>>nw, _autogen-svg2png-0099.png, N=∑Ci=1|yi|, and |yi|=L1, where C is the number of phrases required to encode x1N. The number of bits _autogen-svg2png-0104.png needed to encoded x1N using the sliding window algorithm is

(7.10)
_autogen-svg2png-0106.png

Therefore, the compression factor _autogen-svg2png-0107.png satisfies

(7.11)
_autogen-svg2png-0108.png

Claim 3 7 As limnw→∞ and limN→∞, the compression factor _autogen-svg2png-0111.png converges to the entropy rate H.

A direct proof is complicated, because the location of phrases is a random variable, making a detailed analysis complicated. Let us try to simplify the problem.

Figure 7.1
Block partitioning in analysis of sliding window algorithm 7.

Take l0 that divides nw+N to create _autogen-svg2png-0115.png intervals, where _autogen-svg2png-0116.png, which is related to the window size in the algorithm from Chapter 7. This partition into blocks appears in Figure 7.1. Denote by wl0(x)>0 the smallest value for which _autogen-svg2png-0118.png. Using Kac's lemma 5,

(7.12)
_autogen-svg2png-0119.png

Claim 4 7 We have the following convergence as l increases,

(7.13)
_autogen-svg2png-0121.png

Because of the AEP 1,

(7.14)
_autogen-svg2png-0122.png

Therefore, for a typical input _autogen-svg2png-0123.png.

Recall that the interval length is _autogen-svg2png-0124.png and so the probability that an interval cannot be found in the history is

(7.15)
_autogen-svg2png-0125.png

For a long enough interval, this probability goes to zero.

7.3Redundancy of parsing schemes

There are many Ziv-Lempel style parsing algorithms 8, 9, 7, and each of the variants has different details, but the key idea is to find the longest match in a window of length nw. The length of the match is L, where we remind the reader that _autogen-svg2png-0128.png.

Now, encoding L requires _autogen-svg2png-0130.png bits, and so the per-symbol compression ratio is _autogen-svg2png-0131.png, which in the limit of large nw approaches the entropy rate H.

However, the encoding of L must also desribe its length, and often the symbol that follows the match. These require length _autogen-svg2png-0135.png, and the normalized (per-symbol) cost is

(7.16)
_autogen-svg2png-0136.png

Therefore, the redundancy of Ziv-Lempel style compression algorithms is proportional to