Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 2Background*

It is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/

2013/05/17 13:48:11 -0500

Summary

2.1Convergence of random variables

We include some background material for the course. Let us recall some notions of convergence of random variables (RV's).

  • A sequence of RV's _autogen-svg2png-0001.png converges in probability if _autogen-svg2png-0002.png. We denote this by _autogen-svg2png-0003.png.

  • A sequence of RV's _autogen-svg2png-0004.png converges to _autogen-svg2png-0005.png with probability 1 if _autogen-svg2png-0006.png. We denote this by _autogen-svg2png-0007.png or _autogen-svg2png-0008.png.

  • A sequence of RV's _autogen-svg2png-0009.png converges to _autogen-svg2png-0010.png in the p sense if _autogen-svg2png-0012.png. We denote this by _autogen-svg2png-0013.png.

For example, for p=2 we have mean square convergence, _autogen-svg2png-0015.png. For p≥2,

(2.1)
_autogen-svg2png-0017.png

Therefore, _autogen-svg2png-0018.png yields _autogen-svg2png-0019.png. Note that for convergence in 1 sense, we have

(2.2)
_autogen-svg2png-0021.png

2.2Typical Sequences

The following material appears in most textbooks on information theory (c.f., Cover and Thomas 1 and references therein). We include the highlights in order to make these notes self contained, but skip some details and the proofs. Consider a sequence _autogen-svg2png-0022.png, where xiα, α is the alphabet, and the cardinality of α is r, i.e., |α|=r.

Definition 1 The type of x consists of the empirical probabilities of symbols in x,

(2.3)
_autogen-svg2png-0030.png

where nx(a) is the empirical symbol count, which is the the number of times that aα appears in x.

Definition 2 The set of all possible types is defined as Pn.

Example 2.1

For an alphabet α={0,1} we have _autogen-svg2png-0036.png. In this case, |Pn|=n+1.

Definition 3 A type class Tx contains all x'αn, such that Px=Px',

(2.4)
_autogen-svg2png-0042.png
Example 2.2

Consider α=1,2,3 and x=11321. We have n=5 and the empirical counts are nx=(3,1,1). Therefore, the type is _autogen-svg2png-0047.png, and the type class Tx contains all length-5 sequences with 3 ones, 1 two, and 1 three. That is, Tx={11123,11132,...,32111}. It is easy to see that _autogen-svg2png-0050.png.

Theorem 1 The cardinality of the set of all types satisfies |Pn|≤(n+1)r–1.

The proof is simple, and was given in class. We note in passing that this bound is loose, but it is good enough for our discussion.

Next, consider an i.i.d. source with the following prior,

(2.5)
_autogen-svg2png-0052.png

We note in passing that i.i.d. sources are sometimes called memoryless. Let the entropy be

(2.6)
_autogen-svg2png-0053.png

where we use base-two logarithms throughout. We are studying the entropy _autogen-svg2png-0054.png in order to show that it is the fundamental performance limit in lossless compression. Σ find me

We also define the divergence as

(2.7)
_autogen-svg2png-0056.png

It is well known that the divergence is non-negative,

(2.8)
_autogen-svg2png-0057.png

Moreover, D(PQ)=0 only if the distributions are identical.

Claim 1 The following relation holds,

(2.9)
_autogen-svg2png-0059.png

The derivation is straightforward,

(2.10)
_autogen-svg2png-0060.png

Seeing that the divergence is non-negative Equation 2.8, and zero only if the distributions are equal, we have Q(x)≤Px(x). When Px=Q the divergence between them is zero, and we have that _autogen-svg2png-0063.png.

The proof of the following theorem was discussed in class.

Theorem 2 The cardinality of the type class _autogen-svg2png-0064.png obeys,

(2.11)
_autogen-svg2png-0065.png

Having computed the probability of x and cardinality of its type class, we can easily compute the probability of the type class.

Claim 2 The probability _autogen-svg2png-0067.png of the type class _autogen-svg2png-0068.png obeys,

(2.12)
_autogen-svg2png-0069.png

Consider now an event A that is a union over _autogen-svg2png-0071.png. Suppose T(Q)⊈A, then A is rare with respect to (w.r.t) the prior Q. and we have limn→∞Q(A)=0. That is, the probability is concentrated around Q. In general, the probability assigned by the prior Q to an event A satisfies

(2.13)
_autogen-svg2png-0079.png

where we denote _autogen-svg2png-0080.png when _autogen-svg2png-0081.png.

2.3Fixed and Variable Length Coding

Fixed to fixed length source coding: As before, we have a sequence x of length n, and each element of x is from the alphabet α. A source code maps the input xnrn to a set of 2Rn bit vectors, each of length Rn. The rate R quantifies the number of output bits of the code per input element of x.[1] That is, the output of the code consists of nR bits. If n and R is fixed, then we call this a fixed to fixed length source code.

The decoder processes the nR bits and yields _autogen-svg2png-0099.png. Ideally we have that _autogen-svg2png-0100.png, but if 2nR<rn then there are inputs that are not mapped to any output, and _autogen-svg2png-0102.png may differ from x. Therefore, we want _autogen-svg2png-0104.png to be small. If R is too small, then the error probability will go to 1. On the other hand, sufficiently large R will drive this error probability to 0 as n is increased.

If log(r)>R and _autogen-svg2png-0109.png is vanishing as n is increased, then we are compressing, because 2log(r)n=rn>2Rn, where rn is the number of possible inputs x and there are 2Rn possible outputs.

What is a good fixed to fixed length source code? One option is to map 2Rn–1 outputs to inputs with high probabilities, and the last output can be mapped to a “don't care" input. We will discuss the performance of this style of code.

An input xrn is called δ-typical if Q(x)>2–(H+δ)n. We denote the set of δ-typical inputs by TQ(δ), this set includes the type classes whose empirical probabilities are equal (or closest) to the true prior Q(x). Note that for each type class Tx, all inputs x'Tx in the type class have the same probability, i.e., _autogen-svg2png-0124.png. Therefore, the set TQ(δ) is a union of type classes, and can be thought of as an event A (Section 2.2) that contains type classes consisting of high-probability sequences. It is easily seen that the event A contains the true i.i.d. distribution Q, because sequences whose empirical probabilities satisfy Px=Q also satisfy

(2.14) Q ( x ) = 2H n > 2 – ( H + δ ) n .

Using the principles discussed in Section 2.2, it is readily seen that the probability under the prior Q of the inputs in TQ(δ) satisfies _autogen-svg2png-0133.png when n→∞. Therefore, a code C that enumerates TQ(δ) will encode x correctly with high probability.

The key question is the size of C, or the cardinality of TQ(δ). Because each xTQ(δ) satisfies Q(x)>2(–H+δ)n, and xTQ(δ)Q(x)≤1, we have |TQ(δ)|<2(H+δ)n. Therefore, a rate RH+δ allows near-lossless coding, because the probability of error vanishes (recall that _autogen-svg2png-0145.png, where (·)C denotes the complement).

On the other hand, a rate RHδ will not allow lossless coding, and the probability of error will go to 1. We will see this intuitively. Because the type class whose empirical prob