Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

/ Home / Teacher's Resources / Universal Algorithms in Signal Processing and Communications

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 2. Background^*

It is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/3.0/

2013/05/17 13:48:11 -0500

Summary

2.1. Convergence of random variables

We include some background material for the course. Let us recall some notions of convergence of random variables (RV's).

A sequence of RV's converges in probability if . We denote this by .
A sequence of RV's converges to with probability 1 if . We denote this by or .
A sequence of RV's converges to in the ℓ_p sense if . We denote this by .

For example, for p=2 we have mean square convergence, . For p≥2,

(2.1)

Therefore, yields . Note that for convergence in ℓ₁ sense, we have

(2.2)

2.2. Typical Sequences

The following material appears in most textbooks on information theory (c.f., Cover and Thomas 1 and references therein). We include the highlights in order to make these notes self contained, but skip some details and the proofs. Consider a sequence , where x_i∈α, α is the alphabet, and the cardinality of α is r, i.e., |α|=r.

Definition 1 The type of x consists of the empirical probabilities of symbols in x,

(2.3)

where n_x(a) is the empirical symbol count, which is the the number of times that a∈α appears in x.

Definition 2 The set of all possible types is defined as P_n.

Example 2.1.

For an alphabet α={0,1} we have . In this case, |P_n|=n+1.

Definition 3 A type class T_x contains all x^'∈αⁿ, such that P_x=P_x^',

(2.4)

Example 2.2.

Consider α=1,2,3 and x=11321. We have n=5 and the empirical counts are n_x=(3,1,1). Therefore, the type is , and the type class T_x contains all length-5 sequences with 3 ones, 1 two, and 1 three. That is, T_x={11123,11132,...,32111}. It is easy to see that .

Theorem 1 The cardinality of the set of all types satisfies |P_n|≤(n+1)^r–1.

The proof is simple, and was given in class. We note in passing that this bound is loose, but it is good enough for our discussion.

Next, consider an i.i.d. source with the following prior,

(2.5)

We note in passing that i.i.d. sources are sometimes called memoryless. Let the entropy be

(2.6)

where we use base-two logarithms throughout. We are studying the entropy in order to show that it is the fundamental performance limit in lossless compression. Σ find me

We also define the divergence as

(2.7)

It is well known that the divergence is non-negative,

(2.8)

Moreover, D(P∥Q)=0 only if the distributions are identical.

Claim 1 The following relation holds,

(2.9)

The derivation is straightforward,

(2.10)

Seeing that the divergence is non-negative Equation 2.8, and zero only if the distributions are equal, we have Q(x)≤P_x(x). When P_x=Q the divergence between them is zero, and we have that .

The proof of the following theorem was discussed in class.

Theorem 2 The cardinality of the type class obeys,

(2.11)

Having computed the probability of x and cardinality of its type class, we can easily compute the probability of the type class.

Claim 2 The probability of the type class obeys,

(2.12)

Consider now an event A that is a union over . Suppose T(Q)⊈A, then A is rare with respect to (w.r.t) the prior Q. and we have lim_n→∞Q(A)=0. That is, the probability is concentrated around Q. In general, the probability assigned by the prior Q to an event A satisfies

(2.13)

where we denote when .

2.3. Fixed and Variable Length Coding

Fixed to fixed length source coding: As before, we have a sequence x of length n, and each element of x is from the alphabet α. A source code maps the input xⁿ∈rⁿ to a set of 2^Rn bit vectors, each of length Rn. The rate R quantifies the number of output bits of the code per input element of x.^[1] That is, the output of the code consists of nR bits. If n and R is fixed, then we call this a fixed to fixed length source code.

The decoder processes the nR bits and yields . Ideally we have that , but if 2^nR<rⁿ then there are inputs that are not mapped to any output, and may differ from x. Therefore, we want to be small. If R is too small, then the error probability will go to 1. On the other hand, sufficiently large R will drive this error probability to 0 as n is increased.

If log(r)>R and is vanishing as n is increased, then we are compressing, because 2^log(r)n=rⁿ>2^Rn, where rⁿ is the number of possible inputs x and there are 2^Rn possible outputs.

What is a good fixed to fixed length source code? One option is to map 2^Rn–1 outputs to inputs with high probabilities, and the last output can be mapped to a “don't care" input. We will discuss the performance of this style of code.

An input x∈rⁿ is called δ-typical if Q(x)>2^–(H+δ)n. We denote the set of δ-typical inputs by T_Q(δ), this set includes the type classes whose empirical probabilities are equal (or closest) to the true prior Q(x). Note that for each type class T_x, all inputs x^'∈T_x in the type class have the same probability, i.e., . Therefore, the set T_Q(δ) is a union of type classes, and can be thought of as an event A (Section 2.2) that contains type classes consisting of high-probability sequences. It is easily seen that the event A contains the true i.i.d. distribution Q, because sequences whose empirical probabilities satisfy P_x=Q also satisfy

(2.14) Q ( x ) = 2^{–
H
n} > 2^{–
(
H
+
δ
)
n} .

Using the principles discussed in Section 2.2, it is readily seen that the probability under the prior Q of the inputs in T_Q(δ) satisfies when n→∞. Therefore, a code C that enumerates T_Q(δ) will encode x correctly with high probability.

The key question is the size of C, or the cardinality of T_Q(δ). Because each x∈T_Q(δ) satisfies Q(x)>2^(–H+δ)n, and ∑_{x∈T_Q(δ)}Q(x)≤1, we have |T_Q(δ)|<2^(H+δ)n. Therefore, a rate R≥H+δ allows near-lossless coding, because the probability of error vanishes (recall that , where (·)^C denotes the complement).

On the other hand, a rate R≤H–δ will not allow lossless coding, and the probability of error will go to 1. We will see this intuitively. Because the type class whose empirical prob

Universal Algorithms in Signal Processing and Communications by Denver Greene - HTML preview

Chapter 2. Background*

2.1. Convergence of random variables

2.2. Typical Sequences

2.3. Fixed and Variable Length Coding

Chapter 2. Background^*