A Mathematical Theory of Communication by C. E. Shannon - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

y t

dt

;

=

,

:

T 0

2. Frequency weighted R.M.S. criterion. More generally one can apply different weights to the different

frequency components before using an R.M.S. measure of fidelity. This is equivalent to passing the

difference x t

y t through a shaping filter and then determining the average power in the output.

,

Thus let

e t

x t

y t

=

,

and

Z

f t

e

k t

d

=

,

,

then

1 Z T

x y

f t 2 dt

;

=

:

T 0

3. Absolute error criterion.

1 Z T

x y

x t

y t dt

;

=

,

:

T 0

4. The structure of the ear and brain determine implicitly an evaluation, or rather a number of evaluations,

appropriate in the case of speech or music transmission. There is, for example, an “intelligibility”

criterion in which

x y is equal to the relative frequency of incorrectly interpreted words when

;

message x t is received as y t . Although we cannot give an explicit representation of

x y in these

;

cases it could, in principle, be determined by sufficient experimentation. Some of its properties follow

from well-known experimental results in hearing, e.g., the ear is relatively insensitive to phase and the

sensitivity to amplitude and frequency is roughly logarithmic.

5. The discrete case can be considered as a specialization in which we have tacitly assumed an evaluation

based on the frequency of errors. The function

x y is then defined as the number of symbols in the

;

sequence y differing from the corresponding symbols in x divided by the total number of symbols in x.

28. THE RATE FOR A SOURCE RELATIVE TO A FIDELITY EVALUATION

We are now in a position to define a rate of generating information for a continuous source. We are given

P x for the source and an evaluation v determined by a distance function

x y which will be assumed

;

continuous in both x and y. With a particular system P x y the quality is measured by

;

Z

Z

v

x y P x y dx dy

=

;

;

:

Furthermore the rate of flow of binary digits corresponding to P x y is

;

Z Z

P x y

R

P x y log

;

dx dy

=

;

:

P x P y

We define the rate R 1 of generating information for a given quality v 1 of reproduction to be the minimum of R when we keep v fixed at v 1 and vary Px y . That is:

Z Z

P x y

R

;

1

Min

P x y log

dx dy

=

;

Px y

P x P y

49

index-50_1.png

index-50_2.png

index-50_3.png

subject to the constraint:

Z Z

v 1

P x y

x y dx dy

=

;

;

:

This means that we consider, in effect, all the communication systems that might be used and that

transmit with the required fidelity. The rate of transmission in bits per second is calculated for each one

and we choose that having the least rate. This latter rate is the rate we assign the source for the fidelity in question.

The justification of this definition lies in the following result:

Theorem 21: If a source has a rate R 1 for a valuation v 1 it is possible to encode the output of the source and transmit it over a channel of capacity C with fidelity as near v 1 as desired provided R 1

C. This is not

possible if R 1

C.

The last statement in the theorem follows immediately from the definition of R 1 and previous results. If it were not true we could transmit more than C bits per second over a channel of capacity C. The first part of the theorem is proved by a method analogous to that used for Theorem 11. We may, in the first place,

divide the x y space into a large number of small cells and represent the situation as a discrete case. This

;

will not change the evaluation function by more than an arbitrarily small amount (when the cells are very

small) because of the continuity assumed for

x y . Suppose that P 1 x y is the particular system which

;

;

minimizes the rate and gives R 1. We choose from the high probability y’s a set at random containing 2 R

T

1+

members where

0 as T

∞. With large T each chosen point will be connected by a high probability

!

!

line (as in Fig. 10) to a set of x’s. A calculation similar to that used in proving Theorem 11 shows that with large T almost all x’s are covered by the fans from the chosen y points for almost all choices of the y’s. The communication system to be used operates as follows: The selected points are assigned binary numbers.

When a message x is originated it will (with probability approaching 1 as T

∞) lie within at least one

!

of the fans. The corresponding binary number is transmitted (or one of them chosen arbitrarily if there are

several) over the channel by suitable coding means to give a small probability of error. Since R 1

C this is

possible. At the receiving point the corresponding y is reconstructed and used as the recovered message.

The evaluation v 0 for this system can be made arbitrarily close to v

1

1 by taking T sufficiently large.

This is due to the fact that for each long sample of message x t and recovered message y t the evaluation approaches v 1 (with probability 1).

It is interesting to note that, in this system, the noise in the recovered message is actually produced by a

kind of general quantizing at the transmitter and not produced by the noise in the channel. It is more or less analogous to the quantizing noise in PCM.

29. THE CALCULATION OF RATES

The definition of the rate is similar in many respects to the definition of channel capacity. In the former

Z Z

P x y

R

Min

P x y log

;

dx dy

=

;

Px y

P x P y

Z

Z

with P x and v 1

P x y

x y dx dy fixed. In the latter

=

;

;

Z Z

P x y

;

C

Max

P x y log

dx dy

=

;

P x

P x P y

with Px y fixed and possibly one or more other constraints (e.g., an average power limitation) of the form R

R

K

P x y

x y dx dy.

=

;

;

A partial solution of the general maximizing problem for determining the rate of a source can be given.

Using Lagrange’s method we consider

Z Z

P x y

;

P x y log

P x y

x y

x P x y

dx dy

;

+

;

;

+

;

:

P x P y

50

index-51_1.png

index-51_2.png

index-51_3.png

index-51_4.png

index-51_5.png

index-51_6.png

index-51_7.png

The variational equation (when we take the first variation on P x y ) leads to

;

P

x y

;

y x

B x e,

=

where

is determined to give the required fidelity and B x is chosen to satisfy

Z

B x e

x y

,

;

dx

1

=

:

This shows that, with best encoding, the conditional probability of a certain cause for various received

y, Py x will decline exponentially with the distance function

x y between the x and y in question.

;

In the special case where the distance function

x y depends only on the (vector) difference between x

;

and y,

x y

x

y

;

=

,

we have

Z

B x e

x y

,

,

dx

1

=

:

Hence B x is constant, say , and

P

x y

,

y x

e,

=

:

Unfortunately these formal solutions are difficult to evaluate in particular cases and seem to be of little value.

In fact, the actual calculation of rates has been carried out in only a few very simple cases.

If the distance function

x y is the mean square discrepancy between x and y and the message ensemble

;

is white noise, the rate can be determined. In that case we have

R

Min H x

Hy x

H x

Max Hy x

=

,

=

,

with N

x

y 2. But the Max Hy x occurs when y

x is a white noise, and is equal to W 1 log 2 eN where

=

,

,

W 1 is the bandwidth of the message ensemble. Therefore

R

W 1 log 2 eQ

W 1 log 2 eN

=

,

Q

W 1 log

=

N

where Q is the average message power. This proves the following:

Theorem 22: The rate for a white noise source of power Q and band W 1 relative to an R.M.S. measure of fidelity is

Q

R

W 1 log

=

N

where N is the allowed mean square error between original and recovered messages.

More generally with any message source we can obtain inequalities bounding the rate relative to a mean

square error criterion.

Theorem 23: The rate for any source of band W 1 is bounded by

Q 1

Q

W 1 log

R

W 1 log

N

N

where Q is the average power of the source, Q 1 its entropy power and N the allowed mean square error.

The lower bound follows from the fact that the Max H

2

y x

for a given x

y

N occurs in the white

,

=

noise case. The upper bound results if we place points (used in the proof of Theorem 21) not in the best way

p

but at random in a sphere of radius

Q

N.

,

51

index-52_1.png

index-52_2.png

index-52_3.png

index-52_4.png

ACKNOWLEDGMENTS

The writer is indebted to his colleagues at the Laboratories, particularly to Dr. H. W. Bode, Dr. J. R. Pierce, Dr. B. McMillan, and Dr. B. M. Oliver for many helpful suggestions and criticisms during the course of this

work. Credit should also be given to Professor N. Wiener, whose elegant solution of the problems of filtering

and prediction of stationary ensembles has considerably influenced the writer’s thinking in this field.

APPENDIX 5

Let S 1 be any measurable subset of the g ensemble, and S 2 the subset of the f ensemble which gives S 1

under the operation T . Then

S 1

T S 2

=

:

Let H be the operator which shifts all functions in a set by the time . Then

H S 1

H T S 2

T H S 2

=

=

since T is invariant and therefore commutes with H. Hence if m S is the probability measure of the set S

m H S 1

m T H S 2

m H S 2

=

=

m S 2

m S 1

=

=

where the second equality is by definition of measure in the g space, the third since the f ensemble is stationary, and the last by definition of g measure again.

To prove that the ergodic property is preserved under invariant operations, let S 1 be a subset of the g ensemble which is invariant under H, and let S 2 be the set of all functions f which transform into S 1. Then H S 1

H T S 2

T H S 2

S 1

=

=

=

so that H S 2 is included in S 2 for all . Now, since

m H S 2

m S 1

=

this implies

H S 2

S 2

=

for all

with m S 2

0 1. This contradiction shows that S 1 does not exist.

6=

;

APPENDIX 6

The upper bound, N 3

N 1

N 2, is due to the fact that the maximum possible entropy for a power N 1

N 2

+

+

occurs when we have a white noise of this power. In this case the entropy power is N 1

N 2.

+

To obtain the lower bound, suppose we have two distributions in n dimensions p xi and q xi with entropy powers N 1 and N 2. What form should p and q have to minimize the entropy power N 3 of their convolution r xi :

Z

r xi

p yi q xi

yi dyi

=