A Mathematical Theory of Communication by C. E. Shannon - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

:

:

;

letter frequencies, but nothing else. The maximum entropy source is then the first approximation to English

and its entropy determines the required channel capacity.

As a simple example of some of these results consider a source which produces a sequence of letters

chosen from among A, B, C, D with probabilities 1 , 1 , 1 , 1 , successive symbols being chosen independently.

2

4

8

8

We have

,

H

1 log 1

1 log 1

2 log 1

=

,

2

2 + 4

4 + 8

8

7 bits per symbol

=

4

:

Thus we can approximate a coding system to encode messages from this source into binary digits with an

average of 7 binary digit per symbol. In this case we can actually achieve the limiting value by the following 4

code (obtained by the method of the second proof of Theorem 9):

A

0

B

10

C

110

D

111

The average number of binary digits used in encoding a sequence of N symbols will be

2

,

N 1

1

1

2

3

7 N

2

+

4

+

=

:

8

4

It is easily seen that the binary digits 0, 1 have probabilities 1 , 1 so the H for the coded sequences is one 2

2

bit per symbol. Since, on the average, we have 7 binary symbols per original letter, the entropies on a time

4

basis are the same. The maximum possible entropy for the original set is log 4

2, occurring when A, B, C,

=

D have probabilities 1 , 1 , 1 , 1 . Hence the relative entropy is 7 . We can translate the binary sequences into 4

4

4

4

8

the original set of symbols on a two-to-one basis by the following table:

00

A 0

01

B 0

10

C 0

11

D 0

18

index-19_1.png

index-19_2.png

This double process then encodes the original message into the same symbols but with an average compres-

sion ratio 7 .

8

As a second example consider a source which produces a sequence of A’s and B’s with probability p for A and q for B. If p

q we have

H

log pp 1

p 1 p

,

=

,

,

p log p 1

p 1 p p

,

=

=

,

,

e

:

p log

=

:

p

In such a case one can construct a fairly good coding of the message on a 0, 1 channel by sending a special

sequence, say 0000, for the infrequent symbol A and then a sequence indicating the number of B’s following it. This could be indicated by the binary representation with all numbers containing the special sequence

deleted. All numbers up to 16 are represented as usual; 16 is represented by the next binary number after 16

which does not contain four zeros, namely 17

10001, etc.

=

It can be shown that as p

0 the coding approaches ideal provided the length of the special sequence is

!

properly adjusted.

PART II: THE DISCRETE CHANNEL WITH NOISE

11. REPRESENTATION OF A NOISY DISCRETE CHANNEL

We now consider the case where the signal is perturbed by noise during transmission or at one or the other

of the terminals. This means that the received signal is not necessarily the same as that sent out by the

transmitter. Two cases may be distinguished. If a particular transmitted signal always produces the same

received signal, i.e., the received signal is a definite function of the transmitted signal, then the effect may be called distortion. If this function has an inverse — no two transmitted signals producing the same received

signal — distortion may be corrected, at least in principle, by merely performing the inverse functional

operation on the received signal.

The case of interest here is that in which the signal does not always undergo the same change in trans-

mission. In this case we may assume the received signal E to be a function of the transmitted signal S and a second variable, the noise N.

E

f S N

=

;

The noise is considered to be a chance variable just as the message was above. In general it may be repre-

sented by a suitable stochastic process. The most general type of noisy discrete channel we shall consider

is a generalization of the finite state noise-free channel described previously. We assume a finite number of

states and a set of probabilities

p i

j

;

:

;

This is the probability, if the channel is in state

and symbol i is transmitted, that symbol j will be received

and the channel left in state

. Thus

and

range over the possible states, i over the possible transmitted

signals and j over the possible received signals. In the case where successive symbols are independently perturbed by the noise there is only one state, and the channel is described by the set of transition probabilities pi j , the probability of transmitted symbol i being received as j.

If a noisy channel is fed by a source there are two statistical processes at work: the source and the noise.

Thus there are a number of entropies that can be calculated. First there is the entropy H x of the source or of the input to the channel (these will be equal if the transmitter is non-singular). The entropy of the

output of the channel, i.e., the received signal, will be denoted by H y . In the noiseless case H y H x .

=

The joint entropy of input and output will be H xy . Finally there are two conditional entropies Hx y and Hy x , the entropy of the output when the input is known and conversely. Among these quantities we have the relations

H x y

H x

Hx y

H y

Hy x

;

=

+

=

+

:

All of these entropies can be measured on a per-second or a per-symbol basis.

19

index-20_1.png

index-20_2.png

index-20_3.png

index-20_4.png

index-20_5.png

index-20_6.png

index-20_7.png

index-20_8.png

12. EQUIVOCATION AND CHANNEL CAPACITY

If the channel is noisy it is not in general possible to reconstruct the original message or the transmitted

signal with certainty by any operation on the received signal E. There are, however, ways of transmitting the information which are optimal in combating noise. This is the problem which we now consider.

Suppose there are two possible symbols 0 and 1, and we are transmitting at a rate of 1000 symbols per

second with probabilities p

1

0

p 1

. Thus our source is producing information at the rate of 1000 bits

=

=

2

per second. During transmission the noise introduces errors so that, on the average, 1 in 100 is received

incorrectly (a 0 as 1, or 1 as 0). What is the rate of transmission of information? Certainly less than 1000

bits per second since about 1% of the received symbols are incorrect. Our first impulse might be to say

the rate is 990 bits per second, merely subtracting the expected number of errors. This is not satisfactory

since it fails to take into account the recipient’s lack of knowledge of where the errors occur. We may carry

it to an extreme case and suppose the noise so great that the received symbols are entirely independent of

the transmitted symbols. The probability of receiving 1 is 1 whatever was transmitted and similarly for 0.

2

Then about half of the received symbols are correct due to chance alone, and we would be giving the system

credit for transmitting 500 bits per second while actually no information is being transmitted at all. Equally

“good” transmission would be obtained by dispensing with the channel entirely and flipping a coin at the

receiving point.

Evidently the proper correction to apply to the amount of information transmitted is the amount of this

information which is missing in the received signal, or alternatively the uncertainty when we have received

a signal of what was actually sent. From our previous discussion of entropy as a measure of uncertainty it

seems reasonable to use the conditional entropy of the message, knowing the received signal, as a measure

of this missing information. This is indeed the proper definition, as we shall see later. Following this idea

the rate of actual transmission, R, would be obtained by subtracting from the rate of production (i.e., the entropy of the source) the average rate of conditional entropy.

R

H x

Hy x

=

,

The conditional entropy Hy x will, for convenience, be called the equivocation. It measures the average ambiguity of the received signal.

In the example considered above, if a 0 is received the a posteriori probability that a 0 was transmitted is .99, and that a 1 was transmitted is .01. These figures are reversed if a 1 is received. Hence

Hy x

99 log 99

0 01 log0 01

=

,

:

:

+

:

:

081 bits/symbol

=

:

or 81 bits per second. We may say that the system is transmitting at a rate 1000

81

919 bits per second.

,

=

In the extreme case where a 0 is equally likely to be received as a 0 or 1 and similarly for 1, the a posteriori probabilities are 1 , 1 and

2

2

H

1

1

y x

log 1

log 1

=

,

2

2 + 2

2

1 bit per symbol

=

or 1000 bits per second. The rate of transmission is then 0 as it should be.

The following theorem gives a direct intuitive interpretation of the equivocation and also serves to justify

it as the unique appropriate measure. We consider a communication system and an observer (or auxiliary

device) who can see both what is sent and what is recovered (with errors due to noise). This observer notes

the errors in the recovered message and transmits data to the receiving point over a “correction channel” to

enable the receiver to correct the errors. The situation is indicated schematically in Fig. 8.

Theorem 10: If the correction channel has a capacity equal to Hy x it is possible to so encode the correction data as to send it over this channel and correct all but an arbitrarily small fraction of the errors.

This is not possible if the channel capacity is less than Hy x .

20

CORRECTION DATA

OBSERVER

M

M0

M

SOURCE

TRANSMITTER

RECEIVER

CORRECTING

DEVICE

Fig. 8 — Schematic diagram of a correction system.

Roughly then, Hy x is the amount of additional information that must be supplied per second at the

receiving point to correct the received message.

To prove the first part, consider long sequences of received message M 0 and corresponding original

message M. There will be logarithmically T Hy x of the M’s which could reasonably have produced each M 0. Thus we have T Hy x binary digits to send each T seconds. This can be done with frequency of errors on a channel of capacity Hy x .

The second part can be proved by noting, first, that for any discrete chance variables x, y, z Hy x z

Hy x

;

:

The left-hand side can be expanded to give

Hy z

Hyz x

Hy x

+

Hyz x

Hy x

Hy z

Hy x

H z

,

,

:

If we identify x as the output of the source, y as the received signal and z as the signal sent over the correction channel, then the right-hand side is the equivocation less the rate of transmission over the correction channel.

If the capacity of this channel is less than the equivocation the right-hand side will be greater than zero and Hyz x

0. But this is the uncertainty of what was sent, knowing both the received signal and the correction

signal. If this is greater than zero the frequency of errors cannot be arbitrarily small.

Example:

Suppose the errors occur at random in a sequence of binary digits: probability p that a digit is wrong and q

1

p that it is right. These errors can be corrected if their position is known. Thus the

=

,

correction channel need only send information as to these positions. This amounts to transmitting

from a source which produces binary digits with probability p for 1 (incorrect) and q for 0 (correct).

This requires a channel of capacity

p log p

q log q

,

+

which is the equivocation of the original system.

The rate of transmission R can be written in two other forms due to the identities noted above. We have R

H x

Hy x

=

,

H y

Hx y

=

,

H x

H y

H x y

=

+

,

;

:

21

The first defining expression has already been interpreted as the amount of information sent less the uncer-

tainty of what was sent. The second measures the amount received less the part of this which is due to noise.

The third is the sum of the two amounts less the joint entropy and therefore in a sense is the number of bits

per second common to the two. Thus all three expressions have a certain intuitive significance.

The capacity C of a noisy channel should be the maximum possible rate of transmission, i.e., the rate when the source is properly matched to the channel. We therefore define the channel capacity by

,

C

Max H x

Hy x

=

,

where the maximum is with respect to all possible information sources used as input to the channel. If the

channel is noiseless, Hy x

0. The definition is then equivalent to that already given for a noiseless channel

=

since the maximum entropy for the channel is its capacity.

13. THE FUNDAMENTAL THEOREM FOR A DISCRETE CHANNEL WITH NOISE

It may seem surprising that we should define a definite capacity C for a noisy channel since we can never send certain information in such a case. It is clear, however, that by sending the information in a redundant

form the probability of errors can be reduced. For example, by repeating the message many times and by a

statistical study of the different received versions of the message the probability of errors could be made very small. One would expect, however, that to make this probability of errors approach zero, the redundancy

of the encoding must increase indefinitely, and the rate of transmission therefore approach zero. This is by

no means true. If it were, there would not be a very well defined capacity, but only a capacity for a given

frequency of errors, or a given equivocation; the capacity going down as the error requirements are made

more stringent. Actually the capacity C defined above has a very definite significance. It is possible to send information at the rate C through the channel with as small a frequency of errors or equivocation as desired by proper encoding. This statement is not true for any rate greater than C. If an attempt is made to transmit at a higher rate than C, say C

R 1, then there will necessarily be an equivocation equal to or greater than the

+

excess R 1. Nature takes payment by requiring just that much uncertainty, so that we are not actually getting any more than C through correctly.

The situation is indicated in Fig. 9. The rate of information into the channel is plotted horizontally and

the equivocation vertically. Any point above the heavy line in the shaded region can be attained and those

below cannot. The points on the line cannot in general be attained, but there will usually be two points on

the line that can.

These results are the main justification for the definition of C and will now be proved.

Theorem 11: Let a discrete channel have the capacity C and a discrete source the entropy per second H.

If H

C there exists a coding system such that the output of the source can be transmitted over the channel

with an arbitrarily small frequency of errors (or an arbitrarily small equivocation). If H

C it is possible

to encode the source so that the equivocation is less than H

C

where is arbitrarily small. There is no

,

+

method of encoding which gives an equivocation less than H

C.

,

The method of proving the first part of this theorem is not by exhibiting a coding method having the

desired properties, but by showing that such a code must exist in a certain group of codes. In fact we will

ATTAINABLE

Hy x

REGION

1.0

=

OPE

SL

C

H x

Fig. 9 — The equivocation possible for a given input entropy to a channel.

22

average the frequency of errors over this group and show that this average can be made less than . If the

average of a set of numbers is less than

there must exist at least one in the set which is less than . This

will establish the desired result.

The capacity C of a noisy channel has been defined as

,

C

Max H x

Hy x

=

,

where x is the input and y the output. The maximization is over all sources which might be used as input to the channel.

Let S 0 be a source which achieves the maximum capacity C. If this maximum is not actually achieved by any source let S 0 be a source which approximates to giving the maximum rate. Suppose S 0 is used as input to the channel. We consider the possible transmitted and received sequences of a long duration T . The following will be true:

1. The transmitted sequences fall into two classes, a high probability gr