Applied Probability by Paul E Pfeiffer - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 14Conditional Expectation, Regression

14.1Conditional Expectation, Regression*

Conditional expectation, given a random vector, plays a fundamental role in much of modern probability theory. Various types of “conditioning” characterize some of the more important random sequences and processes. The notion of conditional independence is expressed in terms of conditional expectation. Conditional independence plays an essential role in the theory of Markov processes and in much of decision theory.

We first consider an elementary form of conditional expectation with respect to an event. Then we consider two highly intuitive special cases of conditional expectation, given a random variable. In examining these, we identify a fundamental property which provides the basis for a very general extension. We discover that conditional expectation is a random quantity. The basic property for conditional expectation and properties of ordinary expectation are used to obtain four fundamental properties which imply the “expectationlike” character of conditional expectation. An extension of the fundamental property leads directly to the solution of the regression problem which, in turn, gives an alternate interpretation of conditional expectation.

Conditioning by an event

If a conditioning event C occurs, we modify the original probabilities by introducing the conditional probability measure P(·|C). In making the change from

(14.1)
_autogen-svg2png-0002.png

we effectively do two things:

  • We limit the possible outcomes to event C

  • We “normalize” the probability mass by taking P(C) as the new unit

It seems reasonable to make a corresponding modification of mathematical expectation when the occurrence of event C is known. The expectation E[X] is the probability weighted average of the values taken on by X. Two possibilities for making the modification are suggested.

  • We could replace the prior probability measure P(·) with the conditional probability measure P(·|C) and take the weighted average with respect to these new weights.

  • We could continue to use the prior probability measure P(·) and modify the averaging process as follows:

    • Consider the values X(ω) for only those ωC. This may be done by using the random variable ICX which has value X(ω) for ωC and zero elsewhere. The expectation _autogen-svg2png-0013.png is the probability weighted sum of those values taken on in C.

    • The weighted average is obtained by dividing by P(C).

These two approaches are equivalent. For a simple random variable _autogen-svg2png-0015.png in canonical form

(14.2)
_autogen-svg2png-0016.png

The final sum is expectation with respect to the conditional probability measure. Arguments using basic theorems on expectation and the approximation of general random variables by simple random variables allow an extension to a general random variable X. The notion of a conditional distribution, given C, and taking weighted averages with respect to the conditional probability is intuitive and natural in this case. However, this point of view is limited. In order to display a natural relationship with more the general concept of conditioning with repspect to a random vector, we adopt the following

Definition. The conditional expectation of X, given event C with positive probability, is the quantity

(14.3)
_autogen-svg2png-0017.png

Remark. The product form _autogen-svg2png-0018.png is often useful.

Example 14.1A numerical example

Suppose X exponential (λ) and C={1/λX≤2/λ}. Now IC=IM(X) where _autogen-svg2png-0023.png.

(14.4)
_autogen-svg2png-0024.png
(14.5)
_autogen-svg2png-0025.png

Thus

(14.6)
_autogen-svg2png-0026.png

Conditioning by a random vector—discrete case

Suppose _autogen-svg2png-0027.png and _autogen-svg2png-0028.png in canonical form. We suppose_autogen-svg2png-0029.png and _autogen-svg2png-0030.png, for each permissible i,j. Now

(14.7)
_autogen-svg2png-0032.png

We take the expectation relative to the conditional probability _autogen-svg2png-0033.png to get

(14.8)
_autogen-svg2png-0034.png

Since we have a value for each ti in the range of X, the function e(·) is defined on the range of X. Now consider any reasonable set M on the real line and determine the expectation

(14.9)
_autogen-svg2png-0036.png
(14.10)
_autogen-svg2png-0037.png
(14.11)
_autogen-svg2png-0038.png

We have the pattern

(14.12)
_autogen-svg2png-0039.png

for all ti in the range of X.

We return to examine this property later. But first, consider an example to display the nature of the concept.

Example 14.2Basic calculations and interpretation

Suppose the pair _autogen-svg2png-0040.png has the joint distribution

(14.13)
_autogen-svg2png-0041.png
Table 14.1.
_autogen-svg2png-0042.png 0149
Y = 2 0.050.040.210.15
00.050.010.090.10
-10.100.050.100.05
_autogen-svg2png-0044.png 0.200.100.400.30

Calculate _autogen-svg2png-0045.png for each possible value ti taken on by X

  • _autogen-svg2png-0046.png

  • =(–1·0.10+0·0.05+2·0.05)/0.20=0

  • E[Y|X=1]=(–1·0.05+0·0.01+2·0.04)/0.10=0.30

  • E[Y|X=4]=(–1·0.10+0·0.09+2·0.21)/0.40=0.80

  • E[Y|X=9]=(–1·0.05+0·0.10+2·0.15)/0.10=0.83

The pattern of operation in each case can be described as follows:

  • For the ith column, multiply each value uj by _autogen-svg2png-0051.png, sum, then divide by _autogen-svg2png-0052.png.

The following interpretation helps visualize the conditional expectation and points to an important result in the general case.

  • For each ti we use the mass distributed “above” it. This mass is distributed along a vertical line at values uj taken on by Y. The result of the computation is to determine the center of mass for the conditional distribution above t=ti. As in the case of ordinary expectations, this should be the best estimate, in the mean-square sense, of Y when X=ti. We examine that possibility in the treatment of the regression problem in the section called “The regression problem”.

Although the calculations are not difficult for a problem of this size, the basic pattern can be implemented simply with MATLAB, making the handling of much larger problems quite easy. This is particularly useful in dealing with the simple approximation to an absolutely continuous pair.

X = [0 1 4 9];             % Data for the joint distribution
Y = [-1 0 2];
P = 0.01*[ 5  4 21 15; 5  1  9 10; 10  5 10  5];
jcalc                      % Setup for calculations
Enter JOINT PROBABILITIES (as on the plane)  P
Enter row matrix of VALUES of X  X
Enter row matrix of VALUES of Y  Y
 Use array operations on matrices X, Y, PX, PY, t, u, and P
EYX = sum(u.*P)./sum(P);   % sum(P) = PX  (operation sum yields column sums)
disp([X;EYX]')             % u.*P = u_j P(X = t_i, Y = u_j) for all i, j
         0         0
    1.0000    0.3000
    4.0000    0.8000
    9.0000    0.8333

The calculations extend to _autogen-svg2png-0055.png. Instead of values of uj we use values of _autogen-svg2png-0056.png in the calculations. Suppose Z=g(X,Y)=Y2–2XY.

G = u.^2 - 2*t.*u;         % Z = g(X,Y) = Y^2 - 2XY
EZX = sum(G.*P)./sum(P);   % E[Z|X=x]
disp([X;EZX]')
         0    1.5000
    1.0000    1.5000
    4.0000   -4.0500
    9.0000  -12.8333

Conditioning by a random vector — absolutely continuous case

Suppose the pair _autogen-svg2png-0058.png has joint density function fXY. We seek to use the concept of a conditional distribution, given X=t. The fact that P(X=t)=0 for each t requires a modification of the approach adopted in the discrete case. Intuitively, we consider the conditional density

(14.14)
_autogen-svg2png-0062.png

The condition fX(t)>0 effectively determines the range of X. The function fY|X(·|t) has the properties of a density for each fixed t for which fX(t)>0.

(14.15)
_autogen-svg2png-0066.png

We define, in this case,

(14.16)
_autogen-svg2png-0067.png

The function e(·) is defined for fX(t)>0, hence effectively on the range of X. For any reasonable set M on the real line,

(14.17)
_autogen-svg2png-0070.png
(14.18)
_autogen-svg2png-0071.png

Thus we have, as in the discrete case, for each t in the range of X.

(14.19)
_autogen-svg2png-0072.png

Again, we postpone examination of this pattern until we consider a more general case.

Example 14.3Basic calculation and interpretation

Suppose the pair _autogen-svg2png-0073.png has joint density _autogen-svg2png-0074.png on the triangular region bounded by t=0, u=1, and u=t (see Figure 14.1). Then

(14.20)
_autogen-svg2png-0078.png

By definition, then,

(14.21)
_autogen-svg2png-0079.png

We thus have

(14.22)
_autogen-svg2png-0080.png

Theoretically, we must rule out t=1 since the denominator is zero for that value of t. This causes no problem in practice.

Figure 14.1
The density function for Example 14.3.

We are able to make an interpretation quite analogous to that for the discrete case. This also points the way to practical MATLAB calculations.

  • For any