Applied Probability by Paul E Pfeiffer - HTML preview

/ Home / Mathematics (Academic) / Applied Probability

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 14. Conditional Expectation, Regression

14.1. Conditional Expectation, Regression^*

Conditional expectation, given a random vector, plays a fundamental role in much of modern probability theory. Various types of “conditioning” characterize some of the more important random sequences and processes. The notion of conditional independence is expressed in terms of conditional expectation. Conditional independence plays an essential role in the theory of Markov processes and in much of decision theory.

We first consider an elementary form of conditional expectation with respect to an event. Then we consider two highly intuitive special cases of conditional expectation, given a random variable. In examining these, we identify a fundamental property which provides the basis for a very general extension. We discover that conditional expectation is a random quantity. The basic property for conditional expectation and properties of ordinary expectation are used to obtain four fundamental properties which imply the “expectationlike” character of conditional expectation. An extension of the fundamental property leads directly to the solution of the regression problem which, in turn, gives an alternate interpretation of conditional expectation.

Conditioning by an event

If a conditioning event C occurs, we modify the original probabilities by introducing the conditional probability measure P(·|C). In making the change from

(14.1)

we effectively do two things:

We limit the possible outcomes to event C
We “normalize” the probability mass by taking P(C) as the new unit

It seems reasonable to make a corresponding modification of mathematical expectation when the occurrence of event C is known. The expectation E[X] is the probability weighted average of the values taken on by X. Two possibilities for making the modification are suggested.

We could replace the prior probability measure P(·) with the conditional probability measure P(·|C) and take the weighted average with respect to these new weights.
We could continue to use the prior probability measure P(·) and modify the averaging process as follows:
- Consider the values X(ω) for only those ω∈C. This may be done by using the random variable I_CX which has value X(ω) for ω∈C and zero elsewhere. The expectation is the probability weighted sum of those values taken on in C.
- The weighted average is obtained by dividing by P(C).

These two approaches are equivalent. For a simple random variable in canonical form

(14.2)

The final sum is expectation with respect to the conditional probability measure. Arguments using basic theorems on expectation and the approximation of general random variables by simple random variables allow an extension to a general random variable X. The notion of a conditional distribution, given C, and taking weighted averages with respect to the conditional probability is intuitive and natural in this case. However, this point of view is limited. In order to display a natural relationship with more the general concept of conditioning with repspect to a random vector, we adopt the following

Definition. The conditional expectation of X, given event C with positive probability, is the quantity

(14.3)

Remark. The product form is often useful.

Example 14.1. A numerical example

Suppose X∼ exponential (λ) and C={1/λ≤X≤2/λ}. Now I_C=I_M(X) where .

(14.4)

(14.5)

Thus

(14.6)

Conditioning by a random vector—discrete case

Suppose and in canonical form. We suppose and , for each permissible i,j. Now

(14.7)

We take the expectation relative to the conditional probability to get

(14.8)

Since we have a value for each t_i in the range of X, the function e(·) is defined on the range of X. Now consider any reasonable set M on the real line and determine the expectation

(14.9)

(14.10)

(14.11)

We have the pattern

(14.12)

for all t_i in the range of X.

We return to examine this property later. But first, consider an example to display the nature of the concept.

Example 14.2. Basic calculations and interpretation

Suppose the pair has the joint distribution

(14.13)

Table 14.1.
	0	1	4	9
Y = 2	0.05	0.04	0.21	0.15
0	0.05	0.01	0.09	0.10
-1	0.10	0.05	0.10	0.05
	0.20	0.10	0.40	0.30

Calculate for each possible value t_i taken on by X

=(–1·0.10+0·0.05+2·0.05)/0.20=0
E[Y|X=1]=(–1·0.05+0·0.01+2·0.04)/0.10=0.30
E[Y|X=4]=(–1·0.10+0·0.09+2·0.21)/0.40=0.80
E[Y|X=9]=(–1·0.05+0·0.10+2·0.15)/0.10=0.83

The pattern of operation in each case can be described as follows:

For the ith column, multiply each value u_j by , sum, then divide by .

The following interpretation helps visualize the conditional expectation and points to an important result in the general case.

For each t_i we use the mass distributed “above” it. This mass is distributed along a vertical line at values u_j taken on by Y. The result of the computation is to determine the center of mass for the conditional distribution above t=t_i. As in the case of ordinary expectations, this should be the best estimate, in the mean-square sense, of Y when X=t_i. We examine that possibility in the treatment of the regression problem in the section called “The regression problem”.

Although the calculations are not difficult for a problem of this size, the basic pattern can be implemented simply with MATLAB, making the handling of much larger problems quite easy. This is particularly useful in dealing with the simple approximation to an absolutely continuous pair.

X = [0 1 4 9];             % Data for the joint distribution
Y = [-1 0 2];
P = 0.01*[ 5  4 21 15; 5  1  9 10; 10  5 10  5];
jcalc                      % Setup for calculations
Enter JOINT PROBABILITIES (as on the plane)  P
Enter row matrix of VALUES of X  X
Enter row matrix of VALUES of Y  Y
 Use array operations on matrices X, Y, PX, PY, t, u, and P
EYX = sum(u.*P)./sum(P);   % sum(P) = PX  (operation sum yields column sums)
disp([X;EYX]')             % u.*P = u_j P(X = t_i, Y = u_j) for all i, j
         0         0
    1.0000    0.3000
    4.0000    0.8000
    9.0000    0.8333

The calculations extend to . Instead of values of u_j we use values of in the calculations. Suppose Z=g(X,Y)=Y²–2XY.

G = u.^2 - 2*t.*u;         % Z = g(X,Y) = Y^2 - 2XY
EZX = sum(G.*P)./sum(P);   % E[Z|X=x]
disp([X;EZX]')
         0    1.5000
    1.0000    1.5000
    4.0000   -4.0500
    9.0000  -12.8333

Conditioning by a random vector — absolutely continuous case

Suppose the pair has joint density function f_XY. We seek to use the concept of a conditional distribution, given X=t. The fact that P(X=t)=0 for each t requires a modification of the approach adopted in the discrete case. Intuitively, we consider the conditional density

(14.14)

The condition f_X(t)>0 effectively determines the range of X. The function f_Y|X(·|t) has the properties of a density for each fixed t for which f_X(t)>0.