Applied Probability by Paul E Pfeiffer - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 16Conditional Independence, Given a Random Vector

16.1Conditional Independence, Given a Random Vector*

In the unit on Conditional Independence , the concept of conditional independence of events is examined and used to model a variety of common situations. In this unit, we investigate a more general concept of conditional independence, based on the theory of conditional expectation. This concept lies at the foundations of Bayesian statistics, of many topics in decision theory, and of the theory of Markov systems. We examine in this unit, very briefly, the first of these. In the unit on Markov Sequences, we provide an introduction to the third.

The concept

The definition of conditional independence of events is based on a product rule which may be expressed in terms of conditional expectation, given an event. The pair _autogen-svg2png-0001.png is conditionally independent, given C, iff

(16.1)
_autogen-svg2png-0002.png

If we let A=X–1(M) and B=Y–1(N), then IA=IM(X) and IB=IN(Y). It would be reasonable to consider the pair _autogen-svg2png-0007.png conditionally independent, given event C, iff the product rule

(16.2)
_autogen-svg2png-0008.png

holds for all reasonable M and N (technically, all Borel M and N). This suggests a possible extension to conditional expectation, given a random vector. We examine the following concept.

Definition. The pair {X,Y} is conditionally independent, givenZ, designated _autogen-svg2png-0010.png, iff

(16.3)
_autogen-svg2png-0011.png

Remark. Since it is not necessary that _autogen-svg2png-0012.png, or Z be real valued, we understand that the sets M and N are on the codomains for X and Y, respectively. For example, if X is a three dimensional random vector, then M is a subset of R3.

As in the case of other concepts, it is useful to identify some key properties, which we refer to by the numbers used in the table in Appendix G. We note two kinds of equivalences. For example, the following are equivalent.

(CI1) _autogen-svg2png-0013.png

(CI5) _autogen-svg2png-0014.png

Because the indicator functions are special Borel functions, (CI1) is a special case of (CI5). To show that (CI1) implies (CI5), we need to use linearity, monotonicity, and monotone convergence in a manner similar to that used in extending properties (CE1) to (CE6) for conditional expectation. A second kind of equivalence involves various patterns. The properties (CI1), (CI2), (CI3), and (CI4) are equivalent, with (CI1) being the defining condition for _autogen-svg2png-0015.png.

(CI1) _autogen-svg2png-0016.png

(CI2) _autogen-svg2png-0017.png

(CI3) _autogen-svg2png-0018.png

(CI4) _autogen-svg2png-0019.png

As an example of the kinds of argument needed to verify these equivalences, we show the equivalence of (CI1) and (CI2).

  • (CI1) implies (CI2). Set _autogen-svg2png-0020.png and _autogen-svg2png-0021.png. If we show

    (16.4)
    _autogen-svg2png-0022.png

    then by the uniqueness property (E5b) for expectation we may assert _autogen-svg2png-0023.png Using the defining property (CE1) for conditional expectation, we have

    (16.5)
    _autogen-svg2png-0024.png

    On the other hand, use of (CE1), (CE8), (CI1), and (CE1) yields

    (16.6)
    _autogen-svg2png-0025.png
    (16.7)
    _autogen-svg2png-0026.png
    (16.8)
    _autogen-svg2png-0027.png

    which establishes the desired equality.

  • (CI2) implies (CI1). Using (CE9), (CE8), (CI2), and (CE8), we have

    (16.9)
    _autogen-svg2png-0028.png
    (16.10)
    _autogen-svg2png-0029.png
    (16.11)
    _autogen-svg2png-0030.png

Use of property (CE8) shows that (CI2) and (CI3) are equivalent. Now just as (CI1) extends to (CI5), so also (CI3) is equivalent to

(CI6) _autogen-svg2png-0031.png

Property (CI6) provides an important interpretation of conditional independence:

_autogen-svg2png-0032.png is the best mean-square estimator for _autogen-svg2png-0033.png, given knowledge of Z. The condition _autogen-svg2png-0034.png implies that additional knowledge about Y does not modify that best estimate. This interpretation is often the most useful as a modeling assumption.

Similarly, property (CI4) is equivalent to

(CI8) _autogen-svg2png-0035.png

Property (CI7) is an alternate way of expressing (CI6). Property (CI9) is just a convenient way of expressing the other conditions.

The additional properties in Appendix G are useful in a variety of contexts, particularly in establishing properties of Markov systems. We refer to them as needed.

The Bayesian approach to statistics

In the classical approach to statistics, a fundamental problem is to obtain information about the population distribution from the distribution in a simple random sample. There is an inherent difficulty with this approach. Suppose it is desired to determine the population mean μ. Now μ is an unknown quantity about which there is uncertainty. However, since it is a constant, we cannot assign a probability such as P(a<μb). This has no meaning.

The Bayesian approach makes a fundamental change of viewpoint. Since the population mean is a quantity about which there is uncertainty, it is modeled as a random variable whose value is to be determined by experiment. In this view, the population distribution is conceived as randomly selected from a class of such distributions. One way of expressing this idea is to refer to a state of nature. The population distribution has been “selected by nature” from a class of distributions. The mean value is thus a random variable whose value is determined by this selection. To implement this point of view, we assume

  1. The value of the parameter (say μ in the discussion above) is a “realization” of a parameter random variable H. If two or more parameters are sought (say the mean and variance), they may be considered components of a parameter random vector.

  2. The population distribution is a conditional distribution, given the value of H.

The Bayesian model

If X is a random variable whose distribution is the population distribution and H is the parameter random variable, then _autogen-svg2png-0037.png have a joint distribution.

  1. For each u in the range of H, we have a conditional distribution for X, given H=u.

  2. We assume a prior distribution for H. This is based on previous experience.

  3. We have a random sampling process, given H: i.e., _autogen-svg2png-0039.png is conditionally iid, given H. Let _autogen-svg2png-0040.png and consider the joint conditional distribution function

    (16.12)
    _autogen-svg2png-0041.png
    (16.13)
    _autogen-svg2png-0042.png

    If X has conditional density, given H, then a similar product rule holds.

Population proportion

We illustrate these ideas with one of the simplest, but most important, statistical problems: that of determining the proportion of a population which has a particular characteristic. Examples abound. We mention only a few to indicate the importance.

  1. The proportion of a population of voters who plan to vote for a certain candidate.

  2. The proportion of a given population which has a certain disease.

  3. The fraction of items from a production line which meet specifications.

  4. The fraction of women between the ages eighteen and fifty five who hold full time jobs.

The parameter in this case is the proportion p who meet the criterion. If sampling is at random, then the sampling process is equivalent to a sequence of Bernoulli trials. If H is the parameter random variable and Sn is the number of “successes” in a sample of size n, then the conditional distribution for Sn, given H=u, is binomial _autogen-svg2png-0044.png. To see this, consider

(16.14)
_autogen-svg2png-0045.png

Anaysis is carried out for each fixed u as in the ordinary Bernoulli case. If

(16.15)
_autogen-svg2png-0046.png

we have the result

(16.16)
_autogen-svg2png-0047.png

The objective

We seek to determine the best mean-square estimate of H, given Sn=k. Two steps must be taken:

  1. If H=u, we know _autogen-svg2png-0050.png. Sampling gives Sn=k. We make a Bayesian reversal to get an exression for _autogen-svg2png-0052.png.

  2. To complete the task, we must assume a prior distribution for H on the basis of prior knowledge, if any.

The Bayesian reversal

Since _autogen-svg2png-0053.png is an event with positive probability, we use the definition of the conditional expectation, given an event, and the law of total probability (CE1b) to obtain

(16.17)
_autogen-svg2png-0054.png
(16.18)
_autogen-svg2png-0055.png

A prior distribution for H

The beta _autogen-svg2png-0056.png distribution (see Appendix G), proves to be a “natural” choice for this purpose. Its range is the unit interval, and by proper choice of parameters _autogen-svg2png-0057.png the density function can be given a variety of forms (see Figures 1 and 2).

Figure 16.1
The Beta(r,s) density for _autogen-svg2png-0058.png.
Figure 16.2
The Beta(r,s) density for _autogen-svg2png-0059.png.

Its analysis is based on the integrals

(16.19)
_autogen-svg2png-0060.png

For H beta _autogen-svg2png-0062.png, the density is given by

(16.20)
_autogen-svg2png-0063.png

For _autogen-svg2png-0064.png, fH has a maximum at (r–1)/(r+s–2). For _autogen-svg2png-0066.png positive integers, fH is a polynomial on _autogen-svg2png-0067.png, so that determination of the distribution function is easy. In any case, straightforward integration, using the integral formula above, shows

(16.21)
_autogen-svg2png-0068.png

If the prior distribution for H is beta _autogen-svg2png-0069.png we may complete the determination of _autogen-svg2png-0070.png as follows.

(16.22)
_autogen-svg2png-0071.png
(16.23)
_autogen-svg2png-0072.png

We may adapt the analysis above to show that H is conditionally beta </div