Applied Probability by Paul E Pfeiffer - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 12Variance, Covariance, Linear Regression

12.1Variance*

In the treatment of the mathematical expection of a real random variable X, we note that the mean value locates the center of the probability mass distribution induced by X on the real line. In this unit, we examine how expectation may be used for further characterization of the distribution for X. In particular, we deal with the concept of variance and its square root the standard deviation. In subsequent units, we show how it may be used to characterize the distribution for a pair _autogen-svg2png-0001.png considered jointly with the concepts covariance, and linear regression

Variance

Location of the center of mass for a distribution is important, but provides limited information. Two markedly different random variables may have the same mean value. It would be helpful to have a measure of the spread of the probability mass about the mean. Among the possibilities, the variance and its square root, the standard deviation, have been found particularly useful.

Definition. The variance of a random variable X is the mean square of its variation about the mean value:

(12.1)
_autogen-svg2png-0002.png

The standard deviation for X is the positive square root σX of the variance.

Remarks
  • If X(ω) is the observed value of X, its variation from the mean is X(ω)–μX. The variance is the probability weighted average of the square of these variations.

  • The square of the error treats positive and negative variations alike, and it weights large variations more heavily than smaller ones.

  • As in the case of mean value, the variance is a property of the distribution, rather than of the random variable.

  • We show below that the standard deviation is a “natural” measure of the variation from the mean.

  • In the treatment of mathematical expectation, we show that

    (12.2)
    _autogen-svg2png-0005.png

    This shows that the mean value is the constant which best approximates the random variable, in the mean square sense.

Basic patterns for variance

Since variance is the expectation of a function of the random variable X, we utilize properties of expectation in computations. In addition, we find it expedient to identify several patterns for variance which are frequently useful in performing calculations. For one thing, while the variance is defined as _autogen-svg2png-0006.png, this is usually not the most convenient form for computation. The result quoted above gives an alternate expression.

(V1): Calculating formula. _autogen-svg2png-0007.png.
(V2): Shift property. _autogen-svg2png-0008.png. Adding a constant b to X shifts the distribution (hence its center of mass) by that amount. The variation of the shifted distribution about the shifted center of mass is the same as the variation of the original, unshifted distribution about the original center of mass.
(V3): Change of scale. _autogen-svg2png-0009.png. Multiplication of X by constant a changes the scale by a factor |a|. The squares of the variations are multiplied by a2. So also is the mean of the squares of the variations.
(V4): Linear combinations
  1. _autogen-svg2png-0011.png

  2. More generally,

    (12.3)
    _autogen-svg2png-0012.png

The term _autogen-svg2png-0013.png is the covariance of the pair _autogen-svg2png-0014.png, whose role we study in the unit on that topic. If the cij are all zero, we say the class is uncorrelated.

Remarks

  • If the pair _autogen-svg2png-0016.png is independent, it is uncorrelated. The converse is not true, as examples in the next section show.

  • If the ai=±1 and all pairs are uncorrelated, then

    (12.4)
    _autogen-svg2png-0018.png

    The variance add even if the coefficients are negative.

We calculate variances for some common distributions. Some details are omitted—usually details of algebraic manipulation or the straightforward evaluation of integrals. In some cases we use well known sums of infinite series or values of definite integrals. A number of pertinent facts are summarized in Appendix B. Some Mathematical Aids. The results below are included in the table in Appendix C.

Variances of some discrete distributions

  1. Indicator function X = IEP ( E ) = p , q = 1 – p E [ X ] = p

    (12.5)
    _autogen-svg2png-0021.png

  2. Simple random variable_autogen-svg2png-0022.png (primitive form) _autogen-svg2png-0023.png.

    (12.6)
    _autogen-svg2png-0024.png
  3. Binomial(n,p). _autogen-svg2png-0026.png

    (12.7)
    _autogen-svg2png-0027.png
  4. Geometric(p). P(X=k)=pqkk≥0E[X]=q/p
    We use a trick: _autogen-svg2png-0031.png

    (12.8)
    _autogen-svg2png-0032.png
    (12.9)
    _autogen-svg2png-0033.png
  5. Poisson(μ)_autogen-svg2png-0035.png
    Using _autogen-svg2png-0036.png, we have

    (12.10)
    _autogen-svg2png-0037.png

    Thus, Var [X]=μ2+μμ2=μ. Note that both the mean and the variance have common value μ.

Some absolutely continuous distributions

  1. Uniform on (a,b)_autogen-svg2png-0040.png_autogen-svg2png-0041.png

    (12.11)
    _autogen-svg2png-0042.png
  2. Symmetric triangular(a,b) Because of the shift property (V2), we may center the distribution at the origin. Then the distribution is symmetric triangular (–c,c), where c=(ba)/2. Because of the symmetry

    (12.12)
    _autogen-svg2png-0046.png

    Now, in this case,

    (12.13)
    _autogen-svg2png-0047.png
  3. Exponential ( λ ) fX ( t ) = λ eλ t , t ≥ 0 E [ X ] = 1 / λ

    (12.14)
    _autogen-svg2png-0050.png

  4. Gamma(α,λ)_autogen-svg2png-0052.png

    (12.15)
    _autogen-svg2png-0053.png

    Hence _autogen-svg2png-0054.png.

  5. Normal_autogen-svg2png-0055.pngE[X]=μ
    Consider _autogen-svg2png-0057.png.

    (12.16)X=σY+μimplies Var [X]=σ2 Var [Y]=σ2

Extensions of some previous examples

In the unit on expectations, we calculate the mean for a variety of cases. We revisit some of those examples and calculate the variances.

Example 12.1Expected winnings (Example 8 from "Mathematical Expectation: Simple Random Variables")

A bettor places three bets at $2.00 each. The first pays $10.00 with probability 0.15, the second $8.00 with probability 0.20, and the third $20.00 with probability 0.10.

SOLUTION

The net gain may be expressed

(12.17)
_autogen-svg2png-0059.png

We may reasonbly suppose the class _autogen-svg2png-0060.png is independent (this assumption is not necessary in computing the mean). Then

(12.18)
_autogen-svg2png-0061.png

Calculation is straightforward. We may use MATLAB to perform the arithmetic.

c = [10 8 20];
p = 0.01*[15 20 10];
q = 1 - p;
VX = sum(c.^2.*p.*q)
VX =  58.9900
Example 12.2A function of X (Example 9 from "Mathematical Expectation: Simple Random Variables")

Suppose X in a primitive form is

(12.19) X = – 3 IC1IC2 + 2 IC3 – 3 IC4 + 4 IC5IC6 + IC7 + 2 IC8 + 3 IC9 + 2 IC10

with probabilities _autogen-svg2png-0063.png.

Let g(t)=t2+2t. Determine E[g(X)] and _autogen-svg2png-0066.png

c = [-3 -1 2 -3 4 -1 1 2 3 2];            % Original coefficients
pc = 0.01*[8 11 6 13 5 8 12 7 14 16];     % Probabilities for C_j
G = c.^2 + 2*c                            % g(c_j)
EG = G*pc'                                % Direct calculation E[g(X)]
EG =  6.4200
VG = (G.^2)*pc' - EG^2                  % Direct calculation Var[g(X)]
VG = 40.8036
[Z,PZ] = csort(G,pc);                   % Distribution for Z = g(X)
EZ = Z*PZ'                              % E[Z]
EZ =  6.4200
VZ = (Z.^2)*PZ' - EZ^2                  % Var[Z]
VZ = 40.8036
Example 12.3_autogen-svg2png-0067.png (Example 10 from "Mathematical Expectation: Simple Random Variables")

We use the same joint distribution as for Example 10 from "Mathematical Expectation: Simple Random Variables" and let _autogen-svg2png-0068.png. To set up for calculations, we use jcalc.

jdemo1                      % Call for data
jcalc                       % Set up
Enter JOINT PROBABILITIES (as on the plane)  P
Enter row matrix of VALUES of X  X
Enter row matrix of VALUES of Y  Y
 Use array operations on matrices X, Y, PX, PY, t, u, and P
G = t.^2 + 2*t.*u - 3*u;    % Calculation of matrix of [g(t_i, u_j)]
EG = total(G.*P)            % Direct calculation of E[g(X,Y)]
EG =   3.2529
VG = total(G.^2.*P) - EG^2  % Direct calculation of Var[g(X,Y)]
VG =  80.2133
[Z,PZ] = csort(G,P);        % Determination of distribution for Z
EZ = Z*PZ'                  % E[Z] from distribution
EZ =   3.2529
VZ = (Z.^2)*PZ' - EZ^2      % Var[Z] from distribution
VZ =  80.2133
Example 12.4A function with compound definition (Example 12 from "Mathematical Expectation; General Random Variables")

Suppose X exponential (0.3). Let

(12.20)
_autogen-svg2png-0070.png

Determine E[Z] and _autogen-svg2png-0072.png.

ANALYTIC SOLUTION

(12.21)