Applied Probability by Paul E Pfeiffer - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 6Random Variables and Probabilities

6.1Random Variables and Probabilities*

Introduction

Probability associates with an event a number which indicates the likelihood of the occurrence of that event on any trial. An event is modeled as the set of those possible outcomes of an experiment which satisfy a property or proposition characterizing the event.

Often, each outcome is characterized by a number. The experiment is performed. If the outcome is observed as a physical quantity, the size of that quantity (in prescribed units) is the entity actually observed. In many nonnumerical cases, it is convenient to assign a number to each outcome. For example, in a coin flipping experiment, a “head” may be represented by a 1 and a “tail” by a 0. In a Bernoulli trial, a success may be represented by a 1 and a failure by a 0. In a sequence of trials, we may be interested in the number of successes in a sequence of n component trials. One could assign a distinct number to each card in a deck of playing cards. Observations of the result of selecting a card could be recorded in terms of individual numbers. In each case, the associated number becomes a property of the outcome.

Random variables as functions

We consider in this chapter real random variables (i.e., real-valued random variables). In the chapter "Random Vectors and Joint Distributions", we extend the notion to vector-valued random quantites. The fundamental idea of a real random variable is the assignment of a real number to each elementary outcome ω in the basic space Ω. Such an assignment amounts to determining a function X, whose domain is Ω and whose range is a subset of the real line R. Recall that a real-valued function on a domain (say an interval I on the real line) is characterized by the assignment of a real number y to each element x (argument) in the domain. For a real-valued function of a real variable, it is often possible to write a formula or otherwise state a rule describing the assignment of the value to each argument. Except in special cases, we cannot write a formula for a random variable X. However, random variables share some important general properties of functions which play an essential role in determining their usefulness.

Mappings and inverse mappings

There are various ways of characterizing a function. Probably the most useful for our purposes is as a mapping from the domain Ω to the codomain R. We find the mapping diagram of Figure 1 extremely useful in visualizing the essential patterns. Random variable X, as a mapping from basic space Ω to the real line R, assigns to each element ω a value t=X(ω). The object point ω is mapped, or carried, into the image point t. Each ω is mapped into exactly one t, although several ω may have the same image point.

Random variables as functions
Figure 6.1
The basic mapping diagram _autogen-svg2png-0002.png.

Associated with a function X as a mapping are the inverse mapping X–1 and the inverse images it produces. Let M be a set of numbers on the real line. By the inverse image of M under the mapping X, we mean the set of all those ωΩ which are mapped into M by X (see Figure 2). If X does not take a value in M, the inverse image is the empty set (impossible event). If M includes the range of X, (the set of all possible values of X), the inverse image is the entire basic space Ω. Formally we write

(6.1)
_autogen-svg2png-0005.png

Now we assume the set X–1(M), a subset of Ω, is an event for each M. A detailed examination of that assertion is a topic in measure theory. Fortunately, the results of measure theory ensure that we may make the assumption for any X and any subset M of the real line likely to be encountered in practice. The set X–1(M) is the event that X takes a value in M. As an event, it may be assigned a probability.

Figure 6.2
E is the inverse image X–1(M).
Example 6.1Some illustrative examples.
  1. X=IE where E is an event with probability p. Now X takes on only two values, 0 and 1. The event that X take on the value 1 is the set

    (6.2)
    _autogen-svg2png-0010.png

    so that P({ω:X(ω)=1})=p. This rather ungainly notation is shortened to P(X=1)=p. Similarly, P(X=0)=1–p. Consider any set M. If neither 1 nor 0 is in M, then X–1(M)= If 0 is in M, but 1 is not, then X–1(M)=Ec If 1 is in M, but 0 is not, then X–1(M)=E If both 1 and 0 are in M, then X–1(M)=Ω In this case the class of all events X–1(M) consists of event E, its complement Ec, the impossible event , and the sure event Ω.

  2. Consider a sequence of n Bernoulli trials, with probability p of success. Let Sn be the random variable whose value is the number of successes in the sequence of n component trials. Then, according to the analysis in the section "Bernoulli Trials and the Binomial Distribution"

    (6.3)
    _autogen-svg2png-0019.png

Before considering further examples, we note a general property of inverse images. We state it in terms of a random variable, which maps Ω to the real line (see Figure 3).

Preservation of set operations

Let X be a mapping from Ω to the real line R. If M,Mi,iJ, are sets of real numbers, with respective inverse images E,Ei, then

(6.4)
_autogen-svg2png-0022.png

Examination of simple graphical examples exhibits the plausibility of these patterns. Formal proofs amount to careful reading of the notation. Central to the structure are the facts that each element ω is mapped into only one image point t and that the inverse image of M is the set of all those ω which are mapped into image points in M.

Figure 6.3
Preservation of set operations by inverse images.

An easy, but important, consequence of the general patterns is that the inverse images of disjoint M,N are also disjoint. This implies that the inverse of a disjoint union of Mi is a disjoint union of the separate inverse images.

Example 6.2 Events determined by a random variable

Consider, again, the random variable Sn which counts the number of successes in a sequence of n Bernoulli trials. Let n=10 and p=0.33. Suppose we want to determine the probability _autogen-svg2png-0026.png. Let _autogen-svg2png-0027.png, which we usually shorten to _autogen-svg2png-0028.png. Now the Ak form a partition, since we cannot have ωAk and _autogen-svg2png-0030.png (i.e., for any ω, we cannot have two values for Sn(ω)). Now,

(6.5)
_autogen-svg2png-0032.png

since S10 takes on a value greater than 2 but no greater than 8 iff it takes one of the integer values from 3 to 8. By the additivity of probability,

(6.6)
_autogen-svg2png-0033.png

Mass transfer and induced probability distribution

Because of the abstract nature of the basic space and the class of events, we are limited in the kinds of calculations that can be performed meaningfully with the probabilities on the basic space. We represent probability as mass distributed on the basic space and visualize this with the aid of general Venn diagrams and minterm maps. We now think of the mapping from Ω to R as a producing a point-by-point transfer of the probability mass to the real line. This may be done as follows:

To any set M on the real line assign probability mass _autogen-svg2png-0034.png

It is apparent that PX(M)≥0 and PX(R)=P(Ω)=1. And because of the preservation of set operations by the inverse mapping

(6.7)
_autogen-svg2png-0037.png

This means that PX has the properties of a probability measure defined on the subsets of the real line. Some results of measure theory show that this probability is defined uniquely on a class of subsets of R that includes any set normally encountered in applications. We have achieved a point-by-point transfer of the probability apparatus to the real line in such a manner that we can make calculations about the random variable X. We call PX the probability measure induced byX. Its importance lies in the fact that P(XM)=PX(M). Thus, to determine the likelihood that random quantity X will take on a value in set M, we determine how much induced probability mass is in the set M. This transfer produces what is called the probability distribution for X. In the chapter "Distribution and Density Functions", we consider useful ways to describe the probability distribution induced by a random variable. We turn first to a special class of random variables.

Simple random variables

We consider, in some detail, random variables which have only a finite set of possible values. These are called simple random variables. Thus the term “simple” is used in a special, technical sense. The importance of simple random variables rests on two facts. For one thing, in practice we can distinguish only a finite set of possible values for any random variable. In addition, any random variable may be approximated as closely as pleased by a simple random variable. When the structure and properties of simple random variables have been examined, we turn to more general cases. Many properties of simple random variables extend to the general case via the approximation procedure.

Representation with the aid of indicator functions

In order to deal with simple random variables clearly and precisely, we must find suitable ways to express them analytically. We do this with the aid of indicator functions. Three basic forms of representation are encountered. These are not mutually exclusive representatons.

  1. Standard or canonical form, which displays the possible values and the corresponding events. If X takes on distinct values

    (6.8)
    _autogen-svg2png-0039.png

    and if _autogen-svg2png-0040.png, for 1≤in, then _autogen-svg2png-0042.png is a partition (i.e., on any trial, exactly one of these events occurs). We call this the partition determined by (or, generated by) X. We may write

    (6.9)
    _autogen-svg2png-0043.png

    If X(ω)=ti, then ωAi, so that IAi(ω)=1 and all the other indicator functions have value zero. The summation expression thus picks out the correct value ti. This is true for any ti, so the expression represents X(ω) for all ω. The distinct set _autogen-svg2png-0048.png of the values and the corresponding probabilities _autogen-svg2png-0049.png constitute the distribution for X. Probability calculations for X are made in terms of its distribution. One of the advantages of the canonical form is that it displays the range (set of values), and if the probabilities _autogen-svg2png-0050.png are known, the distribution is determined. Note that in canonical form, if one of the ti has value zero, we include that term. For some probability distributions it may be that _autogen-svg2png-0051.png for one or more of the ti. In that case, we call these values null values, for they can only occur with probability zero, and hence are practically impossible. In the general formulation, we include possible null values, since they do not affect any probabilitiy calculations.

    Example 6.3Successes in Bernoulli trials

    As the analysis of Bernoulli trials and the binomial distribution shows (see Section 4.8), canonical form must be

    (6.10)
    _autogen-svg2png-0052.png

    For many purposes, both theoretical and practical, canonical form is desirable. For one thing, it displays directly the range (i.e., set of values) of the random variable. The distribution consists of the set of values _autogen-svg2png-0053.png paired with the corresponding set of probabilities _autogen-svg2png-0054.png, where _autogen-svg2png-0055.png.

  2. Simple random variable X may be represented by a primitive form

    (6.11)
    _autogen-svg2png-0056.png

    Remarks

    • If _autogen-svg2png-0057.png is a disjoint class, but _autogen-svg2png-0058.png, we may append the event