Applied Probability by Paul E Pfeiffer - HTML preview

/ Home / Mathematics (Academic) / Applied Probability

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 7. Distribution and Density Functions

7.1. Distribution and Density Functions^*

Introduction

In the unit on Random Variables and Probability we introduce real random variables as mappings from the basic space Ω to the real line. The mapping induces a transfer of the probability mass on the basic space to subsets of the real line in such a way that the probability that X takes a value in a set M is exactly the mass assigned to that set by the transfer. To perform probability calculations, we need to describe analytically the distribution on the line. For simple random variables this is easy. We have at each possible value of X a point mass equal to the probability X takes that value. For more general cases, we need a more useful description than that provided by the induced probability measure P_X.

The distribution function

In the theoretical discussion on Random Variables and Probability, we note that the probability distribution induced by a random variable X is determined uniquely by a consistent assignment of mass to semi-infinite intervals of the form (–∞,t] for each real t. This suggests that a natural description is provided by the following.

Definition

The distribution function F_X for random variable X is given by

(7.1)

In terms of the mass distribution on the line, this is the probability mass at or to the left of the point t. As a consequence, F_X has the following properties:

(F1) : F_X must be a nondecreasing function, for if t>s there must be at least as much probability mass at or to the left of t as there is for s.

(F2) : F_X is continuous from the right, with a jump in the amount p₀ at t₀ iff

. If the point t approaches t₀ from the left, the interval does not include the probability mass at t₀ until t reaches that value, at which point the amount at or to the left of t increases ("jumps") by amount p₀; on the other hand, if t approaches t₀ from the right, the interval includes the mass p₀ all the way to and including t₀, but drops immediately as t moves to the left of t₀.

(F3) : Except in very unusual cases involving random variables which may take “infinite” values, the probability mass included in

must increase to one as t moves to the right; as t moves to the left, the probability mass included must decrease to zero, so that

(7.2)

A distribution function determines the probability mass in each semiinfinite interval (–∞,t]. According to the discussion referred to above, this determines uniquely the induced distribution.

The distribution function F_X for a simple random variable is easily visualized. The distribution consists of point mass p_i at each point t_i in the range. To the left of the smallest value in the range, F_X(t)=0; as t increases to the smallest value t₁, F_X(t) remains constant at zero until it jumps by the amount p₁.. F_X(t) remains constant at p₁ until t increases to t₂, where it jumps by an amount p₂ to the value p₁+p₂. This continues until the value of F_X(t)reaches 1 at the largest value t_n. The graph of F_X is thus a step function, continuous from the right, with a jump in the amount p_i at the corresponding point t_i in the range. A similar situation exists for a discrete-valued random variable which may take on an infinity of values (e.g., the geometric distribution or the Poisson distribution considered below). In this case, there is always some probability at points to the right of any t_i, but this must become vanishingly small as t increases, since the total probability mass is one.

The procedure ddbn may be used to plot the distributon function for a simple random variable from a matrix X of values and a corresponding matrix PX of probabilities.

Example 7.1. Graph of F_X for a simple random variable

>> c = [10 18 10 3];             % Distribution for X in Example 6.5.1
>> pm = minprob(0.1*[6 3 5]);
>> canonic
 Enter row vector of coefficients  c
 Enter row vector of minterm probabilities  pm
Use row matrices X and PX for calculations
Call for XDBN to view the distribution
>> ddbn                          % Circles show values at jumps
Enter row matrix of VALUES  X
Enter row matrix of PROBABILITIES  PX
%  Printing details   See Figure 7.1

Figure 7.1.

Distribution function for Example 7.1

Description of some common discrete distributions

We make repeated use of a number of common distributions which are used in many practical situations. This collection includes several distributions which are studied in the chapter "Random Variables and Probabilities".

Indicator function. X=I_E P(X=1)=P(E)=pP(X=0)=q=1–p. The distribution function has a jump in the amount q at t=0 and an additional jump of p to the value 1 at t=1.
Simple random variable (canonical form)
(7.3)
The distribution function is a step function, continuous from the right, with jump of p_i at t=t_i (See Figure 7.1 for Example 7.1)
Binomial (n,p). This random variable appears as the number of successes in a sequence of n Bernoulli trials with probability p of success. In its simplest form
(7.4)
(7.5)
As pointed out in the study of Bernoulli sequences in the unit on Composite Trials, two m-functions ibinom andcbinom are available for computing the individual and cumulative binomial probabilities.
Geometric (p) There are two related distributions, both arising in the study of continuing Bernoulli sequences. The first counts the number of failures before the first success. This is sometimes called the “waiting time.” The event {X=k} consists of a sequence of k failures, then a success. Thus
(7.6)
The second designates the component trial on which the first success occurs. The event {Y=k} consists of k–1 failures, then a success on the kth component trial. We have
(7.7)
We say X has the geometric distribution with parameter (p), which we often designate by X∼ geometric (p). Now Y=X+1 or Y–1=X. For this reason, it is customary to refer to the distribution for the number of the trial for the first success by saying Y–1∼ geometric (p). The probability of k or more failures before the first success is P(X≥k)=q^k. Also
(7.8)
This suggests that a Bernoulli sequence essentially "starts over" on each trial. If it has failed n times, the probability of failing an additional k or more times before the next success is the same as the initial probability of failing k or more times before the first success.
Example 7.2. The geometric distribution
A statistician is taking a random sample from a population in which two percent of the members own a BMW automobile. She takes a sample of size 100. What is the probability of finding no BMW owners in the sample?

SOLUTION
The sampling process may be viewed as a sequence of Bernoulli trials with probability p=0.02 of success. The probability of 100 or more failures before the first success is 0.98¹⁰⁰=0.1326 or about 1/7.5.
Negative binomial (m,p). X is the number of failures before the mth success. It is generally more convenient to work with Y=X+m, the number of the trial on which the mth success occurs. An examination of the possible patterns and elementary combinatorics show that
(7.9)
There are m–1 successes in the first k–1 trials, then a success. Each combination has probability p^mq^k–m. We have an m-function nbinom to calculate these probabilities.
Example 7.3. A game of chance
A player throws a single six-sided die repeatedly. He scores if he throws a 1 or a 6. What is the probability he scores five times in ten or fewer throws?
```
>> p = sum(nbinom(5,1/3,5:10))
p  =  0.2131
```
An alternate solution is possible with the use of the binomial distribution. The mth success comes not later than the kth trial iff the number of successes in k trials is greater than or equal to m.
```
>> P = cbinom(10,1/3,5)
P  =  0.2131
```

Poisson (μ). This distribution is assumed in a wide variety of applications. It appears as a counting variable for items arriving with exponential interarrival times (see the relationship to the gamma distribution below). For large n and small p (which may not be a value found in a table), the binomial distribution is approximately Poisson (np). Use of the generating function (see Transform Methods) shows the sum of independent Poisson random variables is Poisson. The Poisson distribution is integer valued, with

(7.10)

Although Poisson probabilities are usually easier to calculate with scientific calculators than binomial probabilities, the use of tables is often quite helpful. As in the case of the binomial distribution, we have two m-functions for calculating Poisson probabilities. These have advantages of speed and parameter range similar to those for ibinom and cbinom.

: P(X=k) is calculated by P = ipoisson(mu,k), where k is a row or column vector of integers and the result P is a row matrix of the probabilities.

: P(X≥k) is calculated by P = cpoisson(mu,k), where k is a row or column vector of integers and the result P is a row matrix of the probabilities.

Example 7.4. Poisson counting random variable

The number of messages arriving in a one minute period at a communications network junction is a random variable N∼ Poisson (130). What is the probability the number of arrivals is greater than equal to 110, 120, 130, 140, 150, 160 ?

>> p = cpoisson(130,110:10:160)
p  =  0.9666  0.8209  0.5117  0.2011  0.0461  0.0060

The descriptions of these distributions, along with a number of other facts, are summarized in the table DATA ON SOME COMMON DISTRIBUTIONS in Appendix C.

The density function

If the probability mass in the induced distribution is spread smoothly along the real line, with no point mass concentrations, there is a probability density function f_X which satisfies

(7.11)

At each t, f_X(t) is the mass per unit length in the probability distribution. The density function has three characteristic properties:

(7.12)

A random variable (or distribution) which has a density is called absolutely continuous. This term comes from measure theory. We often simply abbreviate as continuous distribution.

Remarks

There is a technical mathematical description of the condition “spread smoothly with no point mass concentrations.” And strictly speaking the integrals are Lebesgue integrals rather than the ordinary Riemann kind. But for practical cases, the two agree, so that we are free to use ordinary integration techniques.
By the fundamental theorem of calculus
(7.13)
Any integrable, nonnegative function f with ∫f=1 determines a distribution function F , which in turn determines a probability distribution. If ∫f≠1, multiplication by the appropriate positive constant gives a suitable f . An argument based on the Quantile Function shows the existence of a random variable with that distribution.
In the literature on probability, it is customary to omit the indication of the region of integration when integrating over the whole line. Thus
(7.14)
The first expression is not an indefinite integral. In many situations, f_X will be zero outside an interval. Thus, the integrand effectively determines the region of integration.

Figure 7.2.

The Weibull density for α=2,λ=0.25,1,4.