Introduction to Statistics by Ewa Paszek - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 5Pseudo - Numbers

5.1PSEUDO-NUMBERS*

UNIFORM PSEUDO-RANDOM VARIABLE GENERATION

In this paragraph, our goals will be to look at, in more detail, how and whether particular types of pseudo-random variable generators work, and how, if necessary, we can implement a generator of our own choosing. Below a list of requirements is listed for our uniform random variable generator:

  1. A uniform marginal distribution,

  2. Independence of the uniform variables,

  3. Repeatability and portability,

  4. Computational speed.

CURRENT ALGORITHMS

The generation of pseudo-random variates through algorithmic methods is a mature field in the sense that a great deal is known theoretically about different classes of algorithms, and in the sense that particular algorithms in each of those classes have been shown, upon testing, to have good statistical properties. In this section, let describe the main classes of generators, and then let make specific recommendation about which generators should be implemented.

Congruential Generators

The most widely used and best understood class of pseudo-random number generators are those based on the linear congruential method introduced by Lehmer (1951). Such generators are based on the following formula:

(5.1)
_autogen-svg2png-0001.png

where Ui ,i=1,2,... are the output random integers; U0 is the chosen starting value for the recursion, called the seed and a,c, and m are prechosen constants.

Notice That

to convert to uniform ( 0,1 ) variates, we need only divide by modulus m, that is, we use the sequence _autogen-svg2png-0005.png .

The following properties of the algorithm are worth stating explicitly:
  1. Because of the “mod m” operation (for background on modular operations, see Knuth, (1981) ), the only possible values the algorithm can produce are the integers 0,1,2,...,m−1. This follows because, by definition, x mod m is the remainder after x is divided by m.

  2. Because the current random integer Ui depends only on the previous random integer Ui−1 once a previous value has been repeated, the entire sequence after it must be repeated. Such a repeating sequence is called a cycle, and its period is the cycle length. Clearly, the maximum period of the congruential generator is m. For given choices of a, c, and m, a generator may contain many short cycles, (see the Example 1 below), and the cycle you enter will depend on the seed you start with. Notice that the generator with many short cycles is not a good one, since the output sequence will be one of a number of short series, each of which may not be uniformly distributed or randomly dispersed on the line or the plane. Moreover, if the simulation is long enough to cause the random numbers to repeat because of the short cycle length, the outputs will not be independent.

  3. If we are concern with a uniform ( 0,1 ) variates, the finest partition of the interval ( 0,1 ) that this generator can provide is [ 0,1/m,2/m,...,( m−1/m ) ] . This is, of course, not truly a uniform ( 0,1 ) distribution since, for any k in ( 0,m−1 ) , we have P[ k/m<U<( k+1 )/m ]=0 , not 1/m are required by theory for continuous random variables.

  4. Choices of a,c, and m, will determine not only the fineness of the partition of ( 0,1 ) and the cycle length, and therefore, the uniformity of the marginal distribution, but also the independence properties of the output sequence. Properly choosing a,c, and m is a science that incorporates both theoretical results and empirical tests. The first rule is to select the modulus m to be “as large as possible”, so that there is some hope to address point 3 above and to generate uniform variates with an approximately uniform marginal distribution. However, simply having m large is not enough; one may still find that the generator has many short cycles, or that the sequence is not approximately independent. See example 1 below.

Example 5.1

Consider

() Ui =2Ui−1 mod⁡2 32

Where a seed of the form 2k creates a loop containing only integers that are powers of 2, or

()_autogen-svg2png-0019.png

which generates the nonrandom sequence of increasing integers. Therefore, the second equation gives a generator that has the maximum possible cycle length but is useless for simulating a random sequence.

Fortunately, one a value of the m has been selected; theoretical results exist that give conditions for choosing values of the multiplier a and the additive constant c such that all the possible integers, 0 through m−1 , are generated before any are repeated.

Notice, that

this does not eliminate the second counterexample above, which already has the maximal cycle length, but is a useless random number generator.

THEOREM I

A linear congruential generator will have maximal cycle length m, if and only if:

  • c is nonzero and is relatively prime to m (i.e., c and m have no common prime factors).

  • ( amod⁡q )=1 for each prime factor q of m.

  • ( amod⁡4 )=1 if 4 is a factor of m.

PROOF

SEE
Knuth (1981, p.16).

As a mathematical note, c is called relatively prime to m if and only if c and m have no common divisor other than 1, which is equivalent to c and m having no common prime factor.

A related result concerns the case of c chosen to be 0. This case does not conform to condition in a Theorem I, a value Ui of zero must be avoided because the generator will continue to produce zero after the first occurrence of a zero. In particular, a seed of zero is not allowable. By Theorem I, a generator with c=0 , which is called a multiplicative congruential generator, cannot have maximal cycle length m. However, By Theorem II. It can have cycle length m−1 .

THEOREM II

If c=0 in a linear congruential generator, then Ui =0 can never be included in a cycle, since the 0 will always repeat. However, the generator will cycle through all m−1 integers in the set ( amod⁡q ) if and only if:

  • m is a prime integer and

  • m is a primitive element modulo m .

PROOF

SEE
Knuth (1981, p.19).

A formal definition or primitive elements modulo m, as well as theoretical results for finding them, are given in Knuth (1981). In effect, when m is a prime, a is a primitive element if the cycle is of length m−1 . The results of Theorem II are not intuitively useful, but for our purposes, it is enough to note that such primitive elements exist and have veen computed by researchers,

SEE

e.g., Table24.8 in Abramowitz and Stegun, 1965.

Hence, we now must select one of two possibilities:

  • Choose a, c, and m according to Theorem I and work with a generator whose cycle length is known to be m.

  • Choose c=0 , take a and m according to Theorem II, use a number other than zero as the seed, and work with a generator whose cycle length is known to be m−1 . A generator satisfying these conditions is known as a prime-modulus multiplicative congruential generator and, because of the simpler computation, it usually has an advantage in terms of speed over the mixed congruential generator.

Another method frequency speeding up a random number generator that has c=0 is to choose the modulus m to be computationally convenient. For instance, consider m=2k . This is clearly not a prime number, but on a computer the modulus operation becomes a bit-shift operation in machine code. In such cases, Theorem III gives a guise to the maximal cycle length.

THEOREM III

If c=0 and m=2k with k>2 , then the maximal possible cycle length is 2k−2 . This is achieved if and only if two conditions hold:

  • a is a primitive element modulo m.

  • the seed is odd.

PROOF

SEE
Knuth (1981, p.19).

Notice that we sacrifice some of the cycle length and, as we will se in Theorem IV, we also lose some randomness in the low-order bits of the random variates. Having use any of Theorems I, II, or III to select triples (a, c, m) that lead to generators with sufficiently long cycles of known length, we can ask which triple gives the most random (i.e., approximately independent ) sequence. Although some theoretical results exist for generators as a whole, these are generally too weak to eliminate any but the worst generators. Marsaglia (1985) and Knuth(1981, Chap. 3.3.3) are good sources for material on that results.

THEOREM IV

If Ui =aUi−1 mod⁡2k, and we define

() Yi =Ui mod⁡2j ,0<j<k

then

() Yi =aYi−1 mod⁡2j .

In practical terms, this means that the sequence of j-lo-order binary bits of the Ui sequence, namely Yi cycle with cycle length at most 2j. In particular, sequence of the least significant bit (i.e., j=1) in _autogen-svg2png-0045.png must behave as ( 0,0,0,0,... ),( 1,1,1,1,... ),( 0,1,0,1,... ) or ( 1,0,1,0,... ) .

PROOF

SEE
Knuth (1981, pp. 12-14).

Such normal behavior in the low-order bits of a congruential generator with non-prime-modulus m is an undesirably property, which may be aggravated by techniques such as the recycling of uniform variates. It has been observed (Hutchinson, 1966) that prime-modulus multiplicative congruential generators with full cycle (i.e., when m is a positive primitive element) tend to have fairly randomly distributed low-order bits, although no theory exists to explain this.

THEOREM V

If our congruential generator produces the sequence _autogen-svg2png-0048.png, and we look at the following sequence of points in n dimensions:

()_autogen-svg2png-0049.png

then the points will all lie in fewer than ( n|m ) 1/n parallel hyper planes.

PROOF

SEE
Marsaglia (1976).

Given these known limitations of congruential generator, we are still left with the question of how to choose the “best” values for a, c, and m. To do this, researchers have followed a straightforward but time-consuming procedure:

  1. Take values a, c, and m that give a sufficiently long, known cycle length and usa the generator to produce sequences of uniform variates.

  2. Subject the output sequences to batteries of statistical tests for independence and a uniform marginal distribution. Document the results.

  3. Subject the generator to theoretical tests. In particular, the spectral test of Coveyou and MacPherson (1967) is currently widely used and recognized as a very sensitive structural test for distinguishing between good and bad generators. Document the results.

  4. As new, more sensitive tests appear, subject to generator to those tests. Several such tests are discussed in Marsaglia(1985).

5.2PSEUDO-RANDOM VARIABLE GENERATORS, cont.*

PSEUDO-RANDOM VARIABLE GENERATORS, cont.

A Shift-Register Generator

An alternative class of pseudo-numbers generators are shift-register or Tausworthe generators, which have their origins in the work of Golomb (1967). These algorithms operate on n-bit, pseudo-random binary vectors, just as congruential generators operate on pseudo-random integers. To return a uniform ( 0,1 ) variate, the binary vector must be converted to an integer and divided by one plus the largest possible number, 2n.

Fibonacci Generators

The final major class of generators to be considered are the lagged Fibonacci generators, which take their name from the famous Fibonacci sequence Ui =Ui−1 +Ui−2 . This recursion is reminiscent of the congruential generators, which the added feature that the current value depends on the two previous values.

The integer generator based directly on the Fibonacci formula

() 2n

has been investigated, but not found to be satisfactory random. A more general formulation can be given by the equation:

() Ui =Uir Uis ,r≥1,s≥1,rs,

where the symbol ‘square’ represents an arbitrary mathematical operation. We can think of the Ui =0 as either binary vectors, integers, or real numbers between 0 and 1, depending on the operation involved.

As examples:
  1. The Ui =0 are real and dot re