Collaborative Statistics by Robert Gallagher - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 6

The Normal Distribution

6.1 The Normal Distribution1

6.1.1 Student Learning Outcomes

By the end of this chapter, the student should be able to:

• Recognize the normal probability distribution and apply it appropriately.

• Recognize the standard normal probability distribution and apply it appropriately.

• Compare normal probabilities by converting to the standard normal distribution.

6.1.2 Introduction

The normal, a continuous distribution, is the most important of all the distributions. It is widely used

and even more widely abused. Its graph is bell-shaped. You see the bell curve in almost all disciplines.

Some of these include psychology, business, economics, the sciences, nursing, and, of course, mathematics.

Some of your instructors may use the normal distribution to help determine your grade. Most IQ scores are

normally distributed. Often real estate prices fit a normal distribution. The normal distribution is extremely

important but it cannot be applied to everything in the real world.

In this chapter, you will study the normal distribution, the standard normal, and applications associated

with them.

6.1.3 Optional Collaborative Classroom Activity

Your instructor will record the heights of both men and women in your class, separately. Draw histograms

of your data. Then draw a smooth curve through each histogram. Is each curve somewhat bell-shaped? Do

you think that if you had recorded 200 data values for men and 200 for women that the curves would look

bell-shaped? Calculate the mean for each data set. Write the means on the x-axis of the appropriate graph

below the peak. Shade the approximate area that represents the probability that one randomly chosen

male is taller than 72 inches. Shade the approximate area that represents the probability that one randomly

chosen female is shorter than 60 inches. If the total area under each curve is one, does either probability

appear to be more than 0.5?

1This content is available online at <http://cnx.org/content/m16979/1.12/>.

251

index-262_1.png

252

CHAPTER 6. THE NORMAL DISTRIBUTION

The normal distribution has two parameters (two numerical descriptive measures), the mean ( µ) and the

standard deviation ( σ). If X is a quantity to be measured that has a normal distribution with mean ( µ) and

the standard deviation ( σ), we designate this by writing

NORMAL:X∼N ( µ, σ)

The probability density function is a rather complicated function. Do not memorize it. It is not necessary.

2

f (x) =

1

· e− 1 ·( x− µ )

2

σ

σ·

π

The cumulative distribution function is P (X < x) . It is calculated either by a calculator or a computer or

it is looked up in a table. Technology has made the tables basically obsolete. For that reason, as well as

the fact that there are various table formats, we are not including table instructions in this chapter. See the

NOTE in this chapter in Calculation of Probabilities.

The curve is symmetrical about a vertical line drawn through the mean, µ. In theory, the mean is the same

as the median since the graph is symmetric about µ. As the notation indicates, the normal distribution

depends only on the mean and the standard deviation. Since the area under the curve must equal one, a

change in the standard deviation, σ, causes a change in the shape of the curve; the curve becomes fatter or

skinnier depending on σ. A change in µ causes the graph to shift to the left or right. This means there are an

infinite number of normal probability distributions. One of special interest is called the standard normal

distribution.

6.2 The Standard Normal Distribution2

The standard normal distribution is a normal distribution of standardized values called z-scores. A z-

score is measured in units of the standard deviation. For example, if the mean of a normal distribution is

5 and the standard deviation is 2, the value 11 is 3 standard deviations above (or to the right of) the mean.

The calculation is:

x = µ + (z) σ = 5 + (3) (2) = 11

(6.1)

The z-score is 3.

The mean for the standard normal distribution is 0 and the standard deviation is 1. The transformation

z = x− µ

produces the distribution Z∼ N (0, 1)

. The value x comes from a normal distribution with

σ

mean µ and standard deviation σ.

2This content is available online at <http://cnx.org/content/m16986/1.7/>.

253

6.3 Z-scores3

If X is a normally distributed random variable and X∼N ( µ, σ), then the z-score is:

x − µ

z =

(6.2)

σ

The z-score tells you how many standard deviations that the value x is above (to the right of) or below

(to the left of) the mean, µ. Values of x that are larger than the mean have positive z-scores and values of x

that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of 0.

Example 6.1

Suppose X ∼ N (5, 6). This says that X is a normally distributed random variable with mean

µ = 5 and standard deviation σ = 6. Suppose x = 17. Then:

x − µ

17 − 5

z =

=

= 2

(6.3)

σ

6

This means that x = 17 is 2 standard deviations (2 σ) above or to the right of the mean µ = 5.

The standard deviation is σ = 6.

Notice that:

5 + 2 · 6 = 17

(The pattern is µ + z σ = x.)

(6.4)

Now suppose x = 1. Then:

x − µ

1 − 5

z =

=

= −0.67

(rounded to two decimal places)

(6.5)

σ

6

This means that x = 1 is 0.67 standard deviations (− 0.67 σ) below or to the left of the mean

µ = 5. Notice that:

5 + (−0.67) (6) is approximately equal to 1

(This has the pattern µ + (−0.67) σ = 1 )

Summarizing, when z is positive, x is above or to the right of µ and when z is negative, x is to the

left of or below µ.

Example 6.2

Some doctors believe that a person can lose 5 pounds, on the average, in a month by reducing

his/her fat intake and by exercising consistently. Suppose weight loss has a normal distribution.

Let X = the amount of weight lost (in pounds) by a person in a month. Use a standard deviation

of 2 pounds. X∼N (5, 2). Fill in the blanks.

Problem 1

(Solution on p. 275.)

Suppose a person lost 10 pounds in a month. The z-score when x = 10 pounds is z = 2.5

(verify). This z-score tells you that x = 10 is ________ standard deviations to the ________ (right

or left) of the mean _____ (What is the mean?).

Problem 2

(Solution on p. 275.)

Suppose a person gained 3 pounds (a negative weight loss). Then z = __________. This z-score

tells you that x = −3 is ________ standard deviations to the __________ (right or left) of the mean.

Suppose the random variables X and Y have the following normal distributions: X ∼N (5, 6) and

Y ∼ N (2, 1). If x = 17, then z = 2. (This was previously shown.) If y = 4, what is z?

y − µ

4 − 2

z =

=

= 2

where µ=2 and σ=1.

(6.6)

σ

1

3This content is available online at <http://cnx.org/content/m16991/1.9/>.

index-264_1.png

254

CHAPTER 6. THE NORMAL DISTRIBUTION

The z-score for y = 4 is z = 2. This means that 4 is z = 2 standard deviations to the right of

the mean. Therefore, x = 17 and y = 4 are both 2 (of their) standard deviations to the right of

their respective means.

The z-score allows us to compare data that are scaled differently. To understand the concept,

suppose X ∼N (5, 6) represents weight gains for one group of people who are trying to gain

weight in a 6 week period and Y ∼N (2, 1) measures the same weight gain for a second group

of people. A negative weight gain would be a weight loss. Since x = 17 and y = 4 are each 2

standard deviations to the right of their means, they represent the same weight gain relative to

their means.

The Empirical Rule

If X is a random variable and has a normal distribution with mean µ and standard deviation σ then the

Empirical Rule says (See the figure below)

• About 68.27% of the x values lie between -1 σ and +1 σ of the mean µ (within 1 standard deviation of

the mean).

• About 95.45% of the x values lie between -2 σ and +2 σ of the mean µ (within 2 standard deviations of

the mean).

• About 99.73% of the x values lie between -3 σ and +3 σ of the mean µ (within 3 standard deviations of

the mean). Notice that almost all the x values lie within 3 standard deviations of the mean.

• The z-scores for +1 σ and –1 σ are +1 and -1, respectively.

• The z-scores for +2 σ and –2 σ are +2 and -2, respectively.

• The z-scores for +3 σ and –3 σ are +3 and -3 respectively.

Example 6.3

Suppose X has a normal distribution with mean 50 and standard deviation 6.

• About 68.27% of the x values lie between -1 σ = (-1)(6) = -6 and 1 σ = (1)(6) = 6. The values -6

and 6 are within 1 standard deviation of the mean 50. The z-scores are -1 and +1 for -6 and

6, respectively.

• About 95.45% of the x values lie between -2 σ = (-2)(6) = -12 and 2 σ = (2)(6) = 12. The values

-12 and 12 are within 2 standard deviations of the mean 50. The z-scores are -2 and +2 for -12

and 12, respectively.

• About 99.73% of the x values lie between -3 σ = (-3)(6) = -18 and 3 σ = (3)(6) = 18. The values

-18 and 18 are within 3 standard deviations of the mean 50. The z-scores are -3 and +3 for -18

and 18, respectively.

index-265_1.png

255

6.4 Areas to the Left and Right of x4

The arrow in the graph below points to the area to the left of x. This area is represented by the probability

P (X < x). Normal tables, computers, and calculators provide or calculate the probability P (X < x).

The area to the right is then P (X > x) = 1 − P (X < x).

Remember, P (X < x) = Area to the left of the vertical line through x.

P (X > x) = 1 − P (X < x) =. Area to the right of the vertical line through x

P (X < x) is the same as P (X ≤ x) and P (X > x) is the same as P (X ≥ x) for continuous distributions.

6.5 Calculations of Probabilities5

Probabilities are calculated by using technology. There are instructions in the chapter for the TI-83+ and

TI-84 calculators.

NOTE: In the Table of Contents for Collaborative Statistics, entry 15. Tables has a link to a table

of normal probabilities. Use the probability tables if so desired, instead of a calculator. The tables

include instructions for how to use then.

Example 6.4

If the area to the left is 0.0228, then the area to the right is 1 − 0.0228 = 0.9772.

Example 6.5

The final exam scores in a statistics class were normally distributed with a mean of 63 and a

standard deviation of 5.

Problem 1

Find the probability that a randomly selected student scored more than 65 on the exam.

Solution

Let X = a score on the final exam. X∼N (63, 5), where µ = 63 and σ = 5

Draw a graph.

Then, find P (x > 65).

P (x > 65) = 0.3446 (calculator or computer)

4This content is available online at <http://cnx.org/content/m16976/1.5/>.

5This content is available online at <http://cnx.org/content/m16977/1.12/>.

index-266_1.png

256

CHAPTER 6. THE NORMAL DISTRIBUTION

The probability that one student scores more than 65 is 0.3446.

Using the TI-83+ or the TI-84 calculators, the calculation is as follows. Go into ✷♥❞ ❉■❙❚❘.

After pressing ✷♥❞ ❉■❙❚❘, press ✷✿♥♦r♠❛❧❝❞❢.

The syntax for the instructions are shown below.

normalcdf(lower value, upper value, mean, standard deviation) For this problem: normal-

cdf(65,1E99,63,5) = 0.3446. You get 1E99 ( = 1099) by pressing ✶, the ❊❊ key (a 2nd key) and then ✾✾.

Or, you can enter ✶✵❫✾✾ instead. The number 1099 is way out in the right tail of the normal curve.

We are calculating the area between 65 and 1099. In some instances, the lower number of the area

might be -1E99 ( = −1099). The number −1099 is way out in the left tail of the normal curve.

HISTORICAL NOTE: The TI probability program calculates a z-score and then the probability from

the z-score. Before technology, the z-score was looked up in a standard normal probability table

(because the math involved is too cumbersome) to find the probability. In this example, a standard

normal table with area to the left of the z-score was used. You calculate the z-score and look up

the area to the left. The probability is the area to the right.

z = 65−63 = 0.4

. Area to the left is 0.6554. P (x > 65) = P (z > 0.4) = 1 − 0.6554 = 0.3446

5

Problem 2

Find the probability that a randomly selected student scored less than 85.

Solution

Draw a graph.

Then find P (x < 85). Shade the graph. P (x < 85) = 1 (calculator or computer)

The probability that one student scores less than 85 is approximately 1 (or 100%).

The TI-instructions and answer are as follows:

normalcdf(0,85,63,5) = 1 (rounds to 1)

Problem 3

Find the 90th percentile (that is, find the score k that has 90 % of the scores below k and 10% of

the scores above k).

Solution

Find the 90th percentile. For each problem or part of a problem, draw a new graph. Draw the

x-axis. Shade the area that corresponds to the 90th percentile.

Let k = the 90th percentile. k is located on the x-axis. P (x < k) is the area to the left of k. The 90th

percentile k separates the exam scores into those that are the same or lower than k and those that

index-267_1.png

257

are the same or higher. Ninety percent of the test scores are the same or lower than k and 10% are

the same or higher. k is often called a critical value.

k = 69.4 (calculator or computer)

The 90th percentile is 69.4. This means that 90% of the test scores fall at or below 69.4 and 10% fall

at or above. For the TI-83+ or TI-84 calculators, use ✐♥✈◆♦r♠ in ✷♥❞ ❉■❙❚❘. invNorm(area to the

left, mean, standard deviation) For this problem, invNorm(0.90,63,5) = 69.4

Problem 4

Find the 70th percentile (that is, find the score k such that 70% of scores are below k and 30% of

the scores are above k).

Solution

Find the 70th percentile.

Draw a new graph and label it appropriately. k = 65.6

The 70th percentile is 65.6. This means that 70% of the test scores fall at or below 65.5 and 30% fall

at or above.

invNorm(0.70,63,5) = 65.6

Example 6.6

A computer is used for office work at home, research, communication, personal finances, educa-

tion, entertainment, social networking and a myriad of other things. Suppose that the average

number of hours a household personal computer is used for entertainment is 2 hours per day.

Assume the times for entertainment are normally distributed and the standard deviation for the

times is half an hour.

Problem 1

Find the probability that a household personal computer is used between 1.8 and 2.75 hours per

day.

Solution

Let X = the amount of time (in hours) a household personal computer is used for entertainment.

x∼N (2, 0.5) where µ = 2 and σ = 0.5.

Find P (1.8 < x < 2.75).

The probability for which you are looking is the area between x

=

1.8 and x

=

2.75.

P (1.8 < x < 2.75) = 0.5886

index-268_1.png

index-268_2.png

258

CHAPTER 6. THE NORMAL DISTRIBUTION

normalcdf(1.8,2.75,2,0.5) = 0.5886

The probability that a household personal computer is used between 1.8 and 2.75 hours per day

for entertainment is 0.5886.

Problem 2

Find the maximum number of hours per day that the bottom quartile of households use a personal

computer for entertainment.

Solution

To find the maximum number of hours per day that the bottom quartile of households uses a

personal computer for entertainment, find the 25th percentile, k, where P (x < k) = 0.25.

invNorm(0.25,2,.5) = 1.66

The maximum number of hours per day that the bottom quartile of households uses a personal

computer for entertainment is 1.66 hours.

259

6.6 Summary of Formulas6

Formula 6.1: Normal Probability Distribution

X∼N ( µ, σ)

µ = the mean

σ = the standard deviation

Formula 6.2: Standard Normal Probability Distribution

Z∼N (0, 1)

z = a standardized value (z-score)

mean = 0

standard deviation = 1

Formula 6.3: Finding the kth Percentile

To find the kth percentile when the z-score is known: k = µ + (z) σ

Formula 6.4: z-score

z = x− µ

σ

Formula 6.5: Finding the area to the left

The area to the left: P (X < x)

Formula 6.6: Finding the area to the right

The area to the right: P (X > x) = 1 − P (X < x)

6This content is available online at <http://cnx.org/content/m16987/1.5/>.

index-270_1.png

260

CHAPTER 6. THE NORMAL DISTRIBUTION

6.7 Practice: The Normal Distribution7

6.7.1 Student Learning Outcomes

• The student will analyze data following a normal distribution.

6.7.2 Given

The life of Sunshine CD players is normally distributed with a mean of 4.1 years and a standard deviation

of 1.3 years. A CD player is guaranteed for 3 years. We are interested in the length of time a CD player

lasts.

6.7.3 Normal Distribution

Exercise 6.7.1

Define the Random Variable X in words. X =

Exercise 6.7.2

X∼

Exercise 6.7.3

(Solution on p. 275.)

Find the probability that a CD player will break down during the guarantee period.

a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probabil-

ity.

Figure 6.1

b. P (0 < x < _________) = _________ (Use zero (0) for the minimum value of x.)

Exercise 6.7.4

(Solution on p. 275.)

Find the probability that a CD player will last between 2.8 and 6 years.

a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the probabil-

ity.

7This content is available online at <http://cnx.org/content/m16983/1.10/>.

index-271_1.png

index-271_2.png

261

Figure 6.2

b. P (_______ < x < _______) = _________

Exercise 6.7.5

(Solution on p. 275.)

Find the 70th percentile of the distribution for the time a CD player lasts.

a. Sketch the situation. Label and scale the axes. Shade the region corresponding to the lower

70%.

Figure 6.3

b. P (x < k) = _________. Therefore, k = __________.

262

CHAPTER 6. THE NORMAL DISTRIBUTION

6.8 Homework8

Exercise 6.8.1

(Solution on p. 275.)

According to a study done by De Anza students, the height for Asian adult males is normally

distributed with an average of 66 inches and a standard deviation of 2.5 inches. Suppose one

Asian adult male is randomly chosen. Let X =height of the individual.

a. X∼_______(_______,_______)

b. Find the probability that the person is between 65 and 69 inches. Include a sketch of the graph

and write a probability statement.

c. Would you expect to meet many Asian adult males over 72 inches? Explain why or why not,

and justify your answer numerically.

d. The middle 40% of heights fall between what two values? Sketch the graph and write the

probability state