Introduction to Statistics by Ewa Paszek - HTML preview

/ Home / Mathematics (Academic) / Introduction to Statistics

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 4. Tests of Statistical Hypotheses

4.1. TEST ABOUT PROPORTIONS^*

TEST ABOUT PROPORTIONS

Tests of statistical hypotheses are a very important topic, let introduce it through an illustration.

Suppose a manufacturer of a certain printed circuit observes that about p=0.05 of the circuits fails. An engineer and statistician working together suggest some changes that might improve the design of the product. To test this new procedure, it was agreed that n=100 circuits would be produced using the proposed method and the checked. Let Y equal the number of these 200 circuits that fail. Clearly, if the number of failures, Y, is such that Y/200 is about to 0.05, then it seems that the new procedure has not resulted in an improvement. On the other hand, If Y is small so that Y/200 is about 0.01 or 0.02, we might believe that the new method is better than the old one. On the other hand, if Y/200 is 0.08 or 0.09, the proposed method has perhaps caused a greater proportion of failures. What is needed is to establish a formal rule that tells when to accept the new procedure as an improvement. For example, we could accept the new procedure as an improvement if Y≤5 of Y/n≤0.025 . We do note, however, that the probability of the failure could still be about p=0.05 even with the new procedure, and yet we could observe 5 of fewer failures in n=200 trials.

That is, we would accept the new method as being an improvement when, in fact, it was not. This decision is a mistake which we call a Type I error. On the other hand, the new procedure might actually improve the product so that p is much smaller, say p=0.02, and yet we could observe y=7 failures so that y/200=0.035. Thus we would not accept the new method as resulting in an improvement when in fact it had. This decision would also be a mistake which we call a Type II error.

If it we believe these trials, using the new procedure, are independent and have about the same probability of failure on each trial, then Y is binomial b( 200,p ) . We wish to make a statistical inference about p using the unbiased . We could also construct a confidence interval, say one that has 95% confidence, obtaining

This inference is very appropriate and many statisticians simply do this. If the limits of this confidence interval contain 0.05, they would not say the new procedure is necessarily better, al least until more data are taken. If, on the other hand, the upper limit of this confidence interval is less than 0.05, then they fell 95% confident that the true p is now less than 0.05. Here, in this illustration, we are testing whether or not the probability of failure has or has not decreased from 0.05 when the new manufacturing procedure is used.

The no change hypothesis, H₀ :p=0.05 , is called the null hypothesis. Since H₀ :p=0.05 completely specifies the distribution it is called a simple hypothesis; thus H₀ :p=0.05 is a simple null hypothesis.

The research worker’s hypothesis H₁ :p<0.05 is called the alternative hypothesis. Since H₁ :p<0.05 does not completely specify the distribution, it is a composite hypothesis because it is composed of many simple hypotheses.

The rule of rejecting H₀ and accepting H₁ if Y≤5 , and otherwise accepting H₀ is called a test of a statistical hypothesis.

It is clearly seen that two types of errors can be recorded

Type I error: Rejecting H₀ and accepting H₁, when H₀ is true;
Type II error: Accepting H₀ when H₁ is true, that is, when H₀ is false.

Since, in the example above, we make a Type I error if Y≤5 when in fact p=0.05. we can calculate the probability of this error, which we denote by α and call the significance level of the test. Under an assumption, it is .

Since n is rather large and p is small, these binomial probabilities can be approximated extremely well by Poisson probabilities with λ=200( 0.05 )=10. That is, from the Poisson table, the probability of the Type I error is

Thus, the approximate significance level of this test is α=0.067 . This value is reasonably small. However, what about the probability of Type II error in case p has been improved to 0.02, say? This error occurs if Y>5 when, in fact, p=0.02; hence its probability, denoted by β , is

Again we use the Poisson approximation, here λ=200(0.02)=4 , to obtain

The engineers and the statisticians who created this new procedure probably are not too pleased with this answer. That is, they note that if their new procedure of manufacturing circuits has actually decreased the probability of failure to 0.02 from 0.05 (a big improvement), there is still a good chance, 0.215, that H₀ : p=0.05 is accepted and their improvement rejected. Thus, this test of H₀ : p=0.05 against H₁ : p=0.02 is unsatisfactory. Without worrying more about the probability of the Type II error, here, above was presented a frequently used procedure for testing H₀ : p=p ₀, where p₀ is some specified probability of success. This test is based upon the fact that the number of successes, Y, in n independent Bernoulli trials is such that Y/n has an approximate normal distribution, N[p ₀ , p ₀ (1- p ₀ )/n] , provided H₀ : p=p ₀ is true and n is large. Suppose the alternative hypothesis is H₀ : p>p ₀ ; that is, it has been hypothesized by a research worker that something has been done to increase the probability of success. Consider the test of H₀ : p=p ₀ against H₁ : p> p ₀ that rejects H₀ and accepts H₁ if and only if

That is, if Y/n exceeds p₀ by standard deviations of Y/n , we reject H₀ and accept the hypothesis H₁ : p> p ₀. Since, under H₀ Z is approximately N( 0,1 ) , the approximate probability of this occurring when H₀ : p=p ₀ is true is α . That is the significance level of that test is approximately α . If the alternative is H₁ : p< p ₀ instead of H₁ : p> p ₀, then the appropriate α -level test is given by Z≤−z_α. That is, if Y/n is smaller than p₀ by standard deviations of Y/n , we accept H₁ : p< p ₀.

In general, without changing the sample size or the type of the test of the hypothesis, a decrease in α causes an increase in β , and a decrease in β causes an increase in α . Both probabilities α and β of the two types of errors can be decreased only by increasing the sample size or, in some way, constructing a better test of the hypothesis.

EXAMPLE

If n=100 and we desire a test with significance level α =0.05, then means, since is N(μ,100/100=1) ,

and c−60=1.645 . Thus c=61.645. The power function is

In particular, this means that β at μ =65 is =1−K( μ )=Φ( 61.645−65 )=Φ( −3.355 )≈0; so, with n=100, both α and β have decreased from their respective original values of 0.1587 and 0.0668 when n=25. Rather than guess at the value of n, an ideal power function determines the sample size. Let us use a critical region of the form . Further, suppose that we want α =0.025 and, when μ =65, β =0.05. Thus, since is N(μ,100/n) ,

and

That is, and .

Solving these equations simultaneously for c and , we obtain

Thus, and n=51.98 . Since n must be an integer, we would use n=52 and obtain α =0.025 and β =0.05, approximately.

For a number of years there has been another value associated with a statistical test, and most statistical computer programs automatically print this out; it is called the probability value or, for brevity, p-value. The p-value associated with a test is the probability that we obtain the observed value of the test statistic or a value that is more extreme in the direction of the alternative hypothesis, calculated when H₀ is true. Rather than select the critical region ahead of time, the p-value of a test can be reported and the reader then makes a decision.

Say we are testing H₀ : μ=60 against H₁ : μ>60 with a sample mean based on n=52 observations. Suppose that we obtain the observed sample mean of . If we compute the probability of obtaining an of that value of 62.75 or greater when μ =60, then we obtain the p-value associated with . That is,

If this p-value is small, we tend to reject the hypothesis H₀ : μ=60 . For example, rejection of H₀ : μ=60 if the p-value is less than or equal to 0.025 is exactly the same as rejection if .That is, has a p-value of 0.025. To help keep the definition of p-value in mind, we note that it can be thought of as that tail-end probability, under H₀, of the distribution of the statistic, here , beyond the observed value of the statistic. See Figure 1 for the p-value associated with

Figure 4.1.

The p-value associated with

Example 4.1.

Suppose that in the past, a golfer’s scores have been (approximately) normally distributed with mean μ =90 and σ²=9. After taking some lessons, the golfer has reason to believe that the mean μ has decreased. (We assume that σ² is still about 9.) To test the null hypothesis H₀ : μ=90 against the alternative hypothesis H₁ : μ<90 , the golfer plays 16 games, computing the sample mean .If is small, say , then H₀ is rejected and H₁ accepted; that is, it seems as if the mean μ has actually decreased after the lessons. If c=88.5, then the power function of the test is

Because 9/16 is the variance of . In particular, α=K( 90 )=Φ( −2 )=1−0.9772=0.0228.

If, in fact, the true mean is equal to μ =88 after the lessons, the power is K( 88 )=Φ( 2/3 )=0.7475 . If μ =87, then K( 87 )=Φ( 2 )=0.9772 . An observed sample mean of has a

and this would lead to a rejection at α =0.0228 (or even α =0.01).

Introduction to Statistics by Ewa Paszek - HTML preview

Chapter 4. Tests of Statistical Hypotheses

4.1. TEST ABOUT PROPORTIONS*

TEST ABOUT PROPORTIONS

EXAMPLE

4.2. TESTS ABOUT ONE MEAN AND ONE VARIANCE*

4.1. TEST ABOUT PROPORTIONS^*

4.2. TESTS ABOUT ONE MEAN AND ONE VARIANCE^*