Business Research Methodology by SRINIVAS R RAO - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

observed and expected frequencies, the greater shall be the value of χ2.

The computed value of χ2 is compared with the table value of χ2 for

given degrees of freedom at a certain specified level of significance. If at

the stated level, the calculated value of χ2 is less than the table value, the

difference between theory and observation is not considered as significant.

114

The following observation may be made with regard to the χ2 distribution:-

i. The sum of the observed and expected frequencies is always zero.

Symbolically, ∑(O – E) = ∑O - ∑E

= N – N = 0

ii. The χ2 test depends only on the set of observed and expected frequencies

and on degrees of freedom v. It is a non-parametric test.

iii. χ2 distribution is a limiting approximation of the multinomial

distribution.

iv. Even though χ2 distribution is essentially a continuous distribution it

can be applied to discrete random variables whose frequencies can be

counted and tabulated with or without grouping.

The Chi-Square Distribution

For large sample sizes, the sampling distribution of χ2 can be closely

approximated by a continuous curve known as the Chi-square distribution.

The probability function of χ2 distribution is:

F(χ2) = C (χ2)(v/2 – 1)e – x2/2

Where

e = 2.71828, v = number of degrees of freedom, C = a constant

depending only on v.

The χ2 distribution has only one parameter, v, the number of

degrees of freedom. As in case of t-distribution there is a distribution

for each different number of degrees of freedom. For very small number

of degrees of freedom, the Chi-square distribution is severely skewed

to the right. As the number of degrees of freedom increases, the curve

rapidly becomes more symmetrical. For large values of v the Chi-square

distribution is closely approximated by the normal curve.

115

The following diagram gives χ2 distribution for 1, 5 and 10 degrees of

freedom:

F(x2)

v = 1

v = 5

v = 10

0

2 4 6

8 10 12 14 16 18 20 22

χ2

χ2 Distribution

It is clear from the given diagram that as the degrees of freedom

increase, the curve becomes more and more symmetric. The Chi-square

distribution is a probability distribution and the total area under the curve

in each Chi-square distribution is unity.

Properties of χ2 Distribution

The main properties of χ2 distribution are:-

(i) The mean of the χ2 distribution is equal to the number of degrees

of freedom,

i.e.,

X = v

(ii) The variance of the χ2 distribution is twice the degrees of

freedom, Variance = 2v

(iii)

µ = 0,

1

(iv)

µ = 2v,

2

116

(v)

µ = 8v,

3

(vi)

µ = 48v + 12v2.

4

µ 2 64v2 8

3

(vii)

β = --- = ----- = --

1

µ 2 8v3 v

2

µ 48v + 12v2 12

4

(v)

β µ = ------ = --------------- = 3 + ---

1 3

µ 2 4v2 v

2

The table values of χ2 are available only up to 30 degrees of freedom.

For degrees of freedom greater than 30, the distribution of χ2 approximates

the normal distribution. For degrees of freedom greater than 30, the

approximation is acceptable close. The mean of the distribution √2χ2 is

√2v – 1, and the standard deviation is equal to 1. Thus the application of

the test is simple, for deviation of √2χ2 from √2v – 1 may be interpreted as

a normal deviate with units standard deviation. That is,

Z

=

√2χ2 - √ 2v – 1

Alternative Method Of Obtaining The Value of χ2

In a 2x2 table where the cell frequencies and marginal totals are as below:

a

b

(a+b)

c

d

(c+d)

(a+c)

(b+d)

N

N is the total frequency and ad the larger cross-product, the value

of χ2 can easily be obtained by the following formula:

117

N (ad – bc)2

χ2 = --------------------------------- or

(a + c) (b + d) (c + d) (a + b)

With Yate’s corrections

N (ab – bc - ½N)2

χ2 = -----------------------------------

(a + c) (b + d) (c + d) (a + b)

Conditions for Applying χ2 Test:

The main conditions considered for employing the χ2 test are:

(i) N must be to ensure the similarity between theoretically

correct distribution and our sampling distribution of χ2.

(ii) No theoretical cell frequency should be small when the expected

frequencies are too small. If it is so, then the value of χ2 will be overestimated

and will result in too many rejections of the null hypothesis. To avoid

making incorrect inferences, a general rule is followed that expected

frequency of less than 5 in one cell of a contingency table is too small to use.

When the table contains more than one cell with an expected frequency of

less than 5 then add with the preceding or succeeding frequency so that the

resulting sum is 5 or more. However, in doing so, we reduce the number of

categories of data and will gain less information from contingency table.

(iii) The constraints on the cell frequencies if any should be linear, i.e.,

they should not involve square and higher powers of the frequencies such

as ∑O = ∑E = N.

Uses of χ2 test:

The main uses of χ2 test are:

i. χ2 test as a test of independence. With the help of χ2 test, we can find

out whether two or more attributes are associated or not. Let’s assume

that we have n observations classified according to some attributes.

118

We may ask whether the attributes are related or independent. Thus,

we can find out whether there is any association between skin colour

of husband and wife. To examine the attributes that are associated,

we formulate the null hypothesis that there is no association against

an alternative hypothesis and that there is an association between the

attributes under study. If the calculated value of χ2 is less than the

table value at a certain level of significance, we say that the result of the

experiment provides no evidence for doubting the hypothesis. On the

other hand, if the calculated value of χ2 is greater than the table value

at a certain level of significance, the results of the experiment do not

support the hypothesis.

ii. χ2 test as a test of goodness of fit. This is due to the fact that it enables

us to ascertain how appropriately the theoretical distributions such as

binomial, Poisson, Normal, etc., fit empirical distributions. When an

ideal frequency curve whether normal or some other type is fitted to

the data, we are interested in finding out how well this curve fits with

the observed facts. A test of the concordance of the two can be made

just by inspection, but such a test is obviously inadequate. Precision

can be secured by applying the χ2 test.

iii. χ2 test as a test of homogeneity. The χ2 test of homogeneity is an

extension of the chi-square test of independence. Tests of homogeneity

are designed to determine whether two or more independent random

samples are drawn from the same population or from different

populations. Instead of one sample as we use with independence

problem we shall now have 2 or more samples. For example, we may be

interested in finding out whether or not university students of various

levels, i.e., middle and richer poor income groups are homogeneous in

performance in the examination.

Illustration:

In an anti-diabetes campaign in a certain area, a particular

medicine, say x was administered to 812 persons out of a total population

of 3248. The number of diabetes cases is shown below:

119

Treatment

Diabetes

No Diabetes Total

Medicine x

20

792

812

No Medicine x

220

2216

2436

Total

240

3008

3248

Discuss the usefulness of medicine x in checking malaria.

Solution:

Let us take the hypothesis that quinine is not effective in checking

diabetes. Applying χ2 test :

(A) X (B) 240 x 812

Expectation of (AB) = ------------ = ------------ = 60

N 3248

Or E , i.e., expected frequency corresponding to first row and first column

1

is 60. The bale of expected frequencies shall be:

60

752

812

180

2256

2436

240

3008

3248

O

E

(O – E)2

(O – E)2/E

20

60

1600

26.667

220

180

1600 8.889

792

752

1600 2.218

2216

2256

1600 0.709

[∑(O – E)2/E] = 38.593

120

χ2 = [∑(O – E)2/E] = 38.593

V = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1

For

v = 1, χ2 = 3.84

0.05

The calculated value of χ2 is greater than the table value. The hypothesis

is rejected. Hence medicine x is useful in checking malaria.

Illustration:

In an experiment on immunization of cattle from tuberculosis the

following results were obtained:

Affected

not affected

Inoculated

10

20

Not inoculated

15

5

Calculate χ2 and discuss the effect of vaccine in controlling susceptibility

to tuberculosis (5% value of χ2 for one degree of freedom = 3.84).

Solution:

Let us take the hypothesis that the vaccine is not effective in

controlling susceptibility to tuberculosis. Applying χ2 test:

N(ad – bc)2 50 (11x5 – 20x15)2

χ2 = -------------------------- = ------------------------ = 8.3

(a+b) (c+d)(a+c)(b+d) 30x20x25x25

Since the calculated value of χ2 is greater than the table value the hypothesis

is not true. We, therefore, conclude the vaccine is effective in controlling

susceptibility to tuberculosis.

***

121

122

CHAPTER – IV

Statistical Applications

A BRIEF INTRODUCTION TO STATISTICAL APPLICATIONS

A manager in a business organization – whether in the top level,

or the middle level, or the bottom level - has to perform an important

role of decision making. For solving any organizational problem – which

most of the times happens to be complex in nature -, he has to identify

a set of alternatives, evaluate them and choose the best alternative. The

experience, expertise, rationality and wisdom gained by the manager over

a period of time will definitely stand in good stead in the evaluation of the

alternatives available at his disposal. He has to consider several factors,

sometimes singly and sometimes jointly, during the process of decision

making. He has to deal with the data of not only his organization but also

of other competing organizations.

It would be a challenging situation for a manager when he has

to face so many variables operating simultaneously, something internal

and something external. Among them, he has to identify the important

variables or the dominating factors and he should be able to distinguish

one factor from the other. He should be able to find which factors have

similar characteristics and which factors stand apart. He should be able to

know which factors have an inter play with each other and which factors

remain independent. It would be advantageous to him to know whether

there is any clear pattern followed by the variables under consideration.

At times he may be required to have a good idea of the values that the

variables would assume in future occasions. The task of a manager becomes

all the more difficult in view of the risks and uncertainties surrounding

the future events. It is imperative on the part of a manager to understand

the impact of various policies and programmes on the development of

the organization as well as the environment. Also he should be able to

understand the impact of several of the environmental factors on his

organization. Sometimes a manager has to take a single stage decision and

at times he is called for to take a multistage decision on the basis of various

factors operating in a situation. 123

Statistical analysis is a tool for a manager in the process of decision

making by means of the data on hand. All managerial activities involve

an analysis of data. Statistical approach would enable a manager to have a

scientific guess of the future events also. Statistical methods are systematic

and built by several experts on firmly established theories and consequently

they would enable a manager to overcome the uncertainties associated with

future occasions. However, statistical tools have their shortcomings too.

The limitations do not reflect on the subject. Rather they shall be traced

to the methods of data collection and recording of data. Even with highly

sophisticated statistical methods, one may not arrive at valid conclusions

if the data collected are devoid of representative character.

In any practical problem, one has to see whether the assumptions

are reasonable or not, whether the data represents a wide spectrum,

whether the data is adequate, whether all the conditions for the statistical

tests have been fulfilled, etc. If one takes care of these aspects, it would be

possible to arrive at better alternatives and more reliable solutions, thereby

avoiding future shocks. While it is true that a statistical analysis, by itself,

cannot solve all the problems faced by an organization, it will definitely

enable a manager to comprehend the ground realities of the situation. It

will for sure provide a foresight in the identification of the crucial variables

and the key areas so that he can locate a set of possible solutions within his

ambit. A manager has to have a proper blend of the statistical theories and

practical wisdom and he shall always strive for a holistic approach to solve

any organizational problem. A manager has to provide some safe-guarding

measures against the limitations of the statistical tools. In the process he

will be able to draw valid inferences thereby providing a clue as to the

direction in which the organization shall move in future. He will be ably

guided by the statistical results in the formulation of appropriate strategies

for the organization. Further, he can prepare the organization to face the

possible problems of business fluctuations in future and minimize the

risks with the help of the early warning signals indicated by the relevant

statistical tools.

A marketing manager of a company or a manager in a service

organization will have occasions to come across the general public and

consumers with several social and psychological variables which are

difficult to be measured and quantified.

124

Depending on the situation and the requirement, a manager may have

to deal with the data of just one variable (univariate data), or data on

two variables (bivariate data) or data concerning several simultaneous

variables (multivariate data).

The unit on hand addresses itself to the role of a manager as a

decision maker with the help of data available with him. Different statistical

techniques which are suitable for different requirements are presented

in this unit in a simple style. A manager shall know the strengths and

weaknesses of various statistical tools. He shall know which statistical

tool would be the most appropriate in a particular context so that the

organization will derive the maximum benefit out of it.

The interpretation of the results from statistical analysis occupies

an important place. Statistics is concerned with the aggregates and

not just the individual data items or isolated measurements of certain

variables. Therefore the conclusions from a statistical study will be valid

for a majority of the objects and normal situations only. There are always

extreme cases in any problem and they have to be dealt with separately.

Statistical tools will enable a manager to identify such outliers (abnormal

cases or extreme variables) in a problem. A manager has to evaluate the

statistical inferences, interpret them in the proper context and apply them

in appropriate situations.

While in an actual research problem, one has to handle a large

quantum of data, it is not possible to treat such voluminous data by

a beginner in the subject. Keeping this point in mind, any numerical

example in the present unit is based on a few data items only. It would be

worthwhile to the budding managers to make a start in solving statistical

problems by practicing the ones furnished in this unit.

The candidates are suggested to use hand calculators for solving

statistical problems. There will be frequent occasions to use statistical

tables of f-values furnished in this unit. The candidates are suggested to

have with them a copy of the tables for easy, ready reference. The books

and articles listed under the references may be consulted for further study

or applications of statistical techniques in relevant research areas.

***

125

126

CHAPTER IV

1. Correlation And Regression Analysis

The Concept Of Correlation

Determination Of Simple Correlation Coefficient

Properties Of Correlation Coefficient

The Concept Of Rank Correlation

Determination Of Rank Correlation Coefficient

The Concept Of Regression

The Principle Of Least Squares

Normal Equations

Determination Of Regression Equations

127

SIMPLE CORRELATION

Correlation

Correlation means the average relationship between two or more

variables. When changes in the values of a variable affect the values of

another variable, we say that there is a correlation between the two

variables. The two variables may move in the same direction or in opposite

directions. Simply because of the presence of correlation between two

variables, we cannot jump to the conclusion that there is a cause-effect

relationship between them. Sometimes, it may be due to chance also.

Simple correlation

We say that the correlation is simple if the comparison involves

two variables only.

TYPES OF CORRELATION

Positive correlation

If two variables x and y move in the same direction, we say that

there is a positive correlation between them. In this case, when the value

of one variable increases, the value of the other variable also increases and

when the value of one variable decreases, the value of the other variable

also decreases. Eg. The age and height of a child.

Negative correlation

If two variables x and y move in opposite directions, we say that

there is a negative correlation between them. i.e., when the value of one

variable increases, the value of the other variable decreases and vice versa.

Eg. The price and demand of a normal good.

128

The following diagrams illustrate positive and negative correlations

between x and y.

y

y

x x

Positive Correlation

Negative Correlation

Perfect Positive Correlation

If changes in two variables are in the same direction and the changes

are in equal proportion, we say that there is a perfect positive correlation

between them.

Perfect Negative Correlation

If changes in two variables are in opposite directions and the

absolute values of changes are in equal proportion, we say that there is a

perfect negative correlation between them.

y

y

x