Business Research Methodology by SRINIVAS R RAO - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

correction term is 0.5 + 0.5 + 2 = 3

144

Problem 9 : Resolving ties in ranks

The following are the details of ratings scored by two popular

insurance schemes. Determine the rank correlation coefficient between

them.

Scheme I

80

80

83

84

87

87

89

90

Scheme II 55

56

57

57

57

58

59

60

Solution:

From the given values, we have to determine the ranks.

Step 1.

Arrange the scores for Insurance Scheme I in descending order and

rank them as 1,2,3,…,8.

Scheme

I

90

89

87

87

84

83

80

80

Score

Rank

1

2

3

4

5

6

7

8

The score 87 appears twice. The corresponding ranks are 3, 4.

Their average is (3 + 4) / 2 = 3.5. Assign this rank to the two equal scores

in Scheme I.

The score 80 appears twice. The corresponding ranks are 7, 8.

Their average is (7 + 8) / 2 = 7.5. Assign this rank to the two equal scores

in Scheme I.

The revised ranks for Insurance Scheme I are as follows:

Scheme

I

90

89

87 87

84

83 80

80

Score

Rank

1

2

3.5 3.5

5

6 7.5 7.5

145

Step 2.

Arrange the scores for Insurance Scheme II in descending order

and rank them as 1,2,3,…,8.

Scheme

II

60

59

58

57

57

57

56

55

Score

Rank

1

2

3

4

5

6

7

8

The score 57 appears thrice. The corresponding ranks are 4, 5, 6.

Their average is (4 + 5 + 6) / 3 = 15 / 3 = 5. Assign this rank to the three

equal scores in Scheme II.

The revised ranks for Insurance Scheme II are as follows:

Scheme

60

59

58

57

57

57

56

55

II Score

Rank

1

2

3

5

5

5

7

8

Step 3.

Calculation of D2: Assign the revised ranks to the given pairs of

values and calculate D2 as follows:

Scheme I Scheme II Scheme I Scheme II D=R - R

D2

Score

Score

Rank: R

Rank: R

1

2

1

2

80

55

7.5

8

- 0.5

0.25

80

56

7.5

7

0.5

0.25

83

57

6

5

1

1

84

57

5

5

0

0

87

57

3.5

5

- 1.5

2.25

87

58

3.5

3

0.5

0.25

89

59

2

2

0

0

90

60

1

1

0

0

Total

4

146

Step 4.

Calculation of ρ:

We have N = 8.

Since there are 2 ties with 2 items each and another tie with 3 items,

the correction term is 0.5 + 0.5 + 2 .

The rank correlation coefficient is

ρ = 1 - [{ 6 ∑ D2 + (1/2) + (1/2) +2 }/ (N3 – N)}]

= 1 – { 6 (4.+0.5+0.5+2) / (512 – 8) } = 1 – (6 x 7 / 504) = 1 - ( 42/504 )

= 1 - 0.083 = 0.917

Inference:

It is inferred that the two insurance schemes are highly, positively

correlated.

REGRESSION

In the pairs of observations, if there is a cause and effect relationship

between the variables X and Y, then the average relationship between

these two variables is called regression, which means “stepping back” or

“return to the average”. The linear relationship giving the best mean value

of a variable corresponding to the other variable is called a regression

line or line of the best fit. The regression of X on Y is different from the

regression of Y on X. Thus, there are two equations of regression and the

two regression lines are given as follows:

Regression of Y on X: Y Y = b X X

yx (

)

Regression of X on Y: X X = b Y Y

xy (

)

Where X , Y are the means of X, Y respectively.

Result:

Let σ , σ denote the standard deviations of x, y respectively. We

x

y

have the following result.

147

σ

σ

Y

X

b = r

and b = r

yx

xy

σ

σ

X

Y

2

r = b b

and so r = b b

yx xy

yx xy

Result:

The coefficient of correlation r between X and Y is the square root

of the product of the b values in the two regression equations. We can find

r by this way also.

Application

The method of regression is very much useful for business

forecasting.

PRINCIPLE OF LEAST SQUARES

Let x, y be two variables under consideration. Out of them, let x

be an independent variable and let y be a dependent variable, depending

on x. We desire to build a functional relationship between them. For this

purpose, the first and foremost requirement is that x, y have a high degree

of correlation. If the correlation coefficient between x and y is moderate or

less, we shall not go ahead with the task of fitting a functional relationship

between them.

Suppose there is a high degree of correlation (positive or negative)

between x and y. Suppose it is required to build a linear relationship

between them i.e., we want a regression of y on x.

Geometrically speaking, if we plot the corresponding values of x

and y in a 2-dimensional plane and join such points, we shall obtain a

straight line. However, hardly we can expect all the pairs (x, y) to lie on

a straight line. We can consider several straight lines which are, to some

extent, near all the points (x, y). Consider one line. An observation (x , y )

1

1

may be either above the line of consideration or below the line. Project this

point on the x-axis. It will meet the straight line at the point (x , y e). Here

1

1

the theoretical value (or the expected value) of the variable is y e while the

1

148

observed value is y . When there is a difference between the expected and

1

observed values, there appears an error. This error is E = y –y . This is

1

1

1

positive if (x , y ) is a point above the line and negative if (x , y ) is a point

1

1

1

1

below the line. For the n pairs of observations, we have the following n

quantities of error:

E = y – y ,

1

1

1

E = y – y ,

2

2

2

E = y – y .

n

n

n

Some of these quantities are positive while the remaining ones are

negative. However, the squares of all these quantities are positive.

Y

(X1, Y1)

e1

e2

(X2, Y2)

O

X

i.e.,

E2 = (y – y )2 ≥ 0, E2 = (y –y )2 ≥ 0, …, E2 = (y –y )2 ≥ 0.

1

1

1

2

2

2

n

n

n

Hence the sum of squares of errors (SSE) = E2 + E2 + … + E2

1

2

n

= (y –y )2 + (y –y )2 + … + (y –y )2 ≥ 0.

1

2

2

2

n

n

149

Among all those straight lines which are somewhat near to the

given observations

(x , y ), (x , y ), …, (x , y ) , we consider that straight line as the ideal one

1

1

2

2

n

n

for which the sse is the least. Since the ideal straight line giving regression

of y on x is based on this concept, we call this principle as the Principle of

least squares.

Normal equations

Suppose we have to fit a straight line to the n pairs of observations

(x , y ), (x , y ), …, (x , y ). Suppose the equation of straight line finally

1

1

2

2

n

n

comes as

Y = a + b X (1)

Where

a, b are constants to be determined. Mathematically speaking, when

we require finding the equation of a straight line, two distinct points on

the straight line are sufficient. However, a different approach is followed

here. We want to include all the observations in our attempt to build a

straight line. Then all the n observed points (x, y) are required to satisfy

the relation

(1). Consider the summation of all such terms. We get

∑ y = ∑ (a + b x ) = ∑ (a .1 + b x ) = ( ∑ a.1) + ( ∑ b x ) = a ( ∑ 1 ) + b ( ∑ x).

i.e.

∑ y = an + b (∑ x) (2)

To find two quantities a and b, we require two equations. We have

obtained one equation i.e., (2). We need one more equation. For this

purpose, multiply both sides of (1) by

x. We obtain

x y = ax + bx2 .

Consider the summation of all such terms. We get

∑ x y = ∑ (ax + bx2 ) = (∑ a x) + ( ∑ bx2)

150

i.e.,

∑ x y = a (∑ x ) + b (∑ x2) ………….. (3)

Equations (2) and (3) are referred to as the normal equations associated

with the regression of y on x. Solving these two equations, we obtain

2

∑X ∑Y - ∑X ∑XY

a =

n ∑ X - (∑X)2

2

n ∑XY - ∑X ∑Y

and b =

n ∑X - (∑X)2

2

Note:

For calculating the coefficient of correlation,

we require ∑X, ∑Y, ∑ Xy, ∑ X2, ∑Y2.

For calculating the regression of y on x, we require ∑X, ∑Y, ∑ XY, ∑

X2. Thus, tabular column is same in both the cases with the difference that

∑Y2 is also required for the coefficient of correlation.

Next, if we consider the regression line of x on y, we get the equation

X = a + b y. The expressions for the coefficients can be got by interchanging

the roles of X and Y in the previous discussion. Thus, we obtain

2

Y ∑X - ∑Y ∑XY

a =

n ∑ Y - (∑Y)2

2

n ∑XY - ∑X ∑Y

And b =

n ∑ Y - (∑Y)2

2

151

Problem 10

Consider the fol owing data on sales and profit.

X

5

6

7

8

9

10

11

Y

2

4

5

5

3

8

7

Determine the regression of profit on sales.

Solution:

We have N = 7. Take X = Sales, Y = Profit.

Calculate ∑ X, ∑y, ∑XY, ∑X2 as follows:

X

Y

XY

X2

5

2

10

25

6

4

24

36

7

5

35

49

8

5

40

64

9

3

27

81

10

8

80

100

11

7

77

121

Total: 56

34

293

476

a = {(∑ x2) (∑ y) – (∑ x) (∑ x y)} / {n (∑ x2) – (∑ x)2}

= (476 x 34 – 56 x 293) / ( 7 x 476 - 562 )

= (16184 – 16408 ) / ( 3332 – 3136 )

= - 224 / 196

= – 1.1429

152

b = {n (∑ x y) – (∑ x) (∑ y)} / {n (∑ x2) – (∑ x) 2}

= (7 x 293 – 56 x 34)/ 196 = (2051 – 1904)/ 196

= 147 /196

= 0.75

The regression of Y on X is given by the equation

Y = a + b X

I.e.,

Y = – 1.14 + 0.75 X

Problem 11

The following are the details of income and expenditure of 10

households.

Income

40

70 50

60

80

50

90 40

60

60

Expenditure 25 60 45 50 45

20

55 30

35

30

Determine the regression of expenditure on income and estimate the

expenditure when the income is 65.

Solution:

We have N = 10. Take X = Income, Y = Expenditure

Calculate ∑ X, ∑y, ∑Xy, ∑X2 as follows:

X

Y

XY

X2

40

25

1000

1600

70

60

4200

4900

50

45

2250

2500

153

60

50

3000

3600

80

45

3600

6400

50

20

1000

2500

90

55

4950

8100

40

30

1200

1600

60

35

2100

3600

60

30

1800

3600

Total: 600

395

25100

38400

a = {(∑ x2) (∑ y) – (∑ x) (∑ x y)} / {n (∑ x2) – (∑ x) 2}

= ( 38400 x 395 - 600 x 25100 ) / (10 x 38400 - 6002)

= (15168000 – 15060000) / (384000 – 360000)

= 108000 / 24000

= 4.5

b = {n (∑ x y) – (∑ x) (∑ y)} / {n (∑ x2) – (∑ x) 2}

= ( 10 x 25100 – 600 x 395) / 24000

= (251000- 237000) / 24000

= 14000 / 24000

= 0.58

The regression of y on x is given by the equation

Y = a + b X

i.e.,

Y = 4.5 + 0.583 X

154

To estimate the expenditure when income is 65:

Take X = 65 in the above equation. Then we get

Y = 4.5 + 0.583 x 65

= 4.5 + 37.895

= 42.395

= 42 (approximately).

Problem 12

Consider the following data on occupancy rate and profit of a hotel.

Occupancy 40 45 70 60 70 75 70 80 95 90

rate

Profit

50

55

65

70

90

95 105 110 120 125

Determine the regressions of

(i) profit on occupancy rate and

(ii) occupancy rate on profit.

Solution:

We have N = 10. Take X = Occupancy Rate, Y = Profit.

Note that in Problems 10 and 11, we wanted only one regression

line and so we did not take ∑Y2 . Now we require two regression lines.

Therefore,

155

Calculate ∑ X, ∑Y, ∑XY, ∑X2, ∑Y2.

X

Y

XY

X2

Y2

40

50

2000

1600

2500

45

55

2475

2025

3025

70

65

4550

4900

4225

60

70

4200

3600

4900

70

90

6300

4900

8100

75

95

7125

5625

9025

70

105

7350

4900

11025

80

110

8800

6400

12100

95

120

11400

9025

14400

90

125

11250

8100

15625

Total: 695

885

65450

51075

84925

The regression line of Y on X:

Y = a + b X

Where

a ={(∑ x2) (∑ y) – (∑ x) (∑ x y)} / {n (∑ x2) – (∑ x) 2}

and

b ={n (∑ x y) – (∑ x) (∑ y)} / {n (∑ x2) – (∑ x) 2}