• No results found

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

N/A
N/A
Protected

Academic year: 2022

Share "MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST) "

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Subject BUSINESS ECONOMICS

Paper No. and Title 2,Applied Business Statistics

Module No. and Title 28, Non-Parametric Statistics (Wilcoxon Test)

Module Tag BSE_P2_M28

(2)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

TABLE OF CONTENTS

1. Learning Outcomes 2. Introduction

3. Wilcoxon Test.

3.1.Wilcoxon Signed Rank Test (Single Sample).

3.2. Wilcoxon Signed Rank Test (Paired Sample)

3.3. Normal Approximation of Wilcoxon Signed rank Test 3.4 Wilcoxon Rank Sum Test

3.5 Normal Approximation of Wilcoxon Rank Sum Test

4. Summary

(3)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST) 1. Learning Outcomes

This module deals with non-parametric statistics. After reading this module you would be able to do one sample and two-sample (Paired and Independent) mean test when your data doesn’t allow for the conventional t test either by being on ordinal scale or because it’s not normal.

 Wilcoxon signed rank test (Single Sample)

 Wilcoxon Signed rank test (Paired sample)

 Wilcoxon Rank Sum Test (Independent Sample)

2.Introduction: Need to Study Non-Parametric Statistics

A potential source of confusion in working out what statistics to use in analyzing data is whether your data allows for parametric or non-parametric statistics.

The issue is of utmost importance and if you get it wrong you risk using an incorrect statistical procedure or you may use a less powerful procedure.

Non-parametric statistical procedures are less powerful because they use less information in their calculations. For example, a parametric correlation uses information about the mean and deviation from the mean while a non-parametric correlation will use only the ordinal position of pairs of scores.

The basic distinction for parametric versus non-parametric is:

If your measurementscale is nominal or ordinal then you use non-parametricstatistics

If you are using interval or ratio scales you use parametric statistics

Besides this, you have to look at the distribution of your data. If your data is supposed to take parametric statistics you should check that the distributions are approximately normal. If distribution deviates markedly from normality then you take the risk that the statistic will be inaccurate. The safest thing to do is to use an equivalent non-parametric statistic.

3.WilcoxonTest

Wilcoxon signed rank test is non-parametric counterpart of parametric t test for one sample and two sample paired test. This test is applicable when the data is not normally distributed or the data series is ordinal. In one sample Wilcoxon signed rank test, our objective is to test for the particular value of mean of the population.

3.1Wilcoxon Signed Rank Test (Single Sample)

The hypothesized value of mean (𝜇0) is subtracted from each sample observation. The resulting series is arranged in the increasing order taking the absolute values of difference and ranked. The rank l is given to the lowest absolute difference and each subsequent observation is given higher

(4)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

ranks. In the next step each rank is given a + or – sign depending upon the sign of difference. The + ranks are summed and that gives our test statistics (s+).

H0: 𝜇 = 𝜇0 Ha: 𝜇 > 𝜇0

We would reject our null if the s+>= c where c is the critical value. The critical value c should be chosen such that the test has a desired significance level (type I error probability). This requires the distribution of s+.

Consider the example of n=4. The number of ways of giving signs to the four ranks is 24=16. The key point is that when H0 is true any collection of the four ranks has the same chance, as does any other. That is the smallest absolute difference is equally likely to be assigned a positive or negative rank. The same is true for the subsequent higher absolute difference.

The table below lists the possible 16 combination of ranks when n=4. The highest possible s+

=n*(n+1)/2 i.e. 10. Since the positive and negative ranks are equally likely, all the ranks would be n/2 +signs and n/2 –signs.

Table: 1

Since all the combinations are equally likely, each of them has a probability of 1/16. From this we can determine the distribution of s+.

The probability of getting s+>=10 is .0625. So if we get the s+ =16 and our significance level is .05 then we would not reject H0. But if our significance level is .10 then we will reject the null.

The probability of getting s+>=9 is .125. If we get s+=9 then we will not reject H0 even if our significance level is .10.

H0𝜇 = 𝜇0 Ha𝜇 < 𝜇0 In this case we would reject H0 if s+<= c Now suppose our hypothesis is

H0𝜇 = 𝜇0 Ha𝜇 ≠ 𝜇0

In this case we would reject H0 if s+>= c1 or s+<= c2 where c2= n*(n+1)/2-c1

The test will be two-tailed test.

The distribution of s+ is module at the end of this chapter for different values of n up-to 20.

Example: 1 A manufacturer of electric irons, wishing to test the accuracy of the thermostat control at the 1000c setting, instructs a test engineer to obtain actual temperatures at that setting for 15 irons. The resulting measurement is as follows:

99.87 100.2 99.54 100.34 99.56 100.2199.34 100.9 99.2100.87100.78 99.02 98.9 101.02 101.56. Can we accept the manufacturer claim?

(5)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Sol. H0𝜇 = 100 Ha𝜇 ≠ 100

The test statistics s+ is 66. For n=15 p(s+=95) is 0.24 from the table. At significance level .05 we can say that the manufactures claim is true.

Test In R:

x=c (99.87,100.2, 99.5, 100.34, 99.56, 100.21, 99.34, 100.9, 99.2, 100.87, 100.78, 99.02, 98.9, 101.02, 101.56)

Wilcox.test (x,alternative=c ("two sided"), mu=100) Wilcoxon signed rank test

Data: x

V = 66, p-value = 0.7615

Alternative hypothesis: true location is not equal to 100

3.2 Wilcoxon Signed rank Test (Paired Sample)

In this case the difference between sample observations is calculated (First Sample-Second Sample) and the absolute value of the difference is ranked. The lowest absolute rank is given rank 1. In the next step the + and –ve sign is assigned to the ranks depending upon the sign of the difference. The sum of positive ranks is taken and that is our test statistics s+.

H0𝜇1= 𝜇2 Ha𝜇1> 𝜇2 In this case we would reject H0 if s+>= c H0𝜇1= 𝜇2 Ha𝜇1< 𝜇2 In this case we would reject H0 if s+<= c

H0𝜇1= 𝜇2 Ha𝜇1≠ 𝜇2In this case we would reject H0 if s+>= c1 or s+<= c2 where c2= n*(n+1)/2-c1

(6)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Example: 2 Output per acre was recorded for ten plots of land. In the first year, without using fertilizer and in second year using fertilizer.

Without Fertilizer 11.0510.90 10.80 11.40 10.710.9510.60 11.20 11.10 11.15 With Fertilizer 11.10 10.9610.7111.32 10.90 10.9410.7411.3511.50 11.66

Can we say that fertilizer affects productivity significantly?

Sol.

H0𝜇1= 𝜇2 Ha𝜇1< 𝜇2

Test statistics s+ is 6. For n=10 the probability of p(s+<=8) (8=n*(n+1)/2-47) is 0.024. The probability of p(s+<=6) would be lower than 0.024. We would reject the null at .05 level of significance. This implies that fertilizer significantly affect the productivity.

Test In R:

p=c(11.05, 10.9, 10.8, 11.32, 10.7, 10.95, 10.6, 11.2, 11.1, 11.15) q=c(11.1, 10.96, 10.71, 11.4, 10.9, 10.94, 10.74, 11.35, 11.5, 11.66) wilcox.test(p,q,alternative=c("less"), mu=0, paired=TRUE)

Wilcoxon signed rank test data: p and q

V = 6, p-value = 0.01367

alternative hypothesis: true location shift is less than 0

3.3.

Normal Approximation of Wilcoxon Signed Rank Test

Normal Approximation of s+ distribution

The distribution of s+ is given at the end of this chapter for different values of n up to 20. It can be shown that the distribution of s+ for n> 20 follows a normal distribution with

𝜇𝑠 =𝑛(𝑛 + 1)

4 𝜎𝑠2=𝑛(𝑛 + 1)(2𝑛 + 1) 24

(7)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

𝑧 = 𝑠+ − 𝜇𝑠 𝜎𝑠

H0 will be rejected if the calculated value of 𝑧 is greater than critical value of 𝑧 at the given significance level.

Example: 3 A particular type of steel has been designed to have a compressive strength of at least 50. For each beam in sample of 25 beams, the compressive strength was determined and is: 49.0 51.05 52.0 51.4 51.2 50.8 49.3 49.4 49.6 49.7 49.8 49.9 51.1 50.9 50.5 50.6 50.87 49.449.749.950.950.649.149.2250.2250.6750.7150.75 49.89 50.01.

Can we say that the compressive strength is not different from 50?

Sol. H0𝜇 = 50 Ha𝜇 ≠ 50

𝜇𝑠 =25(25 + 1)

4 = 162.5 𝜎𝑠2=25(25 + 1)(2 ∗ 25 + 1)

24 = 1381.25 𝑧 = 97 − 162.5

37.17 = −1.76

The critical values of 𝑧 for two-tail test at .05 level of significance are -1.96 and 1.96. Since the calculated doesn’t lie in the rejection region, we couldn’t reject H0. We can say that compressive strength is not different at .05 level of significance.

(8)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST) 3.3

Wilcoxon Rank Sum Test

Wilcoxon rank sum test is the non-parametric counterpart of parametric t test for two independent sample tests. This test is applicable when the data is not normally distributed or the data series is ordinal. In Wilcoxon rank sum test, our objective is to compare the two corresponding population means.

Test Procedure: The two samples of size (m) and (n) are mixed and arranged in increasing order and ranked. The lowest observation is given rank 1 and subsequent higher observations are given higher ranks. After ranking, the samples are separated and sum of ranks is obtained sample wise.

The idea behind Wilcoxon rank sum test is that if the two population means are same then the sum of the ranks of two samples should not be very different. After obtaining the sum of ranks the distribution of sum of ranks (w) for one of the sample is obtained.

Suppose two samples, one having 3 observations and the other having 4 observations are taken.

The lowest possible rank sum for the first sample is sum of the first 3 integers= 6 and highest possible rank sum is the sum of first m+n observations –sum of m observations i.e.18. The rank sum for the first sample would lie between 6 and 18. There are 7c3 =35 possible pairs of ranks for the first sample and each has a probability of 1/35. The distribution of ranks from the first sample is symmetrical about (6+18)/2=12. Next thing is to decide how many pairs would give a particular rank sum and from that we can determine the distribution of rank sum. For example one pair each would obtain rank sum 6 and 18 so the probability of rank sum 6 and 18 is 1/35.

Three pairs (1,3,7), (4,6,2) and (2,3,6) would give a rank sum of 11 so the probability of rank sum 11 is 3/35. Proceeding in the same way we can determine the probability of all rank sum between 6 and 18 and thus we can determine the probability distribution of rank sum from the first sample.

H0𝜇1= 𝜇2 Ha𝜇1> 𝜇2 In this case we would reject H0 if w >= c H0𝜇1= 𝜇2 Ha𝜇1< 𝜇2 In this case we would reject H0 if w <= c

H0𝜇1= 𝜇2 Ha𝜇1 ≠ 𝜇2In this case we would reject H0 if w >= c1 or w <= c2 where c2= m*(m+n+1)-c1

Table at the end of the chapter provides critical values for various possible combinations of sample size.

Example: 4 Urinary fluoride concentration (Parts Per Million) was measured both for a sample of livestock grazing in an area previously exposed to fluoride pollution and for a similar grazing in unpolluted region

(9)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Polluted 21.3 18.7 23.0 17.1 16.8 20.9 19.7 Unpolluted 14.2 18.3 17.2 18.4 20.0

Does the data strongly indicate that true average fluoride concentration for livestock grazing in the polluted region is larger than for the unpolluted region?

Sol. Unpolluted is the first sample (x) and polluted is the sample (y).

H0𝜇1= 𝜇2 Ha𝜇1< 𝜇2

Sample x y y x xx y y x y yy Obs.14.2 16.8 17.1 17.2 18.3 18.4 18.7 19.7 20.0 20.9 21.3 23.0

Rank 1 2 3 4 5 6 7 8 9 10 11 12

The sum of the rank from first sample is 25. The critical value at .01 significance level is 33. The sum of rank obtained is greater than the critical values so null is not rejected. The data indicates that true average fluoride concentration for livestock grazing in the polluted region is not larger than for the unpolluted region.

Test in R:

p=c(14.2, 18.3, 17.2, 18.4, 20.0)

q=c(21.3, 18.7, 23.0, 17.1, 16.8, 20.9, 19.7)

wilcox.test(p,q,alternative=c("less"), mu=0, paired=FALSE)

Wilcoxon rank sum test

data: p and q

W = 10, p-value = 0.1338

alternative hypothesis: true location shift is less than 0

3.3

Normal Approximation of Wilcoxon Rank Sum Test

When both m and n exceed 8, the distribution of w can be approximated by an appropriate normal distribution.

𝜇𝑤=𝑚(𝑚 + 𝑛 + 1)

2 𝜎𝑤2 =𝑚𝑛(𝑚 + 𝑛 + 1) 12 𝑧 =𝑤 − 𝜇𝑤

𝜎𝑤

If the calculated value of z is higher than critical value of z then we reject the H0.

Example: 5 Sample of histamine were obtained from 9 allergic and 13 non-allergic individuals Allergic 67.6 39.6 1651.0 100.0 65.9 1112.0 31.0 102.4 64.7

Non-allergic 34.3 27.3 35.4 48.1 5.2 29.1 4.7 41.7 48.0 66.6 18.9 32.4 45.5

Does the data indicate that there is a difference in true average histamine level for allergic and non-allergic?

Sol. Allergic is the first sample (x) and non-allergic is the sample (y).

H0𝜇1= 𝜇2 Ha𝜇1≠ 𝜇2

(10)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Since both the sample sizes exceed 8, normal approximations have been used.

𝜇𝑤 =9(9 + 13 + 1)

2 = 103.5 𝜎𝑤2 =913(9 + 13 + 1)

12 = 224.25

The sum of the ranks from the allergic sample is 151.

𝑧 =151 − 103.5

√224.25 = 3.17

The calculated value of z is greater than the critical value of z=1.96 for two-tailed test at .05 level of significance and thus we reject the null hypothesis. The data indicates that there is a difference in true average histamine level for allergic and non-allergic.

4. Summary

In this module we have learnt the non-parametric equivalents of conventional t test for one sample and two-sample (paired and independent). These tests should be used instead of t test if your data is on ordinal scale or not normally distributed. These tests are useful in bio-statistics and other areas where the data obtained is not normally distributed.

Table:1 Critical values of Wilcoxon Signed Rank Test

(11)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Critical values of the smallest rank sum for the Wilcoxon Mann Whitney test

(12)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(13)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(14)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(15)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(16)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(17)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(18)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(19)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

(20)

BUSINESS ECONOMICS

PAPER No.: 2,APPLIED BUSINESS STATISTICS

MODULE No.: 28,NON PARAMETRIC STATISTICS(WILCOXON TEST)

Source: Kanji, Gopal K. 100 Statistical Tests. London : SAGE Publication Ltd., 1993.

References

Related documents

Descriptive statistics was used for categorical data, Paired ‘t’ test was used to determine the effectiveness of guided imagery, Karl Pearson correlation coefficient was used to

In this study the pain and functional disability in pretest and post test was evaluated using paired ‘t’ test, by stretching and strengthening exercises in shoulder impingement

Table V represents the mean values, mean difference, standard deviation, and paired ‘t’ value between pre test Vs post test values of shoulder external rotation ROM of group A

While comparing the pre-test and post test values of control group using Paired ‘t’ test, the calculated t value is 3.67 whereas the table value is 2.144786681.Since the

In the analysis and interpretation in Group A, the paired t-test value of Timed Up and Go Test between pre-test versus post-test value 24.864 was greater than the tabulated

The comparative mean values, mean difference, standard deviation and paired t- values between Pre Vs Post test of incremental shuttle walk test in High intensity Aerobic training

Descriptive statistics was used for categorical data, Independent ‘t’ test was used to determine the effectiveness of Information, Education and Communication Package, karl

Whether the means of two normally distributed samples are significantly different at a particular level of probability or not may be tested with the following