• No results found

Paper: Design of Experiments and Sample Survey Module: Non-response Issues and Some Thoughts

N/A
N/A
Protected

Academic year: 2022

Share "Paper: Design of Experiments and Sample Survey Module: Non-response Issues and Some Thoughts"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Subject: Statistics

Paper: Design of Experiments and Sample Survey Module: Non-response Issues and Some Thoughts

on Census / Sample Survey of Households

Module No: SS-17

(2)

Development Team

Principal investigator:

Dr. Bhaswati Ganguli,

Professor, Department of Statistics, University of Calcutta

Paper co-ordinator:

Dr. Bikas Kumar Sinha,

Retired Professor, Indian Statistical Institute, Kolkata

Content writer: Dr. Santu Ghosh,

Lecturer, Department of Environmental Health Engineering, Sri Ramachandra University, Chennai

Content reviewer: Dr. Sugata SenRoy,

Professor, Department of Statistics, University of Calcutta

(3)

Disclaimer

I take this opportunity to thank Dr Bhaswati Ganguly of the Department of Statistics, Calcutta University for approaching me with a specific purpose. In this UGC initiated epathsala programme, for the subject of

’Statistics’, in her capacity as the Principal Investigator, she wanted me to act as a coordinator for the topic : Design of Experiments [DoE] and Sample Survey [SS]. I gladly accepted her proposal and volunteered to prepare all the 40 Modules as asked for. I have followed a distinctive style while preparing the modules viz., as that of a dialogue between an Instructor [Professor Bikram Kanti Sahay(BKS)] and his two students [Ms.

Sagarika Ghosh(SG) and Mr. Subhra Sankar Gupta(SSG)]. I fondly hope this instructional discourse and my efforts on two of my favorite topics in Statistics will be appreciated and found useful.

In the video recordings, I will impersonate as BKS.Mr. Samopriya Basu[MSc (Statistics), Calcutta University] andMs. Moumita Chatterjee[University of Calcutta, Kolkata] will impersonate as the students [SSG and SG] respectively.

Professor Bikas k Sinha Retired Professor of Statistics Indian Statistical Institute Kolkata

July 10, 2015

(4)

BKS

This is the last module to be taken up in this series of 40 modules covering Design of Experiments [DoE] and Sample Surveys [SS].

I would like to dwell on nonresponse in sample surveys which is regarded as a major practical problem in actual surveys with human populations.

However, I will touch upon some uncommon problems and thoughts.

It is generally said that two of the main sources of nonresponse are refusal to /evasion of answers to questions of a sensitive nature, and

’Not-at-Home’s.

We already took up the first issue and deliberated upon some aspects of RRT towards resolving the problems involving sensitive questions.

(5)

As to the Not-at-Home category of individuals [persons/respondents], usually the technique of ’call-backs’ is proposed.

Early attempts to tackle this type of nonresponse may be attributed to Yates (1946) and Hansen and Hurwitz (1946) who introduced the methodology for call-back procedures.

In a way, the population is tacitly divided into two categories or strata : ”response stratum” wherein the units [persons] do respond at the very first attempt and the ”nonresponse stratum” in which the units do not respond in the first attempt and also rarely do so in subsequent attempts.

A second attempt is in order and it is made based on a selection of a subset of nonrespondents and so on.

Deming (1953) considered a mathematical model for call-backs dealing with cost considerations as well.

(6)

SSG

Sir, yes, we have studied little bit of this in our sample survey course.

Generally, 2 or 3 attempts are made / recommended to recover a part of the nonresponding group.

Then the results are ’mixed together’ to arrive at the estimates of population proportions, say for example.

I am sure, you have different material to talk about.

(7)

BKS

Yes, it is indeed so.

The group of individuals, labeled as not-at-homes, are temporarily absent from home, may be located either at work or on a short holiday or visit. Cochran’s (1977) text book describes in detail the call-back technique with reference to the number of call-backs and the relative costs that are involved.

It is interesting to note that Hartley (1946) suggested an ’ingenious’

alternative to call-backs which is ’decidedly cheaper’.

Hartley mentions: ”details of this scheme were given to the War-time Social Survey, but I understand that, owing to pressure of work, an opportunity of trying has, as yet, not arisen.”

(8)

Soon after, Politz and Simmons (1949) published their work popularly known as Politz-Simmons Technique in the Journal of American Statistical Association which is on similar lines to the suggestions of Hartley.

Politz and Simmons (1949), in their paper, while acknowledging the work of Hartley, mention thus: ”During the past three years we have developed a plan for eliminating the need for call-backs and several experiments have been made applying this plan to market surveys.”

We will study below what is known as Hartley - Politz -Simmons [HPS] Technique.

(9)

The HPS Technique consists of grouping the results [what Hartley calls as strata] of first call as follows: Assuming all calls are made at a particular time in the evening of each of the six week-nights, they proceed to question the respondent if he/she had been at home at the same time on each of the five preceding week-nights.

(10)

SSG

Sir - please, let us take a ’stock’ of what you are saying ! As I understand, you are referring to six week-nights i.e., Monday Night through Saturday Night.

Suppose the interview is held on Thursday Night at more or less 7 pm, say covering half an hour period.

Then whoever is available at that point of time is asked : Were you at home around this time Yesterday [meaning Wednesday], Tuesday last, Monday last, Saturday last, Friday last ?

I believe these are the questions posed and truthful responses are sought from the respondents in terms of binary response.

I can see that there should not be any issue of ’memory loss’ for this short time span.

This is an interesting approach for sure. So, as you said, there is no question of any recall for those not available on this night of the survey.

(11)

BKS

I appreciate your approach that at times it’s worth taking a stock of the material being discussed.

Let me proceed now. Imagine a situation wherein we have a total of N HHs in a neighborhood and we conduct100% screening [i.e., census] of the HHs on a randomly chosen week-night.

Let N˜0 be the number of HHs reported as ’unavailable’.

Further, letN˜i be the number of persons or contacts among the HHs who report as ’being at home on exactly ’i’ nights’ [including the night of the survey],i = 1,2, . . . ,6.

Naturally, N= ˜N0+ ˜N1+· · ·+ ˜N6.

(12)

Further to this, note thatN˜0 >0 suggestsN˜6= 0while N˜0 = 0 suggests N˜6 ≥0.

Again, if ’i’ [≥1] is the number of nights the person was present at home, out of tonight and the previous five nights, then ’i/6’ gives an estimate of the probability of the individual being at home at this time on a randomly chosen night.

In an immediately following paper, Politz and Simmons (1950) discuss the difference in the effects of clustering/stratification in using the ’nights-at-home’ plan.

Let Pi be the population proportion of these contacts/persons/HHs available at home on exactly ’i’ out of 6 week-nights .

(13)

Noticing that the first call answers are mostly from those persons who are at home most of the time, the response is given a weight inversely proportional to ’i’ for each ’i=1, 2, . . . , 6’.

This suggests N˜i/N as an estimate of Pi ×(i/6)so that Pˆi = 6 ˜Ni/iN;i = 1,2, . . . ,6.

Once Pi’s are estimated for i = 1to 6, an estimate for P0 is derived thru the identity :

I 1 =

6

X

i=0

Pˆi.

(14)

Also it follows that

I N˜0/N serves as an unbiased estimate for P0+ 5P1/6 + 4P2/6 + 3P3/6 + 2P4/6 + 1P5/6 i.e., to

P0+ 5P1/6 + 2P2/3 +P3/2 +P4/3 +P5/6.

From this againP0 can be estimated.

Question

Are the two estimates of Pˆ0 necessarily equal ? Can any one / both be negative at times ?

(15)

SSG

Sir, this seems to be a highly interesting scenario ! Up front, it looks like we are going for a ’census’ of all the HHs but, yet, we are in a position only to provide estimates of different meaningful parameters!

What if indeed we took up a sample, say SRSWOR(N,n) of HHs ? Let me try to figure it out myself.

Naturally, we would have a split of n asn˜0,n˜1, . . . ,n˜6 in the same spirit as that of N described above. Oh, I see....this is now in the framework of two-stage expectation etc.

(16)

I believe, with reference to the same randomly sampled week-night of the survey, we can claimE2[ ˜n0/n] = ˜N0/N and hence,E[ ˜n0/n] = E1E2[ ˜n0/n] =E1[ ˜N0/N] =P0+ 5P1/6 + 2P2/3 +P3/2 +P4/3 +P5/6 Likewise, E[ ˜ni/n] =E1E2[ ˜ni/n] =E1[ ˜Ni/N] =Pi ×(i/6);

i = 1,2, . . . ,6.

We are now in a position to estimate the population proportions in different categories as usual.

BKS

Let us work out an illustrative example to understand the nature of data and the underlying computations.

(17)

SSG

Sir, if you don’t mind, let me take it up in my own way.

Case 1 : 100% sampling for a population of N= 1000 HHs yields N˜0= 800,N˜1= 40,N˜2= 80,N˜3= 50,N˜4= 20,N˜5 = 10.

Then, we derive : Pˆ1 = 6×40/1000 = 0.24;

2= 6×80/2000 = 0.24;P˜3= 6×50/3000 = 0.10;

4= 6×20/4000 = 0.03 andP˜5= 6×10/5000 = 0.012.

Eventually, we obtain P˜0 = 1−P

ii = 0.378.

(18)

SSG

Again, alternatively, we can compute P˜0 from the relation :

0.80 = ˜N0/N= ˜P0+ 5×P˜1/6 + 4×P˜2/6 + 3×P˜3/6 + 2×P˜4/6 + 1×P˜5/6 = ˜P0+ [2.532]/6 = ˜P0+ 0.422i.e.,P˜0 = 0.378which is the same as before.

This gives the impression that both formulae may lead to the same result.

Again, we will have similar results while dealing with a random sample of HHs.

It is interesting to note that although hardly 40%are

’not-at-all-at-homes’, the survey indicates80% non-response on the night of the survey !

(19)

BKS

So far, we have discussed about estimation of P(i)’s based on H-P-S sampling and data-gathering scheme.

Suppose we are interested in the incidence proportionP(A)of a qualitative feature ’A’ in the population as a whole.

Under 100% inspection of the HHs on a randomly chosen weekday evening, we end up with[N−N˜0]HHs [who are present] and, accordingly, collect binary response data on this item of enquiry from each of these HHs.

The incidence proportion of this feature amongN−N˜0 HHs can be computed outright.

However, this does not represent the whole picture since the nature of incidence of this feature among the remaining N˜0 HHs is totally unknown.

(20)

Can we develop a ’reasonable’ approach towards estimation of P(A) at this stage ?

Recall that towards unbiased estimation of thePi’s, the available HHs have been stratified/classified according to the number of weekday evenings spent at home.

This number varies between 1to 6. For each such stratum of HHs and for this specific feature [A], we can computeN(A) =˜ number of HHs possessing the feature ’A’.

Note thatN˜1(A) + ˜N2(A) +· · ·+ ˜N6(A)stands for all HHs possessing the feature ’A’ amongN−N˜0 HHs present on the evening of the survey.

What is not known is the corresponding number among the N˜0

’no-show’ HHs.

(21)

However, from our understanding about data analysis, we know about possible decomposition of N˜0 among different strata.

Let us writeN˜0 = ˜N0(0) + ˜N0(1) + ˜N0(2) +· · ·+ ˜N0(6) whereN˜0(0) is estimated number of HHs who belong to the ’no show’ category.

Likewise, N˜0(1) is the estimated number of HHs [amongN˜0 HHs]

who are likely to belong to Stratum1 and so on.

We now projectN˜i(A) based onN˜i HHs corresponding to Stratumi to [ ˜Ni(A)/N˜i]×[ ˜Ni+ ˜N0(i)]for each i = 1,2, . . . ,6.

To summarize, we have gathered idea about the splitting of N into7 categories : No show at all, presence in exactly one weekday evening, presence in exactly2 weekday evenings etc. etc.

(22)

Further to this, with respect to possession of the feature ”A”, we have understanding about the number of HHs possessing this feature in each of the six stratum except the stratum representing ’no show’.

The only problem left is to gather information about possession of the feature ’A’ among ’no show at all’ HHs i.e., amongN˜0(0)HHs.

At this stage, we have available a number of methods. We can use suitable interpolation formula. We can develop a suitable probability model and apply it on the available data.

We will now take up an illustrative example.

(23)

SSG

Sir, May I continue with the example we had taken up earlier ?

We had adopted 100%sampling for a population of N = 1000 HHs which yielded : N˜0= 800, N˜1 = 40,N˜2 = 80,N˜3= 50,N˜4= 20, N˜5= 10.

Subsequently, we derived : Pˆ1= 0.24;P˜2 = 0.24;P˜3= 0.10;

4= 0.03,P˜5 = 0.012andPˆ6= 0.

Eventually, we obtained P˜0= 0.378.

(24)

Now we introduce a qualitative feature ’A’ and obtain frequency counts.

1 Total frequency count of incidence of ’A’ among N−N˜0[= 200]HHs

= 60. What is not known at this stage is the frequency count of incidence of ’A’ among N˜0[= 800]HHs - not available on the evening of the survey.

2 We also have available decomposition of the total count of 60 as follows : N˜1(A) = 20;N˜2(A) = 30;N˜3(A) = 5;N˜4(A) = 3and N˜5(A) = 2.

(25)

3 Recall N˜0 = ˜N0(0) + ˜N0(1) + ˜N0(2) +· · ·+ ˜N0(6)whereN˜0(0)is estimated number of HHs who belong to the ’no show’ category.

Using Pˆi’s, we project and compute

I N˜0(1) = ˜N0×Pˆ1= 800×0.24 = 192

I N˜0(2) = ˜N0×Pˆ2= 800×0.24 = 192

I N˜0(3) = ˜N0×Pˆ3= 800×0.10 = 80

I N˜0(4) = ˜N0×Pˆ4= 800×0.03 = 24

I N˜0(5) = ˜N0×Pˆ5= 800×0.012 = 10[approx...]

I N˜0(6) = ˜N0×Pˆ6= 0

(26)

4 ProjectingN˜i(A) based onN˜i HHs corresponding to Stratum i to [ ˜Ni(A)/N˜i]×[ ˜Ni+ ˜N0(i)] for eachi = 1,2, . . . ,6 :

Celli Projection ofN˜i(A)on the whole population 1 (20/40)×[40 + 192] = 116

2 (30/80)×[80 + 192] = 102 3 (5/50)×(50 + 80) = 13

4 (3/20)×(20 + 24) = 7 [approx...]

5 (2/10)×[10 + 10] = 0[approx...]

6 ZERO

(27)

5 Summarization of data :

Celli Projected Cell Count Projected Count of ’A’

0 302 ?

1 232 116

2 272 102

3 130 13

4 44 7

5 20 0

6 Now it is a matter of application of a suitable interpolation formula to obtain the missing value ’?’.

(28)

BKS

Thanks, SSG, for your very clear thinking and understanding on this problem.

I will now leave to you to figure out how one could implement an unequal probability sampling design here, assuming that the HH sizes are available in advance and more the HH size, less is the chance of staying away from home !

(29)

Thank You

References

Related documents

87 DR SHRE KANT SINGH Professor STATISTICS 88 DR BINOD KUMAR PANDEY Associate Prof STATISTICS 89 DR.MANOJ KUMAR RASTOGI Asstt.Professor STATISTICS 90 DR.

Various methodologies have been reported to meet these requirements. We have studied these methodologies and found that typically there are five main steps followed in the synthesis

Singh, Associate Professor, Instrument Design Development Centre, Indian Institute of Technology Delhi, for encouraging me to take this topic and making the facilities

Sunil Pandey, Professor, Department of Mechanical Engineering, Indian Institute of Technology Delhi and Dr.. Aravindan, Assistant Professor, Department of Mechanical

(fifty percent) of the additional expenditure involved after payment of arrears to eligible faculty members in the implementation of the revision, for the Universities, colleges

Sunil Pandey, Professor, Department of Mechanical Engineering, Indian Institute of Technology Delhi, for his guidance, and valuable suggestions throughout the research period..

Ramanna Fellow and Life-Long Distinguished Professor, Biopsychology Laboratory, and Institute of Excellence, University of Mysore, Mysuru, Karnataka 570006, India; Honorary

Professor and Head, Centre for Rural Development and Appropriate ' 1echnology,Indian Institute of Technology,New Delhi.. She helped