Estimating the Population Mean in Two-Stage Sampling with Equal Size Clusters under Non-Response Using Auxiliary Characteristic

The present paper has been devoted to the study of estimating the population mean in two-stage sampling with equal size clusters under non-response using an auxiliary variable. This paper focuses on the study of general families of factor-type estimators of population mean considering two different cases in which non-response is observed on study variable only and on both study and auxiliary variables respectively. The optimum properties of the proposed families in both the cases are discussed. The empirical study is also carried out in support of the theoretical results.


iNTRodUCTioN
I n cluster sampling, all the elements of the selected clusters are enumerated. Though cluster sampling is economical under certain circumstances, it is generally less efficient than sampling of individual units directly. A compromise between cluster sampling and direct sampling of units can be achieved by selecting a sample of clusters and surveying only a sample of units in each sampled cluster instead of completely enumerating all the units in the sampled clusters. Thus, the procedure of first selecting clusters and then choosing a specified number of units from each selected cluster is known as two-stage sampling. The clusters which form the units of the sampling at the first stage are called the first-stage units and the elements or groups of elements within clusters which form the units of sampling at the second stage are called second-stage units.
To improve the efficiency of surveys, auxiliary information may be used in the sampling design or the estimation of parameters. There are several uses of auxiliary information for improving the efficiency of the estimators at the estimation stage in two-stage sampling. Sahoo and Panda (1997) have defined a main class of estimators of population total in two-stage sampling. Panda (1999a, 1999b) extended the results to the situation when two-auxiliary variables are available in estimating the population mean of the study variable. Srivastava and Garg (2009) have proposed a general family of estimators for population mean using multi-auxiliary information in two-stage sampling.
In the present paper, our aim is to present some new estimation strategies in two-stage sampling design when first-stage units are of equal size and the knowledge on an auxiliary characteristic is available. The population is assumed to be suffered from the problem of non-response. We have defined two different families of estimators, based on Factor-Type Estimators (FTE) proposed by Singh and Shukla (1987), assuming (i) non-response occurs only in terms of study variable and (ii) non-response is observed both on study and auxiliary variables. Properties of defined strategies are discussed and have been illustrated with the help of empirical data.

SAMPliNg STRATEgy ANd ESTiMATioN PRoCEdURE UNdER NoN-RESPoNSE
Let us consider a population of size NM divided into N first-stage units (f.s.u's) each having M second-stage units (s.s.u's). A sample of size n is selected from the N f.s.u's with the help of simple random sampling without replacement (SRSWOR) scheme. From each of selected f.s.u's, a random sample of size m of s.s.u's, is drawn from M s.s.u's with the help of SRSWOR scheme. It is observed that the non-response occurs at second stage only and out of m s.s.u's, there are m il respondent and m i2 non-respondent units for the i th f.s.u. (i = 1,2,...,N) .
In presence of non-response, using Hansen and Hurwitz (1946) procedure of sub sampling of non-respondents, we select a sub-sample of size h i2 units from the m i2 non-respondent units with the help of SRSWOR scheme for the i th f.s.u. (i = 1,2,...,N)  It is observed that the estimator T OHH is an unbiased estimator of population mean X o . Thus, the variance of T OHH can be obtained as:

PRoPoSEd FAMiliES oF ESTiMAToRS
In order to get the improved estimate of the population mean, we utilize the auxiliary information. Let us suppose that X 1 be the auxiliary variable with population mean X 1 . Due to Singh and Shukla (1987), a family of factor-type estimators for estimating the population mean X 0 using auxiliary information, can be defined as (if the population is free from non-response): where x 0 and x 1 are the sample mean estimators based on nm s.s.u's for study and auxiliary variables respectively and: (to be determined).
We shall now consider two different sampling strategies under non-response utilizing the concept of FTE.
(i) Non-response is present on study variable and information on auxiliary variable is obtained for all selected s.s.u's. stage.

Non-Response on Study Variable only
In this case, the family of factor-type estimators for estimating the population mean, X o can be defined as: where T OHH is given by (2.2). a ratio-type estimator in two-stage sampling in presence of non-response.

Properties of Proposed Family
In order to find the bias and mean square error (MSE) of T a * , we use large sample approximation. Let us assume that:   Thus, the bias of the family T a * up to the first order of approximation is given as: Since T a * gives a biased estimate of X o , therefore, we can obtain MSE of T a * up to the first order of approximation as: Mathematical Journal of Interdisciplinary Sciences, Volume 2, Number 1, September 2013 (2.7)

optimum Value of a
In order to find the optimum estimator of the proposed family, we minimize M T a * ( ) with respect to a . On differentiating M T a * ( ) with respect to a and equating the derivative to zero, we get: where V, being a function of parameters of the population, is a constant for a given population. The above expression is a cubic equation in a and, therefore, on solving, we may get at the most three real and positive optimum values of the parameter a for which M T a * ( ) would attain its minimum.

Non-Response on Both Study and Auxiliary Variables
If the auxiliary variable is also subjected to non-response, then similar to study variable, an unbiased estimator of population mean X 1 , based upon the subsamples of non-respondents, may be defined as:

( ) =
, which is same as T 4 * , defined in sub-section 3.1. For other choices of a , similarly, one may get non-response versions of some existing estimators defined in two-stage sampling utilizing auxiliary characteristic.

Properties of the Proposed Family
We use large sample approximation for finding the bias and MSE of T a     These results are due to Singh (1998). Further, S 1i2 2 is the mean square of the non-response group for the auxiliary variable in i th f.s.u. and where X 0i2 and X 1i2 are respectively the means of study and auxiliary variables of non-response group for the i th f.s.u. Thus, the bias up to first order of approximation can be obtained as: (2.13) Now, the MSE up to the first order of approximation, can be obtained as:

EMPiRiCAl STUdy
In order to understand the applications of the results obtained in this paper and to observe the behavior of the estimators, it is essential to illustrate whatever has been discussed in previous sections with some empirical data. However, due to non-availability of suitable empirical data for the purpose, we have considered fictitious data in the following manner: We selected 20 random numbers from random number table (Rao et al, 1966) in the bunches of 25 clusters. The four-digit random numbers were converted into two-digit numbers by placing the decimal after two digits so as to reduce the magnitudes of the numbers. The 500 numbers, so selected, were assumed to be values of the study variable in the population of 25 clusters of size 20 each. The same procedure was again repeated in order to generate corresponding values of the auxiliary variable in the population. Thus, we have N = 25, M = 20, NM = 500.
Taking m =5 and n = 10, we have illustrated the results. Table 1 shows values of some of the parameters of clusters in the population. From the above values of the parameters, equation (2.9) gives φ a ( ) . = 0 8564 for the estimator T a * . Table 2 shows the values of M T a * ( ) for a = 1 4 , and a opt for W i2 0 1 = . (0.1) 0.4 in each cluster.  ( ) for a =1, 4 and a opt with W i2 = 0.1 (0.1) 0.4 k i and = 2 (0.5) 3.5 for all i.

CoNClUSioNS
We have suggested two different general families of factor-type estimators for estimating the population mean in two-stage sampling with equal size clusters under non-response using an auxiliary variable. The suggested families can generate a number of well known estimators on different choices of a . A comparison of values of MSE of the estimator T a * in the table 2 reveals that for a given value of the parameter, a , MSE increases with increase in nonresponse rate and also with smaller size of sub-samples of non-respondents. The result is also intuitively expected. Further, the same trend is exhibited in table 4 for the estimator T a (*) . In both the cases, MSEs of the estimators for optimum are slightly smaller than that obtained for a =1, implying that ratio-type estimators are almost as much précised as the optimum estimator. A comparison of values obtained in Table 4 makes it clear that the efficiency of the strategy is almost unaffected by the size of the sub-sample of non-respondents, whatever be the non-response rate. This might be due to the assumption of large sample approximation while deriving the MSE of the estimator.