Estimating Finite Population Mean Using Ranked Set Two-Stage Sampling Design

Estimation of Finite Population Mean Using Ranked Set Two-stage Sampling DesignByU C Sud and Dwidesh MishraIASRI, New Delhi-110012

Introduction • The method of Ranked Set Sampling (RSS) was first introduced by McIntyre (1952) as a cost-efficient alternative to simple random sampling for situations where outside information is available allowing one to rank small sets of sampling units according to the character of interest without actually quantifying the units. • McIyntyre was concerned with estimating agricultural yields where the ranking could be done on the basis of visual inspection. • One of the strengths of the method, however, is that its implementation and performance require only that ranking be possible but they do not depend in any way on how the ranking is accomplished

The Method of RSS • A basic cycle of the method involves the random selection of m2 units from the population. These units are randomly partitioned into m subsets, each containing m sampling units. The members of every subset are ranked according to the character of interest. • Then the lowest ranked member is quantified from the first set, the second lowest ranked member is quantified from the second set, and so on until the highest ranked member of the last set is quantified. • This yields m quantification from among the m2 selected units. Since m is usually taken as small in order to facilitate the ranking, there may not be enough measurements for reasonable inference and the basic cycle is repeated r times to give n=mr quantifications out of r selected units.

Let us take a set-size m=3 with r=4 • Then the sampling scheme can be shown by the following diagram • Here each row indicates a judgement ordered sample for each cycle. Encircled units are quantified. Out of 36 units drawn, 12 units have been quantified

Contd. • Let X11, X12,…, X1m, X22,…,X2m,…,Xm1,…,Xmm be independent random variables all having the same cumulative distribution function F(x). Also let • Xi(1), Xi(2),…, Xi(m) denote the corresponding order statistics of, Xi1,…,Xi2,…,Xii,…,Xim • (i=1,2,…,m). Then X1(1), X2(2),…, Xm(m) is the ranked set sample(considering one cycle only), since Xi(i)is the i-th order statistic in the i-th sample. • The value Xij for the randomly drawn units can be arranged as in the following diagram: • Set

Contd. • After ranking the units appear as: The quantified units appear as

Examples • RSS is very useful in environmental and ecological sampling where exact measurement (or quantification) of a selected unit is either difficult or expensive in terms of time, money or labor, but where ranking of a small set of selected units according to the characteristic of interest can be done with reasonable success on the basis of visual inspection or other rough method not requiring actual measurement. • Thus if the interest lies in estimating the mean height of the sampled trees, then measurement of the height of the trees could pose a problem, but it would be relatively easy to rank small sets of trees on the basis of visual inspection. • In situations where visual inspection is not directly available ranking can be done on the basis of a covariate that is more accessible and also correlated with the character of interest. • Thus for estimating volume of trees one can carry out ranking on the basis of diameter of the trees.

Theory of RSS • Performance of the RSS estimator is generally benchmarked against that of simple random sampling (SRS) estimator with the same number of quantifications. For this purpose, one may employ either the relative precision, • Or the relative savings, • There was little follow up on McIntyre’s (1952) proposal until late 1960s when Hall and Dell (1966) published a field evaluation and Takahasi and Wakimoto (1968) developed the statistical theory for the RSS method. When sampling is from a continuous population and the ranking is perfect, Takahasi and Wakimoto proved that is unbiased for and is at least as efficient as .

Contd. • They also obtained the variance of the RSS estimator as • where is the population variance and is the expected i-th out of m order statistic from the population. They also established the bound • or • The upper bound indicates that ranked set sampling can result in very substantial savings when compared with simple random sampling. Specifically, the method can result in savings in the number of quantifications by as much as 33, 50, 60, 67 percent when m=2, 3, 4, 5 respectively.

Review • Stokes (1979) considered the use of concominant variable at the estimation stage in the context of RSS • Stokes (1980) dealt with the problem of estimation of population variance • Dell and Clutter (1972) considered the problem of ranking errors • Philip and Lam (1997) developed a regression estimator for RSS

RSS in the Context of Finite Population Sampling • Early developments inRSS wereconcerned with sampling from infinite population. • Patil et al. (1994) were the first to consider the situation of sampling from finite population. • Explicit expressions were obtained for the variance of the RSS estimator and for its precision relative to that of simple random sampling without replacement. • Krishna (2002) extended the theory of RSS to the case of sampling from a finite population by utilising a Horvitz-Thomson estimator for the estimation of the finite population mean. • Calculation of • Calculation of is tedious

RSS for Two – stage sampling designs However, the contributions made by Patil et al. (1994) and Krishna (2002) were limited to the case of uni-stage sampling designs. RSS for Two - Stage Sampling Design Three different cases have been studied. In the first case the SRS is used at the 1st stage of sampling and RSS at the 2nd stage of sampling. Similarly, the RSS is used at 1st stage and SRS at the 2nd stage in second case. In the third case the RSS is used in both the stages of sampling. In each of the cases efficiency comparisons of RSS based estimators have been made with SRS based estimators with the help of real datawhen the sampling is SRS at both the stages of sampling. Let there be a finite population of N primary stage units, a-th primary stage unit is of size M. Let be the value of unit pertaining to b-th secondary stage unit (ssu) of a-th primary stage unit (psu).

Contd. = mean per ssu in the a-th psu = Population mean Case 1: SRS at first stage and RSS at second stage Let a sample of size ‘n’ be drawn from ‘N’ by SRSWOR. Also, let a set of size m be selected at random and without replacement from M using RSS. Without any loss of generality we assume that

Case 1: SRS at first stage and RSS at second stage • Define the event such that the k-th ranked unit in the subset is the s-th ranked unit in the population of ssu. Also write, and let denote the - dimensional column vector having as its s-th component -

Contd. It may be noted that is given by If is the quantification of the k-th ranked unit from the set, then

Contd.

Contd. is the component wise square of Next, we study the joint distribution of the order statistics from two disjoint sets. Let two disjoint sets each of size be drawn without replacement from Write for the event that the k-th ranked unit from set 1 has rank s and the j-th ranked unit from set 2 has rank t in the population of size We define

Contd. Following Patil et al. (1994), it may be seen that Let be the matrix with as its (s,t)th component. Notice that , since .Let . and be the quantification of the k-th and j-th ranked units from set 1 and set 2, respectively. Then ,

Contd. The covariance between is given by

Contd. Let mr sets, each of size m, be selected randomly using RSS and without replacement from the a-th psu. Let the lowest ranked unit be quantified in each of the first ‘r’ sets- In each of the next r sets, the second ranked unit is quantified to give: This process continues until the highest ranked unit is quantified in each of the last r sets:

Contd. Theorem 1, The estimator is unbiased and variance of is given by

Proof of the results The matrix is symmetric with zeroes on the diagonal, it is calculated by A program has been made in the language Turbo ‘C’ to calculate T Proof: To prove that the estimator is unbiased, we proceed as follows:

Contd.

Contd. After centering

Contd.

Case2: RSS at first stage and SRS at second stage Assume that a sample of size ‘m’ is selected by SRSWOR from the a-th psu a=1,2,…,N. Further, we assume that a set of size ‘n’ is selected from ‘N’ by RSS. Also, as in Case 1, we assume that the psu’s are increasingly arranged. Define the event such that the a-th ranked unit in the subset is the s-th ranked unit in the population of psu’s. Define be the row vector having

Contd. as its s-th component s=1,2,…,N; a=1,2,…,n = sample mean for the a-th psu.

Contd. To study the joint distribution of the order statistics from disjoint sets each of size ‘n’ drawn by without replacement using RSS, let be the event that the a-th ranked unit from set 1 has rank s in the population and the c-th ranked unit from set 2 has rank t in the population.

Contd. Let and be the quantification of the a-th and c-th ranked units from set 1 and set 2, respectively. Then , Moments of the estimator of population mean: Let nr sets each of size n be selected randomly and without replacement from a population of N psu’s. Let the lowest ranked unit be quantified in each of the first r sets

Contd. Similarly, in each of the next r sets, the second ranked unit is quantified to give This process continues until the highest raked unit is quantified in each of the last r sets: Thus, the proposed estimator of population mean, when the sample at the first stage is selected by RSS and at the second stage by SRS, is given by

Case III: RSS at both the stages On the same lines as in case 1, it can be show that is unbiased and the variance of = + Case3 : RSS at both the stages

3. Empirical Study For the purpose of comparing the RSS and the SRS based estimator an empirical study was carried out where in a part of the data of wheat crop for an experimental station as given in Singh et al. (1979) was taken. The data comprised 9 fields each field having 4 plots. (Set I). (The population values of were 4.163 and 0.306 respectively). For RSS protocol, plots in each field were ranked according to the perceived weight of wheat yield. Using this data, estimators of population mean based on RSS and SRS were considered for the three cases dealt with earlier.

Another data set given in Singh and Mangat (1996) on outstanding loans of farmers affiliated to cooperatives was utilized to compare the performance of RSS and SRS based estimators. (Set II). The population values of were 38.05 and 11.23 respectively). The data comprised 9 blocks and 4 societies in each of the block. Finally data on number of persons in a household given in Raj (1971) was also utilized to compare the performance of RSS and SRS based estimators. (Set III). (The population values of were 7052 and 0.093 respectively). Here also the data comprised 9 households and 4 persons in a household

Table 2.1 Per cent gain in precision of RSS based estimators over SRS based estimators

References: • Dell, T.R. and Clutter, J.L.(1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545-553. • Halls, L.K. and Dell, T.R. (1966). Trail of ranker set sampling for forage yields. Forest Science, 12, 22-26. • Krishna, Pravin (2002). Some aspects of ranked set sampling from finite population. M.Sc.Thesis of I.A.R.I., New Delhi-12. • McIntyre, G A (1952). A method of unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research, 3, 385-390. • Patil, G.P., Sinha, A. K. and Taillie, C. (1993). Ranked set sampling from a finite population in the presence of a trend on a site. Journal of Applied Statistical Science. Vol.1, No. 1, 51-65. • Patil, G.P., Sinha, A. K. and Taillie, C. (1994). Ranked set sampling. Handbook of Statistics. 12, (eds. Patil, G. P. and Rao, C. R.), 167-198, North-Holland, Amsterdam. • Patil, G.P., Sinha, A. K. and Taillie, C. (1995). Finite population corrections for ranked set sampling. Annals of Institute of Statistical Mathematics. Vol.47, No. 4, 621-636.

Raj, D. (1971). The Design of Sample Surveys. Mcgraw-Hill Book Co., New York. • Singh, D., Singh, P. and Kumar, P. (1979). Hand Book on Sampling Methods. Indian Agricultural Statistics Research Institute, New Delhi. • Singh, R and Mangat, N.P.S. (1996). Elements of Survey Sampling. Kluwer Academic Publisher, pp 388. • Stokes, S L (1977). Ranked set sampling with concominant variables. Communication in statistics, Theory and Methods, 6, 1207-1211. • Stokes, S L (1980). Estimation of variance using judgement order ranked set samples. Biometrics, 36, 35-42.

Takahasi, K. and Wakimoto, K. (1968). On biased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics, 20, 1-31. • Yu, Philip L.H. and Lam K. (1997). Regression estimator in ranked set sampling, Biometrics, 53, 1070-1080.

THANKS

Estimating Finite Population Mean Using Ranked Set Two-Stage Sampling Design

Estimating Finite Population Mean Using Ranked Set Two-Stage Sampling Design

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction