A. Demnati and J. N. K. Rao Statistics Canada / Carleton University

Linearization Variance Estimatorsfor Survey Data: Some Recent Work A. Demnati and J. N. K. RaoStatistics Canada / Carleton University A Presentation at the Third International Conference onEstablishment SurveysJune 18-21, 2007 Montréal, Québec, CanadaJune 20, 2007

looking for a method of variance estimation that Situation • is simple • is widely applicable • has good properties • provides unique choice • for estimators • of nonlinear finite population parameters • SM, 2004 • defined explicitly or implicitly • SM, 2004 • using calibration weights • SM, 2004 • under missing data • JSM, 2002 and JMS, 2002 • using repeated survey • FCSM, 2003 • of model parameters • Symposium, 2005 • of dual frames • JSM, 2007

Demnati –Rao Approach • General formulation • Finite population parameters • Model parameters • Estimator for both parameters • Variance estimators associated with and are different

Demnati –Rao Approach( Survey Methodology, 2004 ) • Write the estimator of a finite population parameter as with if element k is not in sample s; if element k is in sample s;

Demnati –Rao Approach( Survey Methodology, 2004 ) • A linearization sampling variance estimator is given by with : variance estimator of the H-T estimator of the total is a (N×1) vector of arbitrary number

Demnati –Rao Approach( Survey Methodology, 2004 ) • Example – Ratio estimator of For SRS and

Demnati –Rao Approach( Survey Methodology, 2004 ) • Example – Ratio estimator of • is a better choice over customary • Royall and Cumberland (1981) • Särndal et al. (1989) • Valliant (1993) • Binder (1996) • Skinner (2004)

Also in Survey Methodology, 2004: Demnati –Rao Approach • Calibration Estimators: • the GREG Estimator • the “Optimal” Regression Estimator • the Generalized Raking Estimator • Two-Phase Sampling • New Extensions: • Wilcoxon Rank-Sum Test • Cox Proportional Hazards Model

Model parameters(Symposium, 2005) • Finite-population assumed to be generated from a superpopulation model • Inference on model parameter • Total variance of : : model expectation and variance : design expectation and variance i) if f ≈ 0 then ii) if f ≈ 1 then where f is the sampling fraction. For multistage sampling, the psu sampling fraction plays the role of f. In case i),

Example: Ratio estimator when y is assumed to be random • for • Define • We have where Ad is a 2×N matrix of random variables with kth column: • We get where Ab is a 2×N matrix of arbitrary real numbers with kth column: where is an estimator of the total variance of

Estimator of the total variance of and when • A variance estimator of is given by with where Note that is an estimator of model covariance when and when

Hence = model variance + sampling variance where and • Under SRS, where

Under ratio model, Note: remains valid under misspecification of • Hence, Note: g-weight appears automatically in and the finite population correction 1-n/N is absent in

Simulation 1: Unconditional performance • We generated R=2,000 finite populations , each of size N=393 from the ratio model where are independent observations generated from a N(0,1) are the “number of beds” for the Hospitals population studied in Valliant, Dorfman, and Royall (2000, p.424-427) • One simple random sample of specified size n is drawn from each generated population • Parameter of interest:

Simulation 1: Unconditional performance • Ratio estimator: • We calculated: • Simulated • and its components and

Simulation 1: Unconditional performance Figure 1: Averages of variance estimates for selected sample sizes compared to simulated MSE of the ratio estimator.

Simulation 2: Conditional performance • We generate R=20,000 finite populations , each of size N=393 from the ratio model using the number of beds as • One simple random sample of size n=100 is drawn from each generated population • Parameter of interest: • We arranged the 20,000 samples in ascending order of -values and then grouped them into 20 groups each of size 1,000

Simulation 2: Conditional performance Figure 2: Conditional relative bias of the expansion and ratio estimators of

Simulation 2: Conditional performance Figure 3: Conditional relative bias of variance estimators

Simulation 2: Conditional performance Figure 4: Conditional coverage rates of normal theory confidence intervals based on , and for nominal level of 95%

Generalized Linear Model g-weighted estimating functions: model parameter • is the solution of weighted estimating equation: • is solution • Special case: (GREG) • Linear Regression Model • Logistic Regression Model

Simulation 3: Estimating equations • We generated R=10,000 finite populations , each of size N=393 from the model • Using the number of beds as • leads to an average of about 60% for z • One simple random sample of size n=30 is drawn from each generated population • Parameter of interest: • Population units are grouped into two classes with 271 units k having x<350 in class 1 and 122 units k with x>=350 in class 2 • Post-stratification: X=(271,122)T

Simulation 3: Estimating equations

Multiple Weight Adjustments • Weight Adjustments for • Units (or complete) nonresponse • Calibration • Due to lack of time, not presented in the talk, but it is included in the proceeding paper

Concluding Remarks • We provided a method of variance estimation for estimators: • of nonlinear model parameters • using survey data • defined explicitly or implicitly • using multiple weight adjustments • under missing data • The method • is simple • is widely applicable • has good properties • provides unique choice Thank you Very Much

A. Demnati and J. N. K. Rao Statistics Canada / Carleton University