1 / 19

A Stata program for calibration weighting

This program in Stata allows for the adjustment of selection weights to match population totals, using methods such as linear and logistic calibration. It is useful for post-stratification and can handle both categorical and numerical variables.

bceleste
Download Presentation

A Stata program for calibration weighting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Stata program for calibration weighting John D’Souza National Centre for Social Research

  2. Outline • Description of calibration • Adjust selection weights so that a weighted sample exactly matches the population • Generalizes post-stratification • Several methods: Linear, logistic … • SAS, GenStat • A new Stata program • Limitations and extensions

  3. Sampling • Selection weights: dk = 1/P(Person k is chosen) • Sample frame variables Xk1, …, XkJ with known population totals, P1, …, PJ. • Horvitz-Thompson estimator of Pi ∑dkXki ≈ Pi for i=1,2, …, J. • Calibration: Adjust dk to get calibration weights, wk, giving exact equality: ∑wkXki = Pi for i=1,2, …, J.

  4. Example: School Census Variables include • Age, Gender, Ethnic Group, Exam results • Type of School, Region • Pupil’s Free School Meal eligibility We calibrate to J variables. Eg. Boy (binary) Girl (binary) Region (eg. four categories) FSM eligibility (binary) J= 1 + 1 + (4-1) + 1 = 6

  5. Special case: post-stratification • Simplest case: • One categorical variable • Easy to deal with (post-stratification) • svyset , poststrata() postweight() • More general case: • Several variables (categorical and numerical)

  6. Deville and Sarndal (1992). Minimize the “distance” between w and d subject to the J calibration constraints. Linear calibration: Minimize ∑S (wk- dk)2/dk Involves solving J simultaneous linear equations Logistic calibration: Minimize ∑S (wklog(wk/dk)– wk + dk) Involves solving J simultaneous non-linear equations

  7. GenStat, SAS, Stata • GenStat and SAS • Methods: linear, logistic and bounded. • Estimation: GenStat gives SEs. • SAS handles categorical variables directly. Enter as indicator variables in GenStat. • Stata • Post-stratification (calibration to one categorical variable). Gives SEs. • No routine for general calibration.

  8. A new Stata program • Typical syntax. matrix M=[10000, 10000, 3000, 4000, 3000, 8000] calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) /// method(linear) print(final) • 10,000 boys, 10,000 girls, 3,000 FSM • Variables boys, girls, FSM are binary • Categorical variable region (4 categories) turned into 4 binary indicator variables). Only 3 entered in the syntax (colinearity)

  9. Output

  10. Options • Options available to: • Control amount of output/graphs • Set max number of iterations/tolerance • Methods • linear, logistic, bounded linear and nonresp (blinear sets bounds for wk/dk. GenStat and SAS have something very similar) (nonresp adjusts for non-response – see below)

  11. Limitations (1) • Solves the equations by finding a matrix inverse • Won’t work if J is large • Can have problems with singular or nearly singular matrices • Iterative methods (logistic, blinear) won’t always converge • No obvious solution to 1. Problem 2 and 3 are usually down to problems with the data

  12. Limitations (2) • We need to recode categorical variables (SAS doesn’t) • Stata: tab region, gen(ireg) • More complicated (eg two-phase) problems aren’t handled directly • Need a bit of syntax to handle this • Other packages can handle this directly

  13. Extensions –Standard errors Calibration weights are often incorrectly treated as selection weights. calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) calibmean , selwt(w1) calibwt(w2) yvar(y) /// marginals(boy girl FSM ireg1-ireg3) /// psu(school) designops (strata(region)) This generalizes Stata’s poststrata command

  14. Extension: Method nonresp (1) Example Select schools, then classes, then pupils Assume all schools respond, pupils might not Variables available on responders. (Pop totals available) Gender, Exam results, FSM, Region Variables on non-responders. (Pop totals not available) PTratio: Pupil-teacher ratio topset: Is pupil in the top set?

  15. Extension: Method nonresp (2) serial region topset outc sex FSM ------------------------------------------ 1. 1001 1 1 0 . . 2. 1002 1 0 1 1 0 3. 1003 2 0 0 . . 4. 1004 1 0 1 1 1 5. 1005 3 1 0 . . ------------------------------------------ 6. 1006 1 0 1 0 1 7. 1007 3 1 1 1 0 8. 1008 2 1 0 . . 9. 1009 1 0 1 1 0

  16. Extension: Method nonresp (3) Population totals unknown, but variables are available on all the sample (including non-responders) calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) /// method(nonresp) outc(outc) /// svars(PTratio topset) Responders weighted to pop totals on “marginals” and to selected sample totals on “svars” (Lundstrom & Sarndal, 2005)

  17. Conclusions • We’ve found the program can handle many practical problems • Easy to calculate SEs (but theory assumes no non-response) • Method nonresp isn’t available in many packages • We don’t have to calibrate to population totals • Eg, calibrate Wave n+1 of a survey to totals from Wave n • Calibrate one sample to look like another

  18. Questions

  19. References • Deville, J.-C. and Sarndal, C.-E. 1992. Calibration estimators in survey sampling. Journal of the American Statistical Association87: 376-382 • Background and theory behind calibration • Lundstrom, S. and Sarndal, C.-E. 2005. Estimation in Surveys with Nonresponse. Wiley • Deals with non-response • Singh, A.C. and Mohl, C.A. 1996. Understanding Calibration estimators in Survey Sampling. Survey Methodology22: 107-115 • Discusses several methods of doing bounded calibration

More Related