Loading in 5 sec....

Weighting and imputationPowerPoint Presentation

Weighting and imputation

- 188 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Weighting and imputation' - luther

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Weighting

- Weighting is the process of adjusting the contribution of each observation in a survey sample based on independent knowledge about appropriate distributions
- Before weighting the implied weight of each observation is 1.0
- After weighting, some observations will have weights >1.0 and some <1.0, and some at 1.0
- No observations should have a weight of 0
- Two general types of weighting:
- Design weights -- Adjusting for differences due to intentional disproportionate sampling (e.g. over-sampling African Americans or certain regions)
- Post-stratification weights -- Adjusting for differences in population or households when release of sample is intended to be representative (e.g. adjustments for non-response of young people)

Common sources for calculating weights

- U.S. Census
- Current Population Survey
- American Community Survey
- For Florida County, Age, Race, Ethnicity the BEBR Population Program

How frequency procedures use weights

- All statistical packages have options on procedures to incorporate weights
- For frequency procedures the weights are multiplied by the unweighted frequencies, then percentages are calculated on the result

Original and adjusted frequency of Employment variable

*This is the sum of the weights for the category

Notes on weighting

- Typically you don’t want weights to make enormous differences
- Keep in mind that with weighting you are saying you have information extraneous to the survey process that informs you of the proper distribution
- You could conceivably up-weight results from a small sample strata
- Weights are typically used for accurate estimates of prevalence
- Models where you test relationships do not need weights if you include the variables you would use to weight

Weighting with more than one variable

- Combined weight with multiplication
- Create individual weights for each variable then multiply weights to get a single weight (Wage*Wgender)
- Not a good solution with a lot of variables

- Combine weights iteratively
- Calculate weight for a variable using frequency table
- Use that weight in frequency of second variable to create weight
- Use that weight in frequency of third variable to create weight
- And so on

Consumer Confidence

- Survey of approximately 500 Florida households each month
- RDD Landline Survey
- Five questions (components) averaged into an Overall Index
- Until now only post-stratification weighting by proportion of households by county

Potential weighting variablesCounty

- Typically we get under-representation from large south Florida counties (Miami-Dade) and over-representation from northern counties (Alachua)
- Household proportions are estimated between census years by BEBR
- Weights June 2011.xls

Potential weighting variablesAge

- RDD tends to lead to over-sampling of seniors with landlines
- Cell phones emerged as a problem around 2005
- No reliable age group data until 2010 Census
- Elderly tend to be less confident than younger respondents due to fixed incomes
- Weights June 2011.xls

Potential weighting variablesHispanic Ethnicity

- Cell phones tend to be used disproportionately by Hispanics
- 2010 Census provided reliable data about proportion of Hispanic Floridians
- Hispanics tend to have lower confidence than non-Hispanics
- Weights June 2011.xls

Potential weighting variablesGender

- Cell phones are disproportionately used by young males
- Monthly CCI uses youngest male/oldest female respondent selection
- Weights June 2011.xls

Example 2- FHIS

- The state of Florida wanted to estimate rates of the uninsured
- They stratified the state into 17 regions and wanted to be able to make estimates for the state and each region with a tolerable margin of error
- On the state level they wanted to be able to say something about Blacks, Hispanics and those under 200 percent of the poverty level
- Data on these demographics for each Florida telephone exchange were obtained prior to sampling
- Strata were created from exchanges
- This made it possible to create weights based on known households in each exchange
- This required design weights to adjust for disproportionate sampling

Example 3 – Medicaid survey

- The state wanted to evaluate Medicaid Reform being conducted in Duval and Broward counties
- They wanted to administer a modified CAHPS instrument to Adults and Children separately
- They wanted to stratify by plan as well, sampling a minimum number of observations per plan
- In the end they wanted to compare plans, counties and adults and children
- These weights required knowledge about total enrollment for each one of these characteristics (plan, age, county)

Imputation

- Like weighting, imputation involves adjusting the analysis after data collection
- Unlike weighting, imputation is the deliberate creation of data that were not actually collected
- The main reason for imputation is to retain observations in a statistical analysis that would otherwise be left out
- Your ability to discover significant results may be compromised by too many missing values
- In some case there may be systematic bias associated with missing data so that not imputing presents an unrepresentative result

Imputation and regressions

- Imputation is particularly common when data are analyzed with regression analysis
- A regression model explains the variability in a dependent variable using one or more independent variables
- Observations can only be included in the regression if they have values for all variables in the model
- Models with a lot of variables increase the probability that an observation will have at least one missing value for them

Example ModelIncome = β1(Age) + β2(Education)+ β3(Employed)

Imputation algorithms

- Two general categories
- Random imputation assigns values randomly, often based on a desired statistical distribution
- Deterministic imputation typically assigns values based on existing knowledge
- Existing knowledge could be in the data
- Single imputation fills missing data with one value, such as the mean of all non-missing values for a continuous variable
- Multiple imputation fills in missing data with a set of plausible values
- Hot deck imputation fills in missing values with those of an observation that matches on key variables

- Existing knowledge could be in the data

Download Presentation

Connecting to Server..