- By
**sybil** - Follow User

- 107 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Prediction and Imputation in ISEE' - sybil

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Presentation Transcript

### Prediction and Imputation in ISEE

### Background information:

- Tools for more efficient use of combined data sources

Li-Chun Zhang, Statistics Norway

Svein Nordbotton, University of Bergen

Use of a statistical register

- Combining administrative and survey data
- Model-based prediction or weighting
- Construction of statistical registers
- Uses of a statistical register
- Prediction of (sub-)population totals
- Multiple uses & general database quality => inferential concerns associated with imputation
- How to balance between the two types inferential concerns?

A triple-goal criterion for statistical registers

- Effisicient population totals of interest
- Correct co-variances among survey variables, as well as between survey and auxiliary variables
- Non-stochastic & constant tabulation

A simultaneous prediction method

- NNI as the only feasible approach in terms of preserving co-variances among all the variables. To improve efficiency: introduce restrictions on the imputed totals, which may be obtained separately from imputation, say, through regression prediction. To be referred to as NNI with restrictions (NNI-WR).
- A simultaneous prediction method
- Values are generated outside of the sample
- Efficient for prediction of population totals
- Not optimal (or best) prediction of each specific unit, but for the assemble of units, now that attention is given to the co-variances among the variables.

About NNI-WR

- Separation of prediction of totals from general imputation concerns, allowing full freedom in search of efficient methods
- Solves variance estimation problem at the same time
- Genuine multivariate imputation with realistic imputed values
- Non-parametric nature and mild regularity condition suggest robustness, compared to standard regression based approaches
- NNI can be made non-stochastic, yielding constant tabulations on repetition

An algorithm and current research

- An algorithm
- Jump-start phase: to speed up the imputation procedure if desirable
- Fine-tune phase: relaxation to k-nearest neighbor imputation for better agreement with restrictions; consistency remains
- Adjustment between the two phases
- Current research
- How well does the algorithm perform in real statistical productions?
- Effective way of setting up the restrictions, i.e. maximum control with minimum number of explicit restrictions for imputation?
- Evaluation of micro-data quality

Some standard methods of prediction and imputation

Basic prediction approach

- Under the general linear model:
- Target parameter T = linear combination of y-values in the population
- Estimation of T Prediction of T outside of the selected sample
- Prediction of individuals: A special case
- Main problems for a statistical register
- Lack of natural variation in data; especially if many units have the same x-values
- Infeasible simultaneously for a large amount of variables; impractical as production mode; leading to inconsistency of cross-tabulation

Random regression imputation (RRI)

- To emulate the natural variation in data: Add a random residual to the best predicted y-value
- Hot-deck as a special case
- Main problems:
- Extra variance of imputed estimator due to random imputation => never fully efficient
- Random imputation not the only means for creating natural variation in data
- Different tabulations on repetition => lack of acceptability and face-value in official statist.

Multiple imputation (MI)

- Independent random imputations + formulae for combining results
- Bayesian or frequentist approach
- Main problems:
- Removes all the extra imputation variance only if infinite number of repetitions. Otherwise, still not fully efficient & non-constant tabulations
- A common misunderstanding: only MI can yield acceptable measures of accuracy.

Predictive mean matching (PMM)

- Find the donor among the observed units who has the same predict y-value & impute the observed y-value
- Noticeable difference from RRI as the chance of multiple donors decreases; PMM is more efficient due to the removal of imputation variance.
- Essentially a marginal, variable-by-variable approach

Nearest neighbor imputation (NNI)

- Provided a set of covariates and a distance metric, the donor is the ‘nearest’ observed unit.
- A non-parametric generalization of PMM & dot-deck as a special case. More flexible and practical for multivariate imputation than regression models.
- Chen and Shao (2000): consistent estimator of totals as well as finite population distributions, provided the absolute difference in conditional means of y is bounded by the ‘distance’ between two units. Linear models as special cases.
- Can be made non-stochastic by introducing extra seemingly uncorrelated covariates, such as Zip code.
- Main draw back: Usually not efficient (i.e. local smoothing instead of global regression predictor)

Artificial neural network (ANN)

- Class of functional imputation
- ANN as generalized regression functions (Bishop, 1995)
- No analytic predictor
- Unrealistic imputed values for categorical variables of interest
- Usually not fully efficient

Download Presentation

Connecting to Server..