Introduction to survey data analysis
1 / 33

Introduction to Survey Data Analysis - PowerPoint PPT Presentation

  • Uploaded on

Introduction to Survey Data Analysis. Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois at Chicago. Focus of the seminar. Data cleaning/missing data Sampling bias reduction. When analyzing survey data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Introduction to Survey Data Analysis' - tejana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to survey data analysis

Introduction to Survey Data Analysis

Linda K. Owens, PhD

Assistant Director for Sampling & Analysis

Survey Research Laboratory

University of Illinois at Chicago

Focus of the seminar
Focus of the seminar

  • Data cleaning/missing data

  • Sampling bias reduction

When analyzing survey data
When analyzing survey data...

  • 1. Understand & evaluate survey design

  • 2. Screen the data

  • 3. Adjust for sampling design

1 understand evaluate survey
1. Understand & evaluate survey

  • Conductor of survey

  • Sponsor of survey

  • Measured variables

  • Unit of analysis

  • Mode of data collection

  • Dates of data collection

1 understand evaluate survey1
1. Understand & evaluate survey

  • Geographic coverage

  • Respondent eligibility criteria

  • Sample design

  • Sample size & response rate

Levels of measurement
Levels of measurement

  • Nominal

  • Ordinal

  • Interval

  • Ratio

2 data screening
2. Data screening

  • ALWAYS examine raw frequency distributions for…

  • (a) out-of-range values (outliers)

  • (b) missing values

2 data screening1
2. Data screening

  • Out-of-range values:

  • Delete data

  • Recode values

Missing data
Missing data:

  • can reduce effective sample size

  • may introduce bias

Reasons for missing data
Reasons for missing data

  • Refusals (question sensitivity)

  • Don’t know responses (cognitive problems, memory problems)

  • Not applicable

  • Data processing errors

  • Questionnaire programming errors

  • Design factors

  • Attrition in panel studies

Effects of ignoring missing data
Effects of ignoring missing data

  • Reduced sample size – loss of statistical power

  • Data may no longer be representative– introduces bias

  • Difficult to identify effects

Assumptions on missing data
Assumptions on missing data

  • Missing completely at random (MCAR)

  • Missing at random (MAR)

  • Ignorable

  • Nonignorable

Missing completely at random mcar
Missing completely at random (MCAR)

  • Being missing is independent from any variables.

  • Cases with complete data are indistinguishable from cases with missing data.

  • Missing cases are a random sub-sample of original sample.

Missing at random mar
Missing at random (MAR)

  • The probability of a variable being observed is independent of the true value of that variable controlling for one or more variables.

  • Example: Probability of missing income is unrelated to income within levels of education.

Ignorable missing data
Ignorable missing data

  • The data are MAR.

  • The missing data mechanism is unrelated to the parameters we want to estimate.

Nonignorable missing data
Nonignorable missing data

  • The pattern of data missingness is non-MAR.

Methods of handling missing data
Methods of handling missing data

  • Listwise (casewise) deletion: uses only complete cases

  • Pairwise deletion: uses all available cases

  • Dummy variable adjustment: Missing value indicator method

  • Mean substitution: substitute mean value computed from available cases (cf. unconditional or conditional)

Methods of handling missing data1
Methods of handling missing data

  • Regression methods: predict value based on regression equation with other variables as predictors

  • Hot deck: identify the most similar case to the case with a missing and impute the value

Methods of handling missing data2
Methods of handling missing data

  • Maximum likelihood methods: use all available data to generate maximum likelihood-based statistics.

Methods of handling missing data3
Methods of handling missing data

  • Multiple imputation: combines the methods of ML to produce multiple data sets with imputed values for missing cases

Multiple imputation software
Multiple Imputation Software

  • SAS—user written IVEware, experimental MI & MIANALYZE

  • STATA—user written ICE

  • R—user written libraries and functions


  • S-PLUS

Methods of handling missing data summary
Methods of handling missing data: summary

  • Listwise deletion assumes MCAR or MAR

  • Dummy variable adjustment is biased, don’t use

  • Conditional mean substitution assumes MCAR within cells

  • Hot Deck & Regression improvement on mean substitution but...

  • Multiple Imputation is least biased but computationally intensive

Missing data final point
Missing Data Final Point

  • All methods of imputation underestimate true variance

    • artificial increase in sample size

    • values treated as though obtained by data collection

  • Rao (1996) On variance estimation with imputed survey data. Jrn. Am. Stat. Assoc. 91:499-506

  • Fay (1996) in bibliography

Types of survey sample designs
Types of survey sample designs

  • Simple Random Sampling

  • Systematic Sampling

  • Complex sample designs

    • stratified designs

    • cluster designs

    • mixed mode designs

Why complex sample designs
Why complex sample designs?

  • Increased efficiency

  • Decreased costs

Why complex sample designs1
Why complex sample designs?

  • Statistical software packages with an assumption of SRS underestimate the sampling variance.

  • Not accounting for the impact of complex sample design can lead to a biased estimate of the sampling variance (Type I error).

Sample weights
Sample weights

  • Used to adjust for differing probabilities of selection.

  • In theory, simple random samples are self-weighted.

  • In practice, simple random samples are likely to also require adjustments for nonresponse.

Types of sample weights
Types of sample weights

  • Poststratification weights: designed to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups.

  • Nonresponse weights: designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics.

  • “Blow-up” (expansion) weights: provide estimates for the total population of interest.

Syntax examples of design based analysis in stata sudaan sas
Syntax examples of design-based analysis in STATA, SUDAAN, & SAS


svyset strata strata

svyset psu psu

svyset pweight finalwt

svyreg fatitk age male black hispanic


proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr;

nest strata psu;

weight finalwt

subpgroup sex race;

levels 2 3;

model fatintk = age sex race;

Syntax examples of design based analysis in stata sudaan sas1
Syntax examples of design-based analysis in STATA, SUDAAN, & SAS

  • SAS

  • proc surveyreg data=nhanes;

  • strata strata;

  • cluster psu;

  • class sex race;

  • model fatintk = age sex race;

  • weight finalwt

In summary when analyzing survey data
In summary, when analyzing survey data... SAS

  • Understand & evaluate survey design

  • Screen the data – deal with missing data & outliers.

  • If necessary, adjust for study design using weights and appropriate computer software.

Bibliography SAS

  • On website with slides

  • Big names in area of missing data are:

  • Roderick Little

  • Donald B. Rubin

  • Paul Allison

  • See also McKnight Missing Data: A Gentle Introduction

  • Large government datasets (NSFG, NHANES, NHIS) often include detailed methodological documentation on imputation and weight construction.