introduction to survey data analysis n.
Skip this Video
Download Presentation
Introduction to Survey Data Analysis

Loading in 2 Seconds...

play fullscreen
1 / 33

Introduction to Survey Data Analysis - PowerPoint PPT Presentation

  • Uploaded on

Introduction to Survey Data Analysis. Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois at Chicago. Focus of the seminar. Data cleaning/missing data Sampling bias reduction. When analyzing survey data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Introduction to Survey Data Analysis' - tejana

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to survey data analysis

Introduction to Survey Data Analysis

Linda K. Owens, PhD

Assistant Director for Sampling & Analysis

Survey Research Laboratory

University of Illinois at Chicago

focus of the seminar
Focus of the seminar
  • Data cleaning/missing data
  • Sampling bias reduction
when analyzing survey data
When analyzing survey data...
  • 1. Understand & evaluate survey design
  • 2. Screen the data
  • 3. Adjust for sampling design
1 understand evaluate survey
1. Understand & evaluate survey
  • Conductor of survey
  • Sponsor of survey
  • Measured variables
  • Unit of analysis
  • Mode of data collection
  • Dates of data collection
1 understand evaluate survey1
1. Understand & evaluate survey
  • Geographic coverage
  • Respondent eligibility criteria
  • Sample design
  • Sample size & response rate
levels of measurement
Levels of measurement
  • Nominal
  • Ordinal
  • Interval
  • Ratio
2 data screening
2. Data screening
  • ALWAYS examine raw frequency distributions for…
  • (a) out-of-range values (outliers)
  • (b) missing values
2 data screening1
2. Data screening
  • Out-of-range values:
  • Delete data
  • Recode values
missing data
Missing data:
  • can reduce effective sample size
  • may introduce bias
reasons for missing data
Reasons for missing data
  • Refusals (question sensitivity)
  • Don’t know responses (cognitive problems, memory problems)
  • Not applicable
  • Data processing errors
  • Questionnaire programming errors
  • Design factors
  • Attrition in panel studies
effects of ignoring missing data
Effects of ignoring missing data
  • Reduced sample size – loss of statistical power
  • Data may no longer be representative– introduces bias
  • Difficult to identify effects
assumptions on missing data
Assumptions on missing data
  • Missing completely at random (MCAR)
  • Missing at random (MAR)
  • Ignorable
  • Nonignorable
missing completely at random mcar
Missing completely at random (MCAR)
  • Being missing is independent from any variables.
  • Cases with complete data are indistinguishable from cases with missing data.
  • Missing cases are a random sub-sample of original sample.
missing at random mar
Missing at random (MAR)
  • The probability of a variable being observed is independent of the true value of that variable controlling for one or more variables.
  • Example: Probability of missing income is unrelated to income within levels of education.
ignorable missing data
Ignorable missing data
  • The data are MAR.
  • The missing data mechanism is unrelated to the parameters we want to estimate.
nonignorable missing data
Nonignorable missing data
  • The pattern of data missingness is non-MAR.
methods of handling missing data
Methods of handling missing data
  • Listwise (casewise) deletion: uses only complete cases
  • Pairwise deletion: uses all available cases
  • Dummy variable adjustment: Missing value indicator method
  • Mean substitution: substitute mean value computed from available cases (cf. unconditional or conditional)
methods of handling missing data1
Methods of handling missing data
  • Regression methods: predict value based on regression equation with other variables as predictors
  • Hot deck: identify the most similar case to the case with a missing and impute the value
methods of handling missing data2
Methods of handling missing data
  • Maximum likelihood methods: use all available data to generate maximum likelihood-based statistics.
methods of handling missing data3
Methods of handling missing data
  • Multiple imputation: combines the methods of ML to produce multiple data sets with imputed values for missing cases
multiple imputation software
Multiple Imputation Software
  • SAS—user written IVEware, experimental MI & MIANALYZE
  • STATA—user written ICE
  • R—user written libraries and functions
  • S-PLUS
methods of handling missing data summary
Methods of handling missing data: summary
  • Listwise deletion assumes MCAR or MAR
  • Dummy variable adjustment is biased, don’t use
  • Conditional mean substitution assumes MCAR within cells
  • Hot Deck & Regression improvement on mean substitution but...
  • Multiple Imputation is least biased but computationally intensive
missing data final point
Missing Data Final Point
  • All methods of imputation underestimate true variance
    • artificial increase in sample size
    • values treated as though obtained by data collection
  • Rao (1996) On variance estimation with imputed survey data. Jrn. Am. Stat. Assoc. 91:499-506
  • Fay (1996) in bibliography
types of survey sample designs
Types of survey sample designs
  • Simple Random Sampling
  • Systematic Sampling
  • Complex sample designs
    • stratified designs
    • cluster designs
    • mixed mode designs
why complex sample designs
Why complex sample designs?
  • Increased efficiency
  • Decreased costs
why complex sample designs1
Why complex sample designs?
  • Statistical software packages with an assumption of SRS underestimate the sampling variance.
  • Not accounting for the impact of complex sample design can lead to a biased estimate of the sampling variance (Type I error).
sample weights
Sample weights
  • Used to adjust for differing probabilities of selection.
  • In theory, simple random samples are self-weighted.
  • In practice, simple random samples are likely to also require adjustments for nonresponse.
types of sample weights
Types of sample weights
  • Poststratification weights: designed to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups.
  • Nonresponse weights: designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics.
  • “Blow-up” (expansion) weights: provide estimates for the total population of interest.
syntax examples of design based analysis in stata sudaan sas
Syntax examples of design-based analysis in STATA, SUDAAN, & SAS


svyset strata strata

svyset psu psu

svyset pweight finalwt

svyreg fatitk age male black hispanic


proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr;

nest strata psu;

weight finalwt

subpgroup sex race;

levels 2 3;

model fatintk = age sex race;

syntax examples of design based analysis in stata sudaan sas1
Syntax examples of design-based analysis in STATA, SUDAAN, & SAS
  • SAS
  • proc surveyreg data=nhanes;
  • strata strata;
  • cluster psu;
  • class sex race;
  • model fatintk = age sex race;
  • weight finalwt
in summary when analyzing survey data
In summary, when analyzing survey data...
  • Understand & evaluate survey design
  • Screen the data – deal with missing data & outliers.
  • If necessary, adjust for study design using weights and appropriate computer software.
  • On website with slides
  • Big names in area of missing data are:
  • Roderick Little
  • Donald B. Rubin
  • Paul Allison
  • See also McKnight Missing Data: A Gentle Introduction
  • Large government datasets (NSFG, NHANES, NHIS) often include detailed methodological documentation on imputation and weight construction.
Thank You!