1 / 15

Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization

Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by: Michael Davern, Ph.D. Assistant Professor, Research Director SHADAC, Health Services Research and Policy University of Minnesota.

osman
Download Presentation

Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by: Michael Davern, Ph.D. Assistant Professor, Research Director SHADAC, Health Services Research and Policy University of Minnesota Supported by a grant from The Robert Wood Johnson Foundation

  2. Co-authors This work is coauthored with: • Miriam King, Ph.D., Research Associate We both are with the Minnesota Population Center at the University of Minnesota

  3. Data set harmonization • The goal is to simplify access to all available years of a data set for analysis of trends over time. • This goal has many difficulties associated with it. • We focus on the issues involved with handling major sources of survey error over time.

  4. Survey changes present challenges to harmonization • Sample design • How people and records are drawn into a data set changes and affects how variance estimation is done. • Nonresponse • How surveys account for unit, supplement, person and item nonresponse changes over time. • Survey questions and measurement • Changes to question wording and question universes. • Survey processing/editing • Changes to processing and data editing.

  5. Decennial census sample designs • Decennial census sampling • Involves both sampling of people/households to receive the “long form” and sampling of long form records to release (1% and 5%). • Both the household/person selection changes over time as does the process used to select the public use micro data samples. • Data users need access to the sample design information to calculate appropriate variances/standard errors. • Although appropriate estimates can be obtained with replicate weights at the moment most users do not use them. • We are testing sample design variables to add to the IPUMS for Taylor Series estimation. • Will include both a stratification variable, cluster variable and weighting variable (when available) so analysts can simply program in SAS, Stata, SUDAAN, etc. • Our approach will make the changes in sample design seem seamless to the data user and will increase the use of more appropriate estimation methods.

  6. Survey sample designs • The NHIS and CPS change sample designs over time. • Non-self representing PSUs are shuffled so some are not included between the designs. • Self-representing PSUs (MSAs) can also change (boundaries annex/lose counties). • Pooling data between two sample designs is a major challenge. • Data users often like to pool data to get larger samples or rare characteristics (e.g., those with SSI income). • When working with data from years with two sample designs it’s best to average the estimates and the standard errors from single years. • Also some surveys (e.g., NHIS) release sample design information that can be used for Taylor Series estimates, whereas others do not (e.g., CPS).

  7. Nonresponse • There are several types of survey nonresponse. • Unit, person, supplement and item. • Nonresponse is also handled differently by the various surveys and can cause problems for data users. • Unit nonresponse is generally handled by adjusting survey weights of responders to account for nonrespnders. • Heterogeneity among the weights makes it important to use appropriate statistical routines for variance estimation.

  8. Person and supplement nonresponse • Person and supplement nonresponse can be more difficult to deal with. • NHIS, for example, contains information on a household, but if they refused the supplement there is no supplement data for them. • This makes the data structure uneven. • The CPS, on the other hand, fully imputes the missing ASEC (i.e., March) supplement nonresponders (currently about 10% of the cases). • This evens out the data structure making it easier for data users to work with. • Although this can be problematic as the CPS full supplement imputation process can lead to rather large biases in estimates (e.g., health insurance coverage). • We are investigating ways of evening out portions of the NHIS data structure to make it easier to work with and disseminate.

  9. Item nonresponse • Item nonresponse is also a challenge. • Decennial census and CPS are fully imputed for item nonresponse. • Makes it much easier for data users. • Although it can simplify things too much. • The NHIS, on the other hand, does not impute missing values. • This is a major problem for people who want to work with the income series on the NHIS (recently they released separate imputed income files). • We are experimenting with imputing the income data information on the NHIS files using CPS income data.

  10. Question wording and measurement • Question wording changes take many forms. • Change in the basic question • The inclusion of examples • the placement of the question in the survey • Changes in the type of response allowed (e.g., can income amounts be reported in smaller than yearly intervals?) • Providing facsimiles of question wording, and highlighting wording changes in variable documentation, allows users to decide whether comparability is possible for their analyses.

  11. Changes to question universes • Changes in universe definitions affect multiple variables (e.g., the age limit for “adults” answering work and income questions). • Other changes affect single variables. • Providing universe definitions in variable documentation tells users how to restrict their data to achieve comparability. • Testing variable universes reveals when data cleaning is needed before the data are released to users.

  12. Changes in response categories • Many data harmonization projects lose detail by adopting a “least common denominator” approach. • IPUMS projects adopt the joint goal of: • Losing no information • Providing comparability over time • IPUMS projects achieve these goals through composite coding schemes. • The first digit(s) provides detail available across all years • Trailing digits provide additional detail available in only limited years

  13. Other strategies for handling changes in response categories • Creating “bridging” variables is another means of achieving comparability over time. • When responses are given in intervalled form in some years, and in full detail in other years, IPUMS projects provide both detailed and intervalled variables. • Recoding data using a common standard (e.g., the 1950 occupation and industry codes), together with providing the original, unrecoded data, is a third strategy employed by IPUMS projects. • When response changes are too great to achieve comparability (e.g., the shift from 4 to 5 categories for health status in NHIS), the data are provided in separate variables and the issue is discussed in the documentation.

  14. Changes in data processing • Variable documentation also helps users by pointing out subtle changes in data processing by the agency releasing the non-harmonized public use data.

  15. Conclusions • The goal of simplifying data dissemination and harmonization is difficult and demographic survey design and processing play a major role in making it difficult. • Sample design • Survey nonresponse • Survey questions and items • Survey processing/editing

More Related