2008 OFHS Public Use File: STATA Tutorial

Using the 2008 OFHS Public Use File A Self Guided Tutorial *Stata Version*

Introduction • This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF). • The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed. • The dataset is a record of the responses to the survey questions at the respondent level. • The dataset is in a format that requires the use of SAS, a statistical analysis software from SAS Institute. • The dataset is also available for SAS and SPSS. There is a separate tutorial for SAS users.

STATA Users • Prerequisites • User has STATA Release 9 or Higher. • User has experience writing STATA programs. • User has an understanding of basic statistics, including analysis of univariate data using nominal and ordinal level variables. • User is comfortable with statistical terms such as proportions, standard error, confidence level, and confidence interval.

OFHS Background • The 2008 OFHS is the largest State sponsored health survey in the U.S. • Previous surveys were completed in 1998 and 2004. • The survey had a sample size of 50,993. • The survey was stratified to have enough respondents to do some analysis for each county in the state.

Documents that you may download before you get started. • OFHS Questionnaire • OFHS Codebook These documents are available on the OFHS web site. http://grc.osu.edu/ofhs Look on the Downloads page.

What you need to know about the survey. • Survey Design • Survey Questions • Imputation of Missing Values • Weighting of Responses • Constructed Variables

Survey Design • The survey is a stratified random sample of Ohio’s non-institutional population. • Conducted through telephone interviews. • Land Lines (49,000 respondents) • Cell Phone (2,000 respondents) • Random Digit Dialing (land lines) within exchange numbers associated with each county. • Exchanges are the first 3 digits of a seven digit phone number. • The last four digits within each exchange are randomly selected.

Survey Design • Cell Phones • Exchanges are at state level. • Over Samples • African Americans - Some Exchanges in 6 largest urban counties have higher proportion of African Americans in the population. The higher proportion exchanges were sampled at a higher rate. • Asian and Hispanics - Supplementation of survey with lists of persons with hispanic or asian surnames. • Household clusters • Each household/family forms a cluster within the sample. • One adult and one child are randomly selected within the family. • Each response includes information on the adult, and the child (if there are any children). • The adult who is most knowledgeable about the child’s health responds for the child.

Survey Design • The population of persons within each of the strata (State, County, telephone exchange, household, etc.) is already known or is collected as a part of the survey. • A weight is established for each child and adult which reflects the inverse of the probability of being selected for the survey. • Indicators of the strata and the weights are used in the STATA programs. We will come back to this later on.

Survey Questions • In the survey questionnaire there are different kinds of questions. They include: • Qs that help to establish the weights for the survey. • How many children are in the family? • How many phone numbers are in the home?

Survey Questions • Qs that identify the demographic and socioeconomic characteristics of the individuals and the family. • Age, gender, race, ethnicity. • Family income, employment, occupation. • Education

Survey Questions • Qs that identify the insurance status of the adult and child respondents. • Source of Coverage (Job based, Medicare, Medicaid, etc.) • If no insurance, the length of time without insurance. • Difficulty in getting insurance. • Types of Coverage (dental, prescriptions, vision mental health)

Survey Questions • Health Status of Adult and Child • General health status • Chronic health conditions • Special Health Care needs • Functional disability • Height and weight

Survey Questions • Health Care Access, Utilization, Satisfaction and Unmet needs. • Usual source of care • Care coordination • Specialists • Emergency room use • Hospitalizations • Types of unmet needs.

Survey Questions • Questions are at multiple levels. • Anchor Questions are questions that are asked of everyone. • Qualifying Questions are questions that help to narrow down who should be responding to an in-depth question. • In-depth questions probe the dimensions of the respondent’s experience with a particular phenomenon.

D43. //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he// had diabetes or sugar diabetes? 01 YES 02 (Skip to D45) NO 03 [VOLUNTEERED:] BORDERLINE 98 DK 99 REFUSED D43a //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he/she// had TYPE 1 CHILD ONSET DIABETES or TYPE 2 ADULT ONSET, DIABETES? [INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS ‘BORDERLINE’ CODE AS ‘03’] //Display response option 97, only if S15 = 02, 99. // 97 (Skip to D45) [VOLUNTEERED:] YES, “GESTATIONAL” OR “ONLY WHEN PREGNANT” MENTIONED 01 YES - TYPE I (JUVENILE) 02 YES - TYPE II (ADULT ONSET) 03 [VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY 04 (Skip to D45) NO, NEVER DIAGNOSED WITH DIABETES 98 (Skip to D45) DK 99 (Skip to D45) REFUSED Example of Question levels Anchor Question

Example of Question levels • D43b. //If (s15 = 02) then ask:// • //Was your/Was person in S1’s// DIABETES only during a time associated with a pregnancy? [INTERVIEWER: PROBE FOR PROPER CODE] • 01 (Skip to D45) YES ONLY WHEN PREGNANT • 02 NO • 98 (Skip to D45) DK • (Skip to D45) REFUSED • D44. //Is your/Is person on S1’s// blood sugar or glucose level, which affects diabetes, USUALLY under control or where a physician wants it, even if medication is required Always, Usually, Sometimes, Rarely, or Never? • 01 ALWAYS • 02 USUALLY • 03 SOMETIMES • 04 RARELY • 05 NEVER • 98 DK • 99 REFUSED Qualifying Question In Depth Question

Question levels • Notice in the example that there are instructions to skip to another question if the answer is no. • These are anchor questions and qualifying questions which are eliminating persons from answering the in-depth questions. • As a result, when a question is not asked of a respondent it creates a missing value for the respondent which is MISSING BY DESIGN.

Missing Values • Some data is missing in the survey because the respondent refused to answer the question, or did not know the answer. • These kinds of missing values need to be treated differently then those that are ‘missing by design’.

Missing Values • There are some types of questions which are very important to the survey design or for public policy issues, for which it is not acceptable to have values missing. • These include questions like: • Number of children in the family (design) • Family Income (public policy)

Imputation of Missing Values • Where it is important for the survey to not have any missing values, the survey statisticians have replaced the missing value, by imputing it from all of the other survey respondents that answered other questions in the survey like the respondent did. • Survey statisticians use very sophisticated models and processes to do imputation, and the practice is well accepted. • When using this survey to do analysis, it is expected that the user will choose the form of the variable which includes the imputed values. • These variables are labeled and typically have a suffix of “_imp”.

Weighting • Weights for each adult and child response which reflect the inverse of the probability of being selected for the survey, are constructed and should be used in all analysis. • When the weights are used, the results reflect an accurate reflection of the entire population.

Weighting • If the weights for children in the OFHS were summed up across all responses, the total would be equal to the child population of Ohio. The same is true of the adult weights. • The variable name for the adult weight is “wt_a”. • The variable name for the child weight is “wt_c”.

Constructed Variables • There are many variables in the OFHS file that are constructed from the responses to the survey questions that make it easier to use the OFHS. These variables include: • BMI – Body mass index. BMI is an indicator of adult and child obesity constructed from height and weight. The formula is complicated, especially for children. We make it easier for the user to do analysis of obesity by pre-calculating it.

Constructed Variables • Insurance Type – In many instances, respondents to the survey had more than one source of insurance. For example, many seniors have insurance from their private pension plans and Medicare. For the purpose of creating an unduplicated count of the population by their insurance status, we have created a variable which imposes a hierarchy of insurance sources to classify the population.

Using Stata with the OFHS • Step 1. Download and Un-zip the Stata dataset. • Step 2. Open dataset in Stata. • Step 3. Set survey design parameters in Stata. • Step 4. Build and run your first OFHS Stata Program

Download and Unzip the Stata dataset. • You will find the OFHS Public Use Dataset at: http://grc.osu.edu/ofhs/datadownloads/index.htm • Right click on the file name and select ‘save target as’. • Save the ZIP file to the directory where you will store the data (c:\statadata\ofhs2008). • After the file has been saved, run winzip, saving the unzipped file to the same directory.

Setting survey design parameters • After you open the data in Stata, you will have to set the survey design parameters prior to running any analyses. To do this, type the following command in the command window in Stata. (Note: You will have to do this EVERY time you open the data.) • If conducting analyses on adults: svyset masterid [pweight=wt_a], strata(stratum) singleunit(certainty) vce(linearized) • If conducting analyses on child population: svyset masterid [pweight=wt_c], strata(stratum) singleunit(certainty) vce(linearized)

Build and run your first OFHS Stata Program • You should only use procedures in Stata that support the use of complex survey designs. Including: • svy: mean (estimates means) • svy: prop (estimates proportions) • svy: tabulate (provides tables) • A detailed list of commands that support the use of complex survey designs can be found by going to the Help menu in Stata (found in toolbar), choosing Stata command, and typing “svy estimation”

Proc Surveymeans Here is a simple program which calculates the percent of children by Insurance Type. It includes a 95% confidence interval around the mean. Note that you have already entered all of the sampling design parameters (at the beginning of your session). Remember that to calculate any adult variables, you will have to re-enter your design parameters, using the code provided on slide 28. svy: tab i_type_c, ci

Svy: tab results (with a little cutting and pasting and formatting of values)

svy: tabulate Now you might add some domain analysis to this, breaking out insurance status for children by poverty level. generate poverty200=. replace poverty200=0 if h87_imp>4 replace poverty200=1 if h87_imp<=4 replace poverty200=. If h87_imp==. svy: tab i_type_c if poverty200==0, se ci svy: tab i_type_c if poverty200==1, se ci

Svy: tabulate with an if statement

The END

2008 OFHS Public Use File: STATA Tutorial