410 likes | 571 Views
. This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF).The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed. The dataset is a record of the
E N D
1. Using the 2008 OFHS Public Use File
A Self Guided Tutorial
*SAS Version*
2. This tutorial is intended for persons who wish to use the 2008 OFHS Public Use File (PUF).
The PUFs exclude any information that could either intentionally, or unintentionally identify a respondent. Geographic information below the county level has been removed.
The dataset is a record of the responses to the survey questions at the respondent level.
The dataset is in a format that requires the use of SAS, a statistical analysis software from SAS Institute.
The dataset is also available for STATA and SPSS. There is a separate tutorials for STATA users.
3. SAS Users Prerequisites
User has SAS version 9.1 or later.
User has experience writing SAS programs and running them in the SAS Display Manager user interface, or in SAS Enterprise Guide.
User has an understanding of basic statistics, including analysis of univariate data using nominal and ordinal level variables.
User is comfortable with statistical terms such as proportions, standard error, confidence level, and confidence interval.
4. OFHS Background The 2008 OFHS is the largest State sponsored health survey in the U.S.
Previous surveys were completed in 1998 and 2004.
The survey had a sample size of 50,993.
The survey was stratified to have enough respondents to do some analysis for each county in the state.
5. Documents that you may download before you get started. OFHS Questionnaire
OFHS Codebook
OFHS Methods Report
6. What you need to know about the survey. Survey Design
Survey Questions
Imputation of Missing Values
Weighting of Responses
Constructed Variables
7. Survey Design The survey is a stratified random sample of Ohio’s non-institutional population.
Conducted through telephone interviews.
Land Lines (49,000 respondents)
Cell Phone (2,000 respondents)
Random Digit Dialing (land lines) within exchange numbers associated with each county.
Exchanges are the first 3 digits of a seven digit phone number.
The last four digits within each exchange are randomly selected.
8. Survey Design Cell Phones
Exchanges are at state level.
Over Samples
African Americans - Some Exchanges in 6 largest urban counties have higher proportion of African Americans in the population. The higher proportion exchanges were sampled at a higher rate.
Asian and Hispanics - Supplementation of survey with lists of persons with hispanic or asian surnames.
Household clusters
Each household/family forms a cluster within the sample.
One adult and one child are randomly selected within the family.
Each response includes information on the adult, and the child (if there are any children).
The adult who is most knowledgeable about the child’s health responds for the child.
9. Survey Design The population of persons within each of the strata (State, County, telephone exchange, household, etc.) is already known or is collected as a part of the survey.
A weight is established for each child and adult which reflects the inverse of the probability of being selected for the survey.
Indicators of the strata and the weights are used in the SAS programs. We will come back to this later on.
10. Survey Questions In the survey questionnaire there are different kinds of questions. They include:
Qs that help to establish the weights for the survey.
How many children are in the family?
How many phone numbers are in the home?
11. Survey Questions Qs that identify the demographic and socioeconomic characteristics of the individuals and the family.
Age, gender, race, ethnicity.
Family income, employment, industry.
Education
12. Survey Questions Qs that identify the insurance status of the adult and child respondents.
Source of Coverage (Job based, Medicare, Medicaid, etc.)
If no insurance, the length of time without insurance, reason for being uninsured.
If insured, length of time covered by current plan.
Types of Coverage (dental, prescriptions, vision mental health)
13. Survey Questions Health Status of Adult and Child
General health status
Chronic health conditions
Special Health Care needs
Functional disability
Height and weight
14. Survey Questions Health Care Access, Utilization, Satisfaction and Unmet needs.
Usual source of care
Care coordination
Specialists
Emergency room use
Hospitalizations
Types of unmet needs.
15. Survey Questions Questions are at multiple levels.
Anchor Questions are questions that are asked of everyone.
Qualifying Questions are questions that help to narrow down who should be responding to an in-depth question.
In-depth questions probe the dimensions of the respondent’s experience with a particular phenomenon.
16. Example of Question levels D43. //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he// had diabetes or sugar diabetes?
01 YES
02 (Skip to D45) NO
03 [VOLUNTEERED:] BORDERLINE
98 DK
99 REFUSED
D43a //Have you/Has person in S1// ever been told by a doctor or any other health professional that //you/he/she// had TYPE 1 CHILD ONSET DIABETES or TYPE 2 ADULT ONSET, DIABETES?
[INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS ‘BORDERLINE’ CODE AS ‘03’]
//Display response option 97, only if S15 = 02, 99.
// 97 (Skip to D45) [VOLUNTEERED:] YES, “GESTATIONAL” OR “ONLY WHEN PREGNANT” MENTIONED
01 YES - TYPE I (JUVENILE)
02 YES - TYPE II (ADULT ONSET)
03 [VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY
04 (Skip to D45) NO, NEVER DIAGNOSED WITH DIABETES
98 (Skip to D45) DK
99 (Skip to D45) REFUSED
17. Example of Question levels
18. Question levels Notice in the example that there are instructions to skip to another question if the answer is no.
These are anchor questions and qualifying questions which are eliminating persons from answering the in-depth questions.
As a result, when a question is not asked of a respondent it creates a missing value for the respondent which is MISSING BY DESIGN.
19. Missing Values Some data is missing in the survey because the respondent refused to answer the question, or did not know the answer.
These kinds of missing values need to be treated differently then those that are ‘missing by design’.
20. Missing Values There are some types of questions which are very important to the survey design or for public policy issues, for which it is not acceptable to have values missing.
These include questions like:
Number of children in the family (design)
Family Income (public policy)
21. Imputation of Missing Values Where it is important for the survey to not have any missing values, the survey statisticians have replaced the missing value, by imputing it from all of the other survey respondents that answered other questions in the survey like the respondent did.
Survey statisticians use very sophisticated models and processes to do imputation, and the practice is well accepted.
When using this survey to do analysis, it is expected that the user will consider whether or not to choose the form of the variable which includes the imputed values.
Imputed variables have a suffix of “_imp”.
22. Weighting Weights for each adult and child response which reflect the inverse of the probability of being selected for the survey, are constructed and should be used in all analysis.
When the weights are used, the results reflect an accurate reflection of the entire population.
23. Weighting If the weights for children in the OFHS were summed up across all responses, the total would be equal to the child population of Ohio. The same is true of the adult weights.
The variable name for the adult weight is “wt_a”.
The variable name for the child weight is “wt_c”.
24. Constructed Variables There are many variables in the OFHS file that are constructed from the responses to the survey questions that make it easier to use the OFHS. These variables include:
BMI – Body mass index. BMI is an indicator of adult and child obesity constructed from height and weight. The formula is complicated, especially for children. We make it easier for the user to do analysis of obesity by pre-calculating it.
25. Constructed Variables Insurance Type – In many instances, respondents to the survey had more than one source of insurance. For example, many seniors have insurance from their private pension plans and Medicare. For the purpose of creating an unduplicated count of the population by their insurance status, we have created a variable which imposes a hierarchy of insurance sources to classify the population.
26. Using SAS with the OFHS Step 1. Make your PC Ready.
Step 2. Download and Un-zip the SAS dataset.
Step 3. Assign a SAS Library name and restore SAS formats.
Step 4. Build and run your first OFHS SAS Program
27. Make Your PC Ready Create a directory for the OFHS Public Use File. It should look like this:
C:\sasdata\ofhs2008
Make sure that you have software to decompress the SAS dataset. WinZip is a popular product which works well for this.
Make sure there is enough room on the drive for the OFHS file after it is unzipped. You will need at least 800 megabytes of storage space. You will need additional temporary work space for when the file is processing. You may want to put the file on a separate drive from the drive which houses the temporary work space (typically Drive C).
28. Download and Unzip the SAS dataset. You will find the OFHS Public Use Dataset at:
http://grc.osu.edu/ofhs/datadownloads/index.htm
Right click on the file name and select ‘save target as’.
Save the ZIP file to the directory where you will store the data (c:\sasdata\ofhs2008).
After the file has been saved, run winzip, saving the unzipped file to the same directory.
29. Download and Unzip the SAS dataset After you download the data, the directory will contain the following files:
Formats.sas7bdat
Restore_formats.sas
OFHS2008.sas7bdat
30. Assign a SAS Library name and restore SAS formats First, you must start SAS or SAS Enterprise Guide.
Open the Restore_Formats.sas in the program editor window.
31. Assign a SAS Library name and restore SAS formats
32. Build and run your first OFHS SAS Program You should only use procedures in SAS that support the use of complex survey designs. Including:
Proc Surveymeans
Proc Surveyfreq
Proc Surveylogistic
Proc Surveyreg
Most newcomers will use Proc Surveymeans to start out. If you are familiar with Surveylogistic or Surveyreg, you probably do not need this tutorial.
33. Proc Surveymeans
34. Proc Surveymeans results (with a little cutting and pasting and formatting of values)
35. Proc Surveymeans
36. Surveymeans with a Domain Statement
37. Proc Surveyfreq
38. Results of Proc Surveyfreq
39. Domain Analysis in Proc Surveyfreq
40. Domain Analysis in Proc Surveyfreq
41. The END