Organizing Your Data for Statistical Analysis in SPSS. Edward A. Greenberg, PhD. ASU HEALTH SOLUTIONS DATA LAB. Revised January 4, 2013. SPSS Data Sets. SPSS Data Sets. SPSS Data Sets. Rows are cases or observations Columns are variables (measurements)
Organizing Your Data for Statistical Analysis in SPSS
Edward A. Greenberg, PhD
ASU HEALTH SOLUTIONS DATA LAB
Revised January 4, 2013
Rows are cases or observations
Columns are variables (measurements)
Up to 231-1 columns (2,147,493,647)
No limit on the number of cases
Numeric (40 character maximum length)
Dates and times (various formats)
Other variations of numeric (currency, comma, scientific notation, etc.)
String (32,767 maximum length)
Variable names must be unique.
Variable names may be up to 64 characters in length.
Names can contain letters, numbers, or special characters.
Names must start with a letter or @, #, or $.
What constitutes a “case?”
An experimental trial
Variable names may be short and cryptic.
Variable labels can be up to 255 characters.
SPSS procedures display at least 40 characters of variable labels.
Value labels can be up to 120 characters.
The order of variables in the SPSS data file normally should be the same as the order of items in the questionnaire.
Use variable names that help you identify the scale or instrument to which they apply.
Each case in an SPSS file should include a case number.
Often this will be the first variable in the file.
The case number does not identify the subject but it links the data record to the subject’s questionnaire.
Useful for correcting data entry errors
Data may be missing for several reasons:
Refused to answer
Skipped a question
Data entry omission
SPSS provides several ways of designating numeric data as “missing values.”
A blank cell is treated as “system missing,” represented by a dot (“.”) in the SPSS Data Editor.
Specific values can be declared as “user missing” values.
Up to three “user missing” values can be declared for a variable.
Or, a range of values plus one additional value can be declared to be missing.
In this example, variable AGEWED has three labeled values that are to be treated as missing
The three values are declared to be missing in the Missing Values dialog.
Expressions handle missing values in different ways.
The result of (var1+var2+var3)/3 is missing if any of the three variables is missing.
The result of MEAN(var1, var2, var3) is missing if all three of the variables are missing.
The FREQUENCIES procedure excludes cases with missing values from computations.
In the MULT RESPONSE procedure, multiple response variables are combines into groups.
The MULT RESPONSE procedure counts responses in multiple response groups in frequency or cross tabular tables.
Total percentages of responses generally will exceed 100%.
Data that are recorded on more than one occasion for each subject
Some procedures, such as GLM, require that all measurements for a case be on the same data record.
Other procedures, such as the MIXED procedure, may expect one data record per occasion.
One data record per subject, one variable per occasion on which it is measured
One data record per occasion per subject
The good news is that SPSS allows you to easily restructure a data set
Restructure selected variables into cases
Restructure selected cases into variables
Transpose all data