Preparing Data for Analysis SPSS Training Thomas V. Joshua, MS July, 2012 College of Nursing
Lecture Overview • Why do we need the data management and data preparation for analysis • Data preparation and general format in SPSS • Introduction to SPSS and overview of SPSS for Windows
What is Data Management (DM) • Collecting, entering, cleaning, and processing of information gathered during a research project. • Is a work that involves the planning, development, implementation, and administration of systems for the acquisition, storage, and retrieval of data while protecting it by implementing high security levels.
Why need DM? • Decision-making, strategic planning and program design should be data driven - Appropriate data is often available, but the process of analysis is daunting for many. • Improving the communication • Ensure that the data are transferred in the proper format for the proposed analyses. • Help you understanding reality from data
Data Preparation • We suggest a phased approach that produces analysis-ready data without destroying the original dataset. • Also look at ways to document your dataset so that it will make sense when reviewed at a later point, or by other people. • In general, SPSS, Microsoft Excel and Access are acceptable as long as it is appropriately formatted. We will use SPSS as the general example.
Main Steps for Data Preparation • Create the data file. • Original data • Interim data • Documentation • Clean the data • Process the data • Create an analysis-ready copy of the data • Document the data
Some critical points to format your data set • Do not include header, trailer information, subtotals, or other extraneous information. For descriptive purposes you may include one row giving variable names.
Format data so that each variable is in its own single column. For example, and better not and absolutely not
All columns must have the same number of rows and Missing data (empty cells) Use a blank space or a . to indicate missing data. - Missing value cannot be treated as “0”. • Keep an identical field, such the subject ID.
If there are multiple data files, do not rely on the file names to carry variable information. For example, if separate files are used for the results of two treatments, include a column in each file containing the name of the treatment.
Counted proportion data. If data consists of counted proportions, e.g. number of individuals responding out of total number of individuals, do not reduce the data to percentages or proportions beforehand. It is recommended that both numerator and denominator of the proportion be entered as separate columns.
For example, Not It is easy to compute proportions during the analysis if they are required, but alternative analyses such as logistic regression may be precluded if original counts are unavailable.
Polytomous data. If data consists of numbers falling into a number of mutually exclusive classes, do not reduce to proportions or percentages beforehand, but enter the integer counts. For example, not
Each column has its own criteria or “meaning”. Narrow the definition for each variable means to create a new variable. For example, gender, age → >45 year-old male
Introduction to SPSS software • SPSS is a software package used for conducting statistical analyses, manipulating data, and generating tables and graphs that summarize data. • Statistical analyses range from basic descriptive statistics, such as averages and frequencies, to advanced inferential statistics, such as regression models, analysis of variance, and factor analysis. • SPSS also contains several tools for manipulating data, including functions for recoding data and computing new variables, as well as for merging and aggregating datasets.
Overview of SPSS for Windows • SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. • Data Editor - the window that is open at start-up and is used to enter and store data in a spreadsheet format. Includes Data View and Variable View. • Output Viewer opens automatically when you execute an analysis or create a graph using a dialog box or command syntax to execute a procedure. The Output Viewer contains the results of all statistical analyses and graphical displays of data. All output from these commands will appear in the Output Viewer. • Syntax Editor - a text editor where you compose SPSS commands and submit them to the SPSS processor. • This document focuses on the methods necessary for inputting, defining, and organizing data in SPSS.
The Data Editor (.sav) • The Data Editor window displays the contents of the working dataset. It is arranged in a spreadsheet format that contains variables in columns and cases in rows. • The Data View is the sheet that is visible when you first open the Data Editor; this sheet contains the data. • The Variable View is the sheet that contains information about the variables in the dataset. • Datasets that are currently open are called working datasets; all data manipulations, statistical functions, and other SPSS procedures operate on these datasets.
From the menu in the Data Editor window, choose the following menu options: File→Open → Data... • The Open File dialog box should automatically open to the SPSS directory of example files. Choose Employee data.sav from the list and click Open. Your Data Editor should now look like this:
The Syntax Editor (.sps) • This SPSS training focuses on the use of dialog boxes to execute procedures; however, there are at least two reasons why you should be aware of SPSS syntax, even if you plan to primarily use the dialog boxes. • First, not all procedures are available through the dialog boxes. Therefore, you may occasionally have to submit commands from the Syntax Editor. • Second, the Syntax Editor is a useful way to save a log of what you have done, and to re-run what you have done at a later date.
The dialog boxes • The dialog boxes available through the pull-down menus have a button labeled Paste, which will print the syntax for the procedure you are running in the dialog box environment to the Syntax Editor. Thus, you can easily generate SPSS syntax without typing in the Syntax Editor. • The following dialog box is used to generate descriptive statistics. You can also get this dialog box by choosing Analyze → Descriptive Statistics→Descriptives then clicking over the two variables using the arrow button.
By clicking on the Paste button, the procedure that the above dialog box is prepared to run will be written in the form of SPSS syntax to the Syntax Editor. Thus, clicking the Paste button in the above example would produce the following syntax: DESCRIPTIVES VARIABLES=salbegin salary /STATISTICS=MEAN STDDEV MIN MAX .
The left frame of the Output Viewer – an outline of the objects contained in the window. • Descriptives in the outline – refers to objects associated with the descriptive statistics. • The Title object – refers to the bold title Descriptives in the output. • The Active Dataset object – refers to the line in the output that designates which dataset was used to run the analysis. • The Descriptive Statistics – refers to the table containing descriptive statistics. • The Notes icon – has no referent in the above example, but it would refer to any notes that appeared between the title and the table.
References • Categorical Data Analysis Using the SAS System, 2nd edition SAS Institute, 2000, M.E Stokes, C. S. Davis, G.G Koch ISBN: 1580257107 • Basic and clinical Biostatistics ISBN-13:9780071410175 ISBN: 0071410171 • An Introduction to Categorical Data Analysis, 2nd edition by A. Agresti (John Wiley & Sons) 2007 ISBN 0471-22618-1 • SPSS for windows step by step: A simple guide and reference,13.0 update. ISBN-13: 9780205480715