Statistical Methods and SPSSPhysical Therapy 34.616Research MethodsRobert Karasek and Sean Collins Robert Karasek, PhD Department of Work Environment, University of Massachusetts Lowell
Statistical Methods and SPSS 34.616 Course Module Goal: • To develop hands-on proficiency and scientific understanding of basic statistical data analysis, using the SPSS statistical analysis program available in the UML Computer Laboratory. • SPSS Basic Text: J. Pallant, SPSS Survival Manual, 3rd or 4th Ed, McGraw Hill • 34.616 Main Text: Portnoy LG and Watkins MP, Foundations.. (Part IV Data Analysis - selected sections) • http://www.umass.edu/statdata/software/handouts/SPSS%20Syntax.pdf • http://www.vassarstats.net/textbook/toc.html
Lecture Topics / order - dates • Lecture 1: Running SPSS and Descriptive Statistics • April 5, 2012 • Lecture 2: Bivariate Relations and Correlation • April 10, 2012 • Lecture 3: Multiple Regression and Factor Analysis • April 12, 2011 • Lecture 4: Categorical Variable Statistics and T-tests • April 17, 2012 • Lecture 5: Analysis of Variance (ANOVA) • April 19, 2012
Research Methods 34.616Statistical Methods/SPSSLecture 1:Using the SPSS programand Descriptive Statistics
Research Methods 34.616Statistical Methods/SPSSLecture 1A:Preliminary Task: Getting SPSS data analysis program / dataset “up and running” in UML Computer Lab
Using SPSS for Data Analysis:w/ Pallant,SPSS Survival Manual (SSM) Using a typical SPSS Datafile (3ED.sav/4ED.sav) • Download data set from web: http://www.allenandunwin.com/spss/data_files.html Download Datafile (Survey3ED.sav / Survey4ED.sav) • Open UML Computer Lab SPSS Statistical program (it maybe called “PASW Stat. 18”). • Double click on dataset (above) icon. (SPSS should open - or... ) • SPSS is Menu-driven and Syntax-driven (SSM, Chap 2). So: • Go to SPSS Menu “File” icon on far left - find the dataset.
SPSS Syntax • Syntax Editor Window SPSS statistical commands can be written in two-ways: 1. Personal input (using book examples, SPSS manuals) 2. Auto-written from Menu click processes (and saved in Syntax Window). • The SYNTAX NEEDS TO BE SAVED so it can be modified as you adjust your scientific questions for analysis (there is no other record of how you generated your output from the Menu analysis). • Syntax in SPSS is further discussed in the U Mass Amherst Memo (and the memo gives you access to other interesting downloadable datasets).
Research Methods 34.616Statistical Methods/SPSSLecture 1B:Descriptive Statistics
Lecture 1B: Descriptive Statistics I. Goals • 1. Describe the sample • 2. Check variables • 3. Research tests II. SPSS Procedures 1. FREQUENCIES 2. DESCRIPTIVES 3. EXPLORE
A Data Base - Typical(SPSS Survival Manual: 3ED.sav / 4ED.sav) Cases = 429, Variables = 139
DESCRIPTIVES (continuous variables)Output/Statistics for “Age” in SPSS SYNTAX: • DESCRIPTIVES VARIABLES=age • /NTILES=4 • /STATISTICS= STDDEV VARIANCE MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS KURTOSIS • /ORDER=ANALYSIS
Sample Population Descriptive Statistics • STDDEV ------- Distribution Parameters • VARIANCE (Normal) • MINIMUM ------ Range • MAXIMUM • MEAN ------- Central Tendency • MEDIAN • (MODE) • SKEWNESS ------- Distribution Shape • KURTOSIS
Formula for variance, standard deviation s2 =√∑(Xi - MX)2 ⁄N
Modifying Variables and Creating Scales - Recode / Compute - (Labels, Missing Values)
Research Methods 34.616Statistical Methods/SPSSLecture 2:Bivariate RelationshipAnalysis and Correlation
Choosing the Right Statistics/Methods I. Exploring Relationships Between Multiple Variables • 1. Correlation • a. Biavarite relationships (ordinal data) - Assess strength of association. • b. Continuous variable and ordinal categorical variable statistics. • 2. Partial Correlation • a. Use a second continuous to control for the effects of the first in a bivariate relationship (ordinal data). • 3. Multiple Regression • a. Explore the association of one or more continuous (ordinal) variable on a third variable - the dependent variable (also a continuous variable). The technique apportions relative strength of association among variables. • b. A variant is logistic multiple regression, where the dependent variable may be dichotomous (case/ non-case). • 4. Factor Analysis • Data reduction technique: find a small set of primary directions of variability among a large set of interrelated variables. Used for creating scales based on multiple variables. • 5. Categorical variables (non-ordinal) • For categorical variables (no clear ordering or interval relationship between categories), use the Chi-square or Kappa statistic.
Choosing the Right Statistics/Methods I. Exploring Differences Between Groups • 5. T-tests Used to determine whether the mean values of an independent variable measured in two samples are statistically different from each other, based on parameters of each sample distribution. • 6. One-way Analysis of Variance Used to determine whether the mean values of a continuous independent variable measured in many samples are statistically different from each other. Determines how much of the variation in the samples is within the compared groups and how much is between the groups. • 7. Two -way Analysis of Variance Used to determine whether the mean values of a continuous independent variable measured in multiple samples, different on two dimensions, are statistically different from each other. Determines how much of the variation in the samples is within the compared groups and how much is between the groups. It can be used to determine whether the association between the independent variable and the first co-variate depends on the level of the second co-variate: an Interaction Effect.
Examples of correlations 3.2c and 3.2e represent perfect correlation, the maximum degree of linear correlation, positive or negative, that could possibly exist between two variables.
Calculating the correlation coefficient • Pair a b c d e f • Xi 1 2 3 4 5 6 • Yi 6 2 4 10 12 8 • Xi2 1 4 9 16 25 36 • Yi2 36 4 16 100 144 64 • XiYi 6 4 12 40 60 48 • For any particular item in a set of measures of the variable Y,deviateY=Yi-MeanY • Similarly, calculate the deviate for X
Calculating the correlation coefficient • SSX = 17.5, • SSY = 70.0, and SCXY = 23.0 you can then easily calculate the correlation coefficient as • r =23.0 / √[17.5 x 70.0]= +0.66 • r2 = (+0.66)2 = 0.44
Research Methods 34.616Statistical Methods/SPSSLecture 3:Partial CorrelationMultiple Regression and Factor Analysis
Regression Coefficients Slope b = SCXY/SSX=23.0/17.5 = +1.31 Intercept a= MeanY - bMeanX= 7.0 - [1.31(3.5)] = 2.4 • the point at which the line crosses the Yﾊaxis (the 'intercept')
A Two-Factor solution (vs. 4 factors) More factors explain more variance, but are more complex to theoretically interpret • - 4 Factors: 68% of variance - 2 Factors: 40% of variance