330 likes | 610 Views
Relationships Among Variables Correlation and Regression. KNES 510 Research Methods in Kinesiology. Correlation. Correlation is “a statistical technique used to determine the relationship between two or more variables” We use two different techniques to determine score relationships:
E N D
Relationships Among VariablesCorrelation and Regression KNES 510 Research Methods in Kinesiology
Correlation • Correlation is “a statistical technique used to determine the relationship between two or more variables” • We use two different techniques to determine score relationships: • graphing technique • mathematical technique called correlation
Types of Relationships • The scattergram can indicate a positiverelationship, a negativerelationship, or a zerorelationship • What are the characteristics of positive, negative, and zero relationships?
Mathematical Technique: The Correlation Coefficient (r) • The correlation coefficient, r,* represents the relationship between the z-scores of the subjects on two different variables (usually designated X and Y) • This can be stated mathematically as the mean of the z-score products for all subjects *A more complete name for this statistic is Pearson’s product-moment correlation coefficient
Formula for the Correlation Coefficient • The correlation coefficient can be calculated as follows:
The values of the coefficient will always range from +1.00 to -1.00 • A correlation coefficient near 0.00 indicates no relationship
Interpreting the Correlation Coefficient • Because the relationship between two sets of data is seldom perfect, the majority of correlation coefficients are fractions (0.92, -0.80, and the like) • When interpreting correlation coefficients it is sometimes difficult to determine what is high, low, and average
The Correlation Coefficient and Cause-and-Effect • There is a high correlation between a person's shoe size and their math skills in grades K through 6 • Is this an example of cause-and-effect? • Can we predict math skill based on shoe size in grade K through 6 students?
Coefficient of Determination • The coefficient of determination is the amount of variability in one measure that is explained by the other measure • The coefficient of determination is the square of the correlation coefficient (r2). • For example, if the correlation coefficient between two variables is r = 0.90, the coefficient of determination is (0.90)2 = 0.81
Regression • When two variables are related (correlated), it is possible to predict a person’s score on one variable (Y) by knowing their score on the second variable (X)
This scatterplot illustrates that there is a strong, positive relationship between fat-free body mass and daily energy expenditure
Regression Line (Line of Best Fit) • The regression line is a line that best describes the trend in the data • This line is as close as possible to the data points • The equation for this line is: Y' = bX = C
Simple Prediction • Tests have been developed to predict VO2 max from the time it takes a person to run 1.5 miles • A person's VO2 max can thus be predicted from their 1.5 mile run time because a prediction or regression equation has been developed
The simple linear prediction or regression equation takes the following form: Y' = a + bX Y' = predicted value a = intercept of the regression line (Y intercept) b = slope of the regression line (change in Y with each change in X) X = score on the predictor variable
Determining Error in Prediction • Unless two variables are perfectly related (-1.00 or +1.00) there will always be error associated with a prediction equation • We find the standard deviation of this error, the standard error of prediction (syx), using the following formula:
A predicted score (Y’) ± syx yields a range of scores within which a person’s true score on the predicted variable lies • If the standard error of prediction may be interpreted as the standard deviation of residuals, what are the odds that a person’s true score lies between Y’ ± syx?
The standard error of prediction for percent body fat estimated using the skinfold method is ±3.5% • If a person has their percent body fat estimated at 12%, between what two values does their true body fat lie (95% probability)?
Which of the following will more precisely predict job performance? A: r = 0.168 B: r = 0.686
Sample SPSS Output • Here is the SPSS output for regressing Work Simulation Job Performance (Dependent Variable) against Supervisor Ratings (Independent Variable)
This information can be used to create a prediction (regression) equation for predicting work performance of future applicants from supervisor ratings Y’ = – 1.156 + 0.033 X
Work Simulation Job Performance may also be predicted from Arm Strength • Here is the SPSS output:
This information can be used to create a prediction (regression) equation for predicting work performance of future applicants from supervisor ratings Y’ = – 4.095 + 0.055 X
We now have two regression equations for predicting Work Simulation Job Performance • Which is the better equation for accurate prediction? • To determine this, we must examine the standard error of prediction for each equation
Standard error of prediction using Supervisor Ratings: • Standard error of prediction using Arm Strength: • Which is the better equation?
Multiple Prediction • A prediction formula using a single measure X is usually not very accurate for predicting a person's score on measure Y • Multiple correlation-regression techniques allow us to predict score Y using several X scores
The general form of a two predictor multiple regression equation is: Y' = a + b1X1+ b2X2
An example of multiple correlation-regression is the prediction of percent body fat from multiple skinfold measurements DB (g/cc) = 1.0994921 - 0.0009929 (3SKF) + 0.0000023 (3SKF)2 – 0.0001392 (age)
Next Class • Chapters 9 & 11 • Mock Proposals in class!