Research Methods of Applied Linguistics and Statistics (11). Correlation and multiple regression By Qin Xiaoqing. Pearson Correlation. The Pearson correlation allows us to establish the strength of relationships between continuous variables.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Research Methods of Applied Linguistics and Statistics (11)
Correlation and multiple regression
By Qin Xiaoqing
Are the data points spread all over the place? This suggests a very low correlation.
Are all the points neatly arranged in a narrow cigar shape? This suggests quite a strong correlation.
Could you draw a straight line through the main cluster of points, or would a curved line better represent the points? If a curved line is evident (suggesting a curvilinear relationship), then Pearson correlation should not be used.
What is the shape of the cluster? Is it even from one end to the other? Or does it start off narrow and then get fatter. If this is the case, the data may be violating the assumption of variance homogeneity.
When r=.60, the variance overlap between the 2 measures is .36.
The overlap tells that the 2 measures provide similar information. Or the magnitude of r2 indicates the amount of variances in X which is accounted for by Y or vice versa.
Stevens (1996) recommends that ‘for social science research, about 15 subjects per predictor are needed for a reliable equation’.
Tabachnick and Fidell (1996, p. 132) give a formula for calculating sample size requirements, taking into account the number of independent variables that you wish to use: N > 50 + 8m (where m = number of independent variables). If you have five independent variables you would need 90 cases.
More cases are needed if the dependent variable is skewed.
For stepwise regression there should be a ratio of forty cases for every independent variable.
The closer r is to ±1 the smaller the error will be in predicting performance on one variable to that of the second. The smaller, the greater the error.
4 pieces of information are needed: They are
With this information, we can predict the S’s score on Y from X on a mathematical basis. By ‘regressing’ Y on X, predicting Y from X will be possible.
Mean for X=8, SD=4.47; mean for Y=10.8, SD=2.96; r=.89