280 likes | 395 Views
Statistics. Correlation and regression. Introduction. Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment B? Correlation and regression used to investigate relationships between variables most commonly linear relationships
E N D
Statistics Correlation and regression
Introduction • Some methods involve one variable • is Treatment A as effective in relieving arthritic pain as Treatment B? • Correlation and regression used to investigate relationships between variables • most commonly linear relationships • between two variables • is BMD related to dietary calcium level?
Contents • Coefficients of correlation • meaning • values • role • significance • Regression • line of best fit • prediction • significance
Introduction • Correlation • the strength of the linear relationship between two variables • Regression analysis • determines the nature of the relationship • Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver?
Pearson’s coefficient of correlation • r • Measures the strength of the linear relationship between one dependent and one independent variable • curvilinear relationships need other techniques • Values lie between +1 and -1 • perfect positive correlation r = +1 • perfect negative correlation r = -1 • no linear relationship r = 0
r = +1 r = -1 r = 0 r = 0.6 Pearson’s coefficient of correlation
Scatter plot BMD dependent variable make inferences about Calcium intake independent variable make inferences from controlled in some cases
Calculating r • The value and significance of r are calculated by SPSS
Interpreting correlation • Large r does not necessarily imply: • strong correlation • r increases with sample size • cause and effect • strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia • watching TV causes paranoid schizophrenia • may be due to indirect relationship
Interpreting correlation • Variation in dependent variable due to: • relationship with independent variable: r2 • random factors: 1 - r2 • r2 is the Coefficient of Determination • e.g. r = 0.661 • r2 = = 0.44 • less than half of the variation in the dependent variable due to independent variable
Agreement • Correlation should never be used to determine the level of agreement between repeated measures: • measuring devices • users • techniques • It measures the degree of linear relationship • 1, 2, 3 and 2, 4, 6 are perfectly positively correlated
Assumptions • Errors are differences of predicted values of Y from actual values • To ascribe significance to r: • distribution of errors is Normal • variance is same for all values of independent variable X
Non-parametric correlation • Make no assumptions • Carried out on ranks • Spearman’s r • easy to calculate • Kendall’s t • has some advantages over r • distribution has better statistical properties • easier to identify concordant / discordant pairs • Usually both lead to same conclusions
Calculation of value and significance • Computer does it!
Role of regression • Shows how one variable changes with another • By determining the line of best fit • linear • curvilinear
value of Y when X=0 change in Y when X increases by 1 Line of best fit • Simplest case linear • Line of best fit between: • dependent variable Y • BMD • independent variable X • dietary intake of Calcium Y= a + bX
Role of regression • Used to predict • the value of the dependent variable • when value of independent variable(s) known • within the range of the known data • extrapolation risky! • relation between age and bone age • Does not imply causality
Assumptions • Only if statistical inferences are to be made • significance of regression • values of slope and intercept
Assumptions • If values of independent variable are randomly chosen then no further assumptions necessary • Otherwise • as in correlation, assumptions based on errors • balance out (mean=0) • variances equal for all values of independent variable • not related to magnitude of independent variable • seek advice / help
Multivariate regression • More than one independent variable • BMD dependent on: • age • gender • calorific intake • etc
Logistic regression • The dependent variable is binary • yes / no • predict whether a patient with Type 1 diabetes will undergo limb amputation given history of prior ulcer, time diabetic etc • result is a probability • Can be extended to more than two categories • Outcome after treatment • recovered, in remission, died
Summary • Correlation • strength of linear relationship between two variables • Pearson’s - parametric • Spearman’s / Kendalls non-parametric • Interpret with care! • Regression • line of best fit • prediction • multivariate • logistic