Statistics

1 / 28

# Statistics - PowerPoint PPT Presentation

Statistics. Correlation and regression. Introduction. Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment B? Correlation and regression used to investigate relationships between variables most commonly linear relationships

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics

Correlation and regression

Introduction
• Some methods involve one variable
• is Treatment A as effective in relieving arthritic pain as Treatment B?
• Correlation and regression used to investigate relationships between variables
• most commonly linear relationships
• between two variables
• is BMD related to dietary calcium level?
Contents
• Coefficients of correlation
• meaning
• values
• role
• significance
• Regression
• line of best fit
• prediction
• significance
Introduction
• Correlation
• the strength of the linear relationship between two variables
• Regression analysis
• determines the nature of the relationship
• Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver?
Pearson’s coefficient of correlation
• r
• Measures the strength of the linear relationship between one dependent and one independent variable
• curvilinear relationships need other techniques
• Values lie between +1 and -1
• perfect positive correlation r = +1
• perfect negative correlation r = -1
• no linear relationship r = 0

r = +1

r = -1

r = 0

r = 0.6

Pearson’s coefficient of correlation

Scatter plot

BMD

dependent variable

Calcium intake

independent variable

make inferences from

controlled in some cases

Calculating r

• The value and significance of r are calculated by SPSS

Interpreting correlation

• Large r does not necessarily imply:
• strong correlation
• r increases with sample size
• cause and effect
• strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia
• watching TV causes paranoid schizophrenia
• may be due to indirect relationship

Interpreting correlation

• Variation in dependent variable due to:
• relationship with independent variable: r2
• random factors: 1 - r2
• r2 is the Coefficient of Determination
• e.g. r = 0.661
• r2 = = 0.44
• less than half of the variation in the dependent variable due to independent variable

Agreement

• Correlation should never be used to determine the level of agreement between repeated measures:
• measuring devices
• users
• techniques
• It measures the degree of linear relationship
• 1, 2, 3 and 2, 4, 6 are perfectly positively correlated

Assumptions

• Errors are differences of predicted values of Y from actual values
• To ascribe significance to r:
• distribution of errors is Normal
• variance is same for all values of independent variable X

Non-parametric correlation

• Make no assumptions
• Carried out on ranks
• Spearman’s r
• easy to calculate
• Kendall’s t
• has some advantages over r
• distribution has better statistical properties
• easier to identify concordant / discordant pairs
• Usually both lead to same conclusions

Role of regression

• Shows how one variable changes with another
• By determining the line of best fit
• linear
• curvilinear

value of Y when X=0

change in Y when X increases by 1

Line of best fit

• Simplest case linear
• Line of best fit between:
• dependent variable Y
• BMD
• independent variable X
• dietary intake of Calcium

Y= a + bX

Role of regression

• Used to predict
• the value of the dependent variable
• when value of independent variable(s) known
• within the range of the known data
• extrapolation risky!
• relation between age and bone age
• Does not imply causality

Assumptions

• Only if statistical inferences are to be made
• significance of regression
• values of slope and intercept

Assumptions

• If values of independent variable are randomly chosen then no further assumptions necessary
• Otherwise
• as in correlation, assumptions based on errors
• balance out (mean=0)
• variances equal for all values of independent variable
• not related to magnitude of independent variable

Multivariate regression

• More than one independent variable
• BMD dependent on:
• age
• gender
• calorific intake
• etc

Logistic regression

• The dependent variable is binary
• yes / no
• predict whether a patient with Type 1 diabetes will undergo limb amputation given history of prior ulcer, time diabetic etc
• result is a probability
• Can be extended to more than two categories
• Outcome after treatment
• recovered, in remission, died

Summary

• Correlation
• strength of linear relationship between two variables
• Pearson’s - parametric
• Spearman’s / Kendalls non-parametric
• Interpret with care!
• Regression
• line of best fit
• prediction
• multivariate
• logistic