1 / 60

LINEAR REGRESSION

LINEAR REGRESSION. Correlation & Linear Regression. Not the same, but are related Linear regression: line that best predicts Y from X Use when one of the variables is controlled Correlation : quantifies how X and Y vary together Use when both X and Y are measured.

acton-gates
Download Presentation

LINEAR REGRESSION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LINEAR REGRESSION

  2. Correlation & Linear Regression • Not the same, but are related • Linear regression: • line that best predicts Y from X • Use when one of the variables is controlled • Correlation: • quantifies how X and Y vary together • Use when both X and Y are measured

  3. Correlation & Linear Regression • 3 characteristics of a relationship: • Direction • Positive (+) • Negative (-) • Degree of Association • Between -1 and +1 • Absolute values signify strength • Form • Linear • Non-linear

  4. Direction

  5. Degree of Association

  6. Form

  7. Linear Regression • If two variables are linearly related, it is possible to develop a simple equation to predict one from the other • The outcome (dependent) variable is designated Y • The predictor (independent) variable is designated X

  8. The Linear Equation • General form: Y = a + bX • Where: • a = intercept • b = slope • X = predictor • Y = outcome • Can use this equation to predict Y for any given value of X • a and b are constants in a given line; X and Y change

  9. The Linear Equation • Same a, different b’s...

  10. The Linear Equation • Same b, different a’s...

  11. The Linear Equation • Different a’s and b’s...

  12. Slope and Intercept

  13. Slope and Intercept

  14. Slope and Intercept • When there is no linear association (r = 0), the regression line is horizontal (b = 0)

  15. Slope and Intercept • When the correlation is perfect (r = ±1), all the points fall exactly along a straight line

  16. Slope and Intercept • When there is some linear association (0<r<±1), the regression line fits as close to the points as possible

  17. Where did this line come from? • It is a straight line which is drawn through a scatterplot, to summarise the relationship between X and Y • It is the line that minimises the squared deviations (Y’ - Y)2 • We call these squared deviations “residuals”

  18. Regression Lines • Minimising the squared vertical distances, or “residuals”

  19. Example

  20. Example

  21. Regression: Analyzing the “fit” • How well does the regression line describe the data? • Assessing “fit” relies on analysis of residuals • Conduct an ANOVA to test the null hypothesis that an increase in X does not cause a change (positive or negative) in the value of Y

  22. Y Regression ANOVA • Need to partition out the variability • Total variability of Y = variability explained by the regression line + unexplained variability

  23. Regression ANOVA

  24. Regression ANOVA: Example • Fill in the ANOVA table:

  25. Regression ANOVA: Example

  26. Regression ANOVA: Example

  27. Regression ANOVA: Example

  28. Regression ANOVA: Example • F1,4 = 7.71 (from table) • Reject HO; An increase in X does cause a change in Y

  29. SPSS Linear Regression

  30. Linear Regression • Linear Regression uses one or more independent variables in an equation to best predict the value of the dependent variable • From the menus choose: Analyze Regression Linear

  31. Linear Regression • Select one dependent variable (numeric) and one or more independent variables (numeric) • The output will compute an ANOVA telling you whether the overall regression is significant • It will also calculate a value for the slope and intercept (coefficients)

  32. Example • Perform a regression analysis with “Years Since PhD” as the independent variable and “Publications” as the dependent variable • What is the equation of the straight line? • Plot the data and draw a regression line through the scatterplot

  33. Results Intercept Slope

  34. Example Scatter plot with regression line

  35. Multiple Regression – Example As cheese ages, various chemical processes take place that determine the taste of the final product. The dataset “Cheese” contains concentrations of various chemicals in 30 samples of mature cheddar cheese, and a subjective measure of taste for each sample. Use a multiple regression analysis to evaluate the effect of these three chemicals on the taste of cheese.

  36. Results

  37. CORRELATION

  38. Correlation Statistical technique that measures and describes the degree of linear relationship between two variables

  39. Pearson’s r • Value ranging from -1 to +1 • Indicates strength and direction of the linear relationship • Absolute value indicates strength • +/- indicates direction

  40. Pearson’s r • r is an estimate of the population  (rho)

  41. Example = 0.866 What is the correlation value?

  42. Some issues with r • Outliers have strong effects • Restriction of range can suppress or augment r • Correlation is not causation • No linear correlation does not mean no association

  43. Outliers • Outliers can strongly affect the value of r

  44. Restricted Range • The relationship you see between X and Y may depend on the range of X • E.g., the size of a child’s vocabulary has a strong positive association with the child’s age, but if all the children in your data set are in the same grade at school, you may not see much association

  45. Common Causes • Two variables might be associated because they share a common cause • There is a positive correlation between ice cream sales and drownings – This is because they both occur more often in summer, not because one is causing the other

  46. Non-Linearity • Some variables are not linearly related, though a relationship obviously exists

  47. Non-Linearity • Even though we find a significant correlation, the relationship may not be linear Four sets of data with the same correlation of 0.816

  48. r-squared • r2 is the Coefficient of Determination • It is the amount of covariation compared to the amount of total variation • The percent of total variance that is shared variance • E.g. If r = 0.80, then r2 = (0.80)2 = 0.64 • X explains 64% of the variation in Y (and vice versa)

  49. Hypothesis Testing with r Use t-test to test if there is a significant correlation (relationship) between variables

  50. - 2 1 r = SE r - n 2 r = t = - df n 2 SE r Previous Example Is the correlation significant? • r= 0.866 • SEr = 0.25 • t = 3.464 • d.f. = 4 • tcrit = 2.776 (from table) Reject null hypothesis – There is a significant relationship

More Related