1 / 26

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression. Correlation Analysis. Correlation analysis is used to describe the degree to which one variable is linearly related to another. There are two measures for describing correlation:

bvirginia
Download Presentation

Correlation and Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Simple Linear Regression

  2. Correlation Analysis • Correlation analysis is used to describe • the degree to which one variable is • linearly related to another. • There are two measures for describing correlation: • The Coefficient of Correlation or Coefficient of linear correlation or Pearson’s coefficient of linear correlation (ρ / r / R) • The Coefficient of Determination (r2 / R2 )

  3. Correlation The correlationbetween two random variables, X and Y, is a measure of the degree of linear associationbetween the two variables. The population correlation, denoted by, can take on any value from -1 to 1.    indicates a perfect negative linear relationship -1 <  < 0 indicates a negative linear relationship    indicates no linear relationship 0 <  < 1 indicates a positive linear relationship    indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.

  4. Y Y Y  = -1  = 0  = 1 X X X Y Y Y  = -.8  = 0  = .8 X X X Illustrations of Correlation

  5. The coefficient of correlation: Sample Coefficient of Determination

  6. The Coefficient of Correlation or Karl Pearson’s Coefficient of Correlation The coefficient of correlation is the square root of the coefficient of determination. The sign of r indicates the direction of the relationship between the two variables X and Y.

  7. Simple Linear Regression • Regressionrefers to the statistical technique of modeling the relationship between variables. • Insimple linearregression, we model the relationship between two variables. • One of the variables, denoted by Y, is called thedependent variable and the other, denoted by X, is called theindependent variable. • The model we will use to depict the relationship between X and Y will be astraight-line relationship. • Agraphical sketch of the pairs (X, Y) is called ascatter plot.

  8. This scatterplotlocates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that: • Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising. S c a t t e r p l o t o f A d v e r t i s i n g E x p e n d i t u r e s ( X ) a n d S a l e s ( Y ) 1 4 0 1 2 0 1 0 0 s 8 0 e l a S 6 0 4 0 2 0 0 0 1 0 2 0 3 0 4 0 5 0 A d v e r t i s i n g • The scatter of points tends to be distributed around a positively sloped straight line. • The pairs of values of advertising expenditures and sales are not located exactly on a straight line. • The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. • The line represents the nature of the relationship on average. Using Statistics

  9. 0 0 Y Y Y 0 0 0 X X Y Y Y X X X Examples of Other Scatterplots X

  10. Simple Linear Regression Model • The equation that describes how y is related to x and • an error term is called the regression model. • The simple linear regression model is: y = a+ bx +e where: • a and b are called parameters of the model, • a is the intercept and b is the slope. • e is a random variable called the error term.

  11. The relationship between X and Y is a straight-line relationship. The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2) Assumptions of the Simple Linear Regression Model Y E[Y]=0 + 1 X Identical normal distributions of errors, all centered on the regression line. X Assumptions of the Simple Linear Regression Model

  12. Errors in Regression Y . { X Xi

  13. SIMPLE REGRESSION AND CORRELATION Estimating Using the Regression Line First, lets look at the equation of a straight line is: Independent variable Dependent variable Slope of the line Y-intercept

  14. SIMPLE REGRESSION AND CORRELATION The Method of Least Squares To estimate the straight line we have to use the least squares method. This method minimizes the sum of squares of error between the estimated points on the line and the actual observed points. The sign of r will be the same as the sign of the coefficient “b” in the regression equation Y = a + b X

  15. SIMPLE REGRESSION AND CORRELATION The estimating line Slope of the best-fitting Regression Line Y-intercept of the Best-fitting Regression Line

  16. SIMPLE REGRESSION – EXAMPLE (Appliance store) Suppose an appliance store conducts a five-month experiment to determine the effect of advertising on sales revenue. The results are shown below. (File: PPT_Regr_example) Advertising Exp.($100s)Sales Rev.($1000S) 11 21 32 42 54

  17. SIMPLE REGRESSION AND CORRELATION :- r is the positive square root :- r is the negative square root If the slope of the estimating line is positive If the slope of the estimating line is negative The relationship between the two variables is direct

  18. Steps in Hypothesis Testing using SPSS • State the null and alternative hypotheses • Define the level of significance (α) • Calculate the actual significance : p-value • Make decision : Reject null hypothesis, if p≤ α, for 2-tail test • Conclusion

  19. Summary of SPSS Regression Analysis Output

  20. Excel and SPSS Correlation Outputs

  21. H0:  = 0 (No significant linear relationship) H1:   0 (Linear relationship is significant) Hypothesis Tests for the Correlation Coefficient Use p-value for decision making.

  22. Analysis-of-Variance Table and an F Test of the Regression Model H0 : The regression model is not significant H1 : The regression model is significant

  23. The p-value is 0.035 Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. b is not equal to zero. Thus, the independent variable is linearly related to y. This linear regression model is valid

  24. Testing for the existence of linear relationship • We test the hypothesis: H0: b = 0 (the independent variable is not a significant predictor of the dependent variable) H1: b is not equal to zero (the independent variable is a significant predictor of the dependent variable). • If b is not equal to zero (if the null hypothesis is rejected), we can conclude that the Independent variable contributes significantly in predicting the Dependent variable.

  25. Conclusion: Alternately, the actual significance is 0.035. Therefore we will reject the null hypothesis. The advertising expenses is a significant explanatory variable.

More Related