1 / 23

Lecture Notes

Lecture Notes. The Relation between Two Variables Q Q Correlation and Regression Prof. L Prado OER - www.helpyourmath.com. Mathematical model is a mathematical expression that represents some phenomenon. It can be deterministic model or probabilistic model

mcleodr
Download Presentation

Lecture Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture Notes The Relation between Two Variables Q Q Correlation and Regression Prof. L Prado OER - www.helpyourmath.com

  2. Mathematical model is a mathematical expression that represents some phenomenon. It can be deterministic model or probabilistic model Often describe the relationship between 2 variables.

  3. 1 2 3 • Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient

  4. 10.1. Scatter Diagrams and Correlation When dealing with 2 variables: • We try to see the relationship between the 2 variables • Sometimes there is a 3rd variable that is not considered, that affects the results (lurking variable). Shoe size does not cause height to change (age affects both the two variables) • Therefore, we can’t conclude that variable A causes B • Some examples are: • Rainfall amounts and plant growth (possible lurking var. Sunlight) • Exercise and cholesterol levels for a group of people (possible lurking variable Diet) • Height and weight for a group of people • Height and fast speed you have ever driven a car. • When we have two variables, they could be related in one of several different ways • They could be unrelated • One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) • One variable could be thought of as causing the other variable to change

  5. Scatter Diagrams The scatter diagramis a graph that shows the relationship visually between 2 quantitative variables. The explanatory variable is plotted on the horizontal axis, the response variable on the vertical axis The response variable (y-axis) is the variable whose value can be explained by the value of the explanatory variable (x-axis).

  6. Linear Correlation • The linearcorrelationcoefficient is a measure of the strength and direction of linear relation between two quantitative variables • The sample correlation coefficient “r” is • This should be computed with software (and not by hand) whenever possible

  7. Answer ‘How StrongIs the Linear Relationship Between 2 Variables?’ • Coefficient of Correlation Used • Population Correlation Coefficient Denoted  (Rho) • Values Range from -1 to +1 • Measures Degree of Association • The sign of r indicates the direction of the relationship: Positive the two variables tend to increase together. Negative one variable increases, the other is likely to decrease. • Used Mainly for Understanding

  8. Perfect Negative Correlation Perfect Positive Correlation No Correlation -1.0 -.5 0 +.5 +1.0 Increasing degree of negative correlation Increasing degree of positive correlation

  9. Strong Negative r = –.8 Strong Positive r = .8 Moderate Negative r = –.5 Moderate Positive r = .5 Very Weak r = .1 Very Weak r = –.1 • Examples of positive correlation • Examples of negative correlation • In general, if the correlation is visible to the eye, then it is likely to be strong

  10. Data x 1 2 1 8 3 6 5 4 y nxy–(x)(y) (Shorcut formula) r= n(x2)– (x)2n(y2)– (y)2 4(48)–(10)(20) r= 4(36)– (10)24(120)– (20)2 –8 r= = –0.135 59.329

  11. Correlation is not causation! • Just because two variables are correlated does not mean that one causes the other to change • There is a strong correlation between shoe sizes and vocabulary sizes for grade school children • Clearly larger shoe sizes do not cause larger vocabularies • Clearly larger vocabularies do not cause larger shoe sizes • Often lurking variables result in confounding

  12. Summary: Chapter 10 – Section 1 • Visual methods • Scatter diagrams • Analogous to histograms for single variables • Numeric methods • Linear correlation coefficient • Analogous to mean and variance for single variables • Care should be taken in the interpretation of linear correlation (nonlinearity and causation) • Correlation between two variables can be described with both visual and numeric methods

  13. Chapter 10 – Section 2 1 2 3 • Learning objectives Find the least-squares regression line and use the line to make predictions and estimations Interpret the slope and the y-intercept of the least squares regression line Compute the sum of squared residuals

  14. If we have two variables X and Y, we often would like to model the relation as a line • Draw a line through the scatter diagram • We want to find the line that “best” describes the linear relationship … the regression line (“The Best Fit”)

  15. Linear Equations • We want to use a linear model • Linear models can be written in several different (equivalent) ways • y = m x + b • y – y1 = m (x – x1) • y = b0 + b1x • Because the slope and the intercept are important to analyze, we will use y = b0 + b1x

  16. Linear Equations BMCC PROFESSOR

  17. The residual The model line The observed value y The predicted value y The x value of interest • One difference between math and stat is that statistics assumes that the measurements are not exact, that there is an error or residual • The formula for the residual is always • Residual = Observed – Predicted • What the residual is on the scatter diagram The equation for the least-squares regression line is given by y = b0 + b1x • b1 is the slope of the least-squares regression line (marginal change) • b0 is the y-intercept of the least-squares regression line

  18. x 1 2 4 5 ^ y= 5 + 4x y 4 24 8 32 Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

  19. n(xy) – (x) (y) b1 = (slope) n(x2) – (x)2 b0 =y – b1x (y-intercept) (slope of the least-squares regression line) (Shorcut) calculators or computers can compute these values

  20. Finding the values of b1 and b0, by hand, is a very tedious process • You should use software for this • Finding the coefficients b1 and b0 is only the first step of a regression analysis • We need to interpret the slope b1 • We need to interpret the y-intercept b0 • We need to do quite a bit more statistical analysis … this is covered in Section 4.3 and also in Chapter 14

  21. Data x 1 2 1 8 3 6 5 4 y n(xy) – (x) (y) b1 = b0 =y – b1x 5 – (–0.181818)(2.5) = 5.45 n(x2) –(x)2 4(48) – (10) (20) b1 = 4(36) – (10)2 –8 b1 = = –0.181818 44 n= 4 x = 10 y = 20 x2 = 36 y2 = 120 xy = 48 Theestimated equation of the regression line is: ^ y= 5.45 – 0.182x

  22. Guidelines for Using The Regression Equation 1. If there is no significant linear correlation, don’t use the regression equation to make predictions. 2. When using the regression equation for predictions, stay within the scope of the available sample data. 3. A regression equation based on old data is not necessarily valid now. 4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

More Related