1 / 62

Relationships

Relationships. If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the variables ?. Association Between Variables :. Two variables measured on the

Download Presentation

Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relationships • If we are doing a study which involves more than one variable, how • can we tell if there is a relationship between two (or more) of the • variables ? • Association Between Variables : Two variables measured on the same individuals are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable. • Response Variable : A response variable measures an outcome of a study. • Explanatory Variable : An explanatory variable explains or causes changes in the response variable.

  2. 2.1: Scatterplots • A scatterplot shows the relationship between two variables. • The values of one variable appear on the horizontal axis, and the • values of the other variable appear on the vertical axis. • Always plot the explanatory variable on the horizontal axis, and the • response variable as the vertical axis. Example: If we are going to try to predict someone’s weight from their height, then the height is the explanatory variable, and the weight is the response variable. • The explanatory variable is often denoted by the variable x, and is • sometimes called the independent variable. • The response variable is often denoted by the variable y, and is • sometimes called the dependent variable.

  3. Father’s Height Son’s Height Scatterplots Example: Do you think that a father’s height would affect a son’s height? We are saying that given a father’s height, can we make any determinations about the son’s height ? The explanatory variable is : The father’s height The response variable is : The son’s height Data Set : 64 65 68 67 68 70 70 72 72 75 74 70 75 73 75 76 76 77 77 76

  4. Father’s Height Son’s Height Father’s Height Son’s Height 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 Response Variable (Son’s Height) 76 72 Explanatory Variable (Father’s Height) 68 64 64 68 72 76

  5. Father’s Height Son’s Height Father’s Height Son’s Height 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 76 72 68 64 64 68 72 76 Son Father

  6. Examining A Scatterplot • In any graph of data, look for the overall pattern and for striking • deviations from that pattern. • You can describe the overall pattern of a scatterplot by the form, • direction, and strength of the relationship. • Strength : How closely the points follow a clear form. • An important kind of deviation is an outlier, an individual that • falls outside the overall pattern of the relationship. • Two variables are positively associated when above-average values • of one tend to accompany above average values of the other and • below average values also tend to occur together. • Two variables are negatively associated when above-average values • of one accompany below-average values of the other; and vice versa.

  7. Direction Type of associations between X and Y. 1. Two variables are positively associated if small values of X are associated with small values of the Y, and if large values of X are associated with large values of Y. There is an upward trend from left to right.

  8. Positive Association Y . . . . . . . . . . . . X

  9. Direction Type of associations between X and Y. 2. Two variables are negatively associated if small values of one variable are associated with large values of the other variable, and vice versa. There is a downward trend from left to right.

  10. Negative Association Y . . . . . . . . . . . . . X

  11. Form Describe the type of trend between X and Y. 1. Linear - points fall close to a straight line.

  12. Linear Association Y . . . . . . . . . . . . X

  13. Form Describe the type of trend between X and Y. 1. Linear - points fall close to a straight line. 2. Quadratic - points follow a parabolic pattern.

  14. Quadratic Association Y . . . . . . . . . . . . . . . . . . . . X

  15. Form Describe the type of trend between X and Y. 1. Linear - points fall close to a straight line. 2. Quadratic - points follow a parabolic pattern. 3. Exponential - points follow a curved pattern.

  16. Exponential Growth Y . . .. . . . . . . . . .. . . . . .. . . . . . .. . . . .. . . . . . . .. . . . . . X

  17. Strength Measures the amount of scatter around the general trend. Linear- The closer the points fall to a straight line,the stronger the relationship between the two variables.

  18. Strong Association Y . . . . . . . . . . . . X

  19. Moderate Association Y . . . . . . . . . . . . . . . . . . . . . . . . . . X

  20. Weak Association Y . .. . . . . . . . . . . . . . . . . . . . . . . X

  21. 76 Son 72 68 Father 64 76 72 64 68 Examining A Scatterplot Consider the previous scatterplot : Direction : Going up Form : Linear Association : Positive Strength : Strong Outliers : None

  22. Example : The following is a scatterplot of data collected from states about students taking the SAT. The question is whether the percentage of students from a state that takes the test will influence the state’s average scores. For instance, in California, 45 % of high school graduates took the SAT and the mean verbal score was 495. Direction : Downward Form : Curved Association : Negative Strength : Strong Outliers : Maybe

  23. §2.2: Correlation Recall that a scatterplot displays the form, direction, and strength of the relationship between two quantitative variables. Linear relationships are important because they are the easiest to model, and are fairly common. We say a linear relationship is strong if the points lie close to a straight line, and weak if the points are scattered around the line. Correlation (r) measures the direction and the strength of the linear relationship between two quantitative variables. The + / - sign denotes a positive or negative association. The numeric value shows the strength. If the strength is strong, then r will be close to 1 or -1. If the strength is weak, then r will be close to 0.

  24. Correlation = 0 Correlation = - 0.3 Correlation = - 0.7 Correlation = 0.5 Correlation = - 0.99 Correlation = 0.9 Correlation Examples

  25. Which has the better correlation ?

  26. The means and standard deviations of the two variables are and for the x-values, and and for the y-values.  ( ) r = 1 x y y x n - 1 s s s s y x - - ( ) i i x y x y Correlation So, how do we find the correlation ? Suppose we have data on variables x and y for n individuals. Question : Will outliers effect the correlation ?

  27. Father’s Height Son’s Height Father’s Height Son’s Height Example: Recall the scatterplot data for the heights of fathers and their sons. 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 We decided that the father’s heights was the explanatory variable and the son’s heights was the response variable. The average of the x terms is 71.9 and the standard deviation is 4.25 The average of the y terms is 72.1 and the standard deviation is 4.07

  28. ( )  x - ( ) 1 x y - y i i r = n - 1 s s x y y - x - y x i i y - y s y x x - s x i i i i y x 64 -7.9 -1.86 65 -7.1 -1.75 68 -3.9 -0.92 67 -5.1 -1.25 68 -3.9 -0.92 70 -2.1 -0.52 70 -1.9 -0.45 72 -0.1 -0.02 72 0.1 0.02 75 2.9 0.71 74 2.1 0.49 70 -2.1 -0.52 75 3.1 0.73 73 0.9 0.22 75 3.1 0.73 76 3.9 0.95 76 4.1 0.96 77 4.9 1.20 77 5.1 1.20 76 3.9 0.95

  29. x - y - x y i i x x - s y - y s x y i i i x i y 64 -7.9 -1.86 65 -7.1 -1.75 68 -3.9 -0.92 67 -5.1 -1.25 68 -3.9 -0.92 70 -2.1 -0.52 70 -1.9 -0.45 72 -0.1 -0.02 72 0.1 0.02 75 2.9 0.71 74 2.1 0.49 70 -2.1 -0.52 75 3.1 0.73 73 0.9 0.22 75 3.1 0.73 76 3.9 0.95 76 4.1 0.96 77 4.9 1.20 77 5.1 1.20 76 3.9 0.95 [ ] 1 1 (-1.86)(-1.75) + (-0.92)(-1.25) + ….. + (1.20)(0.95) 9 10 - 1 [ ] (3.24) + (1.14) + ….. + (1.14) = r = = 0.87

  30. Shortcut Calculations

  31. Facts about Correlation Correlation makes no distinction between explanatory and response variables. The correlation between x and y is the same as the correlation between y and x. Correlation requires that both variables be quantitative. We cannot compute a correlation between a categorical variable and a quantitative variable or between two quantitative variables. r does not change when we do transformations. The correlation between height and weight is the same whether height was measured in feet or centimeters or weight was measured in kilograms or pounds. This happens because all the observations are standardized in the Calculation of correlation. The correlation r itself has no unit of measurement, it is just a number.

  32. Exercise What’s wrong with these statements? 1. At AU there is no correlation between the ethnicity of students and their GPA. 2. The correlation between height and weight of stat202 students (a) is 2.61 (b) is 0.61 inches per pound (c) is 0.61, so the corr. between weight and height is -0.61 (d) is 0.61 using inches and pounds, but converting inches to centimeters would make r > 0.61 (since an inch equals about 2.54 centimeters).

  33. §2.3: Least-Squares Regression • Correlation measures the direction and strength of a straight-line • (linear) relationship between two quantitative variables. • We have tried to summarize the data by drawing a straight-line • the through the data. • A regression line summarizes the relationship between two • variables. • These can only be used in one setting : when one variable helps • explain or predict the other variable.

  34. Regression Line • A regression line is a straight line that describes how a response • variable y changes as an explanatory variable x changes. We often • use a regression line to predict the value of y for a given value of • x. Regression, unlike correlation, requires that we have an • explanatory variable and a response variable. • If a scatterplot displays a linear pattern, we can describe the overall • pattern by drawing a straight line through the points. • This is called fitting a line to the data. • This is a mathematical model which we can use to make predictions • based on the given data.

  35. Father’s Height Son’s Height Father’s Height Son’s Height 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 Example: Recall the data we were using before where we compared the heights of fathers and sons. The first thing we did was to plot the points.

  36. 76 72 Son 68 64 Father 64 68 72 76 Example: Recall the data we were using before where we compared the heights of fathers and sons. The line which is closest to all the points is the regression line.

  37. Least-Squares Regression Line • The least squares regression line of y on x is the line that makes the • sum of the squares of the vertical distances of the data points from • the line as small as possible.

  38. Assume the mean for the explanatory variable is and the • standard deviation is • Assume the mean for the response variable is and the • standard deviation is x y s s x y Equation of the Least-Squares Regression Line • Imagine we have data on an explanatory variable x and a response • y for n individuals. • Assume the correlation between x and y is r.

  39. a + bx = y slope intercept b = r s y ( ) x y - b a = s x Equation of the Least-Squares Regression Line • The equation of the least-squares regression line of y on x is : The slope is b : The intercept is a :

  40. Interpretation of Regression Coefficients • The Y-Intercepta is the value of the response variable, y, when the explanatory variable, x, is zero. • The Slope, b is the change in the response variable, y, for a unit increase in the explanatory variable, x.

  41. Example: What if we want to find the least-squares regression line where we will predict the son’s height from the father’s height ? Note: The father’s heights are the explanatory variable, and the son’s height is the response variable. We need the means and the standard deviations : The average of the x terms is 71.9 and the standard deviation is 4.25 The average of the y terms is 72.1 and the standard deviation is 4.07 The correlation between the two variables is 0.87

  42. r = 0.87 = 71.9 = 4.25 = 4.07 = 72.1 a + bx 12.2 + .83x a + bx = = = y y y b = r s s y y ( ) 4.07 ( ) y x x y - b 0.87 a = s s 4.25 x x The equation for the regression line is : We need to find the slope : b = = 0.8331529 Next, find the intercept : a = 72.1 - (0.8331529)(71.9) = 12.196307 So, the equation for the regression line is :

  43. 12.2 + .83x = y y A: = 12.2 + .83(70) = Making Predictions We can use the regression line to make some predicts. Example : Based on the previous data, we can predict the son’s height from the father’s height. Q: If the father’s height is 70 inches, what is our prediction for the son’s height? 70.1 Note: These predictions are only good on relevant data!!

  44. 12.2 + .83x = y 76 72 Son 68 64 Father 64 68 72 76

  45. The point is always on the regression line. b = r s • The square of the correlation, , is the fraction of the variation • in the values of y that is explained by the least-squares regression • of y on x. y ( ) x y 2 2 r r s ( ) Example: The straight line relationship between father’s heights and son’s height is = (0.87) = 0.7569 explains the variation in heights. x , 2 Notes On Regression • This equation says that a change of one standard deviation in x • corresponds to a change of r standard deviations in y.

  46. y Residual = y - Residuals Analysis • A regression line is a mathematical model for the overall pattern • of a linear relationship between an explanatory variable, and a • response variable. • The regression line is chosen so that the vertical distances to the line • from all the points is as small as possible. • A residual is the difference between an observed value of the • response variable, and the value predicted by the regression line. Residual = observed y - predicted y

  47. Father’s Height Son’s Height Father’s Height Son’s Height 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 Example of Residuals Go back to our favorite example :

  48. Father’s Height Son’s Height Father’s Height Son’s Height 64 65 74 70 68 67 75 73 68 70 75 76 70 72 76 77 72 75 77 76 y y 12.47 + .8293x = Example of Residuals Go back to our favorite example : We found the regression line to be: So, when the father’s height is 64 inches, we expect the son to be how tall? = 12.47 + (.8293)(64) = 65.5452 However, the actual height of the son is 65 inches, so the residual is : 65 - 65.5452 = -0.5452 This tells us the point is .5452 units below the regression line.

  49. Predicted Height Son’s Height Residual y y 12.47 + .8293x = Residual = y - 65 67 70 72 75 70 73 76 77 76 65.5452 68.8624 68.8624 70.521 72.1796 73.8382 74.6675 74.6675 75.4968 76.3261 -0.5452 -1.8624 1.1376 1.479 2.8204 -3.8382 -1.6675 1.3325 1.5032 -0.3261 Average = 0.0037678 • Again, we could have drawn our line anywhere on the graph • The least squares regression line has the property that the mean • of the least-squares residuals is always zero!

More Related