1 / 14

Linear Regression

Linear Regression. Modeling with Data. The BIG Question. Did you prepare for today?. If you did, mark yes and estimate the amount of time you spent preparing on your frequency log. Problem.

alaina
Download Presentation

Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression Modeling with Data

  2. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing on your frequency log.

  3. Problem Suppose we are given the following data about father and son heights to analyze. What can we conclude about it?

  4. Connect Is there anything we have studied that can help you think where to start? How about if we formulate a hypothesis to investigate such as: Is there a correlation between a father’s height and his son’s height? : There is a correlation between a father’s height and his son’s height. : There is no correlation between a father’s height and his son’s height.

  5. Definitions For a problem such as this one, we are trying to determine if there is a relationship between two variables. This is called a correlation. The data can be represented as ordered pairs (x, y). Does anyone recall what the x and y are called? The x-variable is the independent (or explanatory) variable and the y-variable is the dependent (or response) variable. This is similar to the concepts you have seen in algebra. In our example, the father’s height is the independent variable and the son’s height is the dependent variable.

  6. Scatter plot A scatter plot is a plotting of the ordered pairs (x, y) which is used to see what kind of correlation two variables might have. Example 1: What kind of correlation would you guess these data sets to have? Negative Linear Correlation Nonlinear Correlation No Correlation Positive Linear Correlation

  7. Father and Son Data Scatter plot Using SPSS, I loaded the father and son height data into the software. I then generated a scatter plot for the data which looks like: What kind of correlation does it look like it might have? Looks like a positive linear relationship.

  8. Question Is there a way can we can calculate to find out if there is a correlation and how strong it might be? The correlation coefficient, denoted as r, gives us a measure of the strength and direction of a linear relationship between two variables. The population correlation coefficient is denoted as ρ . How do we calculate the correlation coefficient? The formula is: Where n is the number of data pairs.

  9. 1 -1 0 What is the correlation coefficient for the father and son data? Using SPSS we have the following output: This is the correlation coefficient. About where .668 is. What is the range for the correlation coefficient? ● If r is close to 0 there is no linear correlation If r = -1 there is a perfect negative correlation If r = 1 there is a perfect positive correlation

  10. Analysis Since the correlation coefficient is .668, this implies there seems to be a positive linear relationship between a father’s height and his son’s height. However, does this imply that this relationship is significant enough to use it to predict if it would hold as a population correlation coefficient for ρ? We would use r as the test statistic and could use the standardized test statistic t with degrees of freedom n - 2. How do we calculate the t statistic here?

  11. Hypothesis testing for significance Testing the null hypothesis that there is no linear relationship between the independent and dependent variables, we would use the model: : ρ = 0 : ρ≠ 0 • = .05 Degrees of freedom would be 11 – 2 = 9. Thus at a .05 significance, the rejection region starts at - = -2.262 and = 2.262. Example

  12. Calculate and Summarize By running a model analysis in SPSS we have: At the .05 level of significance, the t-value is 2.690 The test statistic lies inside of the rejection region which starts at 2.262. Thus there is enough evidence to reject the null hypothesis and conclude there is a significant linear correlation between a father’s height and his son’s height.

  13. Finding the Regression Line Now that we know that there is a significant linear correlation between a father and son’s height, we can find the regression line. The regression line is the line that best models the data. It can be used to predict the value of y given a value of x. In SPSS we find the regression line to the right:

  14. Question Can we find the exact equation of the regression line? Yes, the equation is similar to the equation of a line from algebra. Who recalls the equation of a line?

More Related