Simple Linear Regression and Correlation: Inferential Methods Chapter 13 AP Statistics Peck, Olsen and Devore
Topic 2: Summary of Bivariate Data • In Topic 2 we discussed summarizing bivariate data • Specifically we were interested in summarizing linear relationships between two measurable characteristics • We summarized these linear relationships by performing a linear regression using the method of least squares
Least Squares Regression • Graphically display the data in a scatterplot • Form, strength and direction • Calculate the Pearson’s Correlation Coefficient • The strength of the linear association • Perform the least squares regression • Inspect the residual plot • Determine if the model is appropriate • No patterns • Determine the Coefficient of Determination • How good is the model as a prediction tool • Use the model as a prediction tool
Interpretation • Pearson’s correlation coefficient • Coefficient of Determination • Variables in • Standard deviation of the residuals
Simple Linear Regression Model • ‘Simple’ because we had only one independent variable • We interpreted as a predicted value of y given a specific value of x • When we can describe this as a deterministic model. That is, the value of y is completely determined by a given value x • That wasn’t really the case when we used our linear regressions. The value of y was equal to our predicted value +/- some amount. That is, We call this a probabilistic model. • So, without e, the (x,y) pairs (observed points) would fall on the regression line.
Now consider this … • How did we calculate the coefficients in our linear regression models? • We were actually estimating a population parameter using a sample. That is, the simple linear regression is an estimate for the population regression line • We can consider estimates for
Basic Assumptions for the Simple Linear Regression Model • The distribution of e at any particular value of x has a mean value of 0. That is, • The standard deviation of e is the same for any value of x. Always denoted by • The distribution of e at any value of x is normal • The random deviations are independent.
Another interpretation of • Consider , where the coefficients are fixed and e is distributed normally. Then the sum of a fixed number and a normally distributed variable is normally distributed (Chapter 7). So y is normally distributed. • Now the mean of y will be equal to plus the mean of e which is equal to 0 • So another interpretation is the mean y value for a given x value =
Distribution of y • Where we can now see that y is distributed normally with a mean of • The variance for y is the same as the variance of e -- which is • An estimate for is
Assumption • The major assumption to all this is that the random deviation e is normally distributed. • We’ll talk more about how this assumption is reasonable later.
Inferences about the slope of the population regression line • Now we are going to make some inferences about the slope of the regression line. Specifically, we’ll construct a confidence interval and then perform a hypothesis test – a model utility test for simple linear regression
Just to repeat … • We said the population regression model is • The coefficients of this model are fixed but unknown (parameters) – so using the method of least squares, we estimate these parameters using a sample of data (statistics) and we get
Sampling distribution of b • We use b as an estimate for the population coefficient in the simple regression model • b is therefore a statistic determined by a random sample and it has a sampling distribution
Sampling distribution of b • When the four assumptions of the linear regression model are met • The mean value of the sampling distribution of b is . That is, • The standard deviation of the statistic b is • The sampling distribution of b is normally distributed.
Estimates for … • The estimate for the standard deviation of b is • When we standardize b it has a t distribution with n-2 degrees of freedom
Confidence Interval • Sample Statistic +/- Crit Value * Std Dev of Stat
Hypothesis Test • We’re normally interested in the null because if we reject the null, the data suggests there is a useful linear relationship between our two variables • We call this ‘Model Utility Test for Simple Linear Regression’
Summary of the Test • Test Statistic • Assumptions are the same four as those for the simple linear regression model.