190 likes | 268 Views
Simple Linear Regression. Statistics 700 December 4-7, 2001.
E N D
Simple Linear Regression Statistics 700 December 4-7, 2001
The human body takes in more oxygen when exercising than when it is at rest. To deliver oxygen to the muscles, the heart must beat faster. Heart rate is easy to measure, but measuring oxygen uptake requires elaborate equipment. If oxygen uptake (VO2) can be accurately predicted from heart rate (HR), the predicted values may replace actually measured values for various research purposes. Unfortunately, not all human bodies are the same, so no single prediction equation works for all people. Researchers can, however, measure both HR and VO2 for one person under varying sets of exercise conditions and calculate a regression equation for predicting that person’s oxygen uptake from heart rate. Example for Illustration Simple Linear Regression
Data From An Individual • Goals in this illustration: • Scatterplot: linear relationship or not? • Obtain the best-fitting line using least-squares. • To test whether the model is significant or not. • To obtain a confidence interval for the regression coefficient. • To obtain predictions. Simple Linear Regression
The Scatterplot Simple Linear Regression
Simple Linear Regression Model 1. Conditional on X=x, the response variable Y has mean equal to m(x) = a + bx. 2. ais the y-intercept; whileb is the slope of the regression line, which could be interpreted as the change in the mean value per unit change in the independent variable. 3. For each X = x, the conditional distribution of Y is normal with mean m(x) and variance s2. 4. Y1, Y2, …, Yn are independent of each other. Shorthand: Yi = a + bxi + ei with ei IID N(0,s2) Simple Linear Regression
Least-Squares (LS) Regression One of the goals in regression analysis is to estimate the parameters a, b, and s2of the regression model. Denote by The estimate of the regression line, so that a estimates a, and b estimates b. Then for the observed values of X, which are x1, x2, …, xn, we may obtain the predicted values of the response variable Y for each of these X-values. These are: Simple Linear Regression
Predicted Values A good estimate of the regression line should produce predicted values that are close to the actual observed values of the response variable. That is, the set of deviations Should ideally be close (if not equal) to zeros. These deviations between observed and predicted values are also called as residuals. Simple Linear Regression
Principle of Least-Squares (LS) In least-squares regression, the best-fitting regression line is that which will make the sum of these squared deviations or residuals as small as possible. Thus, the regression coefficients a and b are chosen in order to minimize the quantity: Using calculus, the values of a and b that will minimize this quantity are given by: Simple Linear Regression
Least-Squares Solution Simple Linear Regression
Estimating the Variance Simple Linear Regression
Interpretations of Quantities • SSE : measures variation not be explained by predictor. • SSR : measures the amount of variation explained by predictor variable. • SYY: total variation in the Y-values. This is partitioned into SSR and SSE. • R2 = (SSR)/(SYY) : coefficient of determination; indicates proportion of variation in Y-values explained by the predictor variable. • MSE = (SSE)/(n-2) : is the mean-squared error. This provides an unbiased estimate of the common variance s2. Simple Linear Regression
Sampling Distributions of Estimators To estimate the variance, s2 is replaced by the MSE. Simple Linear Regression
Testing Hypothesis • To test the null hypothesis H0: b = b0 versus H1: b not equal to b0 we use the t-statistic given by: Which follows a t-distribution with degrees-of-freedom equal to n-2 under the null hypothesis. Thus, we reject H0 if |Tc| > tn-2;a/2. Similarly, for testing H0: a = a0, we use: Simple Linear Regression
Confidence Interval for Mean and Predicting the Value of Y of a new Unit Estimate of Mean and Predicted Value at x0: Variance: CI for m(x0): CI for Y(x0): Simple Linear Regression
Results of Regression Analysis (using Minitab) P-value for regression P-Value (MSR)/(MSE) Simple Linear Regression
Fitted Line on the Scatterplot Simple Linear Regression
Confidence Interval for Mean and Prediction Interval Simple Linear Regression
Excel Implementation of Formulas Simple Linear Regression