480 likes | 508 Views
Learn the fundamentals of simple linear regression, including the regression model, estimation process, coefficient of determination, and significance testing with examples and interpretation. Improve your statistical analysis skills.
E N D
Chapter 12 Simple Linear Regression • Simple Linear Regression Model • Least Squares Method • Coefficient of Determination • Model Assumptions • Testing for Significance • Using the Estimated Regression Equation for Estimation and Prediction • Computer Solution • Residual Analysis: Validating Model Assumptions
Simple Linear Regression Model • The equation that describes how y is related to x and an error term is called the regression model. • The simple linear regression model is: y = b0 + b1x +e • b0 and b1 are called parameters of the model. • e is a random variable called the error term.
Simple Linear Regression Equation • The simple linear regression equation is: E(y) = 0 + 1x • Graph of the regression equation is a straight line. • b0 is the y intercept of the regression line. • b1 is the slope of the regression line. • E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation • Positive Linear Relationship E(y) Regression line Intercept b0 Slope b1 is positive x
Simple Linear Regression Equation • Negative Linear Relationship E(y) Regression line Intercept b0 Slope b1 is negative x
Simple Linear Regression Equation • No Relationship E(y) Regression line Intercept b0 Slope b1 is 0 x
Estimated Simple Linear Regression Equation • The estimated simple linear regression equation is: • The graph is called the estimated regression line. • b0 is the y intercept of the line. • b1 is the slope of the line. • is the estimated value of y for a given x value.
Estimation Process Sample Data: x y x1 y1 . . . . xnyn Regression Model y = b0 + b1x +e Regression Equation E(y) = b0 + b1x Unknown Parameters b0, b1 b0 and b1 provide estimates of b0 and b1 Estimated Regression Equation Sample Statistics b0, b1
Least Squares Method • Least Squares Criterion where: yi = observed value of the dependent variable for the ith observation yi = estimated value of the dependent variable for the ith observation ^
The Least Squares Method • Slope for the Estimated Regression Equation
The Least Squares Method • y-Intercept for the Estimated Regression Equation where: xi = value of independent variable for ith observation yi = value of dependent variable for ith observation x = mean value for independent variable y = mean value for dependent variable n = total number of observations _ _
Example: Reed Auto Sales • Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide.
Example: Reed Auto Sales • Simple Linear Regression Number of TV AdsNumber of Cars Sold 1 14 3 24 2 18 1 17 3 27
Example: Reed Auto Sales • Slope for the Estimated Regression Equation b1 = 220 - (10)(100)/5 = 5 24 - (10)2/5 • y-Intercept for the Estimated Regression Equation b0 = 20 - 5(2) = 10 • Estimated Regression Equation y = 10 + 5x ^
Example: Reed Auto Sales • Scatter Diagram ^
The Coefficient of Determination ^ ^ • Relationship Among SST, SSR, SSE SST = SSR + SSE where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error
The Coefficient of Determination • The coefficient of determination is: r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression
Example: Reed Auto Sales • Coefficient of Determination r2 = SSR/SST = 100/114 = .8772 The regression relationship is very strong because 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.
The Correlation Coefficient • Sample Correlation Coefficient where: b1 = the slope of the estimated regression equation
Example: Reed Auto Sales • Sample Correlation Coefficient The sign of b1 in the equation is “+”. rxy = +.9366
Model Assumptions • Assumptions About the Error Term • The error is a random variable with mean of zero. • The variance of , denoted by 2, is the same for all values of the independent variable. • The values of are independent. • The error is a normally distributed random variable.
Testing for Significance • To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero. • Two tests are commonly used • t Test • F Test • Both tests require an estimate of s2, the variance of e in the regression model.
Testing for Significance • An Estimate of s2 The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where:
Testing for Significance • An Estimate of s • To estimate s we take the square root of s 2. • The resulting s is called the standard error of the estimate.
Testing for Significance: t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic where
Testing for Significance: t Test • Rejection Rule Reject H0 if t < -tor t > t where: tis based on a t distribution with n - 2 degrees of freedom
Example: Reed Auto Sales • t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Rejection Rule For = .05 and d.f. = 3, t.025 = 3.182 Reject H0 if t > 3.182
Example: Reed Auto Sales • t Test • Test Statistics • t = 5/1.08 = 4.63 • Conclusions • t = 4.63> 3.182, so reject H0
Confidence Interval for 1 • We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. • H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.
Confidence Interval for 1 • The form of a confidence interval for 1 is: where b1 is the point estimate is the margin of error is the t value providing an area of a/2 in the upper tail of a t distribution with n - 2 degrees of freedom
Example: Reed Auto Sales • Rejection Rule Reject H0 if 0 is not included in the confidence interval for 1. • 95% Confidence Interval for 1 = 5 +/- 3.182(1.08) = 5 +/- 3.44 or 1.56 to 8.44 • Conclusion 0 is not included in the confidence interval. Reject H0
Testing for Significance: F Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic F = MSR/MSE
Testing for Significance: F Test • Rejection Rule Reject H0 if F > F where: F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator
Example: Reed Auto Sales • F Test • Hypotheses • H0: 1 = 0 Ha: 1 = 0 • Rejection Rule • For = .05 and d.f. = 1, 3: F.05 = 10.13 • Reject H0 if F > 10.13.
Example: Reed Auto Sales • F Test • Test Statistic • F = MSR/MSE = 100/4.667 = 21.43 • Conclusion • F = 21.43 > 10.13, so we reject H0.
Some Cautions about theInterpretation of Significance Tests • Rejecting H0: b1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. • Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.
Using the Estimated Regression Equationfor Estimation and Prediction • Confidence Interval Estimate of E(yp) • Prediction Interval Estimate of yp yp+t/2 sind where: confidence coefficient is 1 - and t/2 is based on a t distribution with n - 2 degrees of freedom
Example: Reed Auto Sales • Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y = 10 + 5(3) = 25 cars ^
Example: Reed Auto Sales • Confidence Interval for E(yp) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25 + 4.61 = 20.39 to 29.61 cars
Example: Reed Auto Sales • Prediction Interval for yp 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25 + 8.28 = 16.72 to 33.28 cars
Residual Analysis ^ • Residual for Observation i yi – yi • Standardized Residual for Observation i where: and ^ ^ ^
Example: Reed Auto Sales • Residuals
Example: Reed Auto Sales • Residual Plot
Residual Analysis • Residual Plot Good Pattern Residual 0 x
Residual Analysis • Residual Plot Nonconstant Variance Residual 0 x
Residual Analysis • Residual Plot Model Form Not Adequate Residual 0 x