150 likes | 153 Views
Learn about simple linear regression analysis, a technique to examine the relationship between an outcome variable (Y) and explanatory variables (X), and how to estimate and predict the effect of X on Y. Explore hypothesis testing, interval estimation, and model comparison methods.
E N D
BA 275 Quantitative Business Methods Agenda • Simple Linear Regression • Inference for Regression • Inference for Prediction
Regression Analysis • A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X1, X2, … Xk). • The model allows us to understand (quantify) the effect of each X on Y. • It also allows us to predict Y based on X1, X2, …. Xk.
Types of Relationship • Linear Relationship • Simple Linear Relationship • Y = b0 + b1 X + e • Multiple Linear Relationship • Y = b0 + b1 X1 + b2 X2 + … + bk Xk + e • Nonlinear Relationship • Y = a0 exp(b1X+e) • Y = b0 + b1 X1 + b2 X12 + e • … etc. • Will focus only on linear relationship.
Simple Linear Regression Model population True effect of X on Y Estimated effect of X on Y sample Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?
Least Squares Method • Least squares line: • It is a statistical procedure for finding the “best-fitting” straight line. • It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Sum of Squares is minimized. Bad fit.
Initial Analysis • Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations • Things to look for • Features of Data (e.g., data range, outliers) • do not want to extrapolate outside data range because the relationship is unknown (or un-established). • Summary statistics and graphs. • Is the assumption of linearity appropriate?
Correlation • r (rho): Population correlation (its value most likely is unknown.) • r: Sample correlation (its value can be calculated from the sample.) • Correlation is a measure of the strength of linear relationship. • Correlation falls between –1 and 1. • No linear relationship if correlation is close to 0. r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1
Correlation (r vs. r) Sample size P-value for H0: r = 0 Ha: r≠ 0 r = 0.9584
Fitted Model: Least Squares Line b0 b1 Least squares line: estimated_Price = –15.1245 + 76.1745 Area.
Hypothesis TestingKey Q1: Does X have any effect on Y? b0 H0: b1 = 0 Ha: b1≠ 0 SEb1 b1 SEb0 Degrees of freedom = n – p – 1 p = # of independent variables used.
Interval EstimationKey Q2: If so, how large is the effect? b0 SEb1 b1 SEb0 Degrees of freedom = n – p – 1 p = # of independent variables used.
Prediction and Confidence IntervalsKey Q3: Given X, what is the estimated Y? • What is your estimated price of that 2000-sf house on the 9th street? • Quick answer: estimated price = -15.1245 + 76.1745 (2) = 137.2245 • What is the average price of a house that occupies 2000 sf? • Quick answer: estimated price = -15.1245 + 76.1745 (2) = 137.2245 • What is the difference?
Prediction and Confidence Intervals Prediction interval Confidence interval
Model Comparison: A Good Fit? s = SS = Sum of Squares = ???