880 likes | 1.29k Views
Statistics for Business and Economics. Chapter 10 Simple Linear Regression. Learning Objectives. Describe the Linear Regression Model State the Regression Modeling Steps Explain Least Squares Compute Regression Coefficients Explain Correlation Predict Response Variable.
E N D
Statistics for Business and Economics Chapter 10 Simple Linear Regression
Learning Objectives • Describe the Linear Regression Model • State the Regression Modeling Steps • Explain Least Squares • Compute Regression Coefficients • Explain Correlation • Predict Response Variable
What is Regression? • 1. Method of modeling relationships • between a response variable Y and one or more predictors X • 2. A way of “fitting line through data”
Example 1: Store Site Selection Model sales Y at existing sites as a function of demographic variables: X1 = Population in store vicinity X2 = Income in area X3 = Age of houses in area X4 = X5 = From equation, predict sales at new sites
Example 2:Marketing Research Model consumer response to a product on basis of product characteristics: Y = Taste score on soft drink X1 = Sugar level X2 = Carbonation level X3 = X4 =
Example 3:Operations/Quality Model product quality in plant as a function of manufacturing process characteristics: Y = Quality score of sheet metal X1 = Raw material purity score X2 = Molten aluminum temp X3 = Line speed X4 =
Example 4:Real Estate Pricing Y = Selling price of houses X1 = Square feet X2 = Taxes X3 = Lot acreage X4 = X5 = X6 =
One-Predictor Regression 25 Homes Sold in Essex County, New Jersey, 1996
One-Predictor Regression Regression Equation: Y= -39001 + 85.0 X
Regression Models • Answers ‘What is the relationship between the variables?’ • Equation used • One numerical dependent (response) variable • What is to be predicted • One or more numerical or categorical independent (explanatory) variables • Used mainly for prediction and estimation
Prediction Using Regression Y= -39001 + 85.0(4000)= $300,999
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
Specifying the Model • Define variables • Conceptual (e.g., Advertising, price) • Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units) • Hypothesize nature of relationship • Expected effects (i.e., Coefficients’ signs) • Functional form (linear or non-linear) • Interactions
Model Specification Is Based on Theory • Theory of field (e.g., Sociology) • Mathematical theory • Previous research • ‘Common sense’
Thinking Challenge: Which Is More Logical? Sales Sales Advertising Advertising Sales Sales Advertising Advertising
1 Explanatory 2+ Explanatory Variable Variables Multiple Simple Non- Non- Linear Linear Linear Linear Types of Regression Models RegressionModels
Linear Regression Model Relationship between variables is a linear function Population y-intercept Population Slope Random Error y x 0 1 Dependent (Response) Variable Independent (Explanatory) Variable
Line of Means y E(y) = β0 + β1x (line of means) Change in y β1 = Slope Change in x β0 =y-intercept x High school teacher
Linear Regression Model • 1. Relationship between variables is a linear function Population Y-intercept Population slope Independent (explanatory) variable Y X i 0 1 i i Dependent (response) variable Random error
Population Linear Regression Model Observedvalue i= Random error Observed value
Sample LinearRegression Model Y True Unknown Line X Observed value
Sample LinearRegression Model Y i= Random error ^ X Observed value
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
y 60 40 20 0 x 0 20 40 60 Scattergram • Plot of all (xi, yi) pairs • Suggests how well model will fit
y 60 40 20 0 x 0 20 40 60 Thinking Challenge • How would you draw a line through the points? • How do you determine which line ‘fits best’?
Least Squares • ‘Best fit’ means difference between actual y values and predicted y values are a minimum • But positive differences off-set negative • Least Squares minimizes the Sum of the Squared Differences (SSE)
Least Squares Graphically y ^ e ^ 4 e 2 ^ e ^ e 1 3 x
Coefficient Equations Prediction Equation Slope y-intercept
Computation Table 2 2 xi yi xi xiyi yi 2 2 x1 y1 x1 x1y1 y1 2 2 x2 y2 x2 y2 x2y2 : : : : : 2 2 xn yn xn yn xnyn 2 2 Σxi Σyi Σxi Σxiyi Σyi
^ • Slope (1) • Estimated y changes by 1 for each 1unit increase in x • If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x) ^ ^ ^ • Y-Intercept (0) • Average value of y when x = 0 • If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0 ^ Interpretation of Coefficients
Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4 Find the least squares line relatingsales and advertising.
Scattergram Sales vs. Advertising Sales 4 3 2 1 0 0 1 2 3 4 5 Advertising
Parameter Estimation Solution Table 2 2 xi yi xi xiyi yi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 Parameter Estimation Computer Output ^ 0 ^ 1
^ • Slope (1) • Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x) ^ • Y-Intercept (0) • Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0 • Difficult to explain to marketing manager • Expect some sales without advertising Coefficient Interpretation Solution
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
Linear Regression Assumptions • Mean of probability distribution of error, ε, is 0 • Probability distribution of error has constant variance • Probability distribution of error, ε, is normal • Errors are independent
Error Probability Distribution ^ f( ) Y X 1 X 2 X
Error Probability Distribution ^ f( ) Y X 1 X 2 X
Error Probability Distribution ^ f( ) Y X 1 X 2 X
Estimating s2 Recall that: where is our estimator of the mean. Now substitute (1) Yi for Xi , (2) , and n-2 df for n-1:
Why are df = n - 2? 1. Two statistics are estimated to compute the regression line, b0 and b1 2. As a result, if I know any n-2 residuals, the other two can be computed from ei = 0 and SeiXi = 0. ^ ^ ^ ^
Regression Modeling Steps • Hypothesize deterministic component • Estimate unknown model parameters • Specify probability distribution of random error term • Estimate standard deviation of error • Evaluate model • Use model for prediction and estimation
Test of Slope Coefficient • Shows if there is a linear relationship between x and y • Involves population slope 1 • Hypotheses • H0: 1 = 0 (No Linear Relationship) • Ha: 1 0 (Linear Relationship) • Theoretical basis is sampling distribution of slope
All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : :Very large number of sample slopes Sampling Distribution ^ S 1 ^ b 1 1 Sampling Distribution of Sample Slopes Sample 1 Line y Sample 2 Line Population Line x
Test of Slope Coefficient Example You’re a marketing analyst for Hasbro Toys. You find β0 = –.1,β1 = .7and s= .6055. Ad $Sales (Units)1 1 2 1 3 2 4 2 5 4 Is the relationship significantat the .05 level of significance? ^ ^