200 likes | 379 Views
Business Intelligence and Decision Modeling. Week 10 Predictive Modeling (1) with Linear Regression. Linear Regression Models. Linear regression ( interval target ) Causal/Explanations Rigorous Statistical Assumptions Normality Independence of observations (homocedasticity)
E N D
Business Intelligence and Decision Modeling Week 10 Predictive Modeling (1) with Linear Regression
Linear Regression Models • Linear regression (interval target) • Causal/Explanations • Rigorous Statistical Assumptions • Normality • Independence of observations (homocedasticity) • Independence of variables (multicolinearity) • Forecasting (Split Samples)
Y Dependent variable X Independent variable Simple Linear Regressionof Y on X { Deviation Unexplained { Deviation from the mean Explained by regression { { Mean Y
Simple Linear Regression Regression equation: Y = a + bX Y Estimate of Y from regression equation Actual value of Y Dependent variable b Value of X used to estimate Y a X Independent variable
Sales Promotion Month (000 units) (000 $) 1 264 2.5 2 116 1.3 3 165 1.4 4 101 1.0 5 209 2.0 Simple Linear Regression on Excel (Example) See Linear Regression.xls on syllabus page
Linear Regression on Excel • Using Formulas (Statistical) • =Slope (Y range, X range) • = Intercept (Y range, X range) • Scatter Chart with Trend Line • Data Analysis ToolPak • Excel Option Add-ins
300 — 250 — 200 — 150 — 100 — 50 Sales Advertising Month (000 units) (000 $) 1 264 2.5 2 116 1.3 3 165 1.4 4 101 1.0 5 209 2.0 a = – 8.136 b = 109.229X r = 0.98 r2 = 0.96 syx = 15.61 Sales (thousands of units) Y = – 8.136 + 109.229X | | | | 1.0 1.5 2.0 2.5 Advertising (thousands of dollars) Simple Linear Regression on Excel Regression equation: Y = a + bX
300 — 250 — 200 — 150 — 100 — 50 Sales Advertising Month (000 units) (000 $) 1 264 2.5 2 116 1.3 3 165 1.4 4 101 1.0 5 209 2.0 a = – 8.136 b = 109.229X r = 0.98 r2 = 0.96 syx = 15.61 Sales (thousands of units) Y = – 8.136 + 109.229X | | | | 1.0 1.5 2.0 2.5 Forecast if Promo is 1.75 X = $1,750, Y = – 8.136 + (109.229 * 1.75) Advertising (thousands of dollars) Simple Linear Regression on Excel (Forecasting)
300 — 250 — 200 — 150 — 100 — 50 Sales Advertising Month (000 units) (000 $) 1 264 2.5 2 116 1.3 3 165 1.4 4 101 1.0 5 209 2.0 a = – 8.136 b = 109.229X r = 0.98 r2 = 0.96 syx = 15.61 Sales (thousands of units) Y = – 8.136 + 109.229X | | | | 1.0 1.5 2.0 2.5 Forecast if Promo = 1.750 X = $1750, Y = 183.015, or 183,015 units Advertising (thousands of dollars) Linear Regression
Multiple Linear Regression • Simple Linear Regression • Y = a + bX • Multiple Linear Regression • Y = a + b1X1 + b2X2 + …. bnXn • When studying one variable, all other variables are kept constant,thus Y = k + biXi
Multiple Linear Regression on Excel Example 14
Multiple Linear Regression On Excel • Do a QD Regression and look at fit statistics • Examine variable distribution • Transform variables • Redo Regression and look at fit statistics
Multiple Linear Regression On Excel – Histogram Total Dollars
SPSS Video Tutorial Training File (1) • Exploratory Linear Regression • Examine Data Distribution • Transform all variables if necessary • Log_x = ln(x + 1) • Verify transformed data distribution • Run Linear Regression • Save xml scoring model • Save unstandardized predicted log value • Correlate actual and predicted values • Pre_Resp = exp(Predicted Log)-1 • Save Training file
Tutorial Validation File (2) • Open Validation File • Note that there is no DV in this file • Repeat same log transformation on other variables. • Utilities Scoring Wizard • Get xml scoring file • Anti-log on predicted log transformed value:pre_resp = exp (predicted)-1 • Create deciles with Visual Binning • Predicted response mean by deciles