Quantitative Business Analysis for Decision Making

Multiple Linear Regression Analysis. Outlines. Multiple Regression Model Estimation Testing Significance of Predictors Multicollinearity Selection of Predictors Diagnostic Plots.

Quantitative Business Analysis for Decision Making

Presentation Transcript

### Quantitative Business Analysis for Decision Making

Multiple Linear

Regression

Analysis

• Multiple Regression Model

• Estimation

• Testing Significance of Predictors

• Multicollinearity

• Selection of Predictors

• Diagnostic Plots

Multiple linear regression model:

are slope coefficients of

X1, X2 ,… ,Xk.

quantifies the amount of change in

response Y for a unit change in Xi when

all other predictors are held fixed.

In the model,

is the mean of Y.

• Contributes to the variation in Y values from their mean , and

• is assumed normally distributed with mean 0 and standard deviation

A random sample of n units is taken. Then for

each unit k+1 measurements are made:

Y, X1 , X2 , …., Xk

Estimated multiple regression model is:

Expressions for bi are cumbersome to

write. is an estimate of

Sample standard deviation around the mean (estimated regression model) is:

It is an estimate of

Standard error of (for specified values of predictors) is denoted by

For comparing with a reference ,test

statistic is:

and for estimating by a confidence

interval,

compute

Coefficient of determination R2 quantifies the % of

variation in the Y-distribution that is accounted by the

predictors in the model. If

• R2 = 80%, then 20% variation in the Y-distribution is due to factors other than those in the model.

• R2 increases as predictors are added in the model but at the cost of complicating it.

Null hypothesis = predictors in the relationship have no predictive power to explain the variation in Y-distribution

Test statistic: F = . It has

F- distribution with k and (n-k-1) degrees of

freedoms for the numerator and denominator.

• Multicollinearity - occurs when predictors are highly

correlated among themselves. In its presence R2 may be high,

but individual coefficients are less reliable.

• Screening process (e.g. stepwise regression) can eliminate

multicollinearity by selecting only those predictors that are not

strongly correlated among themselves.

• Residuals are used to diagnose the validity of the model assumptions.

• A scatter plot of the residuals against the predicted values can serve as a diagnostic tool.

• A diagnostic plot can identify outliers, unequal

variability, and need for transformation to achieve

homogeneity etc.

• Indicator variables (also called dummy variables) are

numerical codes that are used to represent qualitative

variables.

• For example, 0 for men and 1 for women.

• For a qualitative variable with c categories, (c-1) indicator variables need to be defined.

