welcome to buad 310 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Welcome to BUAD 310 PowerPoint Presentation
Download Presentation
Welcome to BUAD 310

Loading in 2 Seconds...

play fullscreen
1 / 30

Welcome to BUAD 310 - PowerPoint PPT Presentation


  • 160 Views
  • Uploaded on

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 23, Monday April 21, 2014. Agenda & Announcement. Today: Continue with Multiple Regression Talk about the Case Study due on Wednesday April 30 th . Pass back the exams & talk about the exam (time permitting)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Welcome to BUAD 310' - gyan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
welcome to buad 310

Welcome to BUAD 310

Instructor: Kam Hamidieh

Lecture 23, Monday April 21, 2014

agenda announcement
Agenda & Announcement
  • Today:
    • Continue with Multiple Regression
    • Talk about the Case Study due on Wednesday April 30th.
    • Pass back the exams & talk about the exam (time permitting)
  • Homework 7 will be posted soon. It is due Friday May 2, 5 PM.
  • Reading:
    • Read all of 23 carefully but you can skip the path diagram stuff.
    • Read all of 24, but you can lightly read the topic of VIF (Variance Inflation Factor)

BUAD 310 - Kam Hamidieh

some important dates
Some Important Dates
  • Case Study due on Wednesday April 30
  • Homework 7 due on Friday, May 2, 2014
  • Final Exam on Thursday May 8th, 11 AM – 1:00 PM, in room THH 101. See http://web-app.usc.edu/maps/(I recommend you scope out the location before the exam.)

BUAD 310 - Kam Hamidieh

some fun stuff
Some Fun Stuff

http://blogs.wsj.com/atwork/2014/04/15/best-jobs-of-2014-congratulations-mathematicians/?mod=e2fb(Jake S. and William C.)

http://fivethirtyeight.com/features/the-toolsiest-player-of-them-all/(Joshua C.)

BUAD 310 - Kam Hamidieh

multiple regression model
Multiple Regression Model
  • The observed response Y is linearly related to k explanatory variables X1, X2, …, and XK by the equation:
  • The values for B0, B1, …, BK are estimated via least squares method; Pick b0, b1 ,…, bkso the quantity below is as small as possible:

BUAD 310 - Kam Hamidieh

model residuals
Model Residuals
  • Residuals are defined just like the simple linear regression case: residual = observed – fitted.
  • The official formula:

BUAD 310 - Kam Hamidieh

previous example
Previous Example

b0 ≈ 23.73

b1≈ -1.59

b2≈ -0.018

b3≈ 0.0004

b4≈ -0.00075

n – k – 1 =

372 – 4 – 1 = 367

(k = # of predictors)

MSE = 1.55

Se = 1.24

(estimate of σɛ )

SSE = 567.80

APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue)

BUAD 310 - Kam Hamidieh

solution to in class exercise 1 from lecture 21
Solution to In Class Exercise 1 from Lecture 21

Part (1)

(1) Predictor Y = 10.07,

Response: LTV = 0.942, Credit Score = 640, Stated Income = 100000, Home Value = 305000

(2) Y10 = 12.87, X24 = 450000, X11,3 = 70000

Part (2)

When stated income goes up by $1000, while holding all other predictors fixed, on average APR goes up by 0.0004%.

APR = 23.73 – 1.59(1/2) – 0.018(600) + 0.0004(10) – (0.00075)(200) ≈ 12%

BUAD 310 - Kam Hamidieh

solution to in class exercise 2 from lecture 21
Solution to In Class Exercise 2 from Lecture 21

(1)

Y observed = 10.7

Fitted Y (APR) = 23.73 – 1.59(0.942) – 0.018(640) + 0.0004(100) – (0.00075)(305) = 10.52

Residual = 10.07 – 10.52 = -0.45

(2) Same as APR’s units so in %

(3) -0.45/1.242 ≈ -0.36 standard deviation united below the estimated equation

BUAD 310 - Kam Hamidieh

partition of the total variability
Partition of the Total Variability
  • Y values have variability.
  • One way to measure this variability is to see how your Y values vary from your overall mean of Y’s.
  • It can be shown – not at all obvious! – that:

The regression or AKA model + …

Total variation in Y’s is accounted for by ….

Leftovers or residuals or

“errors”

BUAD 310 - Kam Hamidieh

partition of the total variability1
Partition of the Total Variability

SSE

SSR

SST

  • SST = Sum of Squares Total, Total variation in Y values
  • SSR = Sum of Squares Regression, Variation account for by the regression (SSM is used too!)
  • SSE = Sum of Squares Error, Left over variation

BUAD 310 - Kam Hamidieh

summarizing results in a table
Summarizing Results in a Table

MSR = Mean squared (due to) Regression = SSR/k

SSR

SST

BUAD 310 - Kam Hamidieh

multiple coefficient of determination
(Multiple) Coefficient of Determination
  • The coefficient of determination R2 is defined as:
  • Its value tells us the percentage of variation in your response value accounted for (or explained by) the regression onto your predictor values.
  • What is the difference between r2 from simple linear regression and R2 from multiple regression?

BUAD 310 - Kam Hamidieh

summarizing results in a table1
Summarizing Results in a Table

About 46% of the variation in the APR values are accounted for (or explained by) the regression onto the predictor variables LTV,…, Home Value.

BUAD 310 - Kam Hamidieh

issues
Issues!
  • It can be shown that adding more variables to the model will always inflate R2. (See page 621 of your book for an intuitive discussion.)
  • Remedy: use adjusted R2:
  • The adjusted R2 now compensates for this issue. HOW/WHY?
  • The adjusted R2also makes it easier to compare models. (More on this later.)
  • However, the “% variation accounted for” interpretation does not apply for the adjusted R2.

BUAD 310 - Kam Hamidieh

adjusted r squared
Adjusted R Squared

Here it is!

Verify the formula!

BUAD 310 - Kam Hamidieh

the f test
The F-Test
  • If the multiple regression seems reasonable, one of the first “tests” you usually carry out is the “F-Test”:H0: B1 = B2 = … = Bk = 0Ha: At least one of Bi’s ≠ 0
  • Informally, null says “the predictors are useless” vs. alternative model “at one of the predictors is useful.”

BUAD 310 - Kam Hamidieh

regression anova table
Regression ANOVA Table

Here it is F statistics & its p-value.

Since:

P-Value < 0.05

We see that at least one of the predictors is significant.

ANOVA Table

BUAD 310 - Kam Hamidieh

many thanks to
Many Thanks to…

One of the “giants” of statistics.

Many things are named after him.

From Wiki:Anders Hald called him "a genius who almost single-handedly created the foundations for modern statistical science”

while Richard Dawkins named him "the greatest biologist since Darwin".

Ronald Fisher

BUAD 310 - Kam Hamidieh

in class exercise 1
In Class Exercise 1
  • This will be handed out in class.

BUAD 310 - Kam Hamidieh

looking at individual coefficients
Looking at Individual Coefficients
  • We want to determine the statistical significance of a single predictor in the model. Why?
  • We want to test for jth predictor:H0: Bj = 0Ha: Bj ≠ 0
  • We have two options:
    • Get a p-values
    • Get a confidence interval for Bj

BUAD 310 - Kam Hamidieh

looking at individual coefficients1
Looking at Individual Coefficients

For testing H0: Bj = 0 versus Ha: Bj ≠ 0

  • Use the output, to get the test statistics and now compute p-value by looking at t-distribution with df = n – k – 1, and compare with your α
  • Create a 100(1-α)% CI: where tα/2 comes from a t-distribution with df = n – k - 1

BUAD 310 - Kam Hamidieh

our example p values
Our Example, P-Values

se(b1), t-statistics, and p-value for LTV variable

se(b2), t-statistics, and p-value for CreditScore variable

se(b3), t-statistics, and p-value for StatedIncome variable

se(b4), t-statistics, and p-value for HomeValue variable

How about 95% confidence intervals?

BUAD 310 - Kam Hamidieh

looking at individual coefficients2
Looking at Individual Coefficients
  • Looking at the previous slide, we see that LTV and CreditScore are statistically significant predictors.
  • Should we throw away the non-significant predictors?
  • Important: The tests for the individual regression coefficients (or predictors) assess the statistical significance of each predictor variable assuming that all other predictors are included in the regression.
  • It’s possible that you throw away a non-significant predictor, and your results for other predictors change!

BUAD 310 - Kam Hamidieh

variable selection
Variable Selection
  • Variable selection is intended to select the “best” subset of predictors.
  • Motivation:
    • We want to select the simplest model that gets the job done.
    • We can avoid “multicollinearity”. More on this later.
    • Practical matters! Like what?
  • Can we simplify our subprime model?

BUAD 310 - Kam Hamidieh

variable selection methods
Variable Selection Methods
  • Entire books are written on variable selection!
  • Here’s the simplest method, called the backward elimination:
    • Start with the largest model (has all the predictors)
    • Remove the predictor with the largest p-value greater than αcrit. This is usually around 0.10 to 0.20. (Why not 0.05?)
    • Stop when all non-significant predictors have been removed.
  • What happens in our example?

BUAD 310 - Kam Hamidieh

backward elimination
Backward Elimination

StatedIncome & HomeValue are removed.

BUAD 310 - Kam Hamidieh

full model left vs new model right
Full Model (Left) vs. New Model (Right)

APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue)

APR = 23.69 - 1.58(LTV) - 0.019(CreditScore))

In Summary: The remaining coefficients in the new model do not change much. Se and R2 go down only slightly.

BUAD 310 - Kam Hamidieh

other variable selection
Other Variable Selection
  • Forward selection: add in variables with the lowest p-value first (opposite of backward)
  • Criterion based: pick the model with the best “criterion” such as adjusted R squared.
  • All subsets!!! Try out every single combination and pick the model with the best “criterion”. You can use adjusted R squared as an example.
  • The cutting edge seems to be LASSO = Least Absolute Shrinkage and Selection Operator (Take more stats)

BUAD 310 - Kam Hamidieh

in class exercise 2
In Class Exercise 2

This is just the continuation of in class exercise 1.

BUAD 310 - Kam Hamidieh