- 83 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Welcome to BUAD 310' - ciqala

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Agenda & Announcement

- Today:
- Finish up the problem from last time & finish off Simple Linear Regression
- Start Multiple Regression, Chapter 23

- Homework 6 is due today at 5 PM.

BUAD 310 - Kam Hamidieh

About Exam II

- NO CELL PHONES ARE ALLOWED.
- Two cheat sheets allowed, both sides, hand written.
- In class this Wednesday April 16.
- Coversheet will be posted by Monday, 33 questions
- Print z, and t tables and bring them with you.
- Coverage: Lecture 12, March 3 to the end of lecture 21 (minus multiple regression), April 9, and HW 4, 5, & 6
- All Exam II relevant material will be posted by tomorrow morning.
- Scantrons passed out Monday, fill out before the exam, do not bend it!
- We will review all of Monday.
- Extended office hours:
- Monday April 14: 4-6 PM
- Tuesday April 15: 2-6 PM

BUAD 310 - Kam Hamidieh

CI and Tests for B1

To test H0: B1 = 0 vs. Ha: B1 ≠ 0:

(1) 100(1-α)% confidence interval for B1 is:

b1 ± tα/2se(b1)

where tα/2comes from a t distribution with df = n-2.

Or (2) Compute the test statistics:

then get the p-value from a t distribution with df = n-2.

BUAD 310 - Kam Hamidieh

CI for Mean Response

100(1-α)% confidence interval for at xnew is:

)

where tα/2comes from a t distribution with df = n-2,

and

We will generally use software.

BUAD 310 - Kam Hamidieh

- “Outliers are observations that stand away from the rest of the data and appear distinct in a plot.” Imprecise!
- They can have very strong influence in your final results.

BUAD 310 - Kam Hamidieh

r2 = 0.80, Se = 3.28

r2 = 0.25, Se = 10

r2 = 0.29, Se = 9.7

r2 = 0.92, Se = 3.2

r2 = 026, Se = 6.1

BUAD 310 - Kam Hamidieh

- There are NO hard and fast rules on how to deal with outliers except: you should not just throw out yours without SOLID justification.
- Check for data entry errors. (Not always possible!)
- Examine the physical context.
- Report your results with and without outliers.
- Standardized residuals can help identify outliers too.
- Transformations can help. (This will be discussed when we cover multiple regression.)

BUAD 310 - Kam Hamidieh

Multiple Regression

- Simple Linear Regression:
- One Y and one X, fit a line that gives the mean of Y’s for a given X

- Multiple regression:
- One Y and multiple X’s, you have multiple predictors

BUAD 310 - Kam Hamidieh

Multiple Regression Model

The observed response Y is linearly related to k explanatory variables X1, X2, …, and XK by the equation:

A single Value of

response

comes from….

a linear combination of k variables plus…

Error,

Where…

Error are normal iid

Given a fixed values of X’s, the mean of Y’s is equal to ….

a linear combination of X’s at those fixed values

BUAD 310 - Kam Hamidieh

Assumption (Redundant Slide?)

- Constant Variance AssumptionThe variance of the error terms is σε2 the same for every combination of values of x1, x2,…, xk
- Normality AssumptionThe error terms follow a normal distribution for every combination of values of x1, x2,…, xk
- Independence AssumptionThe values of the error terms are statistically independent of each other

BUAD 310 - Kam Hamidieh

Simple versus Multiple

Simple regression

Data:

(x1,y1)

(x2,y2)

…

(xn,yn)

Assumed Model:

yi = B0 + B1 xi + εi

εi ~ iid N(0,σε)

Parameters: B0, B1, σε

Multiple regression

Data:

(y1, x11,x12,…,x1k)

(y2, x21,x22,…,x2k)

…

(yn, xn1,xn2,…,xnk)

Assumed Model:

yi = B0 + B1 xi 1 + B2xi 2 + … + Bkxi k εi

εi ~ iid N(0,σε)

Parameters: B0, B1, B2, … , Bk, σε

BUAD 310 - Kam Hamidieh

Example (Page 615)

- Defaults from subprime housing market brought down several financial institutions in 2008 (Lehman, Bear Stern, and AIG) and led to a massive bailout of the financial system.
- Goal: A bank regulator wants to know how lenders are using credit scores to determine the rate of interest paid by subprime borrows.
- The variables of interest are:
Y = APR, annual % rate on the loan

X1 = LTV, loan to value ratio, how much of the loan covers the value of the property. Values near 0 are “good”, near 1 are “bad”.

X2= Credit Score. The higher the better.

X3 = Income in 1000’s of dollars

X4 = Home value in 1000’s of dollars

- The data are n = 372 mortgages obtained from a credit bureau.
- There are 4 predictors: k = 4.

BUAD 310 - Kam Hamidieh

“Pairs Plot”

BUAD 310 - Kam Hamidieh

“Pairs Plot”

APR seems linearly dependent on LTV and Credit Score and not so much on the other two.

Looking at the relationship between predictors is a good idea too.

BUAD 310 - Kam Hamidieh

Pairwise Correlations

BUAD 310 - Kam Hamidieh

Pairwise Correlations

Highest correlations are APR with LTV and Credit score.

Why are some of the boxes empty?

BUAD 310 - Kam Hamidieh

Least Squares

The values for B0, B1, …, BK are estimated via least squares method:

Pick b0, b1,…, bkso this is as small as possible.

But where is the line?

BUAD 310 - Kam Hamidieh

Least Squares Method

One Response Y, two predictors X1 & X2.

Method of least squares minimizes the vertical distances between the points and a plane.

(Picture from An Introduction to Statistical Learbing with Applications in R by James, Witten, Hastie, Tibshirani)

BUAD 310 - Kam Hamidieh

Example Continued

The estimated regression model now is:

Note: y-hat gives the mean APR for a given set of predictor values.

APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue)

BUAD 310 - Kam Hamidieh

Interpretation

APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue)

b0 = 23.73:

When LTV = Credit Score = State Income = Home Value = 0, then the mean APR = 23.73%

b1= -1.59:

Holding all other x variables fixed, when LTV goes up by 0.1, then on average APR goes down by 0.159% (1.59 × 0.1)

b1 = -0.018:

Holding all other x variables fixed, when Credit Score goes up by 1 unit, then on average APR goes down by 0.018%

etc…….

BUAD 310 - Kam Hamidieh

Example

Suppose we observe a subprime borrower with the following characteristics:LTV = 0.90

Credit Score = 650

Stated Income = $45,000

Home Value = $400,000

Our estimated model says that on average such a customer gets:

APR = 23.73 - 1.59(0.90) - 0.018(650) + 0.0004(45) -0.00075(400)

APR ≈ 10.32%

BUAD 310 - Kam Hamidieh

In Class Exercise 1

Part (1): Refer to slide 15.

- What are the predictor and response values for the 9th observation?
- What are the values of y10, x24, x11,3?
Part (2) Refer to slide 25.

- Interpret the slope term for stated income variable.
- What is the estimated mean APR for customer with LTV = 0.50, Credit Score = 600, Stated Income = $10,000, Home Value = $200,000?

BUAD 310 - Kam Hamidieh

Model Residuals

- Residuals are defined just like the simple linear regression case: residual = observed – fitted.
- The official formula:
- What is the “picture” for residuals?

BUAD 310 - Kam Hamidieh

Standard Deviation of Residuals

- Compute the standard deviation of the residuals:
- It has the same interpretation as before: it tells how far away your observed points are from the “plane” on average.
- Se estimates σε.
- The value n – k – 1 is called the residual degrees of freedom.
- SSE = Sums of Squared (due to) Error
- MSE = Mean squared (due to) Error

BUAD 310 - Kam Hamidieh

Summarizing Results in a Table

n – k – 1 =

372 – 4 – 1 = 367

MSE = 1.55

SSE = 567.80

Se = 1.24

BUAD 310 - Kam Hamidieh

In Class Exercise 2

Again, refer to the subprime example.

- What is the residual for the 9th observation?
- What are the units of Se?
- Referring to question 1, how many standard deviations does this observed value fall below or above the estimated equation? (This is relative to Se.)

BUAD 310 - Kam Hamidieh

Download Presentation

Connecting to Server..