1 / 0

Welcome to BUAD 310

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 24, Wednesday April 23, 2014. Agenda & Announcement. Today: In Class Exercise 2 from last time Continue with Multiple Regression Talk about the exam 2 (time permitting). From Last Time.

tave
Download Presentation

Welcome to BUAD 310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to BUAD 310

    Instructor: Kam Hamidieh Lecture 24, Wednesday April 23, 2014
  2. Agenda & Announcement Today: In Class Exercise 2 from last time Continue with Multiple Regression Talk about the exam 2 (time permitting) BUAD 310 - Kam Hamidieh
  3. From Last Time The coefficient of determination R2 is defined as: Its value tells us the percentage of variation in your response value accounted for by the regression onto your predictor values. The adjusted R2 now compensates for adding too many predictors which always inflate R2: BUAD 310 - Kam Hamidieh
  4. From Last Time The F-Test:H0: B1 = B2 = … = Bk = 0 (None of the predictors are statistically significant.)Ha: At least one of Bi’s ≠ 0 (At least one predictor is statistically significant.) For testing H0: Bj = 0 versus Ha: Bj ≠ 0 (Both use t-distribution with df = n – k – 1) BUAD 310 - Kam Hamidieh
  5. From Last Time Variable selection is intended to select the “best” subset of predictors. Backward elimination: Start with the largest model (has all the predictors) Remove the predictor with the largest p-value greater than αcrit. This is usually around 0.10 to 0.20. Stop when all non-significant predictors have been removed. Clarification: In the context of multiple regression I use “slopes” and “coefficients” synonymously. BUAD 310 - Kam Hamidieh
  6. Our Example Continued APR = 23.69 - 1.58 (LTV) - 0.019 (CreditScore) BUAD 310 - Kam Hamidieh
  7. Building a good multiple regression model is partly science andpartly art! “All models are wrong but some are useful.” (G. A. Box) BUAD 310 - Kam Hamidieh
  8. Assumptions Y is a linear combination of the predictors. Constant Variance Assumption:The variance of the error terms is σε2is a constant. Normality Assumption:The error terms follow a normal distribution. Independence Assumption:The values of the error terms are statistically independent of each other. BUAD 310 - Kam Hamidieh
  9. Checking Assumptions Y is linear in each predictor:- Look at the plot of Y against each X.- Plot the residuals versus each x and fitted values Constant Variance Assumption- Plot the residuals versus each x and fitted values Normality Assumption- Look at the histogram of the residuals.- Look at the Q-Q plot of the residuals. Independence AssumptionIf residuals have time or spatial dependency, you can just plot them in order and see if you see a pattern. (Not covered.) You should also look for anything “unusual” while checking for assumptions! BUAD 310 - Kam Hamidieh
  10. Checking Assumptions APR appears linear in both LTV and Credit Score BUAD 310 - Kam Hamidieh
  11. Checking Assumptions I do not see any patterns. There is one point that seems to stand out. You can do the regression with out this point and see what happens. BUAD 310 - Kam Hamidieh
  12. Removing the Unusual Point Before Removal After Removal Removing the point improves the quality of the regression. The point removed correspond to an observation that had the highest APR value. I decided to leave it in. BUAD 310 - Kam Hamidieh
  13. Checking Assumptions The residuals are not normally distributed. Oh no! BUAD 310 - Kam Hamidieh
  14. Assumptions Not Met? BUAD 310 - Kam Hamidieh
  15. (A Detour on) Transformations Below is a plot of fuel efficiency (MPG) versus speed (MPH) for a sample of 60 cars. Is a linear relationship reasonable? BUAD 310 - Kam Hamidieh
  16. Transformations Black line: MPG = (intercept) + (slope) × MPH Red line: “smoothed” function Comments? BUAD 310 - Kam Hamidieh
  17. Transformations Left: MPG = (intercept) + (slope) × MPH Right: MPG = (intercept) + (slope) × Log(MPH) BUAD 310 - Kam Hamidieh
  18. Transformations Transformation: re-expression of a variable by applying a function to each observation. Transformations allow the use of regression analysis to describe a curved pattern. You can transform Y or X or both but the interpretations becomes difficult. A nonlinear transformation useful in business applications: logarithms. BUAD 310 - Kam Hamidieh
  19. More on Transformations A decent idea always worth trying: if your data show skewness, taking the logarithms could improve your overall results! The process of choosing the right transformation is usually iterative. What is the interpretation of the coefficients after log transformation? (When you see “ln”, it means natural logarithm.) BUAD 310 - Kam Hamidieh
  20. Interpretation of Slope We can test that: ln(1 + little bit) ≈ little bit (Try it!) Remember: ln(b) – ln(a) = ln(b/a) With log transformation we have: We increase x by 1%: The change in y is: When MPH goes up by 1%, on average MPG goes up by 0.0787 mpg. BUAD 310 - Kam Hamidieh
  21. Going Back to the Real Estate… When I checked out the pairs plots (Slide 10) it was hard for me to see any nonlinear patterns. The histogram of APR is slightly right skewed. Let’s try ln(APR)! BUAD 310 - Kam Hamidieh
  22. Transforming APR…. You can just add a new column to your data by taking the log of APR column. Now ln(APR) becomes our new Y. BUAD 310 - Kam Hamidieh
  23. Before (Left) and After (Right) ln(APR) = 3.45 – 0.12 (LTV) - 0.0016 (CreditScore) APR = 23.69 - 1.58 (LTV) - 0.019 (CreditScore) Note that both R squared and adjusted R squared improve after transformation. Se’s are not directly comparable since the Y units are different. BUAD 310 - Kam Hamidieh
  24. QQ Plots Before (Left) and After (Right) There is some slight improvement but it is not dramatic. BUAD 310 - Kam Hamidieh
  25. In Class Exercise 1 The equation for the model with no transformation is: APR = 23.69 - 1.58 (LTV) - 0.019 (CreditScore) After the log transformation, we have: ln(APR) = 3.45 - 0.12 (LTV) - 0.0016 (CreditScore) Use the model without the log transformation to predict a borrower’s APR when LTV = 0.5, and credit score = 600. Do the same prediction using the model with the transformed APR. Be careful! When you plug in your values for the equation, you’ll get ln(APR) and not APR! (HARD! I know you can do it!) What is the interpretation of the estimated slope of LTV (which is -0.12)? (Hint: You need to think through this and derive it. Take a look at slide 20. For a change in LTV use 0.1.) BUAD 310 - Kam Hamidieh
  26. Just for Fun!!! NO Transformation Transformed BUAD 310 - Kam Hamidieh
  27. WHAT IS MY GRADE? BUAD 310 - Kam Hamidieh
  28. A Suggestion…. From our syllabus: Suppose your exam 1 = 84%, exam 2 = 65%, HW1-6 = 90%, Class Participation = 100% YOU (not me!) predict your scores for the case, final, hw 7 as follows: Case = 85%, Final = 80%, HW 7 = 90% Your grade is: 0.20(84) + 0.2(65) + 0.15(90) + 0.05(100) + 0.1(85) + 0.3(80) = 80.8% At worst, you get a B-. BUAD 310 - Kam Hamidieh
More Related