1 / 33

09 Multiple Regression

09 Multiple Regression. Learning goals Multiple regression. Statistical model of multiple regression Multiple regression in R, including: Multicollinearity Influential points Interactions between variables Categorical variables (factors)

bat
Download Presentation

09 Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 09 Multiple Regression

  2. Learning goals Multiple regression • Statistical model of multiple regression • Multiple regression in R, including: • Multicollinearity • Influential points • Interactions between variables • Categorical variables (factors) • Model validation and selection, information criterion (basic theory and R) • Nonlinear regression - examples

  3. Multiple linear regression

  4. Model and estimation

  5. Notation X is an n ✕ (p+1) matrix, we assumpe its rank is p+1, n>p+1

  6. Notation What is the dimension of: • Xβ • y - Xβ • (y-Xβ)T(y-Xβ) How would grade, experience, salary look like in this notation? X is an n ✕ (p+1) matrix

  7. Estimate the parameter vector β

  8. Using the result from the previous slide: H is called a hat matrix r - the vector of residuals, I - the identity matrix (optional) Proof of 9.12 based on: And matrix calculations. See: https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf For 9.13, first show, that HH = H (idempotency)

  9. Closer look at the estimator of β

  10. What are the risks of multiple predictors? How to choose the best model?

  11. Validation of the model Why do we need adjusted R2?

  12. Example - model selection

  13. How to check for unnecessary predictors?

  14. Model 1 Model 5

  15. Information criterion Information Criterion balances the goodness of fit of the estimated models with its complexity, measured by the number of parameters. We assume that the distribution of the variables follows a known distribution with an unknown parameter θ. In maximum likelihood estimation, the larger the likelihood function L(θ^hat) or, equivalently, the smaller the negative log-likelihood function −log(θ^hat), the better the model is.

  16. AIC and BIC The Akaike information criterion Bayesian information criterion (BIC)

  17. Example: model selection

  18. Nonlinear regression: examples

  19. Nonlinear regression: examples

  20. Nonlinear regression: examples

  21. Categorical variables in the linear regression

  22. What is the expected salary for a female Prof in discipline B, 10 years after PhD, 15 years of service? What is the expected salary for a female AsistProf, 0 years of service, 0 years after PhD, disciplineA? What do you think about yrs.since.phd and yrs.service?

  23. Categorical variables in the linear regression salary =b0 + b1*grade + b2*years_of_experience + b3*gender + b4*humanities/science/art + e Coding (dummy variables) salary = b0 + b1*grade + b2*years_of_experience + b3*is_men + b4*is_science + is_art + e

  24. Multicollinearity "multicollinearity" refers to predictors that are correlated with other predictors. Warning signs: • A regression coefficient is not significant even though the variable should be highly correlated with Y. • When you add or delete an X variable, the regression coefficients change dramatically. • You see a negative regression coefficient when your response should increase along with X. • You see a positive regression coefficient when the response should decrease as X increases. • Your X variables have high pairwise correlations.

  25. Multicollinearity variance-inflation factors vif(model)

  26. Influential points summary(influence.measures(lm1))

  27. Interactions How can we recognize existing interactions between variables?

  28. Summary Use multiple regression in R Interpret the output (also for categorical variables) Choose the important predictors Check for multicollinearity, influence measures of points, interactions. Discuss if the linear model is the appropriate choice for modelling given data set.

  29. Admin • Evaluation: Exercises Intro. to Statistics Gr.2 (Nr. 17217) – Link zur Umfrage: https://qmsl.uzh.ch/de/79VXV Introduction to Statistics (Nr. 17216) – Link zur Umfrage: https://qmsl.uzh.ch/de/FX4UV • Part A test exam • No lecture on the 23rd April, the lecture with Roman Flury on the 30th of April • No office hours on the 30th of April.

More Related