1 / 17

The Independent Sign Bias: Gaining Insight from Multiple Linear Regression

The Independent Sign Bias: Gaining Insight from Multiple Linear Regression. Michael J. Pazzani and Stephen D. Bay University of California, Irvine. Background. Knowledge Discovery in Databases: The process of identifying valid , novel , useful , and understandable patterns in data

danyl
Download Presentation

The Independent Sign Bias: Gaining Insight from Multiple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Independent Sign Bias: Gaining Insight from Multiple Linear Regression Michael J. Pazzani and Stephen D. Bay University of California, Irvine

  2. Background • Knowledge Discovery in Databases: The process of identifying valid, novel, useful, and understandable patterns in data • “Drowning in data, but starving for knowledge”

  3. Modeling Salaries • Social Science Professors Salary =45,647 -66YaT + 1784YsD -346YsH • BaseBall Players Salary = -180 +10runs +5hits +0.9obp +15hr + 14rbi -0.8ave -18db -39tr Do the models make sense?

  4. Applications • Credit Scoring • Explanations of credit rejection are important • Medical “algorithms” • Models for predicting dementia levels from diagnostic tests • Models must be acceptable to administrator

  5. Goals • Identify conditions under which linear models of data prove credible. • Produce linear models that • are as accurate as standard regression techniques • are more acceptable to people knowledgeable about the domain

  6. Outline • Why are the signs wrong? • The Independent Sign Bias and Constrained Regression • Accuracy • Subject Ratings • Conclusions

  7. Why are the signs wrong? • computational error • numerical (rounding, truncation) • variance in estimates • coefficients do not differ significantly from zero • multicollinearity: predictor variables are highly correlated • true sign is reversed when other variables are considered

  8. The Independent Sign Bias Hypothesis: Models are more acceptable when the sign of each variable in the regression equation is the same as the sign of the variable in isolation

  9. Constrained Regression • Multiple Linear Regression Salary = -180 +10runs +5hits +0.9obp +15hr + 14rbi -0.8 ave -18db -39tr • Independent Sign Regression (ISR) minimize subject to Salary = -207 +15runs +0.8hits +11hr +11rbi +0.33ave +5db

  10. Forward Selection • Forward Selection Salary = -114 +16runs +17rbi -59tr • Independent Sign Forward Regression (ISFR) • add variables as long as constraint is not violated Salary = -148 +15runs +15rbi

  11. Constrained Regression • Mean Coefficient Regression (MCR) Salary = -162 + 4runs +2hits +1.1obp +10hr +3rbi +1.2ave +9db +16tr

  12. Accuracy of Regression 7 4 3 1 3 5 MLR violations:

  13. Accuracy: Forward Selection 0 1 1 0 0 2 FS violations:

  14. Baseball Salary Experiment 47 Subjects

  15. Experiment Results F(4,184)=22.11 p < 0.0001 All differences significant with Tukey-Kramer test at 0.05 level ISR > MLR MCR > MLR ISFR >SFR

  16. Biasing KDD to improve understandability • Related Work • Clark, P. & Matwin, S. (1993). Using Qualitative Models to Guide Inductive Learning. MLC 49-56. • Monotonicity Constraints: Pazzani, Subramani and Shankle. Proc. Cog Sci 1996.

  17. Conclusions • The independent sign bias affects the willingness of subjects to use linear models • new constrained regression routines • as accurate as unconstrained regression • more acceptable to users

More Related