1 / 71

Practical Model Selection and Multi-model Inference using R

Practical Model Selection and Multi-model Inference using R. Presented by: Eric Stolen and Dan Hunt. Foundation: Theory, hypotheses, and models. Theory. This is the link with science, which is about understanding how the world works. Theory.

gustav
Download Presentation

Practical Model Selection and Multi-model Inference using R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Model Selection and Multi-model Inference using R Presented by: Eric Stolen and Dan Hunt

  2. Foundation: Theory, hypotheses, and models

  3. Theory • This is the link with science, which is about understanding how the world works

  4. Theory • “A set of propositions set out as an explanation.” • “Theories are generalizations.” • “Theories contain questions.” • “Theories continually change…” (Ford, E. D. 2000. Scientific Method for Ecological Research. Cambridge University Press.)

  5. Theory • Example 1 – Wading bird foraging: • Ideal Free Distribution • Marginal Value Theorem • Scramble Competition

  6. Theory • Example 2 – Indigo Snake Habitat selection • Animal perception • Evolutionary Biology • Population Demography

  7. Hypotheses • Many views – confusing! • A hypothesis is a statement derived from scientific theory that postulates something about how the world works • A testable hypothesis is a hypothesis that can be falsified by a contradiction between a prediction derived from the hypothesis and data measured in the appropriate way

  8. Hypotheses • To use the Information-theoretic toolbox, we must be able to state a hypothesis as a statistical model (or more precisely an equation which allows us to calculate the maximum likelihood of the hypothesis)

  9. Multiple Working Hypotheses • We operate with a set of multiple alternative hypotheses (models) • The many advantages include safeguarding objectivity, and allowing rigorous inference. • Chamberlain (1890) • Strong Inference - Platt (1964) • Karl Popper (ca. 1960)– Bold Conjectures

  10. Deriving the model set • This is the tough part (but also the creative part) • much thought needed, so don’t rush • collaborate, seek outside advice, read the literature, go to meetings… • How and When hypotheses are better than What hypotheses (strive to predict rather than describe)

  11. Models – Indigo Snake example • Study of indigo snake habitat use • Response variable: home range size ln(ha) • SEX • Land cover – 2-3 levels (lC2) • weeks = effort/exposure • Science question: “Is there a seasonal difference in habitat use between sexes?”

  12. Models – Indigo Snake example SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

  13. Models – Indigo Snake example SEX land cover type (lc2) weeks SEX + lc2 SEX + weeks llc2 + weeks SEX + lc2 + weeks SEX + lc2 + SEX * lc2 SEX + lc2 + weeks + SEX * lc2

  14. Models – Indigo Snake example SEX land cover weeks SEX + land cover SEX + weeks llc2 + weeks SEX + land cover + weeks SEX + land cover + SEX * land cover SEX + land cover + weeks +SEX * land cover

  15. Models – fish habitat use example • Study of fish habitat use in salt marsh • Response variable was density ln(fish m-2 +1) • Habitat – vegetated or unvegetated • Site – 7 impoundments • Season – 4 seasons • Science questions: • “Is there evidence for a difference in density between habitats?” • “Is there a seasonal difference in habitat use by resident marsh fish?”

  16. Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

  17. Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

  18. Models – fish habitat use example Site + Season + Habitat + Site*Habitat + Season*Habitat + Site*Season Site + Season + Habitat + Site*Habitat + Season*Habitat Site + Season + Habitat + Site*Season + Site*Habitat Site + Season + Habitat + Site*Season + Season*Habitat Site + Season + Habitat + Site*Habitat Site + Habitat + Site*Habitat Site + Season + Habitat + Season*Habitat Season + Habitat + Season*Habitat Site + Season + Habitat + Site*Season Site + Season + Site*Season Site + Season + Habitat Site + Season Site + Habitat Season + Habitat Site Season Habitat

  19. The importance of a priori thinking…You can’t go back home!

  20. Modeling • Trade-off between precision and bias • Trying to derive knowledge / advance learning; not “fit the data” • Relationship between data (quantity and quality) and sophistication of the model

  21. Precision-Bias Trade-off Bias 2 Model Complexity – increasing umber of Parameters

  22. Precision-Bias Trade-off variance Bias 2 Model Complexity – increasing umber of Parameters

  23. Precision-Bias Trade-off variance Bias 2 Model Complexity – increasing umber of Parameters

  24. Kullback-Leibler Information • Basic concept from Information theory • The information lost when a model is used to represent full reality • Can also think of it as the distance between a model and full reality

  25. Kullback-Leibler Information Truth / reality G1 (best model in set) G2 G3

  26. Kullback-Leibler Information Truth / reality G1 (best model in set) G2 G3

  27. Kullback-Leibler Information Truth / reality G1 (best model in set) G2 G3

  28. Kullback-Leibler Information Truth / reality G1 (best model in set) G2 The relative difference between models is constant G3

  29. Akaike’s Contributions • Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models • Figured out how to link maximum likelihood estimation theory with expected K-L information • An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K

  30. Akaike’s Contributions • Figured out how to estimate the relative K-L distance between models in a set of models • Figured out how to link maximum likelihood estimation theory with expected K-L information • An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K

  31. Akaike’s Contributions • Figured out how to estimate the relative K-L distance between models in a set of models • Figured out how to link maximum likelihood estimation theory with expected K-L information • An (Akaike’s) Information Criteria • AIC = -2 loge (L{modeli }| data) + 2K

  32. I-T mechanics AICci = -2*loge (Likelihood of model i given the data) + 2*K (n/(n-K-1)) or = AIC + 2*K*(K+1)/(n-K-1) (where K = the number of parameters estimated and n = the sample size)

  33. I-T mechanics AICcmin = AICcfor the model with the lowest AICc value Di = AICci– AICcmin

  34. I-T mechanics wi =Prob{gi | data} Model Probability (model probabilities) evidence ratio of model i to model j = wi / wj

  35. I-T mechanics Least Squares Regression AIC = n loge (s2) + 2*K (n/(n-K-1)) Where s2 = RSS / n (explain offset for constant part)

  36. I-T mechanics Counting Parameters: K = number of parameters estimated Least Square Regression K = number of parameters + 2 (for intercept & s)

  37. I-T mechanics Counting Parameters: K = number of parameters estimated Logistic Regression K = number of parameters + 1 (for intercept)

  38. I-T mechanics Counting Parameters: Non-identifiable parameters

  39. Comparing Models

  40. Comparing Models Combined model weight = 0.995

  41. Comparing Models Evidence Ratio = 4.52

  42. Comparing Models

  43. Comparing Models Evidence Ratio = 3.03

  44. Comparing Models Evidence Ratio =4.28 (.34+.22+.14+.08) / (.11+.04+.02+.01)

  45. Generalized Linear Models

  46. Mathematical details • Three parts to a GLM • Link function • linear equation • error distribution

  47. Mathematical details • General Linear Models – linear regression and ANOVA • Link function – Identity link • linear equation • error distribution – Normal Distribution (Gaussian) Y = b0 + b1X1 + b2X2 + e

  48. Mathematical details • Logistic Regression • Link function - Logit link: ln(p / (1-p)) • linear equation • error distribution – Binomial Distribution Logit(p) = b0 + b1X1 + b2X2 + e

  49. Mathematical details • What types of models can be compared within a single I-T analysis? • Data must be fixed (including response) • Must be able to calculate maximum likelihood • (ways to deal with quasi-likelihood) • Models do not need to be nested • In some cases AIC is additive

  50. Model Fitting Preliminaries • Understanding the data/variables • Avoid data dredging! • safe data screening practices • Detect outliers, scale issues, collinearity • Tools in R

More Related