1 / 17

Model Evaluation and Selection

Model Evaluation and Selection. Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models. Evaluating the Sufficiency of a Single Model (followup to example of Mediation Test). When this model is run, a variety of

arleen
Download Presentation

Model Evaluation and Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Evaluation and Selection

  2. Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models.

  3. Evaluating the Sufficiency of a Single Model (followup to example of Mediation Test) When this model is run, a variety of measures of model fit will be generated. A question of importance is, "Is the fit of the model sufficiently good to yield reliable results?" The alternative model is one in which there is also an arrow from s_age to tcov. In other words, does fire severity explain the effect of stand age on cover, or, is there another pathway of influence independent of fire severity?

  4. Finding Measures of Model Fit in Amos I The model chi-square is the most commonly used measure of absolute model fit. It is always good to check the section of the output called “Notes for Model”. Here we can see that a minimum was achieved and the full p-value for the chi-square. P-value greater than 0.05 suggests that we could accept this model (it indicates no major deviations between data and model).

  5. Further Considerations of Model Chi-square It is well known that model p-values are not always the best way to decide if a model is adequate (in an absolute sense) or the best model (in a relative sense). This is a complex topic and one that lacks complete consensus. What is generally agreed upon is: (1) Chi-squares automatically increase with increasing sample size and p-values reflect increasing power for detecting deviations. (2) P-values for model chi-squares are pretty useful when sample sizes are less than 200, especially for models that do not include latent variables possessing multiple indicators. (3) It is recommended that folks look at multiple measures.

  6. Further Considerations (cont.) One useful way to evaluate model adequacy is to see if the addition of pathways causes the model chi-square to drop by more than 3.84 units. This is the “single-degree-of-freedom chi-square test”. If adding a path reduces the chi-square by less than 3.84, it implies that the added path is not strongly supported by the data. In the current example, the chi-square is 3.243, which tells us that adding a path from s_age to tcov could only reduce model chi-square by 3.243. This further indicates that our model could be considered to be adequate.

  7. Finding Measures of Model Fit in Amos II “Cmin” means minimum chi-square. Model Fit tab gives us several measures to consider.

  8. continued clicking on labels gives additional info

  9. continued RMSEA indicates “close” fit. Also that a value of 0 (perfect fit) cannot be ruled out. An AIC for our model (the “default” model) of 13.243 could only be reduced to a value of 12.000 by saturating our model. This is less than the minimum recommended AIC difference of 2.0, suggesting models indistinguishable. BUT, AIC is often not a reliable measure.

  10. continued some more The CAIC (consistent AIC) is generally viewed to be a better measure than AIC. Here we see that the default model value is more than 2.0 units smaller than the saturated model, supporting the conclusion that our model is adequate.

  11. and still some more The BIC (Bayesian Information Criterion) is one of the more popular measures at the moment. In this case, the saturated model BIC is only 1.257 greater, which is less than the 2.0 difference recommended for picking among models. This index tells us that while the evidence is better for the default model, the saturated model can’t be ruled out.

  12. and even still some more The Hoelter index relates back to our model Chi-square and its p-values. It tells us that at a sample size of 106, we would have enough power to detect an additional path from s_age to tcov with a p-value less than 0.05. 183 samples would be required to obtain a p-value less than 0.01.

  13. AIC difference criteria AIC diffsupport for equivalency of models 0-2 substantial 4-7 weak > 10 none Burnham, K.P. and Anderson, D.R. 2002. Model Selection and Multimodel Inference. Springer Verlag. (second edition), p 70.

  14. BIC difference criteria BIC diffsupport for difference between models 0-2 weak 2-6 positive 6-10 strong > 10 very strong Raftery, A.E. 1995. Sociological Methodology. 25:111-163, p 70

  15. What do we conclude in this case? Given the data we have available, we could justify (in my view) omitting the pathway from s_age to tcov. However, we must recognize that this is an approximation of the truth. If we had more samples, would they lead us to decide that we needed to include a path from s_age to tcov? Without the additional samples we don’t really know. Comparing the path coefficients for the two models would allow us to decide the scientific consequences of our model choice.

  16. What is the SEM perspective on model selection? In SEM we use our scientific knowledge to guide our decisions, and this applies especially to model selection. Do we believe it serves our scientific purposes to omit the path from s_age to tcov? We certainly can present the results for the path in the following fashion if we think it merits discussion. e1 e2 0.45 -0.35 s_age fidx tcov -0.19ns

  17. Final thought "Statistical tests are aids to (hopefully wise) judgement, not two-valued logical declarations of truth or falsity". Abelson, RP (1995) Statistics as Principled Argument. Lawrence Erlbaum Associates, Hillsdale, NJ, USA

More Related