Model Evaluation and Selection. Example Objective: Demonstrate how to evaluate a single model and how to compare alternative models. Evaluating the Sufficiency of a Single Model (followup to example of Mediation Test). When this model is run, a variety of
(followup to example of Mediation Test)
When this model is run, a variety of
measures of model fit will be generated.
A question of importance is, "Is the fit of the model sufficiently good to yield reliable results?"
The alternative model is one in which there is also an arrow from s_age to tcov. In other words, does fire severity explain the effect of stand age on cover, or, is there another pathway of influence independent of fire severity?
The model chi-square is the most commonly used measure of absolute model fit.
It is always good to check the section
of the output called “Notes for Model”.
Here we can see that a minimum was achieved and the full p-value for the chi-square. P-value greater than 0.05 suggests that we could accept this model (it indicates no major deviations between data and model).
It is well known that model p-values are not always the best way to decide if a model is adequate (in an absolute sense) or the best model (in a relative sense). This is a complex topic and one that lacks complete consensus. What is generally agreed upon is:
(1) Chi-squares automatically increase with increasing sample size and p-values reflect increasing power for detecting deviations.
(2) P-values for model chi-squares are pretty useful when sample sizes are less than 200, especially for models that do not include latent variables possessing multiple indicators.
(3) It is recommended that folks look at multiple measures.
One useful way to evaluate model adequacy is to see if the addition of pathways causes the model chi-square to drop by more than 3.84 units. This is the “single-degree-of-freedom chi-square test”. If adding a path reduces the chi-square by less than 3.84, it implies that the added path is not strongly supported by the data.
In the current example, the chi-square is 3.243, which tells us that adding a path from s_age to tcov could only reduce model chi-square by 3.243. This further indicates that our model could be considered to be adequate.
“Cmin” means minimum chi-square.
Model Fit tab gives us several measures to consider.
clicking on labels gives additional info
RMSEA indicates “close” fit. Also that a value of 0 (perfect fit) cannot be ruled out.
An AIC for our model (the “default” model) of 13.243 could only be reduced to a value of 12.000 by saturating our model. This is less than the minimum recommended AIC difference of 2.0, suggesting models indistinguishable.
BUT, AIC is often not a reliable measure.
The CAIC (consistent AIC) is generally viewed to be a better measure than AIC. Here we see that the default model value is more than 2.0 units smaller than the saturated model, supporting the conclusion that our model is adequate.
The BIC (Bayesian Information Criterion) is one of the more popular measures at the moment. In this case, the saturated model BIC is only 1.257 greater, which is less than the 2.0 difference recommended for picking among models. This index tells us that while the evidence is better for the default model, the saturated model can’t be ruled out.
The Hoelter index relates back to our model Chi-square and its p-values. It tells us that at a sample size of 106, we would have enough power to detect an additional path from s_age to tcov with a p-value less than 0.05. 183 samples would be required to obtain a p-value less than 0.01.
AIC diffsupport for equivalency of models
> 10 none
Burnham, K.P. and Anderson, D.R. 2002. Model Selection and
Multimodel Inference. Springer Verlag. (second edition), p 70.
BIC diffsupport for difference between models
> 10 very strong
Raftery, A.E. 1995. Sociological Methodology. 25:111-163, p 70
Given the data we have available, we could justify (in my view) omitting the pathway from s_age to tcov. However, we must recognize that this is an approximation of the truth. If we had more samples, would they lead us to decide that we needed to include a path from s_age to tcov? Without the additional samples we don’t really know. Comparing the path coefficients for the two models would allow us to decide the scientific consequences of our model choice.
In SEM we use our scientific knowledge to guide our decisions, and this applies especially to model selection. Do we believe it serves our scientific purposes to omit the path from s_age to tcov? We certainly can present the results for the path in the following fashion if we think it merits discussion.
"Statistical tests are aids to (hopefully wise) judgement, not two-valued logical declarations of truth or falsity". Abelson, RP (1995) Statistics as Principled Argument. Lawrence Erlbaum Associates, Hillsdale, NJ, USA