1 / 32

Performing Latent Class Analysis Using the CATMOD Procedure David M. Thompson

Performing Latent Class Analysis Using the CATMOD Procedure David M. Thompson Department of Biostatistics and Epidemiology College of Public Health, OUHSC. Latent class analysis (LCA). LCA validates classification in the absence of a gold standard for decision-making.

Download Presentation

Performing Latent Class Analysis Using the CATMOD Procedure David M. Thompson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performing Latent Class Analysis Using the CATMOD Procedure David M. Thompson Department of Biostatistics and Epidemiology College of Public Health, OUHSC SUGI 31 - Contributed paper 201-31

  2. Latent class analysis (LCA) • LCA validates classification in the absence of a gold standard for decision-making. • LCA is unavailable in SAS. SUGI 31 - Contributed paper 201-31

  3. LCA and Patient Classification Patient classification is part of many clinical decisions. • Diagnosis • Prognosis SUGI 31 - Contributed paper 201-31

  4. Patient classification in the absence of a gold standard Diagnosis • Diagnostic categories may be emerging or unclear. Prognosis • predicting rehabilitation outcomes • counseling patients and families regarding expectations SUGI 31 - Contributed paper 201-31

  5. Latent class analysis (LCA) • LCA is a parallel to factor analysis, but for categorical responses. • Like factor analysis, LCA addresses the complex pattern of association that appears among observations…. SUGI 31 - Contributed paper 201-31

  6. … and attributes the pattern to a set of latent (underlying, unobserved) factors or classes. SUGI 31 - Contributed paper 201-31

  7. What if no gold standard existed in cardiology to assess a pattern of “yes/no”signs and symptoms? Rindskopf, R., & Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. Statistics in Medicine, 5, 21-27. SUGI 31 - Contributed paper 201-31

  8. LCA predicts latent class membership such that the observed variables are independent. SUGI 31 - Contributed paper 201-31

  9. LCA estimatesLatent class prevalencesConditional probabilities: probabilities of specific response, given class membership SUGI 31 - Contributed paper 201-31

  10. Conditional probabilities are analogous to sensitivities and specificities, but are calculated in the absence of a gold standard. SUGI 31 - Contributed paper 201-31

  11. LCA works on unconditional contingency table (no information on latent class membership) SUGI 31 - Contributed paper 201-31

  12. LCA’s goal is to produce a complete (conditional) table that assigns counts for each latent class: SUGI 31 - Contributed paper 201-31

  13. Estimating LC parameters • Maximum likelihood approach • Because LC membership is unobserved, the likelihood function, and the likelihood surface, are complex. SUGI 31 - Contributed paper 201-31

  14. EM algorithm calculates L when some data (X) are unobserved “M” step produces ML estimates from complete table “E” step uses parameter estimates to update expected values for cell counts nijklt in complete contingency table SUGI 31 - Contributed paper 201-31

  15. EM algorithm requires initial estimates “M” step 1st “E” step: Provide initial estimates to “fill in” missing information on LC membership “E” step SUGI 31 - Contributed paper 201-31

  16. EM algorithm in SAS “M” step PROC CATMOD 1st “E” step: SAS DATA step that randomly assigns each response profile to one latent class “E” step SAS DATA step SUGI 31 - Contributed paper 201-31

  17. “M” step ods output estimates=mu; proc catmod order=data; weight count; model a*b*c*d*x=_response_ /wls addcell=.1; loglin a b c d x a*x b*x c*x d*x; run; quit; ods output close; SUGI 31 - Contributed paper 201-31

  18. “E” step • data step that uses loglinear ML estimates from CATMOD • converts loglinear estimates into LC prevalences and conditional probabilities • calculates joint response probabilities within and summed across latent classes • calculates “posterior probabilities”, i.e. P(X=1|abcd) • constructs a new complete (conditional) contingency table SUGI 31 - Contributed paper 201-31

  19. Results of a simulation study • simulate responses to four binary (yes-no) observed variables with known but unobservable (latent) group membership • evaluate whether an LCA approach using CATMOD accurately detects true parameters SUGI 31 - Contributed paper 201-31

  20. Distribution of true LC prevalences from 1000 simulated samples where n=200 and E[P(X=1)] = 0.5 Parameter estimates from 406 successful runs using CATMOD SUGI 31 - Contributed paper 201-31

  21. Distribution of conditional probabilities from 1000 simulated samples E[P(A=1|X=1)] = 0.9 E[P(A=1|X=2)] = 0.2 Parameter estimates from CATMOD SUGI 31 - Contributed paper 201-31

  22. Distribution of conditional probabilities from 1000 simulated samples E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8 Parameter estimates from CATMOD SUGI 31 - Contributed paper 201-31

  23. Concluding remarks • LCA is a potentially valuable tool in clinical epidemiology for clarifying ill-defined diagnostic and prognostic classifications. • An approach using CATMOD brings LCA closer to SAS’ analytic framework. SUGI 31 - Contributed paper 201-31

  24. In any approach to LCA, sensitivity to initial estimates requires caution • E-M loop should iterate between 3 and 40 times • Initial estimates for LC prevalences should be at least 0.3 • Approach should employ replicate estimates using different starting values SUGI 31 - Contributed paper 201-31

  25. Parameter estimates from CATMODE[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8 Replicated parameter estimates from CATMOD SUGI 31 - Contributed paper 201-31

  26. Thank you! SUGI 31 - Contributed paper 201-31

  27. Acknowledgements Barbara R. Neas, Ph.D. Willis Owen, Ph.D. Dept. of Biostatistics and Epidemiology, OUHSC Gary Raskob, Ph.D. Dean, College of Public Health, OUHSC SUGI 31 - Contributed paper 201-31

  28. Assumptions of LCA • Exhaustiveness ABCD = X=t ABCDX • Local Independence ABCDX = ABCD|X =A|X B|X C|X D|X X (Goodman’s probabilistic parameterization of a latent class model with four manifest indicators) SUGI 31 - Contributed paper 201-31

  29. Local Independence (2) ABCDX=A|X B|X C|X D|X X ln ABCDX =+ iA + jB + kC + lD + tX + itAX + jtBX + ktCX + ltDX (Haberman’s loglinear parameterization of a latent class model with four manifest indicators) SUGI 31 - Contributed paper 201-31

  30. EM algorithm • A way around the difficulty inherent in calculating L when some data (X) are unobserved. • The first “E” (expectation) step requires initial estimates, which essentially “fill in” missing information on LC membership • “M” step maximizes likelihood for complete but provisional data, then passes the associated parameter estimates to next “E” step. • Given updated parameter estimates, revises the expected values for cell counts nijklt in the complete contingency table while preserving observed marginal counts nijkl. • Finds new parameter estimates that maximize L. SUGI 31 - Contributed paper 201-31

  31. Prognostic classification • Professionals must classify patients even when information is limited and available only in ‘yes/no’ form. • Example of a challenge to prognostic classification: • able to ascend/descend flight of 3 stairs? • positive screening test for depression? • spouse living at home? • independent in using toilet and bath? SUGI 31 - Contributed paper 201-31

  32. Maximum likelihood approach to estimating LC parameters • probability of obtaining observed count nijkl for response profile {i,j,k,l} is (ABCDX )nijklt • likelihood of obtaining a set of observed counts for all response profiles is L = i j k l t (ABCDX )nijklt log L = i j k l t n ijklt ln(ABCDX ) • Because LC membership (X=t) is unobserved, likelihood function is complicated. SUGI 31 - Contributed paper 201-31

More Related