cross sectional mixture modeling n.
Skip this Video
Loading SlideShow in 5 Seconds..
Cross-Sectional Mixture Modeling PowerPoint Presentation
Download Presentation
Cross-Sectional Mixture Modeling

Loading in 2 Seconds...

play fullscreen
1 / 45

Cross-Sectional Mixture Modeling - PowerPoint PPT Presentation

  • Uploaded on

Cross-Sectional Mixture Modeling. Shaunna L. Clark Advanced Genetic Epidemiology Statistical Workshop October 23, 2012. Outline . What is a mixture? Introduction to LCA (LPA) Basic Analysis Ideas\Plan and Issues How to choose the number of classes How do we implement mixtures in OpenMx ?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Cross-Sectional Mixture Modeling' - elysia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cross sectional mixture modeling

Cross-Sectional Mixture Modeling

Shaunna L. Clark

Advanced Genetic Epidemiology Statistical Workshop

October 23, 2012

  • What is a mixture?
  • Introduction to LCA (LPA)
    • Basic Analysis Ideas\Plan and Issues
    • How to choose the number of classes
  • How do we implement mixtures in OpenMx?
  • Factor Mixture Model
  • What do classes mean for twin modeling?
homogeneity vs heterogeneity
Homogeneity Vs. Heterogeneity
  • Most models assume homogeneity
    • i.e. Individuals in a sample all follow the same model
    • What have seen so far (for the most part)
  • But not always the case
    • Ex: Sex, Age, Patterns of Substance Abuse
what is mixture modeling
What is Mixture Modeling

Used to model unobserved heterogeneity by identifying different subgroups of individuals

Ex: IQ, Religiosity

latent class analysis lca

Latent Class Analysis (LCA)

Also known as Latent Profile Analysis (LPA) if you have continuously distributed variables

latent class analysis
Latent Class Analysis

Introduced by Lazarsfeld & Henry, Goodman, Clogg, Dayton & Mcready

  • Setting
    • Cross-sectional data
    • Multiple items measuring a construct
      • 12 items measuring the construct of Cannabis Abuse/Dependence
    • Hypothesized construct represented as latent class variable (categorical latent variable)
      • Different categories of Cannabis Abuse\Dependence patterns
  • Aim
    • Identify items that indicate classes well
    • Estimateproportion of sample in each class (class probability)
    • Classify individuals into classes (posterior probabilities)
latent class analysis model
Latent Class Analysis Model

Dichotomous (0/1) indicators u: u1, u2, ... , ur

Categorical latent variable c: c = k ; k = 1, 2, ... , K

Marginal probability for item uj = 1,

(probability item uj =1 is the sum over all class of the product of the probability of being in class k and the probability of endorsing item uj given that you are in class k)

joint probabilities
Joint Probabilities
  • Joint probability of all u’s, assuming conditional independence:
  • Probability of observing a given response pattern is equal to the sum over all classes of the product of being in a given class and the probability of observing a response on item 1 given that you are in latent class k, . . . (repeat for each item)
posterior probabilities
Posterior Probabilities
  • Probability of being inclass k given your response pattern
  • Used to assign most likely class membership
    • Based on highest posterior probability
model testing
Model Testing
  • Log-likelihood ratio χ2 test (LLRT)
    • Overall test against the data with H1 being the unrestricted multinomial
    • Problem: Not distributed as χ2 due to boundary conditions
      • Don’t use it!!! (McLachlan& Peele, 2000)
  • Information Criteria
    • Akaike Information Criteria, AIC (Akaike,1974)

AIC = 2h-2ln(L)

    • Bayesian Information Criteria, BIC (Schwartz, 1978)

BIC = -2ln(L)+h*ln(n)

    • Where L = log-likelihood, h = number of parameters, n = sample size
    • Chose model with lowest value of IC
other tests
Other Tests
  • Since can’t do LLRT, use test which approximate the difference in LL values between k and k-1 class models.
    • Vuong-Lo-Mendell-Rubin, LMR-LRT (Lo, Mendell, & Rubin, 2001)
    • Parametric bootstrapped LRT, BLRT (McLachlan, 1987)
  • P-value is probability that H0 is true
    • H0: k-1 classes; H1: k classes
    • A low p-value indicates a preference for the estimated model (i.e. k classes)
  • Look for the first time the p-value is non-significant or greater than 0.05
analysis plan
Analysis Plan
  • Fit model with 1-class
    • Everyone in same class
    • Sometimes simple is better
  • Fit LCA models 2-K classes
  • Chose best number of classes

Seems simple right???

not really lots of known issues in mixture analysis
Not really . . .Lots of known issues in Mixture Analysis
  • Global vs. Local Maximum

Log Likelihood

Log Likelihood







  • Use multiple sets of random starting values to make sure have global solution. Make sure that best LL value has replicated
determining the number of classes class enumeration
Determining the number of classes: Class Enumeration
  • No agreed upon way to determine the correct number of latent classes
    • Statistical comparisons (i.e. ICs, LRTs)
    • Interpretability and usefulness of classes
      • Substantive theory
      • Relationship to auxiliary variables
      • Predictive validity of classes
      • Class size
    • Quality of Classifications (not my favorite)
      • Classification table based on posterior probabilities
      • Entropy - A value close to 1 indicates good classification in that many individuals have posterior probabilities close to 0 or 1
suggested strategy
Suggested Strategy
  • Nylund et al. (2007), Tofighi & Enders (2008), among others
  • Simulation studies comparing tests and information criteria described previously
  • Suggest:
    • Use BIC and LMR to narrow down the number of plausible models
    • Then run BLRT on those models because BLRT can be computationally intensive
mixtures in openmx
Mixtures in OpenMx
  • Specify class-specific models
    • Create MxModel objects for each class
  • Specify class probabilities
    • Create an MxMatrix of class probabilities\proportions
  • Specify model-wide objective function
    • Pull everything together in a parent model with data
    • Weighted sum of the class models
  • Estimate entire model

Note: One of potentially many ways to do this

class specific models
Class Specific Models

nameList <- names(<dataset>)

class1 <- mxModel("Class1",

mxMatrix("Iden", name = "R", nrow = nvar, ncol = nvar, free=FALSE),

mxMatrix("Full", name = "M", nrow = 1, ncol = nvar, free=FALSE),

mxMatrix("Full", name = "ThresholdsClass1", nrow = 1, ncol = nvar, dimnames = list("Threshold",nameList), free=TRUE),

mxFIMLObjective(covariance="R", means="M", dimnames=nameList, thresholds="ThresholdsClass1",vector=TRUE))

Repeat for every class in your model

Don’t be like me, make sure to change class numbers

define the model
Define the Model

lcamodel <- mxModel("lcamodel", class1, class2, mxData(vars, type="raw"),

Next, specify class membership probabilities

class membership probabilities
Class Membership Probabilities
  • When specifying need to remember:
    • Class probabilities must be positive
    • Must sum to a constant - 1

mxMatrix("Full", name = "ClassMembershipProbabilities",

nrow = nclass, ncol = 1, free=TRUE,

labels = c(paste("pclass", 1:nclass, sep=""))),

mxBounds(c(paste("pclass", 1:nclass, sep="")),0,1),

mxMatrix("Iden", nrow = 1, name = "constraintLHS"),


name = "constraintRHS"),

mxConstraint(constraintLHS == constraintRHS),

model wide objective function
Model-wide objective function
  • Weighted sum of individual class likelihoods
  • Weights are class probabilities
  • So for two classes:
model wide objective function cont d
Model Wide Objective Function Cont’d



+ pclass2%x%Class2.objective)), name="lca"),



Now we run the model:

model <- mxRun(lcamodel)

And we wait and wait and wait till it’s done.

profile plot
Profile Plot
  • One way to interpret the classes is to plot them.
  • In our example we had binary items, so the thresholds are what distinguishes between classes
    • Can plot the thresholds
  • Or you can plot the probabilities
    • More intuitive
    • Easier for non-statisticians to understand
profile plots in r openmx
Profile Plots in R\OpenMx

#Pulling out thresholds

class1T <- model@output$matrices$Class1.ThresholdsClass1

class2T <- model@output$matrices$Class2.ThresholdsClass2

#Converting threshold to probabilities



profile plots cont d
Profile Plots Cont’D

plot(class1P, type="o", col="blue",ylim=c(0,1),axes=FALSE, ann=FALSE)




lines(class2P,type="o", pch=22, lty=2, col="red")

title(main="LCA 2 Class Profile Plot", col.main="black",font.main=4)

title(xlab="DSM Items", col.lab="black")

title(ylab="Probability", col.lab="black")

legend("bottomright",c("Class 1","Class 2"), cex=0.8,


openmx exercise
OpenMx Exercise
  • Unfortunately, it takes long time for these to run so not feasible to do in this session
  • However, I’ve run the 2-, 3-, and 4- class LCA models for this data and (hopefully) the .Rdata files are posted on the website
  • Exercise: Using the .Rdata files
      • Determine which model is better according to AIC\BIC
        • Want the lowest value
      • Make a profile plot of the best solution and interpret the classes
        • What kind of substances users are there?
code to pull out ll and compute aic bic
Code to pull out LL and compute AIC\BIC

#Pull out LL

LL_2c <- model@output$Minus2LogLikelihood

LL_2cnsam = 1878


npar<- (nclass-1) + (nthresh*nvar*nclass


#Compute AIC & BIC

AIC_2c = 2*npar + LL_2c


BIC_2c = LL_2c + (npar*log(nsam))


problem with lca
Problem with LCA
  • Once in a class, everyone “looks” the same.
  • In the context of substance abuse, unlikely that every user will have the same patterns of use
    • Withdrawal, tolerance, hazardous use
    • There is variation within a latent class
      • Severity
  • One proposed solution is the factor mixture model
    • Uses a latent class variables to classify individuals and latent factor to model severity
factor mixture model
Factor Mixture Model














Classes can be indicated by item thresholds (categorical)\ item means (continuous) or factor mean and variance

general factor mixture model
General Factor Mixture Model

yik = Λkηik + εik,

ηik = αk+ ζik,


ζik~ N(0, Ψk)

  • Similar to the FA model, except many parameters can be class varying as indicated by the subscript k
  • Several variations of this model which differ in terms of the measurement invariance
    • Lubke & Neale (2005), Clark et al. (2012)
how do we do this in openmx
How do we do this in OpenMx?
  • You’ll have to wait till tomorrow!
  • Factor Mixture Model is a generalization of the Growth Mixture Model we’ll talk about tomorrow afternoon.
mixtures twin models

Mixtures & Twin Models

How do we combine the ACDE model and mixtures?

option 1 ace directly on the classes
Option 1: ACE Directly on the Classes

1.0 (MZ) / 0.5 (DZ)


















what would this look like for the fmm
What would this look like for the FMM?

1.0 (MZ) / 0.5 (DZ)


1.0 (MZ) / 0.5 (DZ)


























fmm ace cont d
  • One of many possible ways to do FMM & ACE in the same model
  • Can also have class specific ACE on the factors
    • Each class has own heritability

From Muthén et al. (2006)

issue with option 1
Issue with Option 1
  • Model is utilizes the liability threshold model to “covert” the latent categorical variable, C, to a latent normal variable
    • This requires that classes are ordered
      • Ex: high, medium, low users
  • Don’t always have nicely ordered classes
  • Models are VERY time intensive
    • Take a vacation for a week or two
option 2 three step method
Option 2: Three-Step Method
  • Estimate mixture model
  • Assign individuals into their most likely latent class based on the posterior probabilities of class membership
  • Use the observed, categorical variable of assigned class membership as the phenotype in a liability threshold model version of ACE analysis

Note: Requires ordered classes

option 2a
Option 2a
  • Contingency table analysis using most likely class membership
    • Concordance between twins in terms of most likely class membership
    • If your classes are not ordered
  • Odds Ratio
      • Excess twin concordance due to stronger genetic relationship can be represented by the OR for MZ twins compared to the OR for DZ twins.
  • Place restrictions on the contingency table to test specific hypotheses
    • Mendelian segregation, only shared environmental effects
    • Eaves (1993)
issues with option 2
Issues with Option 2
  • Potential for biased parameter estimates and underestimated standard errors
  • Assigned membership ignores fractional class membership suggested by posterior probabilities
  • Treat the classification as not having any sampling error
  • Good option when entropy is high\ well separated classes
selection of cross sectional mixture genetic analysis writings
Selection of Cross-Sectional Mixture Genetic Analysis Writings
  • Latent Class Analysis
    • Eaves, 1993; Muthén et al., 2006; Clark, 2010
  • Factor Mixture Analysis
    • Neale & Gillespie, 2005 (?); Clark, 2010; Clark et al. (in preparation)
  • Additional References
    • McLachlan, Do, & Ambroise, 2004
  • Mixtures in Substance Abuse
    • Gillespie (2011, 2012)
      • Great cannabis examples