- 195 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Cross-Sectional Mixture Modeling' - elysia

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Cross-Sectional Mixture Modeling

### Latent Class Analysis (LCA)

### OpenMX: LCA Example Script

### Mixtures & Twin Models

Shaunna L. Clark

Advanced Genetic Epidemiology Statistical Workshop

October 23, 2012

Outline

- What is a mixture?
- Introduction to LCA (LPA)
- Basic Analysis Ideas\Plan and Issues
- How to choose the number of classes
- How do we implement mixtures in OpenMx?
- Factor Mixture Model
- What do classes mean for twin modeling?

Homogeneity Vs. Heterogeneity

- Most models assume homogeneity
- i.e. Individuals in a sample all follow the same model
- What have seen so far (for the most part)
- But not always the case
- Ex: Sex, Age, Patterns of Substance Abuse

What is Mixture Modeling

Used to model unobserved heterogeneity by identifying different subgroups of individuals

Ex: IQ, Religiosity

Also known as Latent Profile Analysis (LPA) if you have continuously distributed variables

Latent Class Analysis

Introduced by Lazarsfeld & Henry, Goodman, Clogg, Dayton & Mcready

- Setting
- Cross-sectional data
- Multiple items measuring a construct
- 12 items measuring the construct of Cannabis Abuse/Dependence
- Hypothesized construct represented as latent class variable (categorical latent variable)
- Different categories of Cannabis Abuse\Dependence patterns
- Aim
- Identify items that indicate classes well
- Estimateproportion of sample in each class (class probability)
- Classify individuals into classes (posterior probabilities)

Latent Class Analysis Model

Dichotomous (0/1) indicators u: u1, u2, ... , ur

Categorical latent variable c: c = k ; k = 1, 2, ... , K

Marginal probability for item uj = 1,

(probability item uj =1 is the sum over all class of the product of the probability of being in class k and the probability of endorsing item uj given that you are in class k)

Joint Probabilities

- Joint probability of all u’s, assuming conditional independence:
- Probability of observing a given response pattern is equal to the sum over all classes of the product of being in a given class and the probability of observing a response on item 1 given that you are in latent class k, . . . (repeat for each item)

Posterior Probabilities

- Probability of being inclass k given your response pattern
- Used to assign most likely class membership
- Based on highest posterior probability

Model Testing

- Log-likelihood ratio χ2 test (LLRT)
- Overall test against the data with H1 being the unrestricted multinomial
- Problem: Not distributed as χ2 due to boundary conditions
- Don’t use it!!! (McLachlan& Peele, 2000)
- Information Criteria
- Akaike Information Criteria, AIC (Akaike,1974)

AIC = 2h-2ln(L)

- Bayesian Information Criteria, BIC (Schwartz, 1978)

BIC = -2ln(L)+h*ln(n)

- Where L = log-likelihood, h = number of parameters, n = sample size
- Chose model with lowest value of IC

Other Tests

- Since can’t do LLRT, use test which approximate the difference in LL values between k and k-1 class models.
- Vuong-Lo-Mendell-Rubin, LMR-LRT (Lo, Mendell, & Rubin, 2001)
- Parametric bootstrapped LRT, BLRT (McLachlan, 1987)
- P-value is probability that H0 is true
- H0: k-1 classes; H1: k classes
- A low p-value indicates a preference for the estimated model (i.e. k classes)
- Look for the first time the p-value is non-significant or greater than 0.05

Analysis Plan

- Fit model with 1-class
- Everyone in same class
- Sometimes simple is better
- Fit LCA models 2-K classes
- Chose best number of classes

Seems simple right???

Not really . . .Lots of known issues in Mixture Analysis

- Global vs. Local Maximum

Log Likelihood

Log Likelihood

Global

Global

Local

Local

Parameter

Parameter

- Use multiple sets of random starting values to make sure have global solution. Make sure that best LL value has replicated

Determining the number of classes: Class Enumeration

- No agreed upon way to determine the correct number of latent classes
- Statistical comparisons (i.e. ICs, LRTs)
- Interpretability and usefulness of classes
- Substantive theory
- Relationship to auxiliary variables
- Predictive validity of classes
- Class size
- Quality of Classifications (not my favorite)
- Classification table based on posterior probabilities
- Entropy - A value close to 1 indicates good classification in that many individuals have posterior probabilities close to 0 or 1

Suggested Strategy

- Nylund et al. (2007), Tofighi & Enders (2008), among others
- Simulation studies comparing tests and information criteria described previously
- Suggest:
- Use BIC and LMR to narrow down the number of plausible models
- Then run BLRT on those models because BLRT can be computationally intensive

LCA_example.R

Mixtures in OpenMx

- Specify class-specific models
- Create MxModel objects for each class
- Specify class probabilities
- Create an MxMatrix of class probabilities\proportions
- Specify model-wide objective function
- Pull everything together in a parent model with data
- Weighted sum of the class models
- Estimate entire model

Note: One of potentially many ways to do this

Class Specific Models

nameList <- names(<dataset>)

class1 <- mxModel("Class1",

mxMatrix("Iden", name = "R", nrow = nvar, ncol = nvar, free=FALSE),

mxMatrix("Full", name = "M", nrow = 1, ncol = nvar, free=FALSE),

mxMatrix("Full", name = "ThresholdsClass1", nrow = 1, ncol = nvar, dimnames = list("Threshold",nameList), free=TRUE),

mxFIMLObjective(covariance="R", means="M", dimnames=nameList, thresholds="ThresholdsClass1",vector=TRUE))

Repeat for every class in your model

Don’t be like me, make sure to change class numbers

Define the Model

lcamodel <- mxModel("lcamodel", class1, class2, mxData(vars, type="raw"),

Next, specify class membership probabilities

Class Membership Probabilities

- When specifying need to remember:
- Class probabilities must be positive
- Must sum to a constant - 1

mxMatrix("Full", name = "ClassMembershipProbabilities",

nrow = nclass, ncol = 1, free=TRUE,

labels = c(paste("pclass", 1:nclass, sep=""))),

mxBounds(c(paste("pclass", 1:nclass, sep="")),0,1),

mxMatrix("Iden", nrow = 1, name = "constraintLHS"),

mxAlgebra(sum(ClassMembershipProbabilities),

name = "constraintRHS"),

mxConstraint(constraintLHS == constraintRHS),

Model-wide objective function

- Weighted sum of individual class likelihoods
- Weights are class probabilities
- So for two classes:

Model Wide Objective Function Cont’d

mxAlgebra(

-2*sum(log(pclass1%x%Class1.objective

+ pclass2%x%Class2.objective)), name="lca"),

mxAlgebraObjective("lca"))

)

Now we run the model:

model <- mxRun(lcamodel)

And we wait and wait and wait till it’s done.

Profile Plot

- One way to interpret the classes is to plot them.
- In our example we had binary items, so the thresholds are what distinguishes between classes
- Can plot the thresholds
- Or you can plot the probabilities
- More intuitive
- Easier for non-statisticians to understand

Profile Plots in R\OpenMx

#Pulling out thresholds

class1T <- model@output$matrices$Class1.ThresholdsClass1

class2T <- model@output$matrices$Class2.ThresholdsClass2

#Converting threshold to probabilities

class1P<-t(1/(1+exp(-class1T)))

class2P<-t(1/(1+exp(-class2T)))

Profile Plots Cont’D

plot(class1P, type="o", col="blue",ylim=c(0,1),axes=FALSE, ann=FALSE)

axis(1,at=1:12,lab=nameList)

axis(2,las=1,at=c(0,0.2,0.4,0.6,0.8,1))

box()

lines(class2P,type="o", pch=22, lty=2, col="red")

title(main="LCA 2 Class Profile Plot", col.main="black",font.main=4)

title(xlab="DSM Items", col.lab="black")

title(ylab="Probability", col.lab="black")

legend("bottomright",c("Class 1","Class 2"), cex=0.8,

col=c("blue","red"),pch=21:22,lty=1:2)

OpenMx Exercise

- Unfortunately, it takes long time for these to run so not feasible to do in this session
- However, I’ve run the 2-, 3-, and 4- class LCA models for this data and (hopefully) the .Rdata files are posted on the website
- Exercise: Using the .Rdata files
- Determine which model is better according to AIC\BIC
- Want the lowest value
- Make a profile plot of the best solution and interpret the classes
- What kind of substances users are there?

Code to pull out LL and compute AIC\BIC

#Pull out LL

LL_2c <- model@output$Minus2LogLikelihood

LL_2cnsam = 1878

#parameters

npar<- (nclass-1) + (nthresh*nvar*nclass

npar

#Compute AIC & BIC

AIC_2c = 2*npar + LL_2c

AIC_2c

BIC_2c = LL_2c + (npar*log(nsam))

BIC_2c

Problem with LCA

- Once in a class, everyone “looks” the same.
- In the context of substance abuse, unlikely that every user will have the same patterns of use
- Withdrawal, tolerance, hazardous use
- There is variation within a latent class
- Severity
- One proposed solution is the factor mixture model
- Uses a latent class variables to classify individuals and latent factor to model severity

Factor Mixture Model

σ2F

F

C

λ3

λ1

λ2

λ4

λ5

x1

x2

x3

x4

x5

Classes can be indicated by item thresholds (categorical)\ item means (continuous) or factor mean and variance

General Factor Mixture Model

yik = Λkηik + εik,

ηik = αk+ ζik,

where,

ζik~ N(0, Ψk)

- Similar to the FA model, except many parameters can be class varying as indicated by the subscript k
- Several variations of this model which differ in terms of the measurement invariance
- Lubke & Neale (2005), Clark et al. (2012)

How do we do this in OpenMx?

- You’ll have to wait till tomorrow!
- Factor Mixture Model is a generalization of the Growth Mixture Model we’ll talk about tomorrow afternoon.

How do we combine the ACDE model and mixtures?

What would this look like for the FMM?

1.0 (MZ) / 0.5 (DZ)

1.0

1.0 (MZ) / 0.5 (DZ)

1.0

aA

aA

CA

CB

cB

eA

cA

aA

aB

eB

cA

cA

eA

eA

FB

FA

x1B

x1A

x2B

x2A

x3B

x3A

x4A

x4B

FMM & ACE CONT’D

- One of many possible ways to do FMM & ACE in the same model
- Can also have class specific ACE on the factors
- Each class has own heritability

From Muthén et al. (2006)

Issue with Option 1

- Model is utilizes the liability threshold model to “covert” the latent categorical variable, C, to a latent normal variable
- This requires that classes are ordered
- Ex: high, medium, low users
- Don’t always have nicely ordered classes
- Models are VERY time intensive
- Take a vacation for a week or two

Option 2: Three-Step Method

- Estimate mixture model
- Assign individuals into their most likely latent class based on the posterior probabilities of class membership
- Use the observed, categorical variable of assigned class membership as the phenotype in a liability threshold model version of ACE analysis

Note: Requires ordered classes

Option 2a

- Contingency table analysis using most likely class membership
- Concordance between twins in terms of most likely class membership
- If your classes are not ordered
- Odds Ratio
- Excess twin concordance due to stronger genetic relationship can be represented by the OR for MZ twins compared to the OR for DZ twins.
- Place restrictions on the contingency table to test specific hypotheses
- Mendelian segregation, only shared environmental effects
- Eaves (1993)

Issues with Option 2

- Potential for biased parameter estimates and underestimated standard errors
- Assigned membership ignores fractional class membership suggested by posterior probabilities
- Treat the classification as not having any sampling error
- Good option when entropy is high\ well separated classes

Selection of Cross-Sectional Mixture Genetic Analysis Writings

- Latent Class Analysis
- Eaves, 1993; Muthén et al., 2006; Clark, 2010
- Factor Mixture Analysis
- Neale & Gillespie, 2005 (?); Clark, 2010; Clark et al. (in preparation)
- Additional References
- McLachlan, Do, & Ambroise, 2004
- Mixtures in Substance Abuse
- Gillespie (2011, 2012)
- Great cannabis examples

Download Presentation

Connecting to Server..