490 likes | 626 Views
This document provides a detailed overview of loglinear models (LLMs) for exploring associations among categorical variables. It covers two-way and three-way tables, conditional independencies, and the construction of multigraphs, including the maximum spanning tree. Practical examples illustrate how to apply LLMs in real-world scenarios, such as analyzing survey data on substance use among high school seniors in Dayton, Ohio. The guide emphasizes the interpretation of probabilistic relationships and the implications of collapsibility in statistical modeling.
E N D
The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA
OUTLINE 1. LOGLINEAR MODEL (LLM) - two-way table - three-way table - examples 2. MULTIGRAPH - construction - maximum spanning tree - conditional independencies - collapsibility 3. EXAMPLES
Loglinear Model Goal Identify the structure of associations among a set of categorical variables.
LLM: two variables Y 1 2 3 … J Total ------------------------------------------------------------------------------ 1 n11 n12 n13 … n1Jn1+ 2 n21 n22 n23 … n2J n2+ . . . . . . X . . . . . . . . . . . . I nI1 nI2 nI3 … nIJnI+ Totaln+1 n+2 n+3 … n+Jn
LLM: two variables Example Survey of High School Seniors in Dayton, Ohio Collaboration: WSU Boonshoft School of Medicine and United Health Services of Dayton Marijuana Use? Yes No Total --------------------------------------------------------------------- Yes 914 581 1495 Cigarette Use? No 46 735 781 Total 960 1316 2276
LLM: two variables Two discrete variables, X and Y Model of independence: generating class is [X][Y]
LLM: two variables LLM of independence:
LLM: two variables Saturated LLM: generating class is [XY]:
LLM: two variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------- X and Y independent [X][Y] pij = pi+p+j X and Y dependent [XY] pij
LLM: three variables Example: Dayton High School Data AlcoholCigaretteMarijuana Use UseUse Yes No ---------------------------------------------------------------------------------- Yes Yes 911 538 No 44 456 No Yes 3 43 No 2 279
LLM: three variables Saturated LLM, [XYZ]: 11
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association* [XY][XZ][YZ] * saturated model [XYZ] pijk *nondecomposable model
Decomposable LLMs closed-form expression for MLEs closed-form expression for asymptotic variances (Lee, 1977) conditional G2 statistic simplifies allow for causal interpretations easier to interpret the LLM
3 Categorical Variables: X, Y, and Z If [X⊗Y] and [Y⊗Z] then [X⊗Z] FALSE!
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk
3 Categorical Variables: X, Y, and Z If [Y⊗Z] for all X = 1, 2, …. then [Y⊗Z] FALSE!
LLM: three variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------ mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk
3 Categorical Variables: X, Y, and Z If [Y⊗Z] then [Y⊗Z] for all X = 1, 2, 3, … FALSE!
Which Treatment is Better? TRIAL 1TRIAL 2 CURED? CURED? Yes No Total Yes No Total ---------------------------------------------- ---------------------------------------- A 40 (.20) 160 200 85 (.85) 15 100 TREATMENT B 30 (.15) 170 200 300 (.75) 100 400 Combine TRIALS 1 and 2: CURED? Yes No Total ----------------------------------------------- A 125 (.42) 175 300 TREATMENT B 330 (.55) 270 600 “Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28, 1996
Florida Homicide Convictions Resulting in Death PenaltyML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991 Death Penalty Yes No ---------------------------------------- White 53 (0.11) 430 Defendant’s Race Black 15 (0.08) 176 White VictimBlack Victim Death PenaltyDeath Penalty Yes No Yes No ------------------------------------- -------------------------------------- White 53 (0.11) 414 White 0 (0.00) 16 Defendant’s Race Black 11 (0.23) 37 Black 4 (0.03) 139
Multigraph Representation of LLMs Vertices = generators of the LLM Multiedges = edges that are equal in number to the number of indices shared by the two vertices being joined
Multigraph: three variables [XY][XZ] XY XZ
Examples of Multigraphs [AS][ACR][MCS][MAC] AS ACR MAC MCS
Examples of Multigraphs [ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG
Maximum Spanning Tree The maximum spanning tree of a multigraph M: • tree (connected graph with no circuits) • includes each vertex • sum of the edges is maximum
Examples of maximum spanning trees [XY][XZ] XY XZ
Examples of maximum spanning trees [AS][ACR][MCS][MAC] AS ACR MAC MCS
[ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG Examples of maximum spanning trees
Fundamental Conditional Independenciesfor a Decomposable LLM 1. Let S be the set of indices in a branch of the maximum spanning tree 2. Remove each factor of S from the multigraph, M; the resulting multigraph is M/S 3. An FCI is determined as: where C1, C2, …, Ck are the sets of factors in the components of M/S
[XY][XZ] XY XZ FCIs X S = {X} M/S: Y Z [Y⊗Z|X]
Collapsibility Conditions Consider a conditional independence relationship of the form [C1 ⊗ C2|S]. If the levels of all factors in C1 are collapsed, then all relationships among the remaining factors are undistorted EXCEPT for relationships among factors in S.
[XY][XZ] XY XZ FCIs X S = {X} M/S: Y Z [Y⊗Z|X]
Example: Ob-Gyn Study(Darrocca, et al., 1996) n = 201 pregnant mothers Variables: E: EGA (Early, Late) B: Bishop score (High, Low) T: Treatment (Prostin, Placebo)
Example: Ob-Gyn Study BISHOP SCORE (B) High Low EGA (E) EGA (E) TREATMENT (T) Early Late Early Late ------------------------------------------------------------------------------------------------------ Prostin 34 24 27 21 Placebo 22 16 35 22 Best-fitting model: [E][TB]
Example: Ob-Gyn Study Generating Class: [E][TB] Multigraph: E TB FCI: [E⊗T,B]
Example: Ob-Gyn Study Collapsed Table (collapse over EGA): BISHOP SCORE (B) High Low Total ------------------------------------------------- Prostin 58 (0.55) 48 106 TREATMENT (T) Placebo 38 (0.40) 57 95 P = 0.037
Example: WSU-United Way Study M: Marijuana (No, Yes) A: Alcohol (No, Yes) C: Cigarettes (No, Yes) R: Race (Other, White) S: Sex (Female, Male) Observed cell frequencies (n = 2,276): 12 0 19 2 1 0 23 23 117 1 218 13 17 1 268 405 17 0 18 1 8 1 19 30 133 1 201 28 17 1 228 453
Example: WSU-United Way Study Generating class: [ACE][MAC][MCG] Multigraph, M: ACE MCG MAC
Example: WSU-United Way Study M: S = {A,C} ACE M/S: E AC MG M MCG MAC [E⊗M,G|A,C] A = Alcohol C = Cigarette E = Ethnic G = Gender M = Marijuana
Example: WSU PASS Program “Preparing for Academic Success” GPA below 2.0 at the end of first quarter
Example: WSU PASS Program Variables (n = 972): FACTOR LABEL LEVELS -------------------------------------------------------------------------------------------------------------- Retention R 1=No, 2=Yes Cohort C 1, 2, 3, 4 PASS Participation P 1=No, 2=Yes Ethnic Group E 1=Caucasian, 2=African-American, 3=Other Gender G 1=Male, 2=Female
Example: WSU PASS Program The best-fitting LLM has generating class [EG][CP][RC][PG] Multigraph, M: G EG PG P RC C CP
Example: WSU PASS Program M: S = {C} EG PG EG PG RC CP R P C M M/S [E,G,P⊗R|C] C = Cohort E = Ethnic G = Gender P = PASS Participation R = Retention
Example: Affinal Relations in Bosnia-HerzegovinaData courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations. M: Marriage Type (traditional, elopement) L: Location of Man and Wife (same, different) E: Ethnicity (Bosniak, Serb, Croat) S: Settlement (rural, urban) Best-fitting model: [MLES] Consider structural associations among M, L, and S for each ethnic group (E) separately.
Example: Affinal Relations in Bosnia-Herzegovina Bosniaks: [ML][LS] Serbs: [MS][SL] Croats: [M][L][S] M: Marriage Type L: Location of Man and Wife S: Settlement
Conclusions • The generator multigraph uses mathematical graph theory to analyze and interpret LLMs in a facile manner • Properties of the multigraph allow one to: • Find all conditional independencies • Determine all collapsibility conditions REFERENCE Khamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models, SAGE series Quantitative Applications in the Social Sciences, No. 167.