210 likes | 290 Views
This study compares marginal risk differences and ratios for correlated binary variables, addressing challenges of sparse and unbalanced data. Asymptotic theory is explored while testing statistics with discrete support.
E N D
Comparing Marginsof Multivariate Binary Data Bernhard Klingenberg Assoc. Prof. of Statistics Williams College, MA www.williams.edu/~bklingen
Challenges: Associations of various degrees among binary variables Simultaneous Inference Sparse and/or unbalanced data, Test statistics with discrete support Asymptotic theory questionable Outline • Setup: • Two indep. groups • Response: Vector of k correlated binary variables (multivariate binary) • Goal: • Inference about k margins: • Marginal Risk Differences • Marginal Risk Ratios
Outline • Motivating Examples • From drug safety or animal toxicity/carcinogenicity studies Source: http://us.gsk.com/products/assets/us_advair.pdf
Source: http://www.pfizer.com/files/products/uspi_lipitor.pdf
Outline • Example: AEs from a vaccine trial (flu shot): > head(Y1) # ACTIVE Treatment n1=1971 ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS 2 1 1 1 1 1 1 1 4 0 1 1 0 0 1 0 5 1 0 0 0 0 0 0 6 1 1 1 1 1 1 1 7 0 0 0 0 0 1 0 9 1 0 1 1 1 1 1 > head(Y2) # PLACEBO Treatment n2=1554 ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 8 0 0 0 0 1 0 0 10 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 15 0 0 1 0 0 1 0
Notation and Setup • k-dimensional response vectors: Group 1Group 2 • Random sample in each group: Group 1Group 2 • Joint distrib. in each group depends on 2k-1 parameters Group 1Group 2
Comparing Margins • Usually only interested in k margins.Group 1Group 2 • With just two (k=2) adverse events: Group 1Group 2 Headache Headache Pain Pain
Comparing Margins • Differences in marginal incidence rates between Group 1 (Treatment) andGroup 2 (Control) Group1 Group2 Diff HEADACHE 0.26030.2407 0.0196 INJECTION SITE PAIN 0.60880.1384 0.4705 MYALGIA 0.25880.1088 0.1500 ARTHRALGIA 0.08930.0579 0.0314 MALAISE 0.20850.1332 0.0753 FATIGUE 0.24760.2098 0.0378 CHILLS 0.09280.0463 0.0465
Family of Tests • j-thNull Hypothesis: • Unrestricted and restricted MLEs:
Comparing Margins • Estimates of marginal incidence rates and test statistics comparing Group 1 (Treatment) andGroup 2 (Control)
Asymptotic Test • Note: • Asymptotically, multivariate normal with covariance matrix determined by
Asymptotic Test • Correlation Matrix: > round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7 d1 1.00 0.04 0.29 0.26 0.38 0.41 0.27 d21.00 0.18 0.09 0.08 0.10 0.01 d3 1.00 0.46 0.35 0.36 0.30 d4 1.00 0.33 0.33 0.32 d5 1.00 0.510.44 d6 1.00 0.37 d7 1.00 > qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma)) $quantile [1] 2.656222
Asymptotic Test • Correlation Matrix: > round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7 d11.00 0.06 0.33 0.28 0.41 0.41 0.29 d21.00 0.28 0.11 0.15 0.12 0.09 d3 1.00 0.46 0.41 0.36 0.35 d4 1.00 0.32 0.34 0.28 d5 1.00 0.500.47 d6 1.00 0.37 d7 1.00 > qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma)) $quantile [1] 2.653783
Permutation Approach • When testing can use Permutation Approach • This assumes distributions are exchangeable (i.e. identical), much stronger assumption than under null • Need two extra conditions: • Sequences of all 0's as or more likely to occur under group 2 (Control) • Sequence of all 1's as or more likely to occur under group 1 (Treatment)
Permutation vs. Asymptotic • Permutation vs. asymptotic distribution of Permut. Distr. Critical Value: (a = 0.05) cperm= 2.655 casympt= 2.654 cBonf= 2.690 Asympt. Distr.
Family of Tests • Results: Raw and Adjusted P-values
Simultaneous Confidence Intervals • Invert family of tests: • Confidence Region: • Simplifies to simultaneous confidence intervals if
Simultaneous Confidence Intervals • Results: Inverting Score test diffLBUB HEADACHE 0.0196 -0.0196 0.0583 PAIN 0.4705 0.4323 0.5069 MYALGIA 0.1500 0.1162 0.1835 ARTHRALGIA 0.0314 0.0078 0.0547 MALAISE 0.0753 0.0416 0.1086 FATIGUE 0.0378 -0.0002 0.0752 CHILLS 0.0465 0.0239 0.0692
Simultaneous Confidence Intervals • We used (and recommend) score statistic • Could use Wald statistic instead • This is equivalent to fitting marginal model via GEE: • asympt. multiv. normal, with (sandwich) covariance matrix (same as before) • Use distribution of for multiplicity adjustment
Simultaneous Confidence Intervals • Results: GEE approach (= inverting Wald test) diffLBUB HEADACHE 0.0196 -0.0194 0.0586 PAIN 0.4705 0.4331 0.5078 MYALGIA 0.1500 0.1164 0.1836 ARTHRALGIA 0.0314 0.0082 0.0546 MALAISE 0.0753 0.0419 0.1087 FATIGUE 0.0378 0.0001 0.0755 CHILLS 0.0465 0.0241 0.0689