Loading in 2 Seconds...

Download Presentation

Case-control association techniques in genetic studies

Loading in 2 Seconds...

- By
**keefe** - Follow User

- 157 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Case-control association techniques in genetic studies' - keefe

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Case-control association techniques

in genetic studies

March 10, 2011

Karen Curtin, Ph.D.

Division of Genetic Epidemiology and

HCI Pedigree & Population Resource (PPR)

- Background (genetics concepts)

- Basic case-control association

- Complex case-control association

- Genome-wide association

The Human Genome: 6 billion DNA bases(Adenine, Cytosine, Guanine, or Thymine)

License: Creative Commons Attribution 2.0

…AGCCAAATTGGATTC…

At any locus (position on a chromosome):

Read across both chromosomes

Genotype CT

CA

T

G

Read along a chromosome

Haplotypes: C-A and T-G

Genotype and HaplotypeIf allele T can predict allele G,

two alleles are in

Linkage Disequilibrium (LD)

90% of genomic variants are SNPs

Single Nucleotide Polymorphsim

Two alternate forms (alleles) that differ

in sequence at one point in a DNA segment

Source: David Hall, Creative Commons Attribution 2.5 license

Genetic variants: Germline v Somatic

- Germline variant/mutations
- Inherited/In-born mutation
- In all cells
- In particular, in germline haploid cells
- Heritable
- Cell division - meiosis
- Somatic variants/mutations
- Acquired mutation
- Only in an isolated number of cells (tumor site)
- Generally not heritable
- Cell division - mitosis

- Background (genetics concepts)

- Basic case-control association

- Complex case-control association

- Genome-wide association

Genetic variants in association studies

Association: two characteristics (disease& genetic variant) occur more often together than expected by chance

- Direct Association / Causal

Functional variant Disease

- Functional variant is involved in disease
- Functional variant is associated with the disease
- Indirect Association

Genetic variant Functional variant Disease

- Genetic variant (SNP) is associated/correlated with underlying functional variant
- Functional variant is involved in disease
- Genetic variant (marker) is associated with disease (initial step.. Ultimate goal is to discover causal variant)

Genetic association study Designs

- Observational
- Exposure variables
- Genetic variants
- Environmental factors
- Classical association study designs
- Unit of interest is an individual
- Cohort study (cross-sectional or longitudinal)
- Case-control study
- Family-based association study
- Unit of interest is a family unit

- Sample individuals based on to disease status and without knowledge of exposure status (e.g. genotype)
- CASES (with disease)
- CONTROLS (no disease)
- Usually balanced design (#cases = #controls)
- Retrospective
- Neither prevalence nor incidence can be estimated

Types of Case-Control Study

- Population-based
- Risk estimates can be extrapolated to the source population
- Could be nested in a cohort study
- Selected sampling
- Increases power to detect associations
- Antoniou & Easton (2003)
- Tests of independence are valid
- True positive risks are exaggerated
- Can not be extrapolated

Case-Control: Population-based

- Source population
- All individuals satisfying predefined criteria
- Source cohort
- A group that is ‘representative’ of the source population
- CASES and CONTROLS occur in relation to population prevalence
- CASES
- Cases selected are ‘representative’ of cases in the source cohort
- In particular, in terms of the exposure variables
- CONTROLS
- Controls selected are ‘representative’ of controls in the source cohort
- In particular, in terms of the exposure variables
- Odds Ratio (estimate of the relative risk) can be extrapolated back to the source population
- Population Attributable Risk (PAR)

Case-Control: Selected Sampling

- Source population
- All individuals satisfying predefined criteria
- Source cohort
- A group that is ‘representative’ of the source population
- CASES and CONTROLS occur in relation to population prevalence
- CASES
- Cases selected are in effect selectively sampled from cases in source cohort
- Family history of disease, severe disease, early onset,…
- CONTROLS
- Cases selected are in effect selectively sampled from controls in source cohort
- Screened negative, no family history,…
- Association analyses are still valid and power may be increased
- BUT…
- Odds Ratio (estimate of the relative risk) can not be extrapolated back to the source population

Case-Control Study: Odds Ratio

Exposure

Yes No

Disease Cases (Yes) a b

Controls(No) c d

Odds Ratio (OR) = a / b = a × d

c / d b × c

H0: OR = 1 same risk (no association)

OR > 1 indicates increased risk

OR < 1 indicates decreased risk (protective)

95% confidence intervals for the Odds Ratio

Lower and Upper bounds for the risk estimates.

Two common methods:

- eln(OR) – 1.96se(ln(OR)), eln(OR) + 1.96se(ln(OR))

where se(ln(OR)) = 1/a+1/b+1/c+1/d

2) OR1-1.96/, OR1+1.96/

chi-square test

Compares observed values (O) with those expected under independence between rows and columns

Expected (E) = row total column total

N

chi-square statistic, with (rows-1) (columns-1) degrees of freedom

2 = (O – E)2 ~ 2(rows-1) (columns-1)

E

Test for Non-independence

H0: Disease and exposure (genotype)

are independent

chi-square tests: contingency tables

2×3 genotype table (2 df)

2×2 grouped genotype table (1 df)

- Dominant or recessive

2×3 ‘dose-dependent’ table

- Armitage test for trend (1 df)

2×2 allele table (1 df)

Modeling genetic exposures

- Exposure = genotype
- Single variant with 2 alleles (SNP)
- Three genotypes: CC, CT, TT
- 23 contingency table
- Chi-sq 2df
- Chi-sq 1df (impose a linear dependency between columns)

CC CT TT

Controls

Cases

Mode of Expression / Inheritance

- Let allele C be disease causing
- Examples of modes of expression are:
- Dominant TT TCCC
- Individuals heterozygous or homozygous for the C allele gives rise to the disease
- Recessive TT TC CC
- Only homozygous individuals for the C allele results in disease
- Codominant TT TCCC
- All three genotypes can be distinguished phenotypically
- ‘Additive’ model – TC has r-fold risk, CChas 2r effect

chi-square test

CC CT TT

Totals

Chi-stat= (120-120)2 + (40-50)2 + (20-30)2 +(120-120)2 +(60-50)2 + (40-30)2

120 50 30 120 50 30

Chi-statistic = 10.67

p-value=0.0048 (for a chi-square distribution with 2 df)

Controls

200

120

50

30

Cases

200

120

50

30

400

240

100

60

Totals

Genotypic relative risk

- Assess risk (OR) for each genotype relative to the homozygous common genotype

ORhet = a × e ORhzv = a × f

CT vs. CC b × d TT vs. CC c × d

Genotype (exposure)

CC CT TT

Controls

Cases

chi-square test / genotypic relative risk

CC CT TT

Totals

Chi-stat= (120-120)2 + (40-50)2 + (20-30)2 +(120-120)2 +(60-50)2 + (40-30)2

120 50 30 120 50 30

Chi-statistic = 10.67

p-value=0.0048 (for a chi-square distribution with 2 df)

OR het CT vs. CC = 1.5 OR hzv TT vs. CC = 2.0

Controls

200

120

50

30

Cases

200

120

50

30

400

240

100

60

Totals

Test for Non-independence

H0: Disease and exposure (genotype)

are independent

chi-square tests: contingency tables

2×3 genotype table (2 df)

2×2 grouped genotype table (1 df)

- Dominant or recessive

2×3 ‘dose-dependent’ table

- Armitage test for trend (1 df)

2×2 allele table (1 df)

Dominant model for exposure

Exposure = CT&TT genotypes - 22 test with 1 df

ORdom = a × (e+f) = 1.67

d × (b+c)

Genotype

CC CT TT

(b+c)=

Controls

Cases

(e+f)=100

Recessive model for exposure

Exposure = TT genotype (vs. CC&CT) - 22 test w/1 df

ORrec = (a+b) × f = 1.78

(d+e) × c

Genotype

CC CTTT

Controls

(a+b)=160

Cases

(d+e)=180

Test for Non-independence

H0: Disease and exposure (genotype)

are independent

chi-square tests: contingency tables

2×3 genotype table (2 df)

2×2 grouped genotype table (1 df)

- Dominant or recessive

2×3 ‘dose-dependent’ table

- Armitage’s trend test (1 df)

2×2 allele table (1 df)

Armitage Trend Test (23 with 1df)

Assess departures from a fitted trend

CC (x1=0) CT (x2=1) TT (x3=2)

R

Controls

Cases

n1

n2

n3

N

Example – genotypic relative risk and trend test

Shephard et al. Cancer Res 2009

Test for Non-independence

H0: Disease and exposure (genotype)

are independent

chi-square tests: contingency tables

2×3 genotype table (2 df)

2×2 grouped genotype table (1 df)

- Dominant or recessive

2×3 ‘dose-dependent’ table

- Armitage’s trend test (1 df)

2×2 allelic table (1 df)

Allelic Test

- Exposure = Allele (T vs. C)
- 2 x 2 table (1 df) for a single SNP
- Count every allele (2 per person)
- Doubles the sample size

ORallele = (2a+b)×(2f+e)

(2c+b)×(2d+e)

Allele

C T

Controls

OR = 1.633 T vs. C allele

Cases

More flexible techniques

- If other factors may have an effect on disease status (affected/unaffected, case/control)
- We want to account for these as covariates
- We want to adjust for matching variables (age, sex, etc.)
- Logistic regression
- Logistic transformation (logit)
- ln(p/(1-p)) = + 1x1 + 2x2 + ….
- Coefficients and ’s are estimated using maximum likelihood estimation (MLE)
- Test H0: =0 against H1: = using a likelihood ratio test (LRT)
- Must decide on how to model the genetic exposure
- genotype categories (i.e. CC, CT,TT), dominant, recessive, additive (allele dose)..

~ ~

^

Example of logistic regression model with genetic exposure and covariates

Slattery et al. IJC 2010

Assumptions for Validity

- Independence of all individuals
- Independent and identically distributed (iid)
- Reasonable sample sizes
- Contingency tables
- Expected values all > 1 and 80% > 5
- Logistic regression
- Minimum of 15-20 individuals per group
- If violated
- Simulate the null distribution for testing
- Permutation test
- e.g. Fishers exact test is an exhaustive permutation test
- Monte Carlo simulation

- Background (genetics concepts)

- Basic case-control association

- Complex case-control association

- Genome-wide association

Performing haplotype analyses

- Single locus
- We observe genotypes, so testing is straight-forward counting into a contingency table

CC CT TT

Controls

Cases

Performing haplotype analyses

- Multi-locus
- Haplotypes are not directly observed
- But can be estimated (EM/Bayesian…)
- For some individuals, their haplotype pair can be inferred unambiguously
- For many individuals they can not
- “Phase uncertainty”
- All analyses of haplotypes must take into account the phase uncertainty in the data
- Otherwise, increase in type 1 errors

Haplotypes / Genotypes

Two-locus Haplotypes:

The haplotype pair must be:

C-G and C-G

UNAMBIGUOUS

…AGCTAAACTGGATT…

…AGCCAAACTGGATT…

CG

CG

Estimating haplotypes

Genotypes

Locus 1 Locus 2 Haplotypes

CCGGC-G&C-G

CCGAC-G&C-A

CCAAC-A&C-A

CTGGC-G&T-G

CTGA?(C-G&T-A)

or (C-A&T-G)?

CTAAC-A&T-A

TTGGT-G&T-G

TTGAT-G&G-A

TTAAT-A&T-A

Estimating haplotypes

- Expectation-maximization (EM) algorithm
- SNPHAP (Johnson et al 2001)
- GCHap (Thomas 2003)
- Bayesian MCMC approach
- PHASE (Stephens et al 2001)
- Both approaches assume independent individuals
- Use to estimate
- Population haplotype frequencies estimated from a set of individuals
- Most likely haplotype pair for each individual

Traditional methods for phase uncertainty

- Likelihood based approach
- Each individual can have multiple different haplotype pairs that are consistent with the genotype data
- Some pairs of haplotypes are more or less likely than others
- Each pair is given a weight
- All possible haplotype pairs are considered in the case-control analysis
- weighted by their probabilities

Simulation methods for phase uncertainty

- Sample over the observed data
- Instead of weighting all the possible haplotype pairs for every individual and incorporating all at once into the analysis
- Sample one pair of each individual
- Randomly and in proportion to the weights, select a haplotype pair for each individual
- Perform the analysis as if those were observed
- Repeat 1,000 times…
- Average
- SIMHAP (McCaskie et al.)

Simulation methods for phase uncertainty

- Monte Carlo testing
- Simulate the null –matched to the real data
- Instead of weighting all the possible haplotype pairs for every individual and incorporating all at once into the analysis
- Assign each individual their most likely haplotype pair
- Cases and controls separately
- Simulate null haplotype data
- Null: Convert haplotypes to genotypes
- Null: Estimate haplotypes
- Null: Assign each individual their most likely haplotype pair
- Real and null are matched
- Test real data (with most likely haplotype pairs assigned) against the simulated null
- hapMC (Thomas et al.)

Exponential explosion… high dimensional data

- 1 SNP
- 2 alleles 1 test
- 3 genotypes 1+ tests
- 2 SNP loci
- 4 haplotypes
- 3 SNP loci
- 8 haplotypes
- 10 SNP loci
- 1024 haplotypes many tests..

Multi-locus… but how many, and which loci to test?

- For example…20 tSNPs
- Only perform single SNP analyses?
- Perform tests on all 20-locus haplotypes?
- Group all ‘rare’ haplotypes together
- Cluster to reduce dimension
- Multi-locus tests with subsets of 20 SNPs?
- Subsets of which SNPs?

Data mining approach to haplotype construction – hapConstructor(Abo et al.)

- Automatically builds haplotypes (or composite genotypes)
- Non-contiguous SNPs
- In a case-control framework
- All SNP haplotypes are phased during 1st stage and used in all subset analyses
- Starts with each single SNP locus
- Forward-backward process driven by significance thresholds
- Significance and false discovery rates (p-values and q-values) reported for the building process
- Computationally challenging, potentially time intensive

Multilocushaplotype association using hapConstructor

Curtin et al. BMC Med Genet 2010

Meta-association in case-control studies

- Association: two characteristics occur more often together than expected by chance
- Disease
- Genetic variants
- Meta-Association: study of association across case-control data collected by multiple study sites (collaborative effort)
- NARAC: North American Rheumatoid Arthritis Consortium
- BCAC: Breast Cancer Association Consortium

VS. “Meta-analysis of individual level data from participants in a systematically ascertained

group of studies” (Petitti definition)

Meta-analysis of multi-study case-control data: general concepts

- simple pooling – combine individual level data from multiple studies and compute association statistics
- fixed effects models – inference is conditional on the studies actually done
- in genetic association, assumes same genetic effect size across studies
- random effects models – inference is based on assuming studies in the analysis are a ‘random sample’ of hypothetical population of studies

Fixed effects models

- Methods and effect measures
- Mantel-Haenszel: Odds ratio; also rate, risk ratio
- well-known method for calculating summary estimate of effect across strata (i.e. multiple studies)
- Peto: Ratio (can approximate odds ratio)
- modification of M-H method
- General variance-based: Ratio (all types) and rate differences

Mantel-Haenszel method (fixed effects)

where i is the ith strata (study)

Mantel-Haenszel method (fixed effects)summary odds ratio

weighti = 1/variancei

where:

variance component of effect size

within studies only

Mantel-Haenszel method (fixed effects)summary odds ratio

- Strengths
- Optimal statistical properties (uniformly most powerful test)
- M-H estimate OR=1, M-H Chi-square=0

(mathematical connection of effect with summary statistic)

- Widely available in statistical software
- Limitations
- Requires data to complete 2x2 table for all studies (potential exclusion bias)
- ignores confounding not taken into account by study design (i.e. age, sex-matched controls)
- could use logsitc regression estimate of OR to simultaneously model confounding variables and to adjust for study site

CMH chi-square general association test of independence (fixed-effect method)

- Extension of Cochran-Mantel-Haenszel (CMH) test to sets of (X by Y) contingency tables (i.e. studies)
- Formulas for the CMH statistics are more easily defined in terms of matrices (Landis and Koch 1978)
- Assumes study strata are independent, and that the marginal totals of each stratum are fixed
- H0 : there is no association between X (disease status) and Y (genotype) in any of the strata
- corresponding model is the multiple hypergeometric

Heterogeneity

- If Ho: homogeneity is rejected, studies are not measuring effect of the same size
- Tests of Heterogeneity
- Q test ~Chisq. with d.f.= #studies – 1
- Mantel-Haenszel method:
- Logistic regression: add a term for interaction between study and genotypes in model (test using Wald or Likelihood Ratio)
- When heterogeneity is not extreme, fixed- and random- effects models yield similar results

Random effects models

- Methods and effect measures
- DerSimionian-Laird (1986): Ratio (all types) and difference
- Bagos and Nikolopoulos (2007): Odds ratio
- study-specific coefficient in logistic regression model representing deviation of study i’s true genotypic effect to overall mean effect
- incorporates between-study component of variance, CI’s at least as wide (wider) than fixed effects

Fixed- vs. Random- Assumptions

- analysis under fixed model addresses the question:

Was there a genotype-phenotype association in the consortium of case-control studies used in the meta analysis?

- under the random model, question:

Will there be a genotype-phenotype association “on average?”

Independent individuals

- If study cases and controls are independent (unrelated) individuals,

meta-association is straightforward...

Straightforward...

- Adjust for ‘study site’ in a logistic regression
- Use Cochran Mantel Haenszel (CMH) techniques, controlling for study
- CMH test of association
- CMH test of trend
- meta odds ratio estimate

Cox et al, Nature Genetics (2007)

- Test of Ho: no association included terms for genotype and BCAC study
- Trend test included 1 parameter for allele dose and a term for BCAC study
- Genotype-specific risks estimated as ORs using logistic regression with BCAC study as a covariate (fixed-effects)
- Tested heterogeneity between studies by comparing logistic regression models with and without a genotype x study interaction term
- Data also analyzed using a random-effects model, test for heterogeneity

Meta Association – Related individuals

- But what if some study individuals (cases or controls) are related in multi-study collaborations? ..sibships, trios, pedigrees-or mixed, in families

meta analysis of data from multiple sites is more difficult..

Genie to the rescue..

Genie overview

- Allen-Brady et al. (2006), Curtin et al. (2007)
- Simulation-based technique
- Monte Carlo approach
- Null distribution is simulated for the statistic of interest matching the pedigree structure
- Equivalent to an empirical version of the variance correction method with prior probabilities
- Flexible in type of statistic that can be analyzed
- Classical association statistics and effect measure (OR)
- Meta association statistics (fixed-effects approach)
- Dichotomous and quantitative traits

http://www-genepi.med.utah.edu/Genie/index.html

Genie: Empirical null

- Generate the empirical null
- Using appropriate allele frequencies perform a gene-drop through the pedigree
- Null genotypic configuration
- Calculate the statistic of interest using the null data ignoring relatedness
- Null statistic
- Repeat thousands of times
- Empirical estimate of the null distribution
- Assess the significance of the observed statistic by assessing where it lies in the null distribution

Creating the Simulated Null Distribution

Population allele frequencies

Assign alleles randomly to pedigree founders

Gene drop: simulated Mendelian inheritance

Repeat

Null Genotype Configuration

Calculate NULL statistic

Empirical Null Distribution

Genie Meta-association

- Fixed effects approach – assumes same genetic effect size across studies
- Generalized CMH approach – chi-square general association test of independence

extension to >2x2 tables across multiple studies

- CMH chi-square test of trend – mean score statistic where ordered genotypes (i.e. genotypes aa, aA, and AA) lie on an ordinal scale
- Meta ORs – M-H common odds ratio estimate for 2x2 tables (CT vs CC, TT vs CC)
- 95% CI estimated empirically

Empirical 95% Confidence Interval

Distribution of OR estimates from 1,000 configurations in PedGenie null

Why Genie Meta-association?

- Ability to combine family-based and independent case-control resources and use all available data
- Genie software corrects for relationships in family-based resources; all family members with phenotype and genotype data can be included
- increases the utility of pedigrees previously ascertained for linkage and can provide increased power to detect associations..

..particularly in stratified and subset analyses that may lead to small sample sizes in individual studies

- needs a logistic regression framework (underway)

Association of XRCC2 tag-SNPs with CRC in 4-study meta analysis

(Curtin et al. CEBP 2009)

*Empirical Cochran-Mantel-Haenszel χ2 test for trend or recessive model based on 10,000 simulations.

Association of XRCC2rs3218499G>C with CRC in 4-study meta analysis

*Empirical Cochran-Mantel-Haenszel χ2 test for recessive model based on 10,000 simulations.

Genomewide (case-control) Association GWA: an approach to the study of common diseases

- Complex architecture
- Multiple genes likely involved
- Multiple environmental factors
- Individually low risks
- Argument that the underlying variants may be common and of modest effect..
- Common variants (>0.05, >0.01)
- Not under intense negative selection
- Agnostic.. no hypothesis
- Hypothesis generating vs. hypothesis driven (candidate gene or pathway)

GWA: What is required?

- Large set of SNPs
- Stringent significance thresholds
- ~5 x 10-8
- Large case-control sample size
- Example
- Allele frequency 0.15
- OR=1.25
- 80% power
- 6,000 cases and 6,000 controls

Large set of SNPs

- Linkage-disequilibrium (LD)-based
- Genomewide tag-SNP set
- Made possible by HAPMAP
- 500,000-1,000,000 SNPs
- High-density arrays with 2 million SNPs
- Not optimal for rare variants…
- tag-SNP methods ignore them

Stringent significance thresholds

- Very few ‘hits’ per study
- 1,3,4,5 significant hits per genome using GWA
- If don’t correct and use nominal 0.05
- In 500,000 markers
- Can expect 25,000 false positives
- Need to use a correction for multiple testing
- significance threshold of ~510-8 (Dudbridge & Koeleman ASHG 2004)
- Good… but not great
- But we’re expecting many more genes to be found… right?
- Less stringency and instead use replication?

Multistage strategies in GWA

Hirschhorn & Daly Nature Reviews 2005

Interactions

- An increase (or decrease) effect of one exposure given another.
- Gene-environment interaction
- Risk (genotype AA / no smoke) = 4
- Risk (genotype AA / smoke)= 6
- Gene-gene interaction
- Epistasis
- Risk (genotype AA / genotype bb) = 4
- Risk (genotype AA / genotype Bb,BB) = 6

Statistical Interactions

- Multiplicative model
- Most commonly used
- Natural to a risk framework
- Logistic regression
- Independent loci
- multiply risk OR11=OR10×OR01
- Interaction
- OR11≠OR10×OR01

Multiplicative model

- Multiplicative risk for alleles at each locus
- First locus
- aa 1.00 1
- aA 2.20
- AA 4.84 2
- Second locus
- Bb 1.00 1
- bB 1.50
- BB 2.25 2

Risk: Two Independent Loci (multiplicative)

OR11= OR10 OR01

Statistical Interactions

- Additive model
- Less popular
- Independent loci
- Add risks
- OR11= 1 + (OR10-1) + (OR01-1)
- Interaction
- OR11≠ 1+ (OR10-1) + (OR01-1)

Additive model

- Additive risk for alleles for each single locus
- First locus
- aa 1.00 1
- aA 2.20
- AA 3.40 2-1
- Second locus
- bb 1.00 1
- bB 1.50
- BB 2.00 2-1

Risk: Two Independent Loci (additive)

OR11=1+(OR10-1)+(OR01-1)

No main effects???

- No main effects
- Only interaction effects
- Problem:
- In a stepwise procedure, if aren’t able to identify the main effects, then how do you know to test the interaction??
- HOWEVER… Thus far, no biological model has been put forth that support the lack of main effects

Case-control design: ORs

- Testing in the Odds Ratio framework
- H0: OR11=OR10×OR01
- H0: IOR11=1.0

Case-control design: ORs

- IOR11= OR11

OR10 OR01

- Under the null, IOR11 = 1
- Can do several IORs
- 11, 12, 21 and 22
- Can construct confidence intervals to test for a significant interaction

Case-control design: logistic regression

logit P(Y=1/G1,G2) = + 1G1 + 2G2 + 3G1×G2

- Parameter is an estimator for ln(IOR) under a multiplicative model
- G1 and G2 can be modeled several ways
- Dominant
- Recessive
- Additive
- 3 levels

Methods: MDR

- Multifactor-Dimensionality Reduction (MDR)
- Ritchie et al (2001) Am J Hum Genet
- Combinatorial partitioning
- Data mining
- http://www.epistasis.org/software.html

MDR

- Divide sample into 10 equal partitions
- Model on 9/10 (1…9)
- Test on 1/10 of data (10)
- Repeat 10 times and average the misclassification
- Pick n loci from the total N SNPs
- Exhaustively assess all combinations
- All cells cases>controls (high-risk)
- All cells cases<controls (low-risk)
- Group
- Repeat for all possible n of N
- May be too many… doesn’t scale well

Machine learning

- Machine learning
- Classification trees (e.g. CART)
- Greedy algorithms
- Not optimal
- Cook et al (2004) Stat Med
- Artificial Neural Networks (ANNs)
- GPNN software
- Motsinger et al (2006) BMC Bioinformatics
- Support Vector Machine Approach
- Combinatorial optimization techniques
- Local search
- Genetic algorithms
- Weng et al (2007) Genet Epidemiol

Other approaches

- Logistic regression framework
- tagSNPs and powerful models for epistasis
- Chapman and Clayton (2007) Genet Epidemiol

- Case-control
- Haplotype interactions
- FAMHAP
- Becker et al (2005) Genet Epidemiol

..questions ?

Quantitative traits

- Simple comparisons
- 2 groups (e.g. alleles, dominant)
- Normal test large sample sizes
- T-test small sample sizes
- Mann-Whitney non-parametric
- >2 groups (e.g. genotypes)
- ANOVA (F-test)
- Kruskall-Wallis non-parametric
- Including Covariates
- Linear regression y = + x
- Again, need to model genetic exposure

~ ~

Family-Based Methods

- Parent-Offspring Trios
- Haplotype Relative Risk (HRR)
- Transmission/Disequilibrium Test (TDT)
- Quantitative TDT (QTDT)
- Generalized Estimating Equations (GEE)
- Nuclear Families
- Sibling TDT (STDT)
- FBAT
- QTDT
- GEEs

Family-Based Methods

- General Pedigrees (small to moderate size)
- PDT
- FBAT
- QTDT
- Variance correction (posterior probability)
- CCREL
- Extended Pedigrees
- Variance correction (prior probability)
- Quasi-Likelihood Score (QLS)
- PedGenie

Transmission/Disequilibrium Test (TDT)

- Transmission method
- Spielman et al (2003)
- Trio method
- Requires genotype data on all three individuals
- The statistic considers only {parent, affected-offspring} pairs from the trio for which the parent is heterozygous
- Compare the number of times each of the different alleles is transmitted to the affected offspring
- Is there evidence for preferential transmission of one allele over the other?

TDT: Validity

- H0: (1-2) = 0
- A test for both association and linkage
- Robust to stratification

TDT

CT

CT

Two heterozygous parents

One parent:

Transmits C to offspring

Other parent:

Transmits T to offspring

CT

Download Presentation

Connecting to Server..