Statistical Analysis in Case-Control studies. Summer International Workshop Aug, 09, Beijing. Liu Tian Genome Institute of Singapore firstname.lastname@example.org. Outline . Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure .
Related searches for statistical analysis in case-control studies
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Summer International Workshop
Aug, 09, Beijing
Genome Institute of Singapore
Dramatic variation do exist within a same spice
Almost every biological phenomenon involves a genetic component
There is always a keen need for us to seek the genetic variation relates to complex traits.
A cohort study is a study where a group of individuals are followed.
Cohort studies can be either prospective or retrospective
Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition (the ‘cases’) with patients who do not have the condition but are otherwise similar (the ‘controls’)
Case-control studies are retrospective and non-randomized
Population-based cases: include all subjects or a random sample of all subjects with the disease at a single point or during a given period of time in the defined population.
All patients in a hospital department at a given time
Principles of Control Selection:
Study base: Controls can be used to characterise the distribution of exposure
Comparable-accuracy: Equal reliability in the information obtained from cases and controls (to avoid systematic misclassification)
Overcome confounding: Elimination of confounding through control selection (matching or stratified sampling)
General population controls:
registries, households, telephone sampling
costly and time consuming
eventually high non-response rate
Patients at the same hospital as the cases
Easy to identify; less recall bias; higher response rate
Not rare diseases
Prospective: Expensive and time consuming
Retrospective: in adequate records
Validity can be affected by losses to follow-up
Individuals are unrelated
To test if marker genotypes distribute differently between the cases and controls
By comparing within cases and controls, we identify those genetic factors correlated with a pre-defined phenotype
For one marker with two alleles, there can be three possible genotypes:
Hypothesis: all 3 different genotypes have different effects
Genotypic value is the expected phenotypic value of a particular genotype
AA vs. Aa vs. aa
Hypothesis: the genetic effects of AA and Aa are the same (assuming A is the minor allele)
AA and Aa vs. aa
AA vs. Aa and aa
Hypothesis: the genetic effects of allele A and allele a are different
A vs. a
df = 2
df = 1
df = 1
df = 1
- Allelic test
- Dominant gene action
- Recessive gene action
- Genotypic test
A General Model:
pdisease is the probability that an individual has a particular disease.
β0 is the intercept
β1, β2 …βJ are the effects of genetic factors
X1, X2 …XJ are the dummy variables of genetic factors
Logistic regression describes the relationship between a dichotomousresponse variable and a set of explanatory variables.
Logit model is the only model under which β, the effect parameter, can be estimated in retrospective studies as same as in prospective studies.
If the sampling rate for cases is 10 times that for controls, the intercept estimated is log(10) =2.3 than the one estimated with a prospective study.
Significant test focus on:
Estimator is the estimated odds ratio for genetic factor i.
The sign of determines whether is increasing or decreasing when the effect of genetic factor i exists.
Fisher’s Exact Test:
When sample size is small, the asymptotic approximation of null distribution is no longer valid. By performing Fisher’s exact test, exact significance of the deviation from a null hypothesis can be calculated.
For a 2 by 2 table, the exact p-value can be calculated as:
-- An advantage of the Cochran-Armitage test is that it does not assume Hardy-Weinberg equilibrium
-- Typically used to test a 2 × k contingency table, when the effects of AA, Aa, and aa are thought to be ordered.
-- In genome-wide association studies, the additive (or codominant) version of the test is often used.
-- Affymetrix: 500K; 1M chip arrives in early 2007.
-- Illumina: 550K chip costs (gene-based)
Single Nucleotide Polymorphism
-- Expected proportions of genotypes are not consistent with observed allele frequency
-- HWE p-value < 10-4 to 10-6
-- Sample genotype success rate < 95 to 97.5%
-- Greater proportion of heterozygous genotypes than expected
-- Based on pair-wise comparisons of similarity of genotypes
--principal component analysis (Price’s method)
-- cluster analyses (Plink)
-- Case/control, TDT, quantitative traits
-- LD and haplotype block analysis
-- tag SNP selection algorithm
-- visualization and plotting GWAS results from PLINK