1 / 21

Associating Genomic V ariations with Phenotypes

Associating Genomic V ariations with Phenotypes. M odel comparison , rare variants , and analysis pipeline. Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine. Data & Question. Genotypes: SNP Insertion Deletion Duplication

fleta
Download Presentation

Associating Genomic V ariations with Phenotypes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine

  2. Data & Question Genotypes: SNP Insertion Deletion Duplication Inversion Translocation … Relationship between X and Y ? Phenotypes (quantitative, categorical)

  3. Linkage & Association Genotypes Association: (Y,X) Linkage: (Y,Q) Q is unobservable Phenotype • r1Q r2 • Putative QTL

  4. A Fixed-effect Mixture Model For Linkage • P1 X P2 • F1 Commonly used in plant genetics • SNP A SNP B • F2 • r1Q r2

  5. A Variance-component Model For Linkage • SNP A SNP B Commonly used in human genetics QTL IBD matrix Background IBD matrix Diagonal unit matrix • r1Q r2

  6. Variance-component Model = Random-effect Linear Model Random effects

  7. From Linkage to Association QTL effect(s) Linkage model Family-based association model marker effect(s) fixed effect(s)

  8. A Simple Association ModelFor Unrelated Subjects

  9. Covariate(s): Adjusting For Confounder(s) Observed confounders: age, sex etc. Hidden confounders: population structure Population structure can be estimated by: • -PCA -Clustering -Admixture/ancestry

  10. Modeling Hidden Genetic CorrelationBetween Subjects marker fixed effect(s) covariate fixed effect(s) Genetic background random effects Family data, pedigree => IBD matrix Population data, hidden, marker data => IBS matrix

  11. Modeling Rare Variants Common variants, tested individually, H0: β1=0. One p-value per variant Rare variants, tested as an entire group (burden test), usually by gene H0: β1= β2=…=βk=0 . One p-value per group of variants • Incorporated with variable selection, with loose criteria • β can be treated as random effects, variance components test, can be weighted by prior information

  12. Collapsing Model Collapsing multiple variables into one

  13. Weighted Sum Model Weighted sum score

  14. Weighting Variants • Base on allele frequency, continuous or binary(0,1) weight, variable threshold; • Based on function annotation/prediction; • Based on sequencing quality (coverage, mapping quality, genotyping quality, validated or not etc.); • Data-driven, using both genotype and phenotype data, learning weights (including effect directions) from data, requiring permutation test; • Any combination … Grouping Variants • By gene By transcript By exon • By gene set / pathway By protein domain • ……

  15. Modeling More Data TypesGeneralized Linear (Mixed) Model Link function For binary Y, logistic model

  16. Longitudinal Data (quantitative) Time • Fixed effect, time as covariate • Repeated measures, random effect, correlation within subjects

  17. Longitudinal Data (binary) Time • Linear model, time as covariate • Survival analysis, CoxPH model etc.

  18. Tools • SAS Procedures • REG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST • R Functions/Packages • lm (), glm() • gee, nlme, kinship2/coxme, lme4, survival • Other Programs • SOLAR, MMAP, EMMA, EMMAX, SKAT

  19. Pipeline Input (data + options) Job generating/submitting module Job number controlling module job2 job1 ….. Job N LSF bsub Options.jobi=> self-programmed modules (SAS, R,…) Options.jobi=> external program modules (MMAP, SKAT,..) ….. Result 1 Result 2 Result N Job status monitoring module (all done ?) Yes no Result summarizing module Wait …

  20. gwas.sh options.gwa [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file= phenotype_file=fhs100 markerinfo_file=mapall marker_selection=MAF>0.01 pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp … [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M maxjobn=300 … #!/bin/sh OPFILE=$1 ... … Pheno type covar program analysis run Bmi qt age,sex SASGLM mixed YES Obesql NA SASGLM gee YES HD ql age SASGLM gee NO Age … Sex … … Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.Zhang GSTAT R /dsg1/code/R/gstat.RQ.Zhang MMAPC /dsg1/code/sas/mmap.sh J. Czajkowski …

  21. Thanks !

More Related