html5-img
1 / 21

Sampling Design in Regional Fine Mapping of a Quantitative Trait

Sampling Design in Regional Fine Mapping of a Quantitative Trait. Banff International Research Station Emerging Statistical Challenges and Methods Session 7: GWAS and Beyond II 25 June 2014. Shelley B. Bull , Lunenfeld-Tanenbaum Research Institute,

asta
Download Presentation

Sampling Design in Regional Fine Mapping of a Quantitative Trait

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling Design in Regional Fine Mapping of a Quantitative Trait Banff International Research Station Emerging Statistical Challenges and Methods Session 7: GWAS and Beyond II 25 June 2014 Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public Health, University of Toronto Co-authors: Zhijian Chen and RaduCraiu Lunenfeld-Tanenbaum Research Institute & University of Toronto

  2. Overview • Setting • Studies designed to follow up associations detected in a GWAS • Fine-mapping of a candidate region by sequencing • Aim to identify a functional sequence variant • Approach • Phase I: Quantitative trait with GWAS data (eg. N = 5000) • Phase II: Two stage design • Stage 1 sample (n1) – expensive sequencing to identify a • smaller set of promising variants • Stage 2 sample (n2) – cost-effective genotyping of selected variants in an independent group • Stratification in Stage 1 according to a promising GWAS tag SNP • Bayesian analysis in Stage 1, incorporating genetic model selection

  3. Two-phase Two-stage Design

  4. Background Two-phase designs +/- Stratification on tag SNP Chen et al (2012), Schaidet al (2013), Thomas et al (2013) Earlier: case-cohort designs Two-stage designs Skolet al (2007), Thomas et al (2009), Stanhope & Skol (2012) Bayesian approaches to genetic association Stephens & Balding (2009), Wakefield (2009), WTCCC/Malleret al (2012) Genetic model (mis)specification Jooet al (2010), Spencer et al (2011), Vukcevicet al (2011), Faye et al (2013)

  5. Sampling Designs & Sample Allocation • Based on tag SNP (AA, Aa, aa) from the GWAS: • Simple random sampling (SRS) – ignores tagSNP information • Equal (ES) number from each stratum • Oversampled homozygous (HO) – number larger than under SRS • Example:N=5000, MAF=0.2

  6. Quantitative Trait Model QT Model Parameters: θ = (β0 , β1 , σ 2 ) Genetic Models: M1= additive, M2= dominant, M3= recessive

  7. Bayesian Inference: Stage 1 sample • Specify priors for the genetic models and the regression parameters • p(Mj ) = ⅓ p( θ | Mj ) = p( θ ) • p( θ ) = p(β0 , β1 | σ 2 ) p( σ 2 ) normal-inverse-gamma (NIG) • Derive model-specific posterior for the regression parameters for a functional sequence variant – analytic when prior is NIG • Select a genetic model for each seq variant according to the posterior probability wj = p(Mj | data ) • Given selection of a genetic model, compare all seq variants in the region by computing the posterior probability that variant k is functional given all the data, and rank them (the Bayes factor) • p(1) ≥ p(2) ≥…≥p(m) • Construct a 95% credible interval that includes all variants such that • p(1) +p(2) +… +p(k)≥ 0.95 for minimum k

  8. Criteria for a Good Design Higher probability that the correct genetic model is identified for the sequence variant Fewer sequence variants selected into the credible set (number and %) * cost Higher probability that the functional sequence variant is selected into the credible set * power Higher probability that the functional sequence variant is top ranked in the credible set

  9. Simulation Design (APOE gene region, 1KG) Quantitative trait model is Y = β0 + β1 X +γ1(X=1) + ϵ, Parameters specified by β0=5, β1=0.25, σ2 =0.1, 0.5, 1.5 and σ/β1 =1.3, 2.8, 4.9

  10. Simulation Results: Genetic model selection Designs: SRS ____ ES - - - - HO ….. Data simulated under additive, dominant and recessive genetic models. The rate of selecting the true genetic model for the functional variant using the strong criteria of wj >0.833. Common seq variant (MAF=0.2) 1000 simulations

  11. Simulation Results: Size of the 95% credible set Data simulated under additive, dominant and recessive genetic models. Upper panels: common variant (MAF=0.2) with σ/β1=4.9 (m=201) Lower panels: low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332) 1000 simulations Designs: SRS ____ ES - - - - HO …..

  12. Simulation Results: Selection of functional variant Designs: SRS ____ ES - - - - HO ….. Data simulated under additive, dominant and recessive genetic models. Upper panels: common variant (MAF=0.2) with σ/β1=4.9 (m=201) Lower panels: low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332) 1000 simulations

  13. Simulation Results: Functional variant top ranked Designs: SRS ____ ES - - - - HO ….. Data simulated under additive, dominant and recessive genetic models. Upper panels: common variant (MAF=0.2) with σ/β1=4.9 (m=201) Lower panels: low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332) 1000 simulations

  14. Simulation Results: Model selection Data simulated under additive, dominant and recessive genetic models. For cases without model selection (no MS), analysed under an additive model. Common seq variant (MAF=0.2), σ/β1=4.9, n1=600, 1000 simulations

  15. Simulation Results: Cost Efficiency (CE) A total of m sequence variants are identified in n1 individuals in stage 1, and a proportion q = (m2 / m) are genotyped in n2=N-n1 in stage 2. Cost depends on c1, the stage 1 per individual sequencing cost, and on c2, the stage 2 per individual per marker genotyping cost. e.g. if N = 5000, n1=500, c1=$1000, n2=4500, m2=100, and c2=$0.50, then the total two-stage cost is $500,000 + $225,000 = $725,000 compared to a one-stage cost of $5 million. CE is defined as “Power” / Cost, where “Power” is estimated by the probability that a functional variant falls within the 95% credible set

  16. Comments and Discussion • Incorporating Bayesian genetic model selection is worthwhile • Selection of informative individuals for expensive data collection can be a useful strategy in statistical genetic design and analysis • The simulations confirm the intuition that the efficiency of the tag-stratified sampling strategy increases with tag-seq correlation. • Winner’s curse effects propagate from the GWAS, but are more complicated • Cost-efficiency of a two-stage design depends on the relative costs of sequencing versus genotyping – will it remain practical? • Analysis of the sequence data limited to low frequency and common variants – extensions to rare variants • Other design options – trait-dependent sampling • How to conduct joint Bayesian inference for stages 1 and 2?

  17. Acknowledgements Co-Authors: Zhijian Chen, STAGE Post-doctoral Fellow Radu Craiu, Dept of Statistical Sciences Thanks to Laura Faye and Andrew Paterson for helpful discussions, and to referees for improvements to the paper. To appear in Genetic Epidemiology Funding

  18. Thanks

  19. Simulation Results Summary In stage 1, a total of m variants are sequenced in n1 = 500 individuals, with equal strata sampling (ES) and an additive genetic model. Size is the number m2 of sequence SNPs in the 95% credible set (% or count). P(Select) is the probability the functional variant is selected into the credible set. P(Rank) is the probability the functional variant is top ranked in the credible set.

  20. GWAS Sample Size

  21. Title

More Related