Genotype Phasing and Imputation in 1x Sequencing Data. Warren W. Kretzschmar. DPhil Genomic Medicine and Statistics Wellcome Trust Centre for Human Genetics, Oxford , UK Supervisor: Jonathan Marchini. Major Depression.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Genotype Phasing and Imputation in 1x Sequencing Data
Warren W. Kretzschmar
DPhil Genomic Medicine and Statistics
Wellcome Trust Centre for Human Genetics, Oxford, UK
Supervisor: Jonathan Marchini
Top Ten causes of DALYs
DALY : Disability adjusted life year
: number of years lost due to ill-health, disability or early death
Genetics of Major Depression
Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium (2012). A mega-analysis of genome-wide association studies for major depressive disorder. Molecular Psychiatry18.4:497-511.
(China, Oxford and VCU Experimental Research on Genetic Epidemiology)
59 hospitals, 45 cities, 21 provinces.
Genetically Homogeneous : All subjects are female and their grandparents are Han Chinese
6,000 cases : typically severe affected: 85% qualify for a diagnosis of melancholia by DSM-IV. >25% reported a family history of MD in one or more first-degree relatives
6,000 controls : patients undergoing minor surgical procedures.
Extensive Phenotyping: primary disorder of major depression, common comorbid disorders (e.g. generalized anxiety disorder, panic disorder), within disorder symptoms (e.g. suicidal ideation), disorder subtypes (e.g. melancholia, dysthymia), possible endophenotypes (e.g. neuroticism) and a range of risk factors (e.g. child abuse, stressful life events, social and marital relationships, parenting, post-natal depression, demographics).
Sequencing : mean depth 1.7X using llluminaHiSeqat Beijing Genomics Institute
Current status Sequencing finished. We have data on 12,000 samples. For now we have only considered ~13M sites polymorphic 1000 Genomes Asian samples. Analysis ongoing…
Sequence analysis pipeline
Phase 1: genotype likelihood estimation
One sample at a time
Phase 2: phasing and imputation
All samples together
2.7 CPU years
5 CPU years
4.6 CPU years
Example SNP chip data
Unphased: G/G A/T A/A T/T G/T A/T T/T A/A G/G G/C
Hap 1: G A A T T T T A G C
Hap 2: G T A T G A T A G G
Genotype Imputation from Haplotypes
J Marchiniand B Howie. Nature Rev. Genet. 2010
What is a Genotype Likelihood?
Genotype likelihoods (aka GL) are defined on a site by site basis.
GLs are conditional probabilities.
Genotype Likelihood = Pr( R | G )
R = Reads; also known as the “observed data”
G = Genotype; usually one of ref/ref, ref/alt, alt/alt
How are Genotype Likelihoods Useful?
Genotype likelihoods allow us to quantify how much the reads support each possible genotype independent of other information.
To determine the most likely genotype call, we need a genotype probability.
Genotype Probability = Pr ( G | R ) proportional to Pr( R | G ) * Pr( G )
Pr( G ) = prior probability of G.
May be determined through haplotype phasing and imputation approaches.
Genotype Likelihood Creation with SNPTools
Pr(R|G = ref/ref) = 0.06
Pr(R|G = alt/alt) = 10e-6
Pr(R|G = ref/alt) = 10e-3
Y Wang, J Lu, J Yu, RA Gibbs, FL Yu. Genome Research. 2013
Genotype Phasing using Genotype Likelihoods
Hap 1: G A A T T A C A G G
Hap 2: G T A T T A T A G G
Hap 3: G T A T G A C A G G
Hap 4: G T A T G A T A G C
Example GL data
Pr(ref/ref): G/G A/AA/A T/T G/G A/A T/T A/A G/G G/G
Pr(ref/alt): G/AA/TA/GT/A G/T A/T T/C A/G G/C G/C
Pr(alt/alt): A/A T/TG/GA/A T/TT/TC/C G/G C/C C/C
Plausible Haplotypes after Phasing
Hap 5: G A A T T A T A G C
Hap 6: G T A T T A T A G G
General MCMC Scheme for Phasing from GLs
The Tools/Languages I use
A Bioinformatician’s Best Practices
according to Nick Loman & Mick Watson. Nature Biotechnology. 2013
see also: W. S. Noble. PLoS Computational Biology. 2009
Good Directory Structure
according to W. S. Noble. PLoS Computational Biology. 2009
Thank you. Questions?