Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in...
Download
1 / 22

Anitha Kannan and John Winn - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations. Anitha Kannan and John Winn. Jim Huang *.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Anitha Kannan and John Winn' - aretha-rivera


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations

Anitha Kannan and John Winn

Jim Huang*

Probabilistic and Statistical Inference Group, Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada

Microsoft Research Cambridge Machine Learning and Perception Group Cambridge, UK

ISMB/ECCB 2007

ISMB/ECCB 2007

24/07/2007


Outline
Outline to regulatory and phenotypic variation in human populations

  • Main contributions:

    • Joint Bayesian modelling of genetic variation data and quantitative trait measurements

    • Rich probabilistic model for genotype data

      • State-of-the-art results on predicting missing genotypes

ISMB/ECCB 2007

ISMB/ECCB 2007

24/07/2007


Outline1
Outline to regulatory and phenotypic variation in human populations

Genotype: Unordered pair of SNPs along both chromosomes

Presence of recombination hotspots partitions haplotypes into blocks [Daly, 2001]

Haplotype: Ordered set of SNPs along a chromosome

ISMB/ECCB 2007


Part i learning haplotype block structure
Part I: Learning haplotype block structure to regulatory and phenotypic variation in human populations

  • Our model for genotype data should:

    • Account for phase & parent-child information

    • Account for uncertainty in ancestral haplotypes

    • Account for uncertainty in block structure

    • Account for population-specific haplotype block statistics

    • Allow for prior knowledge of haplotype block structure

ISMB/ECCB 2007


Previous models for genotype data
Previous models for genotype data to regulatory and phenotypic variation in human populations

  • Previous methods learn a low-dimensional representation of the genotype data:

  • HAPLOBLOCK (Greenspan, G. and Geiger, D. RECOMB 2003)

    • Hard partitioning of data into set of haplotype blocks using low-dimensional “ancestral” haplotypes

  • fastPHASE (Scheet P. and Stephens, M. Am J Hum Genet 2006)

    • Learn ancestral haplotypes from high-dimensional genotype data while accounting for uncertainty in haplotype blocks

  • Jojic, N., Jojic, V. and Heckerman, D. UAI 2004.

ISMB/ECCB 2007

ISMB/ECCB 2007

24/07/2007


Probabilistic generative model for genotype data
Probabilistic generative model for genotype data to regulatory and phenotypic variation in human populations

Unsupervised learning via maximum likelihood

Low-dimensional latent representation

High-dimensional data

ISMB/ECCB 2007


Predicting missing genotype data
Predicting missing genotype data to regulatory and phenotypic variation in human populations

  • Have we learned a good density model for genotype data?

  • Gains from

    • Accounting for uncertainty in haplotype block structure

    • Accounting for uncertainty in ancestral haplotypes

    • Accounting for parental relationships

  • Assess model using cross-validation/test prediction error

ISMB/ECCB 2007


Predicting missing genotype data1
Predicting missing genotype data to regulatory and phenotypic variation in human populations

  • Crohn’s/5q31 data set (Daly et al., 2001)

    • Crohn’s disease data from Chromosome 5q31 containing genotypes for 129 children + 258 parents across 103 loci (phases given for children)

  • For each test set, make ρ fraction of data missing

  • Retain model parameters from model learned from training data, then draw 1000 samples over missing data

  • Compute fill-in error rate over 1000 samples, for all missing data

ISMB/ECCB 2007


Prediction error for crohn s 5q31 data
Prediction error for Crohn to regulatory and phenotypic variation in human populations ’s/5q31 data

ISMB/ECCB 2007


Comparative performance for crohn s 5q31 data
Comparative performance for Crohn to regulatory and phenotypic variation in human populations ’s/5q31 data

ISMB/ECCB 2007


Establishing haplotype block boundaries
Establishing haplotype block boundaries to regulatory and phenotypic variation in human populations

  • Define the recombination priorγ on transition probabilities

    • Different γ correspond to different “blockiness” of data

  • For each locus k, can compute the probability of transition pk

    • Can establish a threshold t and establish block boundaries

  • Once blocks are defined, can assign block labelslb= (m,n)

ISMB/ECCB 2007


Haplotype block structure in the enm006 region
Haplotype block structure in the ENm006 region to regulatory and phenotypic variation in human populations

  • 573 SNP markers for 270 individuals from 3 sub-populations:

    • 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI);

    • 90 individuals (30 trios) of European descent from Utah (CEU)

    • 45 Han Chinese individuals from Beijing (CHB+JPT)/45 Japanese individuals from Tokyo (JPT)

ISMB/ECCB 2007



A model for linking haplotype structure to quantitative trait measurements
A model for linking haplotype structure to quantitative trait measurements

Label 3

Label 4

Label 1

Label 2

Individual 1

Observed quantitative trait profile

Individual 2

Haplotype block 1

Individual 3

Individual 4

Individual 5

Individual 1

Individual 2

Haplotype block 2

Individual 3

Individual 4

Individual 5

Relevance variable

Latent block profile

x 1.0

x

+

=

x

x 0.0

ISMB/ECCB 2007


A bayesian model for linking haplotype structure to quantitative measurements
A Bayesian model for linking haplotype structure to quantitative measurements

blocks b = 1,…,B

Tbj

quantitative traits g = 1,…,G

individuals j = 1,…,J

wbg

π0

Block label

Relevance variable

Latent block profile

Sbj

μbg

τ0,μ0

ρg

zgj

α0,β0

Noise precision

Observed trait

ISMB/ECCB 2007


Linking haplotype blocks to phenotype
Linking haplotype blocks to phenotype quantitative measurements

Test cases (sorted)

Test data splits

  • 387 individuals with Crohn’s (+1) or non-Crohn’s (-1) phenotype;

  • Link 10 haplotype blocks from 5q31 to phenotype

  • Average cross-validation error: 23.1% + 3.45%

Haplotype blocks 2 and 10 most relevant to Crohn’s phenotype (p < 4.76 x 10-5)

ISMB/ECCB 2007


Linking haplotype blocks to gene expression
Linking haplotype blocks to gene expression quantitative measurements

  • ENm006 data set:

    • 19 haplotype blocks (573 SNPs)

    • 28 gene expression profiles in ENm006 region (Stranger et al., 2007)

ISMB/ECCB 2007


Addressing population stratification
Addressing population stratification quantitative measurements

The population variable affects phenotype/gene expression…

…whereas variation between individuals is the effect we’re interested in

ISMB/ECCB 2007


Associations between haplotype blocks and gene expression
Associations between haplotype blocks and gene expression quantitative measurements

p < 3.33 x 10-4

p < 2.5 x 10-4

GDI1 - HapBlock2 (YRI)

GDI1 - HapBlock5 (CHB+JPT)

ISMB/ECCB 2007


Summary
Summary quantitative measurements

  • Enhanced version of Jojic et al. (UAI 2004) model for haplotype inference/ discovering block structure

  • Novel Bayesian model for associating haplotype blocks to gene expression

  • We re-discover population-specific block structures across populations in the HapMap data

  • Predictions for Crohn’s disease from Chromosome 5q31 data

  • Cis- associations between blocks and gene expression in ENm006 in presence of non-genetic factors

  • Cis- association between HapBlocks 2 and 5 and GDI1

ISMB/ECCB 2007


The road ahead
The road ahead quantitative measurements…

  • Applying to larger portions of the HapMap data

  • Finding trans- associations

  • Non-linear models for associating block structure to quantitative traits

  • Joint learning of haplotype block structure and associations

  • Accounting for patterns of gene co-expression/similar phenotypes

ISMB/ECCB 2007


Acknowledgements
Acknowledgements quantitative measurements

  • Manolis Dermitzakis and Richard Durbin, Wellcome Trust Sanger Institute

  • Nebojsa Jojic,

    Microsoft Research Redmond

  • Paul Scheet,

    University of Michigan - Ann Arbor

  • US National Science Foundation (NSF)

ISMB/ECCB 2007


ad