1 / 17

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture. Paper by E. Xing, K. Sohn, M. Jordan and Y. Teh, ICML 2006. Duke University Machine Learning Group Presented by Kai Ni August 24, 2006. Outline. Background Dirichlet Processe mixture

aggie
Download Presentation

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Paper by E. Xing, K. Sohn, M. Jordan and Y. Teh, ICML 2006 Duke University Machine Learning Group Presented by Kai Ni August 24, 2006

  2. Outline • Background • Dirichlet Processe mixture • Hierarchical Dirichlet Process mixture • Application on haplotype inference

  3. Motivation • Problem – Uncovering the haplotypes of single nucleotide polymorphisms (SNP) within and between populations. • Methods – Coalescence, finite and infinite mixtures, and maximal parsimony. • Application • Biological and medical analysis; • Genetic demography study.

  4. Background • A SNP haplotype is a list of alleles at contiguous sites in a local region of a single chromosome. A haplotype is inherited as a unit. • For diploid organisms, two haplotypes go together to make up a genotype, which is a list of unordered pairs of alleles in a region. • Haplotype inference from genotype data can be formulated as a mixture model. HDP mixture is used in this paper.

  5. Dirichlet Processes • A single clustering problem can be analyzed as a Dirichlet processes (DP).

  6. DP mixture model • G can be looked as an mixture model with infinite components.

  7. DP-Haplotyper • denotes the genotype of T contiguous SNPs of individual i from ethnic group j. • The corresponding paternal/maternal haplotypes of the individual genotype is denoted by • H is assume to be a random perturbation of an ancestral haplotype A, or founder. • DP-Haplotyper is a DP mixture model to model a single population group.

  8. Graph model of DP-Haplotyper

  9. Hierarchical Dirichlet Process • Each group is modeled as a DP Gj and the group-specific DPs are linked via a global DP G0. • G0 defines the set of mixture components used by all the groups. Different groups share the same set of mixture components (underlying clusters ), but with different mixture proportions.

  10. HDP mixture model • HDP can be used as the prior distribution over the factors for nested group data. • Consider a two-level DPs. G0 links the child Gj DPs and forces them to share components. Gj is conditionally independent given G0

  11. HDP – Chinese Restaurant Franchise • First level: within each group, DP mixture • Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be the values taken on by Φj1,…,Φj(i-1), njk be # of Φji’= Ѱjt, 0<i’<i. • Second level: across group, sharing clusters • Base measure of each group is a draw from DP: • Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of Ѱjt=Өk, all j, t.

  12. HDP-Haplotyper model

  13. Parameterization form of the model • Underlying mixture component Ak := [Ak,1, … , Ak,T] – founding haplotype configuration • Base measure , where p(A) is uniform distribution and p( ) is a beta distribution. • Inheritance model • Genotyping model

  14. Gibbs Sampling • Gibbs sampling variants includes: • Sampling scheme is similar to a two-level urn model:

  15. Simulated data • 100 individuals from 5 groups (20 each). Each group has 2 shared founders and 3 unique founders, in a total of 17 founders.

  16. Real data • International HapMap Project, containing four population of genotypes.

  17. Conclusion • The author proposed a HDP mixture model for haplotype inference for multiple populations. • HDP prior couples multiple heterogeneous populations and facilitates sharing mixture components across multiple infinite mixture models. • In the future, longer SNP sequences will be considered. Also HDP can be generalized to the problem in which the group labels are unknown and to be inferred.

More Related