1 / 13

mStruct: A New Admixture Model for Population Structure Inference

This paper introduces mStruct, a novel admixture model that considers both genetic admixing and allele mutations in population structure analysis. The model is based on microsatellites and single nucleotide polymorphisms (SNP) data and incorporates a mutation parameter for each locus. The authors present the generative process for mStruct and propose a variational inference algorithm for efficient and tractable inference. Experimental results on synthetic datasets and HGDP microsatellite data demonstrate the effectiveness of mStruct in capturing population structure with genetic admixture and allele mutation effects.

bobm
Download Presentation

mStruct: A New Admixture Model for Population Structure Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. mStruct:A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric Xing School of Computer Science Carnegie Mellon University ICML 2008 Presented by Haojun Chen

  2. Outline • Background • Structure Model • mStruct Model • Experiment Results • Summary

  3. Background • Allele: one member of a pair or series of different forms of a gene • Population structure analysis aim to shed light on evolutionary history of modern human population • Microsatellites and single nucleotide polymorphisms (SNP) data: base of population structure analysis • State-of-the-art method: Structure

  4. Structure Model x: Microsatellite alleles : unique set of population-specific multinomial distributions : vector of multinomial parameters, a.k.a., allele frequency profile (AP), of the allele distribution at locus i in ancestral population k : total number of observed marker alleles at locus I : total number of marker loci : total number of individuals : individual-specific admixing coefficient vector

  5. Pitfall of Structure • There is no mutation model for modern individual alleles with respect to common prototypes in the modern populations • Every unique allele in the modern population is assumed to have a distinct ancestral frequency, rather than allowing the possibility of it just being a descendent of some common ancestral allele

  6. mStruct Model : set of ancestral alleles : mutation parameter associated with locus : frequencies of the ancestral alleles : total number of ancestral alleles Microsatellite mutation model SNP mutation model

  7. Generative Process • Generative process for Structure where • Generative process for mStruct step 2.2 above is replaced by

  8. mStruct Model Inference • MCMC: slow • Variational inference for hidden variable variational EM for hyperparameter

  9. Synthetic Data Twenty microsatellite genotype datasets with 100 individuals from 3 ancestral populations at 50 genotype loci

  10. HGDP Microsatellite Data • Model selection by BIC (Bayesian Information Criterion) score

  11. HGDP Microsatellite Data 1056 individuals from 52 populations at 377 autosomal microsatellite loci am-spectrum: spectrums of different ancestral populations gm-spectrum: spectrums of differentgeographical populations

  12. Contour of Mutation Rates

  13. Summary • mStruct takes into account genetic admixture and allele mutation effects • mStruct: extended LDA which allows noisy observations • Variational inference algorithm that allows tractable inference developed for mStruct • Other application: images, text and so on

More Related