1 / 18

Joint work with Quaid Morris (1),(2) , Tim Hughes (2) and

GenXHC: A Probabilistic Generative Model for Cross-hybridization Compensation in High-density Genome-wide Microarray Data. Jim Huang (1). Joint work with Quaid Morris (1),(2) , Tim Hughes (2) and Brendan Frey (1),(2).

carver
Download Presentation

Joint work with Quaid Morris (1),(2) , Tim Hughes (2) and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GenXHC: A Probabilistic Generative Model for Cross-hybridization Compensation in High-density Genome-wide Microarray Data Jim Huang(1) Joint work with Quaid Morris(1),(2), Tim Hughes(2) and Brendan Frey(1),(2) • Probabilistic and Statistical Inference Group, University of Toronto • (2) Banting & Best Department of Medical Research, University of Toronto ISMB 2005

  2. Genome-wide profiling using high-density microarrays • The move towards high-density arrays for genome-wide profiling presents challenges… Coding regions … Genome Conditions Expression Probes ISMB 2005

  3. Cross-hybridization in high-density microarrays mRNA transcript G C GCTAG C AGCTAGGAT G C T C T A • As we move to higher-density arrays, cross-hybridization noise becomes significant and unavoidable TCGAT CTA TCGAT CTA Hybridization Cross-hybridization Oligonucleotide Probes ISMB 2005

  4. Cross-hybridization in high-density microarrays (cont’d) • Large cross-hybridization noise component in high-density data! ISMB 2005

  5. Cross-hybridization compensation • State-of-the-art methods for cross-hybridization compensation designed for Affymetrix GeneChips • Affymetrix MAS 5.0 • Robust Multi-array Analysis (RMA/GC-RMA)(1),(2) • Wu, Z. and Irizarry, R.A. (2004) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. Proc. Ninth International Conference on Research in Computational Molecular Biology (RECOMB), March 2004, pp. 98-106. • (2) Irizarry, R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array • probe level data. Biostatistics, 4, pp. 249 - 264. ISMB 2005

  6. Bilinear model for cross-hybridization Λ Z X • Each probe is assigned a set of cross-hybridizing transcript expression profiles • Each transcript has a hybridization weight λ that determines its contribution ISMB 2005

  7. The probabilistic generative model for cross-hybridization • Model the data probabilistically as X = ΛZ + V where X = [x1x2 … xT] isN x T, Z = [z1z2 … zT] is M x T, Λis the N x M hybridization matrix, V is additive noise ISMB 2005

  8. Sparsity of the Λ matrix • Force many of the weights λij to 0 • Denote by S the set of weights which are non-zero: the prior becomes where ISMB 2005

  9. The probabilistic generative model for cross-hybridization (cont’d) • The probabilistic model p(X,Z,Λ|S) for cross-hybridization is therefore ISMB 2005

  10. Variational inference • To perform inference, minimize the KL-divergence with respect to a distribution qfor the given probabilistic modelp • The optimum is the posterior distribution q(Z,Λ) = p(Z,Λ|X,S) • Difficult to compute exactly! • Use a surrogate which approximates the true posterior ISMB 2005

  11. Variational EM for approximate inference and parameter estimation • Use exponential distributions parameterized by variational parameters for q • Minimize KL-divergence via variational EM(2),(3) to get the estimate βjt of the transcript expression profiles: Variational E-step Variational M-step (2) Neal, R. M. and Hinton, G. E. (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, Kluwer Academic Publishers, pp. 355-368. (3) Jaakkola, T. and Jordan, M.I. (2000) Bayesian parameter estimation via variational methods. Statistics and Computing, 10:1, January 2000, pp. 25-37. ISMB 2005

  12. Variational Expectation-Maximization algorithm Variational E-step Variational M-step ISMB 2005

  13. Results • Agilent exon-tiling microarray data with 26,486 60-mer probes across 12 tissue pools • Matched each probe to full-length RefSeq cDNAs via BLAST search to determine the sparsity structure S • Resulting data set contains 9,904 probes matched to 2,905 mouse transcripts ISMB 2005

  14. Results (cont’d) ISMB 2005

  15. Significance testing of inferred expression profiles • Randomly permute the rows of the S matrix and perform inference • Mean SNR significantly lower for permuted data compared to unpermuted data ISMB 2005

  16. Gene Ontology-Biological Process (GO-BP) enrichment using denoised data • Perform agglomerative hierarchical clustering and compute a hypergeometric p-value for each cluster to evaluate statistical significance of the clustering • Majority of clusters are have increased significance in denoised data compared to clustering using noisy data ISMB 2005

  17. Comparison to Robust Multi-array Analysis • Unlike RMA, GenXHC models the explicit sparse structure of the set of probe-transcript interactions • This increases statistical power when doing functional prediction ISMB 2005

  18. Summary • Cross-hybridization compensation using prior knowledge about the transcript population doubles number of probes on array • Problem of inferring latent transcript profiles is one of variational inference • Functional annotation using denoised data yields functional categories which have higher statistical significance compared to noisy expression data • Taking into account the set of probe-transcript binding interactions generally yields greater statistical power versus ignoring them ISMB 2005

More Related