The Infinite Hierarchical Factor Regression Model

The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009

Outline • Introduction • The Infinite Hierarchical Factor Regression Model • Indian Buffet Process and Beta Process • Experiment • Summary

Introduction • The latent factor representation benefits: 1. Discovering the latent processunderlying the data 2. Simpler predictive modeling through a compact data representation. Large P, Small N. N>=10 · d · C • The fundamental advantages over standard FA model: 1. not assume known number of factors; 2. not assume factors are independent; 3. not assume all features are relevant to the factor analysis.

: Algorithm Model

Graphical Model Tis used to eliminate the spurious genes or noise features. So Tpdetermines whether the p-th customer will enter restaurant to eat any dish.

Indian restaurant with countably many infinite dishes Indian Buffet Process--from latent classes to latent features • For a finite feature model: (Tom Griffiths, 2006)

Differences between DP and IBP IBP ‘class’ matrix DP class matrix Different styles match different problems. 1. Latent feature 2. Clustering 3. others

the first customer samples Poisson( ) dishes then samples new dishes Two-Parameter Finite Model • the i-th customer samples a previously sampled dish with probability (Z. Ghahramani et. al., 2006)

the first customer samples Poisson( ) dishes Beta Process V.S. IBP • Beta Process: • the i-th customer samples a previously sampled dish with probability then samples new dishes

Hierarchical Factor Prior • Kingman’s Coalescent It is a distribution over the genealogy of a countably infinite set of individuals. Construct tree structure • Brownian diffusion A Markov process which encodes message (mean and covariance) in each node of the above tree. Y. W. Teh, H. Daume III, and D. M. Roy. Bayesian Agglomerative Clustering with Coalescents. In NIPS, 2008.

Feature Selection Prior • Some genes are spurious Before selecting dishes, these ‘spurious’ customers should leave the restaurant.

Provided by Piyush Rai

Experimental results E-coli data: 100 samples 50 genes 8 underlying factors Breast cancer data: 251 samples 226 genes 5 underlying factors • The hierarchy can be used to find factors in order of their prominence. • Hierarchical modeling results in better predictive performance for the factor regression task. • The factor hierarchy leads to faster convergence since most of the unlikely configurations will never be visited as they are constrained by the hierarchy.

Ground Truth NIPS Method Sparse BPFA on Factor score VB Sparse BPFA on Factor loading VB The Comparison of Factor Loading Matrice Learned from Different Methods

Factor Regression Training and test data are combined together and test responses are treated as missing values to be imputed.

The Existing Similar FA Models • Putting binary matrix on factor score matrix David Knowles and Zoubin Ghahramani. Infinite Sparse Factor Analysis and Infinite Independent Components Analysis, ICA 2007 John Paisley et. al., Nonparametric Factor Analysis with Beta Process Priors, in submission 2009. • Putting binary matrix on factor loading matrix Piyush Rai and Hal Daume III. The Infinite Hierarchical Factor Regression Model, NIPS 2008. Summary: 1. For ‘large P, small N’ problems, the first one is faster to learn the small factor score matrix with KxN. Considering MCMC solution, it is difficult for the second one to handle the problem with tens of thousands of genes . 2. The second one can give an explanation to the relationship between gene and factor (pathway).

The New Developments of IBP F. Doshi, K. T. Miller, J. Van Gael and Y.W. Teh, Variational Inference for the Indian Buffet Process, AISTATS 2009. Jurgen Van Gael, Yee Whye Teh, Zoubin Ghahramani, The Infinite Factorial Hidden Markov Model, NIPS 2008. K. A. Heller and Zoubin Ghahramani, A Nonparametric Bayesian Approach to Modeling Overlapping Clusters, AISTATS 2007.

The Infinite Hierarchical Factor Regression Model