Loading in 2 Seconds...
Loading in 2 Seconds...
Non-Negative Matrix Factorization for Statistical Analysis Stan Young Paul Fogel, Doug Hawkins NISS MPC Vienna 11July 2007. Outline. Introduction Robust singular value decomposition Non-negative matrix factorization 4. Inference with NMF. Data Blocks (Zoo). 3-Way. PCA. Linear Regression.
4. Inference with NMF.
2-way tables of data are ubiquitous, PCA.
One response and a table of predictors is common,
Multiple 2-way tables are becoming important:
Gene expression, proteomics, metabololomics.
Factor analysis of multiple data matrices.
Horst 1961, 1965, Kettering, J. 1971.
Pittman, Sacks, Young. 2001.
See also www.niss.org/PowerArray
Permute the rows and columns to find patterns.
X = l * LHE ‘ * RHE + E
y = bx + e
The missing cell is colored yellow.
The plot suggests one or two components.
Judges are divided into the following groups: 1-3, 4-7, 8-11, 12-26, 27-32
Wines are divided into the following groups: 1-4, 5-17,18-27,28-41,42-46
Most judges were consistent.
Three judges are at odds with the rest.
The wines divided into 6 classes;
six wines group very well.
There is an apparent interaction of
wines and judges.
One eigen system captures most of the variance.
SVD RH EV elements come from a composite.
(They come from regression.)
NMF commits one vector to each mechanism.
“For such databases there is a generative model
in terms of ‘parts’ and
NMF correctly identifies the ‘parts’.”
NMF clusters samples correctly.
Additional subgroup of ALL-B.
Brunet et al. (2004). PNAS 101, 4164–4169
10 genes (p=0.00019)
MHC class II
Cluster 1 ALL-B1
P = 0.00054
MHC class I & II
P = 0.00018
P = 0.00260
Cluster 3 ALL-B2
DNA Repair and
P = 0.01519
Cell Growth and
16 genesALL-B1 and ALL-B2 Genes
Upregulation in ALL-B2 genes
Higher rate of transcription and replication processes
Proliferative nature compared with ALL-B1
Non-negative matrix factorization is used to group genes.
The testing alpha is allocated over these groups/vectors.
Within each group, genes are tested sequentially;
there is no multiple testing adjustment!!!.
Fogel et al. (2007) Bioinformatics
Genes 1-5: up-regulated by T1
Genes 6-10: up-regulated by T2
Genes 11-20: up-regulated by T1 and T2
NB: Genes within a mechanism are expected to be correlated.
SVD is the basis for most linear statistical systems.
Non-negative matrix factorization
will become increasingly important.
Data sets are getting much bigger.
We are seeing complex, multi-block data sets.
We need good software to expand data analysis.
NMF program and papers at
Stan Young : firstname.lastname@example.org
Paul Fogel : email@example.com
NMF Code and papers at www.niss.org/irMF
Analysis of “L” design: www.niss.org/PowerArray
NMF roundtable luncheon at JSM2007.
See also: www.niss.org/PowerMV