Exploring Microarray data Javier Cabrera. Outline. Exploratory Analysis Steps. Microarray Data as Multivariate Data. Dimension Reduction Correlation Matrix Principal components Geometrical Interpretation Linear Algebra basics How many principal componets Biplots
Linear algebra is useful to write computations in a convenient way. Since the number of genes (G) is very large we need to write the computations so we do not generate any GxG matrices.
Notice that the rows of X are the genes = variables.
Singular Value Decomposition: X = U D V’
Gxp Gxp pxp pxp
In standard Multivariate Analysis X would be transposed so the variables correspond to columns of X. But if we do it that way D and V would both be GxG matrices and that is what we are trying to avoid.
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 4.70972 4.50705 3.87907 1.8340 1.6120 1.5813 1.4073 1.3201
Proportion of Variance 0.24260 0.22217 0.16457 0.0367 0.0284 0.0273 0.0216 0.0190
Cumulative Proportion 0.24260 0.46477 0.62934 0.6661 0.6945 0.7219 0.7435 0.7626
Comp.9 Comp10 Comp11 Comp12 Comp13 Comp14 Comp15 Comp16
Standard deviation 1.27977 1.21854 1.10437 1.0549 1.0238 0.9722 0.9511 0.9177
Proportion of Variance 0.01791 0.01623 0.01333 0.0121 0.0114 0.0103 0.0098 0.0092
Cumulative Proportion 0.78054 0.79678 0.81012 0.8222 0.8337 0.8440 0.8539 0.8632
Principal Components Graph: PC3 Vs PC2 Vs PC1
The four tumor
From SVD: X = UDV’X2 = U2D2V2’
A = U2D2a and B=V2D2b, a+b=1 so X=AB’
The biplot is a Graphical display of X in which two sets of markers are plotted.
One set of markers a1,…,aG represents the rows of X
The other set of markers, b1,…, bp, represents the columns of X.
The biplot is the graph of A and B together in the same graph.
If the number of genes is too big it is better to omit and plot them in a separate graph or to invert the graph.
Ggobi display finding four clusters of tumors using the PP index on the set of 63 cases. The main panel shows the two dimensional projection selected by the PP index with the four clusters in different colors and glyphs. The top left panel shows the main controls and the left bottom panel displays the controls and the graph of the PP index that is been optimized. The graph shows the index value for a sequence of projection ending at the current one.