Exploring Microarray data Javier Cabrera. Outline. Exploratory Analysis Steps. Microarray Data as Multivariate Data. Dimension Reduction Correlation Matrix Principal components Geometrical Interpretation Linear Algebra basics How many principal componets Biplots
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Exploring Microarray data
Microarray data as Multivariate Data
Dimension Reduction: gene subset selection
Principal Components Geometrical Intuition
Linear algebra is useful to write computations in a convenient way. Since the number of genes (G) is very large we need to write the computations so we do not generate any GxG matrices.
Notice that the rows of X are the genes = variables.
Singular Value Decomposition: X = U D V’
Gxp Gxp pxp pxp
In standard Multivariate Analysis X would be transposed so the variables correspond to columns of X. But if we do it that way D and V would both be GxG matrices and that is what we are trying to avoid.
Principal Components Table
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 4.70972 4.50705 3.87907 1.8340 1.6120 1.5813 1.4073 1.3201
Proportion of Variance 0.24260 0.22217 0.16457 0.0367 0.0284 0.0273 0.0216 0.0190
Cumulative Proportion 0.24260 0.46477 0.62934 0.6661 0.6945 0.7219 0.7435 0.7626
Comp.9 Comp10 Comp11 Comp12 Comp13 Comp14 Comp15 Comp16
Standard deviation 1.27977 1.21854 1.10437 1.0549 1.0238 0.9722 0.9511 0.9177
Proportion of Variance 0.01791 0.01623 0.01333 0.0121 0.0114 0.0103 0.0098 0.0092
Cumulative Proportion 0.78054 0.79678 0.81012 0.8222 0.8337 0.8440 0.8539 0.8632
Principal Components Graph: PC3 Vs PC2 Vs PC1
The four tumor
Biplots: Linear Algebra
From SVD: X = UDV’X2 = U2D2V2’
A = U2D2a and B=V2D2b, a+b=1 so X=AB’
The biplot is a Graphical display of X in which two sets of markers are plotted.
One set of markers a1,…,aG represents the rows of X
The other set of markers, b1,…, bp, represents the columns of X.
The biplot is the graph of A and B together in the same graph.
If the number of genes is too big it is better to omit and plot them in a separate graph or to invert the graph.
Biplots of the first two principal components
Ggobi display finding four clusters of tumors using the PP index on the set of 63 cases. The main panel shows the two dimensional projection selected by the PP index with the four clusters in different colors and glyphs. The top left panel shows the main controls and the left bottom panel displays the controls and the graph of the PP index that is been optimized. The graph shows the index value for a sequence of projection ending at the current one.
Exploratory Analysis Steps