Modified Multi-Dimensional Scaling (MDS) Algorithm for Mining Gene Expression Patterns. X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**, H. Aburatani** and S. Iwata* *Research into Artifacts, Center for Engineering (RACE), The University of Tokyo,
X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**,
H. Aburatani** and S. Iwata*
*Research into Artifacts, Center for Engineering (RACE), The University of Tokyo,
Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
**Department of Life Sciences, Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
ABSTRACT: The dataset of Golub et al. is analyzed by using dimensionality-reduction techniques, including principal component analysis (PCA), multi-dimensional scaling (MDS) and a modified MDS algorithm. These methods produce snapshots that are helpful for class discovery.
Data Set 2: Golub, et al. Science 286: 531(1999).
Gene expression patterns can be considered as points in multi-dimensional Euclidean spaces. As the high dimensionality causes difficulty in analysis, it is helpful to have a low-dimensional, representation that captures some characteristics of the raw dataset.
In principal component analysis (PCA), the raw data points are linearly projected to some plane with maximum variance.
In Multi-dimensional Scaling (MDS), data points are represented on low-dimensional space such that the distances between points are preserved. MDS is nonlinear.
n-D data points
A linear projection of gene expression patterns using the first two principal components. Samples of ALL and AML are roughly mapped into different clusters.
MDS minimizes the objective function:
is the distance between points in the x-y plot
is the Euclidean distance between gene expression patterns.
Mapping of gene expression patterns by multi-dimensional scaling (MDS). AML and two subtypes of ALL samples are found in different regions. But the classification is difficult without clinical information.
Goal: Enlarge trans-cluster distances to make separation easier.
Physics background: condensation of atoms to form solids with minimum free energy.
Mapping of gene expression patterns by a modified multi-dimensional scaling (MDS) algorithm. AML and two subtypes of ALL samples are found in different regions.