1 / 7

Integrative Analysis of multiple large-scale molecular biological data

Integrative Analysis of multiple large-scale molecular biological data. Sri Priya Ponnapalli Genomic Signal Processing Laboratory The University of Texas at Austin. Project Objectives. Specimen Under Analysis : The National Cancer Institute’s 60 cell lines (NCI60).

ting
Download Presentation

Integrative Analysis of multiple large-scale molecular biological data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrative Analysis of multiple large-scale molecular biological data Sri Priya Ponnapalli Genomic Signal Processing Laboratory The University of Texas at Austin

  2. Project Objectives Specimen Under Analysis : The National Cancer Institute’s 60 cell lines (NCI60). Dataset #1 RNA Expression profiles, [ Ross et al, 2000 ] Dataset #2 Proteomic profiles, [ Nishizuka et al , 2003 ]Dataset #3 Drug Activity Levels, [ Scherf et al, 2000 ] Perceive relationships between three datasets, each containing a different attribute of the NCI60 – genome-scale expression, sensitivities to more than 70,000 chemical compounds and chemotherapeutics, and proteomic profiles. CHIEF OBJECTIVE : DEVELOP A METHOD TO ANALYSE THE RELATIONSHIPS BETWEEN MULTIPLE DATASETS.

  3. Initial Analysis : SVD • All three datasets were processed using Singular value decomposition [ Alter et al, 2000]. The results look interesting but as you can see, it is difficult to interpret them very well, let alone integrate the SVD results of all three datasets. Plot of the First 5 sorted Eigengenes Tumor Samples Eigengenes

  4. Analysis using GSVD • Every pair of datasets was then processed using Generalized Singular value decomposition. • Dataset1=U1E1X • Dataset2=U2E2X • If a dataset is thought to represent a line, the GSVD of two datasets represent the point of intersection of these lines. i.e. It highlights the similarities and dissimilarities between the two datasets. This simple fact suggests a method to study the similarities and differences between multiple datasets.

  5. Consider the case of finding similarities and dissimilarities between 3 pairs of datasets ( this can be extended to multiple datasets). • These 3 datasets maybe thought of representing 3 lines. • Any two non-parallel lines intersect at a point. • Three non-parallel lines form a triangle ( unless they all have a common point in which case all three vertices of the triangle converge to that point). • To goal is to express the three datasets in the form • Dataset1=U1E1X • Dataset2=U2E2X • Dataset3=U3E3X

  6. If we compute the GSVD of every two datasets ( find the points of intersection of every two lines), we get three matrices that each correspond to a vertex of a triangle. We want a matrix that best approximates these three matrices i.e. a point that is closest to all three vertices simultaneously. This point would be the centroid of the triangle. Given the co-ordinates of the vertices, the centroid may be easily computed. All these results have to be interpreted in terms of matrices. This may be easily done by considering the distances between matrices as defined by the Frobenius distance.

  7. This method is an approximation, but the best possible approximation. • It minimizes the error between the original dataset and the dataset obtained by the product of the three matrices. • It has been tried on the three datasets under study and the results look promising. • Please read the paper for further details.

More Related