1 / 29

EECS 730 Introduction to Bioinformatics Microarray

EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/. Administrative. Final exam: Dec 15 7:30-10:00. Model Based Subspace Clustering. Microarray Bi-clustering δ -clustering. MicroArray Dataset.

maylin
Download Presentation

EECS 730 Introduction to Bioinformatics Microarray

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 730Introduction to BioinformaticsMicroarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/

  2. Administrative • Final exam: Dec 15 7:30-10:00 EECS 730

  3. Model Based Subspace Clustering • Microarray • Bi-clustering • δ-clustering EECS 730

  4. MicroArray Dataset EECS 730

  5. Gene Expression Matrix Genes Genes Conditions Time points Cancer Tissues EECS 730 Conditions

  6. Data Mining: Clustering K-means clustering minimizes Where EECS 730

  7. Clustering by Pattern Similarity (p-Clustering) • The micro-array “raw” data shows 3 genes and their values in a multi-dimensional space • Parallel Coordinates Plots • Difficult to find their patterns • “non-traditional” clustering EECS 730

  8. Clusters Are Clear After Projection EECS 730

  9. Motivation • DNA microarray analysis EECS 730

  10. Motivation EECS 730

  11. Motivation • Strong coherence exhibits by the selected objects on the selected attributes. • They are not necessarily close to each other but rather bear a constant shift. • Object/attribute bias • bi-cluster EECS 730

  12. Challenges • The set of objects and the set of attributes are usually unknown. • Different objects/attributes may possess different biases and such biases • may be local to the set of selected objects/attributes • are usually unknown in advance • May have many unspecified entries EECS 730

  13. Previous Work • Subspace clustering • Identifying a set of objects and a set of attributes such that the set of objects are physically close to each other on the subspace formed by the set of attributes. • Collaborative filtering: Pearson R • Only considers globaloffset of each object/attribute. EECS 730

  14. bi-cluster Terms • Consists of a (sub)set of objects and a (sub)set of attributes • Corresponds to a submatrix • Occupancy threshold  • Each object/attribute has to be filled by a certain percentage. • Volume: number of specified entries in the submatrix • Base: average value of each object/attribute (in the bi-cluster) • Biclustering of Expression Data, Cheng & Church ISMB’00 EECS 730

  15. bi-cluster EECS 730

  16. 17 conditions 40 genes EECS 730

  17. Motivation EECS 730

  18. 17 conditions 40 genes EECS 730

  19. Motivation Co-regulated genes EECS 730

  20. bi-cluster • Perfect -cluster • Imperfect -cluster • Residual: dij diJ dIJ dIj EECS 730

  21. bi-cluster • The smaller the average residue, the stronger the coherence. • Objective: identify -clusters with residue smaller than a given threshold EECS 730

  22. Cheng-Church Algorithm • Find one bi-cluster. • Replace the data in the first bi-cluster with random data • Find the second bi-cluster, and go on. • The quality of the bi-cluster degrades (smaller volume, higher residue) due to the insertion of random data. EECS 730

  23. The FLOC algorithm Generating initial clusters Determine the best action for each row and each column Perform the best action of each row and column sequentially Y Improved? N Yang et al. delta-Clusters: Capturing Subspace Correlation in a Large Data Set, ICDE’02 EECS 730

  24. The FLOC algorithm • Action: the change of membership of a row (or column) with respect to a cluster column M=4 1 2 3 4 row 3 4 2 2 1 M+N actions are Performed at each iteration 2 1 3 2 3 N=3 3 4 2 0 4 EECS 730

  25. The FLOC algorithm • Gain of an action: the residual reduction incurred by performing the action • Order of action: • Fixed order • Random order • Weighted random order • Complexity: O((M+N)MNkp) EECS 730

  26. The FLOC algorithm • Additional features • Maximum allowed overlap among clusters • Minimum coverage of clusters • Minimum volume of each cluster • Can be enforced by “temporarily blocking” certain action during the mining process if such action would violate some constraint. EECS 730

  27. Performance • Microarray data: 2884 genes, 17 conditions • 100 bi-clusters with smallest residue were returned. • Average residue = 10.34 • The average residue of clusters found via the state of the art method in computational biology field is 12.54 • The average volume is 25% bigger • The response time is an order of magnitude faster EECS 730

  28. Conclusion Remark • The model of bi-cluster is proposed to capture coherent objects with incomplete data set. • base • residue • Many additional features can be accommodated (nearly for free). EECS 730

  29. References • J. Young, W. Wang, H. Wang, P. Yu, Delta-cluster: capturing subspace correlation in a large data set, Proceedings of the 18th IEEE International Conference on Data Engineering (ICDE), pp. 517-528, 2002. • H. Wang, W. Wang, J. Young, P. Yu, Clustering by pattern similarity in large data sets, to appear in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2002. • Y. Sungroh,  C. Nardini, L. Benini, G. De Micheli, Enhanced pClustering and its applications to gene expression data Bioinformatics and Bioengineering, 2004. • J. Liu and W. Wang, OP-Cluster: clustering by tendency in high dimensional space, ICDM’03. EECS 730

More Related