1 / 18

Generating Robust and Consensus Clusters from Gene Expression Data

This study compares different clustering algorithms for gene expression data and introduces methods for generating robust and consensus clusters. The algorithms are tested on two datasets, providing specific advantages for array-based gene expression analysis.

stricklanda
Download Presentation

Generating Robust and Consensus Clusters from Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Robust and Consensus Clusters from Gene Expression Data Allan Tuckera, Stephen Swifta, Xiaohui Liua, Nigel Martinb, Christine Orengoc, Paul Kellamc a b c

  2. Introduction • Many different clustering algorithms used for gene expression analysis • Little work on inter-method consistency or cross-comparison • Important due to differing results (each algorithm implicitly forces a structure on data) • Obtaining a consensus across methods should improve confidence

  3. The Talk • Compare a number of existing methods for clustering gene expression data • Algorithms for generating robust clusters and consensus clusters • Tested on a set of Amersham Scorecard data with known structure and experimentally obtained virus B-Cell data • Provides specific advantages in the analysis of array based gene expression data

  4. Clustering Methods • Hierarchical Clustering (R) • PAM (R) • CAST (C++) • Simulated Annealing (C++)

  5. Datasets • Amersham Scorecard • 597 genes, 24 blocks with 32 columns and 12 rows under 30 experimental conditions • Repeated experiments which we assume should cluster together • B Cell Data • 1987 genes

  6. Comparison of Methods

  7. The Agreement Matrix

  8. Robust Clustering • Takes agreement matrix as input • Place all genes into robust clusters that have full agreement • Deterministic algorithm • Should give higher degree of confidence in clusters • Not all genes will be assigned

  9. Dataset ASC B-cell No. of Robust Clusters 24 154 % of variables assigned 79% 25% Max. Robust Cluster size 44 14 Min. Robust Cluster size 2 2 Mean Robust Cluster size 10.2 3.2 Robust Clustering

  10. Consensus Clustering • “Full agreement” requirement for robust clusters can be too restrictive • Algorithm for generating consensus clusters given minimum agreement parameter • Approximate stochastic algorithm

  11. Consensus Clustering Input Cluster Results Agreement Matrix Consensus Clusters

  12. Consensus Clustering B-Cell Dataset ASC Dataset

  13. Consensus Clustering

  14. Consensus Clustering

  15. Summary • Clustering biological data is very useful • Biases in clustering algorithms can mean success in identification of patterns vary • Consensus algorithms used in protein secondary structure prediction • We apply similar strategy with robust and consensus clustering

  16. Conclusions • Robust clusters good for identifying common transcriptional modules • Also for identifying genes with common functional pathway • Useful for creating clusters of genes with high confidence • Can be restrictive in discarding genes that do not have full agreement.

  17. Conclusions • Consensus clustering relaxes full agreement requirement • Resembles defined clusters in synthetic data very well • Reliably picks out features in the virus gene expression data • Fulfils desire not to rely on one clustering algorithm during gene expression analysis

  18. Acknowledgements • The Biotechnology and Biological Sciences Research Council (BBSRC), UK • The Engineering and Physical Sciences Research Council (EPSRC), UK

More Related