1 / 41

Fuzzy K means

Fuzzy K means. Fuzzy K means. A gene can be assigned to several clusters Each gene is assigned to a cluster with a membership value between 0 and 1 The membership values of a gene add up to one Genes with lower membership values are not well represented by the cluster centroid

lgately
Download Presentation

Fuzzy K means

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fuzzy K means

  2. Fuzzy K means • A gene can be assigned to several clusters • Each gene is assigned to a cluster with a membership value between 0 and 1 • The membership values of a gene add up to one • Genes with lower membership values are not well represented by the cluster centroid • Expression of genes with high membership values are close to cluster centroid

  3. Centroid • During the centroid refinement in each clustering cycle, new centroids were calculated on the basis of the weighted mean of all the gene –expression patterns in the data set according to

  4. Membership Function • Each gene’s membership m (a continuous variable from 0 to 1) is defined as:

  5. Fuzzy K means • The gene weight is (only on the seconed and the third round) empirically defined as: Where is the Pearson Correlation between Xi and Xn and is the correlation cutoff

  6. Fuzzy K means • In each clustering cycle , the centroids were iteratively refined until the average change was <0.001. • Around 85 % of the centroids , stabilized within approximately 15 iterations , some of centroids required more : about 40 -60 iterations before stabilizing.

  7. Fuzzy K means • After each clustering cycle , each centroid was compared to all other centroids in the set , and centroid pairs correlated >0.9 were replaced by their average .

  8. Visualization Tools

  9. Cells respond to environment Various external messages Heat Responds to environmental conditions Food Supply

  10. Genome is fixed – Cells are dynamic • A genome is static • Every cell in our body has a copy of same genome • A cell is dynamic • Responds to external conditions • Saccharomyces cerevisiae cells follow a cell cycle of division and also budding. • Cells differentiate during development

  11. Gene regulation • Gene regulation is responsible for dynamic cell • Gene expression varies according to: • Cell type • External conditions

  12. Transcription Factors Binding to DNA • Transcription regulation: • Certain transcription factors bind DNA • Binding recognizes DNA substrings: • Regulatory motifs

  13. Regulation of Genes Transcription Factor (Protein) RNA polymerase (Protein) DNA Gene Regulatory Element

  14. Regulation of Genes Transcription Factor (Protein) RNA polymerase DNA Regulatory Element Gene

  15. Regulation of Genes New protein RNA polymerase Transcription Factor DNA Regulatory Element Gene

  16. The Challenges of Gene Expression Data • Many genes have expression data patterns that are similar to multiple, distinct gene groups.

  17. Results of Clustering Gene Expression • CLUSTER is simple and easy to use • De facto standard for microarray analysis • Limitations: • Hierarchical and other method clustering in general is not robust • Genes may belong to more than one cluster

  18. Gene can be co expressed with different gene groups in response to different conditions.

  19. Saccharomyces cerevisiae • The yeast Saccharomyces cerevisiae possesses sophisticated mechanisms to choreograph the expression of its 6200 genes in order to thrive or at list to survive in a wide range of environmental conditions.

  20. The gene expression of 40 Yap1p targets, these genes were coordinately induced in responds to subset of conditions shown here ( labeled in red)

  21. What is a microarray

  22. What is a microarray (2) • A 2D array of DNA sequences from thousands of genes • Each spot has many copies of same gene • Allow mRNAs from a sample to hybridize • Measure number of hybridizations per spot

  23. Goal of Microarray Experiments • Measure level of gene expression across many different conditions: • Expression Matrix M: {genes}{conditions}: Mij = |genei| in conditionj • Deduce gene function • Genes with similar function are expressed under similar conditions

  24. Fuzzy K-Means clustering • Each gene can belong to many clusters • Soft (fuzzy) assignment of genes to clusters • Each gene has 1.0 membership units, allocated amongst clusters based on correlation with means • Cluster means are calculated by taking the weighted average of all the genes in the cluster

  25. Fuzzy K-Means clustering Algorithm: • Use PCA to initialize cluster means • 3 iterations of fuzzy k-means clustering, find k/3 clusters per iteration • In each iteration, start with brand new clusters and initializations • And a few more heuristic tricks

  26. Initialization • Use PCA to find a few eigenvectors for initialization • These features capture the directions of maximum variance • Must be orthonormal

  27. Example Initialization • k/3 centroids defined from k/3 first eigenvectors

  28. Example • First iteration of clustering

  29. Iteration of the approach • Remove genes that have a Pearson Correlation with a particular cluster greater than 0.7 • Intuition: These strong signal from these genes has been accounted for • Repeat

  30. Removing Duplicate Centroids • Centroids with Pearson correlation > 0.9 will be averaged. • Allows selecting a large initial number of clusters, since duplicates will be removed

  31. Repeat 3 times Output • Cluster means • Gene assignments to clusters

  32. Regulatory systems that govern the expression of overlapping sets of genes in yeast.

  33. Fuzzy K means ADVANTAGES • The method can present overlapping clusters , revealing distinct features of each gene’s function and regulation. • The resulting implication can be used to assign refined hypothetical functions to uncharacterized gene products and additional cellular roles of well none studied proteins .

  34. Fuzzy K means ADVANTAGES • It present more comprehensive groups of conditionally co regulate genes. • It elucidate the environmental conditions that trigger changes in gene expression. • It requires no a priori information about the dataset.

  35. Fuzzy K means DISADVANTAGES • Assignment of genes to the cluster requires a user – defined cutoff and selecting meaningful cutoff is a challenge. • Fuzzy K means failed to identify a small number of groups that were identified by hierarchical clustering.

  36. My opinion • The unique advantages of fuzzy K means clustering make the technique a valuable tool for gene expression analysis , it’s flexibility can be used to reveal more complex correlations between gene expression patterns, promoting refined hypotheses of the role and regulation of gene expression changes.

  37. In order to get over the limitations… combining hierarchical clustering with fuzzy K means can be useful..

  38. Thank you !

More Related