1 / 29

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis. Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides. Why Cluster?. Clustering is a process by which you can explore your data in an efficient manner.

erica
Download Presentation

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Sense of Complicated Microarray DataPart II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides

  2. Why Cluster? • Clustering is a process by which you can explore your data in an efficient manner. • Visualization of data can help you review the data quality. • Assumption: Guilt by association – similar gene expression patterns may indicate a biological relationship.

  3. -2 2 Expression Vectors Gene Expression Vectors encapsulate the expression of a gene over a set of experimental conditions or sample types. 1.5 -0.8 1.8 0.5 -0.4 -1.3 1.5 0.8 Numeric Vector Line Graph Heatmap

  4. Expression Vectors As Points in ‘Expression Space’ t 1 t 2 t 3 G1 -0.8 -0.3 -0.7 G2 -0.7 -0.8 -0.4 G3 Similar Expression -0.4 -0.6 -0.8 G4 0.9 1.2 1.3 G5 1.3 0.9 -0.6 Experiment 3 Experiment 2 Experiment 1

  5. Distance and Similarity -the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms -distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression -selection of a distance metric defines the concept of distance

  6. Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 x1A x2A x3A x5A Gene A x4A x6A Gene B x1B x2B x3B x4B x5B x6B 6 6 • Manhattan: i = 1 |xiA – xiB| Distance: a measure of similarity between gene expression. p1 • Some distances: (MeV provides 11 metrics) • Euclidean: i = 1(xiA - xiB)2 p0 3. Pearson correlation

  7. Clustering Algorithms

  8. Clustering Algorithms • Be weary - confounding computational artifacts are associated with all clustering algorithms. -You should always understand the basic concepts behind an algorithm before using it. • Anything will cluster! Garbage In means Garbage Out.

  9. Hierarchical Clustering • IDEA: Iteratively combines genes into groups based on similar patterns of observed expression • By combining genes with genes OR genes with groups algorithm produces a dendrogram of the hierarchy of relationships. • Display the data as a heatmap and dendrogram • Cluster genes, samples or both (HCL-1)

  10. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  11. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  12. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  13. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  14. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  15. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  16. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  17. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

  18. Hierarchical Clustering H L

  19. Hierarchical Clustering Samples Genes The Leaf Ordering Problem: • Find ‘optimal’ layout of branches for a given dendrogram • architecture • 2N-1 possible orderings of the branches • For a small microarray dataset of 500 genes • there are 1.6*E150 branch configurations

  20. Hierarchical Clustering The Leaf Ordering Problem:

  21. Hierarchical Clustering • Pros: • Commonly used algorithm • Simple and quick to calculate • Cons: • Real genes probably do not have a hierarchical organization

  22. Self-Organizing Maps (SOMs) A Idea: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. B C D c a d b

  23. Self-Organizing Maps (SOMs) A IDEA: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. B C D c a d b

  24. Self-organizing Maps (SOMs)

  25. Self-organizing Maps (SOMS)

  26. The Gene Expression Dynamics Inspector – GEDI S a m p l e s } } } Group C Group A Group B C1 C2 C3 C4 B2 B1 B3 B4 A1 A2 A3 A4 Gene 1 G en e s Gene 2 G en e s Gene 3 Gene 4 Gene 5 Gene 6 Group C Group A Group B … • GEDI’s Features: • Allows for simultaneous analysis or several time courses or datasets • Displays the data in an intuitive and comparable mathematically driven visualization • The same genes maps to the same tiles H Group A Group B Group C L 1 2 3 4

  27. Software Demonstrations MeV available at http://www.tigr.org/software/tm4/mev.html GEDI available at http://www.chip.org/~ge/gedihome.htm

  28. G.E.D.I. allows the direct visual assessment of the quality of conventional cluster analysis Comparison of GEDI vs. Hierarchical ClusteringHierarchical clustering of random data(GIGO) From: CreateGEP_Journal.wpd, random_A

  29. Questions

More Related