Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Making Sense of Complicated Microarray DataPart II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides

Why Cluster? • Clustering is a process by which you can explore your data in an efficient manner. • Visualization of data can help you review the data quality. • Assumption: Guilt by association – similar gene expression patterns may indicate a biological relationship.

-2 2 Expression Vectors Gene Expression Vectors encapsulate the expression of a gene over a set of experimental conditions or sample types. 1.5 -0.8 1.8 0.5 -0.4 -1.3 1.5 0.8 Numeric Vector Line Graph Heatmap

Expression Vectors As Points in ‘Expression Space’ t 1 t 2 t 3 G1 -0.8 -0.3 -0.7 G2 -0.7 -0.8 -0.4 G3 Similar Expression -0.4 -0.6 -0.8 G4 0.9 1.2 1.3 G5 1.3 0.9 -0.6 Experiment 3 Experiment 2 Experiment 1

Distance and Similarity -the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms -distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression -selection of a distance metric defines the concept of distance

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 x1A x2A x3A x5A Gene A x4A x6A Gene B x1B x2B x3B x4B x5B x6B 6 6 • Manhattan: i = 1 |xiA – xiB| Distance: a measure of similarity between gene expression. p1 • Some distances: (MeV provides 11 metrics) • Euclidean: i = 1(xiA - xiB)2 p0 3. Pearson correlation

Clustering Algorithms

Clustering Algorithms • Be weary - confounding computational artifacts are associated with all clustering algorithms. -You should always understand the basic concepts behind an algorithm before using it. • Anything will cluster! Garbage In means Garbage Out.

Hierarchical Clustering • IDEA: Iteratively combines genes into groups based on similar patterns of observed expression • By combining genes with genes OR genes with groups algorithm produces a dendrogram of the hierarchy of relationships. • Display the data as a heatmap and dendrogram • Cluster genes, samples or both (HCL-1)

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8 Hierarchical Clustering

Hierarchical Clustering H L

Hierarchical Clustering Samples Genes The Leaf Ordering Problem: • Find ‘optimal’ layout of branches for a given dendrogram • architecture • 2N-1 possible orderings of the branches • For a small microarray dataset of 500 genes • there are 1.6*E150 branch configurations

Hierarchical Clustering The Leaf Ordering Problem:

Hierarchical Clustering • Pros: • Commonly used algorithm • Simple and quick to calculate • Cons: • Real genes probably do not have a hierarchical organization

Self-Organizing Maps (SOMs) A Idea: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. B C D c a d b

Self-Organizing Maps (SOMs) A IDEA: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. B C D c a d b

Self-organizing Maps (SOMs)

Self-organizing Maps (SOMS)

The Gene Expression Dynamics Inspector – GEDI S a m p l e s } } } Group C Group A Group B C1 C2 C3 C4 B2 B1 B3 B4 A1 A2 A3 A4 Gene 1 G en e s Gene 2 G en e s Gene 3 Gene 4 Gene 5 Gene 6 Group C Group A Group B … • GEDI’s Features: • Allows for simultaneous analysis or several time courses or datasets • Displays the data in an intuitive and comparable mathematically driven visualization • The same genes maps to the same tiles H Group A Group B Group C L 1 2 3 4

Software Demonstrations MeV available at http://www.tigr.org/software/tm4/mev.html GEDI available at http://www.chip.org/~ge/gedihome.htm

G.E.D.I. allows the direct visual assessment of the quality of conventional cluster analysis Comparison of GEDI vs. Hierarchical ClusteringHierarchical clustering of random data(GIGO) From: CreateGEP_Journal.wpd, random_A

Questions

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Presentation Transcript

Lecture 5 MicroArray clustering and data analysis

Clustering analysis of microarray gene expression data

Analysis of microarray data

Basic Gene Expression Data Analysis--Clustering

Clustering methods used in microarray data analysis

Microarray technology and analysis of gene expression data

Making sense of data

Microarray Gene Expression Data Analysis

Analysis of Microarray Data

Biology-Driven Clustering of Microarray Data:

Discrimination and clustering with microarray gene expression data

Microarray Data Analysis Differential Gene Expression

Biology-Driven Clustering of Microarray Data

Gene expression: Microarray data analysis

Microarray Data Analysis - II

Analysis of Microarray Data

Clustering microarray data

Pathway and Gene Set Analysis of Microarray Data

Clustering analysis of microarray gene expression data

Eigensolvers for analysis of microarray gene expression data

Data Analysis Making Sense of Data