1 / 20

In previous lectures: -- Identifying differentially expressed genes from replicates

In previous lectures: -- Identifying differentially expressed genes from replicates Parametric tests: -- T-test (are the means of 2 samples different?) -- ANOVA (are the means of 2 or more samples different?) Bayesian methods: baySeq

corozco
Download Presentation

In previous lectures: -- Identifying differentially expressed genes from replicates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In previous lectures: -- Identifying differentially expressed genes from replicates Parametric tests: -- T-test (are the means of 2 samples different?) -- ANOVA (are the means of 2 or more samples different?) Bayesian methods: baySeq -- Bonferroni vs. q-value & other FDR corrections for multiple testing FDR (false discovery rate): cites % false positives in SET of called genes -- Defining sensitivity and specificity baySeq issues??

  2. Now you have selected a subset of genes to focus on … But even then, there is often still an overwhelming amount of data. Need some strategies to simplify the analysis & visualization

  3. One Strategy: focus initially on groups of genes

  4. Array 1 Array 2 Array 3 Gene X: X1 X2 X3 x coordinate z coordinate y coordinate

  5. Array 1 Array 2 Array 3 Gene X: X1 X2 X3 x coordinate z coordinate y coordinate

  6. Practically speaking, the Pearson correlation R is the sum of all pairwise comparisons of the gene expression values in two gene expression vectors N 1 (Xi – X)(Yi – Y) S Standard Pearson Correlation: R x,y = N SDx SDy i = 1 Array 1 Array 2 Array 3 Array 4 Array 5 Gene X: X1 X2 X3 X4 X5 Gene Y: Y1 Y2 Y3 Y4 Y5 Pearson correlation ranges from –1 (anticorrelated), 0 (uncorrelated) , 1 (identical)

  7. Practically speaking, the Pearson correlation R is the sum of all pairwise comparisons of the gene expression values in two gene expression vectors N 1 (Xi) (Yi) S UncenteredPearson Correlation: (set the means of X and Y to 0) R x,y = N N i = 1 1 N 2 1 S 2 S Xi Yi N N i = 1 i = 1 Array 1 Array 2 Array 3 Array 4 Array 5 Gene X: X1 X2 X3 X4 X5 Gene Y: Y1 Y2 Y3 Y4 Y5 Using Standard Pearson Correlation: similar pattern + constant offset = P. corr of 1.0 Using Uncentered Pearson Correlation: similar pattern + constant offset not = 1.0

  8. Sometimes, want to use the weighted Pearson correlation N 1 (Xi) (Yi) S P x,y = N N i = 1 1 N 2 1 S 2 S Xi Yi N N i = 1 i = 1 Array 1 Array 2 Array 3 Array 4 Array 5 Gene X: X1 X2 X3 X4 X5 Gene Y: Y1 Y2 Y3 Y4 Y5 For example: if these arrays are identical, the data are over-represented 3X You will experiment with this in lab

  9. Excellent review by J. Quakenbush 2001 Nature Reviews-Genetics

  10. Hierarchical clustering Goal is organize the entire dataset into one hierarchical arrangement. Know as a “bottom up” or agglomerative clustering method. Two parts: 1) Calculating gene similarity 2) Organizing genes such that similarly expressed genes are group together

  11. Two steps of hierarchical clustering 1. Calculating the similarity matrix Calculate the Pearson correlation for every pair of genes

  12. Two steps of hierarchical clustering 1. Calculating the similarity matrix End up with a symmetrical table of Pearson correlations

  13. Two steps of hierarchical clustering 1. Calculating the similarity matrix Gene 2 Gene 5 Find the largest P. corr & join those genes together on a node

  14. Two steps of hierarchical clustering 1. Calculating the similarity matrix Gene 2 Gene 5 Should Gene 10 get added onto this node?

  15. Two steps of hierarchical clustering 1. Calculating the similarity matrix Gene 2 Gene 5 Should Gene 10 get added onto this node?

  16. 4. Centroid linkage clustering

  17. ‘centroid’ (average vector) 4. Centroid linkage clustering

  18. Visualization: Data are often converted to a colorimetric scale Each box: a transcript measurement Each row of boxes: transcript measurements for a given gene Each column of boxes: transcript measurements from a single array Red: higher transcript abundance in one sample Green: higher transcript abundance in the other sample

  19. Unweighted Pearson correlation (red/green version) (blue/yellow version)

More Related