MINING THE GENE EXPRESSION MATRIX:
Download
1 / 21

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA - PowerPoint PPT Presentation


  • 287 Views
  • Uploaded on

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM L ARGE SCALE GENE EXPRESSION DATA. Patrik D'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi Information Processing in Cells and Tissues, pp. 203-212, 1998 Presented by Bin He. Motivations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA' - tam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

MINING THE GENE EXPRESSION MATRIX:INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA

Patrik D'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi

Information Processing in Cells and Tissues, pp. 203-212, 1998

Presented by Bin He


Motivations
Motivations

  • it is necessary to determine large-scale temporal gene expression patterns

  • to decipher the logic of gene regulation, we should aim to be able to monitor the expression level of all genes simultaneously


Gene time series
Gene time series

  • assay the expression levels of large numbers of genes in a tissue at different time points

  • Gene time series

    the relative amounts of mRNA produced at these time points provide a gene expression time series for each gene


Gene expression matrix
Gene Expression Matrix

  • Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi, R., 1997, Large-scale temporal gene expression mapping of CNS development, Proc. Natl. Acad. Sci., in press


Previous approach
Previous Approach

  • Euclidean distance and information theoretic measures to cluster the genes into related expression time series

  • A significant problem with this approach is the variety of measures that can be used

  • Each measure produces a unique clustering of gene expression patterns


Contributions
Contributions

  • determining significant relationships between individual genes, based on:

    • linear correlation

    • rank correlation

    • information theory


Linear correlation positive correlation
Linear correlation ------positive correlation

  • positive linear correlation


Linear correlation negative correlation
Linear correlation ------negative correlation

  • negative linear correlation


Linear correlation restriction
Linear correlation ------restriction

  • for 112 different genes, 112x111/2 = 6216 pairs of expression time series need to be examined

  • to restrict the number of relationships, we might want to test which correlations are significantly larger than a certain value


Linear correlation restriction1
Linear correlation ------restriction

  • For instance, to find those relationships in which at least 50% of the variance is explained by the correlation, i.e. rho2>0.5, we need |r|>0.96 to reject at the 1% significance level the null hypothesis that |rho|<0.7071


Linear correlation visualization
Linear correlation ------visualization

  • residual variance based distance measurment

    • d=1-r2

    • d=0 if perfectly correlated, d=1 if uncorrelated

  • multidimensional scaling

    • map time series into a two-dimensional plane


Linear correlation visualization1
Linear correlation ------visualization

  • Multidimensional scaling of 34 time series with high correlation


Nonlinear correlation model
Nonlinear correlation ------Model

  • Spearman rank correlation, rs

    • measurement for monotonic relationships

    • can be used for non-Gaussian distributions

  • 491 pairs of expression time series, involving 98 genes, which have a significant rs, ranging from -0.979 to 0.996


Nonlinear correlation example
Nonlinear correlation ------Example

  • High rank correlation but low linear correlation between mGluR1 and GRa2


Information theory mutual information
Information Theory ------mutual information

  • if H(A) and H(B) are the entropies of sources A and B respectively, and H(A,B) the joint entropy of the sources, then M(A,B) = H(A) + H(B) - H(A,B)

  • discrete form is much easier to use

  • We need discretize the time series by partitioning the expression levels into bins


Information theory bin size
Information Theory ------Bin size

  • The fewer bins we use to discretize the data, the more information about the original time series we ignore.

  • On the other hand, too fine a binning will leave us with too few points per bin to get a reasonable estimate of the frequency of each bin


Information theory mapping
Information Theory ------Mapping

  • Some time series map to the same discretized series

  • In total, from 112 unique continuous-valued time series we get 91 discretized time series


Information theory mapping1
Information Theory ------Mapping


Information theory mapping2
Information Theory ------Mapping

  • eliminate one-to-one mapping by permuting the bin numbers

    • H(A)=H(B)=M(A,B)

    • row 3 and row 4

  • replace such time series by one single series, leaving us with a set of 77 unique, non-equivalent time series.


Information theory measurement
Information Theory ------Measurement

  • symmetric measures

    • M(A,B)/max(H(A),H(B))

    • M(A,B)/H(A,B)

  • asymmetric measures

    • Relative mutual information

      R(A,B) = M(A,B)/H(B)

    • R(A,B) = 1.0, means that all the information about time series B is contained in time series A


Conclusion
Conclusion

  • Linear correlation can be used very effectively to detect linear relationships

    • detect relationships not captured by Euclidean distance, such as high negative correlations

  • Rank correlation can be used to detect non-linear relationships

    • much more robust with respect to the distribution of expression levels

  • Information theory can be used to detect genes whose (binned) expression patterns share information

    • It will detect any mapping from time series A to B


ad