slide1
Download
Skip this Video
Download Presentation
MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM L ARGE SCALE GENE EXPRESSION DATA

Loading in 2 Seconds...

play fullscreen
1 / 21

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA - PowerPoint PPT Presentation


  • 289 Views
  • Uploaded on

MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM L ARGE SCALE GENE EXPRESSION DATA. Patrik D\'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi Information Processing in Cells and Tissues, pp. 203-212, 1998 Presented by Bin He. Motivations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MINING THE GENE EXPRESSION MATRIX: INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA' - tam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

MINING THE GENE EXPRESSION MATRIX:INFERRING GENE RELATIONSHIPS FROM LARGE SCALE GENE EXPRESSION DATA

Patrik D\'haeseleer, Xiling Wen, Stefanie Fuhrman, and Roland Somogyi

Information Processing in Cells and Tissues, pp. 203-212, 1998

Presented by Bin He

motivations
Motivations
  • it is necessary to determine large-scale temporal gene expression patterns
  • to decipher the logic of gene regulation, we should aim to be able to monitor the expression level of all genes simultaneously
gene time series
Gene time series
  • assay the expression levels of large numbers of genes in a tissue at different time points
  • Gene time series

the relative amounts of mRNA produced at these time points provide a gene expression time series for each gene

gene expression matrix
Gene Expression Matrix
  • Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi, R., 1997, Large-scale temporal gene expression mapping of CNS development, Proc. Natl. Acad. Sci., in press
previous approach
Previous Approach
  • Euclidean distance and information theoretic measures to cluster the genes into related expression time series
  • A significant problem with this approach is the variety of measures that can be used
  • Each measure produces a unique clustering of gene expression patterns
contributions
Contributions
  • determining significant relationships between individual genes, based on:
    • linear correlation
    • rank correlation
    • information theory
linear correlation restriction
Linear correlation ------restriction
  • for 112 different genes, 112x111/2 = 6216 pairs of expression time series need to be examined
  • to restrict the number of relationships, we might want to test which correlations are significantly larger than a certain value
linear correlation restriction1
Linear correlation ------restriction
  • For instance, to find those relationships in which at least 50% of the variance is explained by the correlation, i.e. rho2>0.5, we need |r|>0.96 to reject at the 1% significance level the null hypothesis that |rho|<0.7071
linear correlation visualization
Linear correlation ------visualization
  • residual variance based distance measurment
    • d=1-r2
    • d=0 if perfectly correlated, d=1 if uncorrelated
  • multidimensional scaling
    • map time series into a two-dimensional plane
linear correlation visualization1
Linear correlation ------visualization
  • Multidimensional scaling of 34 time series with high correlation
nonlinear correlation model
Nonlinear correlation ------Model
  • Spearman rank correlation, rs
    • measurement for monotonic relationships
    • can be used for non-Gaussian distributions
  • 491 pairs of expression time series, involving 98 genes, which have a significant rs, ranging from -0.979 to 0.996
nonlinear correlation example
Nonlinear correlation ------Example
  • High rank correlation but low linear correlation between mGluR1 and GRa2
information theory mutual information
Information Theory ------mutual information
  • if H(A) and H(B) are the entropies of sources A and B respectively, and H(A,B) the joint entropy of the sources, then M(A,B) = H(A) + H(B) - H(A,B)
  • discrete form is much easier to use
  • We need discretize the time series by partitioning the expression levels into bins
information theory bin size
Information Theory ------Bin size
  • The fewer bins we use to discretize the data, the more information about the original time series we ignore.
  • On the other hand, too fine a binning will leave us with too few points per bin to get a reasonable estimate of the frequency of each bin
information theory mapping
Information Theory ------Mapping
  • Some time series map to the same discretized series
  • In total, from 112 unique continuous-valued time series we get 91 discretized time series
information theory mapping2
Information Theory ------Mapping
  • eliminate one-to-one mapping by permuting the bin numbers
    • H(A)=H(B)=M(A,B)
    • row 3 and row 4
  • replace such time series by one single series, leaving us with a set of 77 unique, non-equivalent time series.
information theory measurement
Information Theory ------Measurement
  • symmetric measures
    • M(A,B)/max(H(A),H(B))
    • M(A,B)/H(A,B)
  • asymmetric measures
    • Relative mutual information

R(A,B) = M(A,B)/H(B)

    • R(A,B) = 1.0, means that all the information about time series B is contained in time series A
conclusion
Conclusion
  • Linear correlation can be used very effectively to detect linear relationships
    • detect relationships not captured by Euclidean distance, such as high negative correlations
  • Rank correlation can be used to detect non-linear relationships
    • much more robust with respect to the distribution of expression levels
  • Information theory can be used to detect genes whose (binned) expression patterns share information
    • It will detect any mapping from time series A to B
ad