1 / 16

ECS 289A Presentation Jimin Ding

ECS 289A Presentation Jimin Ding. Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression Comparing expression levels Limitations of the model and method Other possible solutions References.

Download Presentation

ECS 289A Presentation Jimin Ding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECS 289A PresentationJimin Ding • Problem & Motivation • Two-component Model • Estimation for Parameters in above model • Define low and high level gene expression • Comparing expression levels • Limitations of the model and method • Other possible solutions • References

  2. A Model for Measurement Error for Gene Expression Arrays David Rocke & Blythe Durbin Journal of Computational Biology Nov.2001

  3. Problem & Motivation • Statistical inference for data need assumption of normality with constant variance --- So hypothesis testing for the difference between control and treatment need equal variance (not depending on the mean of the data); • Measurement error for gene expression rises proportionately to the expression level --- So linear regression fails and log transformation has been tried; • However, for genes whose expression level is low or entirely unexpressed, the measurement error doesn’t go down proportionately Example --- So log transformation fails by inflating the variance of observations near background, and two component model is introduced.

  4. Example: MiceFrom: Barosiewics etatl, 2000

  5. From Durbin et.al 2002back

  6. Two-Component Model • Y is the intensity measurement • is the expression level in arbitrary units • is the mean intensity of unexpressed genes • Error term:

  7. Estimation for background ( ) • Estimation of background using negative controls • Estimation of background with replicate measurements Detail • Estimation of background without replicate

  8. Estimation of with replicate measurements • Begin with a small subset of genes with low intensity (10%) • Define a new subset consisting of genes whose intensity values are in • Repeat the first and second steps until the set of genes does not change.

  9. Estimation of the High-level RSD • The variance of intensity in two-component model: , where • At high expression level, only multiple error term is noticeable, so the ratio of the variation to the mean is a constant, i.e. RSD= • For each replicated gene that is at high level, compute the mean of the and the standard deviation of • Then use the pooled standard deviation to estimate :

  10. Define “high” and “low” • Low expression level: Most of the variance is due to the additive error component. 95% CI: • High expression level: Most of the variance is due to the multiplicative error component. 95% CI:

  11. Comparing Expression Levels • Common method: standard t-test on ratio of expression for treatment and control (low level), or its logarithm (high level). • Problem: Less effective when gene is expressed at a low level in one condition and high in the other:

  12. Solution consider treatment and control are correlated • Model: • Variation: Background: High-level RSD:

  13. Hypothesis testing (Comparison) • Assume the data have been adjusted: • Testing: (Gene has same expression level at Control and treatment) • Then using the following approximate variance to do standard t-test for log ratio of raw data:

  14. Limitations • No theoretical result for above estimations. (Consistency and asymptotical distribution) • Cutoff point of high level and low level is fairly artificial • The convergence of estimation of background information is heavily dependent on data and initial selection

  15. Literature & Other Possible Solutions for Measurement Error • Chen et al. (1997): measurement error is normally distributed with constant coefficient of variation (CV)—in accord with experience • Ideker et al.(2000) introduce a multiplicative error component (normal) • Newton et al. (2001) propose a gamma model for measurement error. • Durbin et al.(2002) suggest transformation , where • Huber et al.(2002) introduce transformation

  16. References • Blythe Durbin, Johanna Hardin, Douglas Hawkins, and David Rocke. “A variancestabilizing transformation from gene-expression microarray data”, Bioinformatics, ISMB, 2002. • Chen. Y., Dougherty, E.R. and Bittner, M.L.(1997) “Ratio-based decisions and the quantitative analysis of cDNA microarray images”, J.Biomed. Opt.,2,364-374 • Wolfgang Huber, Anja von Heydebreck,Martin Vingron (Dec.2002) “Analysis of microarray gene expression data”, Preprint • Wolfgang Huber, Anja von Heydebreck, Holger S¨ultmann, Annemarie Poustka, and Martin Vingron. “Variance stablization applied to microarray data calibration and to the quantification of differential expression”, Bioinformatics, 18 Suppl. 1:S96–S104, 2002. ISMB 2002.

More Related