Measurement Error Model for Gene Expression Analysis

ECS 289A PresentationJimin Ding • Problem & Motivation • Two-component Model • Estimation for Parameters in above model • Define low and high level gene expression • Comparing expression levels • Limitations of the model and method • Other possible solutions • References

A Model for Measurement Error for Gene Expression Arrays David Rocke & Blythe Durbin Journal of Computational Biology Nov.2001

Problem & Motivation • Statistical inference for data need assumption of normality with constant variance --- So hypothesis testing for the difference between control and treatment need equal variance (not depending on the mean of the data); • Measurement error for gene expression rises proportionately to the expression level --- So linear regression fails and log transformation has been tried; • However, for genes whose expression level is low or entirely unexpressed, the measurement error doesn’t go down proportionately Example --- So log transformation fails by inflating the variance of observations near background, and two component model is introduced.

Example: MiceFrom: Barosiewics etatl, 2000

From Durbin et.al 2002back

Two-Component Model • Y is the intensity measurement • is the expression level in arbitrary units • is the mean intensity of unexpressed genes • Error term:

Estimation for background ( ) • Estimation of background using negative controls • Estimation of background with replicate measurements Detail • Estimation of background without replicate

Estimation of with replicate measurements • Begin with a small subset of genes with low intensity (10%) • Define a new subset consisting of genes whose intensity values are in • Repeat the first and second steps until the set of genes does not change.

Estimation of the High-level RSD • The variance of intensity in two-component model: , where • At high expression level, only multiple error term is noticeable, so the ratio of the variation to the mean is a constant, i.e. RSD= • For each replicated gene that is at high level, compute the mean of the and the standard deviation of • Then use the pooled standard deviation to estimate :

Define “high” and “low” • Low expression level: Most of the variance is due to the additive error component. 95% CI: • High expression level: Most of the variance is due to the multiplicative error component. 95% CI:

Comparing Expression Levels • Common method: standard t-test on ratio of expression for treatment and control (low level), or its logarithm (high level). • Problem: Less effective when gene is expressed at a low level in one condition and high in the other:

Solution consider treatment and control are correlated • Model: • Variation: Background: High-level RSD:

Hypothesis testing (Comparison) • Assume the data have been adjusted: • Testing: (Gene has same expression level at Control and treatment) • Then using the following approximate variance to do standard t-test for log ratio of raw data:

Limitations • No theoretical result for above estimations. (Consistency and asymptotical distribution) • Cutoff point of high level and low level is fairly artificial • The convergence of estimation of background information is heavily dependent on data and initial selection

Literature & Other Possible Solutions for Measurement Error • Chen et al. (1997): measurement error is normally distributed with constant coefficient of variation (CV)—in accord with experience • Ideker et al.(2000) introduce a multiplicative error component (normal) • Newton et al. (2001) propose a gamma model for measurement error. • Durbin et al.(2002) suggest transformation , where • Huber et al.(2002) introduce transformation

References • Blythe Durbin, Johanna Hardin, Douglas Hawkins, and David Rocke. “A variancestabilizing transformation from gene-expression microarray data”, Bioinformatics, ISMB, 2002. • Chen. Y., Dougherty, E.R. and Bittner, M.L.(1997) “Ratio-based decisions and the quantitative analysis of cDNA microarray images”, J.Biomed. Opt.,2,364-374 • Wolfgang Huber, Anja von Heydebreck,Martin Vingron (Dec.2002) “Analysis of microarray gene expression data”, Preprint • Wolfgang Huber, Anja von Heydebreck, Holger S¨ultmann, Annemarie Poustka, and Martin Vingron. “Variance stablization applied to microarray data calibration and to the quantification of differential expression”, Bioinformatics, 18 Suppl. 1:S96–S104, 2002. ISMB 2002.

Measurement Error Model for Gene Expression Analysis

Measurement Error Model for Gene Expression Analysis

Presentation Transcript

About ECS

About ECS

DING DING DING - BELL RINGER!!!

ECS 152B

About ECS

Ding ding ding…the final countdown

ECS

Basic ECS

Jonathan Ding

ECS- Oy Company presentation

Ding Xiang

ECS

ECS

ECS 152A

ECS

Jimin Shao shaojimin@zju

Ecs Login