1 / 13

Alex Lewin Sylvia Richardson ( IC Epidemiology) Tim Aitman (IC Microarray Centre)

Simultaneous Normalization and Differential Expression. Alex Lewin Sylvia Richardson ( IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina ( IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green and Graeme Ambler (Bristol).

dewalla
Download Presentation

Alex Lewin Sylvia Richardson ( IC Epidemiology) Tim Aitman (IC Microarray Centre)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simultaneous Normalization and Differential Expression Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina (IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green and Graeme Ambler (Bristol)

  2. Expression level dependent normalization Many gene expression data sets need normalization which depends on expression level. Usually normalization is performed in a pre-processing step before the model for differential expression is used. These analyses ignore the fact that the expression level is measured with variability. Ignoring this variability leads to bias in the function used for normalization.

  3. Simultaneous normalization and differential expression We propose a Bayesian model which includes array effects (normalization) in the differential expression model. Show (on simulated data) that ignoring the variability in the expression level leads to a greater number of false positives.

  4. Bayesian hierarchical model for differential expression Data: ygsr = log gene expression for gene g, replicate r g = gene effect δg = differential effect for gene g between 2 conditions r(g)s = array effect (expression-level dependent) gs2 = gene variance • 1st level yg1r  N(g – ½ δg + r(g)1 , g12), yg2r  N(g + ½ δg + r(g)2 , g22), Σrr(g)s = 0, r(g)s = function of g , parameters {a} and {b} • 2nd level Priors for gδg, coefficients {a} and {b} gs2  lognormal (μs, τs)

  5. Details of array effects (Normalization) Piecewise polynomial with unknown break points: r(g)s = quadratic in g for ars(k-1)≤ g ≤ ars(k) with coeff (brsk(1),brsk(2) ), k =1, … #breakpoints Locations of break points not fixed Must do sensitivity checks on # break points Cubic fits well for the data we are interested in

  6. Mouse Data 3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out 3 replicate arrays (wildtype mouse data) Model: posterior means E(r(g)s | data) v. E(g | data) Data:ygsr - E(g | data)

  7. Simulated Data • 1000 genes with 3 replicates under 2 conditions • Expression levels g between 0 and 10 (log scale) • g12  log Normal (-1.8,1), g22  log Normal (-2.2,1) • 900 genes: δg= 0 • 50 genes: δg N( log(3), 0.12) • 50 genes: δg N( -log(3), 0.12) • Array effects r(g)s cubic functions of g

  8. Array Effects and Variability for Simulated Data

  9. Two-step method • Use loess smoothing to obtain array effects loessr(g)s • Subtract loess array effects from data: yloessgsr = ygsr - loessr(g)s • Run our model on yloessgsrwith no array effects

  10. Two-step method • yloessgsr = ygsr - loessr(g)s • ymodelgsr = ygsr - E(r(g)s | data) • Results from 2 different two-step methods are much closer to each other than to full model results.

  11. Decision rules for selecting differentially expressed genes If P(δg > δcut | data) > pcut then gene g is called differentially expressed. We used δcut= log(3) – corresponds to null hypothesis. Various pcut – choose this according to acceptable error rate (e.g. False Discovery Rate).

  12. Full model v. two-step method Plot observed False Discovery Rate against pcut (averaged over 5 simulations) Solid line for full model Dashed line for pre-normalized method

  13. Discussion • More false positives if normalization carried out in a pre-processing step. • Larger slope of array effects – larger difference between full and pre-normalized models • Lewin, A., Richardson, S., Marshall C., Glazier A. and Aitman T. (2004) Bayesian Modelling of Differential Gene Expression. (under revision), available at http ://www.bgx.org.uk/

More Related