1 / 14

Lecture Topic 5

Lecture Topic 5. Pre-processing AFFY data. Probe Level Analysis. The Purpose Calculate an expression value for each probe set (gene) from the 11-25 PM and MM intensities Critical for later analysis. Avoiding GIGO. Difficulties. Large variability Few measurements (11-25) at most

reesep
Download Presentation

Lecture Topic 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture Topic 5 Pre-processing AFFY data

  2. Probe Level Analysis • The Purpose • Calculate an expression value for each probe set (gene) from the 11-25 PM and MM intensities • Critical for later analysis. Avoiding GIGO

  3. Difficulties • Large variability • Few measurements (11-25) at most • MM is very complex, it is signal plus background • Signal has to be SCALED • Probe-level effects

  4. Different Methods • MAS 4 Affymetrix 1996 • MAS 5 Affymetrix 2002 • Robust Multichip Analysis (RMA) 2002 • GC-RMA 2004

  5. MAS 4 A- probe pairs selected

  6. Avg Diff • Calculated using differences between MM and PM of every probe pair and averaging over the probe pair • Excluded OUTLIER pairs if PM-MM > 3 SD • Was NOT a robust average • NOT log-transformed • COULD be negative (about 1/3 of the times)

  7. MAS 5 • Signal=TukeyBiweight{log2(PMj-IMj) • Discussed this earlier. • Requires calculating IM • Adjusted PM-MM are log transformed and robust for outlying observations using Tukey Biweight.

  8. Robust Multichip Analysis ONLY uses PM and ignores MM SACRIFICES Accuracy but major gains in PRECISION • Basic Steps: • 1. Calculate chip background (*BG) and subtract from PM • 2. Carry out intensity dependent normalization for PM-*BG • Lowess • Quantile Normalization (Discussed before) • Normalized PM-*BG are log transformed • Robust multichip analysis of all probes in the set and using Tukey median polishing procedure. Signal is antilog of result.

  9. RMA- Step 1: Background Correction • Irrizary et al(2003) • Looks at finding the conditional expectation of the TRUE signal given the observed signal (which is assumed to be the true signal plus noise) • E(si | si+bi) • Here, si assumed to follow Exponential distribution with parameter q. • Bi assumed to follow N(me, s2e) • Estimate me and se as the mean and standard deviation of empty spots

  10. RMA- BG Corrected Value

  11. RMA-Normalization Use the background corrected intensities B(PM) to carry out normalization • Lowess (for Spatial effects) • Quantile Normalization (to allow comparability amongst replicate slides) • Normalized B(PM) are log transformed

  12. RMA summarization • Use MEDIAN POLISH to fit a linear model • Given a MATRIX of data: • Data= overall effects+row effects + column effects + residual • Find row and column effects by subtracting the medians of row and column successively till all the medians are less than some epsilon • Gives estimated row, column and overall effect when done

  13. Median Polish of RMA • For each probe set we have a matrix (probes in rows and arrays in columns) • We assume: • Signal=probe affinity effect + logscale for expression + error • Also assume the sum of probe affinities is 0 • Use MEDIAN polish to estimate the expression level in each array

More Related