Affymetrix Data Pre-processing: Probe-level Analysis Methods

Lecture Topic 5 Pre-processing AFFY data

Probe Level Analysis • The Purpose • Calculate an expression value for each probe set (gene) from the 11-25 PM and MM intensities • Critical for later analysis. Avoiding GIGO

Difficulties • Large variability • Few measurements (11-25) at most • MM is very complex, it is signal plus background • Signal has to be SCALED • Probe-level effects

Different Methods • MAS 4 Affymetrix 1996 • MAS 5 Affymetrix 2002 • Robust Multichip Analysis (RMA) 2002 • GC-RMA 2004

MAS 4 A- probe pairs selected

Avg Diff • Calculated using differences between MM and PM of every probe pair and averaging over the probe pair • Excluded OUTLIER pairs if PM-MM > 3 SD • Was NOT a robust average • NOT log-transformed • COULD be negative (about 1/3 of the times)

MAS 5 • Signal=TukeyBiweight{log2(PMj-IMj) • Discussed this earlier. • Requires calculating IM • Adjusted PM-MM are log transformed and robust for outlying observations using Tukey Biweight.

Robust Multichip Analysis ONLY uses PM and ignores MM SACRIFICES Accuracy but major gains in PRECISION • Basic Steps: • 1. Calculate chip background (*BG) and subtract from PM • 2. Carry out intensity dependent normalization for PM-*BG • Lowess • Quantile Normalization (Discussed before) • Normalized PM-*BG are log transformed • Robust multichip analysis of all probes in the set and using Tukey median polishing procedure. Signal is antilog of result.

RMA- Step 1: Background Correction • Irrizary et al(2003) • Looks at finding the conditional expectation of the TRUE signal given the observed signal (which is assumed to be the true signal plus noise) • E(si | si+bi) • Here, si assumed to follow Exponential distribution with parameter q. • Bi assumed to follow N(me, s2e) • Estimate me and se as the mean and standard deviation of empty spots

RMA- BG Corrected Value

RMA-Normalization Use the background corrected intensities B(PM) to carry out normalization • Lowess (for Spatial effects) • Quantile Normalization (to allow comparability amongst replicate slides) • Normalized B(PM) are log transformed

RMA summarization • Use MEDIAN POLISH to fit a linear model • Given a MATRIX of data: • Data= overall effects+row effects + column effects + residual • Find row and column effects by subtracting the medians of row and column successively till all the medians are less than some epsilon • Gives estimated row, column and overall effect when done

Median Polish of RMA • For each probe set we have a matrix (probes in rows and arrays in columns) • We assume: • Signal=probe affinity effect + logscale for expression + error • Also assume the sum of probe affinities is 0 • Use MEDIAN polish to estimate the expression level in each array

Affymetrix Data Pre-processing: Probe-level Analysis Methods

Affymetrix Data Pre-processing: Probe-level Analysis Methods

Presentation Transcript

TOPIC 5

Topic 5:

Topic 5

Topic--5

Topic 5

Topic Lecture

Topic 5

Topic 5

Topic Lecture

TOPIC 5

Topic 5

Topic 5

Topic 5

Topic 5

Topic Lecture

TOPIC 5

LECTURE TOPIC

Topic 5. Lecture 8. Earth and Fossils.

Topic 5

Lecture Topic

Topic 5:

Topic 5