1 / 71

Models and methods for summarizing GeneChip probe set data

Models and methods for summarizing GeneChip probe set data. Some Gene Expression Analysis Tasks. Detection of gene expression – presence calls. Differential expression detection – comparative calls. Measurement of gene expression. Objective:.

wilda
Download Presentation

Models and methods for summarizing GeneChip probe set data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models and methods for summarizing GeneChip probe set data

  2. Some Gene Expression Analysis Tasks • Detection of gene expression – presence calls. • Differential expression detection – comparative calls. • Measurement of gene expression.

  3. Objective: To compute probe set summaries which are good indicators of gene expression from background corrected, normalized, prefect match probe intensities for a set of arrays: PM*ijk i=1,…,I, J=1,…,J, k=1,…K Where i denotes probes in probe sets, j denotes arrays and k denotes probe sets.

  4. Affymetrix spike-in data set used for illustration - 14 genes spiked in at differentconcentrations into a common pool ofpancreas cRNA

  5. Affy comment on non-responding probes: • Affymetrix: “Certain probe pairs for 407_at and 36889_at do not work well. It is recommended that these two probe sets be excluded for final statistical tally.”

  6. To log or not to log? In addition to providing good expression values, we would like the model to be easy to understand and analyse – Would like to fit a standard linear model: • Homogeneity of variance • Additivity • Normality

  7. Homogeneity of variance Look at association between the variance and the mean of the intensities – plot IQR of PM* across 59 replicates against the median of PM* across 59 replicates for probe sets spanning the range of intensities. Repeat for log2(PM*).

  8. Intensity scale – ALL

  9. Log Intensity scale – ALL

  10. Additivity Look at log-log plots of PM* vs concentrations for 14 spike-in fragments

  11. PM.bgc.norm vs Conc log-log plot grp 1

  12. PM.bgc.norm vs Conc log-log plot grp 2

  13. Suggested additive model Log-log plots of PM* vs concentrations suggest the following model: log2(PM*ij) = pi + cj + ij (1) With pi a probe affinity effect, cj the log2 scale expression level for chip j, and ij an iid error term. For identifiability we fit with constraint i pi=0.

  14. Normality We can examine residuals from a least squares fit to model specified in (1) to verify the adequacy of the model in terms of additivity of effects and stability of variance. The shape of the distribution of the residuals can also be compared with a Gaussian distribution to see how far off we are from this ideal.

  15. Res vs chip effects - grp 1

  16. Res vs chip effects - grp 2

  17. Figure – qqnorm residuals form additive fit

  18. Res qqnorm - grp 1

  19. Res qqnorm - grp 2

  20. Analyzing the untransformed PM* values On the untransformed scale, one can fit a multiplicative model (Li-Wong): PM*ij = i·j + ij (2) The model is fitted by least squares by iteratively fitting the s and the s, regarding the other set as known. Fitting steps are interleaved with diagnostic checks used to exclude points from subsequent fits.

  21. PM vs logConc - grp 1

  22. Res vs chip effects - grp 1

  23. qq - grp 1

  24. Why robust? • Bad probes – probe outliers • Bad chips – chip outliers • Image artifacts – individual outliers We would like a fitting procedure which yields good estimates in the presence of various types of outliers – individual points, probes, and chips.

  25. Robust estimation • Huber, Hampel, Rousseeuw • Gross errors, round off errors, wrong model. • Distinction between approach based on identification and exclusion of outliers and the modeling approach.

  26. M estimators • A general class of robust estimators are obtained as solutions to: min i(Yi-Xi) Where  is a symmetric function. Or solving the following system: i(Yi-Xi)· Xi=0

  27. Robust fit by IRLS for each probe set Starting with robust fit, at each iteration: S = mad(rij)·c – robust estimate of scale of  uij = rij/S – rescaled residuals wij =(|uij|)/|uij| – weights used in next LS fit. Theoretical considerations can lead to specification of . In practice, one selects  function with desirable characteristics.

  28. Example  functions

  29. Options for fitting models to probe sets Recall model log2(PM*ij) = pi + cj + ij (1) Can fit by: • Least squares • Least absolute deviation ((x)=|x|) • IRLS using various  functions • Can also get a single chip robust probe set summary.

  30. Robust fit example A

  31. Actual vs fitted

  32. Starting weights

  33. Ending weights

  34. Robust analysis of multi way tables • Tukey & co – median polish. • Tukey – one degree of freedom test – additivity or not – no partial judgement. • Gentleman and Wilks – effect of one or two outliers on residuals. • C. Daniel** – estimate which cells are affected by interactions, and estimate the interactions in a set of cells.

  35. Modified weights Standard IRLS procedures determines weights from each cell of the two way table individually. We can also look at residuals across cells in a row (column), to determine a weighting adjustment for the entire row (column): rwi =(|u|i•)/|u|i• , cwj =(|u|•j)/|u|•j And get a composite weight for each cell” wwij = rwi· wij wwwij = cwj · rwi· wij

  36. Heuristic derivation of weights Consider the model with interactions: log2(PM*ij) = pi + cj + ij + ij Can think of Tij = |rij|/S as a test statistic for H0: ij = 0 vs H1: ij  0 and wij = (Tij)/ Tij as a transformation of this test statistic into a weight. Similarly, one could use Ti = |ri•|/S =madi/mad to test H0: ij = 0, j=1,…J vs H1: ij  0 for some j and map this statistic into a weight.

  37. See how it does • Look initial (individual) weights & fit vs adjusted weights and fit and then to convergence. • Also look at probe weights in all spike-in probe sets. • Column weights

  38. Starting weights

  39. Ending weights

  40. Robust fit – composite weights

  41. Probe Weights

  42. Low-weight Probes - 1

  43. Low-weight Probes - 2

  44. Low-weight Probes - 3

  45. Chip Weights

  46. Note on multi chip context Note that the residual variance in the model without probe effects, the single chip analysis set-up, is ~ 6x the residual variance in the model with probe effects. Ie. log2(PM*ij) = pi + cj + ij Vs. log2(PM*ij) = cj + ij

  47. Compare fits on sample probe sets

  48. Affy Probe Data Analysis – X hybridizing probe sets

  49. X-Hybe probe 3

  50. X-Hybe probe 3 – PM vs Phi, Theta

More Related