1 / 37

Applications of microarrays

Applications of microarrays. Measuring transcript abundance (result of production versus degradation) Mapping transcript structure (alternative splicing, TSSs, or degradation; UTRs) Genotyping Estimating DNA copy number (CGH) DNA-protein interactions ….

santos
Download Presentation

Applications of microarrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of microarrays Measuring transcript abundance (result of production versus degradation) Mapping transcript structure (alternative splicing, TSSs, or degradation; UTRs) Genotyping Estimating DNA copy number (CGH) DNA-protein interactions …

  2. Preprocessing, error models, quality assessment

  3. abundance vs transcription rate In principle, they are independent. If only passive degradation:

  4. Response curve Lockhart et. al. Nature Biotechnology 14 (1996)

  5. compression Yue et al., (Incyte Genomics) NAR (2001) 29 e41

  6. log-ratio Which genes are differentially transcribed? same-same tumor-normal

  7. Statistics 101: biasaccuracy  precision variance

  8. Basic dogma of data analysis: Can always increase sensitivity on the cost of specificity, or vice versa, the art is to find the sweet spot. X X X X X X X X X (It can also be possible to increase both by better choice of method / model)

  9. 3000 3000 x3 ? 1500 200 1000 0 ? x1.5 A A B B C C But what if the gene is “off” (below detection limit) in one condition? ratios and fold changes Fold changes are useful to describe continuous changes in expression

  10. fold change estimation and background correction • Many interesting genes will be off in some of the conditions of interest • Due to unspecific hybridization and optical noise, measured values are always > 0. • If you want expression measure to be an unbiased estimator of abundance • strong background correction, get many values  0 • need something else than (log)ratio • 2. If you let expression measure be biased (always>0) • weak background correction, then can keep ratios. • how do you choose the bias?

  11. Raw data are not mRNA concentrations The problem is less that these steps are ‘not perfect’; it is that they may vary from gene to gene, array to array, experiment to experiment.

  12. Systematic Stochastic o similar effect on many measurements o corrections can be estimated from data o too random to be ex-plicitely accounted for o remain as “noise” Calibration Error model Sources of variation amount of RNA in the sample efficiencies of -RNA extraction -reverse transcription -labeling -fluorescent detection probe purity and length distribution cross-/unspecific hybridization stray signal

  13. bi per-sample normalization factor bk sequence-wise probe efficiency hik ~ N(0,s22) “multiplicative noise” ai per-sample offset eik ~ N(0, bi2s12) “additive noise” modeling ansatz measured intensity = offset + gain  true abundance

  14. “multiplicative” noise “additive” noise  The two-component model raw scale log scale B. Durbin, D. Rocke, JCB 2001

  15. variance stabilizing transformations Xu a family of random variables with EXu=u, VarXu=v(u). Define var f(Xu ) independent of u derivation: linear approximation

  16. variance stabilization f(x) x

  17. 1.) constant variance (‘additive’) 2.) constant CV (‘multiplicative’) 3.) offset 4.) additive and multiplicative  variance stabilizing transformations

  18. the “glog” transformation - - - f(x) = log(x) ———hs(x) = asinh(x/s) P. Munson, 2001 D. Rocke & B. Durbin, ISMB 2002

  19. generalized log-ratio difference log-ratio variance: constant part proportional part variance stabilization raw scale log glog

  20. parameter estimation o maximum likelihood estimator: straightforward – but sensitive to deviations from normality o model holds for genes that are unchanged; differentially transcribed genes act as outliers. o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression. o works well as long as <50% of genes are differentially transcribed (and may still work otherwise)

  21. Least trimmed sum of squares regression minimize P. Rousseeuw, 1980s - least sum of squares - least trimmed sum of squares

  22. evaluation: effects of different data transformations difference red-green rank(average)

  23. glog

  24. For Affymetrix data, it turns out that the weak background correction method of RMA and the glog(-ratio) of vsn result in very similar results vsn also useful for other array platforms (e.g. spotted two-color) Don't be afraid of the "glog", it is equivalent to weak (=biased) background correction and normal log! vsn package (see vignette) Ref.: Huber, von Heydebreck et al., Bioinformatics 2002

  25. evaluation: sensitivity / specificity in detecting differential abundance o Data: paired tumor/normal tissue from 19 kidney cancers, in color flip duplicates on 38 cDNA slides à 4000 genes. o 6 different strategies for normalization and quantification of differential abundance o Calculate for each gene & each method: t-statistics, permutation-p oFor threshold a, compare the number of genes the different methods find, #{pi | pia}

  26. evaluation: comparison of methods one-sided test for upone-sided test for down more accurate quantification of differential expression  higher sensitivity / specificity

  27. evaluation: a benchmark for Affymetrix genechip expression measures o Data: Spike-in series: from Affymetrix 59 x HGU95A, 16 genes, 14 concentrations, complex background Dilution series: from GeneLogic 60 x HGU95Av2, liver & CNS cRNA in different proportions and amounts o Benchmark: 15 quality measures regarding -reproducibility -sensitivity -specificity Put together by Rafael Irizarry (Johns Hopkins) http://affycomp.biostat.jhsph.edu

  28.  ROC curves

  29. affycomp results good bad

  30. Probe Set Summarization

  31. Probe set summarization - data and notation PMijg , MMijg= Intensities for perfect match and mismatch probe j for gene g in chip i i = 1,…, n one to hundreds of chips j = 1,…, J usually 11 or 16 probe pairs g= 1,…, G 6…30,000 probe sets. Tasks: calibrate (normalize) the measurements from different chips (samples) summarize for each probe set the probe level data, i.e., 16 PM and MM pairs, into a single expression measure. compare between chips (samples) for detecting differential expression.

  32. expression measures: MAS 4.0 Affymetrix GeneChip MAS 4.0 software uses AvDiff, a trimmed mean: o sort dj = PMj -MMj o exclude highest and lowest value o J := those pairs within 3 standard deviations of the average

  33. Expression measures MAS 5.0 Instead of MM, use "repaired" version CT CT= MM if MM<PM = PM / "typical log-ratio" if MM>=PM "Signal" = Tukey.Biweight (log(PM-CT)) (…median) Tukey Biweight: B(x) = (1 – (x/c)^2)^2 if |x|<c, 0 otherwise

  34. Expression measures: Li & Wong dChip fits a model for each gene where • qi: expression index for gene i • fj: probe sensitivity Maximum likelihood estimate of MBEI is used as expression measure of the gene in chip i. Need at least 10 or 20 chips. Current version works with PMs only.

  35. Robust expression measures RMA: Irizarry et al. (2002) AvDiff-like with A a set of “suitable” pairs. Li-Wong-like: additive model Estimate RMA = ai for chip i using robust method median polish (successively remove row and column medians, accumulate terms, until convergence). Works with d>=2

  36. Expression measures RMA: Irizarry et al. (2002) o Estimate one global background value b=mode(MM). No probe-specific background! o Assume: PM = strue + b Estimate s0 from PM and b as a conditional expectation E[strue|PM, b]. o Use log2(s). o Nonparametric nonlinear calibration ('quantile normalization') across a set of chips.

  37. Affymetrix: IPM = IMM + Ispecific ? log(PM/MM) From: R. Irizarry et al., Biostatistics 2002 0

More Related