680 likes | 943 Views
10/11/04. Affymetrix. Outline. Drive our intuition (basic data)Formalize the intuitionCheck functionalityLook at resultsBonus tricks and stunts. 10/11/04. Affymetrix. Chips / wafer. . Wafers, Chips, and Features. . . 10/11/04. Affymetrix. Expression Probes. . Probes. Sequence. . . . . . . . . .
E N D
1. 10/11/04 Affymetrix Designing M-estimators for expression analysis: PLIER Earl Hubbell
Principal Statistician
Affymetrix
2. 10/11/04 Affymetrix Outline Drive our intuition (basic data)
Formalize the intuition
Check functionality
Look at results
Bonus tricks and stunts
3. 10/11/04 Affymetrix Wafers, Chips, and Features We talk a lot about "wafers", "chips", and "features". This is a graphical representation.
We've mirrored the technology in the semiconductor industry, photolithography, with life sciences.
We do manufacture on wafers. These wafers are 5-inch-by-5-inch pieces of glass.
In our whole-genome products we get 49 individual chips out of each wafer, and on each one of those chips there are over 1,300,000 unique features, and each one of those features has millions of identical DNA probes on them. We talk a lot about "wafers", "chips", and "features". This is a graphical representation.
We've mirrored the technology in the semiconductor industry, photolithography, with life sciences.
We do manufacture on wafers. These wafers are 5-inch-by-5-inch pieces of glass.
In our whole-genome products we get 49 individual chips out of each wafer, and on each one of those chips there are over 1,300,000 unique features, and each one of those features has millions of identical DNA probes on them.
4. 10/11/04 Affymetrix Expression Probes
5. 10/11/04 Affymetrix Components of Stray Signal
6. 10/11/04 Affymetrix Components of Bound Target Signal and Noise
7. 10/11/04 Affymetrix Hybridization is mostly linear, with some stray signal & saturation
8. 10/11/04 Affymetrix One probe (pair): PM-MM* reduces bias
9. 10/11/04 Affymetrix Probes not very informative about concentration near background!
10. 10/11/04 Affymetrix Probes have systematic differences
11. 10/11/04 Affymetrix Affinity compensates for first-order probe differences
12. 10/11/04 Affymetrix Likelihood summarizes knowledge of expression
13. 10/11/04 Affymetrix A pause before jumping into equations
statistics, whatever their mathematical sophistication and elegance, cannot make bad variables into good ones.
H.T. Reynolds, Analysis of Nominal Data
14. 10/11/04 Affymetrix Fun with Statistics Money: What should I estimate?
M-estimators: Statistics by Optimization
Model: Linking Intensity to Concentration
Mismatches: Faking Subtraction
Mayhem: Does it work?
More: Tricks!
15. 10/11/04 Affymetrix Estimator Goals Handle zero/near-zero concentrations
Handle arithmetic noise at low end
Minimum bias (avoid sample trouble)
[can always variance stabilize later]
Resist outliers
Avoid lots of parameters!
16. 10/11/04 Affymetrix How to estimate? (-5+373+473)/3 = 280.3 (Mean)
280.3 is the value minimizing (x+5)^2+(x-373)^2+(x-473)^2
median(-5,373,473) = 373
373 is the value minimizing |x+5|+|x-373|+|x-473|
17. 10/11/04 Affymetrix M-estimator Optimizes some function of the data sum(f(y,xi)) for y
y is then an estimate of some interesting property of the data (we hope)
Looks like Maximum Likelihood estimates (but can tune for utility)
18. 10/11/04 Affymetrix Designing the M-estimator PLIER M-estimator minimizes some function of the data and the estimator(s)
Our case: sum( f(PM,MM, a,c,z) )
Choose f to model reasonable error
Choose tail of f to handle outliers
PLIER: Probe Logarithmic Intensity ERror
19. 10/11/04 Affymetrix Assumptions (approximations?) Concentration never negative! c>=0
linear link between true signal & concentration: T~a*c
Background (not constant) adds to signal: I ~ T+B
Background same for PM and MM
Multiplicative intensity error log(I) ~ normal(log(T+B),s^2)
20. 10/11/04 Affymetrix Assumption: Multiplicative Error Widely agreed that replicate observations of probes (PM,MM) are approximately log-normal
I.e. PM varies by 10% of PM
Does not imply that derived quantities (PM-MM or PM-B) are also log-normal!
I.e. PM-MM varies by ~7% of (PM+MM) not by 10% of (PM-MM)!
21. 10/11/04 Affymetrix No obvious need for arithmetic noise for raw intensities
22. 10/11/04 Affymetrix Simplified model [PM-MM] PM= a*c+MM
MM = a2*c+B
If B can vary wildly (experiment-experiment, probe-probe) , left with
PM-MM = a*c
Incorporating multiplicative error
e1*PM-e2*MM = a*c
23. 10/11/04 Affymetrix Key concept: good fits have small multiplicative errors Trying to minimize log(e1)^2+log(e2)^2
The actual minimum is a complicated function, so we(I) dont want to solve for it
And we dont have to - M-estimators can be chosen for computational convenience
Therefore, let log(e1)^2=log(e2)^2
24. 10/11/04 Affymetrix How good is the fit?: 2 possible log(e1)=log(e2) (log transform)- no solution for MM>PM, always worse fit than
log(e1)=-log(e2) (PLIER)
=> e = [a*c+sqrt((a*c)^2+4*PM*MM)]/2*PM
log(e) exists for any PM,MM>0, any a,c
effective error model changes from arithmetic near zero to multiplicative far from zero
25. 10/11/04 Affymetrix PM - MM Goodness of fit
26. 10/11/04 Affymetrix Define center of f Residual r=log(e)
Under log-normal assumption, fit for least r^2
But we should fix the tails (where outliers show up, and the approximation breaks down)
27. 10/11/04 Affymetrix Robustness Want to discount outliers compared to sum-of-squares
Off-the-shelf: Geman-McClure transformation f(r,z) = r^2/(1+r^2/z)
Looks like least-squares for r small
bounds influence of residual to at most z
28. 10/11/04 Affymetrix Transformation f(r) and its Influence Function
29. 10/11/04 Affymetrix
30. 10/11/04 Affymetrix PLIER: on a t-shirt y= a*c
e = [y+sqrt(y^2+4*PM*MM)]/2*PM
r = log(e)
f(r,z) = r^2/(1+r^2/z)
argmin(sum(f(r,z))) over all a,c >=0
yields PLIER estimate of affinity and concentration
31. 10/11/04 Affymetrix Optimizing (finding minima) Many ways to find best fit
Easiest to explain is cyclic coordinate ascent aka polishing the data
Can start anywhere (but best to start with a good guess)
32. 10/11/04 Affymetrix Finding affinity/concentration [Dont I need to know one to start?]
33. 10/11/04 Affymetrix Observed PM/MM values
34. 10/11/04 Affymetrix Compare observed to predicted (find where to improve predictions)
35. 10/11/04 Affymetrix Polishing the table Guess initial values (a = 1.0, c=0)
Find best concentrations (with current affinities)
Find best affinities (for current concentrations)
Repeat until minimized (or bored)
[remember: values non-negative!]
36. 10/11/04 Affymetrix How does it work on real data? Gold-standard data generated by spiking in known transcript
Example is one of the transcripts (6th)
Look at residuals to find outliers
37. 10/11/04 Affymetrix Latin Square Experimental Design
38. 10/11/04 Affymetrix Model fit:A=1.0, C=0.0
39. 10/11/04 Affymetrix Residuals: Fit Concentration
40. 10/11/04 Affymetrix Residuals: Fit Probes
41. 10/11/04 Affymetrix Fit concentration and affinities - data fits except for outliers [clearly revealed]
42. 10/11/04 Affymetrix What are the outliers?
43. 10/11/04 Affymetrix Final results (value)
44. 10/11/04 Affymetrix Know everything (approx)
45. 10/11/04 Affymetrix Trick: P/A calls by fit
46. 10/11/04 Affymetrix Trick: models are good for residuals!
47. 10/11/04 Affymetrix Optimization (harder to illustrate) Current implementation uses descent optimization (Newton-like)
Start with a good initial guess (median polish)
Improve by descent
Try jumps to escape local minima
48. 10/11/04 Affymetrix Evaluating performance MvA plots (unbiased/biased)
Receiver Operating Characteristic (ROC)
Area Under Curve (AUC) (global/stratified)
Benchmark results
49. 10/11/04 Affymetrix MvA plots Scatterplot turned 45%
Plotting A vs B
M = log(A)-log(B)
A = (log(A)+log(B))/2 average
Allows easy visualization of changes
50. 10/11/04 Affymetrix MVA (bias added for stabilization)
51. 10/11/04 Affymetrix Receiver Operating Characteristic ROC curves measure separation of distributions for two states
Changed or unchanged between pair(s) of experiments
Depends on the variation of the signal within an experiment, and the separation between the two states
Note that just measuring variation or just measuring separation can be misleading!
One popular method of defining changed is a fold-change threshold
ROC curves can be summarized by area under curve
52. 10/11/04 Affymetrix Overall performance good (ROC)
53. 10/11/04 Affymetrix Specific performance regimes are of interest Low, medium, high concentrations
Relatively small fold-changes (2-fold, 4-fold)
Thresholds defined by fold-change
Thresholds defined by change relative to variation (t-like statistic)
54. 10/11/04 Affymetrix
55. 10/11/04 Affymetrix Output characteristics of some standard methods MAS 5.0 Not variance stabilized(*), some bias, runs on single chips
PLIER Not variance stabilized(*), minimal bias, reduced variance, runs on multiple chips
RMA Variance stable, noticeable bias, low variance, runs on multiple chips
(*)[Can always apply stabilizing transformation]
56. 10/11/04 Affymetrix
57. 10/11/04 Affymetrix
58. 10/11/04 Affymetrix
59. 10/11/04 Affymetrix
60. 10/11/04 Affymetrix
61. 10/11/04 Affymetrix Works fine on U133 too
62. 10/11/04 Affymetrix Bonus: M-estimator tricks Handle PM-only (PM+MM, PM-B) just fine by replacing error model in f
Play Bayesian games (affinity penalties, concentration penalties)
63. 10/11/04 Affymetrix M-estimator: PM only PM-B = a*c
[background estimate perfect]
e*PM-B = a*c
e= (a*c+B)/PM
proceed in the same framework using e
Note that B can be zero for (a*c>0)
64. 10/11/04 Affymetrix PM-only: global background biased
65. 10/11/04 Affymetrix Can play Bayesian games Probe affinities likely to be log-normal distributed
Add a penalty term to avoid overweighting any single probe
Good when insufficient data
sum(log(e)^2) + (penalty)*sum(log(a)^2)
[Can do the same for concentration]
66. 10/11/04 Affymetrix Bayesian prior on probes
67. 10/11/04 Affymetrix PLIER M-estimators form a very flexible framework for analysis
Can handle PM-B, PM-MM, PM-only approaches in same framework
Handles zero/near-zero concentration & affinities in model directly
Seems to produce good results
68. 10/11/04 Affymetrix PLIER: obtaining an implementation PLIER algorithm SDK is now available under a GPL open source license.
The code is available as C++ without windows dependencies. Documentation is included at the site.
All of us at Affymetrix hope that releasing PLIER in this manner promotes all of the values that the Bioconductor community embraces.
http://www.affymetrix.com/support/developer/index.affx
69. 10/11/04 Affymetrix Thanks David Kulp
Sejal Shah
Simon Cawley
David Finkelstein
Mike Lelivelt
Teresa Webster Rui Mei
Suzanne Dee
Stefan Bekiranov
Xiaojun Di
Alex Cheung
Steve Lincoln
Many, many others!