1 / 68

Designing M-estimators for expression analysis: PLIER

10/11/04. Affymetrix. Outline. Drive our intuition (basic data)Formalize the intuitionCheck functionalityLook at resultsBonus tricks and stunts. 10/11/04. Affymetrix. Chips / wafer. . Wafers, Chips, and Features. . . 10/11/04. Affymetrix. Expression Probes. . Probes. Sequence. . . . . . . . . .

leiko
Download Presentation

Designing M-estimators for expression analysis: PLIER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 10/11/04 Affymetrix Designing M-estimators for expression analysis: PLIER Earl Hubbell Principal Statistician Affymetrix

    2. 10/11/04 Affymetrix Outline Drive our intuition (basic data) Formalize the intuition Check functionality Look at results Bonus tricks and stunts

    3. 10/11/04 Affymetrix Wafers, Chips, and Features We talk a lot about "wafers", "chips", and "features". This is a graphical representation. We've mirrored the technology in the semiconductor industry, photolithography, with life sciences. We do manufacture on wafers. These wafers are 5-inch-by-5-inch pieces of glass. In our whole-genome products we get 49 individual chips out of each wafer, and on each one of those chips there are over 1,300,000 unique features, and each one of those features has millions of identical DNA probes on them. We talk a lot about "wafers", "chips", and "features". This is a graphical representation. We've mirrored the technology in the semiconductor industry, photolithography, with life sciences. We do manufacture on wafers. These wafers are 5-inch-by-5-inch pieces of glass. In our whole-genome products we get 49 individual chips out of each wafer, and on each one of those chips there are over 1,300,000 unique features, and each one of those features has millions of identical DNA probes on them.

    4. 10/11/04 Affymetrix Expression Probes

    5. 10/11/04 Affymetrix Components of Stray Signal

    6. 10/11/04 Affymetrix Components of Bound Target Signal and Noise

    7. 10/11/04 Affymetrix Hybridization is mostly linear, with some stray signal & saturation

    8. 10/11/04 Affymetrix One probe (pair): PM-MM* reduces bias

    9. 10/11/04 Affymetrix Probes not very informative about concentration near background!

    10. 10/11/04 Affymetrix Probes have systematic differences

    11. 10/11/04 Affymetrix “Affinity” compensates for first-order probe differences

    12. 10/11/04 Affymetrix “Likelihood” summarizes knowledge of expression

    13. 10/11/04 Affymetrix A pause before jumping into equations “… statistics, whatever their mathematical sophistication and elegance, cannot make bad variables into good ones.” H.T. Reynolds, “Analysis of Nominal Data”

    14. 10/11/04 Affymetrix Fun with Statistics Money: What should I estimate? M-estimators: Statistics by Optimization Model: Linking Intensity to Concentration Mismatches: Faking Subtraction Mayhem: Does it work? More: Tricks!

    15. 10/11/04 Affymetrix Estimator Goals Handle zero/near-zero concentrations Handle “arithmetic” noise at low end Minimum bias (avoid sample trouble) [can always variance stabilize later] Resist outliers Avoid lots of parameters!

    16. 10/11/04 Affymetrix How to estimate? (-5+373+473)/3 = 280.3 (“Mean”) 280.3 is the value minimizing (x+5)^2+(x-373)^2+(x-473)^2 median(-5,373,473) = 373 373 is the value minimizing |x+5|+|x-373|+|x-473|

    17. 10/11/04 Affymetrix M-estimator Optimizes some function of the data sum(f(y,xi)) for y y is then an estimate of some interesting property of the data (we hope) Looks like “Maximum Likelihood” estimates (but can tune for utility)

    18. 10/11/04 Affymetrix Designing the M-estimator PLIER M-estimator minimizes some function of the data and the estimator(s) Our case: sum( f(PM,MM, a,c,z) ) Choose f to model “reasonable” error Choose tail of f to handle outliers PLIER: “Probe Logarithmic Intensity ERror”

    19. 10/11/04 Affymetrix Assumptions (approximations?) Concentration never negative! c>=0 linear link between true signal & concentration: T~a*c Background (not constant) adds to signal: I ~ T+B Background same for PM and MM Multiplicative intensity error log(I) ~ normal(log(T+B),s^2)

    20. 10/11/04 Affymetrix Assumption: Multiplicative Error Widely agreed that replicate observations of probes (PM,MM) are approximately log-normal I.e. PM varies by 10% of PM Does not imply that derived quantities (PM-MM or PM-B) are also log-normal! I.e. PM-MM varies by ~7% of (PM+MM) not by 10% of (PM-MM)!

    21. 10/11/04 Affymetrix No obvious need for arithmetic noise for raw intensities

    22. 10/11/04 Affymetrix Simplified model [PM-MM] PM= a*c+MM MM = a2*c+B If B can vary wildly (experiment-experiment, probe-probe) , left with PM-MM = a*c Incorporating multiplicative error e1*PM-e2*MM = a*c

    23. 10/11/04 Affymetrix Key concept: good fits have small multiplicative errors Trying to minimize log(e1)^2+log(e2)^2 The actual minimum is a complicated function, so we(I) don’t want to solve for it And we don’t have to - M-estimators can be chosen for computational convenience Therefore, let log(e1)^2=log(e2)^2

    24. 10/11/04 Affymetrix How good is the fit?: 2 possible log(e1)=log(e2) (“log transform”)- no solution for MM>PM, always worse fit than log(e1)=-log(e2) (“PLIER”) => e = [a*c+sqrt((a*c)^2+4*PM*MM)]/2*PM log(e) exists for any PM,MM>0, any a,c effective error model changes from “arithmetic” near zero to “multiplicative” far from zero

    25. 10/11/04 Affymetrix PM “-” MM Goodness of fit

    26. 10/11/04 Affymetrix Define center of f Residual r=log(e) Under log-normal assumption, fit for least r^2 But we should fix the tails (where outliers show up, and the approximation breaks down)

    27. 10/11/04 Affymetrix Robustness Want to “discount” outliers compared to sum-of-squares Off-the-shelf: Geman-McClure transformation f(r,z) = r^2/(1+r^2/z) Looks like least-squares for r small bounds influence of residual to at most z

    28. 10/11/04 Affymetrix Transformation f(r) and its Influence Function

    29. 10/11/04 Affymetrix

    30. 10/11/04 Affymetrix PLIER: “on a t-shirt” y= a*c e = [y+sqrt(y^2+4*PM*MM)]/2*PM r = log(e) f(r,z) = r^2/(1+r^2/z) argmin(sum(f(r,z))) over all a,c >=0 yields PLIER estimate of affinity and concentration

    31. 10/11/04 Affymetrix Optimizing (finding minima) Many ways to find best fit Easiest to explain is cyclic coordinate ascent aka “polishing” the data Can start anywhere (but best to start with a good guess)

    32. 10/11/04 Affymetrix Finding affinity/concentration [Don’t I need to know one to start?]

    33. 10/11/04 Affymetrix Observed PM/MM values

    34. 10/11/04 Affymetrix Compare observed to predicted (find where to improve predictions)

    35. 10/11/04 Affymetrix “Polishing” the table Guess initial values (a = 1.0, c=0) Find best concentrations (with current affinities) Find best affinities (for current concentrations) Repeat until minimized (or bored) [remember: values non-negative!]

    36. 10/11/04 Affymetrix How does it work on real data? Gold-standard data generated by spiking in known transcript Example is one of the transcripts (6th) Look at residuals to find outliers

    37. 10/11/04 Affymetrix Latin Square Experimental Design

    38. 10/11/04 Affymetrix Model fit:A=1.0, C=0.0

    39. 10/11/04 Affymetrix Residuals: Fit Concentration

    40. 10/11/04 Affymetrix Residuals: Fit Probes

    41. 10/11/04 Affymetrix Fit concentration and affinities - data fits except for outliers [clearly revealed]

    42. 10/11/04 Affymetrix What are the outliers?

    43. 10/11/04 Affymetrix Final results (value)

    44. 10/11/04 Affymetrix Know everything (approx)

    45. 10/11/04 Affymetrix Trick: P/A calls by fit

    46. 10/11/04 Affymetrix Trick: models are good for residuals!

    47. 10/11/04 Affymetrix Optimization (harder to illustrate) Current implementation uses descent optimization (Newton-like) Start with a good initial guess (median polish) Improve by descent Try jumps to escape local minima

    48. 10/11/04 Affymetrix Evaluating performance MvA plots (unbiased/biased) Receiver Operating Characteristic (ROC) Area Under Curve (AUC) (global/stratified) Benchmark results

    49. 10/11/04 Affymetrix MvA plots Scatterplot turned 45% Plotting A vs B M = log(A)-log(B) A = (log(A)+log(B))/2 “average” Allows easy visualization of changes

    50. 10/11/04 Affymetrix MVA (bias added for stabilization)

    51. 10/11/04 Affymetrix Receiver Operating Characteristic ROC curves measure separation of distributions for two states “Changed” or “unchanged” between pair(s) of experiments Depends on the variation of the signal within an experiment, and the separation between the two states Note that just measuring variation or just measuring separation can be misleading! One popular method of defining “changed” is a fold-change threshold ROC curves can be summarized by “area under curve”

    52. 10/11/04 Affymetrix Overall performance good (ROC)

    53. 10/11/04 Affymetrix Specific performance regimes are of interest Low, medium, high concentrations Relatively small fold-changes (2-fold, 4-fold) Thresholds defined by fold-change Thresholds defined by change relative to variation (“t-like statistic”)

    54. 10/11/04 Affymetrix

    55. 10/11/04 Affymetrix Output characteristics of some standard methods MAS 5.0 – Not variance stabilized(*), some bias, runs on single chips PLIER – Not variance stabilized(*), minimal bias, reduced variance, runs on multiple chips RMA – Variance stable, noticeable bias, low variance, runs on multiple chips (*)[Can always apply stabilizing transformation]

    56. 10/11/04 Affymetrix

    57. 10/11/04 Affymetrix

    58. 10/11/04 Affymetrix

    59. 10/11/04 Affymetrix

    60. 10/11/04 Affymetrix

    61. 10/11/04 Affymetrix Works fine on U133 too

    62. 10/11/04 Affymetrix Bonus: M-estimator tricks Handle PM-only (PM+MM, PM-B) just fine by replacing error model in f Play Bayesian games (affinity penalties, concentration penalties)

    63. 10/11/04 Affymetrix M-estimator: PM only PM-B = a*c [background estimate “perfect”] e*PM-B = a*c e= (a*c+B)/PM proceed in the same framework using e Note that B can be zero for (a*c>0)

    64. 10/11/04 Affymetrix PM-only: global background biased

    65. 10/11/04 Affymetrix Can play “Bayesian” games Probe affinities likely to be “log-normal” distributed Add a penalty term to avoid overweighting any single probe Good when insufficient data sum(log(e)^2) + (penalty)*sum(log(a)^2) [Can do the same for concentration]

    66. 10/11/04 Affymetrix Bayesian prior on probes

    67. 10/11/04 Affymetrix PLIER M-estimators form a very flexible framework for analysis Can handle PM-B, PM-MM, PM-only approaches in same framework Handles zero/near-zero concentration & affinities in model directly Seems to produce good results

    68. 10/11/04 Affymetrix PLIER: obtaining an implementation PLIER algorithm SDK is now available under a GPL open source license.   The code is available as C++ without windows dependencies.  Documentation is included at the site.  All of us at Affymetrix hope that releasing PLIER in this manner promotes all of the values that the Bioconductor community embraces. http://www.affymetrix.com/support/developer/index.affx

    69. 10/11/04 Affymetrix Thanks David Kulp Sejal Shah Simon Cawley David Finkelstein Mike Lelivelt Teresa Webster Rui Mei Suzanne Dee Stefan Bekiranov Xiaojun Di Alex Cheung Steve Lincoln Many, many others!

More Related