1 / 20

Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences

Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences. Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry Statistics Workshop September 2006. Gap Between Published Biomarkers And Biomarkers Being Approved For Use.

kamin
Download Presentation

Challenges In Progressing Biomarkers To Clinical Use Proteomic Experiences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges In Progressing Biomarkers To Clinical UseProteomic Experiences Chris Harbron Technical Lead For High Dimensional Data AstraZeneca FDA Industry Statistics Workshop September 2006

  2. Gap Between Published Biomarkers And Biomarkers Being Approved For Use

  3. Why Might This Be?Challenges • Pressures from the contextual environment • High quality data is essential • These are new technologies - not simple to use or analyse • Robust study design including : • Consistent sample collection and processing • Need to understand reproducibility between & within labs & within subjects • Failure leads to poor data quality, frequently dominated by nuisance factors • Rigorous validation is also essential • Occurs at many levels • Avoid overfitting data • Omics may not do it alone • Applications will require combining -omics with other data types

  4. Example : Case-Control Study • Interest in identifying a peptidomic profile that could predict an adverse event • Potential use as a personalised medicine predictive marker • Blood samples taken from subjects at start of treatment • Subjects monitored for adverse event using a rigorous definition • Subjects entered in cohorts • Samples processed in batches within cohorts • Analysed on a LC/MS-MS platform

  5. 100 95 90 85 80 75 70 65 60 55 50 Relative Abundance 45 40 35 30 25 20 15 10 5 0 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z LC-MS/MS Proteomics Clinical Plasma Samples Preparation & Digestion Mass / Charge Ratio Ion intensity Peptides Liquid Chromatography Separation By Retention Time Retention Time Separation By Mass/Charge Measurement Of Intensity 690.81 Mass Spectrometry Fragment Ion intensity 1027.87 570.33 1156.84 599.13 579.3 635.85 1138.86 643.8 1122.83 1251.79 371.25 799.93 1010.89 242.26 727.23 1252.9 258.19 Protein Identification 881.99 389.22 561.21 958.89 276.24 832.76 1269.83 286.28 1234.85 1107.00 1346.63 MS/MS Mass / Charge Ratio

  6. Distribution Of Average Intensities ~5,500,000 RT / MZ / Intensity Measurements Per Sample Distribution Of Average Intensities High Intensity • Pre-Processing • Alignment Of Retention Times • Scaling • Binning Mass-Charge Ratio Low Intensity ~25,000 Common Peaks Per Sample Retention Time

  7. Proteomic DataExploratory Analysis - PCA Considerable batch to batch variation Control Case Non-Index Case Cohort 1 Cohort 2 Cohort 3 Cohort 4

  8. Proteomic DataExploratory Analysis - PCA Within all batches with both cases and controls, there is separation of cases and controls

  9. Univariate Analyses Within BatchesHistograms Of t-Test p-Values

  10. Global Test Of Agreement Between Batches Using A Permutation Test Identify peaks where direction of effect agrees in all 3 batches Summarise by maximum p-value Global test of expected level due to multiple testing by permutation Observed Permuted

  11. Typical Highly Significant Peak Within each batch, cases are highly expressed compared to controls Not possible to define a global cut-off between cases and controls Intensity Batches CASE CONTROLNIC

  12. Multivariate Analyses • Identified consistent effect • BUT, may be difficult to use as a predictive biomarker in a clinical setting due to batch variation • Would a combination of markers, a peptidomic profile, work as a predictive biomarker? • Use Random Forests to generate multivariate predictive models • Assess predictive power using a nested cross-validation • Within and between batch prediction

  13. Modelling Process Data Control Only batches Batch excluded Observation excluded Mixed Case-Control batches Exclude Batches In Turn Exclude Observations By LOO Observation Excluded Training Set Test Set Batch Excluded Analyse Each Peak Within Each Batch Take Maximum p-Value For Each Peak Rank Peaks By p-Value Number Of Peaks Build Model With Top n Peaks Test Model In Test Set

  14. Leave One Out Cross ValidationProteomic Model Predictions Leave One Out Training Set Batches Cases Leave One Out Training Set Batches Controls Other Mixed Batch Cases Other Mixed Batch Controls Other Batches - Controls

  15. Mass Charge Ratio Retention Time Mask Data By Restricting To High Quality Regions Of Proteomic Space • TECHNICALLY • Region of focus for instrument • EMPIRICALLY • Lowest residual • variability • Highest average intensity

  16. Analysis Of Unmasked Peaks • Batch Effects Still Dominate • Consistent Case-Control Effect Can Identify Peaks Separating Cases & Controls Across Batches

  17. Cross-Validation PredictionsUnmasked Peaks Leave One Out Same Batch – Cases Leave One Out Same Batch - Controls Other Mixed Batch - Cases Other Mixed Batch - Controls Other Batches - Controls • Good Predictions Within Same Batch • Prediction Rate Falls When Extrapolated To Other Batches • Need To Prospectively Test In Another Set Of Patients

  18. How To Combine Other Non-omic Information Into A Biomarker? • Combining different data types is challenging • The “bigger” data type will dominate the modelling • Greater signal in data, but doesn’t extrapolate as well • Exploring options turning the random part of random forests to our advantage Known Clinical Prognostic Proteomic Peaks

  19. Proteomic Quality Control Consortium? • MAQC recently reported a reproducibility study for microarrays • Wealth of valuable information • Mammoth effort • Could we do the same for proteomics? • Less mature technology • Greater diversity of platforms • Diversity of pre-processing methodologies • Issues of identification making large scale comparisons challenging

  20. Conclusions • Complicated new technologies • Many challenges • Technical, Data Quality, Data Analysis, Practical • Essential role for statistics • Need to integrate statistical approaches with understanding of technologies and biology • Great potential • Better treatments for patients • Improved use of compounds • Greater biological understanding

More Related