1 / 72

Designing a high quality metabolomics experiment

Designing a high quality metabolomics experiment. Grier P Page Ph.D. Senior Statistical Geneticist RTI International Atlanta Office gpage@rti.org 770-407-4907. Metabolomics is Powerful and Central. Designing a good study. Errors Errors Everywhere. UMSA Analysis. Day 1. Day 2.

cherylcolon
Download Presentation

Designing a high quality metabolomics experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a high quality metabolomics experiment Grier P Page Ph.D.Senior Statistical Geneticist RTI International Atlanta Office gpage@rti.org 770-407-4907

  2. Metabolomics is Powerful and Central

  3. Designing a good study

  4. Errors Errors Everywhere

  5. UMSA Analysis Day 1 Day 2 Insulin Resistant Insulin Sensitive

  6. Primary consideration of good experimental design • Understand the strengths and weaknesses of each step of the experiments. • Take these strengths and weaknesses into account in your design.

  7. From Drug Discov Today. 2005 Sep 1;10(17):1175-82.

  8. State the Question and Articulate the Goals

  9. The Myth That Metabolomics does not need a Hypothesis • There always needs to be a biological question in the experiment. If there is not even a question don’t bother. • The question could be nebulous: What happens to the metabolome of this tissue when I apply Drug A. • The purpose of the question is to drive the experimental design. • Make sure the samples answer the question: Cause vs. effect.

  10. Design Issues • Known sources of non-biological error (not exhaustive) that must be addressed • Technician / post-doc • Reagent lot • Temperature • Protocol • Date • Location • Cage/ Field positions

  11. Experimental Design

  12. Biological replication is essential. • Two types of replication • Biological replication – samples from different individuals are analyzed • Technical replication – same sample measured repeatedly • Technical replicates allow only the effects of measurement variability to be estimated and reduced, whereas biological replicates allow this to be done for both measurement variability and biological differences between cases. Almost all experiments that use statistical inference require biological replication.

  13. How many replicates? • Controlled experiments – cell lines, mice, rats 8-12 per group. • Human studies – discovery 20+ per group • For predictive models – 100+ per group, need model building and validation sets • The more the better, always.

  14. Experimental Conduct All experiments are subject to non-biological variability that can confound any study

  15. Control Everything! • Know what you are doing • Practice! • Practice!

  16. What if you can’t control or make all things uniform • Randomize • Orthogonalize

  17. What are Orthogonalization and Randomization ? • Orthogonalization- spreading the biological sources of error evenly across the non-biological sources of error. • Maximally powerful for known sources of error. • Randomization – spear the biological sources of error at random across the non-biological sources of error. • Useful for controlling for unknown sources of error

  18. Examples of Orthogonalization and Randomization ? Randomize The experiment Orthogonalize

  19. Statistical analyses have assumptions too

  20. Statistical analyses • Supervised analyses – linear models etc • Assume IID (independently identically distibuted) • Normality • Sometimes can rely on central limit • ‘Weird’ variances • Using fold change alone as a statistic alone is not valid. • ‘Shrinkage’ and or use of Bayes can be a good thing. • False-discovery rate is a good alternative to conventional multiple-testing approaches. • Pathway testing is desirable.

  21. Classification • Supervised classification • Supervised-classification procedures require independent cross-validation. • See MAQC-II recommendations Nat Biotechnol. 2010 August ; 28(8): 827–838. doi:10.1038/nbt.1665. • Wholly separate model building and validation stages. Can be 3 stage with multiple models tested • Unsupervised classification • Unsupervised classification should be validated using resampling-based procedures.

  22. Unsupervised classification - continued • Unsupervised analysis methods • Cluster analysis • Principle components • Separability analysis • All have assumptions and input parameters and changing them results in very different answers

  23. Sample size estimation for metabolomics studies

  24. There is strength in numbers —power and sample size . • Unsupervised analyses • Principal components, clustering, heat maps and variants • These are actually data transformations or data display rather than hypothesis testing, thus unclear if sample size estimation is appropriate or even possible. • Stability of clustering may be appropriate to think about. Garge et al 2005 suggested 50+ samples for any stability.

  25. Sample size in supervised experiments • Supervised analyses • Linear models and variants • Methods are still evolving, but we suggest the approach we developed for microarrays may be appropriate for metabolomics (being evaluated)

  26. Metabolomics does not reveal everything and different technologies show different things

  27. Technology and detection evolves over time.

  28. Technologies are not perfect in agreement

  29. The human urine metabolome

  30. Sample, Image and Data Quality Checking

  31. Metabolite quality • Still evolving field • RTI is one of the Metabolomics Reference Standards Synthesis Centers

  32. Know your data - What should it look like

  33. These are OK

  34. These are not OK

  35. One bad sample can contaminate an experiment

  36. Histogram of p-values

  37. Potentially Bad Data

  38. Histogram of p-values with bad data removed

More Related