sample size selection for microarray based gene expression studies l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Sample Size Selection for Microarray based Gene Expression Studies PowerPoint Presentation
Download Presentation
Sample Size Selection for Microarray based Gene Expression Studies

Loading in 2 Seconds...

play fullscreen
1 / 22

Sample Size Selection for Microarray based Gene Expression Studies - PowerPoint PPT Presentation


  • 192 Views
  • Uploaded on

Sample Size Selection for Microarray based Gene Expression Studies. Gregory R. Warnes, Pfizer Global R&D. Fasheng Li Smith Hanley Consulting Group. Outline. What is the context? What is the problem? What are possible approaches? What approach was chosen and why?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sample Size Selection for Microarray based Gene Expression Studies' - uyen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sample size selection for microarray based gene expression studies

Sample Size Selection for Microarray based Gene Expression Studies

Gregory R. Warnes, Pfizer Global R&D

Fasheng Li

Smith Hanley Consulting Group

outline
Outline
  • What is the context?
  • What is the problem?
  • What are possible approaches?
  • What approach was chosen and why?
  • How was the approach implemented?
  • What do the results look like?
  • Future plans?
  • References

Industry/FDA Statistics Workshop: September 18-19, 2003

what is pfizer global r d
What is Pfizer Global R&D?
  • What do we do? Lots!
    • Pharmaceutical research and development
    • Associated basic science, medical, and technological research
  • How are we doing? Very Well
    • 2003 R&D budget: $7.1 billion
    • 33 major research projects across 10 major therapeutic categories
    • 12,000 employees
    • 6 Major Research Sites

Industry/FDA Statistics Workshop: September 18-19, 2003

how are we using gene expression technologies
How are we using Gene Expression Technologies?
  • Determine regulatory and metabolic pathways
  • Identify potential biomarkers
  • Identify potential targets
  • Determine mechanism of action (desired and undesired)
  • Evaluate / predict safety
  • Determine mechanism of toxicity

Industry/FDA Statistics Workshop: September 18-19, 2003

what is the problem
What is the problem?
  • Gene expression assays are expensive
    • ~ $2,000 per samplefor Affymetrix experiments
  • Good experimental design is important
  • A huge number of variables measured on each experimental unit
    • 9,300 variables the Affymetrix S98 Yeast Genechip™
    • 16,000 variables for Affymetrix RAE230a Rat Genechip™
    • 23,000 + 23,000 = 46,000 variables for the Affymetrix U133A and U133B Human Genechips™
  • Sample size calculations are hard

Industry/FDA Statistics Workshop: September 18-19, 2003

standard sample size calculation
Standard sample size calculation

For a single outcome variable, given

  • simple design (e.g., two-sample t-test)
  • effect size (ideally, minimum practical significance)
  • population variance ²,
  • significance level(probability of a false positive when no true effect)
  • power(probability of a true positive given the defined effect size)

It is straightforward to compute the required sample size n (see e.g. Cochrain & Cox (1957))

Industry/FDA Statistics Workshop: September 18-19, 2003

gene expression sample size calculation
Gene expression sample size calculation

When there are thousands of outcome variables which are not independent, many problems arise:

  • How to handle multiple comparison?
  • How to deal with dependencies?
  • One effect size or many?
  • One power or many?
  • Many variables, how to get a single answer?

Industry/FDA Statistics Workshop: September 18-19, 2003

what are possible approaches
What are possible approaches?

Two extremes:

  • Treat each variable (gene) as a separate and independent problem, then summarize

+ easy to set up, understand, explain

+ available data can be used

- may not be sufficiently realistic, hence accuracy may suffer

  • Model the entire system, including realistic error structure and interdependencies

+may be more accurate (if model is good)

- more initial work to set up / compute

- may require substantial new data to be realistic

- May be hard to understand, explain

Industry/FDA Statistics Workshop: September 18-19, 2003

what approach was chosen and why
What approach was chosen and why?
  • We chose to treat each variable (gene) as a separate and independent problem, then summarize
  • Why?
    • First approximations usually yield a useful information with minimal effort.
    • Answers were needed immediately.
    • At best, results would only be used for general guidance
    • A more realistic error model didn’t work:

We tried fitting the model from Zien, et al (2002), which requires high-dimensional numerical integration via MCMC or equivalent. However, the model appears to be non-identifiable.

Industry/FDA Statistics Workshop: September 18-19, 2003

how was the approach implemented
How was the approach implemented?
  • Compute variance of each gene (variable) from existing studies
  • Assume a two sample t-test on log(expression)
  • Bonferonni adjust significance value: i =  / #variables
  • Generate plots of cumulative #genes :
    • Fixed I, , 1- vs. sample size (e.g. n=5/group,6/group,…)
    • Fixed I, , n vs. power (eg. 1-= 60%, 70%, 80%, …)
    • Fixed I, 1-, n vs. effect size (=1.5x, 2.0x, 2.5x, …)
  • Run twice:
    • ‘candidate’ genes ( less stringent Bonf. Adj.)
    • all genes
  • Implemented using R [Ross & Ihaka, 1996] using the power.t.test function.

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like standard deviations focus group
What do the results look like? Standard Deviations: Focus Group

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i 1 vs sample size focus group
What do the results look like? Fixed I, , 1- vs. Sample Size:Focus Group

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i n vs power focus group
What do the results look like?Fixed I, , n vs. Power: Focus Group

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i 1 n vs fold change focus group
What do the results look like?Fixed I, 1-, n vs. Fold Change: Focus Group

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like standard deviations all genes
What do the results look like? Standard Deviations: All Genes

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i 1 vs sample size all genes
What do the results look like? Fixed I, , 1- vs. Sample Size:All Genes

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i n vs power all genes
What do the results look like?Fixed I, , n vs. Power: All genes

Industry/FDA Statistics Workshop: September 18-19, 2003

what do the results look like fixed i 1 n vs fold change all genes
What do the results look like?Fixed I, 1-, n vs. Fold Change: All Genes

Industry/FDA Statistics Workshop: September 18-19, 2003

future plans
Future plans?
  • A web-applet backed by R to perform the calculations

Industry/FDA Statistics Workshop: September 18-19, 2003

future plans20
Future plans?
  • Provide a web-applet backed by R to perform the calculations
  • Use a library of gene variation information in normal samples, (structured by organism, Affymetrix chip type, cell type, normalization/scaling method)
  • Extend to more complicated designs (2-way ANOVA, Repeated measures, etc)
  • Other types of multiple comparison adjustments (FDR)
  • Develop models that deal with correlations between genes.

Industry/FDA Statistics Workshop: September 18-19, 2003

references
References
  • Two-sample t-test sample size:
    • Cochrain WG, Cox GM (1953). Experimental Designs (2nd Ed). 17-28.
  • General sample size calculations:
    • Chow SC, Liu JP (1998). Design and Analysis of Clinical Trials : Concept and Methodologies. Wiley-Interscience. Chapter 10, 424 – 482
    • Chow SC , Shao J, Wang H (2003). Sample Size Calculation in Clinical Research. Marcel Dekker [New, looks interesting]
  • Gene expression experiments sample size:
    • Zien A, Fluck J, Zimmer R, Lengauer T (2002). Microarrays: How Many Do You Need? RECOMB02, Meyers G, Hannenhalli S, Istrail S, Pevzner P, Waterman M, eds. 321-330.
  • Statistical analysis software:
    • Ihaka R, Gentleman R, et al (2003). http://www.r-project.org[web site]
    • Ross Ihaka and Robert Gentleman (1996). R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, Vol 5, Number 3: 299-314.
  • Web applet software:
    • Warnes GR, (2003). http://www.analytics.washington.edu/Zope/projects/RSessionDA/ [web site]
  • Me:
    • http://www.warnes.net

Industry/FDA Statistics Workshop: September 18-19, 2003