slide1
Download
Skip this Video
Download Presentation
Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010

Loading in 2 Seconds...

play fullscreen
1 / 37

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010 - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010. Microarray Technology. Microarray technology allows measuring expression levels (abundance of mRNA transcripts) of thousands of genes simultaneously. Two types of platforms: Affymetrix (single-color)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010' - laurel-strong


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
microarray technology
Microarray Technology
  • Microarray technology allows measuring expression levels (abundance of mRNA transcripts) of thousands of genes simultaneously.
  • Two types of platforms:
    • Affymetrix (single-color)
    • Two-color microarray
wild type vs myostatin knockout mice
Wild-type vs. Myostatin Knockout Mice

Belgian Blue

cattle have a

mutation in the

myostatin gene.

Design of Affymetrix experiment: one sample  one chip

designing 2 color microarray 3 layers
Designing 2-color microarray (3 layers)

From Churchill, 2002, nature genetics

example i sawers et al 2007 bmc bioinformatics

M

B

V

bundle sheath strands

mesophyll protoplasts

Example I: Sawers et al, 2007, BMC Bioinformatics
example i sawers et al 2007 bmc bioinformatics1
Example I: Sawers et al, 2007, BMC Bioinformatics
  • The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types.
  • Goal: To detect genes that are differentially expressed in Bundle Sheath (B) and Mesophyll (M) cells.
example i sawers et al 2007 bmc bioinformatics2
Example I: Sawers et al, 2007, BMC Bioinformatics
  • A simple method:

Isolate cells and perform a microarray experiments to compare the gene expression between the two cells (treatments).

example i sawers et al 2007 bmc bioinformatics3
Example I: Sawers et al, 2007, BMC Bioinformatics
  • A little more complication:

The procedure for extracting mRNA for the two cells are different. The one to extract mRNA from M cells introduces stress.

  • Solution:

Add two more treatment groups: samples with both M and B cells going through extraction of mRNA with and without stress.

B, M, Stress and Total (4 treatment groups)

direct comparison vs indirect comparison
Direct comparison vs indirect comparison
  • Direct: comparison within slide
  • Indirect: comparison between slides
  • Suppose we want to compare gene expression levels between treatment 1 and treatment 2.

2

1

2

1

R

2

1

Direct Comparison

Indirect Comparison

comments about 2 color microarray designs
Comments about 2-color Microarray Designs
  • A unique and powerful feature of 2-color microarray is to make direct comparison between two samples on the same slide.
  • For pairing samples, the variation due to slide can be accounted for.
  • When possible, it is more efficient to use direct comparison.
  • However, sometimes, it is not practical to make direct comparison of all possible pairs.
efficiency of comparison
Efficiency of comparison
  • The efficiency of comparisons between 2 samples is determined by the length and the number of paths connecting them.

2

1

2

1

R

2

1

Direct Comparison

(Dye-swap)

Indirect Comparison

reference vs loop design
Reference vs Loop design

2

1

2

1

3

3

R

Reference Design

Loop Design

designing experiment for example i

B

Total

Stress

M

Designing experiment for example I

With 6 biological replicates

after the bench work
After the bench work…

Affymetrix Gene Chip image

2-color microarray image

pre normalization analysis
Pre-normalization analysis
  • Image processing
    • obtain the intensity measurement of the signal
  • Background correction
    • get rid of local background that might due to non-specific binding and obtain the target sample intensity
  • Filtration
    • remove unreliable spots and reduce the dimension of data
  • Transformation
    • convert data into a format that makes data analysis valid or easier
normalization
Normalization
  • Normalization describes the process of removing (or minimizing) non-biological variation in measured signal intensity levels so that biological differences in gene expression can be appropriately detected.
  • Aim: remove sources of systematic variation
  • Example of non-biological variation: dye difference for 2-color microarray
normalization m vs a plot 45 o rotation
Normalization: M vs. A Plot (45o rotation)

Log Red-Log Green = M

(Log Green+Log Red)/2 = A

lowess fit
LOWESS Fit

Log Red-Log Green

(Log Green+Log Red)/2

after normalization
After normalization

Normalized M

A

statistical inference

Y224

Y114

dye

slide

treatment

Statistical Inference
  • Data notation for normalized signal intensities (NSI):

Yijk for each gene (g)

i: treatment index

j: dye index

k: slide index

fitting linear models to microarray data
Fitting linear models to microarray data
  • After the normalization, we have one observation (normalized signal intensity) for each gene on each channel (a combination of dye and array).
  • Together, the data is an array with each row for one gene and each column for one channel or one chip.
  • We will fit a statistical model for each gene separately.
mean expressions for 4 treatment groups
Mean expressions for 4 treatment groups

Treatments means

  • M (M cell with stress) μ+v2+
  • B (B cell without stress) μ+v1
  • TO (both cells without stress) μ+c*v2+ (1-c)*v1
  • ST (both cells with stress) μ+c*v2+ (1-c)* v1+
  • Note that c is the proportion of M cells in the total leaf sample with both cells.
  • We are interested in testing H0: v1 = v2, whether a given gene is differentially expressed between M and B cells or not.
fixed effects
Fixed effects
  • The parameters on the previous slide (v1, v2, and ) specify fixed effects.
  • Fixed effects are used to specify the mean of the response variable.
  • A factor is fixedif the levels of the factor were selected by the investigator with the purpose of comparing the effects of the levels to one another.
  • The fixed effects included in the model depend on the experimental design.
random effects
Random effects
  • There are some random effects that are unknown:
    • slide effects
    • other effects introduced in the experiment (such as biological replicate effects)
    • residual random effects that include any sources of variation unaccounted for by other terms

B

Total

Stress

M

random effects1
Random effects
  • Random factors are used to specify the correlation structure among the response variable observations.
    • e.g., observations on the same slide are more correlated than observations from different slides.
  • The random effects included in the model also depend on the experimental design.
  • A model that has both fixed and random effects is called a mixed model.
detecting differentially expressed genes
Detecting differentially expressed genes
  • Construct statistical test for parameters that we are interested in, e.g., what are the difference in gene expression (v1 - v2)?

v1 - v2 0 means differential expression.

  • Model the random effects and perform tests or construct confidence intervals.
  • Perform tests for each gene and obtain a p-value.
    • Empirical Bayes test that borrows information across genes is often used because of higher power.
slide31

2536 p-values below 0.05.

0.05

We would expect around 0.05*40000=2000

p-values to be less than 0.05 by chance

if no genes were differentially expressed.

possible errors in testing one gene
Possible Errors in Testing ONE gene
  • Type I Error: false positives
  • Type II Error: false negatives (1-power)
  • Power: true positives
error rate in multiple testing
Error Rate in Multiple Testing

Outcomes when testing m genes

(Benjamini and Hochberg, 1995)

Family-wise error rate, FWER= Pr(V >0)

False Discovery Rate,

FDR = E(V/R |R>0) * Pr(R>0)

clustering
Clustering
  • Grouping genes into different “clusters” based on their expression profile

 Clustering

other analyses
Other analyses
  • Relating the gene expressions with biological functional categories  Gene Enrichment Test
  • Connecting microarray data with other kinds of data such as survival data.
  • More …
assigned references
Assigned References
  • Nettleton, D. (2006) A Discussion of statistical methods for design and analysis of microarray experiments for plant scientists. The Plant Cell,18, 2112–2121.
ad