Loading in 5 sec....

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010PowerPoint Presentation

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010

- 135 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010' - laurel-strong

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Microarray Technology

- Microarray technology allows measuring expression levels (abundance of mRNA transcripts) of thousands of genes simultaneously.
- Two types of platforms:
- Affymetrix (single-color)
- Two-color microarray

Wild-type vs. Myostatin Knockout Mice

Belgian Blue

cattle have a

mutation in the

myostatin gene.

Design of Affymetrix experiment: one sample one chip

Designing 2-color microarray (3 layers)

From Churchill, 2002, nature genetics

Example I: Sawers et al, 2007, BMC Bioinformatics

- The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types.
- Goal: To detect genes that are differentially expressed in Bundle Sheath (B) and Mesophyll (M) cells.

Example I: Sawers et al, 2007, BMC Bioinformatics

- A simple method:
Isolate cells and perform a microarray experiments to compare the gene expression between the two cells (treatments).

Example I: Sawers et al, 2007, BMC Bioinformatics

- A little more complication:
The procedure for extracting mRNA for the two cells are different. The one to extract mRNA from M cells introduces stress.

- Solution:
Add two more treatment groups: samples with both M and B cells going through extraction of mRNA with and without stress.

B, M, Stress and Total (4 treatment groups)

Direct comparison vs indirect comparison

- Direct: comparison within slide
- Indirect: comparison between slides
- Suppose we want to compare gene expression levels between treatment 1 and treatment 2.

2

1

2

1

R

2

1

Direct Comparison

Indirect Comparison

Comments about 2-color Microarray Designs

- A unique and powerful feature of 2-color microarray is to make direct comparison between two samples on the same slide.
- For pairing samples, the variation due to slide can be accounted for.
- When possible, it is more efficient to use direct comparison.
- However, sometimes, it is not practical to make direct comparison of all possible pairs.

Efficiency of comparison

- The efficiency of comparisons between 2 samples is determined by the length and the number of paths connecting them.

2

1

2

1

R

2

1

Direct Comparison

(Dye-swap)

Indirect Comparison

Performing the experiment (Naturecell biol. 2001 3:8)

Pre-normalization analysis

- Image processing
- obtain the intensity measurement of the signal

- Background correction
- get rid of local background that might due to non-specific binding and obtain the target sample intensity

- Filtration
- remove unreliable spots and reduce the dimension of data

- Transformation
- convert data into a format that makes data analysis valid or easier

Normalization

- Normalization describes the process of removing (or minimizing) non-biological variation in measured signal intensity levels so that biological differences in gene expression can be appropriately detected.
- Aim: remove sources of systematic variation
- Example of non-biological variation: dye difference for 2-color microarray

Figure from Dudoit et al, 2002, Statistica Sinica

Self-self experiment

Y224

Y114

dye

slide

treatment

Statistical Inference- Data notation for normalized signal intensities (NSI):
Yijk for each gene (g)

i: treatment index

j: dye index

k: slide index

Fitting linear models to microarray data

- After the normalization, we have one observation (normalized signal intensity) for each gene on each channel (a combination of dye and array).
- Together, the data is an array with each row for one gene and each column for one channel or one chip.
- We will fit a statistical model for each gene separately.

Mean expressions for 4 treatment groups

Treatments means

- M (M cell with stress) μ+v2+
- B (B cell without stress) μ+v1
- TO (both cells without stress) μ+c*v2+ (1-c)*v1
- ST (both cells with stress) μ+c*v2+ (1-c)* v1+
- Note that c is the proportion of M cells in the total leaf sample with both cells.
- We are interested in testing H0: v1 = v2, whether a given gene is differentially expressed between M and B cells or not.

Fixed effects

- The parameters on the previous slide (v1, v2, and ) specify fixed effects.
- Fixed effects are used to specify the mean of the response variable.
- A factor is fixedif the levels of the factor were selected by the investigator with the purpose of comparing the effects of the levels to one another.
- The fixed effects included in the model depend on the experimental design.

Random effects

- There are some random effects that are unknown:
- slide effects
- other effects introduced in the experiment (such as biological replicate effects)
- residual random effects that include any sources of variation unaccounted for by other terms

B

Total

Stress

M

Random effects

- Random factors are used to specify the correlation structure among the response variable observations.
- e.g., observations on the same slide are more correlated than observations from different slides.

- The random effects included in the model also depend on the experimental design.
- A model that has both fixed and random effects is called a mixed model.

Detecting differentially expressed genes

- Construct statistical test for parameters that we are interested in, e.g., what are the difference in gene expression (v1 - v2)?
v1 - v2 0 means differential expression.

- Model the random effects and perform tests or construct confidence intervals.
- Perform tests for each gene and obtain a p-value.
- Empirical Bayes test that borrows information across genes is often used because of higher power.

2536 p-values below 0.05.

0.05

We would expect around 0.05*40000=2000

p-values to be less than 0.05 by chance

if no genes were differentially expressed.

Possible Errors in Testing ONE gene

- Type I Error: false positives
- Type II Error: false negatives (1-power)
- Power: true positives

Error Rate in Multiple Testing

Outcomes when testing m genes

(Benjamini and Hochberg, 1995)

Family-wise error rate, FWER= Pr(V >0)

False Discovery Rate,

FDR = E(V/R |R>0) * Pr(R>0)

Clustering

- Grouping genes into different “clusters” based on their expression profile
Clustering

Other analyses

- Relating the gene expressions with biological functional categories Gene Enrichment Test
- Connecting microarray data with other kinds of data such as survival data.
- More …

Assigned References

- Nettleton, D. (2006) A Discussion of statistical methods for design and analysis of microarray experiments for plant scientists. The Plant Cell,18, 2112–2121.

Download Presentation

Connecting to Server..