Model based analysis of oligonucleotide arrays dchip software
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Model-based analysis of oligonucleotide arrays, dChip software PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Model-based analysis of oligonucleotide arrays, dChip software. Cheng Li (Joint work with Wing Wong). Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public Health January 23-25, 2002. Source: Affymetrix website. Custom software: raw image.

Download Presentation

Model-based analysis of oligonucleotide arrays, dChip software

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Model based analysis of oligonucleotide arrays dchip software

Model-based analysis of oligonucleotide arrays, dChip software

Cheng Li

(Joint work with Wing Wong)

Statistics and Genomics – Lecture 4

Department of Biostatistics

Harvard School of Public Health

January 23-25, 2002


Model based analysis of oligonucleotide arrays dchip software

Source: Affymetrix website


Model based analysis of oligonucleotide arrays dchip software

Custom software: raw image


Model based analysis of oligonucleotide arrays dchip software

Custom software: getting representative value of a probe cell


Normalization is needed to minimize non biological variation between arrays

Normalization is needed to minimize non-biological variation between arrays


Normalization methods

Normalization methods

  • Current software uses linear normalization

  • Nonlinear curve fitting based on scatter plot is still inadequate because 1) effects of differentially expressed genes may be “normalized” 2) regression phenomenon and asymmetry


Regression phenomenon and asymmetry

Regression phenomenon and asymmetry


Invariant set normalization method

Invariant set normalization method

  • A set of points (xi, yi) is said to be order-preserving if yi < yj whenever xi < xj

  • The maximal order-preserving subset can be obtained by dynamic programming

  • If a gene is really differentially expressed, it’s cells tend not to be included into an large order-preserving subset

  • Our method is based on an approximately order preserving subset, called “Invariant set”


Model based analysis of oligonucleotide arrays dchip software

Fig. 2.9 Normalization of a pair of replicated arrays


Model based analysis of oligonucleotide arrays dchip software

Figure 2.10. Two different samples. The smoothing spline in (A) is affected by several points at the lower-right corner, which might belong to differentially expressed genes. Whereas the “invariant set” does not include these points when determining normalization curve, leading to a different normalization relationship at the high end.


Model based analysis of oligonucleotide arrays dchip software

A pair of split-sample replicate arrays


Model based analysis of oligonucleotide arrays dchip software

Source: Affymetrix website


Data for one probe set one array

Data for one probe set, one array

PM/MM differences eliminate background and cross-hybridization signals


Model based analysis of oligonucleotide arrays dchip software

Validation experiments suggest Average Differences are linear to mRNA concentrations at certain dynamic range

Lockhart et al. (1996) Nature Genetics, Vol 14: 1675-1680


Data for one gene in many arrays

Data for one gene in many arrays


Box plot showing array and probe effects

Box plot showing array and probe effects


Modeling probe effects

Modeling probe effects

1) Probes sequence has different hybridization efficiency

2) cross hybridization, SNP, alternative splicing

3) Probe position effect, 3’ bias

Probe effects can dominate biological variation of interest

Previous method : use multiple probes, average to reduce “noise”

Our methods: statistical models for probe effects, “meta-analysis”, learning algorithms, estimation of expression level conditional on knowledge of probe effect


Principal component analysis 42 points in 20 space suggests the data matrix has approx rank 1

Principal component analysis (42 points in 20-space) suggests the data matrix has approx. rank 1


Model based analysis of oligonucleotide arrays dchip software

Model for one gene in multiple arrays


Model based analysis of oligonucleotide arrays dchip software

Figure 1.1. Black curves are the PM and MM data of gene A in the first 6 arrays. Light curves are the fitted values to model (1). Probe pairs are labeled 1 to 20 on the horizontal axis.


Using pm mm differences

Using PM/MM Differences

  • PM/MM differences eliminate most background and cross-hybridization signals

  • Affyemtrix’s GeneChip software is using average differences as basis for determining fold changes, and their validation showed average differences are linear to mRNA concentrations at certain dynamic range


Model based analysis of oligonucleotide arrays dchip software

Model for PM/MM differences (1.2)


Model based analysis of oligonucleotide arrays dchip software

Figure 1.2. Black curves are the PM-MM difference data of gene A in the first 6 arrays. Light curves are the fitted values to model (2).


Residuals of the fitting

Residuals of the fitting


Model fitting amounts to fixing s and regress to estimate

Model fitting amounts to fixing ’s and regress to estimate 


Model based analysis of oligonucleotide arrays dchip software

Fig 1.5 Array outlier: large standard errors of 4


Model based analysis of oligonucleotide arrays dchip software

Fig. 1.6 Probe outlier: large standard errors of 17

Also see gene 6898


Model based analysis of oligonucleotide arrays dchip software

Fig. 1.4 Array outlier image shows that the model automatically handles image contamination


Model based analysis of oligonucleotide arrays dchip software

Compare Model-based expression with Average Difference

  • The array set 5 has 29 pair of arrays replicated at split-mRNA level

  • The differences between the replicated arrays provides a opportunity to assess different expression calculation method


Model based analysis of oligonucleotide arrays dchip software

Figure 2.5. Log (base 10) expression indexes of a pair of replicate arrays (array 1 and 2 of array set 5) for MBEI method (A) and AD method (B). The center line is y=x, and the flanking lines indicate the difference of a factor of two.


Model based analysis of oligonucleotide arrays dchip software

(A)

(B)

Figure 2.6. Boxplots of average absolute log (base 10) ratios between replicate arrays stratified by presence proportion for (A) MBEI method, (B) AD method.


Model based analysis of oligonucleotide arrays dchip software

Source: Affymetrix website


Finding confidence interval of fold change

Finding Confidence Interval of Fold Change


Model based analysis of oligonucleotide arrays dchip software

Table 2.1 Using expression levels and associated standard errors to determine confidence intervals of fold changes


Model based analysis of oligonucleotide arrays dchip software

Resampling hierarchical clustering using standard errors of model-based expression


Model based analysis of oligonucleotide arrays dchip software

Incorporate biological knowledge and database when analyzing microarray data

Right picture: Gene Ontology: tool for the unification of biology, Nature Genetics, 25, p25


Model based analysis of oligonucleotide arrays dchip software

Functional significant clusters

Found 13 structural protein genes out of a 49-cluster (all: 198/2622, PValue: 1.00e+000)


Problems with lwr model

  • Statistical analysis of high-density oligonucleotide arrays: a multiplicative noise model

  • R. Sasik and J. Corbeil (UCSF)

Problems with LWR model:

  • LWR model:

  • The expression index can still be negative.

  • Genes with negative index can still be classified as present.

Slides prepared by Xuemin Fang


Statistical model

Statistical model:

  • Based on the same assumption as the LW model, that PM intensity is directly proportional to the concentration ciof the transcript, . Write the relation in the form

  • Our model is

  • where

  • Least squared estimation of the parameters.

  • Constraint:


Algorithm when analyzing a batch of n s samples

Algorithm -- When analyzing a batch of ns samples:

  • Normalize all samples to the first one on the list by requiring the sum of all PM intensities be the same as that of the first sample.

  • Select the background probes using Naef’s method (MM is used in this step).

  • Subtract the median of the background probe intensity from every PM probe in the array.

  • Probes that become negative are eliminated.

  • Fit the model and probes contributes most to the sum of squares are eliminated.

  • Normalize again and repeat 1-5, until the distribution of residuals is Gaussian.


Model based analysis of oligonucleotide arrays dchip software

Bias, variance and fit for three measures of expression: AvDiff, Li & Wong's,

AvLog (PM -bg)

Rafael Irizarry, Terry Speed (Johns Hopkins)

Slides prepared by Xuemin Fang


A background plus signal model

A background plus signal model:

  • Here represents background signal caused by optical noise and non-specific binding.

  • The mean background level is represented with and the random component with .

  • The transcript signal contains a probe affinity effect , the log expression measures , and an error term.

  • Both error terms and are independent standard normal.


Expression index

Expression index:

  • A naïve estimate of is given by

    with the mode of the log2(MM) distribution.

  • An estimate of this distribution is obtained using a density kernel estimate.


Model based analysis of oligonucleotide arrays dchip software

Acknowledgement

Data source:

Stan Nelson (UCLA)Sven de Vos (UCLA) Dan Tang (DFCI)Andy Bhattacharjee (DFCI)Richardson Andresa (DFCI)Allen Fienberg (Rockefeller)


  • Login