MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers
Download
1 / 23

MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers. Pengyu Hong 10/06/2005. mRNA transcript. Binding sites. Regulators. Genes. Motivation. Understand transcriptional regulation. Gene X. TF. Model transcriptional regulatory networks. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers' - morton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers

Pengyu Hong

10/06/2005


Motivation

mRNA transcript Binding Classifiers

Binding sites

Regulators

Genes

Motivation

  • Understand transcriptional regulation

Gene X

TF

  • Model transcriptional regulatory networks


Motivation1
Motivation Binding Classifiers

Previous works on motif finding

  • AlignACE (Hughes et al 2000)

  • ANN-Spec (Workman et al 2000)

  • BioProspector (Liu et al 2001)

  • Consensus (Hertz et al 1999)

  • Gibbs Motif Sampler (Lawrence et al 1993)

  • LogicMotif (Keles et al 2004)

  • MDScan (Liu et al 2002)

  • MEME (Bailey and Elkan 1995)

  • Motif Regressor (Colon et al 2003)

  • … …


Motivation2

A A C A T C C G Binding Classifiers

• • •

• • •

Motivation

A widely used model – Motif Weight Matrix

(Stormo et al 1982)

1 2 3 4 5 6 7 8

A 0.19 1.11 -0.17 1.65 -2.65 -2.66 -1.98 0.92

C -0.14 -0.49 1.89 -1.81 1.70 2.32 2.14 -2.07

G -1.39 0.25 -1.22 -1.07 -2.07 -2.07 -2.07 1.13

T 0.86 -1.39 -2.65 -2.65 0.41 -2.65 -1.16 -1.80

Score of the site =

= 10.84

vs. threshold

+

A sequence is a target if it contains a binding site (score > threshold).

Computational << Molecular


Motivation3
Motivation Binding Classifiers

Non-linear binding effects, e.g., different binding modes.

• • • CACCCATACAT • • •

Mode 1

Preferred binding

• • • CATCCGTACAT • • •

Mode 2

• • • CA C/T CC A/G TACAT • • •

• • • CACCCGTACAT • • •

Mode 3

Non-preferred binding

• • • CATCCATACAT • • •

Mode 4


Modeling
Modeling Binding Classifiers

Model a TF-DNA binding classifier as an ensemble model.

ensemble model

weight

base classifier


Modeling1

q Binding Classifiersm(Si)

hm(Si)

Modeling

The mth base classifier

Sequence scoring function:

fm(sik) is a site scoring function (weight matrix + threshold).

The scoring function considers

(a) the number of matching sites

(b) the degree of matching


Training boosting

(a) Decide the number of base classifiers. Binding Classifiers

(b) Learn the parameters of each base classifier and its weight.

Training – Boosting

Modify the confidence-rated boosting (CRB) algorithm (Schapire et al. 1999) to train ensemble models


Why boosting

Margin of training samples Binding Classifiers

Generalization error

Training error

Why Boosting?

Booting is a Newton-like technique that iteratively adds base classifiers to minimize the upper bound on the training error.

(Schapire et al. 1998)


Challenges
Challenges Binding Classifiers

•Positive sequences – targets of a TF

•Negative sequences

  • Sequences are labeled, but not the sites in the sequences.

  • Cannot be well separated by the weight matrix model (linear).

  • Number of negative sequences >> number of positive sequences.


Boosting
Boosting Binding Classifiers

Initialization

•Positive

  • Total weight of the positive samples == Total weight of the negative samples.

  • Since the motif must be an enriched pattern in the positive sequences, use Motif Regressor to find a seed motif matrix W0.

•Negative


Boosting1
Boosting Binding Classifiers

Train a base classifier (BC)

•Positive

•Negative

  • Use the seed matrix W0 +to initialize the mth base classifier qm() and let m=1.

  • Refine m and the parameters of qm() to minimize

where yi is the label of Si and dim is the weight of Si in the mth round.

BC 1

  • Negative information is explicitly used to train qm() and m.


Boosting2
Boosting Binding Classifiers

Adjust sample weights and gives higher weights to previously misclassified samples.

•Positive

•Negative

  • yi is the label of Si

  • dim is the weight of Si in the mth round.

  • dim+1 is the new weight of Si.

BC 1


Boosting3
Boosting Binding Classifiers

Add a new base classifier

•Positive

•Negative

BC 1

BC 2


Boosting4
Boosting Binding Classifiers

Add a new base classifier

•Positive

•Negative

Decision boundary


Boosting5
Boosting Binding Classifiers

Adjust sample weights again

•Positive

•Negative

Decision boundary


Boosting6
Boosting Binding Classifiers

Add one more base classifier

•Positive

•Negative

BC 3


Boosting7
Boosting Binding Classifiers

Add one more base classifier

•Positive

•Negative

Decision boundary


Boosting8
Boosting Binding Classifiers

•Positive

Stop if the result is perfect or the performance on the internal validation sequences drops.

•Negative

Decision boundary


Results
Results Binding Classifiers

Data: ChIP-chip data of Saccharomyces cerevisiae (Lee et al. 2002 )

  • Positive sequences

    • p-value < 0.001

    • Number of positive sequences  25.

  • Negative sequences

    • p-value  0.05 & ratio  1

Got 40 TFs.


Results1
Results Binding Classifiers

Leave-one-out test results

Boosted models vs. Seed weight matrices

Vertical axis: Improvements on specificity

Horizontal axis: TFs


Results2
Results Binding Classifiers

Capture Position-Correlation

+

RAP1

0

Weight Matrix

Base classifier 1

Base classifier 2

Base classifier 3

Boosting


Results3
Results Binding Classifiers

Capture Position-Correlation

REB1

Weight Matrix

Base classifier 1

Base classifier 2

Boosting


ad