Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee

Download Presentation

Optimization of SVM parameters in caspase cleavage sites prediction using grid-computing Lawrence Wee

Loading in 2 Seconds...

- 98 Views
- Uploaded on
- Presentation posted in: General

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Optimization of SVM parameters in

caspase cleavage sites prediction

using grid-computing

Lawrence Wee

What are caspases?

Caspases are downstream effectors in apoptosis 1

Extrinsic

Intrinsic

As the final effectors of apoptosis, caspases cleave many

protein substrates.

1. Hengartner MO. The biochemistry of apoptosis.Nature. 2000 Oct 12;407(6805):770-6.

Caspases are proteases

Caspase Cleavage of Substrates1

Caspases are cysteine proteases.

Recognize tetrapeptide sequence on substrates (P4-P3-P2-P1).

P4 P3 P2 P1 P1’ P2’

- D– E – V – D --- T – Y

Cleave after canonical Asp (D) residue at the P1 position.

- 1. Fuentes-Prior et al. Biochem J. 2004 Dec 1;384(Pt 2):201-32.
- 2. Thornberry et al. J Biol Chem. 1997 Jul 18;272(29):17907-11.

Caspases are proteases

The Enormous Range of Caspase Substrates1

Apoptotic regulators

Cytoskeletal proteins

Caspase

Substrates

Organelle proteins

DNA-associated proteins

Caspases

RNA-associated proteins

Cell signaling proteins

Cell cycle proteins

Viral proteins

More than 400 caspase substrates experimentally determined to date.1Many more await discovery.

Other proteins ???

1. Wee LJ, Tong JC, Tan TW, Ranganathan S. A multi-factor model for caspase degradome prediction. BMC Genomics. 2009, 10:S6.

Computation prediction of caspase cleavage sites

- Identification of caspase substrates is important for elucidating biological function of caspases.
- Refine our understanding of apoptotic and other caspase-dependent signaling pathways.
- Wet-laboratory efforts can be laborious.
- Consider computational prediction of caspase cleavage sites?

- Support Vector Machines (SVM)
- A type of machine learning algorithm
- Works very well for several biological problems
- Can be computationally hungry with large dimensions or parameters to optimize.

Prediction of caspase cleavage sites

Support Vector Machines: A Brief Introduction1

Data-points belonging to 2 distinct classes are represented as vectors.

A set of “learning” or “training” data-points belong to 2 classes (green and orange).

Each data-point has a unique set of attributes represented by vectors.

1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

Prediction of caspase cleavage sites

Support Vector Machines: A Brief Introduction1

The SVM algorithm constructs a “classifier” to discriminate the two classes.

Maximal margin hyperplane

The classifier is a maximal margin hyperplane that separates the two classes (green and orange)

Support Vectors

1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

Prediction of caspase cleavage sites

SVM: A Brief Introduction1

The SVM algorithm classifies new unseen data into one of two classes.

The classifier assigns the new data-point into one of the two classes based on where it is represented relative to the hyperplane.

New data-point assigned to

“orange” class.

1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

Prediction of caspase cleavage sites

SVM: A Brief Introduction1

SVM Decision Function with RBF kernel:

2 Parameters: C and gamma

1. Cortes,C. and Vapnik,V. (1995) Support vector networks. Machine Learning, 20, 273–293.

Prediction of caspase cleavage sites

Computational issues

Training dataset (390 sequences)

Leave-one-out

cross-validation

SVM Classifier

Predicting caspase cleavage sites

Computational issues

Leave-one-out cross-validation for a set of C and gamma values:

Training set (5 sequences)

Seq 1

Seq 2

Seq 3

Seq 4

Seq 5

Set 1

Set 2

Set 3

Set 4

Set 5

Trained classifier

Prediction of caspase cleavage sites

Computational issues

Training dataset (390 sequences)

For C=0.1, g=0.1,

Accuracy = 70%

Leave-one-out

cross-validation

SVM Classifier

Prediction of caspase cleavage sites

Grid-based (brute force) optimization of SVM parameters

Two Computational Issues

1. Leave-one-out cross-validation is computationally tedious.

With a dataset of 390 training examples, leave-one-out cross-validation takes ~12 secs using an Intel 2.66GHz Core2Duo processor with 4GB ram using 2 parameters (C and gamma).

Challenge: How fast will grid computers complete the same computation?

Two Computational Issues

2. Brute-force optimization is computationally tedious.

Challenge: How fast will grid computers complete the same computation (but repeated 100 times with different set of C and gamma values)?

Practical