cs 6890 offered by charles yan presented by jyothi sankuri
Skip this Video
Download Presentation
CS 6890 Offered by Charles Yan Presented by: Jyothi Sankuri

Loading in 2 Seconds...

play fullscreen
1 / 24

CS 6890 Offered by Charles Yan Presented by: Jyothi Sankuri - PowerPoint PPT Presentation

  • Uploaded on

Application of support vector machines for T-cell epitopes prediction By Yingdong Zhao, Clemencia Pinilla, Danila Valmori, Roland Martin and Richard Simon. CS 6890 Offered by Charles Yan Presented by: Jyothi Sankuri. Overview. Introduction Problem Why ? T-Cell epitopes SVMs Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' CS 6890 Offered by Charles Yan Presented by: Jyothi Sankuri' - rune

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs 6890 offered by charles yan presented by jyothi sankuri

Application of support vector machines forT-cell epitopes predictionByYingdong Zhao, Clemencia Pinilla, Danila Valmori,Roland Martin and Richard Simon.

CS 6890

Offered by

Charles Yan

Presented by: Jyothi Sankuri

  • Introduction
    • Problem
    • Why ?
    • T-Cell epitopes
    • SVMs
    • Results
  • Support Vector Machines( SVMs)
    • SVM Principle
    • Kernel Function
  • Systems and Methods
  • Results
  • Discussions & Conclusions
  • Remarks & Future Work
  • References
  • Problem

Training of SVMs for the Predicition of T-cell epitopes

  • T-Cell epitopes

Antigenic determinants recognized and bound by the T-cell receptor. Epitopes are antigenic determinant of an antigen due to which the

immune system recognizes it as an “antigen”

  • SVMs

Support Vector Machines

  • Why prediction of T-cell epitopes ?

Prediction of T-cell Epitopes

  • The T-cell receptor
    • A major histocompatibility complex (MHC) molecule, play major roles in the process of antigen-specific T-cell activation.
    • One receptor may recognize thousands of different peptides.
  • Deciphering the patterns of peptides that elicit a MHC restricted T-cell response is critical for vaccine development.
    • A crucial step in the design of subunit vaccines is the identification of T-cell epitopes in sets of disease-specific gene products.
  • Identifying characteristic patterns of immunogenic peptide epitopes can provide fundamental information for understanding disease pathogenesis and etiology, and for therapeutics such as vaccine development.
why svms
Why SVMs?
  • Because of the ability of SVMs to build effective predictive models when the dimensionality of the data is high and the number of observations is limited.
  • SVMs are based on a strong theoretical foundation for avoiding over-fitting training data.
  • SVMs are one of the most powerful newtechniques and have been effective in DNAsequence analysis, protein structure prediction and gene expression pattern discovery
different methods used for t cell epitope prediction
Different methods used for T-cell Epitope Prediction :
  • ANNs
  • Decision Tree Classifiers
  • Score Matrix Based Approach
  • SVMs
  • Definition
  • Training and Test Data sets
  • Training SVM
    • Structure Risk Minimization Induction Principle.
    • Kernel functions
    • Leave-One-out Cross Validation
  • Reason for using SVM with T-cell Epitope.
  • Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships.
  • A SVM performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories.
  • A set of features that describes one case is called a vector.
  • The goal of SVM modeling is to find the optimal hyperplane that separates clusters of vector in such a way that cases with one category of the target variable are on one side of the plane and cases with the other category are on the other size of the plane. The vectors near the hyperplane are the support vectors.
general svm function
General SVM function

u = SiaiyiK(xi,x) – b (1)

  • u : SVM output
  • a : weights to blend different kernels
  • y in {-1, +1} : desired output
  • b : threshold
  • xi : stored training example (support vector)
  • x : input (vector)
  • K : kernel function to measure similarity of xi to xi
training and test data sets
Training and Test Data sets
  • Dividing the peptides (data) into positive and negative groups.
  • Randomly sampling the group.
  • 80% is Training set & 20% is Test set.
  • Combining the two groups, separately in training and test sets.
  • Using pairwise Pearson coefficients to ensure peptides were dissimilar.
  • Training SVM
  • Training is done using SVMlight
  • 100 input values with class values of +1 & -1 .
  • Separation of classes with maximizing the margin.
  • An SVM analysis finds the line (or, in general, hyperplane) that is oriented so that the margin between the support vectors is maximized. In the figure above, the line in the right panel is superior to the line in the left panel.
  • Implementing Structural Risk Minimization Principle.
  • SVM classification function used here is:
  • For linear SVM, inner product Kernel function is used.

Kernel Function

  • The kernel function may transform the data into a higher dimensional space to make it possible to perform the separation.
  • The concept of a kernel mapping function is very powerful. It allows SVM models to perform separations even with very complex boundaries.
  • If f(x) is positive then sample is predicted to be in class +1 else in -1.
  • The {ai} and ‘b’ coefficients are determined by ‘learning’ the data.
  • During learning on the 80% training set, leave-one-out cross-validation.
  • Training and testing were repeated ten times for randomly determined training/test set partitions.
  • Comparison with other methods is based on PPV, sensitivity values and the ROC curve’s area.
    • Sensitivity is the portion of all positive peptides that are correctly identified which indicates the ability of the classifier to detect real epitopes.
    • PPV (positive predictive value) is the probability that a peptide predicted to be positive actually does stimulate the TCC.
      • PPV reflects the efficiency of the method. A classifier with low PPV will result in the generation of numerous non-stimulatory peptides for the next rounds of testing.
    • Receiver Operating Characteristic curve (or ROC curve) is a plot of the sensitivity and specificity.
results 2

b The SVM model was trained based on A.0201 restricted MHC binding data from SYFPEITHI database.C The SVM model was trained based on A.0201 restricted MHC binding data from MHCPEP database.The area under the averaged ROC curve was 0.919 for the SVM model.The area under the averaged ROC curve was 0.833 for the Score matrix based approach..

  • The ANN model had many more parameters than the SVM and requires a larger number of training peptides for equivalent performance.
  • In case of Decision Tress classifier it is easy to over fit and require large training sets. The optimal decision tree classifier had a sensitivity considerably less than the SVM.
  • In the development of SVM the input variables plays kernel function, optimal learning parameters play a vital role. Here the simple linear kernel performed best in our data set, compared to the polynomial and radial basis kernel functions.
  • Leave-one-out Cross validation was used to optimize the tuning parameters. In general turning parameters are chosen arbitrarily.
discussions 2
  • The comparisons clearly show our SVM approach to predict T-cell epitopes is superior to the publicly available methods such as SVMHC


  • SVM model greatly improved the predicting accuracy ( It is based on ROC curve).
  • SVMs can be effectively used for predicting T-cell epitopes. Using physical property factors to encode the candidate peptides enables SVM classifiers to achieve good performance with modest amounts of synthesized peptide training data.
  • This makes for an efficient process of prediction and synthesis of additional peptides because positive peptides are most informative.
  • Developing a support vector machine (SVM) for T-cell epitome prediction with an MHC type I restricted T-cell clone for the first time.
  • SVMs can be trained on relatively small data sets to provide prediction more accurate than those based on MHC binding.
  • Further investigations of the use of SVM for T-cell epitope

prediction are warranted as a potentially efficient and powerful

method for defining candidate autoantigens, finding the

antigenic targets and molecular mimics in complex infectious

organisms, and developing vaccines for infectious diseases

and cancers.

  • Support Vector Machine (SVM) is one of the best statistical learning methods.
  • Its performance is significantly better than that of competing methods.
  • The goal is to provide biologists a friendly tool to test their hypothesis.
remarks future work
Remarks & Future Work
  • Extending development of vector development of a support vector machine (SVM) for T-cell epitome prediction with an MHC type 2 restricted T-cell clone.
  • In this case simple linear kernel function was used ,SVM using polynomial and radial basis kernel functions.
  • Using different techniques to optimize the tuning parameters
about authors
About Authors
  • Dr.Yingdong Zhao

He works atNational Cancer Institure, BioMetric Research Branch

Division of Cancer Treatment and Diagnosis.

  • Clemencia Pinilla, Ph.D.

2Torrey Pines Institute for Molecular Studies,

San Diego, CA 92121, USA

  • Danila Valmori

Ludwig Institute Clinical Trial Center, New York, NY, and Ludwig

Institute for Cancer Research, Lausanne, Switzerland

  • Roland Martin

3Division of Clinical Onco-Immunology, Ludwig

Institute for Cancer Research, University Hospital (CHUV), Lausanne,


  • Richard Simon

He works atNational Cancer Institure, BioMetric Research Branch

Division of Cancer Treatment and Diagnosis.