A discriminative method for protein
Download
1 / 17

A discriminative method for protein remote homology detection based on N-Gram - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

A discriminative method for protein remote homology detection based on N-Gram. Reporter : Xie sifa Mentor : Zou quan. Outline. Introduction. Method. Improve P&R. Conclusion. Introduction. Introduction. Protein homology detection. detect 10%~30% protein structure.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A discriminative method for protein remote homology detection based on N-Gram ' - jane-hoffman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A discriminative method for protein remote homology detection based on n gram

A discriminative method for protein

remote homology detection based on N-Gram

Reporter : Xie sifa

Mentor : Zou quan


A discriminative method for protein remote homology detection based on n gram

Outline

Introduction

Method

Improve P&R

Conclusion



A discriminative method for protein remote homology detection based on n gram

Introduction

Protein homology detection

detect 10%~30%

protein structure

Remote homology detection

...ATTATCCGACGGCCGCCT...

...TCATCTGCACGGCCTCAC...

Similarity<25%

--《生物信息学基础》

孙啸,陆祖宏,谢建明


A discriminative method for protein remote homology detection based on n gram

Process

Data Set

Feature Extraction

Classify


A discriminative method for protein remote homology detection based on n gram

Date Set

Benchmark (Liao and Noble,2003)

Same

superfamily

Similatiry<10-25

4352proteins

TrainSet

Different

family

54 Families

Familyi

Same

family

Test

Set

Different

family


A discriminative method for protein remote homology detection based on n gram

Ngram

2Gram: 400

3Gram: 8000

1Gram: 20

"A Closer Look at Skip-gram Modelling"

--David Guthrie,Ben Allison et al

Skip-Ngram:

"I hit the tennis ball"

"hit the ball" !!!

"the tennis ball"

"I hit the"

"hit the tennis"


A discriminative method for protein remote homology detection based on n gram

Random Forest

Ensemble !!!


A discriminative method for protein remote homology detection based on n gram

Result

the area under the ROC curve

up to first 50 false positives




A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

Unbalance data set

Trade-off


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

One family one threshold


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

Train set

0.98+

0.95+

0.93+

0.92+

0.90-

0.87-

0.85+

0.84-

0.81+

0.79+

0.77-

0.75-

0.73-

0.69+

0.65-

0.62-

0.58-

0.55-

0.53-

F value

0.88

0.85

0.82

0.79

0.78

0.76

0.75

0.72

0.70

0.68

0.67

0.63

0.60

0.57

0.56

0.54

0.51

0.49

0.48

0.79

New test

New train

F value

F value

no value

but position!

F value



A discriminative method for protein remote homology detection based on n gram

Conclusion

1. Ngram model is successfully used to detect protein remote homology.

The result on the benchmark is satisfied.

2. A novel method is proposed to improve the recall and precision of positive samples. This method yields values of 0.86752 and 0.56470 for mean recall and mean precision, respectively.


ad
  • Login