A discriminative method for protein
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

A discriminative method for protein remote homology detection based on N-Gram PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

A discriminative method for protein remote homology detection based on N-Gram. Reporter : Xie sifa Mentor : Zou quan. Outline. Introduction. Method. Improve P&R. Conclusion. Introduction. Introduction. Protein homology detection. detect 10%~30% protein structure.

Download Presentation

A discriminative method for protein remote homology detection based on N-Gram

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A discriminative method for protein remote homology detection based on n gram

A discriminative method for protein

remote homology detection based on N-Gram

Reporter : Xie sifa

Mentor : Zou quan


A discriminative method for protein remote homology detection based on n gram

Outline

Introduction

Method

Improve P&R

Conclusion


A discriminative method for protein remote homology detection based on n gram

Introduction


A discriminative method for protein remote homology detection based on n gram

Introduction

Protein homology detection

detect 10%~30%

protein structure

Remote homology detection

...ATTATCCGACGGCCGCCT...

...TCATCTGCACGGCCTCAC...

Similarity<25%

--《生物信息学基础》

孙啸,陆祖宏,谢建明


A discriminative method for protein remote homology detection based on n gram

Process

Data Set

Feature Extraction

Classify


A discriminative method for protein remote homology detection based on n gram

Date Set

Benchmark (Liao and Noble,2003)

Same

superfamily

Similatiry<10-25

4352proteins

TrainSet

Different

family

54 Families

Familyi

Same

family

Test

Set

Different

family


A discriminative method for protein remote homology detection based on n gram

Ngram

2Gram: 400

3Gram: 8000

1Gram: 20

"A Closer Look at Skip-gram Modelling"

--David Guthrie,Ben Allison et al

Skip-Ngram:

"I hit the tennis ball"

"hit the ball" !!!

"the tennis ball"

"I hit the"

"hit the tennis"


A discriminative method for protein remote homology detection based on n gram

Random Forest

Ensemble !!!


A discriminative method for protein remote homology detection based on n gram

Result

the area under the ROC curve

up to first 50 false positives


A discriminative method for protein remote homology detection based on n gram

Result


A discriminative method for protein remote homology detection based on n gram

Result


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

Unbalance data set

Trade-off


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

One family one threshold


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision

Train set

0.98+

0.95+

0.93+

0.92+

0.90-

0.87-

0.85+

0.84-

0.81+

0.79+

0.77-

0.75-

0.73-

0.69+

0.65-

0.62-

0.58-

0.55-

0.53-

F value

0.88

0.85

0.82

0.79

0.78

0.76

0.75

0.72

0.70

0.68

0.67

0.63

0.60

0.57

0.56

0.54

0.51

0.49

0.48

0.79

New test

New train

F value

F value

no value

but position!

F value


A discriminative method for protein remote homology detection based on n gram

Improving Recall and Precision


A discriminative method for protein remote homology detection based on n gram

Conclusion

1. Ngram model is successfully used to detect protein remote homology.

The result on the benchmark is satisfied.

2. A novel method is proposed to improve the recall and precision of positive samples. This method yields values of 0.86752 and 0.56470 for mean recall and mean precision, respectively.


  • Login