Protein local 3d structure prediction by super granule support vector machines super gsvm
Download
1 / 26

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM). Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas Fall 2009. Goal of the Dissertation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM)' - ezra-pace


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Protein local 3d structure prediction by super granule support vector machines super gsvm

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM)

Dr. Bernard Chen

Assistant Professor

Department of Computer Science

University of Central Arkansas

Fall 2009


Goal of the dissertation
Goal of the Dissertation Support Vector Machines (Super GSVM)

  • The main purpose is trying to obtain and extract protein sequence motifs information which are universally conserved and across protein family boundaries.

  • And then use these information to do Protein Local 3D Structure Prediction


Research flow
Research Support Vector Machines (Super GSVM)Flow

Part1

Bioinformatics Knowledge and Dataset Collection

Part2

Discovering Protein Sequence Motifs

Part3

Motif Information Extraction

Part4

Protein Local Tertiary Structure Prediction


Data set
Data set Support Vector Machines (Super GSVM)


Hssp matrix 1b25
HSSP matrix: 1b25 Support Vector Machines (Super GSVM)


Hssp matrix 1b251
HSSP matrix: 1b25 Support Vector Machines (Super GSVM)


Hssp matrix 1b252
HSSP matrix: 1b25 Support Vector Machines (Super GSVM)


Representation of segment
Representation of Segment Support Vector Machines (Super GSVM)

  • Sliding window size: 9

  • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP.

  • More than 560,000 segments (413MB) are generated by this method.

  • DSSP: Obtain 2nd Structure information


Research flow1
Research Support Vector Machines (Super GSVM)Flow

Part1

Bioinformatics Knowledge and Dataset Collection

Part2

Discovering Protein Sequence Motifs

Part3

Motif Information Extraction

Part4

Protein Local Tertiary Structure Prediction


Granular computing model

Original dataset Support Vector Machines (Super GSVM)

Fuzzy C-Means Clustering

Information Granule 1

...

Information Granule M

New Improved or Greedy K-means Clustering

...

New Improved or Greedy K-means Clustering

Join Information

Final Sequence Motifs Information

Granular Computing Model


Reduce time complexity
Reduce Time-complexity Support Vector Machines (Super GSVM)

Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days)

Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days)

(FCM exe time) (2.7 Days)


Comparison of quality measures
Comparison of Quality Measures Support Vector Machines (Super GSVM)


Research flow2
Research Support Vector Machines (Super GSVM)Flow

Part1

Bioinformatics Knowledge and Dataset Collection

Part2

Discovering Protein Sequence Motifs

Part3

Motif Information Extraction

Part4

Protein Local Tertiary Structure Prediction


Super gsvm fe motivation
Super GSVM-FE Motivation Support Vector Machines (Super GSVM)

  • First, the information we try to generate is about sequence motifs, but the original input data are derived from whole protein sequences by a sliding window technique;

  • Second, during fuzzy c-means clustering, it has the ability to assign one segment to more than one information granule.


Super gsvm fe

Original dataset Support Vector Machines (Super GSVM)

Fuzzy C-Means Clustering

Information Granule 1

Information Granule M

...

Five iterations of traditional K-maens

Five iterations of traditional K-maens

...

Greedy K-means Clustering

Greedy K-means Clustering

For Each

Cluster

For Each

Cluster

...

Ranking SVM

Feature Elimination

Ranking SVM

Feature Elimination

For Each

Cluster

For Each

Cluster

...

Collect Survived

Segments

Collect Survived

Segments

...

Greedy K-means Clustering

Greedy K-means Clustering

Join Information

Final Sequence Motifs Information

Super GSVM-FE

Additional Portion


Extracted motif information
Extracted Motif Information Support Vector Machines (Super GSVM)


Research flow3
Research Support Vector Machines (Super GSVM)Flow

Part1

Bioinformatics Knowledge and Dataset Collection

Part2

Discovering Protein Sequence Motifs

Part3

Motif Information Extraction

Part4

Protein Local Tertiary Structure Prediction


3d information
3D information Support Vector Machines (Super GSVM)

  • 3D information is generated from PDB (Protein Data Bank),

  • an example of 1a3c PDB file


3d information1
3D information Support Vector Machines (Super GSVM)

  • 3D information is generated from PDB (Protein Data Bank),

  • an example of 1a3c PDB file


Testing data
Testing Data Support Vector Machines (Super GSVM)

  • The latest release of PISCES includes 4345 PDB files.

  • Compare with the dataset in our experiment, 2419 PDB files are excluded.

  • Therefore, we regard our 2710 protein files as the training dataset and 2419 protein files as the independent testing dataset.


Testing data1
Testing Data Support Vector Machines (Super GSVM)

  • We convert the testing dataset by the approach we introduced

  • more than 490,000 segments are generated as testing dataset.


Super gsvm

Training dataset Support Vector Machines (Super GSVM)

Fuzzy C-Means Clustering

Information Granule 1

Information Granule M

...

Five iterations of traditional K-means

Five iterations of traditional K-means

...

Greedy K-means Clustering

Greedy K-means Clustering

For Each

Cluster

For Each

Cluster

...

Train Ranking SVM

and then

Eliminate 20% lower rank members

Train Ranking SVM

and then

Eliminate 20% lower rank members

Collect all extracted clusters and Ranking-SVMs

Find the closest cluster within a given distance threshold

Feed to the belonging SVM

If the rank belongs to cluster

Independent testing Dataset

All Sequence clusters

All Ranking SVMs

Predict the local 3D structure

If not, find the next closest cluster

Super GSVM


Prediction accuracy
Prediction Accuracy Support Vector Machines (Super GSVM)


Prediction coverage
Prediction Coverage Support Vector Machines (Super GSVM)


Future works
Future Works Support Vector Machines (Super GSVM)

  • Incorporate

    Chou-Fasman parameter for SVM training


Future works1

Training dataset Support Vector Machines (Super GSVM)

Fuzzy C-Means Clustering

Information Granule 1

Information Granule M

...

Five iterations of traditional K-means

Five iterations of traditional K-means

...

Greedy K-means Clustering

Greedy K-means Clustering

For Each

Cluster

For Each

Cluster

...

Build Decision Tree

Build Decision Tree

Collect all extracted clusters and Ranking-SVMs

Find the closest cluster within a given distance threshold

Feed to the belonging DT

If the rank belongs to cluster

Independent testing Dataset

All Sequence clusters

Test by DT

Predict the local 3D structure

If not, find the next closest cluster

Future Works

  • For each cluster, instead of building SVM model, we build Decision Tree instead