My research work and clustering
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

My Research Work and Clustering PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

My Research Work and Clustering. Bernard Chen 2009. Outline. Introduction Experimental Setup Clustering Future Works. Central Dogma of Molecular Biology. Amino Acids, the subunit of proteins. Protein Primary, Secondary, and Tertiary Structure. Protein 3D Structure.

Download Presentation

My Research Work and Clustering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


My Research Work and Clustering

Bernard Chen 2009


Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Central Dogma of Molecular Biology


Amino Acids, the subunit of proteins


Protein Primary, Secondary, and Tertiary Structure


Protein 3D Structure


Protein Sequence Motif

  • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids

  • Sequence Motif:

    A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.


Protein Sequence Motif

These biologically significant regions or

residues are usually:

  • Enzyme catalytic site

  • Prostethic group attachment sites

    (heme, pyridoxal-phosphate, biotin…)

  • Amino acid involved in binding a metal ion

  • Cysteines involved in disulfide bonds

  • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)


Goal of the our group

  • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries.

  • Discuss the relation between Protein Primary structure and Tertiary structure


Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Experiment setup: HSSP matrix: 1b25


HSSP matrix: 1b25


Representation of Segment

  • Sliding window size: 9

  • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP.

  • More than 560,000 segments (413MB) are generated by this method.

  • DSSP: Obtain 2nd Structure information


Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Clustering Algorithms

  • There are two clustering algorithms we used in our approach:

  • K-means Clustering

  • Fuzzy C-means Clustering


K-means Clustering


K-means Clustering


K-means Clustering


K-means Clustering


K-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Fuzzy C-means Clustering


Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Original dataset

Fuzzy C-Means Clustering

Information Granule 1

...

Information Granule M

K-means

Clustering

...

K-means Clustering

Join Information

Final Sequence Motifs Information

Granular Computing Model


Motivation


Reduce Space-complexity

Table 1 summary of results obtained by FCM


Reduce Time-complexity

Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days)

Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days)

(FCM exe time) (2.7 Days)


HSSP-BLOSUM62 Measure


FutureWorks


  • Login