My research work and clustering
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

My Research Work and Clustering PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

My Research Work and Clustering. Bernard Chen 2009. Outline. Introduction Experimental Setup Clustering Future Works. Central Dogma of Molecular Biology. Amino Acids, the subunit of proteins. Protein Primary, Secondary, and Tertiary Structure. Protein 3D Structure.

Download Presentation

My Research Work and Clustering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


My research work and clustering

My Research Work and Clustering

Bernard Chen 2009


Outline

Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Central dogma of molecular biology

Central Dogma of Molecular Biology


Amino acids the subunit of proteins

Amino Acids, the subunit of proteins


Protein primary secondary and tertiary structure

Protein Primary, Secondary, and Tertiary Structure


Protein 3d structure

Protein 3D Structure


Protein sequence motif

Protein Sequence Motif

  • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids

  • Sequence Motif:

    A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.


Protein sequence motif1

Protein Sequence Motif

These biologically significant regions or

residues are usually:

  • Enzyme catalytic site

  • Prostethic group attachment sites

    (heme, pyridoxal-phosphate, biotin…)

  • Amino acid involved in binding a metal ion

  • Cysteines involved in disulfide bonds

  • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)


Goal of the our group

Goal of the our group

  • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries.

  • Discuss the relation between Protein Primary structure and Tertiary structure


Outline1

Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Experiment setup hssp matrix 1b25

Experiment setup: HSSP matrix: 1b25


Hssp matrix 1b25

HSSP matrix: 1b25


Representation of segment

Representation of Segment

  • Sliding window size: 9

  • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP.

  • More than 560,000 segments (413MB) are generated by this method.

  • DSSP: Obtain 2nd Structure information


Outline2

Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Clustering algorithms

Clustering Algorithms

  • There are two clustering algorithms we used in our approach:

  • K-means Clustering

  • Fuzzy C-means Clustering


K means clustering

K-means Clustering


K means clustering1

K-means Clustering


K means clustering2

K-means Clustering


K means clustering3

K-means Clustering


K means clustering4

K-means Clustering


Fuzzy c means clustering

Fuzzy C-means Clustering


Fuzzy c means clustering1

Fuzzy C-means Clustering


Fuzzy c means clustering2

Fuzzy C-means Clustering


Fuzzy c means clustering3

Fuzzy C-means Clustering


Fuzzy c means clustering4

Fuzzy C-means Clustering


Fuzzy c means clustering5

Fuzzy C-means Clustering


Fuzzy c means clustering6

Fuzzy C-means Clustering


Outline3

Outline

  • Introduction

  • Experimental Setup

  • Clustering

  • Future Works


Granular computing model

Original dataset

Fuzzy C-Means Clustering

Information Granule 1

...

Information Granule M

K-means

Clustering

...

K-means Clustering

Join Information

Final Sequence Motifs Information

Granular Computing Model


Motivation

Motivation


Reduce space complexity

Reduce Space-complexity

Table 1 summary of results obtained by FCM


Reduce time complexity

Reduce Time-complexity

Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days)

Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days)

(FCM exe time) (2.7 Days)


Hssp blosum62 measure

HSSP-BLOSUM62 Measure


Future works

FutureWorks


  • Login