Protein Sequence Motif Extraction through Clustering Analysis

My Research Work and Clustering Bernard Chen 2009

Outline • Introduction • Experimental Setup • Clustering • Future Works

Central Dogma of Molecular Biology

Amino Acids, the subunit of proteins

Protein Primary, Secondary, and Tertiary Structure

Protein 3D Structure

Protein Sequence Motif • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids • Sequence Motif: A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.

Protein Sequence Motif These biologically significant regions or residues are usually: • Enzyme catalytic site • Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin…) • Amino acid involved in binding a metal ion • Cysteines involved in disulfide bonds • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)

Goal of the our group • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries. • Discuss the relation between Protein Primary structure and Tertiary structure

Experiment setup: HSSP matrix: 1b25

HSSP matrix: 1b25

Representation of Segment • Sliding window size: 9 • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. • More than 560,000 segments (413MB) are generated by this method. • DSSP: Obtain 2nd Structure information

Clustering Algorithms • There are two clustering algorithms we used in our approach: • K-means Clustering • Fuzzy C-means Clustering

K-means Clustering

Fuzzy C-means Clustering

Original dataset Fuzzy C-Means Clustering Information Granule 1 ... Information Granule M K-means Clustering ... K-means Clustering Join Information Final Sequence Motifs Information Granular Computing Model

Motivation

Reduce Space-complexity Table 1 summary of results obtained by FCM

Reduce Time-complexity Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)

HSSP-BLOSUM62 Measure

FutureWorks

Protein Sequence Motif Extraction through Clustering Analysis

Protein Sequence Motif Extraction through Clustering Analysis

Presentation Transcript

My Research

My English work

My Work Experience

My work Experience

My Work Career

My work

My mum and her work

MY STUDIES HISTORY AND MY WORK

My work Experience

My work project

My work

Work and research experiences

My Work

My Work

My research

My Education and My Research

MY WORK

Clustering and Research Works

My Work

Introduction: My name and work

My work

My research