340 likes | 427 Views
Explore protein sequence motifs using clustering algorithms like K-means and Fuzzy C-means. Extract universality and relation to protein structures. Reduce time and space complexity for motif identification.
E N D
My Research Work and Clustering Bernard Chen 2009
Outline • Introduction • Experimental Setup • Clustering • Future Works
Protein Sequence Motif • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids • Sequence Motif: A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.
Protein Sequence Motif These biologically significant regions or residues are usually: • Enzyme catalytic site • Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin…) • Amino acid involved in binding a metal ion • Cysteines involved in disulfide bonds • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)
Goal of the our group • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries. • Discuss the relation between Protein Primary structure and Tertiary structure
Outline • Introduction • Experimental Setup • Clustering • Future Works
Representation of Segment • Sliding window size: 9 • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. • More than 560,000 segments (413MB) are generated by this method. • DSSP: Obtain 2nd Structure information
Outline • Introduction • Experimental Setup • Clustering • Future Works
Clustering Algorithms • There are two clustering algorithms we used in our approach: • K-means Clustering • Fuzzy C-means Clustering
Outline • Introduction • Experimental Setup • Clustering • Future Works
Original dataset Fuzzy C-Means Clustering Information Granule 1 ... Information Granule M K-means Clustering ... K-means Clustering Join Information Final Sequence Motifs Information Granular Computing Model
Reduce Space-complexity Table 1 summary of results obtained by FCM
Reduce Time-complexity Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)