1 / 1

HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE

Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group. Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor.

lindsay
Download Presentation

HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Figure 4. Simple and Average politics of hierarchical clustering for Hydrogenase maturation factor group. Figure 5. Simple and Average politics of hierarchical clustering for Transcription elongation factor. Figure 1. Superoxide dismutase clusters using a simple and average linkage politic of hierarchical clustering. Figure 6. Seed Linkage in Ferredoxin goup. Conclusions Thus, hierarchical clustering of secondary structure represents a novel and more stringent clustering procedure than COG, Seed Linkage and PSI-BLAST although not so stringent as UniRef50 procedure. These pictures below show two representants of the superoxide dismutases group that have resolved structures in the PDB database. These proteins are aparently similar, but they are in a different clusters. Figure 2. Ferredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 3. Peroxiredoxin clusters using a simple and average linkage politic of hierarchical clustering. Figure 7. Two proteins of Superoxide dismutases group that are in different clusters.. HIERARCHICAL CLUSTERING OF HOMOLOG GROUPS BASED ON OVERLAP OF SECONDARY STRUCTURE Oto Coelho Jr., Lucas Santos, Adriano Silva, Izabella Pena,J Miguel Ortega¹ otojr@dcc.ufmg.br, lucass@ufmg.br, adrianoufpe@yahoo.com.br, ipena@ufmg.br, miguel@icb.ufmg.br INTRODUCTION Alignment of primary structure is the basis of detection of putative homologous proteins. The software BLAST is the most popular and efficient tool for calculation of these alignments. This software prints a value (called by E-value) that represents the number of alignments which scores equal or better than those obtained without homology relationship. However, when score is under a certain limit (< 100), the E-value will not achieve a trustable value. In this work we used the pair-wise overlapping of secondary structure to build clusters of homolog groups, using as a model the COG of Superoxide dismutase (COG0605) bearing 74 sequences, enriched with UniRef50 members. Metodology A total of 551 sequences were submitted to the prediction of secondary structure with the software SSPro4, resulting in output files using the alphabet of H: alpha-helix, E: beta-strand and C: unstructured coil. Predictions were pair-wised aligned (Clustal/W) and used to calculate the Structural overlap (SOV). The SOV result is a percentage of secondary structure overlap. After this, a local implementation of hierarchical clustering (Average and Simple Linkage politic) was used to create clusters (named CSO after clusters of structural overlap) over the filtered similarity values in a cutoff of 70%. Results In a Simple Linkage politic (considering the maximum SOV value between sequences) was able to create 15 clusters containing from two up to 224 members. In a Average Linkage politic (considering the average SOV between sequences) Using Sov values, a local implementation of hierarchical clustering was able to create 6 clusters containing from one up to 510 members We evaluated the clustering procedure by analyzing the CSO in respect of their UniRef_50 clusters composition. We found that CSO0605_1 (arrow) gathered members of 22 different UniRef50 clusters. Conversely, Seed Linkage software and PSI-BLAST grouped, respectively, 541 and 531 sequences. We also calculated the hierarchical clustering in simple and average politics for four other groups of proteins: Ferredoxin, Peroxiredoxin, Hydrogenase Maturation Factor and Transcription Elongation Factor. Supported by UFMG • Laboratório de Biodados. Departamento de Bioquímica e Imunologia, ICB-UFMG

More Related