390 likes | 469 Views
This study focuses on the structural validation of homology sequences within the Dali Domain Dictionary, a comprehensive database of protein domain structures. With over 10,000 PDB entries, 17,000 protein chains, and 1375 fold types, the Dali Domain Dictionary provides a numerical taxonomy of known domain structures. Through the analysis of 5 supersecondary structure motifs, 2582 functional families, and 3724 domain sequence families, this research offers insights into the evolutionary relationships and functional classifications of proteins. Additionally, a global representation of protein fold space is presented, utilizing a matrix of similarity scores to define the orthogonal axes that represent the geometric centroid of observed folds.
E N D
Structural Validation of Homology 19% Seq ID Z = 12.2 Adenylate Kinase Guanylate Kinase
Dali Domain DictionaryDeitman, Park, Notredame, Heger, Lappe, and Holm Nucleic Acids Res. 29: 5557 (2001) • Dali Domain Dictionary is a numerical taxonomy of all known domain structures in the PDB • Evolves from Dali / FSSP Database Holm & Sander, Nucl. Acid Res. 25: 231-234 (1997) • Dali Domain Dictionary Sept 2000 • 10,532 PDB enteries • 17,101 protein chains • 5 supersecondary structure motifs (attractors) • 1375 fold types • 2582 functional families • 3724 domain sequence families
Most proteins in biology have been produced by the duplication, divergence and recombination of the members of a small number of protein families. courtesy of C. Chothia
Cadherins courtesy of C. Chothia
A Global Representation of Protein Fold SpaceHou, Sims, Zhang, Kim, PNAS 100: 2386 - 2390 (2003) Database of 498 SCOP “Folds” or “Superfamilies” The overall pair-wise comparisons of 498 folds lead to a 498 x 498 matrix of similarity scores Sijs, where Sij is the alignment score between the ith and jth folds. An appropriate method for handling such data matrices as a whole is metric matrix distance geometry . We first convert the similarity score matrix [Sij] to a distance matrix [Dij] by using Dij = Smax - Sij, where Smax is the maximum similarity score among all pairs of folds. We then transform the distance matrix to a metric (or Gram) matrix [Mij] by using Mij = Dij2 - Dio2 - Djo2 where Di0, the distance between the ith fold and the geometric centroid of all N = 498 folds. The eigen values of the metric matrix define an orthogonal system of axes, called factors. These axes pass through the geometric centroid of the points representing all observed folds and correspond to a decreasing order of the amount of information each factor represents.
A Global Representation of Protein Fold SpaceHou, Sims, Zhang, Kim, PNAS 100: 2386 - 2390 (2003)