1 / 18

Appendix: Automated Methods for Structure Comparison

Appendix: Automated Methods for Structure Comparison. Basic problem: how are any two given structures to be automatically compared in a meaningful way? How are distant relationships to be recognized? program method DALI distance matrix comparison (basis

althea
Download Presentation

Appendix: Automated Methods for Structure Comparison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Appendix: Automated Methods for Structure Comparison • Basic problem: how are any two given structures to be automatically compared in a meaningful way? • How are distant relationships to be recognized? programmethod DALI distance matrix comparison (basis for FSSP structural classification) SSAP dynamic programming (used in CATH to classify topologies) VAST convert secondary structures to vectors and align vectors

  2. Structure comparison is pretty easy when two proteins are very similar • when two proteins are so similar that the sequences can be reliably aligned, say >35% identical, structure comparison can proceed from the seq. alignment: 1. Align the sequences sequence 1: YIREV-GKL sequence 2: YITQVRNKA 2. Superpose the structures to minimize the RMSD for equivalent residue pairs in the alignment note: these structures do not correspond to the sequences above

  3. it is harder when the proteins are very different... • if one cannot align the sequence reliably, how does one establish which residues, if any, play equivalent structural roles in the two proteins? • the answer is to attempt to align the structures directly in such a way that structural equivalencies in the two proteins are revealed • we will discuss how the distance-matrix based algorithm of DALI solves this problem

  4. Distance Matrices • 2D representation of 3D structure • plot sequence against itself • identify pairs of residues which are close in space to each other • usually distance between C-alpha carbons is used • identify closeness between residues as dark parts of the matrix

  5. Distance matrices

  6. Different substructures, such as secondary or supersecondary structures, give rise to distinct patterns in the matrix e.g. antiparallel vs. parallel beta-sheets in principle, one could recognize structural similarity in two proteins by comparing patterns in distance matrices, but it’s not that simple

  7. Problem: two structures with the same topology may differ in the precise location of secondary structure elements along the sequence, i.e. loop lengths may differ same fold, different matrices

  8. Or two common architectures may differ in connectivity (topology)... both three-stranded antiparallel beta-sheets how might we compare their distance matrices to reveal this similarity?

  9. DALI algorithm • not useful to compare entire matrices • instead, chop distance matrices into all possible submatrices of 6x6 amino acids • compare this set of submatrices for pattern similarities rather than comparing entire matrix

  10. 1. identify a pair of matching submatrices within the two matrices make an initial sequence alignment from this match...

  11. 2. Identify a second pair which overlaps the first(contains one common structural element)

  12. 3. Combine overlapping pairs overall alignment of structurally equivalent sequence regions

  13. 4. Rearrange and “collapse” the matrixaccording to the aligned regions of the sequence now the common structural elements are aligned as are the structurally equivalent residues in the sequence!

  14. All together now...

  15. The Power of DALI • DALI is quite powerful because it can recognize architectural similarities even when topologies are different. • It is also flexible because it can be made more topologically restrictive (i.e. no swapping of segments in chain allowed) to focus on closer relationships

  16. FSSP uses DALI alignments to classify structures 8320 947 1484 540 eliminate similar sequences divide into domains all PDB entries representative set of structures representative set of domains group domains into fold types (clusters of similar structures) and make set of representatives of each fold align domains with DALI!

  17. Judging DALI alignments • Z-score: how much better than average is the alignment, i.e. how many standard deviations from the mean of a distribution of alignments of random pairs of proteins. >16 very close, 8-16 pretty close, <8 not so close. • RMSD: root mean square deviation of alpha carbons for the matching portion of the structures. • LALI: length of alignment (recognizably matching portion of the structures) • LSEQ2: total length of the sequence being matched. • %IDE: % sequence identity between the two sequences

  18. if you go into FSSP, and search for a particular structure, you’ll get an output of its best DALI alignments with other structures STRID2 Z RMSD LALI LSEQ2 %IDE PROTEIN 1plc 24.4 0.0 99 99 100 Plastocyanin (cu2+, ph 6.0) 2pcy 23.4 0.2 99 99 100 Apo-plastocyanin (pH 6.0) 1bqk 12.1 2.0 89 124 29 pseudoazurin 1aac 11.0 1.9 84 104 24 amicyanin 1ibzA 9.1 2.5 83 111 19 nitrosocyanin 1qhqA 8.3 2.4 87 139 29 auracyanin 1rcy 8.2 2.5 90 151 17 rusticyanin biological_unit 1qniA 7.7 2.2 78 572 19 nitrous-oxide reductase 1kcw 7.1 2.4 81 1017 17 ceruloplasmin biological_unit 2cuaA 7.0 2.2 80 122 15 cua fragment 1nwpA 6.7 3.1 85 128 24 azurin

More Related