200 likes | 274 Views
Explore methods for finding structurally similar proteins, including RMSD, sequence alignment, dynamic programming, and Monte Carlo approach for optimal pairing choices. Understand the benefits of tools like SAP, Dali, and Pfam in enhancing protein conformation analysis.
E N D
Doug Raiford Lesson 18 Protein Structure Searches Protein Structure Searches
Problem definition • Given a protein conformation can we find other structurally similar proteins? • Might have a database of structures (like the PDB) Protein Structure Searches
If have a predicted and known… • Can do a simple RMSD to compare the two conformations • Know precisely which aa’s compare to which Protein Structure Searches
What about if not identical sequences? • Must map aa’s from one to aa’s in the other • How might you do this? • Sequence similarity • MSA’s Protein Structure Searches
Have we seen before? • 3D PSSM • Sequence alignment integrated with 3D alignment • Stored in profile (position specific similarity profile) • Gens 1D profiles first (MSAs) • Then uses a structural alignment program (SAP) to augment profiles with structural similarity Protein Structure Searches
SAP (structural alignment program) • Aligning secondary structures Protein Structure Searches
How? • What do you think of when you hear that you will need to align two things? • Dynamic programming Protein Structure Searches
Scoring • Three components • AA similarity (substitution matrix) • Local structure • E.g. both aa’s members of alpha helix • Solvent exposure Are the associated AA’s similar, sequence wise (i.e. both glycines)? Are they both in a similar local structure? Are they both buried or both exposed to solvent? Protein Structure Searches
Benefits • SAP (structure alignment) allows a profile to be influenced by secondary structure • Useful to 3D PSSM in thatthreading decisions (whichaa’s match to a profile) • Homology based protein conformation enhancedby making better decisions on where to insert gaps/varying length loops Protein Structure Searches
Another already seen • PFAM • Have Markov Models for protein families • Sequences that match models have high probability of matching conformation • Even though not comparing structures (query to target) • are matching a sequence to its most probable structure Pfam HMMR Protein Structure Searches
What about similar structure in an alternative way? • Can’t really align • How else might it work? Protein Structure Searches
Dali (distance matrix alignment) • How might two distance matrices look? • All pair wise distances from each aa to all other aa’s • If identical proteins the matrices would be almost identical Low distance region if hair pin (anti-parallel) Low distance region in matrix if parallel Protein Structure Searches
How turn into a similarity score? • Find optimum set of similar sub-structures • Even if in different 1D locations • Find amino acid equivalence • Once have equivalence can easily compare structure similarity • E.g. with RMSD Protein Structure Searches
Approach • Break matrix into a bunch of overlapping sub-matrices • Do an all pair wise comparison • Sub-matrices are merged that naturally extend • Must find pairings of sub-matrices that yield best overall score Protein Structure Searches
How optimize choice of pairings • Monte Carlo approach • Randomly generate pairings • Calculate overall similarity • Multiple solutions in parallel • Slowly improve each by randomly altering pairings (like a random search) • Have some probability of keeping a solution that is worse than previous Protein Structure Searches
Once have aa associations… • Can determine similarity • How? Protein Structure Searches
Have to minimize aa distances • Must perturb XYZ (translation), pitch, and yaw (rotation) of one of the proteins minimizing RMSD • Like linear regression • Can’t do until know which aa’s are associated Protein Structure Searches
Have to minimize aa distances • Some numeric methods start by fixing between 2 and 4 amino acids • Some short cuts • Center of gravity is the average of all vectors • Translate • ave(p1) – ave(p2) • Singular value decomposition to rotate (Like Eivenvectors) Protein Structure Searches
Score more complex so… • Requires double dynamic programming • If nxm matrix then n times m different matrices generated pinning return path to each aa pair • Used to generate a position specific scoring which is then used in aa similarity scoring • Reduces the constraint that two particular aa’s are equivalent … Protein Structure Searches