1 / 16

PSI-BLAST and Multiple Sequence Alignments

Doug Raiford Lesson 5. PSI-BLAST and Multiple Sequence Alignments. Left off…. Dynamic programming methods Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) BLAST. Fixed: best Linear: next best Polynomial (n 2 ): not bad Exponential (3 n ): very bad. But….

gigi
Download Presentation

PSI-BLAST and Multiple Sequence Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Raiford Lesson 5 PSI-BLAST andMultiple Sequence Alignments

  2. Left off… • Dynamic programming methods • Needleman-Wunsch (global alignment) • Smith-Waterman (local alignment) • BLAST Fixed: best Linear: next best Polynomial (n2): not bad Exponential (3n): very bad

  3. But… • BLAST fast (linear) • But not as sensitive Speed Sensitivity

  4. How improve sensitivity? • Similarity matrix • Especially with amino acids • Some amino acids have similar chemical characteristics • Similarity to all 8,000 3-mers calculated • Usually ~50 are above a threshold • All of these ~50 are considered hits when searching • Matrices • PAM (Point Accepted Mutation) • Built from observed substitution rates in closely related proteins • BLOSOM (BLOckSUbstitution Matrix) • Built from observed substitution rates in evolutionarily divergentproteins

  5. Build own matrix on the fly • PSI-BLAST (Position Specific Iterative) • Align using default similarity matrix • At each query location build a Position Specific Scoring Matrix (PSSM) based upon observed search and alignment results • Repeat with new matrix until results no longer change PSI-BLAST Build sensitivity by specifying allowed similarity at each position Slower, but still faster than local alignment

  6. Importance of sequence alignment • Central to bioinformatics • Need for • Phylogeny • Protein function • Protein structure • Structure  function • Drug discovery

  7. Conserved regions • Some parts of proteins are very important to maintain function • Must be similar from species to species • Can we spot these regions through alignment? atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctcgatacgtgccgcaggagatcaggactttcacct--tggatcatgcgaccgtacctac

  8. Why is this important? • Often conserved regions are near active sights • Ligand binding sights (docking) • Protein-to-protein interface • Important regions for tertiary structure Ligand: small molecule, target of protein, e.g. O2 is the ligand for hemoglobin Substrate: a molecule upon which an enzyme acts

  9. How can we improve detection? • What if we look at more proteins • Increase our confidence? • But how to go about performing multiple sequence alignment? atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag

  10. Exhaustively • Hyper-dimensional dynamic programming • Becomes exponential with respect to number of sequences • O(nL) with L = number of sequences

  11. Progressive approach • Determine all pair-wise distances • Fast: number of l-mermatches • Slower: full global alignments • Start with closest pairand aligns • Then aligns the next closest to those two • And so on.. ClustalW: cluster-alignment

  12. Aligning to a set of previously aligned sequences • Profile: matrix of real values, representing the probability of amino acids at each position in a corresponding multiple sequence alignment • A modification of the Smith/Waterman algorithm • Degree to which an aa is preferred is the degree of match between the profile and the sequence Consensus 1 M.ERS.HLPEG.PFAAALSGARFAAQSSGN.ASVL..DWNVLP.E 38 | : : : || : ::::: : |: | ::|: : | : OPSD_XENLA 1 MNG.GTE..EGPN.NFYVP.PMS...SN.NKTGVVRSP.P..PFD 33

  13. Issues • Mistakes early in a progressive approach propagated throughout process • Once aligned not revisited • Iterative methods devised to revisit • Newest version of ClustalW (version 2) includes iteration • Other MSA apps • T-Coffee • PSalign • DIALIGN • MUSCLE

  14. Visualizing with a motif logo • Height of letter represents how prevalent that letter is at that position

  15. Bit Scores • Scores are affected by sequence lengths • If want scores that can be compared across different query lengths need to normalize • Term “bit” comes from fact that probabilities are stored as log2 values (binary, bit) • Done so can add across length of sequence instead of multiply Database Searches

More Related