1 / 40

EX3

EX3. Sequence Alignment: BLAST and Psi-BLAST. Outline. Pairwise alignment : Alignment with gaps Global alignment Local alignment Blast: NCBI BLAST web server NCBI PSI-BLAST web server BLAST through Chimera. Introduction. The Limits of Sequence Similarity. Introduction.

miette
Download Presentation

EX3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EX3 SequenceAlignment: BLAST and Psi-BLAST

  2. Outline • Pairwise alignment: • Alignment with gaps • Global alignment • Local alignment • Blast: • NCBI BLAST web server • NCBI PSI-BLAST web server • BLAST through Chimera

  3. Introduction The Limits of Sequence Similarity

  4. Introduction Example: Aligning Two Globins Human Hemoglobin (HH): VLSPADKTNVKAAWGKVGAHAGYEG Sperm Whale Myoglobin (SWM): VLSEGEWQLVLHVWAKVEADVAGHG

  5. Introduction Example: Aligning Two Globins (HH) VLSPADKTNVKAAWGKVGAHAGYEG (SWM) VLSEGEWQLVLHVWAKVEADVAGHG • No Gaps: • Percent identity: 36 • Percent similarity: 40

  6. Introduction Example: Aligning Two Globins (HH) VLSPADKTNVKAAWGKVGAH-AGYEG (SWM) VLSEGEWQLVLHVWAKVEADVAGH-G • With Gaps: • Gaps: 2 • Percent identity: 45.833 (instead of 36 without gaps) • Percent similarity: 54.167 (instead of 40 without gaps)

  7. Introduction How do gaps create? Indelsarerarein evolution. They vary in size from one base pair to a section of one chromosome. Insertion Deletion

  8. Introduction Types of Gap Penalties • Once a gaps is created, easy to extend: • Gap open – penalty for the first residue in a gap • Gap extension – penalty for additional residue in a gap. Conclusion: gap opening and extension should be ranked differently. Gap opening will get higher penalty.

  9. Introduction Proteins scoring matrices

  10. Introduction Proteins scoring matrices PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45 Closer sequences Distant sequences

  11. Introduction Scoring • The final score of the alignment is the sum of the positive scores and penalty scores: + Identities + Similarities - Substitution - Gap insertions - Gap extensions Alignment score Scoring Matrix Gap penalties

  12. Pairwise Alignment Local vs. Global • Global alignment – finds the best alignment across the whole two sequences. ADLGAVFALCDRYFQ |||| |||| | ADLGRTQN-CDRYYQ • Local alignment– finds regions of similarity in parts of the • sequences. ADLG CDRYFQ |||| |||| | ADLG CDRYYQ

  13. Pairwise Alignment Global: Needleman & Wunsch (1970) • Involves an iterative matrix method of calculation Needleman, S. B. and Wunsch, C. D., 1970

  14. Pairwise Alignment http://www.ebi.ac.uk/Tools/psa/emboss_needle/

  15. Pairwise Alignment http://www.ebi.ac.uk/Tools/psa/emboss_needle/

  16. Pairwise Alignment Local: Smith & Waterman (1981) • Makes an optimal alignment of the best segmentof similarity • between two sequences • Sequences that contain regions that are highly similar • Use when one sequence is short and the other is very long • Can return a number of aligned segments Smith, T.F. and Waterman, M.S., 1981

  17. Pairwise Alignment http://www.ebi.ac.uk/Tools/psa/emboss_water/

  18. Pairwise Alignment http://www.ebi.ac.uk/emboss/align/

  19. BLAST/PSI-BLAST • BLAST- search your sequence against a sequence database • PSI-BLAST- search a PSSM against a sequence database

  20. BLAST(BASIC LOCAL ALIGNMENT SEARCH TOOL) • Goal: A fast search for homologues in a huge database BLAST is a heuristic method . Avoids an explicit search of the entire matrix by discarding most irrelevant sequences. Key concept: Homologous sequences expected to contain ungapped short segments with substitutions but without gaps. Altschul, S.F.,Gish, W., Miller, W., Myers, E.W., and Lipman,D.J(1990) “basic local alignment search tool” J. Mol. Biol. 215: 403-410

  21. PSI-BLAST • Standard protein-protein BLAST search. • Building a position-specific scoring matrix (PSSM or profile) from a multiple alignment of the sequences returned with low Expect values. • BLAST search with PSSM as query. • Refining the PSSM by adding new database sequences. • Stop when no more matches to new database sequences are found. Otherwise, repeat to step 3.

  22. PSSMPosition Specific Scoring Matrix • Given a query sequence: • Alignall sequences above a certain similarity • Each cell (i,j) represents probability of residue i to beat position j of the multiple alignment.

  23. PSI-BLASTOutline

  24. General Issues • Where? (to find homologues) • Structural templates- search against the PDB • Sequence homologues- search against SwissProt or Uniprot • How long? (length of homologues) • Fragments- short homologues (less than 50,60% the query’s length) = relatively bad alignment • Ensure your sequences exhibit the wanted domain(s) • N/C terminal tend to vary in length between homologues

  25. General Issues • From who? (which species the sequence belongs to) • Don’t care, all homologues are welcome • Orthologues/paralogues may be helpful • Sequences from distant/close species provide different types of information • Which method? (BLAST/PSI-BLAST) • Depends…

  26. General Issues • Which method? (BLAST/PSI-BLAST) • BLAST: • identify the query sequence • find protein sequences similar to the query • PSI-BLAST: • finding very distantly related proteins • finding new members of a protein family • build a custom position-specific score matrix • Poor results from BLAST.

  27. No “Miracle solution”  Each protein is a different story  adjust parameters: • BLAST- E-value, substitution matrix, gap penalties, database, word length… • PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds…

  28. The Query Protein Name: Dihydrodipicolinatereductase Enzyme reaction: Molecular process: Lysine biosynthesis (early stages) Organism: E. coli Sequence length: 273 aa

  29. The Query Protein Query: DAPB_ECOLI <DAPB_ECOLI MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

  30. BLAST

  31. BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi BLASTp

  32. BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Query Sequence Database BLASTp Run

  33. BLAST NCBI As many as possible Evalue threshold Matrix

  34. BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Mark all Mark only wanted

  35. BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

  36. BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

  37. PSI-BLAST

  38. PSI-BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Query Sequence Database Run PSI-BLAST

  39. PSI-BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Pre-calculated PSSM Threshold for inclusion in PSSM

  40. PSI-BLAST NCBI http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Run next round Not found in previous round Include sequence in the PSSM

More Related