1 / 29

BLAST

BLAST. B asic L ocal A lignment S earch T ool. לצורך דיג מוצלח יש לבחור חכה, פיתיון ומקווה מים בהתאם לשאלה הביולוגית. BLAST החכה. BLAST (Basic Local Alignment Search Tool)

neo
Download Presentation

BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST Basic Local Alignment Search Tool

  2. לצורך דיג מוצלח יש לבחור חכה, פיתיון ומקווה מים בהתאם לשאלה הביולוגית. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף שאילתא(nucleotides or amino acids) הפיתיון בחכהagainst a database הים הגדול.

  3. Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences. What for? Applications include: • Identifying shared similarities with sequences already deposited in the databanks (orthologs and paralogs?) • Discovering new genes or proteins (ascertaining existence of a putative ORF) • Discovering variants of genes or proteins •Identifying functional motifs shared with other proteins. • Investigating expressed sequence tags (ESTs) • Exploring protein structure and function

  4. Why use local alignment for database searches? Local alignment is a useful approach to DB searching because many query sequences have domains, active sites or other motifs that have local but not global regions of similarity to other sequences.

  5. BLAST (1) for the query, find the list of high scoring words of length w Query Sequence of length L For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix (e.g. PAM 250, BLOSUM)

  6. (2) Compare the word list to the database and identify exact matches database sequence Word List Exact matches of words from word lists (3) For each word match, extend the alignment in both directions to find alignments that score greater than a threshold of value S maximal segment pairs (MSPs) BLAST (cont.)

  7. Blast is a heuristic algorythm לא משווים את מלוא רצף השאילתא למלוא האורך של כ"א מן הרצפים במאגר (מרחב החיפוש), אלא מבצעים חיפוש חלקי ע"ס קירוב. Speed vs. sensitivity Does not find ALL best matches !!! False negatives. כיצד נעריך את הממצאים המתקבלים?

  8. Raw score "S" of the alignment is usually calculated by summing the scores for matches, mismatches and gaps in the alignment . Normalized score (bits) - bit scores from different alignments, even those employing different scoring matrices can be compared. The higher the score the better the alignment, BUT the significance of an alignment can not be deduced from the score alone.

  9. E-value (Expectation value) • Expect value of 10 for a match means, in a database of current size, one might expect to see 10 matches with a similar or better score, simply by chance alone • E-value is the most commonly used threshold in database searches. Only those hits with E-values smaller than the set threshold will be reported in the output • Increasing the E-value enables you to see biologically related sequences but statistically insignificant

  10. To evaluate the alignment • Examine statistical parameters: 􀂃Normalized score 􀂃E value 􀂃% identity 􀂃 % similarity 􀂃 % gaps • Examine the alignment itself. • Use biological common sense. Don’t rely only on statistical significance!!!

  11. What can we do if there are too many matches? מרוב עצים לא רואים את היער יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה. לא רואים רצפים בעלי דמיון נמוך יותר שעשויים אף הם להיות מעניינים.

  12. Limit DB • Limit organism • Filter reported entries by keyword • (Limit to a specific domain) • Change matrix and/or gap penalties • Change E-value • Add filter for low complexity ספירת האפשרויות השונות

  13. What can we do if there are hardly any matches?

  14. Check choice of DB • Check choice of organism • Remove filter for low complexity • Change matrix or gap penalties • Increase E-value

  15. DNA vs. Protein searches If we have anucleotidesequence, should we search theDNAdatabases only? Or should we translate it to protein and searchproteindatabases? Translating causes loss of information but protein sequence is more conserved than DNA sequence It is therefore advisable to translate a nucleotidesequence to protein and search proteindatabases for homology Query:DNAProtein Database:DNAProtein

  16. Why use a nucleotide sequence after all? • No ORF found. • No similar protein sequences were found • Specific DNA databases are available (EST) • To find duplicated genes in a genome • To find pseudogenes • To find the location of non-protein coding genes in the genome (siRNA etc.)

  17. Query: DNA Protein DB: DNA Protein Blast flavors • BlastN- nt versus nt database • BlastP- protein versus proteindatabase • BlastX- translated nt (6 frames) versus protein database • tBlastN - protein versus translated nt database (6 frames) • tBlastX - translated nt versus translated nt database (both 6 frames)

  18. Uses of BLAST programs BLASTx – compares a nucleotide query seq translated in all reading frames against a prot seq db. DNA protein If you have a DNA seq and you want to now what protein (if any) it encodes, you can perform BLASTx search.

  19. tBLASTn tBLASTn – compares a protein query seq against a nucleotide seq db which is translated in all reading frames. Protein DNA You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest.

  20. tBLASTx tBLASTx – translates DNA from query and compares it to db of DNA seqs all translated to all reading frames DNA DNA (nr db cannot be used, because it’s too large) Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query. (If blastx or tblastn fail)

  21. E-value

More Related