1 / 22

BLAST Tutorial: Exploring Sequence Databases and Analyzing Similarity Searches

Learn about Basic Local Alignment Search Tool (BLAST) and how it can be used to explore sequence databases, develop hypotheses, and analyze similarities between sequences. Optimize your searches, interpret BLAST results, and understand scoring and E-values.

mcleveland
Download Presentation

BLAST Tutorial: Exploring Sequence Databases and Analyzing Similarity Searches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial 3 BLAST • What is BLAST? • Basic Local Alignment Search Tool • Is a set of similarity search programs designed to explore sequence databases.  • What are similarity searches good for? • One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function Database Query BLAST program

  2. BLAST Databases

  3. http://www.ncbi.nlm.nih.gov/BLAST/

  4. Place Query Choose Database ?

  5. BLASTN Databases http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

  6. Place Query Choose Database Optimize similarity level of the search ? Limit output size Threshold for results significance Primary word match (16-64 nt) Reward and penalty for matching and mismatching bases Cost to create and extend a gap Remove low information content Limit search to specific organism

  7. Search for homologous to chick “olfactory receptor 6” gene

  8. Global Alignments Local Alignments Query sequence Matched Areas of database sequences

  9. Sequence description E value Score(bits) Sequence Identifier Identity Coverage

  10. Score andE value Identities and gaps Strand

  11. Multiple hits on a same subject

  12. Design of the BLAST survey • Consider your research question: • Are you looking for an particular gene in a particular species?: BLAST against the genome of that species. • Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database. • Are you looking for exact motif matches? : increase gap penalty or use megablast.

  13. Score and E-value Score (S): (identities + mismatches)-gaps Bit Score (S’): Score Depends on search space Depends on scoring system Database length(bp) Query length(bp)

  14. Score and E-value • The score is a measure of the similarity of the query • to the sequence shown. • The E-value is a measure of the reliability of the score. • The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score.

  15. Score and E-value • The Size of the E-value • The typical threshold for a good E-value from a BLAST • search is E=10-6≈e-6 or lower. • The reason for such low values is that an E=0.001 in a • million entry database would still leave 1000 entries due • to chance. An E=e-6 would only leave one entry due to • chance.

  16. Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT Given the following parameters: Query length: 150 • =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 S: (Id+MM)-GP S = 13-1 = 12 S’= (1.37*12 – ln(0.711))/ln(2) S’= 16.44 + 0.341 /0.693 S’= 24.2

  17. Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT Given the following parameters: Query length: 150 • =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 E= 0.711x150x270x4,554,026xe-1.37*12 E= 131135455683x7.24e-8 E= 9504.27

  18. Exercise What will be the minimal score in order to achieve a significant E value (e-6~10-6)? 131135455683e-1.37S=10-6 ln (131135455683e-1.37S)=ln(10-6) ln (131135455683)+ln(e-1.37S)=-13.81 25.6-1.37S=-13.81 S= =-13.81-25.6/-1.37 S≈ 28.76

  19. 1. חיפוש רצפים הומולוגיים לגן CFTR באדם

  20. 2. חברי משפחה נוספים לגן CFTR הנמצאים ביצורים אחרים

  21. 3. חיפוש של גנים נוספים חברי משפחת ABC transporters

More Related