Tutorial 3 B LAST - PowerPoint PPT Presentation

tutorial 3 b last n.
Skip this Video
Loading SlideShow in 5 Seconds..
Tutorial 3 B LAST PowerPoint Presentation
Download Presentation
Tutorial 3 B LAST

play fullscreen
1 / 36
Download Presentation
Tutorial 3 B LAST
Download Presentation

Tutorial 3 B LAST

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Tutorial 3 BLAST

  2. BLAST tutorial • How to use BLAST • Score vs. E-value • Exercise • Cool story of the day: How Alzheimer is studied in yeast

  3. BLAST What is BLAST? • Basic Local Alignment Search Tool • Set of similarity search programs for exploring sequence databases.  Database Query BLAST program

  4. Why perform a similarity search? • Find genes/proteins with possibly similar function • Find the origin of a sequence (what organism it is taken form) • Different degrees of similarity can be found in database search

  5. BLAST Databases Genomic: A T G C Proteomic:G A S T C V L I M P F Y W D E N Q H K R Translated genomic: The query is genomic, translated to protein using 6 possible reading frames ATGCCGTTC -> MPF , CR, AV

  6. http://blast.ncbi.nlm.nih.gov/Blast.cgi

  7. Query and DB parameters Place Query Job title – helpful when running multiple runs Choose Database In case you want to restrict to a specific organism ? In case you want to eliminate specific sequences

  8. How to choose the database? Depends on what you’re looking for… A good place to start if you don’t know what you’re looking for nr/nt: non-redundant nucleotide

  9. Alignment parameters Optimize similarity level of the search

  10. Alignment parameters Threshold for results significance Primary word match (16-64 nt) Scores of matching and mismatching bases Cost to create and extend a gap

  11. How to interpret BLAST results?

  12. Search for homologous to chick “olfactory receptor 6” gene

  13. Search results

  14. Graphic Summary Query sequence Matched sequences from DBs

  15. Descriptions Score(bits) Sequence description %Coverage Sequence Identifier + link E value %Identity

  16. Alignments Query info Alignment info Alignment

  17. It is possible to get multiple hits per sequence

  18. E-values and scores

  19. Score vs. E-value • The score is a measure of the similarity of the query to the a sequence from the database. • The E-value is a measure of the reliability of the score. • The definition of the E-value is: • The number of expected alignments with this score or higher due to chance.

  20. Score vs. E-value Score (S) =  (identities + mismatches) -  gaps Bit Score (S’): ‘ Score Depends on search space Depends on scoring system Query length(bp) Effective length (total number of bases) of the database(bp) • E-values cannot be compared across different DBs, even if the score is the same.

  21. Intuition for “significance” • Think of the query as a ball, each color represents a part of the sequence. • The DB is a pool of colored balls. • If the ball has many colors (longer query) – there is a higher probability to see the same color in the pool by chance. • If the pool of balls is very big, there is a higher probability to see one of the balls colors in the pool.

  22. E-value Threshold The typical threshold for a good E-value from a BLAST search is E=10-6≈e-6or lower. This does not mean that higher E-values are given for queries with no biological significance. http://www.youtube.com/watch?v=Z7ek7UoP7Bg&src_vid=nO0wJgZRZJs&feature=iv&annotation_id=annotation_234259

  23. E-value vs. P-value P-value is the probability that an event will happen by chance E-value is correction of the P-value considering the DB size. So if the probability to find a sequence is 0.001 in a 1,000,000 entries DB the number of expected alignments we will find is 1,000! http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf http://www.ncbi.nlm.nih.gov/BLAST/tutorial/

  24. Exercise

  25. Find homologs for CFTR gene in human You can put the gene ID rather than the sequence Human DB only We’ll start with high similarity

  26. Now change to more distinct sequences

  27. We get more results

  28. Find homologs for CFTR gene in other organisms Not only human sequences

  29. Where to run a nucleotide sequence - blastn or blastx ? blastn (genomic vs. genomics) blastx (translated genomics vs. proteomic) ncRNA If you know your sequence is a protein – blastx is better, since you will get more reliable results.

  30. Cool Story of the day How Alzheimer is studied in yeast

  31. Alzheimer's disease (AD) • Alzheimer's disease leads to nerve cell death and tissue loss throughout the brain. • Symptoms can include confusion, aggression, trouble with language, and long term memory loss. Gradually, bodily functions are lost, ultimately leading to death. • There are no available treatments that stop or reverse the progression of the disease. • The disease is associated with plaques and tangles in the brain. http://www.alz.org/braintour/alzheimers_changes.asp http://en.wikipedia.org/wiki/Alzheimer's_disease

  32. How can AD be studied in yeast? Yeast cells lack the specialized processes of neuronal cells and the cell-cell communications that modulate neuropathology. However, the most fundamental features of eukaryotic cell biology evolved before the split between yeast and metazoans. Treusch et al. Science (2011) http://lindquistlab.wi.mit.edu/

  33. Beta-amyloid () peptide is one of the hypothesized causes of AD. • Susan Linquist’s lab showed it was toxic when expressed in yeast. Later they tested the affect of this protein on rat neuron cells and in C.elegansneurons. Treusch et al. Science (2011) http://lindquistlab.wi.mit.edu/

  34. The researchers looked for suppressor genes that had homologs in Human and C.elegans Treusch et al. Science (2011)