1 / 99

BLAST

BLAST. Introduction to Blast. Prepared by Nurul Annasuha Binti Mohd Baharudin. B asic  L ocal  A lignment  S earch  T ool (BLAST). A n algorithm for comparing primary biological sequence information (e.g. amino-acid sequences of different proteins , nucleotides of DNA sequences).

foster
Download Presentation

BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAST

  2. Introduction to Blast Prepared by Nurul Annasuha Binti Mohd Baharudin

  3. Basic Local Alignment Search Tool(BLAST) • An algorithm for comparing primary biological sequence information (e.g. amino-acid sequences of different proteins , nucleotides of DNA sequences). • A widely used bioinformatics' program. • availability on the World Wide Web through a large server at the  NCBI (National Center for Biotechnology  Information) and at many other sites.

  4. BLAST is more time-efficient than FASTA • comparative sensitivity - searching only for the more significant patterns in the sequences. • freely available to run on many computer platform.

  5. Background • Designed by Eugene Myers, Stephen Altschul, Warren Gish, David J.Lipman and Webb Miller. • Was published in the Journal of Molecular Biology in 1990. • Smith-Waterman algorithm was used before fast algorithm such as BLAST and FASTA were developed.

  6. Uses of BLAST • Identifying species - Identify a species and find homologous species. e.g. working with a DNA sequence from an unknown species. • Locating domains - To locate known domains within the sequence of interest. e.g. working with a protein sequence as an input

  7. DNA mapping - To compare the chromosomal position of the sequence of interest, to relevant sequences in the database. e.g. working with a known species, and looking to sequence a gene at an unknown location. • Comparison - To locate common gene in two related species, and can be used to map annotations from one organism to another. e.g. working with genes

  8. Input • Input Sequences: • FASTA format • Genbank format

  9. FASTA • Also known as Pearson format • A text-based format for representing : • nucleotidesequences (nucleic acid) • peptide sequences (amino acids) • Advantages: • easy to manipulate and parse sequences using text-processing tools and scripting languages like Python, Ruby, and Perl. Expressed by single-letter codes.

  10. Nucleotide Sequences Nucleic acids code supported: Refer for an unknown nucleic acid residue. • Treated as mismatches in nucleotide alignment. • Too many input sequence will cause rejected. ‘ – ‘ not included in query. To represent gaps ‘ N ‘.

  11. Peptide Sequences Amino acids code supported: ‘U’ is replaced by ‘X’. Refer for an unknown nucleic acid residue. Nucleotide-like code. Too many input sequence rejection. ‘ – ‘ not included in query. To represent gaps ‘ X ‘

  12. Example: • Greater than symbol • Usually start with ‘<‘ symbol. • Useful when doing multiple sequence FASTA format. (concatenating several single sequence FASTA file) description line sequence data • Recommendation : all lines of text be < 80 characters length. • Blank lines are not allowed in the middle of FASTA input. • -->single hyphen or dash (‘-’) can be used to represent a gap • of indeterminate length.

  13. Example: Also starter symbol. This line will be ignored by the software. • lower-case letters are accepted and are mapped into upper-case.

  14. Bare sequence : The sequence portion of a GenBank/GenPept flatfile report.

  15. Example: Multi Sequence FASTA file

  16. Genebank • Genbank sequence database is an open access • Annotated collection of all publicly available nucleotide sequences and their protein translation. • Contained over 65 billion nucleotide bases in more than 61 million sequences

  17. Example: Legal input of Genebank identifier: CAA89576 CAA89576.1 1015707 gi|1015707 e.g Mega BLAST: 1015707 129295

  18. Exercise: Following are illigal input of Genebank identifier. Why they are illigal?

  19. Must be removed. Answer: Should be NO space.

  20. Output HTML : NCBI default format • Agraphical format showing the hits found • Atable showing sequence identifiers for the hits with scoring related data, as well as alignments for the sequence of interest and the hits received with corresponding BLAST scores.

  21. Types of BLAST Prepared by Nurrul Shafiqah Binti Abdullah

  22. Nucleotide BLAST • Protein BLAST • blastx • tblastn • tblastx

  23. Nucleotide BLAST • Search a nucleotide database using a nucleotide query • Algorithm • blastn • megablast • discontinous megablast

  24. blastn • compares a nucleotide query sequence against a nucleotide sequence database

  25. megablast - identify an unknown sequence is to see if that sequence already exists in a public database.

  26. Protein BLAST • Algorithm • blastp, • psi-blast, • phi-blast, • delta-blast

  27. blastp • compares an amino acid query sequence against a protein sequence database

  28. blastx • compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database

  29. Six-frame conceptual translation

  30. tblastn • compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

  31. tblastx • compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

  32. Advantages/Disadvantages of BLAST On Net/Computer

  33. On Net http://www.ncbi.nlm.nih.gov/BLAST/

  34. On Computer UNIX/MacOS/Windows ftp://ncbi.nlm.nih.gov/blast/executables/

  35. Question??? What is/are the significance of translation BLAST (blastx, tblastn and tblastx)? • protein sequences are better conserved evolutionarily than nucleotide sequences, • produce more reliable and accurate results when dealing with coding DNA •  enable one to be able to directly see the function of the protein sequence, since by translating the sequence of interest before searching often gives the annotated protein hits.

  36. Common Databases for Use with BLAST available at NCBI Prepared by Roslan Bin Armina

  37. A BLAST search has four components: 1. query 2.database 3.program 4.search purpose/goal • Blast database are grouped into protein nucleotide database according to their content. These can shown in table below.

  38. protein sequence database

  39. combination of several database into one • example of other database use in blast.

  40. NCBI • Can choose what database to use, whether a combination of databasenr (nr) or specific type of database (refseq_protein swissprot, pat, etc)

  41. nuclueotide database

  42. NCBI also provides specialized BLAST databases. 1. vector screening database 2. variety of genome databases for different organisms. 3. trace databases

More Related