Blast l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

BLAST PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on
  • Presentation posted in: General

BLAST. BLAST. B asic L ocal A lignment S earch T ool Developed in 1990 and 1997 (S. Altschul) A heuristic method for performing local alignments through searches of high scoring segment pairs (HSP’s) 1st to use statistics to predict significance of initial matches - saves on false leads

Download Presentation

BLAST

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Blast l.jpg

BLAST


Blast2 l.jpg

BLAST

  • Basic Local Alignment Search Tool

  • Developed in 1990 and 1997 (S. Altschul)

  • A heuristic method for performing local alignments through searches of high scoring segment pairs (HSP’s)

  • 1st to use statistics to predict significance of initial matches - saves on false leads

  • Offers both sensitivity and speed


Blast3 l.jpg

BLAST

  • Looks for clusters of nearby or locally dense “similar or homologous” k-tuples

  • Uses “look-up” tables to shorten search time

  • Uses larger “word size” than FASTA to accelerate the search process

  • Performs both Global and Local alignment

  • Fastest and most frequently used sequence alignment tool -- THE STANDARD


Blast access l.jpg

BLAST Access

  • NCBI BLAST

  • http://www.ncbi.nlm.nih.gov/BLAST/

  • Canadian Bioinformatics Resource BLAST

  • http://cbr-rbc.nrc-cnrc.gc.ca/blast/

  • European Bioinformatics Institute BLAST

  • http://www.ebi.ac.uk/blastall/

  • http://www.ebi.ac.uk/blast2/


Different flavours of blast l.jpg

Different Flavours of BLAST

  • BLASTP - protein query against protein DB

  • BLASTN - DNA/RNA query against GenBank (DNA)

  • BLASTX - 6 frame trans. DNA query against proteinDB

  • TBLASTN - protein query against 6 frame GB transl.

  • TBLASTX - 6 frame DNA query to 6 frame GB transl.

  • PSI-BLAST - protein ‘profile’ query against protein DB

  • PHI-BLAST - protein pattern against protein DB


Other blast services l.jpg

Other BLAST Services

  • MEGABLAST - for comparison of large sets of long DNA sequences

  • RPS-BLAST - Conserved Domain Detection

  • BLAST 2 Sequences - for performing pairwise alignments for 2 chosen sequences

  • Genomic BLAST - for alignments against select human, microbial or malarial genomes

  • VecScreen - for detecting cloning vector contamination in sequenced data


Running ncbi blast l.jpg

Running NCBI BLAST


Mt0895 l.jpg

MT0895

  • MMKIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIMGRVASKEEIKKILS


Running ncbi blast12 l.jpg

Running NCBI BLAST

  • Paste in sequence (FASTA format, raw sequence or type in GI or accession number)

>Mysequence MT0895 KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS

OR

> KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS

OR

KIQIYGTGCANCQMLEKNAREAVKELGIDAEFEKIKEMDQILEAGLTALPGLAVDGELKIDS


Running ncbi blast13 l.jpg

Running NCBI BLAST

  • Choose a range of interest in the sequence “set subsequences” (not usually used)

  • Select the database from pull-down menu (usually choose nr = non-redundant)

  • Keep CD Search “check box” on

  • Leave “Options” unchanged (use defaults)

  • Go to “Format” menu and adjust Number of descriptions and alignments as desired


Running ncbi blast14 l.jpg

Running NCBI BLAST

Select Database


Conserved domain database l.jpg

Conserved Domain Database

  • Contains a collection of pre-identified functional or structural domains

  • Derived from Pfam and Smart databases as well as other sources

  • Uses Reverse Position Specific BLAST (RPS-BLAST) to perform search

  • Query sequence is compared to a PSSM derived from each of the aligned domains


Running ncbi blast16 l.jpg

Running NCBI BLAST

Click BLAST!


Formatting results l.jpg

Formatting Results


Blast format options l.jpg

BLAST Format Options


Blast output l.jpg

BLAST Output


Blast output20 l.jpg

BLAST Output


Blast output21 l.jpg

BLAST Output


Blast output22 l.jpg

BLAST Output


Blast output23 l.jpg

BLAST Output


Blast output24 l.jpg

BLAST Output


Blast parameters l.jpg

BLAST Parameters

  • Identities - No. & % exact residue matches

  • Positives - No. and % similar & ID matches

  • Gaps - No. & % gaps introduced

  • Score - Summed HSP score (S)

  • Bit Score - a normalized score (S’)

  • Expect (E) - Expected # of chance HSP aligns

  • P - Probability of getting a score > X

  • T - Minimum word or k-tuple score (Threshold)


Blast rules of thumb l.jpg

BLAST - Rules of Thumb

  • Expect (E-value) is equal to the number of BLAST alignments with a given Score that are expected to be seen simply due to chance

  • Don’t trust a BLAST alignment with an Expect score > 0.01 (Grey zone is between 0.01 - 1)

  • Expect and Score are related, but Expect contains more information. Note that %Identies is more useful than the bit Score

  • Recall Doolittle’s Curve (%ID vs. Length, next slide) %ID > 30 - numres/50

  • If uncertain about a hit, perform a PSI-BLAST search


Doolittle s curve l.jpg

Doolittle’s Curve

Twilight Zone


Getting the most from blast l.jpg

Getting the Most from BLAST


Blast options l.jpg

BLAST Options


Blast options30 l.jpg

BLAST Options

  • Composition-based statistics (Yes)

  • Sequence Complexity Filter (Yes)

  • Expect (E) value (10)

  • Word Size (3)

  • Substitution or Scoring Matrix (Blosum62)

  • Gap Insertion Penalty (11)

  • Gap Extension Penalty (1)


Composition statistics l.jpg

Composition Statistics

  • Recent addition to BLAST algorithm

  • Permits calculated E (Expect) values to account for amino acid composition of queries and database hits

  • Improves accuracy and reduces false positives

  • Effectively conducts a different scoring procedure for each sequence in database


Lcr s low complexity l.jpg

LCR’s (low complexity)

  • Watch out for…

    • transmembrane or signal peptide regions

    • coil-coil regions

    • short amino acid repeats (collagen, elastin)

    • homopolymeric repeats

  • BLAST uses SEG to mask amino acids

  • BLAST uses DUST to mask bases


Scoring matrices l.jpg

Scoring Matrices

  • BLOSUM Matrices

    • Developed by Henikoff & Henikoff (1992)

    • BLOcks SUbstitution Matrix

    • Derived from the BLOCKS database

  • PAM Matrices

    • Developed by Schwarz and Dayhoff (1978)

    • Point Accepted Mutation

    • Derived from manual alignments of closely related proteins


How to make your own matrix l.jpg

How to Make Your Own Matrix

A

C D ...

ACDEFGH..

ACDEFGK..

AADEFGH..

GCDEFGH..

ACAEYGK..

ACAEFAH..

#Aobs

f(A,A) =

A

0.8 -- --

#Aexp

C

0.2 0.8 --

D

0.0 0.3 1.0

#C/Aobs

f(C,A) =

E

-- -- --

#Aexp

+

#Cexp

PerformCalculateFill Sub

AlignmentFrequenciesMatrix


Pam versus blosum l.jpg

First useful scoring matrix for protein

Assumed a Markov Model of evolution (I.e. all sites equally mutable and independent)

Derived from small, closely related proteins with ~15% divergence

Much later entry to matrix “sweepstakes”

No evolutionary model is assumed

Built from PROSITE derived sequence blocks

Uses much larger, more diverse set of protein sequences (30% - 90% ID)

PAM versus BLOSUM


Pam versus blosum36 l.jpg

Higher PAM numbers to detect more remote sequence similarities

Lower PAM numbers to detect high similarities

1 PAM ~ 1 million years of divergence

Errors in PAM 1 are scaled 250X in PAM 250

Lower BLOSUM numbers to detect more remote sequence similarities

Higher BLOSUM numbers to detect high similarities

Sensitive to structural and functional subsitution

Errors in BLOSUM arise from errors in alignment

PAM versus BLOSUM


Pam matricies l.jpg

PAM Matricies

  • PAM 40 - prepared by multiplying PAM 1 by itself a total of 40 times best for short alignments with high similarity

  • PAM 120 - prepared by multiplying PAM 1 by itself a total of 120 times best for general alignment

  • PAM 250 - prepared by multiplying PAM 1 by itself a total of 250 times best for detecting distant sequence similarity


Blosum matricies l.jpg

BLOSUM Matricies

  • BLOSUM 90 - prepared from BLOCKS sequences with >90% sequence ID best for short alignments with high similarity

  • BLOSUM 62 - prepared from BLOCKS sequences with >62% sequence ID best for general alignment (default)

  • BLOSUM 30 - prepared from BLOCKS sequences with >30% sequence ID best for detecting weak local alignments


Scraping the bottom of the barrel with psi blast l.jpg

Scraping the Bottom of the Barrel with Psi-BLAST


Psi blast algorithm l.jpg

PSI-BLAST Algorithm

  • Perform initial alignment with BLAST using BLOSUM 62 substitution matrix

  • Construct a multiple alignment from matches

  • Prepare position specific scoring matrix

  • Use PSSM profile as the scoring matrix for a second BLAST run against database

  • Repeat steps 3-5 until convergence


Psi blast l.jpg

PSI-BLAST


Psi blast42 l.jpg

PresSIterate!

PSI-BLAST


Psi blast43 l.jpg

PSI-BLAST

PresSIterate!


Psi blast44 l.jpg

PSI-BLAST


Psi blast45 l.jpg

PSI-BLAST

  • For Protein Sequences ONLY

  • Much more sensitive than BLAST

  • Slower (iterative process)

  • Often yields results that are as good as many common threading methods

  • SHOULD BE YOUR FIRST CHOICE IN ANALYZING A NEW SEQUENCE


Blast against pdb l.jpg

BLAST against PDB


Still confused l.jpg

Still Confused?

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html


Conclusions l.jpg

Conclusions

  • BLAST is the most important program in bioinformatics (maybe all of biology)

  • BLAST is based on sound statistical principles (key to its speed and sensitivity)

  • A basic understanding of its principles is key for using/interpreting BLAST output

  • Use NBLAST or MEGABLAST for DNA

  • Use PSI-BLAST for protein searches


  • Login