1 / 26

Sequence alignment

BI420 – Introduction to Bioinformatics. Sequence alignment. Gabor T. Marth. Department of Biology, Boston College marth@bc.edu. Biologically significant alignment. hba_human. hbb_human. http://artedi.ebc.uu.se/programs/pairwise.html. Biologically plausible alignment. Spurious alignment.

Download Presentation

Sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BI420 – Introduction to Bioinformatics Sequence alignment Gabor T. Marth Department of Biology, Boston College marth@bc.edu

  2. Biologically significant alignment hba_human hbb_human http://artedi.ebc.uu.se/programs/pairwise.html

  3. Biologically plausible alignment

  4. Spurious alignment (BRCA1 variant) Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison

  5. Alignment types How do we align the words: CRANE and FRAME? CRANE || | FRAME 3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x 1 + 1 x (-1) + 3 x (-1) = 0. This is the alignment score. Examples from: BLAST. Korf, Yandell, Bedell

  6. Finding the “best” alignment COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-6 S=-10 S=-2 COELACANTH || ||| P-ELICAN-- S=0

  7. Global alignment – Needleman-Wunsch Aligning words: SHAKE and SPEARE Example from: Higgs and Attwood

  8. Local alignment – Smith-Waterman Example from: Higgs and Attwood

  9. Visualizing pair-wise alignments

  10. Sequence similarity and scoring Match-mismatch-gap penalties: e.g. Match = 1 Mismatch = -5 Gap = -10 Scoring matrices

  11. Multiple alignments clustalW

  12. Anchored multiple alignment

  13. Similarity searching vs. alignment Alignment Similarity search query database

  14. The BLAST algorithms

  15. BLAST report

  16. BLAST report gi|7428631 http://www.ncbi.nih.gov/BLAST/

  17. The BLAST algorithm Sequence alignment takes place in a 2-dimensional space where diagonal lines represent regions of similarity. Gaps in an alignment appear as broken diagonals. The search space is sometimes considered as 2 sequences and somtimes as query x database. • Global alignment vs. local alignment • BLAST is local • Maximum scoring pair (MSP) vs. High-scoring pair (HSP) • BLAST finds HSPs (usually the MSP too) • Gapped vs. ungapped • BLAST can do both

  18. BLOSUM62 neighborhood of RGD The BLAST algorithm RGD 17 KGD 14 QGD 13 RGE 13 EGD 12 HGD 12 NGD 12 RGN 12 AGD 11 MGD 11 RAD 11 RGQ 11 RGS 11 RND 11 RSD 11 SGD 11 TGD 11 • Speed gained by minimizing search space • Alignments require word hits • Neighborhood words • W and T modulate speed and sensitivity T=12

  19. Word length

  20. 2-hit seeding • Alignments tend to have multiple word hits. • Isolated word hits are frequently false leads. • Most alignments have large ungapped regions. • Requiring 2 word hits on the same diagonal (of 40 aa for example), greatly increases speed at a slight cost in sensitivity.

  21. Extension of the seed alignments • Alignments are extended from seeds in each direction. • Extension is terminated when the maximum score drops below X. The quick brown fox jumps over the lazy dog. The quiet brown cat purrs when she sees him. Text example match +1 mismatch -1 no gaps

  22. BLAST statistics >gi|23098447|ref|NP_691913.1| (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253 Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49 How significant is this similarity?

  23. Scoring the alignment Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49 4 -1 4 S (score)

  24. The Karlin-Altschul equation The “Expect” or “E-value” Scaling factor A minor constant Normalized score Expected number of alignments Raw score Length of query Length of database Search space The “P-value”

  25. The sum-statistics Sum statistics increases the significance (decreases the E-value) for groups of consistent alignments.

  26. The sum-statistics The sum score is not reported by BLAST!

More Related