1 / 20

BIOMETRICS

BIOMETRICS. Module Code: CA641 Week 11- Pairwise Sequence Alignment. Outline. Why should one do sequence comparison ? Dynamic programming alignment algorithms global alignment (Needleman- Wunsch ) local alignment (Smith-Waterman ) Heuristic alignment algorithms : FASTA , BLAST

kynton
Download Presentation

BIOMETRICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment

  2. Outline • Whyshould one do sequence comparison? • Dynamic programming alignment algorithms • global alignment (Needleman-Wunsch) • local alignment (Smith-Waterman) • Heuristic alignment algorithms: FASTA, BLAST Acknowledgement: this presentation was created following Prof. LiviuCiortuz, “Computer Science” Faculty, “Al. I. Cuza” University, Iasi, Romania: http://profs.info.uaic.ro/~ciortuz/

  3. Why should one do sequence comparison?First success stories • 1984 (Russell Doolittle): similarity between the newly discovered ν-sis onco-gene in monkeys and a previously known gene PDGF(platelet derived growth factor) involved in controlled growth of the cell, suggested that the onco-gene does the right job at the wrong time • 1989: cystic fibrosis disease — an abnormally high content of sodium in the mucus in lungs: • Biologists suggested that this is due to a mutant gene (still unknown at that time, but narrowed down to a 1MB region) on chromosome 7. Comparison with (then) known genes found similaritywith a gene that encodes ATP , the adenosine triphosphate binding protein, which spans the cell membrane as an ion transport channel. Cystic fibrosis was then proved to be caused by the deletionof a triplet in the CFTR(cystic fibrosis transport regulator) gene.

  4. Model organism a non-human species - extensively studied to understand particular biological phenomena used to research human disease Escherichia coli Drosophila melanogaster Zea mays Laboratory mice Image source: http://en.wikipedia.org/wiki/Model_organism

  5. How to rank alignments: Scoring systems Notations • Consider 2 sequencesx and y of lengths n and m , respectively; xi is the i-th symbol (“residue”) in x and yj is the j -th symbol in y • For DNA, the alphabetis {A, C, G, T}. For proteins, the alphabet consists of 20 amino acids.

  6. Sequence matching/alignment • Given two DNA sequences or proteins • X = x1x2x3..xm • Y = y1y2y3…yn, where xi, yi are from the alphabet Alf = {A, C, G, T}, what is their similarity? • Seq. Alignment= a way to arrange the sequences of DNA, RNA or protein, i.e. regions of similarity can be identified • Example of seq. alignment: • Gapsmay be opened to improve alignment

  7. A dot plot example

  8. Gap Penalties • Types of gap penalties: • Linear score: ϒ (g) = -gd • Affine score: ϒ (g) = -d – (g-1)e where: • d > 0 is the gap open penalty, • e> 0 is the gap extension penalty, • g N is the length of the gap. • Note: • Usually e < d, therefore long insertions and deletions will be penalised less than they would have been penalised by a linear gap cost. • Examples from this presentation use d = -8.

  9. DNA substitution matrices Simple Used in genome alignments

  10. Dynamic programming • Better alignments will have a higher score. Therefore, the purpose is to maximize the score in order to find the optimal alignment. • Sometimes scores are assigned by other means as costs or edit distances, when a minimum cost of an alignment is searched. • Both approaches have been used in biological sequence comparison. • Dynamic programming is applied to both cases: the difference is searching for a minimum or maximum score of the alignment.

  11. The score matrix for the following examples (from BLOSUM50) Examples from this presentation use d = -8.

  12. Global Alignment:Needleman–Wunsch Algorithm (1970) • Problem: Find the optimal global alignment of two sequences x = x1..xn, and y = y1..ym. • Idea: Use dynamic programming. • Initialisation: F(0,0) =0, F(i,0)=-id, F(0,j) = -jd, for i=1..n, j=1..m • Recurrence: There are three ways in which the best score F(i,j) of an alignment up to xi, yj could be obtained: F(i,j)= max  Fill in the matrix F from top left to bottom right. • Optimal score: F(n,m).

  13. Filling in the dynamics programming matrix -d -d

  14. Global Alignment: ExampleComputing the dynamic programming matrix

  15. Global Alignment: Example (cont’d)Finding an optimal alignment

  16. Local AlignmentSmith–Waterman Algorithm • Problem: Find the best local alignment of two sequences x and y, allowing gaps. • Idea: Modify the global alignment algorithm in such a way that it will end in a local alignment whenever the score becomes negative.. • Initialisation: F(0,0) =0, F(i,0)= F(0,j) = 0, for i=1..n, j=1..m • Recurrence: There are three ways in which the best score F(i,j) of an alignment up to xi, yj could be obtained: F(i,j)= max  Fill in the matrix F from top left to bottom right. • Optimal score: maxi, j F(i,j).

  17. Note For the local alignment algorithm to work, • Some (a,b) must be positive, otherwise the algorithm will not find any alignment at all; • The expected score of a random alignment should be negative, otherwise long matches between unrelated sequences will have the high score, just based on their length, and a true local alignment would be masked by a longer but incorrect one.

  18. Local Alignment: Example

  19. Additional links • http://thor.info.uaic.ro/~ciortuz/SLIDES/pairAlign.pdf

  20. Biometric Workshop – page 11 • Q.3 Given 2 sequences GCCGAGCTAG CCTAGCAAG • Build the matrix of maximal similarity between them, according to Smith-Waterman algorithm (local alignment), using the scores: match = 1; mismatch = -1; gap = -3. • What is the optimal alignment?

More Related