1 / 17

Understanding Minimum Edit Distance in Computational Biology and Sequence Alignment

Minimum edit distance is a key concept in computational biology, particularly in sequence alignment. This process involves comparing DNA sequences from different species to find significant regions, determine gene functions, explore evolutionary forces, and identify mutations. Two popular algorithms for sequence alignment are Needleman-Wunsch and Smith-Waterman, each adapting to different alignment problems, whether global or local. Understanding these algorithms is crucial for assembling DNA fragments and analyzing genetic information efficiently.

lora
Download Presentation

Understanding Minimum Edit Distance in Computational Biology and Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimum Edit Distance Minimum Edit Distance in Computational Biology

  2. Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

  3. Why sequence alignment? • Comparing genes or regions from different species • to find important regions • determine function • uncover evolutionary forces • Assembling fragments to sequence DNA • Compare individuals to looking for mutations

  4. Alignments in two fields • In Natural Language Processing • We generally talk about distance (minimized) • And weights • In Computational Biology • We generally talk about similarity (maximized) • And scores

  5. The Needleman-Wunsch Algorithm • Initialization: D(i,0) = -i * d D(0,j) = -j * d • Recurrence Relation: D(i-1,j) - d D(i,j)= max D(i,j-1) - d D(i-1,j-1) + s[x(i),y(j)] • Termination: D(N,M) is distance

  6. The Needleman-Wunsch Matrix x1 ……………………………… xM (Note that the origin is at the upper left.) y1 …………………… yN Slide adapted from SerafimBatzoglou

  7. A variant of the basic algorithm: ----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG-------------- • If so, we don’t want to penalize gaps at the ends Maybe it is OK to have an unlimited # of gaps in the beginning and end: Slide from Serafim Batzoglou

  8. Different types of overlaps Example: 2 overlapping“reads” from a sequencing project Example: Search for a mouse gene within a human chromosome Slide from Serafim Batzoglou

  9. The Overlap Detection variant x1 ……………………………… xM y1 …………………… yN Changes: • Initialization For all i, j, F(i, 0) = 0 F(0, j) = 0 • Termination maxi F(i, N) FOPT = max maxjF(M, j) Slide from SerafimBatzoglou

  10. The Local Alignment Problem Slide from SerafimBatzoglou Given two strings x = x1……xM, y = y1……yN Find substrings x’, y’ whose similarity (optimal global alignment value) is maximum x = aaaacccccggggtta y = ttcccgggaaccaacc

  11. The Smith-Waterman algorithm Idea: Ignore badly aligning regions Modifications to Needleman-Wunsch: Initialization: F(0, j) = 0 F(i, 0) = 0 0 Iteration:F(i, j) = max F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + s(xi, yj) Slide from SerafimBatzoglou

  12. The Smith-Waterman algorithm Termination: • If we want the best local alignment… FOPT = maxi,j F(i, j) Find FOPT and trace back • If we want all local alignments scoring > t ?? For all i, j find F(i, j) > t, and trace back? Complicated by overlapping local alignments Slide from SerafimBatzoglou

  13. Local alignment example X = ATCAT Y = ATTATC Let: m = 1 (1 point for match) d = 1 (-1 point for del/ins/sub)

  14. Local alignment example X = ATCAT Y = ATTATC

  15. Local alignment example X = ATCAT Y = ATTATC

  16. Local alignment example X = ATCAT Y = ATTATC

  17. Minimum Edit Distance Minimum Edit Distance in Computational Biology

More Related