1 / 15

# Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties - PowerPoint PPT Presentation

Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois Urbana-Champaign Scribed By: Chandrasekar Ramachandran. Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties. Contents. Introduction Interpretations Types of Alignments

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties' - nicodemus

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

02/05/2008

Department of Computer Science

University of Illinois Urbana-Champaign

Scribed By: Chandrasekar Ramachandran

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

Introduction

Interpretations

Types of Alignments

Techniques for Solving

Dynamic Programming

Probabilistic Methods

Scoring Functions

N-W and S-W Affine Gap Penalties

Sequence Alignment:

Ways of Arranging one sequence(DNA,RNA,Protein) on another to determine whether a region has been conserved in evolution or has a common evolutionary origin

Strings of Letters

Matrix Representation:

-

G

G

C

C

A

G

G

A

T

T

G

G

G

C

C

-

G

G

-

T

T

Mismatches?

Point Mutations: Replacement of a Single Base Nucleotide

Categorized as Transitions and Transversions

Gaps?

Indels or Insertion/Deletion Mutations

Can produce Frameshift Mutations Unless Multiple of 3

Introduced in one or both lineages

What about Amino Acids?

Degree of Similarity

Estimates Conservation

If Conservation is Less:

Indicates Region of High Importance

Estimating Similar Functional Roles:

By Assessing Similarity of Base Pairing

Dynamic Programming

Initialization

Matrix Fill or Scoring

Traceback

Probabilistic Methods

Bayesian Methods for HMM

Likelihood Derivatives and Fisher Scores

Training and Model Comparison

Scores for Aligned Functions Specified by a Similarity Matrix

Example:

Sequence 1: -CCGCTTACCTA

Sequence 2: TTCCGCTTATTA

Possible Alignments:

Sequence 1:-CCGCTTACCTA

Sequence 2:-CCGCTTA- - - -

Score Matches,Gaps and Indels Separately

The Scoring Matrix is Called F-Matrix

Each (I,j) entry denoted by Fij

Running Time:

For Sequences of size a and b, O(ab)‏

Summary:

Initialization: Fill in Base Cases in Topmost Row and Leftmost Column

Filling Partial Alignments: Traceback:

Trace back to Initial Pointer Matrix to get best solution

Involving Stretches Shorter than the Entire Sequence Length

Generally involves Sequences which are significantly dissimilar

Negative Scoring Matrix Cells are Set to Zero

Backtracking starts at highest scoring cell and continues to a cell with zero score

Prerequisite: Negative Expectation Score

Given sequences, a number is associated with each alignment

E.g Matches : +x, Mismatches: -y,Gaps: -z

Scoring Function: (x X #Matches) –(y X #mismatches) – (z X #Gaps)‏

Alignment Scores:

Sum of Substitution Scores and Gap Penalties

Residue-Based

Substitution Matrices:

Protein

Evolutionary

Expresses How one Character in a Sequence Changes with Other Character States

N X N Matrix where: N=4 for DNA and 20 for Amino Acids

Another way would be to consider A,G as Purines and T,C as Pyrimidines

Purines less likely to occur than Pyrimidines

Minimum Entropy Score:

Sum of Entropy Scores Computed For Each Column

Here,

i is a column

ciathe counts of letter a at column I

piathe inferred probability

Gap Characters: Residue Symbols

• Gaps More Likely to Occur in Groups

• Examples:

• Convex Gap Scoring Functions

• Affine Gap Functions

• Convex Gap Scoring Functions:

• Penalties decrease as Gaps Get Longer

• γ(n):for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1)

• Now F(i,j) = max { F(i-1,j-1) + s(xi,yj)

maxk=0...i-1 F(k,j) –γ(i-k)

maxk=0...j-1 F(i,k) –γ(j-k)

• Shortcomings of a general gap penalty function:

• Different Penalties for Additional Gaps

• Cubic Time for Updating Entries

• Example:

• First Gap Penalized Differently, Subsequent Gaps Penalized Linearly

• 3 Matrices Computed Simultaneously

• http://webcourse.cs.technion.ac.il/236522/Winter2005-2006/ho/WCFiles/tutorial03.ppt

• http://engr.smu.edu/~saad/courses/cse8354/lectures/lecture6.pdf

• http://www.bioinfo.org.cn/lectures/index-13.html

• Needleman, S.B. and Wunsch, Ch.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443-453.

• Smith, T.F. and Waterman, M.S. (1981) Comparison of Biosequences. Adv. appl. Math., 2, 482-489.

• Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing Homologies in Protein Sequences. Methods Enzymol., 91, 524-545.

• Gotoh, O. (1982) An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162, 705-708.