Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois Urbana-Champaign Scribed By: Chandrasekar Ramachandran. Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties. Contents. Introduction Interpretations Types of Alignments

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

Saurabh Sinha

02/05/2008

Department of Computer Science

University of Illinois Urbana-Champaign

Scribed By: Chandrasekar Ramachandran

### Contents

Introduction

Interpretations

Types of Alignments

Techniques for Solving

Dynamic Programming

Probabilistic Methods

Scoring Functions

N-W and S-W Affine Gap Penalties

### Introduction

Sequence Alignment:

Ways of Arranging one sequence(DNA,RNA,Protein) on another to determine whether a region has been conserved in evolution or has a common evolutionary origin

Strings of Letters

Matrix Representation:

-

G

G

C

C

A

G

G

A

T

T

G

G

G

C

C

-

G

G

-

T

T

### Interpretations

Mismatches?

Point Mutations: Replacement of a Single Base Nucleotide

Categorized as Transitions and Transversions

Gaps?

Indels or Insertion/Deletion Mutations

Can produce Frameshift Mutations Unless Multiple of 3

Introduced in one or both lineages

### Interpretations(Contd.)

Degree of Similarity

Estimates Conservation

If Conservation is Less:

Indicates Region of High Importance

Estimating Similar Functional Roles:

By Assessing Similarity of Base Pairing

### Solving Sequence Alignment Problems

Dynamic Programming

Initialization

Matrix Fill or Scoring

Traceback

Probabilistic Methods

Bayesian Methods for HMM

Likelihood Derivatives and Fisher Scores

Training and Model Comparison

### Needleman-Wunsch Algorithm(Global Alignment)‏

Scores for Aligned Functions Specified by a Similarity Matrix

Example:

Sequence 1: -CCGCTTACCTA

Sequence 2: TTCCGCTTATTA

Possible Alignments:

Sequence 1:-CCGCTTACCTA

Sequence 2:-CCGCTTA- - - -

Score Matches,Gaps and Indels Separately

### Global Alignment(contd.)‏

The Scoring Matrix is Called F-Matrix

Each (I,j) entry denoted by Fij

Running Time:

For Sequences of size a and b, O(ab)‏

Summary:

Initialization: Fill in Base Cases in Topmost Row and Leftmost Column

Filling Partial Alignments: Traceback:

Trace back to Initial Pointer Matrix to get best solution

### Smith-Waterman Algorithm(Local Alignment)‏

Involving Stretches Shorter than the Entire Sequence Length

Generally involves Sequences which are significantly dissimilar

Negative Scoring Matrix Cells are Set to Zero

Backtracking starts at highest scoring cell and continues to a cell with zero score

Prerequisite: Negative Expectation Score

### Scoring Functions - Overview

Given sequences, a number is associated with each alignment

E.g Matches : +x, Mismatches: -y,Gaps: -z

Scoring Function: (x X #Matches) –(y X #mismatches) – (z X #Gaps)‏

Alignment Scores:

Sum of Substitution Scores and Gap Penalties

Residue-Based

Substitution Matrices:

Protein

Evolutionary

### Simple Substitution Matrices

Expresses How one Character in a Sequence Changes with Other Character States

N X N Matrix where: N=4 for DNA and 20 for Amino Acids

Another way would be to consider A,G as Purines and T,C as Pyrimidines

Purines less likely to occur than Pyrimidines

### Minimum Entropy Scoring Function

Minimum Entropy Score:

Sum of Entropy Scores Computed For Each Column

Here,

i is a column

ciathe counts of letter a at column I

piathe inferred probability

Gap Characters: Residue Symbols

### Gap Functions

• Gaps More Likely to Occur in Groups

• Examples:

• Convex Gap Scoring Functions

• Affine Gap Functions

• Convex Gap Scoring Functions:

• Penalties decrease as Gaps Get Longer

• γ(n):for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1)

• Now F(i,j) = max { F(i-1,j-1) + s(xi,yj)

maxk=0...i-1 F(k,j) –γ(i-k)

maxk=0...j-1 F(i,k) –γ(j-k)

### Affine Gap Functions

• Shortcomings of a general gap penalty function:

• Different Penalties for Additional Gaps

• Cubic Time for Updating Entries

• Example:

• First Gap Penalized Differently, Subsequent Gaps Penalized Linearly

• 3 Matrices Computed Simultaneously

### References

• http://webcourse.cs.technion.ac.il/236522/Winter2005-2006/ho/WCFiles/tutorial03.ppt