Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties PowerPoint PPT Presentation


  • 178 Views
  • Uploaded on
  • Presentation posted in: General

Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois Urbana-Champaign Scribed By: Chandrasekar Ramachandran. Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties. Contents. Introduction Interpretations Types of Alignments

Download Presentation

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

Saurabh Sinha

02/05/2008

Department of Computer Science

University of Illinois Urbana-Champaign

Scribed By: Chandrasekar Ramachandran

Sequence Alignment – Scoring Functions, N-W and S-W Affine Gap Penalties


Contents l.jpg

Contents

Introduction

Interpretations

Types of Alignments

Techniques for Solving

Dynamic Programming

Probabilistic Methods

Scoring Functions

N-W and S-W Affine Gap Penalties


Introduction l.jpg

Introduction

Sequence Alignment:

Ways of Arranging one sequence(DNA,RNA,Protein) on another to determine whether a region has been conserved in evolution or has a common evolutionary origin

Strings of Letters

Matrix Representation:

-

G

G

C

C

A

G

G

A

T

T

G

G

G

C

C

-

G

G

-

T

T


Interpretations l.jpg

Interpretations

Mismatches?

Point Mutations: Replacement of a Single Base Nucleotide

Categorized as Transitions and Transversions

Gaps?

Indels or Insertion/Deletion Mutations

Can produce Frameshift Mutations Unless Multiple of 3

Introduced in one or both lineages


Interpretations contd l.jpg

Interpretations(Contd.)

What about Amino Acids?

Degree of Similarity

Estimates Conservation

If Conservation is Less:

Indicates Region of High Importance

Estimating Similar Functional Roles:

By Assessing Similarity of Base Pairing


Solving sequence alignment problems l.jpg

Solving Sequence Alignment Problems

Dynamic Programming

Initialization

Matrix Fill or Scoring

Traceback

Probabilistic Methods

Bayesian Methods for HMM

Likelihood Derivatives and Fisher Scores

Training and Model Comparison


Needleman wunsch algorithm global alignment l.jpg

Needleman-Wunsch Algorithm(Global Alignment)‏

Scores for Aligned Functions Specified by a Similarity Matrix

Example:

Sequence 1: -CCGCTTACCTA

Sequence 2: TTCCGCTTATTA

Possible Alignments:

Sequence 1:-CCGCTTACCTA

Sequence 2:-CCGCTTA- - - -

Score Matches,Gaps and Indels Separately


Global alignment contd l.jpg

Global Alignment(contd.)‏

The Scoring Matrix is Called F-Matrix

Each (I,j) entry denoted by Fij

Running Time:

For Sequences of size a and b, O(ab)‏

Summary:

Initialization: Fill in Base Cases in Topmost Row and Leftmost Column

Filling Partial Alignments: Traceback:

Trace back to Initial Pointer Matrix to get best solution


Smith waterman algorithm local alignment l.jpg

Smith-Waterman Algorithm(Local Alignment)‏

Involving Stretches Shorter than the Entire Sequence Length

Generally involves Sequences which are significantly dissimilar

Negative Scoring Matrix Cells are Set to Zero

Backtracking starts at highest scoring cell and continues to a cell with zero score

Prerequisite: Negative Expectation Score


Scoring functions overview l.jpg

Scoring Functions - Overview

Given sequences, a number is associated with each alignment

E.g Matches : +x, Mismatches: -y,Gaps: -z

Scoring Function: (x X #Matches) –(y X #mismatches) – (z X #Gaps)‏

Alignment Scores:

Sum of Substitution Scores and Gap Penalties

Residue-Based

Substitution Matrices:

Protein

Evolutionary


Simple substitution matrices l.jpg

Simple Substitution Matrices

Expresses How one Character in a Sequence Changes with Other Character States

N X N Matrix where: N=4 for DNA and 20 for Amino Acids

Another way would be to consider A,G as Purines and T,C as Pyrimidines

Purines less likely to occur than Pyrimidines


Minimum entropy scoring function l.jpg

Minimum Entropy Scoring Function

Minimum Entropy Score:

Sum of Entropy Scores Computed For Each Column

Here,

i is a column

ciathe counts of letter a at column I

piathe inferred probability

Gap Characters: Residue Symbols


Gap functions l.jpg

Gap Functions

  • Gaps More Likely to Occur in Groups

  • Examples:

    • Convex Gap Scoring Functions

    • Affine Gap Functions

  • Convex Gap Scoring Functions:

    • Penalties decrease as Gaps Get Longer

    • γ(n):for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1)

    • Now F(i,j) = max { F(i-1,j-1) + s(xi,yj)

      maxk=0...i-1 F(k,j) –γ(i-k)

      maxk=0...j-1 F(i,k) –γ(j-k)


Affine gap functions l.jpg

Affine Gap Functions

  • Shortcomings of a general gap penalty function:

    • Different Penalties for Additional Gaps

    • Cubic Time for Updating Entries

  • Example:

  • First Gap Penalized Differently, Subsequent Gaps Penalized Linearly

  • 3 Matrices Computed Simultaneously


References l.jpg

References

  • http://webcourse.cs.technion.ac.il/236522/Winter2005-2006/ho/WCFiles/tutorial03.ppt

  • http://engr.smu.edu/~saad/courses/cse8354/lectures/lecture6.pdf

  • http://www.bioinfo.org.cn/lectures/index-13.html

  • Needleman, S.B. and Wunsch, Ch.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443-453.

  • Smith, T.F. and Waterman, M.S. (1981) Comparison of Biosequences. Adv. appl. Math., 2, 482-489.

  • Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing Homologies in Protein Sequences. Methods Enzymol., 91, 524-545.

  • Gotoh, O. (1982) An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162, 705-708.


  • Login