Saurabh Sinha 02/05/2008 Department of Computer Science University of Illinois UrbanaChampaign Scribed By: Chandrasekar Ramachandran. Sequence Alignment – Scoring Functions, NW and SW Affine Gap Penalties. Contents. Introduction Interpretations Types of Alignments
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
02/05/2008
Department of Computer Science
University of Illinois UrbanaChampaign
Scribed By: Chandrasekar Ramachandran
Sequence Alignment – Scoring Functions, NW and SW Affine Gap PenaltiesIntroduction
Interpretations
Types of Alignments
Techniques for Solving
Dynamic Programming
Probabilistic Methods
Scoring Functions
NW and SW Affine Gap Penalties
Sequence Alignment:
Ways of Arranging one sequence(DNA,RNA,Protein) on another to determine whether a region has been conserved in evolution or has a common evolutionary origin
Strings of Letters
Matrix Representation:

G
G
C
C
A
G
G
A
T
T
G
G
G
C
C

G
G

T
T
Mismatches?
Point Mutations: Replacement of a Single Base Nucleotide
Categorized as Transitions and Transversions
Gaps?
Indels or Insertion/Deletion Mutations
Can produce Frameshift Mutations Unless Multiple of 3
Introduced in one or both lineages
What about Amino Acids?
Degree of Similarity
Estimates Conservation
If Conservation is Less:
Indicates Region of High Importance
Estimating Similar Functional Roles:
By Assessing Similarity of Base Pairing
Dynamic Programming
Initialization
Matrix Fill or Scoring
Traceback
Probabilistic Methods
Bayesian Methods for HMM
Likelihood Derivatives and Fisher Scores
Training and Model Comparison
Scores for Aligned Functions Specified by a Similarity Matrix
Example:
Sequence 1: CCGCTTACCTA
Sequence 2: TTCCGCTTATTA
Possible Alignments:
Sequence 1:CCGCTTACCTA
Sequence 2:CCGCTTA   
Score Matches,Gaps and Indels Separately
The Scoring Matrix is Called FMatrix
Each (I,j) entry denoted by Fij
Running Time:
For Sequences of size a and b, O(ab)
Summary:
Initialization: Fill in Base Cases in Topmost Row and Leftmost Column
Filling Partial Alignments: Traceback:
Trace back to Initial Pointer Matrix to get best solution
Involving Stretches Shorter than the Entire Sequence Length
Generally involves Sequences which are significantly dissimilar
Negative Scoring Matrix Cells are Set to Zero
Backtracking starts at highest scoring cell and continues to a cell with zero score
Prerequisite: Negative Expectation Score
Given sequences, a number is associated with each alignment
E.g Matches : +x, Mismatches: y,Gaps: z
Scoring Function: (x X #Matches) –(y X #mismatches) – (z X #Gaps)
Alignment Scores:
Sum of Substitution Scores and Gap Penalties
ResidueBased
Substitution Matrices:
Protein
Evolutionary
Expresses How one Character in a Sequence Changes with Other Character States
N X N Matrix where: N=4 for DNA and 20 for Amino Acids
Another way would be to consider A,G as Purines and T,C as Pyrimidines
Purines less likely to occur than Pyrimidines
Minimum Entropy Score:
Sum of Entropy Scores Computed For Each Column
Here,
i is a column
ciathe counts of letter a at column I
piathe inferred probability
Gap Characters: Residue Symbols
maxk=0...i1 F(k,j) –γ(ik)
maxk=0...j1 F(i,k) –γ(jk)