Inverse Alignment. CS 374 Bahman Bahmani Fall 2006. The Papers To Be Presented. Sequence Comparison - Alignment. Alignments can be thought of as two sequences differing due to mutations happened during the evolution. AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
| | | | | | | | | | | | | x | | | | | | | | | | |
-1 + (-1) + (-2) + 5 + 7 + 3 = 11
BLOSUM50 Scoring Matrix
Separate penalties for gap opening and gap extension
Opening: The cost to introduce a gap
Extension: The cost to elongate a gap
Opening a gap is costly, while extending a gap is cheap
Despite scoring matrices, no gap penalties are commonly agreed upon
-5 -1 -1 -1
Definition (Inverse Optimal Alignment):
INPUT: alignments A1, A2, …, Ak of strings,
an alignment scoring function fw with parameters w = (w1, w2, …, wp).
OUTPUT: values x = (x1, x2, …, xp) for w
GOAL: each input alignment be an optimal alignment of its strings under fx .
ATTENTION: This problem may have no solution!
where is the optimal alignment of S under f.
INPUT: alignments Ai
scoring function f
OUTPUT: find parameter values x
GOAL: each alignment Ai be -optimal under fx .
The smallest possible can be found within accuracy using calls to the algorithm.
for every alignment B of S other than A.
INPUT: alignments Ai
scoring function f
OUTPUT: parameter values x
GOAL: each alignment Ai be -unique under fx
The largest possible can be found within accuracy using calls to the algorithm.
where each fi measures one of the features of A.
g(A) = number of gaps
l(A) = total length of gaps
s(A) = total score of all substitutions
a and b range over all letters in the alphabet
hab(A) = # of substitutions in A replacing a by b
a system of linear inequalities in x
a linear objective function in x
OUTPUT: assignment of real values to x
GOAL: satisfy all the inequalities and minimize the objective
In general, the program can be infeasible, bounded, or unbounded.
The number of alignments of a pair of strings of length n is hence a total of inequalities in p variables. Also, no specific objective function.
INPUT: rational coefficients c specifying the objective
OUTPUT: a point x in P minimizing cx, or determining that P is empty.
INPUT: a point y in
OUTPU: rational coefficients w and b such that for all points x in P, but (a violated inequality) or determining that y is in P.
That is, for bounded rational polyhedrons:
add the violated inequality to S and loop back to step (2).
1. the alignment scoring function is linear
2. the parameters values can be bounded
3. for any fixed parameter choice, an optimal alignment can be found in polynomial time.
Inverse Unique-Optimal Alignment can be solved in polynomial time if in addition:
3’. for any fixed parameter choice, a next-best alignment can be found in polynomial time.
where s is the minimum of all non-identity substitution scores and i is the maximum of all identity scores.
x1/2 is expected to better generalize to alignments outside the training set.
INPUT: two sequences x and y
OUTPUT: the alignment a of x and y that maximizes P(a|x,y;w)
RUNNING TIME: O(|x|.|y|)
where w is a real-valued parameter vector not necessarily corresponding to log-probabilities
where is a Gaussian prior on w, to prevent over-fitting.
1. Hydropathy-based gap context features (CONTRAlignHYDROPATHY)
2. External Information:
2.1. Secondary structure (CONTRAlignDSSP)
2.2. Solvent accessibility (CONTRAlignACCESSIBILITY)
For each conservation range, the uncolored bars give accuracies for MAFFT(L-INS-i), T-Coffee, CLUSTALW, MUSCLE, and PROBCONS (Bali) in that order, and the colored bar indicated the accuracy for CONTRAlign.