270 likes | 511 Views
Why do we need alignment?. To predict function of proteins or RNAsComplication: function evolves!To predict structure of proteins or RNAsa.k.a.
E N D
1. Alignment basics
3. Function Prediction Function prediction by homology
a gene or protein is compared against other genes or proteins in a database
if a sequence can be detected whose similarity is statistically significant, the function of the unknown gene or protein is inferred.
first-order approximation of the molecular function of the proteins encoded in a genome
prioritize experimental investigation
4. Inference of function
5. Complications Many proteins belong to large families.
Composed of subfamilies by gene duplication events
Gene duplication allows one copy to assume a new biological role through mutation
Hence, subfamilies often differ in their biological functionality yet still exhibit a high degree of sequence similarity.
Other complications
Ignoring the multi-domain organization of proteins.
Error propagation
Insufficient masking of low complexity regions
Alternative splicing
Recombination, “gene conversion”
7. How Is It Possible? The structure of a protein is uniquely determined by its amino acid sequence(but sequence is sometimes not enough):
prions
pH, ions, cofactors, chaperones
Structure is conserved much longer than sequence in evolution.
Structure > Function >> Sequence
8. How Is It Done? Identify template(s) – initial alignment
Improve alignment
Backbone generation
Loop modelling
Side chains
Refinement
Validation
9. Inference of structure by comparative (homology) modeling
10. CASP competition The main goal of CASP is to obtain an in-depth and objective assessment of our current abilities and inabilities in the area of protein structure prediction. To this end, participants will predict as much as possible about a set of soon to be known structures. These will be true predictions, not “post-dictions” made on already known structures.
11. Critical residue prediction
12. Domain identification
15. Molecular evolution Questions like…
Where did sequence X originate?
What is the phylogeny of X, Y and Z?
Does genome G contain any horizontally transferred sequences?
Are there any duplicated genes? What are their orthology/paralogy relationships?
What are the [relative] rates of various kinds of mutations (synonymous, nonsynonymous, frame-preserving, etc.)?
16. How to make alignments? Visual inspection
dotplots
Manual editing
alignment editors
Automated methods
scoring schemes
dynamic programming algorithms
17. Dotplots
18. Dotplot vs self: repetitive sequence
19. Pairwise alignments: terminology Substitutions & insertions/deletions (“indels”)
Collinear alignment
Place gap characters (“-” or “.”) in the sequence so that homologous residues are aligned
20. Pairwise alignment (DNA)
21. Pairwise alignment (DNA-protein)
22. Alignment path graph
23. Global vs local alignment Global alignment
Entirety of sequences must match
Local alignment
Best match between subsequences
“Semi-local” etc.
Global w.r.t. query, local w.r.t. target
Alignment path graph views
24. Scoring schemes Substitution scores
Simple, e.g. +5 for match, -4 for mismatch
More differentiated
e.g. different scores for transitions/transversions
Most flexible: substitution matrix
Gap penalties
Linear: fixed penalty per gap column
Affine: scores for opening & extending gaps
Local: gaps at the ends are “free”
25. Example: edit distance Concept from information theory
Minimum number of edit operations required to change one string into another
Hamming distance
Each substituted character scores -1
Levenshtein distance
Each substituted, inserted or deleted character scores -1
26. How might we make edit distance more realistic? Each inserted or deleted character scores -3
DNA
Transition (A:G or C:T) scores -1
Transversion scores -2
English text
Vowel:vowel or consonant:consonant scores -1
Vowel:consonant or consonant:vowel scores -2
27. Needleman-Wunsch