alignment basics

1. Alignment basics

3. Function Prediction Function prediction by homology a gene or protein is compared against other genes or proteins in a database if a sequence can be detected whose similarity is statistically significant, the function of the unknown gene or protein is inferred. first-order approximation of the molecular function of the proteins encoded in a genome prioritize experimental investigation

4. Inference of function

5. Complications Many proteins belong to large families. Composed of subfamilies by gene duplication events Gene duplication allows one copy to assume a new biological role through mutation Hence, subfamilies often differ in their biological functionality yet still exhibit a high degree of sequence similarity. Other complications Ignoring the multi-domain organization of proteins. Error propagation Insufficient masking of low complexity regions Alternative splicing Recombination, �gene conversion�

7. How Is It Possible? The structure of a protein is uniquely determined by its amino acid sequence(but sequence is sometimes not enough): prions pH, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution. Structure > Function >> Sequence

8. How Is It Done? Identify template(s) � initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation

9. Inference of structure by comparative (homology) modeling

10. CASP competition The main goal of CASP is to obtain an in-depth and objective assessment of our current abilities and inabilities in the area of protein structure prediction. To this end, participants will predict as much as possible about a set of soon to be known structures. These will be true predictions, not �post-dictions� made on already known structures.

11. Critical residue prediction

12. Domain identification

15. Molecular evolution Questions like� Where did sequence X originate? What is the phylogeny of X, Y and Z? Does genome G contain any horizontally transferred sequences? Are there any duplicated genes? What are their orthology/paralogy relationships? What are the [relative] rates of various kinds of mutations (synonymous, nonsynonymous, frame-preserving, etc.)?

16. How to make alignments? Visual inspection dotplots Manual editing alignment editors Automated methods scoring schemes dynamic programming algorithms

17. Dotplots

18. Dotplot vs self: repetitive sequence

19. Pairwise alignments: terminology Substitutions & insertions/deletions (�indels�) Collinear alignment Place gap characters (�-� or �.�) in the sequence so that homologous residues are aligned

20. Pairwise alignment (DNA)

21. Pairwise alignment (DNA-protein)

22. Alignment path graph

23. Global vs local alignment Global alignment Entirety of sequences must match Local alignment Best match between subsequences �Semi-local� etc. Global w.r.t. query, local w.r.t. target Alignment path graph views

24. Scoring schemes Substitution scores Simple, e.g. +5 for match, -4 for mismatch More differentiated e.g. different scores for transitions/transversions Most flexible: substitution matrix Gap penalties Linear: fixed penalty per gap column Affine: scores for opening & extending gaps Local: gaps at the ends are �free�

25. Example: edit distance Concept from information theory Minimum number of edit operations required to change one string into another Hamming distance Each substituted character scores -1 Levenshtein distance Each substituted, inserted or deleted character scores -1

26. How might we make edit distance more realistic? Each inserted or deleted character scores -3 DNA Transition (A:G or C:T) scores -1 Transversion scores -2 English text Vowel:vowel or consonant:consonant scores -1 Vowel:consonant or consonant:vowel scores -2

27. Needleman-Wunsch

alignment basics

alignment basics

Presentation Transcript

Basics of Sequence Alignment and Weight Matrices and DOT Plot

ALIGNMENT

Alignment

Alignment

Alignment

Pairwise Alignment Global & local alignment

Alignment

Alignment

Seq. Alignment, Struc. Alignment, Threading

ALIGNMENT

Alignment

Shaft Alignment

Alignment

Sequence Alignment

Alignment Design

Genome Alignment

TT Alignment

Alignment

Alignment

Basics of Sequence Alignment and Weight Matrices and DOT Plot

alignment basics