1 / 27

alignment basics

Why do we need alignment?. To predict function of proteins or RNAsComplication: function evolves!To predict structure of proteins or RNAsa.k.a.

Jims
Download Presentation

alignment basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Alignment basics

    3. Function Prediction Function prediction by homology a gene or protein is compared against other genes or proteins in a database if a sequence can be detected whose similarity is statistically significant, the function of the unknown gene or protein is inferred. first-order approximation of the molecular function of the proteins encoded in a genome prioritize experimental investigation

    4. Inference of function

    5. Complications Many proteins belong to large families. Composed of subfamilies by gene duplication events Gene duplication allows one copy to assume a new biological role through mutation Hence, subfamilies often differ in their biological functionality yet still exhibit a high degree of sequence similarity. Other complications Ignoring the multi-domain organization of proteins. Error propagation Insufficient masking of low complexity regions Alternative splicing Recombination, “gene conversion”

    7. How Is It Possible? The structure of a protein is uniquely determined by its amino acid sequence(but sequence is sometimes not enough): prions pH, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution. Structure > Function >> Sequence

    8. How Is It Done? Identify template(s) – initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation

    9. Inference of structure by comparative (homology) modeling

    10. CASP competition The main goal of CASP is to obtain an in-depth and objective assessment of our current abilities and inabilities in the area of protein structure prediction. To this end, participants will predict as much as possible about a set of soon to be known structures. These will be true predictions, not “post-dictions” made on already known structures.

    11. Critical residue prediction

    12. Domain identification

    15. Molecular evolution Questions like… Where did sequence X originate? What is the phylogeny of X, Y and Z? Does genome G contain any horizontally transferred sequences? Are there any duplicated genes? What are their orthology/paralogy relationships? What are the [relative] rates of various kinds of mutations (synonymous, nonsynonymous, frame-preserving, etc.)?

    16. How to make alignments? Visual inspection dotplots Manual editing alignment editors Automated methods scoring schemes dynamic programming algorithms

    17. Dotplots

    18. Dotplot vs self: repetitive sequence

    19. Pairwise alignments: terminology Substitutions & insertions/deletions (“indels”) Collinear alignment Place gap characters (“-” or “.”) in the sequence so that homologous residues are aligned

    20. Pairwise alignment (DNA)

    21. Pairwise alignment (DNA-protein)

    22. Alignment path graph

    23. Global vs local alignment Global alignment Entirety of sequences must match Local alignment Best match between subsequences “Semi-local” etc. Global w.r.t. query, local w.r.t. target Alignment path graph views

    24. Scoring schemes Substitution scores Simple, e.g. +5 for match, -4 for mismatch More differentiated e.g. different scores for transitions/transversions Most flexible: substitution matrix Gap penalties Linear: fixed penalty per gap column Affine: scores for opening & extending gaps Local: gaps at the ends are “free”

    25. Example: edit distance Concept from information theory Minimum number of edit operations required to change one string into another Hamming distance Each substituted character scores -1 Levenshtein distance Each substituted, inserted or deleted character scores -1

    26. How might we make edit distance more realistic? Each inserted or deleted character scores -3 DNA Transition (A:G or C:T) scores -1 Transversion scores -2 English text Vowel:vowel or consonant:consonant scores -1 Vowel:consonant or consonant:vowel scores -2

    27. Needleman-Wunsch

More Related