Pairwise Sequence Analysis-I

Pairwise Sequence Analysis-I • Analogous Concepts • Domain Concepts • Dot-matrix Analysis • Pairwise Sequence Analysis • Why is it useful? • What are the underlying concepts? • What algorithms are used? • Limitations/Open questions Lecture 2 CS566

Concepts • Pairwise “everything” analysis • Humans are experts at pairwise comparison • “Peas of a pod” • Basis for pecking order in society • Basis for notion of fairness (“He got a better deal than I did!”) • Basis for grouping • Basis for sorting/ranking algorithms • Basis for search strategies Lecture 2 CS566

Concepts • Identity versus similarity • Brand name versus generic products • Coincidence versus plagiarism • Co-discoveries versus “Microsoft Explorer” • “What you see is not what you got” • Process not evident from product • “If it ain’t broke, don’t fix it” • Why messy code that works is left to remain messy Lecture 2 CS566

Domain Concepts • Sequences change over generations • More changes in DNA than in proteins (Why?) • Causes of change • Error rates of replication ~ 1 in a billion • Exchange • Legitimate: Recombination • Illegitimate: Lateral transfer (a la Napster) • Mutation • Exposure to chemicals Lecture 2 CS566

Dot-matrix Analysis Lecture 2 CS566

Why (Motivation) • Given sequence A, what are its properties? • Find any sequence(s) B similar to A, where B  {Sequences with known properties} • Given sequences A & B, are they • Identical? • Similar? • Evolutionary related? • Given sequence A, is this a new discovery? • Given a set of sequences, can they be clustered into families of related sequences? Lecture 2 CS566

Components • Compare the two sequences and assign score • Evaluate the score • Use of a scoring scheme to assign a value to a candidate alignment • Finding the best alignment between two sequences • Use of a probability model to assess the significance of the similarity (coincidence or plagiarism?) Lecture 2 CS566

Concepts - Alignment • Sequences may be aligned ‘locally’ (look for regions/subsequences that are similar) or ‘globally’ (align along entire length) • The result of a local alignment may differ from that a global alignment (Why?) • Potentially large number of alignments to be considered • Each alignment is a path through a fully connected dot matrix graph, i.e., a subset of the set of all diagonal edges, connected by horizontal or vertical lines. Lecture 2 CS566

Concepts – Scoring scheme • Scores are based on amino-acid substitution matrices (log-odds ratio of observed versus expected random substitutions) • PAM (Percent amino acid substitution) matrices: Based on evolutionary model. E.g., PAM 1%, PAM 250%. • BLOSUM (Blocks substitution) matrices: Based on percent identity. E.g., Blosum50, Blosum62. Lecture 2 CS566

Concepts – Scoring scheme Identity based scoring (match XOR non-match) Isawa --- dancer Isawatapdancer Isawa ---- danc-er Isawapinkpanther Similarity based scoring (partial matching) Eyl dkv Lecture 2 CS566

Algorithms • Optimal/Exact solutions: • Take longer time • Typically used for comparing a small number of sequences • (Needleman-Wunsch; Smith-Waterman) • Heuristic: • Frequently close to the optimal solution • Rapid • Typically used for 1:n searches involving large number of sequences • (BLAST, FASTA) Lecture 2 CS566

Open Questions/Limitations Highly similar sequences or highly dissimilar sequences are easily found BUT • Gray area lies in weak similarity; probability distribution is continuous • Other forms of similarity (e. g., structural or species similarity) may be necessary reach a decision • Sequence similarity per se does not guarantee functional equivalence; small difference can be responsible for large difference in function Lecture 2 CS566

Pairwise Sequence Analysis-I

Pairwise Sequence Analysis-I

Presentation Transcript

Pairwise sequence alignments

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise Sequence Analysis-V

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignments

Pairwise Sequence Comparison

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise sequence comparison

Pairwise sequence alignments

Pairwise sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment (I)

Pairwise Sequence Alignment

Pairwise sequence alignment