1 / 16

ProbCons : Probabilistic consistency based multiple sequence alignment

ProbCons : Probabilistic consistency based multiple sequence alignment. Genome Research 2005 By Serafim Batzoglou et. al Stanford University. Majid Kazemian. Multiple Sequence Alignment. Biologists need accurate tools for multiple sequence alignment to study evolution

Download Presentation

ProbCons : Probabilistic consistency based multiple sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProbCons : Probabilistic consistency based multiple sequence alignment Genome Research 2005 By Serafim Batzoglou et. al Stanford University Majid Kazemian

  2. Multiple Sequence Alignment • Biologists need accurate tools for multiple sequence alignment to study evolution • Conserved stretches of amino acid often indicate preserved 3D protein structure • Obtaining accurate alignment is difficult • High computational cost • Lack of proper objective function for measuring • Variety of heuristic strategies have been proposed including GA, SA ,DP and greedy approaches

  3. ClustalW • The most popular heuristic strategies involve tree based progressive alignment (ClustalW) • Construct pair-wise alignment • Build a guided tree • Progressive alignment based on guided tree • Post-processing for iterative refinement • ad hoc sum-of-pairs schema for scoring • Error in the early stages of alignment propagate to the final alignment

  4. Consistency: prevention is the best medicine • Every multiple alignment induces pairwise alignments which are necessarily consistent • We want to Incorporate multiple sequence information to guide pairwise alignment • E.g. adjusting the score for an xi-yj residue pairing according to support from zk that align to both xi andyj

  5. Algorithm overview • Step 1: Computation of posterior-probability matrices • Step 2: Computation of expected accuracies • Step 3: Probabilistic consistency transformation • Re-estimate the PP by incorporating other sequences • Step 4: Computation of guided tree • Step 5: Progressive alignment • Post-processing- Iterative refinement

  6. Step 1: Computation of posterior-probability • Uses pair-HMM model to generate alignment M emits two letters, one from each sequence Ix emits a letter from x that aligns to a gap Iy emits a letter from y that aligns to a gap

  7. Step 2: Computation of expected accuracies • Compute maximum accuracy • Maximizing the expected accuracy of the reported alignment

  8. Step 3: Probabilistic consistency transformation • Re-estimate the match quality scores (PP) by applying probabilistic consistency transformation

  9. Step 4: Computation of guided tree • Constructs a guide tree for S by hierarchical clustering like UPGMA. • Uses expected accuracy as the measure of similarity between 2 sequences • Defines the similarity of two clusters as weighted average of pairwise similarities between sequences of the clusters

  10. Step 5: Progressive alignment & Iterative refinement • Aligns sequences hierarchically according to the order in guide tree using transformed match quality scores • Randomly partitions alignment into two groups of sequences and realign

  11. Some additional features • Estimates the reliability of alignment columns based on pairwise posterior probabilities • ProbCons-ext • Extra pair of insertion states to model long or terminal insertions

  12. Results & performances of ProbCons • Benchmark alignment databases • BAliBASE 2.01 (Thompson et al. 1999a) • PREFAB 3.0 (Edgar 2004) • 1932 alignments averaging 49 sequences of length 240 • SABMARK 1.63 (Van Walle et al. 2004) • Measures of alignment accuracy • SP (sum-of-pairs score) • CS (column score)

  13. Results on BAliBASE

  14. Estimated Column reliabilities

  15. PREFAB results

  16. Discussion & Conclusion • Dramatically improves in alignment accuracy • Uses a simple model of Seq. similarity (HMM) • Doesn’t incorporate biological knowledge such as position-specific gap scoring, evolutionary tree construction. It Can be more refined by incorporating these features. • Competitive but still high computational cost for long sequences • Can be used in RNA structure alignment and prediction

More Related