Comprehensive Guide to Multiple Sequence Alignment in Bioinformatics
130 likes | 153 Views
Learn about approaches, algorithms, and applications of Multiple Sequence Alignment (MSA). Understand ClustalW, progressive alignment, genetic algorithms, and more. Explore the limitations and benefits of MSA.
Comprehensive Guide to Multiple Sequence Alignment in Bioinformatics
E N D
Presentation Transcript
Multiple Sequence Alignment Urmila Kulkarni-Kale Bioinformatics Centre University of Pune urmila@bioinfo.ernet.in
Approaches: MSA • Dynamic programming • Progressive alignment: ClustalW • Genetic algorithms: SAGA
Progressive alignment approach • Align most related sequences • Add on less related sequences to initial alignment • Perform pairwise alignments of all sequences • Use alignment scores to produce phylogenetic tree • Align sequences sequentially, guided by the tree • Gaps are added to an existing profile in progressive methods
Pairwise alignment: Calculate the distance matrix Unrooted Neighbor-joining tree Rooted NJ tree Sequence weights Progressive alignment using Guide tree Steps in ClustalW Algorithm
ClustalW: weight • groups of related sequences receive lower weight • highly divergent sequences without any close relatives receive high weights
ClustalW: affine Gap penalty • GOP: Gap Opening Penalty • GEP: Gap Extension Penalty Heuristics in calculating gap penalty • Position specific penalty • gap at position? • yes lower GOP and GEP • no, but gap within 8 residues increase GOP • stretch of hydrophilic residues? • yes lower GOP • no use residue-specific gap propensities Once a gap, always a gap
Highest GOP in ‘Gapped regions’ Variation in local GOP Lowest GOP in Hydrophilic regions Initial GOP
MSA: help detect Similarity Hemoglobin: Human, chimpanzee, Goat, pig, horse & mouse
Applications of MSA • Detecting diagnostic patterns • Phylogenetic analysis • Primer design • Prediction of protein secondary structure • Finding novel relationships between genes • Similar genes conserved across organisms • Same or similar function • Simultaneous alignment of similar genes yields: • regions subject to mutation • regions of conservation • mutations or rearrangements causing change in conformation or function
Limitations of Progressive alignment approach • Greedy nature • Any errors in the initial alignment are carried through • More efficient for closely related sequences than for divergent sequences