1 / 13

Sequence alignment is central to bioinformatics!

Sequence alignment is central to bioinformatics!. Phylogenetic trees and molecular evolution Identifying genes in a genome Predicting function of unknown genes Predicting protein structure Assembling genome sequences. What does it mean to “align” DNA sequences?. a 1. a 2.

austin
Download Presentation

Sequence alignment is central to bioinformatics!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence alignment is central to bioinformatics! • Phylogenetic trees and molecular evolution • Identifying genes in a genome • Predicting function of unknown genes • Predicting protein structure • Assembling genome sequences

  2. What does it mean to “align” DNA sequences? a1 a2 GCCTACGACCTCCAGAC GCGTTGG--CTCCAGAC a-globin e (embryonic) d (fetal) b (adult) b-globin ancestral globin gene leghemoglobin (legumes) myoglobin (muscle) evolutionary time

  3. A hemoglobin alignment

  4. Terms for related sequences: paralogs orthologs • Homologous sequences – same evolutionary origin • Similar sequences – don’t have to have a common origin • Orthologs – homologous genes in two species • Paralogs – homologous genes within a species (duplication)

  5. What can we learn from this pairwise alignment? species 1 GCCTACGACCTCCAGAC species 2 GCGTTGG--CTCCAGAC

  6. How did the OYOP mutation detection algorithm work?Did you feel that it wasn’t fully satisfactory? What about inserting any combination of any number of gaps at any position until the best score is obtained? GCCTAC GCCTAC GCCTAC GCCTAC GCT--- G-CT-- G--CT- G---CT GCCTAC GCCTAC GCCTAC GCCTAC G-C-T- G-C--T -G-C-T --G-CT

  7. Dot-matrix alignment: a simple algorithm window 1 window 2 • “Sliding window” of fixed length • Dot when window 1 matches window 2

  8. Dot-matrix alignment • Uses: • See similar or different regions • Look for repeats, insertions, deletions • Drawbacks: • Noisy unless sequences are very similar • Does not show how the sequences align • Does not produce a score

  9. Scoring an alignment species 1GCCTACGACCTCCGCCTACGACCTCC species 2GCGTTGG--CTCCGCGTT-GGC-TCC • Considerations: • Measure percent identity • Scoring: match, mismatch, gap • Two different alignments may give same score Is a gap worse than a mismatch? Why? Is a longer gap worse?

  10. Needleman-Wunsch algorithm • Compares two sequences • Global alignment • Considers matches, mismatches and gaps • Provides optimal alignment and score • Not necessarily “correct” alignment • Efficient: uses dynamic programming • Break a problem into manageable sub-problems • Assemble sub-problems to solve original problem

  11. Needleman-Wunsch parameters • Match score (“match bonus”) • Mismatch score – often zero, could penalize mismatch • Gap penalty (linear or Affine) • Various scoring methods can be used with basic algorithm

  12. Semi-global (“glocal”) alignments • Align a gene with a genome • Align a domain with a protein • Align start of one sequence with end of another

  13. Local alignments • Partial match between sequences: • Allow for introns • Find shared domains within distinct proteins

More Related