1 / 1

Novel computational methods for large scale genome comparison

Novel computational methods for large scale genome comparison Todd James Treangen

zoey
Download Presentation

Novel computational methods for large scale genome comparison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Novel computational methods for • large scale genome comparison • Todd James Treangen • The current wealth of available genomic data embodies a wide variety of species, spanning all domains of life, thus providing an unprecedented opportunity to compare and contrast evolutionary histories of closely and distantly related organisms. The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, sequence alignment has proven to be a versatile tool for comparing closely and distantly related organisms, nucleotide at a time. For pairwise comparisons, optimal global and local alignment for a given scoring scheme requires O (m x n) time and space with respect to the lengths m,n of the genomes. This computational bottleneck becomes exponential for multiple genomes, O (nk) for k genomes, making large scale multiple genome alignment a daunting task using traditional methods. • The focus of this dissertation is on developing novel algorithms and software for efficient global and local comparison of multiple genomes and the application of these methods for a biologically relevant case study. The thesis research is organized following the main focus into three successive phases, specifically: (1) multiple genome alignment of closely related species, (2) local multiple alignment of interspersed repeats, and finally, (3) a comparative genomics case study of DNA uptake sequences (DUS) in Nesseria. • In Phase 1, we first develop an efficient algorithm and data structure for maximal substring match search in multiple genome sequences. Our approach for searching for maximal unique substrings yields significant improvements, in both time and space, over existing methods when dealing with multiple genome sequences. Specifically, given S1 … Sm genome sequences (where S1 is the smallest genome), we are able to find the maximal unique substrings among all of the sequence in linear time O(|S1|+ … + Sm|) and linear space O(|S1|). We then implement these contributions into an interactive multiple genome comparison and alignment tool, M-GCAT, which can efficiently construct multiple genome comparison frameworks in closely related species. In Phase 2, we present a novel computational method for local multiple alignment of interspersed repeats. Our method for local alignment of interspersed repeats features a novel method for gapped extensions of chained seed matches, joining global multiple alignment with a homology test based on a hidden Markov model. We implement our method for local alignment of interspersed repeats into the open source procrastAlign software. In Phase 3, using the results from the previous two phases we perform a case study of neisserial genomes by tracking the propagation of repeat sequence elements in attempt to understand why the important pathogens of the neisserial group have sexual exchange of DNA by natural transformation. • In conclusion, our global contributions in this dissertation have focused on comparing and contrasting evolutionary histories of related organisms via their genomes. We have proposed novel data structures, algorithms, and software for active areas of research in comparative genomics. We have experimentally demonstrated the efficiency and accuracy of the computational methods presented in this thesis for multiple alignment, and have implemented all data structures and algorithms in freely available software. Novel computational methods for large scale genome comparison Novel computational methods for large scale genome comparisons of related species DOCTORAL THESIS, 2008 Todd James Treangen PhD Director: Dr. Xavier Messeguer Departament de LlenguatgesiSistemesInformàtics UniversitatPolitècnica de Catalunya

More Related