1 / 1

Reconstruction of Ancestral Gene Order after Segmental Duplication and Gene Loss

Introduction.

chava
Download Presentation

Reconstruction of Ancestral Gene Order after Segmental Duplication and Gene Loss

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction Identifying homologous chromosome segments in different organisms is an important problem in comparative genomics.  The task is a challenging one when the genomes are distantly related (particularly if they have undergone large-scale duplication and gene loss [4]) and when the markers mapped in common between species are sparse relative to the number of rearrangement events. It has been proposed that multiple genome alignments can improve the power to detect segmental homologies over the more common pairwise alignment approach [1, 2], but methods for multiple genome alignment appropriate for large, sparsely mapped nuclear genomes have received little study.  One approach to the problem is to reconstruct marker order and content in a common ancestor of the genomes under study, as it should generally be easier to detect segmental homologies between the most recent common ancestor and its contemporary descendents than among the contemporary descendants themselves. Here we propose a heuristic algorithm for the inference of ancestral gene order in a set of related genomes. We evaluate the accuracy of the reconstructions obtained using this algorithm for simulated genomes that evolve under large-scale duplication, gene loss and a variety of chromosomal rearrangement events. We used two assessments, normalized breakpoint distance and coverage, to measure the quality of the reconstructions. We define a contig’s normalized BP distance as the ratio between its induced breakpoint distance (to the original genome) and the contig’s length. Coverage is calculated as the ratio of the total number of distinct markers in all contigs to the number of markers in the original genome. We compared the quality of eAssembler contigs to the segments produced by a pairwise local genome alignment method alone (using FISH [3]). The result is presented in Figure 1. The quality of reconstruction as a function of the total number of rearrangements is presented in Figure 2. eAssembler Overview eAssembler (Evolutionary Assembler) is an algorithm that iteratively detects homologies from sequence data and assembles them into larger segments that more closely resemble the ancestral genome sequence. The three major steps are as follows: References [1] G. Blanc, K. Hokamp, and K. H. Wolfe. Genome Res,13:137–44, 2003. [2] J. E. Bowers, B. A. Chapman, J. Rong, and A. H. Paterson. Nature, 422:433–438, 2003. [3] P. P. Calabrese, S. Chakravarty, and T. J. Vision. Bioinformatics, 19:i74–i80, 2003. [4] H.-M. Ku, T. Vision, J. Liu, and S. D. Tanksley. Proc Natl Acad Sci USA, 97:9121–9126, 2000. [5] D. Sankoff, D. Bryant, M. Deneault, B. F. Lang, and G. Burger. J Comput Biol, 7:521–536, 2000. [6] L.-S.Wang and T.Warnow. In Proc.33th Annual ACM Symp. on Theory of Comp. ACM Press, 2001. • Detecting homologous segments using existing pairwise local genome alignment algorithm • A clustering algorithm is used to determine which segments are to be assembled and in what order. • Initially, the algorithm places each pair of homologous segments in a separate cluster. • The algorithm iteratively joins two existing clusters P and Q that satisfy two conditions, governed by parameters t and k: • P and Q must share at least t markers. • There must exist a median m such that the distance between every segment in the two clusters is at most k. • a join with the maximal number of shared markers is preferred in case of tie. • An optimal median is computed and reportedfor each cluster using a parallelized version of [5]. Conclusions and Future Work The simulation results show that long contigs with only minor rearrangements from the ancestral order can be obtained using the eAssembler algorithm when rearrangement frequency is less than 0.2 to 0.3 per marker. Such contigs can be used to substantially improve the detection sensitivity of local genome alignment algorithms [2]. In our current work, we are optimizing assembly parameters with the aim of improving the quality of the reconstructions and applying eAssembler to biological datasets to test whether such reconstructions allow more sensitive detection of segmental homologies. Experimental Study Figure 1 Left: distribution of normalized BP distances (BP/marker). Right: length distribution of segments (FISH)/contigs (eAssembler) in number of markers. Mean normalized BP distance: 0.260 (FISH) and 0.316 (eAssembler). Mean length of segment (FISH): 10.6 and contig (eAssembler): 19.9. Figure 2 Normalized BP distance (left) and coverage (right) under varying numbers of rearrangement events using representative parameters. The three lines show results for different fixed proportions of deletion : inversion : transposition: translocation 12:6:1:1 (solid), 11:3:3:3 (dashed), 9:9:1:1 (dot-dashed) Ten genomes were simulated and assembled for each point. Vertical bars show one standard deviation. • We applied eAssembler to synthetic datasets to see how well it could reconstruct ancestral marker order. • A multi-chromosome genome is obtained by simulation: • Initially create a single chromosome genome by generating a marker sequence of length n • Apply the following rearrangements to the genome according to a generalized Nadeau-Taylor model [6] : • Deletion (a single marker is randomly selected and delete) • Inversion (a randomly selected substring is reversed) • Transposition (move a randomly selected substring to a new location) • Reciprocal translocation (mutually exchanging random substrings between two marker sequences) • Whole genome duplication Breakpoint Distance Given a sequence of markers P = p1, p2, .., pn and one of P’s permutations Q, the breakpoint distance between P and Q is the total number of broken adjacencies between P and Q. Given two marker sequences P = p1, p2, .., pn and Q= q1, q2, .., qm (no duplicated markers in P and Q), let P’ (Q’) be the subsequence of P (Q) composed by only shared markers between P and Q. The induced breakpoint distance (BP) between P and Q is the breakpoint distance between P’ and Q’ Given two sequences P = p1, p2, .., pn and Q= q1, q2, .., qm, a median of P, Q is a permutation of the set of markers in P or Q. An optimal median for a set S of sequences is a median that has minimal sum of induced breakpoint distances to each member in S. An optimal median is also referred to as a contig, by analogy to similar problems in sequence assembly. Reconstruction of Ancestral Gene Order after Segmental Duplication and Gene Loss 1Jun Huan, 1Jan Prins, 1Wei Wang, 2Todd Vision Departments of 1Computer Science and 2Biology, University of North Carolina at Chapel Hill contact: huan@cs.unc.edu This work is partially supported by NSF grant DBI-0227314 and a Bioinformatics and Computational Biology Graduate Fellowship from UNC.

More Related