ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs: Class-I non-long terminal repeat retrotransposons (non-LTRs), by building a semi-automated pipeline 9 .
We have conducted an extensive computational analysis of the Culexquinquefasciatus genome to find and annotate a specific subfamily of the TEs: Class-I non-long terminal repeat retrotransposons (non-LTRs), by building a semi-automated pipeline9.
Initially we conducted BLAST searches to find the similarity to the known non-LTRs using amino acid sequences of Reverse-Transcriptase (RT) of known non-LTRs as the starting queries5,6. Consequently Blast-hits (DNA sequences) were combined and extracted utilizing PERL scripts, to obtain non-LTR candidates of Culex. These sequences were than assembled using SEQMAN module of DNA-STAR, manually truncated, adjusted, and annotated.
Annotation was done by two steps: I.- we annotated all the sequences using BLAST to nr database (NCBI), and identified some of Culex non-LTR consensuses as belonging to known non-LTR families; II.- we conducted phylogenetic analysis on all Culex non-LTRs, allowing us to further annotate our consensus sequences. Some of the elements were deteriorated and not possible to classify as a specific clade.
Upon completing preliminary annotation, a copy number of each element in the genome within the threshold was found. Comparison between Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, has shown different non-LTR clade composition, suggesting different evolutionary development of these species.
Phylogram produced by PhyloDraw7 visualizing tool, using as
input Multiple Alignment file created by ClustalX8 (N.J. algorithm)
Comparison of number of elements per
clade within three mosquito genomes.
Culex quinquefasciatus is an important vector of human pathogens in the United States and world-wide, including West Nile encephalitis and lymphatic filariases. Genomic analysis can help us better understand the adapting capabilities of this mosquito to various climatic environments and to the parasite. A significant part of any eukaryotic genome consists of the various types of repeats, including DNA and RNA Transposable Elements (TEs). The presence of TEs makes genomes difficult to assemble because of their repetitive nature and mobile activity. Thus it is one of the essential tasks of any genome project to annotate and characterize TEs. The recent Culex quinquefasciatus genome sequencing project provided us an opportunity to identify and annotate non-LTR retrotransposons.
Comparative contribution of TEs
to mosquito the genome sizes
Bioinformatic detection and annotation of non-LTR retrotransposons in the Culex quinquefasciatus mosquito genome.
Maria F. Unger×, Ryan C. Kennedy*, Jenica L. Abrudan×, Peter Arensburger¤, Greg Madey*×, Frank H. Collins×*
×Eck Institute of Global Health, University of Notre Dame
*Department of Computer Science & Engineering, University of Notre Dame
¤Department of Entomology, University of California, Riverside
Using only protein sequences in our semi-automated pipeline as starting queries, a large portion of elements (for which protein sequences were not available from TEfam6 or Repbase5) was overlooked. This problem was fixed by adding DNA sequences as BLAST queries to our pipeline, and we were able to identify and classify most of the overlooked elements.
There is arich diversity of non-LTRs present in Culexquinquefasciatus genome. Although there is no evidence of Outcast and R4 clades members in C.quinquefasciatus genome, there is a CM-gag, a unique Gag-only non-LTR retrotransposon, and LOA (which is not present in A. gambiae).
Non-LTR clades vary widely in copy number. Jockey, CR1 and CM-gag have thousands of copies, while I, L2, LOA , Loner and R1 have only hundreds. Jockey contributes more to the genome size then any other non-LTR clade, 1.76% of the genome. The total non-LTR number makes up 4.8% of the Culex genome.
Culex quinquefasciatus non-LTR:
Using a semi-automated pipeline approach we identified 9 non-LTR clades in Culexquinquefasciatus genome. Phylogenetic analysis classifies C. quinquefasciatus non-LTR clades representatives, in the same way as semi-automated pipeline does. This supports the correctness of the semi-automated pipeline.
L1, CR1, and Jockey clades have a wide variety of elements and a high copy number in the genome, which suggests the recent non-LTR activity.
Identify all possible protein sequences of the elements and conduct phylogenetic analysis.
Identify, if possible, active non-LTRs in Culexquinquefasciatus genome.
1. R. Holt, et al., The Genome Sequence of the Malaria Mosquito
Anopheles gambiae, Science, 298:129-149, 2002.
2. J. Biedler, Z. Tu, Non-LTR Retrotransposons in the African
Malaria Mosquito, Anopheles gambiae: Unprecedented
Diversity and Evidence of Recent Activity. Molecular Biology and Evolution, 20(11):1911-1825, 2003.
3. V. Nene , et al., Genome sequence of Aedes aegypti, a Major Arbovirus Vector. Science, 316:1718, 2007.
4. D. Lawson, et al., VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Research, 37:D58307, 2009.
5. Repbase. http://www.girinst.org/repbase/index.html.
6. TEfam. http://tefam.biochem.vt.edu.
7. PhiloDraw http://pearl.cs.pusan.ac.kr/phylodraw/#test
8. ClustalX2: clustalx-2.0.10-win
9. VectorBase. http://vectorbase.org
We thank James Biedler, Vladimir Kapitonov, Scott Christley, Karine Mouline, members of Frank H. Collins and Nora J. Besansky labs and VectorBase for helpful discussions and support.
This Work Was Supported by the US National Institute of Allergy and Infectious Diseases (NIAID) contract HHSN266200400039C.
Fig.1 Phylogenetic analysis classifies Culex quinquefasciatus non-LTR clades same way as
semi-automated pipeline does. (C. quinquefasciatus non-LTRs indicated as light green leaves.)