1 / 24

Analyses of ORFans in microbial and viral genomes

Analyses of ORFans in microbial and viral genomes. Journal club presentation on Mar. 14 Albert Yu. ORFan. Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %)

anson
Download Presentation

Analyses of ORFans in microbial and viral genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu

  2. ORFan Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) The more genomes sequenced, the more ORFans have found Most are annotated as hypothetical proteins of unknown function (no exp.)

  3. ORFan continue More data… real , functional proteins 3D nstructure conserved in closely related species (Ka/Ks) Origin of ORFans ???????? Viral laterally transferred genes (especially phages) ? Viral genome Microbial genome

  4. Viral genome Microbial genome

  5. Question: the origin of ORFans Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses To find homologs to these microbial ORFans within the virus sequence database

  6. Genome-wide quantitative study • BLASTP • 277 microbial genomes • 1456 viral genomes • H(g): the number of genomes having at least one homolog of ORFan g • U(g): uniqueness: the genomic distance between the genomes with ORFan g

  7. Classification of ORFans • Singleton: without any homolog wherever H=1, BLASTP=1 • Paralogous: homologs in the same genome H=1, BLASTP>1 • Orthologous: homologs within very closely related microbial genome H>1, U <= 0.1(by observations)

  8. The U-value for all ORFs in prokaryote genomes In total: ORFs: 818906 ORFans: 110186 S: 64324(7.8%) P: 10419(1.3%) O: 35443(4.3%) S or p 0.64 O

  9. ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%) • Non-ORFans-VH%(NOVH): % of non-ORFans having homologs in viruses (4.1% ~ 18.2%) • The strength of the hypothesis = the value between these two VH%

  10. Percentages of microbial ORFs with homologs in viruses Gamma proteobacteria Red: OVH Blue: NOVH 24 phylogenetic clades Bacteria Firmicutes Archea

  11. The average % of OVH and NOVH in various groups 6.6% vs 0.8 % 148 10% vs 9 % 63 8.5% vs 2.7 % 66

  12. Conclusion • Most OVH << NOVH: current evidence supporting the hypothesis is weak • Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased) Viral database bias 1456 viruses 280 phages (109--Gamma; 102--Firmicutes; 69--others) Sampling ?????

  13. Viral genome Microbial genome

  14. 277 Microbial genomes • 1456 viruses All-virus-DB: 43566 ORFs • 280 phages (20%) Phage-DB: 18368 ORFs (42%) ORFans: all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr) all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)

  15. Some characteristics of ORFans • Bacterial ORFans are shorter than non-ORFans on average • Bacterial ORFans have significant lower GC3 content than non-ORFans

  16. The length of Viral ORFans and non-ORFans Length: Non-ORFans > ORFans

  17. Length: ORFans < non-ORFans GC3%: ORFans < non-ORFans

  18. The number of ORFs per genome in 1456 viruses Focusing on phage: higher %

  19. The growing of the number of phage ORFans (consistent) Keep increasing 38.4% Drop to 0 ?

  20. Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity • Only 280 phage genomes in database (low phage sampling)

  21. Less than 5 phages Virus sampling bias between and within groups

  22. The H-value percentages for all phage ORFs and prokaryotic ORFs prokaryotes phages 38.4% - ORFans 9.1% - ORFans 32.4% - ortho 11.3% - ortho

  23. the H-value percentages of phage ORFs

  24. 4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans) • 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans) • 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)

More Related