1 / 13

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?. EUAsiaGrid Workshop 4-6 May 2010. Chanditha Hapuarachchi Environmental Health Institute National Environment Agency. Outline. Work scope Analytical approach Current limitations

zahina
Download Presentation

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental Health Institute National Environment Agency

  2. Outline • Work scope • Analytical approach • Current limitations • What is expected from Grid-enabling?

  3. Work scope • Understanding the molecular epidemiology of vector-borne, infectious diseases in Singapore with a view of utilizing information in disease control operations • Objectives • To determine the routes of pathogen migration (mainly Dengue and • Chikungunya viruses) • To understand the evolutionary dynamics of pathogens • To understand the outbreak potential of pathogens within the country

  4. What phylogenetic inferences are made? Phylogenetic relationships (trees) (BEAST, MEGA) Temporo-spatial distribution of viruses (BEAST, NETWORK) Molecular epidemiology of DENV & CHIKV Evolutionary dynamics (Evolutionary rates, selection pressure, recombination etc) (BEAST, HYPHY etc.) Population dynamics (Bayesian skyline plots) (BEAST) BEAST is a multi-task software package

  5. CHIKV whole genome tree with spatial model India Sri Lanka Singapore Malaysia Ind. Ocean Islands Kenya Time (yrs)

  6. Spatial distribution of different lineages of DENV in Singapore

  7. However…….. BEAST analysis is time consuming & requires substantial computing power

  8. Limitations of the BEAST approach? • Size of dataset • Length of sequences • No. of sequences • E.g. Analyzing a dataset of ~90 whole genomes of CHIKV (11.8 kb) takes several days depending on the available computing power

  9. Limitations… • Analytical parameters • A basic analysis takes ~0.3 hrs per million states • (Core 2 duo, 2.1 GHz, 4 GB RAM, >50% CPU) • A general run involves at least a 100 million sampling frame • (=~30 hrs) • The duration increases substantially with changing parameters • Incorporation of spatial model (7 states) alone increases the runtime to ~0.4 hrs per million states • The ultimate duration depends on Effective Sample Size (ESS) • values (general requirement >200)

  10. BEAST Tracer output window

  11. Limitations… • Number of parallel runs & users • ↑ runs & users -------- ↓ analytical efficiency Single run takes up >50% of CPU power

  12. Why to Grid-enable BEAST? • Enables efficient data analysis • parallel runs • multiple users • expanded datasets • Enhances data interpretation

  13. Can Grid-enabling help to improve the existing performance?

More Related