1 / 40

Developing a Bioinformatics Grid-aware Client

Developing a Bioinformatics Grid-aware Client. Steven Stones-Havas 1 and Allen Rodrigo 2 1 Biomatters Ltd And 2 The Bioinformatics Institute (New Zealand). Talk Overview. Overview of BeSTGrid and NZBioGrid Motivation and design philosophy The Geneious platform Workflows

gustav
Download Presentation

Developing a Bioinformatics Grid-aware Client

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing a Bioinformatics Grid-aware Client Steven Stones-Havas1 and Allen Rodrigo2 1Biomatters Ltd And 2The Bioinformatics Institute (New Zealand)

  2. Talk Overview • Overview of BeSTGrid and NZBioGrid • Motivation and design philosophy • The Geneious platform • Workflows • Future developments

  3. KAREN and BeSTGRID

  4. Kiwi Advanced Research and Education Network (KAREN) • Established by REANNZ Ltd • “REANNZ (Research and Education Advanced Network New Zealand Ltd) is the Crown-owned company set up to establish, own and operate a high-speed telecommunications network for the research and education sectors.” • www.karen.net.nz • High Speed Connectionn (up to 10 Gbits/sec) between NZ Universities and CRIs • High Speed Connection to Australia (up to XX Mbits/sec) and the rest of the world

  5. Broadband enabled Science and Technology GRID (BeSTGRID) • “BeSTGRID started in 2006 as a Tertiary Education Commission Innovation and Development Fund Project 2006-2008, focused on how to make eResearch work, to create a fully-functional eResearch ecosystem for New Zealand. BeSTGRID delivered mechanisms, methods and tools that facilitate collaboration on shared information, sharing of computational resources and online visualization of instruments and experiments.” www.bestgrid.org • Consists of • Data Grid • Collaboration Grid • Computational Grid

  6. NZ BioGrid

  7. NZBioGrid • To implement a grid-enabled platform for biological science researchers that will deliver access to biologically relevant databases and applications. • Funded by the Ministry of Research, Science and Technology and TelstraClear Ltd, through REANNZ. • Project started in May 2007. • The Bioinformatics Institute (New Zealand), University of Auckland, in partnership with: • Biomatters Ltd • NetValue Ltd

  8. So what exactly is “Bioinformatics”? • The computational organization and analysis of biological information • Bioinformatics is an interdisciplinary science. It integrates: • Biology • Computer Science • Mathematics • Statistics

  9. 2002 NCBI, National Library of Medicine, NIH www.ncbi.nlm.nih.gov “There are approximately 65,369,091,950 bases in 61,132,599 sequence records in the traditional GenBank divisions and 80,369,977,826 bases in 17,960,667 sequence records in the WGS division as of August 2006.”

  10. Sub-$10,000 Genome Sequencing and Population Genomics

  11. New Sequencing Technologies • Roche, Illumina, and Applied Biosystems have released next-generation sequencers that produce large quantities of sequence information. • Millions of shotgun fragments, each between 25nt-250nt long • 106 - 109 nt in a single run (within days/weeks) • Other technologies will follow.

  12. Databases Nucleotide Sequence Databases RNA sequence databases Protein sequence databases Structure Databases Genomics Databases (non-vertebrate) Metabolic and Signaling Pathways Human and other Vertebrate Genomes Human Genes and Diseases Microarray Data and other Gene Expression Databases Proteomics Resources Other Molecular Biology Databases Organelle databases Plant databases Immunological databases Total 1062 Source: NAR Database Categories List

  13. www.biomirror.org.nz

  14. NZBioGrid -- Motivation • 21st century biology relies on bioinformatics. • Many biologists are mathematically or computationally challenged, but • They need access to data • They need access to tools • They have solved these problems in an ad hoc manner • Many computational biologists and bioinformaticists (who are not so challenged) write programs that are • Difficult to run (e.g., command line input) • Have different input/output formats • Focus on one or a few analyses • Many computational tasks take a great deal of time to execute.

  15. NZBioGrid – Design Philosophy • To develop a tool that • Is easy to use • Can sit on a desktop • Available for different OSs (principally, Windows and MacOS) • Has a GUI • Can integrate I/O across different analyses • Can be extended as more analyses/ software become available • Can use the resources on BeSTGRID to relieve computational burden on individual computers • Minimal “culture change: focuses on OUTCOMES as well as ANALYSES

  16. NZBioGrid and Geneious • The Bioinformatics Institute has teamed up with Biomatters Ltd to deliver a grid-enabled platform for computational analysis. • The platform is built on Biomatters’ existing product, Geneious. • Written in Java • E-mail-like interface • Standard bioinformatics tools • Consistent GUI • API permits plug-ins • Runs on a desktop, but is internet-aware

  17. Comprehensive toolset • Sequence and structure alignment • Primer design and restriction analysis • Phylogenetic and taxonomic tree building • Contig assembly • Publication searching • Automatic search agents • Collaboration

  18. Grid-enabled Geneious client -- Development • Grid Plug-in • Plug-ins for existing programs on BeSTGRID • Plug-ins for generic command-line programs on BeSTGRID • Workflows

  19. Getting Started You need a security Certificate GRIX (http://grix.vpac.org/downloads/)

  20. Software Available with Native Plug-ins • Now available • ClustalW • MrBayes • LAMARC • PAUP* • Soon to be added • BLAST • BEAST

  21. Command-line Programs: Command Line Interface Creator (CLIC) • As more programs are added, there needs to be a facility that permits these programs to be integrated into the platform. • CLIC is a plug-in that permits a user to specify command line syntax and switches for any program that permits command-line input • CLIC generates an XML script that Geneious uses to create a dialog box.

  22. Workflows • A great deal of work has been done on workflows • “Support basic research in computer science to create a science of workflows.” Recommendation by the NSF-sponsored Workshop on the Challenges of Scientific Workflows, May 06 • In biology, there are two broad types of workflows: • Repetitive application of routine tasks • Well-defined, generally accepted workflows • “Program-splicing” • Permits different combinations of programs • Need to allow for user to interact with workflow

  23. Opportunity for user to review alignment with summary diagnostics. • Development of alignment quality scores, that permit automatic progression through workflow.

  24. Program-splicing • We want to use an outcome-driven approach to workflow development. • We want the users to tell us what type of data they have, and what they want to get out of the data. • Different from analysis-driven approach. • Plan to create “The Advisor” • A simple expert system that will design workflows based on user’s input. • Input delivered using questions/answers about outcomes. • Output is a workflow.

  25. Future plans • More plug-ins • More workflows • Work with NetValue Ltd • Delivery of rapid database searching based on their SlimSearch platform (several orders of magnitude faster than BLAST) • The Advisor • Publicity blitz

  26. What we have learnt so far • User uptake depends on: • Ease of use • Ease of access • Reliability of grid services • Publicity • Need two-tier platform: • For biologists (focus on outcomes) • For bioinformaticists/computational biologists (focus on analyses)

  27. Acknowledgements • Ministry of Research, Science and Technology • TelstraClear • REANNZ • The NZ BioGrid Design Team: David Bryant Alexei Drummond Stephane Guindon Howard Ross www.bioinformatics.org.nz

More Related