1 / 29

High performance bioinformatics

Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams. High performance bioinformatics. Problem/Need Statement. Current ways to solve Bioinformatics problems are either slow or very expensive.

nika
Download Presentation

High performance bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams High performance bioinformatics

  2. Problem/Need Statement • Current ways to solve Bioinformatics problems are either slow or very expensive. • There is a need for a way to reduce cost and still deliver high performance in a computer system that can solve Bioinformatics problems.

  3. What is Bioinformatics? • Genetic sequencing. • Massive amounts of data. • Simple operations but many of them. • Perfect for distributed computing.

  4. Proposed Solution • Use a cluster of PS3s with their embedded Cell processors.

  5. Cell Broadband Engine • Has 1 central PowerPC based PPE. • Has 8 surrounding SPEs. • The 8 SPEs are connected via the element interconnect bus.

  6. Cell Broadband Engine

  7. Functional requirements • FR1. Ported applications shall run on the Cell B.E. • FR2. The results returned shall be the same as the original program. • FR3. The applications shall return their runtime. • FR4. The applications shall execute in parallel on multiple Cell B.E.s.

  8. Non-Functional Requirements • NF1. The Cells shall all run on the Linux OS. • NF2. The resulting runtimes of the ported applications shall be faster than on the original applications. • NF3. The ported application shall be coded in the C language.

  9. Operating Environment • Use Fedora 9 OS as it is currently supported by the Cell SDK 3.1. • Uses the command line for user interface. • Use the IBM XLC compiler and/or the current GCC compiler.

  10. Market Survey • Results of the survey point to a huge speed up of computationally intensive programs. • Dr. GauravKhanna at the University of Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer. • Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

  11. Deliverables • The Source Code. • Compiled Executable. • Runtime Comparisons. • Project Final Report. • Project Poster. • Project Final Presentation.

  12. Work Breakdown Structure

  13. Costs • Time • Approximately 555 man hours total. Freely donated. Total cost $0. • Equipment • 3 PS3s • Crossbar router Provided for us by client. Total cost $0.

  14. Resource Requirements • 3 PlayStation 3s. • High performance network switch. • Books on distributed computing on Cell. • Time.

  15. Work Schedule • Gant chart

  16. Risk Assessment • Slow network speed. • Software support. • Limited RAM. • Hardware Failure. • Lower quality entertainment hardware. • Limited prior experience. • Software development schedule.

  17. Design • Further divide the application into multiple threads for SPE execution on multiple PS3s, alter the functional logic, and vectorize the code where possible.

  18. Software Decomposition Diagram

  19. System Requirements • SR1. The system shall allow the user to input multiple DNA sequences in FASTA format through a file interface. • SR2. The system shall output all of the most parsimonious trees implied by the input data to the screen. • SR3. The system shall share computational work among the PPE and SPEs available to each client/server process. • SR4. The front-end shall share computational work with available back-end processes. • SR5. The front-end shall be able to connect to at least 2 back-end processes via a high performance router.

  20. System Analysis • The key is data flow. • Broken into 3 stages. • DNA sequences distributed to the PPEs down to the SPEs • Each SPE searches every possible parsimony tree for the best possible score using a branch and bound heuristic. • Finally the results are aggregated back to the main PPE and the results output.

  21. Specifications • Input • DNA sequence files in FASTA format. • Output • Runtime of the application. • The most parsimonious phylogenetic tree. • The parsimony score of the phylogenetic tree.

  22. Specifications • User Interface • No changes to the user interface. • Uses a command line interface.

  23. Specifications • Hardware • 3 PlayStation 3s • High performance Cross-Bar network switch.

  24. Specifications • Software • Fedora 9 with Linux 2.6.25 kernel for the Power PC • IBM Cell SDK 3.1 • IBM XLC 9.0 and GCC 4.3 compilers. • DNAPenny 3.6. • Bioperf Suite

  25. Specifications • Testing • Compare benchmarked runtimes over several iterations and inputs to get averages. • Compare these runtimes with previous group’s runtimes on single Cell processor. • Compare these runtimes with previous group’s runtimes on a high performance server. • Quad-core Intel Xeon 3.0GHz, 6GB RAM.

  26. Acknowledgements • May08-24 group • Kyle Byerly • Shannon McCormick • Matt Rohlf • Bryan Venteicher • Bioperf developers • David A. Bader, Georgia Tech • Yue Li, Univ. of Florida • Tao Li, Univ. of Florida • Vipin Sachdeva, IBM Austin

  27. Questions?

  28. Previous Results and Projected Results

  29. Summary • Cost: $0. Equipment provided. • Time: 555 approximate man hours. Freely Donated. • Results: 4x the performance of a similarly priced system.

More Related