1 / 23

Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations

Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations. David Gobaud Computational Drug Discovery Stanford University 7 March 2006. Outline. Overview Background Delft Molecular Dynamics Processor GRAPE Protein Explorer Summary MDGRAPE-3 Chip

lerato
Download Presentation

Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations David Gobaud Computational Drug Discovery Stanford University 7 March 2006

  2. Outline • Overview • Background • Delft Molecular Dynamics Processor • GRAPE • Protein Explorer Summary • MDGRAPE-3 Chip • Force Calculation Pipeline • J-Particle Memory and Control Units • System Architecture • Software • Cost • Questions

  3. Overview • Protein Explorer • Petaflop special-purpose computer system for molecular dynamics simulations • High-precision screening for drug design • Large-scale simulations of huge proteins/complexes • PC cluster with special-purpose engines to perform the most time-consuming calculations • Dedicated LSI MDGRAPE-3 chip performs force calculations at 165 Gflops or higher • ETA 2006

  4. Background • PCs are universal machines • Various applications • Hardware can be designed independent of applications • Obstacles to high-performance • Memory bandwidth bottleneck • Heat dissipation problem • Can be overcome by developing specialized architectures

  5. Delft Molecular Dynamics Processor (DMDP) • Pioneered high-performance special-purpose systems • Not able to achieve effective cost-performance • Demanded too much time and money in development state • Speed of development is a crucial factor affecting cost-performance because electronic device technology continues to develop rapidly • Almost all calculations performed by DMDP making hardware very complex

  6. GRAPE (GRAvity PipE) • One of the most successful attempts to develop high-performance special-purpose systems • Specialized for simulations of classical particles • Most time spent on calculation of long-range forces (gravitational, Coulomb, and van der Waals) • Thus special hardware only performs these calculations • Hardware very simple and cost-effective

  7. GRAPE (GRAvity PipE) • In 1995 first machine to break teraflops barrier in nominal peak performance • Since 2001 leader in performance has been Molecular Dynamics Machine at RIKEN at 78-TFlops • 2002 @ University of Tokyo a 64-TFlop GRAPE-6 completed • Protein Explorer launched based on 2002 University of Tokyo success

  8. Protein Explorer Summary • Host PC cluster with special purpose boards attached • Boards calculate only non-bounded forces • Very simple hardware and software • No detailed knowledge of hardware needed to write programs • Communication time between host and boards is proportional to number of particles • Calculation time proportional to • N^2 for direct summation of long-range forces • N*Nc for short range forces where Nc is the average number of particles within the cutoff radius • 0.25 byte/1000 operations

  9. MDGRAPE-3 Chip - Force Calculation Pipeline • 3 subtractor units • 6 adder units • 8 multiplier units • 1 function-evaluation unit • Can perform ~33 equivalent operations/sec when it calculates the Coulomb force

  10. MDGRAPE-3 Chip - Force Calculation Pipeline

  11. MDGRAPE-3 Chip - Force Calculation Pipeline • Most operations done in 32-bit single precision floating point format • Force accumulation is 80-bit fixed point format • Can be converted to 64-bit double precision floating point • Coordinates stored in 40-bit fixed-point format • Makes implementation of periodic boundary condition easy

  12. MDGRAPE-3 Chip - Force Calculation Pipeline • Function Evaluator • Most important part of pipeline • Allows calculation of arbitrary smooth function • Has memory unit which contains a table for polynomial coefficients and exponents and a hardwired pipeline for fourth-order polynomial evaluation • Interpolates an arbitrary smooth function g(x) using segmented fourth-order polynomials by Homer’s method

  13. MDGRAPE-3 Chip - J-Particle Memory and Control Units • 20 Force Calculation Pipelines • j-Particle Memory Unit • 32,768 bodies • “Main Memory” • 6.6 Mbits constructed by static RAM • Cell-Index Controller • Controls j-Particle memory – generates addresses • Force Simulation Unit • Master Controller • Manages timings and inputs/outputs of the chip

  14. MDGRAPE-3 Chip • 2 virtual pipelines/physical pipeline • Physical bandwidth of j-particle unit 2.5 Gbytes/sec but virtual bandwidth will reach 100 Gbytes/sec • 340 arithmetic units • 20 function-evaluator units which work simultaneously • 165 Gflops at 250MHz

  15. MDGRAPE-3 Chip

  16. MDGRAPE-3 Chip • Chip made by Hitachi • 6M gates • 10M bits of memory • Chip size is ~220 mm^2 • Dissipate 20 watts at core voltage of +1.2V • .12 W/Gflops much better than P4 3GHz which is 14 W/Gflop

  17. System Architecture • Host PC cluster will use Itanium or Opteron CPU • 256 nodes with 512 CPUs each • Performance of node is 3.96 Tflops • Total reaches a petaflop • Require 10G-bit/sec network • Infiniband 10G Ethernet or future Myrinet • Network topology will be a 2D hyper-crossbar • Each node has 24 MDGRAPE-3 chips • MDGRAPE-3 chips connected via 2 PCI-X busses at 133 MHz • 19” rack can house 6 nodes • 43 racks total • Power dissipation ~150 KWatts • Occupy 100 m^2

  18. System Architecture

  19. Protein Explorer Board

  20. Software • Very easy to create programs for • All computational abilities provided in a library • No special knowledge of device needed

  21. Cost • $20 million including labor • Less than $10/Gflop • At least ten times better than general-purpose computers even when compared with relatively cheap BlueGene/L ($140/Gflop)

  22. Questions • What is Myrinet? • What is a two-dimensional hyper-crossbar network topology? • How does this compare to massive distributed computing such as Folding@Home • Advantages? • Disadvantages?

More Related