1 / 28

Protein Folding Landscapes in a Distributed Environment

Protein Folding Landscapes in a Distributed Environment. All Hands Meeting, 2001. University of Virginia Andrew Grimshaw Anand Natrajan. Scripps (TSRI) Charles L. Brooks III Michael Crowley. SDSC Nancy Wilkins-Diehr. Outline. CHARMM Issues Legion The Run Results Lessons AmberGrid

daktari
Download Presentation

Protein Folding Landscapes in a Distributed Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Folding Landscapes in a Distributed Environment All Hands Meeting, 2001 University of Virginia Andrew Grimshaw Anand Natrajan Scripps (TSRI) Charles L. Brooks III Michael Crowley SDSC Nancy Wilkins-Diehr

  2. Outline • CHARMM • Issues • Legion • The Run • Results • Lessons • AmberGrid • Summary

  3. CHARMM • Routine exploration of folding landscapes helps in search for protein folding solution • Understanding folding critical to structural genomics, biophysics, drug design, etc. • Key to understanding cell malfunctions in Alzheimer’s, cystic fibrosis, etc. • CHARMM and Amber benefit majority (>80%) of bio-molecular scientists • Structural genomic & protein structure predictions

  4. Folding Free Energy Landscape Molecular Dynamics Simulations 100-200 structures to sample (r,Rgyr ) space r Rgyr

  5. Application Characteristics • Parameter-space study • Parameters correspond to structures along & near folding path • Path unknown - could be many or broad • Many places along path sampled for determining local low free energy states • Path is valley of lowest free energy states from high free energy state of unfolded protein to lowest free energy state (folded native protein)

  6. Folding of Protein L • Immunoglobulin-binding protein • 62 residues (small), 585 atoms • 6500 water molecules, total 20085 atoms • Each parameter point requires O(106) dynamics steps • Typical folding surfaces require 100-200 sampling runs • CHARMM using most accurate physics available for classical molecular dynamics simulation • PME, 9 Ao cutoff, heuristic list update, SHAKE • Multiple 16-way parallel runs - maximum efficiency

  7. Application Characteristics • Many independent runs • 200 sets of data to be simulated in two sequential runs • Equilibration (4-8 hours) • Production/sampling (8 to 16 hours) • Each point has task name, e.g., pl_1_2_1_e

  8. Binaries for each type Script for dispatching jobs Script for keeping track of results Script for running binary at site optional feature in Legion Abstract interface to resources queues, accounting, firewalls, etc. Binary transfer (with caching) Input file transfer Job submission Status reporting Output file transfer Scientists Using Legion

  9. Legion Complete, Integrated Infrastructure for Secure Distributed Resource Sharing

  10. Wide-area High Performance Complexity Management Extensibility Security Site Autonomy Input / Output Heterogeneity Fault-tolerance Scalability Simplicity Single Namespace Resource Management Platform Independence Multi-language Legacy Support Grid OS Requirements

  11. Transparent System

  12. npacinet

  13. The Run

  14. 5 organisations 7 systems 9 queues 5 architectures ~1000 processors Computational Issues • Provide improved response time • Access large set of resources transparently • geographically distributed • heterogeneous • different organisations

  15. Resources Available IBM SP3 UMich 375MHz Power3 24/24 HP SuperDome CalTech 440 MHz PA-8700 128/128 DEC Alpha UVa 533MHz EV56 32/128 IBM Blue Horizon SDSC 375MHz Power3 512/1184 Sun HPC 10000 SDSC 400MHz SMP 32/64 IBM Azure UTexas 160MHz Power2 32/64

  16. Binaries for each type Script for dispatching jobs Script for keeping track of results Script for running binary at site optional feature in Legion Abstract interface to resources queues, accounting, firewalls, etc. Binary transfer (with caching) Input file transfer Job submission Status reporting Output file transfer Scientists Using Legion

  17. Mechanics of Runs Register binaries Legion Dispatch equilibration & production Create task directories & specification Dispatch equilibration

  18. Distribution of CHARMM Work

  19. 01101 LEGION UMich SDSC UVa Problems Encountered • Network slowdowns • Slowdown in the middle of the run • 100% loss for packets of size ~8500 bytes • Site failures • LoadLeveler restarts • NFS/AFS failures • Legion • No run-time failures • Archival support lacking • Must address binary differences

  20. Successes • Science accomplished faster • 1 month on 128 SGI Origins @Scripps • 1.5 days on national grid with Legion • Transparent access to resources • User didn’t need to log on to different machines • Minimal direct interaction with resources • Problems identified • Legion remained stable • Other Legion users unaware of large runs • Large grid application run at powerful resources by one person from local resource • Collaboration between natural and computer scientists

  21. AmberGrid Easy Interface to Grid

  22. Legion GUIs • Simple point-and-click interface to Grids • Familiar access to distributed file system • Enables & encourages sharing • Application portal model for HPC • AmberGrid • RenderGrid • Accounting Transparent Access to Remote Resources Intended Audience is Scientists

  23. Logging in to npacinet

  24. View of contexts (Distributed File System)

  25. Control Panel

  26. Running Amber

  27. Run Status (Legion) Graphical View (Chime)

  28. Summary • CHARMM Run • Succeeded in starting big runs • Encountered problems • Learnt lessons for future • Let’s do it again! • more processors, systems, organisations • AmberGrid • Showed proof-of-concept - grid portal • Need to resolve licence issues

More Related