1 / 13

Maui High Performance Computing Center

Maui High Performance Computing Center. Open System Support An AFRL, MHPCC and UH Collaboration December 18, 2007. Mike McCraney MHPCC Operations Director. Agenda. MHPCC Background and History Open System Description Scheduled and Unscheduled Maintenance Application Process

shelby
Download Presentation

Maui High Performance Computing Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maui High Performance Computing Center Open System Support An AFRL, MHPCC and UH Collaboration December 18, 2007 Mike McCraney MHPCC Operations Director

  2. Agenda • MHPCC Background and History • Open System Description • Scheduled and Unscheduled Maintenance • Application Process • Additional Information Required • Summary and Q/A

  3. An AFRL Center • An Air Force Research Laboratory Center • Operational since 1993 • Managed by the University of Hawaii • Subcontractor Partners – SAIC / Boeing • A DoD High Performance Computing Modernization Program (HPCMP) Distributed Center • Task Order Contract – Maximum Estimated Ordering Value = $181,000,000 • Performance Dependent – 10 Years • 4 Year Base Period with 2, 3-Year Term Awards

  4. A DoD HPCMP Distributed Center Director, Defense Research and Engineering DUSD (Science and Technology) High Performance Computing Modernization Program • Major Shared Resource Centers • Aeronautical Systems Center (ASC) • Army Research Laboratory (ARL) • Engineer Research and Development Center (ERDC) • Naval Oceanographic Office (NAVO) • Distributed Centers • Allocated Distributed Centers • Army High Performance Computing Research Center (AHPCRC) • Arctic Region Supercomputing Center (ARSC) • Maui High Performance Computing Center (MHPCC) • Space and Missile Defense Command (SMDC) • Dedicated Distributed Centers • ATC • AFWA • AEDC • AFRL/IF • Eglin • FNMOC • JFCOM/J9 • NAWC-AD • NAWC-CD • NUWC • RTTC • SIMAF • SSCSD • WSMR

  5. MHPCC HPC History • 1994 - IBM P2SC Typhoon Installed • 1996 - 2000 IBM P2SC • 2000 - IBM P3 Tempest Installed • 2001 - IBM Netfinity Huinalu Installed • 2002 - IBM P2SC Typhoon Retired • 2002 - IBM P4 Tempest Installed • 2004 - LNXi Evolocity II Koa Installed • 2005 - Cray XD1 Hoku Installed • 2006 - IBM P3 Tempest Retired • 2007 - IBM P4 Tempest Reassigned • 2007 - Dell Poweredge Jaws Installed

  6. Hurricane Configuration Summary Current Hurricane Configuration: • Eight, 32 processor/32GB “nodes” IBM P690 Power4 • Jobs may be scheduled across nodes for a total of 288p • Shared memory jobs can span up to 32p and 32GB • 10TB Shared Disk available to all nodes • LoadLeveler Scheduling • One job per node – 32p chunks – can only support 8 simultaneous jobs • Issues: • Old technology, reaching end of life, upgradability issues • Cost prohibitive – Power consumption constant ~$400,000 annual power cost

  7. Dell Configuration Summary Proposed Shark Configuration: • 40, 4 processor/8GB “nodes” Intel 3.0Ghz Dual Core Woodcrest Processors • Jobs may be scheduled across nodes for a total of 160p • Shared memory jobs can span up to 8p and 16GB • 10TB Shared Disk available to all nodes • LSF Scheduler • One job per node – 8p chunks – can support up to 40 simultaneous jobs • Shared use as Open system and TDS (test and development system) • Much lower power cost – Intel power management • System already maintained and in use • System covered 24x7 UPS, generator • Possible short-notice downtime Features/Issues:

  8. Storage Storage Storage User Webtop Storage User Webtop DDN 200 TB User Webtop 24 Lustre I/O Nodes, 1 MDS Simulation Engine Simulation Engine User Webtop Simulation Engine User Webtop Simulation Engine 3 Interactive Nodes (12 cores) 1280 Batch (5120 Cores) Networks Networks Networks Networks DREN Networks Jaws Architecture Cisco 6500 Core Head Node • Head Node for System Administration • “Build” Nodes • Running Parallel Tools • (pdsh, pdcp, etc.) • SSH Communications Between Nodes • Localized Infiniband Network • Private Ethernet • Dell Remote Access Controllers • Private Ethernet • Remote Power On/Off • Temperature Reporting • Operability Status • Alarms • 10 Blades Per Chassis • CFS Lustre Filesystem • Shared Access • High Performance • Using Infiniband Fabric 10 Gig-E Ethernet Fibre Gig-E nodes with 10 Gig-E uplinks. 40 nodes per uplink. Fibre Channel Cisco Infiniband (Copper)

  9. Shark Software • Systems Software • Red Hat Enterprise Linux v4 • 2.6.9 Kernel • Infiniband • Cisco Software stack • MVAPICH • MPICH 1.2.7 over IB Library • Gnu 3.4.6 C/C++/Fortran • Intel 9.1 C/C++/Fortran • Platform LSF HPC 6.2 • Platform Rocks

  10. Maintenance Schedule • Current • 2:00pm – 4:00pm • 2nd and 4th Thursday (as necessary) • Check website (mhpcc.hpc.mil) for maintenance notices • New Proposed Schedule • 8:00am – 5:00pm • 2nd and 4th Wednesdays (as necessary) • Check website for maintenance notices • Only take maintenance on scheduled systems • Check on Mondays before submitting jobs

  11. Account Applications and Documentation • Contact Helpdesk or website for application information • Documentation Needed: • Account names, systems, special requirements • Project title, nature of work, accessibility of code • Nationality of applicant • Collaborative relevance with AFRL • New Requirements • “Case File” information • For use in AFRL research collaboration • Future AFRL applicability • Intellectual property shared with AFRL • Annual Account Renewals • September 30 is final day of the fiscal year

  12. Summary • Anticipated migration to Shark • Should be more productive and able to support wide range of jobs • Cutting edge technology • Cost savings from Hurricane (~$400,000 annual) • Stay tuned for timeline – likely end of January, early February

  13. Mahalo

More Related