1 / 16

MINERVA USER GROUP MEETING 3 July 2012

MINERVA USER GROUP MEETING 3 July 2012. Minerva Operational Statistics. Minerva Operational Statistics. Minerva Usage By User. Remaining Users CPU Hours. Minerva Usage By Group. Minerva Utilization Mid-April - June. Minerva Utilization May - June. Minerva Scratch Usage. /scratch.

uyen
Download Presentation

MINERVA USER GROUP MEETING 3 July 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MINERVA USER GROUP MEETING 3 July 2012 MUG - Mid April - June Period

  2. MUG - Mid April - June Period

  3. Minerva Operational Statistics MUG - Mid April - June Period

  4. Minerva Operational Statistics MUG - Mid April - June Period

  5. Minerva Usage By User MUG - Mid April - June Period

  6. MUG - Mid April - June Period

  7. Remaining Users CPU Hours MUG - Mid April - June Period

  8. Minerva Usage By Group MUG - Mid April - June Period

  9. MUG - Mid April - June Period

  10. MUG - Mid April - June Period

  11. Minerva UtilizationMid-April - June MUG - Mid April - June Period

  12. Minerva UtilizationMay - June MUG - Mid April - June Period

  13. Minerva Scratch Usage /scratch /projects MUG - Mid April - June Period

  14. Other Plans/Projects • Archival Storage • Ordered: Tape Library with 4 Tape transports • 350TB tape capacity • Anticipated 1 Sep 2012 start of service • GPGPU • Chassis w/2 Fermi-based Tesla cards ordered • Target availability date is 1 Aug 2012 • Checkpoint/Restart (BLCR) • Partially Installed – needs reboot of systems and testing. • Monthly Training Meetings • Third Tuesday of Month • Alternate between basic and advanced MUG - Mid April - June Period

  15. Hiccups Scheduler Failure: Problem: June Tripled previous job count. Scheduler database table overflowed. Resolution: We put limits for the number of jobs per user in Torque and Moab. Long Term: Newer version of Torque and Moab. Move to a SQL Database. Infiniband / MPI Issues: Problem: Mellanox driver buffer overflowing because of 64 core systems. Resolution: We built a custom version of the Mellanox driver. Long Term: Working with Mellanox to add changes to mainline code. AMD 64core understanding + performance: Problem: Misunderstanding of number of 32 FPU’s in a system, not 64. Also the ACML Library is not tuned for the FFTW Library. Resolution: Changed scheduling to allow blocks of 32 and job exclusive nodes. Long Term: AMD is creating a new ACML library with tuned FFT sizes. MUG - Mid April - June Period

  16. Open ForumRequested/Suggested Topics • Bioconductor R site-library • Should we put all Bioconductor R packages in one library? ( module load bioconductor) • Epilogue report • Report job resource resource usage to syserr? • PM Schedule • Can we reduce PM’s to monthly? • Fairshare • Comments? Feedback? MUG - Mid April - June Period

More Related