Download
summer internship n.
Skip this Video
Loading SlideShow in 5 Seconds..
Summer Internship PowerPoint Presentation
Download Presentation
Summer Internship

Summer Internship

97 Views Download Presentation
Download Presentation

Summer Internship

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Summer Internship Douglas Drobny Idaho National Laboratory High Performance Computing

  2. Who I worked for • Idaho National Laboratory • Idaho Falls • High Performance Computing group • Manages ~4 different clusters • Supports and maintains software for big research progress. • User Support group

  3. Clusters • Fission • 12,512 processors • 25 TBytes of memory • Icestorm • 2048 processors • 4 TBytes of memory • Quark • Eos

  4. Compute Manager • Current job submissions are command line • Goals • Web interface for PBS Scheduler • Easy to use • Behaves the same as current job submissions • Improved error message handling

  5. Setup • Application Services • On the server head nodes • Receive web requests • Submits Jobs • Compute Manager • On the web server • Creates web forms • Sends results to App. Services • Displays Results

  6. What I did • Installed compute manager and AIF on Eos • Created test cases for PBS features • Created test cases for User Inputs • Submit feedback / bug reports with PBS • Documented process for future implementations / troubleshooting

  7. Results • Good • Easy to create different application forms • Instant job monitoring • Restrict input values • Default input values • Secure file transferring

  8. Results • Bad • Easy to put results in insecure location • Always copies the input files • Missing a form entry can result in lost output files • Spams the sudo log • “Fixed in next version (Week after I leave)”

  9. Updating HPC Wiki • Moinmoin wiki (python) • 1.8.8 to 1.9.4 • Used temporary virtual machine to test update and fix issues • Added support for viewing reports • Deployed on hpcweb • Note: Learn what type of service monitoring is being used before taking down a system.

  10. Wiki Reports • Automatically generate a visual report of an XML document each month • Created the XSL • Putting data into charts • Automation ('Right' way vs. Working way) • Editing to reduce transcription errors • <script/>

  11. XSL/XML • Goal: Display XSL/XML pages inside of a wikipage • Problems • Moinmoin uses outdated XSL library • XSL can contain javascript (XSS) • Solution • Created a wiki macro to convert XML with a specific XSL stylesheet on the server

  12. Intel Compiler Issue (ICC) • Issue • Compile times on Quark are much longer than Fission (head nodes) • Quark should be faster (hardware wise) • 17 minutes on Quark • 8 minutes on Fission

  13. Intel Compiler Steps • Create test cases • Determine effected systems • Enable debugging • Strace • Wireshark • Hardware Test Environment

  14. ICC Solution • License files were resolved in the order • License manager • User's home directory • /opt/intel • /apps/intel/..../license • 'Errors' in the license file cause the system to check all of the sources

  15. ICC Solution • The /opt/intel license files pointed to the license manager • This caused additional requests to the license manager (takes time) • Quark's /opt/intel license files pointed to the license servers the most • *Removed /opt/intel/license folder to fix the problem.

  16. Things Learned • Python • XSL • Creating and Signing SSL Keys • Unix permissions • Strace • Testing • Refactoring • Monitoring • Vim!