1 / 13

Open Science Grid: More compute power Alan De Smet chtc@cs.wisc

Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu. CHTC Cores In Use. (CPU days each day averaged over one month). 1,500. OSG Cores In Use. (CPU days each day averaged over one month). 60,000. Open Science Grid. CHTC and OSG usage. (CPU days each day).

chelsi
Download Presentation

Open Science Grid: More compute power Alan De Smet chtc@cs.wisc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Science Grid:More compute powerAlan De Smet chtc@cs.wisc.edu

  2. CHTC Cores In Use (CPU days each day averaged over one month) 1,500

  3. OSG Cores In Use (CPU days each day averaged over one month) 60,000

  4. Open Science Grid

  5. CHTC and OSG usage (CPU days each day)

  6. Challenges Solved We worry about all of this. You don’t have to. • Authentication • X.509 certificates, certificate authorities, VOMS • Interface • Globus, GridFTP, Grid universe • Validation • Linux distribution, glibc version, basic libraries

  7. Using OSG • Before universe = vanilla executable = myjob log = myjob.log queue

  8. Using OSG • After universe = vanilla executable = myjob log = myjob.log +WantGlidein = true queue

  9. Challenge: Opportunistic • OSG computers go away without notice • Solutions • Condor restarts automatically • Sub-hour jobs • Self-checkpointing • Automated checkpointing • Condor’s standard universe • DMTCP http://dmtcp.sourceforge.net/

  10. Challenge: Local Software

  11. Challenge: Local Software • Bare-bones Linux systems • Solution • Bring everything with you • CHTC provided MATLAB and R packages • RunDagEnv/mkdag

  12. Challenge: Erratic Failures • Complex systems fail sometimes • Solution • Expect failures and automatically retry • DAGMan for retries • DAGMan POST scripts to detect problems • RunDagEnv/mkdag

  13. Challenge: Bandwidth • Solutions • Only send what you need • Store large, shared files in our web cache • Read small amounts of data on the fly • Condor’s standard universe • Parrot http://www.cse.nd.edu/~ccl/software/parrot/

More Related