130 likes | 361 Views
Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu. CHTC Cores In Use. (CPU days each day averaged over one month). 1,500. OSG Cores In Use. (CPU days each day averaged over one month). 60,000. Open Science Grid. CHTC and OSG usage. (CPU days each day).
E N D
Open Science Grid:More compute powerAlan De Smet chtc@cs.wisc.edu
CHTC Cores In Use (CPU days each day averaged over one month) 1,500
OSG Cores In Use (CPU days each day averaged over one month) 60,000
CHTC and OSG usage (CPU days each day)
Challenges Solved We worry about all of this. You don’t have to. • Authentication • X.509 certificates, certificate authorities, VOMS • Interface • Globus, GridFTP, Grid universe • Validation • Linux distribution, glibc version, basic libraries
Using OSG • Before universe = vanilla executable = myjob log = myjob.log queue
Using OSG • After universe = vanilla executable = myjob log = myjob.log +WantGlidein = true queue
Challenge: Opportunistic • OSG computers go away without notice • Solutions • Condor restarts automatically • Sub-hour jobs • Self-checkpointing • Automated checkpointing • Condor’s standard universe • DMTCP http://dmtcp.sourceforge.net/
Challenge: Local Software • Bare-bones Linux systems • Solution • Bring everything with you • CHTC provided MATLAB and R packages • RunDagEnv/mkdag
Challenge: Erratic Failures • Complex systems fail sometimes • Solution • Expect failures and automatically retry • DAGMan for retries • DAGMan POST scripts to detect problems • RunDagEnv/mkdag
Challenge: Bandwidth • Solutions • Only send what you need • Store large, shared files in our web cache • Read small amounts of data on the fly • Condor’s standard universe • Parrot http://www.cse.nd.edu/~ccl/software/parrot/