100 likes | 197 Views
This comprehensive guide covers the basics of setting up and managing a UMD Tier-3 cluster, including submitting Condor and CRAB jobs, running CMSSW, managing storage elements, gLite-UI and CRAB configurations, using BeStMan for storage, monitoring with Ganglia and RSV, employing PhEDEx for data transfers, and essential tips and tricks for efficient system administration. Get insights into cluster setup, job management, software installation, and best practices for maintaining a functional UMD Tier-3 environment.
E N D
UMD T3 experiences UMD Tier-3 experiences Malina Kirn
What are your service needs? • Basic cluster: • Submit Condor jobs to cluster • Submit CRAB jobs to grid • Run CMSSW • Download data registered in DBS (PhEDEx & srm client) • Computing element: • Service CRAB production jobs • Storage element: • Service all CRAB jobs UMD T3 experiences
UMD cluster basics • Configuration • 1 HN, 8 WNs, ~9TB disk array • HN = Rocks HN, CE & SE (obviously not scalable) • WNs = 7 interactive WNs + 1 PhEDEx WN • Disk array RAID-6, xfs, logical volume, network mounted from HN (direct attached storage) • Cluster management: Rocks • Free, with software rolls such as Ganglia, Condor • “Clean reinstall” model for WN management • Network • All nodes have internal and external network connections • Scalable, but some view as risky UMD T3 experiences
gLite-UI & CRAB • gLite-UI (EDG utils) somewhat necessary for CRAB • CRAB now offers CrabServer, which does not have to be installed at your site (direct users to set server_name=bari in crab.cfg) • gLite-UI cannot be installed on a Rocks HN, probably not on OSG CE or SE • gLite-UI configuration is a challenge, work from example (not the template) • Links: • gLite-UI: 1, 2, 3, 4 • YAIM • CRAB UMD T3 experiences
CMSSW • Can have CMSSW versions automatically installed and removed via OSG utilities. • ‘Production releases’ of CMSSW • email Bockjoo Kim • Alternatively, manually install,create link named <OSG APP>/cmssoft/cms& edit <OSG APP>/etc/grid3-locations.txt • Frontier DB queries require Squid web proxy • Support for CRAB jobs requires site-local-config.xml & storage.xml (examples) UMD T3 experiences
site planning UMD T3 experiences
BeStMan storage element (SE) • Lightweight, easy to install, configure, and use • Will manage files for you or provide a gateway to your existing file system • OSG also supports BeStMan on top of XrootdFS (requires two additional nodes, minimum) • OSG guide for BeStMan on XrootdFS is coming out, OSG guide for just BeStMan (you will want to set your own configuration options) • Getting to work with FNAL srm-client requires using special tags in calls or editing $SRM_CONFIG • webservice_path=srm/v2/server (.wsdl?) • access_latency=ONLINE • pushmode=true UMD T3 experiences
Monitoring • Ganglia for cluster monitoring, comes with Rocks • RSV for OSG monitoring, comes with OSG • We don’t use SAM • Tests CMS-specific details, very nice! We use CRAB. • Enables participation in official production • SAM tests for BeStMan SEs under development UMD T3 experiences
PhEDEx • You will probably want a “PhEDEx node” in addition to your OSG CE & SE node(s) • Transfer publicDBS data to or from your site • To site: does not require SE • To site & shown as host in DBS: requires SE • From site: requires dCache SE or a special PhEDEx client just for you at the receiving site • PhEDEx can run atop gLite-UI • gLite-UI required for advanced protocols • Otherwise uses srm • Also requires storage.xml, which can be different from CMSSW’s storage.xml. UMD T3 experiences
Tricks • Always back up your OSG installation before any upgrade! pacman allow easy rollback of software from backup. • Use cp -p : permissions in OSG directory are important • Use soft links on your first install, then you can move it around for upgrades and fixes • Set shell for grid users to /bin/true • Deter brute-force ssh attacks (we use DenyHosts) • Keep a detailed log • Write a user guide • Train admin backup • Email OSG CMS Tier-3 hypernews to get help UMD T3 experiences