1 / 10

Discussion Points on Deployment, Support, Validation, Monitoring, Optimization, Maintenance, and Information Systems

This discussion covers various points related to the deployment, support, validation, monitoring, optimization, maintenance, and information systems for LFC and DPM. Topics include local catalogs, Tier 2 support, validation and monitoring tools, optimization strategies, maintenance considerations, and more.

Download Presentation

Discussion Points on Deployment, Support, Validation, Monitoring, Optimization, Maintenance, and Information Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DPM and LFC:Discussion Points from a Site Perspective Graeme StewartGridPP, University of Glasgow

  2. Outline of Discussion Points • Deployment • Supporting Tier 2s • Validation and Monitoring • Does it work? • Optimisation • Does it work well? • Maintenance and Utilities • Will it stay working? • Information Systems • How do I tell you what I’m offering?

  3. Deployment : LFC Local Catalogs • Deployed though YAIM installer at most (Tier2) sites • Configuration very easy: LFC_HOST=my-lfc.$DOMAIN LFC_DB_PASSWORD=XXXXXXX LFC_CENTRAL=“” LFC_LOCAL=“” • Installing by hand easy – twiki instructions • Add your LFC to BDII_REGIONS on the CE to publish your catalog

  4. Deployment: DPM • YAIM install for Tier2s. Functionality improved in 2.7.0 – now handles DPM disk servers with multiple filesystems. DPMDATA="/gridstore0“ DPMMGR=dpmmgr DPMUSER_PWD=XXXX DPMFSIZE=200M DPM_HOST=“dpm-admin.$MY_DOMAIN” DPMPOOL=myPool DPMPOOL_NODES=“svr1.$MY_DOMAIN:/gridstore0 \ svr1.$MY_DOMAIN:/gridstore1 svr2.$MY_DOMAIN:/bigdisk” • Conversion of Classic SE possible – excellent migration route for smaller sites. • Improvements: • Simplify YAIM (remove DPMDATA), rename DPMPOOL_NODES to DPM_FILESYSTEMS. • Service is now straightforward to setup.

  5. Deployment: Tier 2 Support • Deployment of SRMs is not trivial – especially if things go wrong! (daemon headcount: Classic SE 1, DPM 6) • Good support relationship between Tier2s and Tier1 essential • Tier1 can have experts to help Tier2s • Tier2s also build up a support community • UKI Example • Storage group at RAL • Mailing list and weekly phone conferences • Wiki to collect documentation

  6. Validation • Local LFC • Not yet monitored in any SFT • Nameserver functionality can be tested • Sites need a tool to check registrations lcg-cr --catalog=MY_LFC ? • DPM • Monitored through SFTs, but indirectly • Sites can conduct tests “by hand” (e.g., http://wiki.gridpp.ac.uk/wiki/DPM_Testing) • Often difficult to tell where an ST rm test actually fails • Multiple SEs?

  7. Optimisation • Problem of deploying DPM mostly solved. • But can Tier2s provide the performance the experiments want through DPM. • Do we know what this is? srm_write_rate(KSI2K, VO) • Wrong choices at deployment more costly to recover from once service is established: • Running DPM head node too hard can cause FTS failures • Better to isolate SRM daemons from disk servers

  8. Maintenance • LFC and DPM both hold metadata in their databases – different, e.g., from the Classic SE • Databases must be backed up. • Frequency/Recovery time? • What the experiments'’ expectations of T2 storage? • Are we saying anything about the T2s hardware yet? • For both services ensuring certificates are valid is still a headache

  9. DPM: Care and Feeding As DPM rolls out to T2s we move from deployment to service maintenance. What tools do sites need from the DPM developers? • Filesystem draining utilities • Allows reconfiguration of DPM. Removal of servers for maintenance, etc. • Per-VO Quotas • Reserved pools probably not flexible enough • Database Tools • Removal of files from dead filesystem • Load Balancing • Better filesystem selection algorithms to express hardware differences know to the sites

  10. Tier2 SEs and SRM:Glue sticks it together • Currently Tier2s publish 1 SA per VO. • GlueSAType attribute refers only to SRM storage model (permanent, durable, volatile). • GlueSEArchitecture hints at underlying hardware (disk, multidisk, tape) but is too vague for experiments. And it may vary for each SA! • Should we publish SEs as abstract Glue services? Might help with debugging? • Should we add additional fields describing storage’s “durability” (low, medium, high, archival?). Will need to be per SA – no field in Glue1.2. (ref: GDB presentations from Jeff and Laurence). • Can T2s usefully advertise volatile space to the experiments?

More Related