1 / 15

McFarm Improvements and Re-processing Integration

McFarm Improvements and Re-processing Integration. D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003. http://www-hep.uta.edu/~d0race/McFarm/McFarm.html. Reasons for Using McFarm. McFarm is a DØ MC Control Software developed at UTA and used in six farms

orson-case
Download Presentation

McFarm Improvements and Re-processing Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003 http://www-hep.uta.edu/~d0race/McFarm/McFarm.html

  2. Reasons for Using McFarm • McFarm is a DØ MC Control Software developed at UTA and used in six farms • Simplifies Monte Carlo Production • Manages the Cluster with Minimum Labor • Manages the Cluster Efficiently • Minimizes Impact of Changes to SAM, mc_runjob, other DØ software • User-Oriented McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  3. McFarm Software Integration • DØ Binaries - minitars or full release • SAM - declaration, storage, retrieval • mc_runjob - job and metadata construction • NFS - access to binaries, minbias database • NIS - account management • ssh - intra-cluster monitoring and control • Batch Queues - PBS and Condor McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  4. Improvements: Procedural Changes • Request monitoring now provided by McFarm monitor to close requests • “check_sam” now obsolete, replaced by archive daemon & store-verification • Mechanism to handle too-large reco tasks: do just the pythia/d0g/sim (PDS jobs) and let requestor do reco on CAB McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  5. Bug Fixes • event-count now correct when not all events done • available-space correct on NFS-mounted disks (df command) • No longer attempting to patch metadata for bad key-words. • Others McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  6. Enhancements • Two-day grace period for final tmb merge now configurable: FARM_MERGE_GRACE_HOURS • Also FARM_MERGE_MAX_EVENTS and FARM_MERGE_MAX_FILES • Monitor reassurance can be turned off: FARM_MONITOR_REASSURE=‘NO’ McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  7. Enhancements - 2 • FARM_SAM_VERIFY_STORE_HOURS makes storing and purging separate events in McFarm. “Archive” daemon. • SAM store retry improved - will undeclare, cancel-store as necessary • bin/onetime/re-store full-job-dir-name • SAM gather will get to merger files periodically even if busy with regular stores McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  8. Enhancements - 3 • Request life-cycle monitoring handled by McFarm monitor to improve turn-around. • Number of events now in gather.log • execute daemon will detect reco stall due to over-swapping and will kill job. • Execute daemon retains job hist even when stopped/restarted (job.hist file) McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  9. Enhancements - 4 • launch_request accepts PDS (and S) jobs to handle unwieldy reco requests - it stores sim files. • purge_job accepts “--d0phase=mcpNN” argument to purge archives by D0 phase, including merger archives. McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  10. Re-Processing Real Data • Basic approach is to do d0reco binary only, using reprocessing Framework rcp, running on raw or reco file as input. • Joel Snow traced the rcp usage in UMICH job • Mark Sosebee has done sample by hand and analyzed histograms - so far so good • Dave Evans has included re-processing support in version 06-00-02 of mc_runjob McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  11. Re-Processing Real Data - 2 • I have used mc_runjob v06 to manually re-process both MC reco files and raw files. Testing is continuing - presently some problems with metadata declaration. • The bad news: mc_runjob v06 contains substantial changes to job structure and execution that will require days of work to integrate into McFarm and test all code McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  12. Re-Processing Real Data - 3 • Our approach is to have Request_NNNN.py file include a sam dataset definition of files to be re-processed, feed into McFarm just like a Monte Carlo request • Some of McFarm is ready (SAM acquire), some is not (launch_request RT, v06 adaptation, switch from events to files) • Dave Evans is leaving McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  13. Re-Processing Real Data - 4 • Key contents of Request_REPROC02.py: 'Reconstructed':{'datasetdefname':’ reco_14.02.00_raw_2_files’, 'frameworkrcpname':'runD0recoSAM_data_reprocess_p13dst.rcp',}, • launch_request REPROC02 /home/mctest 0 RT • job UTATEST-RT-ReqREPROC02-03265220712 • It runs under mc_runjob v05 / McFarm v10.04, but no proper metadata yet. McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  14. Re-Processing Real DataTo Be Done • mc_runjob v06 must be debugged and released • McFarm must be adapted to v06 and debugged • Metadata must be stabilized and accepted by SAM • Re-processing authority should use MC-like Request_NNNN.py to invoke re-reco McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

  15. Conclusions • McFarm has morphed significantly since its creation to accommodate • Enhanced error handling • Enhanced monitoring • Other improvements • Re-processing capability in the works, despite some worries on schedule and support • IAC’s use and comments prompted McFarm improvements (Thank you everyone!!) • Comments always appreciated McFarm Improvements; D. Meyer, UTA DØ SAR Workshop, OU

More Related