1 / 9

p20 Reprocessing

p20 Reprocessing. Tibor Kur č a IPN Lyon. Introduction Computing Resources Architecture Operational Model Technical Issues Operational Issues - Status. Introduction. Goal: reprocess ~500 M RunIIb events (83 TB) with newly calibrated detector & improved

pletendre
Download Presentation

p20 Reprocessing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. p20 Reprocessing Tibor Kurča IPN Lyon • Introduction • Computing Resources • Architecture • Operational Model • Technical Issues • Operational Issues - Status T.Kurca - D0 Meeting FNAL

  2. Introduction Goal:reprocess ~500 M RunIIb events (83 TB) with newly calibrated detector & improved reconstruction software by end of March ‘07 Where: SAMGrid/OSG & native SAMGrid (CCIN2P3) Issues: SAMGrid –OSG new environment - missing experience - fast problems identification  solution ? - importance of organization, preparation  efficiency ? T.Kurca - D0 Meeting FNAL

  3. Computing Resources • Needs:2000 CPUs, 4 TB disk cache, 1Gb links • Available : #CPU (not guaranted) Disk Cache Oklahoma U 250 1 TB Indiana U 250 1 TB Nebraska U 250 CMS –FNAL 250 Fermilab 4 TB NERSC 250 SPRACE 250 Purdue ? Florida ? CCIN2P3 500 (non-OSG, SAMGrid) T.Kurca - D0 Meeting FNAL

  4. Flow of Job Submission Offers services to … Basic Architecture SAM-Grid / OSG Forwarding Node SAM-Grid OSG New SAM • Main issues to track down: • Accessibility of the services • Usability of the resources • Scalability SAM-Grid VO-Specific Services T.Kurca - D0 Meeting FNAL

  5. C C C C Current Configuration Network Boundaries Forwarding Node LCG Cluster SAM Stager VO-Service (SAM) Job Flow Offers Service - new SAM SAM-Grid FW FNAL FW C SPRACE stg S C CMS UNL C IU S stg NERSC stg stg OU stg T.Kurca - D0 Meeting FNAL

  6. Operational Model • Production & Merging: - production – reconstruction at each site  unmerged TMBs at FNAL - merging preferentially at FNAL • Organization - define submitter teams à 2 person - assign datasets to each team - define primary resp. secondary OSG clusters for each team where they should submit - submission from central UI installed on d0mino • Operation Problems Solution - multilevel expertise - identify problem : SAMGrid or OSG SAMGrid: contact d0_reprocessing - official OSG way : open ticket at GOC-Indiana - contact directly local administrators (if 1st way not working) T.Kurca - D0 Meeting FNAL

  7. Multilevel Expertise • Submitters/shifters: to submit, check logfiles  report problems Mandy ROMINSKY, Sohrab HOSSAIN University of Oklahoma Joseph STEELE Louisiana Tech University Yanwen LIU University of Science and Technology of China Dag GILLBERG, Zhiyi LIU Simon Fraser University, Canada • Experienced people: first aid Daniel, Joel, Mike, Tibor …. & others • SAMGrid experts: problem solving, intervention Andrew, Gabriele, Parag • OSG experts /local administrators: OSG related issues …we have established contacts ; … goc@opensciencegrid.org <mailto:goc@opensciencegrid.org> (subject should mention dzero reprocessing) T.Kurca - D0 Meeting FNAL

  8. Technical Issues • Tests done: • OSG clusters & CCIN2P3 tested • deployment of storage queues done  working • To be done !: -central UI installation on d0mino node jim_client, d0repro tools …. Most urgent , UI for all submitters • Issue: - binary input RTE-file size ~600 MB !!! …. To be shipped with each job … 2x raw data file size!!!  ???? To reduce it very desirable !!!!!! T.Kurca - D0 Meeting FNAL

  9. Operational Issues - Status • Relevant information at http://www-d0.fnal.gov/computing/reprocessing/p20/ • Grid certificates: • most of the submitters have their DOEGrids certificates • user account at NERSC ….. Procedure started • Test runs • submitters training  hands on experience - jobs submission using d0repro tools - where to look for logs  to be done ! …this Thursday? d0mino UI ? • large scale test -scalability issues? … early next week • Production start - early January ? T.Kurca - D0 Meeting FNAL

More Related