1 / 30

Computing for CDF Status and Requests for 2003

Computing for CDF Status and Requests for 2003. Stefano Belforte INFN – Trieste. The CDF-Italy Computing Plan. Presented on June 24, 2002 Referees (and CSN1) postponed discussion/approval until November 2002: decide based on experience Collecting experience now

jmarcum
Download Presentation

Computing for CDF Status and Requests for 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing for CDFStatus and Requests for 2003 Stefano Belforte INFN – Trieste

  2. The CDF-Italy Computing Plan • Presented on June 24, 2002 • Referees (and CSN1) postponed discussion/approval until November 2002: decide based on experience • Collecting experience now • No reason to modify plan so far • Today: • Status report on analysis farm at FNAL • Update on work toward de-centralization • GRID - CNAF • Progress toward MOU/MOF • Rational for 2003 requests Stefano Belforte – INFN Trieste CDF computing

  3. Status of CAF • FNAL Central Analysis Farm (CAF): a big success so far • Easy to use • Effective • Convenient • Measure of success • 100% used now • Upgrade in progress • Many institutions spending their $$$ there • Cloning started (Korea) Stefano Belforte – INFN Trieste CDF computing

  4. CDF Central Analysis Farm My Desktop • Compile/link/debug everywhere • Submit from everywhere • Execute @ FNAL • Submission of N parallel jobs with single command • Access data from CAF disks now • Access tape data via transparent cache soon now • Get job output everywhere • Store small output on local scratch area for later analysis • Access to scratch area from everywhere • IT WORKS NOW My favorite Computer Log out job ftp rootd FNAL gateway scratchserver N jobs out switch NFS rootd Local Data servers A pile of PC’s Stefano Belforte – INFN Trieste CDF computing

  5. From disk 2TB/day From tape Tape to Disk to CPU “Spec. from 2000 review”: Disk cache should satisfy 80% of all data requests Days in September 2002 Stefano Belforte – INFN Trieste CDF computing

  6. Made in Italy CAF promise fulfilled Giorgio Chiarelli runs 100 section jobs and integrates 120x7x24x3% = 600 CPU hours in a few days using up to more then half the full CAF at the same time • Go through 1TB of data in a few hours • All of this with one single few lines script that automatically • divides the input among the various job sections Stefano Belforte – INFN Trieste CDF computing

  7. Made in Italy Monitoring jobs and sections on the Web Stefano Belforte – INFN Trieste CDF computing

  8. Made in Italy Managing user’s area on CAF O(100GB) Stefano Belforte – INFN Trieste CDF computing

  9. Made in Italy CAF this summer • CAF stage 1 saved the day for summer conferences • 61 duals (10 INFN 16Pitt/CMU) • 15 fileservers (4 INFN 1 MIT) • CPU usage ~90% since June • Users happy Stefano Belforte – INFN Trieste CDF computing

  10. Made in Italy CAF today • Wait times get longer • Users want more • Ready for Stage 2 • New hardware ready this fall for ski conferences Stefano Belforte – INFN Trieste CDF computing

  11. CAF Stage 2 (Stage1  x4) • FNAL/CD centralized bid ~ two times/year • CDF procurement for Stage 2 this summer • JustInTime to catch INFN funds released in June (x3) • Bids are in • Hope for HW up and running in November • CSN1  users = 6 months • Many others will join CAF in Stage2 • KEK-Japan: 2 fileservers 38 duals • Korea : 0.5 fileserver (+ 2 later) • Spain : 1 fileserver • Canada : 1 fileserver • US (8 universities) : 10 fileservers 4 duals • More to come Stefano Belforte – INFN Trieste CDF computing

  12. Why is CAF a success • CAF is more than a pile of PC’s • Integrated hw/sw design for farm and access tools • Designed for optimized access to data • Lots of disk resident data • Large transparent disk cache in front of tape robot • Tuning of disk access (data striping, minimal NFS,…) • Designed for users convenience • Simple GUI’s, Kerberos based authentication, large local user areas • Professional system management and close loop with vendors • Several hw/firmware/sw problems solved so far • RAID controller, defective RAM, file system or kernel bugs … • Plus the normal failure rate of disks, power supplies etc. • 2 FTE on CAF infrastructure Stefano Belforte – INFN Trieste CDF computing

  13. Will CAF success last ? • User community: • ramping up in these days: 20  200 • From the pioneers to the masses • Exposure to all kinds of access patterns • Hardware expansion: • up to a factor 10 over the next 2 years • Only experience will tell • CAF is build with the cheapest hardware • Will have to learn to live with 10~20% of hardware broken at any given time Stefano Belforte – INFN Trieste CDF computing

  14. Beyond CAF • FERMILAB wants to join the GRID • FNAL will be Tier1 for CMS-US • Foreign CDF institutions want to integrate their local farms • Spain, Korea, UK, Germany, Canada, Italy • In many case to exploit LHC/GRID hardware • So far no big offer of help for common work, unlikely D0 • Exception: Canada: 224 nodes “now” for CDF MC • No software tool to do this integration “transparently” yet • Not clear how much this will help CDF analysis Stefano Belforte – INFN Trieste CDF computing

  15. Decentralizing analysis computing • FNAL-CD working hard to promote SAM for remote work • SAM: Metadata catalog + distributed disk caches • Run analysis locally • Copy data as needed (only 1st time) • Works in Trieste (as other places) • SAM to become “the” CDF data access tool • SAM integration with (EuroData)GRID being tried • CDF working on “packaging CAF for export” • Decentralized CAFs • Each handling data independently • Cloning FNAL CAF is the easiest way (Korea choice) • Remote farms = extra costs for FNAL Stefano Belforte – INFN Trieste CDF computing

  16. CDF computing outside US (approx) Stefano Belforte – INFN Trieste CDF computing

  17. MOU/MOF • Moving to a way to recognize foreign contribution • IFC and Scrutiny Group to work on this • INFN present in both • Issues being talked about: • Computing will have to enter MOF somehow • Allow and encourage contribution • Take into account history and present situation • No indication of a “crisis” that has to be dumped on the collaborators for help Stefano Belforte – INFN Trieste CDF computing

  18. 2003 requests detailed: 5 items • Stick to June plan : • Invest majority of resources on FNAL CAF • Modest growth in Italy for interactive work • Summer experience: needs do not scale down with luminosity • No reason to expect large variation from June numbers • Requested resources well within June forecast • Nevertheless, prudent, incremental approach (referees) • New in 2003 • Start MC • Interactive work at FNAL • Start transition to CNAF Stefano Belforte – INFN Trieste CDF computing

  19. Tevatron keeps us busy • By next summer tune analysis to same level as Run1: • Alignments, precision tracking, secondary vertexes, B-tag • Jet energy corrections, underlying event • Do interesting physics in the meanwhile • Example: All italian Dhh • By end of year (100pb-1) • 10^6 events in the mass peak, 10^7 in the histogram • 4TB of data by spring, 16TB by end 2003 • This channel alone saturates disk financed so far (15TB) • Learning field for Bhh Stefano Belforte – INFN Trieste CDF computing

  20. Monte Carlo • CDF has talked about central production • But no overall estimate of needs yet • Next year safe bet: everybody on his/her own • Just the same as Run 1 • Italian groups starting on this now • Plan for capacity of 10^7 events/months • Modest hw need: 10 dual cpu nodes • Adequate for most analysis (10x a given dataset) • Future growth should be small • Further requests only on basis of clear “cases” Stefano Belforte – INFN Trieste CDF computing

  21. Interactive work at FNAL • When at FNAL can not run root on Italy’s machines • Need “some” “better then desktop” PC (Cfr. June’s talk) • Referees asked for central management: • Defined total cap at 10 “power PCs” • Asked for 5 in 2003 • 4 full time physicists doing analysis at FNAL P.Azzi, R.Carosi, S.Giagu, M.Rescigno • Explore central alternative in 2003 • Interactive login pool in CAF • Some ideas so far, will try and see Stefano Belforte – INFN Trieste CDF computing

  22. Spend money in Italy Join INFN effort in building world class computing center Easier access to 3rd data and/or interactive resources GARR vs WAN Tap on GRID/LHC hardware pool for peak needs Import here tools and experience learnt on CAF Not an “experiment need” FNAL CAF may be enough Costs more Poor access to main data repository (FNAL tapes) Need to replicate easy of use and operation of FNAL CAF Different hardware = different problems Have to divert time and effort from data analysis Moving CAF to CNAF PRO’s CON’s Stefano Belforte – INFN Trieste CDF computing

  23. Moving CAF to CNAF: the proposal • Start with limited, but significant hardware • 2003 at CNAF  ½ of private share of CAF in 2002 • 7TB of disk and 29 dual processor estimated on the basis of expected data needs for top6j and Zbbar • Explore effectiveness of work environment • Don’t give up on CAF features • Look for added value • Will need help (manpower) • Will try and see, decision to leave FNAL will have to be based on proof of existence of valid alternative here Stefano Belforte – INFN Trieste CDF computing

  24. Summary of requests Analysis at FNAL June 24 “plan” After CSN1’s June decision • FNAL CAF: 22TB disk + 63 dual nodes = 132+173=306KEu • Monte Carlo: 10 dual nodes = 28KEu (FNAL price) • CNAF: 7TB disk + 29 dual nodes = 70+96=166KEu • Interactive FNAL: 5 “power PC” = 22.5KEu • Interactive Italy: disk and cpu Pd/Pi/Rm/Ts/… = 50KEu total Stefano Belforte – INFN Trieste CDF computing

  25. SPARE Spare slides from here on

  26. Working on CDF CAF is easy • Pick a dataset by name • Decide how many parallel execution threads (sections) • Prepare 1 executable, 1 tcl and 1 script file • Submit from anywhere via simple GUI • Query CAF status at any time via web monitor • Retrieve log/data anywhere via simple GUI 2 step submission of 100 sections 1) In the script: setenv TOT_SECT 100  @ section = $1 - 1   setenv CAF_SECTION $section 2) In the tcl file (only one tcl file)   module talk DHInput include dataset bhmu03     setInput cache=DCACHE      splitInput slots=$env(TOT_SECT) this=$env(CAF_SECTION) Stefano Belforte – INFN Trieste CDF computing

  27. Working on CAF is effective • Quickly go through any CDF dataset (disk or tape) • Create personalized output and store it locally • Run on that output (data file or root ntuple) • Locally on CAF nodes • Remotely via rootd (e.g. Root from desktop) Stefano Belforte – INFN Trieste CDF computing

  28. CAF is convenient: can work from anywhere • All needed code and tools for CDF offline via anonymous ftp or simply from /afs/infn.it • Everything runs on plain RedHat 6.x, 7.x • even on GRID testbed • no need for customized system install • Need Kerberos ticket to talk to FNAL, but.. • One click install of kerberos client from the web • No need for system manager • Just type “kinit” and your Fermilab password • Many people work from their laptop ! Stefano Belforte – INFN Trieste CDF computing

  29. CAF future Stefano Belforte – INFN Trieste CDF computing

  30. Little data ? No way ! • DAQ runs at full speed • Typical Luminosity better then Run1 • 2 track trigger from SVT is full of charm • We are refocusing attention on samples that in the default scenario would have been limited in statistics • Low Pt jets (20GeV) and leptons (8GeV) • Charm • Interesting for physics • improve on PDG in charm sector • Fundamental control samples • Particle ID on Dhh as learning field for Bhh • Heavy flavor content in jets • B-jet tagging • Jets resolution • … Stefano Belforte – INFN Trieste CDF computing

More Related