1 / 33

The D0 NIKHEF Farm

The D0 NIKHEF Farm. Ton Damen. Willem van Leeuwen. Kors Bos. Fermilab, May 23 2001. Layout of this talk. D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base A Grid intermezzo The network The next steps. Fermilab, May 23 2001. D0 Monte Carlo needs.

clio
Download Presentation

The D0 NIKHEF Farm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The D0 NIKHEF Farm Ton Damen Willem van Leeuwen Kors Bos Fermilab, May 23 2001

  2. Layout of this talk D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base A Grid intermezzo The network The next steps Fermilab, May 23 2001

  3. D0 Monte Carlo needs • D0 Trigger rate is 100 Hz, 107 seconds/yr  109 events/yr • We want 10% of that be simulated  108 events/yr • To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) • On a 800 MHz PIII • So 1 cpu can produce ~105 events/yr (~200 Gbyte) • Assuming a 60% overall efficiency • So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte) • And this is only 10% of the goal we set ourselves • Not counting Nijmegen D0 farm yet • So we need another 900 cpu’s • UTA (50), Lyon (200), Prague(10), BU(64), • Nijmegen(50), Lancaster(200), Rio(25),

  4. How it looks

  5. The NIKHEF D0 Farm Tape robot @SARA Sam station SARA network Meta data @Fermilab 155 Mbit/s 1 Gbit/s 1 Gbit/s Surfnet NIKHEF network .. Etc. 100 Mbit/s 1 Gbit/s .. Etc. Farm Server Sam station 1 Gbit/s 100 Mbit/s .. Etc. switch File Server .. Etc. .. Etc. 1.5 TB disk cache Farm nodes

  6. 50 Farm nodes(100 cpu’s)Dell Precision Workstation 220 • Dual Pentium III processor 800 MHz / 256 kB cache each • 512 MB PC800 ECC RDRAM • 40 GB (7200 rpm) ATA-66 disk drive • no screen • no keyboard • no mouse • wake up on Lan functionality

  7. The Farm ServerDell Precision 620 workstation The File ServerElonex EIDE Server • Dual Pentium III 700 MHz • 512 MB SDRAM • 20 GByte EIDE disk • Dual Pentium III Xeon 1 GHz • 512 MB RDRAM • 72.8 GByte SCSI disk • 1.2 Tbyte : 75 GB EIDE disks • Will also serve as D0 software server for the NIKHEF/D0 people • 2 x Gigabit Netgear GA620 network card

  8. Software on the farm • Boot via the network • Standard Redhat Linux 6.2 • Ups/upd on the server • D0 software on the server • FBSNG on the server, deamon on the nodes • SAM on the file server • Used to test new machines …

  9. What we run on the farm • Particle Generator: Pythia or Isajet • Geant Detector simulation: d0gstar • Digitization, adding min.bias: psim • Check the data: mc_analyze • Reconstruction: preco • Analysis: reco_analyze

  10. Example: Min.bias • Did a run with 1000 events on all cpu’s • Took ~2 min./event • So ~1.5 days for the whole run • Ouput file size ~575 MByte • We left those files on the nodes • reason for enough local disk space • Intend to repeat that “sometimes”

  11. Output data • -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py • -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params • -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000 • -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

  12. Output data translated 0.047 Gbyte gen_* 1.5 Gbyte d0g_* 0.7 Gbyte sim_* import_gen_*.py import_d0g_*.py import_sim_*.py isajet_*.params RunJob_Farm_*.params d0gstar_*.params d0sim_*.params samglobal_*.params Summary.conf 12 files for generator+d0gstar+psim But of course only 3 big ones Total ~2 Gbyte

  13. Fermilab SARA sam d0mino TERAS reconstructed data generator data NIKHEF D0 FARM Data management parameters Import_gen.py geant data (hits) Import_d0g.py sim data (digis) Import_sim.py Import_reco.py

  14. Automation • Mc_runjob (modified) • Prepares MC jobs (gen+sim+reco+anal) • (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times • Submits them into the batch (FBS) • Ran on the nodes • Copy to fileserver after completion • A separate batch job onto the fileserver • Submits them into SAM • Sam does file transfers to Fermi and SARA • Runs for a week …

  15. 1.2 TB fbs(rcp) fbs(sam) mcc request farm server SAM DB file server fbs job: 1 mcc 2 rcp 3 sam fbs(mcc) datastore mcc input FNAL SARA mcc output node 50 + control 40 GB data metadata

  16. Fermilab SARA sam d0mino TERAS KUN D0 FARM in2p3 D0 FARM NIKHEF D0 FARM This is a grid!

  17. The Grid • Not just D0, but for the LHC expts. • Not just SAM, but for any database • Not just farms, but any cpu resource • Not just SARA, but any mass storage • Not just FBS, but any batch system • Not just HEP, but any science, EO, …

  18. European Datagrid Project • 3 yr. Project for 10 M€ • Manpower to develop grid tools • Cern, in2p3, infn, pparc, esa, fom • Nikhef + sara + knmi • Farm management • Mass storage management • Network management • Testbed • HEP & EO applications

  19. Desktop LHC - Regional Centres KEK CERN – Tier 0 INFN BNL IN2P3 NIKHEF/ SARA RAL FNAL Tier 1 Utrecht Vrije Univ. Tier2 Nijmegen Amsterdam Brussel SURFnet Leuven Department Atlas LHCb Alice possibly

  20. Dubna Edinburgh Lund Moscow Manchester Estec KNMI Oxford Berlin QMW Bristol RAL IPSL Prague Paris Brno CERN Lyon Santander Milano Grenoble PD-LNL Torino Madrid Marseille BO-CNAF HEP sites Pisa Lisboa Barcelona ESRIN ESA sites Roma Valencia Catania DataGrid : Test bed sites Nikhef

  21. TheNL-Datagrid Project

  22. NL-Datagrid Goals • National test bed for middleware development • WP4, WP5, WP6, WP7, WP8, WP9 • To become an LHC Tier-1 center • ATLAS, LHCb, Alice • To use it for the existing program • D0, Antares • To use it for other sciences • EO, Astronomy, Biology • for tests with other (Trans Atlantic) grids • D0 • PPDG, GriPhyN

  23. NL-Datagrid Testbed Sites Univ.Amsterdam (Atlas) Vrije Univ. (LHCb) CERN RAL FNAL ESA Nijmegen Univ. (Atlas) Univ.Utrecht (Alice)

  24. CERN Geneva KNMI Surfnet FNAL Free Univ. ESA D-PAF Munchen NIKHEF SARA Dutch Grid topology Alice Utrecht Univ. Nijmegen Univ. LHCb D0 Atlas D0 Atlas LHCb Alice

  25. End of the Grid intermezzo Back to The NIKHEF D0 farm and Fermilab: The network

  26. Network bandwidth • NIKHEF SURFnet 1 Gbit • SURFnet: Amsterdam  Chicago 622 Mbit • Esnet: Chicago  Fermilab 155 Mbit ATM • But ftp gives us ~4 Mbit/sec • bbftp gives us ~25 Mbit/sec • bbftp processes in parallel ~45 Mbit/sec • For 2002 • NIKHEF SURFnet 2.5 Gbit • SURFnet: Amsterdam  Chicago 622 Mbit • SURFnet: Amsterdam  Chicago 2.5 Bbit optical • Chicago  Fermilab ? but more ..

  27. ftp++ • ftp gives you 4 Mb/s to Fermilab • bbftp: increased buffer, # streams • gsiftp: with security layer, increased buffer, .. • grid_ftp: increased buffer, # streams, #sockets, fail-over protection, security • bbftp  ~20 Mb/s • grid_ftp  ~25 Mb/s • Multiple ftp in //  factor 2 seen • Should get to > 100 Mbit/sec  • Or ~1 Gbyte/minute

  28. 100 Gbit/s SURFnet5 20 Gbit/s SURFnet4 10 Gbit/s 10 Gbit/s 2,5 Gbit/s 1.0 Gbit/s 1 Gbit/s Access capacity 155 Mbit/s 100 Mbit/s 10 Mbit/s 1999 2000 2001 2002 SURFnet5 access capacity

  29. TA access capacity UK SuperJANET4 NL SURFnet GEANT It GARR-B Fr Renater NewYork Abilene STAR-LIGHT ESNET Geneva 2.5 Gb MREN 622 Mb STAR-TAP

  30. Network load last week • Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) • Available to Chicago: 622 Mbit/s • Available to FNAL: 155 Mbit/s • Needed next year (double cap.): ~25 Mbit/s • Available to Chicago: 2.5 Gbit/s: factor 100 more !! • Available to FNAL: ??

  31. New nodes for D0 • In a 2u 19” mounting • Dual 1 GHz PIII • 1 Gbyte RAM • 40 Gbyte disk • 100 Mbit ethernet • Cost ~k$2 • Dell machines were ~k$4 (tax incl)  • FACTOR 2 cheaper!! • assembly time 1/hour • 1 switch k$2.5 (24 ports) • 1 rack k$2 (46u high) • Requested for 2001: k$60 • 22 dual cpu’s • 1 switch • 1 19” rack

  32. The End Kors Bos Fermilab, May 23 2001

More Related