1 / 16

Jean-Yves Nief CC-IN2P3, Lyon

BaBar Tier A @ CC-IN2P3. Jean-Yves Nief CC-IN2P3, Lyon. HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002. Talk’s outline. Overview of BaBar: motivation for a TierA. Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network).

calder
Download Presentation

Jean-Yves Nief CC-IN2P3, Lyon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BaBar Tier A @ CC-IN2P3 Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002

  2. Talk’s outline • Overview of BaBar: motivation for a TierA. • Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network). • Softwareissues (maintenance, data import). • Resources usage (CPU used…). • Problems encountered (hardware, software). • BaBar-Grid and future developments. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  3. BaBar: a short overview • Study of CP violation using B mesons, located at SLAC. • Since 1999, more than 88 millions B-B events collected. ~ 660 TB of data stored (real data + simulation) How is it handled ? • Object oriented techniques: C++ software and OO database system (Objectivity). • For data analysis @ SLAC: 445 batch workers (500 CPUs), 127 Objy servers + ~50 TB of disk + HPSS. But: important users needs (> 500 physicists)=>saturation of the system. collaborators spread world-wide (America, Europe). Idea: creation of mirror sites where data analysis/simu prod could be done. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  4. CC-IN2P3 Tier A: hardware (I) • 19 Objectivity servers: SUN machines. - 8 Sun Netra 1405T (4 CPUs). - 2 Sun 4500 (4 CPUs). - 1 Sun 1450 (4 CPUs). - 8 Sun 250 (2 CPUs). • 9 servers for data access for analysis jobs. • 2 databases catalog servers. • 6 servers for databases transactions handling. • 1 server for Monte-Carlo production. • 1 server for data import/export. • 20 TB of disks. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  5. Hardware (II): Storage system • Mass storage system: 20 % available on disk => automatic stagingrequired. • Storage for private use: • Temporary storage: 200 GB NFS space. • Permanent storage: - For small files (log files…): Elliot archiving system. - For large files (ntuples…) > 20 GB: HPSS (2% of the total occupancy). > 100 TB in HPSS HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  6. Hardware (III): the network • Massive data import from Slac ( ~ 80 TB in one year ). • Data needs to be available in Lyon within a short amount of time (max: 24 - 48 hours). • Large bandwidth between SLAC and IN2P3 required. • 2 roads: • CC-IN2P3  Renater  US : 100 Mbs/s • CC-IN2P3  CERN  US : 155 Mbs/s (until this week) • CC-IN2P3  Geant  US : 1 Gbs/s (from now on)  Full potential never reached (not understood) HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  7. Hardware (IV): the batch and interactive farm • The batch farm (shared): • 20Sun Ultra 60 dual processor. • 96Linux PIII-750 MHz dual processor, NetFinity 4000R. • 96Linux PIII-1GHz dual processor, IBM X-series. • 424 CPUs • The interactive farm (shared): • 4 Sun machines. • 12 Linux machines. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  8. Software (I): BaBar releases, Objectivity • BaBar releases: • Needs to keep up with evolution of the BaBar software at Slac.  new BaBar software releases have to be installed as soon as they are available. • Objectivity and related issues: • Development of tools: • To monitor the servers activity, HPSS and batch resources. • To survey the Objectivity processes on the servers (« sick » daemons, transactions locks…). • Maintenance: software upgrades, load balancing of the servers. • Debugging the Objy problems both on client and server side. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  9. Software (II): data import mechanism • SLACCern IN2P3 • (2) SLACRenaterIN2P3  Data catalog available for users through a mySql database. • < size of the dbs > ~ 500 MB • using multi-stream transfer • (bbftp: designed for big files). • extraction when new or updated dbs available. • import in Lyon launched when extraction @ • SLAC is finished. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  10. Resources usage (I) Tier A officially opened last fall. • ~ 200 - 250analysis jobs running in parallel (the batch system can handle up to 600 jobs in // ). • ~ 60 – 70 MC production jobs running in //. already ~ 50 millions events produced in Lyon.  now represents ~ 10-15% of the total weekly BaBar MC prod. • ~ 1/3 of the jobs running are BaBar jobs. • Up to 4500 jobs in queue during the busiest periods. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  11. Resource usage (II) (*) (*) 1 unit = 1/8 hour on PIII, 1 GHz. • BaBar: top CPU consumer group in the last 4 months at IN2P3. • Second CPU consumer since the beginning of the year. • MC prod represents 25 – 30% of the total CPU time used.  ~ 25 – 30% of CPU for analysis used by remote users. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  12. Resources usage (III) • 20% of the data on disk  dynamic staging via HPSS (RFIO interface). • ~ 80 s for a staging request. • Up to 3000 staging requests possible per day • Not a limitation for CPU efficiency. • Needs less disk space, allow to save money. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  13. Problems encountered • A few problems with the availability of data in Lyon due to the complexity of the export/import procedure. • Network bandwidth for data import a bit erratic, maximum never reached. • Objectivity related bugs (most of them due to Objy server problems). • Some HPSS outages, system overloaded (software related + hardware limitations): solved  better performance now. • During peak activity (e.g. before the summer conference), huge backlog on the batch system. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  14. The Tier A and the outer world: BaBar Grid @ IN2P3 • Involvement of BaBar to use Grid technologies. • Storage Resource Broker (SRB) and MetaCatalog (MCAT) software installed and tested @ IN2P3: • Allows to access data sets and resources based on their attributes rather than their physical locations.  Future for the data distribution between SLAC and IN2P3. • Tests @ IN2P3 of the EDG software using BaBar analysis applications: possible to remotely submit a job @ IN2P3 to RAL and SLAC.  Prototype of a tool to remotely submit jobs: December 2002. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  15. CC-IN2P3 Tier A: future developments • 2 new Objy servers + new disks (near future): • 1 allocated to MC prod  goal: x 2 the MC production. • Less staging requests to HPSS. • 72 new Linux batch workers ( PIII, 1.4 Ghz)  CPU power increased by 50% (shared with others). • Compression of the databases on disk (client or server decompression on the fly)  HPSS load decreased. • Installation of a dynamic load balancing system on the Objy servers  more efficient (next year). HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

  16. Conclusion • BaBar Tier A in Lyon running full steam. • ~ 25 – 30 % of the CPU consumed by analysis jobs used by remote users. • Significant resources at CC-IN2P3 dedicated to BaBar (CPU: 2nd biggest user this year, HPSS: first staging requester). • Contribution to BaBar overall effort increasing thanks to: • New Objy servers and disk space. • New batch workers (72 new Linux this year, ~ 200 next year). • HPSS new tape drivers. • Database compression and dynamic load balancing of the servers. HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002

More Related