1 / 23

BABAR: il nuovo modello di calcolo

BABAR: il nuovo modello di calcolo. Computer Model Working Group Analysis Model Event Store Technologies Computing/Analysis Sites Implementation/Migration. J. Walsh INFN-Pisa.

Download Presentation

BABAR: il nuovo modello di calcolo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BABAR: il nuovo modello di calcolo • Computer Model Working Group • Analysis Model • Event Store Technologies • Computing/Analysis Sites • Implementation/Migration J. Walsh INFN-Pisa Note: The new Babar Computing Model is still under development. Everything I present here today is still under discussion within Babar. J. Walsh, INFN-Pisa

  2. Need for new computing model • Babar Computing Model established end of 2000 • changes in computing environment  require review • Babar Computing Review: April, 2002 • large luminosity of PEP-II puts big burden on Babar computing resources • Analysis Groups produce huge ntuples that essentially duplicate micro-dst information  not scalable with luminosity • two event store technologies, Root/IO (Kanga/Root) and Objectivity  burden on Computing Group • several other issues raised: role of remote sites, etc. • Computer Model Working Group 2 - update Babar Computing Model. So far work concentrated in two main areas • New proposal for analysis model • Event store technology- Root vs. Objectivity debate • Plus, will comment on Remote sites – Tier A, Tier C, etc. J. Walsh, INFN-Pisa

  3. where we are now Luminosity Projection design luminosity (3x1033) exceeded in 2001 already expect 3 times current data sample in 2005 Previsioni '02 J. Walsh, INFN-Pisa

  4. Computing Model Working Group 2: Members • Members: David Brown/LBL Claudio Campagnari Andreas Hoecker Hassan Jawahery (co-chair) Yury Kolomensky (co-chair) David Lange Mike Roney Aaron Roodman Anders Ryd Bernhard Spaan John Walsh Fergus Wilson • Technical Experts: • Jacek Becla • Fabrizio Bianchi • Nicole Chevalier • Nicolo De Groot • Gregory Dubois-Felsmann • Peter Elmer • Steffen Luitz • Mauro Morandin • Ex-Officio: • Stephen Gowdy CC • Rainer Bartoldus DepCC • Richard Mount SCS • Dominique Boutigny Tier-A Rep. • Livio Lanceri DepPAC J. Walsh, INFN-Pisa

  5. Current Computing Model - Jargon • Analysis: • performed exclusively on micro format • Beta/Framework is analysis program, C++ based • Central Skims produce subsets of data pertinent to various analyses • Analysis Groups (AWG’s) often produce ntuples (or rootuples) that are large and redundant (contain same info as micro) • Fortran (or root scripts) analyze ntuples (rootuples) • Event Store: • Originally only in Objectivity • Kanga/Root developed for micro format only, to enhance analysis performance, data distribution • micro currently maintained in Objy and Kanga/Root • Computing/Analysis Sites: • Full copy of micro data available at SLAC, IN2P3, RAL • Production reprocessing in place at INFN-Padova • Numerous institutes produce MC at remote sites Tier A Sites J. Walsh, INFN-Pisa

  6. New Analysis Model – Key Points • Data formats: • Mini: new format with detailed detector information available. • New Micro: upgraded form of current micro, allow faster access, customizable content • Tag (nano-dst): no major changes, general cleanup • Note: large RECO and RAW objects no longer written to event store • Skims: • New Micro will make centralized skims more useful and efficient • Large ntuple production: • Hopefully, rendered obsolete, freeing up resources (CPU, disk space and manpower) J. Walsh, INFN-Pisa

  7. Mini Format • New format introduced over the last year: • contains essentially full detector information • track hits, calorimeter crystals, DRC hits, etc. • efficiently packed to optimize space • about 8 kBytes per event (compare to micro: 2 kBytes/event) • increased analysis capability w.r.t. micro, e.g.: • track extrapolation through detector material • follow changing conditions (e.g. SVT alignment) • event display • etc. J. Walsh, INFN-Pisa

  8. Mini Format - II • Additional characteristics: • customization • larger and slower than micro-dst: • develop coherent staging system to retrieve events from tape. Target access times: • < 1 hr for small (< 100 events) samples • < 1 day for medium (< 1 k events) samples • < 2-4 weeks for large samples • exact use pattern of mini not really predictable • need to remain flexible on implementation J. Walsh, INFN-Pisa

  9. New-Micro Format • Radical improvement w.r.t. current micro-dst • Dual usage: • regular framework/Beta job (current Babar norm) • interactive use with Root • Customizable content • option to store detector info or not • additional user info can be added • composite candidates • different mass hypo track fits, etc. • High speed: the aim is to reach 1 kHz with framework/Beta  will require Beta development • current rate: few tens of Hz • higher read rates envisioned with Root access (at cost of reduced functionality) J. Walsh, INFN-Pisa

  10. New Micro Format – II • Impact on users: • much analysis in Babar done at ntuple level • ntuple analysis code will have to be adapted/converted to new-micro (use of paw/Fortran discouraged)  potentially disruptive • Comment on Objectivity: • since interactive Root access is a basic feature of the new-micro  Objectivity event store is not an option • new-micro will be in Root/IO format J. Walsh, INFN-Pisa

  11. Event Store Debate • Current system: hybrid with Objectivity at SLAC and IN2P3 and Kanga/Root at RAL, INFN-Padova • Problems with Objectivity: • lock collisions • Prompt Reconstruction and Simulation Production performance issues • poor record of scaling with luminosity: every jump in data sample has been accompanied by Objy problems • distribution difficulties  getting data samples to Tier C sites • other HEP experiments have dropped Objy as an option • concerns about viability of Objectivity Company • we don’t have source code • how much expertise will be around in 2007? J. Walsh, INFN-Pisa

  12. Event Store Debate - II • Kanga/Root • much easier maintenance • easier to export • smaller event size (although Objy event size is decreasing with deployment of compression and redesign of navigational info) • more efficient CPU usage • becoming HEP standard – easy to attract manpower to support Kanga/Root J. Walsh, INFN-Pisa

  13. Event Store Debate - III • So, why not drop Objy and go with Kanga/Root system? • Cost of migration: effort and disruption: estimates ranged from 1 to 2 years to achieve migration  most in Babar agree a switch that takes more than 2 years is probably not worth doing. • Kanga/Root has some technical issues that need to be addressed: • file server to handle huge number of files • lack of transactions • lack of cross-file associations (e.g. mini-to-micro navigation) • bookkeeping • staging system • Political/human issues. • Note: conditions database implemented in Objectivity • too costly (estimate 2-3 years) to convert to Root-based DB • Babar relationship with Objy will continue in any case Work to address these issues is ongoing (not just in Babar context)  probably no show-stoppers, but it is work. J. Walsh, INFN-Pisa

  14. Event Store Debate - IV • Alternative to Kanga/Root-only system: a hybrid system where: • new-micro in Kanga/Root format only • everything else (event reconstruction, simulation production, mini format) in Objectivity • Hybrid system has advantage of easier, less-disruptive migration, but we still need to support 2 event store technologies • Final decision/recommendation on event store coming soon J. Walsh, INFN-Pisa

  15. Computing/Analysis Sites • The Working Group is just starting on this subject  just present the issues • Role of Tier A sites - large site that reduces significantly computing burden at SLAC • Primarily analysis: IN2P3, RAL • Production: INFN-Padova • Issues: • data replication at Tier A’s • data partitioning at Tier A’s (micro, mini, beam data, MC) • transparent access to data across Tier A’s (BabarGrid) • specialization of Tier A’s: skimming, (re-)processing, etc. • Role of Tier C sites – smaller sites at remote institutes • main contribution so far in MC production (majority of MC events produced away from SLAC) • analysis at Tier C’s has been difficult due to problems with data distribution need to resolve with new Computing Model J. Walsh, INFN-Pisa

  16. Implementation • Mini • Already implemented in Objectivity (minor fixes, improvements ongoing) • Feasibility of Root implementation has been studied  could be ready by early 2003 • New -micro • Dual usage (Beta and Root) prototype implementation has been achieved. Additional development needed: • customization • persistent composites • Beta/Framework optimization J. Walsh, INFN-Pisa

  17. Migration • Essential requirement: minimal disturbance to Babar capability to produce physics results • Mini • currently doing reprocessing in Padova of all data, producing mini format output • the mini is “new” feature, so disruption of ongoing analyses is minimal • New-micro • exploit Babar’s data replication to ease migration • maintain old-micro at SLAC and IN2P3 sites • introduce new-micro at RAL site • users have choice of format during transition period • Dependence on other parts of Computing Model • use of Tier A sites • choice of event store technology, etc. J. Walsh, INFN-Pisa

  18. Summary • Babar is currently updating its computing model, to be able to deal with large increase in data set in the coming years • A new analysis model, based on the new-micro and mini data formats has been proposed and largely agreed to. • the mini will permit more in-depth analysis • the new-micro will eliminate largely wasteful/redundant ntuples • The working group is also considering the future of event store technologies employed in Babar. • Should Objectivity event store be dropped in favor of Root-based technology? • Is Kanga/Root ready to be used as a full-scale event store? • Does a hybrid system do enough to alleviate the problems of Objy? • An important part of the new model will be how best to use remote computing/analysis sites: Tier A and Tier C • Work starting on this subject within the Working Group J. Walsh, INFN-Pisa

  19. the following are backup slides J. Walsh, INFN-Pisa

  20. Skims with New-Micro • The customization features of the new-micro make it an attractive tool to use with Centralized Skims • The idea is that each Analysis Working Group (AWG) will provide the appropriate event selection and contentcustomization to the Central Skim group • Small skims will be encouraged: deep-copies, which provide fast access, will be possible for small skims (< few % selection rate) • In addition to skims, a generic new-micro containing all physics events will be available  important for new analyses • Aiming for increased frequency for Central Skims – every 3 months • feasibility being evaluated J. Walsh, INFN-Pisa

  21. Tag (nano-dst) Format • The Tag format will continue to be maintained • Optimization/cleanup to remove unused or redundant information – should get a factor of 2 size reduction J. Walsh, INFN-Pisa

  22. Deep Copy vs. Pointer Skims deep copy • Deep copy • copy full event to new location • faster read rate • more disk space • Pointer (shallow) copy • write pointer only • slower read rate • less disk space Ev 1 Ev 2 Ev 2 Ev 4 Ev 3 Ev 4 shallow copy Ptr 2 Ptr 4 J. Walsh, INFN-Pisa

  23. Use Cases • Mature analysis (like sin2b) could create a Mini skim of a relatively small number of events and work from that • An analysis with loose skim cuts (2-body charmless) would customize a new micro skim, dropping unneeded info and saving B candidates. Mini could be used near end of analysis when number of events is sufficiently reduced. • A new analysis would use allEvents generic new micro to explore concept and define cuts. Final analysis could require a customized new micro skim or a mini skim (if event sample is small enough). • An AWG could produce a skim that serves many analyses. Specific analyses could make pointer skims of the skim, or deep copy skims if small enough. • etc. J. Walsh, INFN-Pisa

More Related