1 / 26

B A B AR Operational Experience with Objectivity ODBMS

B A B AR Operational Experience with Objectivity ODBMS. David R. Quarrie Lawrence Berkeley National Laboratory for B A B AR Experiment DRQuarrie@LBL.GOV. Database Goals. Provide storage and access for event data Event store Provide storage and access for detector conditions data

overton
Download Presentation

B A B AR Operational Experience with Objectivity ODBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BABAR Operational Experience with Objectivity ODBMS David R. Quarrie Lawrence Berkeley National Laboratory for BABAR Experiment DRQuarrie@LBL.GOV

  2. Database Goals • Provide storage and access for event data • Event store • Provide storage and access for detector conditions data • Environmental conditions that vary with time • Conditions & Ambient databases • Configuration Management • Keyed access to unique configurations • Trigger • Detector setpoints • Configuration Database • Not production management • Handle distribution and access across whole collaboration • Wide area as well as local area access David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  3. Experiment Characteristics David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  4. Performance Requirements • Online Prompt Reconstruction • Baseline of 200 processing nodes • 100 Hz total (physics plus backgrounds) • 30 Hz of Hadronic Physics • Fully reconstructed • 70 Hz of backgrounds, calibration physics • Not necessarily fully reconstructed • Physics Analysis • DST Creation • 2 users at 109 events in 106 secs (1 month) • DST Analysis • 20 users at 108 events in 106 secs • Interactive Analysis • 100 users at 100events/secs David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  5. Functionality Summary • Basic design/functionality ok • No performance or scaling problems with conditions, ambient and configuration databases • Security and data protection APIs added • Internal to a federation • Access to different federations • Problems • Significant performance/scaling problems with event store • Online Prompt Reconstruction • Physics Analysis • Data Distribution problems • Internal within SLAC • External to/from remote Institutions • Focus of the remainder of the talk David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  6. Computing Review 2-4 Aug 1999 • Identified database performance as major technical concern • Recommended database reviews in Feb and Aug 2000 • Recommended development of limited-function short-term non-Objy solution for micro-DST analysis • Recommended setting up of a dedicated Objectivity testbed in order to perform detailed scaling and performance tests David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  7. Production Federations • Two groups • Physics • Online • Analysis • Reprocessing • Simulation • Generation • Analysis • Reprocessing • Motivations • Minimization of interference (particularly with online) • Increase the available number of databases • Operational experience caused the Online to be split • IR2 • OPR David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  8. SLAC Design Hardware Configuration David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  9. SLAC Configuration at time of Review X X X X X X David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  10. Testbed Hardware Configuration • Testbed hardware available from about 7th August • Two datamovers (450) • 100+ bronco clients (Ultra-5) • Conditions & catalog servers (250) • Journal servers (250) • Lock servers • Two sets of tests • Online Prompt Reconstruction (OPR) • Physics Analysis • Initial tests have focussed on OPR • Already well instrumented • Expect any performance improvements to apply to analysis as well • Dedicated analysis performance tests later David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  11. Baseline Configuration • We baselined the testbed against the production system to ensure that we started off with the same performance • Turned off filtering • All input events are being fully reconstructed • Easier to understand event rate • Will turn it back on again later on in the testing • Some of tests are preliminary and we need to go back & redo them • Don’t fully understand all the numbers yet • The tests are still underway • Numbers are not final David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  12. Baseline Results at time of Review Asymptotic limit Production set point David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  13. Minimize catalog operations Separate conditions DB server Separate catalog server Tune AMS server Client file descriptors Client cache sizes Initial container sizes Transaction lengths TCP configuration Multiple AMS processes Database clustering Autonomous partitions Disable filters Singleton Federations Veritas Filesystem optimization Decrease payload per event LM starvation? Loadbalance across datamovers More datamovers Database pre-creation Gigabit lockserver Caching handles Local bootfile Unlock instead of mini-transaction Run OPR with no output Run on shire to bypass AMS Knobs to twiddle (tests so far) David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  14. Results so far 4 datamovers David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  15. Significant Items • Minimize Catalog operations • e.g. Named containers • Linkable AMS server slow (~3-4 Mbytes/sec) • Not the normal AMS - the special one allowing migration/staging • Inefficiency in handling 16k file descriptors • Located in Objy code • First improvement by Andy Hanushevsky • Probably more improvements to come • Extending containers is expensive • During persistent object creation • Contrary to advice from Objy engineer • For a single process it’s low overhead • Causes locking • Presize to 50% of average final size David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  16. Significant Items (2) • Database clusters • Grouping of nodes to databases • Reduce the number of processes accessing each database • Undocumented locking operation to extend containers • Multiple AMS processes per server • Currently single threaded • Definite improvement with 4 – we’ll try 8 • N.B. Most servers have 4 cpus • Won’t be necessary in 5.2 - the AMS is (finally) multi-threaded • Veritas filesystem configuration • Single-threaded tests show 40MB/sec read & write • Random-write tests (non-Objy) show 7MB/sec throughput • We’re seeing about this with 180 nodes • Work in progress on optimization • Managed 8 MB/sec • More datamovers David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  17. Problem - Payload per event - Problem is our poor implementation, not Objectivity overhead - Work is underway to redesign/reimplement this David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  18. Future Prompt Reconstruction Tests • Reduce payload per event • Autonomous partitions • Slight hint of lock server saturation (cpu load) • Veritas filesystem optimization • TCP configuration • About to try 250 nodes • The bottom line: • We’ve met the design goals (with filtering re-enabled) • Still lots of possibilities for improvements David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  19. Physics Analysis • No quantitative tests yet • Expect that improvements shown by prompt reconstruction will also improve performance for physics analysis • Also expect to find and apply read-only optimizations • 3 “typical” jobs being setup • CPU bound • Medium cpu “skim” • Fast physics analysis • Testing about to start • Also using shire (E10000) as database server • Objy 5.2 (with SLAC extensions) will support dynamic load-balancing across multiple servers • 20MB/sec per server? David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  20. Data Distribution Issues • Internal to SLAC • Sweeps of data between production federations • Database id allocation scheme works well • HPSS catalog used as primary location • Shadowing of databases as well as copying • Bookkeeping is biggest outstanding problem • Getting better but a ways to go… • External to SLAC • Use of 10GB databases has caused major problems • Lots of unexpected infrastructure problems (perl, tsch, etc.) • Bugs in size calculation has caused some nominally 2 GB databases to exceed this limit • Fix being installed into production now • Bandwidth of tools • File copies between computers at SLAC David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  21. Scaling Problems • Total number of database files • Being addressed by longRefs in future release • Avoids the current need for database files >2GB • Cause significant infrastructure problems • Timescale “6-9 months” • Number of nodes for parallel loading • We’re essentially there • In process of applying lessons from testbed to production • Administration tools operate slower • Still an issue • Update “starvation” • Administration problem since multiple read accesses prevent updates from being applied • MROW access expected to solve this David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  22. Reliability Problems • Lock collisions • Better understanding of lock management • Avoid leaving lock trails behind • Automatic cleanup at end of job • Automatic cleanup at regular intervals • Separate Online and OPR federations • Separated for reliability & OPR lock “firestorms” • Unable to provide full calibration feedback • Firestorms not in fact an interference between Online and OPR • Solved by Objy bug fix and lock optimization • New design allows closed loop calibration feedback with separate federations • We’re gaining operational experience in production • Earlier tests (e.g. MDC2) didn’t scale David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  23. Lack of automation problems • Goal was to achieve understanding and hence reliability using manual procedures, then install automatic procedures • Automatic procedures only work once we understand the issues and achieve reliable operation • Most of underlying tools now in place • e.g. Sweeping of data from one federation to another • Still lack necessary bookkeeping • Automatic procedures and logging mechanisms (e.g. web pages) slowly being put into place • More personnel now available to work on this • Still a lot of learning to be done in this area David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  24. Risk Analysis - Alternatives to Objectivity? • Should we be looking into an alternative? • We have attempted to minimize direct dependency on Objectivity • Successful for reconstruction/analysis code • Not successful for infrastructure • Makefiles • Administration tools • Data distribution • MicroDST based on ROOT I/O • Takes advantage of Converters & Modules classes for Objy. David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  25. Objectivity Usage Statistics • >30 sites using Objectivity • USA, UK, France, Italy, Germany • ~650 licensees • People who have signed the license agreement • ~400 users • People who have created a test federation • >100 simultaneous users • Monitoring distributed oolockmon statistics • 60 developers • Have created or modified a persistent class • A wide range of expertise • 10-15 experts • 485 persistent classes David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

  26. Conclusions • Basic design and technology ok • Serious performance/scaling problems at startup • Lots of learning about how to manage production environment • Dedicated testbed has demonstrated good results • Prompt Reconstruction now achieving design performance • Similar improvements in physics analysis expected • Not all these improvements have been fed back into production environments • Underway now • Is Objectivity suitable for use within HEP? • Yes • Is it the only solution? • No David R. Quarrie: BaBar Operational Experience with Objectivity ODBMS

More Related