1 / 16

Persistency Framework News for ATLAS

Persistency Framework News for ATLAS. Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 14 th March 2012. Outline and summary. COOL validation on Oracle 11g Recent releases ( since Oct 2011 talk at ATLAS sw week )

turi
Download Presentation

Persistency Framework News for ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Persistency Framework Newsfor ATLAS Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 14th March 2012

  2. Outline and summary • COOL validation on Oracle 11g • Recent releases (since Oct 2011 talk at ATLAS sw week) • LCG 60x (COOL, CORAL, POOL): oldest supported version • LCG 61x (COOL, CORAL, POOL): 2012 production • Maintain binary compatibility to COOL 2.8 and CORAL 2.3 • LCG 62x (COOL, CORAL; no POOL): 2012 development • Eventually break binary compatibility in COOL 2.9 and CORAL 2.4 • WLCG Technical Evolution Groups • DB (CORAL/COOL) and DM (POOL) TEGs • Work in progress

  3. COOL on Oracle 11g servers • Validation of COOL performance has been completed • Problem seen in Oct 2011 (task #23366) is now understood to be caused by a bug in Oracle 11.2.0.2 server (Oracle bug 10405897) • Confirmed by enabling/disabling Oracle patch in a private 11.2.0.2 DB • Introduced in 11.2.0.2 (was not there in 11.2.0.1!) • Earliest COOL tests on 11g had seen no issue because they used 11.1.0.7… • Bad news: even a minor server patch can break performance! • Fixed in 11.2.0.3 – good news: we should no longer worry now • No need to downgrade to 10.2.0.5 optimizer (see my Oct 2011 talk) • Performance on 11.2.0.3 is as good as on 10.2.0.5 • Note that the corresponding best execution plans look different (but most likely the algorithm is exactly the same – only their display differs) • Thanks a lot to IT-DB and ATLAS DBAs for their help! 11.2.0.2 11.2.0.3

  4. COOL performance tests on Oracle • Test script was improved and procedure was documented • See https://twiki.cern.ch/twiki/bin/view/Persistency/CoolPerformanceTests • A detailed performance report can be created with one command • Covering 9 common use cases for querying COOL data • Showing queries, hints, performance plots and execution plans • e.g. https://twiki.cern.ch/twiki/pub/Persistency/CoolPerformanceTests/ALL-11.2.0.3-full.pdf 11.2.0.3 11.2.0.2

  5. 10g vs. 11g execution plans • With hints: only one (good) plan on 10.2.0.5 and 11.2.0.3 • Plan looks different but is probably exactly the same algorithm • Using INDEX RANGE SCAN (MIN/MAX) • With hints: 3 (bad) plans on 11.2.0.2 • Depending on statistics and bind variables (no exec plan stability!) • All of them are bad and involve INDEX FULL SCAN 11.2.0.3 (good) 10.2.0.5 (good) 11.2.0.2 (bad) – 1st of 3 plans

  6. Other issues on 11g servers • New discoveries and switches/defaults in CORAL & COOL • Disabled SQL plan baselines by default in CORAL • Disabled adaptive cursor sharing in COOL queries • COOL uses hints to stabilize execution plans – the two features above instead were found to lead to very confusing results… • One problem specific to nightly tests has also been fixed • ORA-01466 errors when querying (in RO transactions) test tables that have just been created (bug #87935) • Also present in 9i and 10g servers, more frequent in 11g servers • Workaround: sleep 1s if DDL on the test tables has just happened

  7. Oracle client – “11.2.0.1.0p3” • Many bugs relevant to CORAL are fixed in 11.2.0.1.0p3 • Production version for ATLAS and LHCb since June 2011 • See /afs/cern.ch/sw/lcg/external/oracle/11.2.0.1.0p3/doc/README_11.2.0.1.0p3.txt • SELinux issues on SLC5 (bug #45238) • 11.2.0.1 libraries rebuilt with Oracle patches (completed in 11.2.0.1.0p2) • Crash on AMD Opteron quadcore (bug #62194) • 11.2.0.1 libraries rebuilt with Oracle patch (as of 11.2.0.1.0p2) • Remove ~/oradiag directory dump (bug #58917) and workaround for bug in Oracle libraries redefining Kerberos5 symbols (bug #76988) • Custom sqlnet.ora configuration file (completed in 11.2.0.1.0p3) • A newer client 11.2.0.3 is available but I did not test it yet • IIRC I had tested 11.2.0.2 but SELinux and AMD fixes were missing • Upgrading to the 11.2.0.3 client is on my (low priority) to-do list • If I discover that the patches are not there, I would stick to 11.2.0.1.0p3 rather than reapplying the patches on top of 11.2.0.3.0 (to be discussed)

  8. LCG 61b for LHCb (Oct 2011) • Motivation: urgent Xrootd bug fix in ROOT (5.30.04) • Not sure if this was used by ATLAS (that requested 61c later on) • POOL 2.9.18 • Minor fixes in PersistencySvc • CORAL 2.3.19 • Minor fixes to help in the analysis of Oracle 11g server performance • Environment variable CORAL_ORA_OPTIMIZER_FEATURES • COOL 2.8.11a • Rebuild of previous COOL 2.8.11 (for ATLAS in LCG 61a) • For full details see the release notes on TWiki

  9. LCG 60e for ATLAS (Nov 2011) • Motivation: urgent CINT bug fix in ROOT (5.28.00h) • Essentially no other changes • POOL 2.9.16a • Rebuild of previous POOL 2.9.16 (for ATLAS in LCG 60d) • CORAL 2.3.17a • Rebuild of previous CORAL 2.3.17 (for ATLAS in LCG 60d) • COOL 2.8.10c • Rebuild of previous COOL 2.8.10b (for ATLAS in LCG 60d) • For full details see the release notes on TWiki

  10. LCG 62 for ATLAS (Dec 2011) • Motivation: major upgrade in ROOT (5.32) and Boost (1.48) • This is the first release without POOL! • First release on gcc46 (on SLC5 – prepare for the move to SLC6) • Complete 11g move (and other changes) in CORAL and COOL • CORAL 2.3.20 • Useful changes to improve analysis on 11g servers • Disable SQL plan baselines by default (unless CORAL_ORA_USE_SQL_PLAN_BASELINES is set) • Allow selective control over optimizer features and bug fixes by the CORAL_ORA_FIX_CONTROL environment variable • Fixes in the simple expression parser • COOL 2.8.12 • Useful changes to improve analysis on 11g servers • Environment variable COOL_ORA_OPTIMIZER_FEATURES • Disable adaptive cursor sharing (add the NO_BIND_AWARE hint) • For full details see the release notes on TWiki

  11. LCG 61c for ATLAS (Dec 2011) • Motivation: urgent CINT bug fix in ROOT (5.30.05) • Complete port to 11g in CORAL and COOL (rebuild LCG62 tags) • POOL 2.9.19 • Fixes and improvements in collection packages • CORAL 2.3.20a • Rebuild of previous CORAL 2.3.20 (for ATLAS in LCG 62) • COOL 2.8.12a • Rebuild of previous COOL 2.8.12 (for ATLAS in LCG 62) • For full details see the release notes on TWiki

  12. LCG 62a for ATLAS (Dec 2011) • Motivation: fix LCG62 installation procedure • Also move to frontier client 2.8.5 (with several bug fixes) • CORAL 2.3.20 • No rebuild, move last good installation (for ATLAS in LCG 62) • COOL 2.8.12 • No rebuild, move last good installation (for ATLAS in LCG 62) • For full details see the release notes on TWiki

  13. CORAL 2.3.21 for CMS (Feb 2012) • Upgrade from CORAL 2.3.12 • Pick up many changes prepared during the last ~one year • CORAL 2.3.21 • Fix for ORA-25408 during a transaction rollback (bug #87164) • More fixes in the simple expression parser (bug #91075) • Please remember: the usage of lowercase names and of reserved Oracle words (e.g. SELECT) as column names is strongly discouraged! • Fix memory leaks in OracleAccess • For full details see the release notes on TWiki

  14. WLCG TEGs • Data Management TEG – POOL support • LHCb (like CMS previously) is essentially no longer using POOL • Replaced by direct ROOT; only Gaudi (not for long) still needs POOL • ATLAS will no longer need POOL support as of LCG62, where a custom package derived from it is built and maintained by ATLAS • Database TEG – CORAL and COOL usage and support • Review of conditions data handling in ATLAS, CMS and LHCb • Survey of COOL usage in ATLAS will be useful (AndreaF’s talk) • Experiment requests for COOL, CORAL and CoralServer support • More details in Dario’s talk

  15. Other issues and work in progress • More work on CORAL connection management • Work on network glitch handling should converge quite soon • Better understanding of CORAL interaction with Oracle TAF (Transparent Application Failover) – e.g. ORA-25408 errors • Port to gcc46 and clang30 on SLC6 is now complete • Will eventually be included in the releases • Cleaning up CORAL and (mainly) COOL API extensions • Will be in LCG62x – or more likely in LCG63 with ROOT 5.34 • Not for ATLAS production usage in 2012 • Example: COOL “vector payload”

  16. Conclusions • COOL migration to 11g servers is now complete • Performance affected by bug in 11.2.0.2, fixed in 11.2.0.3 • Detailed performance reports can now be easily produced • Several COOL and CORAL releases • LCG61 will be the ATLAS production version in 2012 • LCG62 will be the ATLAS development version in 2012 • With API extensions in CORAL & COOL; and without POOL • WLCG Technical Evolution Groups • DM: agreement to move POOL to ATLAS as of LCG62 • DB: review of usage and support model for CORAL & COOL

More Related