1 / 7

CORAL network glitch

CORAL network glitch. Andrea Valassi (IT-ES) IT-ES Persistency Team meeting, 19 th November 2010. “Network glitch” overview. Reported by all experiments in various cases “A transaction is not active” in CORAL server ( bug #65597 )

barid
Download Presentation

CORAL network glitch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CORAL network glitch Andrea Valassi (IT-ES) IT-ES Persistency Team meeting, 19th November 2010

  2. “Network glitch” overview • Reported by all experiments in various cases • “A transaction is not active” in CORAL server (bug #65597) • ORA-24327 “need explicit attach” in ATLAS/CMS (bug #24327) • OracleAccess crash after losing session in LHCb (bug #73334) • What should CORAL do? Many different scenarios • e.g. non serializable R/O transaction: should reconnect and restart it • e.g. DDL not committed in update transaction: cannot do anything • What is CORAL doing now? • Correctly reconnecting in some cases (existing useful features) • Not doing anything in other cases (missing useful features) • Reconnecting in the wrong way in other cases (bugs!)

  3. General directions • 1. Catalog the different scenarios • 2. Prepare tests for each different scenario • Using CppUnit… • 3. Prototype the implementation changes • ConnectionSvc and/or plugins?

  4. Connection, session, transaction • A network glitch causes a loss of many states: • The state of the connection • The state of the session • The state of the transaction • We must separately keep track of each ‘old’ state • And then separately restore each state (only if possible/correct) • Example: two sessions over a shared connection • We must reconnect once, restart two logical sessions, and then restart up to two transactions if possible • It may be appropriate to restore/refresh the states in the three separate classes • Connection, Session, Transaction

  5. Detecting a network glitch • ‘I am not connected’ does not mean ‘I lost the connection’ • We need a separate method/mechanism than just “isConnected” or “isUserSessionActive” or “Transaction::isActive” • for instance: connectionWasLost(), sessionWasLost(), transactionWasLost() • Again: we must keep track of the old state… • Example: we should NOT start a new transaction if there was no transaction active before the glitch! • Add some tests also for some similar scenarios…

  6. Recovering from the glitch • In general: refresh instances rather than create new ones • Previous CORAL was closing session and creating a new one • This leads to segmentation faults and other problems • Better approach: keep existing C++ instances and refresh them • Add new methods specific to refreshing the states • Separately for the three (or more) classes • Encapsulate all loops in those methods • e.g. for how long should we retry to reconnect?

  7. Generic coding conventions • Long discussions last year and some hints on twiki • But not completely formalised(sorry…) • Please avoid • names that are not clear/relevant • file names that contain classes with different names • egScopedTransactionStatus in QueryMgr • Please do • keep it simple whenever possible! • avoid very general approaches to solve simple specific issues • avoid adding classes/headers that are not relevant

More Related