1 / 14

OraMonPlans 08/04

OraMonPlans 08/04. Topics. Enhancements OraMon DB redundancy layer Compare and fix OraMon configurations Expiry of historical data Saving disk space OraMonArch Bugs Others OraMon OO development with Together OraMon changes for Maciej’s alarm interfacing system?.

Download Presentation

OraMonPlans 08/04

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OraMonPlans 08/04

  2. Topics Enhancements • OraMon DB redundancy layer • Compare and fix OraMon configurations • Expiry of historical data • Saving disk space OraMonArch Bugs Others OraMon OO development with Together OraMon changes for Maciej’s alarm interfacing system?

  3. OraMon DB redundancy layer Requirements: • OraMon should retry connect after loosing DB connection Currently (as for OraMon 0.0.3), upon DB connection failure, OraMon issues a [FATAL] log and stops • OraMon should support ‘Do(Not)InsertSamples’ command Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY • OraMon should have a ‘HeartBeat’ command Currently, one may check if an OraMon instance is alive by issuing a MR API query to it (via lemon-utils/lemon-cli.pl). Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’: • ‘External’: (do some variable setting and) start Oramon Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time • ‘Internal’: Change OraMon to satisfy requirement by adding specific code Pros and cons are the opposite compared to ‘External’

  4. OraMon DB redundancy layer Requirements: • OraMon should retry connect after loosing DB connection Currently (as for OraMon 0.0.3), upon DB connection failure, OraMon issues a [FATAL] log (+ failure kind) and stops • OraMon should support ‘Do(Not)InsertSamples’ command Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY • OraMon should have a ‘HeartBeat’ command Currently, one may check if an OraMon instance is alive by issuing a MR API query to it (via lemon-utils/lemon-cli.pl). Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’: • ‘External’: (do some variable setting and) start Oramon Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time • ‘Internal’: Change OraMon to satisfy requirement by adding specific code Pros and cons are the opposite compared to ‘External’

  5. OraMon DB redundancy layer ‘External’ solutions: • Retry connect after loosing DB connection A simple (restart-oramon like) service that issues: /etc/rc.d/init.d/OraMon start after OraMon stops, if ‘failure kind’ belongs to a TBD failure set. • ‘InsertSamples’ command to OraMon restart OraMon after un/setMR_READONLY: • Do insert: unset MR_READONLY ; /etc/rc.d/init.d/OraMon restart • Do not insert: set MR_READONLY=yes ; /etc/rc.d/init.d/OraMon restart • OraMon ‘HeartBeat’ Check sane response to a lemon-cli.pl query Should not get: Failed to MRs_getSamples() : #-1 : Connection refused Example: perl lemon-utils/lemon-cli.pl --metrics="10002" --nodes="lcgmon002d« --remote-server="http://ccs002d:12510"

  6. OraMon DB redundancy layer ‘Internal’ solutions: • Retry connect after loosing DB connection Change OraMon code: when an SQL command fails, because of a TBD failure set, do not fail, but rather try to connect again first (for a few times, sleeping between each try) • ‘InsertSamples’ command to OraMon Reuse and extend existing proprietary ‘insert samples’ protocol: • Define ‘pseudo’ metricId (set) that OraMon interprets as commands rather than as metrics to be inserted • Commands arrive from a specific port or from samples port. • Commands may be added to ‘metrics configuration’ (like) configuration • OraMon ‘HeartBeat’: the same as previous

  7. Changing metrics configuration Related OraMon documentation: Changing metrics configuration German’s email 19/7 [Lemon] changes in metric data fields: • changes (adding/removing/changing data fields) to latestOnly metrics: ok David: - ok. - When applying a new configuration, all (TBD changed) latest tables and views will be automatically dropped • changes to latestOnly metrics which have a historical table defined, but not (anylonger) used (reconfigured from 'latestOnly=false' to true): drop historical table altogether. David: - ok. - Also, drop tables of removed metrics? (- Also, is Archiving of tables to be dropped required?)

  8. Changing metrics configuration Cont. • changes to 'historical' metrics (not latestOnly):  - added data fields: OK David: TBD: ok iff adding fields does not complicate restoring of old data that do not have new fields • removed and changed data fields: drop historical values in DB, or refuse (global OraMon configuration Boolean parameter). David: I doubt that dropping historical data will satisfy potential problems while restoring older data. Assuming this is correct, ‘refuse’ will always be applied. • changes where historical values should be preserved: define a new metric ID. I don't think any conversion magic is appropriate, and for being consistent, it should be applied as well to all historical data already archived into CASTOR, which is far from trivial. David: As a rule of thumb: I suggest to avoid applying changes to archived data

  9. Changing metrics configurationDavid’s suggestions • Observation: The OraMon level of complexity to add a field is similar to that of applying other ‘compatible’ changes: remove field, change length • In order to avoid clashes between existing OraMon data schemas and previously archived data, I suggest that: • Each change to a metricClass will have new metricIds • Previous metricIds will be marked ‘obsolete’, by new metadata field • Previous metricIds may have a ‘replaced by metricId’ metadata field • In order to preserve older data and allow data schema changes, I suggest that when a ‘compatible’ change is applied to a metricClass, its existing historical table will be renamed to the new name, and automatic fixes will be applied by OraMon.

  10. Expiry of historical data 4162expiry of historical dataTo be discussed at CERN 2004-Jul-19 12:14 jveldik

  11. Saving disk space Compress partitions • Howto: OraMon partitions thread to compress partitions that are at least one day old • TBD: May cause unexpected complications • Saving space is important, but not urgent Make numbers (and strings) smaller • May be applied after applying all ‘Changing metrics configuration’ items  

  12. OraMonArch OraMonArch documentation • If ‘archive and not drop’ is required, implementation should be enhanced, since current implementation drops and returns data • Two OraMonArch instances: continuous and non-continuous: Non continuous requests can not be queued • OraMonArch transaction error when stop/crash after DDL command and before updating relevant checkpoint

  13. bug reports

  14. Bugs found while installing OraMon 0.0.3 • OraMon views indicate time that is later by one hour than the real time • OraMonArch/Cont service script (/etc/rc.d/init.d/OraMonArchContCtl): Return only after completing the work. Should return immediately. May cause computer to stuck at reboot. • Probable problem: metric validation errors at lcgmon002d differ from those at ccs002d • To be addressed to German: recognizing metric configuration change according to date causes rpm update to fail by mistake. Suggested fix: A hard coded date attribute. • To be checked: I suspect that logrotate does not work at ccs002d for /var/log/OraMon.log, because it did grow to: 66M as for 27/7 • OraMonArch transaction error when stop.crash after DDL command and before updating relevant checkpoint (See above)

More Related