1 / 10

Derived Metrics Prototyping

Derived Metrics Prototyping. Dennis.Waldron@cern.ch IT-FIO-FS. Objective. Initially: How to calculate derived metrics in a global context?. Expanded: Evaluation of Heidelberg Fault Tolerance package in the context of a global framework of recovery.

rose-ramsey
Download Presentation

Derived Metrics Prototyping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Derived Metrics Prototyping Dennis.Waldron@cern.ch IT-FIO-FS

  2. Objective • Initially: • How to calculate derived metrics in a global context?. • Expanded: • Evaluation of Heidelberg Fault Tolerance package in the context of a global framework of recovery. • Note: Derived metrics also referred to as: • Global correlations • Combined metrics

  3. Motivation • Produce a: • Simple • Extensible • Highly Configurable • Powerful • A prototype sensor exists ‘CMsensor’ • Coded in PERL • Adheres to WP4 architecture • Can be used in both local and global context • Limitations apply to global context

  4. Architecture User System Boundary 1…* Measurement Repository Database Cache Subscription Sensor Sensor MR Server (OraMonServer) Collector Agent (MSA) Sensor MR API Node Configuration File(s) MRs sensorAPI CMsensor XML XML XML XML Interpreter

  5. CMsensor • Responsible for: • Configuration Management • Autoload of configuration changes on the fly, instantiating, removal, alteration • Subscription Management • Manages active subscriptions, re-subscription requests, shutdowns, reconnects, etc… • Metric Triggers • Calculation of derived metrics can be triggered by: • MSA scheduling (GET request) • Subscription callback • Data Caching • Rule Evaluation and Error Handling

  6. CMsensor cont. • 1 XML file exists for each derived metric - defines: • Metric name and description • Subscription requirements • Metric processing code (rule) Example 1: <sensorMetric name=“example1” description=“this is an example metric”> <metric>10002</metric> <rule> # PERL code here $value = &getMetric($host, $metric) * 50; # return value to MR &storeSample(03, $mid, 0, $host, $value); </rule> </sensorMetric> Note: $host, $metric, $mid (re-injection id) locally scoped at execution

  7. CMsensor cont. Example 2: <sensorMetric name=“daemonUp” description=“Does at least 1 instance of a daemon exist across a defined list of nodes”> <rule> # PERL code here my $retval = 0; my $node; my @node_list = split(/ /, $params); foreach $node (sort @node_list) { if (&getMetric($node, 12345) == 1) { $retval = 1; last; } } &storeSample(01, $mid, 0, $retval); </rule> </sensorMetric>

  8. CMsensor cont. • A sensorAPI PERL module implementing the latest ASCII MSA – MS protocol (v1.3) now exists. • Sensor utilises the new PERL simplified API for MR access. • No hard coded metrics! • All fatal messages are trapped, logged and appropriate action triggered. • System allows for access to alternative sources of information other the central measurement repository e.g. LSF.

  9. Limitations • Large volume re-injection causes performance issues with the MSA. • The impact of large volume insertions with the MR is unknown. • Derived metrics requiring a lot of processing can cause a backlog of metrics to be processed, hence some schedule executions may be skipped. • The extraction of large volumes of data takes a long time. • Can retrieve approx 266 metrics/second through the simplified API in comparison to 4668 through direct SQL calls. • Current MR queries limited to 12,000 values • EDG Bugzilla 2380, 2381 • Segmentation Faults in subscription mechanism via PERL MRs • EDG Bugzilla 2320, 2366 • CMsensor lacks some functionality

  10. Questions?

More Related