1 / 18

Import XRootD monitoring data from MonALISA

Import XRootD monitoring data from MonALISA. Sergey Belov , JINR, Dubna Sergey.Belov@cern.ch DNG section meeting, 30.10.2012. Motivation. XRootD federations monitoring is of importance for ALICE, ATLAS, CMS

kyrie
Download Presentation

Import XRootD monitoring data from MonALISA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna Sergey.Belov@cern.ch DNG section meeting, 30.10.2012

  2. Motivation • XRootD federations monitoring is of importance for ALICE, ATLAS, CMS • For the experiments now is more convenient to collect initial monitoring data on their side • Two collector types for MonALISA are in use: • Individual transfers statistics, some server statistics (in ALICE, based on ALICE developments) • Server statistics, some server statistics (in CMS and ATLAS, is fed wit information from UCSD collector) • Goal: to have this information in Dashboard

  3. What could we get from ML? • ALICE: transfers • individual transfer summary for the 60 seconds • Server name, client IP • Read/write MB • NO transfer ID ! • CMS, ATLAS: servers • Incoming and outgoing traffic • Current connections number and total connections ever • Authenticated and unauthenticated logins count, authentication failures number • Redirection count • For all these parameters their rates (HZ) are also provided

  4. How it should be done? Requirements: • Standard way: send information via message brokers, in JSON format (ML only as a transport) • Reliability • In messages handling along all the chain • No information loss on failures • Reasonable behavior in sending messages • Send only consistent information • Respect connection frequency, authorization, timeouts • Few big messages instead of hundreds small

  5. Dumping data from MonALISA (1) Steps to get the data: • Setup ML repository • Subscribe it to the appropriate monitoring groups (alice, xrootd_cms, xrootd_atlas) • Configure ML to consume only required parameters, but do not store anything • Set a custom filter (=handler) putting the data to outside - dumper

  6. Dumping data from MonALISA (2) ML result object structure: • “farm”, “cluster”, “node” • “node” → xrootd server name • “Site name” could be get from “farm” or “cluster” • timestamp • arrays of parameters’ names and values • Most common case: result object “decays” to the objects with just a single parameter name and value in the corresponding arrays • transfer ‘s information should be gathered piece by piece

  7. Dumping data from MonALISA (3) The dumper: • Is called each time repository has results from the subscriptions • Should be fast enough to not slow all the things down (consecutive calls for coming results) • If doing message handling or sending in here, no hope to have a reliable or stable solution

  8. Proposed information handling chain aggregator xrootdserver local queue dumper collector xrootdserver MonALISA local queue local queue xrootdserver local queue collector xrootdserver Messaging Transfer Agent (s) Dashboard Messagebrokers

  9. Technical solutions (1) • ML filter (Dumper) • Java class, catching incoming results from ML • Initial data transformation (decode IPs, etc.) • Stores data to local directory queues • Aggregator • Python 2.4 program, aggregating Dumper’s queues and preparing final messages to be sent by MTA • Reads/write messages from local directory queue • Does message messages aggregation and grouping

  10. Technical solutions (2) • Directory queues libraries • Java implementation: ch.cern.dirq class(by Massimo Paladin) • python-dirq(available in EPEL repository) • Messaging Transfer Agent • stompclt : flexible tool to consume and dispatch messages between different sources in a configurable and reliable way (by Lionel Cons), available in CERN SW rep, in EPEL soon • now STOMP protocol is enough (AMPQ protocol support is on the way with amqpclt tool)

  11. Adding more reliability with supervision • Proven concept (Erlang/OTP) • Workers do their work • Supervisors monitor workers • All are defined in a supervision tree • Flexible implementation available (simplevisor) • Non intrusive • Handle service evolution • Messaging Services and Client Software,Lionel Cons – Massimo Paladin, • EGI Technical Forum - Prague, 18th September 2012

  12. Aggregator’s internals • Accumulates statistics on xrootd servers (per timestamp), groups it by hostname • Reconstructs transfer statistics from subsequent messages, aggregates transfers by server and timestamp • Passes a bunch of messages (by type) in a large message to MTA • Removes all local queues messages involved when aggregated message is successfully sent • All semi-complete information chunks are to be sent on timeout, all (hopelessly) incomplete ones are wiped out • Three threads in the process: • Main (control) • Worker (periodically consume, aggregate, republish for MTA) • Cleanup (remove temporary stuff in directory queues involved)

  13. Message formats: xrootd transfers { "message_id": "05b179bb….", "server_host": "xr.cern.ch", "timestamp": "1123456789", "clients": [ { "client_ip": “12.34.56.78", "read_mb": “1.234", "written_mb": “2.345", "transfer_speed_mb": “3.582" }, ….. ] } { "header": { "message_id": "6061d13b….", "mon_service_fqdn": "mon.x.ch", "timestamp": "1223456789", "vo": "alice", }, "body": { "transfers": [ transfers messages ] } } * Need VO, or just send to different queues?

  14. Message formats: xrootd servers (1) • { • "message_id":"25e3c2f8….", • "timestamp":"1123456789", • "server_host": "example.cern.ch ", • "link_in": "5048475", • "link_in_R":"5.1234", • "link_out":"10493857", • "link_out_R":"7.2345", • "link_tot":"16949274", • "xrootd_lgn_af_R":"0.123", • "xrootd_lgn_au_R": "2.345", • "xrootd_lgn_ua_R": "0.5“, • …. • } { "header": { "message_id":"0d502ae9….", "timestamp":"122356789", “mon_host_fqdn":"mon.x.ch", "vo": "atlas|cms", }, "body":{ “servers_stats":[ stats messages are here ] } } * Need VO, or just send to different queues?

  15. Message formats: xrootd servers (2)

  16. Current state of developments • ML dumper filter is ready and works fine • Produces intermediate JSON messages to be consumed by aggregator, no performance limits observed • Aggregator is ready and being tested • Chosen technical solution (directory queues libraries, stopmclt) is proven to be appropriate, fast and scalable

  17. Further steps • Tests of full message processing chain (including stress tests) • Consumer from the Dashboard’s side • Tuning the setting of ML dumper,aggregator and stompclt • Supervision of all components (ML repo, aggregator, MTA) with simplevisor • Packaging of dumper, aggregator and all the configurations to RPM within the Dashboard

  18. Thanks for your attention!

More Related