1 / 19

CASTOR logging at RAL

This text discusses the implementation of CERN's new CASTOR logging system at RAL, exploring different approaches and off-the-shelf solutions. It highlights the benefits and challenges of each approach and outlines future plans for testing and scalability.

cbrad
Download Presentation

CASTOR logging at RAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CASTOR logging at RAL Rob Appleyard, James Adams and Kashyap Manjusha

  2. The plan • Implement CERN’s new CASTOR logging system at RAL • Start work at the source of messages and work through to the destination. • Simple, right?

  3. simple-log-producer • Easy to set up, ran well. • We weren’t sure of the reason to write custom code for this, we are aware of off-the-shelf products that do the same thing.

  4. Apollo/ActiveMQ • CERN planned to move from Apollo to ActiveMQ, so we decided to go straight to the final destination. • The ActiveMQ broker proved difficult to set up. • Arcane config. • Dept. experience – they got it running once and hoped to never touch it again! • ActiveMQ seemed like overkill. • It is a heavyweight bit of software. • Our use case is extremely simple; take messages from multiple sources and forward them on. • Is there a simpler way of doing this?

  5. Message Broker: The solution! • Replace the ActiveMQ broker with some rsyslog config that does the same thing. • We use rsyslog for all other Tier 1 logging at RAL. • Lightweight solution that does what we needed. • Simply send unprocessed log messages over TCP. • A couple of lines in rsyslog.conf were all that was necessary.

  6. But… • The simple-log-producer is not simply a forwarding mechanism. Messages processed locally before transmission. • We need to do the processing done by slp somewhere downstream. • Combine slp and the consumer scripts into one script that runs on the ‘viewer’ node. • We could also eliminate the rsyslog broker, and just send directly to the viewer.

  7. smooshed-log-producer • Attempt to combine simple-log-producer with the consumers. • James spent several days working on this. • Thought that the system could more easily be re-implemented using standard software. • Let’s have a go!

  8. The Off-the-Shelf Approach • Use logstash feeding ElasticSearch and Kibana. • All three components are affiliated open-source products. • <1 day to produce a working prototype. • We already have a solution for long-term archival of log messages; central loggers that capture all Tier 1 messages.

  9. Logstash • Open source log management tool. • Input -> Filters -> Output • Can interface with… more or less anything.

  10. Logstash • Our setup receives syslog messages over TCP, tokenises and forwards them to ElasticSearch. • RAL has experimented with it in the past for other applications. • See: logstash.net

  11. ElasticSearch • Distributed real-time search and analysis tool. • Based on Apache Lucene (JSON document-based search) • Horizontal construction – need more capacity? Just add more nodes. • Currently running on a two-machine cluster • Accepting messages from preproduction instance.

  12. Kibana

  13. Kibana • Web FE for ElasticSearch. • Index and full text of every CASTOR log message. All tokenised and searchable. • Search on any message field. • Arbitrary queries • Lots of graphs and analytics. • Much faster than DLF, at least with preprod. • No Oracle or MySQL database required. • Current implementation: LINK!

  14. The Result We have a system that appears to be capable of fulfilling our needs better than DLF. • Faster • Able to run arbitrary queries. • Components are all off-the-shelf. • Correctly handles all CASTOR messages that DLF did (which isn’t everything…) • Needs some help to interpret and deal with a few anomalies. • The xroot logs are weird.

  15. Future Plans • Currently working against preprod during 2.1.14 stress testing. • During stress testing, received 16 GB/day of CASTOR logs. • No dependency on 2.1.14 • We aim to start testing against our production instances before Christmas. • Scalability? • The plan is to have one message index per CASTOR instance. • Possible future development: • Reconfigure rsyslog on source nodes to send JSON to logstash rather than syslog (should be pretty trivial).

  16. Questions?

More Related