1 / 15

IT Monitoring WG IT/CS Monitoring System

IT Monitoring WG IT/CS Monitoring System. Virginie Longo. September 14th 2011. Summary. CS Monitoring Systems Spectrum CA Performance Analysis Others Tools Data storage Requirements NMS Status Requirements Researches. CS Monitoring systems. Spectrum CA. Description:

Download Presentation

IT Monitoring WG IT/CS Monitoring System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IT Monitoring WGIT/CS Monitoring System Virginie Longo • September 14th 2011

  2. Summary • CS Monitoring Systems • Spectrum CA • Performance Analysis • Others Tools • Data storage • Requirements • NMS Status • Requirements • Researches

  3. CS Monitoring systems

  4. Spectrum CA • Description: • Commercial Tool • Fault management oriented system • Root Cause Analysis/ alarm Correlation • Topology View • Service Manager => Relation With SLS View • Basic Performance manager • Volumes: • ~3000 devices monitored • Support 3K Laser devices for simple alarm (UP/DOWN) • Thousands of attributes polled and analyzed • 6GB of data events over 30 days • Monitoring Protocols: • SNMP and ICMP • Information only feed by SNMP (No remote agent) • Few other support : DNS / DHCP / TRACEROUTE /NTP /HTTP • Few home maid scripts for DHCP, web monitoring.

  5. Alarm Monitoring Spectrum Architecture (Storage system) Mysql Events Spectrum DB Models , topology, current polling value ,alarms SNMP Remote Mysql Service Manager Alarm Notifier Devices Info SSLogger Oracle Alarm History (LANDB) Oracle Stats (CSR) SLS Non Spectrum system Spectrum System

  6. Performance Analysis Statistics Architecture - Mix home maid system and Spectrum tool - Extraction data from Spectrum to Oracle DB - Data consolidation into RRD. - Displayed on Netstat website (PHP). Volumes: - ~9000 models (port + devices) for 24K of RRDs - 36 Metrics - 157 Attributes - ~160K entries load into Oracle DB for 5MN of poll - Data kept 1 months for oracle - 2 years of consolidated data in RRDs. Note : Metric is a group of attributes such as Bandwidth = in/out bits and in/out packets.

  7. Performance Analysis

  8. Other Tools Syslog event recording - Gathering all log from network devices - Stored into Oracle DB - Accessible from CSDB - Filtering and propagation by notification LHCOPN : Perfsonar Tool - Decentralized networks tool - OWD, latency and throughput regular test - Other tools like traceroute - LHCOPN network analysis Implementation ongoing, testing phase with 1BG link, security tests not complete yet. (www.perfosnar.net)

  9. Data storage

  10. Data Storage Summary: • Spectrum proprietary DBs for core and alarms • Mysql database for events and service manager • Oracle database for stats (CSR) and alarm history (LANDB) • Oracle database for Syslog info • Standalone Mysql database for Perfsonar tools. • Too many different type of storage. • Missing correlation between Syslog and SNMP

  11. Requirements

  12. NMS Status • Advantages : • Root cause analysis efficient • Correct Event- Alarm management • High availability • Really good topology views (useful for intervention group) • Support NICE users • Very good level of filtering (topology, alarms) • - Notification support • Negative points / Weakness • Expensive • Polling limitation is almost reached • (new version with complete redraw of polling system will arrive in 2 years) • Not a performance system: can’t handle 50K of statistics • Integration of non certificated manufacturer is complex • Data collection mostly limited to SNMP (changes ongoing)

  13. Requirements • Mandatory: • Root Cause Analysis • High polling system :1-2mn for critical nodes 3-5mn for others • Network topology representation • Notifications (SMS/ MAIL/XMPP) and general console • Distributed environment • High Availability System • Complete performance management • IPv6 Support • Nice to have : • Autodiscovery system • Mobile version • Oracle centralized database • Numbers and storage time : • Polling capacity for at least 5K nodes • Performance statistics for 56K of ports • Data lifetime: 1 month without aggregation, max with aggregation • Devices Alarm: around 2 years

  14. Researches • List of tools which fit better : • Icinga: Nagios like (forked) (Not Yet Tested) • Zabbix: Large polling scale, open source, notification, Oracle database, distributed (NYT) • (http://www.zabbix.com/features.php) • Solarwind: commercial but include performance and less expensive (NYT) • Opennms : • Open source - Completely customizable • High polling system with distributed environment • Events correlation, Alarm management, notification • Many data collection support (SNMP, HTML, JMX, JDBC, NAGIOS-NSCLIENT) • (http://www.opennms.org/about/) • Links : • http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html

  15. Thanks Questions ?

More Related