slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Nagios On-Call Rotation PowerPoint Presentation
Download Presentation
Nagios On-Call Rotation

Loading in 2 Seconds...

play fullscreen
1 / 27

Nagios On-Call Rotation - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Nagios On-Call Rotation. James Clark. banditbbs@gmail.com. Topics Discussed. About Me / My Monitoring History Monitoring History at Current Company Prerequisites Current Company Setup Scripts Nagios Configuration. About Me. Have been in the IT industry since 1988

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nagios On-Call Rotation' - kaspar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Nagios On-Call Rotation

James Clark

banditbbs@gmail.com

topics discussed
Topics Discussed
  • About Me / My Monitoring History
  • Monitoring History at Current Company
  • Prerequisites
  • Current Company Setup
  • Scripts
  • Nagios Configuration
about me
About Me
  • Have been in the IT industry since 1988
  • In 2004 became server group manager
  • Have been using Nagios since ~2003
  • Switched to XI ~2010 (And loved every part of it)
  • Changed jobs in August 2012 and quickly convinced new company to purchase XI
about me1
About Me
  • Private web page ishttp://www.bandits-home-on-the-web.com
  • On that page you will find some of theNagios modifications I have done
history of monitoring and alerting at new job
History of Monitoring and Alerting at new job
  • Many monitoring applications spread through-out the IT department
    • CCSS for iSeries
    • Foglight for DB
    • SCOM for Windows
    • Three separate Nagios Core servers
    • IBM NetCool
    • Many departments had no monitoring
  • All of the applications forward to NetCool and NetCool then forwards alerts to AlarmPoint (xMatters)
  • AlarmPoint holds the on-call schedule for the many different groups in the IT department
history of monitoring and alerting at new job1
History of Monitoring and Alerting at new job
  • CCSS for iSeries
    • Partial conversion to XI started
  • Foglight for DB
    • Complete conversion to XI hopeful
  • SCOM for Windows
    • Will either develop custom script to communicate back and forth. Currently testing WMI and hope to use that instead
  • Three separate Nagios Core servers
    • Converted to a single XI server
history of monitoring and alerting at new job2
History of Monitoring and Alerting at new job
  • IBM NetCool
    • Removing from company
  • AlarmPoint
    • More than likely, removing from company
  • One primary XI server currently with 3 mod_gearman workers
  • One XI server for monitoring primary XI and a few other devices
  • One XI server in our DR web data center
history of monitoring and alerting at new job3
History of Monitoring and Alerting at new job
  • Besides AlarmPoint, On-Call schedule is kept in a separate MS SharePoint site that the DC Operations uses.
  • No fulltime administrator for either NetCool or AlarmPoint.
  • When done switching everything to NagiosXI, a significant savings will be realized.
  • One of the main hurdles to the switch, is on-call rotation for alerting.
on call data prerequisites
On Call Data - Prerequisites
  • On-call information stored in some application
  • On-call information able to be exported from the application in a specific format
  • A job scheduler to run the jobs
on call data our setup
On Call Data – Our Setup
  • SharePoint site to store on-call schedule
  • SharePoint admin created an application to export the data needed and send the files to an FTP server.
  • Two files are sent, one for primary and one for secondary.
  • We use Control-M to schedule the above program and the two Linux scripts.
  • The job is run daily at 8am. Our on-call changes Monday’s at 8am.
  • If changes are made to the on-call schedule, that need to take effect immediately, the job is manually run. Otherwise, it can wait until the next day at 8am.
on call data our setup1
On Call Data – Our Setup

Added ID to contacts table.

Added short name to On-Call Groups table.

Set the SharePoint site to alert me when any changes done to those two tables so it can be mirrored it in Nagios.

The scripts do handle blanks. This will be shown in a later slide.

on call data example files
On Call Data – Example files

Networking,network,smithj

System p Administration,aix_admins,doej

AE Direct,aed_infra,user1

Database,dba,clarks

System iAdministration,system_i_admin,walenciejs

Wintel Administration,wintel_admins,hilderbrandr

System iApplications,system_i_apps,brownr

Client Server Applications,client_server,yatesp

DataWarehouse/Enterprise Rpts,datawarehouse,connerys

Store Applications,store_apps,probstj

The first field is what is displayed on the SharePoint site and is the alias assigned in Nagios. The second field is the name given to the contact groups. The third field is of course the ID of the user.

on call data ftp script
On Call Data – FTP Script

HOST=xxxxxxx #This is the FTP servers host or IP address.

USER=xxxxxxx #This is the FTP user that has access to the server.

PASS=xxxxxxx #This is the password for the FTP user.

ftp -inv $HOST << EOF

user $USER $PASS

cd /nagiosftp

get primaryOnCall.txt

get secondaryOnCall.txt

delete primaryOnCall.txt

delete secondaryOnCall.txt

bye

EOF

exit 0

on call data data manipulation script
On Call Data – Data Manipulation Script

#!/usr/bin/perl

#Remove old config files

system ("find /usr/local/nagios/etc/static -type f -not -name 'xi*' -not -name 'esc*' -not -name 'aed_*' | xargs rm");

#Process primary on-call file

open (INFILE, 'primaryOnCall.txt') or die $1;

while (<INFILE>) {

chomp;

($group, $alias, $id) = split(",");

if (($alias ne '') && ($group ne '') && ($id ne '')) {

open (OUTFILE, '>/usr/local/nagios/etc/static/' . $alias . '_oncall_pri.cfg');

print OUTFILE "define contactgroup{\n";

print OUTFILE "contactgroup_name $alias" . "_oncall_pri\n";

print OUTFILE "alias $group\n";

print OUTFILE "members $id\n";

print OUTFILE "}";

close (OUTFILE);

}

}

close (INFILE);

on call data data manipulation script cont
On Call Data – Data Manipulation Script(cont…)

#Process secondary on-call file

open (INFILE, 'secondaryOnCall.txt') or die $1;

while (<INFILE>) {

chomp;

($group, $alias, $id) = split(",");

if (($alias ne '') && ($group ne '') && ($id ne '')) {

open (OUTFILE, '>/usr/local/nagios/etc/static/' . $alias . '_oncall_sec.cfg');

print OUTFILE "define contactgroup{\n";

print OUTFILE "contactgroup_name $alias" . "_oncall_sec\n";

print OUTFILE "alias $group\n";

print OUTFILE "members $id\n";

print OUTFILE "}";

close (OUTFILE);

}

}

close (INFILE);

on call data data manipulation script cont1
On Call Data – Data Manipulation Script(cont…)

#Change ownership and permissions of config files

system ("sudo /bin/chown apache:nagios /usr/local/nagios/etc/static/*.cfg");

system ("sudo /bin/chmod 777 /usr/local/nagios/etc/static/*.cfg");

#Delete data files

system ("rm primaryOnCall.txt");

system ("rm secondaryOnCall.txt");

#Restart Nagios

system ("sudo su -l nagios -c 'cd /usr/local/nagiosxi/scripts/ && ./reconfigure_nagios.sh'");

#Exit clean

exit 0;

on call data list of files created
On Call Data – List of Files Created

Due to a blank for secondary on-call in the file, only the primary file for datawarehouse exists.

nagiosxi configuration
NagiosXI Configuration
  • No contacts or contact groups are assigned to the hosts or services. Unless you want to always receive alerts. i.e. Someone who needs alerted that is not a member of the specific on-call group.
  • Users receive permissions to see hosts and services by having an escalation for them
  • Escalations must be created for both hosts and services. Services do not inherit escalations like they do notifications
nagiosxi configuration cont
NagiosXI Configuration(cont…)
  • Escalations created as static config files.
    • Otherwise Nagios would error on the empty contact groups.
    • All members of groups go into an ALL group. This will be used to give users permissions
    • The group manager goes into a BOSS group. This is used for alerting the manager after on-call individuals fail to acknowledge an issue
static configuration example hosts
Static Configuration Example - Hosts

define hostescalation{

hostgroup_name network_oncall

contact_groups network_oncall_pri

first_notification 1

last_notification 0

notification_interval 15

}

define hostescalation{

hostgroup_name network_oncall

contact_groups network_oncall_sec

first_notification 2

last_notification 0

notification_interval 15

}

define hostescalation{

hostgroup_name network_oncall

contact_groups network_boss

first_notification 4

last_notification 0

notification_interval 15

}

define hostescalation{

hostgroup_name network_oncall

contact_groups network_all

first_notification 3

last_notification 0

notification_interval 15

}

Created by script

Created by script

Created in XI and manager of group assigned as member

Created in XI and all members of group assigned as members

static configuration example services
Static Configuration Example - Services

define serviceescalation{

hostgroup_namenetwork_oncall

service_description *

contact_groupsnetwork_oncall_pri

first_notification 1

last_notification 0

notification_interval 15

}

define serviceescalation{

hostgroup_namenetwork_oncall

service_description *

contact_groupsnetwork_oncall_sec

first_notification 2

last_notification 0

notification_interval 15

}

define serviceescalation{

hostgroup_namenetwork_oncall

service_description *

contact_groupsnetwork_all

first_notification 3

last_notification 0

notification_interval 15

}

define serviceescalation{

hostgroup_namenetwork_oncall

service_description *

contact_groupsnetwork_boss

first_notification 4

last_notification 0

notification_interval 15

}

The way we set it up, it uses the same hostgroup used for all the hosts and uses a wildcard for service, to include all services.

This could get very complicated if different groups/individuals were needed on different services on the same host.

static configuration example services1
Static Configuration Example - Services

define serviceescalation{

host_name *

servicegroup_namedba_oncall

contact_groupsdba_oncall_pri,dba_oncall_sec

first_notification 1

last_notification 0

notification_interval 15

}

define serviceescalation{

host_name *

servicegroup_namedba_oncall

contact_groupsdba

first_notification 500

last_notification 0

notification_interval 15

}

static configuration example services2
Static Configuration Example - Services

The services can be an simple as the last slide, or as complex as you can imagine. This attached file is a great example of the complexity that is capable.

slide27

Questions?

James Clark

Systems Monitoring Administrator

banditbbs@gmail.com