nagios

Monitoring account creation/password resets

Tagged:

It's taken a while, but we finally are monitoring the Sybase tables that manage the Unity account creation, renames, and password resets.  In the past, if our account management code in either Novell or AD land failed for some reason, the first indication we would get would be furious screaming from the dbas,

https://sysnews.ncsu.edu/tools-bin/server-status?info=ncsu-accounts-ad

Down to 8 servers that can't be monitored

Whoopie, huzzah, etc!
We're down to exactly 8 hosts that the "new" monitoring subnets can't see, and they're all in the same 152.1.64.0/24 subnet.

Firewall contexts set to allow new nagios access

As we work to bring the new Nagios system on-line, there's a lot of firewall changes that need to get made.

Comtech has established a range of addresses for all OIT devices meant for monitoring, and the intention is to open this address range (referred to as "OIT-Monitor") for incoming access for monitoring protocols for all datacenter subnets. This will make firewall settings easier and quicker for ComTech, and allow everyone less back and forth in setting up firewall rules for new VLANS.

license02 software firewall adjusted for flexLM monitoring

FYI, I've adjusted the software firewall on license02 so that the new Nagii can monitor it properly.

Windows servers now checked every 90 min for backup agents

We've added a new check, "backup agent" to the "windows-hosted" servers.

This check tests to see if either the NetBackup OR Avamar agents is loaded.  If one is, then it return green, if neither are found or on any errors, a yellow alert is generated.

We'll be deploying this on other Windows machines shortly.

Clearing errors on Windows 'uptime' and 'load' checks in Nagios

We had a couple of Windows hosts give the error "Could not get value" when checking uptime, and the extra mysterous "ERROR: Could not get data for 1 perhaps we don't collect data this far back?" while checking load.

Turns out the Windows performance counters had gotten sidewise, and need to be rebuilt.  With some help from the event log and http://eventid.net, we discovered

Nagios tests

Checks that can be done on our Nagios Systems, including information about what tests ("commands") are available, and how to configure them.

Nagios tests - check_mssql_health

This plugin alows you to monitor a Microsoft SQL Server by checking more than 20 metrics.

Nagios Server Side config

In the old Microsys repo there is a RHEL5 package for the Nagios server side of this plugin. Note that this plugin uses the perl module DBD::Sybase.

Install the package check_mssql_health and/or put it in the kickstart file for the Nagios server for automatic installation.

Uninstalling the older NSClient++ plugin for Nagios (Windows Servers)

We've run into several machines that can't upgrade to the latest NSClient++ plugin (the software that implements NRPE, the Nagios Remote Plugin Executor "protocol") because the installed version doesn't appear in Add/Remove programs.

Bizzare!

To uninstall it manually, assuming it's located in "C:\Program Files\NC State University\NSClientPP" issue the folllowing commands from a cmd prompt:

"C:\Program Files\NC State University\NSClientPP\nsclient++.exe" /uninstall
rd /q "C:\Program Files\NC State University\NSClientPP"

 

Windows Terminal Server specific nagios config

To monitor Windows Terminal Server Licenses, we do the following