• Sponsorship

  • Using NSClient++ from nagios

    TODO: I will writeup some information on how to use nsclient++ from nagios here (in the mean time some quick thoughts and such)

    Choosing a mode of operation

    NSClient++ has several modes of operation that you can use with nagios:

    1. NSClient (check_nt) Only has some basic checks and is intended for backwards compatibility.
    2. NRPE (check_nrpe) This is what I would think of as the "normal" or preferred way to use NSClient++. Most examples are intended to be used in this mode.
    3. NSCA (nsca server) If you are an "advanced" nagios user you might want to do passive checking (which is supported from NSClient++). If you don't know what NSCA is you probably don't want to do this.

    I would recommend nagios-beginners to starting out with NSClient++ to go with NSClient (since it is simplest to setup) and everyone else go with NRPE (unless you have specific needs in which case you most likely know enough to choose for you self).

    NSClient (check_nt)

    This is the simplest and most locked in way to use NSClient++ you are limited to a handful of checks and there is no way to exploit the power of NSClient++ from here. The good though is that it is very simple to use and setup so it might be a good way to start. It is also the "only" way to have password protection. But note that since there is no encryption the password is sent as clear text so if you are compromised it will be easy to find. Also since check_nt is distributed in the "normal plugin kit" you undoubtedly already have everything you need on the nagios side.

    Nagios have their own guide for setting this up here http://nagios.sourceforge.net/docs/3_0/monitoring-windows.html

    using from the command line

    I tend to always start with this as it is a good way to eliminate errors and you wont have to bother with restarting/waiting on nagios when you need to make changes. To access NSClient++ from the nagios-server via the NSClient protocol you use a program (comes with the default plugins) called check_nt.

    check_nt -H <client ip> -p <port> -v <command> ...
    
    • client ip = the IP of the server you want to monitor (i.e. where NSClient++ i installed).
    • port = the port you are using for the NSClientListener (defaults to 12489)
    • command = is the various things you can monitor. The various commands all take different additional arguments which are all showed in the help.

    To check the CPU load you can for instance run the following (assuming your windows server has 10.0.0.1 as ip address)

    check_nt -H 10.0.0.1 -p 12489 -v CPULOAD -w 80 -c 90 -l 5,80,90,10,80,90
    CPU Load 0% (5 min average) 0% (10 min average) |   '5 min avg Load'=0%;80;90;0;100 '10 min avg Load'=0%;80;90;0;100
    

    If you instead got the following don't worry, we will solve that in the next section.

    CRITICAL - Socket timeout after 10 seconds
    

    NSClient++ configuration

    The first thing you need to do is decide which modules you want to use. NSClient++ is modular by design this means you only use the features you want (and if you want you can use all of them). The modules can be roughly divided into two kinds.

    1. check commands
    2. protocols (and utility modules).

    The first kind is the one you *use* it responds to your commands and "finds" monitored data for you. The second kind is the one that allows you to talk to the first kind. When it comes to modules for the NSClient mode you will need the following:

    • FileLogger.dll - Logs errors to a file so you can see what is going on.
    • CheckSystem.dll - Handles many system checks (CPU, MEMORY, COUNTER etc)
    • CheckDisk.dll - Handles Disk related checks (USEDDISKSPACE)
    • NSClientListener.dll - Listens and responds to incoming requests from nagios.

    To enable modules you edit the [modules] section in the nsc.ini file and your section should look something like this:

    [modules]
    FileLogger.dll
    CheckSystem.dll
    CheckDisk.dll
    NSClientListener.dll
    NRPEListener.dll
    

    The other things you need to configure is who is allowed to ask questions (which ip addresses) this is done either under the [Settings] section (globally) or under the [NSClient] (locally). I would recommend using the [Settings] section as it will simplify tings when you start using NRPE. The keys you need to change are allowed_hosts and password. And the value should be:

    • allowed_hosts = A list of addresses that is allowed to ask questions (i.e. your nagios ip).
    • password = The password to use.

    The result should look like this (assuming you don't use a password and the nagios ip address is 10.0.0.2):

    [Settings]
    ;password=secret-password
    allowed_hosts=10.0.0.2
    

    Notice that since you don't use a password that key is commented out (;).

    Don't forget to restart NSClient++ after you make changes to the NSC.ini file.

    nsclient++ /stop
    nsclient++ /start
    ... or ...
    net stop nsclientpp
    net start nsclientpp
    

    Now feel free to try the command line agent again and hopefully things should work out perfectly. Run the following command from your nagios server.

    check_nt -H 10.0.0.1 -p 12489 -v CPULOAD -w 80 -c 90 -l 5,80,90,10,80,90
    CPU Load 0% (5 min average) 0% (10 min average) |   '5 min avg Load'=0%;80;90;0;100 '10 min avg Load'=0%;80;90;0;100
    

    Finding and solving problems

    A good way to find and solve problems is to run nsclient++ in "test" mode this is done by stopping the service and starting it in "test" mode.

    nsclient++ /stop
    nsclient++ /test
    ... test mode ... (quite with: exit)
    nsclient++ /start
    

    When in test mode you will get a lot of interesting log messages when things are happening so it is fairly simple to figure out what is wrong.

    nagios configuration

    Nagios comes pre-configured for many of the NSClient checks. in windows.cfg you will find many entries along the lines of:

    define service{
    	use			generic-service
    	host_name		winserver
    	service_description	NSClient++ Version
    	check_command		check_nt!CLIENTVERSION
    }
    

    The interesting part here is: 'check_nt!CLIENTVERSION' which will run a check against check_nt. In commands.cfg the check_nt command is defined like so:

    # 'check_nt' command definition
    define command{
    	command_name	check_nt
    	command_line	$USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
    }
    

    So you can see most things are already setup for you so it is quite simple to get started. The more "advanced" checks (which takes parameters) looks like this if you recall the CPULOAD we tried from the command line:

    define service{
    	use			generic-service
    	host_name		winserver
    	service_description	CPU Load
    	check_command		check_nt!CPULOAD!-l 5,80,90
    }
    

    the command is now defined as 'check_nt!CPULOAD!-l 5,80,90' which translates directly into:

    <plugin dir>/check_nt -H <ip of client> -p 12489 -v CPULOAD -l 5,80,90
    

    which if you recall is exactly what we used when we tried the command from the command line. If you want to add a password the simplest way is to add it in command.cfg (if you want to have the same password on all your clients) like so:

    # 'check_nt' command definition
    define command{
    	command_name	check_nt
    	command_line	$USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s <password> -v $ARG1$ $ARG2$
    }
    

    And now you are all set. (sort of)...

    And remember if you experience problems don't "debug" from nagios, run your command from the command line while having nsclient++ running in /test mode and you should be fine!

    NRPE (check_nrpe)

    NRPE is the preferred way and, if you ask me, you get the most out of NSClient++ choosing this mode. NRPE works much like NRPE for unix (if you are familiar with it) and in short you can say it relays a plugin request to a remote server. NRPE acts like a simple transport layer allowing remote execution. The difference between regular NRPE and NSClient++ is that NSClient++ has built-in checks. So with NSClient++ you get a lot of ready-to-use checks that wont require you to have scripts. But if you choose you can disable all "modules" and stick with a pure NRPE installation and only external scripts.

    using from the command line

    NRPE require you to install a special plug-in on your nagios server called NRPE. The unix-side of NRPE consists of a server and a client on nagios you only need the client so you can skip any "servers" or what not that it want to start when you install it.

    The client is (generally) called check_nrpe and works like so:

    ./check_nrpe -H <nsclient++ server ip> -c <command> [-a <a> <list> <of> <arguments>]
    
    • <command> = The command (script) you want to run (often this is a pre-built command from within NSClient++)
    • <a> <list> <of> <arguments> = a list of arguments for the command.

    So the simplest way to see if things are a-working just run it without a command and you should get a response specifying the version of "NRPE" (in this case NSClient++) like so:

    ./check_nrpe -H <nsclient++ server ip>
    I (0.3.3.19 2008-07-02) seem to be doing fine...
    

    And again like in the NSClient example above don't worry if you get a timeout here since we have to configure NSClient++ before it actually works.

    First think about a nagios command to check the actual installed version (see above)

    # 'check_nrpeVersion' command definition
    define command{
    	command_name	check_nrpeVersion
    	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 
    }
    

    NSClient++ configuration

    Configuring NRPE is a bit more involved but not overly so. The first thing you need to do to get things working is add the NRPEListener module.

    [modules]
    ...
    NRPEListener.dll
    ...
    

    If you have not already done so (above) you also need to set which computers are allowed to query the agent. This is set either under the [Settings] section (globally) or under the [NRPE] section (locally). If you when you configured NSClient above set this globally you are already set to go. If not the key you need to change is the allowed_hosts. There is no password for NRPE.

    • allowed_hosts = A list of addresses that is allowed to ask questions (i.e. your nagios ip).

    The result should look like this (assuming your nagios server ip address is 10.0.0.2):

    [Settings]
    allowed_hosts=10.0.0.2
    

    After this restart the service.

    nsclient++ /stop
    nsclient++ /start
    ... or ...
    net stop nsclientpp
    net start nsclientpp
    

    Now feel free to try the command line agent again and hopefully things should work out perfectly. Run the following command from your nagios server.

    ./check_nrpe -H 10.0.0.1
    I (0.3.3.19 2008-07-02) seem to be doing fine...
    

    Finding and solving problems

    A good way to find and solve problems is to run nsclient++ in "test" mode this is done by stopping the service and starting it in "test" mode.

    nsclient++ /stop
    nsclient++ /test
    ... test mode ... (quite with: exit)
    nsclient++ /start
    

    When in test mode you will get a lot of interesting log messages when things are happening so it is fairly simple to figure out what is wrong.

    NSClient++ configuration (revisited)

    As we said before it is a bit more involved to configure NRPE and yet thus far it has actually been simpler? This is because we have not configured things yet NRPE has a few more keys and I shall go over the most important once here:

    • use_ssl Boolean If this is 1 (true) we will use SSL encryption when communicating. Notice this flag has to be the same on both ends or you will end up with strange errors. The flag is set on check_nrpe with the -n option (if you use -n no SSL will be used).
    • allow_arguments Since arguments can be potentially dangerous (it allows your users to control the execution) there is a flag (which defaults to off) to enable arguments. So if you plan on control NSClient++ from the nagios end you need to enable this. But be warned this is a security issue you need to think about. If you do not want to allow arguments you can instead configure all checks in the NSC.ini file and just execute the aliases from nagios.
    • allow_nasty_meta_chars This flag allows arguments to contain "dangerous" characters such as redirection and pipe (<>|) and makes things a tad more dangerous. But if you decide to use arguments you most likely want to use this flag as well. But again this is a security risk

    using from the command line (revisited)

    Now that we have the agent up and running (if not probably want to go back over the previous sections to get it up and running before reading on) what can we do with it?. From here on we will assume you have allow arguments and metchars enabled since it makes it simpler to try things out. As we stated before check_nrpe is a lot more powerful then the "old" check_nt and there is a lot of built in commands as well as a lot of external once you can use. The built in once are listed below.

    Lets start with a simple one [CheckCPU] and see how to use it.

    If we check the docs for it it has an example like so:

    checkCPU warn=80 crit=90 time=20m time=10s time=4
    CPU Load ok.|'20m average'=11%;80;90; '10s average'=7%;80;90; '4 average'=10%;80;90;
    

    Now this is a "NSCLient++ /test mode command" so it is not usable in it self for you instead you need to change it slightly. The first word is the command and the rest are arguments. check_nrpe has two options for settings commands (-c) and arguments (-a) and is used like so:

    check_nrpe ... -c <command> [-a <argument> <argument> <argument>]
    

    in this case (CheckCPU) this translates to:

    check_nrpe ... -c CheckCPU -a <the argument list as-is>
    check_nrpe ... -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4
    CPU Load ok.|'20m average'=11%;80;90; '10s average'=7%;80;90; '4 average'=10%;80;90;
    

    And that is as hard as it gets all you need to do is figure out which arguments you want to use for the command and stack them all in a long line.

    nagios configuration

    ... TODO ...

    NSCA (nsca-server)

    ... TODO ...

    using from the command line

    ... TODO ...

    NSClient++ configuration

    ... TODO ...

    nagios configuration

    ... TODO ...