Guides‎ > ‎Mod Gearman‎ > ‎Job Server (master)‎ > ‎

CentOS 6.5 - Nagios XI

My CentOS 6.5 x64 server is running Nagios XI 2014 R2.0. I've freshly deployed it after downloading it from the Nagios Library.

Install Mod Gearman

  • Type cd /tmp and press Enter
  • Type wget http://assets.nagios.com/downloads/mod_gearman/rpms/mod_gearman-1.5.0b1-1.el6.x86_64.rpm and press Enter
  • Wait while the file is downloaded
  • Type yum -y localinstall --nogpgcheck mod_gearman-1.5.0b1-1.el6.x86_64.rpm and press Enter
  • Wait while the dependencies are downloaded/installed and mod gearman is installed

At this point:
  • Gearman (gearmand) is installed and stopped
  • Mod Gearman Job Server is installed however Nagios is not yet configured to use it
  • Mod Gearman Worker (mod_gearman_worker) is installed and stopped

Configure Firewall Rules

This becomes important when you start using workers on other servers (external workers).
  • Type iptables -I INPUT -p tcp --destination-port 4730 -j ACCEPT and press Enter
  • Type service iptables save and press Enter


Gearman Configuration

Gearman and Mod Gearman ... whats the difference. The Mod Gearman project utilizes another Open Source project called Gearman. Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. This is one of those situations where there's no point in re-inventing the wheel.

We need to configure gearmand to run on boot.
  • Type chkconfig gearmand on and press Enter


Job Server Configuration

The default configuration file for the Job Server is /etc/mod_gearman/mod_gearman_neb.conf

At this point there is nothing to be configured, the default settings are fine. The workers will contact this server to let the Job Server know they are ready to do work.

However there is one setting that should change as it prevents some known issues. Instead of using "localhost", use "127.0.0.1" instead.
  • Type nano /etc/mod_gearman/mod_gearman_neb.conf and press Enter
  • Add the following line to the config file, anywhere is fine (I'm adding it under the NDOUtils module)
  • Find the setting:
    • server=localhost:4730
  • Change this to:
    • server=127.0.0.1:4730
  • Press Ctrl + x
  • Type Y
  • Press Enter to save changes


Worker Configuration

The default configuration file for the Worker is /etc/mod_gearman/mod_gearman_worker.conf

At this point there is nothing to be configured, the default settings are fine. The worker will contact the Job server to let the Job Server know they are ready to do work. But how does the worker know the address of the job server?
  • server=localhost:4730
This setting is how to worker knows who to talk to. In this case it's a local worker talking to the local Job server, which will be fine for our initial testing.

However there is one setting that should change as it prevents some known issues. Instead of using "localhost", use "127.0.0.1" instead.
  • Type nano /etc/mod_gearman/mod_gearman_worker.conf and press Enter
  • Add the following line to the config file, anywhere is fine (I'm adding it under the NDOUtils module)
  • Find the setting:
    • server=localhost:4730
  • Change this to:
    • server=127.0.0.1:4730
  • Press Ctrl + x
  • Type Y
  • Press Enter to save changes

Nagios Configuration

Finally we need to tell Nagios to use Mod Gearman for distributing checks.

  • Type nano /usr/local/nagios/etc/nagios.cfg and press Enter
  • Add the following line to the config file, anywhere is fine (I'm adding it under the NDOUtils module)
  • broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf
  • Press Ctrl + x
  • Type Y
  • Press Enter to save changes


Start /Stop Sequence

You must should and stop everything in a sequence for it to all work correctly.

Go ahead and start everything (in this case Nagios is already running so you can do a restart instead)

  • Service Start Order:
    • Type service gearmand start and press Enter
    • Type service mod_gearman_worker start and press Enter
    • Type service nagios start and press Enter
  • Service Stop Order:
    • Type service nagios stop and press Enter
    • Type service mod_gearman_worker stop and press Enter
    • Type service gearmand stop and press Enter

Is It Working?

First check to make sure Nagios is still running. If there was a problem, then the Nagios service would have stopped.
  • Type service nagios status and press Enter
Now check the Job Server Queue
  • Type gearman_top and press Enter
  • You'll get something like this:
gearman_top
  • Right now this is a pretty quiet server, so not much is happening
  • If you went into Nagios and did a "Schedule immediate check for all services on this host" for localhost you might see this screen update (once again, it's not doing a lot)
  • Press Ctrl + c to exit gearman_top

  • For a bit of fun, lets stop the worker and see what happens
  • Type service mod_gearman_worker stop and press Enter
  • Type gearman_top and press Enter
  • Go into Nagios and initiate a "Schedule immediate check for all services on this host" for localhost
  • Now gearman_top will look like this:

  • Which is basically saying that there are jobs queued up
  • You'll also notice that the workers don't start on boot right now and the same scenario will occur
  • Press Ctrl + c to exit gearman_top
  • Type service mod_gearman_worker start and press Enter
  • Type gearman_top and press Enter
  • You'll notice everything is back to normal now

Hold on ... why do I have workers running on my job server? That's just the default configuration on installation and separating this out to a different (external) server will be the next step.

Workers & localhost

Up until this point we have been using a local worker. Now it's time to configure some worker servers. However before doing that, a configuration change is required on the Job server first.

Our Nagios server has a handful of "localhost" checks which are specific to the Nagios host. If we configure external workers and send worker jobs to them for localhost, those checks will be local to the worker that runs then. So how can this be avoided?

The Job server can be configured to ignore Nagios host and service objects. This is done by putting these objects into Nagios host and service groups and then configuring the Job server to use those groups for it's "ignore list".

In Nagios, create a host group and a service group similar to as follows:

define hostgroup {
    hostgroup_name                        hosts_ignored_by_mod_gearman
    alias                                 hosts_ignored_by_mod_gearman
    members                               localhost
    }   


define servicegroup {
       servicegroup_name                     services_ignored_by_mod_gearman
       alias                                 services_ignored_by_mod_gearman
       members                               localhost,PING,localhost,Root Partition,localhost,Current Users,localhost,Total Processes,localhost,Current Load,localhost,Swap Usage,localhost,SSH,localhost,HTTP
    }   

Make sure you restart Nagios to ensure the configuration is active.

Now to configure the Job server. What we are doing here is telling the Job Server to:
  • Ignore any hosts in the hostgroup "hosts_ignored_by_mod_gearman", let Nagios execute these checks locally
  • Ignore any services in the servicegroup "services_ignored_by_mod_gearman", let Nagios execute these checks locally
  • Type nano /etc/mod_gearman/mod_gearman_neb.conf and press Enter
  • Add the following line to the config file, anywhere is fine (I'm adding it under the NDOUtils module)
  • Find the setting:
    • localhostgroups=
  • Change this to:
    • localhostgroups=hosts_ignored_by_mod_gearman
  • Find the setting:
    • localservicegroups=
  • Change this to:
    • localservicegroups=services_ignored_by_mod_gearman
  • Press Ctrl + x
  • Type Y
  • Press Enter to save changes

Now stop / start Nagios / Job server / worker as per the required sequence.
  • Type service nagios stop and press Enter
  • Type service mod_gearman_worker stop and press Enter
  • Type service gearmand restart and press Enter
  • Type service mod_gearman_worker start and press Enter
  • Type service nagios start and press Enter
That's all there is to configuring it.

How can you test if this is working?
  • In the example earlier, when the mod_gearman_worker service was stopped, the "Jobs Waiting" in gearman_top simply queued up
  • With that in mind, if you stop the mod_gearman_worker service now and forced checks of the localhost (host and service objects), you would see that they keep being executed and they do not queue up in "Jobs Waiting" in gearman_top.


External Workers

Now it's time to add some external workers. Follow the steps here and then come back to this guide when those steps have been completed.

In this scenario, two workers were just deployed and are pointing to this Job server. Here's what it looks like from gearman_top:

Three Workers

What you can see here is three workers, the job server is a worker along with the two other external worker servers.

Remove Local Worker

If you did not want your local Nagios server being a worker then you can simply stop it and prevent it from starting on boot:
  • Type service mod_gearman_worker stop and press Enter
  • Type chkconfig mod_gearman_worker off and press Enter
  • The next time you reboot the server you will not see the Job server in the gearman_top worker list


Keep Local Worker

If you want to keep your local Nagios server as a worker then you can simply configure it to starting on boot:
  • Type chkconfig mod_gearman_worker on and press Enter

It's as simple as that.