Service dependencies allow you to suppress notifications and active checks of services based on the status of other service(s). Why would you want this anyway, when a host goes down doesn't this happen anyway? Yes and No.define host { use windows-server host_name host1 alias host1 address 10.25.14.51 } define host { use windows-server host_name host2 alias host2 address 10.25.14.52 } define hostgroup { hostgroup_name nrpe_hosts_windows alias nrpe_hosts_windows members host1,host2 } define command { command_name check_nrpe_status command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 } define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$ } define service { use local-service hostgroup_name nrpe_hosts_windows service_description NRPE Status check_command check_nrpe_status max_check_attempts 2 check_interval 1 retry_interval 1 } define service { use local-service hostgroup_name nrpe_hosts_windows service_description CPU Load check_command check_nrpe!CheckCPU!warn=80 crit=90 time=1m time=5m time=15m ShowAll max_check_attempts 4 check_interval 1 retry_interval 1 } define service { use local-service hostgroup_name nrpe_hosts_windows service_description Disk Usage - C: check_command check_nrpe!CheckDriveSize!ShowAll MinWarn=10G MinCrit=5G Drive=C: max_check_attempts 4 check_interval 1 retry_interval 1 } One of these services is called NRPE Status and NSClient++ simply returns a response like I (0,4,1,105 2014-04-28) seem to be doing fine... with an OK status. If NSClient++ fails to respond, check_nrpe will return a response like CHECK_NRPE: Socket timeout after 30 seconds. with a CRITICAL status. When this happens, we don't want any other NRPE based checks to run or be notified about them, until this original problem is resolved.The Most Important Settings One of the most overlooked settings for services is the combination of
What this means is that the Why is this important? It means that the
service is guaranteed to do down BEFORE the other services and at this
point the service dependency will take affect. If they all had the same max_check_attempts value, then it is very possible that one of the other services could go down BEFORE the service and then notifications would be sent.Of course check_interval and retry_interval need to also be taken into consideration, however in this case they both have the value of 1 for the purpose of keeping the example simple.A service definition for a single host is a good starting example, the definition for this is as follows: define servicedependency {
inherits_parent 1 execution_failure_criteria u,c,p, notification_failure_criteria u,c,p, dependency_period 24x7 } What does this mean?
Now lets make the second host (host2) use the same dependencies. You can just add the host to the host_name directive, separating them with a comma: define servicedependency {
inherits_parent 1 execution_failure_criteria u,c,p, notification_failure_criteria u,c,p, dependency_period 24x7 } What does this mean?
The previous example allowed the dependency to be applied to multiple hosts. However this can become an administrative overhead each time you add a new host to be monitored, you would need to update the dependency to include the new host. Using a hostgroup instead is a much simpler way to achieve this. Considering you need the same named services on each host it's more than likely you'll be using hostgroups to apply services to multiple hosts as per this example. In that case you'll see I defined a hostgroup earlier called nrpe_hosts_windows and we need to use the hostgroup_name directive:define servicedependency {
inherits_parent 1 execution_failure_criteria u,c,p, notification_failure_criteria u,c,p, dependency_period 24x7 } What does this mean?
|
Guides > Configurations and Definitions >