Monitor Type: nagios
(Source)
Accepts Endpoints: No
Multiple Instances Allowed: Yes
Wrapper to run existing nagios status check scripts through SignalFx agent which will play the role of NRPE or SNMP exec.
It will run the script set in command
parameter and send the
state of the check depending on the
exit code of the command.
It is very similar to telegraf/exec
monitor configured with dataFormat: nagios
but:
- it does not retrieve perfdata metrics, only the state of the script for alerting purpose.
- it will override the state if exit code == 0 but output string starts with
warn
,crit
orunkn
(not case sensitive).
Also the main advantage and purpose of this monitor is to add more context to this status check state thougth SignalFx events. Indeed, in addition to the state metric, it will send an event which includes the output and the error caught from the command execution.
This should make the troubleshooting more efficient and allow the user to remain in SignalFx without to have to connect to the machine in case of anormal state to understand what is happening. It will also give the ability to create a dashboard similar to what nagios user are accustomed to.
Note: the last sent event is cached into memory to avoid sending the same event for each collection
interval over and over but already sent event will be send again when you restart the agent erasing its
cache. If your check always "normally" produces a different output for each run like the uptime check
does so you can use the FilterStdOut: true
parameter to ignore it in comparison.
To activate this monitor in the Smart Agent, add the following to your agent config:
monitors: # All monitor config goes under this key
- type: nagios
... # Additional config
For a list of monitor options that are common to all monitors, see Common Configuration.
Config option | Required | Type | Description |
---|---|---|---|
command |
yes | string |
The command to exec with any arguments like: "LC_ALL=\"en_US.utf8\" /usr/lib/nagios/plugins/check_ntp_time -H pool.ntp.typhon.net -w 0.5 -c 1" |
service |
yes | string |
Corresponds to the nagios service column and allows to aggregate all instances of the same service (when calling the same check script with different arguments) |
timeout |
no | integer |
The max execution time allowed in seconds before sending SIGKILL (default: 9 ) |
ignoreStdOut |
no | bool |
If false and change is detected on stdout compared to the last event it will send a new one (default: false ) |
ignoreStdErr |
no | bool |
If false and change is detected on stderr compared to the last event it will send a new one (default: false ) |
These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.
nagios.state
(gauge)
Nagios status check state.
To emit metrics that are not default, you can add those metrics in the
generic monitor-level extraMetrics
config option. Metrics that are derived
from specific configuration options that do not appear in the above list of
metrics do not need to be added to extraMetrics
.
To see a list of metrics that will be emitted you can run agent-status monitors
after configuring this monitor in a running agent instance.
The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.
Name | Description |
---|---|
command |
The configured command for this monitor. |
plugin |
The name of this monitor: nagios . |
service |
The configured service for this monitor. |