Using Nagios if my java process goes down I want to restart it when it stops. Is there an available api/listener from nagios to do the same and is this possible using nagios.
Any help on this would be useful.
thanks
Lokesh
Yes, this is achieved through the use of Event Handlers in Nagios.
http://nagios.sourceforge.net/docs/3_0/eventhandlers.html
Use event handlers.
Here is a config i use
define service{
host_name mysql_host
service_description mysql
max_check_attempts 2
event_handler mysql_bounce
}
define command{
command_name mysql_bounce
command_line /opt/nagios/scripts/mysql_bounce
}
/opt/nagios/scripts/mysql_bounce contains the command to bounce mysql.
Related
Firstly, sorry for my english.
I try to configure a probe in nagios for monitor log files and notify me when Nagios find string like "Exception" or "Error".
I use Nagios with Centreon.
So, when I execute my command :
$USER1$/check_log -F path/for/log.files -q /Exception/
Nagios return : "Log check error: Log file path/for/log.files does not exist!"
When i check in my server the path, the files exists, all (root, group and other) can read the file. So the problem doesn't seem to come of rights management.
The client for supervisor is a CentOS. I have already install nrpe client, and configure allowed host etc ...
I looked everywhere for someone who had the same error but find nothing.
If someone can help me, it would be so nice !
If you need further informations for help me, please, don't hesitate, i'm not sur that i'm explain in good way my problem.
Regards.
On Nagios server side, you must define something like this:
define service {
service_description service_name
host_name your_remote_hostname
use your_template
check_command ext_check!check_log!-f path/for/log.files -g /Exception/
}
In commands.cfg or in similar file on Nagios server:
define command{
command_name ext_check
command_line $USER1$/check_nrpe -t 30 -H $HOSTADDRESS$ -p 5666 -c $ARG1$ -a $ARG2$
}
On client/monitored host you must edit nrpe.cfg file:
command[check_log]=/opt/nagios/plugins/check_log $ARG1$
after that, you must restart nrpe service and reload Nagios configuration.
I'm getting a (No output returned from plugin) from a host and cannot understand why:
Service on monitor server:
# Check Clamd availability
define service {
hostgroup_name clamd-servers
service_description ClamAV Daemon
check_command check_nrpe!check_clamd
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
Hosts on monitor:
# Clamd Servers
define hostgroup {
hostgroup_name clamd-servers
alias ClamAV servers
members fsmvps
}
nrpe_local.fcfg on host fsmvps
command[check_clamd]=/usr/lib/nagios/plugins/check_clamd -H /var/run/clamav/clamd.ctl
Running the command /usr/lib/nagios/plugins/check_clamd -H /var/run/clamav/clamd.ctl on the host will produce the following output as clam is up and running:
CLAMD OK - 0.000 second response time on socket /var/run/clamav/clamd.ctl [PONG]|time=0.000219s;;;0.000000;10.000000
Clueless at the moment as to why no output is returned as I'm a beginner on Nagios.
Perhaps your NRPE service wasn't setup right (sometimes it complains about ssl).
Running (as the nagios user) on your monitor server something like :
/usr/lib/nagios/plugins/check_nrpe -H fsmvps check_clamd
Might help diagnose things.
It might be :
Permissions (can the nagios user on fsmvps read /var/run/clamav/clamd.ctl)
check_nrpe needs the -n flag or a different port.
you've not restarted nrpe on the fsmvps server after editing it's config.
I want to check my supervisord status by nagios.I haven 2 servers 1 nagios and other is client server.In my client server supervisor is running.
I have put my check_supervisord.py file in my /usr/local/nagios/libexec path & on my services.cfg file:
define service {
use generic-service
host_name ubuntuserver
service_description supervisord
check_command check_supervisord!80!hduser!password
}
But it showing me plugin missing error,
Since your other plugins are successfully running I would guess this is a permission issue.
cd /usr/local/nagios/libexec
chmod 755 check_supervisord.py
chown root:nagios check_supervisord.py
Try that and see if the plugin works. If this doesn't work try and see what permissions supervisord needs to run in a script or compare the script permissions to the other plugins that are working on your system.
I've got two NRPE checks in nagios that are failing as undefined in the nagios web interface.
Which is odd, because:
1) I have the check defined in the config file for this host on the nagios server. In /usr/local/nagios/etc/objects/web2.cfg I have these two definitions.
# Define a service to check Cassandra reads on the web2 machine.
define service{
use generic-service
host_name web2
service_description Check NRPE Cassandra JMX
contact_groups linux-admins
check_command check_nrpe!check_cassandra_jmx
notifications_enabled 1
}
# Define a service to check Cassandra reads on the web2 machine.
define service{
use generic-service
host_name web2
service_description Check NRPE Cassandra Heap
contact_groups linux-admins
check_command check_nrpe!check_cassandra_heap
notifications_enabled 1
}
2) I have the check defined in the nrpe.cfg on the host I'm checking:
[root#web2:~] #egrep "heap|jmx" /usr/local/nagios/etc/nrpe.cfg
command[check_cassandra_jmx]=/usr/local/nagios/libexec/check_jmx -U service:jmx:rmi:///jndi/rmi://beta.jokefire.com:7199/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 10737418240 -c 20401094656
command[check_cassandra_heap]=/usr/local/nagios/libexec/cassandra.pl
3) If I go back to the nagios host and run the commands via check_nrpe on the command line, it succeeds:
[root#monitor1:~] #/usr/local/nagios/libexec/check_nrpe -H web2.mydomain.com -c check_cassandra_jmx
JMX OK HeapMemoryUsage.used=142913536{committed=526385152;init=536870912;max=526385152;u sed=142913536}
[root#monitor1:~] #/usr/local/nagios/libexec/check_nrpe -H web2.mydomain.com -c check_cassandra_heap
CASSANDRA OK - | heap_mem=27.46
Other checks for this host are working fine in the web interface. Does anyone out there have some ideas about what's wrong in this case?
Thanks!
I am looking for a little help in configuring NAGIOS for NRPE. I am quite new at Linux and seem to be having some trouble getting this working.
I am running Ubuntu 11.10 with the Nagios 3.3.1 core and Nagios plugins 1.4.15 running nrpe2.13
Currently I am trying to get the Nagios Exchange plugin check_be.exe to work with Nagios. I followed the check_be.txt for the setup on my nagios server and windows backup exec server.
Currently if I run
root#PERSES:/usr/local/nagios/libexec# ./check_nrpe -H 192.168.1.10 -t200 -c check_be
I will get
Job: Daily Backup, Success, Date:17/4/2012
From Nagios all I get is no output returned from plugin.
Windows.cfg has the following entry
# Service for Backup Exec agent
define service {
use template-backupexec
service_description BackupExec - Daily DAT backup ; specific display name, if you need
host_name cmbssrv.cmbs.local
}
Templates.cfg has this entry – I have tried to modify it to avoid the socket timeout
define service{
name template-backupexec
use generic-service
service_description BackupExec Job Check ; default display name in Nagios
check_command check_nrpe! -t 240 -c check_be ; same name as in the nsclient++ nsc.ini command defini$
normal_check_interval 60 ; your check intervals here
retry_check_interval 60
register 0 ; this is a template
}
Commands.cfg:
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDESS$ -p 5666 -v $ARG1$
}
Any ideas would be greatly appreciated
This looks wrong
check_command check_nrpe! -t 240 -c check_be
I think those extra arguments need to be in the define command block.
Also change the name of the check_command. You may confuse the check_nrpe executable command (runs in terminal) with your check_command of the same name (which is unknown to the terminal shell).
Here's a working example much like what you are doing.
On the main nagios machine:
define command {
command_name check_nrpe_cart
command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -p 6565 -c $ARG1$
}
define service{
use clientcritical
host_name cartbox
service_description email
normal_check_interval 15
check_command check_nrpe_cart!check_postfix
}
On cartbox in /etc/nagios/nrpe_local.cfg
command[check_postfix]=/usr/lib/nagios/plugins/check_procs -w 1:1 -c 1:1 -C master
You should read pdf from the following link you will get maximum posible answer
for NRPE NAGIOS comunication problem.
http://assets.nagios.com/downloads/nagiosxi/docs/NRPE_Troubleshooting_and_Common_Solutions.pdf