I'm trying to excute the following command:
/usr/lib/nagios/plugings/check_nrpe -H xxxxxxx -c check_disk -a 60 80 /dev/sda1
but I got the following message:
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
When I consult the log in the remote host, I found that:
anonymous rsyslogd-2359: action 'action 17' resumed (module 'builtin:ompipe') [try http://www.rsyslog.com/e/2359 ]
The command /usr/lib/nagios/plugings/check_disk -a 60 80 /dev/sda1 is working fine in the remote host.
A couple of things to check:
On the machine where you have NRPE installed, check the nrpe.cfg file. Make sure the allowed_hosts= line is uncommented and that its value includes the IP address of the Nagios server that's trying to connect to it. Make sure the dont_blame_nrpe line is uncommented and it's value is set to 1, i.e.: dont_blame_nrpe=1.
Restart nrpe.
If you have firewall rules enabled on this machine, make sure there's a rule to allow connections to tcp port 5666.
Related
First I get this error "connect to address 10.0.0.102 and port 12489: Connection refused", then I made some changes according to forums and I changed nclient.ini file. I adjusted the allowed hosts and password.
Then server side the Status Information is changed. Now I get this error in Nagios Admin Panel:
NSClient - ERROR: Invalid password.
However, at Nagios XI Server I checked password and it's same as in nsclient.ini file on client side.
I used this command to check:
/usr/local/nagios/libexec/check_nt -H 'hostname' -s nagpasswd -p 12489 -v CPULOAD -w 80 -c 90 -l 5,80,90,10,80,9 NSClient
What might be the issue? Any help would be perfect.
after changing the password , you need to restart your nagios service once
I installed Nagios to my local server, and am monitoring a CentOS server.
All the plugins (nagios plugins and nrpe) are installed too, and working in local, but not via my server. Generic services are monitored well but others (local services) aren't working. Statut information shows: CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
I've installed nrpe in my remote host and added commands in nrpe.cfg.
In my nagios server, I defined those commands in my server's configuration file.
When I check those commands in my centOS server, it works well.
For exemple when I type:
./check_procs -w 250 -c 300
prompt shows:
PROCS AVERTISSEMENT: 284 processus | procs=284;250;300;0;
Or the command: ./check_nrpe -H localhost
It shows: NRPE v3.2.1
Everything seems working, but if I try: ./check_nrpe -H monitoredserver, it doesn't work.
Also, in nagios web interface, every local service in monitored server shows: CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Please verify common mistakes which you can have:
Your NRPE daemon is not started on remote server
Run service nrpe status on remote server and verify NRPE state.
Test your network connection
Run telnet monitoredserver 5666 from Nagios server and test your connection. If this command fail, then you have firewall between these servers.
I just finished installing Nagios 3 in Ubuntu server and I'm not sure how I can add a third party plugin into it.
The plugin is available : Here
Thanks in advance for your help
You didn't mention any information about the server that you want to monitor with Nagios.
I'm going to assume it's an Ubuntu Linux server and it's not the same server as the machine you installed Nagios on.
On the server to be monitored:
Ensure that NRPE (Nagios Remote Plugin Executor) is installed. Here's a link to instructions for installing NRPE on the Ubuntu operating system.
http://tecadmin.net/install-nrpe-on-ubuntu/
After you install NRPE on the server to be monitored, it's very important that you edit the nrpe.cfg file (most likely found at etc/nagios/nrpe.cfg but this can differ based on your installation method).
You need to modify the allowed_hosts configuration line to include the IP address of your Nagios server. If you don't, NRPE will refuse connection attempts from Nagios and you won't be able to run your Nagios plugin or report results back to Nagios.
Be sure to restart NRPE after you've modified nrpe.cfg.
Next you'll need to download the Nagios plugin to the server being monitored. For example:
wget --directory-prefix=/usr/lib/nagios/plugins/ https://github.com/thehunmonkgroup/nagios-plugin-file-ages-in-dirs/archive/v1.1.tar.gz
cd to your nagios plugins directory and extract the tar-gzipped archive you just downloaded:
cd /usr/lib/nagios/plugins/
tar zxvf v1.1
ls -al /usr/lib/nagios/plugins/nagios-plugin-file-ages-in-dirs-1.1/check_file_ages_in_dirs
Be sure to give the nagios plugin script execute permissions:
chmod a+x /usr/lib/nagios/plugins/nagios-plugin-file-ages-in-dirs-1.1/check_file_ages_in_dirs
With the nagios plugin now residing on your server to be monitored, you will need to define some command definitions on that same server.
First you need to find the path that NRPE will search for new command definitions that you manually add to the system.
To do this, grep your nrpe.cfg file for the term "include_dir".
For example:
grep include_dir /etc/nagios/nrpe.cfg
include_dir=/etc/nrpe.d/
If no result for "include_dir" is returned from your grep, add the above "include_dir" configuration to your nrpe.cfg file. Ensure that the /etc/nrpe.d/ folder is created.
Create a new file in your include_dir named check_file_ages_in_dirs.cfg. Add to check_file_ages_in_dirs.cfg a command definition for check_file_ages_in_dirs pointing to the path of your Nagios plugin and including the arguments necessary to execute it.
For example:
echo "command[check_file_ages_in_dirs]=/usr/lib/nagios/plugins/nagios-plugin-file-ages-in-dirs-1.1/check_file_ages_in_dirs -d \"/tmp\" -w 24 -c 48" >> /etc/nrpe.d/check_file_ages_in_dirs.cfg
cat /etc/nrpe.d/check_file_ages_in_dirs.cfg
command[check_file_ages_in_dirs]=/usr/lib/nagios/plugins/nagios-plugin-file-ages-in-dirs-1.1/check_file_ages_in_dirs -d "/tmp" -w 24 -c 48
For the above, I hard-coded the warning and critical thresholds of 24 hours and 48 hours. I've also hard-coded the directory to check as "/tmp"
Attempt to execute the nagios plugin script locally to confirm it's working correctly:
/usr/lib/nagios/plugins/nagios-plugin-file-ages-in-dirs-1.1/check_file_ages_in_dirs -d "/tmp" -w 24 -c 48
OK: 1 dir(s) -- /tmp: 1 files
Ensure the nrpe user has read permissions on your check_file_ages_in_dirs.cfg file:
chmod a+r /etc/nrpe.d/check_file_ages_in_dirs.cfg
Restart your nrpe service, as per the instructions in http://tecadmin.net/install-nrpe-on-ubuntu/
You also need to ensure that if you have any firewall rules in place, they allow tcp traffic to port 5666.
On your Nagios server:
From your Nagios server, you'll need to manually run check_nrpe against your host to be monitored so as to verify correct functioning of the Nagios plugin and correct NRPE configuration.
Find the location of your check_nrpe file. On my installation, it's located at /usr/local/nagios/libexec/check_nrpe, but this could be different for your installation.
find / -name "check_nrpe" -type f
/usr/local/nagios/libexec/check_nrpe
If you don't have check_nrpe, you'll need to install it on your Nagios server.
apt-get install nagios-nrpe-plugin
First execute check_nrpe against your server to be monitored with no remote command arguments. This is just to confirm that NRPE is running on your remote server and it's correctly configured to allow connections from your Nagios server.
Note: For this example I'll pretend the IP address of the host I want to monitor is 10.0.0.1. Replace this with the IP address of the host you want to monitor.
/usr/local/nagios/libexec/check_nrpe -H 10.0.0.1
NRPE v2.14
The check_nrpe command above should return the version number of the NRPE agent running on the remote host if it's configured correctly.
Next attempt to manually invoke the Nagios plugin via NRPE:
/usr/local/nagios/libexec/check_nrpe -H 10.0.0.1 -c check_file_ages_in_dirs
OK: 1 dir(s) -- /tmp: 1 files
If you get output similar to the above, then it's time to move on to defining hosts, services, and commands on your Nagios server.
It would be cleaner to define separate configuration files for host, service, and command definitions. But that's outside of the scope of this post.
For now, we'll define these things in the default Nagios configuration file (nagios.cfg).
First locate your nagios.cfg file:
find / -name "nagios.cfg" -type f
/usr/local/nagios/etc/nagios.cfg
Edit the nagios.cfg file.
Add a host definition for the server you wish to monitor:
define host {
host_name Remote-Host
alias Remote-Host
address 10.0.0.1
use linux-server
contact_groups admins
notification_interval 0
notification_period 24x7
notifications_enabled 1
register 1
}
Add a command definition for the remote execution of check_file_ages_in_dirs:
define command {
command_name check_file_ages_in_dirs
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_file_ages_in_dirs
register 1
}
Add a service definition that will reference the check_file_ages_in_dirs command:
define service {
service_description check_file_ages_in_dirs
use generic-service
check_command check_file_ages_in_dirs
host_name Remote-Host
contact_groups admins
notification_interval 0
notification_period 24x7
notifications_enabled 1
flap_detection_enabled 1
register 1
}
Save and exit your nagios.cfg file.
Validate your Nagios configuration file:
nagios -v /usr/local/nagios/etc/nagios.cfg
If no errors are reported, restart your Nagios service.
Check the Nagios Web UI, and you should see your check_file_ages_in_dirs service monitoring your remote host.
Given an IP Address and port number, is it possible to check if the machine with that IP address has Postgresql listening on the specified port? If so, how?
I just want to obtain a boolean value of whether Postgresql is listening on the specified port of the specified machine.
You can use, for example, nmap tool:
=$ sudo nmap -v -p 5930 127.0.0.1
Starting Nmap 6.00 ( http://nmap.org ) at 2013-06-25 19:28 CEST
Initiating SYN Stealth Scan at 19:28
Scanning localhost (127.0.0.1) [1 port]
Discovered open port 5930/tcp on 127.0.0.1
Completed SYN Stealth Scan at 19:28, 0.03s elapsed (1 total ports)
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000045s latency).
PORT STATE SERVICE
5930/tcp open unknown
Read data files from: /usr/bin/../share/nmap
Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds
Raw packets sent: 1 (44B) | Rcvd: 2 (88B)
Alternatively you can just "SELECT 1" with psql, and check output:
=$ psql -h 127.0.0.1 -p 5930 -c "select 1"
?column?
----------
1
(1 row)
=$ psql -h 127.0.0.1 -p 5940 -c "select 1"
psql: could not connect to server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5940?
I think you need to define what you're trying to achieve better. Do you just want to know if anything is listening on a certain point? If PostgreSQL is listening on a given port? If PostgreSQL is running and actually accepting connections? If you can connect to PostgreSQL, authenticate successfully and issue queries?
One option is to invoke psql to connect to it and check the result code. Do not attempt to parse the output text, since that's subject to translation into different languages.
Better, use the client library for the language of your choice - psycopg2 for Python, PgJDBC for Java, the Pg gem for Ruby, DBD::Pg for Perl, nPgSQL for C#, etc. This is the approach I'd recommend. The SQLSTATE or exception details from any connection error will tell you more about why the connection failed - you'll be able to tell the difference between the server not listening, authentication failure, etc this way. For example, in Python:
import psycopg2
try:
conn = psycopg2.connect("host=localhost dbname=postgres")
conn.close()
except psycopg2.OperationalError as ex:
print("Connection failed: {0}".format(ex))
There are exception details in ex.pgcode (the SQLSTATE) to tell you more about errors that're generated server-side, like authentication failures; it'll be empty for client-side errors.
If you just want to see if something is listening on a given IP and TCP port, you can use netcat (*nix only), or a simple script in the language of your choice that creates a socket and does a connect() then closes the socket if it gets a successful response. For example, the following trivial Python script:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect(('localhost',5432))
s.close()
except socket.error as ex:
print("Connection failed with errno {0}: {1}".format(ex.errno, ex.strerror))
The same approach applies in any programming language, just the details of the socket library and error handling vary.
For some purposes it can also be useful to use the netstat tool to passively list which processes are listening on which network sockets. The built-in netstat on Windows is pretty brain-dead so you have to do more parsing of the output than with netstat for other platforms, but it'll still do the job. The presence of a socket in netstat doesn't mean that connecting to it will succeed, though; if the process has failed in some way that leaves it broken but still running (stuck in an infinite loop, blocked by a debugger, SIGSTOPed, etc) then it won't respond to an actual connection attempt.
In brief
In details
Fastest way is to use netcat aka nc with timeout ability as shared here
Results as 0/1 means postgres working/not-working
echo 'QUIT' | nc -w SECONDS YOUR_HOST PORT; echo $?
# eg
echo 'QUIT' | nc -w 1 localhost 5432; echo $?
Another also-faster way that works for me is to use telnet as discussed here.
echo -e '\x1dclose\x0d' | telnet YOUR_HOST PORT
# eg
echo -e '\x1dclose\x0d' | telnet localhost 5432
I have started the pgpool using the command
sudo pgpool -n &
it started giving the following message on the terminal:
2012-05-04 10:54:29 LOG: pid 4109: pgpool-II successfully started. version 2.3.2.1 (tomiteboshi)
But when I try to run the following command:
createdb -p 9999 bench_replication
I get the following error message:
createdb: could not connect to database postgres: could not connect to server: No such file or directory.
Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.9999"?
When I change the port from 9999 to 5432, a database bench_replication is created on the local node only, not on slave nodes. But, tutorial say to mention port 9999 in this command in order to create database bench_replication on all slave nodes through pgpool.
To confirm whether pgpool is really running or not, I stop the pgpool using command
2012-05-04 10:58:50 LOG: pid 4109: received smart shutdown request
stop request sent to pgpool. waiting for termination...done.
[1]+ Done sudo -A pgpool -n
which confirms the pgpool was actually running. What am I doing wrong? I have changed all my pgpool configuration file as mentioned in the standard tutorials on net.
Try this command :
createdb -p 9999 -h 127.0.0.1 bench_replication
By default PostgreSQL try to use the socket.
Late response but useful for future generations:
When you run
createdb -p 9999 bench_replication
under root, this generates the following error in log:
no pg_hba.conf entry for host "12.34.56.78", user "root", database
"postgres", SSL off
This means you should explicit mention username just like this:
createdb -U postgres -p 9999 bench_replication
Then you will get another error:
no pg_hba.conf entry for host "12.34.56.78", user "postgres", database
"postgres", SSL off
So you were blocked by second node on HBA level.
In this case either you should allow on second node access from first (in pg_hba.conf):
host all postgres 12.34.56.77 trust
or you should set password:
createdb -U postgres -p 9999 bench_replication -W SoMeP4Ssw0rD
If this is not clear enough - just check for your logs.