Nagios: CRITICAL - Socket timeout after 10 seconds - nagios

I've been running nagios for about two years, but recently this problem started appearing with one of my services.
I'm getting
CRITICAL - Socket timeout after 10 seconds
for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:
GET /abc/def HTTP/1.0
User-Agent: check_http/v1861 (nagios-plugins 1.4.11)
Connection: close
Host: my.host.com
CRITICAL - Socket timeout after 10 seconds
... which does not tell me what's going wrong.
Any ideas how I could resolve this?
Thanks!

Try using the -N option of check_http.
I ran into similar problems, and in my case the web server didn't terminate the connection after sending the response (https was working, http wasn't). check_http tries to read from the open socket until the server closes the connection. If that doesn't happen then the timeout occurs.
The -N option tells check_http to receive only the header, but not the content of the page / document.

I tracked my issue down to an issue with the security providers configured in the most recent version of OpenSUSE.
From summary of other web pages it appears to be an issue with an attempt to use TLSv2 protocol which does not appear to work correctly, or is missing something in the default configurations to allow it to work.
To overcome the problem I commented out the security provider in question from the JRE security configuration file.
#security.provider.10=sun.security.pkcs11.SunPKCS11
The security.provider. value may be different in your configuration, but essentially the SunPKCS11 provider is at issue.
This configuration is normally found in
$JAVA_HOME/lib/security/java.security
of the JRE that you are using.

Fixed with this url in nrpe.cfg: (on Deb 6.0 Squeeze using nagios-nrpe-server)
command[check_http]=/usr/lib/nagios/plugins/check_http -H localhost -p 8080 -N -u /login?from=%2F

For whoever is interested, I stumbled in this problem too and the problem ended up being in mod_itk on the web server.
A patch is available, even if it seems it's not included in the current CentOS or Debian packages:
https://lists.err.no/pipermail/mpm-itk/2015-September/000925.html

In my case /etc/postfix/main.cf file was not good configured.
My mailserverrelay was not defined and was also very restrictive.
I should to add:
relayhost = mailrelay.ext.example.com
smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination

Related

Failed to load resource: net::ERR_CONNECTION_TIMED_OUT on remote but works fine on localhost

i have react with asp.net core website . it worked fine on localhost but when published on iis remote server the timeout error occurs.
the front-end (react client) and back-end(server) asp.netcore webapi work independently.
before uploading i changed the following in program.cs in webapi.
usUrl("https://localhost:4000")
to useUrl("https://www.virtualcollege.pk:4000")
i also changed the front-end baseurl similarly.
moreover, the connectionstrings in appsettings.json is correct for both databases.
i added migration and updated the databases successfully.
the website is live but timeout error occur :
virtualcollege.pk
i also tried the url with "https://myip-address:4000"
thanks in advance for help.
if i remove port number from url and publish on local folder than upload to remote server . the webapi.exe on local machine runs as follows:
You have to open incoming request for 4000 port. Try some methods below.
Windows Server
Please check this link or this one
Ubuntu/Debian
sudo ufw allow 4000/tcp
sudo ufw status // check status
CentOS
First, you should disable selinux, edit file /etc/sysconfig/selinux so it looks like this:
SELINUX=disabled
SELINUXTYPE=targeted
Save file and restart system.
Then you can add the new rule to iptables:
iptables -A INPUT -m state --state NEW -p tcp --dport 4000 -j ACCEPT
and restart iptables with /etc/init.d/iptables restart

HTTP Server: Connection closed by foreign host

Attempting to get this HTTP webserver I found online running after downloading their source files (source: Webserver). [Files located at bottom of webpage.]
I attempted to compile it using their Makefile but there were some errors, where I just needed to #include some extra libraries. However, once I got that compiled and running (tested using telnet)
telnet localhost <port number>
I get the following:
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Then after 5 seconds or so it displays the following:
Connection closed by foreign host.
I'm not sure if the person who wrote it is still managing it so I figured I'd ask here. Any ideas as to why connection closes?
I'm running this on a Windows machine connected to a Unix server, so as the program site states, it should be running correctly on Unix machines.
in the file: reqhead.c
in the function: Get_Request()
There is a timed call to select()
You can change the timeout value (currently 5 seconds)
or replace the timeout parameter with a NULL parameter (although replacing with NULL would mean the code, once a connection is established would wait forever.)
First we will see ubuntu system log with this command
sudo gedit /var/log/syslog
and if you will see this error "execv( /usr/sbin/tcpd ) failed: No such file or directory"
then run this command
sudo apt-get install tcpd
It will solve your problem (if not then you need to search your system error on google)

[Error]: Error - Could not complete SSL handshake. ON LOCALHOST

All of the questions about this error show people running check_nrpe -H [some_remote_ip], in contrast to an error-free run on localhost.
I, however, can't even get this to run on localhost:
$> ./check_nrpe -H localhost
CHECK_NRPE: Error - Could not complete SSL handshake.
The service does appear to be up and running:
$> sudo netstat -apn | grep :5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 5847/nrpe
tcp6 0 0 :::5666 :::* LISTEN 10216/nrpe
And the daemon returns no errors
$> sudo service nagios-nrpe-server status
* nagios-nrpe is running
My nrpe.cfg file has allowed_hosts set correctly:
allowed_hosts=127.0.0.1,10.0.1.2,0.0.0.0
Contents of /var/log/syslog with debugging turned on:
Nov 1 22:54:44 <MYHOST> nrpe[11156]: Connection from ::1 port 6601
Nov 1 22:54:44 <MYHOST> nrpe[11156]: Host ::1 is not allowed to talk to us!
Nov 1 22:54:44 <MYHOST> nrpe[11156]: Connection from ::1 closed.
Does anyone have any idea what's going on, this seems almost nonsensical. Thanks!
Note that my example may be different than yours.
First change to the folder having your nrpe command and run:
./nrpe --version
The output from that command will look something like this:
NRPE - Nagios Remote Plugin Executor
Copyright (c) 1999-2008 Ethan Galstad (nagios#nagios.org)
Version: nrpe-3.0
Last Modified: 07-12-2016
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available, OpenSSL 0.9.6 or higher required
Notice that the last line tells you that SSL is indeed supported by this build of NRPE. If it is not there, then you'll have to install a version that was compiled with SSL support (which may mean compiling one of for yourself, depending on where you got it). The docs for the source code are pretty clear on how this is done.
If you DO have the SSL line above, look at the required version on the line and check your system to be sure that at least that version has been installed. I used this command:
rpm -qa | grep openssl
And received output looking like this:
libopenssl1_0_0-32bit-1.0.1k-2.39.1.x86_64
openssl-1.0.1k-2.39.1.x86_64
Both openssl and libopenssl are required for NRPEs SSL support to function correctly. I strongly recommend that if these are not installed, to use your systems package installer (aptget, yum, zypper, ...) to fetch and install them. If these are already installed, but you still have errors, then you will likely have a configuration issue in:
/etc/ssl/openssl.cnf
Fixing that is well outside of the scope/space available here. I recommend to upgrade both of these via a working, on-line package - these packages always include a default configuration which should work fine with NRPE - assuming the version is equal to or higher than required.
I think that check_nrpe is trying to use IPv6.
The IPv6 localhost ip is ::1, so adding this to your allowed_hosts= line in _nrpe.cfg_ and restarting nrpe will tick this box for you.
Alternatively as another responder replied you can just add -4 to your check_nrpe command to force it to stick to IPv4.
I was having the same issue and it's only when I saw the ::1 in the question it dawned on me what was happening.
I am not sure if it is still relevant, but I had the same issue and discovered someone had changed the /etc/hosts.allow file, blocking the access. Somehow this results in the following errors:
Client: Connection refused by TCP wrapper
Server: Error: (nerrs = 0)(!log_opts) Could not complete SSL handshake with <Client IP> : rc=-1 SSL-error=5
Changing the /etc/hosts.allow file solved the issue.

Google compute engine returned 399 internal server error

Google compute engine console return 399 error code already asks my question but the solution is not as suggested there. Since the URL is little old starting a new thread.
I am trying to do a wget using:
wget https://console.developers.google.com/m/cloudstorage/b/m-lab/o/ndt/2012/05/23/20120523T000000Z-mlab1-ams01-ndt-0000.tgz
I see the error:
Resolving console.developers.google.com (console.developers.google.com)... 216.239.32.27
Connecting to console.developers.google.com (console.developers.google.com)|216.239.32.27|:443... connected.
HTTP request sent, awaiting response... 399 Internal Server Error
2014-08-26 20:02:18 ERROR 399: Internal Server Error.
I am new to Linux commands so wanted to know if am missing something obvious.
The address works when I use Chrome downloader but fails with wget with me as well
I have never seen this behaviour before
You can also use cURL to download files, I used the -v switch and got a dns error(no idea why)
curl -v http://console.developers.googlO.com/m/cloudstorage/b/m-lab/o/ndt/2012/05/23/20120523T000000Z-mlab1-ams01-ndt-0000.tgz
We cannot download with traditional tools we have to use gsutil utility provided by google, using which automation is possible.
You need to use the following URI pattern:
http://storage.googleapis.com/<bucket>/<object>
In this case, you can download that file using the command:
wget http://storage.googleapis.com/m-lab/ndt/2012/05/23/20120523T000000Z-mlab1-ams01-ndt-0000.tgz

MacPorts Apache2 Stopped Launching on Boot

Something that I've noticed recently on two different machines is that Apache2 installed via MacPorts seems to have stopped launching when I boot up. The honest truth is that I can't swear it did so before, but it's something I think I'd notice because installing the LaunchDaemon is part of my install process. In fact, if I try to reload the LaunchDaemon, it fails:
$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.apache2.plist
org.macports.apache2: Already loaded
Unless I start Apache manually (using sudo apachectl restart), grep'ing for either "apache2" or "httpd" in my process list only produces this:
$ sudo ps -ef | egrep "apache2|httpd"
0 52 1 0 0:00.06 ?? 0:00.08 /opt/local/bin/daemondo --label=apache2 --start-cmd /opt/local/etc/LaunchDaemons/org.macports.apache2/apache2.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.apache2/apache2.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.apache2/apache2.wrapper restart ; --pid=none
1410639199 6960 6792 0 0:00.00 ttys001 0:00.00 egrep apache2|httpd
Looks like the daemon itself is in place, but no executable. As far as I know/can tell, the relevant executables (httpd and apachectl) are executable by everyone.
Has anyone else noticed this? Any ideas?
UPDATE
As requested below, I did execute launchctl list. The list is long and I'm not sure how to snip it, but suffice to say that no org.macports.* items are listed. That in itself is interesting because my MySQL daemon is loaded the same way. It works, but also doesn't appear in the list. Let me know if the entire output is really needed.
UPDATE
I assumed that I had executed launchctl list under sudo, but prompted by mipadi's comment below, I tried again ensuring that I did so and I assumed incorrectly. When executed under sudo, the MacPorts items appear:
51 - org.macports.mysql5
52 - org.macports.apache2
I'm not sure whether that will help, but it's a little more info nonetheless.
UPDATE
I've asked a different, but related, question at LaunchDaemons and Environment Variables. I'll update both questions as I learn more.
UPDATE
Today, based on mailing list input, I tried using a wildcard home directory. Academically, it's a little more inclusive than I'd like, but the practical reality is that I'm the only one using this computer; certainly the only one who'd have Apache config files laying around.
Include "/Users/*/Dropbox/Application Support/apache/conf.d.osx/*.conf"
Include "/Users/*/Library/Application Support/MacPorts/apache/conf.d/*.conf"
Unfortunately...
httpd: Syntax error on line 512 of /opt/local/apache2/conf/httpd.conf: Wildcard patterns not allowed in Include /Users/*/Dropbox/Application Support/apache/conf.d.osx/*.conf
I found my answer to this problem here:
https://trac.macports.org/ticket/36101
"I apparently fixed this when changing my local dnsmasq config. In /etc/hosts I added my servername (gala) to the loopback entry:
127.0.0.1 localhost gala
and then I changed ServerName in /opt/local/apache2/conf/httpd.conf to match:
ServerName gala
Apache now starts at boot for me."
Since I now know why Apache has stopped loading on startup, I'm going to articulate that answer and mark this question as answered. The reason Apache has stopped launching on boot is that I'm trying to share an httpd.conf file across systems. The config file needs to Include files from directories that exist within my home directory. Since the home directory is different on each machine, I was trying to reference the ${HOME} environment variable.
This works fine when manually starting after the machine is booted, but fails on startup because the environment variable isn't yet set. As mentioned above, see this question for more information.
Rob:
Had the same problem: "sudo launchctl load -w ..." started Apache2 while I was logged in, but did not work during startup (the "-w" should have taken care of that). Also, as you noticed, the daemon seems to be registered with launchctl. It will show up with "sudo launchctl list" and another "sudo launchctl load ..." will result in the error message.
I played with "sudo port load apache2" and "sudo port unload apache2", but could not get httpd running on reboot.
In the end, I got rid of the MacPorts startup item: "sudo port unload apache2", checked with "sudo launchctl list" that org.macports.apache2 is no longer registered for startup.
Afterwards, I followed the steps on http://diymacserver.com > Docs > Tiger > Starting Apache. I only had to adapt the path from /usr/local/... to /opt/local/...
Now the MacPorts Apache2 is starting fine with every reboot.
Good luck, Klaus
I found that my MacPorts apache2 was not starting on boot because of an “error” in my httpd.conf.
I was using
Listen 127.0.0.1:80
Listen 192.168.2.1:80
Listen 123.123.123.123:80 # Example IP, not the one I was really using
And in Console.app I was seeing
4/8/12 4:59:06.208 PM org.macports.apache2: (49)Can't assign requested address: make_sock: could not bind to address 192.168.2.1:80
4/8/12 4:59:06.208 PM org.macports.apache2: no listening sockets available, shutting down
4/8/12 4:59:06.208 PM org.macports.apache2: Unable to open logs
I tried adjusting permissions on all the log folders (despite the fact that logs were being written just fine when I manually started apache2) and that didn't help.
Even though the Apache Documentation for Listen clearly states
Multiple Listen directives may be used to specify a number of addresses and ports to listen to. The server will respond to requests from any of the listed addresses and ports.
I decided to try switching back to just using
Listen 80
And after doing so apache2 is starting on boot with no errors or warnings.
If you're using Subversion with Apache, you may find that Apache is not starting because the mod_dav_svn.so file has moved to /opt/local/libexec. You'll need to adjust your Apache startup files to account for the new location of this file.
In newer versions of MacPorts you can run sudo port load apache2 to instruct MacPorts to take care of the launchctl setup and automatically start the process. To stop the process run port unload.
After loading check /opt/local/apache2/logs/error_log for errors, including configuration issues.
In addition to my previous answer I have also found that sometimes Apache fails to start because something else with the system is not yet ready.
On one OS X Server machine I also use the DNS to create a “internal only” DNS name for the machine and that name is used in my Apache configuration. Sometimes when Apache tries to start the DNS server is not yet ready and Apache fails to load because the hostname isn’t valid.
I have also seen this on other non-Server systems without local DNS as well where something else required by Apache must not be ready yet.
One thing that has worked is to edit the apache2.wrapper located at /opt/local/etc/LaunchDaemons/org.macports.apache2/apache2.wrapper that MacPorts’ daemondo uses to start up Apache.
Edit the Start() function to add a sleep command to wait a bit before launching Apache.
Original (Lines 14-17 on my machine)
Start()
{
[ -x /opt/local/apache2/bin/apachectl ] && /opt/local/apache2/bin/apachectl start > /dev/null
}
With wait time added
Start()
{
[ -x /opt/local/apache2/bin/apachectl ] && sleep 10 && /opt/local/apache2/bin/apachectl start > /dev/null
}

Resources