Nagios: Return code of 7 is out of bounds - nagios

Services are up and running on the remote nodes. CLI execution returns OK, but in UI it returning CRITICAL with Status Information:'Return code of 7 is out of bounds'
nagios-xxxxxxxx:~# /usr/lib/nagios/plugins/check_tcp -H hostname -p <port> -w 5 -c 10 -t 60
TCP OK - 0.002 second response time on hostname port XXXXXXX|time=0.001642s;5.000000;10.000000;0.000000;60.000000
Can someone help me in fixing it?
Nagios log:
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.

I fixed these issues.Actually issues are with duplicated service configs on nagios server: location:: /etc/nagios4/objects/services/
Cleard the duplcate service configs from the location and reloaded nagios service.
Issues cleared.

I reproduced this problem on my systems. I have 620 hosts, 7000 services.
When the number of services exceed 6189, all plugins become unusable with "Return code of 7 out of bounds", even if there are just /bin/true command.
The main solution is to set in nagios.cfg:
enable_environment_macros=0
I did not want to do this for a long time, because I have one of plugins which uses nagios ENV variables during building HTML e-mail for notifications.
But I found this solution for its running, you need to set manually necessary ENV for particular plugin in this way:
define command{
command_name notify-html-service
command_line NAGIOS_NOTIFICATIONTYPE='$NOTIFICATIONTYPE$' NAGIOS_SERVICEATTEMPT='$SERVICEATTEMPT$' NAGIOS_SERVICESTATE='$SERVICESTATE$' NAGIOS_CONTACTGROUPNAME='$CONTACTGROUPNAME$' NAGIOS_HOSTNAME='$HOSTNAME$' NAGIOS_SERVICEDESC='$SERVICEDESC$' NAGIOS_LONGSERVICEOUTPUT='$LONGSERVICEOUTPUT$' NAGIOS_HOSTADDRESS='$HOSTADDRESS$' NAGIOS_HOSTGROUPNAMES='$HOSTGROUPNAMES$' NAGIOS_HOSTALIAS='$HOSTALIAS$' NAGIOS_SERVICEOUTPUT='$SERVICEOUTPUT$' NAGIOS_LONGDATETIME='$LONGDATETIME$' NAGIOS_SERVICEDURATION='$SERVICEDURATION$' NAGIOS_NOTIFICATIONRECIPIENTS='$NOTIFICATIONRECIPIENTS$' NAGIOS_SERVICEGROUPALIAS='$SERVICEGROUPALIAS$' NAGIOS_HOSTALIAS='$HOSTALIAS$' NAGIOS_NOTIFICATIONAUTHOR='$NOTIFICATIONAUTHOR$' NAGIOS_NOTIFICATIONCOMMENT='$NOTIFICATIONCOMMENT$' NAGIOS_CONTACTEMAIL='$CONTACTEMAIL$' NAGIOS_SERVICEATTEMPT='$SERVICEATTEMPT$' /usr/bin/perl '$USER7$/send.notify' http://192.168.1.1/nagios 2>/tmp/send.log
}
define command{
command_name notify-html-host
command_line NAGIOS_NOTIFICATIONTYPE='$NOTIFICATIONTYPE$' NAGIOS_HOSTSTATE='$HOSTSTATE$' NAGIOS_CONTACTGROUPNAME='$CONTACTGROUPNAME$' NAGIOS_HOSTNAME='$HOSTNAME$' NAGIOS_HOSTADDRESS='$HOSTADDRESS$' NAGIOS_HOSTGROUPNAMES='$HOSTGROUPNAMES$' NAGIOS_HOSTALIAS='$HOSTALIAS$' NAGIOS_LONGDATETIME='$LONGDATETIME$' NAGIOS_NOTIFICATIONRECIPIENTS='$NOTIFICATIONRECIPIENTS$' NAGIOS_SERVICEGROUPALIAS='$SERVICEGROUPALIAS$' NAGIOS_LONGHOSTOUTPUT='$LONGHOSTOUTPUT$' NAGIOS_HOSTALIAS='$HOSTALIAS$' NAGIOS_HOSTOUTPUT='$HOSTOUTPUT$' NAGIOS_HOSTDURATION='$HOSTDURATION$' NAGIOS_NOTIFICATIONAUTHOR='$NOTIFICATIONAUTHOR$' NAGIOS_NOTIFICATIONCOMMENT='$NOTIFICATIONCOMMENT$' NAGIOS_CONTACTEMAIL='$CONTACTEMAIL$' NAGIOS_SERVICEATTEMPT='' /usr/bin/perl '$USER7$/send.notify' http://192.168.1.1/nagios 2>/tmp/send.log
}
This helped to me. At the beginning it was one command for both notifications, with different host/service ENV vars presetting by nagios:
define command{
command_name notify-html
command_line /usr/bin/perl $USER2$/send.notify http://192.168.1.1/nagios 2>/tmp/send.log
}
By the way, nagios documentation not recommends to set enable_environment_macros=1:
Enabling this is a very bad idea for anything but very small setups,
as it means plugins, notification scripts and eventhandlers may run
out of environment space. It will also cause a significant increase
in CPU- and memory usage and drastically reduce the number of checks
you can run.
PS/
My answer was edited, due to need to split notify-html command to notify-html-host and notify-html-service. I started to receive wrong host notifications due to errors with macros deffinitions (service macros are absent in host notification events), and I had to trace debug log of nagios and saw a lots of 'WARNING: An error occurred processing macro' messages.
Good Luck.

I had this exact same issue but it seems it was due to the number of services tied to a single Servicegroup. Once the Servicegroup had more than nine services reporting they would return:
[XXXXXXX] Warning: Return code of 7 for check of service 'XXXXXXX' on host was out of bounds.
I reorganized my services into a few separate Servicegroups and all the checks functioned normally again without any further adjustment.

Related

DBus : Can't get match rules for my user's session bus

I'm trying to use dbus/tools/GetAllMatchRules.py to get diagnostic information. When I run it without parameters as my regular user I get "GetConnectionMatchRules failed: did you enable the Stats interface?"
I modified GetAllMatchRules to print the specific exception details. It now says
GetConnectionMatchRules failed: did you enable the Stats interface?: org.freedesktop.DBus.Error.AccessDenied: The caller does not have the necessary privileged to call this method
So then I'm wondering, does it work at all? So I sudo su and run it again and it gives me the kind of information I'd expect to see, just not for the right bus. Oddly, if I use the --system parameter, even root gets org.freedesktop.DBus.Error.AccessDenied .
The repository claims, in bus/example-session-disable-stats.conf.in , that
"If the Stats interface was enabled at compile-time, users can use it on
the session bus by default. Systems providing isolation of processes
with LSMs might want to restrict this. This can be achieved by copying
this file in #EXPANDED_SYSCONFDIR#/dbus-1/session.d/
"
But that's clearly not the case because my user can NOT access this information.
I even tried a brute force approach to disabling (commenting out) ALL deny statements at /usr/share/dbus-1/system.conf and reloading and it still doesn't work. I also tried a full system restart in case I wasn't reloading correctly. I also did a system-wide search for system.conf in case it's actually using some other conf file that I'm not seeing, which would mean I'm modifying the wrong thing. I got a big hint that that's not the case when I had a typo (-- instead of --> for commenting out) and it failed to reload, but did reload once I fixed the typo.
I'm ok with the possibility that I can only do this signed in as root, so I also tried modifying GetAllMatchRules to use dbus.bus.BusConnection(), and force-feeding it the session address (unix:path=/run/user/1000/bus) which results in
"org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Incidentally, this is the same issue that happens if I leave the code alone but use sudo -E su instead of just sudo su (the -E option in this case means that the $DBUS_SESSION_BUS_ADDRESS variable is retained)
I'm not sure what to try next...
Turns out there isn't currently a solution, the privilege error is simply the code that was chosen to indicate that the method is an unimplemented stub method

How can I run dracut commands inside a non-root C code?

I'm developing a tool that modifies LUKS partitions and disks.
Everything is working very well. Until now...
To handle disks properly as a non-root user, I added some polkit rules to change password, open partition, change crypttab and many others.
But, I'm seeing problems when I change crypttab and I need to run dracut to apply some dracut modules (dracut --force). Specially, the last one.
My user is part of admin group and I added a rule into sudoers file to not ask sudo password when my application is executed.
So, I decided to use this code:
gchar *dracut[] = {"/usr/bin/sudo", "/usr/bin/dracut", "--force", NULL};
if ((child = fork()) > 0) {
waitpid(child, NULL, 0);
} else if (!child) {
execvp("/usr/bin/sudo", dracut);
}
It is not working because SELinux is preventing to run this command:
SELinux is preventing /usr/bin/sudo from getattr access on the chr_file /dev/hpet.
***** Plugin catchall (100. confidence) suggests **************************
If you believe that sudo should be allowed getattr access on the hpet chr_file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'sudo' --raw | audit2allow -M my-sudo
# semodule -X 300 -i my-sudo.pp
Additional Information:
Source Context system_u:system_r:xdm_t:s0-s0:c0.c1023
Target Context system_u:object_r:clock_device_t:s0
Target Objects /dev/hpet [ chr_file ]
Source sudo
Source Path /usr/bin/sudo
Port <Unknown>
Host <Unknown>
Source RPM Packages sudo-1.8.25p1-4.el8.x86_64
Target RPM Packages
Policy RPM selinux-policy-3.14.1-61.el8.noarch
Selinux Enabled True
Policy Type targeted
Enforcing Mode Enforcing
Host Name jcfaracco#hostname
Platform Linux jcfaracco#hostname 4.18.0-80.el8.x86_64 #1
SMP Wed Mar 13 12:02:46 UTC 2019 x86_64 x86_64
Alert Count 9
First Seen 2019-06-14 19:32:42 -03
Last Seen 2019-06-14 19:42:46 -03
Local ID 772b2c41-2302-4ee0-8886-52789eb63e22
Raw Audit Messages
type=AVC msg=audit(1560552166.658:199): avc: denied { getattr } for pid=2291 comm="sudo" path="/dev/hpet" dev="devtmpfs" ino=10776 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:clock_device_t:s0 tclass=chr_file permissive=0
type=SYSCALL msg=audit(1560552166.658:199): arch=x86_64 syscall=stat success=no exit=EACCES a0=7ffd4a6dffb0 a1=7ffd4a6def20 a2=7ffd4a6def20 a3=7fe845a73181 items=0 ppid=1756 pid=2291 auid=4294967295 uid=982 gid=980 euid=0 suid=0 fsuid=0 egid=980 sgid=980 fsgid=980 tty=tty1 ses=4294967295 comm=sudo exe=/usr/bin/sudo subj=system_u:system_r:xdm_t:s0-s0:c0.c1023 key=(null)ARCH=x86_64 SYSCALL=stat AUID=unset UID=gnome-initial-setup GID=gnome-initial-setup EUID=root SUID=root FSUID=root EGID=gnome-initial-setup SGID=gnome-initial-setup FSGID=gnome-initial-setup
Hash: sudo,xdm_t,clock_device_t,chr_file,getattr
Do you know how to fix this issue? Any other idea to call dracut inside a C code is welcome too. In case of any other smart way to perform this issue.

apache2 FastCGI comm with dynamic server aborted first read idle timeout

Summary: Unable to run any of the most simple “Hello World” FastCGI script, any request always terminating into a time out. Seems there is no communication at all between the server and the FastCGI scripts (using dynamic FastCGI scripts).
The environment
Ubuntu Precise (12.04)
Package apache2.2-bin
Package apache2-mpm-prefork
Package libapache2-mod-fastcgi
Package libfcgi-perl
Package python-flup
Multiple sites configured as virtual hosts on 127.0.0.1
There exists a /var/lib/apache2/fastcgi directory, owned by www-data, readable by all (owner, group and others)
There exists a /var/lib/apache2/fastcgi/dynamic directory, owned by www-data, which is restricted to the owner (readable, writable and accessible by www-data only)
There exists an inode/socket file in the /var/lib/apache2/fastcgi/ directory
The FastCGI relevant configurations:
The directory /etc/apache2/mods-enabled/ holds a reference to fastcgi.conf and fastcgi.load (mod_fastcgi is enabled).
The file fastcgi.conf contains the following (left untouched, I did not edit it):
<IfModule mod_fastcgi.c>
AddHandler fastcgi-script .fcgi
#FastCgiWrapper /usr/lib/apache2/suexec
FastCgiIpcDir /var/lib/apache2/fastcgi
</IfModule>
The relevant configuration file in /etc/apache2/sites-enabled/ contains the following (there is nothing more anywhere else about FastCGI specific configuration):
<DirectoryMatch /fcgi-bin>
Options +ExecCGI
<FilesMatch "^[^\.]+$">
SetHandler fastcgi-script
</FilesMatch>
</DirectoryMatch>
The test materials on the test virtual host:
There exist a fcgi-bin/test-perl.fcgi whose content is (the file is executable by all, and readable by owner and group):
#!/usr/bin/perl
use CGI::Fast qw(:standard);
$COUNTER = 0;
while (new CGI::Fast) {
print header;
print start_html("Fast CGI Rocks");
print
h1("Fast CGI Rocks"),
"Invocation number ",b($COUNTER++),
" PID ",b($$),".",
hr;
print end_html;
}
There exist a fcgi-bin/test-python.fcgi whose content is (the file is executable by all, and readable by owner and group):
#!/usr/bin/python
def myapp(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/plain')])
return ['Hello World!\n']
try:
from flup.server.fcgi import WSGIServer
WSGIServer(myapp).run()
except:
import sys, traceback
traceback.print_exc(file=open("errlog.txt","a"))
The issue
Although both fcgi-bin/test-perl.fcgi and fcgi-bin/test-python.fcgi runs normally when executed from the command‑line, none seems to work when invoked, e.g. as http://test.loc/fcgi-bin/test-perl.fcgi or http://test.loc/fcgi-bin/test-python.fcgi.
Nothing at all happens, and after some delay, I get an Error 500, and Apache error logs contains multiple entries looking like:
[<date>] [error] [client <IP>] FastCGI: comm with (dynamic) server "/<…>/fcgi-bin/<script>.fcgi" aborted: (first read) idle timeout (30 sec), referer: <referrer>
[<date>] [error] [client <IP>] FastCGI: incomplete headers (0 bytes) received from server "<…>/fcgi-bin/<script>.fcgi", referer: <referrer>
I've spent hours and hours searching the web trying to understand why it does not work, and finally decided to give up and ask for some help here.
Any pointers and check list welcome. Feel free to ask for any missing details you may feel to be relevant or worth checking.
Enjoy a nice day.
-- edit --
Issue update
In my own reply to my own question, I mentioned a weird case where things were looking suddenly fine without reasons. I later discovered this was only partly fine.
In the same virtual host, so with the exact same server configuration, some scripts, which are exactly the same (and with exact same access rights), fails depending on their location.
As a remainder, here is what's in the site configuration:
<DirectoryMatch /fcgi-bin>
Options +ExecCGI
<FilesMatch "^[^\.]+$">
SetHandler fastcgi-script
</FilesMatch>
</DirectoryMatch>
With the above, only scripts in /fcgi-bin are handled as FastCGI script. But I also have some elsewhere (still for testing): one in /cgi-bin and one in / (i.e. in the public_html directory). For this purpose, .htaccess contains this entry:
Options +ExecCGI
AddHandler fastcgi-script .fcgi
So the two others FastCGI script should work the same as the one in /fcgi-bin, but they don't, and for the time, they invariably terminates with a connexion time‑out, just like the one /fcgi-bin first did.
This makes me feel something may be wrong with the mod_fastcgi module (known bug? else?). So far, this module seems to act rather randomly.
-- edit 2 --
The above in the first edit, was an error of mine: the group was wrong with the other scripts, it had to be www-data, but it was not. So is something is wrong, stick to the answer I gave, that is, try to look at the FastCgiConfig, and see if it solve anything or at least if it honours the time‑out options.
I will answer my own question, as it seems to be working now. However, the epilogue still looks weird.
Although the default configuration should be OK, I still wanted to review the “Module mod_fastcgi” document again. As I only wanted a dynamic FastCGI, I focused on the FastCgiConfig directive only, thus on purpose not going into FastCgiServer and FastCgiExternalServer directives.
As there was no FastCgiServer at all in the default fastcgi.conf file, I started to try to set‑up my own. For a first test, I wanted to use the -appConnTimeout option, at least to request the server to not wait so much long before it returns me an Error 500.
So I just added this in the site configuration (I did not touch fastcgi.cong), in the same file where virtual hosts are configured:
FastCgiConfig -appConnTimeout 2
This was to tell the server to wait no more than 2 seconds, instead of the 30 seconds it was waiting. I tried to invoked a FastCGI script to see if at least this configuration was working. I expected to get an error in a 2 seconds delay, but instead, the script ran without error.
What's weird, is that I then tried to remove this option, to check if it was just that addition which was just missing to make FastCGI scripts working. But after I commented‑out the option, it was still working, and the same after a full reboot.
Can't tell more, that looks weird, but this is the only thing I did, I did not edit anything else. I can just suggest people who may encounter a similar issue, to just try the above.
Sorry, if I can't explain what it did exactly. I really would like to know. It just working now, but I don't know why.
#############
fastcgi.conf
FastCgiWrapper Off
peng.rl 's answer solve my problem.
My ceph radosgw can't get apache's input at all. after set FastCgiWrapper Off, I can capture data in wireshark.

Why won't mariadb listen on port 3306 after a macports update?

At some point after a mariadb port update, she refused to listen on 3306 upon startup.
I made sure there were no skip-networking directives, and even tried adding one with "=OFF", which did nothing... but the odd thing was it had been working, and "I haven't changed anything".
Yet when I run:
/opt/local/lib/mariadb/bin/mysqladmin variables -u root -p | grep skip_networking
I see skip-networking as being ON.
My config has this:
[mysqld]
port = 3306
bind-address = 127.0.0.1
and no skip-networking setting at all.
Even passing the port and bind-address via command line will not make it listen.
After a grep of /opt/local/etc, it turns out there is a default config, and inside that there's a skip-networking directive:
cat /opt/local/etc/mariadb/macports-default.cnf
This was only picked up because after reading /etc/my.cnf, apparently the /opt/local/etc/mariadb/my.cnf file is also read. (I'd used /etc/my.cnf, never having edited the other, but something changed-- maybe I'd edited the default and it was overwritten with the update, though I don't remember doing so.)
Commenting out the include in /opt/local/etc/mariadb/my.cnf of the macports-default.cnf once again has her listening.
Pretty clear solution in retrospect I guess, but I was a bit stumped, as I swear "I changed nothing!"... Regardless-- For posterity, and key word searches!
I can't comment yet, but wanted to add:
If you have other versions of MySQL or mariaDB installed via MacPorts, be sure to check out their config files too because MariaDB reads them.
Locations:
/opt/local/etc/mysql${mysqlVersion}/my.cnf
/opt/local/etc/mariadb-${mariadbVersion}/my.cnf
I have mariadb-10.1-server installed. There are two configs:
/opt/local/etc/mariadb/my.cnf
/opt/local/etc/mariadb-${mariadbVersion}/my.cnf
Additionally, some info about default config files from MirandaDB documentation (this is not Macports specific):
/etc/my.cnf
/etc/mysql/my.cnf
my.cnf in the DEFAULT_SYSCONFDIR specified during the compilation
my.cnf in the path, specified in the environment variable MYSQL_HOME (if any) the file specified in --defaults-extra-file (if any)
user-home-dir/.my.cnf
Alternatively to commenting out the defaults file, the value can be overriden in the my.cnf:
[mysqld]
...
skip_networking=0

setting up passive checks on nagios

hello board this question may be a little clean and green however,
I've been trying to set up Nagios NSCA for passive checks on a local ubuntu box as a prototype.
for those in the know, my nsca listening on 5667 and send_nsca is on the same ubuntu computer (localhost 127.0.0.1) . I've been reading and testing object definitions and service templates however I have been getting config errors when i try to access nagios web after modifications.
I hope to get clearer instructions on how I can create the service (directories/configurations) to process passive checks in Nagio3 for ubuntu.
There are a few things to consider, firstly that localhost is defined as a host and secondly that the check actually exists as it would for any other check but with a command that doesn't actually do anything, for example.. I've created a passiveservices.cfg file with services defined as follows:
define service{
use generic-service,service-pnp
host_name Server1,Server2
service_description Uptime
active_checks_enabled 1
passive_checks_enabled 1
check_command check_null
check_freshness 1
check_period none
}
define service{
use generic-service,service-pnp
host_name Server1,Server2
service_description Drive space
active_checks_enabled 1
passive_checks_enabled 1
check_command check_null
check_freshness 1
check_period none
Note that the check command is check_null, it's not actually doing anything.. and passive_checks_enabled is 1.
There are two lines within Nagios.cfg which you need to enable:
accept_passive_host_checks
accept_passive_service_checks
It's also a good idea to enable the following two lines aswell
check_service_freshness
check_host_freshness
If a server doesn't poll in after a set amount of time, it'll trigger a script (I trigger an email within my config)
Lastly, enable the following two lines:
log_external_commands
log_passive_checks
They'll help with debugging if this doesn't work. It writes out to /var/log/syslog on Ubuntu (well, it does on mine)..

Resources