How to catch "Nrpe unable to read output" when occured? - nagios

I'm trying to catch "nrpe unable to read output" output from plugin and send an email when this one occurs and I'm a little bit stuck :) . Thing is there are different return codes when this error occurs on different plugin:
Return code Service status
0 OK
1 WARNING
2 CRITICAL
3 UNKNOWN
Is there a way either to unify return codes of all plugins I use(that there always will be 2[CRITICAL] when this problem occurs), or any other way to catch those alerts? I want to keep return codes for different situations as is(i.e. filesystem /home will be warning(return code 1) for 95% and critical(return code 2) for 98%

Most folks would rather not have this error sending alert emails, because it does not represent an actual failed check. Basically it means nothing more than:
The command/plugin (local or remote) was ran by NRPE, but
failed to return any usable status and/or text back to nrpe.
This most often means something went wrong with the command/plugin and it hasn't done the job it was expected to perform. You don't want alerts being thrown for checks, when the check wasn't actually performed - as this would be very misleading. It's also important to note that the Return Code is not even be coming from the command/plugin.
In my experience, the number one cause of this error is a bad check. And as the docs for NPRE state, you should run the check (with all its options!) to make sure it runs correctly. Do yourself a favor and test both working AND not working states. About 75% of the time, this has happened because the check only works correctly when it has OK results, and blows up when something not-OK must be reported.
Another issue that causes these are network glitches. NRPE connects and runs the check; but the connection is closed before any response is seen. Once again, not a true check result.
For a production Nagios monitoring system, these should be very rare errors. If they are happening frequently, then you likely have other issues that need to be fixed.
And as far as I can tell, all built-in Nagios plugins use the exact same set of return codes. Are you certain this isn't a 'custom' check?

Ok, I think I've found the solution for my problems-I will try to check nagios.log on each node for those errors.

Related

How to decide warning and critical levels in nagios?

Can someone please tell on what basis should I decide warning and critical level for a check in nagios . I want to have notification for warning state before the check sends out critical . The check I am performing is check_es_logs that is Elastic Search logs.Performance data is below:
Warning and Critical thresholds are plugin-specific, which means that they are set in your check command. The plugin decides what state it should return (technically which exit code) based on the thresholds that you pass with flags to it, normally -w and -c.
If you want a more specific answer you need to ask a more specific question. For example:
I want to have notification for warning state before the check sends out critical
This doesn't make a lot of sense to me since you would just end up with two notifications, one saying Warning and one saying Critical.
You can do additional filtering, for example by saying that a contact should only receive notifications of a certain state, but since I don't know what you want your exact end result to be I'm not sure what to suggest.
Also, Icinga is a very different product from Nagios Core for example, so if this is an Icinga specific question it should not be tagged Nagios as the syntax for config files etc. will likely differ a lot.

Understanding Build Error: Method code too large

When sending Andoird Build to server I get the following build error:
Error! Failed to transform some classes java.lang.RuntimeException:
Method code too large! at
net.orfjackal.retrolambda.asm.MethodWriter.getSize(MethodWriter.java:2036)
at
net.orfjackal.retrolambda.asm.ClassWriter.toByteArray(ClassWriter.java:827)
at
net.orfjackal.retrolambda.Transformers.transform(Transformers.java:121)
at
net.orfjackal.retrolambda.Transformers.transform(Transformers.java:106)
at
net.orfjackal.retrolambda.Transformers.backportClass(Transformers.java:46)
at net.orfjackal.retrolambda.Retrolambda.run(Retrolambda.java:72) at
net.orfjackal.retrolambda.Main.main(Main.java:26)
I must confess I'm not sure why this is occurring as I do not references these classes. Could someone please explain how to track down the cause and fix it? I have not added any new imports since the last successful build :/ My project is also set to use Java 8. Not sure where to go from here to be honest.
there is a hard limit on the size of methods in a class file of 64k. You have at least one big method that you need to split up. It may have been coming in just under the limit for the initial compilation but the retrolambda conversion just pushed it over. You need to split these methods into smaller methods.
This error doesnt really give you a clue as to which methods are problematic but you can probably eyeball it.

Extend Store class to always execute a function after load on ExtJS

I am working on a project where we were asked to "patch" (they don't want a lot of time spent on development as they soon will replace the system) a system implemented under ExtJS 4.1.0.
That system is used under a very slow and non-stable network connection. So sometimes the stores don't get the expected data.
First two things that come to my mind as patches are:
1. Every time a store is loaded for the first time, wait 5 seconds and try again. Most times, a page refresh fix the problem of stores not loading.
Somehow, check detect that no data was received after loading a store and, try to get it again.
This patches should be executed only once to avoid infinite loops or unnecessary recursivity, given that it's ok that some times, it's ok that stores don't get any data back.
I don't like this kind of solutions but it was requested by the client.
This link should help with your question.
One of the posters suggests adding the below in an overrides.js file which is loaded in between the ExtJs source code and your applications code.
Ext.util.Observable.observe(Ext.data.Connection);
Ext.data.Connection.on('requestexception', function(dataconn, response, options){
if (response.responseText != null) {
window.document.body.innerHTML = response.responseText;
}
});
Using this example, on any error instead of echoing the error in the example you could log the error details for debugging later and try to load again. I would suggest adding some additional logic into this so that it will only retry a certain number of times otherwise it could run indefinitely while the browser window is open and more than likely crash the browser and put additional load on your server.
Obviously the root cause of the issue is not the code itself, rather your slow connection. I'd try to address this issue rather than any other.

"conn 0x7f7d6c001610 error: i=-2 errno=11 state=4 rc=3 br=721" Appearing in nxweb log

I am writing a custom handler and wanted to see how stable it is and just let my browser request the same URL over and over. It doesn't crash but I get some messages like the one in the title.
I.e.
conn 0x7f7d6c001610 error: i=-2 errno=11 state=4 rc=3 br=721
This also happens when I execute the "hello" example but far less often.
Could you give me any pointers as to why this happens? Do I have to fix this?
This error report is not critical. This means connection to client dropped for some reason before request has been fulfilled.

Problem with performance counters on Vista

I'm running into a strange issue on Vista with the Performance monitoring API. I'm currently using code that worked fine on XP/2k, based around PdhGetFormattedCounterValue(). I start out using PdhExpandWildCardPath to expand the counters (I'm interested in overall network statistics), the counters I'm looking at are:
\\Network Interface(*)\\Bytes Received/sec
\\Network Interface(*)\\Bytes Sent/sec
\\Processor(_Total)\\% Processor Time
The problem is that on their first call they return PDH_INVALID_DATA, I don't think this is a problem, since if I query it again I will start getting data without the error. The problem is this - while the processor time is worked exactly as expected, neither of the network interface counters are returning anything - just 0 all the time. I verified using Perfmon that they are reporting data normally, so I'm at a loss as to what might be the issue. I caught this at MS:
http://support.microsoft.com/?scid=kb%3Ben-us%3B287159&x=11&y=9
But I'm not interested in multi-language for my task, so I don't think this is relevant. I will see if I can come up with some basic code showing exactly what I'm doing, but nothing is returning anything strange, and it worked on XP/2k, so I suspect something changed under the hood. Thanks!
It turns out the issue was that the network interfaces are both wildcards, whereas the Processor one is actually already rolled up by the performance monitoring. What I didn't realize was that it PdhExpandWildCardPath didn't return something directly usable by PdhAddCounter. By this I mean that if ExpandWildCard returns 3 expanded matches, they come back as a null separated strings - I understood this, but I had assumed that AddCounter would be effectively create a counter containing all three. Nope, reality is I needed to break up each path and request it individually from AddCounter, then roll up the results manually when I get them.
Hopefully this helps someone else to avoid the same mistake I made with less frustration. ;)

Resources