How to handle exit codes with Camel Exec? - apache-camel

I'm using Camel Exec for automated shutdowns on some of our devices.
The shutdown command is pretty simple, and it mostly works fine:
from(START_DEEP_SLEEP)
.setBody(constant(null)) // we don't want stdin for exec
.setHeader(ExecBinding.EXEC_COMMAND_ARGS, constant("""shutdown $shutdownDelay "starting deep sleep shutdown" """))
.to("exec:sudo")
Obviously, this command will send a shutdown to the application executing it. That too isn't much of an issue, except that sometimes this produces an exit value of 143. I know the meaning of the return value, and it makes sense to see it here, but this only happens on some devices. Most others just return 0. They are all of the same type, so I really don't know where this discrepancy comes from, but it's not even that big an issue. The shutdown works none the less.
The problem is that camel exec logs this as an error:
ERROR 549 --- [Camel (camel-1) thread #1 - seda://start-deepsleep] o.a.camel.component.exec.ExecProducer : The command ExecCommand [args=[shutdown, now, starting deep sleep shutdown], executable=sudo, timeout=9223372036854775807, outFile=null, workingDir=null, useStderrOnEmptyStdout=false] returned exit value 143
This produces undesired noise in our monitoring, and I would rather not have it logged.
The core issue here is that Camel Exec does not throw, so there's no exception I could handle. It just logs the error, which then gets picked up by our log analysis.
I would like to handle that exit code gracefully without camel Exec logging an error. The return value is already logged separately anyways. How can I do that?

According to the docu http://camel.apache.org/exec.html there is a header ExecBinding.EXEC_EXIT_VALUE filled with the error number. Yours should be 143 (the docu states that this depends on the OS).
That could be a "hook" to handle the log entry, e.g. deleting the last entry with the same error number.
Of course this is only a cosmetic fix. The implementation could be like this:
from(START_DEEP_SLEEP)
.setBody(constant(null)) // we don't want stdin for exec
.setHeader(ExecBinding.EXEC_COMMAND_ARGS, constant("""shutdown $shutdownDelay "starting deep sleep shutdown" """))
.to("exec:sudo")
.when(header(ExecBinding.EXEC_EXIT_VALUE))
.to("direct:edit_the_log")
Please note that I did not test that code. Maybe u access that header with
.when(header(EXEC_EXIT_VALUE))
instead.
Please, inform me if that could be a proper solution or not.

Related

DBus : Can't get match rules for my user's session bus

I'm trying to use dbus/tools/GetAllMatchRules.py to get diagnostic information. When I run it without parameters as my regular user I get "GetConnectionMatchRules failed: did you enable the Stats interface?"
I modified GetAllMatchRules to print the specific exception details. It now says
GetConnectionMatchRules failed: did you enable the Stats interface?: org.freedesktop.DBus.Error.AccessDenied: The caller does not have the necessary privileged to call this method
So then I'm wondering, does it work at all? So I sudo su and run it again and it gives me the kind of information I'd expect to see, just not for the right bus. Oddly, if I use the --system parameter, even root gets org.freedesktop.DBus.Error.AccessDenied .
The repository claims, in bus/example-session-disable-stats.conf.in , that
"If the Stats interface was enabled at compile-time, users can use it on
the session bus by default. Systems providing isolation of processes
with LSMs might want to restrict this. This can be achieved by copying
this file in #EXPANDED_SYSCONFDIR#/dbus-1/session.d/
"
But that's clearly not the case because my user can NOT access this information.
I even tried a brute force approach to disabling (commenting out) ALL deny statements at /usr/share/dbus-1/system.conf and reloading and it still doesn't work. I also tried a full system restart in case I wasn't reloading correctly. I also did a system-wide search for system.conf in case it's actually using some other conf file that I'm not seeing, which would mean I'm modifying the wrong thing. I got a big hint that that's not the case when I had a typo (-- instead of --> for commenting out) and it failed to reload, but did reload once I fixed the typo.
I'm ok with the possibility that I can only do this signed in as root, so I also tried modifying GetAllMatchRules to use dbus.bus.BusConnection(), and force-feeding it the session address (unix:path=/run/user/1000/bus) which results in
"org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
Incidentally, this is the same issue that happens if I leave the code alone but use sudo -E su instead of just sudo su (the -E option in this case means that the $DBUS_SESSION_BUS_ADDRESS variable is retained)
I'm not sure what to try next...
Turns out there isn't currently a solution, the privilege error is simply the code that was chosen to indicate that the method is an unimplemented stub method

How to abort a macports portfile on an error condition?

I working on a version bump on the cc65 and encountered a problem with the linuxdoc-tools. Since I can't fix the linuxdoc-tools and there is a simple workaround possible I decided to add an if statement to inform the user together with the workaround:
if {! [file exists ${prefix}/bin/perl] } {
ui_error "
«${prefix}/bin/perl» is missing but the linuxdoc-tools depends on it.
Please create an appropriate symbolic link for linuxdoc-tools to work.
"
exit 1
}
Crude but the best I can do since I'm neither the perl5 nor the linuxdoc-tools maintainer and I don't want to spend to much time on a version bump.
However, the MacPorts doesn't understand exit 1 and ui_error won't stop execution on its own.
How do I stop the execution so not to waste the users time on a build which will otherwise fail right at the end.
Use return -code error "error message", or the shorthand for the same thing, error "error message".
Note that you should use ui_error before that to print a human-readable message for the user – while the error message is also being printed, it can sometimes get lost in the output.
Additionally, note that $prefix/bin/perl is a build dependency of linuxdoc-tools. If it is also needed at runtime, you should submit a pull request that adds depends_run path:bin/perl:perl5 to the port rather than attempting to fix this bug in your port.

generating trap/segfault messages in dmesg

I had a program that was segfaulting.
When I went to investigate and ran dmesg I could see lines like this:
[955.915050] traps: foo_bar[123] general protection ip:7f5fcc2d4306 sp:7ffd9e5868b8 ...
Now the program has been fixed and I'm trying to write some analysis scripts across different systems to find similar messages and was hoping to induce a line in the dmesg log to get a baseline for what to look for and see if there's a difference between, say, a sigbus(10) and a sigill(4)
I tried to do it via kill -11 on the command line . No entry in dmesg
I tried to do it via signal(getpid(), 11) in the code. No entry in dmesg
I tried to do it via signal 11 after attaching in gdb . No entry in dmesg
I tried to do it via writing bad code and it worked for SEGV, but I can't figure out how to trigger a SIGBUS (for example)
I'm guessing that there is more than one path for handling the signal depending on how it occurs and my attempts above just aren't doing it the right way.
How can I trigger/send a signal to my program that'll get a line in dmesg? Is there some kernel or log configuration I can twiddle to get those lines?
Update:
" __builtin_trap: when to use it? " shows how to get a SIGILL but alas doesn't have a signal-agnostic solution)

Debugging crash during app exit (WPF)

I'm trying to figure out why an WPF-app won't exit imediately on closing it. Using Process Explorer I hade found out that WerFault.exe is started while exiting which seem to indicate that something crashes during the teardown, perhaps some destructor or dispose that fails. This started happening when I recently switched to VS2015. I am running Windows 8.
My question is: How can I find out what the real problem is? Any way of finding a crash log for WerFault.exe? I have hundreds of destructors and dispose-methods so it's a bit hard to put breakpoints in all of them. Any other way of capturing these kinds of errors in VS?
The exit code is -1073740791 which "indicate a bug in the executed software that causes stack overflow, leading to abnormal termination of the software". But where?
Some more info from the event log:
Faulting module name: ucrtbase.DLL, version: 10.0.10240.16390, time stamp: 0x55a5b718
Exception code: 0xc0000409
Fault offset: 0x0000000000065a4e
You could try enabling user mode dumps:
Create the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps
Within LocalDumps, create a key that is the name of your executable
Within the key you just created, set the values of DumpFolder, DumpCount, DumpType, and CustomDumpFlags as needed (you should definitely set DumpType to 2 for full dumps, otherwise I don't think that enough information will be captured to debug a managed dump).
Once you have done this, whenever your executable crashes a dump file will be created in the folder specified by DumpFolder (or %LOCALAPPDATA%\CrashDumps by default).

Hiredis timeout with large number of concurrent requests

I'm using a redis integration plugin for Varnish called libvmod-redis. I'm seeing an issue where if I get a large number of concurrent requests, around 350, redis starts timing out and I eventually get a segfault in Varnish.
I get these errors:
varnishd[27892]: Child (27893) said redis error (connect): Connection timed out
varnishd[27892]: Child (27893) said redis error (command): err=1 errstr=Connection timed out
varnishd[19528]: Child (19529) said redis error (command): err=1 errstr=select(2): Invalid argument
varnishd[19528]: Child (19529) said redis error (command): err=1 errstr=Connection timed out
varnishd[19528]: last message repeated 9 times
varnishd[19528]: Child (19529) said redis error (command): err=1 errstr=select(2): Invalid argument
varnishd[19528]: Child (19529) said redis error (connect): fcntl(F_GETFL): Bad file descriptor
varnishd[19528]: Child (19529) said redis error (command): err=1 errstr=fcntl(F_GETFL): Bad file descriptor
kernel: [282284.005658] varnishd[19727] general protection ip:7f1f9dea1427 sp:7f1f4123c120 error:0 in libhiredis.so.0.10[7f1f9de9f000+9000]
My timeout is 1 second, and I'm using an ElastiCache node for Redis. I'm wondering what exactly could be failing here. I'm not an expert in C, so I feel like I'm missing something.
It would be interesting to see the code, especially around the select.
Since you have a lot of concurrent requests you better check the validity of your file descriptors: maybe on of them is closed during the process...
You may try the following:
To start, try with redis-cli --latency if the problem is the Redis server. If redis-cli has no latency issues from its point of view, the problem is the client.
If the problem is the server, you may want to follow this guide: http://redis.io/topics/latency

Resources