I am writing daemon application for Debian Sid. It works perfectly most of the times, but dies silently after i put my laptop to suspend (or hibernate). So i have a couple of questions:
What should I Google for solutions?
Maybe, you have any ideas what is going on?
Try strace-ing the daemon to see what is the reason it dies silently. Generally, suspend/hibernate alone should have no effect on user processes.
Daemon's loop was on blocking read call, and suspend (hibernate) interrupts it. So, should check errnos more accurately.
Fixed by adding:
if ( errno == EINTR ) continue;
Related
Let's say I get myself into a situation where I do not know how to recover. What would be the best way to self restart the process? What I'm looking for is something similar to closing itself and launching itself again. On some arduino's I can call NVIC_SystemReset however I'd like a function for windows, mac and linux.
I was thinking perhaps the only way is to execute a detached process and let myself shut down? With shellexecute on windows and execl on linux?
As mentioned by #Lundin, this is for processor not micro-controller.
This answer might not be correct in your case.
1) Create a proxy process to delegate to main process.
2) This proxy process will redirect to main process.
3) If your main process fails/return due to any reason then restart the main process. Otherwise if it normal exit of main program then end the proxy process.
Some background:
I have an application that relies on third party hardware and a closed source driver. The driver currently has a bug in it that causes the device to stop responding after a random period of time. This is caused by an apparent deadlock within the driver and interrupts proper functioning of my application, which is in an always-on 24/7 highly visible environment.
What I have found is that attaching GDB to the process, and immediately detaching GDB from the process results in the device resuming functionality. This was my first indication that there was a thread locking issue within the driver itself. There is some kind of race condition that leads to a deadlock. Attaching GDB was obviously causing some reshuffling of threads and probably pushing them out of their wait state, causing them to re-evaluate their conditions and thus breaking the deadlock.
The question:
My question is simply this: is there a clean wait for an application to trigger all threads within the program to interrupt their wait state? One thing that definitely works (at least on my implementation) is to send a SIGSTOP followed immediately by a SIGCONT from another process (i.e. from bash):
kill -19 `cat /var/run/mypidfile` ; kill -18 `cat /var/run/mypidfile`
This triggers a spurious wake-up within the process and everything comes back to life.
I'm hoping there is an intelligent method to trigger a spurious wake-up of all threads within my process. Think pthread_cond_broadcast(...) but without having access to the actual condition variable being waited on.
Is this possible, or is relying on a program like kill my only approach?
The way you're doing it right now is probably the most correct and simplest. There is no "wake all waiting futexes in a given process" operation in the kernel, which is what you would need to achieve this more directly.
Note that if the failure-to-wake "deadlock" is in pthread_cond_wait but interrupting it with a signal breaks out of the deadlock, the bug cannot be in the application; it must actually be in the implementation of pthread condition variables. glibc has known unfixed bugs in its condition variable implementation; see http://sourceware.org/bugzilla/show_bug.cgi?id=13165 and related bug reports. However, you might have found a new one, since I don't think the existing known ones can be fixed by breaking out of the futex wait with a signal. If you can report this bug to the glibc bug tracker, it would be very helpful.
I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon
Any idea on how to capture closing the terminal window that my program is running in?
While I'm at it, any way to capture when the computer is shutting down but the program is still running, or if the user logs off?
If on Unix/Linux: Did you have a look at SIGTERM? This is at least the one sent to you during shutdown.
You could try the atexit() function ? (see comments)
Or look at this post here: Signals received by bash when terminal is closed
Try catching SIGTERM. Note that you can not capture SIGKILL which might be what happens during shutdown after a certain amount of time. I found this really nice post that explains some differences too.
[update] Longshot here but what about testing if std-in/out is still open and good? When the terminal dies those file descriptors should be scrapped. Disclaimer, this is a guess at best.
From my tests... the signal that my program is receiving when closing terminal is 1 or SIGHUP
We have a small daemon application written in C for a couple of various UNIX platforms (this problem is happening in SunOS 5.10), that basically just opens up a serial port and then listens for information to come in via said port.
In this particular instance, the daemon appears to read a single transmission (like a file's worth of data) sent over via the serial port, then it receives a SIGINT. This happens every time. Other customers use this setup very similarly without receiving the SIGINT. Quite obviously, the users are NOT pressing Ctrl-C. We have a relatively simple signal handler in place, so we definitely know that that is what is happening.
What else could possibly be causing this? Googling around and looking through the questions here, I couldn't find much explanation as to other things that could cause a SIGINT. I also looked through the code and found no calls to raise() and only a single call to kill(pid, 0) which wouldn't send a SIGINT anyway.
Any thoughts or insight would definitely be appreciated.
If you do not want the serial port to become the controlling terminal for the process, open it using the open flag O_NOCTTY. If it is the controlling terminal, data from the serial port may be interpreted as an interrupt or other special character.
You didn't say how your signal handler is attached, but if you're able to attach it using sigaction(2) so as to get a siginfo_t then it looks like that would include the pid that sent the signal (si_pid).
I found an interesting blog post about debugging a problem with similar symptoms. While I doubt it's the same issue, it's got some very useful debugging tips for tracing the origin of signals.