Signals and libraries - c

Are there any conventions/design pattern for using signals and signal handlers in a library code? Because signals are directed to the whole process and not to specific thread or library, i feel there may be some issues.
Let's say i m writing a shared library which will be used by other applications and i want to use alarm, setitimer functions and trap SIGALRM signal to do some processing at specific time.
I see some problems with it:
1) If application code (which i have no control of) also uses SIGALRM and i install my own signal handler for it, this may overwrite the application's signal handler and thus disable it's functionality. Of course i can make sure to call previous signal handler (retrieved by signal() function) from my own signal handler, but then there is still a reverse problem when application code can overwrite my signal handler and thus disable the functionality in my library.
2) Even worse than that, application developer may link with another shared library from another vendor which also uses SIGALRM, and thus nor me nor application developer would have any control over it.
3) Calling alarm() or setitimer() will overwrite the previous timer used by the process, so application could overwrite the timer i set in the library or vice versa.
I m kinda a novice at this, so i m wondering if there is already some convention for handling this? (For example, if every code is super nice, it would call previous signal handler from their own signal handler and also structure the alarm code to honor previous timers before overwriting them with their own timer)
Or should i avoid using signal handlers and alarm()s in a library alltogether?

Or should i avoid using signal handlers and alarm()s in a library alltogether?
Yes. For the reasons you've identified, you can't depend on signal disposition for anything, unless you control all code in the application.
You could document that your library requires that the application not use SIGALRM, nor call alarm, but the application developer may not have any control over that anyway, and so it's in your best interest to avoid placing such restrictions in the first place.
If your library can work without SIGALRM (perhaps with reduced functionality), you can also make this feature optional, perhaps controlled by some environment variable. Then, if it is discovered that there is some code that interferes with your signal handling, you can tell the end-user to set your environment variable to disable that part of your library (which beats having to rebuild and supply a new version of it).
P.S. Your question and this answer applies to any library, whether shared or archive.

Related

how to reset handlers registered by pthread_atfork

Some libraries might register some handlers with pthread_atfork(). I don't need them as I only use fork() together with exec(). Also, they can cause trouble in some cases. So, is there a way to reset the registered handler list?
Related: calling fork() without the atfork handlers, fork() async signal safety.
POSIX does not document any mechanism for fork handlers installed by pthread_atfork() to be removed, short of termination of the process or replacing the process image. If you don't want them, then don't install them. If they are installed by a third-party library, as you describe, then your options are to find a way to avoid that behavior of the library (possibly by avoiding the library altogether) or to live with it.

User-level threads context switching: How to detect when a thread is blocking in C?

As the title suggests, is there a way in C to detect when a user-level thread running on top of a kernel-level thread e.g., pthread has blocked (or about to block) for I/O?
My use case is as follows: I need to execute tasks in a multithreaded environment (on top of kernel threads e.g., pthreads). The tasks are basically user functions that can be synchronized and may use blocking operations within. I need to hide latency in my implementation. So, I am exploring the idea of implementing the tasks as user-level threads for better control of their execution context such that, when a task blocks or synchronizes, I context-switch to other ready tasks (i.e., implementing my own scheduler for the user-level threads). Consequently, almost the full use of the OS’s time quantum per kernel thread can be achieved.
There used to be code that did this, for example GNU pth. It's generally been abandoned because it just doesn't work very well and we have much better options now. You have two choices:
1) If you have OS help, you can use the OS mechanisms. Windows provides OS help for this, IOCP dispatching uses it.
2) If you have no OS help, then you have to convert all blocking operations into non-blocking ones that call your dispatcher rather than blocking. So, for example, if someone calls socket, you intercept that call and set the socket non-blocking. When they call read, you intercept that call and if they get a "would block" indication, you arrange to resume when the operation might succeed and schedule another thread.
You can look at GNU pth to see how you might make option 2 work. But be warned, GNU pth is full of reported bugs that have never been fixed since it was abandoned. It will give you an idea of how to implement things like mutexes and sleeps in a cooperative user-space threading environment. But don't actually use the code.

atexit considered harmful?

Are there inherent dangers in using atexit in large projects such as libraries?
If so, what is it about the technical nature behind atexit that may lead to problems in larger projects?
The main reason I would avoid using atexit in libraries is that any use of it involves global state. A good library should avoid having global state.
However, there are also other technical reasons:
Implementations are only required to support a small number (32, I think) of atexit handlers. After that, it's possible that all calls to atexit fail, or that they succeed or fail depending on resource availability. Thus, you have to deal with what to do if you can't register your atexit handler, and there might not be any good way to proceed.
Interaction of atexit with dlopen or other methods of loading libraries dynamically is not defined. A library which has registered atexit handlers cannot safely be unloaded, and the ways different implementations deal with this situation can vary.
Poorly written atexit handlers could have interactions with one another, or just bad behaviors, that prevent the program from properly exiting. For instance, if an atexit handler attempts to obtain a lock that's held in another thread and that can't be released due to the state at the time exit was called.
Secure CERT has an entry about atexit when not used correctly:
ENV32-C. All atexit handlers must return normally
https://www.securecoding.cert.org/confluence/display/seccode/ENV32-C.+All+atexit+handlers+must+return+normally

How to trigger spurious wake-up within a Linux application?

Some background:
I have an application that relies on third party hardware and a closed source driver. The driver currently has a bug in it that causes the device to stop responding after a random period of time. This is caused by an apparent deadlock within the driver and interrupts proper functioning of my application, which is in an always-on 24/7 highly visible environment.
What I have found is that attaching GDB to the process, and immediately detaching GDB from the process results in the device resuming functionality. This was my first indication that there was a thread locking issue within the driver itself. There is some kind of race condition that leads to a deadlock. Attaching GDB was obviously causing some reshuffling of threads and probably pushing them out of their wait state, causing them to re-evaluate their conditions and thus breaking the deadlock.
The question:
My question is simply this: is there a clean wait for an application to trigger all threads within the program to interrupt their wait state? One thing that definitely works (at least on my implementation) is to send a SIGSTOP followed immediately by a SIGCONT from another process (i.e. from bash):
kill -19 `cat /var/run/mypidfile` ; kill -18 `cat /var/run/mypidfile`
This triggers a spurious wake-up within the process and everything comes back to life.
I'm hoping there is an intelligent method to trigger a spurious wake-up of all threads within my process. Think pthread_cond_broadcast(...) but without having access to the actual condition variable being waited on.
Is this possible, or is relying on a program like kill my only approach?
The way you're doing it right now is probably the most correct and simplest. There is no "wake all waiting futexes in a given process" operation in the kernel, which is what you would need to achieve this more directly.
Note that if the failure-to-wake "deadlock" is in pthread_cond_wait but interrupting it with a signal breaks out of the deadlock, the bug cannot be in the application; it must actually be in the implementation of pthread condition variables. glibc has known unfixed bugs in its condition variable implementation; see http://sourceware.org/bugzilla/show_bug.cgi?id=13165 and related bug reports. However, you might have found a new one, since I don't think the existing known ones can be fixed by breaking out of the futex wait with a signal. If you can report this bug to the glibc bug tracker, it would be very helpful.

Shared POSIX objects cleanup on process end / death

Is there any way to perform POSIX shared synchronization objects cleanup especially on process crash? Locked POSIX semaphores unblock is most desired thing but automatically 'collected' queues / shared memory region would be nice too. Another thing to keep eye on is we can't in general use signal handlers because of SIGKILL which cannot be caught.
I see only one alternative: some external daemon which accepts subscriptions and 'keep-alive' requests working as watchdog so not having notifications about some object it could close / unlock object in accordance to registered policy.
Has anyone better alternative / proposition? I never worked seriously with POSIX shared objects before (sockets were enough for all my needs and are much more useful by my opinion) and I did not found any applicable article. I'd gladly use sockets here but can't because of historical reasons.
Rather than using semaphores you could use file locking to co-oridinate your processes. The big advanatge of file locks being that they are released if the process terminates. You can map each semaphore onto a lock for a byte in a shared file and know that locks will get released on exit; in mosts version of unix the bytes you lock don't even have to exist. There is code for this in Marc Rochkind's book Advanced Unix Programming 1st edition, don't know if it's in the latest 2nd edition though.
I know this question is old, but another great solution is POSIX robust mutexes. They automatically unlock and enter an "inconsistent flag" state when the owner dies, and the next thread to attempt locking the mutex gets an EOWNERDEAD error but succeeds in becoming the new owner of the mutex. It's then able to clean up whatever state the mutex was protecting (which could be in a very bad inconsistent state due to asynchronous termination of the previous owner!) and mark the mutex as consistent again before unlocking it.
See the documentation on robust mutexes here:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html
The usual way is to work with signal handlers. Just catch the signals and call the cleanup functions.
But your watchdog daemon has some merits, too. It would surely make the system more simple to understand and manage. To make it more simple to administrate, your application should start the daemon when it's not running and the daemon should be able to clean up any residue from the last crash.

Resources