Exiting gracefully from a multithreaded process

Exiting gracefully from a multithreaded process - c

I'm running a multi-threaded C program (process?) , making use of semaphores & pthreads. The threads keep interacting, blocking, waking & printing prompts on stdout continuously, without any human intervention. I want to be able to exit this process (gracefully after printing a message & putting down all threads, not via a crude CTRL+C SIGINT) by pressing a keyboard character like #.
What are my options for getting such an input from the user?
What more relevant information could I provide that will help to solve this problem?
Edit:
All your answers sound interesting, but my primary question remains. How do I get user input, when I don't know which thread is currently executing? Also, semaphore blocking using sem_wait() breaks if signalled via SIGINT, which may cause a deadlock.

There is no difference in reading standard input from threads except if more than one thread is trying to read it at the same time. Most likely your threads are not all calling functions to read standard input all the time, though.
If you regularly need to read input from the user you might want to have one thread that just reads this input and then sets flags or posts events to other threads based on this input.
If the kill character is the only thing you want or if this is just going to be used for debugging then what you probably want to do is occasionally poll for new data on standard input. You can do this either by setting up standard input as non-blocking and try to read from it occasionally. If reads return 0 characters read then no keys were pressed. This method has some problems, though. I've never used stdio.h functions on a FILE * after having set the underlying file descriptor (an int) to non-blocking, but suspect that they may act odd. You could avoid the use of the stdio functions and use read to avoid this. There is still an issue I read about once where the block/non-block flag could be changed by another process if you forked and exec-ed a new program that had access to a version of that file descriptor. I'm not sure if this is a problem on all systems. Nonblocking mode can be set or cleared with a 'fcntl' call.
But you could use one of the polling functions with a very small (0) timeout to see if there is data ready. The poll system call is probably the simplest, but there is also select. Various operating systems have other polling functions.
#include <poll.h>
...
/* return 0 if no data is available on stdin.
> 0 if there is data ready
< 0 if there is an error
*/
int poll_stdin(void) {
struct pollfd pfd = { .fd = 0, .events = POLLIN };
/* Since we only ask for POLLIN we assume that that was the only thing that
* the kernel would have put in pfd.revents */
return = poll(&pfd, 1, 0);
}
You can call this function within one of your threads until and as long as it retuns 0 you just keep on going. When it returns a positive number then you need to read a character from stdin to see what that was. Note that if you are using the stdio functions on stdin elsewhere there could actually be other characters already buffered up in front of the new character. poll tells you that the operating system has something new for you, not what C's stdio has.
If you are regularly reading from standard input in other threads then things just get messy. I'm assuming you aren't doing that (because if you are and it works correctly you probably wouldn't be asking this question).

You would have a thread listening for keyboard input, and then it would join() the other threads when receiving # as input.
Another way is to trap SIGINT and use it to handle the shutdown of your application.

The way I would do it is to keep a global int "should_die" or something, whose range is 0 or 1, and another global int "died," which keeps track of the number of threads terminated. should_die and died are both initially zero. You'll also need two semaphores to provide mutex around the globals.
At a certain point, a thread checks the should_die variable (after acquiring the mutex, of course). If it should die, it acquires the died_mutex, ups the died count, releases the died_mutex, and dies.
The main initial thread periodically wakes up, checks that the number of threads that have died is less than the number of threads, and goes back to sleep. The main thread dies when all the other threads have checked in.
If the main thread doesn't spawn all the threads itself, a small modification would be to have "threads_alive" instead of "died". threads_alive is incremented when a thread forks, and decremented when the thread dies.
In general, terminating a multithreaded operation cleanly is a pain in the butt, and besides special cases where you can use things like the semaphore barrier design pattern, this is the best I've heard of. I'd love to hear it if you find a better, cleaner one.
~anjruu

In general, I have threads waiting on a set of events and one of those events is the termination event.
In the main thread, when I have triggered the termination event, I then wait on all the threads having exited.

SIGINT is actually not that difficult to handle and is often used for graceful termination. You need a signal handler and a way to tell all the threads that it's time to stop. One global flag that threads check in their loops and the signal handler sets might do. Same approach works for "on user command" termination, though you need a way to get the input from the terminal - either poll in a dedicated thread, or again, set the terminal to generate a signal for you.
The tricky part is to unblock waiting threads. You have to carefully design the notification protocol of who tells who to stop and what they need to do - put dummy message into a queue, set a flag and signal a cv, etc.

Related

Cancelling calculation early using pthreads

I have a program in c where I want to do some calculations which may or may not take a very long time. It is hard to know beforehand how much time the calculations will take. The program has a cli so right now I usually do something like this
./program
do calculation 243
and it starts calculating. If I want to cancel it because it takes to much time I do ctrl+c and restart the program with another calculation. Now I would like for the program to cancel the calculation itself after either q has been pressed or for example 10 seconds has passed.
I have found a way which seems to do what I expect using pthreads. I'm however wondering if this is recommended or if there are for example any memory leaks or other things that can happen.
The following is my code
void *pthread_getc(void *ptr) {
char c = '\0';
while (c != 'q')
c = getc(stdin);
pthread_cancel((pthread_t)ptr);
}
void *pthread_sleep(void *ptr) {
sleep(10);
pthread_cancel((pthread_t)ptr);
}
void pthread_cancellable(void *(*ptr)(void *), struct arg *arg) {
pthread_t thread_main, thread_getc, thread_sleep;
pthread_create(&thread_main, NULL, ptr, (void *)arg);
pthread_create(&thread_getc, NULL, pthread_getc, (void *)thread_main);
pthread_create(&thread_sleep, NULL, pthread_sleep, (void *)thread_main);
pthread_join(thread_main, NULL);
pthread_cancel(thread_getc);
pthread_cancel(thread_sleep);
pthread_join(thread_getc, NULL);
pthread_join(thread_sleep, NULL);
}
the idea being that both pthread_getc and pthread_sleep can cancel main, and once main is cancelled so are these two. Then I simply call pthread_cancellable where the first argument is a function doing the calculation and the second argument is the arguments to the calculating function.
Can something go wrong with memory leaks here or something else? Is there an easier/better way to this in c?
What happens if main is cancelled two times and if a thread gets cancelled when its already done?

Can something go wrong with memory leaks here or something else?
If the program is going to terminate after aborting the computation then there is no issue with memory leaks. The system does not rely on processes to clean up after themselves -- it will reclaim all memory allotted to the process no matter how the process used it.
But your code violates the #1 rule of pthread_cancel(): never call pthread_cancel(). And although monitoring stdin for a q keystroke could work, that's a bit odd, and it potentially gets in the way of using stdin for something else you want to add to your program later.
Is there an easier/better way to this in c?
Yes. In the first place, if the objective is simply to terminate the program at timeout / user interrupt, then do that. That is is, have any thread call exit() when you want to terminate. You do not need to cancel any threads for that.
In the second place, I don't see what is gained by implementing a custom keyboard action (type 'q' to abort) when the standard interrupt signal sent by Ctrl-C works fine, and you even get the latter for free. If you want or need to perform some kind of extra behavior in response to an interrupt signal (before or instead of terminating), then register a handler for it.
There are multiple ways you could implement the early termination behavior, but here are outlines of two I like:
No-frills abortion upon timeout (or Ctrl-C):
Only the program's initial thread is needed.
Before it launches the computation, it creates and starts an interval timer (timer_create()) to count down the timeout. Configure the timer to raise SIGINT when it expires.
That's it. You get termination via the keyboard (albeit with Ctrl-C as you already do, not 'q') and the same termination behavior as far as an external observer can see in the event of a timeout.
optional addition 1:
If desired, you can install a handler for SIGINT to get extra or different behavior upon cancellation than you otherwise would. Note, however, that there are significant limits on what a signal handler may do. For example, maybe you want to emit a message to stderr (use write(), not fprintf() for such things), or you want to exit() with non-zero status instead of terminating (directly) because of the signal.
optional addition 2:
If the program reaches a point where it is not finished but it no longer wants to be terminated when the timeout is reached then it may at that point use timer_delete() to disable the timer.
With-frills abortion upon timeout (or Ctrl-C):
If you want to perform work in response to abort of the computation that is unsuited for a signal handler (too much, needs to call functions that are not async-signal-safe, ...) then you need a thread to do that in, and additional control structures and mechanisms. This is one way to do it:
Create and initialize a mutex, a condition variable, and a flag of type sig_atomic_t, all at file scope. The contract for these is that the flag may be accessed (read or write) only by a thread that currently holds the mutex locked, and that the mutex is the same that will be associated with all waits on the CV.
Install a signal handler for SIGINT that
locks the mutex
Provided that the flag does not indicate completion, updates it to indicate cancellation
unlocks the mutex
broadcasts to the CV
The last thing the computational thread will do after completing its work is (with the mutex locked) set the flag to a value indicating completion, and then broadcast to the CV.
The initial thread will then do this:
Setup as described in the previous points
lock the mutex
Create / start an interval timer (timer_create()) that raises SIGINT when it expires, after the chosen timeout period.
Start the computational thread
loop while the flag indicates ongoing computation. In the loop body
perform a wait on the CV
The computation having either completed successfully or been canceled at this point, perform whatever final actions are appropriate and then terminate, either by returning from main() or by calling exit().
That's still pretty clean, gets you both timeout-based and keyboard-based cancellation (albeit the latter with Ctrl-C instead of 'q'), puts all the cancellation handling in one place, and requires only one thread in addition to the computational one.
optional addition: abort in response to 'q'
Although I do not recommend it, if you really must have that termination by typing 'q', then you can set up another thread that monitors for that keypress / character, and performs a raise(SIGINT) if it sees it.

Will signals be delivered to a program blocked on POSIX semaphore?

This really is two questions, but I suppose it's better they be combined.
We're working on a client that uses asynchronous TCP connection. The idea is that the program will block until certain message is received from the server, which will invoke a SIGPOLL handler. We are using a busy waiting loop, basically:
var = 1
while (var) usleep(100);
//...and somewhere else
void sigpoll_handler(int signum){
......
var = 0;
......
}
We would like to use something more reliable instead, like a semaphore. The thing is, when a thread is blocked on a semaphore, will the signal get through still? Especially considering that signals get delivered when it switches back to user level; if the process is off the runqueue, how will it happen?
Side question (just out of curiosity):
Without the "usleep(100)" the program never progresses past the while loop, although I can verify the variable was set in the handler. Why is that? Printing changes its behaviour too.
Cheers!

[too long for a comment]
Accessing var from inside the signal handler invokes undefined behaviour (at least for a POSIX conforming system).
From the related POSIX specification:
[...] if the process is single-threaded and a signal handler is executed [...] the behavior is undefined if the signal handler refers to any object [...] with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t [...]
So var shall be defined:
volatile sig_atomic_t var;
The busy waiting while-loop, can be replaced by a single call to a blocking pause(), as it will return on reception of the signal.
From the related POSIX specification:
The pause() function shall suspend the calling thread until delivery of a signal whose action is either to execute a signal-catching function or to terminate the process.
Using pause(), btw, will make the use of any global flag like var redundant, to not say needless.

Short answer: yes, the signal will get through fine with a good implementation.
If you're going to be using a semaphore to control the flow of the program, you'll want to have the listening be on one child with the actual data processing be on another. This will then put the concurrency fairness in the hands of the OS which will make sure your signal listening thread gets a chance to check for a signal with some regularity. It shouldn't ever be really "off the runqueue," but cycling through positions on the runqueue instead.
If it helps you to think about it, what you have right now seems to basically be a a very rough implementation of a semaphore on its own -- a shared variable whose value will stop one block of code from executing until another code block clears it. There isn't anything inherently paralyzing about a semaphore on a system level.
I kind of wonder why whatever function you're using to listen for the SIGPOLL isn't doing its own blocking, though. Most of those utilities that I've seen will stop their calling thread until they return a value. Basically they handle the concurrency for you and you can code as if you were dealing with a normal synchronous program.
With regards to the usleep loop: I'd have to look at what the optimizer's doing, but I think there are basically two possibilities. I think it's unlikely, but it could be that the no-body loop is compiling into something that isn't actually checking for a value change and is instead just looping. More likely to me would be that the lack of any body steps is messing up the underlying concurrency handling, and the loop is executing so quickly that nothing else is getting a chance to run -- the queue is being flooded by loop iterations and your signal processsing can't get a word in edgewise. You could try just watching it for a few hours to see if anything changes; theoretically if it's just a concurrency problem then the random factor involved could clear the block on its own with a few billion chances.

halting a client server program

I am sorry for the basicness of this question, but I am having an issue here. I have a client-server program. I don't know before hand how many connections will come but they are not infinite. And at the end , after all connections are closed some results are output. But the problem I am having is, accepting connections is in an infinite while loop, how is it stoppedd to output the result.
Thanks

you need to have some form of condition to break out of you loop, in your case, a timeout would probably work the best, basically meaning, if you don't get any new clients for x seconds, you stop looking for clients, same goes for any for of connection error.
Anything more requires looking at the code you are using.

Handling EINTR on error from accept(2) with terminating the program and hitting ^C usually works.

You could install a handler for the SIGTERM signal which would set a global volatile sig_atomic_t variable, and test that variable in your multiplexing loop (probably around poll or select). Remember that signal handlers cannot call many functions (only the async-signal-safe ones).
Catching nicely SIGTERM is expected from most Linux or Posix servers.
You could consider using an event handling library like libev, libevent etc.

Although my background is with Windows NT the function "names" are ones that name generic threading or process functions that should be available in any multi-threading environment.
If the main thread can determine when the child thread in question should terminate it can either do this by having the child thread loop on a boolean - such as "terminate_conditon" - or by terminating the thread throught its handle.
// child thread
terminate_condition=FALSE;
while (!terminate_condition)
{
// accept connections
}
child_thread_done=TRUE;
// output results
exit_thread ();
// main thread
child_thread_done=FALSE;
child_thread=create_thread (...);
// monitor connections to determine when done
terminate_condition=TRUE;
while (!child_thread_done)
{
sleep (1);
}
// or maybe output results here?
exit_process ();
This controlled termination solution requires that only one thread writes to the child_thread_done boolean and that any other thread only reads.
Or
// child thread
while (1)
{
// accept connections
}
// main thread
child_thread=create_thread (...);
// monitor connections to determine when done
kill_thread (child_thread);
// output results
exit_process ();
The second form is messier since it simply kills the child thread. In general it is better to have the child thread perform a controlled termination, especially if it has allocated resources (which become the responsibility of the process as a whole rather than just the allocating thread).
If there are many child threads working with connections a synchronized termination mechanism is necessary: either a struct with as many members as there are child threads (a terminating thread sets its "terminated" boolean to true, terminates and the main thread monitors the struct to make sure all child "terminated" booleans are true before proceeding) or a counter containing the number of child threads operating (when a child is about to terminate it takes exclusive control of the counter via a spinlock, decrements it and frees the lock before terminating: the main thread doesn't do anything before the counter contains zero).

Multi-threaded reads from one pipe

I'm implementing a system that runs game servers. I have a process (the "game controller") that creates two pipe pairs and forks a child. The child dups its STDIN to one pipe, and dups its STDOUT and STDERR to the other, then runs execlp() to run the game code.
The game controller will have two threads. The first will block in accept() on a named UNIX socket receiving input from another application, and the second thread is blocking read()ing the out and error pipe from the game server.
Occasionally, the first thread will receive a command to send a string to the stdin pipe of the game server. At this point, somehow I need to stop the second thread from read()ing so that the first thread can read the reply from the out and error pipe.
(It is worth noting that I will know how many characters/lines long the reply is, so I will know when to stop reading and let the second thread resume reading, resetting the process.)
How can I temporarily switch the read control to another thread, as above?

There are a couple of options. One would be to have the second thread handle all of the reading, and give the first thread a way to signal it to tell it to pass the input back. But this will be somewhat complicated; you will need to set up a method for signalling between the threads, and make sure that the first thread tells the second thread that it wants the input before the second thread reads it and handles it itself. There will be potential for various kinds of race conditions that could make your code unpredictable.
Another option is to avoid using threads at all. Just use select(2) (or poll(2)) to allow one thread to wait for activity on several file descriptors at once. select allows you to indicate the set of file descriptors that you are interested in. As soon as any activity happens on one of them (a connection is available to accept, data is available to read), select will return, indicating the set of file descriptors that are ready. You can then accept or read on the appropriate descriptors, and when you are done, loop and call select again to wait for the next I/O event.

Forcing a function to end using SIGALRM in C

Right now I have a function connected to SIGARLM that goes off after 1 second and will re-alarm itself to go off in another second everytime. There's a test in the logic of the SIGALRM function I wrote to see if a certain timeout has been reached and when it does I need it to kill a function that's running. Does anybody know how I can do this?
I forgot to mention: in the function that needs to be killed it waits on scanf() and the function needs to die even if scanf() hasn't returned yet.

One approach that might be worth looking into is using select to poll stdin and see if any data is ready. select lets you wait for some period of time on a file descriptor, controlling when you can be interrupted and by what, and seems like it's perfect here. You could just sit in a loop waiting for up to a second, then failing gracefully if no data is available. That way, SIGALRM wouldn't need to kill the function; it would take care of that all by itself.

Not sure exactly what you're asking or what the structure of the program is. If I understand correctly: some function is running and you want to terminate it if it's been running for X time. You have a SIGALARM wake up every second and that will check the running time of the other function and do the terminate.
How do you plan to kill the function? Is it a function in the same process, or is it a separate process. Is your question how to terminate it or how to tell when it needs to be terminated?
I've done something which I believe is similar. I had a multi-threaded application with a structure which contained information about the threads I wished to monitor. The structure contained a member variable "startTime". My monitoring (SIGALARM) function had access to a list of threads. When the monitor woke up it would traverse the list, compare current time to each thread startTime and send a message to the function if it had exceeded it's allowed runtime.
Does this help at all?

You could use a (global) variable to communicate between the signal handler and the function that should be stopped. The function then would check that variable to see if it should still continue running or if it should exit.
Something line this:
volatile int worker_expired = 0;
void worker() {
while (!worker_expired) {
// ...
}
}
void sig_alrm() {
worker_expired = 1;
}

If you want the signal to terminate IO operations, you need to make sure it's an interrupting signal handler. On modern systems, system calls interrupted by signals automatically restart unless you specify otherwise. Use the sigaction function rather than the signal function to setup your signal handlers if you want control over things like this. With sigaction, unless you specify SA_RESTART, signal handlers can interrupt.
If you're using file-descriptor IO functions like read, you should now get the effects you want.
If you're using stdio functions like fscanf, getting interrupted by a signal will put the FILE into an error state that can only be cleared by clearerr, and will lose any partial input in the buffer. Interrupting signals do not mix very well with stdio unless you just want to abort all operations on the file and close it when a signal is received.

So ... to restate slightly: it isn't so much that you want to kill the function as that you want any pending i/o to terminate and the function to exit.
I would either:
use select() to periodically wake up and check a flag set by the signal handler. if the flag isn't set and there's no input pending then loop and call select() again.
i suspect that your SIGALARM handler is doing more than just checking this one timer, and so using pselect() to check for i/o OR SIGALARM is probably not an option for you. i wonder if you could grab a user defined signal, and pass that in pselect. then your alarm handler would send that user defined signal.
Regarding choice 1, if SIGALARM is waking every second then you can adjust the time that select() sleeps to be within your maximum error latency. In other words assume that the timeout occurs immediately after the call to select(), then it will take until select() wakes up to detect the flag set by the SIGALARM handler. So if select() wakes up 10 times per second then it could take up to 1/10 second to detect the setting of the "give up" flag (set by the SIGALARM handler).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight