In Linux, if two threads are created and both of them are running, when one of them calls recv() or any IO syscall that blocks when no data is available, what would happen to the whole process?
Will the other thread block also? I guess this depends on how threading is implemented. If thread library is in user space and kernel totally unaware of the threads within process, then process is the scheduling entity and thus both threads got blocked.
Further, if the other thread doesn't block because of this, can it then send() data via the same socket, which is blocking the recv thread? Duplexing?
Any ideas?
You're absolutely right that the blocking behavior will depend on if the thread is implemented in kernel space, or in user space. If threading is implemented purely in user space (that is, the kernel is completely uninvolved with the threading), then any blocking entry point into the kernel will need to be wrapped with some non-blocking variant that can simulate blocking semantics to its calling "thread" (e.g. using AIO to send / recv data instead of blocking, and the completion callback makes the thread runnable, again).
In Linux (and every other extant major OS I can think of), threading is implemented at the kernel level, or similar, and a blocking call into the kernel will not cause all other threads to block.
Yes, you can send() to a socket for which another thread is blocked on recv().
Blocking calls in one thread should not affect other threads.
If the blocked thread locks a mutex prior to entering the blocked call and the second thread attempts to lock the same mutex, then they the second thread would need to wait for the blocking call to finish and for the first thread to release the lock.
It's completely possible to implement threads in user-space such that one thread can proceed while another thread block on I/O.
The non-blocked thread should be able to send on the socket while the other thread is blocking on it (I've written such code).
Related
I can't seem to find a useful definition for "blocking" (or for that matter "non-blocking") when used in relation to POSIX C functions.
For example read() may be called in blocking or non-blocking mode on a FIFO pipe. If called in blocking mode, it will block until it's opened elsewhere for writing.
Will this blocking just seize up the thread? Or the process? Or will it pause the rendering of the multiverse?
Blocking means that the thread is de-scheduled off the CPU while waiting for an event to happen. When a thread is de-scheduled it doesn't consume any CPU cycles and allows other threads to make progress or put the CPU in a lower power state if there are no other threads waiting to run.
One thread blocking doesn't affect other threads you may have in the process. A blocking call only blocks the calling thread.
For example, read blocks when there is no data in the pipe to read. When data arrives it "unblocks" and the read call returns.
In the kernel each file description and other objects one can block on (e.g. mutex or condition_variable) have a list of waiting threads. When a thread blocks on an object it is appended to that object's wait list and de-scheduled off the CPU. Whenever an event for the object occurs the kernel checks the wait list for waiting threads for such an event and if there are any one or multiple threads get scheduled again and the blocking calls eventually return.
In non-blocking mode such calls do not block but return immediately an error code with errno being set to EWOULDBLOCK or EAGAIN, which are nowadays two different names for the same errno value. (pthread calls do not set errno but return the error value directly).
This is kind of generic question - however I met this problem several times already and I still haven't found the best possible solution.
Let's imagine you have program (e.g. HTTP application server) that is multithreaded and that communicates over sockets (TCP, Unix, ...). Main thread is using asynchronous IO and select() or poll() POSIX calls to dispatch traffic from/to sockets. There are also worker threads that process requests and provides responses. To send response back to the client, worker thread synchronises with main thread (that polls) 'somehow'. Core of the questions is 'how' - in terms of what is efficient. I can use pipe() - socket based IPC mechanism - but this seems to me as quite huge overhead. I tend to use some pthread IPC techniques like mutex, condition variables etc. … but these will not work with select() or poll().
Is there a common technique in POSIX (and surroundings) that address this conflict?
I guess on Windows there is WaitForMultipleObjects() function that allows that.
Example program is crafted to illustrate an issue, I know that I can design master/worker pattern in a different way but this is not what I'm asking for. I have other cases where I'm in the same situation.
You could use a signal to poke the worker thread, which will interrupt the select() call and return EINTR. This gets even easier to do with pselect().
For this to work:
decide on a signal (or allocate a real-time signal)
attach an empty handler function to it (if the signal were ignored, the system call would be automatically restarted)
block the signal, at least in the worker thread.
use the signal mask argument in pselect() to unblock the signal while waiting.
Between threads, you can use pthread_kill to deliver the signal to the worker thread specifically. When another process should send the signal, you can either make sure the signal is blocked in all but the worker thread (so it will be delivered there), or use the signal handler to find out whether the signal was sent to the worker thread, and use pthread_kill to forward it explicitly (the worker thread still doesn't need to do anything in the signal handler).
Due to laziness on my part, I don't have a source code viewer online, but you can clone the LibreVISA git tree, and take a look at src/messagepump.cpp, where this method is used to poke the worker thread after another thread added a file descriptor to the watch list.
Simon Richthers answer is v good.
Another alternative might be to make main thread only responsible for listening for new connections and starting up a worker thread with the connection information so that the worker is responsible for all subsequent ‘transactions’ from this source.
My understanding is:
Main thread uses select.
Worker threads processes requests forwarded to it by main thread.
So need to synchronize between workers and main thread e.g. when
worker finishes a transaction need to send response back to main
thread which in turn forwards the response back to the source.
Why don't you remove the problem of having to synchronize between the worker thread and the main thread by making the worker thread responsible for all transactions from a particular connection?
Thus the main thread is only responsible for listening for new connections and starting up a worker thread with the connection information i.e. the file descriptor for the new connection.
First of all, the way to wake another thread is to use the pthread_cond_wait / pthread_cond_timedwait calls in thread A to wait, and for thread B to use pthread_cond_broadcast / pthread_cond_signal to pick it up. So, for instance if B is a producer and A is the consumer, the producer might add items to a linked list protected with a mutex. There would be an associated conditional variable such that after the addition of the item, it could wake thread B such that it went to see if any new items had arrived on the list, and if so removed them. I say 'associated' as then the same mutex can be associated with the condition variable as protects the list.
So far so good. Now you mention asynchronous I/O. What I've wanted to do several times is select() or poll() on a set of FDs and a set of condition variables, so the select(), poll() is interrupted when the condition variable is broadcasted to. There is no easy way of doing this directly; you cannot simply mix and match.
You thus need to do one of two things. Either:
work around the problem (for instance, use a self-connected pipe() to send one byte to wake the select() up either instead of the condition variable, as well as the condition variable, or from some additional thread waiting on the condition variable; or
convert to a more threaded model. IE use one thread for sending, one thread for receiving, and use a producer / consumer model, so the sender thread simply removes from a list / buffer and sends (blocking if necessary), and the received waits for I/O (blocking if necessary) and adds it to the list (this is what you put in italics at the end).
The second is a major design change for those of us brought up on asynchronous I/O, and the first is ugly. You are not the first to be dismayed by this, but I've not found an easy way around it. Re the first an inefficiency, if you only write one character to wake the select loop to the self-pipe, I don't think you are going to see too much inefficiency.
What is the meaning of "blocking system call"?
In my operating systems course, we are studying multithreaded programming. I'm unsure what is meant when I read in my textbook "it can allow another thread to run when a thread make a blocking system call"
A blocking system call is one that must wait until the action can be completed. read() would be a good example - if no input is ready, it'll sit there and wait until some is (provided you haven't set it to non-blocking, of course, in which case it wouldn't be a blocking system call). Obviously, while one thread is waiting on a blocking system call, another thread can be off doing something else.
For a blocking system call, the caller can't do anything until the system call returns. If the system call may be lengthy (e.g. involve file IO or networking IO) this can be a bad thing (e.g. imagine a frustrated user hammering a "Cancel" button in an application that doesn't respond because that thread is blocked waiting for a packet from the network that isn't arriving). To get around that problem (to do useful work while you wait for a blocking system call to return) you can use threads - while one thread is blocked the other thread/s can continue doing useful work.
The alternative is non-blocking system calls. In this case the system call returns (almost) immediately. For lengthy system calls the result of the system call is either sent to the caller later (e.g. as some sort of event or message or signal) or polled by the caller later. This allows you to have a single thread waiting for many different lengthy system calls to complete at the same time; and avoids the hassle of threads (and locking, race conditions, the overhead of thread switches, etc). However, it also increases the hassle involved with getting and handling the system call's results.
It is (almost always) possible to write a non-blocking wrapper around a blocking system call; where the wrapper spawns a thread and returns (almost) immediately, and the spawned thread does the blocking system call and either sends the system call's results to the original caller or stores them where the original caller can poll for them.
It is also (almost always) possible to write a blocking wrapper around a non-blocking system call; where the wrapper does the system call and waits for the results before it returns.
I would suggest having a read on this very short text:
http://files.mkgnu.net/files/upstare/UPSTARE_RELEASE_0-12-8/manual/html-multi/x755.html
In particular you can read there why blocking system calls can be a worry with threads, not just with concurrent processes:
This is particularly problematic for multi-threaded applications since
one thread blocking on a system call may indefinitely delay the update
of the code of another thread.
Hope it helps.
A blocking system call is a system call by means of which any process is requesting some service from the system but that service is not currently available. So that particular system call blocks the process.
If you want to make it clear in context with multi threading you can go through the link...
I am writing a simple multi-client server communication program using POSIX threads in C. I am creating a thread every time a new client is connected, i.e. after the accept(...) routine in main().
I have put the accept(...) and the pthread_create(...) inside a while(1) loop, so that server continues to accept clients forever. Now, where should I write the pthread_join(...) routine after a thread exits.
More Info: Inside the thread's "start routine", I have used poll() & then recv() functions, again inside a while(1) loop to continuously poll for availability of client and receive the data from client, respectively. The thread exits in following cases:
1) Either poll() returns some error event or client hangs up.
2) recv() returns a value <= 0.
Language: C
Platform: Suse Linux Enterprise Server 10.3 (x86_64)
First up starting a new thread for each client is probably wasteful and surely won't scale very far. You should try a design where a thread handles more than one client (i.e. calls poll on more than one socket). Indeed, that's what poll(2), epoll etc were designed for.
That being said, in this design you likely needn't join the threads at all. You're not mentioning any reason why the main thread would need information from a thread that finished. Put another way, there's no need for joining.
Just set them as "detached" (pthread_detach or pthread_attr_setdetachstate) and they will be cleaned up automatically when their function returns.
The problem is that pthread_join blocks the calling thread until the thread exits. This means you can't really call it and hope the thread have exited as then the main thread will not be able to do anything else until the thread have exited.
One solution is that each child thread have a flag that is polled by the main thread, and the child thread set that flag just before exiting. When the main thread notices the flag being set, it can join the child thread.
Another possible solution, is if you have e.g. pthread_tryjoin_np (which you should have since you're on a Linux system). Then the main thread in its loop can simply try to join all the child threads in a non-blocking way.
Yet another solution may be to detach the child threads. Then they will run by themselves and do not need to be joined.
Ah, the ol' clean shutdown problem.
Assuming that you may want to cleanly disconnect the server from all clients under some circumstance or other, your main thread will have to tell the client threads that they're to disconnect. So how could this be done?
One way would be to have a pipe (one per client thread) between the main thread and client thread. The client thread includes the file descriptor for that pipe in its call to poll(). That way the main thread can easily send a command to the client thread, telling it to terminate. The client thread reads the command when poll() tells it that the pipe has become ready for reading.
So your main thread can then send some sort of command through the pipe to the client thread and then call pthread_join() waiting for the client thread to tidy itself up and terminate.
Similarly another pipe (again one per client thread) can be used by the client thread to send information to the main thread. Instead of being stuck in a call to accept(), the main thread can be using poll() to wait for a new client connection and for messages from the existing client threads. A timeout on poll() also allows the main thread to do something periodically.
Welcome to the world of the actor model of concurrent programming.
Of course, if you don't need a clean shut down then you can just let threads terminate as and when they want to, and just ctrl c the program to close it...
As other people have said getting the balance of work per thread is important for efficient scaling.
Without keeping a list of current threads, I'm trying to see that a realtime signal gets delivered to all threads in my process. My idea is to go about it like this:
Initially the signal handler is installed and the signal is unblocked in all threads.
When one thread wants to send the 'broadcast' signal, it acquires a mutex and sets a global flag that the broadcast is taking place.
The sender blocks the signal (using pthread_sigmask) for itself, and enters a loop repeatedly calling raise(sig) until sigpending indicates that the signal is pending (there were no threads remaining with the signal blocked).
As threads receive the signal, they act on it but wait in the signal handler for the broadcast flag to be cleared, so that the signal will remain masked.
The sender finishes the loop by unblocking the signal (in order to get its own delivery).
When the sender handles its own signal, it clears the global flag so that all the other threads can continue with their business.
The problem I'm running into is that pthread_sigmask is not being respected. Everything works right if I run the test program under strace (presumably due to different scheduling timing), but as soon as I run it alone, the sender receives its own signal (despite having blocked it..?) and none of the other threads ever get scheduled.
Any ideas what might be wrong? I've tried using sigqueue instead of raise, probing the signal mask, adding sleep all over the place to make sure the threads are patiently waiting for their signals, etc. and now I'm at a loss.
Edit: Thanks to psmears' answer, I think I understand the problem. Here's a potential solution. Feedback would be great:
At any given time, I can know the number of threads running, and I can prevent all thread creation and exiting during the broadcast signal if I need to.
The thread that wants to do the broadcast signal acquires a lock (so no other thread can do it at the same time), then blocks the signal for itself, and sends num_threads signals to the process, then unblocks the signal for itself.
The signal handler atomically increments a counter, and each instance of the signal handler waits until that counter is equal to num_threads to return.
The thread that did the broadcast also waits for the counter to reach num_threads, then it releases the lock.
One possible concern is that the signals will not get queued if the kernel is out of memory (Linux seems to have that issue). Do you know if sigqueue reliably informs the caller when it's unable to queue the signal (in which case I would loop until it succeeds), or could signals possibly be silently lost?
Edit 2: It seems to be working now. According to the documentation for sigqueue, it returns EAGAIN if it fails to queue the signal. But for robustness, I decided to just keep calling sigqueue until num_threads-1 signal handlers are running, interleaving calls to sched_yield after I've sent num_threads-1 signals.
There was a race condition at thread creation time, counting new threads, but I solved it with a strange (ab)use of read-write locks. Thread creation is "reading" and the broadcast signal is "writing", so unless there's a thread trying to broadcast, it doesn't create any contention at thread-creation.
raise() sends the signal to the current thread (only), so other threads won't receive it. I suspect that the fact that strace makes things work is a bug in strace (due to the way it works it ends up intercepting all signals sent to the process and re-raising them, so it may be re-raising them in the wrong way...).
You can probably get round that using kill(getpid(), <signal>) to send the signal to the current process as a whole.
However, another potential issue you might see is that sigpending() can indicate that the signal is pending on the process before all threads have received it - all that means is that there is at least one such signal pending for the process, and no CPU has yet become available to run a thread to deliver it...
Can you describe more details of what you're aiming to achieve? And how portable you want it to be? There's almost certainly a better way of doing it (signals are almost always a major headache, especially when mixed with threads...)
In multithreaded program raise(sig) is equivalent to pthread_kill(pthread_self(), sig).
Try kill(getpid(), sig)
Given that you can apparently lock thread creation and destruction, could you not just have the "broadcasting" thread post the required updates to thread-local-state in a per-thread queue, which each thread checks whenever it goes to use the thread-local-state? If there's outstanding update(s), it first applies them.
You are trying to synchronize a set of threads.
From a design pattern point of view the pthread native solution for your problem would be a pthread barrier.