Implementing blocking between pthreads without conditional variables - c

I'm implementing a boss/worker design pattern using pthreads on Linux. I want to have a boss thread that constantly checks for work, and if there is work, then wakes up a sleeping worker to do the work. My question is: what type of IPC synchronization/mechanism should I use to achieve the least latency between my boss thread handing off to my worker, and my worker waking up?
The easy solution is to use Pthread conditional variables and call pthread_cond_signal in the boss thread, and pthread_cond_wait in each of the worker threads, but I'm wondering
is there something faster that I can use to implement the blocking and signaling? For example, how would using pipes between the boss and worker threads fare?
how can I measure the performance of one type of IPC versus another? For example, I see benchmarks for pipe()'s and fork()'s, but nothing for using pipe()'s as an interthread communication.
Let me know if I can clarify anything in my questions!
EDIT
As an example of how I would use pipe()'s to implement blocking between my worker and boss threads, the worker thread would read() a pipe, and since it's empty would then block on that read call until the boss calls write() on it.

The glibc implementation of pthreads uses the low-level "futex" locks to implement pthread_cond_wait() / pthread_cond_signal(). Futexes were designed to be a fast synchronisation primitive, so these are likely to outperform pipes or similar methods (at the very least, using pipes requires copying a byte to and from kernel space that isn't needed for futexes).
If pthread_cond_wait() / pthread_cond_signal() map well onto your problem (and it sounds like they do), then the only way to outperform them is likely to be to implement something on futexes yourself (for example, you could eliminate the handling of thread cancellation if you do not use that).
It is probably worthwhile benchmarking your application - unless your work units are very small indeed, then the condition variable wakeup latency is unlikely to dominate.

What you should do first is being sure you need something faster. Since pthread signaling is implemented using futex, where futex stands for fast user space mutex, I don't think you can out perform them.
If you have waiting threads, by definition you will have to wake them up, and this round trip through the kernel will be the source of your unwanted latency.
But what you should do is really think about your problem :
if you constantly have work to do, then your worker thread is always busy. Work will be done when previous work is finished, and you don't care about the latency.
If what matters is the latency between the boss detecting an event and the worker starting to work, then why do you use a boss -> worker pattern ?
My advice would be to look for a faster thing when you really need it, at this time you will probably have a much mre detailed question to ask. Maybe I am wrong, but it looks like you are trying to optimize preemptively, which as you perhaps know is the root of all evil. Of course, bad design can lead to massive rework, but here you are dealing with a very small detail of your real design decision which is using a boss / worker pattern.
Implement your design with pthread_signal, or perhaps semp_post() / sem_wait(), and then look where your latency really is, and if it is really a problem.

I would guess signal and wait would be the best. Most OS recognize threads and can have them just idle until the interrupt comes. With pipes the worker would have to keep waking up and checking the pipe for output. The best testing I've found for efficiency has usually been using the unix command to get the running time from start to finish(assuming the program isn't meant to keep running in the background), set up a script to do it a few times and compare.

Related

How does a process know that semaphore is available

I have a very basic doubt.
when a process is waiting on a semaphore , it goes into sleep state.
So no way it can poll the semaphore value.
Does kernel poll the semaphore value and if available sends a signal to all process waiting for it ? If so, wont it be too much overhead for the kernel.
Or does the signal() call internally notifies all the process waiting for the semaphore.
Please let me know on this.
The operating system schedules the process once more when the operating system is told by another process that it has done with the semaphore.
Semaphores are just one of the ways of interacting with the OS scheduler.
The kernel doesn't poll the semaphore; it doesn't need to. Every time a process calls sem_post() (or equivalent), that involves interaction with the kernel. What the kernel does during the sem_post() is look up whatever processes have previously called sem_wait() on the same semaphore. If one or more processes have called sem_wait(), it picks the process with the highest priority and schedules it. This shows up as that sem_wait() finally returning and that process carries on executing.
How This is Implemented Under the Hood
Fundamentally the kernel needs to implement something called an "atomic test and set". That is an operation where by the value of some variable can be tested and, if a certain condition is met (such as the value == 0) the variable value is altered (e.g. value = 1). If this succeeds, the kernel will do one thing, (like schedule a process), if this does not (because the condition value==0 was false) the kernel will do something difference (like put a process on the do-not-schedule list). The 'atomic' part is that this decision is made without anything else being able to look at and change the same variable at the same time.
There's several ways of doing this. One is to suspend all processes (or at least all activity within the kernel) so that nothing else is testing the value of the variable at the same time. That's not very fast.
For example, the Linux kernel once had something called the Big Kernel Lock. I don't know if this was used to process semaphore interactions, but that's the kind of thing that OSes used to have for atomic test & sets.
These days CPUs have atomic test & set op codes, which is a lot faster. The good ole' Motorola 68000 had one of these a long time ago; it took CPUs like the PowerPC and the x86 many, many years to get the same kind of instruction.
If you root around inside linux you'll find mention of futexes. a futex is a fast mutex - it relies on a CPU's test/set instruction to implement a fast mutex semaphore.
Post a Semaphore in Hardware
A variation is a mailbox semaphore. This is a special variation on a semaphore that is extremely useful in some system types where hardware needs to wake up a process at the end of a DMA transfer. A mailbox is a special location in memory which when written to will cause an interrupt to be raised. This can be turned into a semaphore by the kernel because when that interrupt is raised, it goes through the same motions as it would had something called sem_post().
This is incredibly handy; a device can DMA a large amount of data to some pre-arranged buffer, and top that off with a small DMA transfer to the mail box. The kernel handles the interrupt, and if a process has previously called sem_wait() on the mailbox semaphore the kernel schedules it. The process, which also knows about this pre-arranged buffer, can then process the data.
On a real time DSP systems this is very useful, because it's very fast and very low latency; it allows a process to receive data from some device with very little delay. The alternative, to have a full up device driver stack that uses read() / write() to transfer data from the device to the process is incredibly slow by comparison.
Speed
The speed of semaphore interactions depends entirely on the OS.
For OSes like Windows and Linux, the context switch time is fairly slow (in the order of several microseconds, if not tens of microseconds). Basically this means that when a process calls something like sem_post(), the kernel is doing a lot of different things whilst it has the opportunity before finally returning control to the process(es). What it's doing during this time could be, well, almost anything!
If a program has made use of a lot threads, and they're all rapidly interacting between themselves using semaphores, quite a lot of time is lost to the sem_post() and sem_wait(). This places an emphasis on doing a decent amount of work once a process has returned from sem_wait() before calling the next sem_post().
However on OSes like VxWorks, the context switch time is lightning fast. That is there's very little code in the kernel that gets run when sem_post() is called. The result is that a semaphore interaction is a lot more efficient. Moreover, and OS like VxWorks is written in such a way so as to guarantee that the time take to do all this sem_post() / sem_wait() work is constant.
This influences the architecture of one's software on these systems. On VxWorks, where a context switch is cheap, there's very little penalty in having a large number of threads all doing quite small tasks. On Windows / Linux there's more of an emphasis on the opposite.
This is why OSes like VxWorks are excellent for hard real time applications, and Windows / Linux are not.
The Linux PREEMPT_RT patch set in part aims to improve the latency of the linux kernel during operations like this. For example, it pushes a lot of device interrupt handlers (device drivers) up into kernel threads; these are scheduled almost just like any other thread. The idea is to reduce the amount of work that is being done by the kernel (and have more done by kernel threads), so that the work it still has to do itself (such as handling sem_post() / sem_wait()) takes less time and is more consistent about how long this takes. It still not a hard guarantee of latency, but it's a pretty good improvement. This is what we call a soft-realtime kernel. The impact though is that overall throughput of the machine can be lower.
Signals
Signals are nasty, horrible things that really get in the way of using things like sem_post() and sem_wait(). I avoid them like the plague.
If you are on a Linux platform and you do have to use signals, take a serious long look at signalfd (man page). This is a far better way of dealing with signals because you can choose to accept them at a convenient time (simply by called read()), instead of having to handle them as soon as they occur. Certainly if you're using epoll() or select() anywhere at all in a program then signalfd is the way to go.

Check if pthread is still alive in Linux C

I know similar questions have been asked, but I think my situation is little bit different. I need to check if child thread is alive, and if it's not print error message. Child thread is supposed to run all the time. So basically I just need non-block pthread_join and in my case there are no race conditions. Child thread can be killed so I can't set some kind of shared variable from child thread when it completes because it will not be set in this case.
Killing of child thread can be done like this:
kill -9 child_pid
EDIT: alright, this example is wrong but still I'm sure there exists way to kill a specific thread in some way.
EDIT: my motivation for this is to implement another layer of security in my application which requires this check. Even though this check can be bypassed but that is another story.
EDIT: lets say my application is intended as a demo for reverse engineering students. And their task is to hack my application. But I placed some anti-hacking/anti-debugging obstacles in child thread. And I wanted to be sure that this child thread is kept alive. As mentioned in some comments - it's probably not that easy to kill child without messing parent so maybe this check is not necessary. Security checks are present in main thread also but this time I needed to add them in another thread to make main thread responsive.
killed by what and why that thing can't indicate the thread is dead? but even then this sounds fishy
it's almost universally a design error if you need to check if a thread/process is alive - the logic in the code should implicitly handle this.
In your edit it seems you want to do something about a possibility of a thread getting killed by something completely external.
Well, good news. There is no way to do that without bringing the whole process down. All ways of non-voluntary death of a thread kill all threads in the process, apart from cancellation but that can only be triggered by something else in the same process.
The kill(1) command does not send signals to some thread, but to a entire process. Read carefully signal(7) and pthreads(7).
Signals and threads don't mix well together. As a rule of thumb, you don't want to use both.
BTW, using kill -KILL or kill -9 is a mistake. The receiving process don't have the opportunity to handle the SIGKILL signal. You should use SIGTERM ...
If you want to handle SIGTERM in a multi-threaded application, read signal-safety(7) and consider setting some pipe(7) to self (and use poll(2) in some event loop) which the signal handler would write(2). That well-known trick is well explained in Qt documentation. You could also consider the signalfd(2) Linux specific syscall.
If you think of using pthread_kill(3), you probably should not in your case (however, using it with a 0 signal is a valid but crude way to check that the thread exists). Read some Pthread tutorial. Don't forget to pthread_join(3) or pthread_detach(3).
Child thread is supposed to run all the time.
This is the wrong approach. You should know when and how a child thread terminates because you are coding the function passed to pthread_create(3) and you should handle all error cases there and add relevant cleanup code (and perhaps synchronization). So the child thread should run as long as you want it to run and should do appropriate cleanup actions when ending.
Consider also some other inter-process communication mechanism (like socket(7), fifo(7) ...); they are generally more suitable than signals, notably for multi-threaded applications. For example you might design your application as some specialized web or HTTP server (using libonion or some other HTTP server library). You'll then use your web browser, or some HTTP client command (like curl) or HTTP client library like libcurl to drive your multi-threaded application. Or add some RPC ability into your application, perhaps using JSONRPC.
(your putative usage of signals smells very bad and is likely to be some XY problem; consider strongly using something better)
my motivation for this is to implement another layer of security in my application
I don't understand that at all. How can signal and threads add security? I'm guessing you are decreasing the security of your software.
I wanted to be sure that this child thread is kept alive.
You can't be sure, other than by coding well and avoiding bugs (but be aware of Rice's theorem and the Halting Problem: there cannot be any reliable and sound static source code program analysis to check that). If something else (e.g. some other thread, or even bad code in your own one) is e.g. arbitrarily modifying the call stack of your thread, you've got undefined behavior and you can just be very scared.
In practice tools like the gdb debugger, address and thread sanitizers, other compiler instrumentation options, valgrind, can help to find most such bugs, but there is No Silver Bullet.
Maybe you want to take advantage of process isolation, but then you should give up your multi-threading approach, and consider some multi-processing approach. By definition, threads share a lot of resources (notably their virtual address space) with other threads of the same process. So the security checks mentioned in your question don't make much sense. I guess that they are adding more code, but just decrease security (since you'll have more bugs).
Reading a textbook like Operating Systems: Three Easy Pieces should be worthwhile.
You can use pthread_kill() to check if a thread exists.
SYNOPSIS
#include <signal.h>
int pthread_kill(pthread_t thread, int sig);
DESCRIPTION
The pthread_kill() function shall request that a signal be delivered
to the specified thread.
As in kill(), if sig is zero, error checking shall be performed
but no signal shall actually be sent.
Something like
int rc = pthread_kill( thread_id, 0 );
if ( rc != 0 )
{
// thread no longer exists...
}
It's not very useful, though, as stated by others elsewhere, and it's really weak as any type of security measure. Anything with permissions to kill a thread will be able to stop it from running without killing it, or make it run arbitrary code so that it doesn't do what you want.

Linux Scheduling: OS vs "virtual"

How does one implement a multithreaded single process model in linux fedora under c where a single scheduler is used on a "main" core reading i/o availability (ex. tcp/ip, udp) then having a single-thread-per-core (started at init), the "execution thread", parse the data then update a small amount of info update to shared memory space (it is my understanding pthreads share data under a single process).
I beleive my options are:
Pthreads or the linux OS scheduler
I have a naive model in mind consisting of starting a certain number of these execution threads a single scheduler thread.
What is the best solution one could think when I know that I can use this sort of model.
Completing Benoit's answer, in order to communicate between your master and your worker threads, you could use conditional variable. The workers do something like:
while (true)
{
pthread_mutex_lock(workQueueMutex);
while (workQueue.empty())
pthread_cond_wait(workQueueCond, workQueueMutex);
/* if we get were then (a) we have work (b) we hold workQueueMutex */
work = pop(workQueue);
pthread_mutex_unlock(workQueueMutex);
/* do work */
}
and the master:
/* I/O received */
pthread_mutex_lock(workQueueMutex);
push(workQueue, work);
pthread_cond_signal(workQueueCond);
pthread_mutex_unlock(workQueueMutex);
This would wake up one idle work to immediately process the request. If no worker is available, the work will be dequeued and processed later.
Modifying the Linux scheduler is quite a tough work. I would just forget about it. Pthread is usually prefered. If I understand well, you want to have one core dedicated to the control plan, and a pool of other cores dedicated to the data plan processing? Then create a pool of threads from your master thread and setup core affinity for these slave threads with pthread_setaffinity_np(...).
Indeed threads of a process share the same address-space, and global variables are accessible by any threads of that process.
It looks to me that you have a version of the producer-consumer problem with a single consumer aggregating the results of n producers. This is a pretty standard problem, so I definitely think that pthread is more than enough for you. You don't need to go and mess around with the scheduler.
As one of the answer's states, a thread safe queue like the one described here works nicely for this sort of issue. Your original idea of spawning a bunch of threads is a good idea. You seem to be worried that the ability of the threads to share global state will cause you problems. I don't think that this is an issue if you keep shared state to a minimum and use sane locking discipline. Sharing state is fine as long as you do so responsibly.
Finally, unless you really know what you're doing, I would advise against manually messing with thread affinity. Just spawn the threads and let the scheduler handle when and on what core a thread runs. The thing to optimize is the number of threads you use. One for each core may not actually be the fastest approach if other threads are running.
Generally speaking, this is more or less exactly what the posix select and linux specific epoll functions are for.

Exclusive access to resource from multiple threads

Is there something equivalent to SIGSTOP and SICONT for threads? Am using pthreads.
Thanks
An edit:
I am implementing a crude form of file access syncronization among threads. So if a file is already opened by a thread, and another thread wants to open it again, I need to halt or pause the second thread at that point of its execution. When the first thread has completed its work it will check what other threads wanted to use a file it released and "wake" them up. The second thread then resumes execution from exactly that point. I use my own book keeping datastructures.
I'm going to tell you how to do things instead of answering the question. (Look up the "X Y problem".)
You are trying to prevent two threads from accessing the same file at the same time. In other words, access is MUTually EXclusive. A "mutex" is designed to do this. In general, it is easier to find help if you search for what you are trying to do (prevent two threads from accessing the same resource simultaneously) rather than searching for how you want to do it (make one thread wait for the other).
Edit: It sounds like you actually want many readers but one writer. This is probably the second most common synchronization problem (after the "producer-consumer" problem). Use a pthread_rwlock: readers call pthread_rdlock and writers call pthread_wrlock.
If you're doing something this sophisticated, you really should start reading the relevant literature. If you think you can do multithreaded programming some serious reading, you are much smarter than me and you don't need my help. I recommend "The Little Book of Semaphores" which is a free download (source). It's not about pthreads, but it's good stuff. The readers-writers problem you are asking about is found under §4.2 in the chapter "Classical Synchronization Problems" (heck, this problem is even mentioned in the blurb).
Multithreaded programing is HARD with capital letters and a bold font.
Well, there is pthread_kill.
But you almost certainly do not want to do this. What if the other thread holds (e.g.) a mutex for the heap, and you try to call new while it is stopped?
Since you do not know what the runtime is doing with mutexes, there is no way to avoid this kind of problem in general unless you completely avoid the standard library.
[edit]
Actually, come to think of it, I am not sure what happens if you target a specific thread with SIGSTOP, since that signal usually affects the whole process.
So to update my answer, I do not believe there is any standard mechanism for suspending a thread asynchronously... And for the reason mentioned above, I do not think you want one.
Depending on your application, Pthreads supports what can be considered more refined mechanisms, such as http://www.unix.com/man-page/all/3t/pthread_suspend/ and Mutex mechnisms

Is there a way to ‘join’ (block) in POSIX threads, without exiting the joinee?

I’m buried in multithreading / parallelism documents, trying to figure out how to implement a threading implementation in a programming language I’ve been designing.
I’m trying to map a mental model to the pthreads.h library, but I’m having trouble with one thing: I need my interpreter instances to continue to exist after they complete interpretation of a routine (the language’s closure/function data type), because I want to later assign other routines to them for interpretation, thus saving me the thread and interpreter setup/teardown time.
This would be fine, except that pthread_join(3) requires that I call pthread_exit(3) to ‘unblock’ the original thread. How can I block the original thread (when it needs the result of executing the routine), and then unblock it when interpretation of the child routine is complete?
Use a pthread_cond_t; wait on it on one thread and signal or broadcast it in the other.
Sounds like you actually want an implementation of the Thread Pool Pattern. It makes for a fairly simple conceptual model, without repeated thread creation & tear down costs. Some OS's directly support it, on others it should be reasonably simple to implement using a queue and a semaphore.

Resources