Suppose two processes (or threads) both call write on a pipe/socket/terminal whose buffer is full, thus blocking. Is there any guarantee as to who gets to write first when buffer space becomes available? Is it FIFO order? Globally, or within a given priority level, and ordered first by priority? Or is it completely random/indeterminate?
What about starved reads? Will the first to call read get the data when it becomes available?
I'm asking specifically on Linux and as far as I know POSIX has nothing to say about these issues, but I'd also be interested if I'm wrong on that and POSIX does mandate particular behaviors.
Within the kernel, the pipe_wait() function is used by both pipe readers and writers to block. This function uses the DEFINE_WAIT() macro to define a wait queue, which sets the .flags member of the wait queue to zero.
They are woken up with a call to wake_up_interruptible_sync_poll(), which calls down to __wake_up_common(). You can see that if the .flags member does not have the WQ_FLAG_EXCLUSIVE bit set (as in this case), then all waiters are unceremoniously made runnable.
The scheduler will then use its normal heuristics to pick the runnable processes to run next. In particular, a later waiter with a higher priority will get to go first - but note that if you have more than one processor core available, multiple waiters can start to run simultaneously, and which one actually gets to touch the pipe first depends entirely on which one manages to grab the pipe lock first.
Short answer:
Higher priority processes will likely get it first, but applications should not make assumptions on this behavior.
Additional info:
When data is available on a pipe/socket, a race condition can occur, in which a seemingly random process can grab the lock first. In general, higher priority processes will get the lock first, but this should not be depended on, as many other factors can contribute to this, such as the number of processor cores and active threads.
In general, user-level apps can assume that higher priority will ensure more frequent I/O access, but should not expect or assume consistent behavior defined beyond these generalities.
Related
I have a little paging problem on my realtime system, and wanted to know how exactly linux should behave in my particular case.
Among various other things, my application spawns 2 threads using pthread_create(), which operate on a set of shared buffers.
The first thread, let's call it A, reads data from a device, performs some calculations on it, and writes the results into one of the buffers.
Once that buffer is full, thread B will read all the results and send them to a PC via ethernet, while thread A writes into the next buffer.
I have noticed that each time thread A starts writing into a previously unused buffer, i miss some interrupts and lose data (there is an id in the header of each packet, and if that increments by more than one, i have missed interrupts).
So if i use n buffers, i get exactly n bursts of missed interrupts at the start of my data acquisition (therefore the problem is definitely caused by paging).
To fix this, i used mlock() and memset() on all of the buffers to make sure they are actually paged in.
This fixed my problem, but i was wondering where in my code would be the correct place do this. In my main application, or in one/both of the threads? (currently i do it in both threads)
According to the libc documentation (section 3.4.2 "Locked Memory Details"), memory locks are not inherited by child processes created using fork().
So what about pthreads? Do they behave the same way, or would they inherit those locks from my main process?
Some background information about my system, even though i don't think it matters in this particular case:
It is an embedded system powered by a SoC with a dual-core Cortex-A9 running Linux 4.1.22 with PREEMPT_RT.
The interrupt frequency is 4kHz
The thread priorities (as shown in htop) are -99 for the interrupt, -98 for thread A (both of which are higher than the standard priority of -51 for all other interrupts) and -2 for thread B
EDIT:
I have done some additional tests, calling my page locking function from different threads (and in main).
If i lock the pages in main(), and then try to lock them again in one of the threads, i would expect to see a large amount of page faults for main() but no page faults for the thread itself (because the pages should already be locked). However, htop tells a different story: i see a large amount of page faults (MINFLT column) for each and every thread that locks those pages.
To me, that would suggest that pthreads actually do have the same limitation as child processes spawned using fork(). And if this is the case, locking them in both threads (but not in main) would be the correct procedure.
Threads share the same memory management context. If a page is resident for one thread, it's resident for all threads in the same process.
The implication of this is that memory locking is per-process, not per-thread.
You are probably still seeing minor faults on the first write because a fault is used to mark the page dirty. You can avoid this by also writing to each page after locking.
I have written code for two threads where is one is assigned priority 20 (lower) and another on 10 (higher). Upon executing my code, 70% of the time I get expected results i.e high_prio (With priority 10) thread executes first and then low_prio (With priority 20).
Why is my code not able to get 100 % correct result in all the executions? Is there any conceptual mistake that I am doing?
void *low_prio(){
Something here;
}
void *high_prio(){
Something here;
}
int main(){
Thread with priority 10 calls high_prio;
Thread with priority 20 calls low_prio;
return 0;
}
Is there any conceptual mistake that I am doing?
Yes — you have an incorrect expectation regarding what thread priorities do. Thread priorities are not meant to force one thread to execute before another thread.
In fact, in a scenario where there is no CPU contention (i.e. where there are always at least as many CPU cores available as there are threads that currently want to execute), thread priorities will have no effect at all -- because there would be no benefit to forcing a low-priority thread not to run when there is a CPU core available for it to run on. In this no-contention scenario, all of the threads will get to run simultaneously and continuously for as long as they want to.
The only time thread priorities may make a difference is when there is CPU contention -- i.e. there are more threads that want to run than there are CPU cores available to run them. At that point, the OS's thread-scheduler has to make a decision about which thread will get to run and which thread will have to wait for a while. In this instance, thread priorities can be used to indicate to the scheduler which thread it should prefer allow to run.
Note that it's even more complicated than that, however -- for example, in your posted program, both of your threads are calling printf() rather a lot, and printf() invokes I/O, which means that the thread may be temporarily put to sleep while the I/O (e.g. to your Terminal window, or to a file if you have redirected stdout to file) completes. And while that thread is sleeping, the thread-scheduler can take advantage of the now-available CPU core to let another thread run, even if that other thread is of lower priority. Later, when the I/O operation completes, your high-priority thread will be re-awoken and re-assigned to a CPU core (possibly "bumping" a low-priority thread off of that core in order to get it).
Note that inconsistent results are normal for multithreaded programs -- threads are inherently non-deterministic, since their execution patterns are determined by the thread-scheduler's decisions, which in turn are determined by lots of factors (e.g. what other programs are running on the computer at the time, the system clock's granularity, etc).
I have a very basic doubt.
when a process is waiting on a semaphore , it goes into sleep state.
So no way it can poll the semaphore value.
Does kernel poll the semaphore value and if available sends a signal to all process waiting for it ? If so, wont it be too much overhead for the kernel.
Or does the signal() call internally notifies all the process waiting for the semaphore.
Please let me know on this.
The operating system schedules the process once more when the operating system is told by another process that it has done with the semaphore.
Semaphores are just one of the ways of interacting with the OS scheduler.
The kernel doesn't poll the semaphore; it doesn't need to. Every time a process calls sem_post() (or equivalent), that involves interaction with the kernel. What the kernel does during the sem_post() is look up whatever processes have previously called sem_wait() on the same semaphore. If one or more processes have called sem_wait(), it picks the process with the highest priority and schedules it. This shows up as that sem_wait() finally returning and that process carries on executing.
How This is Implemented Under the Hood
Fundamentally the kernel needs to implement something called an "atomic test and set". That is an operation where by the value of some variable can be tested and, if a certain condition is met (such as the value == 0) the variable value is altered (e.g. value = 1). If this succeeds, the kernel will do one thing, (like schedule a process), if this does not (because the condition value==0 was false) the kernel will do something difference (like put a process on the do-not-schedule list). The 'atomic' part is that this decision is made without anything else being able to look at and change the same variable at the same time.
There's several ways of doing this. One is to suspend all processes (or at least all activity within the kernel) so that nothing else is testing the value of the variable at the same time. That's not very fast.
For example, the Linux kernel once had something called the Big Kernel Lock. I don't know if this was used to process semaphore interactions, but that's the kind of thing that OSes used to have for atomic test & sets.
These days CPUs have atomic test & set op codes, which is a lot faster. The good ole' Motorola 68000 had one of these a long time ago; it took CPUs like the PowerPC and the x86 many, many years to get the same kind of instruction.
If you root around inside linux you'll find mention of futexes. a futex is a fast mutex - it relies on a CPU's test/set instruction to implement a fast mutex semaphore.
Post a Semaphore in Hardware
A variation is a mailbox semaphore. This is a special variation on a semaphore that is extremely useful in some system types where hardware needs to wake up a process at the end of a DMA transfer. A mailbox is a special location in memory which when written to will cause an interrupt to be raised. This can be turned into a semaphore by the kernel because when that interrupt is raised, it goes through the same motions as it would had something called sem_post().
This is incredibly handy; a device can DMA a large amount of data to some pre-arranged buffer, and top that off with a small DMA transfer to the mail box. The kernel handles the interrupt, and if a process has previously called sem_wait() on the mailbox semaphore the kernel schedules it. The process, which also knows about this pre-arranged buffer, can then process the data.
On a real time DSP systems this is very useful, because it's very fast and very low latency; it allows a process to receive data from some device with very little delay. The alternative, to have a full up device driver stack that uses read() / write() to transfer data from the device to the process is incredibly slow by comparison.
Speed
The speed of semaphore interactions depends entirely on the OS.
For OSes like Windows and Linux, the context switch time is fairly slow (in the order of several microseconds, if not tens of microseconds). Basically this means that when a process calls something like sem_post(), the kernel is doing a lot of different things whilst it has the opportunity before finally returning control to the process(es). What it's doing during this time could be, well, almost anything!
If a program has made use of a lot threads, and they're all rapidly interacting between themselves using semaphores, quite a lot of time is lost to the sem_post() and sem_wait(). This places an emphasis on doing a decent amount of work once a process has returned from sem_wait() before calling the next sem_post().
However on OSes like VxWorks, the context switch time is lightning fast. That is there's very little code in the kernel that gets run when sem_post() is called. The result is that a semaphore interaction is a lot more efficient. Moreover, and OS like VxWorks is written in such a way so as to guarantee that the time take to do all this sem_post() / sem_wait() work is constant.
This influences the architecture of one's software on these systems. On VxWorks, where a context switch is cheap, there's very little penalty in having a large number of threads all doing quite small tasks. On Windows / Linux there's more of an emphasis on the opposite.
This is why OSes like VxWorks are excellent for hard real time applications, and Windows / Linux are not.
The Linux PREEMPT_RT patch set in part aims to improve the latency of the linux kernel during operations like this. For example, it pushes a lot of device interrupt handlers (device drivers) up into kernel threads; these are scheduled almost just like any other thread. The idea is to reduce the amount of work that is being done by the kernel (and have more done by kernel threads), so that the work it still has to do itself (such as handling sem_post() / sem_wait()) takes less time and is more consistent about how long this takes. It still not a hard guarantee of latency, but it's a pretty good improvement. This is what we call a soft-realtime kernel. The impact though is that overall throughput of the machine can be lower.
Signals
Signals are nasty, horrible things that really get in the way of using things like sem_post() and sem_wait(). I avoid them like the plague.
If you are on a Linux platform and you do have to use signals, take a serious long look at signalfd (man page). This is a far better way of dealing with signals because you can choose to accept them at a convenient time (simply by called read()), instead of having to handle them as soon as they occur. Certainly if you're using epoll() or select() anywhere at all in a program then signalfd is the way to go.
I have a project with some soft real-time requirements. I have two processes (programs that I've written) that do some data acquisition. In either case, I need to continuously read in data that's coming in and process it.
The first program is heavily threaded, and the second one uses a library which should be threaded, but I have no clue what's going on under the hood. Each program is executed by the user and (by default) I see each with a priority of 20 and a nice value of 0. Each program uses roughly 30% of the CPU.
As it stands, both processes have to contended with a few background processes, and I want to give my two programs the best shot at the CPU as possible. My main issue is that I have a device that I talk to that has a 64 byte hardware buffer, and if I don't read from it in time, I get an overflow. I have noted this condition occurring once every 2-3 hours of run time.
Based on my research (http://oreilly.com/catalog/linuxkernel/chapter/ch10.html) there appear to be three ways of playing around with the priority:
Set the nice value to a lower number, and therefore give each process more priority. I can do this without any modification to my code (or use the system call) using the nice command.
Use sched_setscheduler() for the entire process to a particular scheduling policy.
Use pthread_setschedparam() to individually set each pthread.
I have run into the following roadblocks:
Say I go with choice 3, how do I prevent lower priority threads from being starved? Is there also a way to ensure that shared locks cause lower priority threads to be promoted to a higher priority? Say I have a thread that's real-time, SCHED_RR and it shared a lock with a default, SCHED_OTHER thread. When the SCHED_OTHER thread gets the lock, I want it to execute # higher priority to free the lock. How do I ensure this?
If a thread of SCHED_RR creates another thread, is the new thread automatically SCHED_RR, or do I need to specify this? What if I have a process that I have set to SCHED_RR, do all its threads automatically follow this policy? What if a process of SCHED_RR spawns a child process, is it too automatically SCHED_RR?
Does any of this matter given that the code only uses up 60% of the CPU? Or are there still issues with the CPU being shared with background processes that I should be concerned with and could be caused my buffer overflows?
Sorry for the long winded question, but I felt it needed some background info. Thanks in advance for the help.
(1) pthread_mutex_setprioceiling
(2) A newly created thread inherits the schedule and priority of its creating thread unless it's thread attributes (e.g. pthread_attr_setschedparam / pthread_attr_setschedpolicy) are directed to do otherwise when you call pthread_create.
(3) Since you don't know what causes it now it is in fairness hard for anyone say with assurance.
I'm extending the functionality of a semaphore. I ran into a roadblock when I realized I don't know the implementation of an actual semaphore and to make sure my code ran correctly, I needed to know this.
I know a semaphore works by blocking threads that are waiting on it when they call sem_wait() and another thread currently has it locked. The thread is then blocked and then put into a wait list for that semaphore.
My question relates to what happens on a sem_post(). Is the next thread pulled off the waiting list, set as the locking thread, and allowed to be unblocked? Or is the scheme for posting completely different?
Thanks!
The next thread to unblock on it's sem_wait() will be whatever thread the OS decides is the next one to context switch into. Nobody makes any guarantee of ordering; it depends on your OS's scheduling strategy. It might be the thread that has been off the CPU for the longest, or the one that has been assigned the highest "priority", or the one that has historically had certain resource-usage statistics, or whatever.
Most likely, your current thread (the one that called sem_post()) will continue running for a while, until it either starts waiting for user input, blocks on another semaphore, or runs out of its os-allotted time slice. Then, the OS will switch in some totally unrelated process to run for a fraction of a second (probably Firefox or something), then go off and handle some network traffic, get itself a cup of tea, and, finally, when it gets around to it, pick whichever of your other threads it feels like, based on something like whether it feels based on past history that the particular thread is more CPU or I/O-bound.
In many OSes, priority is given to I/O-bound processes that haven't been around for very long. The theory is that new processes might be short-lived (if it's been around for five hours already, odds are it won't be finishing up in the next 1ms) so we might as well get them over with. I/O-bound processes are likely to continue to be I/O-bound, which means that chances are they are going to switch off the CPU shortly while waiting for other resources. Basically, the OS wants to find the process that it's going to be able to be done with ASAP, so it can get back to sipping its tea and running your malware.
Semaphores have two operations:
P() To acquire the semaphore (you seem to call this sem_wait)
V() To release the semaphore (you seem to call this sem_post)
Semaphores also have an integer associated to them, which is the number of concurrent threads allowed to pass P() without blocking. Other calls to P() will block until V() is called to free up spots.
That is the classic definition of a semaphore.
Edit: Semaphores do not make any guarantee of order. They don't have to actually use a queue or other FIFO structure. When only one thread is allowed at a time, when it calls V(), another (possibly random) thread will then return from its P() call and continue.
According to the IEEE standards, the behavior of POSIX semaphores:
If the semaphore value resulting from this operation is positive, then no threads were blocked waiting for the semaphore to become unlocked; the semaphore value is simply incremented.
If the value of the semaphore resulting from this operation is zero, then one of the threads blocked waiting for the semaphore shall be allowed to return successfully from its call to sem_wait(). If the Process Scheduling option is supported, the thread to be unblocked shall be chosen in a manner appropriate to the scheduling policies and parameters in effect for the blocked threads. In the case of the schedulers SCHED_FIFO and SCHED_RR, the highest priority waiting thread shall be unblocked, and if there is more than one highest priority thread blocked waiting for the semaphore, then the highest priority thread that has been waiting the longest shall be unblocked. If the Process Scheduling option is not defined, the choice of a thread to unblock is unspecified.
If the Process Sporadic Server option is supported, and the scheduling policy is SCHED_SPORADIC, the semantics are as per SCHED_FIFO above."