The disadvantages of using sleep() - c

For c programming, if i want to coordinate two concurrently executing processes, I can use sleep(). However, i heard that sleep() is not a good idea to implement the orders of events between processes? Are there any reasons?

sleep() is not a coordination function. It never has been. sleep() makes your process do just that - go to sleep, not running at all for a certain period of time.
You have been misinformed. Perhaps your source was referring to what is known as a backoff after an acquisition of a lock fails, in which case a randomized sleep may be appropriate.
The way one generally establishes a relative event ordering between processes (ie, creates a happens-before edge) is to use a concurrency-control structure such as a condition variable which is only raised at a certain point, or a more-obtuse barrier which causes each thread hitting it to wait until all others have also reached that point in the program.

Using sleep() will impact the latency and CPU load. Let's say you sleep for 1ms and check some atomic shared variable. The average latency will be (at least) 0.5ms. You will be consuming CPU cycles in this non-active thread to poll the shared atomic variable. There are also often no guarantees about the sleep time.
The OS provides services to communicate/synchronize between threads/processes. Those have low latency, consume less CPU cycles, and often have other guarantees - those are the ones you should use... (E.g. condition variables, events, semaphores etc.). When you use those the thread/process does not need to "poll". The kernel wakes up the waiting threads/processes when needed (the thread/process "blocks").
There are some rare situations where polling is the best solution for thread/process synchronization, e.g. a spinlock, usually when the overhead of going through the kernel is larger than the time spent polling.

Sleep would not be a very robust way to handle event ordering between processes as there are so many things that can go wrong.
What if your sleep() is interrupted?
You need to be a bit more specific about what you mean by "implement the order of events between processes".

In my case, I was using this function in celery. I was doing time.sleep(10). And it was working fine if the celery_task was called once or twice per minute. But it created chaos in one case.
If the celery_task is called 1000 times
I had 4 celery workers, so the above 1000 celery calls were queued for execution.
The first 4 calls were executed by the 4 workers and the remaining 996 were still in the queue.
the workers were busy in the 4 tasks for 10 seconds and after 10 secs it took the next 4 tasks. Going this way it may take around 1000\4*10=2500 seconds.
Eventually, we had to remove time.sleep as it was blocking the worker for 10 seconds in my case.

Related

Is there ever a valid reason to call pthread_yield() when running under a modern/pre-emptive scheduler?

pthread_yield is documented as "causes the calling thread to relinquish the CPU", but on a modern OS/scheduler, the relinquishing of the CPU happens automatically at the appropriate times (i.e. whenever the thread calls a blocking operation, and/or when the thread's quantum has expired). Is pthread_yield() therefore vestigial/useless except in the special case of running under a co-operative-only task scheduler? Or are there some use-cases where calling it would still be correct/useful even under a modern pre-emptive scheduler?
pthread_yield() gives you a chance to do a short sleep -- not a timed sleep. You relinquish the remainder of time slice to some other thread or process, but you don't put the thread in a wait queue.
Also a while ago I read about how schedulers prioritizing interactive processes. These are the processes that user interacts with directly and you feel their sluggishness most (you have less of a feeling of your system being slow if your UI is responsive). One of the properties of interactive processes is that they have little to do and mostly don't use entire time slice. So if a process keeps yielding before its time slice is up you assume it is interactive and you boost its priority. There were exploits that used this trick to effectively use 99% of CPU while showing the offending process as being at 0%.

Minimum time quantum needed in nanosleep(), usleep() to yield the CPU

In concurrent code in my workplace, there are several occurrences of nanosleep() or usleep() with a non-zero constant to free up the CPU without relying on futex(), or a sleeping synchronization primitive to put the thread to sleep (for instance, when waiting for an element from a concurrent queue). The code claims to prevent pathological cases where threads consume CPU without doing any actual work when other threads are available to get scheduled on that CPU. This sounds reasonable by itself assuming the cooperation between the sleep functions and the kernel thread scheduler is correct.
Is there a concept in linux where a minimum duration passed to nanosleep(), usleep(), et al. is known to put the calling thread to sleep and run another thread in it's place on the same core when cores are oversubscribed? And if the duration is smaller than that, then the thread does not actually yield the CPU but continue spinning? This forms the basis of the constant passed to the sleep() functions in order to make it behave like a coarse-yield.
I realize that a sched_yield() is probably better suited for what the code is doing; but I just wanted to educate myself on the behavior of the linux sleep() functions before benchmarking a replacement or improvement on the existing code.
Thanks!
The man page makes it clear that it no longer busy-waits.
In order to support applications requiring much more precise pauses
(e.g., in order to control some time-critical hardware), nanosleep()
would handle pauses of up to 2 milliseconds by busy waiting with
microsecond precision when called from a thread scheduled under a
real-time policy like SCHED_FIFO or SCHED_RR. This special extension
was removed in kernel 2.5.39, and is thus not available in Linux
2.6.0 and later kernels.
#stark has answered your question as written, but to elaborate, don't do that. If you're waiting for an event to happen, perform an operation that waits for the event, like pthread_cond_wait, sem_wait, poll, read, etc. rather than sleeping and retrying. This will avoid wasting lots of cpu time, and it also discourages erroneous programming models full of data races (because normally the same primitive that waits also ensures exclusive access/synchronization).

Priority based multithreading?

I have written code for two threads where is one is assigned priority 20 (lower) and another on 10 (higher). Upon executing my code, 70% of the time I get expected results i.e high_prio (With priority 10) thread executes first and then low_prio (With priority 20).
Why is my code not able to get 100 % correct result in all the executions? Is there any conceptual mistake that I am doing?
void *low_prio(){
Something here;
}
void *high_prio(){
Something here;
}
int main(){
Thread with priority 10 calls high_prio;
Thread with priority 20 calls low_prio;
return 0;
}
Is there any conceptual mistake that I am doing?
Yes — you have an incorrect expectation regarding what thread priorities do. Thread priorities are not meant to force one thread to execute before another thread.
In fact, in a scenario where there is no CPU contention (i.e. where there are always at least as many CPU cores available as there are threads that currently want to execute), thread priorities will have no effect at all -- because there would be no benefit to forcing a low-priority thread not to run when there is a CPU core available for it to run on. In this no-contention scenario, all of the threads will get to run simultaneously and continuously for as long as they want to.
The only time thread priorities may make a difference is when there is CPU contention -- i.e. there are more threads that want to run than there are CPU cores available to run them. At that point, the OS's thread-scheduler has to make a decision about which thread will get to run and which thread will have to wait for a while. In this instance, thread priorities can be used to indicate to the scheduler which thread it should prefer allow to run.
Note that it's even more complicated than that, however -- for example, in your posted program, both of your threads are calling printf() rather a lot, and printf() invokes I/O, which means that the thread may be temporarily put to sleep while the I/O (e.g. to your Terminal window, or to a file if you have redirected stdout to file) completes. And while that thread is sleeping, the thread-scheduler can take advantage of the now-available CPU core to let another thread run, even if that other thread is of lower priority. Later, when the I/O operation completes, your high-priority thread will be re-awoken and re-assigned to a CPU core (possibly "bumping" a low-priority thread off of that core in order to get it).
Note that inconsistent results are normal for multithreaded programs -- threads are inherently non-deterministic, since their execution patterns are determined by the thread-scheduler's decisions, which in turn are determined by lots of factors (e.g. what other programs are running on the computer at the time, the system clock's granularity, etc).

What could produce this bizzare behavior with two threads sleeping at the same time?

There are two threads. One is an events thread, and another does rendering. The rendering thread uses variables from the events thread. There are mutex locks but they are irrelevant since I noticed the behavior is same even if I remove them completely (for testing).
If I do a sleep() in the rendering thread alone, for 10 milliseconds, the FPS is normally 100.
If I do no sleep at all in the rendering thread and a sleep in the events thread, the rendering thread does not slow down at all.
But, if I do a sleep of 10 milliseconds in the rendering thread and 10 in the events thread, the FPS is not 100, but lower, about 84! (notice it's the same even if mutex locks are removed completely)
(If none of them has sleeps it normally goes high.)
What could produce this behavior?
--
The sleep command used is Sleep() of windows or SDL_Delay() (which probably ends up to Sleep() on windows).
I believe I have found an answer (own answer).
Sleeping is not guaranteed to wait for a period, but it will wait at least a certain time, due to OS scheduling.
A better approach would be to calculate actual time passed explicitly (and allow execution via that, only if certain time has passed).
The threads run asynchronously unless you synchronise them, and will be scheduled according to the OS's scheduling policy. I would suggest that the behaviour will at best be non-deterministic (unless you were running on an RTOS perhaps).
You might do better to have one thread trigger another by some synchronisation mechanism such as a semaphore, then only have one thread Sleep, and the other wait on the semaphore.
I do not know what your "Events" thread does but given its name, perhaps it would be better to wait on the events themselves rather than simply sleep and then poll for events (if that is what it does). Making the rendering periodic probably makes sense, but waiting on events would be better doing exactly that.
The behavior will vary depending on many factors such as the OS version (e.g. Win7 vs. Win XP) and number of cores. If you have two cores and two threads with no synchronization objects they should run concurrently and Sleep() on one thread should not impact the other (for the most part).
It sounds like you have some other synchronization between the threads because otherwise when you have no sleep at all in your rendering thread you should be running at >100FPS, no?
In case that there is absolutely no synchronization then depending on how much processing happens in the two threads having them both Sleep() may increase the probability of contention for a single core system. That is if only one thread calls Sleep() it is generally likely to be given the next quanta once it wakes up and assuming it does very little processing, i.e. yields right away, that behavior will continue. If two threads are calling Sleep() there is some probability they will wake up in the same quanta and if at least one of them needs to do any amount of processing the other will be delayed and the observed frequency will be lower. This should only apply if there's a single core available to run the two threads on.
If you want to maintain a 100FPS update rate you should keep track of the next scheduled update time and only Sleep for the remaining time. This will ensure that even if your thread gets bumped by some other thread for a CPU quanta you will be able to keep the rate (assuming there is enough CPU time for all processing). Something like:
DWORD next_frame_time = GetTickCount(); // Milli-seconds. Note the resolution of GetTickCount()
while(1)
{
next_frame_time += 10; // Time of next frame update in ms
DWORD wait_for = next_frame_time - GetTickCount(); // How much time remains to next update
if( wait_for < 11 ) // A simplistic test for the case where we're already too late
{
Sleep(wait_for);
}
// Do periodic processing here
}
Depending on the target OS and your accuracy requirements you may want to use a higher resolution time function such as QueryPerformanceCounter(). The code above will not work well on Windows XP where the resolution of GetTickCount() is ~16ms but should work in Win7 - it's mostly to illustrate my point rather than meant to be copied literally in all situations.

How to setup and manage persistent multiple threads?

I have POSIX in mind for implementation, though this question is more about architecture.
I am starting from an update loop that has several main jobs to do. I can group those jobs into four or five main tasks that have common memory access requirements. It's my idea to break off those jobs into their own threads and have them complete one cycle of "update" and sleep until the next frame.
But how to synchronize? If I detach four or five threads at the start of each cycle, have them run once, die, and then detach another 4-5 threads on each pass? That sounds expensive.
It sounds more reasonable to create these threads once, and have them go to sleep until a synchronized call wakes it up.
Is this a wise approach? I'm open to accepting responses from just ideas to implementations of any kind.
EDIT: based on the answers so far, I'd like to add:
concurrency is desired
these worker threads are intended to run at very short durations <250ms
the work done by each thread will always be the same
i'm considering 4-5 threads, 20 being a hard limit.
That depends on the granularity of the tasks that the threads are performing. If they're doing long tasks (e.g. a second or longer), then the cost of creating and destroying threads is negligible compared to the work the threads are doing, so I'd recommend keeping things simple and creating the threads on demand.
Conversely, if you have very short tasks (e.g. less than 10-100 ms or so), you will definitely start to notice the cost of creating and destroying lots of threads. In that case, yes, you should create the threads only once and have them sleep until work arrives for them. You'll want to use some sort of condition variable (e.g. pthread_cond_t) for this: the thread waits on the condition variable, and when work arrives, you signal the condition variable.
If you always have the same work to do every cycle, and you need to wait for all the work to finish before the next cycle starts, then you're thinking about the right solution.
You'll need some synchronization objects: a "start of frame semaphore", an "end of frame semaphore", and an "end of frame event". If you have n independent tasks each frame, start n threads, with loops that look like this (pseudocode):
while true:
wait on "start of frame semaphore"
<do work>
enter lock
decrement "worker count"
if "worker count" = 0 then set "end of frame event"
release lock
wait on "end of frame semaphore"
You can then have a controller thread run:
while true:
set "worker count" to n
increment "start of frame semaphore" by n
wait on "end of frame event"
increment "end of frame semaphore" by n
This will work well for small n. If the number of tasks you need to complete each cycle becomes large, then you will probably want to use a thread pool coupled with a task queue, so that you don't overwhelm the system with threads. But there's more complexity with that solution, and with threading complexity is the enemy.
The best is probably to use a task queue.
Task queues can be seen as threads waiting for a job to be submitted to them. If there are many sent at once, they are executed in FIFO order.
That way, you maintain 4-5 threads, and each of them executes the job you feed them, without needing to detach a new thread for each job.
The only problem is that I don't know many implementations of task queues in C. Apple has Grand Central Dispatch that does just that; FreeBSD has an implementation of it too. Except those, I don't know any other. (I didn't look very hard, though.)
Your idea is known as a thread pool. They are found in WinAPI, Intel TBB and the Visual Studio ConcRT, I don't know much about POSIX and therefore cannot help you, but they are an excellent structure with many desirable properties, such as excellent scaling, if the work being posted can be split up.
However, I wouldn't trivialize the time the work takes. If you have five tasks, and you have a performance issue so desperate that multiple threads are the key, then creating the threads is almost certainly a negligible problem.

Resources