I am studying how to write a shell in C, and I have come across a method to use a "busy loop around the sleep function when implementing wait command". In the loop, the while(1) loop is used. I suppose to loop unconditionally, and hence take up some processing time and space? What exactly is the purpose of a busy loop? Also, if the only objective in a lazy loop is to have an unconditional loop, then can't we use any other form of loop like for(;;) instead of while(1) too?
A busy loop is a loop which purposely wastes time waiting for something to happen. Normally, you would want to avoid busy loops at all costs, as they consume CPU time doing nothing and therefore are a waste of resources, but there are rare cases in which they might be needed.
One of those cases is indeed when you need to sleep for a long amount of time and you have things like signal handlers installed that could interrupt sleeping. However, a "sleep busy loop" is hardly a busy loop at all, since almost all the time is spent sleeping.
You can build a busy loop with any loop construct you prefer, after all for, while, do ... while and goto are all interchangeable constructs in C given the appropriate control code.
Here's an example using clock_nanosleep:
// I want to sleep for 10 seconds, but I cannot do it just with a single
// syscall as it might get interrupted, I need to continue requesting to
// sleep untill the entire 10 seconds have elapsed.
struct timespec requested = { .tv_sec = 10, .tv_nsec = 0 };
struct timespec remaining;
int err;
for (;;) {
err = clock_nanosleep(CLOCK_MONOTONIC, 0, &requested, &remaining);
if (err == 0) {
// We're done sleeping
break;
}
if (err != EINTR) {
// Some error occurred, check the value of err
// Handle err somehow
break;
}
// err == EINTR, we did not finish sleeping all the requested time
// Just keep going...
requested = remaining;
}
An actual busy loop would look something like the following, where var is supposedly some sort of atomic variable set by somebody else (e.g. another thread):
while (var != 1);
// or equivalent
while (1) {
if (var == 1)
break;
}
Needless to say, this is the kind of loop that you want to avoid as it is continuously checking for a condition wasting CPU. A better implementation would be to use signals, pthread condition variables, semaphores, etc. There are usually plenty of different ways to avoid busy looping.
Finally, note that in the above case, as #einpoklum says in the comments, the compiler may "optimize" the entire loop body away by dropping the check for var, unless it has some idea that it might change. A volatile qualifier can help, but it really depends on the scenario, don't take the above code as anything other than a silly example.
Related
My question is about exactly when the I/O thread in an asynchronous I/O call returns when a call back function is involved. Specifically, given this very general code for reading a file ...
#include<stdio.h>
#include<aio.h>
...
// callback function:
void finish_aio(sigval_t sigval) {
/* do stuff ... maybe close the file */
}
int main() {
struct aiocb my_aiocb;
int aio_return;
...
//Open file, take care of any other prelims, then
//Fill in file-specific info for my_aiocb, then
//Fill in callback information for my_aiocb:
my_aiocb.aio_sigevent.sigev_notify = SIGEV_THREAD;
my_aiocb.aio_sigevent.sigev_notify_function = finish_aio;
my_aiocb.aio_sigevent.sigev_notify_attributes = NULL;
my_aiocb.aio_sigevent.sigev_value.sival_ptr = &info_on_file;
// then read the file:
aio_return = aio_read(&my_aiocb);
// do stuff that doesn't need data that is being read ...
// then block execution until read is complete:
while(aio_error(&my_aiocb) == EINPROGRESS) {}
// etc.
}
I understand that the callback function is called as soon as the read of the file is completed. But what exactly happens then? Does the I/O thread start running the callback finish_aio()? Or does it spawn a new thread to handle that callback, while it returns to the main thread? Another way to put this would be: When does aio_error(&my_aiocb) stop returning EINPROGRESS? Is it just before the call to the callback, or when the callback is completed?
I understand that the callback function is called as soon as the read of the file is completed. But what exactly happens then?
What happens is that when the IO finishes it "behaves as if" it started a new thread (similar to calling pthread_create(&ignored, NULL, finish_aio, &info_on_file)).
When does aio_error(&my_aiocb) stop returning EINPROGRESS?
I'd expect that aio_error(&my_aiocb) stops returning EINPROGRESS as soon as the IO finishes, then the system (probably the standard library) either begins creating a new thread to call finish_aio() or "unblocks" a "previously created without you knowing" thread. However, I don't think the exact order is documented anywhere ("implementation defined") because it doesn't make much sense to call aio_error(&my_aiocb) from anywhere other than the finish_aio() anyway.
More specifically; if you're using polling (my_aiocb.aio_sigevent.sigev_notify = SIGEV_NONE) then you'd repeatedly check aio_error(&my_aiocb) yourself and you can't care if you're notified before or after this happens because you're not notified at all; and if you aren't using polling you'd wait until you are notified (via. a new thread or a signal) that there's a reason to check aio_error(&my_aiocb).
In other words, your finish_aio() would look more like this:
void finish_aio(sigval_t sigval) {
struct aiocb * my_aiocb = (struct aiocb *)sigval;
int status;
status = aio_error(&my_aiocb);
/* Figure out what to do (handle the error or handle the file's data) */
.. and for main() the while(aio_error(&my_aiocb) == EINPROGRESS) (which may waste a huge amount of CPU time for nothing) would be deleted and/or possibly replaced with something else (e.g. a pthread_cond_wait() to wait until the code in finish_aio() does a pthread_cond_signal() to tell the main thread it can continue).
To understand this, let's take a look at what pure polling would look like:
int main() {
struct aiocb my_aiocb;
int aio_return;
...
//Open file, take care of any other prelims, then
//Fill in file-specific info for my_aiocb, then
my_aiocb.aio_sigevent.sigev_notify = SIGEV_NONE; /* CHANGED! */
// my_aiocb.aio_sigevent.sigev_notify_function = finish_aio;
// my_aiocb.aio_sigevent.sigev_notify_attributes = NULL;
// my_aiocb.aio_sigevent.sigev_value.sival_ptr = &info_on_file;
// then read the file:
aio_return = aio_read(&my_aiocb);
// do stuff that doesn't need data that is being read ...
// then block execution until read is complete:
while(aio_error(&my_aiocb) == EINPROGRESS) {}
finish_aio(sigval_t sigval); /* ADDED! */
}
In this case it behaves almost the same as your original code, except that there's no extra thread (and you can't care if the "thread that doesn't exist" is started before or after aio_error(&my_aiocb) returns a value other than EINPROGRESS).
The problem with pure polling is that the while(aio_error(&my_aiocb) == EINPROGRESS) could waste a huge amount of CPU time constantly checking when nothing has happened yet.
The main purpose of using my_aiocb.aio_sigevent.sigev_notify = SIGEV_THREAD is to avoid wasting a possibly huge amount of CPU time polling when nothing changed (not forgetting that in some cases wasting CPU time polling like this can prevent other threads, including the finish_aio() thread, from getting CPU time). In other words, you want to delete the while(aio_error(&my_aiocb) == EINPROGRESS) loop, so you used SIGEV_THREAD so that you can delete that polling loop.
The new problem is that (if the main thread has to wait until the data is ready) you need some other way for the main thread to wait until the data is ready. However, typically it's not "the aio_read() completed" that you actually care about, it's something else. For example, maybe the raw file data is a bunch of values in a text file (like "12, 34, 56, 78") and you want to parse that data and create an array of integers, and want to notify the main thread that the array of integers is ready (and don't want to notify the main thread if you're starting to parse the file's data). It might be like:
int parsed_file_result = 0;
void finish_aio(sigval_t sigval) {
struct aiocb * my_aiocb = (struct aiocb *)sigval;
int status;
status = aio_error(&my_aiocb);
close(my_aiocb->aio_fildes);
if(status == 0) {
/* Read was successful */
parsed_file_result = parse_file_data(); /* Create the array of integers */
} else {
/* Read failed, handle the error somehow */
parsed_file_result = -1; /* Tell main thread we failed to create the array of integers */
}
/* Tell the main thread it can continue somehow */
}
One of the best ways to tell the main thread it can continue (at the end of finish_aio()) is to use pthread conditional variables (e.g. pthread_cond_signal() called at the end of finish_aio(); with pthread_cond_wait() in the main thread). In this case the main thread will simply block (the kernel/scheduler will not give it any CPU time until pthread_cond_signal() is called) so it wastes no CPU time polling.
Sadly, pthread conditional variables aren't trivial (they require a mutex, initialization, etc), and teaching/showing their use here is a little too far from the original topic. Fortunately; you shouldn't have much trouble finding a good tutorial elsewhere.
The important part is that if you used SIGEV_THREAD (so that you can delete that awful while(aio_error(&my_aiocb) == EINPROGRESS) polling loop) you're left with no reason to call aio_error(&my_aiocb) until after the finish_aio() has already been started; and no reason to care if aio_error(&my_aiocb) would've been changed (or not) before finish_aio() is started.
I have a thread running in while with condition and it has sleep of 2 minutes.
(i.e.
while (condition) {
//do something
sleep(120);
}
)
To terminate the thread gracefully, I used pthread_join() and made while condition to false (e.g. someflag = 0)
And its working to terminate the thread, but if the thread is sleeping, it doesn't terminate until it finishes sleeping.
This is the problem I need avoid; I need to make thread come out early even if it is in sleep.
None of the above. Instead of while (condition) sleep(120); you should be using a condition variable:
while (condition) {
...
pthread_cond_timedwait(&condvar, &mutex, &abstime);
...
}
I chose pthread_cond_timedwait assuming you actually need to wake up and do something every 120 seconds even if nobody signals you, but if not you could just use pthread_cond_wait instead. The signaling thread needs to call pthread_cond_signal(&condvar) after changing the condition, and of course all access (reads and writes) to the state the condition depends on need to be protected by a mutex, mutex. You have to hold the mutex while calling pthread_cond_[timed]wait. If you have further questions on how to use condition variables, search the existing questions/answers (there are lots) or ask a follow-up.
This may not be the right answer, however I can suggest a work around to break sleep() of 120 sec into smaller time such as 2 seconds and put that in loop. Every time the loop executes, you can check for condition e.g.
while (condition)
{
//do something
int i = 0;
while(condition && (60 > i))
{
sleep (2);
i++;
}
}
I hope someone will surely paste better answer.
I'm approaching to C Windows programming in particular threads, concurrency and synchronization.
To experiment, I'm writing a C program that accepts N parameters.
Each parameter indicates a path to a file system directory tree and the program has to compare the content of all directories to decide whether all directories have the same content or not.
The main runs a "reading" thread for each parameter while a single "comparison" thread compares the name of all the entries found. For each file/directory found, "reading" threads synchronize themselves by activating the "comparison" thread.
I wrote the program with Semaphore objects and now I'm trying with Event objects.
The idea is to use N Events auto-reset and a single Event manual-reset.
The N events are used by the N "reading" threads to signal the "comparison" thread which is in WaitForMultipleObjects for an INFINITE time. When all the signals are available, it starts comparing the entry and then it performs a SetEvent() for the manual-reset object.
The "reading" threads wait for this set and then Reset the event and continue working with the next entry.
Some code for the N reading threads:
void ReadingTraverseDirectory(LPTSTR StartPathName, DWORD i) {
//variables and some work
do {
//take the next entry and put it in current_entry;
gtParams[it].entry = current_entry; //global var for comparison
SetEvent(glphReadingEvent[i]); //signal the comparison thread
WaitForSingleObject(ghComparisonEvent, INFINITE); //wait signal to restart working
ResetEvent(ghComparisonEvent); //reset the event
if (current_entry == TYPE_DIR) {
ReadingTraverseDirectory(current_entry, i); //recur to explor the next dir
}
} while (FindNextFile(SearchHandle, &FindData)); //while there are still entries
//
return;
}
Some code for the comparison thread:
DWORD WINAPI CompareThread(LPVOID arg) {
while (entries are equal){
WaitForMultipleObjects(N, glphReadingEvent, TRUE, 1000);
for (r = 0; r < nworkers - 1; r++){
if (_tcscmp(entries) != 0){
//entries are different. exit and close.
}
}
SetEvent(ghComparisonEvent);
}
}
The problem:
Sometimes it happens that one reading thread is able to work without respecting the synchro with other threads. If I put a printf() or Sleep(1) -between Wait and Set of the comparison thread-, the program works perfectly.
My opinion:
I think the manual-reset Event is not safe for this kind of (barrier)synchronization.
A reading thread may be too fast in ResetEvent() and if the scheduler slows down other threads, it is possible that some of them risk to stay blocked while the one which performed the Reset is able to continue its work.However if this is the case, the comparison thread should block itself on WaitingForMultipleObjects causing a deadlock... actually there is no deadlock but 1 thread is able to cycle more times respect to others.
What I'm trying to understand is why a simple Sleep(1) can solve the issue. Is it matter of scheduling or wrong implementation of synchronization?
Thank you.
This isn't a technical question, but a conceptual one. My program needs to handle several tasks in background. In my case, I consider threads more appropriate than processes for several reasons :
Background tasks aren't heavy, but they have to be processed regularly.
All threads need to manipulate a shared resource. Complete processes would require setting up a shared memory segment, which isn't appropriate in my case (the resource doesn't have a fixed size). Of course, this resource is protected by a mutex.
Another thing I take into consideration is that the main() function needs to be able to end all backgrounds tasks when it wants to (which means joining threads).
Now, here are two implementations :
1 thread, looping inside.
void *my_thread_func(void* shared_ressource)
{
while(1){
do_the_job();
sleep(5);
}
}
// main()
pthread_create(&my_thread, NULL, my_thread_func, (void*)&shared_ressource);
pthread_kill(my_thread, 15);
// pthread_cancel(my_thread);
pthread_join(my_thread, NULL);
Note : In this case, main() needs to signal (or cancel) the thread before joining, otherwise it'll hang. This can be dangerous if the thread doesn't get time to sem_post before it gets terminated.
n threads, looping outside.
void *my_thread_func(void* shared_ressource)
{
do_the_job();
}
// main()
while(1){
pthread_create(&my_thread, NULL, my_thread_func, (void*)&shared_ressource);
pthread_join(my_thread, NULL);
sleep(5);
}
Note : In this case, main() wouldn't naturally hang on pthread_join, it would just have to kill its own continuous loop (using a "boolean" for instance).
Now, I would like some help comparing those two. Threads are lightweight structures, but is the spawning process too heavy for the second implementation ? Or is the infinite loop holding the thread when it shouldn't ? At the moment, I prefer the second implementation because it protects the semaphore : threads do not terminate before they sem_post it. My concern here is optimisation, not functionality.
Having your background threads continuously spawning and dying tends to be inefficient. It is usually much better to have some number of threads stay alive, servicing the background work as it becomes available.
However, it's often better to avoid thread cancellation, too. Instead, I advise using a condition variable and exit flag:
void *my_thread_func(void *shared_resource)
{
struct timespec timeout;
pthread_mutex_lock(&exit_mutex);
do
{
pthread_mutex_unlock(&exit_mutex);
do_the_job();
clock_gettime(CLOCK_REALTIME, &timeout);
timeout.tv_sec += 5;
pthread_mutex_lock(&exit_mutex);
if (!exit_flag)
pthread_cond_timedwait(&exit_cond, &exit_mutex, &timeout);
} while (!exit_flag)
pthread_mutex_unlock(&exit_mutex);
}
When the main thread wants the background thread to exit, it sets the exit flag and signals the condition variable:
pthread_mutex_lock(&exit_mutex);
exit_flag = 1;
pthread_cond_signal(&exit_cond);
pthread_mutex_unlock(&exit_mutex);
pthread_join(my_thread, NULL);
(You should actually strongly consider using CLOCK_MONOTONIC instead of the default CLOCK_REALTIME, because the former isn't affected by changes to the system clock. This requires using pthread_condattr_setclock() and pthread_cond_init() to set the clock used by the condition variable.)
I am just starting to look into multi-threaded programming and thread safety. I am familiar with busy-waiting and after a bit of research I am now familiar with the theory behind spin locks, so I thought I would have a look at OSSpinLock's implementation on the Mac. It boils down to the following function (defined in objc-os.h):
static inline void ARRSpinLockLock(ARRSpinLock *l)
{
again:
/* ... Busy-waiting ... */
thread_switch(THREAD_NULL, SWITCH_OPTION_DEPRESS, 1);
goto again;
}
(Full implementation here)
After doing a bit of digging, I now have an approximate idea of what thread_switch's parameters do (this site is where I found it). My interpretation of what I have read is that this particular call to thread_switch will switch to the next available thread, and decrease the current thread's priority to an absolute minimum for 1 cycle. 'Eventually' (in CPU time) this thread will become active again and immediately execute the goto again; instruction which starts the busy-waiting all over again.
My question though, is why is this call actually necessary? I found another implementation of a spin-lock (for Windows this time) here and it doesn't include a (Windows-equivalent) thread switching call at all.
You can implement a spin lock in many different ways. If you find another SpinLock implementation for Windows you'll see another algorithm for that (it may involves SetThreadPriority, Sleep or SwitchToThread).
Default implementation for ARRSpinLockLock is clever enough and after one first spinning cycle it "depress" thread priority for a while, this has following advantages:
it gives more opportunities to the thread that owns the lock to release it;
it wastes less CPU time (and power!) performing NOP or PAUSE.
Windows implementation doesn't do it because Windows API doesn't offer that opportunity (there is no equivalent thread_switch() function and multiple calls to SetThreadPriority could be less efficient).
I actually don't think they're that different. In the first case:
static inline void ARRSpinLockLock(ARRSpinLock *l)
{
unsigned y;
again:
if (__builtin_expect(__sync_lock_test_and_set(l, 1), 0) == 0) {
return;
}
for (y = 1000; y; y--) {
#if defined(__i386__) || defined(__x86_64__)
asm("pause");
#endif
if (*l == 0) goto again;
}
thread_switch(THREAD_NULL, SWITCH_OPTION_DEPRESS, 1);
goto again;
}
We try to acquire the lock. If that fails, we spin in the for loop and if it's become available in the meantime we immediately try to reacquire it, if not we relinquish the CPU.
In the other case:
inline void Enter(void)
{
int prev_s;
do
{
prev_s = TestAndSet(&m_s, 0);
if (m_s == 0 && prev_s == 1)
{
break;
}
// reluinquish current timeslice (can only
// be used when OS available and
// we do NOT want to 'spin')
// HWSleep(0);
}
while (true);
}
Note the comment below the if, which actually says that we could either spin or relinquish the CPU if the OS gives us that option. In fact the second example seems to just leave that part up to the programmer [insert your preferred way of continuing the code here], so in a sense it's not a complete implementation like the first one.
My take on the whole thing, and I'm commenting on the first snippet, is that they're trying to achieve a balance between being able to get the lock fast (within 1000 iterations) and not hogging the CPU too much (hence we eventually switch if the lock does not become available).