Unpredictable results with two threads calling a function - c

I hoping someone can help me solve some unpredictable behaviour in a C program I need to fix:
I have two Xenomai real-time tasks (threads) that wait until they receive an incoming message from one of two CAN buses.
Each task calls a function checkMessageNumber() however I'm getting unpredictable results.
Please note that I am using a priority based, single-threaded system. One thread has priority over the other, however one thread could be part-way through executing when the other thread takes priority.
It the future it is possible that the hardware could be upgraded to a multi-threading system, however this part of the program would still be confined to a single thread (one CPU core).
It is my understanding that each thread would invoke it's own instance of this function so I don't know what's happening.
int getMessageIndex(unsigned int msg_number)
{
unsigned int i = 0;
while(i < global_number_message_boxes)
{
if (global_message_box[i].id == msg_number}
return i; // matched the msg number, so return the index number
i++;
}
return -1; // no match found
}
Originally this function was highly unpredictable, and as messages streamed in and were processed by the two tasks (depending on which hardware bus the message came from), this function would sometimes return -1 even though the incoming 'msg_number' did match an 'id' in the 'global_message_box' struct.
I was able to make it work better by setting 'global_number_message_boxes' to an integer:
eg. while(i < 50)
however the function still sometimes returns -1 even though there should be a match.
I am only reading global variables, so why are they getting corrupted? what do I need to learn about this?
My idea is to simplify things so the incoming 'msg_number' simply just is the 'id' in the 'global_message_box'.
Each thread will then write to the struct directly without having to check which 'id' to write too.
How important is it to use a mutex? due to system design, each thread will never write to the same part of the struct, so I am unsure if it's important?
Thanks.

This likely comes down to lack of thread synchronisation around the global struct: you say this function is just reading. Sure, but what if another thread calls another function that writes global_number_message_boxes or global_message_box? In a system where you have globals and multiple threads accessing them the safes rule is: put a lock around every access. Maybe the platform you use even supports read/write locks, so multiple threads can read at the same time as long as none is writing.

Lock and Semaphores will be your friends here. Writing data using two threads is going to cause any number of problems.
When the thread enters the function, you will have to BLOCK the other threads and UNBLOCK those threads at exit. This will ensure thread-safe operations and produce consistent results.

Related

How to use sched_yield() properly?

For an assignment, I need to use sched_yield() to synchronize threads. I understand a mutex lock/conditional variables would be much more effective, but I am not allowed to use those.
The only functions we are allowed to use are sched_yield(), pthread_create(), and pthread_join(). We cannot use mutexes, locks, semaphores, or any type of shared variable.
I know sched_yield() is supposed to relinquish access to the thread so another thread can run. So it should move the thread it executes on to the back of the running queue.
The code below is supposed to print 'abc' in order and then the newline after all three threads have executed. I looped sched_yield() in functions b() and c() because it wasn't working as I expected, but I'm pretty sure all that is doing is delaying the printing because a function is running so many times, not because sched_yield() is working.
The server it needs to run on has 16 CPUs. I saw somewhere that sched_yield() may immediately assign the thread to a new CPU.
Essentially I'm unsure of how, using only sched_yield(), to synchronize these threads given everything I could find and troubleshoot with online.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sched.h>
void* a(void*);
void* b(void*);
void* c(void*);
int main( void ){
pthread_t a_id, b_id, c_id;
pthread_create(&a_id, NULL, a, NULL);
pthread_create(&b_id, NULL, b, NULL);
pthread_create(&c_id, NULL, c, NULL);
pthread_join(a_id, NULL);
pthread_join(b_id, NULL);
pthread_join(c_id, NULL);
printf("\n");
return 0;
}
void* a(void* ret){
printf("a");
return ret;
}
void* b(void* ret){
for(int i = 0; i < 10; i++){
sched_yield();
}
printf("b");
return ret;
}
void* c(void* ret){
for(int i = 0; i < 100; i++){
sched_yield();
}
printf("c");
return ret;
}
There's 4 cases:
a) the scheduler doesn't use multiplexing (e.g. doesn't use "round robin" but uses "highest priority thread that can run does run", or "earliest deadline first", or ...) and sched_yield() does nothing.
b) the scheduler does use multiplexing in theory, but you have more CPUs than threads so the multiplexing doesn't actually happen, and sched_yield() does nothing. Note: With 16 CPUs and 2 threads, this is likely what you'd get for "default scheduling policy" on an OS like Linux - the sched_yield() just does a "Hrm, no other thread I could use this CPU for, so I guess the calling thread can keep using the same CPU!").
c) the scheduler does use multiplexing and there's more threads than CPUs, but to improve performance (avoid task switches) the scheduler designer decided that sched_yield() does nothing.
d) sched_yield() does cause a task switch (yielding the CPU to some other task), but that is not enough to do any kind of synchronization on its own (e.g. you'd need an atomic variable or something for the actual synchronization - maybe like "while( atomic_variable_not_set_by_other_thread ) { sched_yield(); }). Note that with an atomic variable (introduced in C11) it'd work without sched_yield() - the sched_yield() (if it does anything) merely makes busy waiting less awful/wasteful.
Essentially I'm unsure of how, using only sched_yield(), to
synchronize these threads given everything I could find and
troubleshoot with online.
That would be because sched_yield() is not well suited to the task. As I wrote in comments, sched_yield() is about scheduling, not synchronization. There is a relationship between the two, in the sense that synchronization events affect which threads are eligible to run, but that goes in the wrong direction for your needs.
You are probably looking at the problem from the wrong end. You need each of your threads to wait to execute until it is their turn, and for them to do that, they need some mechanism to convey information among them about whose turn it is. There are several alternatives for that, but if "only sched_yield()" is taken to mean that no library functions other than sched_yield() may be used for that purpose then a shared variable seems the expected choice. The starting point should therefore be how you could use a shared variable to make the threads take turns in the appropriate order.
Flawed starting point
Here is a naive approach that might spring immediately to mind:
/* FLAWED */
void *b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
// nothing?
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That is, the thread executes a busy loop, monitoring the shared variable to await it taking a value signifying that the thread should proceed. When it has done its work, the thread modifies the variable to indicate that the next thread may proceed. But there are several problems with that, among them:
Supposing that at least one other thread writes to the object designated by *whose_turn, the program contains a data race, and therefore its behavior is undefined. As a practical matter, a thread that once entered the body of the loop in that function might loop infinitely, notwithstanding any action by other threads.
Without making additional assumptions about thread scheduling, such as a fairness policy, it is not safe to assume that the thread that will make the needed modification to the shared variable will be scheduled in bounded time.
While a thread is executing the loop in that function, it prevents any other thread from executing on the same core, yet it cannot make progress until some other thread takes action. To the extent that we can assume preemptive thread scheduling, this is an efficiency issue and contributory to (2). However, if we assume neither preemptive thread scheduling nor the threads being scheduled each on a separate core then this is an invitation to deadlock.
Possible improvements
The conventional and most appropriate way to do that in a pthreads program is with the use of a mutex and condition variable. Properly implemented, that resolves the data race (issue 1) and it ensures that other threads get a chance to run (issue 3). If that leaves no other threads eligible to run besides the one that will modify the shared variable then it also addresses issue 2, to the extent that the scheduler is assumed to grant any CPU to the process at all.
But you are forbidden to do that, so what else is available? Well, you could make the shared variable _Atomic. That would resolve the data race, and in practice it would likely be sufficient for the wanted thread ordering. In principle, however, it does not resolve issue 3, and as a practical matter, it does not use sched_yield(). Also, all that busy-looping is wasteful.
But wait! You have a clue in that you are told to use sched_yield(). What could that do for you? Suppose you insert a call to sched_yield() in the body of the busy loop:
/* (A bit) better */
void* b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
sched_yield();
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That resolves issues 2 and 3, explicitly affording the possibility for other threads to run and putting the calling thread at the tail of the scheduler's thread list. Formally, it does not resolve issue 1 because sched_yield() has no documented effect on memory ordering, but in practice, I don't think it can be implemented without a (full) memory barrier. If you are allowed to use atomic objects then combining an atomic shared variable with sched_yield() would tick all three boxes. Even then, however, there would still be a bunch of wasteful busy-looping.
Final remarks
Note well that pthread_join() is a synchronization function, thus, as I understand the task, you may not use it to ensure that the main thread's output is printed last.
Note also that I have not spoken to how the main() function would need to be modified to support the approach I have suggested. Changes would be needed for that, and they are left as an exercise.

Race conditions can also occur in traditional, single-threaded programs - Clarity

I have read few books on parallel programming over the past few months and I decided to close it off with learning about the posix thread.
I am reading "PThreads programming - A Posix standard for better multiprocessing nutshell-handbook". In chapter 5 ( Pthreads and Unix ) the author talks about handling signals in multi-threaded programs. In the "Threadsafe Library Functions and System Calls" section, the author made a statement that I have not seen in most books that I have read on parallel programming. The statement was:
Race conditions can also occur in traditional, single-threaded programs that use signal handlers or that call routines recursively. A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack.
I find it a little bit tedious to decipher this statement. Does the race condition in the recursive function occur when the recursive function keeps an internal structure by using the static storage type?
I would also love to know how signal handlers can cause RACE CONDITION IN SINGLE THREADED PROGRAMS
Note: Am not a computer science student , i would really appreciate simplified terms
I don't think one can call it a race condition in the classical meaning. Race conditions have a somewhat stochastic behavior, depending on the scheduler policy and timings.
The author is probably talking about bugs that can arise when the same object/resource is accessed from multiple recursive calls. But this behavior is completely deterministic and manageable.
Signals on the other hand is a different story as they occur asynchronously and can apparently interrupt some data processing in the middle and trigger some other processing on that data, corrupting it when returned to the interrupted task.
A signal handler can be called at any time without warning, and it potentially can access any global state in the program.
So, suppose your program has some global flag, that the signal handler sets in response to,... I don't know,... SIGINT. And your program checks the flag before each call to f(x).
if (! flag) {
f(x);
}
That's a data race. There is no guarantee that f(x) will not be called after the signal happens because the signal could sneak in at any time, including right after the "main" program tests the flag.
First it is important to understand what a race condition is. The definition given by Wikipedia is:
Race conditions arise in software when an application depends on the sequence or timing of processes or threads for it to operate properly.
The important thing to note is that a program can behave both properly and improperly based on timing or ordering of execution.
We can fairly easily create "dummy" race conditions in single threaded programs under this definition.
bool isnow(time_t then) {
time_t now = time(0);
return now == then;
}
The above function is a very dumb example and while mostly it will not work, sometimes it will give the correct answer. The correct vs. incorrect behavior depends entirely on timing and so represents a race condition on a single thread.
Taking it a step further we can write another dummy program.
bool printHello() {
sleep(10);
printf("Hello\n");
}
The expected behavior of the above program is to print "Hello" after waiting 10 seconds.
If we send a SIGINT signal 11 seconds after calling our function, everything behaves as expected. If we send a SIGINT signal 3 seconds after calling our function, the program behaves improperly and does not print "Hello".
The only difference between the correct and incorrect behavior was the timing of the SIGINT signal. Thus, a race condition was introduced by signal handling.
I'm going to give a more general answer than you asked for. And this is my own, personal, pragmatic answer, not necessarily one that hews to any official, formal definition of the term "race condition".
Me, I hate race conditions. They lead to huge classes of nasty bugs that are hard to think about, hard to find, and sometimes hard to fix. So I don't like doing programming that's susceptible to race conditions. So I don't do much classically multithreaded programming.
But even though I don't do much multithreaded programming, I'm still confronted by certain classes of what feel to me like race conditions from time to time. Here are the three I try to keep in mind:
The one you mentioned: signal handlers. Receipt of a signal, and calling of a signal handler, is a truly asynchronous event. If you have a data structure of some kind, and you're in the middle of modifying it when a signal occurs, and if your signal handler also tries to modify that same data structure, you've got a race condition. If the code that was interrupted was in the middle of doing something that left the data structure in an inconsistent state, the code in the signal handler might be confused. Note, too, that it's not necessarily code right in the signal handler, but any function called by the signal handler, or called by a function that's called by the signal handler, etc.
Shared OS resources, typically in the filesystem: If your program accesses (or modifies) a file or directory in the filesystem that's also being accessed or modified by another process, you've got a big potential for race conditions. (This is not surprising, because in a computer science sense, multiple processes are multiple threads. They may have separate address spaces meaning they can't interfere with each other that way, but obviously the filesystem is a shared resource where they still can interfere with each other.)
Non-reentrant functions like strtok. If a function maintains internal, static state, you can't have a second call to that function if another instance is active. This is not a "race condition" in the formal sense at all, but it has many of the same symptoms, and also some of the same fixes: don't use static data; do try to write your functions so that they're reentrant.
The author of the book in which you found seems to be defining the term "race condition" in an unusual manner, or maybe he's just used the wrong term.
By the usual definition, no, recursion does not create race conditions in single-threaded programs because the term is defined with respect to the respective actions of multiple threads of execution. It is possible, however, for a recursion to produce exposure to non-reentrancy of some of the functions involved. It's also possible for a single thread to deadlock against itself. These do not reflect race conditions, but perhaps one or both of them is what the author meant.
Alternatively, maybe what you read is the result of a bad editing job. The text you quoted groups functions that employ signal handling together with recursive functions, and signal handlers indeed can produce data races, just as a multiple threads can do, because execution of a signal handler has the relevant characteristics of execution of a separate thread.
Race conditions absolutely happen in single-threaded programs once you have signal handlers. Look at the Unix manual page for pselect().
One way it happens is like this: You have a signal handler that sets some global flag. You check your global flag and because it is clear you make a system call that suspends, confident that when the signal arrives the system call will exit early. But the signal arrives just after you check the global flag and just before the system call takes place. So now you're hung in a system call waiting for a signal that has already arrived. In this case, the race is between your single-threaded code and an external signal.
Well, consider the following code:
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int num = 2;
void lock_and_call_again() {
pthread_mutex_lock(&mutex);
if(num > 0) {
--num;
lock_and_call_again();
}
}
int main(int argc, char** argv) {
lock_and_call_again();
}
(Compile with gcc -pthread thread-test.c if you safe the code as thread-test.c)
This is clearly single-threaded, isn't it?
Never the less, it will enter a dead-lock, because you try to lock an already locked mutex.
That's basically what is meant within the paragraph you cited, IMHO:
It does not matter whether it is done in several threads or one single thread, if you try to lock an already locked mutex, your program will end in an dead-lock.
If a function calls itself, like lock_and_call above, it what is called a recursive call .
Just as james large explains, a signal can occur any time, and if a signal handler is registered with this signal, it will called at unpredictable times, if no measures are taken, even while the same handler is already being executed - yielding some kind of implicit recursive execution of the signal handler.
If this handler aquires some kind of a lock, you end up in a dead-lock, even without a function calling itself explicitly.
Consider the following function:
pthread_mutex_t mutex;
void my_handler(int s) {
pthread_mutex_lock(&mutex);
sleep(10);
pthread_mutex_unnlock(&mutex);
}
Now if you register this function for a particular signal, it will be called whenever the signal is caught by your program. If the handler has been called and sleeps, it might get interrupted, the handler called again, and the handler try to lock the mutex that is already locked.
Regarding the wording of the citation:
"A single-threaded program of this kind may have the same routine in progress in various call frames on its process stack."
When a function gets called, some information is stored on the process's stack - e.g. the return address. This information is called a call frame. If you call a function recursively, like in the example above, this information gets stored on the stack several times - several call frames are stored.
It's stated a littlebit clumsy, I admit...

How to synchronize threads without blocking?

Now as far as I know mutex used for syncing all the thread which are sharing same data by following a principle that when one thread is using that data all other thread should be blocked while using that common resource until it is unlocked...now recently in a blogpost I have seen a code explaining this concept and some people wrote that blocking all the threads while one thread is accessing the resources is a very bad idea and it goes against the concept of threading which is true somehow.. Then my question is how to synchronize threads without blocking?
Here is the link of that blogpost
http://www.thegeekstuff.com/2012/05/c-mutex-examples/
You cannot synchronize threads without blocking by the very definition of synchronization. However, good synchronization technique will limit the scope of where things are blocked to the absolute minimum. To illustrate, and point out exactly why the article is wrong consider the following:
From the article:
pthread_t tid[2];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
pthread_mutex_lock(&lock);
unsigned long i = 0;
counter += 1;
printf("\n Job %d started\n", counter);
for(i=0; i<(0xFFFFFFFF);i++);
printf("\n Job %d finished\n", counter);
pthread_mutex_unlock(&lock);
return NULL;
}
What it should be:
pthread_t tid[2];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
unsigned long i = 0;
pthread_mutex_lock(&lock);
counter += 1;
int myJobNumber = counter;
pthread_mutex_unlock(&lock);
printf("\n Job %d started\n", myJobNumber);
for(i=0; i<(0xFFFFFFFF);i++);
printf("\n Job %d finished\n", myJobNumber);
return NULL;
}
Notice that in the article, the work being done (the pointless for loop) is done while holding the lock. This is complete nonsense, since the work is supposed to be done concurrently. The reason the lock is needed is only to protect the counter variable. Thus the threads only need to hold the lock when changing that variable as in the second example.
Mutex locks protect the critical section of code, which are those areas of code which only 1 thread at a time should touch - and all the other threads must block if trying to access the critical section at the same time. However, if thread 1 is in the critical section, and thread 2 is not, then it's perfectly fine for both to run concurrently.
The term you are looking for is lock free data structures.
General idea is that the state shared between threads is contorted into one of those.
Implementations of those vary and often are compiler or platform specific. For example MSVC has a set of _Interlocked* functions to perform simple atomic operations.
blocking all the threads while one thread is accessing the resources is a very bad idea and it goes against the concept of threading which is true somehow
This is a fallacy. Locks block only contending threads, allowing all non-contending threads to run concurrently. Running the work that's the most efficient to run at any particular time rather than forcing any particular ordering is not against the concept of threading at all.
Now if so many of your threads contend so badly that blocking contending threads is harming performance, there are two possibilities:
Most likely you have a very poor design and you should fix it. Don't blame the locks for a high-contention design.
You are in the rare case where other synchronization mechanisms are more appropriate (such as lock-free collections). But this requires significant expertise and analysis of the specific use case to find the best solution.
Generally, if your use case is a perfect fit for atomics, use them. Otherwise, mutexes (possibly in combination with condition variables) should be your first thought. That will cover 99% of the cases a typical multi-threaded C programmer will face.
You can use pthread_mutex_trylock() to attempt a lock. If that fails then you know you would have blocked. You can't do what you want to do, but your thread is not blocked, so it can attempt to do something else. I think most of the comments on that blog are about avoiding contention between threads though, i.e. that maximising multi-threaded performance is about avoiding threads working on the same resource at the same time. If you avoid that by design then by design you don't need locks as you never have contention.
There are a number of tricks that can be used to avoid concurrent bottle necks.
Immutable Data Structures. The idea here is that concurrent reads are okay, but writes are not. To implement something like this you basically need to think of business units as factories to these immutable data structures which are used by other business units.
Asynchronous-Callbacks. This is the essence of event-driven development. If you have concurrent tasks, use the observer pattern to execute some logic when a resource becomes available. Basically we execute some code up until a shared resource is needed then add a listener for when the resource becomes available. This typically results in less readable code and heaver strain on the stack, but you never block a thread waiting on a resource. If you have the tasks ready to keep the CPUs running hot, this pattern will do it for you.
Even with these tools, you'll never completely remove the need for some synchronization (counters come to mind).

Do I really need mutex lock in this case?

Consider we have three thread, bool status_flag[500] array, and working situations as follow :
Two threads only writing in status_flag array at different index. while third thread is only reading at any index.
All three thread writing at different index. While all three threads reading at any index.
In writing operation we are just setting the flag never reset it again.
status_flag [i] = true;
In reading operation we are doing something like that :
for(;;){ //spinning to get flag true
if(status_flag [i] == true){
//do_something ;
break;
}
}
What happen if compiler optimize (branch prediction) code?
I have read lot about lock but still having confusion to conclude result. Please help me to conclude.
POSIX is quite clear on this:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it.
So without locking you're not allowed to read memory that some other thread may be writing. Furthermore, that part of POSIX describes which function will synchronize the memory between the threads. Before both threads have called any of the functions listed in there, you have no guarantee that the changes made by one thread will be visible to the other thread.
If all the threads are operating on different index value, then you do not need a lock. Basically it is equivalent to using different variables.
In your code , the value of the variable i is not set or modified. So it is reading only a particular index of flag. And for writing you are using different index, in this case no need to use lock.

How to get threads to execute in a certain order?

Almost every resource that I have looked up has talked about how to enforce mutual exclusion, or deal with the producer/consumer problem.
The problem is that I need to get certain threads to execute before other threads, but can't figure out how. I am trying to use semaphores, but don't really see how they can help in my case.
I have
a read thread,
N number of search threads, and
a write thread.
The read thread fills a buffer with data, then the search threads parse the data and outputs it to a different buffer, which the write thread then writes to a file.
Any idea as to how I would accomplish this?
I can post the code I have so far if anyone thinks that would help.
I think what you're looking for is a monitor
I would use a few condition variables.
You have read buffers. Probably two. If the search threads are slow you want read to wait rather than use all the memory on buffers.
So: A read_ready condition variable and a read_ready_mutex. Set to true when there is an open buffer.
Next: A search_ready condition variable and a search_ready_mutex. Set to true when there is a complete read buffer.
Next: A write_ready variable and a write_ready mutex. Set to true when there is work to do for the write thread.
Instead of true/false you could use integers of the number of buffers that are ready. As long as you verify the condition while the mutex is held and only modify the condition while the mutex is held, it will work.
[Too long for a comment]
Cutting this down to two assumptions:
Searchig cannot be done before reading has finished.
Writing cannot be done before searching has finished.
I conclude:
Do not use threads for reading and writing, but do it from the main thread.
Just do the the searching in parallel using threads.
Generally speaking, threads are used precisely when we don't care about the order of execution.
If you want to execute some statements S1, S2, ... , SN in that order, then you catenate them into a program that is run by a single thread: { S1; S2; ...; SN }.
The specific problem you describe can be solved with a synchronization primitive known as a barrier (implemented as the POSIX function pthread_barrier_wait).
A barrier is initialized with a number, the barrier count N. Threads which call the barrier wait operation are suspended, until N threads accumulate. Then they are are all released. One of the threads receives a return value which tells it that it is the "serial thread".
So for instance, suppose we have N threads doing this read, process-in-paralle, and write sequence. It goes like this (pseudocode):
i_am_serial = barrier.wait(); # at first, everyone waits at the barrier
if (i_am_serial) # serial thread does the reading, preparing the data
do_read_task();
barrier.wait(); # everyone rendezvous at the barrier again
do_paralallel_processing(); # everyone performs the processing on the data
i_am_serial = barrier.wait(); # rendezvous after parallel processing
if (i_am_serial)
do_write_report_task(); # serialized integration and reporting of results

Resources