Dynamically creating threads and passing an integer - c

I have a process that creates a number of threads based on an argument passed to the process.
producer_threads[num_threads];
for (id = 0; id < num_threads; id++)
{
printf("%d\n", id);
pthread_create(&producer_threads[id], NULL, &produce, (void *) &id);
}
Each thread goes into a produce function and stores the id as a local variable
void* produce (void* args)
{
int my_id = * (int*) args;
printf("Thread %d started to produce\n", my_id);
}
However the output I receive is as shown
0
1
Thread <n> started to produce
Thread <n> started to produce
and n are randomly either 0, 1, or 2. I am not sure what is causing the problem unless it is because the global variable is being updated before it is assigned locally. Or because the "local variable" is shared between threads.

The problem is that you're passing pointers to the same variable to each thread. This creates a race condition, whereby the value of the variable as seen by each thread depends on the exact timing.
If you were to pass the thread argument by value rather than by pointer, this would fix the problem.

The integer needs to be an alloc'd variable instead of a stack variable. Since you're passing a pointer to a memory location on the stack, your results will depend on timing (i.e. are a race condition). You need to pass different variables to each pthread_create call.

Related

Do all threads have the same global variable?

I have a general question that occured to me while trying to implement a thread sychronization problem with sempaphores. I do not want to get into much (unrelated) detail so I am going to give the code that I think is important to clarify my question.
sem_t *mysema;
violatile int counter;
struct my_info{
pthread_t t;
int id;
};
void *barrier (void *arg){
struct my_info *a = arg;
arg->id = thrid;
while(counter >0){
do_work(&mysem[thrid])
sem_wait(&mysema[third])
display_my_work(arg);
counter--;
sem_post(&mysema[thrid+1])
}
return NULL;
}
int main(int argc, char *argv[]){
int N = atoi(argv[1]);
mysema = mallon(N*(*mysema));
counter = 50;
/*semaphore intialisations */
for(i=0; i<M; i++){
sem_init(&mysema[i],0,0);
}
for(i=0; i<M; i++){
mysema[i].id = i;
}
for(i=0; i<M; i++){
pthread_create(&mysema.t[i], NULL, barrier, &tinfo[i])
}
/*init wake up the first sempahore */
sem_post(&mysema[0]);
.
.
.
We have an array made of M semaphores intialised in 0 , where M is defined in command line by the user.
I know I am done when all M threads in total have done some necessary computations 50 times.
Each thread blocks itself, until the previous thread "sem_post's" it. The very first thread will be waken up by init.
My question is whether the threads will stop when '''counter = 0 '''. Do they all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
These are different questions:
Do [the threads] all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
The first is fairly simple: yes. An object declared at file scope and without storage class specifier _Thread_local is a single object whose storage duration is the entire run of the program. Wherever that object's identifier is in-scope and visible, it identifies the same object regardless of which thread is accessing it.
The answer to the second question is more complicated. In a multi-threaded program there is the potential for data races, and the behavior of a program containing a data race is undefined. The volatile qualifier does not protect against these; instead, you need proper synchronization for all accesses to each shared variable, both reads and writes. This can be provided by a semaphore or more often a mutex, among other possibilities.
Your code's decrement of counter may be adequately protected, but I suspect not, on account of the threads using different semaphores. If this allows for multiple different threads to execute the ...
display_my_work(arg);
counter--;
... lines at the same time then you have a data race. Even if your protection is adequate there, however, the read of counter in the while condition clearly is not properly synchronized, and you definitely have a data race there.
One of the common manifestations of the undefined behavior brought on by data races is that threads do not see each others' updates, so not only does your program's undefined behavior generally mean that threads 1 ... M-1 may not see thread 0's update of counter, it also specifically makes such a failure comparatively probable.

Per-thread state vs. shared state in threads

I'm trying to understand the details in the TCB (thread control block and the differences between per-thread states and shared states. My book has its own implementation of pthread, so it gives an example with this mini C program (I've not typed the whole thing out)
#include "thread.h"
static void go(int n);
static thread_t threads[NTHREADS];
#define NTHREADS 10
int main(int argh, char **argv) {
int i;
long exitValue;
for (i = 0; i < NTHREADS; i++) {
thread_create(&threads[i]), &go, i);
}
for (i = 0; i < NTHREADS; i++) {
exitValue = thread_join(threads[i]);
}
printf("Main thread done".\n);
return 0;
}
void go(int n) {
printf("Hello from thread %d\n", n);
thread_exit(100 + n);
}
What would the variables i and exitValue (in the main() function) be examples of? They're not shared state since they're not global variables, but I'm not sure if they're per-thread state either. The i is used as the parameter for the go function when each thread is being created, so I'm a bit confused about it. The exitValue's scope is limited only to main() so that seems like it would just be stored on the process' stack. The int n as the parameter for the void go() would be a per-thread variable because its value is independent for each thread. I don't think I fully understand these concepts so any help would be appreciated! Thanks!
Short Answer
All of the variables in your example program are automatic variables. Each time one of them comes into scope storage for it is allocated, and when it leaves its scope it is no longer valid. This concept is independent of whether the variables is shared or not.
Longer Answer
The scope of a variable refers to its lifetime in the program (and also the rules for how it can be accessed). In your program the variables i and exitValue are scoped to the main function. Typically a compiler will allocate space on the stack which is used to store the values for these variables.
The variable n in function go is a parameter to the function and so it also acts as a local variable in the function go. So each time go is executed the compiler will allocate space on the stack frame for the variables n (although the compiler may be able to perform optimization to keep the local variables in registers rather than actually allocating stack space). However, as a parameter n will be initialized with whatever value it was called with (its actual parameter).
To make this more concrete, here is what the values of the variales in the program would be after the first loop has completed 2 iterations (assuming that the spawned threads haven't finished executing).
Main thread: i = 2, exitValue = 0
Thread 0: n = 0
Thread 1: n = 1
The thing to note is that there are multiple independent copies of the variable n. And that n gets a copy of the value in i when thread_create is executed, but that the values of i and n are independent after that.
Finally I'm not certain what is supposed to happen with the statement exitValue = thread_join(threads[i]); since this is a variation of pthreads. But what probably happens is that it makes the value available when another thread calls thread_join. So in that way you do get some data sharing between threads, but the sharing is synchronized by the thread_join command.
They're objects with automatic storage, casually known as "local variables" although the latter is ambiguous since C and C++ both allow objects with local scope but that only have one global instance via the static keyword.

Safe way to pass parameters into a thread

Can you clarify, why the following code is a safe way to pass parameters into the new thread:
//Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
And the following code isn't:
//Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Quote from the book,regarding the code:
It is critical to realize that the child thread can start executing at any point after the call, so the pointer must point to something that still exists and still retains the same value. This rules out passing in pointers to changing variables as well as pointers to information held on the stack (unless the stack is certain to exist until after the child thread has read the value).
A third method is good as given below:
static int args[10];
for ( int i=0; i<10; i++ ) {
args[i] = i;
pthread_create( &thread, 0, &thread_code, (void *)&args[i] );
}
If you want same variable shared across all the threads, make a local variable in main or preferably and static or global variable.
Issues with method 1 and method 2:
Method 1 You are casting an int to void * and then back to int which is bad as the size of int and void * may be different. If you plan to cast void * to int *, it is even worse and an UB. Also read this post.
Method 2 You are passing same address to all threads. When i is changed from main thread of any of the 10 worker threads same value would be reflected everywhere which may not be your intention. Moreover scope of i ends after the for loop, and you may end up accessing dangling pointers in threads. and would cause UB. (undefined behaviour)
Why is the second example wrong?
As your citation says, you must not pass a pointer to the interation variable because it gets changed quickly. You never know when exactly the concurrent thread will use the pointer and dereference it.
// Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Imagine the very first call to pthread_create(). It receives a pointer to i and will probably dereference the pointer and read the value. Your value is supposed to be 0 at the time. But your main thread (the one with the for loop) may have already changed i from 0 to 1. That is called a race condition because your program depends on whether one thread is faster to change the value or the other is faster to get it.
There's a second race condition as well, as your i variable will get out of scope at the end of the loop. If the threads were slow to start or to read the pointer target, the address on the stack can already be allocated to something else. You must not dereference pointers to variables that no longer exist.
Why the first doesn't have the same problem?
The first example uses the value of i, not it's address. That is good, as pthread_create() will just hold the value and pass it to the thread.
// Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
But pthread_create() only accepts void * (a generic pointer). The example uses a special trick where you cast the integer value to a pointer value. It is expected that the thread function will do the reverse (will cast the pointer back to integer).
This trick is often used to store an integer value where an object is expected, as it avoids having to allocate and deallocate the object. Whether such a technique is good or bad practice is out of scope of a factual answer. It's being used in frameworks like GLib but I guess many programmers will scorn it.
Final notes
The examples in the book are clearly not solutions for real problems but just motivation examples. In actual code, you would rarely pass just an integer value and you might want to join the thread at some point of time. Therefore in a simple scenario you would have to allocate the thread arguments, fill them in, start the workers, join the workers, retrieve the results and free the allocations.
In a more complicated scenario you would communicate with the threads and therefore you wouldn't be limited to feeding them at their creation and retreiving the results after joining them. You could even just let the workers run and reuse them whenever you need them.

Why is threadID unique?

I've used POSIX threads a few times in C and I never thought about this until the other day: why is the variable taken from arg given to pthread_create() private, given that all the threads call the same function when they start and run the same code to initialise the same variable (most likely a thread ID)? For example, the code:
#include <stdio.h>
#include <pthread.h>
void* threadMethod(void* arg)
{
int threadID = (int) arg;
printf("Thread %d reporting in\n", threadID);
}
int main()
{
pthread_t threads[8];
for (int i = 0; i < 8; i++)
pthread_create(&threads[i], NULL, threadMethod, (void*) i);
for (int i = 0; i < 8; i++)
pthread_join(threads[i], NULL);
}
threadID has a unique value to each thread but I don't understand why, given that it's the same variable in the same method that all threads execute. Shouldn't threads be overwriting each others' value of it? I think it's something to do with stacks. Could someone please clarify what exactly is going on here?
The question should be, "Why do all 8 thread get their own argument"
(private means something else)
The answer to that is, that you are passing by value.
The content of the variable is copied into a register
(or the stack depending on calling convention)
and is then copied further into the local argument variable
(arg), witch lives in thread-local memory.
pthread_create is a C function so there is no concept of private. The reason why the argument to your thread function is a "void*" is because void* is a generic pointer that can point to any type of memory. What that memory is, is between the thread function and the function creating the thread. You are free to use this for a threadId but it really can be anything. Since each thread may be created using a different startup function and using different data.
The reason for the warning is that void* is 64 bits on a 64 bit machine but in is typically 32 bits. The compiler is warning you that you may lose data in the cast. Using a size_t instead of an int should remove the warning.

thread parameter passing c

My question regards parameter passing to a thread.
I have a function Foo that operates on an array, say arrayA. To speed things up, Foo is coded to operate at both directions on the array. So, Foo takes arrayA and an integer X as parameters. Depending on the value of X, it operates in forward or reverse direction.
I'm looking to avoid making global use of "arrayA" and "X". So, I'm after passing "arrayA" and "X" as parameters to Foo, and creating two threads to run Foo-- one in each direction. Here's what I did:
typedef struct {int* arrayA[MSIZE]; int X; } TP; //arrayPack=TP
void Foo (void *tP) {
TP *tp = (TP*)tP; // cast the parameter tP back to what it is and assign to pointer *tp
int x;
printf("\nX: %d", tp->X);
printf("\n arrayA: "); for (x=0; x<tp->arrayA.size(); printf("%d ", aP->arrayA[x]), x++);
} // end Foo
void callingRouting () {
int* arrayA[MSIZE] = {3,5,7,9};
TP tp; tp.arrayA=arrayA;
tp.X=0; _beginthread(Foo, 0, (void*)&tp); // process -- forward
tp.X=1; _beginthread(Foo, 0, (void*)&tp); // process -- reverse
}
The values aren`t passed-- my array is printing empty and I'm not getting the value of X printed right. What am i missing ?
I would also appreciate suggestions on some readings on this-- passing parameters to threads-- especially on passing the resources shared by the threads. Thanks.
You're passing the address of a stack variable to your thread function, once callingRouting exits the TP structure no longer exists. They need to be either globals or allocated on the heap.
However you'll need two copies of the TP for each thread as the change tp.X=1 may be visible to both threads.
There are problems there but how you see them depends on how the OS decides to schedule the threads on each execution.
The first thing to remember is that you have a thread that is starting up two other threads. Since you do not have any control over the processor time slice and how that is allocated, you can not be sure when the two other threads will start and may be not even the order that they will start in.
Since you hare using an array on the stack that is local to the function callingRouting () as soon as that function returns the local variables allocated will basically be out of scope and can no longer be depended on.
So there are a couple of ways to do this.
The first is to use global or static memory variables for these data items being passed to the threads.
The other is to start both threads and then wait for both to complete before continuing.
Since you do not know when or the order of the threads being started, you really should use two different TP type variables, one for each thread. Otherwise you run the risk of the time slice allocation to be such that both threads will have the same TP data.

Resources