Can you clarify, why the following code is a safe way to pass parameters into the new thread:
//Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
And the following code isn't:
//Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Quote from the book,regarding the code:
It is critical to realize that the child thread can start executing at any point after the call, so the pointer must point to something that still exists and still retains the same value. This rules out passing in pointers to changing variables as well as pointers to information held on the stack (unless the stack is certain to exist until after the child thread has read the value).
A third method is good as given below:
static int args[10];
for ( int i=0; i<10; i++ ) {
args[i] = i;
pthread_create( &thread, 0, &thread_code, (void *)&args[i] );
}
If you want same variable shared across all the threads, make a local variable in main or preferably and static or global variable.
Issues with method 1 and method 2:
Method 1 You are casting an int to void * and then back to int which is bad as the size of int and void * may be different. If you plan to cast void * to int *, it is even worse and an UB. Also read this post.
Method 2 You are passing same address to all threads. When i is changed from main thread of any of the 10 worker threads same value would be reflected everywhere which may not be your intention. Moreover scope of i ends after the for loop, and you may end up accessing dangling pointers in threads. and would cause UB. (undefined behaviour)
Why is the second example wrong?
As your citation says, you must not pass a pointer to the interation variable because it gets changed quickly. You never know when exactly the concurrent thread will use the pointer and dereference it.
// Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Imagine the very first call to pthread_create(). It receives a pointer to i and will probably dereference the pointer and read the value. Your value is supposed to be 0 at the time. But your main thread (the one with the for loop) may have already changed i from 0 to 1. That is called a race condition because your program depends on whether one thread is faster to change the value or the other is faster to get it.
There's a second race condition as well, as your i variable will get out of scope at the end of the loop. If the threads were slow to start or to read the pointer target, the address on the stack can already be allocated to something else. You must not dereference pointers to variables that no longer exist.
Why the first doesn't have the same problem?
The first example uses the value of i, not it's address. That is good, as pthread_create() will just hold the value and pass it to the thread.
// Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
But pthread_create() only accepts void * (a generic pointer). The example uses a special trick where you cast the integer value to a pointer value. It is expected that the thread function will do the reverse (will cast the pointer back to integer).
This trick is often used to store an integer value where an object is expected, as it avoids having to allocate and deallocate the object. Whether such a technique is good or bad practice is out of scope of a factual answer. It's being used in frameworks like GLib but I guess many programmers will scorn it.
Final notes
The examples in the book are clearly not solutions for real problems but just motivation examples. In actual code, you would rarely pass just an integer value and you might want to join the thread at some point of time. Therefore in a simple scenario you would have to allocate the thread arguments, fill them in, start the workers, join the workers, retrieve the results and free the allocations.
In a more complicated scenario you would communicate with the threads and therefore you wouldn't be limited to feeding them at their creation and retreiving the results after joining them. You could even just let the workers run and reuse them whenever you need them.
Related
I have a general question that occured to me while trying to implement a thread sychronization problem with sempaphores. I do not want to get into much (unrelated) detail so I am going to give the code that I think is important to clarify my question.
sem_t *mysema;
violatile int counter;
struct my_info{
pthread_t t;
int id;
};
void *barrier (void *arg){
struct my_info *a = arg;
arg->id = thrid;
while(counter >0){
do_work(&mysem[thrid])
sem_wait(&mysema[third])
display_my_work(arg);
counter--;
sem_post(&mysema[thrid+1])
}
return NULL;
}
int main(int argc, char *argv[]){
int N = atoi(argv[1]);
mysema = mallon(N*(*mysema));
counter = 50;
/*semaphore intialisations */
for(i=0; i<M; i++){
sem_init(&mysema[i],0,0);
}
for(i=0; i<M; i++){
mysema[i].id = i;
}
for(i=0; i<M; i++){
pthread_create(&mysema.t[i], NULL, barrier, &tinfo[i])
}
/*init wake up the first sempahore */
sem_post(&mysema[0]);
.
.
.
We have an array made of M semaphores intialised in 0 , where M is defined in command line by the user.
I know I am done when all M threads in total have done some necessary computations 50 times.
Each thread blocks itself, until the previous thread "sem_post's" it. The very first thread will be waken up by init.
My question is whether the threads will stop when '''counter = 0 '''. Do they all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
These are different questions:
Do [the threads] all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
The first is fairly simple: yes. An object declared at file scope and without storage class specifier _Thread_local is a single object whose storage duration is the entire run of the program. Wherever that object's identifier is in-scope and visible, it identifies the same object regardless of which thread is accessing it.
The answer to the second question is more complicated. In a multi-threaded program there is the potential for data races, and the behavior of a program containing a data race is undefined. The volatile qualifier does not protect against these; instead, you need proper synchronization for all accesses to each shared variable, both reads and writes. This can be provided by a semaphore or more often a mutex, among other possibilities.
Your code's decrement of counter may be adequately protected, but I suspect not, on account of the threads using different semaphores. If this allows for multiple different threads to execute the ...
display_my_work(arg);
counter--;
... lines at the same time then you have a data race. Even if your protection is adequate there, however, the read of counter in the while condition clearly is not properly synchronized, and you definitely have a data race there.
One of the common manifestations of the undefined behavior brought on by data races is that threads do not see each others' updates, so not only does your program's undefined behavior generally mean that threads 1 ... M-1 may not see thread 0's update of counter, it also specifically makes such a failure comparatively probable.
I've used POSIX threads a few times in C and I never thought about this until the other day: why is the variable taken from arg given to pthread_create() private, given that all the threads call the same function when they start and run the same code to initialise the same variable (most likely a thread ID)? For example, the code:
#include <stdio.h>
#include <pthread.h>
void* threadMethod(void* arg)
{
int threadID = (int) arg;
printf("Thread %d reporting in\n", threadID);
}
int main()
{
pthread_t threads[8];
for (int i = 0; i < 8; i++)
pthread_create(&threads[i], NULL, threadMethod, (void*) i);
for (int i = 0; i < 8; i++)
pthread_join(threads[i], NULL);
}
threadID has a unique value to each thread but I don't understand why, given that it's the same variable in the same method that all threads execute. Shouldn't threads be overwriting each others' value of it? I think it's something to do with stacks. Could someone please clarify what exactly is going on here?
The question should be, "Why do all 8 thread get their own argument"
(private means something else)
The answer to that is, that you are passing by value.
The content of the variable is copied into a register
(or the stack depending on calling convention)
and is then copied further into the local argument variable
(arg), witch lives in thread-local memory.
pthread_create is a C function so there is no concept of private. The reason why the argument to your thread function is a "void*" is because void* is a generic pointer that can point to any type of memory. What that memory is, is between the thread function and the function creating the thread. You are free to use this for a threadId but it really can be anything. Since each thread may be created using a different startup function and using different data.
The reason for the warning is that void* is 64 bits on a 64 bit machine but in is typically 32 bits. The compiler is warning you that you may lose data in the cast. Using a size_t instead of an int should remove the warning.
I don't know what am I doing wrong in the following function. I am trying to return an array but it Seg Faults at while loop in the compute_average. I am using a structure to initialize the begin and the last word and used in the function "compute_average" You can assume the compute_average function works fine which was already there but I changed the code when using the structure or "first" and "second" value.
typedef struct parallel{
unsigned long first, second;
}Parallel;
return final_array;
}
Your argument to pthread_join is wrong. You are, in effect, doing this:
pthread_join(tid, NULL);
You need to supply the address of a void * variable (and technically you should capture the return value of pthread_join itself, in case it fails, although here it won't):
error = pthread_join(tid, &ary1);
If error==0 after this, ary1 will contain the value returned from the thread, and array1 = ary1 will suffice to make it useful (no cast is needed there).
(As ldav1s noted, the size is wrong too. You may want to augment the code to return more than just one pointer. Or you could fill in a data structure supplied by the thread's caller. [I'm making various guesses here about how you might implement multiple threads, in the end. The guesses may be wrong. :-) ])
This:
unsigned long *array1;
...
size = (sizeof(array1)/sizeof(array1[0]));
Looks extremely suspicious, unless size is supposed to be 1 or 2. array1 is a pointer, not an array.
pthread_join(tid, ary1);
should be:
pthread_join(tid, &ary1);
If you want to get any data back from compute_prime at all. You also need to "tack on" some kind of size in the data returned from compute_prime to tell you how many elements you've got.
You're also leaking memory allocated in compute_prime, better free(ary1).
My question regards parameter passing to a thread.
I have a function Foo that operates on an array, say arrayA. To speed things up, Foo is coded to operate at both directions on the array. So, Foo takes arrayA and an integer X as parameters. Depending on the value of X, it operates in forward or reverse direction.
I'm looking to avoid making global use of "arrayA" and "X". So, I'm after passing "arrayA" and "X" as parameters to Foo, and creating two threads to run Foo-- one in each direction. Here's what I did:
typedef struct {int* arrayA[MSIZE]; int X; } TP; //arrayPack=TP
void Foo (void *tP) {
TP *tp = (TP*)tP; // cast the parameter tP back to what it is and assign to pointer *tp
int x;
printf("\nX: %d", tp->X);
printf("\n arrayA: "); for (x=0; x<tp->arrayA.size(); printf("%d ", aP->arrayA[x]), x++);
} // end Foo
void callingRouting () {
int* arrayA[MSIZE] = {3,5,7,9};
TP tp; tp.arrayA=arrayA;
tp.X=0; _beginthread(Foo, 0, (void*)&tp); // process -- forward
tp.X=1; _beginthread(Foo, 0, (void*)&tp); // process -- reverse
}
The values aren`t passed-- my array is printing empty and I'm not getting the value of X printed right. What am i missing ?
I would also appreciate suggestions on some readings on this-- passing parameters to threads-- especially on passing the resources shared by the threads. Thanks.
You're passing the address of a stack variable to your thread function, once callingRouting exits the TP structure no longer exists. They need to be either globals or allocated on the heap.
However you'll need two copies of the TP for each thread as the change tp.X=1 may be visible to both threads.
There are problems there but how you see them depends on how the OS decides to schedule the threads on each execution.
The first thing to remember is that you have a thread that is starting up two other threads. Since you do not have any control over the processor time slice and how that is allocated, you can not be sure when the two other threads will start and may be not even the order that they will start in.
Since you hare using an array on the stack that is local to the function callingRouting () as soon as that function returns the local variables allocated will basically be out of scope and can no longer be depended on.
So there are a couple of ways to do this.
The first is to use global or static memory variables for these data items being passed to the threads.
The other is to start both threads and then wait for both to complete before continuing.
Since you do not know when or the order of the threads being started, you really should use two different TP type variables, one for each thread. Otherwise you run the risk of the time slice allocation to be such that both threads will have the same TP data.
I have a process that creates a number of threads based on an argument passed to the process.
producer_threads[num_threads];
for (id = 0; id < num_threads; id++)
{
printf("%d\n", id);
pthread_create(&producer_threads[id], NULL, &produce, (void *) &id);
}
Each thread goes into a produce function and stores the id as a local variable
void* produce (void* args)
{
int my_id = * (int*) args;
printf("Thread %d started to produce\n", my_id);
}
However the output I receive is as shown
0
1
Thread <n> started to produce
Thread <n> started to produce
and n are randomly either 0, 1, or 2. I am not sure what is causing the problem unless it is because the global variable is being updated before it is assigned locally. Or because the "local variable" is shared between threads.
The problem is that you're passing pointers to the same variable to each thread. This creates a race condition, whereby the value of the variable as seen by each thread depends on the exact timing.
If you were to pass the thread argument by value rather than by pointer, this would fix the problem.
The integer needs to be an alloc'd variable instead of a stack variable. Since you're passing a pointer to a memory location on the stack, your results will depend on timing (i.e. are a race condition). You need to pass different variables to each pthread_create call.