Why is threadID unique? - c

I've used POSIX threads a few times in C and I never thought about this until the other day: why is the variable taken from arg given to pthread_create() private, given that all the threads call the same function when they start and run the same code to initialise the same variable (most likely a thread ID)? For example, the code:
#include <stdio.h>
#include <pthread.h>
void* threadMethod(void* arg)
{
int threadID = (int) arg;
printf("Thread %d reporting in\n", threadID);
}
int main()
{
pthread_t threads[8];
for (int i = 0; i < 8; i++)
pthread_create(&threads[i], NULL, threadMethod, (void*) i);
for (int i = 0; i < 8; i++)
pthread_join(threads[i], NULL);
}
threadID has a unique value to each thread but I don't understand why, given that it's the same variable in the same method that all threads execute. Shouldn't threads be overwriting each others' value of it? I think it's something to do with stacks. Could someone please clarify what exactly is going on here?

The question should be, "Why do all 8 thread get their own argument"
(private means something else)
The answer to that is, that you are passing by value.
The content of the variable is copied into a register
(or the stack depending on calling convention)
and is then copied further into the local argument variable
(arg), witch lives in thread-local memory.

pthread_create is a C function so there is no concept of private. The reason why the argument to your thread function is a "void*" is because void* is a generic pointer that can point to any type of memory. What that memory is, is between the thread function and the function creating the thread. You are free to use this for a threadId but it really can be anything. Since each thread may be created using a different startup function and using different data.
The reason for the warning is that void* is 64 bits on a 64 bit machine but in is typically 32 bits. The compiler is warning you that you may lose data in the cast. Using a size_t instead of an int should remove the warning.

Related

Do all threads have the same global variable?

I have a general question that occured to me while trying to implement a thread sychronization problem with sempaphores. I do not want to get into much (unrelated) detail so I am going to give the code that I think is important to clarify my question.
sem_t *mysema;
violatile int counter;
struct my_info{
pthread_t t;
int id;
};
void *barrier (void *arg){
struct my_info *a = arg;
arg->id = thrid;
while(counter >0){
do_work(&mysem[thrid])
sem_wait(&mysema[third])
display_my_work(arg);
counter--;
sem_post(&mysema[thrid+1])
}
return NULL;
}
int main(int argc, char *argv[]){
int N = atoi(argv[1]);
mysema = mallon(N*(*mysema));
counter = 50;
/*semaphore intialisations */
for(i=0; i<M; i++){
sem_init(&mysema[i],0,0);
}
for(i=0; i<M; i++){
mysema[i].id = i;
}
for(i=0; i<M; i++){
pthread_create(&mysema.t[i], NULL, barrier, &tinfo[i])
}
/*init wake up the first sempahore */
sem_post(&mysema[0]);
.
.
.
We have an array made of M semaphores intialised in 0 , where M is defined in command line by the user.
I know I am done when all M threads in total have done some necessary computations 50 times.
Each thread blocks itself, until the previous thread "sem_post's" it. The very first thread will be waken up by init.
My question is whether the threads will stop when '''counter = 0 '''. Do they all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
These are different questions:
Do [the threads] all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
The first is fairly simple: yes. An object declared at file scope and without storage class specifier _Thread_local is a single object whose storage duration is the entire run of the program. Wherever that object's identifier is in-scope and visible, it identifies the same object regardless of which thread is accessing it.
The answer to the second question is more complicated. In a multi-threaded program there is the potential for data races, and the behavior of a program containing a data race is undefined. The volatile qualifier does not protect against these; instead, you need proper synchronization for all accesses to each shared variable, both reads and writes. This can be provided by a semaphore or more often a mutex, among other possibilities.
Your code's decrement of counter may be adequately protected, but I suspect not, on account of the threads using different semaphores. If this allows for multiple different threads to execute the ...
display_my_work(arg);
counter--;
... lines at the same time then you have a data race. Even if your protection is adequate there, however, the read of counter in the while condition clearly is not properly synchronized, and you definitely have a data race there.
One of the common manifestations of the undefined behavior brought on by data races is that threads do not see each others' updates, so not only does your program's undefined behavior generally mean that threads 1 ... M-1 may not see thread 0's update of counter, it also specifically makes such a failure comparatively probable.

Per-thread state vs. shared state in threads

I'm trying to understand the details in the TCB (thread control block and the differences between per-thread states and shared states. My book has its own implementation of pthread, so it gives an example with this mini C program (I've not typed the whole thing out)
#include "thread.h"
static void go(int n);
static thread_t threads[NTHREADS];
#define NTHREADS 10
int main(int argh, char **argv) {
int i;
long exitValue;
for (i = 0; i < NTHREADS; i++) {
thread_create(&threads[i]), &go, i);
}
for (i = 0; i < NTHREADS; i++) {
exitValue = thread_join(threads[i]);
}
printf("Main thread done".\n);
return 0;
}
void go(int n) {
printf("Hello from thread %d\n", n);
thread_exit(100 + n);
}
What would the variables i and exitValue (in the main() function) be examples of? They're not shared state since they're not global variables, but I'm not sure if they're per-thread state either. The i is used as the parameter for the go function when each thread is being created, so I'm a bit confused about it. The exitValue's scope is limited only to main() so that seems like it would just be stored on the process' stack. The int n as the parameter for the void go() would be a per-thread variable because its value is independent for each thread. I don't think I fully understand these concepts so any help would be appreciated! Thanks!
Short Answer
All of the variables in your example program are automatic variables. Each time one of them comes into scope storage for it is allocated, and when it leaves its scope it is no longer valid. This concept is independent of whether the variables is shared or not.
Longer Answer
The scope of a variable refers to its lifetime in the program (and also the rules for how it can be accessed). In your program the variables i and exitValue are scoped to the main function. Typically a compiler will allocate space on the stack which is used to store the values for these variables.
The variable n in function go is a parameter to the function and so it also acts as a local variable in the function go. So each time go is executed the compiler will allocate space on the stack frame for the variables n (although the compiler may be able to perform optimization to keep the local variables in registers rather than actually allocating stack space). However, as a parameter n will be initialized with whatever value it was called with (its actual parameter).
To make this more concrete, here is what the values of the variales in the program would be after the first loop has completed 2 iterations (assuming that the spawned threads haven't finished executing).
Main thread: i = 2, exitValue = 0
Thread 0: n = 0
Thread 1: n = 1
The thing to note is that there are multiple independent copies of the variable n. And that n gets a copy of the value in i when thread_create is executed, but that the values of i and n are independent after that.
Finally I'm not certain what is supposed to happen with the statement exitValue = thread_join(threads[i]); since this is a variation of pthreads. But what probably happens is that it makes the value available when another thread calls thread_join. So in that way you do get some data sharing between threads, but the sharing is synchronized by the thread_join command.
They're objects with automatic storage, casually known as "local variables" although the latter is ambiguous since C and C++ both allow objects with local scope but that only have one global instance via the static keyword.

Safe way to pass parameters into a thread

Can you clarify, why the following code is a safe way to pass parameters into the new thread:
//Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
And the following code isn't:
//Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Quote from the book,regarding the code:
It is critical to realize that the child thread can start executing at any point after the call, so the pointer must point to something that still exists and still retains the same value. This rules out passing in pointers to changing variables as well as pointers to information held on the stack (unless the stack is certain to exist until after the child thread has read the value).
A third method is good as given below:
static int args[10];
for ( int i=0; i<10; i++ ) {
args[i] = i;
pthread_create( &thread, 0, &thread_code, (void *)&args[i] );
}
If you want same variable shared across all the threads, make a local variable in main or preferably and static or global variable.
Issues with method 1 and method 2:
Method 1 You are casting an int to void * and then back to int which is bad as the size of int and void * may be different. If you plan to cast void * to int *, it is even worse and an UB. Also read this post.
Method 2 You are passing same address to all threads. When i is changed from main thread of any of the 10 worker threads same value would be reflected everywhere which may not be your intention. Moreover scope of i ends after the for loop, and you may end up accessing dangling pointers in threads. and would cause UB. (undefined behaviour)
Why is the second example wrong?
As your citation says, you must not pass a pointer to the interation variable because it gets changed quickly. You never know when exactly the concurrent thread will use the pointer and dereference it.
// Listing 5.4 Erroneous Way of Passing Data to a New Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)&i );
Imagine the very first call to pthread_create(). It receives a pointer to i and will probably dereference the pointer and read the value. Your value is supposed to be 0 at the time. But your main thread (the one with the for loop) may have already changed i from 0 to 1. That is called a race condition because your program depends on whether one thread is faster to change the value or the other is faster to get it.
There's a second race condition as well, as your i variable will get out of scope at the end of the loop. If the threads were slow to start or to read the pointer target, the address on the stack can already be allocated to something else. You must not dereference pointers to variables that no longer exist.
Why the first doesn't have the same problem?
The first example uses the value of i, not it's address. That is good, as pthread_create() will just hold the value and pass it to the thread.
// Listing 5.3 Passing a Value into a Created Thread
for ( int i=0; i<10; i++ )
pthread_create( &thread, 0, &thread_code, (void *)i );
But pthread_create() only accepts void * (a generic pointer). The example uses a special trick where you cast the integer value to a pointer value. It is expected that the thread function will do the reverse (will cast the pointer back to integer).
This trick is often used to store an integer value where an object is expected, as it avoids having to allocate and deallocate the object. Whether such a technique is good or bad practice is out of scope of a factual answer. It's being used in frameworks like GLib but I guess many programmers will scorn it.
Final notes
The examples in the book are clearly not solutions for real problems but just motivation examples. In actual code, you would rarely pass just an integer value and you might want to join the thread at some point of time. Therefore in a simple scenario you would have to allocate the thread arguments, fill them in, start the workers, join the workers, retrieve the results and free the allocations.
In a more complicated scenario you would communicate with the threads and therefore you wouldn't be limited to feeding them at their creation and retreiving the results after joining them. You could even just let the workers run and reuse them whenever you need them.

Understanding pthread_ create arguments in C

In this below link
https://computing.llnl.gov/tutorials/pthreads/samples/hello.c
in the statement rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t); the coder has just passed a variable as 4th argument without passing address of that variable. Is this code correct? If yes how can we cast a variable to void *
The above link seems to be popular as it is listing first in Google for pthreads.
Well it is a bit weird, but it does what it is supposed to.
The fourth argument is sent as argument to the PrintHello function/routine. It has to be passed as a void *.
Typically you have a pointer to a dynamically allocated object that you cast to void *. But here he defines a long t, casts it to void * (address) and sends it in. Then he casts it back to a long in PrintHello, so all is fine, but a bit ugly and could have gone "horribly" wrong if he would have cast it to a pointer and tried to access the memory it pointed to.
Yes this code is correct, if you don't try to access the memory pointed to by the parameter in the thread. Just convert it to a long in the thread.
tid = (long)threadid;
It converts the pointer to a long, but it doesn't touch the memory space that the pointer points to, which is most likely junk and will cause access violations.
For example if you did:
tid = (long)*threadid;
That would cause an access violation because you are trying to access the memory at the location pointed to by threadid.
If you would rather pass the pointer to a long integer you could do something like this.
...
long* pint = (long*)malloc(sizeof(long));
*pint = t;
rc = pthread_create(&threads[t], NULL, PrintHello, (void *)pint);
void *PrintHello(void *threadid)
{
long* tid;
tid = (long*)threadid;
printf("Hello World! It's me, thread #%ld!\n", *tid);
free(tid);
pthread_exit(NULL);
}
But that requires the use of malloc and free
Keep in mind that a pointer is nothing more than a 32 or 64bit unsigned integer which represents a location in memory, you can put any number you want in a pointer, just don't try to access the memory it points to.
Hope that helps,
-Dave
Actually the 4th argument is the parameter to be passed to the thread, for example if there is a value that needs to be passed from the main thread to the newly created one, then this is done through this 4th argument. For example:
Lets say I have a thread being created from the main loop:
Int32 l_threadid = pthread_create(&l_updatethread,NULL,Thread,&l_filter);
As you can note that I'm passing the address of a value that is going to be used in the thread being created in the following way:
void* Thread(void *p_parameter)
{
int *l_thread_filter = (int *)p_parameter;
.... then play around with this variable ...
}

Dynamically creating threads and passing an integer

I have a process that creates a number of threads based on an argument passed to the process.
producer_threads[num_threads];
for (id = 0; id < num_threads; id++)
{
printf("%d\n", id);
pthread_create(&producer_threads[id], NULL, &produce, (void *) &id);
}
Each thread goes into a produce function and stores the id as a local variable
void* produce (void* args)
{
int my_id = * (int*) args;
printf("Thread %d started to produce\n", my_id);
}
However the output I receive is as shown
0
1
Thread <n> started to produce
Thread <n> started to produce
and n are randomly either 0, 1, or 2. I am not sure what is causing the problem unless it is because the global variable is being updated before it is assigned locally. Or because the "local variable" is shared between threads.
The problem is that you're passing pointers to the same variable to each thread. This creates a race condition, whereby the value of the variable as seen by each thread depends on the exact timing.
If you were to pass the thread argument by value rather than by pointer, this would fix the problem.
The integer needs to be an alloc'd variable instead of a stack variable. Since you're passing a pointer to a memory location on the stack, your results will depend on timing (i.e. are a race condition). You need to pass different variables to each pthread_create call.

Resources