Usually, when I have two concurrent running threads, and each thread calls the same function, there are two instances of the function running in parallel, one in each thread's stack memory. No race condition.
My question is what if I have a global struct with a function pointer.
If I run that function in each thread, from the global struct, is that a race condition? Is there only one copy of the function's variables in the application stack?
Do I need a mutex/semaphore?
I suspect it is not a race condition because the act of calling a function from a function pointer, should be effectively the same as calling a function
Your suspicion is correct. If thread A calls some function, then the activation record for that call (i.e., where the local variables for the call reside) will be on thread A's stack. If thread B simultaneously calls the same function, then that activation record will be on thread B's stack.
It does not matter how either of the two threads knew which function to call. It's the same regardless of whether the address of the function was hard-wired into the code, or whether they got the function address from a "function pointer" variable in a struct.
If I run that function simultaneously in each thread, from the global
struct, is that a race condition?
The rule is: if two (or more) threads can both access the same data, and at least one of the threads might modify the data, then there is a race condition. In that situation, you can avoid the race condition by having each thread lock a mutex before accessing the data, and unlock the mutex afterwards, so that it is guaranteed that no other thread will modify the data while that thread is reading and/or modifying it.
If both threads are only reading the data and will never modify it, then no mutex is necessary.
Is there only one copy of the function's variables in the application
stack?
I don't know what "the application stack" is, but if you are asking if there is only one copy of the global struct, the answer is yes.
If by "the function's variables" you mean local variables that are declared inside the function body -- those are located separately in each thread's stack and not shared across threads (although if the local variable is a pointer, the object it points to might be shared).
Related
While experimenting with c11 threads.h and mutexes to synchronise a network thead and a main thread I started using following procedure:
-define a mutex in the main function
mtx_t mutex_network_acqbuffer;
-initializing it with
mtx_init(&mutex_network_acqbuffer,mtx_plain);
-asigning the pointer of this mutex to a member of a heap allocated struct passed as the starting argument into my network thread
-locking the mutex in either the main thread / network thread to make sure that some data in the heap is not accessed simultaneousely.
But I'm not sure if this is the propper way to do it or if I'm just lucky that my compiler does not break my code.
I tought that the mutex resides in the stack of the main thread, so the child thread should not be able to access it, since it should be only able to acces heap allocated stuff or global variables.
But nevertheless the synchronisation seems to work.
Is there some magick trickery involved inside mtx_init which places the mutex on the heap?
Or is this just implementation dependent?
Should I malloc the mutex in the main thread to be on the save side/make it a global variable?
In C11, the fact if objects on the stack are accessible from different threads or not is implementation-defined.
I personally don't know of any implementation that doesn't give access from other threads, though.
I know it's a very specific question and it's not very interesting for a high level programmer, but I would like to know when exactly are allocated the local variables of a thread function, in other words after
pthread_create(&thread, &function, ...)
is executed, can I say that they exists in memory or not (considering that the scheduler could have not executed the thread yet)?
I tried to search in the posix library code but it's not easy to understand, I arrive at the clone function, written in assembly, but than I cannot find che code of the system call service routine sys_clone to understand what exactly it does. I see in the clone code the invocation of the thread function, but I think this should happen only in the created thread (which could have never been executed by the scheduler when pthread_create is terminated) and not in the creator.
in other words after
pthread_create(&thread, &function, ...)
is executed, can I say that they exists in memory or not (considering
that the scheduler could have not executed the thread yet)?
POSIX does not give you any reason for confidence that the local variables of the initial call to function function() in the created thread will have been allocated by the time pthread_create() returns. They might or might not have been, and indeed, the answer might not even be well defined inasmuch as different threads do not necessarily have a consistent view of machine state.
There is no special significance to the local variables of a thread's start function relative to the local variables of any other function called in that thread. Moreover, although pthread_create() will not return successfully until the new thread has been created, that's a separate question from whether the start function has even been entered, much less whether its local variables have been allocated.
The title is the question: when a thread exits, does its cached memory get flushed to the main memory?
I am wondering because cases are common where the main thread creates some threads, they do some work on independent parts of the array (no data dependencies between each other), the main thread joins all the worker threads, then does more calculations with the array values that result from the worker threads computations. Do the arrays need to be declared volatile for the main thread to see the side-effects on it?
The pthreads specification requires that pthread_join() is one of the functions that "synchronizes memory with respect to other threads", so in the case of pthreads you are OK - after pthread_join() has returned, the main thread will see all updates to shared memory made by the joined thread.
Assuming you are doing this in C, and if the array is global or you have passed a structure to the threads which contains the indices on which the threads need to do the computation on and a pointer to the array, then the array need not be volatile for the main thread to see the changes since the array memory is shared between the worker threads and the main thread.
I have declared some local variable in one function like this:
void* thread_function (void* parameter)
{
struct parameter * thread_data = (struct parameter *)parameter;
char buffer[20];
int temp;
}
Here if I have created two threads then in one thread if buffer & temp is updated so will it effect other thread ?
i mean if there are two thread then does there will be two copy of all local variable?
EDIT : then in which case i need to used thread specific data.? i mean pthread_setspecific & all such stuff
These variables are allocated on the stack, and each thread has its own stack: these variables are private to each thread (they are not shared). (See this answer for more details.)
If you assign thread_data to a global pointer, for example, other threads will be able to access thread_data via the global pointer.
Thread specific data (e.g. pthread_setspecific) is used to create variables that are global, but still specific to each thread (not shared): They are thread-specific global variables.
You need to use thread specific variables when you want global variables, but don't want to share them between threads.
It's not that each thread has its own copy, it's that each instance of a function invocation has its own copy of all automatic (i.e. local non-static) variables, regardless of whether the instances are in the same thread or different threads. This is true if the instances come into existence due to invocation in different threads, recursive invocation, mutual/indirect recursion, or even invocation from an asynchronous signal handler. Note that while the C standard does not specify threads, the relevant section in the standard is probably 5.2.3 Signals and interrupts:
Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called by a signal handler, or both, with no alteration to earlier, but still active, invocations' control flow (after the interruption), function return values, or objects with automatic storage duration. All such objects shall be maintained outside the function image (the instructions that compose the executable representation of a function) on a per-invocation basis.
This makes it explicit that each invocation must have its own storage for automatic variables.
Local variables are stored in stack memory, which is private to a thread.
Therefore they are not shared between threads: there will be an independent copy of each variable in each thread
Update
Whether you would want to share data between threads really boils down to a design question; What are your threads doing? Are their effort co-ordinated or are they simply workers processing a queue.
The main thing to consider is synchronization of shared data. Variables that are shared between threads are variables that can change value unexpectedly (within a single thread) and so need to be treated as such. I would suggest that you err on the side of not sharing, unless you have a specific reason to do so.
How does one modify a threads data from outside a thread?
If a thread is running a function that loops for the runtime of the application, how can its data be set, changed?
How does one call functions which modify a specific threads functions?
Where do these functions belong?
The advantage and disadvantage of threads is that they share the memory space with every other thread in the process. You can use any form of data transfer you would use in single threaded applications to pass data betweens segments of you application. However, in a multi-threaded application you must use some type of synchronization to assure data integrity and prevent deadlocks.
If the "thread's data" you want to modify from outside is in the form of local variables in a function running in the thread, or thread-specific data created with the __thread extension, then the only way you can modify them from outside (modulo code with UB that's technically just trashing memory) is by having the thread take the addresses of its variables and store that somewhere where other threads can see it (either in a global variable, or at a location passed in via the thread start function's void * argument.
Also note that, as rerun pointed out, you have to use some method of synchronization if multiple threads are accessing the same data. The only standard/portable synchronization methods are the pthread ones: pthread_mutex_lock etc., but you can also use assembly or compiler intrinsics (like __sync_* in gcc).