If I look at the implementation of pthread_equal it looks as follows:
int
__pthread_equal (pthread_t thread1, pthread_t thread2)
{
return thread1 == thread2;
}
weak_alias (__pthread_equal, pthread_equal)
with
typedef unsigned long int pthread_t;
Posix documentation says pthread_equal is threadsafe. But the call to pthread_equals copies and therefore accesses the thread1 and thread2 variables. If those variables are global and changed in this moment by another thread this would lead to undefined behavior.
So should pthread_t not be an atomic? Or does it behave atomically which is ensured in some other way?
This implementation of pthread_equal (which is going to be specific to the POSIX implementation it goes with) does not access any variables except local ones. In C, arguments are always passed by value. thread1 and thread2 are local variables in the function who take their values from the expressions the caller uses when making the call.
With that said, even if they weren't, there would be no problem of thread-unsafety here. Rather, if the caller accessed pthread_t objects that could be changed concurrently by other threads without using appropriate synchronization mechanisms to preclude that, it's a data race bug in the caller, not a thread-safety matter in the callee.
When we say an operation is thread safe, we make certain assumptions on the code that invokes that operation. Those assumptions include making sure the inputs to the operation are stable and that their values remain valid and stable until the operation completes. Otherwise, nothing would be thread safe.
Related
The function putenv is not a thread safe function, so I guess if I call pthread_mutex_lock before calling putenv, can I make putenv "thread safe" in this way?
I tried it but when I run it, segmentation fault came out.
Here is the code:
#include "apue.h"
#include <pthread.h>
pthread_mutex_t envlock = PTHREAD_MUTEX_INITIALIZER;
void thread_func(void*arg){
pthread_mutex_lock(&envlock);
char env[100];
sprintf(env,"hhh=%s",(char*)arg);
putenv(env);
pthread_mutex_unlock(&envlock);
return;
}
int main(){
pthread_t thread0, thread1, thread2;
void *shit;
int err;
char name0[]="thread0";
err=pthread_create(&thread0,NULL,thread_func,(void*)name0);
if(err!=0)
exit(-1);
char name1[]="thread1";
err=pthread_create(&thread1,NULL,thread_func,(void*)name1);
if(err!=0)
exit(-1);
char name2[]="thread2";
err=pthread_create(&thread2,NULL,thread_func,(void*)name2);
if(err!=0)
exit(-1);
pthread_join(thread0,&shit);
pthread_join(thread1,&shit);
pthread_join(thread2,&shit);
char *hhh=getenv("hhh");
printf("hhh is =%s",hhh);
return 0;
}
putenv is reentrant in newer versions of glibc. The problem is that putenv does not copy the string that is given to it, and therefore you cannot base it on your stack. Try keeping your char env[100] in a place where it will not be destroyed at the function's end.
The putenv() function is not required to be reentrant, and the one in
glibc 2.0 is not, but the glibc 2.1 version is.
...
Since version 2.1.2, the glibc implementation conforms to SUSv2: the
pointer string given to putenv() is used. In particular, this string
becomes part of the environment; changing it later will change the
environment. (Thus, it is an error to call putenv() with an
automatic variable as the argument, then return from the calling
function while string is still part of the environment.)
In general, protecting a function with a locking mechanism does not make it automatically reentrant. Reentrancy means precisely that, that a function can be reentered again, in the middle of a call to itself, without any risk to the internal data it manages int the active call. For that to occurr, the function must operate only on its stack frame, or be given pointers as parameters (in the stack, or in storable registers) to any externa data objects it must act upon. This makes reentrancy possible.
Now, I'll explain one scenario where using a non-reentrant function (with the locking mechanism you propose) is not applicable:
Assume the case you have a function f() that is being executed, then a signal is received (or an interrupt) and the signal handler just has to make a call to f(). As far as you have the function entry locked, the signal handler will be locked on entry to the function f(), making the handler never return, so the main program cannot continue its execution of f() to open the lock. In cases (the majority) you use the same stack to handle interrupts than the one of the interrupted function (well, I know about FreeBSD using a different context for interrupt handlers, but don't know if this applies to user mode processes), no chance to unlock the lock until the handler has return, but it cannot return as far as the handler is waiting for the lock to be unlocked. This case of reentrancy is not handled by your routine.
How can this problem be avoided. Just avoid interrupts that call this handler when you are in the middle of the shared region (then, why to lock it?) so interrupts are handled after the function call.
Of course, if you need to call it from several threads (each with it's own stack) then you need the lock after all.
Conclusion:
The lock just avoids the reentering of f(), but doesn't make it reentrant.
If I have multiple threads and I would like to wait for a thread to finish, why is this undefined according to the pthread_join() function?
For example, the code below shows thread 1 and 2 wait for thread 0:
void* thread1(void* t1){
..
pthread_join(pthread_t thread0, void **retval1);
return NULL
}
void* thread2(void* t2){
..
pthread_join(pthread_t thread0, void **retval2);
return NULL
}
Why would this behaviour be undefined or in other words not possible?
The pthread_t object will generally point to some allocated data pertaining to the thread. Among this data will generally be the return value of the thread. pthread_join will read the return value and free the thread data.
If you pthread_join the same pthread id twice, you kind of have a case of a double-free. The pointed to data might be invalid on the second join, but it also could have been reused in an unpredictable fashion by another thread.
The results are difficult/impossible to reason about. Hence the UB.
Basically, the retval of thread0 is stored until one thread calls pthread_join. After that, the retval is no longer available.
C++ has std::shared_future which explicitly is designed so multiple threads can wait for a single result, which shows your idea isn't strange.
pthread_join is a resource-identifier-freeing operation, like close on file descriptors, free on allocated memory, unlink on filenames, etc. All such operations are inherently subject to "double-free"/"use-after-free" bugs if the identifier (here, the particular pattern of bits making up the pthread_t value) can be used again in the future. As such, the behavior has to be either completely undefined or subject to conditions that make it a serious programming error, to the point where it might as well be undefined.
Note that in your mental model you might be thinking of both pthread_join calls "starting before the thread exits", and thereby taking place during the time when the pthread_t identifier is still valid. However formally there is no ordering between them, nor can there be. There is no observable distinction between the state of "already blocked in pthread_join" and "just about to call pthread_join".
I've never had the chance to play with the pthreads library before, but I am reviewing some code involving pthread mutexes. I checked the documentation for pthread_mutex_lock and pthread_mutex_init, and my understanding from reading the man pages for both these functions is that I must call pthread_mutex_init before I call pthread_mutex_lock.
However, I asked a couple colleagues, and they think it is okay to call pthread_mutex_lock before calling pthread_mutex_init. The code I'm reviewing also calls pthread_mutex_lock without even calling pthread_mutex_init.
Basically, is it safe and smart to call pthread_mutex_lock before ever calling pthread_mutex_init (if pthread_mutex_init even gets called)?
EDIT: I also see some examples where pthread_mutex_lock is called when pthread_mutex_init is not used, such as this example
EDIT #2: Here is specifically the code I'm reviewing. Please note that the configure function acquires and attaches to some shared memory that does not get initialized. The Java code later on will call lock(), with no other native functions called in-between. Link to code
Mutexes are variables containing state (information) that functions need to do their job. If no information was needed, the routine wouldn't need a variable. Likewise, the routine can't possibly function properly if you feed random garbage to it.
Most platforms do accept a mutex object filled with zero bytes. This is usually what pthread_mutex_init and PTHREAD_MUTEX_INITIALIZER create. As it happens, the C language also guarantees that uninitialized global variables are zeroed out when the program starts. So, it may appear that you don't need to initialize pthread_mutex_t objects, but this is not the case. Things that live on the stack or the heap, in particular, often won't be zeroed.
Calling pthread_mutex_init after pthread_lock is certain to have undesired consequences. It will overwrite the variable. Potential results:
The mutex gets unlocked.
A race condition with another thread attempting to get the lock, leading to a crash.
Resources leaked in the library or kernel (but will be freed on process termination).
The POSIX standard says:
If mutex does not refer to an initialized mutex object, the behavior
of pthread_mutex_lock(), pthread_mutex_trylock(), and
pthread_mutex_unlock() is undefined.
So you do need to initialise the mutex. This can be done either by a call to pthread_mutex_init(); or, if the mutex has static storage duration, by using the static initializer PTHREAD_MUTEX_INITIALIZER. Eg:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
here is the text from the link I posted in a comment:
Mutual exclusion locks (mutexes) prevent multiple threads
from simultaneously executing critical sections of code that
access shared data (that is, mutexes are used to serialize
the execution of threads). All mutexes must be global. A
successful call for a mutex lock by way of mutex_lock()
will cause another thread that is also trying to lock the
same mutex to block until the owner thread unlocks it by way
of mutex_unlock(). Threads within the same process or
within other processes can share mutexes.
Mutexes can synchronize threads within the **same process** or
in ***other processes***. Mutexes can be used to synchronize
threads between processes if the mutexes are allocated in
writable memory and shared among the cooperating processes
(see mmap(2)), and have been initialized for this task.
Initialize Mutexes are either intra-process or inter-process,
depending upon the argument passed implicitly or explicitly to the initialization of that mutex.
A statically allocated mutex does not need to be explicitly initialized;
by default, a statically allocated mutex is initialized with all zeros and its scope is set to be within the calling process.
For inter-process synchronization, a mutex needs to be allo-
cated in memory shared between these processes. Since the
memory for such a mutex must be allocated dynamically, the
mutex needs to be explicitly initialized using mutex_init().
also, for inter-process synchronization,
besides the requirement to be allocated in shared memory,
the mutexes must also use the attribute PTHREAD_PROCESS_SHARED,
otherwise accessing the mutex from another process than its creator results in undefined behaviour
(see this: linux.die.net/man/3/pthread_mutexattr_setpshared):
The process-shared attribute is set to PTHREAD_PROCESS_SHARED to permit a
mutex to be operated upon by any thread that has access to the memory
where the mutex is allocated, even if the mutex is allocated in memory that is shared by multiple processes
Is it possible to read the registers or thread local variables of another thread directly, that is, without requiring to go to the kernel? What is the best way to do so?
You can't read the registers, which wouldn't be useful anyway. But reading thread local variables from another thread is easily possible.
Depending on the architecture (e. g. strong memory ordering like on x86_64) you can safely do it even without synchronization, provided that the read value doesn't affect in any way the thread is belongs to. A scenario would be displaying a thread local counter or similar.
Specifically in linux on x86_64 as you tagged, you could to it like that:
// A thread local variable. GCC extension, but since C++11 actually part of C++
__thread int some_tl_var;
// The pointer to thread local. In itself NOT thread local, as it will be
// read from the outside world.
struct thread_data {
int *psome_tl_var;
...
};
// the function started by pthread_create. THe pointer needs to be initialized
// here, and NOT when the storage for the objects used by the thread is allocated
// (otherwise it would point to the thread local of the controlling thread)
void thread_run(void* pdata) {
pdata->psome_tl_var = &some_tl_var;
// Now do some work...
// ...
}
void start_threads() {
...
thread_data other_thread_data[NTHREADS];
for (int i=0; i<NTHREADS; ++i) {
pthread_create(pthreadid, NULL, thread_run, &other_thread_data[i]);
}
// Now you can access each some_tl_var as
int value = *(other_thread_data[i].psome_tl_var);
...
}
I used similar for displaying some statistics about worker threads. It is even easier in C++, if you create objects around your threads, just make the pointer to the thread local a field in your thread class and access is with a member function.
Disclaimer: This is non portable, but it works on x86_64, linux, gcc and may work on other platforms too.
There's no way to do it without involving the kernel, and in fact I don't think it could be meaningful to read them anyway without some sort of synchronization. If you don't want to use ptrace (which is ugly and non-portable) you could instead choose one of the realtime signals to use for a "send me your registers/TLS" message. The rough idea is:
Lock a global mutex for the request.
Store the information on what data you want (e.g. a pthread_key_t or a special value meaning registers) from the thread in global variables.
Signal the target thread with pthread_kill.
In the signal handler (which should have been installed with sigaction and SA_SIGINFO) use the third void * argument to the signal handler (which really points to a ucontext_t) to copy that ucontext_t to the global variable used to communicate back to the requesting thread. This will give it all the register values, and a lot more. Note that TLS is a bit more tricky since pthread_getspecific is not async-signal-safe and technically not legal to run in this context...but it probably works in practice.
The signal handler posts a semaphore (this is the ONLY async-signal-safe synchronization function offered by POSIX) indicating to the requesting thread that it's done, and returns.
The requesting thread finishes by waiting on the semaphore, then reads the data and unlocks the request mutex.
Note that this will involve at least 1 transition to kernelspace (pthread_kill) in the requesting thread (and maybe another in sem_wait), and 1-3 in the target thread (1 for returning from the signal handler, one for entering the signal handler if it was not already sleeping in kernelspace, and possibly one for sem_post). Still it's probably faster than mucking around with ptrace which is not designed for high-performance usage...
I have declared some local variable in one function like this:
void* thread_function (void* parameter)
{
struct parameter * thread_data = (struct parameter *)parameter;
char buffer[20];
int temp;
}
Here if I have created two threads then in one thread if buffer & temp is updated so will it effect other thread ?
i mean if there are two thread then does there will be two copy of all local variable?
EDIT : then in which case i need to used thread specific data.? i mean pthread_setspecific & all such stuff
These variables are allocated on the stack, and each thread has its own stack: these variables are private to each thread (they are not shared). (See this answer for more details.)
If you assign thread_data to a global pointer, for example, other threads will be able to access thread_data via the global pointer.
Thread specific data (e.g. pthread_setspecific) is used to create variables that are global, but still specific to each thread (not shared): They are thread-specific global variables.
You need to use thread specific variables when you want global variables, but don't want to share them between threads.
It's not that each thread has its own copy, it's that each instance of a function invocation has its own copy of all automatic (i.e. local non-static) variables, regardless of whether the instances are in the same thread or different threads. This is true if the instances come into existence due to invocation in different threads, recursive invocation, mutual/indirect recursion, or even invocation from an asynchronous signal handler. Note that while the C standard does not specify threads, the relevant section in the standard is probably 5.2.3 Signals and interrupts:
Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called by a signal handler, or both, with no alteration to earlier, but still active, invocations' control flow (after the interruption), function return values, or objects with automatic storage duration. All such objects shall be maintained outside the function image (the instructions that compose the executable representation of a function) on a per-invocation basis.
This makes it explicit that each invocation must have its own storage for automatic variables.
Local variables are stored in stack memory, which is private to a thread.
Therefore they are not shared between threads: there will be an independent copy of each variable in each thread
Update
Whether you would want to share data between threads really boils down to a design question; What are your threads doing? Are their effort co-ordinated or are they simply workers processing a queue.
The main thing to consider is synchronization of shared data. Variables that are shared between threads are variables that can change value unexpectedly (within a single thread) and so need to be treated as such. I would suggest that you err on the side of not sharing, unless you have a specific reason to do so.