I used to get a trouble with pthread_exit(). I know there is no way to use pthread_exit() in a way like
pthread_exit(&some_local_variable);
We always need to use pthread_exit() like:
pthread_exit("Thread Exit Message or something necessary information");
I once coded a simple program for testing purpose.
I made four thread functions for addition, subtraction, multiplication and division of two integers, respectively. Then while performing these operations on four different threads, I tried to return the result of the operation by pthread_exit(). What I mean is something like:
pthread_exit(&add_result);
When I ran the code in CentOS 6, I got the desired result (i.e., garbage values from all the threads) as pthread_exit() cannot be used like that. But, I got confused. Because for the first time, I ran that code in Ubuntu 11.10 and got three absolutely correct result(correct result of the operation) from three threads and garbage value from one thread. This confused me because why three threads are giving correct result of operation?
Moreover, I used different sleep times for those threads. I found that the thread having least sleep time gave the garbage value.
As gcc is the compiler for both these operating systems, why one system has bugs like this?
It confuses novice programmers like me. If it is not a bug, can anyone explain it to me why is this happening?
I think your answer is in pthread_exit doc. You say that you returned a pointer on add_result, which seems to be a local variable.
Here is the quote of the doc that might answer:
After a thread has terminated, the result of access to local (auto)
variables of the thread is undefined. Thus,
references to local variables of the exiting thread should not be used for the pthread_exit() value_ptr parameter
value.
You may use the void* argument to the threaded function to use a structure, which should contain the actual result of your operation.
pthread_exit just takes a pointer to a void. If you pass the address of a variable local to the thread, sometimes that memory will have been reused for something else. Sometimes it will still be there. There's no guarantee that after a thread exits, some part of the system will go and make sure that all of the memory it was using is set to garbage values.
It's not a bug - the system is doing exactly what you ask it.
Bonus related answer - Can a local variable's memory be accessed outside its scope?
The only requirement for pthread_exit(foo) is that foo points to something which lives long enough. Local variables don't, malloc'ed memory does.
Related
Details about the issue leading to the question
We're facing a SIGSEGV error under extremely rare circumstances when using the lmdb database library that are not easily reproducible. The best we got out of it is a stack-trace that looks like this through the core dump:
#0 in mdb_env_reader_dest (ptr=...) at .../mdb.c: 4935
#1 in __nptl_deallocate_tsd () at pthread_create.c:301
...
The line the stack-trace is pointing to is this (it's not the same because we attempted some changes to fix this).
Having tracked the issue for a while, the only explanation I have is that, when closing the lmdb environment from a thread other than the main thread, there's some kind of race between this line and this line, where the latter deletes the key, calls the custom destructor mdb_env_reader_dest, which causes SIGSEGV according to the documentation of lmdb, when attempting to use resources after freeing them.
The question
The documentation of pthread_key_create and pthread_key_delete are ambiguous to me, personally, in the sense whether they're talking about the calls to pthread_key_create and pthread_key_delete or the data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data at thread exit time are only guaranteed to work correctly when called in the thread that allocated the thread-specific data.
So the question is, can we call mdb_env_open and mdb_env_close from two different threads, leading to pthread_key_create and pthread_key_delete from different threads, and expect correct behavior?
I couldn't find such a requirement in the lmdb documentation. The closest I could find is this, which doesn't seem to reference the mdb_env_open function.
Are we allowed to call pthread_key_create and pthread_key_delete from different threads?
Yes.
However, a key must be created via pthread_key_create before it can be used by any thread, and that is not, itself, inherently thread safe. The key creation is often synchronized by performing it before starting any (other) threads that will use the key, but there are other alternatives.
Similarly, a key must not be deleted before all threads are finished with it, and the deletion is not, itself, inherently thread safe. TSD keys often are not deleted at all, and when they are deleted, that is often synchronized by first joining all (other) threads that may use the key. But again, there are other alternatives.
The documentation of pthread_key_create and pthread_key_delete are
ambiguous to me, personally, in the sense whether they're talking
about the calls to pthread_key_create and pthread_key_delete or the
data underneath the pointers. This is what the docs say:
The associated destructor functions used to free thread-specific data
at thread exit time are only guaranteed to work correctly when called
in the thread that allocated the thread-specific data.
The destructor functions those docs are talking about are the ones passed as the second argument to pthread_key_create().
And note well that that text is drawn from the Rationale section of the docs, not the Description section. It is talking about why the TSD destructor functions are not called by pthread_key_delete(), not trying to explain what the function does. That particular point is that TSD destructor functions must run in each thread carrying non-NULL TSD, as opposed to in the thread that calls pthread_key_delete().
So the question is, can we call mdb_env_open and mdb_env_close from
two different threads, leading to pthread_key_create and
pthread_key_delete from different threads, and expect correct
behavior?
The library's use of thread-specific data does not imply otherwise. However, you seem to be suggesting that there is a race between two different lines in the same function, mdb_env_close0, which can be the case only if that function is called in parallel by two different threads. The MDB docs say of mdb_env_close() that "Only a single thread may call this function." I would guess that they mean that to be scoped to a particular MDB environment. In any case, if you really have the race you think you have, then it seems to me that your program must be calling mdb_env_close() from multiple threads, contrary to specification.
So, as far as I know or can tell, the thread that calls mdb_env_close() does not need to be the same one that called mdb_env_open() (or mdb_env_create()), but it does need to be the only one that calls it.
Is there a difference between the first thread and other threads created during runtime. Because I have a program where to abort longjmp is used and a thread should be able to terminate the program (exit or abort don't work in my case). Could I safely use pthread_kill_other_threads_np and then longjmp?
I'm not sure what platform you're talking about, but pthread_kill_other_threads_np is not a standard function and not a remotely reasonable operation anymore than free_all_malloced_memory would be. Process termination inherently involves the termination of all threads atomically with respect to each other (they don't see each other terminate).
As for longjmp, while there is nothing wrong with longjmp, you cannot use it to jump to a context in a different thread.
It sounds like you have an XY problem here; you've asked about whether you can use (or how to use) particular tools that are not the right tool for whatever it is you want, without actually explaining what your constraints are.
I know it's a very specific question and it's not very interesting for a high level programmer, but I would like to know when exactly are allocated the local variables of a thread function, in other words after
pthread_create(&thread, &function, ...)
is executed, can I say that they exists in memory or not (considering that the scheduler could have not executed the thread yet)?
I tried to search in the posix library code but it's not easy to understand, I arrive at the clone function, written in assembly, but than I cannot find che code of the system call service routine sys_clone to understand what exactly it does. I see in the clone code the invocation of the thread function, but I think this should happen only in the created thread (which could have never been executed by the scheduler when pthread_create is terminated) and not in the creator.
in other words after
pthread_create(&thread, &function, ...)
is executed, can I say that they exists in memory or not (considering
that the scheduler could have not executed the thread yet)?
POSIX does not give you any reason for confidence that the local variables of the initial call to function function() in the created thread will have been allocated by the time pthread_create() returns. They might or might not have been, and indeed, the answer might not even be well defined inasmuch as different threads do not necessarily have a consistent view of machine state.
There is no special significance to the local variables of a thread's start function relative to the local variables of any other function called in that thread. Moreover, although pthread_create() will not return successfully until the new thread has been created, that's a separate question from whether the start function has even been entered, much less whether its local variables have been allocated.
I know I am supposed to use mutexes but the way I currently use pthreads it would overly complicate the program...
anyway I basically have a variable which I use to denote if a thread is currently performing work or not. in the main thread I run over it in a while loop the check what threads are no longer busy. Now obviously my thread can write to this same variable once it is done.
Is it allowed to read and write from the same variable from 2 different threads, if 1 thread is ONLY reading and 1 thread is ONLY writing. reading of an old version is not of much concern since it will just read the correct once on the next iteration.
so is it safe to do something like that?
In general, NO.
The following article explains why:
http://www.domaigne.com/blog/computing/mutex-and-memory-visibility/
Here is a list of API functions that act as memory barriers:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11
I am working on some C code and am having a problem with locking a mutex. The code does a call to a function and this function locks a mutex to ensure a file pointer doesn't get overwritten, this works fine for several instances, probably about 10-20 separate calls of the function being called, but on the next call, pthread_mutex_lock will return with a result of 22. I've then put this result into strerror(); and got back invalid argument.
What does invalid argument means, thanks for any help you can provide.
22 is ENVAL error code which means invlalid argument. Make sure that you have initilized you mutex, or if at some point you have unitilized it somewhere.
Also man pthread_mutex_lock says:
EINVAL
The mutex was created with the protocol attribute having the
value PTHREAD_PRIO_PROTECT and the calling thread's priority is higher
than the mutex's current priority ceiling.
I don't quite understand this but it probably means that you need to change thread's priority. I am not sure. Maybe someone else can shine light on it.
Sounds like you have a threading problem or a wild point somewhere else in your program. Try printing the value of the mutex pointer. Try having another thread that simply locks the mutex and then prints to a log file the time and that the lock was successful, then unlocks the mutex. I suspect the problem is not where you are looking.
Also, as other have said here, your best bet is to create a very small test program that demonstrates the problem and post it here. Chances are you won't be able to get that small program to demonstrate the error. Then slowly add all of your original code into the small program until the error returns. If it returns, you now know what caused the problem. If it doesn't return, you're done.